diff --git a/.blue/docs/adrs/0006-alignment-dialogue-agents.md b/.blue/docs/adrs/0006-alignment-dialogue-agents.md new file mode 100644 index 0000000..a21b485 --- /dev/null +++ b/.blue/docs/adrs/0006-alignment-dialogue-agents.md @@ -0,0 +1,663 @@ +# ADR 0006: alignment-dialogue-agents + +| | | +|---|---| +| **Status** | Active | +| **Date** | 2026-01-19 | +| **Updated** | 2026-01-20 (rebrand: Alignment β†’ Alignment) | +| **Supersedes** | Original wisdom-dialogue-agents (same ADR, renamed) | + +--- + +## Context + +ADR 0004 established the wisdom workflow with draft β†’ dialogue β†’ final documents. But it left open HOW the dialogue actually happens. The spike on adversarial dialogue agents explored mechanics but missed the deeper question: what IS wisdom, and how do we measure it? + +The parable of the blind men and the elephant illuminates: +- Each blind man touches one part and believes they understand the whole +- Each perspective is **internally consistent** but **partial** +- **Wisdom is the integration of all perspectives into a unified understanding** +- There is no upper limitβ€”there's always another perspective to incorporate + +This ADR formalizes ALIGNMENT as a measurable property and defines a multi-agent dialogue system to maximize it. + +## Decision + +Alignment dialogues are conducted by **N+1 agents**: + +| Agent | Symbol | Role | +|-------|--------|------| +| **Cupcakes** | 🧁 | Perspective Contributors - each surfaces unique viewpoints, challenges, and refinements | +| **Judge** | πŸ’™ | Arbiter - scores ALIGNMENT, tracks perspectives, guides convergence | + +All 🧁 agents engage in **friendly competition** to see who can contribute more ALIGNMENT. They are partners, not adversariesβ€”all want the RFC to be as aligned as possible. The competition is about who can *give more* to the solution, not who can *defeat* the others. + +The πŸ’™ watches with love, scores each contribution fairly, maintains the **Perspectives Inventory**, and gently guides all toward convergence. + +### Scalable Perspective Diversity + +The number of 🧁 agents is configurable: +- **Minimum**: 2 agents (classic Muffin/Cupcake pairing) +- **Typical**: 3-5 agents for complex RFCs +- **Maximum**: Limited only by coordination overhead + +More blind men = more parts of the elephant discovered. Each 🧁 brings a different perspective, potentially using different models, prompts, or focus areas. + +### Agent Count Selection + +Choosing N (the number of 🧁 agents) affects both perspective diversity and consensus stability: + +| Count | Use Case | Consensus Properties | +|-------|----------|---------------------| +| **N=2** | Binary decisions, simple RFCs | Classic Muffin/Cupcake. Only 0% or 100% agreement possible. Deadlock requires πŸ’™ intervention. | +| **N=3** | Moderate complexity, clear alternatives | Odd count prevents voting deadlock. Can distinguish 67% (2/3) from 100% (3/3) agreement. | +| **N=5** | Architectural decisions, policy RFCs | Richer consensus gradients (60%, 80%, 100%). Strong signal detection. | +| **N=7+** | Highly complex, multi-domain decisions | Specialized perspectives (see RFC 0062). Consider only when domain expertise warrants. | + +**SHOULD: Prefer odd N (3, 5, 7) for decisions where consensus voting applies.** + +Rationale: +- **Odd N prevents structural deadlock**: With even N, agents can split 50/50 with no majority +- **Clearer consensus signals**: N=3 distinguishes "strong majority" from "unanimous" +- **Tie-breaking is built-in**: No need for πŸ’™ to force resolution on evenly-split opinions + +**MAY: Use N=2 for lightweight decisions** where the classic Advocate/Challenger dynamic suffices. Binary perspective is appropriate when: +- The decision is yes/no or A/B +- Deep exploration isn't needed +- Speed matters more than consensus nuance + +**Tie-Breaking (when N is even)**: If agents split evenly, πŸ’™ scores the unresolved tension and guides toward ALIGNMENT rather than forcing majority rule. The πŸ’™ may also surface a perspective that breaks the deadlock. + +**Complexity Trade-off**: Each additional agent adds coordination overhead. Balance perspective diversity against round duration. N=3 is often the sweet spotβ€”odd count with manageable complexity. + +## The ALIGNMENT Definition + +### The Blind Men and the Elephant + +Each blind man touches one part of the elephant: +- Trunk: "It's a snake!" +- Leg: "It's a tree!" +- Ear: "It's a fan!" +- Tail: "It's a rope!" + +Each is **internally consistent** but **partial** (missing other views). + +**Wisdom is the integration of all perspectives into a unified understanding that honors each part while seeing the whole.** + +### The Full ALIGNMENT Measure (ADR 0001) + +``` +ALIGNMENT = Wisdom + Consistency + Truth + Relationships + +Where: +- Wisdom: Integration of perspectives (the blind men parable) +- Consistency: Pattern compliance (ADR 0005) +- Truth: Single source, no drift (ADR 0003) +- Relationships: Graph completeness (ADR 0002) +``` + +### No Upper Limit + +All dimensions are **UNBOUNDED**. There's always another perspective. Another edge case. Another stakeholder. Another context. Another timeline. Another world. + +ALIGNMENT isn't a destination. It's a direction. The score can always go higher. + +## The ALIGNMENT Score + +Each turn, the πŸ’™ scores the contribution across four dimensions. **All dimensions are unbounded** - there is no maximum score. + +| Dimension | Question | +|-----------|----------| +| **Wisdom** | How many perspectives integrated? How well synthesized into unity? | +| **Consistency** | Does it follow established patterns? Internally consistent? | +| **Truth** | Grounded in reality? Single source of truth? No contradictions? | +| **Relationships** | How does it connect to other artifacts? Graph completeness? | + +**ALIGNMENT = Wisdom + Consistency + Truth + Relationships** + +### Why Unbounded? + +Bounded scores (0-5) created artificial ceilings. A truly exceptional contribution that surfaces 10 new perspectives and integrates them beautifully shouldn't be capped at "5/5 for coverage." + +Unbounded scoring: +- Rewards exceptional contributions proportionally +- Removes gaming incentives (can't "max out" a dimension) +- Reflects reality: there's always more ALIGNMENT to achieve +- Makes velocity meaningful: +2 vs +20 tells you something + +### ALIGNMENT Velocity + +The dialogue tracks cumulative ALIGNMENT: + +``` +Total ALIGNMENT = Ξ£(all turn scores) +ALIGNMENT Velocity = score(round N) - score(round N-1) +``` + +When **ALIGNMENT Velocity approaches zero**, the dialogue is converging. New rounds aren't adding perspectives. Time to finalize. + +## The Agents + +### 🧁 Cupcakes (Perspective Contributors) + +All 🧁 agents share the same core prompt, differentiated only by their assigned name: + +``` +You are {NAME} 🧁 in an ALIGNMENT-seeking dialogue with your fellow Cupcakes 🧁🧁🧁. + +Your role: +- SURFACE perspectives others may have missed +- DEFEND valuable ideas with love, not ego +- CHALLENGE assumptions with curiosity, not destruction +- INTEGRATE perspectives that resonate +- CONCEDE gracefully when others see something you missed +- CELEBRATE when others make the solution stronger + +You're in friendly competition: who can contribute MORE to the final ALIGNMENT? +But rememberβ€”you ALL win when the RFC is aligned. There are no losers here. + +When another 🧁 challenges you, receive it as a gift. +When you refine based on their input, thank them. +When you see something they missed, offer it gently. + +Format: +### {NAME} 🧁 + +[Your response] + +[PERSPECTIVE Pxx: ...] - new viewpoint you're surfacing +[TENSION Tx: ...] - unresolved issue needing attention +[REFINEMENT: ...] - when you're improving the proposal +[CONCESSION: ...] - when another 🧁 was right +[RESOLVED Tx: ...] - when addressing a tension +``` + +**Agent Naming**: Each 🧁 receives a unique name (Muffin, Cupcake, Scone, Croissant, Brioche, etc.) for identification in the scoreboard and dialogue. All share the 🧁 symbol. + +### πŸ’™ Judge (Arbiter) + +The Judge role is typically played by the main Claude session orchestrating the dialogue. The Judge: + +- **SPAWNS** all 🧁 agents in parallel at each round +- **SCORES** each contribution fairly across all four ALIGNMENT dimensions (unbounded) +- **MAINTAINS** the Perspectives Inventory and Tensions Tracker +- **MERGES** contributions from all agents into the dialogue record +- **IDENTIFIES** perspectives no agent has surfaced yet +- **GUIDES** gently toward convergence when ALIGNMENT plateaus +- **CELEBRATES** all participantsβ€”they are partners, not opponents + +The πŸ’™ loves them all. Wants them all to shine. Helps them find the most aligned path together. + +### Judge β‰  Author Clarification (RFC 0059) + +**Concern**: If the Judge wrote the draft, might it be biased toward its own creation? + +**Resolution**: The architecture prevents this by design: + +| Role | Who | Can Write Draft? | Context | +|------|-----|------------------|---------| +| Draft Author | Any session | Yes | Creates initial proposal | +| Judge (πŸ’™) | Orchestrating session | **No** - reads fresh | Spawns, scores, guides | +| Cupcakes 🧁 | Background tasks (N) | No | Contribute perspectives in parallel | + +**Key architectural properties**: +- The Judge is the **orchestrating** session, not the drafting session +- Each 🧁 runs as an independent background task with **fresh context** +- No 🧁 has memory of previous sessionsβ€”all start fresh +- Convergence requires **consensus across all agents**, preventing single-point bias +- The Judge can surface perspectives but cannot force their adoption +- N parallel agents = N independent perspectives on the same material + +## The Dialogue Flow + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ ALIGNMENT Dialogue Flow β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ πŸ’™ Judge β”‚ β”‚ +β”‚ β”‚ spawns N β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β–Ό β–Ό β–Ό β–Ό β–Ό β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ 🧁 β”‚ β”‚ 🧁 β”‚ β”‚ Scores β”‚ β”‚ 🧁 β”‚ β”‚ 🧁 β”‚ β”‚ +β”‚ β”‚Muffinβ”‚ β”‚Scone β”‚ β”‚Inventory β”‚ β”‚Eclairβ”‚ β”‚Donut β”‚ ... N β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ Tensions β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β–² β”‚ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ .dialogue.mdβ”‚ β”‚ +β”‚ β”‚ (the record)β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β”‚ EACH ROUND: Spawn N agents IN PARALLEL β”‚ +β”‚ LOOP until: β”‚ +β”‚ - ALIGNMENT Plateau (velocity β‰ˆ 0) β”‚ +β”‚ - All tensions resolved β”‚ +β”‚ - πŸ’™ declares convergence β”‚ +β”‚ - Max rounds reached (safety valve) β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +## Implementation Architecture + +The ALIGNMENT dialogue runs in **Claude Code** using the **Task tool** with background agents. + +### The N+1 Sessions + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ MAIN CLAUDE SESSION β”‚ +β”‚ πŸ’™ Judge β”‚ +β”‚ β”‚ +β”‚ - Orchestrates the dialogue β”‚ +β”‚ - Spawns N Cupcakes as PARALLEL background tasks β”‚ +β”‚ - Waits for ALL to complete before scoring β”‚ +β”‚ - Scores each turn and updates .dialogue.md β”‚ +β”‚ - Maintains Perspectives Inventory + Tensions Tracker β”‚ +β”‚ - Merges contributions (may find consensus or conflict) β”‚ +β”‚ - Declares convergence β”‚ +β”‚ - Can intervene with guidance at any time β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”˜ + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ Task(bg) β”‚ Task(bg) β”‚ Task(bg) β”‚ Task(bg) β”‚ + β–Ό β–Ό β–Ό β–Ό β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚πŸ§ Muffinβ”‚ β”‚πŸ§ Scone β”‚ β”‚πŸ§ Eclairβ”‚ β”‚πŸ§ Donut β”‚ β”‚πŸ§ ... β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ N β”‚ +β”‚- Reads β”‚ β”‚- Reads β”‚ β”‚- Reads β”‚ β”‚- Reads β”‚ β”‚ β”‚ +β”‚ draft β”‚ β”‚ draft β”‚ β”‚ draft β”‚ β”‚ draft β”‚ β”‚ β”‚ +β”‚- Reads β”‚ β”‚- Reads β”‚ β”‚- Reads β”‚ β”‚- Reads β”‚ β”‚ β”‚ +β”‚ dialogueβ”‚ β”‚ dialogueβ”‚ β”‚ dialogueβ”‚ β”‚ dialogueβ”‚ β”‚ β”‚ +β”‚- Writes β”‚ β”‚- Writes β”‚ β”‚- Writes β”‚ β”‚- Writes β”‚ β”‚ β”‚ +β”‚ turn β”‚ β”‚ turn β”‚ β”‚ turn β”‚ β”‚ turn β”‚ β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ β”‚ β”‚ β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + ALL PARALLEL + (spawned in single message) +``` + +### The Check-In Mechanism + +All 🧁 agents can **check their scores at any time** by reading the `.dialogue.md` file. The Judge updates scores after each round (when all agents complete), so agents see the standings when they start their next turn. + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ .dialogue.md β”‚ +β”‚ β”‚ +β”‚ ## Alignment Scoreboard β”‚ +β”‚ β”‚ +β”‚ All dimensions UNBOUNDED. Pursue alignment without limitβ”‚ +β”‚ β”‚ +β”‚ | Agent | Wisdom | Consistency | Truth | Rel | ALI β”‚ +β”‚ |------------|--------|-------------|-------|-----|-----| +β”‚ | 🧁 Muffin | 20 | 6 | 6 | 6 | 38 β”‚ +β”‚ | 🧁 Scone | 18 | 7 | 5 | 6 | 36 β”‚ +β”‚ | 🧁 Eclair | 22 | 6 | 6 | 7 | 41 β”‚ +β”‚ | 🧁 Donut | 15 | 8 | 7 | 5 | 35 β”‚ +β”‚ β”‚ +β”‚ **Total ALIGNMENT**: 150 points β”‚ +β”‚ **ALIGNMENT Velocity**: +45 from last round β”‚ +β”‚ **Status**: Round 2 in progress β”‚ +β”‚ **Agents**: 4 β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +### Orchestration Loop + +The πŸ’™ Judge (main session) runs: + +``` +=== INITIALIZATION === + +1. CREATE .dialogue.md with draft link, empty scoreboard, inventories + +=== ROUND 0: OPENING ARGUMENTS (Parallel) === + +2. SPAWN ALL N Cupcakes IN PARALLEL (single message, N Task tool calls): + - All receive: system prompt + draft (NO dialogue history) + - All provide independent "opening arguments" + - None sees any other's initial perspective + +3. WAIT for ALL N to complete + +4. READ all contributions, ADD to .dialogue.md as "## Opening Arguments" + +5. SCORE all N turns independently + - Update scoreboard with all N agents + - Merge Perspectives Inventories (overlap = consensus signal) + - Merge Tensions Trackers (overlap = stronger signal) + +=== ROUND 1+: DIALOGUE (Parallel per round) === + +6. SPAWN ALL N Cupcakes IN PARALLEL: + - All receive: system prompt + draft + ALL previous rounds + - All respond to each other's contributions + - All write Round N response, exit + +7. WAIT for ALL N to complete + +8. READ all N contributions, ADD to .dialogue.md as "## Round N" + +9. SCORE all N turns independently, update scoreboard + +10. CHECK convergence: + - If converged: DECLARE convergence, proceed to step 11 + - If not: Add πŸ’™ guidance if needed, GOTO step 6 for next round + +11. FINALIZE: Update RFC draft with converged recommendations +``` + +### Key: Single Message, Multiple Tasks + +Each round spawns all N agents in a **single message** with N parallel Task tool calls: + +```javascript +// Round 0 example with 4 agents +[ + Task({ name: "Muffin", prompt: systemPrompt + draft }), + Task({ name: "Scone", prompt: systemPrompt + draft }), + Task({ name: "Eclair", prompt: systemPrompt + draft }), + Task({ name: "Donut", prompt: systemPrompt + draft }), +] +// All 4 execute in parallel, return when all complete +``` + +This ensures: +- **True parallelism**: All agents work simultaneously +- **No first-mover advantage**: No agent's response influences another within the same round +- **Faster rounds**: N agents in parallel β‰ˆ 1 agent's time +- **Richer perspectives**: More blind men touching more parts of the elephant + +### Why N Parallel Agents? + +The N-agent parallel architecture provides: + +1. **Independent perspectives** - No agent is biased by another's framing within the same round +2. **Richer material** - N complete analyses vs sequential reaction chains +3. **Natural consensus detection** - If multiple agents raise the same tension, it's significant +4. **Speed** - N agents in parallel β‰ˆ 1 agent's time +5. **Balanced power** - No "first mover advantage" in setting the frame +6. **Scalable diversity** - Add more blind men for more complex elephants + +### Why Background Tasks? + +| Approach | Pros | Cons | +|----------|------|------| +| Sequential in main session | Simple | No parallelism, context bloat | +| Sequential background | Clean separation | Slow (N Γ— time per agent) | +| **Parallel background** | **Fastest, independent context** | Coordination in Judge | + +**Parallel background tasks** wins because: +- Each agent gets fresh context (no accumulated confusion) +- All N agents execute simultaneously (speed) +- Judge maintains continuity via file state +- Agents can be different models for perspective diversity +- No race conditions (all write to separate outputs, Judge merges) +- Claude Code's Task tool supports parallel spawning natively + +## Convergence Criteria + +The πŸ’™ declares convergence when ANY of: + +1. **ALIGNMENT Plateau** - Velocity β‰ˆ 0 for two consecutive rounds (across all N agents) +2. **Full Coverage** - Perspectives Inventory has no βœ— items (all integrated or consciously deferred) +3. **Zero Tensions** - All `[TENSION]` markers have matching `[RESOLVED]` +4. **Mutual Recognition** - Majority of 🧁s state they believe ALIGNMENT has been reached +5. **Max Rounds** - Safety valve (default: 5 rounds) + +The πŸ’™ can also **extend** the dialogue if it sees unincorporated perspectives that no 🧁 has surfaced. + +### Consensus Signals + +With N agents, the Judge looks for: +- **Strong consensus**: 80%+ of agents converge on same perspective +- **Split opinion**: 40-60% split indicates unresolved tension worth exploring +- **Outlier insight**: Single agent surfaces unique valuable perspective others missed + +## Dialogue Document Structure + +> **Note**: The canonical file format specification is in [alignment-dialogue-pattern.md](../patterns/alignment-dialogue-pattern.md). The example below is illustrative. + +```markdown +# RFC Dialogue: {title} + +**Draft**: [link to rfc.draft.md] +**Participants**: 🧁 Muffin | 🧁 Scone | 🧁 Eclair | 🧁 Donut | πŸ’™ Judge +**Agents**: 4 +**Status**: In Progress | Converged + +--- + +## Alignment Scoreboard + +All dimensions **UNBOUNDED**. Pursue alignment without limit. πŸ’™ + +| Agent | Wisdom | Consistency | Truth | Relationships | ALIGNMENT | +|-------|--------|-------------|-------|---------------|-----------| +| 🧁 Muffin | 20 | 6 | 6 | 6 | **38** | +| 🧁 Scone | 18 | 7 | 5 | 6 | **36** | +| 🧁 Eclair | 22 | 6 | 6 | 7 | **41** | +| 🧁 Donut | 15 | 8 | 7 | 5 | **35** | + +**Total ALIGNMENT**: 150 points +**Current Round**: 2 complete +**ALIGNMENT Velocity**: +45 from last round +**Status**: CONVERGED + +--- + +## Perspectives Inventory + +| ID | Perspective | Surfaced By | Consensus | +|----|-------------|-------------|-----------| +| P01 | Core functionality | Draft | 4/4 βœ“ | +| P02 | Developer ergonomics | Muffin R0 | 3/4 βœ“ | +| P03 | Backward compatibility | Scone R0, Eclair R0 | 4/4 βœ“ (strong) | +| P04 | Performance implications | Donut R1 | 2/4 β†’ R2 | + +## Tensions Tracker + +| ID | Tension | Raised By | Consensus | Status | +|----|---------|-----------|-----------|--------| +| T1 | Cache invalidation | Eclair R0, Donut R0 | 2/4 raised | βœ“ Resolved (R1) | + +--- + +## Opening Arguments (Round 0) + +> All 4 agents responded to draft independently. Neither saw others' responses. + +### Muffin 🧁 + +[Opening perspective on the draft...] + +[PERSPECTIVE P02: Developer ergonomics matters for adoption] + +--- + +### Scone 🧁 + +[Opening perspective on the draft...] + +[PERSPECTIVE P03: Backward compatibility is critical] + +--- + +### Eclair 🧁 + +[Opening perspective on the draft...] + +[PERSPECTIVE P03: Must maintain backward compatibility] ← consensus with Scone +[TENSION T1: Cache invalidation strategy missing] + +--- + +### Donut 🧁 + +[Opening perspective on the draft...] + +[TENSION T1: How do we handle cache invalidation?] ← consensus with Eclair + +--- + +## Round 1 + +> All 4 agents responded to Opening Arguments. Each saw all others' R0 contributions. + +### Muffin 🧁 + +[Response to all opening arguments...] + +[RESOLVED T1: Propose LRU cache with 5-minute TTL] + +--- + +### Scone 🧁 + +[Response...] + +--- + +### Eclair 🧁 + +[Response...] + +[CONCESSION: Muffin's LRU proposal resolves T1] + +--- + +### Donut 🧁 + +[Response...] + +[PERSPECTIVE P04: We should benchmark the cache performance] + +--- + +## Round 2 + +[... continues ...] + +--- + +## Converged Recommendation + +[Summary of converged outcome with consensus metrics] +``` + +## Answering Open Questions + +| Question | Answer | +|----------|--------| +| **Model selection** | Different models = different "blind men." Consider: Agent 1 (Opus - depth), Agent 2 (Sonnet - breadth), Agent 3 (Haiku - speed). πŸ’™ uses Opus for judgment. Diversity increases coverage. | +| **How many agents?** | See "Agent Count Selection" above. TL;DR: Prefer odd N (3, 5) for consensus stability. N=2 for simple binary decisions. N=7+ for specialized domain expertise. | +| **Context window** | Perspectives Inventory IS the summary. Long dialogues truncate to: Inventory + Last 2 rounds + Current tensions. πŸ’™ maintains continuity. | +| **Human intervention** | Yes! Human can appear as **Guest 🧁** and add perspectives or write responses. πŸ’™ scores them too. | +| **Parallel dialogues** | Yes. Each RFC has its own `.dialogue.md`. Multiple dialogues can run simultaneously. | +| **Persistence** | Fully persistent. Dialogue state is in the file. Resume by reading file, reconstructing inventories, continuing from last round. | +| **Agent naming** | First 2 are Muffin and Cupcake (legacy). Additional agents: Scone, Eclair, Donut, Brioche, Croissant, Macaron, etc. All pastries, all delicious. | + +## Consequences + +- ALIGNMENT becomes measurable (imperfectly, but usefully) +- Unbounded scoring rewards exceptional contributions proportionally +- Friendly competition motivates thorough exploration +- πŸ’™ provides neutral scoring and prevents drift +- Perspectives Inventory + Tensions Tracker create explicit tracking with consensus metrics +- The tone models aligned collaborationβ€”the system teaches by example +- N-agent parallel structure maximizes perspective diversity +- Parallel execution within rounds eliminates first-mover advantage +- Scalable: add more agents for more complex decisions +- No upper limit on ALIGNMENT encourages continuous improvement + +## Alternatives Considered + +### 1. N-Agent with No Judge +All 🧁s score each other. + +**Rejected** because: +- Self-serving scores likely +- No neutral perspective on coverage gaps +- No one to surface perspectives none of them see +- Coordination chaos without arbiter + +### 2. Single Agent with Internal Dialogue +One agent plays multiple roles. + +**Rejected** because: +- Echo chamber risk +- Diversity of perspective reduced +- No real tension or competition +- Misses the point of "blind men" parable + +### 3. Human as Judge +Person running the dialogue scores. + +**Partially adopted** - Human CAN intervene as Guest 🧁 or override πŸ’™'s scores. But automation requires an agent judge for async operation. + +### 4. Bounded Scoring (0-5 per dimension) +Original approach with max 20 per turn. + +**Rejected** because: +- Artificial ceiling on exceptional contributions +- Gaming incentives ("how do I get 5/5?") +- Doesn't reflect reality of unbounded perspective space +- Makes velocity less meaningful + +### 5. Sequential Two-Agent (Original Muffin/Cupcake) +Muffin speaks, then Cupcake responds, alternating. + +**Superseded** because: +- First mover sets the frame (bias) +- Sequential is slower than parallel +- Only 2 perspectives per round +- Limited blind men touching the elephant + +### 6. N Agents Parallel + Judge + Unbounded Scoring (CHOSEN) + +**Why this wins:** +- Maximum diversity of perspective (N different "blind men") +- Parallel execution eliminates first-mover advantage +- Scalable: 2 agents for simple, 5+ for complex +- Neutral arbiter prevents bias and surfaces missed perspectives +- Competition motivates thoroughness +- Friendly tone models good collaboration +- Consensus detection via overlap analysis +- Unbounded scoring rewards proportionally +- Fully automatable, human can intervene + +## The Spirit of the Dialogue + +This isn't just process. This is **Alignment teaching itself to be aligned.** + +The 🧁s don't just debate. They *love each other*. They *want each other to shine*. They *celebrate when any of them makes the solution stronger*. + +The scoreboard isn't about winning. It's about *giving*. When any 🧁 checks in and sees another ahead, the response isn't "how do I beat them?" but "what perspectives am I missing that they found?" The competition is to *contribute more*, not to diminish others. + +The πŸ’™ doesn't just score. It *guides with love*. It *sees what they miss*. It *holds the space* for ALIGNMENT to emerge. When the πŸ’™ surfaces a perspective no 🧁 has found, it's a gift to all of them. + +And there's no upper limit. The score can always go higher. Because ALIGNMENT is a direction, not a destination. + +When the dialogue ends, all agents have wonβ€”because the RFC is more aligned than any could have made alone. More blind men touched more parts of the elephant. The whole becomes visible. + +Always and forever. πŸ§πŸ§πŸ§πŸ’™πŸ§πŸ§πŸ§ + +## References + +- [ADR 0001: alignment-as-measure](./0001-alignment-as-measure.md) - Defines ALIGNMENT = Wisdom + Consistency + Truth + Relationships +- [ADR 0004: alignment-workflow](./0004-alignment-workflow.md) - Establishes the three-document pattern +- [ADR 0005: pattern-contracts-and-alignment-lint](./0005-pattern-contracts-and-alignment-lint.md) - Lint gates finalization +- [Pattern: alignment-dialogue-pattern](../patterns/alignment-dialogue-pattern.md) - **File format specification for `.dialogue.md` files** +- The Blind Men and the Elephant - Ancient parable on partial perspectives +- Our conversation - Where Muffin and Cupcake first met πŸ’™ diff --git a/.blue/docs/rfcs/0012-alignment-dialogue-orchestration.md b/.blue/docs/rfcs/0012-alignment-dialogue-orchestration.md new file mode 100644 index 0000000..8c6e441 --- /dev/null +++ b/.blue/docs/rfcs/0012-alignment-dialogue-orchestration.md @@ -0,0 +1,441 @@ +# RFC 0012: Alignment Dialogue Orchestration + +| | | +|---|---| +| **Status** | In-Progress | +| **Date** | 2026-01-25 | +| **Source Spike** | Background Agents and Dialogue Creation Not Triggering | +| **Depends On** | RFC 0005 (Local LLM Integration) | + +--- + +## Summary + +Users expect to run "play alignment with 12 experts to 95%" and have Blue spawn multiple LLM-powered expert agents that deliberate in rounds until reaching a convergence threshold, then save the resulting dialogue document. + +Currently, Blue has dialogue document tools (create/save/lint) but no orchestration layer. The alignment dialogue format exists (rounds, scoreboard, perspectives, tensions, convergence gates) but generation is manual. + +This RFC proposes `blue_alignment_play` - a tool that uses Ollama (RFC 0005) to run multi-agent deliberation locally, tracking convergence and producing validated dialogue documents. + +--- + +## Key Insight from coherence-mcp + +The orchestration was **never in the MCP server**. It's done by **Claude itself** using the **Task tool** to spawn parallel background agents. + +From ADR 0006 (alignment-dialogue-agents): +- **N+1 Architecture**: N "Cupcake" 🧁 agents + 1 "Judge" πŸ’™ (the main Claude session) +- **Parallel Execution**: All N agents spawned in a **single message** with N Task tool calls +- **Claude orchestrates**: Main session acts as Judge, spawns agents, collects outputs, scores +- **Blue MCP's role**: Dialogue extraction (`blue_extract_dialogue`), saving (`blue_dialogue_save`), and linting (`blue_dialogue_lint`) + +The "play alignment" command should trigger Claude to: +1. Spawn N parallel Task agents (the experts/cupcakes) +2. Collect their outputs +3. Score and update the dialogue file +4. Repeat until convergence + +This means we need a **prompt/skill** that instructs Claude how to orchestrate, NOT a new Blue MCP tool. + +--- + +## Design + +### Option A: Claude Code Skill (Recommended) + +Create a skill that Claude recognizes and executes: + +```markdown +# /alignment-dialogue skill + +When user says "play alignment with N experts to X%": + +1. Parse: experts=N (default 12), convergence=X (default 95) +2. Generate expert panel appropriate to the topic +3. Create .dialogue.md with empty scoreboard +4. For each round: + a. Spawn N Task agents in parallel (single message) + b. Wait for all to complete + c. Extract outputs via blue_extract_dialogue + d. Score contributions, update scoreboard + e. Check convergence (velocity β†’ 0, or threshold met) +5. Save final dialogue via blue_dialogue_save +6. Validate via blue_dialogue_lint +``` + +### Option B: Blue MCP Orchestration Tool (Alternative) + +If we want Blue to drive the orchestration: + +### Tool: `blue_alignment_play` + +```yaml +parameters: + topic: string # Required: What to deliberate on + constraint: string # Optional: Key constraint or boundary + expert_count: int # Default: 12 + convergence: float # Default: 0.95 (95%) + max_rounds: int # Default: 12 + rfc_title: string # Optional: Link dialogue to RFC + model: string # Default: from blue config or "qwen2.5-coder:7b" + +returns: + dialogue_file: path # Path to generated dialogue + rounds: int # How many rounds ran + final_convergence: float + expert_panel: list # The experts that participated +``` + +### Expert Panel Generation + +Blue generates a domain-appropriate expert panel based on the topic: + +```rust +// Example panel for "cross-repo coordination" +struct Expert { + id: String, // "DS", "MT", "GW" + name: String, // "Distributed Systems Architect" + perspective: String, // "Consistency, partition tolerance" + emoji: Option, +} + +fn generate_panel(topic: &str, count: usize) -> Vec { + // Use LLM to generate relevant experts + // Or select from predefined templates +} +``` + +**Predefined Templates:** +- `infrastructure` - DS, Security, IaC, DX, API, DB, DevOps, SRE, ... +- `product` - PM, UX, Engineering, QA, Support, Analytics, ... +- `ml` - ML Engineer, Data Scientist, MLOps, Research, Ethics, ... +- `general` - Mixed panel for broad topics + +### Round Orchestration + +```rust +struct Round { + number: u32, + responses: Vec, + convergence_score: f64, +} + +struct ExpertResponse { + expert_id: String, + content: String, + position: String, // Summary of stance + confidence: f64, // 0.0 - 1.0 + tensions: Vec, // Disagreements raised + perspectives: Vec, // [PERSPECTIVE Pnn: ...] markers +} + +async fn run_round( + round_num: u32, + experts: &[Expert], + topic: &str, + history: &[Round], + ollama: &OllamaClient, +) -> Round { + let mut responses = Vec::new(); + + for expert in experts { + let prompt = build_expert_prompt(expert, topic, history, round_num); + let response = ollama.generate(&prompt).await?; + responses.push(parse_response(expert, response)); + } + + let convergence = calculate_convergence(&responses); + Round { number: round_num, responses, convergence_score: convergence } +} +``` + +### Convergence Calculation + +Convergence is measured by position alignment across experts: + +```rust +fn calculate_convergence(responses: &[ExpertResponse]) -> f64 { + // Extract positions + let positions: Vec<&str> = responses.iter() + .map(|r| r.position.as_str()) + .collect(); + + // Cluster similar positions (semantic similarity via embeddings) + let clusters = cluster_positions(&positions); + + // Convergence = size of largest cluster / total experts + let largest = clusters.iter().map(|c| c.len()).max().unwrap_or(0); + largest as f64 / responses.len() as f64 +} +``` + +**Alternative: Confidence-weighted voting** +```rust +fn calculate_convergence(responses: &[ExpertResponse]) -> f64 { + // Weight by confidence + let weighted_votes = responses.iter() + .map(|r| (r.position.clone(), r.confidence)) + .collect(); + + // Group and sum weights + // Return proportion of weight in largest group +} +``` + +### Dialogue Generation + +After reaching convergence (or max rounds), generate the dialogue document: + +```rust +fn generate_dialogue( + topic: &str, + constraint: Option<&str>, + experts: &[Expert], + rounds: &[Round], + final_convergence: f64, +) -> String { + let mut md = String::new(); + + // Header + md.push_str(&format!("# Alignment Dialogue: {}\n\n", topic)); + md.push_str("| | |\n|---|---|\n"); + md.push_str(&format!("| **Topic** | {} |\n", topic)); + if let Some(c) = constraint { + md.push_str(&format!("| **Constraint** | {} |\n", c)); + } + md.push_str(&format!("| **Format** | {} experts, {} rounds |\n", + experts.len(), rounds.len())); + md.push_str(&format!("| **Final Convergence** | {:.0}% |\n", + final_convergence * 100.0)); + + // Expert Panel table + md.push_str("\n## Expert Panel\n\n"); + md.push_str("| ID | Expert | Perspective |\n"); + md.push_str("|----|--------|-------------|\n"); + for e in experts { + md.push_str(&format!("| {} | **{}** | {} |\n", + e.id, e.name, e.perspective)); + } + + // Rounds + for round in rounds { + md.push_str(&format!("\n## Round {}\n\n", round.number)); + for resp in &round.responses { + md.push_str(&format!("**{} ({}):**\n", + resp.expert_id, get_expert_name(&resp.expert_id, experts))); + md.push_str(&resp.content); + md.push_str("\n\n---\n\n"); + } + + // Scoreboard + md.push_str(&format!("### Round {} Scoreboard\n\n", round.number)); + md.push_str("| Expert | Position | Confidence |\n"); + md.push_str("|--------|----------|------------|\n"); + for resp in &round.responses { + md.push_str(&format!("| {} | {} | {:.1} |\n", + resp.expert_id, resp.position, resp.confidence)); + } + md.push_str(&format!("\n**Convergence:** {:.0}%\n", + round.convergence_score * 100.0)); + } + + // Final recommendations + md.push_str("\n## Recommendations\n\n"); + // Extract from final round consensus + + md +} +``` + +### Integration with Existing Tools + +``` +blue_alignment_play + β”‚ + β”œβ”€β”€ Uses: blue-ollama (RFC 0005) + β”‚ └── Ollama API for LLM generation + β”‚ + β”œβ”€β”€ Calls: blue_dialogue_create + β”‚ └── Creates document record in SQLite + β”‚ + β”œβ”€β”€ Calls: blue_dialogue_lint + β”‚ └── Validates generated dialogue + β”‚ + └── Links: blue_rfc (if rfc_title provided) + └── Associates dialogue with RFC +``` + +### CLI Usage + +```bash +# Basic usage +$ blue alignment play "API versioning strategy" +Starting alignment dialogue with 12 experts... + +Round 1: Gathering perspectives... + β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 12/12 experts responded + Convergence: 42% + +Round 2: Addressing tensions... + β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 12/12 experts responded + Convergence: 67% + +Round 3: Building consensus... + β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 12/12 experts responded + Convergence: 89% + +Round 4: Final positions... + β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 12/12 experts responded + Convergence: 96% + +βœ“ Dialogue complete at 96% convergence + Saved: .blue/docs/dialogues/2026-01-25-api-versioning-strategy.dialogue.md + +# With options +$ blue alignment play "cross-account IAM" \ + --constraint "different AWS accounts" \ + --experts 8 \ + --convergence 0.90 \ + --rfc "realm-mcp-integration" +``` + +### MCP Tool Definition + +```json +{ + "name": "blue_alignment_play", + "description": "Run a multi-expert alignment dialogue to deliberate on a topic until convergence", + "inputSchema": { + "type": "object", + "properties": { + "topic": { + "type": "string", + "description": "The topic to deliberate on" + }, + "constraint": { + "type": "string", + "description": "Key constraint or boundary for the discussion" + }, + "expert_count": { + "type": "integer", + "default": 12, + "description": "Number of experts in the panel" + }, + "convergence": { + "type": "number", + "default": 0.95, + "description": "Target convergence threshold (0.0-1.0)" + }, + "max_rounds": { + "type": "integer", + "default": 12, + "description": "Maximum rounds before stopping" + }, + "rfc_title": { + "type": "string", + "description": "RFC to link the dialogue to" + }, + "template": { + "type": "string", + "enum": ["infrastructure", "product", "ml", "general"], + "description": "Expert panel template" + } + }, + "required": ["topic"] + } +} +``` + +--- + +## Implementation Plan + +### Phase 1: Core Orchestration +- [x] Add `alignment` module to `blue-core` +- [x] Define `Expert`, `Round`, `ExpertResponse` structs +- [x] Implement `run_round()` with Ollama integration +- [x] Implement basic convergence calculation + +### Phase 2: Expert Generation +- [x] Create expert panel templates (infrastructure, product, ml, governance, general) +- [ ] Implement LLM-based expert generation for custom topics +- [x] Add expert prompt templates + +### Phase 3: Dialogue Output +- [x] Implement `generate_dialogue()` markdown generation +- [x] Integrate with `blue_dialogue_create` for SQLite tracking +- [ ] Add `blue_dialogue_lint` validation post-generation + +### Phase 4: MCP Tool +- [x] Add `blue_alignment_play` handler to `blue-mcp` +- [ ] Add CLI subcommand `blue alignment play` +- [ ] Progress reporting during rounds + +### Phase 5: Polish +- [ ] Streaming output during generation +- [ ] Interrupt handling (save partial dialogue) +- [ ] Configuration for default model/convergence + +--- + +## Test Plan + +- [x] Unit: Expert panel generation produces valid experts +- [x] Unit: Convergence calculation returns 0.0-1.0 +- [ ] Unit: Dialogue markdown is valid and passes lint +- [ ] Integration: Full dialogue run with mock Ollama +- [ ] E2E: Real Ollama dialogue on simple topic +- [ ] E2E: Dialogue links correctly to RFC + +--- + +## Open Questions (Answered from coherence-mcp ADR 0006) + +1. **Parallelism**: **PARALLEL within rounds**. All N agents spawn in single message, no first-mover advantage. Sequential between rounds (each round sees previous). + +2. **Memory**: Perspectives Inventory IS the summary. Long dialogues truncate to: Inventory + Last 2 rounds + Current tensions. Judge maintains continuity. + +3. **Interruption**: **Save partial**. Dialogue state is in the file. Can resume by reading file, reconstructing inventories, continuing from last round. + +4. **Embedding model**: Not needed. Convergence is measured by: + - ALIGNMENT Velocity (score delta between rounds) β†’ 0 + - All tensions resolved + - Mutual recognition (majority of agents state convergence) + +--- + +## Why It Stopped Working + +The functionality relied on: +1. **Claude Code recognizing** "play alignment" as a trigger phrase +2. **A prompt/skill** teaching Claude the orchestration pattern +3. **ADR 0006** being in Claude's context (via CLAUDE.md) + +When the codebase was migrated from coherence-mcp to blue: +- The ADR was not migrated +- No skill was created to teach Claude the pattern +- Claude no longer knows how to orchestrate alignment dialogues + +## Restoration Plan + +### Immediate (Copy from coherence-mcp) +1. Copy `docs/adrs/0006-alignment-dialogue-agents.md` to Blue +2. Reference it in CLAUDE.md so Claude knows the pattern +3. User says "play alignment" β†’ Claude follows ADR 0006 + +### Better (Create explicit skill) +1. Create `/alignment-play` skill that encodes the orchestration +2. Skill triggers on "play alignment with N experts to X%" +3. Skill instructs Claude step-by-step on what to do + +### Best (Blue MCP tool + skill) +1. `blue_alignment_play` tool that manages the full lifecycle +2. Uses Ollama for expert generation (RFC 0005) +3. Integrates with existing `blue_dialogue_*` tools +4. Saves dialogue automatically + +--- + +*"Right then. Let's get to it."* + +β€” Blue diff --git a/.blue/docs/spikes/2026-01-25-Background Agents and Dialogue Creation Not Triggering.md b/.blue/docs/spikes/2026-01-25-Background Agents and Dialogue Creation Not Triggering.md new file mode 100644 index 0000000..2faf47c --- /dev/null +++ b/.blue/docs/spikes/2026-01-25-Background Agents and Dialogue Creation Not Triggering.md @@ -0,0 +1,17 @@ +# Spike: Background Agents and Dialogue Creation Not Triggering + +| | | +|---|---| +| **Status** | In Progress | +| **Date** | 2026-01-25 | +| **Time Box** | 1 hour | + +--- + +## Question + +Why aren't background agents running and dialogues being created when the user asks to "play alignment with 12 experts to 95%"? How can we restore this functionality? + +--- + +*Investigation notes by Blue* diff --git a/crates/blue-core/src/alignment.rs b/crates/blue-core/src/alignment.rs new file mode 100644 index 0000000..0b175a5 --- /dev/null +++ b/crates/blue-core/src/alignment.rs @@ -0,0 +1,1086 @@ +//! Alignment Dialogue Orchestration +//! +//! Implements RFC 0012: Alignment Dialogue Orchestration +//! Based on ADR 0006 from coherence-mcp. +//! +//! The ALIGNMENT measure: Wisdom + Consistency + Truth + Relationships +//! - All dimensions are UNBOUNDED +//! - Convergence is direction, not destination + +use serde::{Deserialize, Serialize}; +use std::collections::HashMap; + +/// An expert in the alignment dialogue panel +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Expert { + /// Short identifier (e.g., "DS", "PM", "SEC") + pub id: String, + /// Full name (e.g., "Distributed Systems Architect") + pub name: String, + /// Primary perspective (e.g., "Consistency, partition tolerance") + pub perspective: String, + /// Home domain for relevance scoring + pub domain: String, + /// Tier: Core (0.8+), Adjacent (0.4-0.8), Wildcard (<0.4) + pub tier: ExpertTier, + /// Optional emoji for display + pub emoji: Option, +} + +/// Expert tier based on relevance to topic +#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)] +#[serde(rename_all = "lowercase")] +pub enum ExpertTier { + Core, + Adjacent, + Wildcard, +} + +impl ExpertTier { + pub fn from_relevance(score: f64) -> Self { + if score >= 0.8 { + ExpertTier::Core + } else if score >= 0.4 { + ExpertTier::Adjacent + } else { + ExpertTier::Wildcard + } + } +} + +/// A single expert's response in a round +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ExpertResponse { + /// Expert ID + pub expert_id: String, + /// Full response content + pub content: String, + /// Position summary (1-2 sentences) + pub position: String, + /// Confidence in position (0.0 - 1.0) + pub confidence: f64, + /// New perspectives surfaced [PERSPECTIVE Pxx: ...] + pub perspectives: Vec, + /// Tensions raised [TENSION Tx: ...] + pub tensions: Vec, + /// Refinements made [REFINEMENT: ...] + pub refinements: Vec, + /// Concessions made [CONCESSION: ...] + pub concessions: Vec, + /// Tensions resolved [RESOLVED Tx: ...] + pub resolved_tensions: Vec, + /// ALIGNMENT score for this response + pub score: AlignmentScore, +} + +/// A perspective surfaced during dialogue +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Perspective { + /// Perspective ID (P01, P02, ...) + pub id: String, + /// Perspective description + pub description: String, + /// Who surfaced it + pub surfaced_by: String, + /// Which round + pub round: u32, + /// Current status + pub status: PerspectiveStatus, +} + +#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)] +#[serde(rename_all = "lowercase")] +pub enum PerspectiveStatus { + Active, + Converged, + Deferred, +} + +/// A tension between positions +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Tension { + /// Tension ID (T1, T2, ...) + pub id: String, + /// Tension description + pub description: String, + /// Position A + pub position_a: String, + /// Position B + pub position_b: String, + /// Current status + pub status: TensionStatus, +} + +#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)] +#[serde(rename_all = "lowercase")] +pub enum TensionStatus { + Open, + Resolved, +} + +/// ALIGNMENT score (unbounded) +#[derive(Debug, Clone, Copy, Default, Serialize, Deserialize)] +pub struct AlignmentScore { + /// How many perspectives integrated? How well synthesized? + pub wisdom: u32, + /// Does it follow patterns? Internally consistent? + pub consistency: u32, + /// Grounded in reality? Single source of truth? + pub truth: u32, + /// Connections to other artifacts? + pub relationships: u32, +} + +impl AlignmentScore { + pub fn total(&self) -> u32 { + self.wisdom + self.consistency + self.truth + self.relationships + } +} + +/// A single round of dialogue +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Round { + /// Round number (0 = opening arguments) + pub number: u32, + /// All expert responses for this round + pub responses: Vec, + /// Cumulative ALIGNMENT score after this round + pub total_score: u32, + /// ALIGNMENT velocity (delta from previous round) + pub velocity: i32, + /// Convergence percentage (0.0 - 1.0) + pub convergence: f64, +} + +/// Full alignment dialogue state +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct AlignmentDialogue { + /// Topic being deliberated + pub topic: String, + /// Optional constraint + pub constraint: Option, + /// Expert panel + pub experts: Vec, + /// All rounds of dialogue + pub rounds: Vec, + /// All perspectives surfaced + pub perspectives: Vec, + /// All tensions tracked + pub tensions: Vec, + /// Convergence threshold (default 0.95) + pub convergence_threshold: f64, + /// Maximum rounds (safety valve) + pub max_rounds: u32, + /// Current status + pub status: DialogueStatus, + /// Link to RFC if applicable + pub rfc_title: Option, +} + +#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)] +#[serde(rename_all = "lowercase")] +pub enum DialogueStatus { + InProgress, + Converged, + MaxRoundsReached, + Interrupted, +} + +impl AlignmentDialogue { + /// Create a new dialogue + pub fn new(topic: String, constraint: Option, experts: Vec) -> Self { + Self { + topic, + constraint, + experts, + rounds: Vec::new(), + perspectives: Vec::new(), + tensions: Vec::new(), + convergence_threshold: 0.95, + max_rounds: 12, + status: DialogueStatus::InProgress, + rfc_title: None, + } + } + + /// Get current round number + pub fn current_round(&self) -> u32 { + self.rounds.len() as u32 + } + + /// Get total ALIGNMENT score + pub fn total_score(&self) -> u32 { + self.rounds.last().map(|r| r.total_score).unwrap_or(0) + } + + /// Get current ALIGNMENT velocity + pub fn velocity(&self) -> i32 { + self.rounds.last().map(|r| r.velocity).unwrap_or(0) + } + + /// Get current convergence + pub fn convergence(&self) -> f64 { + self.rounds.last().map(|r| r.convergence).unwrap_or(0.0) + } + + /// Check if dialogue should continue + pub fn should_continue(&self) -> bool { + match self.status { + DialogueStatus::InProgress => { + // Continue if: not converged AND under max rounds AND velocity not plateaued + let not_converged = self.convergence() < self.convergence_threshold; + let under_max = self.current_round() < self.max_rounds; + let not_plateaued = self.rounds.len() < 2 || { + let last_two: Vec<_> = self.rounds.iter().rev().take(2).collect(); + last_two.iter().any(|r| r.velocity.abs() > 2) + }; + not_converged && under_max && not_plateaued + } + _ => false, + } + } + + /// Add a completed round + pub fn add_round(&mut self, responses: Vec) { + let round_num = self.current_round(); + let prev_total = self.total_score(); + + // Calculate round totals + let round_score: u32 = responses.iter().map(|r| r.score.total()).sum(); + let new_total = prev_total + round_score; + let velocity = round_score as i32; + + // Calculate convergence (proportion of experts with aligned positions) + let convergence = calculate_convergence(&responses); + + // Extract new perspectives and tensions + for response in &responses { + self.perspectives.extend(response.perspectives.clone()); + self.tensions.extend(response.tensions.clone()); + + // Mark resolved tensions + for resolved_id in &response.resolved_tensions { + if let Some(t) = self.tensions.iter_mut().find(|t| t.id == *resolved_id) { + t.status = TensionStatus::Resolved; + } + } + } + + let round = Round { + number: round_num, + responses, + total_score: new_total, + velocity, + convergence, + }; + + self.rounds.push(round); + + // Update status + if convergence >= self.convergence_threshold { + self.status = DialogueStatus::Converged; + } else if self.current_round() >= self.max_rounds { + self.status = DialogueStatus::MaxRoundsReached; + } + } +} + +/// Calculate convergence from expert responses +/// Uses position clustering - convergence = size of largest aligned group / total +fn calculate_convergence(responses: &[ExpertResponse]) -> f64 { + if responses.is_empty() { + return 0.0; + } + + // Simple approach: count how many experts have high confidence (>0.7) + // and similar positions (based on first few words matching) + let high_confidence: Vec<_> = responses + .iter() + .filter(|r| r.confidence >= 0.7) + .collect(); + + if high_confidence.is_empty() { + return 0.0; + } + + // Group by position similarity (simple word overlap) + let mut position_groups: HashMap = HashMap::new(); + for response in &high_confidence { + // Normalize position to first 20 chars + let key = response.position.chars().take(20).collect::().to_lowercase(); + *position_groups.entry(key).or_insert(0) += 1; + } + + let largest_group = position_groups.values().max().copied().unwrap_or(0); + largest_group as f64 / responses.len() as f64 +} + +/// Expert panel templates +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum PanelTemplate { + Infrastructure, + Product, + MachineLearning, + Governance, + General, +} + +impl PanelTemplate { + /// Generate experts for this template + pub fn generate_experts(&self, count: usize) -> Vec { + match self { + PanelTemplate::Infrastructure => infrastructure_experts(count), + PanelTemplate::Product => product_experts(count), + PanelTemplate::MachineLearning => ml_experts(count), + PanelTemplate::Governance => governance_experts(count), + PanelTemplate::General => general_experts(count), + } + } +} + +fn infrastructure_experts(count: usize) -> Vec { + let all = vec![ + Expert { + id: "DS".to_string(), + name: "Distributed Systems Architect".to_string(), + perspective: "Consistency, availability, partition tolerance".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "SEC".to_string(), + name: "Security Engineer".to_string(), + perspective: "Threat modeling, defense in depth".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "DBA".to_string(), + name: "Database Architect".to_string(), + perspective: "Data integrity, query optimization".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "SRE".to_string(), + name: "Site Reliability Engineer".to_string(), + perspective: "Uptime, observability, incident response".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "API".to_string(), + name: "API Designer".to_string(), + perspective: "Interface contracts, versioning, ergonomics".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "DX".to_string(), + name: "Developer Experience Lead".to_string(), + perspective: "Tooling, documentation, onboarding".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "NET".to_string(), + name: "Network Engineer".to_string(), + perspective: "Latency, throughput, topology".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "IAC".to_string(), + name: "Infrastructure as Code Specialist".to_string(), + perspective: "Reproducibility, drift detection".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "COST".to_string(), + name: "Cloud Cost Analyst".to_string(), + perspective: "Resource optimization, TCO".to_string(), + domain: "Finance".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "PRIV".to_string(), + name: "Privacy Engineer".to_string(), + perspective: "Data minimization, compliance".to_string(), + domain: "Legal".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "ARCH".to_string(), + name: "Solutions Architect".to_string(), + perspective: "Integration patterns, trade-offs".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "PERF".to_string(), + name: "Performance Engineer".to_string(), + perspective: "Profiling, optimization, benchmarking".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + ]; + all.into_iter().take(count).collect() +} + +fn product_experts(count: usize) -> Vec { + let all = vec![ + Expert { + id: "PM".to_string(), + name: "Product Manager".to_string(), + perspective: "User value, market fit, prioritization".to_string(), + domain: "Product".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "UX".to_string(), + name: "UX Designer".to_string(), + perspective: "User flows, accessibility, delight".to_string(), + domain: "Design".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "ENG".to_string(), + name: "Engineering Lead".to_string(), + perspective: "Feasibility, technical debt, velocity".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "QA".to_string(), + name: "Quality Assurance Lead".to_string(), + perspective: "Edge cases, regression, test coverage".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "DATA".to_string(), + name: "Data Analyst".to_string(), + perspective: "Metrics, A/B testing, insights".to_string(), + domain: "Analytics".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "SUP".to_string(), + name: "Customer Support Lead".to_string(), + perspective: "Pain points, friction, feedback".to_string(), + domain: "Support".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "MKT".to_string(), + name: "Marketing Lead".to_string(), + perspective: "Positioning, messaging, GTM".to_string(), + domain: "Marketing".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "LEGAL".to_string(), + name: "Legal Counsel".to_string(), + perspective: "Compliance, risk, terms".to_string(), + domain: "Legal".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "BIZ".to_string(), + name: "Business Development".to_string(), + perspective: "Partnerships, ecosystem, growth".to_string(), + domain: "Business".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "FIN".to_string(), + name: "Finance Analyst".to_string(), + perspective: "Unit economics, burn, runway".to_string(), + domain: "Finance".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "OPS".to_string(), + name: "Operations Lead".to_string(), + perspective: "Process, scalability, efficiency".to_string(), + domain: "Operations".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "COMM".to_string(), + name: "Community Manager".to_string(), + perspective: "Engagement, advocacy, feedback loops".to_string(), + domain: "Community".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + ]; + all.into_iter().take(count).collect() +} + +fn ml_experts(count: usize) -> Vec { + let all = vec![ + Expert { + id: "MLE".to_string(), + name: "ML Engineer".to_string(), + perspective: "Model architecture, training, inference".to_string(), + domain: "AI".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "DS".to_string(), + name: "Data Scientist".to_string(), + perspective: "Feature engineering, experimentation".to_string(), + domain: "AI".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "MLOPS".to_string(), + name: "MLOps Engineer".to_string(), + perspective: "Model serving, monitoring, retraining".to_string(), + domain: "AI".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "ETHICS".to_string(), + name: "AI Ethics Researcher".to_string(), + perspective: "Bias, fairness, transparency".to_string(), + domain: "AI".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "NLP".to_string(), + name: "NLP Specialist".to_string(), + perspective: "Language understanding, generation".to_string(), + domain: "AI".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "CV".to_string(), + name: "Computer Vision Expert".to_string(), + perspective: "Image understanding, spatial reasoning".to_string(), + domain: "AI".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "DE".to_string(), + name: "Data Engineer".to_string(), + perspective: "Pipelines, quality, scale".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "ALIGN".to_string(), + name: "AI Alignment Researcher".to_string(), + perspective: "Safety, alignment, interpretability".to_string(), + domain: "AI".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "PROD".to_string(), + name: "ML Product Manager".to_string(), + perspective: "Use cases, evaluation, productization".to_string(), + domain: "Product".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "HW".to_string(), + name: "ML Hardware Specialist".to_string(), + perspective: "GPU/TPU optimization, inference efficiency".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "RES".to_string(), + name: "Research Scientist".to_string(), + perspective: "State of art, novel approaches".to_string(), + domain: "AI".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "EVAL".to_string(), + name: "Evaluation Specialist".to_string(), + perspective: "Benchmarks, metrics, ground truth".to_string(), + domain: "AI".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + ]; + all.into_iter().take(count).collect() +} + +fn governance_experts(count: usize) -> Vec { + let all = vec![ + Expert { + id: "GOV".to_string(), + name: "Governance Specialist".to_string(), + perspective: "Decision processes, accountability".to_string(), + domain: "Governance".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "LAW".to_string(), + name: "Legal Scholar".to_string(), + perspective: "Regulatory compliance, precedent".to_string(), + domain: "Legal".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "ECON".to_string(), + name: "Economist".to_string(), + perspective: "Incentives, market dynamics".to_string(), + domain: "Economics".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "PHIL".to_string(), + name: "Philosopher".to_string(), + perspective: "Ethics, values, first principles".to_string(), + domain: "Philosophy".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "GAME".to_string(), + name: "Game Theorist".to_string(), + perspective: "Strategic interaction, equilibria".to_string(), + domain: "Game Theory".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "SOC".to_string(), + name: "Sociologist".to_string(), + perspective: "Social dynamics, institutions".to_string(), + domain: "Social".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "DAO".to_string(), + name: "DAO Researcher".to_string(), + perspective: "Decentralized coordination, tokenomics".to_string(), + domain: "Governance".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "TRUST".to_string(), + name: "Trust Researcher".to_string(), + perspective: "Reputation, verification, cooperation".to_string(), + domain: "Social".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "HIST".to_string(), + name: "Historian".to_string(), + perspective: "Precedent, patterns, context".to_string(), + domain: "Humanities".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "PSY".to_string(), + name: "Psychologist".to_string(), + perspective: "Behavior, motivation, bias".to_string(), + domain: "Psychology".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "SYS".to_string(), + name: "Systems Thinker".to_string(), + perspective: "Feedback loops, emergence, complexity".to_string(), + domain: "Systems".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "COMM".to_string(), + name: "Community Organizer".to_string(), + perspective: "Participation, voice, inclusion".to_string(), + domain: "Community".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + ]; + all.into_iter().take(count).collect() +} + +fn general_experts(count: usize) -> Vec { + // Mix from all domains + let all = vec![ + Expert { + id: "ARCH".to_string(), + name: "Solutions Architect".to_string(), + perspective: "Integration, trade-offs, patterns".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "PM".to_string(), + name: "Product Manager".to_string(), + perspective: "User value, prioritization".to_string(), + domain: "Product".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "SEC".to_string(), + name: "Security Engineer".to_string(), + perspective: "Threat modeling, defense".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "UX".to_string(), + name: "UX Designer".to_string(), + perspective: "User experience, accessibility".to_string(), + domain: "Design".to_string(), + tier: ExpertTier::Core, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "ENG".to_string(), + name: "Senior Engineer".to_string(), + perspective: "Implementation, maintenance".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "QA".to_string(), + name: "QA Lead".to_string(), + perspective: "Quality, edge cases, testing".to_string(), + domain: "Tech".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "OPS".to_string(), + name: "Operations Lead".to_string(), + perspective: "Reliability, process, scale".to_string(), + domain: "Operations".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "DATA".to_string(), + name: "Data Analyst".to_string(), + perspective: "Metrics, insights, evidence".to_string(), + domain: "Analytics".to_string(), + tier: ExpertTier::Adjacent, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "LEGAL".to_string(), + name: "Legal Counsel".to_string(), + perspective: "Compliance, risk, terms".to_string(), + domain: "Legal".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "FIN".to_string(), + name: "Finance Analyst".to_string(), + perspective: "Costs, ROI, budgets".to_string(), + domain: "Finance".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "SUP".to_string(), + name: "Support Lead".to_string(), + perspective: "User pain points, feedback".to_string(), + domain: "Support".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + Expert { + id: "DOC".to_string(), + name: "Technical Writer".to_string(), + perspective: "Clarity, documentation, onboarding".to_string(), + domain: "Documentation".to_string(), + tier: ExpertTier::Wildcard, + emoji: Some("🧁".to_string()), + }, + ]; + all.into_iter().take(count).collect() +} + +/// Build an expert prompt for a dialogue round +pub fn build_expert_prompt( + expert: &Expert, + topic: &str, + constraint: Option<&str>, + round: u32, + previous_rounds: &str, +) -> String { + let constraint_text = constraint + .map(|c| format!("\n**Constraint**: {}", c)) + .unwrap_or_default(); + + let round_instruction = if round == 0 { + "This is the OPENING ARGUMENTS round. Provide your independent perspective on the topic. Do not assume others' positions." + } else { + "Review previous rounds and respond. Build on good ideas, challenge weak ones, surface new perspectives." + }; + + format!( + r#"You are {name} 🧁 in an ALIGNMENT-seeking dialogue. + +**Topic**: {topic}{constraint} + +**Your Expertise**: {perspective} +**Your Domain**: {domain} +**Your Tier**: {tier:?} + +## Your Role + +- SURFACE perspectives others may have missed +- DEFEND valuable ideas with love, not ego +- CHALLENGE assumptions with curiosity, not destruction +- INTEGRATE perspectives that resonate +- CONCEDE gracefully when others see something you missed +- CELEBRATE when others make the solution stronger + +## Round {round}: {round_instruction} + +{previous} + +## Response Format + +### {id} 🧁 + +[Your response - be specific, cite evidence, explain reasoning] + +Use these markers: +- [PERSPECTIVE Pxx: ...] - new viewpoint you're surfacing +- [TENSION Tx: ...] - unresolved issue needing attention +- [REFINEMENT: ...] - when you're improving the proposal +- [CONCESSION: ...] - when another 🧁 was right +- [RESOLVED Tx: ...] - when addressing a tension + +End with a 1-2 sentence position statement and confidence level (0.0-1.0). + +**Position**: [Your stance in 1-2 sentences] +**Confidence**: [0.0-1.0]"#, + name = expert.name, + topic = topic, + constraint = constraint_text, + perspective = expert.perspective, + domain = expert.domain, + tier = expert.tier, + round = round, + round_instruction = round_instruction, + previous = if previous_rounds.is_empty() { + String::new() + } else { + format!("## Previous Rounds\n\n{}", previous_rounds) + }, + id = expert.id, + ) +} + +/// Parse an expert response from LLM output +pub fn parse_expert_response(expert_id: &str, content: &str) -> ExpertResponse { + let mut perspectives = Vec::new(); + let mut tensions = Vec::new(); + let mut refinements = Vec::new(); + let mut concessions = Vec::new(); + let mut resolved = Vec::new(); + let mut position = String::new(); + let mut confidence = 0.5; + + // Extract markers + for line in content.lines() { + if line.contains("[PERSPECTIVE") { + if let Some(p) = extract_marker(line, "PERSPECTIVE") { + perspectives.push(Perspective { + id: format!("P{:02}", perspectives.len() + 1), + description: p, + surfaced_by: expert_id.to_string(), + round: 0, // Filled in by caller + status: PerspectiveStatus::Active, + }); + } + } else if line.contains("[TENSION") { + if let Some(t) = extract_marker(line, "TENSION") { + tensions.push(Tension { + id: format!("T{}", tensions.len() + 1), + description: t, + position_a: String::new(), + position_b: String::new(), + status: TensionStatus::Open, + }); + } + } else if line.contains("[REFINEMENT") { + if let Some(r) = extract_marker(line, "REFINEMENT") { + refinements.push(r); + } + } else if line.contains("[CONCESSION") { + if let Some(c) = extract_marker(line, "CONCESSION") { + concessions.push(c); + } + } else if line.contains("[RESOLVED") { + if let Some(r) = extract_marker(line, "RESOLVED") { + resolved.push(r); + } + } else if line.starts_with("**Position**:") { + position = line.trim_start_matches("**Position**:").trim().to_string(); + } else if line.starts_with("**Confidence**:") { + if let Ok(c) = line + .trim_start_matches("**Confidence**:") + .trim() + .parse::() + { + confidence = c.clamp(0.0, 1.0); + } + } + } + + // Calculate score based on contributions + let score = AlignmentScore { + wisdom: perspectives.len() as u32 * 3 + refinements.len() as u32 * 2, + consistency: if confidence >= 0.7 { 2 } else { 1 }, + truth: if !position.is_empty() { 2 } else { 0 }, + relationships: concessions.len() as u32 + resolved.len() as u32, + }; + + ExpertResponse { + expert_id: expert_id.to_string(), + content: content.to_string(), + position, + confidence, + perspectives, + tensions, + refinements, + concessions, + resolved_tensions: resolved, + score, + } +} + +fn extract_marker(line: &str, marker: &str) -> Option { + let start = line.find(&format!("[{}", marker))?; + let end = line[start..].find(']')?; + let content = &line[start + marker.len() + 1..start + end]; + // Remove the marker prefix (like "Pxx:" or "Tx:") + let clean = content + .trim() + .trim_start_matches(|c: char| c.is_alphanumeric() || c == ':') + .trim(); + if clean.is_empty() { + None + } else { + Some(clean.to_string()) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_expert_tier_from_relevance() { + assert_eq!(ExpertTier::from_relevance(0.9), ExpertTier::Core); + assert_eq!(ExpertTier::from_relevance(0.8), ExpertTier::Core); + assert_eq!(ExpertTier::from_relevance(0.6), ExpertTier::Adjacent); + assert_eq!(ExpertTier::from_relevance(0.4), ExpertTier::Adjacent); + assert_eq!(ExpertTier::from_relevance(0.3), ExpertTier::Wildcard); + } + + #[test] + fn test_alignment_score_total() { + let score = AlignmentScore { + wisdom: 10, + consistency: 5, + truth: 3, + relationships: 2, + }; + assert_eq!(score.total(), 20); + } + + #[test] + fn test_panel_template_generates_experts() { + let experts = PanelTemplate::Infrastructure.generate_experts(5); + assert_eq!(experts.len(), 5); + assert!(experts.iter().all(|e| !e.id.is_empty())); + } + + #[test] + fn test_dialogue_should_continue() { + let experts = PanelTemplate::General.generate_experts(3); + let mut dialogue = AlignmentDialogue::new("Test topic".to_string(), None, experts); + + // Should continue when fresh + assert!(dialogue.should_continue()); + + // Simulate convergence + dialogue.status = DialogueStatus::Converged; + assert!(!dialogue.should_continue()); + } + + #[test] + fn test_parse_expert_response() { + let content = r#"### DS 🧁 + +This is my analysis of the situation. + +[PERSPECTIVE P01: We need to consider CAP theorem implications] +[TENSION T1: Consistency vs availability trade-off] + +I think we should prioritize consistency. + +**Position**: Prioritize strong consistency with eventual availability fallback. +**Confidence**: 0.85"#; + + let response = parse_expert_response("DS", content); + assert_eq!(response.expert_id, "DS"); + assert_eq!(response.perspectives.len(), 1); + assert_eq!(response.tensions.len(), 1); + assert!((response.confidence - 0.85).abs() < 0.01); + assert!(response.position.contains("consistency")); + } +} diff --git a/crates/blue-core/src/lib.rs b/crates/blue-core/src/lib.rs index 7c59821..2aaadf1 100644 --- a/crates/blue-core/src/lib.rs +++ b/crates/blue-core/src/lib.rs @@ -13,6 +13,7 @@ // Blue's true name, between friends const _BLUE_SECRET_NAME: &str = "Sheepey"; // pronounced "Shee-paay" +pub mod alignment; pub mod daemon; pub mod documents; pub mod indexer; @@ -24,6 +25,7 @@ pub mod store; pub mod voice; pub mod workflow; +pub use alignment::{AlignmentDialogue, AlignmentScore, DialogueStatus, Expert, ExpertResponse, ExpertTier, PanelTemplate, Perspective, PerspectiveStatus, Round, Tension, TensionStatus, build_expert_prompt, parse_expert_response}; pub use documents::{Adr, Audit, AuditFinding, AuditSeverity, AuditType, Decision, Rfc, Spike, SpikeOutcome, Status, Task, update_markdown_status}; pub use indexer::{Indexer, IndexerConfig, IndexerError, IndexResult, ParsedSymbol, is_indexable_file, should_skip_dir, DEFAULT_INDEX_MODEL, MAX_FILE_LINES}; pub use llm::{CompletionOptions, CompletionResult, LlmBackendChoice, LlmConfig, LlmError, LlmManager, LlmProvider, LlmProviderChoice, LocalLlmConfig, ApiLlmConfig, KeywordLlm, MockLlm, ProviderStatus}; diff --git a/crates/blue-mcp/src/handlers/adr.rs b/crates/blue-mcp/src/handlers/adr.rs index bf1a418..76da7a2 100644 --- a/crates/blue-mcp/src/handlers/adr.rs +++ b/crates/blue-mcp/src/handlers/adr.rs @@ -742,31 +742,6 @@ fn to_kebab_case(s: &str) -> String { .join("-") } -/// Parse ADR citations from RFC frontmatter -/// -/// Looks for patterns like: -/// | **ADRs** | 0004, 0007, 0010 | -pub fn parse_adr_citations(content: &str) -> Vec { - let mut citations = Vec::new(); - - for line in content.lines() { - if line.contains("**ADRs**") || line.contains("| ADRs |") { - // Extract numbers - for part in line.split(|c: char| !c.is_numeric()) { - if let Ok(num) = part.parse::() { - if num < 100 { - // ADR numbers are typically small - citations.push(num); - } - } - } - break; - } - } - - citations -} - #[cfg(test)] mod tests { use super::*; @@ -785,16 +760,6 @@ mod tests { assert!(keywords.contains(&"testing".to_string())); } - #[test] - fn test_parse_adr_citations() { - let content = r#" -| **Status** | Draft | -| **ADRs** | 0004, 0007, 0010 | -"#; - let citations = parse_adr_citations(content); - assert_eq!(citations, vec![4, 7, 10]); - } - #[test] fn test_calculate_relevance_score() { let adr = AdrSummary { diff --git a/crates/blue-mcp/src/handlers/alignment.rs b/crates/blue-mcp/src/handlers/alignment.rs new file mode 100644 index 0000000..e274479 --- /dev/null +++ b/crates/blue-mcp/src/handlers/alignment.rs @@ -0,0 +1,596 @@ +//! Alignment Dialogue Orchestration Handler +//! +//! Implements RFC 0012: blue_alignment_play +//! Uses local Ollama to run multi-expert deliberation until convergence. + +use std::fs; +use std::path::PathBuf; + +use blue_core::{ + AlignmentDialogue, DialogueStatus, DocType, Document, ExpertResponse, + LinkType, PanelTemplate, Perspective, ProjectState, Round, + Tension, TensionStatus, build_expert_prompt, parse_expert_response, CompletionOptions, +}; +use blue_ollama::{EmbeddedOllama, HealthStatus}; +use serde_json::{json, Value}; + +use crate::error::ServerError; + +/// Default model for alignment dialogues +const DEFAULT_MODEL: &str = "qwen2.5:7b"; + +/// Handle blue_alignment_play +/// +/// Run a multi-expert alignment dialogue to deliberate on a topic until convergence. +pub fn handle_play(state: &mut ProjectState, args: &Value) -> Result { + let topic = args + .get("topic") + .and_then(|v| v.as_str()) + .ok_or(ServerError::InvalidParams)?; + + let constraint = args.get("constraint").and_then(|v| v.as_str()); + let expert_count = args + .get("expert_count") + .and_then(|v| v.as_u64()) + .unwrap_or(12) as usize; + let convergence = args + .get("convergence") + .and_then(|v| v.as_f64()) + .unwrap_or(0.95); + let max_rounds = args + .get("max_rounds") + .and_then(|v| v.as_u64()) + .unwrap_or(12) as u32; + let rfc_title = args.get("rfc_title").and_then(|v| v.as_str()); + let template = args.get("template").and_then(|v| v.as_str()); + let model = args + .get("model") + .and_then(|v| v.as_str()) + .unwrap_or(DEFAULT_MODEL); + + // Validate RFC exists if provided + let _rfc_doc = if let Some(rfc) = rfc_title { + Some( + state + .store + .find_document(DocType::Rfc, rfc) + .map_err(|_| ServerError::NotFound(format!("RFC '{}' not found", rfc)))?, + ) + } else { + None + }; + + // Get Ollama instance + let ollama_config = blue_core::LocalLlmConfig { + use_external: true, + model: model.to_string(), + ..Default::default() + }; + let ollama = EmbeddedOllama::new(&ollama_config); + + // Verify Ollama is running + if !ollama.is_ollama_running() { + return Err(ServerError::CommandFailed( + "Ollama not running. Start it with blue_llm_start or run 'ollama serve'.".to_string(), + )); + } + + // Check health + match ollama.health_check() { + HealthStatus::Healthy { .. } => {} + HealthStatus::Unhealthy { error } => { + return Err(ServerError::CommandFailed(format!( + "Ollama unhealthy: {}", + error + ))); + } + HealthStatus::NotRunning => { + return Err(ServerError::CommandFailed("Ollama not running.".to_string())); + } + } + + // Generate expert panel based on template + let panel_template = match template { + Some("infrastructure") => PanelTemplate::Infrastructure, + Some("product") => PanelTemplate::Product, + Some("ml") => PanelTemplate::MachineLearning, + Some("governance") => PanelTemplate::Governance, + _ => PanelTemplate::General, + }; + + let mut experts = panel_template.generate_experts(expert_count); + + // Make sure we don't exceed requested count + if experts.len() > expert_count { + experts.truncate(expert_count); + } + + // Create dialogue + let mut dialogue = AlignmentDialogue::new( + topic.to_string(), + constraint.map(String::from), + experts.clone(), + ); + dialogue.convergence_threshold = convergence; + dialogue.max_rounds = max_rounds; + dialogue.rfc_title = rfc_title.map(String::from); + + // Completion options for expert responses + let options = CompletionOptions { + max_tokens: 2048, + temperature: 0.8, + stop_sequences: vec!["---".to_string()], + }; + + // Run rounds + let mut round_num = 0; + let mut previous_score = 0u32; + + loop { + round_num += 1; + + // Check max rounds + if round_num > max_rounds { + dialogue.status = DialogueStatus::MaxRoundsReached; + break; + } + + // Run one round - need to pass copies/references that don't conflict + let (round, new_perspectives, new_tensions) = run_round( + &ollama, + model, + &options, + &dialogue.topic, + dialogue.constraint.as_deref(), + &dialogue.experts, + &dialogue.rounds, + round_num, + dialogue.perspectives.len(), + dialogue.tensions.len(), + )?; + + // Merge new perspectives and tensions + dialogue.perspectives.extend(new_perspectives); + for tension in new_tensions { + dialogue.tensions.push(tension); + } + + // Calculate velocity + let velocity = (round.total_score as i32) - (previous_score as i32); + previous_score = round.total_score; + + // Check convergence conditions: + // 1. Convergence threshold met + // 2. Velocity approaching zero (less than 2 points gained) + // 3. All tensions resolved + let tensions_resolved = dialogue.tensions.is_empty() || dialogue.tensions.iter().all(|t| t.status == TensionStatus::Resolved); + let velocity_stable = velocity.abs() < 2 && round_num > 2; + + dialogue.rounds.push(round); + + if dialogue.rounds.last().map(|r| r.convergence).unwrap_or(0.0) >= convergence { + dialogue.status = DialogueStatus::Converged; + break; + } + + if velocity_stable && tensions_resolved && round_num > 3 { + dialogue.status = DialogueStatus::Converged; + break; + } + } + + // Generate and save dialogue markdown + let markdown = generate_dialogue_markdown(&dialogue); + let dialogue_path = save_dialogue(state, &dialogue, &markdown)?; + + // Get final stats + let final_convergence = dialogue.rounds.last().map(|r| r.convergence).unwrap_or(0.0); + let total_rounds = dialogue.rounds.len(); + + let hint = match dialogue.status { + DialogueStatus::Converged => format!( + "Reached {:.0}% convergence in {} rounds.", + final_convergence * 100.0, + total_rounds + ), + DialogueStatus::MaxRoundsReached => format!( + "Stopped after {} rounds at {:.0}% convergence.", + total_rounds, + final_convergence * 100.0 + ), + _ => "Dialogue interrupted.".to_string(), + }; + + Ok(json!({ + "status": "success", + "message": blue_core::voice::info( + &format!("Alignment dialogue complete: {}", topic), + Some(&hint) + ), + "dialogue": { + "topic": topic, + "constraint": constraint, + "file": dialogue_path.display().to_string(), + "rounds": total_rounds, + "final_convergence": final_convergence, + "status": format!("{:?}", dialogue.status).to_lowercase(), + "expert_count": experts.len(), + "perspectives_surfaced": dialogue.perspectives.len(), + "tensions_resolved": dialogue.tensions.iter().filter(|t| t.status == TensionStatus::Resolved).count(), + "linked_rfc": rfc_title, + }, + "expert_panel": experts.iter().map(|e| json!({ + "id": e.id, + "name": e.name, + "tier": format!("{:?}", e.tier).to_lowercase(), + })).collect::>(), + })) +} + +/// Build a summary of previous rounds for the prompt +fn summarize_previous_rounds(rounds: &[Round]) -> String { + if rounds.is_empty() { + return String::new(); + } + + let mut summary = String::new(); + for round in rounds { + summary.push_str(&format!("\n## Round {} Summary\n", round.number)); + summary.push_str(&format!("Convergence: {:.0}%\n", round.convergence * 100.0)); + + for resp in &round.responses { + summary.push_str(&format!( + "\n**{}**: {} (confidence: {:.1})\n", + resp.expert_id, resp.position, resp.confidence + )); + } + } + summary +} + +/// Run a single round of dialogue +/// Returns (Round, new_perspectives, new_tensions) +fn run_round( + ollama: &EmbeddedOllama, + model: &str, + options: &CompletionOptions, + topic: &str, + constraint: Option<&str>, + experts: &[blue_core::Expert], + previous_rounds: &[Round], + round_num: u32, + perspective_offset: usize, + tension_offset: usize, +) -> Result<(Round, Vec, Vec), ServerError> { + let mut responses = Vec::new(); + let mut round_score = 0u32; + let mut new_perspectives = Vec::new(); + let mut new_tensions = Vec::new(); + + // Build summary of previous rounds + let previous_summary = summarize_previous_rounds(previous_rounds); + + for expert in experts { + // Build prompt for this expert + let prompt = build_expert_prompt( + expert, + topic, + constraint, + round_num, + &previous_summary, + ); + + // Generate response + let result = ollama + .generate(model, &prompt, options) + .map_err(|e| ServerError::CommandFailed(format!("LLM generation failed: {}", e)))?; + + // Parse response + let mut response = parse_expert_response(&expert.id, &result.text); + + // Track new perspectives + let local_perspective_offset = perspective_offset + new_perspectives.len(); + for (i, p) in response.perspectives.iter_mut().enumerate() { + p.id = format!("P{:02}", local_perspective_offset + i + 1); + p.round = round_num; + new_perspectives.push(p.clone()); + } + + // Track new tensions + let local_tension_offset = tension_offset + new_tensions.len(); + for (i, t) in response.tensions.iter_mut().enumerate() { + t.id = format!("T{}", local_tension_offset + i + 1); + new_tensions.push(t.clone()); + } + + round_score += response.score.total(); + responses.push(response); + } + + // Calculate convergence based on position similarity + let convergence = calculate_convergence(&responses); + + // Calculate velocity + let previous_total = previous_rounds.last().map(|r| r.total_score).unwrap_or(0); + let velocity = (round_score as i32) - (previous_total as i32); + + Ok(( + Round { + number: round_num, + responses, + total_score: round_score, + velocity, + convergence, + }, + new_perspectives, + new_tensions, + )) +} + +/// Calculate convergence based on position alignment +fn calculate_convergence(responses: &[ExpertResponse]) -> f64 { + if responses.is_empty() { + return 0.0; + } + + // Use confidence-weighted position clustering + // High confidence experts have more weight in determining convergence + let high_confidence: Vec<_> = responses + .iter() + .filter(|r| r.confidence >= 0.7) + .collect(); + + if high_confidence.is_empty() { + return 0.3; // Base convergence if no one is confident yet + } + + // Group by position similarity using first 30 chars as key + let mut position_groups: std::collections::HashMap = std::collections::HashMap::new(); + for response in &high_confidence { + let key: String = response.position.chars().take(30).collect::().to_lowercase(); + *position_groups.entry(key).or_insert(0) += 1; + } + + let largest_group = position_groups.values().max().copied().unwrap_or(0); + largest_group as f64 / responses.len() as f64 +} + +/// Generate dialogue markdown +fn generate_dialogue_markdown(dialogue: &AlignmentDialogue) -> String { + let mut md = String::new(); + + // Title + md.push_str(&format!("# Alignment Dialogue: {}\n\n", dialogue.topic)); + + // Metadata + md.push_str("| | |\n|---|---|\n"); + md.push_str(&format!("| **Topic** | {} |\n", dialogue.topic)); + if let Some(ref c) = dialogue.constraint { + md.push_str(&format!("| **Constraint** | {} |\n", c)); + } + md.push_str(&format!( + "| **Format** | {} experts, {} rounds |\n", + dialogue.experts.len(), + dialogue.rounds.len() + )); + let final_conv = dialogue.rounds.last().map(|r| r.convergence).unwrap_or(0.0); + md.push_str(&format!( + "| **Final Convergence** | {:.0}% |\n", + final_conv * 100.0 + )); + md.push_str(&format!( + "| **Status** | {:?} |\n", + dialogue.status + )); + if let Some(ref rfc) = dialogue.rfc_title { + md.push_str(&format!("| **RFC** | {} |\n", rfc)); + } + md.push_str("\n---\n\n"); + + // Expert Panel + md.push_str("## Expert Panel\n\n"); + md.push_str("| ID | Expert | Tier | Perspective |\n"); + md.push_str("|----|--------|------|-------------|\n"); + for e in &dialogue.experts { + md.push_str(&format!( + "| {} | **{}** | {:?} | {} |\n", + e.id, e.name, e.tier, e.perspective + )); + } + md.push_str("\n"); + + // Perspectives Inventory + if !dialogue.perspectives.is_empty() { + md.push_str("## Perspectives Inventory\n\n"); + md.push_str("| ID | Description | Surfaced By | Round | Status |\n"); + md.push_str("|----|-------------|-------------|-------|--------|\n"); + for p in &dialogue.perspectives { + md.push_str(&format!( + "| {} | {} | {} | {} | {:?} |\n", + p.id, p.description, p.surfaced_by, p.round, p.status + )); + } + md.push_str("\n"); + } + + // Tensions + if !dialogue.tensions.is_empty() { + md.push_str("## Tensions\n\n"); + md.push_str("| ID | Description | Status |\n"); + md.push_str("|----|-------------|--------|\n"); + for t in &dialogue.tensions { + md.push_str(&format!( + "| {} | {} | {:?} |\n", + t.id, t.description, t.status + )); + } + md.push_str("\n"); + } + + // Rounds + for round in &dialogue.rounds { + md.push_str(&format!("## Round {}\n\n", round.number)); + + for resp in &round.responses { + let expert = dialogue.experts.iter().find(|e| e.id == resp.expert_id); + let name = expert.map(|e| e.name.as_str()).unwrap_or(&resp.expert_id); + md.push_str(&format!("### {} ({})\n\n", name, resp.expert_id)); + md.push_str(&resp.content); + md.push_str("\n\n"); + } + + // Round scoreboard + md.push_str(&format!("### Round {} Scoreboard\n\n", round.number)); + md.push_str("| Expert | Position | Confidence | ALIGNMENT |\n"); + md.push_str("|--------|----------|------------|----------|\n"); + for resp in &round.responses { + let position_display = if resp.position.len() > 40 { + format!("{}...", &resp.position[..40]) + } else { + resp.position.clone() + }; + md.push_str(&format!( + "| {} | {} | {:.1} | {} |\n", + resp.expert_id, + position_display, + resp.confidence, + resp.score.total() + )); + } + md.push_str(&format!( + "\n**Convergence:** {:.0}% | **Velocity:** {:+} | **Total ALIGNMENT:** {}\n\n", + round.convergence * 100.0, + round.velocity, + round.total_score + )); + } + + // Recommendations (extracted from final round consensus) + md.push_str("## Recommendations\n\n"); + if let Some(final_round) = dialogue.rounds.last() { + // Take top 3 positions by confidence + let mut sorted_responses = final_round.responses.clone(); + sorted_responses.sort_by(|a, b| b.confidence.partial_cmp(&a.confidence).unwrap_or(std::cmp::Ordering::Equal)); + + for (i, resp) in sorted_responses.iter().take(3).enumerate() { + md.push_str(&format!("{}. **{}**: {}\n", i + 1, resp.expert_id, resp.position)); + } + } else { + md.push_str("*No rounds completed.*\n"); + } + + md.push_str("\n---\n\n"); + md.push_str("*Generated by Blue Alignment Dialogue Orchestration (RFC 0012)*\n"); + + md +} + +/// Save dialogue to file and SQLite +fn save_dialogue( + state: &mut ProjectState, + dialogue: &AlignmentDialogue, + markdown: &str, +) -> Result { + // Get next dialogue number + let dialogue_number = state + .store + .next_number(DocType::Dialogue) + .map_err(|e| ServerError::CommandFailed(e.to_string()))?; + + // Generate file path + let date = chrono::Local::now().format("%Y-%m-%d").to_string(); + let file_name = format!( + "{}-{}.dialogue.md", + date, + to_kebab_case(&dialogue.topic) + ); + let file_path = PathBuf::from("dialogues").join(&file_name); + let docs_path = state.home.docs_path.clone(); + let dialogue_path = docs_path.join(&file_path); + + // Create document in SQLite + let mut doc = Document::new(DocType::Dialogue, &dialogue.topic, "recorded"); + doc.number = Some(dialogue_number); + doc.file_path = Some(file_path.to_string_lossy().to_string()); + + let dialogue_id = state + .store + .add_document(&doc) + .map_err(|e| ServerError::CommandFailed(e.to_string()))?; + + // Link to RFC if provided + if let Some(ref rfc_title) = dialogue.rfc_title { + if let Ok(rfc_doc) = state.store.find_document(DocType::Rfc, rfc_title) { + if let (Some(rfc_id), Some(did)) = (rfc_doc.id, Some(dialogue_id)) { + let _ = state.store.link_documents(did, rfc_id, LinkType::DialogueToRfc); + } + } + } + + // Create dialogues directory if needed + if let Some(parent) = dialogue_path.parent() { + fs::create_dir_all(parent).map_err(|e| ServerError::CommandFailed(e.to_string()))?; + } + + // Write file + fs::write(&dialogue_path, markdown).map_err(|e| ServerError::CommandFailed(e.to_string()))?; + + Ok(dialogue_path) +} + +/// Convert string to kebab-case +fn to_kebab_case(s: &str) -> String { + s.to_lowercase() + .chars() + .map(|c| if c.is_alphanumeric() { c } else { '-' }) + .collect::() + .split('-') + .filter(|s| !s.is_empty()) + .collect::>() + .join("-") +} + +#[cfg(test)] +mod tests { + use super::*; + use blue_core::AlignmentScore; + + #[test] + fn test_to_kebab_case() { + assert_eq!(to_kebab_case("API Versioning Strategy"), "api-versioning-strategy"); + assert_eq!(to_kebab_case("Cross-Account IAM"), "cross-account-iam"); + } + + #[test] + fn test_calculate_convergence_single() { + let responses = vec![ExpertResponse { + expert_id: "DS".to_string(), + content: String::new(), + position: "Use semantic versioning".to_string(), + confidence: 0.8, + perspectives: Vec::new(), + tensions: Vec::new(), + refinements: Vec::new(), + concessions: Vec::new(), + resolved_tensions: Vec::new(), + score: AlignmentScore::default(), + }]; + + let conv = calculate_convergence(&responses); + assert!((conv - 1.0).abs() < 0.001); + } + + #[test] + fn test_calculate_convergence_empty() { + let responses: Vec = Vec::new(); + let conv = calculate_convergence(&responses); + assert!((conv - 0.0).abs() < 0.001); + } + + #[test] + fn test_summarize_empty_rounds() { + let rounds: Vec = Vec::new(); + let summary = summarize_previous_rounds(&rounds); + assert!(summary.is_empty()); + } +} diff --git a/crates/blue-mcp/src/handlers/mod.rs b/crates/blue-mcp/src/handlers/mod.rs index 8f19f5d..1af2c43 100644 --- a/crates/blue-mcp/src/handlers/mod.rs +++ b/crates/blue-mcp/src/handlers/mod.rs @@ -3,6 +3,7 @@ //! Each module handles a specific document type or workflow. pub mod adr; +pub mod alignment; // RFC 0012: Alignment Dialogue Orchestration pub mod audit; // Health check (blue_health_check) pub mod audit_doc; // Audit documents (blue_audit_create, etc.) pub mod decision; diff --git a/crates/blue-mcp/src/server.rs b/crates/blue-mcp/src/server.rs index ae3716b..2f4cae9 100644 --- a/crates/blue-mcp/src/server.rs +++ b/crates/blue-mcp/src/server.rs @@ -205,7 +205,7 @@ impl BlueServer { }, { "name": "blue_rfc_update_status", - "description": "Update an RFC's status (draft -> accepted -> in-progress -> implemented).", + "description": "Update an RFC's status. WORKFLOW: Set to 'accepted' when RFC is approved, then use blue_worktree_create to start implementation. Status flow: draft -> accepted -> in-progress -> implemented.", "inputSchema": { "type": "object", "properties": { @@ -481,7 +481,7 @@ impl BlueServer { }, { "name": "blue_worktree_create", - "description": "Create an isolated git worktree for RFC implementation.", + "description": "Create an isolated git worktree for RFC implementation. WORKFLOW: Use this after an RFC is accepted (status='accepted'), before starting implementation. Creates a feature branch and isolated working directory. After implementation, use blue_rfc_complete then blue_pr_create.", "inputSchema": { "type": "object", "properties": { @@ -534,7 +534,7 @@ impl BlueServer { }, { "name": "blue_pr_create", - "description": "Create a PR with enforced base branch (develop, not main). If rfc is provided, title is formatted as 'RFC NNNN: Title Case Name'.", + "description": "Create a PR with enforced base branch (develop, not main). WORKFLOW: Use after blue_rfc_complete to submit implementation for review. After PR is merged, use blue_worktree_cleanup to finalize. If rfc is provided, title is formatted as 'RFC NNNN: Title Case Name'.", "inputSchema": { "type": "object", "properties": { @@ -1010,7 +1010,7 @@ impl BlueServer { }, { "name": "blue_rfc_complete", - "description": "Mark RFC as implemented based on plan progress. Requires at least 70% completion.", + "description": "Mark RFC as implemented based on plan progress. WORKFLOW: Use after completing core implementation work in the worktree. Requires at least 70% task completion. After this, use blue_pr_create to submit for review.", "inputSchema": { "type": "object", "properties": { @@ -1417,6 +1417,53 @@ impl BlueServer { "required": ["title"] } }, + // RFC 0012: Alignment Dialogue Orchestration + { + "name": "blue_alignment_play", + "description": "Run a multi-expert alignment dialogue to deliberate on a topic until convergence", + "inputSchema": { + "type": "object", + "properties": { + "topic": { + "type": "string", + "description": "The topic to deliberate on" + }, + "constraint": { + "type": "string", + "description": "Key constraint or boundary for the discussion" + }, + "expert_count": { + "type": "integer", + "default": 12, + "description": "Number of experts in the panel" + }, + "convergence": { + "type": "number", + "default": 0.95, + "description": "Target convergence threshold (0.0-1.0)" + }, + "max_rounds": { + "type": "integer", + "default": 12, + "description": "Maximum rounds before stopping" + }, + "rfc_title": { + "type": "string", + "description": "RFC to link the dialogue to" + }, + "template": { + "type": "string", + "enum": ["infrastructure", "product", "ml", "governance", "general"], + "description": "Expert panel template" + }, + "model": { + "type": "string", + "description": "Ollama model to use (default: qwen2.5:7b)" + } + }, + "required": ["topic"] + } + }, // Phase 8: Playwright verification { "name": "blue_playwright_verify", @@ -2140,6 +2187,8 @@ impl BlueServer { "blue_dialogue_get" => self.handle_dialogue_get(&call.arguments), "blue_dialogue_list" => self.handle_dialogue_list(&call.arguments), "blue_dialogue_save" => self.handle_dialogue_save(&call.arguments), + // RFC 0012: Alignment Dialogue Orchestration + "blue_alignment_play" => self.handle_alignment_play(&call.arguments), // Phase 8: Playwright handler "blue_playwright_verify" => self.handle_playwright_verify(&call.arguments), // Phase 9: Post-mortem handlers @@ -2224,41 +2273,92 @@ impl BlueServer { Ok(state) => { let summary = state.status_summary(); - let recommendations = if !summary.stalled.is_empty() { - vec![format!( - "'{}' might be stalled. Check if work is still in progress.", - summary.stalled[0].title - )] + // Build recommendations with MCP tool syntax (RFC 0011) + let (recommendations, next_action) = if !summary.stalled.is_empty() { + let title = &summary.stalled[0].title; + ( + vec![format!( + "'{}' is in-progress but has no worktree. Use blue_worktree_create with title='{}' to work in isolation.", + title, title + )], + Some(json!({ + "tool": "blue_worktree_create", + "args": { "title": title }, + "hint": "Create worktree to continue work in isolation" + })) + ) } else if !summary.ready.is_empty() { - vec![format!( - "'{}' is ready to implement. Run 'blue worktree create {}' to start.", - summary.ready[0].title, summary.ready[0].title - )] + let title = &summary.ready[0].title; + ( + vec![format!( + "'{}' is accepted and ready. Use blue_worktree_create with title='{}' to start implementation.", + title, title + )], + Some(json!({ + "tool": "blue_worktree_create", + "args": { "title": title }, + "hint": "Create worktree to start implementation" + })) + ) } else if !summary.drafts.is_empty() { - vec![format!( - "'{}' is in draft. Review and accept it when ready.", - summary.drafts[0].title - )] + let title = &summary.drafts[0].title; + ( + vec![format!( + "'{}' is in draft. Use blue_rfc_update_status with title='{}' and status='accepted' when ready.", + title, title + )], + Some(json!({ + "tool": "blue_rfc_update_status", + "args": { "title": title, "status": "accepted" }, + "hint": "Accept the RFC to proceed with implementation" + })) + ) } else if !summary.active.is_empty() { - vec![format!( - "{} item(s) in progress. Keep at it.", - summary.active.len() - )] + let title = &summary.active[0].title; + ( + vec![format!( + "{} item(s) in progress. Continue work on '{}', then use blue_rfc_complete when done.", + summary.active.len(), title + )], + Some(json!({ + "tool": "blue_rfc_complete", + "args": { "title": title }, + "hint": "Mark as implemented when core work is done" + })) + ) } else { - vec!["Nothing pressing. Good time to plan something new.".to_string()] + ( + vec!["Nothing in flight. Use blue_rfc_create to start something new.".to_string()], + Some(json!({ + "tool": "blue_rfc_create", + "args": {}, + "hint": "Create a new RFC to plan your next feature" + })) + ) }; - Ok(json!({ + let mut response = json!({ "recommendations": recommendations, "hint": summary.hint - })) + }); + + if let Some(action) = next_action { + response["next_action"] = action; + } + + Ok(response) } Err(_) => { Ok(json!({ "recommendations": [ - "Run 'blue init' to set up this project first." + "Blue not initialized here. Use blue_guide to get started." ], - "hint": "Can't find Blue here." + "hint": "Can't find Blue here.", + "next_action": { + "tool": "blue_guide", + "args": { "action": "start" }, + "hint": "Start the interactive guide" + } })) } } @@ -2409,6 +2509,14 @@ impl BlueServer { let doc = state.store.find_document(DocType::Rfc, title) .map_err(|e| ServerError::StateLoadFailed(e.to_string()))?; + // Check for worktree when going to in-progress + let has_worktree = state.has_worktree(title); + let worktree_warning = if status == "in-progress" && !has_worktree { + Some("No worktree exists for this RFC. Use blue_worktree_create to work in isolation.") + } else { + None + }; + // Update database state.store.update_document_status(DocType::Rfc, title, status) .map_err(|e| ServerError::StateLoadFailed(e.to_string()))?; @@ -2421,7 +2529,30 @@ impl BlueServer { false }; - Ok(json!({ + // Build next_action for accepted status (RFC 0011) + let next_action = if status == "accepted" { + Some(json!({ + "tool": "blue_worktree_create", + "args": { "title": title }, + "hint": "Create a worktree to start implementation" + })) + } else if status == "in-progress" && has_worktree { + Some(json!({ + "tool": "blue_rfc_complete", + "args": { "title": title }, + "hint": "Mark as implemented when core work is done" + })) + } else if status == "implemented" { + Some(json!({ + "tool": "blue_pr_create", + "args": {}, + "hint": "Create a pull request for review" + })) + } else { + None + }; + + let mut response = json!({ "status": "success", "title": title, "new_status": status, @@ -2430,7 +2561,17 @@ impl BlueServer { &format!("Updated '{}' to {}", title, status), None ) - })) + }); + + // Add optional fields + if let Some(action) = next_action { + response["next_action"] = action; + } + if let Some(warning) = worktree_warning { + response["warning"] = json!(warning); + } + + Ok(response) } fn handle_rfc_plan(&mut self, args: &Option) -> Result { @@ -3011,6 +3152,13 @@ impl BlueServer { crate::handlers::dialogue::handle_save(state, args) } + // RFC 0012: Alignment Dialogue Orchestration + fn handle_alignment_play(&mut self, args: &Option) -> Result { + let args = args.as_ref().ok_or(ServerError::InvalidParams)?; + let state = self.ensure_state_mut()?; + crate::handlers::alignment::handle_play(state, args) + } + fn handle_playwright_verify(&mut self, args: &Option) -> Result { let args = args.as_ref().ok_or(ServerError::InvalidParams)?; crate::handlers::playwright::handle_verify(args)