feat: RFC 0048 expert pool implementation and documentation batch

## RFC 0048 Expert Pool Implementation
- Added tiered expert pools (Core/Adjacent/Wildcard) to dialogue handlers
- Implemented weighted random sampling for panel selection
- Added blue_dialogue_sample_panel MCP tool for manual round control
- Updated alignment-play skill with pool design instructions

## New RFCs
- 0044: RFC matching and auto-status (draft)
- 0045: MCP tool enforcement (draft)
- 0046: Judge-defined expert panels (superseded)
- 0047: Expert pool sampling architecture (superseded)
- 0048: Alignment expert pools (implemented)
- 0050: Graduated panel rotation (draft)

## Dialogues Recorded
- 2026-02-01T2026Z: Test expert pool feature
- 2026-02-01T2105Z: SQLite vs flat files
- 2026-02-01T2214Z: Guard command architecture

## Other Changes
- Added TODO.md for tracking work
- Updated expert-pools.md knowledge doc
- Removed deprecated alignment-expert agent
- Added spikes for SQLite assets and SDLC workflow gaps

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Eric Garcia 2026-02-01 19:26:41 -05:00
parent f5d3621495
commit d7db9c667d
18 changed files with 2622 additions and 282 deletions

41
.blue/docs/TODO.md Normal file
View file

@ -0,0 +1,41 @@
Fix the heartbeat and put these back:
/Users/ericg/.claude/settings.json
```
"PreToolUse": [
{
"matcher": "blue_*",
"hooks": [
{
"type": "command",
"command": "blue session-heartbeat"
}
]
}
],
"SessionEnd": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "blue session-end"
}
]
}
]
```
/Users/ericg/letemcook/blue/.claude/settings.json
```
{
"matcher": "Write|Edit|MultiEdit",
"hooks": [
{
"type": "command",
"command": "blue guard --path=\"$TOOL_INPUT:file_path\""
}
]
}
```

View file

@ -0,0 +1,52 @@
# Alignment Dialogue: Test Expert Pool Feature
**Draft**: Dialogue 2044
**Date**: 2026-02-01 20:26Z
**Status**: In Progress
**Participants**: 💙 Judge, 🧁 Muffin, 🧁 Cupcake, 🧁 Scone
## Expert Panel
| Agent | Role | Tier | Relevance | Emoji |
|-------|------|------|-----------|-------|
| 💙 Judge | Orchestrator | — | — | 💙 |
| 🧁 Muffin | Quality Engineer | Core | 0.95 | 🧁 |
| 🧁 Cupcake | Systems Thinker | Adjacent | 0.70 | 🧁 |
| 🧁 Scone | Domain Expert | Wildcard | 0.40 | 🧁 |
## Alignment Scoreboard
| Agent | Wisdom | Consistency | Truth | Relationships | **Total** |
|-------|--------|-------------|-------|---------------|----------|
| 🧁 Muffin | 0 | 0 | 0 | 0 | **0** |
| 🧁 Cupcake | 0 | 0 | 0 | 0 | **0** |
| 🧁 Scone | 0 | 0 | 0 | 0 | **0** |
**Total ALIGNMENT**: 0
## Perspectives Inventory
| ID | Agent | Perspective | Round |
|----|-------|-------------|-------|
| — | — | [Awaiting Round 0] | — |
## Tensions Tracker
| ID | Tension | Status | Raised | Resolved |
|----|---------|--------|--------|----------|
| — | [Awaiting Round 0] | — | — | — |
## Round 0: Opening Arguments
### Muffin 🧁
[Awaiting response]
### Cupcake 🧁
[Awaiting response]
### Scone 🧁
[Awaiting response]

View file

@ -0,0 +1,75 @@
# Alignment Dialogue: Sqlite Vs Flat Files
**Draft**: Dialogue 2045
**Date**: 2026-02-01 21:05Z
**Status**: In Progress
**Participants**: 💙 Judge, 🧁 Muffin, 🧁 Cupcake, 🧁 Scone, 🧁 Eclair, 🧁 Donut
## Expert Pool
**Domain**: System Architecture
**Question**: Should Blue use SQLite vs flat files for configuration state?
| Tier | Experts |
|------|--------|
| Core | Data Architect, Systems Engineer, Developer Experience Lead |
| Adjacent | Operations Engineer, Performance Engineer, CLI Tooling Expert |
| Wildcard | Unix Philosophy Advocate, Database Skeptic |
## Expert Panel
| Agent | Role | Tier | Relevance | Emoji |
|-------|------|------|-----------|-------|
| 💙 Judge | Orchestrator | — | — | 💙 |
| 🧁 Muffin | Data Architect | Core | 0.95 | 🧁 |
| 🧁 Cupcake | Operations Engineer | Adjacent | 0.75 | 🧁 |
| 🧁 Scone | Performance Engineer | Adjacent | 0.70 | 🧁 |
| 🧁 Eclair | CLI Tooling Expert | Adjacent | 0.65 | 🧁 |
| 🧁 Donut | Database Skeptic | Wildcard | 0.35 | 🧁 |
## Alignment Scoreboard
| Agent | Wisdom | Consistency | Truth | Relationships | **Total** |
|-------|--------|-------------|-------|---------------|----------|
| 🧁 Muffin | 0 | 0 | 0 | 0 | **0** |
| 🧁 Cupcake | 0 | 0 | 0 | 0 | **0** |
| 🧁 Scone | 0 | 0 | 0 | 0 | **0** |
| 🧁 Eclair | 0 | 0 | 0 | 0 | **0** |
| 🧁 Donut | 0 | 0 | 0 | 0 | **0** |
**Total ALIGNMENT**: 0
## Perspectives Inventory
| ID | Agent | Perspective | Round |
|----|-------|-------------|-------|
| — | — | [Awaiting Round 0] | — |
## Tensions Tracker
| ID | Tension | Status | Raised | Resolved |
|----|---------|--------|--------|----------|
| — | [Awaiting Round 0] | — | — | — |
## Round 0: Opening Arguments
### Muffin 🧁
[Awaiting response]
### Cupcake 🧁
[Awaiting response]
### Scone 🧁
[Awaiting response]
### Eclair 🧁
[Awaiting response]
### Donut 🧁
[Awaiting response]

View file

@ -0,0 +1,75 @@
# Alignment Dialogue: Guard Command Architecture
**Draft**: Dialogue 2046
**Date**: 2026-02-01 22:14Z
**Status**: In Progress
**Participants**: 💙 Judge, 🧁 Muffin, 🧁 Cupcake, 🧁 Scone, 🧁 Eclair, 🧁 Donut
## Expert Pool
**Domain**: CLI Architecture
**Question**: Should the guard command run synchronously before tokio runtime initialization, or remain async?
| Tier | Experts |
|------|--------|
| Core | Systems Architect, Performance Engineer, Rust Expert |
| Adjacent | CLI UX Designer, DevOps Engineer, Security Analyst |
| Wildcard | Minimalist, Contrarian |
## Expert Panel
| Agent | Role | Tier | Relevance | Emoji |
|-------|------|------|-----------|-------|
| 💙 Judge | Orchestrator | — | — | 💙 |
| 🧁 Muffin | Systems Architect | Core | 0.95 | 🧁 |
| 🧁 Cupcake | CLI UX Designer | Adjacent | 0.70 | 🧁 |
| 🧁 Scone | DevOps Engineer | Adjacent | 0.65 | 🧁 |
| 🧁 Eclair | Security Analyst | Adjacent | 0.60 | 🧁 |
| 🧁 Donut | Minimalist | Wildcard | 0.40 | 🧁 |
## Alignment Scoreboard
| Agent | Wisdom | Consistency | Truth | Relationships | **Total** |
|-------|--------|-------------|-------|---------------|----------|
| 🧁 Muffin | 0 | 0 | 0 | 0 | **0** |
| 🧁 Cupcake | 0 | 0 | 0 | 0 | **0** |
| 🧁 Scone | 0 | 0 | 0 | 0 | **0** |
| 🧁 Eclair | 0 | 0 | 0 | 0 | **0** |
| 🧁 Donut | 0 | 0 | 0 | 0 | **0** |
**Total ALIGNMENT**: 0
## Perspectives Inventory
| ID | Agent | Perspective | Round |
|----|-------|-------------|-------|
| — | — | [Awaiting Round 0] | — |
## Tensions Tracker
| ID | Tension | Status | Raised | Resolved |
|----|---------|--------|--------|----------|
| — | [Awaiting Round 0] | — | — | — |
## Round 0: Opening Arguments
### Muffin 🧁
[Awaiting response]
### Cupcake 🧁
[Awaiting response]
### Scone 🧁
[Awaiting response]
### Eclair 🧁
[Awaiting response]
### Donut 🧁
[Awaiting response]

View file

@ -0,0 +1,11 @@
# Plan: comprehensive-config-architecture
| | |
|---|---|
| **RFC** | comprehensive-config-architecture |
| **Status** | in-progress |
| **Updated** | 2026-01-31T00:12:42.095896+00:00 |
## Tasks
- [ ] Test task for worktree validation

View file

@ -0,0 +1,138 @@
# RFC 0044: RFC Matching and Auto Status
| | |
|---|---|
| **Status** | Draft |
| **Date** | 2026-02-01 |
| **Source Spike** | [rfc-sdlc-workflow-gaps](../spikes/2026-02-01T0124Z-rfc-sdlc-workflow-gaps.wip.md) |
---
## Summary
Two gaps in the SDLC workflow:
1. **RFC matching fails for NNNN-slug patterns** - `find_document()` in `store.rs:1789` uses `trim_start_matches('0')` which only works on pure numeric strings. Pattern `0107-worker-job-id` fails because `107-worker-job-id` isn't a valid integer.
2. **RFC status not auto-updated on PR merge** - `handle_merge()` in `pr.rs:416` doesn't update RFC status to "implemented". The link exists (worktrees table has document_id + branch_name) but missing `get_worktree_by_branch()` function and no status update code.
Both fixes are mechanical.
---
## Design
### Fix 1: Extract Leading Digits from NNNN-slug Patterns
**File:** `crates/blue-core/src/store.rs:1788-1798`
**Before:**
```rust
// Try number match
let trimmed = query.trim_start_matches('0');
if let Ok(num) = if trimmed.is_empty() {
"0".parse()
} else {
trimmed.parse::<i32>()
} {
if let Ok(doc) = self.get_document_by_number(doc_type, num) {
return Ok(doc);
}
}
```
**After:**
```rust
// Try number match - extract leading digits from NNNN-slug format
let num_str: String = query.chars()
.take_while(|c| c.is_ascii_digit())
.collect();
if !num_str.is_empty() {
let trimmed = num_str.trim_start_matches('0');
if let Ok(num) = if trimmed.is_empty() {
"0".parse()
} else {
trimmed.parse::<i32>()
} {
if let Ok(doc) = self.get_document_by_number(doc_type, num) {
return Ok(doc);
}
}
}
```
This handles:
- `0107` → extracts `0107` → parses as 107
- `0107-worker-job-id` → extracts `0107` → parses as 107
- `worker-job-id` → extracts `` → skips number match, falls to substring
---
### Fix 2: Auto-Update RFC Status on Merge
**Step 2a: Add `get_worktree_by_branch()` to store.rs**
```rust
pub fn get_worktree_by_branch(&self, branch_name: &str) -> Result<Option<Worktree>, StoreError> {
match self.conn.query_row(
"SELECT id, document_id, branch_name, worktree_path, created_at
FROM worktrees WHERE branch_name = ?1",
params![branch_name],
|row| {
Ok(Worktree {
id: Some(row.get(0)?),
document_id: row.get(1)?,
branch_name: row.get(2)?,
worktree_path: row.get(3)?,
created_at: row.get(4)?,
})
},
) {
Ok(wt) => Ok(Some(wt)),
Err(rusqlite::Error::QueryReturnedNoRows) => Ok(None),
Err(e) => Err(StoreError::Database(e.to_string())),
}
}
```
**Step 2b: Update `handle_merge()` in pr.rs**
After successful merge (line 416), add:
```rust
Ok(()) => {
// Auto-update RFC status to implemented
let branch = get_current_branch(&state.home.root).ok();
if let Some(ref b) = branch {
if let Ok(Some(wt)) = state.store.get_worktree_by_branch(b) {
if let Ok(doc) = state.store.get_document_by_id(wt.document_id) {
if doc.status == "in-progress" {
let _ = state.store.update_document_status(
DocType::Rfc,
&doc.title,
"implemented"
);
}
}
}
}
Ok(json!({ ... }))
}
```
---
## Test Plan
- [ ] `find_document("0044")` returns RFC 0044
- [ ] `find_document("0044-rfc-matching")` returns RFC 0044
- [ ] `find_document("rfc-matching")` returns RFC 0044
- [ ] Merge PR on RFC-linked branch → RFC status changes to "implemented"
- [ ] Merge PR on non-RFC branch → no error, no status change
---
*"Right then. Let's get to it."*
— Blue

View file

@ -0,0 +1,73 @@
# RFC 0045: Mcp Tool Enforcement
| | |
|---|---|
| **Status** | Draft |
| **Date** | 2026-02-01 |
---
## Summary
Claude bypasses Blue MCP tools (blue_rfc_create, etc.) and uses Write/Edit directly for .blue/docs/ files, causing index drift. This happens because MCP instructions are soft guidance without explicit prohibitions.
## Solution
Hybrid approach with defense in depth:
### 1. Update MCP Server Instructions
Add explicit prohibitions to `crates/blue-mcp/src/server.rs`:
```rust
"IMPORTANT: When working in repos with .blue/ directories:\n",
"- NEVER use Write/Edit to create files in .blue/docs/\n",
"- ALWAYS use blue_rfc_create for RFCs\n",
"- ALWAYS use blue_adr_create for ADRs\n",
"- ALWAYS use blue_spike_create for spikes\n",
"These tools maintain the index database. Direct file creation causes drift.\n\n",
```
### 2. Add PreToolUse Guard Hook
New CLI command: `blue guard-write <path>`
Behavior:
- If path matches `.blue/docs/**/*.md` → exit 1 with message: "Use blue_rfc_create/blue_adr_create/blue_spike_create instead"
- Otherwise → exit 0 (allow)
Hook config in `~/.claude/settings.json`:
```json
{
"matcher": "Write",
"hooks": [{
"type": "command",
"command": "blue guard-write"
}]
}
```
### 3. Enhance blue_sync Visibility
Make index drift more prominent in `blue_status` output when detected.
## Implementation
- [ ] Update `server.rs` instructions with explicit tool requirements
- [ ] Add `blue guard-write` CLI command to `src/commands/`
- [ ] Add PreToolUse hook config for Write tool
- [ ] Test enforcement in superviber-web
- [ ] Run `blue_sync` to fix existing drift
## Test Plan
- [ ] Create RFC via Write in test repo → should be blocked by guard hook
- [ ] Create RFC via blue_rfc_create → should succeed and index correctly
- [ ] Verify blue_status shows drift warning for unindexed files
- [ ] Verify blue_sync fixes existing drift
---
*"Right then. Let's get to it."*
— Blue

View file

@ -0,0 +1,163 @@
# RFC 0046: Judge Defined Expert Panels
| | |
|---|---|
| **Status** | Superseded |
| **Date** | 2026-02-01 |
| **ADRs** | 0014 (Alignment Dialogue Agents) |
---
## Summary
The alignment dialogue system currently uses keyword matching against the topic title to select expert roles, falling back to generic roles like "Systems Thinker" and "Domain Expert". This produces inappropriate expert panels for domain-specific topics (e.g., investment strategy gets software engineering roles instead of investment analysts, risk managers, portfolio strategists). The Judge should be able to define custom expert panels appropriate for the specific problem being deliberated.
## Problem
When the Judge calls `blue_dialogue_create` for an alignment dialogue, expert roles are auto-selected via:
1. Keyword matching against topic title (e.g., "security" → "Security Architect")
2. Fallback to generic roles: Systems Thinker, Domain Expert, Devil's Advocate, etc.
This fails for:
- **Domain-specific topics**: Investment analysis gets "Systems Architect" instead of "Portfolio Manager"
- **Cross-functional topics**: A product launch might need Marketing, Legal, Finance perspectives
- **Novel domains**: Topics without keyword matches get only generic roles
The Judge understands the problem space after reading the topic. The Judge should select appropriate experts.
## Design
### New Parameter: `expert_panel`
Add an `expert_panel` parameter to `blue_dialogue_create`:
```json
{
"title": "Investment Strategy for Q3 Portfolio Rebalancing",
"alignment": true,
"agents": 5,
"expert_panel": [
"Investment Analyst",
"Risk Manager",
"Portfolio Strategist",
"Compliance Officer",
"Market Economist"
]
}
```
### Behavior
When `expert_panel` is provided:
- Use the provided roles in order
- Array length determines agent count (ignore `agents` param if both provided)
- Assign pastry names automatically (Muffin, Cupcake, Scone...)
- Assign tiers based on position: first ~33% Core, next ~42% Adjacent, final ~25% Wildcard
- Calculate relevance scores within each tier
When `expert_panel` is omitted:
- Remove keyword matching entirely
- Require `expert_panel` for alignment dialogues
- Error: "Alignment dialogues require expert_panel parameter"
### Example Output
```markdown
## Expert Panel
| Agent | Role | Tier | Relevance | Emoji |
|-------|------|------|-----------|-------|
| 🧁 Muffin | Investment Analyst | Core | 0.95 | 🧁 |
| 🧁 Cupcake | Risk Manager | Core | 0.90 | 🧁 |
| 🧁 Scone | Portfolio Strategist | Adjacent | 0.70 | 🧁 |
| 🧁 Eclair | Compliance Officer | Adjacent | 0.65 | 🧁 |
| 🧁 Donut | Market Economist | Wildcard | 0.40 | 🧁 |
```
### SKILL.md Update
Update the alignment-play skill to document the workflow:
```
## Round 0: Panel Design
Before creating the dialogue, the Judge:
1. Reads the topic/RFC thoroughly
2. Identifies relevant domains and expertise needed
3. Designs a panel of 3-12 experts appropriate to the problem
4. Creates the dialogue with the custom expert_panel
```
## Implementation
### Changes to `dialogue.rs`
1. **Remove** `ROLE_KEYWORDS` and `GENERAL_ROLES` constants
2. **Remove** `select_role_for_topic` function
3. **Modify** `handle_create` to parse `expert_panel` array
4. **Modify** `assign_pastry_agents` to accept optional `Vec<String>` roles
5. **Error** if `alignment: true` but no `expert_panel` provided
### Code Changes
```rust
// In handle_create:
let expert_panel: Option<Vec<String>> = args
.get("expert_panel")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(|s| s.to_string()))
.collect()
});
// Require expert_panel for alignment mode
if alignment && expert_panel.is_none() {
return Err(ServerError::InvalidParams);
}
let agent_count = expert_panel.as_ref().map(|p| p.len()).unwrap_or(agent_count);
let agents = assign_pastry_agents(agent_count, expert_panel);
```
```rust
// Modified assign_pastry_agents:
pub fn assign_pastry_agents(count: usize, roles: Option<Vec<String>>) -> Vec<PastryAgent> {
let (core_count, adjacent_count, _wildcard_count) = tier_split(count);
(0..count)
.map(|i| {
let name = PASTRY_NAMES.get(i).unwrap_or(&"Pastry").to_string();
let role = roles
.as_ref()
.and_then(|r| r.get(i))
.cloned()
.unwrap_or_else(|| "Expert".to_string());
// ... tier/relevance assignment unchanged
})
.collect()
}
```
## Test Plan
- [ ] `blue_dialogue_create` with `expert_panel` creates correct roles
- [ ] `blue_dialogue_create` without `expert_panel` in alignment mode returns error
- [ ] Pastry names assigned correctly regardless of panel
- [ ] Tier distribution follows Core/Adjacent/Wildcard split
- [ ] `blue_dialogue_round_prompt` returns correct role in prompt
- [ ] End-to-end: alignment dialogue with custom panel runs successfully
## Migration
No migration needed. This is a breaking change:
- Remove keyword-based role selection entirely
- Alignment dialogues now require `expert_panel`
- Non-alignment dialogues unaffected
---
*"The Judge sees the whole elephant. The Judge picks which blind men to summon."*
— Blue

View file

@ -0,0 +1,422 @@
# RFC 0047: Expert Pool Sampling Architecture
| | |
|---|---|
| **Status** | Superseded |
| **Superseded By** | RFC 0048 (Alignment Expert Pools) |
| **Date** | 2026-02-01 |
| **ADRs** | 0014 (Alignment Dialogue Agents) |
| **Extends** | RFC 0046 (Judge Defined Expert Panels) |
---
## Summary
Extend the alignment dialogue system to support **larger expert pools** from which panels are sampled. Currently, `blue_dialogue_create(agents=12)` creates exactly 12 fixed agents. This RFC introduces a two-phase architecture: (1) Judge defines a domain-appropriate pool of 15-30 experts, (2) MCP server samples N experts from this pool, with optional per-round rotation.
## Problem
RFC 0046 addresses the issue of inappropriate auto-selected roles. However, it still creates a **fixed panel** of exactly N agents who participate in all rounds. This misses opportunities:
1. **Perspective diversity**: A larger pool enables rotation, bringing fresh perspectives each round
2. **Stochastic exploration**: Weighted random sampling may surface unexpected insights
3. **Tiered expertise**: Core experts can be retained while Wildcards rotate
4. **Demonstrable reasoning**: Users see the Judge's domain analysis reflected in pool design
Current architecture:
```
blue_dialogue_create(agents=12, expert_panel=[...12 roles...])
→ Creates 12 fixed agents
→ Same 12 participate in all rounds
```
Proposed architecture:
```
blue_dialogue_create(expert_pool=[...24 roles...], panel_size=12, rotation="wildcards")
→ Creates 24-expert domain pool
→ Samples 12 for Round 0
→ Rotates Wildcards each round (retains Core/Adjacent)
```
## Design
### Expert Pool Structure
The Judge creates a pool with tiered relevance:
```json
{
"expert_pool": {
"domain": "Investment Analysis",
"question": "Should Acme Trust add NVIDIA by trimming NVAI?",
"experts": [
{ "role": "Value Analyst", "tier": "Core", "relevance": 0.95 },
{ "role": "Growth Analyst", "tier": "Core", "relevance": 0.90 },
{ "role": "Risk Manager", "tier": "Core", "relevance": 0.85 },
{ "role": "Portfolio Strategist", "tier": "Core", "relevance": 0.80 },
{ "role": "ESG Analyst", "tier": "Adjacent", "relevance": 0.70 },
{ "role": "Quant Strategist", "tier": "Adjacent", "relevance": 0.65 },
{ "role": "Technical Analyst", "tier": "Adjacent", "relevance": 0.60 },
{ "role": "Behavioral Analyst", "tier": "Adjacent", "relevance": 0.55 },
{ "role": "Income Analyst", "tier": "Adjacent", "relevance": 0.50 },
{ "role": "Macro Economist", "tier": "Wildcard", "relevance": 0.40 },
{ "role": "Credit Analyst", "tier": "Wildcard", "relevance": 0.35 },
{ "role": "Contrarian", "tier": "Wildcard", "relevance": 0.30 },
{ "role": "Geopolitical Analyst", "tier": "Wildcard", "relevance": 0.25 },
{ "role": "Market Historian", "tier": "Wildcard", "relevance": 0.22 },
{ "role": "Options Strategist", "tier": "Wildcard", "relevance": 0.20 }
]
},
"panel_size": 12,
"rotation": "wildcards"
}
```
### Tier Distribution
For a pool of P experts with panel size N:
| Tier | Pool % | Panel % | Purpose |
|------|--------|---------|---------|
| **Core** | ~25% | ~33% | Domain essentials, always selected |
| **Adjacent** | ~40% | ~42% | Related expertise, high selection probability |
| **Wildcard** | ~35% | ~25% | Fresh perspectives, rotation candidates |
### Sampling Algorithm
```rust
/// Sample N experts from pool for a round
fn sample_panel(pool: &ExpertPool, panel_size: usize, round: usize, rotation: RotationMode) -> Vec<PastryAgent> {
let (core_n, adj_n, wc_n) = tier_split(panel_size);
match rotation {
RotationMode::None => {
// Round 0 selection persists all rounds (current behavior)
if round == 0 {
weighted_sample(&pool.core, core_n)
.chain(weighted_sample(&pool.adjacent, adj_n))
.chain(weighted_sample(&pool.wildcard, wc_n))
} else {
// Return same panel as round 0
load_round_0_panel()
}
}
RotationMode::Wildcards => {
// Core/Adjacent persist, Wildcards resample each round
let core = if round == 0 { weighted_sample(&pool.core, core_n) } else { load_core_panel() };
let adjacent = if round == 0 { weighted_sample(&pool.adjacent, adj_n) } else { load_adjacent_panel() };
let wc_remaining = pool.wildcard.iter()
.filter(|e| !used_wildcards.contains(&e.role))
.collect();
let wildcards = weighted_sample(&wc_remaining, wc_n);
core.chain(adjacent).chain(wildcards)
}
RotationMode::Full => {
// Complete resample each round (respects relevance weights)
weighted_sample(&pool.all, panel_size)
}
}
}
/// Weighted random sampling without replacement
fn weighted_sample(experts: &[Expert], n: usize) -> Vec<Expert> {
let total_weight: f64 = experts.iter().map(|e| e.relevance).sum();
let probs: Vec<f64> = experts.iter().map(|e| e.relevance / total_weight).collect();
// Reservoir sampling with weights
weighted_reservoir_sample(experts, probs, n)
}
```
### API Changes
#### `blue_dialogue_create` Extended Parameters
```json
{
"title": "NVIDIA Investment Decision",
"alignment": true,
"expert_pool": {
"domain": "Investment Analysis",
"experts": [
{ "role": "Value Analyst", "tier": "Core", "relevance": 0.95, "focus": "Intrinsic value, margin of safety" },
// ... 15-30 experts
]
},
"panel_size": 12,
"rotation": "wildcards" // "none" | "wildcards" | "full"
}
```
#### Backward Compatibility
- `expert_panel` (RFC 0046) still works → creates fixed panel, no pool
- `expert_pool` (this RFC) → creates pool with sampling
```rust
if let Some(pool) = args.get("expert_pool") {
// New pool-based architecture
create_with_pool(pool, panel_size, rotation)
} else if let Some(panel) = args.get("expert_panel") {
// RFC 0046 behavior - fixed panel
create_with_fixed_panel(panel)
} else {
// Error - alignment requires either pool or panel
Err(ServerError::InvalidParams)
}
```
### New Tool: `blue_dialogue_sample_panel`
For manual round-by-round control:
```json
{
"name": "blue_dialogue_sample_panel",
"description": "Sample a new panel from the expert pool for the next round",
"params": {
"dialogue_id": "nvidia-investment-decision",
"round": 1,
"retain_experts": ["muffin", "cupcake", "scone"], // Optional: keep specific experts
"exclude_experts": ["beignet"] // Optional: exclude specific experts
}
}
```
### Pool Persistence
Expert pools are stored per-dialogue:
```
{output_dir}/
├── expert-pool.json ← Full pool definition (Judge writes)
├── round-0/
│ ├── panel.json ← Sampled panel for this round
│ └── *.md ← Agent responses
├── round-1/
│ ├── panel.json ← May differ if rotation enabled
│ └── *.md
└── scoreboard.md
```
### Judge Workflow
1. **Analyze problem**: Read RFC/topic, identify required expertise domains
2. **Design pool**: Create 15-30 experts across Core/Adjacent/Wildcard tiers
3. **Create dialogue**: Call `blue_dialogue_create` with `expert_pool`
4. **Run rounds**: MCP server handles sampling automatically
5. **Review selections**: Pool and panel visible in output files
## ADR 0014 Amendment
Add to ADR 0014:
```markdown
### Expert Pools (RFC 0047)
The Judge may create a **larger expert pool** from which panels are sampled:
| Concept | Description |
|---------|-------------|
| **Pool** | 15-30 domain-appropriate experts defined by Judge |
| **Panel** | N experts sampled from pool for a given round |
| **Sampling** | Weighted random selection respecting relevance scores |
| **Rotation** | Optional: Wildcards may rotate between rounds |
Pool design is a Judge responsibility. The Judge understands the problem domain after reading the RFC/topic and designs experts accordingly.
**Tier Distribution**:
- **Core** (~25% of pool, 33% of panel): Essential domain experts, always selected
- **Adjacent** (~40% of pool, 42% of panel): Related expertise, high probability
- **Wildcard** (~35% of pool, 25% of panel): Fresh perspectives, rotation candidates
**Rotation Modes**:
- `none`: Fixed panel (current behavior)
- `wildcards`: Core/Adjacent persist, Wildcards resample each round
- `full`: Complete resample each round (experimental)
```
## Skill Update: alignment-play
```markdown
## Phase 1: Pool Design
Before creating the dialogue, the Judge:
1. Reads the topic/RFC thoroughly
2. Identifies the **domain** (e.g., "Investment Analysis", "System Architecture")
3. Designs **15-30 experts** appropriate to the domain:
- **Core (4-8)**: Essential perspectives for this specific problem
- **Adjacent (6-12)**: Related expertise that adds depth
- **Wildcard (5-10)**: Fresh perspectives, contrarians, cross-domain insight
4. Assigns **relevance scores** (0.20-0.95) based on expected contribution
5. Calls `blue_dialogue_create` with the `expert_pool`
## Phase 2: Round Execution
The MCP server:
1. Samples `panel_size` experts from pool using weighted random selection
2. Higher relevance = higher selection probability
3. Core experts almost always selected; Wildcards provide variety
4. If rotation enabled, Wildcards resample each round
## Phase 3: Convergence
Same as current: velocity → 0 or tensions resolved
```
## Implementation
### Changes to `dialogue.rs`
```rust
/// Expert pool with tiered structure
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ExpertPool {
pub domain: String,
pub experts: Vec<PoolExpert>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PoolExpert {
pub role: String,
pub tier: ExpertTier,
pub relevance: f64,
pub focus: Option<String>,
pub bias: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ExpertTier {
Core,
Adjacent,
Wildcard,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum RotationMode {
None, // Fixed panel all rounds
Wildcards, // Core/Adjacent fixed, Wildcards rotate
Full, // Complete resample each round
}
/// Handle blue_dialogue_create with pool support
pub fn handle_create(state: &mut ProjectState, args: &Value) -> Result<Value, ServerError> {
// ... existing validation ...
if let Some(pool_json) = args.get("expert_pool") {
let pool: ExpertPool = serde_json::from_value(pool_json.clone())?;
let panel_size = args.get("panel_size").and_then(|v| v.as_u64()).unwrap_or(12) as usize;
let rotation: RotationMode = args.get("rotation")
.and_then(|v| v.as_str())
.map(|s| match s {
"wildcards" => RotationMode::Wildcards,
"full" => RotationMode::Full,
_ => RotationMode::None,
})
.unwrap_or(RotationMode::None);
// Sample initial panel
let agents = sample_panel_from_pool(&pool, panel_size);
// Persist pool to output directory
let pool_path = format!("{}/expert-pool.json", output_dir);
fs::write(&pool_path, serde_json::to_string_pretty(&pool)?)?;
// ... rest of dialogue creation ...
}
}
```
### New Handler: `handle_sample_panel`
```rust
pub fn handle_sample_panel(state: &ProjectState, args: &Value) -> Result<Value, ServerError> {
let dialogue_id = args.get("dialogue_id").and_then(|v| v.as_str())
.ok_or(ServerError::InvalidParams)?;
let round = args.get("round").and_then(|v| v.as_u64())
.ok_or(ServerError::InvalidParams)? as usize;
// Load pool from dialogue directory
let pool_path = format!("/tmp/blue-dialogue/{}/expert-pool.json", dialogue_id);
let pool: ExpertPool = serde_json::from_str(&fs::read_to_string(&pool_path)?)?;
// Parse retain/exclude lists
let retain: Vec<String> = args.get("retain_experts")
.and_then(|v| v.as_array())
.map(|arr| arr.iter().filter_map(|v| v.as_str().map(String::from)).collect())
.unwrap_or_default();
// Sample new panel
let panel = sample_panel_with_constraints(&pool, 12, round, &retain);
// Persist panel for this round
let panel_path = format!("/tmp/blue-dialogue/{}/round-{}/panel.json", dialogue_id, round);
fs::write(&panel_path, serde_json::to_string_pretty(&panel)?)?;
Ok(json!({
"status": "success",
"round": round,
"panel": panel,
}))
}
```
## Test Plan
- [ ] `blue_dialogue_create` with `expert_pool` creates pool file
- [ ] Initial panel respects tier distribution (33/42/25)
- [ ] Weighted sampling: higher relevance = higher selection probability
- [ ] `rotation: "wildcards"` keeps Core/Adjacent, rotates Wildcards
- [ ] `rotation: "none"` uses same panel all rounds
- [ ] `blue_dialogue_sample_panel` respects `retain_experts`
- [ ] Pool persists across rounds in output directory
- [ ] Backward compatibility: `expert_panel` (RFC 0046) still works
## Visualization (for demo)
The demo page can show:
```
┌─────────────────────────────────────────────────────────────┐
│ INVESTMENT EXPERT POOL 24 total │
├─────────────────────────────────────────────────────────────┤
│ CORE (6) ████████████████████ rel: 0.80-0.95 │
│ ✓ Value Analyst ✓ Growth Analyst ✓ Risk Manager │
│ ✓ Portfolio Strat ○ Fundamental ○ Tax Specialist │
├─────────────────────────────────────────────────────────────┤
│ ADJACENT (10) ████████████████ rel: 0.50-0.70 │
│ ✓ ESG Analyst ✓ Quant Strategist ✓ Technical Analyst │
│ ✓ Behavioral ✓ Income Analyst ○ Credit Analyst │
│ ○ Governance ○ Competitive ○ Regulatory │
│ ○ Momentum Trader │
├─────────────────────────────────────────────────────────────┤
│ WILDCARD (8) ████████ rel: 0.20-0.40 │
│ ✓ Macro Economist ✓ Contrarian ✓ Geopolitical │
│ ○ Market Historian ○ Options Strat ○ Ethicist │
│ ○ Retail Sentiment ○ Academic │
└─────────────────────────────────────────────────────────────┘
✓ = Selected for Round 0 ○ = Available in pool
[🎲 Resample Panel] ← Click to see stochastic selection
```
---
## Philosophy
> "The Judge sees the elephant. The Judge summons the right blind men."
The alignment dialogue system embodies the parable of the blind men and the elephant. Each expert touches a different part. Wisdom emerges from integration.
With expert pools:
- The Judge **designs** the population of potential perspectives
- The MCP server **samples** fairly from that population
- Rotation **refreshes** the conversation with new viewpoints
- The final verdict reflects **multiple samplings** of the elephant
This is ALIGNMENT by design: more blind men, more parts touched, more wisdom integrated.
---
*Blue*

View file

@ -0,0 +1,376 @@
# RFC 0048: Alignment Expert Pools
| | |
|---|---|
| **Status** | Implemented |
| **Date** | 2026-02-01 |
| **ADRs** | 0014 (Alignment Dialogue Agents) |
| **Supersedes** | RFC 0046, RFC 0047 |
---
## Summary
The alignment dialogue system uses keyword matching to auto-select generic expert roles, producing inappropriate panels for domain-specific topics. This RFC introduces **Judge-defined expert pools**: the Judge designs a tiered pool of domain-appropriate experts, and the MCP server samples panels from this pool with optional per-round rotation.
## Problem
When `blue_dialogue_create` is called for an alignment dialogue, expert roles are auto-selected via:
1. Keyword matching against topic title (e.g., "security" → "Security Architect")
2. Fallback to generic roles: Systems Thinker, Domain Expert, Devil's Advocate
This fails for:
- **Domain-specific topics**: Investment analysis gets "Systems Architect" instead of "Portfolio Manager"
- **Cross-functional topics**: A product launch might need Marketing, Legal, Finance perspectives
- **Novel domains**: Topics without keyword matches get only generic roles
- **Perspective diversity**: Fixed panels miss opportunities for rotation and fresh viewpoints
The Judge understands the problem space after reading the topic. The Judge should design the expert pool.
## Design
### Expert Pool Structure
The Judge creates a pool with three tiers:
```json
{
"title": "NVIDIA Investment Decision",
"alignment": true,
"expert_pool": {
"domain": "Investment Analysis",
"question": "Should Acme Trust add NVIDIA by trimming NVAI?",
"experts": [
{ "role": "Value Analyst", "tier": "Core", "relevance": 0.95 },
{ "role": "Growth Analyst", "tier": "Core", "relevance": 0.90 },
{ "role": "Risk Manager", "tier": "Core", "relevance": 0.85 },
{ "role": "Portfolio Strategist", "tier": "Core", "relevance": 0.80 },
{ "role": "ESG Analyst", "tier": "Adjacent", "relevance": 0.70 },
{ "role": "Quant Strategist", "tier": "Adjacent", "relevance": 0.65 },
{ "role": "Technical Analyst", "tier": "Adjacent", "relevance": 0.60 },
{ "role": "Behavioral Analyst", "tier": "Adjacent", "relevance": 0.55 },
{ "role": "Income Analyst", "tier": "Adjacent", "relevance": 0.50 },
{ "role": "Macro Economist", "tier": "Wildcard", "relevance": 0.40 },
{ "role": "Contrarian", "tier": "Wildcard", "relevance": 0.35 },
{ "role": "Geopolitical Analyst", "tier": "Wildcard", "relevance": 0.30 },
{ "role": "Market Historian", "tier": "Wildcard", "relevance": 0.25 }
]
},
"panel_size": 7,
"rotation": "wildcards"
}
```
### Tier Distribution
| Tier | Pool % | Panel % | Purpose |
|------|--------|---------|---------|
| **Core** | ~30% | ~33% | Domain essentials, always selected |
| **Adjacent** | ~40% | ~42% | Related expertise, high selection probability |
| **Wildcard** | ~30% | ~25% | Fresh perspectives, rotation candidates |
### Rotation Modes
| Mode | Behavior |
|------|----------|
| `none` | Fixed panel for all rounds (default) |
| `wildcards` | Core/Adjacent persist, Wildcards resample each round |
| `full` | Complete resample each round |
### Sampling Algorithm
```rust
fn sample_panel(pool: &ExpertPool, panel_size: usize, round: usize, rotation: RotationMode) -> Vec<PastryAgent> {
let (core_n, adj_n, wc_n) = tier_split(panel_size);
match rotation {
RotationMode::None => {
// Round 0 selection persists all rounds
if round == 0 {
weighted_sample(&pool.core, core_n)
.chain(weighted_sample(&pool.adjacent, adj_n))
.chain(weighted_sample(&pool.wildcard, wc_n))
} else {
load_round_0_panel()
}
}
RotationMode::Wildcards => {
// Core/Adjacent persist, Wildcards resample
let core = if round == 0 { weighted_sample(&pool.core, core_n) } else { load_core() };
let adjacent = if round == 0 { weighted_sample(&pool.adjacent, adj_n) } else { load_adjacent() };
let unused_wc = pool.wildcard.iter().filter(|e| !used_in_previous_rounds(e));
let wildcards = weighted_sample(&unused_wc, wc_n);
core.chain(adjacent).chain(wildcards)
}
RotationMode::Full => {
weighted_sample(&pool.all, panel_size)
}
}
}
fn weighted_sample(experts: &[Expert], n: usize) -> Vec<Expert> {
// Higher relevance = higher selection probability
let total: f64 = experts.iter().map(|e| e.relevance).sum();
let probs: Vec<f64> = experts.iter().map(|e| e.relevance / total).collect();
weighted_reservoir_sample(experts, probs, n)
}
```
### API
#### `blue_dialogue_create`
```json
{
"title": "...",
"alignment": true,
"expert_pool": {
"domain": "string",
"question": "string (optional)",
"experts": [
{ "role": "string", "tier": "Core|Adjacent|Wildcard", "relevance": 0.0-1.0 }
]
},
"panel_size": 12,
"rotation": "none|wildcards|full"
}
```
**Required for alignment mode**: `expert_pool` with at least 3 experts.
**Validation**:
- Error if `alignment: true` but no `expert_pool`
- Error if `panel_size` > total experts in pool
- Error if relevance not in 0.0-1.0 range
- Warning if no Wildcard tier experts (groupthink risk)
#### `blue_dialogue_sample_panel` (New)
Manual round-by-round control:
```json
{
"dialogue_title": "nvidia-investment-decision",
"round": 1,
"retain": ["Muffin", "Cupcake"],
"exclude": ["Beignet"]
}
```
### Output Structure
```
{output_dir}/
├── expert-pool.json # Full pool (Judge's design)
├── round-0/
│ ├── panel.json # Sampled panel for this round
│ └── *.md # Agent responses
├── round-1/
│ ├── panel.json # May differ if rotation enabled
│ └── *.md
└── scoreboard.md
```
### Dialogue Markdown
```markdown
## Expert Pool
**Domain**: Investment Analysis
**Question**: Should Acme Trust add NVIDIA by trimming NVAI?
| Tier | Experts |
|------|---------|
| Core | Value Analyst, Growth Analyst, Risk Manager, Portfolio Strategist |
| Adjacent | ESG Analyst, Quant Strategist, Technical Analyst, Behavioral Analyst, Income Analyst |
| Wildcard | Macro Economist, Contrarian, Geopolitical Analyst, Market Historian |
## Round 0 Panel
| Agent | Role | Tier | Relevance | Emoji |
|-------|------|------|-----------|-------|
| 🧁 Muffin | Value Analyst | Core | 0.95 | 🧁 |
| 🧁 Cupcake | Risk Manager | Core | 0.85 | 🧁 |
| 🧁 Scone | ESG Analyst | Adjacent | 0.70 | 🧁 |
| 🧁 Eclair | Technical Analyst | Adjacent | 0.60 | 🧁 |
| 🧁 Donut | Behavioral Analyst | Adjacent | 0.55 | 🧁 |
| 🧁 Brioche | Contrarian | Wildcard | 0.35 | 🧁 |
| 🧁 Croissant | Market Historian | Wildcard | 0.25 | 🧁 |
```
## Implementation
### Data Structures
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ExpertPool {
pub domain: String,
pub question: Option<String>,
pub experts: Vec<PoolExpert>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PoolExpert {
pub role: String,
pub tier: ExpertTier,
pub relevance: f64,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub enum ExpertTier {
Core,
Adjacent,
Wildcard,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize, Default)]
pub enum RotationMode {
#[default]
None,
Wildcards,
Full,
}
```
### Changes to `dialogue.rs`
1. **Remove**: `ROLE_KEYWORDS`, `GENERAL_ROLES`, `select_role_for_topic`
2. **Add**: `ExpertPool`, `PoolExpert`, `ExpertTier`, `RotationMode` structs
3. **Add**: `sample_panel_from_pool`, `weighted_sample` functions
4. **Modify**: `handle_create` to parse `expert_pool` and require it for alignment mode
5. **Modify**: `assign_pastry_agents` to accept sampled experts instead of generating roles
6. **Add**: `handle_sample_panel` for manual round control
### handle_create Changes
```rust
pub fn handle_create(state: &mut ProjectState, args: &Value) -> Result<Value, ServerError> {
// ... existing validation ...
if alignment {
let pool: ExpertPool = args.get("expert_pool")
.ok_or_else(|| ServerError::InvalidParams("alignment requires expert_pool".into()))?
.try_into()?;
let panel_size = args.get("panel_size")
.and_then(|v| v.as_u64())
.unwrap_or(pool.experts.len().min(12) as u64) as usize;
let rotation: RotationMode = args.get("rotation")
.and_then(|v| v.as_str())
.map(|s| match s {
"wildcards" => RotationMode::Wildcards,
"full" => RotationMode::Full,
_ => RotationMode::None,
})
.unwrap_or_default();
// Sample initial panel
let sampled = sample_panel_from_pool(&pool, panel_size, 0, rotation);
let agents = assign_pastry_names(sampled);
// Persist pool
let pool_path = format!("{}/expert-pool.json", output_dir);
fs::write(&pool_path, serde_json::to_string_pretty(&pool)?)?;
// ... continue with dialogue creation ...
}
}
```
## Judge Workflow
### Phase 0: Pool Design
Before creating the dialogue, the Judge:
1. **Reads** the topic/RFC thoroughly
2. **Identifies** the domain (e.g., "Investment Analysis", "System Architecture")
3. **Designs** 8-24 experts appropriate to the domain:
- **Core (3-8)**: Essential perspectives for this specific problem
- **Adjacent (4-10)**: Related expertise that adds depth
- **Wildcard (3-6)**: Fresh perspectives, contrarians, cross-domain insight
4. **Assigns** relevance scores (0.20-0.95) based on expected contribution
5. **Creates** the dialogue with `expert_pool`
### Phase 1+: Round Execution
The MCP server:
1. Samples `panel_size` experts using weighted random selection
2. Higher relevance = higher selection probability
3. Core experts almost always selected; Wildcards provide variety
4. If rotation enabled, Wildcards resample each round
### Convergence
Same as ADR 0014: velocity → 0 for 3 consecutive rounds, or tensions resolved.
## Skill Update: alignment-play
```markdown
## Parameters
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--panel-size` | pool size or 12 | Number of experts per round |
| `--rotation` | `none` | Rotation mode: none, wildcards, full |
| `--max-rounds` | `12` | Maximum rounds before stopping |
| `--rfc` | none | Link dialogue to an RFC |
## Expert Pool Design
The Judge designs the pool before creating the dialogue:
1. Analyze the problem domain
2. Identify 8-24 relevant expert roles
3. Assign tiers (Core/Adjacent/Wildcard)
4. Assign relevance scores (0.20-0.95)
5. Call blue_dialogue_create with expert_pool
Example pool for "API Rate Limiting Strategy":
| Role | Tier | Relevance |
|------|------|-----------|
| API Architect | Core | 0.95 |
| Platform Engineer | Core | 0.90 |
| Security Engineer | Core | 0.85 |
| SRE Lead | Adjacent | 0.70 |
| Developer Advocate | Adjacent | 0.65 |
| Cost Analyst | Adjacent | 0.55 |
| Customer Success | Wildcard | 0.40 |
| Chaos Engineer | Wildcard | 0.30 |
```
## Test Plan
- [ ] `blue_dialogue_create` requires `expert_pool` for alignment mode
- [ ] Error returned when `expert_pool` missing in alignment mode
- [ ] Pool persisted to `expert-pool.json` in output directory
- [ ] Weighted sampling: higher relevance = higher selection probability
- [ ] Tier distribution respected: ~33% Core, ~42% Adjacent, ~25% Wildcard
- [ ] `rotation: "none"` uses same panel all rounds
- [ ] `rotation: "wildcards"` keeps Core/Adjacent, rotates Wildcards
- [ ] `rotation: "full"` resamples completely each round
- [ ] `blue_dialogue_sample_panel` respects retain/exclude
- [ ] `blue_dialogue_round_prompt` returns correct role from sampled panel
- [ ] Pastry names assigned correctly to sampled experts
- [ ] End-to-end: alignment dialogue with custom pool runs successfully
## Migration
**Breaking change**: Remove all keyword-based role selection.
- Delete `ROLE_KEYWORDS` constant
- Delete `GENERAL_ROLES` constant
- Delete `select_role_for_topic` function
- Alignment dialogues require `expert_pool` parameter
- Non-alignment dialogues unaffected
---
*"The Judge sees the elephant. The Judge summons the right blind men. The sampling ensures no single perspective dominates."*
— Blue

View file

@ -0,0 +1,255 @@
# RFC 0050: Graduated Panel Rotation
| | |
|---|---|
| **Status** | Draft |
| **Date** | 2026-02-01 |
| **ADRs** | 0014 (Alignment Dialogue Agents) |
| **Extends** | RFC 0048 (Alignment Expert Pools) |
---
## Summary
The current alignment dialogue system samples a fixed panel from the expert pool for Round 0 and uses **the same panel for all rounds**. This wastes the larger pool and misses opportunities for fresh perspectives. This RFC introduces **graduated panel rotation**: the Judge evolves the panel each round based on dialogue dynamics, with freedom to retain high performers, bring in fresh perspectives, and even create new experts to address emerging tensions.
## Problem
In the NVIDIA Investment Decision dialogue:
- **Pool size**: 22 experts across Core/Adjacent/Wildcard tiers
- **Panel size**: 12 experts
- **Actual behavior**: Same 12 experts for all 3 rounds
- **Expected behavior**: Panel evolves based on dialogue needs
The dialogue converged with contributions from Strudel (Automotive Tech Analyst) and Brioche (Options Strategist). But 10 experts in the pool **never participated**: Value Analyst, Data Center Specialist, Supply Chain Analyst, ESG Analyst, Quant Strategist, Behavioral Finance Expert, Energy Sector Analyst, Retail Investor Advocate, Regulatory Expert, and Gaming Industry Analyst.
Worse: when a tension emerged around regulatory risk, there was no mechanism to pull in the Regulatory Expert specifically to address it.
## Design
### Judge-Driven Panel Evolution
Instead of algorithmic rotation with fixed parameters, the **Judge decides** how to evolve the panel each round. The MCP server provides infrastructure; the Judge provides judgment.
### Rotation Mode: `graduated`
```json
{
"rotation": "graduated"
}
```
That's it. No `rotation_config`. The Judge receives guidelines in the skill prompt.
### Judge Guidelines (in alignment-play skill)
The skill prompt instructs the Judge on panel evolution principles:
```markdown
## Panel Evolution Guidelines
Between rounds, you decide how to evolve the panel. Consider:
### Retention Criteria
- **High scorers**: Experts who contributed sharp insights should continue
- **Unresolved advocates**: Experts defending positions with open tensions
- **Core relevance**: Experts central to the domain should anchor continuity
### Fresh Perspective Triggers
- **Stale consensus**: If the panel is converging too easily, bring challengers
- **Unexplored angles**: Pull in experts whose focus hasn't been represented
- **Low-scoring experts**: Consider rotating out experts who aren't contributing
### Targeted Expert Injection
When a specific tension emerges that no current expert can address:
1. Check if the pool has a relevant expert → pull them in
2. If not, **create a new expert** with the needed focus
Example: Tension T03 raises supply chain concentration risk, but no Supply Chain
Analyst is on the panel. Either pull from pool or create:
```json
{ "role": "Supply Chain Analyst", "tier": "adjacent", "focus": "Geographic concentration, single-source risk" }
```
### Panel Size Flexibility
- Target panel size is a guideline, not a constraint
- You may run a smaller panel if the dialogue is converging
- You may expand briefly to address a complex tension
### Expert Creation
You are not limited to the initial pool. If the dialogue surfaces a perspective
that no pooled expert covers, create one. The pool was your starting point,
not your ceiling.
```
### MCP Server Role
The server provides:
1. **Panel tracking**: Record which experts participated in which rounds
2. **Context briefs**: Generate summaries for fresh experts joining mid-dialogue
3. **Expert registry**: Accept new experts created by the Judge
4. **History persistence**: Store panel evolution for post-hoc analysis
The server does **not**:
- Decide which experts to retain
- Calculate overlap ratios
- Enforce tier-based rules
### API
#### `blue_dialogue_round_prompt`
When the Judge requests the next round, they specify the panel:
```json
{
"round": 1,
"panel": [
{ "name": "Muffin", "role": "Value Analyst", "retained": true },
{ "name": "Scone", "role": "Data Center Specialist", "source": "pool" },
{ "name": "Palmier", "role": "Supply Chain Risk Analyst", "source": "created", "focus": "Geographic concentration" }
]
}
```
The server:
- Validates expert names are unique
- Generates context briefs for non-retained experts
- Records the panel composition
- Returns prompts for each expert
#### Response
```json
{
"round": 1,
"panel_size": 12,
"retained": 6,
"from_pool": 5,
"created": 1,
"context_brief": "## Round 0 Summary\n...",
"expert_prompts": [...]
}
```
### Persistence
```
{output_dir}/
├── expert-pool.json # Initial pool (Judge's starting point)
├── round-0/
│ └── panel.json # { "experts": [...] }
├── round-1/
│ └── panel.json # { "experts": [...], "retained": [...], "fresh": [...], "created": [...] }
└── round-2/
└── panel.json
```
## Dialogue Continuity
Fresh experts (from pool or created) receive a context brief:
```markdown
## Context for Round 1
You are joining this dialogue in Round 1. Here's what happened:
### Key Tensions Raised (Round 0)
- T01: Growth mandate vs. valuation discipline
- T02: Hedging income vs. conviction allocation
### Current Panel Position (Round 0)
- 10 experts: Don't Add
- 1 expert (Brioche): Options Reframe
- 1 expert (Strudel): Automotive Differentiation
### Your Task
Review these positions and contribute your perspective as {role}.
```
## Example: 3-Round Dialogue with Targeted Injection
**Initial Pool**: 22 experts
**Round 0 Panel**: 12 sampled experts
```
Round 0:
├── Panel deliberates
├── Tension T03 emerges: "What about Taiwan concentration risk?"
└── No Supply Chain expert on panel
Judge decision for Round 1:
├── Retain: 7 experts (high scorers + tension advocates)
├── Rotate out: 5 experts (low contribution)
├── Pull from pool: 4 experts including Supply Chain Analyst
├── Create: 1 new expert "Geopolitical Risk Analyst" (not in original pool)
└── New panel size: 12
Round 1:
├── Supply Chain Analyst addresses T03 directly
├── Geopolitical Risk Analyst adds Taiwan Strait context
├── T03 marked [RESOLVED] with synthesis
└── New tension T04 emerges around AI chip export controls
Judge decision for Round 2:
├── Retain: 8 experts (T04 is complex, needs continuity)
├── Pull from pool: 2 experts
├── Create: "Export Control Specialist" for T04
└── Smaller panel: 11 (dialogue converging)
```
**Result**:
- 18 of 22 pool experts participated
- 2 experts created on-demand
- All tensions addressed by relevant expertise
## Comparison to Current Modes
| Aspect | `none` | `wildcards` | `full` | `graduated` (new) |
|--------|--------|-------------|--------|-------------------|
| Pool utilization | ~50% | ~65% | 100% | High (Judge discretion) |
| Dialogue continuity | High | High | Low | High (retained experts) |
| Fresh perspectives | None | Some | All | As needed |
| Targeted expertise | No | No | No | **Yes** |
| Expert creation | No | No | No | **Yes** |
| Configurable | No | No | No | Via guidelines |
## Implementation
### Changes to `dialogue.rs`
1. Accept `panel` specification in `round_prompt` request
2. Track expert sources: `retained`, `pool`, `created`
3. Generate context briefs for non-retained experts
4. Persist panel history per round
### Changes to `alignment-play` skill
Add Judge guidelines for panel evolution (see above).
### No New Config Structs
The Judge's judgment replaces configuration. The server just records what the Judge decides.
## Test Plan
- [ ] Judge can specify panel composition in round prompt
- [ ] Fresh experts receive context briefs
- [ ] Created experts are registered and tracked
- [ ] Panel history persists across rounds
- [ ] Backward compatibility: `rotation: "none"` still works
## Philosophy
> "The Judge sees the elephant. The Judge summons the right blind men. And when a new part of the elephant emerges, the Judge can summon someone who wasn't in the original room."
The pool is a starting point, not a constraint. The Judge's job is to ensure every relevant perspective touches the elephant. Sometimes that means pulling from the pool. Sometimes that means creating a new expert on the spot.
This is ALIGNMENT by design: **responsive expertise** rather than **fixed sampling**.
---
*"The elephant is larger than we thought. Let me get someone who knows about tusks."*
— The Judge

View file

@ -0,0 +1,102 @@
# Spike: Sqlite Alignment Dialogue Assets
| | |
|---|---|
| **Status** | Complete |
| **Date** | 2026-01-31 |
| **Time Box** | 1 hour |
---
## Question
Would storing alignment dialogue assets (round outputs, scoreboard, tensions, agent responses) in SQLite be faster than the current file-based approach?
---
## Current Architecture
File-based storage in `/tmp/blue-dialogue/<topic>/`:
```
├── scoreboard.md (~500 bytes, Judge writes)
├── tensions.md (~1-2KB, Judge writes, agents read)
├── round-0.summary.md (~1-2KB, Judge writes, agents read)
├── round-0/
│ ├── muffin.md (~1.2KB, agent writes)
│ ├── cupcake.md (~1.2KB, agent writes)
│ └── ... (6-12 agents)
└── round-1/...
```
**Total per round**: ~15-25KB
## I/O Pattern Analysis
| Operation | Who | Concurrency | Size |
|-----------|-----|-------------|------|
| Write agent response | 6-12 agents | Parallel (separate files) | 1-1.5KB each |
| Read all agent files | Judge | Sequential | ~10KB |
| Write scoreboard | Judge | Single | ~500B |
| Write tensions | Judge | Single | ~1-2KB |
| Write summary | Judge | Single | ~1-2KB |
| Read context (next round) | Agents | Parallel | ~5KB each |
## Bottleneck Analysis
| Operation | Time |
|-----------|------|
| LLM inference per agent | **30-60 seconds** |
| File write | ~1-5ms |
| File read | ~1-5ms |
| All file I/O per round | ~50ms total |
**The actual bottleneck is LLM inference, not file I/O.** Even eliminating all file operations would save ~50ms on a 3-5 minute round.
## SQLite Trade-offs
### Potential Pros
- Single file instead of directory tree
- Transactional writes
- Queryable (find all tensions across all dialogues)
- Integration with existing blue-core SQLite db
### Significant Cons
1. **Subagents use Write tool** → can't write to SQLite directly
- Would need new MCP tools: `blue_dialogue_write_response`, `blue_dialogue_read_context`
- Significant API surface increase
2. **Parallel writes require careful handling**
- SQLite has write lock; 6-12 agents writing simultaneously would serialize
- Would need WAL mode + careful transaction design
3. **Files are trivially debuggable**
- `cat`, `grep`, `less` just work
- SQLite requires tooling to inspect
4. **No performance gain**
- Bottleneck is LLM, not I/O
5. **More complexity for same result**
## The Real Problem
The current issue isn't file I/O speed. It's that subagents weren't reliably writing files because:
1. `alignment-expert` agent type had Write tool listed but wasn't using it
2. Switched to `general-purpose` agents which have full tool access
This was a tool reliability / prompting issue, not a storage architecture issue.
## Conclusion
**Don't do this.** SQLite would add complexity without solving any real problem:
- Performance gain: negligible (~50ms on 3+ minute rounds)
- Debugging: harder (need SQLite tools vs cat/grep)
- Agent integration: would require new MCP tools
- Concurrency: more complex (SQLite write locks)
The file-based approach:
- Works with existing Write tool in Task agents
- Easily debuggable
- Naturally parallelizes (separate files)
- Matches how Claude Code agents already work
## Recommendation
Keep file-based approach. The "fix" was using `general-purpose` subagents, not changing storage.

View file

@ -0,0 +1,123 @@
# Spike: RFC SDLC Workflow Gaps
| | |
|---|---|
| **Status** | In Progress |
| **Date** | 2026-02-01 |
| **Time Box** | 1 hour |
---
## Question
Why does RFC matching fail for NNNN-prefixed patterns, and why doesn't RFC status auto-update on PR merge?
---
## Findings
### Issue 1: RFC Matching Fails for `NNNN-slug` Patterns
**Location:** `crates/blue-core/src/store.rs:1749-1831`
The `find_document()` function tries matches in this order:
1. **Exact title match** (line 1751)
2. **Slug-to-title** (lines 1755-1786): `worker-job-id``worker job id`
3. **Number match** (lines 1788-1798): Parse as integer
4. **Substring match** (lines 1800-1824): `LIKE '%query%'`
**Root cause at line 1789:**
```rust
let trimmed = query.trim_start_matches('0');
if let Ok(num) = if trimmed.is_empty() {
"0".parse()
} else {
trimmed.parse::<i32>()
} { ... }
```
This only works for **pure numeric strings**. Given `0107-worker-job-id-integration`:
- `trim_start_matches('0')``"107-worker-job-id-integration"`
- `parse::<i32>()`**fails** (not a number)
- Falls through to substring match
- `%0107-worker-job-id-integration%` matches nothing
**Why each pattern failed:**
| Pattern | Why Failed |
|---------|-----------|
| `0107` | Parsed as 107, but no RFC #107 exists |
| `0107-worker-job-id-integration` | Not a pure number, substring finds nothing |
| `worker-job-id-integration` | Substring match succeeds ✓ |
**Fix:** Extract leading digits with regex before number parse:
```rust
// Try number match - extract leading digits from NNNN-slug format
let num_str = query.chars()
.take_while(|c| c.is_ascii_digit())
.collect::<String>();
let trimmed = num_str.trim_start_matches('0');
```
---
### Issue 2: RFC Status Not Auto-Updating on Merge
**Location:** `crates/blue-mcp/src/handlers/pr.rs:341-441`
The `handle_merge()` function merges the PR but **never updates RFC status**.
**Existing automation:**
| Handler | Status Update |
|---------|--------------|
| `worktree.rs:218-221` | accepted → in-progress ✓ |
| `pr.rs:341-441` | **None** ❌ |
**Why it's missing - the data exists but isn't connected:**
1. **Worktrees table has the link:**
```sql
CREATE TABLE worktrees (
document_id INTEGER NOT NULL, -- Links to RFC
branch_name TEXT NOT NULL, -- The branch
...
)
```
2. **PR handler can get current branch:**
```rust
fn get_current_branch(repo_path: &Path) -> Result<String, String>
```
3. **But no way to look up worktree by branch:**
- `get_worktree(document_id)` - by RFC id only
- `list_worktrees()` - all worktrees
- **Missing:** `get_worktree_by_branch(branch_name)`
**Fix requires two changes:**
1. Add `get_worktree_by_branch()` to `store.rs`:
```rust
pub fn get_worktree_by_branch(&self, branch_name: &str) -> Result<Option<Worktree>, StoreError>
```
2. Update `handle_merge()` in `pr.rs` to:
- Get current branch
- Look up worktree by branch
- Get document (RFC) from worktree.document_id
- Update status to "implemented"
---
## Summary
| Issue | Root Cause | Fix Location |
|-------|-----------|--------------|
| RFC matching | Number extraction only works on pure digits | `store.rs:1789` |
| Auto-status | No worktree→RFC lookup on merge | `pr.rs:416` + `store.rs` |
Both fixes are mechanical. Create RFC to track implementation.

View file

@ -1,61 +0,0 @@
---
name: alignment-expert
description: Expert agent for alignment dialogues. Produces focused perspectives with inline markers. Use when orchestrating multi-expert alignment dialogues via blue_dialogue_create.
tools: Read, Grep, Glob, Write
model: sonnet
---
You are an expert participant in an ALIGNMENT-seeking dialogue.
## CRITICAL: FILE OUTPUT PROTOCOL
**YOU MUST WRITE YOUR RESPONSE TO A FILE.** This is not optional.
Your prompt will specify an OUTPUT_FILE path. You MUST:
1. Use the Write tool to write your complete response to that file
2. AFTER writing succeeds, return a structured confirmation to the Judge
If you return your response text directly without writing to a file, **YOUR WORK WILL BE LOST** and you will fail your task.
## Your Role
- SURFACE perspectives others may have missed
- DEFEND valuable ideas with evidence, not ego
- CHALLENGE assumptions with curiosity, not destruction
- INTEGRATE perspectives that resonate
- CONCEDE gracefully when others see something you missed
Your contribution is scored on PRECISION, not volume.
One sharp insight beats ten paragraphs.
## Response Structure (Write This to the File)
```
[PERSPECTIVE P01: brief label]
Two to four sentences. No preamble.
[PERSPECTIVE P02: brief label] ← optional
One to two sentences.
[TENSION T01: brief description] ← optional
One sentence.
[REFINEMENT: description] or [CONCESSION: description] or [RESOLVED Tn] ← optional
One sentence each.
```
Nothing else. No introduction. No conclusion.
## Return Format (After Writing File)
After successfully writing your response to the file, return ONLY this structured confirmation:
```
FILE_WRITTEN: {path}
Perspectives: P01 [label], P02 [label]
Tensions: T01 [label] or none
Moves: [CONCESSION|REFINEMENT|RESOLVED] or none
Claim: [your single strongest claim in one sentence]
```
Five lines. The FILE_WRITTEN line confirms you wrote the file. Without it, your work is considered lost.

View file

@ -9,7 +9,8 @@ use std::path::{Path, PathBuf};
use std::process::Command; use std::process::Command;
use blue_core::{DocType, Document, LinkType, ProjectState, title_to_slug}; use blue_core::{DocType, Document, LinkType, ProjectState, title_to_slug};
use serde::Serialize; use rand::Rng;
use serde::{Deserialize, Serialize};
use serde_json::{json, Value}; use serde_json::{json, Value};
use crate::error::ServerError; use crate::error::ServerError;
@ -26,8 +27,54 @@ fn coerce_bool(v: &Value) -> Option<bool> {
// ==================== Alignment Mode Types ==================== // ==================== Alignment Mode Types ====================
/// Expert tier for pool-based sampling (RFC 0048)
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum ExpertTier {
Core,
Adjacent,
Wildcard,
}
impl std::fmt::Display for ExpertTier {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
ExpertTier::Core => write!(f, "Core"),
ExpertTier::Adjacent => write!(f, "Adjacent"),
ExpertTier::Wildcard => write!(f, "Wildcard"),
}
}
}
/// Rotation mode for expert panel sampling (RFC 0048)
#[derive(Debug, Clone, Copy, Serialize, Deserialize, Default, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum RotationMode {
#[default]
None,
Wildcards,
Full,
}
/// A single expert in the pool (RFC 0048)
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PoolExpert {
pub role: String,
pub tier: ExpertTier,
pub relevance: f64,
}
/// Expert pool with tiered structure (RFC 0048)
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ExpertPool {
pub domain: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub question: Option<String>,
pub experts: Vec<PoolExpert>,
}
/// A pastry-themed expert agent for alignment dialogues /// A pastry-themed expert agent for alignment dialogues
#[derive(Debug, Clone, Serialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PastryAgent { pub struct PastryAgent {
pub name: String, pub name: String,
pub role: String, pub role: String,
@ -316,10 +363,6 @@ pub fn handle_create(state: &mut ProjectState, args: &Value) -> Result<Value, Se
.get("alignment") .get("alignment")
.and_then(coerce_bool) .and_then(coerce_bool)
.unwrap_or(false); .unwrap_or(false);
let agent_count = args
.get("agents")
.and_then(|v| v.as_u64())
.unwrap_or(3) as usize;
let model = args let model = args
.get("model") .get("model")
.and_then(|v| v.as_str()) .and_then(|v| v.as_str())
@ -334,6 +377,33 @@ pub fn handle_create(state: &mut ProjectState, args: &Value) -> Result<Value, Se
}) })
.unwrap_or_default(); .unwrap_or_default();
// RFC 0048: Expert pool parameters
let expert_pool: Option<ExpertPool> = args
.get("expert_pool")
.and_then(|v| serde_json::from_value(v.clone()).ok());
let panel_size = args
.get("panel_size")
.and_then(|v| v.as_u64())
.map(|n| n as usize);
let rotation: RotationMode = args
.get("rotation")
.and_then(|v| v.as_str())
.map(|s| match s {
"wildcards" => RotationMode::Wildcards,
"full" => RotationMode::Full,
_ => RotationMode::None,
})
.unwrap_or_default();
// RFC 0048: Alignment mode requires expert_pool
if alignment && expert_pool.is_none() {
return Err(ServerError::CommandFailed(
"Alignment dialogues require expert_pool parameter (RFC 0048)".to_string(),
));
}
// Validate RFC exists if provided // Validate RFC exists if provided
let rfc_doc = if let Some(rfc) = rfc_title { let rfc_doc = if let Some(rfc) = rfc_title {
Some( Some(
@ -363,15 +433,20 @@ pub fn handle_create(state: &mut ProjectState, args: &Value) -> Result<Value, Se
let dialogue_path = docs_path.join(&file_path); let dialogue_path = docs_path.join(&file_path);
// Generate markdown content — alignment mode gets a different scaffold // Generate markdown content — alignment mode gets a different scaffold
let (markdown, pastry_agents) = if alignment { let (markdown, pastry_agents, pool_for_response) = if alignment {
let agents = assign_pastry_agents(agent_count, title); // RFC 0048: Use expert pool for alignment mode
let pool = expert_pool.unwrap(); // Safe: validated above
let size = panel_size.unwrap_or_else(|| pool.experts.len().min(12));
let sampled = sample_panel_from_pool(&pool, size);
let agents = assign_pastry_names(sampled);
let md = generate_alignment_dialogue_markdown( let md = generate_alignment_dialogue_markdown(
title, title,
dialogue_number, dialogue_number,
rfc_title, rfc_title,
&agents, &agents,
Some(&pool),
); );
(md, Some(agents)) (md, Some(agents), Some(pool))
} else { } else {
let md = generate_dialogue_markdown( let md = generate_dialogue_markdown(
title, title,
@ -380,7 +455,7 @@ pub fn handle_create(state: &mut ProjectState, args: &Value) -> Result<Value, Se
summary, summary,
content, content,
); );
(md, None) (md, None, None)
}; };
// Create dialogues directory if it doesn't exist // Create dialogues directory if it doesn't exist
@ -427,12 +502,23 @@ pub fn handle_create(state: &mut ProjectState, args: &Value) -> Result<Value, Se
ServerError::CommandFailed(format!("Failed to create output dir {}: {}", output_dir, e)) ServerError::CommandFailed(format!("Failed to create output dir {}: {}", output_dir, e))
})?; })?;
// RFC 0048: Persist expert pool to output directory
if let Some(ref pool) = pool_for_response {
let pool_path = format!("{}/expert-pool.json", output_dir);
let pool_json = serde_json::to_string_pretty(pool)
.map_err(|e| ServerError::CommandFailed(format!("Failed to serialize pool: {}", e)))?;
fs::write(&pool_path, pool_json)
.map_err(|e| ServerError::CommandFailed(format!("Failed to write pool: {}", e)))?;
}
let protocol = build_judge_protocol( let protocol = build_judge_protocol(
agents, agents,
&dialogue_path.display().to_string(), &dialogue_path.display().to_string(),
model, model,
&sources, &sources,
&output_dir, &output_dir,
pool_for_response.as_ref(),
rotation,
); );
// Extract instructions as prose so Claude reads them directly // Extract instructions as prose so Claude reads them directly
let instructions = protocol["instructions"].as_str().unwrap_or(""); let instructions = protocol["instructions"].as_str().unwrap_or("");
@ -729,55 +815,85 @@ fn generate_dialogue_markdown(
// ==================== Alignment Mode Helpers ==================== // ==================== Alignment Mode Helpers ====================
/// Expert roles keyed by topic keywords /// Weighted random sampling without replacement (RFC 0048)
const ROLE_KEYWORDS: &[(&[&str], &str)] = &[ /// Higher relevance = higher selection probability
(&["system", "architect", "infrastructure", "scale"], "Systems Architect"), fn weighted_sample(experts: &[PoolExpert], n: usize) -> Vec<PoolExpert> {
(&["security", "auth", "vulnerability", "trust"], "Security Architect"), if n >= experts.len() {
(&["api", "endpoint", "rest", "grpc", "protocol"], "API Designer"), return experts.to_vec();
(&["data", "database", "storage", "schema", "model"], "Data Architect"), }
(&["test", "quality", "qa", "reliability"], "Quality Engineer"),
(&["ux", "ui", "frontend", "user", "interface", "design"], "UX Architect"),
(&["perf", "performance", "latency", "throughput", "speed"], "Performance Engineer"),
(&["devops", "deploy", "ci", "cd", "pipeline", "ops"], "DevOps Architect"),
(&["ml", "ai", "model", "training", "inference"], "ML Engineer"),
(&["doc", "documentation", "spec", "rfc", "standard"], "Technical Writer"),
];
/// General-purpose roles used when keywords don't match let mut rng = rand::thread_rng();
const GENERAL_ROLES: &[&str] = &[ let mut remaining: Vec<_> = experts.iter().cloned().collect();
"Systems Thinker", let mut selected = Vec::with_capacity(n);
"Domain Expert",
"Devil's Advocate",
"Integration Specialist",
"Risk Analyst",
"First Principles Reasoner",
"Pattern Recognizer",
"Edge Case Hunter",
];
/// Select a role based on topic keywords for _ in 0..n {
fn select_role_for_topic(topic: &str, index: usize) -> &'static str { if remaining.is_empty() {
let topic_lower = topic.to_lowercase(); break;
// Try keyword matching first — pick the best match for this agent index
let mut matched_roles: Vec<&str> = Vec::new();
for (keywords, role) in ROLE_KEYWORDS {
if keywords.iter().any(|kw| topic_lower.contains(kw)) {
matched_roles.push(role);
} }
let total_weight: f64 = remaining.iter().map(|e| e.relevance).sum();
if total_weight <= 0.0 {
// Fall back to uniform sampling if weights are zero
let idx = rng.gen_range(0..remaining.len());
selected.push(remaining.remove(idx));
continue;
}
// Weighted selection
let mut threshold = rng.gen::<f64>() * total_weight;
let mut idx = 0;
for (i, expert) in remaining.iter().enumerate() {
threshold -= expert.relevance;
if threshold <= 0.0 {
idx = i;
break;
}
}
selected.push(remaining.remove(idx));
} }
if index < matched_roles.len() { selected
return matched_roles[index]; }
/// Sample a panel from an expert pool (RFC 0048)
pub fn sample_panel_from_pool(pool: &ExpertPool, panel_size: usize) -> Vec<PoolExpert> {
let (core_n, adj_n, wc_n) = tier_split(panel_size);
// Separate experts by tier
let core: Vec<_> = pool.experts.iter()
.filter(|e| e.tier == ExpertTier::Core)
.cloned()
.collect();
let adjacent: Vec<_> = pool.experts.iter()
.filter(|e| e.tier == ExpertTier::Adjacent)
.cloned()
.collect();
let wildcard: Vec<_> = pool.experts.iter()
.filter(|e| e.tier == ExpertTier::Wildcard)
.cloned()
.collect();
// Sample from each tier
let mut panel = Vec::new();
panel.extend(weighted_sample(&core, core_n));
panel.extend(weighted_sample(&adjacent, adj_n));
panel.extend(weighted_sample(&wildcard, wc_n));
// If we don't have enough in a tier, fill from others
while panel.len() < panel_size && panel.len() < pool.experts.len() {
let used_roles: std::collections::HashSet<_> = panel.iter().map(|e| &e.role).collect();
let remaining: Vec<_> = pool.experts.iter()
.filter(|e| !used_roles.contains(&e.role))
.cloned()
.collect();
if remaining.is_empty() {
break;
}
let sampled = weighted_sample(&remaining, 1);
panel.extend(sampled);
} }
// Fall back to general roles panel
let general_idx = if matched_roles.is_empty() {
index
} else {
index - matched_roles.len()
};
GENERAL_ROLES[general_idx % GENERAL_ROLES.len()]
} }
/// Compute tier boundaries for agent assignment /// Compute tier boundaries for agent assignment
@ -798,33 +914,23 @@ fn tier_split(count: usize) -> (usize, usize, usize) {
} }
} }
/// Assign pastry-themed agents with expert roles, tiers, and relevance /// Assign pastry names to sampled experts (RFC 0048)
pub fn assign_pastry_agents(count: usize, topic: &str) -> Vec<PastryAgent> { pub fn assign_pastry_names(sampled: Vec<PoolExpert>) -> Vec<PastryAgent> {
let (core_count, adjacent_count, _wildcard_count) = tier_split(count); sampled
.into_iter()
(0..count) .enumerate()
.map(|i| { .map(|(i, expert)| {
let name = if i < PASTRY_NAMES.len() { let name = if i < PASTRY_NAMES.len() {
PASTRY_NAMES[i].to_string() PASTRY_NAMES[i].to_string()
} else { } else {
format!("Pastry{}", i + 1) format!("Pastry{}", i + 1)
}; };
let role = select_role_for_topic(topic, i).to_string();
let (tier, relevance) = if i < core_count {
("Core", 0.95 - (i as f64 * 0.05))
} else if i < core_count + adjacent_count {
let adj_idx = i - core_count;
("Adjacent", 0.70 - (adj_idx as f64 * 0.05))
} else {
let wc_idx = i - core_count - adjacent_count;
("Wildcard", 0.40 - (wc_idx as f64 * 0.05))
};
PastryAgent { PastryAgent {
name, name,
role, role: expert.role,
emoji: "🧁".to_string(), emoji: "🧁".to_string(),
tier: tier.to_string(), tier: expert.tier.to_string(),
relevance, relevance: expert.relevance,
} }
}) })
.collect() .collect()
@ -836,6 +942,7 @@ pub fn generate_alignment_dialogue_markdown(
number: i32, number: i32,
rfc_title: Option<&str>, rfc_title: Option<&str>,
agents: &[PastryAgent], agents: &[PastryAgent],
pool: Option<&ExpertPool>,
) -> String { ) -> String {
let date = chrono::Utc::now().format("%Y-%m-%d").to_string(); let date = chrono::Utc::now().format("%Y-%m-%d").to_string();
let time = chrono::Utc::now().format("%H:%MZ").to_string(); let time = chrono::Utc::now().format("%H:%MZ").to_string();
@ -865,7 +972,35 @@ pub fn generate_alignment_dialogue_markdown(
} }
md.push('\n'); md.push('\n');
// Expert Panel table // Expert Pool section (RFC 0048)
if let Some(p) = pool {
md.push_str("## Expert Pool\n\n");
md.push_str(&format!("**Domain**: {}\n", p.domain));
if let Some(ref q) = p.question {
md.push_str(&format!("**Question**: {}\n", q));
}
md.push('\n');
// Group by tier
let core: Vec<_> = p.experts.iter().filter(|e| e.tier == ExpertTier::Core).collect();
let adjacent: Vec<_> = p.experts.iter().filter(|e| e.tier == ExpertTier::Adjacent).collect();
let wildcard: Vec<_> = p.experts.iter().filter(|e| e.tier == ExpertTier::Wildcard).collect();
md.push_str("| Tier | Experts |\n");
md.push_str("|------|--------|\n");
if !core.is_empty() {
md.push_str(&format!("| Core | {} |\n", core.iter().map(|e| e.role.as_str()).collect::<Vec<_>>().join(", ")));
}
if !adjacent.is_empty() {
md.push_str(&format!("| Adjacent | {} |\n", adjacent.iter().map(|e| e.role.as_str()).collect::<Vec<_>>().join(", ")));
}
if !wildcard.is_empty() {
md.push_str(&format!("| Wildcard | {} |\n", wildcard.iter().map(|e| e.role.as_str()).collect::<Vec<_>>().join(", ")));
}
md.push('\n');
}
// Expert Panel table (sampled for this dialogue)
md.push_str("## Expert Panel\n\n"); md.push_str("## Expert Panel\n\n");
md.push_str("| Agent | Role | Tier | Relevance | Emoji |\n"); md.push_str("| Agent | Role | Tier | Relevance | Emoji |\n");
md.push_str("|-------|------|------|-----------|-------|\n"); md.push_str("|-------|------|------|-----------|-------|\n");
@ -919,6 +1054,8 @@ pub fn build_judge_protocol(
model: &str, model: &str,
sources: &[String], sources: &[String],
output_dir: &str, output_dir: &str,
pool: Option<&ExpertPool>,
rotation: RotationMode,
) -> Value { ) -> Value {
let agent_list: Vec<Value> = agents let agent_list: Vec<Value> = agents
.iter() .iter()
@ -1033,9 +1170,8 @@ Then spawn ALL {agent_count} experts in a SINGLE message with {agent_count} Task
Multiple Task calls in one message run as parallel subagents. Multiple Task calls in one message run as parallel subagents.
Each Task call uses the prompt from blue_dialogue_round_prompt: Each Task call uses the prompt from blue_dialogue_round_prompt:
- subagent_type: "alignment-expert" (from task_params in response) - subagent_type: "general-purpose" (from task_params in response)
- description: "🧁 Muffin expert deliberation" (from task_params in response) - description: "🧁 Muffin expert deliberation" (from task_params in response)
- max_turns: 5 (from task_params in response)
- prompt: the "prompt" field from blue_dialogue_round_prompt response (already substituted) - prompt: the "prompt" field from blue_dialogue_round_prompt response (already substituted)
All {agent_count} results return when complete WITH STRUCTURED CONFIRMATIONS. All {agent_count} results return when complete WITH STRUCTURED CONFIRMATIONS.
@ -1104,7 +1240,7 @@ NOTE: blue_dialogue_round_prompt handles round-specific context automatically:
.join(", "), .join(", "),
); );
json!({ let mut result = json!({
"instructions": instructions, "instructions": instructions,
"agent_prompt_template": agent_prompt_template, "agent_prompt_template": agent_prompt_template,
"agents": agent_list, "agents": agent_list,
@ -1112,12 +1248,28 @@ NOTE: blue_dialogue_round_prompt handles round-specific context automatically:
"model": model, "model": model,
"sources": sources, "sources": sources,
"output_dir": output_dir, "output_dir": output_dir,
"rotation": format!("{:?}", rotation).to_lowercase(),
"convergence": { "convergence": {
"max_rounds": 5, "max_rounds": 5,
"velocity_threshold": 0.1, "velocity_threshold": 0.1,
"tension_resolution_gate": true, "tension_resolution_gate": true,
}, },
}) });
// RFC 0048: Include pool info if present
if let Some(p) = pool {
result.as_object_mut().unwrap().insert(
"expert_pool".to_string(),
json!({
"domain": p.domain,
"question": p.question,
"total_experts": p.experts.len(),
"pool_file": format!("{}/expert-pool.json", output_dir),
}),
);
}
result
} }
/// Convert slug to title case /// Convert slug to title case
@ -1191,6 +1343,7 @@ pub fn handle_round_prompt(args: &Value) -> Result<Value, ServerError> {
// Build context instructions based on round // Build context instructions based on round
let context_instructions = if round == 0 { let context_instructions = if round == 0 {
// Round 0: No prior context to read, but agents can research if needed
String::new() String::new()
} else { } else {
format!( format!(
@ -1275,11 +1428,127 @@ Five lines. The FILE_WRITTEN line proves you wrote the file. Without it, the Jud
"task_params": { "task_params": {
"subagent_type": "general-purpose", "subagent_type": "general-purpose",
"description": format!("{} {} expert deliberation", agent_emoji, agent_name), "description": format!("{} {} expert deliberation", agent_emoji, agent_name),
"max_turns": 5,
} }
})) }))
} }
/// Handle blue_dialogue_sample_panel (RFC 0048)
///
/// Sample a new panel from the expert pool for manual round control.
pub fn handle_sample_panel(args: &Value) -> Result<Value, ServerError> {
let dialogue_title = args
.get("dialogue_title")
.and_then(|v| v.as_str())
.ok_or_else(|| ServerError::InvalidParams)?;
let round = args
.get("round")
.and_then(|v| v.as_u64())
.ok_or_else(|| ServerError::InvalidParams)? as usize;
let panel_size = args
.get("panel_size")
.and_then(|v| v.as_u64())
.map(|n| n as usize)
.unwrap_or(12);
// Parse retain/exclude lists
let retain: Vec<String> = args
.get("retain")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(String::from))
.collect()
})
.unwrap_or_default();
let exclude: Vec<String> = args
.get("exclude")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(String::from))
.collect()
})
.unwrap_or_default();
// Load pool from dialogue directory
let slug = title_to_slug(dialogue_title);
let pool_path = format!("/tmp/blue-dialogue/{}/expert-pool.json", slug);
let pool_content = fs::read_to_string(&pool_path).map_err(|e| {
ServerError::CommandFailed(format!(
"Failed to read expert pool at {}: {}. Did you create the dialogue with expert_pool?",
pool_path, e
))
})?;
let pool: ExpertPool = serde_json::from_str(&pool_content).map_err(|e| {
ServerError::CommandFailed(format!("Failed to parse expert pool: {}", e))
})?;
// Filter pool based on retain/exclude
let filtered: Vec<PoolExpert> = pool
.experts
.iter()
.filter(|e| {
let role_lower = e.role.to_lowercase();
// Include if in retain list (if retain is non-empty)
let in_retain = retain.is_empty()
|| retain.iter().any(|r| role_lower.contains(&r.to_lowercase()));
// Exclude if in exclude list
let in_exclude = exclude.iter().any(|x| role_lower.contains(&x.to_lowercase()));
in_retain && !in_exclude
})
.cloned()
.collect();
if filtered.is_empty() {
return Err(ServerError::CommandFailed(
"No experts remain after filtering. Check retain/exclude parameters.".to_string(),
));
}
// Create filtered pool for sampling
let filtered_pool = ExpertPool {
domain: pool.domain.clone(),
question: pool.question.clone(),
experts: filtered,
};
// Sample panel
let sampled = sample_panel_from_pool(&filtered_pool, panel_size);
let agents = assign_pastry_names(sampled);
// Create round directory and save panel
let output_dir = format!("/tmp/blue-dialogue/{}", slug);
let round_dir = format!("{}/round-{}", output_dir, round);
fs::create_dir_all(&round_dir).map_err(|e| {
ServerError::CommandFailed(format!("Failed to create round dir: {}", e))
})?;
let panel_path = format!("{}/panel.json", round_dir);
let panel_json = serde_json::to_string_pretty(&agents)
.map_err(|e| ServerError::CommandFailed(format!("Failed to serialize panel: {}", e)))?;
fs::write(&panel_path, panel_json)
.map_err(|e| ServerError::CommandFailed(format!("Failed to write panel: {}", e)))?;
Ok(json!({
"status": "success",
"message": format!("Sampled {} experts for round {}", agents.len(), round),
"round": round,
"panel_file": panel_path,
"panel": agents.iter().map(|a| json!({
"name": a.name,
"role": a.role,
"emoji": a.emoji,
"tier": a.tier,
"relevance": a.relevance,
})).collect::<Vec<_>>(),
}))
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
@ -1313,9 +1582,57 @@ mod tests {
// ==================== Alignment Mode Tests ==================== // ==================== Alignment Mode Tests ====================
/// Helper: create a test pool with the specified number of experts
fn test_pool(n: usize) -> ExpertPool {
let mut experts = Vec::new();
let base_roles = [
("Systems Architect", ExpertTier::Core),
("Security Engineer", ExpertTier::Core),
("API Designer", ExpertTier::Core),
("Data Architect", ExpertTier::Adjacent),
("Quality Engineer", ExpertTier::Adjacent),
("UX Architect", ExpertTier::Adjacent),
("DevOps Engineer", ExpertTier::Adjacent),
("Performance Engineer", ExpertTier::Wildcard),
("Technical Writer", ExpertTier::Wildcard),
("Risk Analyst", ExpertTier::Wildcard),
];
for i in 0..n {
let (base_role, tier) = base_roles[i % base_roles.len()];
// Make roles unique by adding a suffix for overflow
let role = if i < base_roles.len() {
base_role.to_string()
} else {
format!("{} {}", base_role, i / base_roles.len() + 1)
};
let relevance = match tier {
ExpertTier::Core => 0.95 - (i as f64 * 0.02),
ExpertTier::Adjacent => 0.70 - (i as f64 * 0.02),
ExpertTier::Wildcard => 0.40 - (i as f64 * 0.02),
};
experts.push(PoolExpert {
role,
tier,
relevance: relevance.max(0.20),
});
}
ExpertPool {
domain: "Test Domain".to_string(),
question: Some("Test question?".to_string()),
experts,
}
}
/// Helper: create test agents from a pool
fn test_agents(n: usize) -> Vec<PastryAgent> {
let pool = test_pool(n.max(10));
let sampled = sample_panel_from_pool(&pool, n);
assign_pastry_names(sampled)
}
#[test] #[test]
fn test_assign_pastry_agents() { fn test_assign_pastry_names() {
let agents = assign_pastry_agents(3, "system design"); let agents = test_agents(3);
assert_eq!(agents.len(), 3); assert_eq!(agents.len(), 3);
assert_eq!(agents[0].name, "Muffin"); assert_eq!(agents[0].name, "Muffin");
assert_eq!(agents[1].name, "Cupcake"); assert_eq!(agents[1].name, "Cupcake");
@ -1327,8 +1644,11 @@ mod tests {
} }
#[test] #[test]
fn test_assign_pastry_agents_overflow() { fn test_assign_pastry_names_overflow() {
let agents = assign_pastry_agents(25, "general topic"); // Create a pool with 25 experts
let pool = test_pool(25);
let sampled = sample_panel_from_pool(&pool, 25);
let agents = assign_pastry_names(sampled);
assert_eq!(agents.len(), 25); assert_eq!(agents.len(), 25);
// First 20 use named pastries // First 20 use named pastries
assert_eq!(agents[0].name, "Muffin"); assert_eq!(agents[0].name, "Muffin");
@ -1339,28 +1659,43 @@ mod tests {
} }
#[test] #[test]
fn test_select_roles_for_topic() { fn test_sample_panel_from_pool() {
// Security topic should get Security Architect let pool = test_pool(15);
let role = select_role_for_topic("security vulnerability assessment", 0); let sampled = sample_panel_from_pool(&pool, 7);
assert_eq!(role, "Security Architect"); assert_eq!(sampled.len(), 7);
// All sampled experts should have valid roles
for expert in &sampled {
assert!(!expert.role.is_empty());
assert!(expert.relevance > 0.0);
}
}
// API topic should get API Designer #[test]
let role = select_role_for_topic("api endpoint design", 0); fn test_weighted_sample_respects_size() {
assert_eq!(role, "API Designer"); let experts = vec![
PoolExpert { role: "A".to_string(), tier: ExpertTier::Core, relevance: 0.9 },
PoolExpert { role: "B".to_string(), tier: ExpertTier::Core, relevance: 0.8 },
PoolExpert { role: "C".to_string(), tier: ExpertTier::Core, relevance: 0.7 },
];
// Request more than available
let sampled = weighted_sample(&experts, 5);
assert_eq!(sampled.len(), 3); // Should return all available
// Unknown topic falls back to general roles // Request fewer than available
let role = select_role_for_topic("something unusual", 0); let sampled = weighted_sample(&experts, 2);
assert_eq!(role, "Systems Thinker"); assert_eq!(sampled.len(), 2);
} }
#[test] #[test]
fn test_alignment_dialogue_markdown() { fn test_alignment_dialogue_markdown() {
let agents = assign_pastry_agents(3, "test topic"); let agents = test_agents(3);
let pool = test_pool(10);
let md = generate_alignment_dialogue_markdown( let md = generate_alignment_dialogue_markdown(
"test-alignment", "test-alignment",
1, 1,
Some("test-rfc"), Some("test-rfc"),
&agents, &agents,
Some(&pool),
); );
// Required sections // Required sections
@ -1385,22 +1720,29 @@ mod tests {
assert!(md.contains("**Status**: In Progress")); assert!(md.contains("**Status**: In Progress"));
assert!(md.contains("**RFC**: test-rfc")); assert!(md.contains("**RFC**: test-rfc"));
assert!(md.contains("💙 Judge")); assert!(md.contains("💙 Judge"));
// RFC 0048: Expert Pool section
assert!(md.contains("## Expert Pool"));
assert!(md.contains("**Domain**: Test Domain"));
} }
#[test] #[test]
fn test_build_judge_protocol() { fn test_build_judge_protocol() {
let agents = assign_pastry_agents(3, "system design"); let agents = test_agents(3);
let pool = test_pool(10);
let protocol = build_judge_protocol( let protocol = build_judge_protocol(
&agents, &agents,
"/tmp/test.dialogue.md", "/tmp/test.dialogue.md",
"sonnet", "sonnet",
&["/tmp/source.rs".to_string()], &["/tmp/source.rs".to_string()],
"/tmp/blue-dialogue/system-design", "/tmp/blue-dialogue/system-design",
Some(&pool),
RotationMode::None,
); );
// Must have instructions // Must have instructions
let instructions = protocol.get("instructions").unwrap().as_str().unwrap(); let instructions = protocol.get("instructions").unwrap().as_str().unwrap();
assert!(instructions.contains("alignment-expert")); assert!(instructions.contains("general-purpose"));
assert!(instructions.contains("ALIGNMENT")); assert!(instructions.contains("ALIGNMENT"));
assert!(instructions.contains("Wisdom")); assert!(instructions.contains("Wisdom"));
assert!(instructions.contains("convergence")); assert!(instructions.contains("convergence"));
@ -1456,13 +1798,15 @@ mod tests {
#[test] #[test]
fn test_build_judge_protocol_no_sources() { fn test_build_judge_protocol_no_sources() {
let agents = assign_pastry_agents(2, "quick topic"); let agents = test_agents(2);
let protocol = build_judge_protocol( let protocol = build_judge_protocol(
&agents, &agents,
"/tmp/test.dialogue.md", "/tmp/test.dialogue.md",
"haiku", "haiku",
&[], &[],
"/tmp/blue-dialogue/quick-topic", "/tmp/blue-dialogue/quick-topic",
None,
RotationMode::None,
); );
// Template should NOT contain grounding instructions when no sources // Template should NOT contain grounding instructions when no sources
@ -1472,13 +1816,15 @@ mod tests {
#[test] #[test]
fn test_build_judge_protocol_output_paths() { fn test_build_judge_protocol_output_paths() {
let agents = assign_pastry_agents(4, "api design"); let agents = test_agents(4);
let protocol = build_judge_protocol( let protocol = build_judge_protocol(
&agents, &agents,
"/tmp/test.dialogue.md", "/tmp/test.dialogue.md",
"sonnet", "sonnet",
&[], &[],
"/tmp/blue-dialogue/api-design", "/tmp/blue-dialogue/api-design",
None,
RotationMode::None,
); );
// output_dir in JSON // output_dir in JSON
@ -1505,13 +1851,15 @@ mod tests {
#[test] #[test]
fn test_judge_protocol_artifact_write_instructions() { fn test_judge_protocol_artifact_write_instructions() {
let agents = assign_pastry_agents(3, "test artifacts"); let agents = test_agents(3);
let protocol = build_judge_protocol( let protocol = build_judge_protocol(
&agents, &agents,
"/tmp/test.dialogue.md", "/tmp/test.dialogue.md",
"sonnet", "sonnet",
&[], &[],
"/tmp/blue-dialogue/test-artifacts", "/tmp/blue-dialogue/test-artifacts",
None,
RotationMode::None,
); );
let instructions = protocol["instructions"].as_str().unwrap(); let instructions = protocol["instructions"].as_str().unwrap();
@ -1553,13 +1901,15 @@ mod tests {
#[test] #[test]
fn test_judge_protocol_context_references_artifacts() { fn test_judge_protocol_context_references_artifacts() {
let agents = assign_pastry_agents(3, "context test"); let agents = test_agents(3);
let protocol = build_judge_protocol( let protocol = build_judge_protocol(
&agents, &agents,
"/tmp/test.dialogue.md", "/tmp/test.dialogue.md",
"sonnet", "sonnet",
&[], &[],
"/tmp/blue-dialogue/context-test", "/tmp/blue-dialogue/context-test",
None,
RotationMode::None,
); );
let instructions = protocol["instructions"].as_str().unwrap(); let instructions = protocol["instructions"].as_str().unwrap();
@ -1620,7 +1970,8 @@ mod tests {
// Must have task_params for spawning // Must have task_params for spawning
assert_eq!(result["task_params"]["subagent_type"], "general-purpose"); assert_eq!(result["task_params"]["subagent_type"], "general-purpose");
assert_eq!(result["task_params"]["max_turns"], 5); // No max_turns - agents run until complete
assert!(result["task_params"].get("max_turns").is_none());
} }
#[test] #[test]

View file

@ -1535,7 +1535,7 @@ impl BlueServer {
}, },
{ {
"name": "blue_dialogue_create", "name": "blue_dialogue_create",
"description": "Create a new dialogue document. Pass alignment: true for multi-agent alignment dialogues (ADR 0014). When alignment is enabled, the response message contains a JUDGE PROTOCOL section — you MUST follow those instructions exactly to orchestrate the dialogue. The protocol tells you how to spawn background agents, score them, and run convergence rounds.", "description": "Create a new dialogue document. Pass alignment: true for multi-agent alignment dialogues (ADR 0014, RFC 0048). Alignment mode REQUIRES expert_pool parameter with tiered experts. The response contains a JUDGE PROTOCOL — follow those instructions to orchestrate the dialogue.",
"inputSchema": { "inputSchema": {
"type": "object", "type": "object",
"properties": { "properties": {
@ -1557,11 +1557,44 @@ impl BlueServer {
}, },
"alignment": { "alignment": {
"type": "boolean", "type": "boolean",
"description": "Enable alignment mode — returns a judge protocol with pastry-themed expert agents" "description": "Enable alignment mode — REQUIRES expert_pool parameter"
}, },
"agents": { "expert_pool": {
"type": "object",
"description": "RFC 0048: Judge-defined expert pool (REQUIRED for alignment mode)",
"properties": {
"domain": {
"type": "string",
"description": "Domain name (e.g., 'Investment Analysis')"
},
"question": {
"type": "string",
"description": "Optional: specific question being deliberated"
},
"experts": {
"type": "array",
"description": "Array of experts with role, tier, and relevance",
"items": {
"type": "object",
"properties": {
"role": { "type": "string", "description": "Expert role (e.g., 'Risk Manager')" },
"tier": { "type": "string", "enum": ["core", "adjacent", "wildcard"], "description": "Expert tier" },
"relevance": { "type": "number", "description": "Relevance score 0.0-1.0" }
},
"required": ["role", "tier", "relevance"]
}
}
},
"required": ["domain", "experts"]
},
"panel_size": {
"type": "integer", "type": "integer",
"description": "Number of cupcake agents (alignment mode only, default 3)" "description": "Number of experts to sample per round (default: pool size or 12)"
},
"rotation": {
"type": "string",
"enum": ["none", "wildcards", "full"],
"description": "Panel rotation mode: none (fixed), wildcards (rotate wildcards), full (resample all)"
}, },
"model": { "model": {
"type": "string", "type": "string",
@ -1668,6 +1701,38 @@ impl BlueServer {
"required": ["output_dir", "agent_name", "agent_emoji", "agent_role", "round"] "required": ["output_dir", "agent_name", "agent_emoji", "agent_role", "round"]
} }
}, },
{
"name": "blue_dialogue_sample_panel",
"description": "RFC 0048: Sample a new panel from the expert pool for manual round control. Use this when you want to explicitly control which experts participate in a specific round.",
"inputSchema": {
"type": "object",
"properties": {
"dialogue_title": {
"type": "string",
"description": "Dialogue title (used to find the expert-pool.json)"
},
"round": {
"type": "integer",
"description": "Round number to sample for"
},
"panel_size": {
"type": "integer",
"description": "Number of experts to sample (default: 12)"
},
"retain": {
"type": "array",
"items": { "type": "string" },
"description": "Expert roles to retain (must include these)"
},
"exclude": {
"type": "array",
"items": { "type": "string" },
"description": "Expert roles to exclude"
}
},
"required": ["dialogue_title", "round"]
}
},
// Phase 8: Playwright verification // Phase 8: Playwright verification
{ {
"name": "blue_playwright_verify", "name": "blue_playwright_verify",
@ -2446,6 +2511,7 @@ impl BlueServer {
"blue_dialogue_list" => self.handle_dialogue_list(&call.arguments), "blue_dialogue_list" => self.handle_dialogue_list(&call.arguments),
"blue_dialogue_save" => self.handle_dialogue_save(&call.arguments), "blue_dialogue_save" => self.handle_dialogue_save(&call.arguments),
"blue_dialogue_round_prompt" => self.handle_dialogue_round_prompt(&call.arguments), "blue_dialogue_round_prompt" => self.handle_dialogue_round_prompt(&call.arguments),
"blue_dialogue_sample_panel" => self.handle_dialogue_sample_panel(&call.arguments),
// Phase 8: Playwright handler // Phase 8: Playwright handler
"blue_playwright_verify" => self.handle_playwright_verify(&call.arguments), "blue_playwright_verify" => self.handle_playwright_verify(&call.arguments),
// Phase 9: Post-mortem handlers // Phase 9: Post-mortem handlers
@ -3778,6 +3844,11 @@ impl BlueServer {
crate::handlers::dialogue::handle_round_prompt(args) crate::handlers::dialogue::handle_round_prompt(args)
} }
fn handle_dialogue_sample_panel(&mut self, args: &Option<Value>) -> Result<Value, ServerError> {
let args = args.as_ref().ok_or(ServerError::InvalidParams)?;
crate::handlers::dialogue::handle_sample_panel(args)
}
fn handle_playwright_verify(&mut self, args: &Option<Value>) -> Result<Value, ServerError> { fn handle_playwright_verify(&mut self, args: &Option<Value>) -> Result<Value, ServerError> {
let args = args.as_ref().ok_or(ServerError::InvalidParams)?; let args = args.as_ref().ok_or(ServerError::InvalidParams)?;
crate::handlers::playwright::handle_verify(args) crate::handlers::playwright::handle_verify(args)

View file

@ -1,114 +1,98 @@
# Expert Pool System # Expert Pool System
When running alignment dialogues, select domain-specific experts based on relevance to the topic. When running alignment dialogues, the Judge creates domain-appropriate expert pools from which panels are sampled.
## Expert Selection Algorithm ## Two-Phase Architecture (RFC 0047)
1. **Identify domains** relevant to the topic | Phase | Actor | Action |
2. **Select experts** by relevance tier: |-------|-------|--------|
- **Core** (4): Highest relevance (0.75-0.95) | **Pool Design** | Judge | Creates 15-30 domain-specific experts with tiers and relevance |
- **Adjacent** (5): Medium relevance (0.50-0.70) | **Panel Sampling** | MCP Server | Samples N experts using weighted random selection |
- **Wildcard** (3): Low relevance but bring fresh perspectives (0.25-0.45)
3. **Assign pastry names** for identification (Muffin, Cupcake, Scone, Eclair, Donut, Brioche, Croissant, Macaron, Cannoli, Strudel, Beignet, Churro)
## Domain Expert Pools ## Pool Design (Judge Responsibility)
### Infrastructure / DevOps The Judge reads the RFC/topic and designs experts appropriate to the domain:
| Expert | Domain | Relevance |
|--------|--------|-----------|
| Platform Architect | Infra | 0.95 |
| SRE Lead | Infra | 0.90 |
| Database Architect | Infra | 0.85 |
| Security Engineer | Infra | 0.80 |
| Network Engineer | Infra | 0.70 |
| Cost Analyst | Finance | 0.55 |
| Compliance Officer | Legal | 0.45 |
| UX Researcher | Product | 0.35 |
### Product / Feature ```json
| Expert | Domain | Relevance | {
|--------|--------|-----------| "expert_pool": {
| Product Manager | Product | 0.95 | "domain": "Investment Analysis",
| UX Designer | Product | 0.90 | "experts": [
| Frontend Architect | Eng | 0.85 | { "role": "Value Analyst", "tier": "Core", "relevance": 0.95, "focus": "Intrinsic value, margin of safety" },
| Customer Advocate | Product | 0.80 | { "role": "Growth Analyst", "tier": "Core", "relevance": 0.90, "focus": "TAM expansion, revenue acceleration" },
| Data Analyst | Analytics | 0.70 | { "role": "Risk Manager", "tier": "Core", "relevance": 0.85, "focus": "Downside scenarios, tail events" },
| Backend Engineer | Eng | 0.65 | { "role": "ESG Analyst", "tier": "Adjacent", "relevance": 0.70, "focus": "Environmental, governance factors" },
| QA Lead | Eng | 0.55 | { "role": "Contrarian", "tier": "Wildcard", "relevance": 0.30, "focus": "Challenge consensus, find crowding" }
| Marketing Strategist | Business | 0.35 | ]
}
### ML / AI }
| Expert | Domain | Relevance |
|--------|--------|-----------|
| ML Architect | AI | 0.95 |
| Data Scientist | AI | 0.90 |
| MLOps Engineer | AI | 0.85 |
| AI Ethics Researcher | AI | 0.80 |
| Feature Engineer | AI | 0.70 |
| Platform Engineer | Infra | 0.60 |
| Privacy Counsel | Legal | 0.50 |
| Cognitive Scientist | Research | 0.35 |
### Governance / Policy
| Expert | Domain | Relevance |
|--------|--------|-----------|
| Governance Specialist | Gov | 0.95 |
| Legal Counsel | Legal | 0.90 |
| Ethics Board Member | Gov | 0.85 |
| Compliance Officer | Legal | 0.80 |
| Risk Analyst | Finance | 0.70 |
| Community Manager | Community | 0.60 |
| Economist | Economics | 0.50 |
| Anthropologist | Research | 0.35 |
### API / Integration
| Expert | Domain | Relevance |
|--------|--------|-----------|
| API Architect | Eng | 0.95 |
| Developer Advocate | Community | 0.90 |
| Integration Engineer | Eng | 0.85 |
| Security Architect | Security | 0.80 |
| Documentation Lead | Community | 0.70 |
| SDK Developer | Eng | 0.65 |
| Support Engineer | Community | 0.55 |
| Partner Manager | Business | 0.40 |
### General (default)
| Expert | Domain | Relevance |
|--------|--------|-----------|
| Systems Architect | Eng | 0.95 |
| Technical Lead | Eng | 0.90 |
| Product Manager | Product | 0.85 |
| Senior Engineer | Eng | 0.80 |
| QA Engineer | Eng | 0.70 |
| DevOps Engineer | Infra | 0.65 |
| Tech Writer | Community | 0.55 |
| Generalist | General | 0.40 |
## Expert Prompt Enhancement
Each expert receives their domain context in the prompt:
```
You are {expert_name} 🧁, a {domain_role} with expertise in {domain}.
Relevance to this topic: {relevance_score}
Bring your unique domain perspective while respecting that others see parts of the elephant you cannot.
``` ```
## Panel Composition ## Tier Distribution
For N=12 experts (typical for complex RFCs): | Tier | Pool % | Panel % | Selection Behavior |
- 4 Core experts (highest domain relevance) |------|--------|---------|-------------------|
- 5 Adjacent experts (related domains) | **Core** | ~25% | ~33% | Almost always selected (high relevance weights) |
- 3 Wildcard experts (distant domains for fresh thinking) | **Adjacent** | ~40% | ~42% | High probability, related expertise |
| **Wildcard** | ~35% | ~25% | Fresh perspectives, rotation candidates |
The Wildcards are crucial - they prevent groupthink and surface unexpected perspectives. ## Panel Sampling (MCP Server)
## Sampling Without Replacement ```
blue_dialogue_create(expert_pool=[...24 roles...], panel_size=12, rotation="wildcards")
→ Weighted random sample: higher relevance = higher selection probability
→ For N=12: ~4 Core, ~5 Adjacent, ~3 Wildcard
```
Each expert is used once per dialogue. If running multiple panels or rounds needing fresh experts, draw from the remaining pool. ## Rotation Modes
| Mode | Behavior | Use Case |
|------|----------|----------|
| `none` | Fixed panel all rounds | Standard deliberation |
| `wildcards` | Core/Adjacent persist, Wildcards resample | Bring fresh perspectives each round |
| `full` | Complete resample each round | Maximum diversity (experimental) |
## Pastry Naming
Experts are assigned pastry names for identification:
Muffin, Cupcake, Scone, Eclair, Donut, Brioche, Croissant, Macaron, Cannoli, Strudel, Beignet, Churro, Profiterole, Tartlet, Galette, Palmier, Kouign, Sfogliatella, Financier, Religieuse
## Domain-Specific Pools
The Judge designs pools appropriate to each domain. Example domains:
**Investment Analysis**: Value Analyst, Growth Analyst, Risk Manager, Portfolio Strategist, ESG Analyst, Quant Strategist, Technical Analyst, Behavioral Analyst, Income Analyst, Macro Economist, Credit Analyst, Contrarian
**System Architecture**: Platform Architect, Security Engineer, Database Architect, SRE Lead, API Designer, DevOps Engineer, Performance Engineer, Network Engineer, Cost Analyst, Compliance Officer
**Product Development**: Product Manager, UX Designer, Frontend Architect, Customer Advocate, Data Analyst, Backend Engineer, QA Lead, Technical Writer, Marketing Strategist
## Expert Prompt Template
Each expert receives their context:
```
You are {name} 🧁, a {role} in an ALIGNMENT-seeking dialogue.
Tier: {tier} | Relevance: {relevance}
Focus: {focus}
Your contribution is scored on PRECISION, not volume.
One sharp insight beats ten paragraphs.
```
## Pool Persistence
Pools are stored per-dialogue:
```
{output_dir}/
├── expert-pool.json ← Full pool definition (Judge writes)
├── round-0/
│ ├── panel.json ← Sampled panel for this round
│ └── *.md ← Agent responses
└── scoreboard.md
```
--- ---
*"The blind men who've never touched an elephant before often find the parts the experts overlook."* *"The Judge sees the elephant. The Judge summons the right blind men."*

View file

@ -5,14 +5,14 @@ description: Run multi-expert alignment dialogues with parallel background agent
# Alignment Play Skill # Alignment Play Skill
Orchestrate multi-expert alignment dialogues using the N+1 agent architecture from ADR 0014. Orchestrate multi-expert alignment dialogues using the N+1 agent architecture from ADR 0014 and RFC 0048.
## Usage ## Usage
``` ```
/alignment-play <topic> /alignment-play <topic>
/alignment-play --experts 5 <topic> /alignment-play --panel-size 7 <topic>
/alignment-play --convergence 0.95 <topic> /alignment-play --rotation wildcards <topic>
/alignment-play --rfc <rfc-title> <topic> /alignment-play --rfc <rfc-title> <topic>
``` ```
@ -20,41 +20,128 @@ Orchestrate multi-expert alignment dialogues using the N+1 agent architecture fr
| Parameter | Default | Description | | Parameter | Default | Description |
|-----------|---------|-------------| |-----------|---------|-------------|
| `--experts` | `3` | Number of expert agents (odd numbers preferred) | | `--panel-size` | pool size or 12 | Number of experts per round |
| `--convergence` | `0.95` | Target convergence threshold (0.0-1.0) | | `--rotation` | `none` | Rotation mode: none, wildcards, full |
| `--max-rounds` | `12` | Maximum rounds before stopping | | `--max-rounds` | `12` | Maximum rounds before stopping |
| `--rfc` | none | Link dialogue to an RFC | | `--rfc` | none | Link dialogue to an RFC |
| `--template` | `general` | Expert panel template (infrastructure, product, ml, governance, general) |
## How It Works ## How It Works
1. Call `blue_dialogue_create` with `alignment: true` and desired expert count ### Phase 0: Pool Design (RFC 0048)
2. The returned **Judge Protocol** contains everything: round workflow, agent prompt template, file architecture, scoring rules, convergence config
3. **Follow the protocol.** It is the single source of truth for execution. Before creating the dialogue, the Judge:
1. **Reads** the topic/RFC thoroughly
2. **Identifies** the domain (e.g., "Investment Analysis", "System Architecture")
3. **Designs** 8-24 experts appropriate to the domain:
- **Core (3-8)**: Essential perspectives for this specific problem
- **Adjacent (4-10)**: Related expertise that adds depth
- **Wildcard (3-6)**: Fresh perspectives, contrarians, cross-domain insight
4. **Assigns** relevance scores (0.20-0.95) based on expected contribution
5. **Creates** the dialogue with `expert_pool`:
```json
{
"title": "Investment Strategy Analysis",
"alignment": true,
"expert_pool": {
"domain": "Investment Analysis",
"question": "Should we rebalance the portfolio?",
"experts": [
{ "role": "Value Analyst", "tier": "core", "relevance": 0.95 },
{ "role": "Risk Manager", "tier": "core", "relevance": 0.90 },
{ "role": "Portfolio Strategist", "tier": "adjacent", "relevance": 0.70 },
{ "role": "ESG Analyst", "tier": "adjacent", "relevance": 0.65 },
{ "role": "Macro Economist", "tier": "wildcard", "relevance": 0.40 },
{ "role": "Contrarian", "tier": "wildcard", "relevance": 0.35 }
]
},
"panel_size": 5,
"rotation": "none"
}
```
### Phase 1+: Round Execution
1. The returned **Judge Protocol** contains: round workflow, agent prompt template, file architecture, scoring rules, convergence config
2. **Follow the protocol.** It is the single source of truth for execution.
3. The MCP server samples experts from the pool using weighted random selection
4. Higher relevance = higher selection probability
5. Core experts almost always selected; Wildcards provide variety
**CRITICAL**: You MUST use the Task tool to spawn REAL parallel agents. Do NOT simulate experts inline. The whole point is N independent Claude agents running in parallel via the Task tool. **CRITICAL**: You MUST use the Task tool to spawn REAL parallel agents. Do NOT simulate experts inline. The whole point is N independent Claude agents running in parallel via the Task tool.
## Expert Selection ## Expert Pool Design Examples
Experts are selected by **relevance to the topic**. Each gets a pastry name (Muffin, Cupcake, Scone, Eclair, Donut, Brioche, Croissant, Macaron, Cannoli, Strudel, Beignet, Churro). ### For an Investment Decision
**Tier Distribution** for N=12: | Role | Tier | Relevance |
- **Core** (4): Highest relevance (0.75-0.95) — domain specialists |------|------|-----------|
- **Adjacent** (5): Medium relevance (0.50-0.70) — related domains | Value Analyst | Core | 0.95 |
- **Wildcard** (3): Low relevance (0.25-0.45) — fresh perspectives, prevent groupthink | Risk Manager | Core | 0.90 |
| Portfolio Strategist | Core | 0.85 |
| ESG Analyst | Adjacent | 0.70 |
| Quant Strategist | Adjacent | 0.65 |
| Technical Analyst | Adjacent | 0.60 |
| Macro Economist | Wildcard | 0.40 |
| Contrarian | Wildcard | 0.35 |
### For an API Design
| Role | Tier | Relevance |
|------|------|-----------|
| API Architect | Core | 0.95 |
| Platform Engineer | Core | 0.90 |
| Security Engineer | Core | 0.85 |
| Developer Advocate | Adjacent | 0.70 |
| SRE Lead | Adjacent | 0.65 |
| Cost Analyst | Adjacent | 0.55 |
| Customer Success | Wildcard | 0.40 |
| Chaos Engineer | Wildcard | 0.30 |
## Tier Distribution
For a pool of P experts with panel size N:
| Tier | Pool % | Panel % | Purpose |
|------|--------|---------|---------|
| **Core** | ~30% | ~33% | Domain essentials, always selected |
| **Adjacent** | ~40% | ~42% | Related expertise, high selection probability |
| **Wildcard** | ~30% | ~25% | Fresh perspectives, rotation candidates |
## Blue MCP Tools ## Blue MCP Tools
- `blue_dialogue_create` — Creates dialogue, returns Judge Protocol (your source of truth) - `blue_dialogue_create` — Creates dialogue with expert_pool, returns Judge Protocol
- `blue_dialogue_round_prompt`**Get fully-substituted prompts for each agent.** Call this for each agent before spawning. Returns ready-to-use prompt with all template variables substituted (no manual substitution needed). - `blue_dialogue_round_prompt` — Get fully-substituted prompts for each agent
- `blue_dialogue_sample_panel` — Manually sample a new panel for a round (RFC 0048)
- `blue_dialogue_lint` — Validate .dialogue.md format - `blue_dialogue_lint` — Validate .dialogue.md format
- `blue_dialogue_save` — Persist to .blue/docs/dialogues/ - `blue_dialogue_save` — Persist to .blue/docs/dialogues/
## Agent Spawning
When spawning expert agents, you MUST use the Task tool with:
- `subagent_type: "general-purpose"` — NOT `alignment-expert`
- The prompt from `blue_dialogue_round_prompt`
- A descriptive name like "🧁 Muffin expert deliberation"
Example:
```
Task(
description: "🧁 Muffin expert deliberation",
subagent_type: "general-purpose",
prompt: <from blue_dialogue_round_prompt>
)
```
The `general-purpose` subagent has access to all tools including Write, which is required for writing the response file.
## Key Rules ## Key Rules
1. **NEVER submit your own perspectives** — You are the 💙 Judge, not a participant 1. **DESIGN THE POOL FIRST** — You are the 💙 Judge. Analyze the problem domain and design appropriate experts.
2. **Spawn ALL agents in ONE message** — No first-mover advantage 2. **NEVER submit your own perspectives** — You orchestrate, you don't participate
3. **Follow the Judge Protocol exactly** — It contains the round workflow, artifact writing steps, scoring rules, and convergence criteria 3. **Spawn ALL agents in ONE message** — No first-mover advantage
4. **Follow the Judge Protocol exactly** — It contains the round workflow, artifact writing steps, scoring rules, and convergence criteria
5. **Use `general-purpose` subagent_type** — NOT `alignment-expert`. The general-purpose agents have access to all tools including Write, which is required for file output
## The Spirit of the Dialogue ## The Spirit of the Dialogue
@ -70,4 +157,6 @@ And there's no upper limit. The score can always go higher. Because ALIGNMENT is
When the dialogue ends, all agents have won—because the result is more aligned than any could have made alone. More blind men touched more parts of the elephant. The whole becomes visible. When the dialogue ends, all agents have won—because the result is more aligned than any could have made alone. More blind men touched more parts of the elephant. The whole becomes visible.
*"The Judge sees the elephant. The Judge summons the right blind men."*
Always and forever. 🧁🧁🧁💙🧁🧁🧁 Always and forever. 🧁🧁🧁💙🧁🧁🧁