# RFC 0059: Expert-Judge Context Efficiency

**Status:** Draft
**Created:** 2026-02-04
**Author:** Eric + Claude
**Supersedes:** Portions of RFC 0036 (Expert Output Discipline)

## Problem

Current alignment dialogue architecture has unclear separation between audit artifacts and Judge context:

1. **Prompt confusion**: Experts are told to write full content to files and return only a 5-line confirmation
2. **Frequent failures**: Agents frequently don't write files (0 tool uses observed in 7/12 experts)
3. **Confirmation too sparse**: The 5-line return gives labels but no content—Judge can't synthesize
4. **Hallucination cascade**: When file writes fail and confirmations lack content, Judge fabricates expert contributions
5. **Context waste**: When experts DO include prose reasoning, Judge receives ~12k tokens it doesn't need

### Observed Failure Mode

```
12 Task agents finished
├─ Muffin Filesystem Architect · 0 tool uses    ← No file written
├─ Cupcake Knowledge Engineer · 10 tool uses    ← File written
├─ Scone AI Agent Specialist · 3 tool uses      ← File written
├─ Eclair DevEx Lead · 0 tool uses              ← No file written
├─ Donut API Designer · 0 tool uses             ← No file written
...
```

Judge then writes detailed scoreboard crediting Muffin, Eclair, Donut with specific insights they never provided.

## Analysis

### What the Judge Actually Needs

| Need | Current | Proposed |
|------|---------|----------|
| Know what perspectives were raised | Labels only | Labels + content |
| Score W/C/T/R dimensions | Must read prose | Structured markers sufficient |
| Track tensions | Labels only | Labels + content |
| Identify convergence signals | Implicit | Explicit [MOVE:CONVERGE] |
| Full reasoning chain | Not needed | Audit trail in file |

### Context Budget Comparison

| Approach | Per Expert | 12 Experts | Notes |
|----------|------------|------------|-------|
| Full prose + reasoning | ~1000 tokens | ~12k | Current when file read |
| Structured markers + content | ~300 tokens | ~3.6k | Proposed |
| Labels only (5-line) | ~50 tokens | ~600 | Current confirmation—too sparse |

**3.3x context reduction** while providing everything Judge needs.

## Proposal

### Preserve RFC 0051 Marker Syntax

Use the full marker syntax from RFC 0051 / `alignment-expert` skill:

**Local IDs:** `{EXPERT}-{TYPE}{round:02d}{seq:02d}`
- `MUFFIN-P0101` — Perspective
- `MUFFIN-R0101` — Recommendation
- `MUFFIN-T0101` — Tension
- `MUFFIN-E0101` — Evidence
- `MUFFIN-C0101` — Claim
- `MUFFIN-S0101` — Stance (NEW - this RFC)

**Cross-references:** `[RE:SUPPORT P0001]`, `[RE:OPPOSE R0001]`, `[RE:RESOLVE T0001]`, etc.

**Moves:** `[MOVE:CONVERGE]`, `[MOVE:CHALLENGE target]`, `[MOVE:CONCEDE target]`, etc.

### Tighten Output Discipline

The change is **format discipline**, not marker syntax:

```markdown
[MUFFIN-P0101: Income mandate mismatch]
NVIDIA's zero dividend conflicts with the trust's 4% income requirement.
The gap is substantial: zero income from a $2.1M position.

[MUFFIN-T0101: Growth vs income obligation]
[RE:ADDRESS T0001]
Fundamental conflict between NVIDIA's growth profile and income mandate.

[MUFFIN-R0101: Options collar structure]
[RE:RESOLVE MUFFIN-T0101]
Implement 30-delta covered call strategy. Historical premium: 2.1-2.8% monthly.

[MOVE:CHALLENGE P0023]
Prior "hold and wait" ignores opportunity cost of 8% dead weight.

---
[MUFFIN-S0101: CONDITIONAL | 0.85]
Requires options overlay to satisfy income mandate.
```

### Rules

1. **No prose preamble**: No "As a Value Analyst, I've considered..."
2. **No prose transitions**: No "Building on Cupcake's point..."
3. **Content in markers**: Each marker includes 1-3 sentences of substance
4. **Cross-refs inline**: Put `[RE:*]` on same line or immediately after marker
5. **Stance marker required**: Every expert must declare stance at end
6. **Separator required**: `---` before stance marker

### Stance: New First-Class Entity

Stance captures an expert's overall position on the dialogue question. Unlike Perspectives (observations) or Recommendations (proposals), Stance is the expert's **vote**.

#### Marker Syntax

```
[{EXPERT}-S{round}01: {stance_type} | {confidence}]
{conditions if CONDITIONAL}
```

**Stance Types:**
| Type | Meaning |
|------|---------|
| `APPROVE` | Support the proposal/direction |
| `REJECT` | Oppose the proposal/direction |
| `HOLD` | Need more information before deciding |
| `CONDITIONAL` | Support with specific conditions (must specify) |
| `ABSTAIN` | Declining to vote (conflict of interest, outside expertise) |

**Examples:**

```markdown
[MUFFIN-S0101: APPROVE | 0.90]

[CUPCAKE-S0101: CONDITIONAL | 0.75]
Requires options overlay to satisfy income mandate.

[CHURRO-S0101: REJECT | 0.60]
Concentration risk unaddressed.

[STRUDEL-S0101: HOLD | 0.50]
Need implementation evidence before committing.
```

#### Database Schema

```sql
CREATE TABLE stances (
  dialogue_id    TEXT NOT NULL,
  expert_slug    TEXT NOT NULL,
  round          INTEGER NOT NULL,
  stance_type    TEXT NOT NULL CHECK (stance_type IN ('APPROVE', 'REJECT', 'HOLD', 'CONDITIONAL', 'ABSTAIN')),
  confidence     REAL NOT NULL CHECK (confidence >= 0.0 AND confidence <= 1.0),
  conditions     TEXT,  -- required if CONDITIONAL
  created_at     TEXT NOT NULL DEFAULT (datetime('now')),

  PRIMARY KEY (dialogue_id, expert_slug, round),
  FOREIGN KEY (dialogue_id) REFERENCES dialogues(dialogue_id)
);

CREATE INDEX idx_stances_dialogue_round ON stances(dialogue_id, round);
```

#### Stance Tracking Across Rounds

Experts may change stance between rounds. The DB tracks history:

```
Round 0: MUFFIN-S0001: REJECT | 0.70
Round 1: MUFFIN-S0101: CONDITIONAL | 0.80 (after options proposal)
Round 2: MUFFIN-S0201: APPROVE | 0.90 (after evidence)
```

**Stance velocity** = number of stance changes in a round. High velocity indicates unresolved tensions.

#### Convergence Integration

Stance formalizes convergence tracking (RFC 0057):

```
Converge % = (APPROVE + CONDITIONAL with met conditions) / (total - ABSTAIN) × 100
```

| Metric | Calculation |
|--------|-------------|
| Unanimous | 100% APPROVE or CONDITIONAL |
| Supermajority | ≥75% |
| Majority | >50% |
| Deadlocked | No majority after max rounds |

**Confidence-weighted voting** (optional):
```
Weighted APPROVE = Σ(confidence where stance=APPROVE) / Σ(all confidence)
```

#### MCP Tools

Add to `blue_dialogue_round_register`:
```json
{
  "stances": [
    { "expert_slug": "muffin", "stance_type": "APPROVE", "confidence": 0.90 },
    { "expert_slug": "cupcake", "stance_type": "CONDITIONAL", "confidence": 0.75, "conditions": "Requires options overlay" }
  ]
}
```

Add to `blue_dialogue_round_context` response:
```json
{
  "stances": [
    { "expert_slug": "muffin", "round": 0, "stance_type": "REJECT", "confidence": 0.70 },
    { "expert_slug": "muffin", "round": 1, "stance_type": "APPROVE", "confidence": 0.90 }
  ],
  "current_stance_summary": {
    "APPROVE": 5,
    "CONDITIONAL": 2,
    "REJECT": 1,
    "HOLD": 1,
    "ABSTAIN": 0,
    "converge_percent": 77.8,
    "weighted_approve": 0.82
  }
}
```

### Prompt Construction: Judge Responsibility

The Judge builds prompts using `blue_dialogue_round_context` (RFC 0051), NOT `blue_dialogue_round_prompt`:

```
1. Judge calls blue_dialogue_round_context(dialogue_id, round)
   → Returns: experts, perspectives, tensions, open_tensions, convergence status

2. Judge constructs prompt for each expert using:
   - Context data from step 1
   - Output discipline rules (this RFC)
   - alignment-expert skill reference for marker syntax

3. Judge spawns Task with constructed prompt
   → Expert returns structured markers
   → Judge receives response directly
```

This keeps prompt construction in the Judge (flexible, no code changes for prompt tweaks) and data retrieval in MCP (structured, queryable).

### File Persistence: Judge Writes After Task Completion

After receiving Task results, Judge persists each expert's response:

```
4. Judge receives expert output from Task result
5. Judge calls blue_dialogue_expert_write(dialogue_id, round, expert_slug, content)
   → MCP writes to {output_dir}/round-{n}/{expert}.md
```

This removes "write to file" from expert responsibility—they just return structured content.

### Implementation

#### Phase 1: Prompt Construction Template (Judge-Side)

The Judge builds prompts following this template:

```rust
let prompt = format!(r##"
You are {name} {emoji}, a {role} in an ALIGNMENT-seeking dialogue.

Use the marker syntax from the alignment-expert skill:
- Local IDs: {name_upper}-P0101, {name_upper}-R0101, {name_upper}-T0101, etc.
- Cross-refs: [RE:SUPPORT P0001], [RE:RESOLVE T0001], etc.
- Moves: [MOVE:CONVERGE], [MOVE:CHALLENGE target], etc.

OUTPUT DISCIPLINE:
- NO prose preamble ("As a Value Analyst...")
- NO prose transitions ("Building on Cupcake's point...")
- NO prose conclusion ("In summary...")
- ONLY structured markers with 1-3 sentence content each
- END with: --- then one-line stance + confidence

EXAMPLE:
[{name_upper}-P0101: Income mandate mismatch]
NVIDIA's zero dividend conflicts with the trust's 4% income requirement.

[{name_upper}-T0101: Growth vs income]
[RE:ADDRESS T0001]
Fundamental conflict between growth profile and income mandate.

[{name_upper}-R0101: Options collar]
[RE:RESOLVE {name_upper}-T0101]
30-delta covered call strategy. Historical premium: 2.1-2.8% monthly.

[MOVE:CONCEDE P0023]
Donut's options proposal was directionally correct.

---
Stance: Conditional APPROVE with options overlay | Confidence: 0.85

Your output will be scored on PRECISION. One sharp insight beats ten paragraphs.
"##);
```

#### Phase 2: Add `blue_dialogue_expert_write` MCP Tool

New tool for Judge to persist expert outputs after Task completion:

```rust
/// Handle blue_dialogue_expert_write
///
/// Persist expert output to round directory for audit trail.
pub fn handle_expert_write(args: &Value) -> Result<Value, ServerError> {
    let output_dir = args.get("output_dir").and_then(|v| v.as_str())
        .ok_or(ServerError::InvalidParams)?;
    let round = args.get("round").and_then(|v| v.as_u64())
        .ok_or(ServerError::InvalidParams)? as usize;
    let expert_slug = args.get("expert_slug").and_then(|v| v.as_str())
        .ok_or(ServerError::InvalidParams)?;
    let content = args.get("content").and_then(|v| v.as_str())
        .ok_or(ServerError::InvalidParams)?;

    let round_dir = format!("{}/round-{}", output_dir, round);
    fs::create_dir_all(&round_dir)?;

    let output_path = format!("{}/{}.md", round_dir, expert_slug.to_lowercase());
    fs::write(&output_path, content)?;

    Ok(json!({
        "status": "success",
        "path": output_path
    }))
}
```

#### Phase 3: Remove `blue_dialogue_round_prompt`

The `blue_dialogue_round_prompt` tool conflates data retrieval with prompt construction. With this RFC:
- **Keep**: `blue_dialogue_round_context` for structured data
- **Add**: `blue_dialogue_expert_write` for persistence
- **Remove**: `blue_dialogue_round_prompt` entirely

**Rationale:**
- Prompt construction is orchestration, not data - belongs in Judge/skill
- Prompt iteration is common - shouldn't require Rust recompile
- Two approaches creates confusion
- Template text belongs in markdown, not Rust code

#### Phase 4: Update `alignment-play` Skill with Prompt Template

Add prompt template to skill (Judge fills from `round_context` data):

```markdown
## Expert Prompt Template

Build this prompt for each expert using data from `blue_dialogue_round_context`:

---

You are {expert.name} 🧁, a {expert.role} in an ALIGNMENT dialogue.

**Question:** {dialogue.question}

### Prior Round Context

**Open Tensions:**
{for t in open_tensions}
- {t.id}: {t.label} — {t.description}
{/for}

**Key Perspectives:**
{for p in perspectives where p.round == round - 1}
- {p.id}: {p.label} — {p.content}
{/for}

### Output Discipline (RFC 0059)

Return ONLY structured markers. No prose preamble. No transitions. No conclusion.

Use marker syntax from alignment-expert skill:
- Local IDs: {EXPERT}-P{round}01, {EXPERT}-T{round}01, etc.
- Cross-refs: [RE:SUPPORT P0001], [RE:RESOLVE T0001]
- Moves: [MOVE:CONVERGE], [MOVE:CHALLENGE target]
- Stance: REQUIRED - your vote on the question

End with:
---
[{EXPERT}-S{round}01: {APPROVE|REJECT|HOLD|CONDITIONAL|ABSTAIN} | {confidence}]
{conditions if CONDITIONAL}

Your contribution is scored on PRECISION. One sharp insight beats ten paragraphs.

---
```

Update skill workflow:
1. Call `blue_dialogue_round_context(dialogue_id, round)` for data
2. Build prompts using template above
3. Spawn all experts in parallel via Task
4. Receive structured marker responses
5. Call `blue_dialogue_expert_write` for each expert to persist
6. Score and synthesize

## Success Criteria

1. **Zero hallucination**: Judge only scores perspectives actually returned
2. **100% file capture**: All expert outputs persisted for audit
3. **Context efficiency**: <4k tokens for 12-expert round
4. **Clear failure mode**: If expert returns empty, Judge explicitly notes "no contribution"
5. **Stance tracking**: Every expert declares stance each round; history preserved
6. **Convergence calculation**: Automatic converge % from stance data, not manual counting

## Migration

- **RFC 0051** (Global Perspective Tension Tracking): Marker syntax preserved unchanged
- **RFC 0036** (Expert Output Discipline): Verbosity guidance superseded by stricter rules
- **alignment-expert skill**: Referenced for syntax, not duplicated in prompts
- **alignment-play skill**: Currently inconsistent—shows both `round_prompt` (lines 91, 119, 306) and `round_context` (line 254). Must be updated to use only `round_context` + Judge-built prompts.
- Existing dialogues unaffected (different prompt version)
- New dialogues use updated prompt template with output discipline

### Skill Updates Required

**alignment-play/SKILL.md:**
- Remove ALL references to `blue_dialogue_round_prompt` (lines 91, 119, 122-131, 225, 306, 314)
- Add prompt template section (Judge constructs prompts)
- Update workflow to use `blue_dialogue_round_context` + Judge prompt construction
- Add `blue_dialogue_expert_write` call after Task completion
- Add output discipline rules

**MCP Code:**
- Add `blue_dialogue_expert_write` handler
- Remove `handle_round_prompt` function from `dialogue.rs`
- Remove tool registration for `blue_dialogue_round_prompt`
- Add `stances` table to SQLite schema (`alignment_db.rs`)
- Add `register_stance` function
- Update `blue_dialogue_round_register` to accept stances
- Update `blue_dialogue_round_context` to return stance history + summary

## Alternatives Considered

### A: Keep file-primary, fix agent compliance

Problem: Can't force subagents to use Write tool. They have autonomy.

### B: Full prose to Judge, summarize later

Problem: 12k+ tokens per round is expensive and mostly wasted.

### C: Two-phase (agents write, Judge reads files)

Problem: Adds latency, requires Judge to glob/read, still fails if agents don't write.

### D: Keep `blue_dialogue_round_prompt` alongside `round_context`

Problem: Two ways to do the same thing creates confusion. Prompt construction is orchestration (Judge domain), not data retrieval (MCP domain). Prompt changes shouldn't require Rust recompile.

## Decision

**Adopt structured markers as canonical format, MCP-side file capture for audit.**

This inverts the current model: experts return content (not confirmation), MCP handles persistence (not experts).