ADRs: - Update 0008-honor, 0009-courage, 0013-overflow, 0015-plausibility - Add 0017-hosted-coding-assistant-architecture RFCs: - 0032: per-repo AWS profile configuration (draft) - 0033: round-scoped dialogue files (impl + plan) - 0034: comprehensive config architecture (accepted) - 0036: expert output discipline (impl) - 0037: single source protocol authority (draft) - 0038: SDLC workflow discipline (draft) - 0039: ADR architecture greenfield clarifications (impl) - 0040: divorce financial analysis (draft) - 0042: alignment dialogue defensive publication (draft) Spikes: - Read tool token limit on assembled dialogues - RFC ID collision root cause - Expert agent output too long - Judge writes expert outputs - Blue MCP server on superviber infrastructure - Playwright MCP multiple window isolation Dialogues: 16 alignment dialogue records Code: - blue-core: forge module enhancements - blue-mcp: env handlers and server updates - alignment-expert agent improvements - alignment-play skill refinements - install.sh script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
7.5 KiB
Spike: Judge Writing Expert Outputs (Inefficiency Diagnosis)
Created: 2026-01-27 Status: RESOLVED Issue: Judge (main Claude) writes expert output files instead of experts writing in parallel
Verification: CONFIRMED
Nanosecond timestamp analysis proves Judge writes expert files sequentially:
muffin.md .582672674
cupcake.md .611072573 (+28ms)
scone.md .649223592 (+38ms)
eclair.md .684142057 (+35ms)
donut.md .721882199 (+38ms)
brioche.md .755007032 (+33ms)
6 files written in 172ms total with consistent ~30-40ms gaps.
If agents wrote in parallel, we'd see:
- Different absolute timestamps (agents finish at different times)
- Random ordering (whichever agent finishes first writes first)
- Gaps of seconds/minutes (based on agent thinking time)
Instead: perfect sequential order with millisecond gaps = Judge batch write.
Observed Behavior
From user transcript:
5 alignment-expert agents finished (ctrl+o to expand)
├─ Muffin resume · 5 tool uses · 34.1k tokens
├─ Scone resume · 4 tool uses · 38.4k tokens
...
I have all 6 expert responses. Let me synthesize, score, and write the artifacts.
Write(/tmp/blue-dialogue/.../round-0/muffin.md)
Write(/tmp/blue-dialogue/.../round-0/cupcake.md)
Write(/tmp/blue-dialogue/.../round-0/scone.md)
...
The Judge writes expert files sequentially after all agents finish, instead of experts writing in parallel during their execution.
Expected Behavior (Per Protocol)
Agent Prompt Template (dialogue.rs:964-966):
WRITE YOUR OUTPUT — THIS IS MANDATORY:
Use the Write tool to write your COMPLETE response to:
{{OUTPUT_FILE}}
Judge Protocol (dialogue.rs:1007):
├─ round-0/
│ └─ {agent}.md ← Agents write, peers read (~2-3KB each)
Agents SHOULD:
- Write their full response to
{output_dir}/round-N/{agent_lowercase}.md - Return 4-line summary to Judge
Judge SHOULD:
- Receive summaries (not full responses)
- NOT write expert files (they're already written)
- Only write:
scoreboard.md,tensions.md,round-N.summary.md
Root Cause Analysis
VERIFIED: Agents Return Full Content, Judge Writes Files
Timestamp analysis confirms Hypothesis 2:
- Agents produce correctly formatted output (P01, P02, T01 markers)
- Agents return full content to Judge instead of 4-line summary
- Judge receives content and writes to files sequentially
- Agents do NOT call Write tool themselves
Evidence:
- User transcript shows Judge calling
Write(round-0/muffin.md)after agents finish - Nanosecond timestamps show sequential writes (~30-40ms gaps)
- Agent tool uses (5-9 per agent) are likely Read operations for grounding files, not Write
- File content matches expected format but was written by Judge
Why Agents Don't Write
The agent prompt template includes write instructions:
WRITE YOUR OUTPUT — THIS IS MANDATORY:
Use the Write tool to write your COMPLETE response to:
{{OUTPUT_FILE}}
But agents appear to ignore this and return full content instead. Possible causes:
- Instruction buried in prompt - Write instruction may not be salient enough
- Default agent behavior - Agents may default to returning content rather than writing
- Template not properly forwarded - Judge may not include full template in Task prompt
Inefficiency Impact
Current Flow (Inefficient):
Expert A runs → returns full content
Expert B runs → returns full content (parallel)
Expert C runs → returns full content (parallel)
↓
All finish
↓
Judge writes muffin.md (sequential)
Judge writes cupcake.md (sequential)
Judge writes scone.md (sequential)
Judge writes scoreboard.md
Judge writes tensions.md
Judge writes round-N.summary.md
Expected Flow (Efficient):
Expert A runs → writes muffin.md → returns 4-line summary
Expert B runs → writes cupcake.md → returns 4-line summary (parallel)
Expert C runs → writes scone.md → returns 4-line summary (parallel)
↓
All finish
↓
Judge writes scoreboard.md
Judge writes tensions.md (could be parallel)
Judge writes round-N.summary.md
Savings:
- N sequential Write operations eliminated (where N = expert count)
- Write operations moved to parallel agent execution
- Judge API calls reduced
Verification Steps
- Check if agents write files: After expert execution, verify if
{output_dir}/round-N/{agent}.mdexists before Judge writes - Check agent return content: Examine what Task tool returns - is it 4-line summary or full content?
- Check Judge prompt handling: Trace whether Judge properly substitutes and forwards the full template
Proposed Fix
If diagnosis confirms agents aren't writing:
Option A: Strengthen Agent Write Instruction
Move write instruction from template to agent definition (.claude/agents/alignment-expert.md):
---
name: alignment-expert
tools: Read, Grep, Glob, Write
model: sonnet
---
...
## OUTPUT PROTOCOL
You MUST write your response to the file path provided in your prompt using the Write tool.
This is mandatory. Do not return your full response to the Judge.
Option B: Validate Writes in Judge Protocol
Add verification step:
3a. VERIFY: After agents return, check that files exist at {output_dir}/round-N/*.md
If missing, write from returned content (fallback only)
Option C: Structured Return Contract
Have agents return structured JSON that explicitly indicates file was written:
{
"file_written": true,
"path": "/tmp/blue-dialogue/.../round-0/muffin.md",
"summary": "Perspectives: P01...\nTensions:..."
}
Next Steps
Run a test dialogue and capture agent tool use details✓ DoneVerify if agents actually call Write tool✓ Verified: They don'tCheck what content agents return to Judge✓ Full content, not 4-line summary- Implement fix - Choose from options below
Resolution: Options A + C Implemented
Changes Made
1. Agent Definition (Option A) — .claude/agents/alignment-expert.md
Added ## CRITICAL: FILE OUTPUT PROTOCOL section at the top of the agent definition:
- Emphasizes file writing is mandatory, not optional
- Warns that work will be lost if not written to file
- Placed at identity level (before role description) for maximum salience
2. Structured Return Contract (Option C) — dialogue.rs agent prompt template
Changed return format from 4-line summary to 5-line structured confirmation:
FILE_WRITTEN: {path}
Perspectives: P01 [label], P02 [label]
Tensions: T01 [label] or none
Moves: [CONCESSION|REFINEMENT|RESOLVED] or none
Claim: [single sentence]
The FILE_WRITTEN: line serves as proof the agent wrote to the file.
3. Judge Verification — dialogue.rs Judge protocol
Updated COLLECT step to verify FILE_WRITTEN line:
- If present: Agent wrote file, no action needed
- If missing: Fallback - check if file exists, write from return content if needed
Expected Behavior After Fix
Expert A runs → writes muffin.md → returns "FILE_WRITTEN: .../muffin.md\n..."
Expert B runs → writes cupcake.md → returns "FILE_WRITTEN: .../cupcake.md\n..." (parallel)
Expert C runs → writes scone.md → returns "FILE_WRITTEN: .../scone.md\n..." (parallel)
↓
All finish (files already written)
↓
Judge verifies FILE_WRITTEN in returns
Judge writes ONLY: scoreboard.md, tensions.md, round-N.summary.md
Savings: N fewer sequential Write operations by Judge (Opus), moved to parallel agent execution (Sonnet).