blue/.blue/docs/spikes/2026-01-26T1843Z-read-tool-token-limit-on-assembled-dialogue-documents.wip.md
Eric Garcia 02901dfec7 chore: batch commit - ADRs, RFCs, dialogues, spikes, and code updates
ADRs:
- Update 0008-honor, 0009-courage, 0013-overflow, 0015-plausibility
- Add 0017-hosted-coding-assistant-architecture

RFCs:
- 0032: per-repo AWS profile configuration (draft)
- 0033: round-scoped dialogue files (impl + plan)
- 0034: comprehensive config architecture (accepted)
- 0036: expert output discipline (impl)
- 0037: single source protocol authority (draft)
- 0038: SDLC workflow discipline (draft)
- 0039: ADR architecture greenfield clarifications (impl)
- 0040: divorce financial analysis (draft)
- 0042: alignment dialogue defensive publication (draft)

Spikes:
- Read tool token limit on assembled dialogues
- RFC ID collision root cause
- Expert agent output too long
- Judge writes expert outputs
- Blue MCP server on superviber infrastructure
- Playwright MCP multiple window isolation

Dialogues: 16 alignment dialogue records

Code:
- blue-core: forge module enhancements
- blue-mcp: env handlers and server updates
- alignment-expert agent improvements
- alignment-play skill refinements
- install.sh script

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 16:28:31 -05:00

130 lines
5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Spike: Read tool token limit on assembled dialogue documents
| | |
|---|---|
| **Status** | In Progress |
| **Date** | 2026-01-26 |
| **Time Box** | 30 minutes |
---
## Question
Why does the alignment dialogue fail with token limit errors when using file-based subagent output?
---
## Root Cause
The error occurs when the **Judge agent** tries to read the assembled dialogue document after completing all rounds. Individual agent output files are small (~2-3KB each, ~400 words), but the combined dialogue document accumulates:
- 3-6 expert perspectives per round
- Multiple rounds (typically 2-3)
- Each perspective ~400 words
- Plus judge synthesis, tension markers, and metadata
**Result:** A 4-round dialogue with 5 experts produces ~10KB per round × 4 = ~40KB+, exceeding the Read tool's 25,000 token limit.
## Evidence
Error observed:
```
Read(~/.claude/projects/-Users-ericg-letemcook-fungal-image-analysis/acd9a1b2-29fd-437c-a1
Error: File content (31767 tokens) exceeds maximum allowed tokens (25000)
```
The path `~/.claude/projects/...` is where Claude stores Task output, suggesting the Judge was reading back its own assembled document (not the individual `/tmp/blue-dialogue/{slug}/round-N/{agent}.md` files).
## Already Documented
RFC 0029 (file-based-subagent-output) captured this as **Churro T02** (open question at line 159):
> When agent output exceeds Write tool buffer limits, should the Task system JSONL approach serve as fallback?
The original dialogue noted:
> TENSION T02: Stream vs document modes — when agent output exceeds buffer
## What Works
- Individual agent files in `/tmp/blue-dialogue/{slug}/round-N/{agent}.md` (~2-3KB each)
- Write tool successfully stores agent perspectives
- Round-scoped paths prevent collisions
- Fallback to `blue_extract_dialogue(task_id=...)` exists for missing files
## What Breaks
1. **Assembled dialogue documents** can exceed Read tool's 25K token limit
2. **Judge can't verify its own writes** to large dialogue files
3. **No paginated read strategy** in the judge protocol
## Options
### A. Paginated reading
Judge reads dialogue with offset/limit parameters. Requires tracking document structure to know what to skip.
### B. Streaming writes, chunk reads
Each round writes to a separate section file. Judge assembles by reading chunks. More complexity.
### C. Trust-but-verify pattern
Judge writes without reading back the full document. Only reads individual agent files which stay small. Final document assembly happens at dialogue completion, not during.
### D. Summary-based continuation
After each round, Judge writes a summary of accumulated state rather than re-reading the full document. Avoids needing to read large files.
## Recommendation
**Option C (trust-but-verify)** aligns with the file-based approach:
1. Judge reads individual agent output files (always small)
2. Judge appends to dialogue document without re-reading it
3. `blue_dialogue_save` handles final assembly and validation
4. Remove any Judge instructions that require reading the full assembled document mid-dialogue
This requires updating `build_judge_protocol` in `dialogue.rs` to not instruct the Judge to read back its own document.
---
## Alignment Dialogue Outcome
A 3-expert alignment dialogue reached **100% convergence** on an improved architecture:
**Dialogue:** `.blue/docs/dialogues/2026-01-26T1850Z-round-scoped-file-architecture-for-alignment-dialogues.dialogue.recorded.md`
### Final Architecture
```
/tmp/blue-dialogue/{slug}/
├─ round-0/
│ ├─ muffin.md ← Agents write (working artifacts)
│ ├─ cupcake.md
│ └─ scone.md
├─ round-0.dialogue.md ← Judge assembles (continuity artifact)
├─ round-1/
│ └─ {agent}.md
├─ round-1.dialogue.md
└─ .archive/ ← Post-round archive (optional)
```
### Key Resolutions
| Tension | Resolution |
|---------|------------|
| Stateless vs stateful synthesis | **Stateful by reference** — global tension IDs (T01, T02...) enable cross-round references without copying content |
| What content in synthesis | **Full round content** — synthesis + all expert perspectives + metadata (~8-12KB per round, safely under 25K) |
| Cross-round tension references | **Global namespace** — T01, T02, T03... never reused across rounds |
| Dual-write burden on Judge | **Necessary separation of concerns** — prompt templating (pre-round) and synthesis assembly (post-round) serve different consumers |
### Implementation Changes Required
1. **Judge reads per round:** ~15-20KB max
- Current round agent files (~2-3KB × agents)
- Prior round's `round-N.dialogue.md` only (~8-12KB) — NOT full history
2. **Judge writes per round:**
- Agent prompt files (pre-round, templated)
- Round dialogue file (post-round, synthesis + perspectives)
3. **Agents read per round:**
- All prior `round-N.dialogue.md` files for context
- Source grounding files specified in prompt
This eliminates the token overflow by ensuring no single Read exceeds 25K tokens.