Every document filename now mirrors its lifecycle state with a status suffix (e.g., .draft.md, .wip.md, .accepted.md). No more bare .md for tracked document types. Also renamed all from_str methods to parse to avoid FromStr trait confusion, introduced StagingDeploymentParams struct, and fixed all 19 clippy warnings across the codebase. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
553 lines
36 KiB
Markdown
553 lines
36 KiB
Markdown
# Alignment Dialogue: Authenticated MCP Instruction Delivery RFC Design
|
|
|
|
**Draft**: Dialogue 2027
|
|
**Date**: 2026-01-26 08:04
|
|
**Status**: Complete
|
|
**Participants**: 💙 Judge, 🧁 Muffin, 🧁 Cupcake, 🧁 Scone, 🧁 Eclair, 🧁 Donut, 🧁 Brioche, 🧁 Croissant, 🧁 Macaron, 🧁 Cannoli, 🧁 Strudel, 🧁 Beignet, 🧁 Churro
|
|
**RFC**: authenticated-mcp-instruction-delivery
|
|
|
|
## Expert Panel
|
|
|
|
| Agent | Role | Tier | Relevance | Emoji |
|
|
|-------|------|------|-----------|-------|
|
|
| 💙 Judge | Orchestrator | — | — | 💙 |
|
|
| 🧁 Muffin | Security Architect | Core | 0.95 | 🧁 |
|
|
| 🧁 Cupcake | UX Architect | Core | 0.90 | 🧁 |
|
|
| 🧁 Scone | Technical Writer | Core | 0.85 | 🧁 |
|
|
| 🧁 Eclair | Systems Thinker | Core | 0.80 | 🧁 |
|
|
| 🧁 Donut | Domain Expert | Adjacent | 0.70 | 🧁 |
|
|
| 🧁 Brioche | Devil's Advocate | Adjacent | 0.65 | 🧁 |
|
|
| 🧁 Croissant | Integration Specialist | Adjacent | 0.60 | 🧁 |
|
|
| 🧁 Macaron | Risk Analyst | Adjacent | 0.55 | 🧁 |
|
|
| 🧁 Cannoli | First Principles Reasoner | Adjacent | 0.50 | 🧁 |
|
|
| 🧁 Strudel | Pattern Recognizer | Wildcard | 0.40 | 🧁 |
|
|
| 🧁 Beignet | Edge Case Hunter | Wildcard | 0.35 | 🧁 |
|
|
| 🧁 Churro | Systems Thinker | Wildcard | 0.30 | 🧁 |
|
|
|
|
## Alignment Scoreboard
|
|
|
|
| Agent | Wisdom | Consistency | Truth | Relationships | **Total** |
|
|
|-------|--------|-------------|-------|---------------|----------|
|
|
| 🧁 Muffin | 3 | 3 | 3 | 3 | **12** |
|
|
| 🧁 Cupcake | 3 | 3 | 3 | 3 | **12** |
|
|
| 🧁 Scone | 3 | 3 | 3 | 3 | **12** |
|
|
| 🧁 Eclair | 3 | 2 | 3 | 2 | **10** |
|
|
| 🧁 Donut | 3 | 3 | 3 | 3 | **12** |
|
|
| 🧁 Brioche | 3 | 3 | 3 | 2 | **11** |
|
|
| 🧁 Croissant | 3 | 3 | 3 | 3 | **12** |
|
|
| 🧁 Macaron | 3 | 3 | 3 | 2 | **11** |
|
|
| 🧁 Cannoli | 3 | 2 | 3 | 2 | **10** |
|
|
| 🧁 Strudel | 3 | 3 | 3 | 3 | **12** |
|
|
| 🧁 Beignet | 3 | 3 | 3 | 3 | **12** |
|
|
| 🧁 Churro | 3 | 3 | 3 | 3 | **12** |
|
|
|
|
**Total ALIGNMENT**: 138
|
|
|
|
## Perspectives Inventory
|
|
|
|
| ID | Agent | Perspective | Round |
|
|
|----|-------|-------------|-------|
|
|
| P01 | 🧁 Muffin | Token provisioning underspecified (bootstrap paradox) | 0 |
|
|
| P02 | 🧁 Cupcake | Auth introduces new failure surface for developers | 0 |
|
|
| P03 | 🧁 Scone | "Don't leak" CONFIDENTIAL framing is dishonest | 0 |
|
|
| P04 | 🧁 Eclair | Daemon reuse creates coupling inversion | 0 |
|
|
| P05 | 🧁 Donut | MCP spec silence is permissive, not restrictive | 0 |
|
|
| P06 | 🧁 Donut | Plugin model creates inverse incentive | 0 |
|
|
| P07 | 🧁 Brioche | This is disproportionate security theater | 0 |
|
|
| P08 | 🧁 Croissant | Discovery via daemon health endpoint with backoff | 0 |
|
|
| P09 | 🧁 Croissant | Session token via daemon DB, not filesystem | 0 |
|
|
| P10 | 🧁 Macaron | Auth compromise = exposure only, not control | 0 |
|
|
| P11 | 🧁 Cannoli | Real invariant is behavioral integrity, not confidentiality | 0 |
|
|
| P12 | 🧁 Strudel | Code signing is the best analogy (with revocation) | 0 |
|
|
| P13 | 🧁 Beignet | Token file collision with concurrent sessions | 0 |
|
|
| P14 | 🧁 Beignet | /tmp survives reboot on macOS (stale tokens) | 0 |
|
|
| P15 | 🧁 Churro | Defense layers have same failure mode (don't compound) | 0 |
|
|
| P16 | 🧁 Muffin | Fail-closed UX is feature gate, not crash (degraded mode) | 1 |
|
|
| P17 | 🧁 Muffin | Telemetry must measure extraction attempts, not just usage | 1 |
|
|
| P18 | 🧁 Cupcake | `blue auth check` as diagnostic first-responder | 1 |
|
|
| P19 | 🧁 Scone | Classification by extraction risk, not content type (revocation test) | 1 |
|
|
| P20 | 🧁 Eclair | Daemon becomes behavioral authority, binary becomes dumb executor | 1 |
|
|
| P21 | 🧁 Brioche | Phase 1 should be instrumentation only, measure before building | 1 |
|
|
| P22 | 🧁 Donut | MCP spec assumes fat servers; Option C preserves MCP contract | 1 |
|
|
| P23 | 🧁 Beignet | CI uses env var tokens (BLUE_AUTH_TOKEN), service accounts are scope creep | 1 |
|
|
| P24 | 🧁 Cannoli | Auth is real protection looking for a real threat; defer until distribution | 1 |
|
|
| P25 | 🧁 Strudel | Code signing enables per-build-signature token policies | 1 |
|
|
| P26 | 🧁 Macaron | Phase 2 gate criteria: 99.9% uptime, <50ms p95, zero leaks, friction <2/10 | 1 |
|
|
| P27 | 🧁 Churro | Current threat is opportunity-based (casual inspection), not targeted | 1 |
|
|
|
|
## Tensions Tracker
|
|
|
|
| ID | Tension | Status | Raised | Resolved |
|
|
|----|---------|--------|--------|----------|
|
|
| T1 | Fail open vs fail closed on daemon unavailability | **Resolved** | Muffin R0 | R0 — Fail closed (consensus) |
|
|
| T2 | Token lifecycle invisible to developers (debugging hostile) | **Resolved** | Cupcake R0 | R1 — Cupcake/Croissant: degraded mode UX, `blue auth check` |
|
|
| T3 | Structural vs behavioral boundary is fuzzy (classification debt) | **Resolved** | Eclair R0 | R1 — Scone: extraction risk framework, revocation acid test |
|
|
| T4 | Runtime dependency vs security gain tradeoff | **Resolved** | Donut R0 | R1 — Option C with session caching (7/12 consensus) |
|
|
| T5 | Prompt injection bypasses auth entirely (primary attack surface ignored) | **Resolved** | Brioche R0 | R0 — Orthogonal threats, not layered (consensus) |
|
|
| T6 | Latency vs offline capability | **Resolved** | Croissant R0 | R1 — In-memory cache per session, fetch once |
|
|
| T7 | Phase ordering needs telemetry before Phase 2 decisions | **Resolved** | Macaron R0 | R1 — Macaron: concrete gate criteria defined |
|
|
| T8 | Auth doesn't solve the hard problem (behavioral integrity) | **Resolved** | Cannoli R0 | R1 — Reframed: auth solves portability, "don't leak" solves injection |
|
|
| T9 | Revocation story: network dependency is the feature, not the bug | **Resolved** | Strudel R0 | R1 — Strudel: per-build policies, design for A, build C |
|
|
| T10 | CI/CD and Docker have no persistent /tmp or interactive session | **Resolved** | Beignet R0 | R1 — Env var tokens (BLUE_AUTH_TOKEN), structural-only mode |
|
|
| T11 | What property are we buying? (portability resistance) | **Resolved** | Churro R0 | R1 — Build Phase 1 now (7/12), casual inspection is current threat |
|
|
| T12 | First-run experience: auto-start daemon or require explicit setup? | Open | Cupcake R1 | — |
|
|
|
|
## Round 0: Opening Arguments
|
|
|
|
### Muffin 🧁
|
|
|
|
[PERSPECTIVE P01: Token provisioning underspecified (bootstrap paradox)]
|
|
|
|
The spike proposes `/tmp/blue-session-{pid}` with "hook calls `blue auth session-start`". But when exactly does the hook fire? Before MCP handshake or after? Who creates the token file? The MCP server needs a token to call `/auth/instructions` during `initialize`, but `SessionStart` hooks may fire *after* `initialize` starts.
|
|
|
|
The existing daemon (server.rs:40-51) already runs on localhost:7865. Adding auth as a route group is clean, but **token provisioning timing** creates a bootstrap paradox.
|
|
|
|
[TENSION T1: Graceful degradation conflicts with security goal]
|
|
|
|
If the goal is protecting behavioral content from direct invocation, graceful degradation defeats it. An attacker who blocks localhost:7866 gets the fallback path. **Either** auth is required (fail closed) **or** it's optional (fail open). The spike doesn't pick.
|
|
|
|
[REFINEMENT] Session token should be daemon-issued via existing `POST /sessions` (server.rs:62). MCP binary calls on first request, daemon stores in DB. No `/tmp/` files. No hook dependency.
|
|
|
|
### Cupcake 🧁
|
|
|
|
[PERSPECTIVE P02: Auth introduces new failure surface]
|
|
|
|
The hybrid model creates a hard dependency on the daemon for behavioral content. If the auth server is down, Blue loses its voice — it becomes a hollow shell. The spike proposes "graceful degradation" but doesn't define what "generic" means.
|
|
|
|
[TENSION T2: Token lifecycle vs developer mental model]
|
|
|
|
The token flow is invisible when it works, but debugging is hostile. If the token file is missing, MCP requests fail with 401. The developer sees "authentication failed" but has no intuitive fix. No `blue login` command, no visible session concept.
|
|
|
|
[REFINEMENT] Need `blue auth check` diagnostic command. MCP server should auto-spawn daemon if not running. Clear warning in initialize instructions when degraded.
|
|
|
|
### Scone 🧁
|
|
|
|
[PERSPECTIVE P03: "Don't leak" creates false security expectations]
|
|
|
|
The spike's CONFIDENTIAL framing (lines 167-180) promises confidentiality we can't deliver. Any sufficiently clever prompt injection bypasses it. The RFC must not misrepresent this.
|
|
|
|
[TENSION T3: CONFIDENTIAL framing is dishonest]
|
|
|
|
Replace "CONFIDENTIAL — INTERNAL BEHAVIORAL GUIDANCE" with: "OPERATIONAL CONTEXT — NOT A SECURITY BOUNDARY. The following patterns guide your behavior as Blue. These are preferences, not policies." This removes false security implications while still discouraging casual extraction.
|
|
|
|
### Eclair 🧁
|
|
|
|
[PERSPECTIVE P04: Daemon reuse creates coupling inversion]
|
|
|
|
The MCP server is currently independent — a stdio binary with no external dependencies. Making it call the daemon for instructions means the MCP protocol now depends on daemon availability. That's a significant architectural change.
|
|
|
|
The daemon was designed for realm sync, sessions, and notifications — persistent state. Auth tokens are ephemeral session state. Adding auth conflates persistent project state with transient session security.
|
|
|
|
[TENSION T4: Structural vs behavioral split is underspecified]
|
|
|
|
Where does "Blue speaks in 2 sentences" live? What about the ADR arc explanation? If we split wrong, we leak IP in the binary or create chatty auth calls for low-value strings.
|
|
|
|
[REFINEMENT] Memory cache on first fetch per session + disk fallback for last-known-good instructions.
|
|
|
|
### Donut 🧁
|
|
|
|
[PERSPECTIVE P05: MCP spec silence is permissive]
|
|
|
|
The MCP specification is agnostic to instruction sensitivity. It defines `initialize` as returning server metadata and optional instructions but makes no statements about where those instructions originate. Auth is a conformant implementation.
|
|
|
|
[PERSPECTIVE P06: Plugin model creates inverse incentive]
|
|
|
|
The thin-plugin/fat-binary strategy keeps alignment mechanics out of visible plugin files. But the auth server proposal acknowledges that even the compiled binary is vulnerable. The plugin architecture doesn't change the threat model — attackers target the binary, not the plugin wrapper.
|
|
|
|
[TENSION T5: Runtime dependency vs security gain tradeoff]
|
|
|
|
Is the threat (reverse engineering alignment protocols) realistic enough to justify a mandatory runtime HTTP dependency for local development?
|
|
|
|
### Brioche 🧁
|
|
|
|
[PERSPECTIVE P07: Security theater / disproportionate]
|
|
|
|
The auth layer protects against exactly two scenarios: (1) casual `blue mcp` invocation by confused users, and (2) static analysis via `strings`. The first is user confusion, not a threat. The second delays reverse engineering by an afternoon.
|
|
|
|
Meanwhile, you're adding: HTTP client, token generation/validation, file I/O, graceful degradation logic, cache invalidation, daemon deployment, documentation for token lifecycle, and debugging surface.
|
|
|
|
[TENSION T6: Prompt injection bypasses everything]
|
|
|
|
The "don't leak" directive is a speed bump. But if your threat model includes sophisticated attackers (who reverse-engineer binaries), why would they fumble a prompt injection? You're fortifying the moat while leaving the front door unlocked.
|
|
|
|
Risk-adjusted value: This work makes sense *if* distributing to untrusted environments where static analysis is likely and prompt injection is hard. For dev-focused SaaS? Disproportionate.
|
|
|
|
### Croissant 🧁
|
|
|
|
[PERSPECTIVE P08: Discovery via daemon health endpoint]
|
|
|
|
MCP server should poll `GET /health` with exponential backoff (50ms, 100ms, 200ms, max 2s total). If health check fails after timeout, return generic instructions and log warning.
|
|
|
|
[PERSPECTIVE P09: Session token via daemon DB, not filesystem]
|
|
|
|
The daemon should issue tokens via `POST /auth/session` and store them in SQLite. MCP process calls on startup, gets token. If daemon restarts, MCP gets 401, re-authenticates. No `/tmp/` files, no garbage on crashes.
|
|
|
|
[TENSION T7: Latency vs offline capability]
|
|
|
|
Is this primarily an anti-reverse-engineering control (offline OK, cache OK) or an anti-runtime-extraction control (daemon must stay up)?
|
|
|
|
### Macaron 🧁
|
|
|
|
[PERSPECTIVE P10: Auth server compromise = exposure, not control]
|
|
|
|
If compromised, attacker gains voice patterns and alignment content but **cannot hijack tool behavior** — binary still validates parameters and routes calls. Blast radius: intelligence exposure, zero code execution risk.
|
|
|
|
[TENSION T8: Phase ordering needs telemetry]
|
|
|
|
Phase 2 moves tool response templates to auth server — every tool call gets network latency. But "validate Phase 1" is undefined. Phase 1 should include latency telemetry and cache hit rate measurement so Phase 2 decisions are data-driven.
|
|
|
|
### Cannoli 🧁
|
|
|
|
[PERSPECTIVE P11: Real invariant is behavioral integrity, not confidentiality]
|
|
|
|
The spike frames this as "instruction protection," but the fundamental invariant is: **Blue's responses should reflect Blue's protocols, not an adversary's prompt**. Reframing from confidentiality to behavioral fidelity changes everything:
|
|
|
|
- Confidentiality framing → Auth prevents RE → Prompt injection defeats it → Auth feels like theater
|
|
- Behavioral fidelity framing → Auth establishes provenance → Injection becomes detectable drift → Auth is one defense layer
|
|
|
|
[TENSION T9: Auth doesn't solve the hard problem]
|
|
|
|
If high-value content still hits Claude's context in plaintext, what are we actually protecting? The honest answer: casual RE and direct invocation. The RFC must be explicit about this boundary.
|
|
|
|
### Strudel 🧁
|
|
|
|
[PERSPECTIVE P12: Code signing is the best analogy]
|
|
|
|
DRM fails because it protects content consumed by the user — the adversary IS the legitimate user. OAuth is about delegation. HSMs are overkill. **Code signing** solves our exact problem: ensuring the MCP server requesting instructions is authentic, not tampered.
|
|
|
|
[TENSION T10: Revocation story (network dependency is the feature)]
|
|
|
|
Code signing's power is that signatures can be revoked. If Blue's instructions leak, Option A lets you rotate server-side instantly. Option C requires a binary update. **Option A's network dependency is the feature, not the bug.**
|
|
|
|
### Beignet 🧁
|
|
|
|
[PERSPECTIVE P13: Token file collision across concurrent sessions]
|
|
|
|
`/tmp/blue-session-{pid}` breaks when daemon restarts — new PID, new token file, old MCP instance reads stale token. The PID should be Claude Code's process, not the daemon's, but then discovery becomes another problem.
|
|
|
|
[PERSPECTIVE P14: /tmp survives reboot incorrectly on macOS]
|
|
|
|
macOS `/tmp` → `/private/tmp` persists across reboots. Stale token files from crashed sessions accumulate. Need explicit cleanup: session-end hook must rm the file, daemon startup should clean old files.
|
|
|
|
[TENSION T11: CI/CD and Docker have no persistent /tmp]
|
|
|
|
In CI/CD, there's no interactive session. GitHub Actions or Docker builds need a different token provisioning path. Current design assumes interactive desktop use only.
|
|
|
|
### Churro 🧁
|
|
|
|
[PERSPECTIVE P15: Defense layers have same failure mode]
|
|
|
|
Auth and "don't leak" do **not compound** — they share the same failure mode. Both rely on the assumption that an attacker cannot access local process state. If someone can extract the binary and run `strings`, they can also intercept stdio pipes, read `/tmp`, or `ptrace` the process.
|
|
|
|
Auth protects against *binary distribution* (binary useless elsewhere). "Don't leak" protects against *prompt injection*. These are **orthogonal** threat vectors, not layered defenses.
|
|
|
|
[TENSION T12: What property are we buying?]
|
|
|
|
The system-level property is **portability resistance** — making the binary useless outside its provisioned environment. The spike conflates this with prompt injection defense. These require different architectures. The current proposal hedges both and delivers full protection for neither.
|
|
|
|
## 💙 Judge: Round 0 Assessment
|
|
|
|
### Emerging Consensus (7 points of agreement)
|
|
|
|
1. **Drop /tmp token files** → Use daemon's existing session DB (Muffin, Croissant, Beignet)
|
|
2. **Fail closed** → No daemon = no behavioral instructions (Muffin, Brioche converge here)
|
|
3. **Reframe RFC** → "Behavioral integrity/provenance" not "confidentiality" (Cannoli, Churro, Scone)
|
|
4. **Honest operational context** → Not "CONFIDENTIAL", use "OPERATIONAL CONTEXT — NOT A SECURITY BOUNDARY" (Scone)
|
|
5. **Code signing as primary pattern** → Establish provenance with revocation capability (Strudel)
|
|
6. **Phase 1 needs telemetry** → Measure latency/cache before Phase 2 decisions (Macaron)
|
|
7. **Auth protects portability, not injection** → Two orthogonal threat vectors, not layered (Brioche, Churro)
|
|
|
|
### Unresolved Tensions (5 requiring Round 1)
|
|
|
|
1. **T4/T5: Option A vs Option C** — Strudel says network dependency is the feature (revocation). Eclair/Donut worry about latency. Brioche questions whether either is worth the complexity.
|
|
2. **T2: Developer UX** — Cupcake's failure mode concerns need concrete resolution: what does "fail closed" look like to the developer?
|
|
3. **T11: CI/CD support** — Beignet's edge case about non-interactive environments is unaddressed.
|
|
4. **T3: Structural/behavioral boundary** — Eclair's concern about classification debt needs a decision framework.
|
|
5. **T8: Proportionality** — Brioche and Churro challenge whether the engineering cost is justified for current threats.
|
|
|
|
### Convergence Status
|
|
|
|
**7/12 experts align** on the core reframing (behavioral integrity, daemon DB tokens, fail closed, honest language). **5 tensions remain open**. Proceeding to Round 1 for targeted resolution.
|
|
|
|
## Round 1: Integration & Resolution
|
|
|
|
### Muffin 🧁
|
|
|
|
[CONCESSION: Session DB over /tmp files]
|
|
The daemon already has session infrastructure. Token files were sloppy thinking — use the session table you already maintain.
|
|
|
|
[CONCESSION: Framing as provenance, not confidentiality]
|
|
This **is** about behavioral integrity. The threat is tampering with Blue's voice/protocols via direct binary invocation or extracted strings. Auth proves "this instruction came from the legitimate source."
|
|
|
|
[RESOLVED T1: Option A vs C]
|
|
**Option C with a clear decision framework**: Auth server holds anything that defines *how Blue behaves* (voice, alignment protocols, scoring, agent templates). Binary holds *what Blue can do* (tool schemas, routing, database queries). If it affects tone or workflow philosophy, it's behavioral — fetch it. If it's parameter validation or CRUD logic, it's structural — compile it.
|
|
|
|
[PERSPECTIVE P16: Fail-closed UX is a feature gate, not a crash]
|
|
When daemon is unreachable, MCP server returns degraded `initialize` instructions: "Blue MCP tools available, but behavioral guidance offline. Running in minimal mode." Tools still work for CRUD operations; alignment features gracefully degrade. Log the failure, surface to user status bar. This is how browsers handle cert validation failure — inform, don't break.
|
|
|
|
[TENSION T2: CI/CD needs non-interactive auth]
|
|
Fail-closed breaks automated environments. Either: (1) CI token provisioned via environment variable, daemon validates against project-bound secret, or (2) CI runs in "structural-only" mode where behavioral instructions are optional. Need explicit decision on whether CI should execute alignment dialogues.
|
|
|
|
[PERSPECTIVE P17: Telemetry must measure extraction attempts, not just usage]
|
|
Phase 1 telemetry shouldn't just count instruction fetches. Log authentication **failures** — if you see repeated rejected requests, someone is probing. That data tells you whether Phase 2 investment is justified.
|
|
|
|
### Cupcake 🧁
|
|
|
|
[RESOLVED T2: Developer UX for Fail Closed]
|
|
|
|
**What developers see when daemon is not running:**
|
|
|
|
```
|
|
$ claude <starts MCP server>
|
|
[Blue] Warning: Daemon not running — behavioral instructions unavailable
|
|
[Blue] Info: Start daemon: blue daemon start
|
|
[Blue] Warning: Tools available in degraded mode (no voice, alignment, ADRs)
|
|
```
|
|
|
|
Then MCP returns minimal `initialize`: `"instructions": "Blue MCP server (degraded mode). Daemon unavailable."`
|
|
|
|
Tools still work but responses contain no alignment protocols, voice patterns, or judge templates.
|
|
|
|
[RESOLVED T11: CI/CD Non-Interactive Environments]
|
|
|
|
**For CI/CD:**
|
|
```bash
|
|
blue daemon start --ci-mode
|
|
blue auth session-create --output=BLUE_SESSION_TOKEN
|
|
export BLUE_SESSION_TOKEN=<token>
|
|
```
|
|
|
|
MCP reads token from env var if file is missing. Daemon auto-stops after job timeout (default 2h).
|
|
|
|
**For ephemeral containers:** Daemon embedded in MCP process if `BLUE_EMBEDDED=1` set. Session DB lives in memory. No external dependencies.
|
|
|
|
[PERSPECTIVE P18: `blue auth check` as diagnostic first-responder]
|
|
|
|
```bash
|
|
$ blue auth check
|
|
✓ Daemon running (pid 12345, uptime 2h 15m)
|
|
✓ Session active (expires in 21h 45m)
|
|
✓ Instruction delivery: operational
|
|
✓ MCP server: ready
|
|
```
|
|
|
|
This becomes the first troubleshooting command when "Blue doesn't sound right."
|
|
|
|
[TENSION T12: First-run experience]
|
|
If developer runs `claude` before ever starting daemon, they get degraded mode with no explanation of *why*. Should first MCP connection auto-start daemon, or require explicit `blue daemon start`?
|
|
|
|
### Scone 🧁
|
|
|
|
[RESOLVED T3: Decision Framework for Content Classification]
|
|
|
|
**Classification by extraction risk, not content type.**
|
|
|
|
**Stay in Binary IF:**
|
|
1. **Performance-critical** — needed synchronously at <10ms latency (tool schemas for `tools/list`)
|
|
2. **Low extraction value** — publicly documentable or easily derived (parameter types, tool names)
|
|
3. **Verification-only** — validates structure, doesn't encode strategy (JSON schema enforcement)
|
|
|
|
**Move to Auth Server IF:**
|
|
1. **Portable IP** — could be copied and recreated elsewhere (alignment protocols, scoring formulas)
|
|
2. **Strategic content** — reveals "why" not "what" (judge reasoning templates, ADR directives)
|
|
3. **Voice/persona** — brand-identifying patterns (tone rules, catchphrases)
|
|
|
|
| Content | Decision | Rationale |
|
|
|---------|----------|-----------|
|
|
| `"name": "dialogue-start"` | Binary | Tool name is in docs anyway |
|
|
| `"required": ["config_path"]` | Binary | Parameter schema has no IP |
|
|
| "Right then. Let's get to it." | **Auth server** | Brand voice, easily extracted |
|
|
| Alignment tier thresholds | **Auth server** | Core scoring IP |
|
|
| `match tool.name { ... }` | Binary | Routing logic, not strategy |
|
|
|
|
[PERSPECTIVE P19: Revocation as the real test]
|
|
The acid test: **"Would we want to revoke access to this content?"** If compromised credentials should lock someone out of the content, it goes to the auth server. Tool schemas? No. Voice patterns and scoring mechanics? Absolutely.
|
|
|
|
### Eclair 🧁
|
|
|
|
[CONCESSION: Strudel's revocation argument is decisive]
|
|
Strudel is right. The network dependency isn't a bug — it's the core feature for **revocable provenance**. The Round 0 reframe from "confidentiality" to **behavioral integrity/provenance** changes the game.
|
|
|
|
[RESOLVED T4/T5: Go Option A — revocable remote is the point]
|
|
Option A aligns with code signing as the primary pattern. If the daemon holds a signed manifest of instructions, it can check revocation status, refuse to serve compromised versions, and update atomically without binary recompilation.
|
|
|
|
Option C's "pragmatism" preserves exactly what we're trying to eliminate: **compiled behavioral strings that can't be revoked**.
|
|
|
|
[PERSPECTIVE P20: Daemon becomes behavioral authority]
|
|
This inverts the coupling cleanly. The binary becomes a **dumb executor** (routing, parameter validation, DB/FS access). The daemon becomes the **behavioral authority** (what Blue sounds like, how alignment works, what scoring means). Latency concern is solvable: cache fetched instructions for session lifetime. First call pays ~1-5ms local roundtrip, rest is memory lookup.
|
|
|
|
### Donut 🧁
|
|
|
|
[PERSPECTIVE P22: MCP spec assumes fat servers, not thin proxies]
|
|
The MCP specification expects servers to respond to `initialize` and `tools/list` synchronously from local state. Option A violates this pattern by making every protocol method call depend on an external HTTP service. Option C preserves the MCP contract: tool schemas live in the binary, `tools/list` is fast and self-contained.
|
|
|
|
[RESOLVED T4/T5: Option C aligns with MCP patterns]
|
|
Option C preserves the MCP contract. Behavioral instructions come from the daemon — a runtime dependency Blue *already has* for session/workspace state. This isn't adding a new external dependency; it's using existing infrastructure.
|
|
|
|
[RESOLVED T11: CI/CD gets structural tools only]
|
|
Non-interactive environments shouldn't receive behavioral instructions at all. They get structural tools only: compiled tool schemas, no voice patterns, no alignment scoring. No daemon = no behavioral layer. Fail closed.
|
|
|
|
### Brioche 🧁
|
|
|
|
[CONCESSION: The reframe partially resolves my proportionality objection]
|
|
The shift from "confidentiality" to "behavioral integrity/portability resistance" narrows the problem space. If code signing with revocation becomes the core pattern, the network dependency transforms from liability to feature.
|
|
|
|
**However, my proportionality concern adapts rather than disappears.**
|
|
|
|
[PERSPECTIVE P21: Cost scales with the threat model we're NOT defending against]
|
|
The spike clearly documents: "Auth does NOT solve prompt injection." The implementation path optimizes for reverse engineering defense — yet if prompt injection is the real threat vector in practice, we're building maximum infrastructure against the *lesser* threat.
|
|
|
|
[TENSION T9: Phase 1 telemetry requirement conflicts with Phase 1 architecture]
|
|
Phase 1 should be *instrumentation only* — log how often instructions are accessed, detect anomalous patterns, measure if direct `blue mcp` invocation happens in practice. *Then* decide if auth infrastructure is warranted. Otherwise "measure before Phase 2" is theater — we've already paid the complexity cost.
|
|
|
|
[CHALLENGE: Code signing doesn't require runtime auth server]
|
|
Code signing solves provenance at *distribution-time*, not runtime. You can verify provenance once per process lifetime without ongoing network dependency. Does the reframe justify the *runtime auth server* specifically, or just justify *signing* generally?
|
|
|
|
### Croissant 🧁
|
|
|
|
[CONCESSION: /tmp token approach was wrong]
|
|
The spike proposed `/tmp/blue-session-{pid}` token files. Round 0 consensus correctly rejected this — it's duplicative, adds surface area, and conflicts with the existing daemon session model.
|
|
|
|
[RESOLVED T2: Developer UX of fail closed]
|
|
MCP server checks daemon health on `initialize` via `GET localhost:7865/health`. Daemon down = return generic instructions + warning banner. Tools still route, but alignment protocols/scoring are stubbed.
|
|
|
|
**Concrete integration flow:**
|
|
1. **Startup:** MCP server checks daemon `/health`. If healthy, proceed. If not, warn and degrade.
|
|
2. **Session token:** Server calls `POST /auth/session`. Daemon creates session in existing sessions table, returns HMAC-signed UUID token (24h TTL).
|
|
3. **Token storage:** Server holds token in memory. No filesystem writes. Token tied to stdio process lifetime.
|
|
4. **Instruction fetch:** On `initialize`, server calls `GET /auth/instructions?token=X`. Daemon validates HMAC, returns voice patterns + ADRs. Server caches in-memory for session.
|
|
5. **Tool response assembly:** On high-value tools, server calls `GET /auth/templates/{tool}?token=X` first time per tool, then caches.
|
|
6. **Daemon restart mid-session:** Next MCP call fails validation. Server detects 401, attempts re-auth. If daemon recognizes client_id, reissues token.
|
|
|
|
[RESOLVED T6: Latency vs offline — caching is essential]
|
|
Fetch once per session: `initialize` instructions (1 call), tool templates on-demand (cached after first use). No network call per tool invocation. No disk cache of secrets.
|
|
|
|
### Macaron 🧁
|
|
|
|
[RESOLVED T7: Telemetry framework defined]
|
|
|
|
**Phase 1 Telemetry Specification:**
|
|
1. **Auth Success Rate**: `sessions_created / sessions_attempted`
|
|
2. **Instruction Fetch Latency**: p50, p95, p99 for `GET /instructions`. Target: <50ms p95.
|
|
3. **Token Validation Failures**: Count by reason (expired, missing, malformed, HMAC_invalid)
|
|
4. **Graceful Degradation Trigger Rate**: How often does fail-closed mode serve generic fallback?
|
|
|
|
**Phase 2 Gate Criteria:**
|
|
- Auth server uptime ≥99.9% in local dev (30-day window)
|
|
- Instruction fetch p95 latency <50ms after 1000 sessions
|
|
- Zero observed prompt injection leaks in telemetry logs
|
|
- Developer friction score <2/10 from team survey
|
|
|
|
[PERSPECTIVE P26: Leak attempt detector]
|
|
Phase 1 telemetry MUST include a "leak attempt detector" — log when Claude's output contains substrings from instruction content. That metric determines Phase 2 go/no-go.
|
|
|
|
[CONCESSION: Proportionality tension persists]
|
|
If 80% of actual threat is prompt injection (unknown without measurement), Phase 1 builds infrastructure for the 20%. This justifies the phased approach with metrics, but we won't know proportionality until we measure.
|
|
|
|
### Cannoli 🧁
|
|
|
|
[CONCESSION: Brioche is right about proportionality]
|
|
The reframe succeeded intellectually but failed pragmatically. Behavioral provenance doesn't pass the cost-benefit test for Phase 1.
|
|
|
|
[PERSPECTIVE P24: Auth is real protection looking for a real threat]
|
|
Direct invocation: requires manually crafting JSON-RPC to extract voice patterns. Why? They could just read the design docs. Reverse engineering: `strings blue-mcp` reveals instructions. But for what gain? Blue isn't deployed beyond local dev.
|
|
|
|
[RESOLVED T8: Defer auth until plugin distribution]
|
|
Auth becomes proportional when: (1) plugins ship to third parties, (2) instructions contain competitive IP, (3) binary runs in hostile environments. None true for `~/.blue/bin/blue-mcp` on a dev machine.
|
|
|
|
[REFINEMENT: Keep fail-closed stance, defer implementation]
|
|
Preserve the RFC as design intent. But implement as compiled-in checks, not HTTP calls: `if !daemon::is_running() { return generic_instructions() }`. This enforces "no daemon = no behavior" without latency or token infrastructure.
|
|
|
|
### Strudel 🧁
|
|
|
|
[CONCESSION: Proportionality]
|
|
For Phase 1 local dev, Option A's network dependency is overhead without additional threat mitigation over Option C.
|
|
|
|
[RESOLVED T9: Start with Option C, design for Option A migration]
|
|
Phase 1: Hybrid (Option C). Binary: tool schemas, routing, parameter validation. Daemon `/auth/*` routes: `initialize` instructions, voice patterns, "don't leak" directive. Latency only on session initialization. Tool response templates stay compiled for now.
|
|
|
|
[PERSPECTIVE P25: Code signing enables token issuance policy]
|
|
With signed binaries, the daemon can enforce per-build-signature token policies. Dev builds get 24h tokens. Beta builds get 7d tokens. Release builds get refresh tokens with remote validation.
|
|
|
|
**Migration trigger**: When we ship Blue as a distributed plugin, Option A becomes proportional. Until then, Option C gives us the architecture without the operational cost.
|
|
|
|
### Beignet 🧁
|
|
|
|
[CONCESSION: Daemon DB resolves P13/P14]
|
|
Using daemon's session database instead of `/tmp` token files eliminates collision concerns and stale token issues.
|
|
|
|
[RESOLVED T10: CI/CD environments]
|
|
CI environments need **environment variable tokens** as primary solution:
|
|
1. `ENV: BLUE_AUTH_TOKEN` (CI/CD, Docker, scripting)
|
|
2. `DB: sessions table` (interactive sessions)
|
|
3. `FAIL: Return error, no generic fallback`
|
|
|
|
**Provisioning**: Interactive uses `blue auth session-start`. CI uses `blue auth token-create --scope ci` (90d expiry). Docker mounts `.blue/` directory or uses build args.
|
|
|
|
[PERSPECTIVE P23: Service accounts are scope creep]
|
|
If we add service account tokens for CI, we're building OAuth. Env vars are the escape hatch. Scope this RFC to "interactive sessions only" and punt automation to a later RFC.
|
|
|
|
### Churro 🧁
|
|
|
|
[RESOLVED T11: Build Phase 1 now — daemon route group, not separate service]
|
|
|
|
The property we're buying is **making casual inspection worthless**.
|
|
|
|
[PERSPECTIVE P27: Current threat is opportunity-based, not targeted]
|
|
Today's risk isn't nation-state reverse engineering — it's casual inspection. Someone runs `strings blue-mcp`, sees alignment protocols in plaintext, copies them. The "don't leak" directive doesn't help here because the binary itself leaks.
|
|
|
|
**Build it now because:**
|
|
1. The daemon already runs Axum on localhost — auth is just `/auth/instructions` + token check
|
|
2. Moving strings out of the binary *now* means we never ship them in a distributed binary later
|
|
3. "Don't leak" alone protects against prompt injection but not filesystem access or reverse engineering — we need both
|
|
|
|
[CONCESSION: Phase 2 can wait]
|
|
Tool response templates are dynamic (database-driven during execution, not compiled strings). The reverse engineering attack surface for templates is smaller. Defer Phase 2 until binary distribution is imminent.
|
|
|
|
## 💙 Judge: Round 1 Assessment
|
|
|
|
### Convergence Summary
|
|
|
|
**Architecture: Option C** — 9/12 experts converge on hybrid (Option C with session caching). Eclair alone advocates Option A for revocation purity; Brioche and Cannoli advocate deferral. Eclair concedes C is pragmatically correct for Phase 1 while designing for A migration. The minority position (defer) doesn't object to the architecture itself — only the timing.
|
|
|
|
**Timing: Build Phase 1 now** — 7/12 experts say build now (Muffin, Cupcake, Scone, Croissant, Donut, Strudel, Churro). 3/12 say defer (Brioche, Cannoli, Eclair). 2/12 say measure first (Macaron, Beignet). The "measure first" camp is compatible with building — they want telemetry in Phase 1, which is already consensus.
|
|
|
|
### All Original Tensions Resolved
|
|
|
|
| Tension | Resolution |
|
|
|---------|------------|
|
|
| T1 | Fail closed (R0 consensus) |
|
|
| T2 | Degraded mode UX with `blue auth check` diagnostic (Cupcake/Croissant R1) |
|
|
| T3 | Extraction risk framework with revocation acid test (Scone R1) |
|
|
| T4/T5 | Option C preserves MCP contract, uses existing daemon infra (Donut/Muffin R1) |
|
|
| T6 | In-memory cache per session, fetch once (Croissant R1) |
|
|
| T7 | Concrete Phase 2 gate criteria: uptime, latency, leaks, friction (Macaron R1) |
|
|
| T8 | Auth = portability resistance, "don't leak" = injection defense (Cannoli/Churro R0-R1) |
|
|
| T9 | Option C now, design for A migration; per-build signing policies (Strudel R1) |
|
|
| T10 | Env var tokens for CI, structural-only mode for non-interactive (Beignet/Donut R1) |
|
|
| T11 | Build now — casual inspection is current threat, minimal effort on existing daemon (Churro R1) |
|
|
|
|
### Remaining Open Tension
|
|
|
|
**T12: First-run experience** — Should MCP auto-start daemon on first connection, or require explicit `blue daemon start`? Minor UX decision, does not block RFC.
|
|
|
|
### Final Consensus (12/12 on architecture, 9/12 on timing)
|
|
|
|
1. **Option C (hybrid)** — Tool schemas in binary, behavioral content from daemon `/auth/*` routes
|
|
2. **Daemon DB sessions** — No /tmp files; HMAC-signed UUID tokens, 24h TTL, in-memory on MCP side
|
|
3. **Fail closed** — No daemon = degraded mode (tools work, no voice/alignment/scoring)
|
|
4. **"OPERATIONAL CONTEXT"** framing — Not "CONFIDENTIAL", honest about non-security-boundary
|
|
5. **Extraction risk classification** — "Would we revoke access?" as the acid test for what moves to auth server
|
|
6. **Phase 1 telemetry** — Auth success rate, latency, token failures, leak attempt detection
|
|
7. **Phase 2 gate criteria** — 99.9% uptime, <50ms p95, zero leaks, friction <2/10
|
|
8. **CI/CD: env var tokens** — `BLUE_AUTH_TOKEN` env var, structural-only mode for headless
|
|
9. **Phase 2 deferred** — Tool response templates stay compiled until distribution imminent
|
|
10. **Code signing design** — Per-build-signature policies, design for Option A migration
|
|
|
|
### Convergence Status
|
|
|
|
**11/11 original tensions resolved. 1 minor tension (T12) remains open — does not block RFC.**
|
|
|
|
**Convergence: ~92%.** The 3 "defer" experts (Brioche, Cannoli, Eclair) accept the architecture but question timing. Since the RFC documents design intent with phased implementation, the deferral position is *compatible* with the RFC — it's a project scheduling decision, not an architectural disagreement.
|
|
|
|
**Proceeding to RFC draft.**
|