blue/docs/dialogues/realm-mcp-design.dialogue.md
Eric Garcia daaaea5c82 feat(realm): Implement RFC 0001 cross-repo coordination and RFC 0002 Phase 1 MCP tools
RFC 0001 - Cross-Repo Coordination with Realms:
- Daemon architecture with HTTP server on localhost:7865
- SQLite persistence for sessions, realms, notifications
- Realm service with git-based storage and caching
- CLI commands: realm status/sync/check/worktree/pr/admin
- Session coordination for multi-repo work

RFC 0002 Phase 1 - Realm MCP Integration:
- realm_status: Get realm overview (repos, domains, contracts)
- realm_check: Validate contracts/bindings with errors/warnings
- contract_get: Get contract details with bindings
- Context detection from .blue/config.yaml
- 98% expert panel alignment via 12-expert dialogue

Also includes:
- CLI documentation in docs/cli/
- Spike for Forgejo tunnelless access
- 86 tests passing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 07:14:35 -05:00

16 KiB

Dialogue: Realm MCP Integration Design

RFC: 0002-realm-mcp-integration Goal: Reach 95% alignment on open design questions Format: 12 experts, structured rounds


Open Questions

  1. Tool granularity - One realm tool with subcommands, or separate tools?
  2. Notification delivery - Poll on each tool call, or separate subscription?
  3. Multi-realm - How to handle repos in multiple realms?

Expert Panel

Expert Domain Perspective
Ada API Design Clean interfaces, discoverability
Ben Developer Experience Friction, learning curve
Carmen Systems Architecture Scalability, performance
David MCP Protocol Tool conventions, client compatibility
Elena Claude Integration LLM tool use patterns
Felix Distributed Systems Consistency, coordination
Grace Security Trust boundaries, access control
Hassan Product User workflows, value delivery
Iris Simplicity Minimalism, YAGNI
James Observability Debugging, transparency
Kim Testing Testability, reliability
Luna Documentation Learnability, examples

Round 1: Initial Positions

Question 1: Tool Granularity

Ada (API Design): Separate tools. Each tool has a clear contract. realm_status returns status, realm_check returns validation results. Easier to document, easier to version independently.

Ben (DX): Separate tools, but not too many. 5-7 tools max in the "realm" namespace. Too many tools overwhelms. Group by workflow: status, validation, session, worktree.

David (MCP Protocol): MCP tools should be atomic operations. One tool = one action. Subcommand patterns work poorly because the LLM has to understand nested schemas. Separate tools with clear names.

Elena (Claude Integration): Claude performs better with focused tools. A tool that does one thing well gets used correctly. A multi-purpose tool with modes leads to parameter confusion.

Iris (Simplicity): Start with 3 tools: realm_status, realm_check, realm_action. The action tool can handle mutations. Expand only when pain is proven.

Luna (Documentation): Separate tools are easier to document with examples. Each tool gets its own "when to use this" section.

Alignment: 85% toward separate tools, debate on how many.

Question 2: Notification Delivery

Carmen (Systems): Polling is simpler and more reliable. MCP doesn't have a push channel. Each tool call can check for pending notifications and include them in the response.

Felix (Distributed): Polling with piggybacking. Don't make a separate notification tool - just include notifications in every response when relevant. The daemon tracks what's been delivered.

David (MCP Protocol): MCP has no subscription model. Polling is the only option. But we can be smart: return notifications with any tool response, mark as delivered, client sees them naturally.

Hassan (Product): Users don't want to manually check notifications. Piggybacking is right - if Claude calls any realm tool and there are notifications, surface them. Claude can then decide to act.

James (Observability): Need a way to explicitly list/ack notifications too. Piggybacking is good for discovery, but sometimes you want to see "what happened while I was away."

Grace (Security): Notifications should only show changes the current session is authorized to see. Scope to domains the repo participates in.

Alignment: 90% toward poll-with-piggyback, explicit list/ack as supplement.

Question 3: Multi-Realm

Iris (Simplicity): Defer. MVP is one realm per repo. Multi-realm is complexity we don't need yet. YAGNI.

Ada (API Design): If we defer, design the API to not preclude it. All tools should accept optional realm parameter. Default to detected realm from cwd.

Ben (DX): Agree with Iris. I've never seen a repo that needs to be in multiple realms. Solve real problems first.

Carmen (Systems): One repo, one realm is a reasonable constraint. It simplifies the daemon state model significantly.

Felix (Distributed): Multi-realm creates coordination nightmares. Which realm's contract wins? Keep it simple: one repo belongs to one realm.

Grace (Security): Multi-realm means multi-trust-boundary. That's a security model I'd want to think very carefully about. Defer.

Kim (Testing): Single realm is much easier to test. The state space explodes with multi-realm.

Hassan (Product): What's the user story for multi-realm? I can't think of one. A repo coordinates with its team. That's one realm.

Alignment: 95% toward single-realm constraint for MVP.


Round 2: Convergence

Question 1: How Many Tools?

Elena: Let me propose a concrete list based on user workflows:

Status/Read (3 tools):

  • realm_status - Overview of realm state
  • realm_check - Validation for CI
  • contract_get - Get specific contract details

Session (2 tools):

  • session_start - Begin work session
  • session_stop - End session (status is included in other responses)

Workflow (2 tools):

  • worktree_create - Set up RFC worktrees
  • pr_status - Check PR readiness (prepare is a CLI action, not MCP)

Notifications (1 tool):

  • notifications_list - Explicit check (ack happens implicitly when acted upon)

Total: 8 tools

Ben: That's reasonable. Each maps to a clear user intent.

Ada: I'd merge session_start and session_stop into session with an action parameter. "Start" and "stop" are two sides of one concept.

Elena: Fair, but Claude handles "start a session" better than "call session with action=start". The verb in the tool name guides usage.

David: Agree with Elena. MCP tools work best when the name is the action. session_start is clearer than session(action="start").

Iris: 8 tools feels like a lot. Can we cut?

Hassan: Which would you cut? Each serves a distinct workflow.

Iris: contract_get could be part of realm_status with a filter. notifications_list could be piggybacked only.

James: I want notifications_list as explicit tool. "Show me what changed" is a real user intent.

Luna: 8 tools is fine if they're well-documented. The CLI has more commands than that.

Alignment on Q1: 90% - 8 tools as proposed, with room to consolidate if usage shows overlap.

Question 2: Notification Details

Felix: Proposal for piggybacking:

  1. Every tool response includes notifications: [] field
  2. Daemon marks notifications as "delivered" when returned
  3. notifications_list shows all (including delivered) with filter options
  4. No explicit ack needed - acting on a notification is implicit ack

Carmen: What triggers a notification? Contract version bump?

Felix: Three triggers:

  • Contract updated (version change)
  • Contract schema changed (even same version - dangerous)
  • Binding added/removed in shared domain

Grace: Notifications scoped to domains the current repo participates in. If aperture and fungal share s3-access domain, aperture sees fungal's changes to contracts in that domain only.

Kim: How do we test piggybacking? Every tool needs to include the notification check.

Ada: Extract to middleware. Every MCP handler calls check_notifications() and merges into response.

Alignment on Q2: 95% - Piggyback with explicit list, middleware pattern, three trigger types.

Question 3: Single Realm Confirmed

All: Consensus. One repo, one realm. The realm parameter is optional (defaults to cwd detection) but exists for explicit override in edge cases.

Ada: Document clearly: "A repo belongs to one realm. To coordinate across organizational boundaries, create a shared realm."

Alignment on Q3: 95% - Single realm constraint, documented clearly.


Round 3: Final Positions

Resolved Design

Tool Inventory (8 tools):

Tool Purpose Notifications
realm_status Realm overview Yes
realm_check Validation Yes
contract_get Contract details Yes
session_start Begin session Yes
session_stop End session No (final)
worktree_create Create RFC worktrees Yes
pr_status PR readiness Yes
notifications_list Explicit notification check N/A

Notification Model:

  • Piggybacked on tool responses
  • Three triggers: version change, schema change, binding change
  • Scoped to shared domains
  • Middleware pattern for implementation
  • Explicit list for "catch up" workflow

Realm Constraint:

  • One repo belongs to one realm
  • Optional realm parameter for explicit override
  • Detected from .blue/config.yaml by default

Round 4: Resolving the Deferred 5%

Question 4: Notification Persistence

Carmen (Systems): Notifications need a lifecycle. Options:

  • A) Session-scoped: live until session ends
  • B) Time-based: live for N hours
  • C) Ack-based: live until explicitly acknowledged
  • D) Hybrid: session OR time, whichever comes first

Felix (Distributed): Session-scoped is problematic. What if I start a session, see a notification, don't act on it, end session, start new session - is it gone? That's data loss.

James (Observability): Notifications are events. Events should be durable. I want to see "what changed in the last week" even if I wasn't in a session.

Hassan (Product): User story: "I was on vacation for a week. I come back, start a session. What changed?" Time-based with reasonable window.

Grace (Security): Notifications contain information about what changed. Long retention = larger attack surface if daemon db is compromised. Keep it short.

Iris (Simplicity): 7 days, no ack needed. Old notifications auto-expire. Simple to implement, simple to reason about.

Ben (DX): What about "I've seen this, stop showing me"? Piggyback means I see the same notification every tool call until it expires.

Ada (API Design): Two states: pending and seen. Piggyback only returns pending. First piggyback delivery marks as seen. notifications_list can show both with filter.

Felix: So the lifecycle is:

  1. Created (pending) - triggered by contract change
  2. Seen - first piggybacked delivery
  3. Expired - 7 days after creation

Kim (Testing): That's testable. Clear state machine.

Elena (Claude): Claude sees notification once via piggyback, can ask for history via notifications_list. Clean.

Luna (Docs): Easy to document: "Notifications appear once automatically, then move to history. History retained 7 days."

Alignment on Q4: 95%

  • Lifecycle: pending → seen → expired
  • Retention: 7 days from creation
  • Piggyback: only pending notifications
  • List: shows all with state filter

Question 5: Schema Change Detection

Carmen (Systems): JSON Schema diffing is hard. Semantic equivalence is undecidable in general. Options:

  • A) Hash comparison (fast, false positives on formatting)
  • B) Normalized hash (canonicalize then hash)
  • C) Structural diff (expensive, accurate)
  • D) Don't detect schema changes, only version changes

Ada (API Design): What's the user need? "Contract schema changed" means "you might need to update your code." Version bump should signal that.

David (MCP): If we require version bump for schema changes, we don't need schema diffing. The version IS the signal.

Iris (Simplicity): I like D. Schema changes without version bump is a bug. Don't build tooling for buggy workflows.

Grace (Security): Counter-point: malicious or careless actor changes schema without bumping version. Consumer code breaks silently. Detection is a safety net.

Felix (Distributed): Schema hash as secondary check. If schema hash changes but version doesn't, that's a warning, not a notification. Different severity.

Ben (DX): So we have:

  • Version change → notification (normal)
  • Schema change without version change → warning in realm_check (smells bad)

Kim (Testing): Normalized hash is deterministic. Canonicalize JSON (sorted keys, no whitespace), SHA256. Same schema always produces same hash.

Carmen: Canonicalization is well-defined for JSON. Use RFC 8785 (JSON Canonicalization Scheme) or similar.

James (Observability): Store schema hash in contract metadata. On load, compute hash, compare. Mismatch = warning. No complex diffing needed.

Hassan (Product): I like the split: version changes are notifications (expected), schema-without-version is a check warning (unexpected, possibly buggy).

Elena (Claude): Clear for Claude too. Notifications are "things happened." Warnings are "something might be wrong."

Alignment on Q5: 95%

  • Version change: notification (normal workflow)
  • Schema change without version: warning in realm_check (smells bad)
  • Detection method: canonical JSON hash (RFC 8785 style)
  • Storage: hash stored in contract, computed on load, compared

Question 6: Worktree Tool Scope

Hassan (Product): User stories:

  1. "I'm starting RFC work, set up worktrees for all repos in my realm"
  2. "I only need to touch aperture and fungal for this RFC, not the others"
  3. "I'm in aperture, create a worktree just for this repo"

Ben (DX): Default should be "smart" - create worktrees for repos in domains I participate in, not all repos in realm.

Ada (API Design): Parameters:

  • rfc (required): branch name
  • repos (optional): specific list, default = domain peers

Felix (Distributed): "Domain peers" = repos that share at least one domain with current repo. If aperture and fungal share s3-access, they're peers.

Iris (Simplicity): What if I just want current repo? That's the simplest case.

Luna (Docs): Three modes:

  1. worktree_create(rfc="x") → domain peers (smart default)
  2. worktree_create(rfc="x", repos=["a","b"]) → specific list
  3. worktree_create(rfc="x", repos=["self"]) → just current repo

Kim (Testing): "self" is a magic value. I'd prefer explicit: repos=["aperture"] where aperture is current repo.

Elena (Claude): Claude can figure out current repo name from context. Magic values are confusing for LLMs.

Ada: Revised:

  • repos omitted → domain peers
  • repos=[] (empty) → error, must specify something
  • repos=["aperture"] → just aperture

Ben: What if repo has no domain peers? Solo repo in realm.

Felix: Then domain peers = empty = just self. Natural fallback.

Carmen: Edge case: repo in multiple domains with different peer sets. Union of all peers?

Grace: Union. If you share any domain, you might need to coordinate.

James (Observability): Log which repos were selected and why. "Creating worktrees for domain peers: aperture, fungal (shared domain: s3-access)"

Alignment on Q6: 95%

  • Default: domain peers (repos sharing at least one domain)
  • Explicit: repos parameter for specific list
  • Solo repo: defaults to just self
  • Multiple domains: union of all peers
  • Logging: explain selection reasoning

Remaining 5%: Truly Deferred

  1. Notification aggregation - If contract changes 5 times in an hour, 5 notifications or 1? (Decide during implementation based on UX testing)

Final Alignment: 98%

Consensus reached on:

Core Design (Rounds 1-3)

  • 8 focused tools mapping to user workflows
  • Piggyback notifications with explicit list fallback
  • Single realm constraint with documented rationale

Notification Persistence (Round 4)

  • Lifecycle: pending → seen → expired
  • Retention: 7 days from creation
  • Piggyback delivers pending only, marks as seen
  • List tool shows all with state filter

Schema Change Detection (Round 5)

  • Version changes → notifications (normal workflow)
  • Schema-without-version → realm_check warning (smells bad)
  • Detection via canonical JSON hash (RFC 8785 style)

Worktree Scope (Round 6)

  • Default: domain peers (repos sharing domains with current repo)
  • Explicit: repos parameter overrides default
  • Solo repos default to self
  • Multiple domains: union of all peers
  • Log selection reasoning for transparency

Truly Deferred (2%)

  • Notification aggregation (rapid changes: batch or individual?)

Panel Sign-off:

  • Ada ✓, Ben ✓, Carmen ✓, David ✓, Elena ✓, Felix ✓
  • Grace ✓, Hassan ✓, Iris ✓, James ✓, Kim ✓, Luna ✓