blue/.blue/docs/spikes/2026-01-26T0600Z-blue-plugin-architecture.done.md
Eric Garcia 0fea499957 feat: lifecycle suffixes for all document states + resolve all clippy warnings
Every document filename now mirrors its lifecycle state with a status
suffix (e.g., .draft.md, .wip.md, .accepted.md). No more bare .md for
tracked document types. Also renamed all from_str methods to parse to
avoid FromStr trait confusion, introduced StagingDeploymentParams struct,
and fixed all 19 clippy warnings across the codebase.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 12:19:46 -05:00

6.9 KiB

Spike: Blue Plugin Architecture & Alignment Protection

Status Complete
Date 2026-01-26
Time-box 1 hour

Question

Can Blue be packaged as a Claude Code plugin? What components benefit from plugin structure? Can the alignment dialogue system be encrypted to prevent leaking game mechanics to end users?


Investigation

Plugin Structure

Claude Code plugins are directories with a .claude-plugin/plugin.json manifest. They can bundle:

  • Subagents (agents/ directory) -- markdown files defining custom agents
  • Skills (skills/ directory) -- slash commands with SKILL.md
  • Hooks (hooks/hooks.json) -- event handlers for PreToolUse, PostToolUse, SessionStart, etc.
  • MCP servers (.mcp.json) -- bundled MCP server configs with ${CLAUDE_PLUGIN_ROOT} paths
  • LSP servers (.lsp.json) -- language server configs

Plugins are namespaced. Blue's commands would appear as /blue:status, /blue:next, etc.

Plugin Distribution

  • Installed via CLI: claude plugin install blue@marketplace-name
  • Scopes: user, project, local, managed
  • Distributed via git-based marketplaces
  • Plugins are COPIED to a cache directory (not used in-place)

Encryption / Protection: Not Possible

There is no encryption, obfuscation, or binary packaging for plugin files. Plugins are plain markdown and JSON files copied to a cache directory. Any user with filesystem access can read every file.

However, a layered defense approach exists:

  1. Compiled MCP binary (already protected): The Judge protocol, prompt templates, expert panel tier logic, and scoring formulas are all in crates/blue-mcp/src/handlers/dialogue.rs -- compiled Rust. Users cannot read this without reverse engineering the binary.

  2. Thin subagent definition (minimal exposure): The alignment-expert.md can be a thin wrapper -- just enough for Claude Code to know the agent's purpose and tool restrictions. The real behavioral instructions (collaborative tone, markers, output limits) can be injected at runtime via the MCP tool response from blue_dialogue_create.

  3. MCP tool response injection (RFC 0023): The Judge protocol is already injected as prose in the message field of the blue_dialogue_create response. The subagent prompt template is also injected there. This means the plugin's agent file can be minimal -- the intelligence stays in the compiled binary.

What Benefits from Plugin Structure

Component Current Location Plugin Benefit
alignment-expert subagent ~/.claude/agents/ + .claude/agents/ Single install, version-controlled, namespaced
Blue voice & ADR context MCP instructions field SessionStart hook could inject additional context
/status, /next commands MCP tools only Skill shortcuts: /blue:status, /blue:next
Blue MCP server Manually configured per-repo .mcp.json Auto-configured via plugin .mcp.json
Dialogue lint/validation MCP tool only PostToolUse hook could auto-lint after dialogue edits
RFC/spike creation MCP tool only Skill shortcuts: /blue:rfc, /blue:spike

Protection Strategy: Thin Agent + Fat MCP

Instead of trying to encrypt plugin files (impossible), split the system into two layers.

In the plugin (visible to users):

# alignment-expert.md
---
name: alignment-expert
description: Expert agent for alignment dialogues
tools: Read, Grep, Glob
model: sonnet
---
You are an expert participant in an alignment dialogue.
Follow the instructions provided in your prompt exactly.

In the compiled binary (invisible to users):

  • Full collaborative tone (SURFACE, DEFEND, CHALLENGE, INTEGRATE, CONCEDE)
  • Marker format ([PERSPECTIVE Pnn:], [TENSION Tn:], etc.)
  • Output limits (400 words, 2000 chars)
  • Expert panel tiers (Core/Adjacent/Wildcard)
  • Scoring formula (Wisdom + Consistency + Truth + Relationships)
  • Judge orchestration protocol

The MCP tool response from blue_dialogue_create injects the full behavioral prompt into the Judge's context, which then passes it to each subagent via the Task call's prompt parameter. The plugin agent file tells Claude Code "this is a sonnet-model agent with Read/Grep/Glob tools" -- no game mechanics leaked.

Plugin Directory Structure

blue-plugin/
├── .claude-plugin/
│   └── plugin.json
├── agents/
│   └── alignment-expert.md    (thin: just tool/model config)
├── skills/
│   ├── status/
│   │   └── SKILL.md           (/blue:status)
│   ├── next/
│   │   └── SKILL.md           (/blue:next)
│   ├── rfc/
│   │   └── SKILL.md           (/blue:rfc)
│   └── spike/
│       └── SKILL.md           (/blue:spike)
├── hooks/
│   └── hooks.json             (SessionStart, PostToolUse)
├── .mcp.json                  (blue MCP server auto-config)
└── README.md

Findings

Question Answer
Can Blue be a Claude Code plugin? Yes. The plugin manifest supports all needed components: subagents, skills, hooks, and MCP server config.
What benefits from plugin structure? Subagent distribution, skill shortcuts, auto MCP config, and lifecycle hooks. The biggest wins are eliminating manual .mcp.json setup and providing namespaced /blue:* commands.
Can alignment dialogues be encrypted? No. Plugin files are plain text with no encryption or binary packaging support.
Is alignment still protectable? Yes. The Thin Agent + Fat MCP strategy keeps all game mechanics in compiled Rust. The plugin agent file is a minimal shell; the real behavioral prompt is injected at runtime via MCP tool responses.

Recommendation

Build Blue as a Claude Code plugin using the Thin Agent + Fat MCP strategy.

  1. Package as plugin: Create blue-plugin/ with manifest, thin subagent, skill shortcuts, hooks, and bundled .mcp.json.
  2. Keep alignment in compiled binary: All dialogue mechanics (Judge protocol, scoring, expert tiers, markers, output limits) stay in crates/blue-mcp/src/handlers/dialogue.rs. The plugin agent file contains only tool/model config.
  3. Use runtime injection: blue_dialogue_create continues to inject the full behavioral prompt via its MCP response. No game mechanics appear in any plugin file.
  4. Add lifecycle hooks: SessionStart for Blue voice injection. PostToolUse for auto-linting dialogue edits.

This gives Blue single-command installation, version-controlled distribution, and namespaced commands -- without exposing alignment internals.


Outcome

  • Draft an RFC for the plugin architecture (structure, manifest, skill definitions, hook behavior)
  • Implement the plugin directory as a new top-level blue-plugin/ path in the workspace
  • Migrate the alignment-expert.md subagent from manual placement to plugin-bundled thin agent
  • Test distribution via claude plugin install from a git marketplace