Every document filename now mirrors its lifecycle state with a status suffix (e.g., .draft.md, .wip.md, .accepted.md). No more bare .md for tracked document types. Also renamed all from_str methods to parse to avoid FromStr trait confusion, introduced StagingDeploymentParams struct, and fixed all 19 clippy warnings across the codebase. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
8.6 KiB
RFC 0004: ADR Adherence
| Status | Implemented |
| Date | 2026-01-24 |
| Source Spike | adr-adherence |
| ADRs | 0004 (Evidence), 0007 (Integrity), 0008 (Honor) |
Summary
No mechanism to surface relevant ADRs during work, track ADR citations, or verify adherence to testable architectural decisions.
Philosophy
Guide, don't block. ADRs are beliefs, not bureaucracy. Blue should:
- Help you find relevant ADRs
- Make citing ADRs easy
- Verify testable ADRs optionally
- Never require ADR approval to proceed
Proposal
Layer 1: Awareness (Passive)
blue_adr_list
List all ADRs with summaries:
blue_adr_list
Returns:
{
"adrs": [
{ "number": 0, "title": "Never Give Up", "summary": "The only rule we need" },
{ "number": 4, "title": "Evidence", "summary": "Show, don't tell" },
...
]
}
blue_adr_get
Get full ADR content:
blue_adr_get number=4
Returns ADR markdown and metadata.
Layer 2: Contextual Relevance (Active)
blue_adr_relevant
Given context, use AI to suggest relevant ADRs:
blue_adr_relevant context="testing strategy"
Returns:
{
"relevant": [
{
"number": 4,
"title": "Evidence",
"confidence": 0.95,
"why": "Testing is the primary form of evidence that code works. This ADR's core principle 'show, don't tell' directly applies to test strategy decisions."
},
{
"number": 7,
"title": "Integrity",
"confidence": 0.82,
"why": "Tests verify structural wholeness - that the system holds together under various conditions."
}
]
}
AI-Powered Relevance:
Keyword matching fails for philosophical ADRs. "Courage" won't match "deleting legacy code" even though ADR 0009 is highly relevant.
The AI evaluator:
- Receives the full context (RFC title, problem, code diff, etc.)
- Reads all ADR content (cached in prompt)
- Determines semantic relevance with reasoning
- Returns confidence scores and explanations
Prompt Structure:
You are evaluating which ADRs are relevant to this work.
Context: {user_context}
ADRs:
{all_adr_summaries}
For each ADR, determine:
1. Is it relevant? (yes/no)
2. Confidence (0.0-1.0)
3. Why is it relevant? (1-2 sentences)
Only return ADRs with confidence > 0.7.
Model Selection:
- Use fast/cheap model (Haiku) for relevance checks
- Results are suggestions, not authoritative
- User can override or ignore
Graceful Degradation:
| Condition | Behavior |
|---|---|
| API key configured, API up | AI relevance (default) |
| API key configured, API down | Fallback to keywords + warning |
| No API key | Keywords only (no warning) |
--no-ai flag |
Keywords only (explicit) |
Response Metadata:
{
"method": "ai", // or "keyword"
"cached": false,
"latency_ms": 287,
"relevant": [...]
}
Privacy:
- Only context string sent to API (not code, not files)
- No PII should be in context string
- User controls what context to send
RFC ADR Suggestions
When creating an RFC, Blue suggests relevant ADRs based on title/problem:
blue_rfc_create title="testing-framework" ...
→ "Consider these ADRs: 0004 (Evidence), 0010 (No Dead Code)"
ADR Citations in Documents
RFCs can cite ADRs in frontmatter:
| **ADRs** | 0004, 0007, 0010 |
Or inline:
Per ADR 0004 (Evidence), we require test coverage > 80%.
Layer 3: Lightweight Verification (Optional)
blue_adr_audit
Scan for potential ADR violations. Only for testable ADRs:
blue_adr_audit
Returns:
{
"findings": [
{
"adr": 10,
"title": "No Dead Code",
"type": "warning",
"message": "3 unused exports in src/utils.rs",
"locations": ["src/utils.rs:45", "src/utils.rs:67", "src/utils.rs:89"]
},
{
"adr": 4,
"title": "Evidence",
"type": "info",
"message": "Test coverage at 72% (threshold: 80%)"
}
],
"passed": [
{ "adr": 5, "title": "Single Source", "message": "No duplicate definitions found" }
]
}
Testable ADRs:
| ADR | Check |
|---|---|
| 0004 Evidence | Test coverage, assertion ratios |
| 0005 Single Source | Duplicate definitions, copy-paste detection |
| 0010 No Dead Code | Unused exports, unreachable branches |
Non-testable ADRs (human judgment):
| ADR | Guidance |
|---|---|
| 0001 Purpose | Does this serve meaning? |
| 0002 Presence | Are we actually here? |
| 0009 Courage | Are we acting rightly? |
| 0013 Overflow | Building from fullness? |
Layer 4: Documentation Trail
ADR-Document Links
Store citations in document_links table:
INSERT INTO document_links (source_id, target_id, link_type)
VALUES (rfc_id, adr_doc_id, 'cites_adr');
Search by ADR
blue_search query="adr:0004"
Returns all documents citing ADR 0004.
ADR "Referenced By"
blue_adr_get number=4
Includes:
{
"referenced_by": [
{ "type": "rfc", "title": "testing-framework", "date": "2026-01-20" },
{ "type": "decision", "title": "require-integration-tests", "date": "2026-01-15" }
]
}
ADR Metadata Enhancement
Add to each ADR:
## Applies When
- Writing or modifying tests
- Reviewing pull requests
- Evaluating technical claims
## Anti-Patterns
- Claiming code works without tests
- Trusting documentation over running code
- Accepting "it works on my machine"
This gives the AI richer context for relevance matching. Anti-patterns are particularly useful - they help identify when work might be violating an ADR.
Implementation
- Add ADR document type and loader
- Implement
blue_adr_listandblue_adr_get - Implement AI relevance evaluator:
- Load all ADRs into prompt context
- Send context + ADRs to LLM (Haiku for speed/cost)
- Parse structured response with confidence scores
- Cache ADR summaries to minimize token usage
- Implement
blue_adr_relevantusing AI evaluator - Add ADR citation parsing to RFC creation
- Implement
blue_adr_auditfor testable ADRs - Add "referenced_by" to ADR responses
- Extend
blue_searchfor ADR queries
AI Integration Notes:
- Blue MCP server needs LLM access (API key in
.blue/config.yaml) - Use streaming for responsiveness
- Fallback to keyword matching if AI unavailable
- Cache relevance results per context hash (5 min TTL)
Caching Strategy:
CREATE TABLE adr_relevance_cache (
context_hash TEXT PRIMARY KEY,
adr_versions_hash TEXT, -- Invalidate if ADRs change
result_json TEXT,
created_at TEXT,
expires_at TEXT
);
Testing AI Relevance:
- Golden test cases with expected ADRs (fuzzy match)
- Confidence thresholds: 0004 should be > 0.8 for "testing"
- Mock AI responses in unit tests
- Integration tests hit real API (rate limited)
Test Plan
- List all ADRs returns correct count and summaries
- Get specific ADR returns full content
- AI relevance: "testing" context suggests 0004 (Evidence)
- AI relevance: "deleting old code" suggests 0009 (Courage), 0010 (No Dead Code)
- AI relevance: confidence scores are reasonable (0.7-1.0 range)
- AI relevance: explanations are coherent
- Fallback: keyword matching works when AI unavailable
- RFC with
| **ADRs** | 0004 |creates document link - Search
adr:0004finds citing documents - Audit detects unused exports (ADR 0010)
- Audit reports test coverage (ADR 0004)
- Non-testable ADRs not included in audit findings
- Caching: repeated same context uses cached result
- Cache invalidation: ADR content change clears relevant cache
--no-aiflag forces keyword matching- Response includes method (ai/keyword), cached, latency
- Graceful degradation when API unavailable
FAQ
Q: Will this block my PRs? A: No. All ADR features are informational. Nothing blocks.
Q: Do I have to cite ADRs in every RFC? A: No. Citations are optional but encouraged for significant decisions.
Q: What if I disagree with an ADR? A: ADRs can be superseded. Create a new ADR documenting why.
Q: How do I add a new ADR?
A: blue_adr_create (future work) or manually add to docs/adrs/.
Q: Why use AI for relevance instead of keywords? A: Keywords fail for philosophical ADRs. "Courage" won't match "deleting legacy code" but ADR 0009 is highly relevant. AI understands semantic meaning.
Q: What if I don't have an API key configured? A: Falls back to keyword matching. Less accurate but still functional.
Q: How much does the AI relevance check cost? A: Uses Haiku (~$0.00025 per check). Cached for 5 minutes per unique context.
"The beliefs that guide us, made visible."
— Blue