blue/.blue/docs/rfcs/0004-adr-adherence.impl.md
Eric Garcia 0fea499957 feat: lifecycle suffixes for all document states + resolve all clippy warnings
Every document filename now mirrors its lifecycle state with a status
suffix (e.g., .draft.md, .wip.md, .accepted.md). No more bare .md for
tracked document types. Also renamed all from_str methods to parse to
avoid FromStr trait confusion, introduced StagingDeploymentParams struct,
and fixed all 19 clippy warnings across the codebase.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 12:19:46 -05:00

8.6 KiB

RFC 0004: ADR Adherence

Status Implemented
Date 2026-01-24
Source Spike adr-adherence
ADRs 0004 (Evidence), 0007 (Integrity), 0008 (Honor)

Summary

No mechanism to surface relevant ADRs during work, track ADR citations, or verify adherence to testable architectural decisions.

Philosophy

Guide, don't block. ADRs are beliefs, not bureaucracy. Blue should:

  • Help you find relevant ADRs
  • Make citing ADRs easy
  • Verify testable ADRs optionally
  • Never require ADR approval to proceed

Proposal

Layer 1: Awareness (Passive)

blue_adr_list

List all ADRs with summaries:

blue_adr_list

Returns:

{
  "adrs": [
    { "number": 0, "title": "Never Give Up", "summary": "The only rule we need" },
    { "number": 4, "title": "Evidence", "summary": "Show, don't tell" },
    ...
  ]
}

blue_adr_get

Get full ADR content:

blue_adr_get number=4

Returns ADR markdown and metadata.

Layer 2: Contextual Relevance (Active)

blue_adr_relevant

Given context, use AI to suggest relevant ADRs:

blue_adr_relevant context="testing strategy"

Returns:

{
  "relevant": [
    {
      "number": 4,
      "title": "Evidence",
      "confidence": 0.95,
      "why": "Testing is the primary form of evidence that code works. This ADR's core principle 'show, don't tell' directly applies to test strategy decisions."
    },
    {
      "number": 7,
      "title": "Integrity",
      "confidence": 0.82,
      "why": "Tests verify structural wholeness - that the system holds together under various conditions."
    }
  ]
}

AI-Powered Relevance:

Keyword matching fails for philosophical ADRs. "Courage" won't match "deleting legacy code" even though ADR 0009 is highly relevant.

The AI evaluator:

  1. Receives the full context (RFC title, problem, code diff, etc.)
  2. Reads all ADR content (cached in prompt)
  3. Determines semantic relevance with reasoning
  4. Returns confidence scores and explanations

Prompt Structure:

You are evaluating which ADRs are relevant to this work.

Context: {user_context}

ADRs:
{all_adr_summaries}

For each ADR, determine:
1. Is it relevant? (yes/no)
2. Confidence (0.0-1.0)
3. Why is it relevant? (1-2 sentences)

Only return ADRs with confidence > 0.7.

Model Selection:

  • Use fast/cheap model (Haiku) for relevance checks
  • Results are suggestions, not authoritative
  • User can override or ignore

Graceful Degradation:

Condition Behavior
API key configured, API up AI relevance (default)
API key configured, API down Fallback to keywords + warning
No API key Keywords only (no warning)
--no-ai flag Keywords only (explicit)

Response Metadata:

{
  "method": "ai",        // or "keyword"
  "cached": false,
  "latency_ms": 287,
  "relevant": [...]
}

Privacy:

  • Only context string sent to API (not code, not files)
  • No PII should be in context string
  • User controls what context to send

RFC ADR Suggestions

When creating an RFC, Blue suggests relevant ADRs based on title/problem:

blue_rfc_create title="testing-framework" ...

→ "Consider these ADRs: 0004 (Evidence), 0010 (No Dead Code)"

ADR Citations in Documents

RFCs can cite ADRs in frontmatter:

| **ADRs** | 0004, 0007, 0010 |

Or inline:

Per ADR 0004 (Evidence), we require test coverage > 80%.

Layer 3: Lightweight Verification (Optional)

blue_adr_audit

Scan for potential ADR violations. Only for testable ADRs:

blue_adr_audit

Returns:

{
  "findings": [
    {
      "adr": 10,
      "title": "No Dead Code",
      "type": "warning",
      "message": "3 unused exports in src/utils.rs",
      "locations": ["src/utils.rs:45", "src/utils.rs:67", "src/utils.rs:89"]
    },
    {
      "adr": 4,
      "title": "Evidence",
      "type": "info",
      "message": "Test coverage at 72% (threshold: 80%)"
    }
  ],
  "passed": [
    { "adr": 5, "title": "Single Source", "message": "No duplicate definitions found" }
  ]
}

Testable ADRs:

ADR Check
0004 Evidence Test coverage, assertion ratios
0005 Single Source Duplicate definitions, copy-paste detection
0010 No Dead Code Unused exports, unreachable branches

Non-testable ADRs (human judgment):

ADR Guidance
0001 Purpose Does this serve meaning?
0002 Presence Are we actually here?
0009 Courage Are we acting rightly?
0013 Overflow Building from fullness?

Layer 4: Documentation Trail

Store citations in document_links table:

INSERT INTO document_links (source_id, target_id, link_type)
VALUES (rfc_id, adr_doc_id, 'cites_adr');

Search by ADR

blue_search query="adr:0004"

Returns all documents citing ADR 0004.

ADR "Referenced By"

blue_adr_get number=4

Includes:

{
  "referenced_by": [
    { "type": "rfc", "title": "testing-framework", "date": "2026-01-20" },
    { "type": "decision", "title": "require-integration-tests", "date": "2026-01-15" }
  ]
}

ADR Metadata Enhancement

Add to each ADR:

## Applies When

- Writing or modifying tests
- Reviewing pull requests
- Evaluating technical claims

## Anti-Patterns

- Claiming code works without tests
- Trusting documentation over running code
- Accepting "it works on my machine"

This gives the AI richer context for relevance matching. Anti-patterns are particularly useful - they help identify when work might be violating an ADR.

Implementation

  1. Add ADR document type and loader
  2. Implement blue_adr_list and blue_adr_get
  3. Implement AI relevance evaluator:
    • Load all ADRs into prompt context
    • Send context + ADRs to LLM (Haiku for speed/cost)
    • Parse structured response with confidence scores
    • Cache ADR summaries to minimize token usage
  4. Implement blue_adr_relevant using AI evaluator
  5. Add ADR citation parsing to RFC creation
  6. Implement blue_adr_audit for testable ADRs
  7. Add "referenced_by" to ADR responses
  8. Extend blue_search for ADR queries

AI Integration Notes:

  • Blue MCP server needs LLM access (API key in .blue/config.yaml)
  • Use streaming for responsiveness
  • Fallback to keyword matching if AI unavailable
  • Cache relevance results per context hash (5 min TTL)

Caching Strategy:

CREATE TABLE adr_relevance_cache (
    context_hash TEXT PRIMARY KEY,
    adr_versions_hash TEXT,  -- Invalidate if ADRs change
    result_json TEXT,
    created_at TEXT,
    expires_at TEXT
);

Testing AI Relevance:

  • Golden test cases with expected ADRs (fuzzy match)
  • Confidence thresholds: 0004 should be > 0.8 for "testing"
  • Mock AI responses in unit tests
  • Integration tests hit real API (rate limited)

Test Plan

  • List all ADRs returns correct count and summaries
  • Get specific ADR returns full content
  • AI relevance: "testing" context suggests 0004 (Evidence)
  • AI relevance: "deleting old code" suggests 0009 (Courage), 0010 (No Dead Code)
  • AI relevance: confidence scores are reasonable (0.7-1.0 range)
  • AI relevance: explanations are coherent
  • Fallback: keyword matching works when AI unavailable
  • RFC with | **ADRs** | 0004 | creates document link
  • Search adr:0004 finds citing documents
  • Audit detects unused exports (ADR 0010)
  • Audit reports test coverage (ADR 0004)
  • Non-testable ADRs not included in audit findings
  • Caching: repeated same context uses cached result
  • Cache invalidation: ADR content change clears relevant cache
  • --no-ai flag forces keyword matching
  • Response includes method (ai/keyword), cached, latency
  • Graceful degradation when API unavailable

FAQ

Q: Will this block my PRs? A: No. All ADR features are informational. Nothing blocks.

Q: Do I have to cite ADRs in every RFC? A: No. Citations are optional but encouraged for significant decisions.

Q: What if I disagree with an ADR? A: ADRs can be superseded. Create a new ADR documenting why.

Q: How do I add a new ADR? A: blue_adr_create (future work) or manually add to docs/adrs/.

Q: Why use AI for relevance instead of keywords? A: Keywords fail for philosophical ADRs. "Courage" won't match "deleting legacy code" but ADR 0009 is highly relevant. AI understands semantic meaning.

Q: What if I don't have an API key configured? A: Falls back to keyword matching. Less accurate but still functional.

Q: How much does the AI relevance check cost? A: Uses Haiku (~$0.00025 per check). Cached for 5 minutes per unique context.


"The beliefs that guide us, made visible."

— Blue