blue/.blue/docs/rfcs/0004-adr-adherence.md
Eric Garcia 0150a5d1ed chore: move adrs, rfcs, spikes to .blue/docs
Per RFC 0003, Blue-tracked documents live in per-repo .blue/ directories.
ADRs needed for semantic adherence checking.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:16:34 -05:00

363 lines
8.6 KiB
Markdown

# RFC 0004: ADR Adherence
| | |
|---|---|
| **Status** | Draft |
| **Date** | 2026-01-24 |
| **Source Spike** | adr-adherence |
| **ADRs** | 0004 (Evidence), 0007 (Integrity), 0008 (Honor) |
---
## Summary
No mechanism to surface relevant ADRs during work, track ADR citations, or verify adherence to testable architectural decisions.
## Philosophy
**Guide, don't block.** ADRs are beliefs, not bureaucracy. Blue should:
- Help you find relevant ADRs
- Make citing ADRs easy
- Verify testable ADRs optionally
- Never require ADR approval to proceed
## Proposal
### Layer 1: Awareness (Passive)
#### `blue_adr_list`
List all ADRs with summaries:
```
blue_adr_list
```
Returns:
```json
{
"adrs": [
{ "number": 0, "title": "Never Give Up", "summary": "The only rule we need" },
{ "number": 4, "title": "Evidence", "summary": "Show, don't tell" },
...
]
}
```
#### `blue_adr_get`
Get full ADR content:
```
blue_adr_get number=4
```
Returns ADR markdown and metadata.
### Layer 2: Contextual Relevance (Active)
#### `blue_adr_relevant`
Given context, use AI to suggest relevant ADRs:
```
blue_adr_relevant context="testing strategy"
```
Returns:
```json
{
"relevant": [
{
"number": 4,
"title": "Evidence",
"confidence": 0.95,
"why": "Testing is the primary form of evidence that code works. This ADR's core principle 'show, don't tell' directly applies to test strategy decisions."
},
{
"number": 7,
"title": "Integrity",
"confidence": 0.82,
"why": "Tests verify structural wholeness - that the system holds together under various conditions."
}
]
}
```
**AI-Powered Relevance:**
Keyword matching fails for philosophical ADRs. "Courage" won't match "deleting legacy code" even though ADR 0009 is highly relevant.
The AI evaluator:
1. Receives the full context (RFC title, problem, code diff, etc.)
2. Reads all ADR content (cached in prompt)
3. Determines semantic relevance with reasoning
4. Returns confidence scores and explanations
**Prompt Structure:**
```
You are evaluating which ADRs are relevant to this work.
Context: {user_context}
ADRs:
{all_adr_summaries}
For each ADR, determine:
1. Is it relevant? (yes/no)
2. Confidence (0.0-1.0)
3. Why is it relevant? (1-2 sentences)
Only return ADRs with confidence > 0.7.
```
**Model Selection:**
- Use fast/cheap model (Haiku) for relevance checks
- Results are suggestions, not authoritative
- User can override or ignore
**Graceful Degradation:**
| Condition | Behavior |
|-----------|----------|
| API key configured, API up | AI relevance (default) |
| API key configured, API down | Fallback to keywords + warning |
| No API key | Keywords only (no warning) |
| `--no-ai` flag | Keywords only (explicit) |
**Response Metadata:**
```json
{
"method": "ai", // or "keyword"
"cached": false,
"latency_ms": 287,
"relevant": [...]
}
```
**Privacy:**
- Only context string sent to API (not code, not files)
- No PII should be in context string
- User controls what context to send
#### RFC ADR Suggestions
When creating an RFC, Blue suggests relevant ADRs based on title/problem:
```
blue_rfc_create title="testing-framework" ...
→ "Consider these ADRs: 0004 (Evidence), 0010 (No Dead Code)"
```
#### ADR Citations in Documents
RFCs can cite ADRs in frontmatter:
```markdown
| **ADRs** | 0004, 0007, 0010 |
```
Or inline:
```markdown
Per ADR 0004 (Evidence), we require test coverage > 80%.
```
### Layer 3: Lightweight Verification (Optional)
#### `blue_adr_audit`
Scan for potential ADR violations. Only for testable ADRs:
```
blue_adr_audit
```
Returns:
```json
{
"findings": [
{
"adr": 10,
"title": "No Dead Code",
"type": "warning",
"message": "3 unused exports in src/utils.rs",
"locations": ["src/utils.rs:45", "src/utils.rs:67", "src/utils.rs:89"]
},
{
"adr": 4,
"title": "Evidence",
"type": "info",
"message": "Test coverage at 72% (threshold: 80%)"
}
],
"passed": [
{ "adr": 5, "title": "Single Source", "message": "No duplicate definitions found" }
]
}
```
**Testable ADRs:**
| ADR | Check |
|-----|-------|
| 0004 Evidence | Test coverage, assertion ratios |
| 0005 Single Source | Duplicate definitions, copy-paste detection |
| 0010 No Dead Code | Unused exports, unreachable branches |
**Non-testable ADRs** (human judgment):
| ADR | Guidance |
|-----|----------|
| 0001 Purpose | Does this serve meaning? |
| 0002 Presence | Are we actually here? |
| 0009 Courage | Are we acting rightly? |
| 0013 Overflow | Building from fullness? |
### Layer 4: Documentation Trail
#### ADR-Document Links
Store citations in `document_links` table:
```sql
INSERT INTO document_links (source_id, target_id, link_type)
VALUES (rfc_id, adr_doc_id, 'cites_adr');
```
#### Search by ADR
```
blue_search query="adr:0004"
```
Returns all documents citing ADR 0004.
#### ADR "Referenced By"
```
blue_adr_get number=4
```
Includes:
```json
{
"referenced_by": [
{ "type": "rfc", "title": "testing-framework", "date": "2026-01-20" },
{ "type": "decision", "title": "require-integration-tests", "date": "2026-01-15" }
]
}
```
## ADR Metadata Enhancement
Add to each ADR:
```markdown
## Applies When
- Writing or modifying tests
- Reviewing pull requests
- Evaluating technical claims
## Anti-Patterns
- Claiming code works without tests
- Trusting documentation over running code
- Accepting "it works on my machine"
```
This gives the AI richer context for relevance matching. Anti-patterns are particularly useful - they help identify when work might be *violating* an ADR.
## Implementation
1. Add ADR document type and loader
2. Implement `blue_adr_list` and `blue_adr_get`
3. **Implement AI relevance evaluator:**
- Load all ADRs into prompt context
- Send context + ADRs to LLM (Haiku for speed/cost)
- Parse structured response with confidence scores
- Cache ADR summaries to minimize token usage
4. Implement `blue_adr_relevant` using AI evaluator
5. Add ADR citation parsing to RFC creation
6. Implement `blue_adr_audit` for testable ADRs
7. Add "referenced_by" to ADR responses
8. Extend `blue_search` for ADR queries
**AI Integration Notes:**
- Blue MCP server needs LLM access (API key in `.blue/config.yaml`)
- Use streaming for responsiveness
- Fallback to keyword matching if AI unavailable
- Cache relevance results per context hash (5 min TTL)
**Caching Strategy:**
```sql
CREATE TABLE adr_relevance_cache (
context_hash TEXT PRIMARY KEY,
adr_versions_hash TEXT, -- Invalidate if ADRs change
result_json TEXT,
created_at TEXT,
expires_at TEXT
);
```
**Testing AI Relevance:**
- Golden test cases with expected ADRs (fuzzy match)
- Confidence thresholds: 0004 should be > 0.8 for "testing"
- Mock AI responses in unit tests
- Integration tests hit real API (rate limited)
## Test Plan
- [ ] List all ADRs returns correct count and summaries
- [ ] Get specific ADR returns full content
- [ ] AI relevance: "testing" context suggests 0004 (Evidence)
- [ ] AI relevance: "deleting old code" suggests 0009 (Courage), 0010 (No Dead Code)
- [ ] AI relevance: confidence scores are reasonable (0.7-1.0 range)
- [ ] AI relevance: explanations are coherent
- [ ] Fallback: keyword matching works when AI unavailable
- [ ] RFC with `| **ADRs** | 0004 |` creates document link
- [ ] Search `adr:0004` finds citing documents
- [ ] Audit detects unused exports (ADR 0010)
- [ ] Audit reports test coverage (ADR 0004)
- [ ] Non-testable ADRs not included in audit findings
- [ ] Caching: repeated same context uses cached result
- [ ] Cache invalidation: ADR content change clears relevant cache
- [ ] `--no-ai` flag forces keyword matching
- [ ] Response includes method (ai/keyword), cached, latency
- [ ] Graceful degradation when API unavailable
## FAQ
**Q: Will this block my PRs?**
A: No. All ADR features are informational. Nothing blocks.
**Q: Do I have to cite ADRs in every RFC?**
A: No. Citations are optional but encouraged for significant decisions.
**Q: What if I disagree with an ADR?**
A: ADRs can be superseded. Create a new ADR documenting why.
**Q: How do I add a new ADR?**
A: `blue_adr_create` (future work) or manually add to `docs/adrs/`.
**Q: Why use AI for relevance instead of keywords?**
A: Keywords fail for philosophical ADRs. "Courage" won't match "deleting legacy code" but ADR 0009 is highly relevant. AI understands semantic meaning.
**Q: What if I don't have an API key configured?**
A: Falls back to keyword matching. Less accurate but still functional.
**Q: How much does the AI relevance check cost?**
A: Uses Haiku (~$0.00025 per check). Cached for 5 minutes per unique context.
---
*"The beliefs that guide us, made visible."*
— Blue