Every document filename now mirrors its lifecycle state with a status suffix (e.g., .draft.md, .wip.md, .accepted.md). No more bare .md for tracked document types. Also renamed all from_str methods to parse to avoid FromStr trait confusion, introduced StagingDeploymentParams struct, and fixed all 19 clippy warnings across the codebase. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
363 lines
8.6 KiB
Markdown
363 lines
8.6 KiB
Markdown
# RFC 0004: ADR Adherence
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Status** | Implemented |
|
|
| **Date** | 2026-01-24 |
|
|
| **Source Spike** | adr-adherence |
|
|
| **ADRs** | 0004 (Evidence), 0007 (Integrity), 0008 (Honor) |
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
No mechanism to surface relevant ADRs during work, track ADR citations, or verify adherence to testable architectural decisions.
|
|
|
|
## Philosophy
|
|
|
|
**Guide, don't block.** ADRs are beliefs, not bureaucracy. Blue should:
|
|
- Help you find relevant ADRs
|
|
- Make citing ADRs easy
|
|
- Verify testable ADRs optionally
|
|
- Never require ADR approval to proceed
|
|
|
|
## Proposal
|
|
|
|
### Layer 1: Awareness (Passive)
|
|
|
|
#### `blue_adr_list`
|
|
|
|
List all ADRs with summaries:
|
|
|
|
```
|
|
blue_adr_list
|
|
```
|
|
|
|
Returns:
|
|
```json
|
|
{
|
|
"adrs": [
|
|
{ "number": 0, "title": "Never Give Up", "summary": "The only rule we need" },
|
|
{ "number": 4, "title": "Evidence", "summary": "Show, don't tell" },
|
|
...
|
|
]
|
|
}
|
|
```
|
|
|
|
#### `blue_adr_get`
|
|
|
|
Get full ADR content:
|
|
|
|
```
|
|
blue_adr_get number=4
|
|
```
|
|
|
|
Returns ADR markdown and metadata.
|
|
|
|
### Layer 2: Contextual Relevance (Active)
|
|
|
|
#### `blue_adr_relevant`
|
|
|
|
Given context, use AI to suggest relevant ADRs:
|
|
|
|
```
|
|
blue_adr_relevant context="testing strategy"
|
|
```
|
|
|
|
Returns:
|
|
```json
|
|
{
|
|
"relevant": [
|
|
{
|
|
"number": 4,
|
|
"title": "Evidence",
|
|
"confidence": 0.95,
|
|
"why": "Testing is the primary form of evidence that code works. This ADR's core principle 'show, don't tell' directly applies to test strategy decisions."
|
|
},
|
|
{
|
|
"number": 7,
|
|
"title": "Integrity",
|
|
"confidence": 0.82,
|
|
"why": "Tests verify structural wholeness - that the system holds together under various conditions."
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**AI-Powered Relevance:**
|
|
|
|
Keyword matching fails for philosophical ADRs. "Courage" won't match "deleting legacy code" even though ADR 0009 is highly relevant.
|
|
|
|
The AI evaluator:
|
|
1. Receives the full context (RFC title, problem, code diff, etc.)
|
|
2. Reads all ADR content (cached in prompt)
|
|
3. Determines semantic relevance with reasoning
|
|
4. Returns confidence scores and explanations
|
|
|
|
**Prompt Structure:**
|
|
|
|
```
|
|
You are evaluating which ADRs are relevant to this work.
|
|
|
|
Context: {user_context}
|
|
|
|
ADRs:
|
|
{all_adr_summaries}
|
|
|
|
For each ADR, determine:
|
|
1. Is it relevant? (yes/no)
|
|
2. Confidence (0.0-1.0)
|
|
3. Why is it relevant? (1-2 sentences)
|
|
|
|
Only return ADRs with confidence > 0.7.
|
|
```
|
|
|
|
**Model Selection:**
|
|
- Use fast/cheap model (Haiku) for relevance checks
|
|
- Results are suggestions, not authoritative
|
|
- User can override or ignore
|
|
|
|
**Graceful Degradation:**
|
|
|
|
| Condition | Behavior |
|
|
|-----------|----------|
|
|
| API key configured, API up | AI relevance (default) |
|
|
| API key configured, API down | Fallback to keywords + warning |
|
|
| No API key | Keywords only (no warning) |
|
|
| `--no-ai` flag | Keywords only (explicit) |
|
|
|
|
**Response Metadata:**
|
|
|
|
```json
|
|
{
|
|
"method": "ai", // or "keyword"
|
|
"cached": false,
|
|
"latency_ms": 287,
|
|
"relevant": [...]
|
|
}
|
|
```
|
|
|
|
**Privacy:**
|
|
- Only context string sent to API (not code, not files)
|
|
- No PII should be in context string
|
|
- User controls what context to send
|
|
|
|
#### RFC ADR Suggestions
|
|
|
|
When creating an RFC, Blue suggests relevant ADRs based on title/problem:
|
|
|
|
```
|
|
blue_rfc_create title="testing-framework" ...
|
|
|
|
→ "Consider these ADRs: 0004 (Evidence), 0010 (No Dead Code)"
|
|
```
|
|
|
|
#### ADR Citations in Documents
|
|
|
|
RFCs can cite ADRs in frontmatter:
|
|
|
|
```markdown
|
|
| **ADRs** | 0004, 0007, 0010 |
|
|
```
|
|
|
|
Or inline:
|
|
|
|
```markdown
|
|
Per ADR 0004 (Evidence), we require test coverage > 80%.
|
|
```
|
|
|
|
### Layer 3: Lightweight Verification (Optional)
|
|
|
|
#### `blue_adr_audit`
|
|
|
|
Scan for potential ADR violations. Only for testable ADRs:
|
|
|
|
```
|
|
blue_adr_audit
|
|
```
|
|
|
|
Returns:
|
|
```json
|
|
{
|
|
"findings": [
|
|
{
|
|
"adr": 10,
|
|
"title": "No Dead Code",
|
|
"type": "warning",
|
|
"message": "3 unused exports in src/utils.rs",
|
|
"locations": ["src/utils.rs:45", "src/utils.rs:67", "src/utils.rs:89"]
|
|
},
|
|
{
|
|
"adr": 4,
|
|
"title": "Evidence",
|
|
"type": "info",
|
|
"message": "Test coverage at 72% (threshold: 80%)"
|
|
}
|
|
],
|
|
"passed": [
|
|
{ "adr": 5, "title": "Single Source", "message": "No duplicate definitions found" }
|
|
]
|
|
}
|
|
```
|
|
|
|
**Testable ADRs:**
|
|
|
|
| ADR | Check |
|
|
|-----|-------|
|
|
| 0004 Evidence | Test coverage, assertion ratios |
|
|
| 0005 Single Source | Duplicate definitions, copy-paste detection |
|
|
| 0010 No Dead Code | Unused exports, unreachable branches |
|
|
|
|
**Non-testable ADRs** (human judgment):
|
|
|
|
| ADR | Guidance |
|
|
|-----|----------|
|
|
| 0001 Purpose | Does this serve meaning? |
|
|
| 0002 Presence | Are we actually here? |
|
|
| 0009 Courage | Are we acting rightly? |
|
|
| 0013 Overflow | Building from fullness? |
|
|
|
|
### Layer 4: Documentation Trail
|
|
|
|
#### ADR-Document Links
|
|
|
|
Store citations in `document_links` table:
|
|
|
|
```sql
|
|
INSERT INTO document_links (source_id, target_id, link_type)
|
|
VALUES (rfc_id, adr_doc_id, 'cites_adr');
|
|
```
|
|
|
|
#### Search by ADR
|
|
|
|
```
|
|
blue_search query="adr:0004"
|
|
```
|
|
|
|
Returns all documents citing ADR 0004.
|
|
|
|
#### ADR "Referenced By"
|
|
|
|
```
|
|
blue_adr_get number=4
|
|
```
|
|
|
|
Includes:
|
|
```json
|
|
{
|
|
"referenced_by": [
|
|
{ "type": "rfc", "title": "testing-framework", "date": "2026-01-20" },
|
|
{ "type": "decision", "title": "require-integration-tests", "date": "2026-01-15" }
|
|
]
|
|
}
|
|
```
|
|
|
|
## ADR Metadata Enhancement
|
|
|
|
Add to each ADR:
|
|
|
|
```markdown
|
|
## Applies When
|
|
|
|
- Writing or modifying tests
|
|
- Reviewing pull requests
|
|
- Evaluating technical claims
|
|
|
|
## Anti-Patterns
|
|
|
|
- Claiming code works without tests
|
|
- Trusting documentation over running code
|
|
- Accepting "it works on my machine"
|
|
```
|
|
|
|
This gives the AI richer context for relevance matching. Anti-patterns are particularly useful - they help identify when work might be *violating* an ADR.
|
|
|
|
## Implementation
|
|
|
|
1. Add ADR document type and loader
|
|
2. Implement `blue_adr_list` and `blue_adr_get`
|
|
3. **Implement AI relevance evaluator:**
|
|
- Load all ADRs into prompt context
|
|
- Send context + ADRs to LLM (Haiku for speed/cost)
|
|
- Parse structured response with confidence scores
|
|
- Cache ADR summaries to minimize token usage
|
|
4. Implement `blue_adr_relevant` using AI evaluator
|
|
5. Add ADR citation parsing to RFC creation
|
|
6. Implement `blue_adr_audit` for testable ADRs
|
|
7. Add "referenced_by" to ADR responses
|
|
8. Extend `blue_search` for ADR queries
|
|
|
|
**AI Integration Notes:**
|
|
|
|
- Blue MCP server needs LLM access (API key in `.blue/config.yaml`)
|
|
- Use streaming for responsiveness
|
|
- Fallback to keyword matching if AI unavailable
|
|
- Cache relevance results per context hash (5 min TTL)
|
|
|
|
**Caching Strategy:**
|
|
|
|
```sql
|
|
CREATE TABLE adr_relevance_cache (
|
|
context_hash TEXT PRIMARY KEY,
|
|
adr_versions_hash TEXT, -- Invalidate if ADRs change
|
|
result_json TEXT,
|
|
created_at TEXT,
|
|
expires_at TEXT
|
|
);
|
|
```
|
|
|
|
**Testing AI Relevance:**
|
|
|
|
- Golden test cases with expected ADRs (fuzzy match)
|
|
- Confidence thresholds: 0004 should be > 0.8 for "testing"
|
|
- Mock AI responses in unit tests
|
|
- Integration tests hit real API (rate limited)
|
|
|
|
## Test Plan
|
|
|
|
- [ ] List all ADRs returns correct count and summaries
|
|
- [ ] Get specific ADR returns full content
|
|
- [ ] AI relevance: "testing" context suggests 0004 (Evidence)
|
|
- [ ] AI relevance: "deleting old code" suggests 0009 (Courage), 0010 (No Dead Code)
|
|
- [ ] AI relevance: confidence scores are reasonable (0.7-1.0 range)
|
|
- [ ] AI relevance: explanations are coherent
|
|
- [ ] Fallback: keyword matching works when AI unavailable
|
|
- [ ] RFC with `| **ADRs** | 0004 |` creates document link
|
|
- [ ] Search `adr:0004` finds citing documents
|
|
- [ ] Audit detects unused exports (ADR 0010)
|
|
- [ ] Audit reports test coverage (ADR 0004)
|
|
- [ ] Non-testable ADRs not included in audit findings
|
|
- [ ] Caching: repeated same context uses cached result
|
|
- [ ] Cache invalidation: ADR content change clears relevant cache
|
|
- [ ] `--no-ai` flag forces keyword matching
|
|
- [ ] Response includes method (ai/keyword), cached, latency
|
|
- [ ] Graceful degradation when API unavailable
|
|
|
|
## FAQ
|
|
|
|
**Q: Will this block my PRs?**
|
|
A: No. All ADR features are informational. Nothing blocks.
|
|
|
|
**Q: Do I have to cite ADRs in every RFC?**
|
|
A: No. Citations are optional but encouraged for significant decisions.
|
|
|
|
**Q: What if I disagree with an ADR?**
|
|
A: ADRs can be superseded. Create a new ADR documenting why.
|
|
|
|
**Q: How do I add a new ADR?**
|
|
A: `blue_adr_create` (future work) or manually add to `docs/adrs/`.
|
|
|
|
**Q: Why use AI for relevance instead of keywords?**
|
|
A: Keywords fail for philosophical ADRs. "Courage" won't match "deleting legacy code" but ADR 0009 is highly relevant. AI understands semantic meaning.
|
|
|
|
**Q: What if I don't have an API key configured?**
|
|
A: Falls back to keyword matching. Less accurate but still functional.
|
|
|
|
**Q: How much does the AI relevance check cost?**
|
|
A: Uses Haiku (~$0.00025 per check). Cached for 5 minutes per unique context.
|
|
|
|
---
|
|
|
|
*"The beliefs that guide us, made visible."*
|
|
|
|
— Blue
|