blue/.blue/docs/rfcs/0018-document-import-sync.md
Eric Garcia 16d45d9a11 feat: alignment dialogue subagents, MCP instructions, and document batch
Alignment dialogues now use custom `alignment-expert` subagents with
max_turns: 10, tool restrictions (Read/Grep/Glob), and hard 400-word
output limits. Judge protocol injects as prose via RFC 0023. Moved
Blue voice patterns from CLAUDE.md to MCP server instructions field
for cross-repo portability.

Includes RFCs 0017-0026, spikes, and alignment dialogues from
2026-01-25/26 sessions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 07:09:39 -05:00

211 lines
6.7 KiB
Markdown

# RFC 0018: Document Import/Sync Mechanism
| | |
|---|---|
| **Status** | Superseded |
| **Superseded By** | [RFC 0022: Filesystem Authority](./0022-filesystem-authority.md) |
| **Date** | 2026-01-25 |
| **Dialogue** | [rfc-document-import-sync](../dialogues/rfc-document-import-sync.dialogue.md) |
---
## Summary
Blue maintains documents in both filesystem (`.blue/docs/*.md`) and database (`blue.db`). When these diverge, Blue reports "not found" for files that visibly exist. This RFC establishes the filesystem as the single source of truth, with the database serving as a rebuildable index/cache.
## Problem
1. **Files invisible to Blue**: Manually created files, copied files, or files after database reset aren't found by `find_document()`
2. **ADR 0005 violation**: Two sources of truth (filesystem and database) inevitably diverge
3. **Git collaboration broken**: Database doesn't survive `git clone`, so collaborators can't see each other's documents
4. **Branch isolation**: Database state persists across branch switches, causing phantom documents
## Architecture
### Authority Model
```
CURRENT (problematic):
blue_rfc_create → writes file AND database (can diverge)
find_document() → queries database ONLY (misses files)
PROPOSED:
Filesystem = SOURCE OF TRUTH (survives git clone)
Database = DERIVED INDEX (rebuildable, disposable)
find_document() = checks index, falls back to filesystem scan
```
### Metadata Location
| Location | Contents | Rationale |
|----------|----------|-----------|
| **Frontmatter** | title, number, status, date | Human-readable identity |
| **Content** | Relationships (as links) | Parseable from text |
| **Database Only** | id, file_path, content_hash, indexed_at, computed relationships | Derived/computed |
**Principle**: If the database is deleted, files alone must be sufficient for full rebuild.
### Staleness Detection
Hash-based lazy revalidation:
```rust
fn is_document_stale(doc: &Document, file_path: &Path) -> bool {
// Fast path: check mtime
let file_mtime = fs::metadata(file_path).modified();
if file_mtime <= doc.indexed_at { return false; }
// Slow path: verify with hash
let content = fs::read_to_string(file_path)?;
let current_hash = hash_content(&content);
current_hash != doc.content_hash
}
```
No file watchers - they're fragile across platforms and introduce race conditions.
### Reconciliation
| Condition | Action |
|-----------|--------|
| File exists, no DB record | Create DB record from file |
| DB record exists, no file | Soft-delete DB record (`deleted_at = now()`) |
| Both exist, hash mismatch | Update DB from file (filesystem wins) |
### User-Facing Commands
```bash
# Explicit reconciliation
blue sync # Full filesystem scan, reconcile all
blue sync --dry-run # Report drift without fixing
blue sync rfcs/ # Scope to directory
# Status shows drift
blue status # Warns if index drift detected
# Normal operations use index (fast)
blue search "feature" # Queries index
blue rfc get 0042 # Queries index, falls back to filesystem
```
## Implementation
### Phase 1: Add content_hash to Document
```rust
pub struct Document {
// ... existing fields ...
pub content_hash: Option<String>,
pub indexed_at: Option<String>,
}
```
### Phase 2: Implement `find_document` fallback
```rust
pub fn find_document(&self, doc_type: DocType, query: &str) -> Result<Document, StoreError> {
// Try database first (fast path)
if let Ok(doc) = self.find_document_in_db(doc_type, query) {
return Ok(doc);
}
// Fall back to filesystem scan
self.scan_and_register(doc_type, query)
}
```
### Phase 3: Add `blue sync` command
```rust
pub fn reconcile(&self) -> ReconcileResult {
let mut result = ReconcileResult::default();
// Scan filesystem
for file in glob(".blue/docs/**/*.md") {
if !self.has_record_for(&file) {
self.register_from_file(&file);
result.added.push(file);
}
}
// Check for orphan records
for doc in self.all_documents() {
if let Some(path) = &doc.file_path {
if !Path::new(path).exists() {
self.soft_delete(doc.id);
result.orphaned.push(doc);
}
}
}
result
}
```
### Phase 4: Update `blue status` to show drift
```
$ blue status
RFC 0042 in-progress (3/5 tasks)
⚠ Index drift detected:
+ rfcs/0043-new-feature.md (not indexed)
- rfcs/0037-old-thing.md (file missing)
Run `blue sync` to reconcile.
```
## Implementation Plan
### Phase 1: Schema & Hashing
- [ ] Add `content_hash` and `indexed_at` fields to Document struct in `store.rs`
- [ ] Add migration to create `content_hash` and `indexed_at` columns in documents table
- [ ] Update document creation/update to populate `content_hash` via `hash_content()`
### Phase 2: Fallback Logic
- [ ] Implement `is_document_stale()` with mtime fast path and hash slow path
- [ ] Add `scan_and_register()` to parse frontmatter and create DB record from file
- [ ] Modify `find_document()` to fall back to filesystem scan when DB lookup fails
### Phase 3: Sync Command
- [ ] Create `blue_sync` MCP handler with `ReconcileResult` struct
- [ ] Implement `reconcile()` - scan filesystem, register unindexed files
- [ ] Implement orphan detection - soft-delete records for missing files
- [ ] Add `--dry-run` flag to report drift without fixing
- [ ] Add directory scoping (`blue sync rfcs/`)
### Phase 4: Status Integration
- [ ] Update `blue_status` to detect and warn about index drift
- [ ] Show count of unindexed files and orphan records in status output
## Guardrails
1. **Never auto-fix**: Always report drift, require explicit `blue sync`
2. **Soft delete only**: DB records for missing files get `deleted_at`, never hard-deleted
3. **30-day retention**: Soft-deleted records purged after 30 days via `blue purge`
4. **Frontmatter validation**: Files with malformed frontmatter get indexed with warnings, not rejected
## Test Plan
- [ ] `find_document` returns file that exists but has no DB record
- [ ] `blue sync` creates records for unindexed files
- [ ] `blue sync` soft-deletes records for missing files
- [ ] `blue status` warns when drift detected
- [ ] Database can be deleted and rebuilt from files
- [ ] Frontmatter parse errors don't block indexing
- [ ] Hash-based staleness detection works correctly
## References
- **ADR 0005**: Single Source of Truth - "One truth, one location"
- **ADR 0007**: Integrity - "Hidden state is a crack"
- **RFC 0008**: Status Update File Sync - Already syncs status to files
- **RFC 0017**: Plan File Authority - Companion files as source of truth
- **Dialogue**: 6-expert alignment achieved 97% convergence
---
*"If I can `cat` the file, Blue should know about it."*
— The 🧁 Consensus