blue/.blue/docs/rfcs/0018-document-import-sync.super.md
Eric Garcia 0fea499957 feat: lifecycle suffixes for all document states + resolve all clippy warnings
Every document filename now mirrors its lifecycle state with a status
suffix (e.g., .draft.md, .wip.md, .accepted.md). No more bare .md for
tracked document types. Also renamed all from_str methods to parse to
avoid FromStr trait confusion, introduced StagingDeploymentParams struct,
and fixed all 19 clippy warnings across the codebase.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 12:19:46 -05:00

6.7 KiB

RFC 0018: Document Import/Sync Mechanism

Status Superseded
Superseded By RFC 0022: Filesystem Authority
Date 2026-01-25
Dialogue rfc-document-import-sync

Summary

Blue maintains documents in both filesystem (.blue/docs/*.md) and database (blue.db). When these diverge, Blue reports "not found" for files that visibly exist. This RFC establishes the filesystem as the single source of truth, with the database serving as a rebuildable index/cache.

Problem

  1. Files invisible to Blue: Manually created files, copied files, or files after database reset aren't found by find_document()
  2. ADR 0005 violation: Two sources of truth (filesystem and database) inevitably diverge
  3. Git collaboration broken: Database doesn't survive git clone, so collaborators can't see each other's documents
  4. Branch isolation: Database state persists across branch switches, causing phantom documents

Architecture

Authority Model

CURRENT (problematic):
  blue_rfc_create → writes file AND database (can diverge)
  find_document() → queries database ONLY (misses files)

PROPOSED:
  Filesystem = SOURCE OF TRUTH (survives git clone)
  Database = DERIVED INDEX (rebuildable, disposable)
  find_document() = checks index, falls back to filesystem scan

Metadata Location

Location Contents Rationale
Frontmatter title, number, status, date Human-readable identity
Content Relationships (as links) Parseable from text
Database Only id, file_path, content_hash, indexed_at, computed relationships Derived/computed

Principle: If the database is deleted, files alone must be sufficient for full rebuild.

Staleness Detection

Hash-based lazy revalidation:

fn is_document_stale(doc: &Document, file_path: &Path) -> bool {
    // Fast path: check mtime
    let file_mtime = fs::metadata(file_path).modified();
    if file_mtime <= doc.indexed_at { return false; }

    // Slow path: verify with hash
    let content = fs::read_to_string(file_path)?;
    let current_hash = hash_content(&content);
    current_hash != doc.content_hash
}

No file watchers - they're fragile across platforms and introduce race conditions.

Reconciliation

Condition Action
File exists, no DB record Create DB record from file
DB record exists, no file Soft-delete DB record (deleted_at = now())
Both exist, hash mismatch Update DB from file (filesystem wins)

User-Facing Commands

# Explicit reconciliation
blue sync                # Full filesystem scan, reconcile all
blue sync --dry-run      # Report drift without fixing
blue sync rfcs/          # Scope to directory

# Status shows drift
blue status              # Warns if index drift detected

# Normal operations use index (fast)
blue search "feature"    # Queries index
blue rfc get 0042        # Queries index, falls back to filesystem

Implementation

Phase 1: Add content_hash to Document

pub struct Document {
    // ... existing fields ...
    pub content_hash: Option<String>,
    pub indexed_at: Option<String>,
}

Phase 2: Implement find_document fallback

pub fn find_document(&self, doc_type: DocType, query: &str) -> Result<Document, StoreError> {
    // Try database first (fast path)
    if let Ok(doc) = self.find_document_in_db(doc_type, query) {
        return Ok(doc);
    }

    // Fall back to filesystem scan
    self.scan_and_register(doc_type, query)
}

Phase 3: Add blue sync command

pub fn reconcile(&self) -> ReconcileResult {
    let mut result = ReconcileResult::default();

    // Scan filesystem
    for file in glob(".blue/docs/**/*.md") {
        if !self.has_record_for(&file) {
            self.register_from_file(&file);
            result.added.push(file);
        }
    }

    // Check for orphan records
    for doc in self.all_documents() {
        if let Some(path) = &doc.file_path {
            if !Path::new(path).exists() {
                self.soft_delete(doc.id);
                result.orphaned.push(doc);
            }
        }
    }

    result
}

Phase 4: Update blue status to show drift

$ blue status
RFC 0042 in-progress (3/5 tasks)

⚠ Index drift detected:
  + rfcs/0043-new-feature.md (not indexed)
  - rfcs/0037-old-thing.md (file missing)

Run `blue sync` to reconcile.

Implementation Plan

Phase 1: Schema & Hashing

  • Add content_hash and indexed_at fields to Document struct in store.rs
  • Add migration to create content_hash and indexed_at columns in documents table
  • Update document creation/update to populate content_hash via hash_content()

Phase 2: Fallback Logic

  • Implement is_document_stale() with mtime fast path and hash slow path
  • Add scan_and_register() to parse frontmatter and create DB record from file
  • Modify find_document() to fall back to filesystem scan when DB lookup fails

Phase 3: Sync Command

  • Create blue_sync MCP handler with ReconcileResult struct
  • Implement reconcile() - scan filesystem, register unindexed files
  • Implement orphan detection - soft-delete records for missing files
  • Add --dry-run flag to report drift without fixing
  • Add directory scoping (blue sync rfcs/)

Phase 4: Status Integration

  • Update blue_status to detect and warn about index drift
  • Show count of unindexed files and orphan records in status output

Guardrails

  1. Never auto-fix: Always report drift, require explicit blue sync
  2. Soft delete only: DB records for missing files get deleted_at, never hard-deleted
  3. 30-day retention: Soft-deleted records purged after 30 days via blue purge
  4. Frontmatter validation: Files with malformed frontmatter get indexed with warnings, not rejected

Test Plan

  • find_document returns file that exists but has no DB record
  • blue sync creates records for unindexed files
  • blue sync soft-deletes records for missing files
  • blue status warns when drift detected
  • Database can be deleted and rebuilt from files
  • Frontmatter parse errors don't block indexing
  • Hash-based staleness detection works correctly

References

  • ADR 0005: Single Source of Truth - "One truth, one location"
  • ADR 0007: Integrity - "Hidden state is a crack"
  • RFC 0008: Status Update File Sync - Already syncs status to files
  • RFC 0017: Plan File Authority - Companion files as source of truth
  • Dialogue: 6-expert alignment achieved 97% convergence

"If I can cat the file, Blue should know about it."

— The 🧁 Consensus