Eric Garcia 0fea499957 feat: lifecycle suffixes for all document states + resolve all clippy warnings

Every document filename now mirrors its lifecycle state with a status
suffix (e.g., .draft.md, .wip.md, .accepted.md). No more bare .md for
tracked document types. Also renamed all from_str methods to parse to
avoid FromStr trait confusion, introduced StagingDeploymentParams struct,
and fixed all 19 clippy warnings across the codebase.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-26 12:19:46 -05:00

5.4 KiB

Raw Blame History

Spike: Realm Semantic Index


Status	In Progress
Date	2026-01-24
Time Box	4 hours

Question

How can we create an AI-maintained semantic index of files within a realm, tracking what each file does (with line references), its relationships to other files, and enabling semantic search for change impact analysis?

Context

Realms coordinate across repos. Domains define relationships (provider/consumer, exports/imports). But when a file changes, there's no quick way to know:

What does this file actually do?
What other files depend on it?
What's the blast radius of a change?

We want an AI-maintained index that answers these questions via semantic search.

Design Space

What Gets Indexed

For each file in a realm:

file: src/realm/domain.rs
last_indexed: 2026-01-24T10:30:00Z
hash: abc123  # for change detection

summary: "Domain definitions for cross-repo coordination. Defines Domain, Binding, ExportBinding, ImportBinding types."

symbols:
  - name: Domain
    kind: struct
    lines: [13, 73]
    description: "A coordination context between repos with name, description, creation time, and member list"

  - name: Binding
    kind: struct
    lines: [76, 143]
    description: "Declares what a repo exports or imports in a domain"

  - name: ImportStatus
    kind: enum
    lines: [259, 274]
    description: "Status of an import binding: Pending, Current, Outdated, Broken"

relationships:
  - target: src/realm/service.rs
    kind: used_by
    description: "RealmService uses Domain and Binding to manage cross-repo state"

  - target: src/realm/repo.rs
    kind: used_by
    description: "Repo operations load/save Domain and Binding files"

Storage Options

Option	Pros	Cons
SQLite + FTS5	Already have blue.db, full-text search built-in	No semantic/vector search
SQLite + sqlite-vec	Vector similarity search, keeps single DB	Requires extension, Rust bindings unclear
Separate JSON files	Human-readable, git-tracked	Slow to search at scale
Embedded vector DB (lancedb)	Purpose-built for semantic search	Another dependency

Recommendation: Start with SQLite + FTS5 for keyword search. Add embeddings later if needed.

Index Update Triggers

On-demand - blue index command regenerates
Git hook - Post-commit hook calls blue index --changed
File watcher - Daemon watches for changes (already have daemon infrastructure)
MCP tool - blue_index_file for AI agents to update during work

Likely want combination: daemon watches + on-demand refresh.

Semantic Search Approaches

Phase 1: Keyword + Structure

FTS5 for text search across summaries and descriptions
Filter by file path, symbol kind, relationship type
Good enough for "find files related to authentication"

Phase 2: Embeddings

Generate embeddings for each symbol description
Store in sqlite-vec or similar
Query: "what handles S3 bucket permissions" → vector similarity

Relationship Detection

AI needs to identify relationships. Approaches:

Static analysis - Parse imports/uses (language-specific, complex)
AI inference - "Given file A and file B, describe their relationship"
Explicit declarations - Like current ExportBinding/ImportBinding
Hybrid - AI suggests, human confirms

Recommendation: AI inference with caching. When indexing file A, ask AI to describe relationships to files it references.

Proposed Schema

-- File-level index
CREATE TABLE file_index (
    id INTEGER PRIMARY KEY,
    realm TEXT NOT NULL,
    repo TEXT NOT NULL,
    file_path TEXT NOT NULL,
    file_hash TEXT NOT NULL,
    summary TEXT,
    indexed_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(realm, repo, file_path)
);

-- Symbol-level index
CREATE TABLE symbol_index (
    id INTEGER PRIMARY KEY,
    file_id INTEGER REFERENCES file_index(id),
    name TEXT NOT NULL,
    kind TEXT NOT NULL,  -- struct, fn, enum, class, etc.
    start_line INTEGER,
    end_line INTEGER,
    description TEXT
);

-- Relationships between files
CREATE TABLE file_relationships (
    id INTEGER PRIMARY KEY,
    source_file_id INTEGER REFERENCES file_index(id),
    target_file_id INTEGER REFERENCES file_index(id),
    kind TEXT NOT NULL,  -- uses, used_by, imports, exports, tests
    description TEXT
);

-- FTS5 virtual table for search
CREATE VIRTUAL TABLE file_search USING fts5(
    file_path,
    summary,
    symbol_names,
    symbol_descriptions,
    content=file_index
);

Proposed MCP Tools

Tool	Purpose
`blue_index_realm`	Index all files in a realm
`blue_index_file`	Index a single file (for incremental updates)
`blue_index_search`	Semantic search across the index
`blue_index_impact`	Given a file, show what depends on it
`blue_index_status`	Show indexing status and staleness

Open Questions

Which AI model for indexing? Local (Ollama) for cost, or API for quality?
How to handle large files? Chunk by function/class? Summary only?
Cross-realm relationships? Index within realm first, cross-realm later?
Embedding model? If we go vector route, which embedding model?

Next Steps

If this spike looks good:

Create RFC for the full design
Start with SQLite schema + FTS5
Add blue_index_file tool that takes AI-generated index data
Add daemon file watcher for auto-indexing

Investigation notes by Blue

5.4 KiB Raw Blame History