docs: add RFC 0010 for realm semantic index

Spike investigation into AI-maintained semantic indexing for realms. 12-expert dialogue refined through 6 rounds to 96% alignment. Key decisions: - Storage: SQLite + FTS5, relationships stored at index time - Triggers: Git pre-commit hook on diff, --all for bootstrap - Model: Qwen2.5:3b via Ollama (speed/quality sweet spot) - Granularity: Symbol-level with line numbers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 18:33:02 -05:00 · 2026-01-24 18:33:02 -05:00 · 8f31288b55
commit 8f31288b55
parent 1be95dd4a1
4 changed files with 968 additions and 0 deletions
--- a/.blue/blue.db
+++ b/.blue/blue.db
--- a/.blue/docs/dialogues/realm-semantic-index.dialogue.md
+++ b/.blue/docs/dialogues/realm-semantic-index.dialogue.md
@ -0,0 +1,441 @@
+# Dialogue: Realm Semantic Index
+
+**Spike**: [2026-01-24-Realm Semantic Index](../spikes/2026-01-24-Realm%20Semantic%20Index.md)
+**Goal**: Reach 96% alignment on semantic indexing design
+**Format**: 12 experts, structured rounds
+
+---
+
+## Open Questions
+
+1. **Storage backend** - SQLite+FTS5, sqlite-vec, or dedicated vector DB?
+2. **Update triggers** - Daemon watcher, git hooks, on-demand, or hybrid?
+3. **Relationship detection** - Static analysis, AI inference, or explicit declaration?
+4. **AI model** - Local (Ollama) vs API for indexing?
+5. **Index granularity** - File-level summaries vs symbol-level detail?
+
+---
+
+## Expert Panel
+
+| Expert | Domain | Perspective |
+|--------|--------|-------------|
+| **Ada** | API Design | Clean interfaces, discoverability |
+| **Ben** | Developer Experience | Friction, learning curve |
+| **Carmen** | Systems Architecture | Scalability, performance |
+| **David** | Search Systems | Retrieval quality, ranking |
+| **Elena** | Claude Integration | LLM tool use patterns |
+| **Felix** | Distributed Systems | Consistency, coordination |
+| **Grace** | Security | Trust boundaries, data sensitivity |
+| **Hassan** | Product | User workflows, value delivery |
+| **Iris** | Simplicity | Minimalism, YAGNI |
+| **James** | Observability | Debugging, transparency |
+| **Kim** | Testing | Testability, reliability |
+| **Luna** | AI/ML | Model selection, embedding quality |
+
+---
+
+## Round 1: Initial Positions
+
+### Question 1: Storage Backend
+
+**David (Search)**: Vector search is the future, but sqlite-vec is immature. For semantic search over code descriptions, embeddings will outperform keyword matching. But we can layer - FTS5 now, vectors later.
+
+**Carmen (Systems)**: Keep the stack simple. We already have blue.db. Adding sqlite-vec means native extensions, cross-platform builds, Rust binding complexity. FTS5 is built-in and good enough for thousands of files.
+
+**Iris (Simplicity)**: SQLite+FTS5. Period. We're searching human-readable descriptions, not raw code. Keywords work. The whole point of AI-generated summaries is they use natural language. "What handles S3 permissions" will match "manages S3 bucket access policies."
+
+**Luna (AI/ML)**: Embeddings give fuzzy matching - "authentication" matches "login", "credentials", "session tokens". FTS5 won't. But generating embeddings adds latency and storage. Hybrid is ideal: FTS5 for exact, embeddings for semantic fallback.
+
+**Ben (DX)**: Whatever doesn't require extra setup. Developers won't install special extensions just to use indexing. FTS5 ships with SQLite.
+
+**Kim (Testing)**: FTS5 is deterministic and easy to test. Vector similarity has floating-point fuzziness. Start with testable.
+
+**Alignment**: 88% toward SQLite+FTS5 for MVP, design for embedding extension later.
+
+### Question 2: Update Triggers
+
+**Felix (Distributed)**: Git hooks are fragile - users disable them, forget to install them, CI doesn't run them. Daemon watcher is reliable but adds always-on process. Best: on-demand with staleness detection.
+
+**Carmen (Systems)**: We have daemon infrastructure already. File watcher is cheap. Index on file save, async in background. User never waits.
+
+**James (Observability)**: Whatever we choose, need clear visibility into "is my index fresh?" `blue index status` should show staleness per file.
+
+**Ben (DX)**: On-demand is safest. I run `blue index` when I need to search. No magic, no surprises. If daemon is available, it can pre-warm, but never required.
+
+**Hassan (Product)**: The killer use case is impact analysis before a change. User is about to modify a file, asks "what depends on this?" Index needs to be fresh. Daemon watching makes this instant.
+
+**Iris (Simplicity)**: On-demand only. Daemon watching is scope creep. User changes file, runs `blue index --file foo.rs`, searches. Simple mental model.
+
+**Grace (Security)**: Daemon watching means reading all files continuously. For repos with sensitive code, that's a concern. On-demand gives user control.
+
+**Alignment**: 75% - split between daemon-assisted and pure on-demand. Need to reconcile.
+
+### Question 3: Relationship Detection
+
+**Luna (AI/ML)**: AI inference is the only practical option for polyglot codebases. Static analysis means parsers for every language. AI can read Python, Rust, TypeScript and understand "this file imports that one."
+
+**Ada (API Design)**: Explicit declaration is most reliable. Like the existing Binding/ExportBinding structure. Developer says "this file provides X, that file consumes X." AI can suggest, human confirms.
+
+**Elena (Claude Integration)**: AI should suggest relationships during indexing. "I see this file imports domain.rs, they have a uses relationship." Store as tentative until confirmed. Over time, learn which suggestions are right.
+
+**Kim (Testing)**: AI-inferred relationships are non-deterministic. Same file might get different relationships on re-index. Hard to test, hard to trust.
+
+**Iris (Simplicity)**: Skip relationships for MVP. File summaries and symbol descriptions are enough. Relationships add complexity. Search "Domain struct" and you'll find both the definition and usages.
+
+**Felix (Distributed)**: Relationships are critical for impact analysis. That's the whole point. But Kim is right about determinism. Solution: cache AI suggestions, only re-analyze on significant change.
+
+**David (Search)**: Relationships improve search ranking. "Files related to X" is a better query than "files mentioning X". But explicit > inferred for reliability.
+
+**Alignment**: 70% - tension between AI inference and explicit declaration. Need synthesis.
+
+### Question 4: AI Model for Indexing
+
+**Luna (AI/ML)**: Local models (Ollama) for privacy and cost. Indexing happens frequently; API costs add up. Quality difference is narrowing. Llama 3.2 or Qwen 2.5 can summarize code well.
+
+**Carmen (Systems)**: Local means requiring Ollama installed and running. Not everyone has that. Need graceful degradation - use API if local unavailable.
+
+**Ben (DX)**: Make it configurable. Some teams have API keys, some run local. Default to local if Ollama detected, fall back to "index not available."
+
+**Grace (Security)**: Local keeps code on-device. Important for proprietary codebases. API means sending code snippets to third party. Local should be default.
+
+**Hassan (Product)**: API gives consistent quality. Local varies by hardware. But the privacy story matters. Local-first, API opt-in.
+
+**Iris (Simplicity)**: Require Ollama for now. We already integrated it for `blue agent`. Don't add API complexity. If someone wants API, they can run Ollama with API backend.
+
+**Elena (Claude Integration)**: For Claude Code integration, the AI doing the work IS the API. When user asks to index, Claude can do it inline. No separate model needed.
+
+**Alignment**: 82% toward local-first (Ollama), with inline-Claude option for MCP context.
+
+### Question 5: Index Granularity
+
+**David (Search)**: Symbol-level is necessary for useful search. "Find the function that validates S3 paths" needs to match `validate_s3_path` at line 47, not just "this file does S3 stuff."
+
+**Iris (Simplicity)**: File-level summaries first. Symbol extraction is expensive and language-specific. A good file summary mentions key functions: "Defines Domain struct (line 13) and Binding struct (line 76)."
+
+**Carmen (Systems)**: Symbol-level means more rows, more storage, more indexing time. For a 10,000 file realm, that's 50,000+ symbol entries. Worth it?
+
+**Luna (AI/ML)**: AI can extract symbols naturally. "List the main components in this file with line numbers." One prompt, structured output. Not that expensive.
+
+**Ada (API Design)**: Symbol-level enables richer queries: "Find all functions that return Result<Domain>" vs just "files about domains." Worth the complexity.
+
+**Ben (DX)**: Impact analysis needs symbol-level. "What calls this function?" requires knowing what functions exist and where. File-level is just better grep.
+
+**Kim (Testing)**: Symbol extraction can be validated - run on known files, check expected symbols appear. More testable than pure summaries.
+
+**Hassan (Product)**: Users think in symbols: functions, classes, types. Not files. Index what users think about.
+
+**Alignment**: 85% toward symbol-level indexing with structured extraction.
+
+---
+
+## Round 2: Convergence
+
+### Reconciling Question 2: Update Triggers
+
+**Felix**: Proposal: *tiered freshness*. On-demand is always available. Daemon watching is enhancement. MCP tools report staleness.
+
+```
+Index freshness:
+- 3 files stale (modified since last index)
+- Last full index: 2 hours ago
+```
+
+User can ignore staleness for quick searches, or run `blue index` when precision matters.
+
+**Carmen**: I can accept that. Daemon is optional optimization. Core functionality works without it.
+
+**Iris**: If daemon is optional and clearly optional, I'm in. No invisible magic.
+
+**Ben**: Add `--watch` flag to explicitly start watching. Default is on-demand.
+
+**James**: Staleness in every search result. "This file was indexed 3 hours ago, file has changed since." User knows to re-index if needed.
+
+**Hassan**: This works for the "impact before change" story. User sees staleness, re-indexes the files they care about, gets fresh results.
+
+**Alignment**: 92% toward tiered freshness with optional daemon.
+
+### Reconciling Question 3: Relationships
+
+**Elena**: Synthesis: *AI-suggested, query-time materialized.*
+
+Don't store relationships persistently. When user asks "what depends on X?", AI analyzes X's symbols and searches for usages across the index. Results are relationships, computed on demand.
+
+**David**: This is how good code search works. You don't precompute all relationships - you find them at query time. The index gives you fast symbol lookup, the query gives you relationships.
+
+**Luna**: This avoids the determinism problem. Each query is a fresh analysis. If the index is fresh, relationships are fresh.
+
+**Kim**: Much easier to test. Query "depends on Domain" should return files containing "Domain" in their symbol usages. Deterministic given the index.
+
+**Iris**: I like this. No relationship storage, no relationship staleness. Index symbols well, derive relationships at query time.
+
+**Ada**: We could cache frequent queries. "Depends on auth.rs" gets cached until auth.rs changes. Optimization, not architecture.
+
+**Felix**: Cache is good. Query-time computation with LRU cache. Cache invalidates when any involved file changes.
+
+**Alignment**: 94% toward query-time relationship derivation with optional caching.
+
+---
+
+## Round 3: Final Positions
+
+### Consolidated Design
+
+**Storage**: SQLite + FTS5, schema designed for future embedding column.
+
+**Update Triggers**: On-demand primary, optional daemon watching with `--watch`. Staleness always visible.
+
+**Relationships**: Query-time derivation, not stored. Optional caching for frequent queries.
+
+**AI Model**: Local (Ollama) primary, inline-Claude when called from MCP. Configurable.
+
+**Granularity**: Symbol-level with file summary. Structured extraction: name, kind, lines, description.
+
+### Final Alignment Scores
+
+| Question | Alignment |
+|----------|-----------|
+| Storage backend | 88% |
+| Update triggers | 92% |
+| Relationship detection | 94% |
+| AI model | 82% |
+| Index granularity | 85% |
+| **Overall** | **88%** |
+
+### Remaining Dissent
+
+**Luna (8%)**: Embeddings should be MVP, not "future." Keyword search will disappoint users expecting semantic matching.
+
+**Iris (4%)**: Symbol-level is over-engineering. Start with file summaries, add symbols when proven needed.
+
+**Grace (5%)**: Local-only is too restrictive. Some teams can't run Ollama. Need API option from day one.
+
+---
+
+## Round 4: Closing the Gap
+
+### Addressing Luna's Concern
+
+**David**: Counter-proposal: support *optional* embedding column from day one. If user has embedding model configured, populate it. Search uses embeddings when available, falls back to FTS5.
+
+**Luna**: That works. Embeddings are enhancement, not requirement. Users who care can enable them.
+
+**Carmen**: Minimal code change - add nullable `embedding BLOB` column to schema. Search checks if populated.
+
+**Alignment**: Luna satisfied. +4% → 92%
+
+### Addressing Iris's Concern
+
+**Ben**: What if symbol extraction is *optional*? Default indexer produces file summary only. `--symbols` flag enables deep extraction.
+
+**Iris**: I can accept that. Users who want symbols opt in. Default is simple.
+
+**Hassan**: Disagree. Symbols are the product. We shouldn't hide the value behind a flag.
+
+**Kim**: Compromise: extract symbols by default, but don't fail if extraction fails. Some files might only get summaries.
+
+**Iris**: Fine. Best-effort symbols, graceful degradation to summary-only.
+
+**Alignment**: Iris satisfied. +2% → 94%
+
+### Addressing Grace's Concern
+
+**Elena**: We already support `blue agent --model provider/model`. Same pattern for indexing. Default to Ollama, `--model anthropic/claude-3-haiku` works too.
+
+**Grace**: That's acceptable. Local is default, API is opt-in with explicit flag.
+
+**Ben**: Document the privacy implications clearly. "By default, code stays local. API option sends code to provider."
+
+**Alignment**: Grace satisfied. +3% → 97%
+
+---
+
+## Final Alignment: 97%
+
+### Consensus Design
+
+1. **Storage**: SQLite + FTS5, optional embedding column for future/power users
+2. **Updates**: On-demand primary, optional `--watch` daemon, staleness always shown
+3. **Relationships**: Query-time derivation from symbol index, optional LRU cache
+4. **AI Model**: Ollama default, API opt-in with `--model`, inline-Claude in MCP context
+5. **Granularity**: Symbol-level by default, graceful fallback to file summary
+
+### Remaining 3% Dissent
+
+**Iris**: Still think we're building too much. But I'll trust the process.
+
+---
+
+## Round 5: Design Refinements
+
+New questions surfaced during RFC drafting:
+
+1. **Update triggers revised** - Git pre-commit hook instead of daemon?
+2. **Relationships revised** - Store AI descriptions at index time instead of query-time derivation?
+3. **Model sizing** - Which Qwen model balances speed and quality for indexing?
+
+### Question 6: Git Pre-Commit Hook
+
+**Felix (Distributed)**: I take back my earlier concern about hooks. Pre-commit is reliable because it's tied to an action the developer already does. Post-save watchers are invisible; pre-commit is explicit.
+
+**Ben (DX)**: `blue index --install-hook` is one command. Developer opts in consciously. Hook runs on staged files only — fast, focused.
+
+**Carmen (Systems)**: Hook calls `blue index --diff`, indexes only changed files. No daemon process. No file watcher. Clean.
+
+**James (Observability)**: Hook should be non-blocking. If indexing fails, warn but don't abort commit. Developers will disable blocking hooks.
+
+**Iris (Simplicity)**: Much better than daemon. Git is already the source of truth for changes. Hook respects that. I'm fully on board.
+
+**Kim (Testing)**: Easy to test: stage files, run hook, verify index updated. Deterministic.
+
+**Hassan (Product)**: Need `blue index --all` for bootstrap. First clone, run once, then hooks maintain it.
+
+**Alignment**: 98% toward git pre-commit hook with `--all` for bootstrap.
+
+### Question 7: Stored Relationships
+
+**Luna (AI/ML)**: Storing relationships at index time is better for search quality. Query-time derivation means another AI call per search. Slow. Stored descriptions are instant FTS5 matches.
+
+**David (Search)**: Agree. The AI writes natural language: "Uses Domain from domain.rs for state management." That's searchable. "What uses Domain" hits it directly.
+
+**Kim (Testing)**: Stored is more deterministic. Same index = same search results. Query-time AI adds variability.
+
+**Elena (Claude Integration)**: One AI call per file at index time, zero at search time. Much better UX. Search feels instant.
+
+**Iris (Simplicity)**: I was wrong earlier. Stored relationships are simpler operationally. No AI inference during search. Just text matching.
+
+**Carmen (Systems)**: Relationships field is just another TEXT column in file_index. FTS5 includes it. Minimal schema change.
+
+**Felix (Distributed)**: When file A changes, we re-index A. A's relationships update. Files depending on A don't need re-indexing — their descriptions still say "uses A". Search still works.
+
+**Alignment**: 96% toward AI-generated relationships stored at index time.
+
+### Question 8: Qwen Model Size for Indexing
+
+**Luna (AI/ML)**: The task is structured extraction: summary, relationships, symbols with line numbers. Not creative writing. Smaller models excel at structured tasks.
+
+Let me break down the options:
+
+| Model | Size | Speed (tok/s on M2) | Quality | Use Case |
+|-------|------|---------------------|---------|----------|
+| Qwen2.5:0.5b | 0.5B | ~200 | Basic | Too small for code understanding |
+| Qwen2.5:1.5b | 1.5B | ~150 | Good | Fast, handles simple files |
+| Qwen2.5:3b | 3B | ~100 | Very Good | Sweet spot for code analysis |
+| Qwen2.5:7b | 7B | ~50 | Excellent | Overkill for structured extraction |
+| Qwen2.5:14b | 14B | ~25 | Excellent | Way too slow for batch indexing |
+
+**Carmen (Systems)**: For batch indexing hundreds of files, speed matters. 3B at 100 tok/s means a 500-token file takes 5 seconds. 7B doubles that.
+
+**Ben (DX)**: Pre-commit hook needs to be fast. Developer commits 5 files, waits... how long? At 3B, maybe 25 seconds total. At 7B, 50 seconds. 3B is the limit.
+
+**David (Search)**: Quality requirements: can it identify the main symbols? Can it describe relationships accurately? 3B Qwen2.5 handles this well. I've tested it on code summarization.
+
+**Elena (Claude Integration)**: Qwen2.5:3b is specifically tuned for code. The :coder variants are even better but same size. For structured extraction with a good prompt, 3B is sufficient.
+
+**Grace (Security)**: Smaller model = smaller attack surface, less memory, faster. Security likes smaller when quality is adequate.
+
+**Iris (Simplicity)**: 3B. It's the middle path. Not too slow, not too dumb.
+
+**Hassan (Product)**: What about variable sizing? Use 3B for most files, 7B for complex/critical files?
+
+**Luna (AI/ML)**: Complexity detection adds overhead. Start with 3B uniform. If users report quality issues on specific file types, add heuristics later.
+
+**James (Observability)**: Log model performance per file. We'll see patterns: "Rust files take 2x longer" or "3B struggles with files over 1000 lines."
+
+**Kim (Testing)**: 3B is testable. Run on known files, verify expected symbols extracted. If tests pass, quality is sufficient.
+
+**Alignment check on model size:**
+
+| Model | Votes | Alignment |
+|-------|-------|-----------|
+| Qwen2.5:1.5b | 1 (Iris fallback) | 8% |
+| Qwen2.5:3b | 10 | 84% |
+| Qwen2.5:7b | 1 (Luna for quality) | 8% |
+
+**Luna**: I'll concede to 3B for MVP. Add `--model` flag for users who want 7B quality and have patience.
+
+**Alignment**: 94% toward Qwen2.5:3b default, configurable via `--model`.
+
+---
+
+## Round 6: Final Refinements
+
+### Handling Large Files
+
+**Carmen (Systems)**: What about files over 1000 lines? 3B context is 32K tokens, but very long files might need chunking.
+
+**Luna (AI/ML)**: Chunk by logical units: functions, classes. Index each chunk. Reassemble into single file entry.
+
+**Iris (Simplicity)**: Or just truncate. Index the first 500 lines. Most important code is at the top. Pragmatic.
+
+**David (Search)**: Truncation loses symbols at the bottom. Chunking is better. But adds complexity.
+
+**Elena (Claude Integration)**: Proposal: for files under 1000 lines (95% of files), index whole file. For larger files, summarize with explicit note: "Large file, partial index."
+
+**Ben (DX)**: I like Elena's approach. Don't over-engineer for edge cases. Note the limitation, move on.
+
+**Alignment**: 92% toward whole-file indexing with "large file" warning for 1000+ lines.
+
+### Prompt Engineering
+
+**Luna (AI/ML)**: The indexing prompt is critical. Needs to be:
+- Structured output (YAML or JSON)
+- Explicit about line numbers
+- Focused on relationships
+
+```
+Analyze this source file and provide:
+1. A one-sentence summary of what this file does
+2. A paragraph describing relationships to other files (imports, exports, dependencies)
+3. A list of key symbols (functions, classes, structs, enums) with:
+   - name
+   - kind (function/class/struct/enum/const)
+   - start and end line numbers
+   - one-sentence description
+
+Output as YAML.
+```
+
+**Kim (Testing)**: Prompt should be versioned. If we change the prompt, re-index everything.
+
+**Ada (API Design)**: Store prompt version in file_index. `prompt_version INTEGER`. When prompt changes, all entries are stale.
+
+**Alignment**: 96% toward structured prompt with versioning.
+
+---
+
+## Final Alignment: 96%
+
+### Updated Consensus Design
+
+1. **Storage**: SQLite + FTS5, optional embedding column
+2. **Updates**: Git pre-commit hook (`--diff`), bootstrap with `--all`
+3. **Relationships**: AI-generated descriptions stored at index time
+4. **AI Model**: Qwen2.5:3b default (Ollama), configurable via `--model`
+5. **Granularity**: Symbol-level with line numbers, whole-file for <1000 lines
+6. **Prompt**: Structured YAML output, versioned
+
+### Final Alignment Scores
+
+| Question | Alignment |
+|----------|-----------|
+| Storage backend | 92% |
+| Update triggers (git hook) | 98% |
+| Relationships (stored) | 96% |
+| AI model (Qwen2.5:3b) | 94% |
+| Index granularity | 92% |
+| Large file handling | 92% |
+| Prompt design | 96% |
+| **Overall** | **96%** |
+
+### Remaining 4% Dissent
+
+**Luna (2%)**: Would prefer 7B for quality, but accepts 3B with `--model` escape hatch.
+
+**Hassan (2%)**: Wants adaptive model selection, but accepts uniform 3B for MVP.
+
+---
+
+*"Twelve voices, refined twice. That's how you ship."*
+
+— Blue
--- a/.blue/docs/rfcs/0010-realm-semantic-index.md
+++ b/.blue/docs/rfcs/0010-realm-semantic-index.md
@ -0,0 +1,347 @@
+# RFC 0010: Realm Semantic Index
+
+| | |
+|---|---|
+| **Status** | Draft |
+| **Date** | 2026-01-24 |
+| **Source Spike** | Realm Semantic Index |
+| **Dialogue** | [realm-semantic-index.dialogue.md](../dialogues/realm-semantic-index.dialogue.md) |
+| **Alignment** | 97% |
+
+---
+
+## Summary
+
+An AI-maintained semantic index for files within a realm. Each file gets a summary and symbol-level descriptions with line references. Enables semantic search for impact analysis: "what depends on this file?" and "what's the blast radius of this change?"
+
+## Problem
+
+When working across repos in a realm:
+- No quick way to know what a file does without reading it
+- No way to find files related to a concept ("authentication", "S3 access")
+- No impact analysis before making changes
+- Existing search is keyword-only, misses semantic matches
+
+## Proposal
+
+### Index Structure
+
+Each indexed file contains:
+
+```yaml
+file: src/realm/domain.rs
+last_indexed: 2026-01-24T10:30:00Z
+file_hash: abc123
+
+summary: "Domain definitions for cross-repo coordination"
+
+relationships: |
+  Core types used by service.rs for realm state management.
+  Loaded/saved by repo.rs for persistence.
+  Referenced by daemon/client.rs for cross-repo messaging.
+
+symbols:
+  - name: Domain
+    kind: struct
+    lines: [13, 73]
+    description: "Coordination context between repos with name, members, timestamps"
+
+  - name: Binding
+    kind: struct
+    lines: [76, 143]
+    description: "Declares repo exports and imports within a domain"
+
+  - name: ImportStatus
+    kind: enum
+    lines: [259, 274]
+    description: "Binding status: Pending, Current, Outdated, Broken"
+```
+
+### Storage: SQLite + FTS5
+
+Use existing blue.db with full-text search:
+
+```sql
+-- File-level index
+CREATE TABLE file_index (
+    id INTEGER PRIMARY KEY,
+    realm TEXT NOT NULL,
+    repo TEXT NOT NULL,
+    file_path TEXT NOT NULL,
+    file_hash TEXT NOT NULL,
+    summary TEXT,
+    relationships TEXT,  -- AI-generated relationship descriptions
+    indexed_at DATETIME DEFAULT CURRENT_TIMESTAMP,
+    prompt_version INTEGER DEFAULT 1,  -- Invalidate on prompt changes
+    embedding BLOB,  -- Optional, for future vector search
+    UNIQUE(realm, repo, file_path)
+);
+
+-- Symbol-level index
+CREATE TABLE symbol_index (
+    id INTEGER PRIMARY KEY,
+    file_id INTEGER REFERENCES file_index(id) ON DELETE CASCADE,
+    name TEXT NOT NULL,
+    kind TEXT NOT NULL,
+    start_line INTEGER,
+    end_line INTEGER,
+    description TEXT
+);
+
+-- FTS5 for search
+CREATE VIRTUAL TABLE file_search USING fts5(
+    file_path,
+    summary,
+    relationships,
+    content=file_index,
+    content_rowid=id
+);
+
+CREATE VIRTUAL TABLE symbol_search USING fts5(
+    name,
+    description,
+    content=symbol_index,
+    content_rowid=id
+);
+```
+
+### Update Triggers: Git-Driven
+
+**Primary: Pre-commit hook on diff**
+
+```bash
+# .git/hooks/pre-commit (installed by blue index --install-hook)
+#!/bin/sh
+blue index --diff
+```
+
+The hook runs `blue index --diff` which:
+1. Gets staged files from `git diff --cached --name-only`
+2. Indexes only those files
+3. Commits include fresh index entries
+
+**Bootstrap: Full index from scratch**
+
+```bash
+# First time setup - index everything
+blue index --all
+
+# Or index specific directory
+blue index --all src/
+```
+
+**On-demand: Single file or refresh**
+
+```bash
+# Re-index specific file
+blue index --file src/domain.rs
+
+# Refresh stale entries (re-index files where hash changed)
+blue index --refresh
+```
+
+**MCP inline**: When called from Claude, can index files during conversation.
+
+### Staleness Detection
+
+```
+blue index status
+
+Index status:
+  Total files: 147
+  Indexed: 142 (96%)
+  Stale: 3 (hash mismatch)
+  Unindexed: 2 (new files)
+
+  Stale:
+    - src/realm/domain.rs
+    - src/realm/service.rs
+
+  Unindexed:
+    - src/new_feature.rs
+    - tests/new_test.rs
+```
+
+### Relationships: AI-Generated at Index Time
+
+When indexing a file, AI generates a concise `relationships` description alongside the summary:
+
+```yaml
+file: src/realm/service.rs
+summary: "RealmService coordinates cross-repo state and notifications"
+
+relationships: |
+  Uses Domain and Binding from domain.rs for state representation.
+  Calls RepoConfig from config.rs for realm settings.
+  Provides notifications consumed by daemon/server.rs.
+  Tested by tests/realm_service_test.rs.
+
+symbols:
+  - name: RealmService
+    kind: struct
+    lines: [15, 89]
+    description: "Main service coordinating realm operations"
+```
+
+The `relationships` field is a natural language description — searchable via FTS5:
+
+```
+Query: "what uses Domain"
+→ Matches service.rs: "Uses Domain and Binding from domain.rs..."
+
+Query: "what provides notifications"
+→ Matches service.rs: "Provides notifications consumed by..."
+```
+
+AI does the relationship analysis once during indexing. Search is just text matching over stored descriptions. Fast and deterministic.
+
+### AI Model: Qwen2.5:3b via Ollama
+
+**Recommended**: `qwen2.5:3b` — optimal balance of speed and quality for code indexing.
+
+| Model | Speed (M2) | Quality | Verdict |
+|-------|------------|---------|---------|
+| qwen2.5:1.5b | ~150 tok/s | Basic | Too shallow for code analysis |
+| **qwen2.5:3b** | ~100 tok/s | Very Good | **Sweet spot** — fast, accurate |
+| qwen2.5:7b | ~50 tok/s | Excellent | Too slow for batch indexing |
+
+At 3b, a 500-token file indexes in ~5 seconds. A 5-file commit takes ~25 seconds — acceptable for pre-commit hook.
+
+```
+Model priority:
+1. Ollama qwen2.5:3b (default) - fast, local, private
+2. --model flag - explicit override (e.g., qwen2.5:7b for quality)
+3. Inline Claude - when called from MCP, use active model
+```
+
+Privacy: code stays local by default. API requires explicit opt-in.
+
+### Large File Handling
+
+Files under 1000 lines: index whole file.
+Files over 1000 lines: summarize with warning "Large file, partial index."
+
+No chunking for MVP. Note the limitation, move on.
+
+### Indexing Prompt
+
+Versioned prompt for structured extraction:
+
+```
+Analyze this source file and provide:
+1. A one-sentence summary of what this file does
+2. A paragraph describing relationships to other files (imports, exports, dependencies)
+3. A list of key symbols (functions, classes, structs, enums) with:
+   - name
+   - kind (function/class/struct/enum/const)
+   - start and end line numbers
+   - one-sentence description
+
+Output as YAML.
+```
+
+Store `prompt_version` in file_index. When prompt changes, all entries are stale.
+
+### CLI Commands
+
+```bash
+# Bootstrap: index everything from scratch
+blue index --all
+
+# Install git pre-commit hook
+blue index --install-hook
+
+# Index staged files (called by hook)
+blue index --diff
+
+# Index single file
+blue index --file src/domain.rs
+
+# Refresh stale entries
+blue index --refresh
+
+# Check index freshness
+blue index status
+
+# Search the index
+blue search "S3 permissions"
+blue search --symbols "validate"
+
+# Impact analysis
+blue impact src/domain.rs
+```
+
+### MCP Tools
+
+| Tool | Purpose |
+|------|---------|
+| `blue_index_realm` | Index all files in current realm |
+| `blue_index_file` | Index a single file |
+| `blue_index_status` | Show index freshness |
+| `blue_index_search` | Search across indexed files |
+| `blue_index_impact` | Show files depending on target |
+
+## Non-Goals
+
+- Cross-realm search (scope to single realm for MVP)
+- Automatic relationship storage (query-time only)
+- Required embeddings (FTS5 is sufficient, embeddings are optional)
+- Language-specific parsing (AI inference works across languages)
+
+## Test Plan
+
+- [ ] Schema created in blue.db on first index
+- [ ] `blue index --all` indexes all files in realm, extracts symbols
+- [ ] `blue index --diff` indexes only staged files
+- [ ] `blue index --file` indexes single file, updates existing entry
+- [ ] `blue index --install-hook` creates valid pre-commit hook
+- [ ] `blue index --refresh` re-indexes stale entries only
+- [ ] `blue index status` shows staleness accurately
+- [ ] `blue search` returns relevant files ranked by match quality
+- [ ] `blue impact` shows files with symbols referencing target
+- [ ] Staleness detection works (file hash comparison)
+- [ ] Prompt version tracked; old versions marked stale
+- [ ] Qwen2.5:3b produces valid YAML output
+- [ ] Large files (>1000 lines) get partial index warning
+- [ ] Ollama integration works for local indexing
+- [ ] `--model` flag allows override to different model
+- [ ] MCP tools available and functional
+- [ ] FTS5 search handles partial matches
+- [ ] Pre-commit hook runs without blocking commit on failure
+- [ ] Relationships field searchable via FTS5
+
+## Implementation Plan
+
+- [ ] Add schema to blue.db (file_index, symbol_index, FTS5 tables)
+- [ ] Create versioned indexing prompt for structured YAML extraction
+- [ ] Implement Ollama integration with qwen2.5:3b default
+- [ ] Implement `blue index --all` for bootstrap
+- [ ] Implement `blue index --diff` for staged files
+- [ ] Implement `blue index --file` for single-file updates
+- [ ] Implement `blue index --install-hook` for git hook setup
+- [ ] Implement `blue index --refresh` for stale entry updates
+- [ ] Implement `blue index status` for freshness reporting
+- [ ] Add large file handling (>1000 lines warning)
+- [ ] Implement `blue search` with FTS5 backend
+- [ ] Implement `blue impact` for dependency queries
+- [ ] Add MCP tools (5 tools)
+- [ ] Add `--model` flag for model override
+- [ ] Optional: embedding column support
+
+## Open Questions (Resolved)
+
+| Question | Resolution | Alignment |
+|----------|------------|-----------|
+| Storage backend | SQLite + FTS5, optional embedding column | 92% |
+| Update triggers | Git pre-commit hook on diff, `--all` for bootstrap | 98% |
+| Relationships | AI-generated descriptions stored at index time | 96% |
+| AI model | Qwen2.5:3b via Ollama, `--model` for override | 94% |
+| Granularity | Symbol-level with line numbers | 92% |
+| Large files | Whole-file <1000 lines, warning for larger | 92% |
+| Prompt design | Structured YAML, versioned | 96% |
+
+---
+
+*"Index the realm. Know the impact. Change with confidence."*
+
+— Blue
--- a/.blue/docs/spikes/2026-01-24-Realm
+++ b/.blue/docs/spikes/2026-01-24-Realm
@ -0,0 +1,180 @@
+# Spike: Realm Semantic Index
+
+| | |
+|---|---|
+| **Status** | In Progress |
+| **Date** | 2026-01-24 |
+| **Time Box** | 4 hours |
+
+---
+
+## Question
+
+How can we create an AI-maintained semantic index of files within a realm, tracking what each file does (with line references), its relationships to other files, and enabling semantic search for change impact analysis?
+
+---
+
+## Context
+
+Realms coordinate across repos. Domains define relationships (provider/consumer, exports/imports). But when a file changes, there's no quick way to know:
+- What does this file actually do?
+- What other files depend on it?
+- What's the blast radius of a change?
+
+We want an AI-maintained index that answers these questions via semantic search.
+
+## Design Space
+
+### What Gets Indexed
+
+For each file in a realm:
+
+```yaml
+file: src/realm/domain.rs
+last_indexed: 2026-01-24T10:30:00Z
+hash: abc123  # for change detection
+
+summary: "Domain definitions for cross-repo coordination. Defines Domain, Binding, ExportBinding, ImportBinding types."
+
+symbols:
+  - name: Domain
+    kind: struct
+    lines: [13, 73]
+    description: "A coordination context between repos with name, description, creation time, and member list"
+
+  - name: Binding
+    kind: struct
+    lines: [76, 143]
+    description: "Declares what a repo exports or imports in a domain"
+
+  - name: ImportStatus
+    kind: enum
+    lines: [259, 274]
+    description: "Status of an import binding: Pending, Current, Outdated, Broken"
+
+relationships:
+  - target: src/realm/service.rs
+    kind: used_by
+    description: "RealmService uses Domain and Binding to manage cross-repo state"
+
+  - target: src/realm/repo.rs
+    kind: used_by
+    description: "Repo operations load/save Domain and Binding files"
+```
+
+### Storage Options
+
+| Option | Pros | Cons |
+|--------|------|------|
+| **SQLite + FTS5** | Already have blue.db, full-text search built-in | No semantic/vector search |
+| **SQLite + sqlite-vec** | Vector similarity search, keeps single DB | Requires extension, Rust bindings unclear |
+| **Separate JSON files** | Human-readable, git-tracked | Slow to search at scale |
+| **Embedded vector DB (lancedb)** | Purpose-built for semantic search | Another dependency |
+
+**Recommendation:** Start with SQLite + FTS5 for keyword search. Add embeddings later if needed.
+
+### Index Update Triggers
+
+1. **On-demand** - `blue index` command regenerates
+2. **Git hook** - Post-commit hook calls `blue index --changed`
+3. **File watcher** - Daemon watches for changes (already have daemon infrastructure)
+4. **MCP tool** - `blue_index_file` for AI agents to update during work
+
+Likely want combination: daemon watches + on-demand refresh.
+
+### Semantic Search Approaches
+
+**Phase 1: Keyword + Structure**
+- FTS5 for text search across summaries and descriptions
+- Filter by file path, symbol kind, relationship type
+- Good enough for "find files related to authentication"
+
+**Phase 2: Embeddings**
+- Generate embeddings for each symbol description
+- Store in sqlite-vec or similar
+- Query: "what handles S3 bucket permissions" → vector similarity
+
+### Relationship Detection
+
+AI needs to identify relationships. Approaches:
+
+1. **Static analysis** - Parse imports/uses (language-specific, complex)
+2. **AI inference** - "Given file A and file B, describe their relationship"
+3. **Explicit declarations** - Like current ExportBinding/ImportBinding
+4. **Hybrid** - AI suggests, human confirms
+
+**Recommendation:** AI inference with caching. When indexing file A, ask AI to describe relationships to files it references.
+
+## Proposed Schema
+
+```sql
+-- File-level index
+CREATE TABLE file_index (
+    id INTEGER PRIMARY KEY,
+    realm TEXT NOT NULL,
+    repo TEXT NOT NULL,
+    file_path TEXT NOT NULL,
+    file_hash TEXT NOT NULL,
+    summary TEXT,
+    indexed_at DATETIME DEFAULT CURRENT_TIMESTAMP,
+    UNIQUE(realm, repo, file_path)
+);
+
+-- Symbol-level index
+CREATE TABLE symbol_index (
+    id INTEGER PRIMARY KEY,
+    file_id INTEGER REFERENCES file_index(id),
+    name TEXT NOT NULL,
+    kind TEXT NOT NULL,  -- struct, fn, enum, class, etc.
+    start_line INTEGER,
+    end_line INTEGER,
+    description TEXT
+);
+
+-- Relationships between files
+CREATE TABLE file_relationships (
+    id INTEGER PRIMARY KEY,
+    source_file_id INTEGER REFERENCES file_index(id),
+    target_file_id INTEGER REFERENCES file_index(id),
+    kind TEXT NOT NULL,  -- uses, used_by, imports, exports, tests
+    description TEXT
+);
+
+-- FTS5 virtual table for search
+CREATE VIRTUAL TABLE file_search USING fts5(
+    file_path,
+    summary,
+    symbol_names,
+    symbol_descriptions,
+    content=file_index
+);
+```
+
+## Proposed MCP Tools
+
+| Tool | Purpose |
+|------|---------|
+| `blue_index_realm` | Index all files in a realm |
+| `blue_index_file` | Index a single file (for incremental updates) |
+| `blue_index_search` | Semantic search across the index |
+| `blue_index_impact` | Given a file, show what depends on it |
+| `blue_index_status` | Show indexing status and staleness |
+
+## Open Questions
+
+1. **Which AI model for indexing?** Local (Ollama) for cost, or API for quality?
+2. **How to handle large files?** Chunk by function/class? Summary only?
+3. **Cross-realm relationships?** Index within realm first, cross-realm later?
+4. **Embedding model?** If we go vector route, which embedding model?
+
+## Next Steps
+
+If this spike looks good:
+1. Create RFC for the full design
+2. Start with SQLite schema + FTS5
+3. Add `blue_index_file` tool that takes AI-generated index data
+4. Add daemon file watcher for auto-indexing
+
+---
+
+*Investigation notes by Blue*