diff --git a/.blue/blue.db b/.blue/blue.db index e516d36..4add833 100644 Binary files a/.blue/blue.db and b/.blue/blue.db differ diff --git a/.blue/docs/dialogues/realm-semantic-index.dialogue.md b/.blue/docs/dialogues/realm-semantic-index.dialogue.md new file mode 100644 index 0000000..d84f8a7 --- /dev/null +++ b/.blue/docs/dialogues/realm-semantic-index.dialogue.md @@ -0,0 +1,441 @@ +# Dialogue: Realm Semantic Index + +**Spike**: [2026-01-24-Realm Semantic Index](../spikes/2026-01-24-Realm%20Semantic%20Index.md) +**Goal**: Reach 96% alignment on semantic indexing design +**Format**: 12 experts, structured rounds + +--- + +## Open Questions + +1. **Storage backend** - SQLite+FTS5, sqlite-vec, or dedicated vector DB? +2. **Update triggers** - Daemon watcher, git hooks, on-demand, or hybrid? +3. **Relationship detection** - Static analysis, AI inference, or explicit declaration? +4. **AI model** - Local (Ollama) vs API for indexing? +5. **Index granularity** - File-level summaries vs symbol-level detail? + +--- + +## Expert Panel + +| Expert | Domain | Perspective | +|--------|--------|-------------| +| **Ada** | API Design | Clean interfaces, discoverability | +| **Ben** | Developer Experience | Friction, learning curve | +| **Carmen** | Systems Architecture | Scalability, performance | +| **David** | Search Systems | Retrieval quality, ranking | +| **Elena** | Claude Integration | LLM tool use patterns | +| **Felix** | Distributed Systems | Consistency, coordination | +| **Grace** | Security | Trust boundaries, data sensitivity | +| **Hassan** | Product | User workflows, value delivery | +| **Iris** | Simplicity | Minimalism, YAGNI | +| **James** | Observability | Debugging, transparency | +| **Kim** | Testing | Testability, reliability | +| **Luna** | AI/ML | Model selection, embedding quality | + +--- + +## Round 1: Initial Positions + +### Question 1: Storage Backend + +**David (Search)**: Vector search is the future, but sqlite-vec is immature. For semantic search over code descriptions, embeddings will outperform keyword matching. But we can layer - FTS5 now, vectors later. + +**Carmen (Systems)**: Keep the stack simple. We already have blue.db. Adding sqlite-vec means native extensions, cross-platform builds, Rust binding complexity. FTS5 is built-in and good enough for thousands of files. + +**Iris (Simplicity)**: SQLite+FTS5. Period. We're searching human-readable descriptions, not raw code. Keywords work. The whole point of AI-generated summaries is they use natural language. "What handles S3 permissions" will match "manages S3 bucket access policies." + +**Luna (AI/ML)**: Embeddings give fuzzy matching - "authentication" matches "login", "credentials", "session tokens". FTS5 won't. But generating embeddings adds latency and storage. Hybrid is ideal: FTS5 for exact, embeddings for semantic fallback. + +**Ben (DX)**: Whatever doesn't require extra setup. Developers won't install special extensions just to use indexing. FTS5 ships with SQLite. + +**Kim (Testing)**: FTS5 is deterministic and easy to test. Vector similarity has floating-point fuzziness. Start with testable. + +**Alignment**: 88% toward SQLite+FTS5 for MVP, design for embedding extension later. + +### Question 2: Update Triggers + +**Felix (Distributed)**: Git hooks are fragile - users disable them, forget to install them, CI doesn't run them. Daemon watcher is reliable but adds always-on process. Best: on-demand with staleness detection. + +**Carmen (Systems)**: We have daemon infrastructure already. File watcher is cheap. Index on file save, async in background. User never waits. + +**James (Observability)**: Whatever we choose, need clear visibility into "is my index fresh?" `blue index status` should show staleness per file. + +**Ben (DX)**: On-demand is safest. I run `blue index` when I need to search. No magic, no surprises. If daemon is available, it can pre-warm, but never required. + +**Hassan (Product)**: The killer use case is impact analysis before a change. User is about to modify a file, asks "what depends on this?" Index needs to be fresh. Daemon watching makes this instant. + +**Iris (Simplicity)**: On-demand only. Daemon watching is scope creep. User changes file, runs `blue index --file foo.rs`, searches. Simple mental model. + +**Grace (Security)**: Daemon watching means reading all files continuously. For repos with sensitive code, that's a concern. On-demand gives user control. + +**Alignment**: 75% - split between daemon-assisted and pure on-demand. Need to reconcile. + +### Question 3: Relationship Detection + +**Luna (AI/ML)**: AI inference is the only practical option for polyglot codebases. Static analysis means parsers for every language. AI can read Python, Rust, TypeScript and understand "this file imports that one." + +**Ada (API Design)**: Explicit declaration is most reliable. Like the existing Binding/ExportBinding structure. Developer says "this file provides X, that file consumes X." AI can suggest, human confirms. + +**Elena (Claude Integration)**: AI should suggest relationships during indexing. "I see this file imports domain.rs, they have a uses relationship." Store as tentative until confirmed. Over time, learn which suggestions are right. + +**Kim (Testing)**: AI-inferred relationships are non-deterministic. Same file might get different relationships on re-index. Hard to test, hard to trust. + +**Iris (Simplicity)**: Skip relationships for MVP. File summaries and symbol descriptions are enough. Relationships add complexity. Search "Domain struct" and you'll find both the definition and usages. + +**Felix (Distributed)**: Relationships are critical for impact analysis. That's the whole point. But Kim is right about determinism. Solution: cache AI suggestions, only re-analyze on significant change. + +**David (Search)**: Relationships improve search ranking. "Files related to X" is a better query than "files mentioning X". But explicit > inferred for reliability. + +**Alignment**: 70% - tension between AI inference and explicit declaration. Need synthesis. + +### Question 4: AI Model for Indexing + +**Luna (AI/ML)**: Local models (Ollama) for privacy and cost. Indexing happens frequently; API costs add up. Quality difference is narrowing. Llama 3.2 or Qwen 2.5 can summarize code well. + +**Carmen (Systems)**: Local means requiring Ollama installed and running. Not everyone has that. Need graceful degradation - use API if local unavailable. + +**Ben (DX)**: Make it configurable. Some teams have API keys, some run local. Default to local if Ollama detected, fall back to "index not available." + +**Grace (Security)**: Local keeps code on-device. Important for proprietary codebases. API means sending code snippets to third party. Local should be default. + +**Hassan (Product)**: API gives consistent quality. Local varies by hardware. But the privacy story matters. Local-first, API opt-in. + +**Iris (Simplicity)**: Require Ollama for now. We already integrated it for `blue agent`. Don't add API complexity. If someone wants API, they can run Ollama with API backend. + +**Elena (Claude Integration)**: For Claude Code integration, the AI doing the work IS the API. When user asks to index, Claude can do it inline. No separate model needed. + +**Alignment**: 82% toward local-first (Ollama), with inline-Claude option for MCP context. + +### Question 5: Index Granularity + +**David (Search)**: Symbol-level is necessary for useful search. "Find the function that validates S3 paths" needs to match `validate_s3_path` at line 47, not just "this file does S3 stuff." + +**Iris (Simplicity)**: File-level summaries first. Symbol extraction is expensive and language-specific. A good file summary mentions key functions: "Defines Domain struct (line 13) and Binding struct (line 76)." + +**Carmen (Systems)**: Symbol-level means more rows, more storage, more indexing time. For a 10,000 file realm, that's 50,000+ symbol entries. Worth it? + +**Luna (AI/ML)**: AI can extract symbols naturally. "List the main components in this file with line numbers." One prompt, structured output. Not that expensive. + +**Ada (API Design)**: Symbol-level enables richer queries: "Find all functions that return Result" vs just "files about domains." Worth the complexity. + +**Ben (DX)**: Impact analysis needs symbol-level. "What calls this function?" requires knowing what functions exist and where. File-level is just better grep. + +**Kim (Testing)**: Symbol extraction can be validated - run on known files, check expected symbols appear. More testable than pure summaries. + +**Hassan (Product)**: Users think in symbols: functions, classes, types. Not files. Index what users think about. + +**Alignment**: 85% toward symbol-level indexing with structured extraction. + +--- + +## Round 2: Convergence + +### Reconciling Question 2: Update Triggers + +**Felix**: Proposal: *tiered freshness*. On-demand is always available. Daemon watching is enhancement. MCP tools report staleness. + +``` +Index freshness: +- 3 files stale (modified since last index) +- Last full index: 2 hours ago +``` + +User can ignore staleness for quick searches, or run `blue index` when precision matters. + +**Carmen**: I can accept that. Daemon is optional optimization. Core functionality works without it. + +**Iris**: If daemon is optional and clearly optional, I'm in. No invisible magic. + +**Ben**: Add `--watch` flag to explicitly start watching. Default is on-demand. + +**James**: Staleness in every search result. "This file was indexed 3 hours ago, file has changed since." User knows to re-index if needed. + +**Hassan**: This works for the "impact before change" story. User sees staleness, re-indexes the files they care about, gets fresh results. + +**Alignment**: 92% toward tiered freshness with optional daemon. + +### Reconciling Question 3: Relationships + +**Elena**: Synthesis: *AI-suggested, query-time materialized.* + +Don't store relationships persistently. When user asks "what depends on X?", AI analyzes X's symbols and searches for usages across the index. Results are relationships, computed on demand. + +**David**: This is how good code search works. You don't precompute all relationships - you find them at query time. The index gives you fast symbol lookup, the query gives you relationships. + +**Luna**: This avoids the determinism problem. Each query is a fresh analysis. If the index is fresh, relationships are fresh. + +**Kim**: Much easier to test. Query "depends on Domain" should return files containing "Domain" in their symbol usages. Deterministic given the index. + +**Iris**: I like this. No relationship storage, no relationship staleness. Index symbols well, derive relationships at query time. + +**Ada**: We could cache frequent queries. "Depends on auth.rs" gets cached until auth.rs changes. Optimization, not architecture. + +**Felix**: Cache is good. Query-time computation with LRU cache. Cache invalidates when any involved file changes. + +**Alignment**: 94% toward query-time relationship derivation with optional caching. + +--- + +## Round 3: Final Positions + +### Consolidated Design + +**Storage**: SQLite + FTS5, schema designed for future embedding column. + +**Update Triggers**: On-demand primary, optional daemon watching with `--watch`. Staleness always visible. + +**Relationships**: Query-time derivation, not stored. Optional caching for frequent queries. + +**AI Model**: Local (Ollama) primary, inline-Claude when called from MCP. Configurable. + +**Granularity**: Symbol-level with file summary. Structured extraction: name, kind, lines, description. + +### Final Alignment Scores + +| Question | Alignment | +|----------|-----------| +| Storage backend | 88% | +| Update triggers | 92% | +| Relationship detection | 94% | +| AI model | 82% | +| Index granularity | 85% | +| **Overall** | **88%** | + +### Remaining Dissent + +**Luna (8%)**: Embeddings should be MVP, not "future." Keyword search will disappoint users expecting semantic matching. + +**Iris (4%)**: Symbol-level is over-engineering. Start with file summaries, add symbols when proven needed. + +**Grace (5%)**: Local-only is too restrictive. Some teams can't run Ollama. Need API option from day one. + +--- + +## Round 4: Closing the Gap + +### Addressing Luna's Concern + +**David**: Counter-proposal: support *optional* embedding column from day one. If user has embedding model configured, populate it. Search uses embeddings when available, falls back to FTS5. + +**Luna**: That works. Embeddings are enhancement, not requirement. Users who care can enable them. + +**Carmen**: Minimal code change - add nullable `embedding BLOB` column to schema. Search checks if populated. + +**Alignment**: Luna satisfied. +4% → 92% + +### Addressing Iris's Concern + +**Ben**: What if symbol extraction is *optional*? Default indexer produces file summary only. `--symbols` flag enables deep extraction. + +**Iris**: I can accept that. Users who want symbols opt in. Default is simple. + +**Hassan**: Disagree. Symbols are the product. We shouldn't hide the value behind a flag. + +**Kim**: Compromise: extract symbols by default, but don't fail if extraction fails. Some files might only get summaries. + +**Iris**: Fine. Best-effort symbols, graceful degradation to summary-only. + +**Alignment**: Iris satisfied. +2% → 94% + +### Addressing Grace's Concern + +**Elena**: We already support `blue agent --model provider/model`. Same pattern for indexing. Default to Ollama, `--model anthropic/claude-3-haiku` works too. + +**Grace**: That's acceptable. Local is default, API is opt-in with explicit flag. + +**Ben**: Document the privacy implications clearly. "By default, code stays local. API option sends code to provider." + +**Alignment**: Grace satisfied. +3% → 97% + +--- + +## Final Alignment: 97% + +### Consensus Design + +1. **Storage**: SQLite + FTS5, optional embedding column for future/power users +2. **Updates**: On-demand primary, optional `--watch` daemon, staleness always shown +3. **Relationships**: Query-time derivation from symbol index, optional LRU cache +4. **AI Model**: Ollama default, API opt-in with `--model`, inline-Claude in MCP context +5. **Granularity**: Symbol-level by default, graceful fallback to file summary + +### Remaining 3% Dissent + +**Iris**: Still think we're building too much. But I'll trust the process. + +--- + +## Round 5: Design Refinements + +New questions surfaced during RFC drafting: + +1. **Update triggers revised** - Git pre-commit hook instead of daemon? +2. **Relationships revised** - Store AI descriptions at index time instead of query-time derivation? +3. **Model sizing** - Which Qwen model balances speed and quality for indexing? + +### Question 6: Git Pre-Commit Hook + +**Felix (Distributed)**: I take back my earlier concern about hooks. Pre-commit is reliable because it's tied to an action the developer already does. Post-save watchers are invisible; pre-commit is explicit. + +**Ben (DX)**: `blue index --install-hook` is one command. Developer opts in consciously. Hook runs on staged files only — fast, focused. + +**Carmen (Systems)**: Hook calls `blue index --diff`, indexes only changed files. No daemon process. No file watcher. Clean. + +**James (Observability)**: Hook should be non-blocking. If indexing fails, warn but don't abort commit. Developers will disable blocking hooks. + +**Iris (Simplicity)**: Much better than daemon. Git is already the source of truth for changes. Hook respects that. I'm fully on board. + +**Kim (Testing)**: Easy to test: stage files, run hook, verify index updated. Deterministic. + +**Hassan (Product)**: Need `blue index --all` for bootstrap. First clone, run once, then hooks maintain it. + +**Alignment**: 98% toward git pre-commit hook with `--all` for bootstrap. + +### Question 7: Stored Relationships + +**Luna (AI/ML)**: Storing relationships at index time is better for search quality. Query-time derivation means another AI call per search. Slow. Stored descriptions are instant FTS5 matches. + +**David (Search)**: Agree. The AI writes natural language: "Uses Domain from domain.rs for state management." That's searchable. "What uses Domain" hits it directly. + +**Kim (Testing)**: Stored is more deterministic. Same index = same search results. Query-time AI adds variability. + +**Elena (Claude Integration)**: One AI call per file at index time, zero at search time. Much better UX. Search feels instant. + +**Iris (Simplicity)**: I was wrong earlier. Stored relationships are simpler operationally. No AI inference during search. Just text matching. + +**Carmen (Systems)**: Relationships field is just another TEXT column in file_index. FTS5 includes it. Minimal schema change. + +**Felix (Distributed)**: When file A changes, we re-index A. A's relationships update. Files depending on A don't need re-indexing — their descriptions still say "uses A". Search still works. + +**Alignment**: 96% toward AI-generated relationships stored at index time. + +### Question 8: Qwen Model Size for Indexing + +**Luna (AI/ML)**: The task is structured extraction: summary, relationships, symbols with line numbers. Not creative writing. Smaller models excel at structured tasks. + +Let me break down the options: + +| Model | Size | Speed (tok/s on M2) | Quality | Use Case | +|-------|------|---------------------|---------|----------| +| Qwen2.5:0.5b | 0.5B | ~200 | Basic | Too small for code understanding | +| Qwen2.5:1.5b | 1.5B | ~150 | Good | Fast, handles simple files | +| Qwen2.5:3b | 3B | ~100 | Very Good | Sweet spot for code analysis | +| Qwen2.5:7b | 7B | ~50 | Excellent | Overkill for structured extraction | +| Qwen2.5:14b | 14B | ~25 | Excellent | Way too slow for batch indexing | + +**Carmen (Systems)**: For batch indexing hundreds of files, speed matters. 3B at 100 tok/s means a 500-token file takes 5 seconds. 7B doubles that. + +**Ben (DX)**: Pre-commit hook needs to be fast. Developer commits 5 files, waits... how long? At 3B, maybe 25 seconds total. At 7B, 50 seconds. 3B is the limit. + +**David (Search)**: Quality requirements: can it identify the main symbols? Can it describe relationships accurately? 3B Qwen2.5 handles this well. I've tested it on code summarization. + +**Elena (Claude Integration)**: Qwen2.5:3b is specifically tuned for code. The :coder variants are even better but same size. For structured extraction with a good prompt, 3B is sufficient. + +**Grace (Security)**: Smaller model = smaller attack surface, less memory, faster. Security likes smaller when quality is adequate. + +**Iris (Simplicity)**: 3B. It's the middle path. Not too slow, not too dumb. + +**Hassan (Product)**: What about variable sizing? Use 3B for most files, 7B for complex/critical files? + +**Luna (AI/ML)**: Complexity detection adds overhead. Start with 3B uniform. If users report quality issues on specific file types, add heuristics later. + +**James (Observability)**: Log model performance per file. We'll see patterns: "Rust files take 2x longer" or "3B struggles with files over 1000 lines." + +**Kim (Testing)**: 3B is testable. Run on known files, verify expected symbols extracted. If tests pass, quality is sufficient. + +**Alignment check on model size:** + +| Model | Votes | Alignment | +|-------|-------|-----------| +| Qwen2.5:1.5b | 1 (Iris fallback) | 8% | +| Qwen2.5:3b | 10 | 84% | +| Qwen2.5:7b | 1 (Luna for quality) | 8% | + +**Luna**: I'll concede to 3B for MVP. Add `--model` flag for users who want 7B quality and have patience. + +**Alignment**: 94% toward Qwen2.5:3b default, configurable via `--model`. + +--- + +## Round 6: Final Refinements + +### Handling Large Files + +**Carmen (Systems)**: What about files over 1000 lines? 3B context is 32K tokens, but very long files might need chunking. + +**Luna (AI/ML)**: Chunk by logical units: functions, classes. Index each chunk. Reassemble into single file entry. + +**Iris (Simplicity)**: Or just truncate. Index the first 500 lines. Most important code is at the top. Pragmatic. + +**David (Search)**: Truncation loses symbols at the bottom. Chunking is better. But adds complexity. + +**Elena (Claude Integration)**: Proposal: for files under 1000 lines (95% of files), index whole file. For larger files, summarize with explicit note: "Large file, partial index." + +**Ben (DX)**: I like Elena's approach. Don't over-engineer for edge cases. Note the limitation, move on. + +**Alignment**: 92% toward whole-file indexing with "large file" warning for 1000+ lines. + +### Prompt Engineering + +**Luna (AI/ML)**: The indexing prompt is critical. Needs to be: +- Structured output (YAML or JSON) +- Explicit about line numbers +- Focused on relationships + +``` +Analyze this source file and provide: +1. A one-sentence summary of what this file does +2. A paragraph describing relationships to other files (imports, exports, dependencies) +3. A list of key symbols (functions, classes, structs, enums) with: + - name + - kind (function/class/struct/enum/const) + - start and end line numbers + - one-sentence description + +Output as YAML. +``` + +**Kim (Testing)**: Prompt should be versioned. If we change the prompt, re-index everything. + +**Ada (API Design)**: Store prompt version in file_index. `prompt_version INTEGER`. When prompt changes, all entries are stale. + +**Alignment**: 96% toward structured prompt with versioning. + +--- + +## Final Alignment: 96% + +### Updated Consensus Design + +1. **Storage**: SQLite + FTS5, optional embedding column +2. **Updates**: Git pre-commit hook (`--diff`), bootstrap with `--all` +3. **Relationships**: AI-generated descriptions stored at index time +4. **AI Model**: Qwen2.5:3b default (Ollama), configurable via `--model` +5. **Granularity**: Symbol-level with line numbers, whole-file for <1000 lines +6. **Prompt**: Structured YAML output, versioned + +### Final Alignment Scores + +| Question | Alignment | +|----------|-----------| +| Storage backend | 92% | +| Update triggers (git hook) | 98% | +| Relationships (stored) | 96% | +| AI model (Qwen2.5:3b) | 94% | +| Index granularity | 92% | +| Large file handling | 92% | +| Prompt design | 96% | +| **Overall** | **96%** | + +### Remaining 4% Dissent + +**Luna (2%)**: Would prefer 7B for quality, but accepts 3B with `--model` escape hatch. + +**Hassan (2%)**: Wants adaptive model selection, but accepts uniform 3B for MVP. + +--- + +*"Twelve voices, refined twice. That's how you ship."* + +— Blue diff --git a/.blue/docs/rfcs/0010-realm-semantic-index.md b/.blue/docs/rfcs/0010-realm-semantic-index.md new file mode 100644 index 0000000..eb0b5e9 --- /dev/null +++ b/.blue/docs/rfcs/0010-realm-semantic-index.md @@ -0,0 +1,347 @@ +# RFC 0010: Realm Semantic Index + +| | | +|---|---| +| **Status** | Draft | +| **Date** | 2026-01-24 | +| **Source Spike** | Realm Semantic Index | +| **Dialogue** | [realm-semantic-index.dialogue.md](../dialogues/realm-semantic-index.dialogue.md) | +| **Alignment** | 97% | + +--- + +## Summary + +An AI-maintained semantic index for files within a realm. Each file gets a summary and symbol-level descriptions with line references. Enables semantic search for impact analysis: "what depends on this file?" and "what's the blast radius of this change?" + +## Problem + +When working across repos in a realm: +- No quick way to know what a file does without reading it +- No way to find files related to a concept ("authentication", "S3 access") +- No impact analysis before making changes +- Existing search is keyword-only, misses semantic matches + +## Proposal + +### Index Structure + +Each indexed file contains: + +```yaml +file: src/realm/domain.rs +last_indexed: 2026-01-24T10:30:00Z +file_hash: abc123 + +summary: "Domain definitions for cross-repo coordination" + +relationships: | + Core types used by service.rs for realm state management. + Loaded/saved by repo.rs for persistence. + Referenced by daemon/client.rs for cross-repo messaging. + +symbols: + - name: Domain + kind: struct + lines: [13, 73] + description: "Coordination context between repos with name, members, timestamps" + + - name: Binding + kind: struct + lines: [76, 143] + description: "Declares repo exports and imports within a domain" + + - name: ImportStatus + kind: enum + lines: [259, 274] + description: "Binding status: Pending, Current, Outdated, Broken" +``` + +### Storage: SQLite + FTS5 + +Use existing blue.db with full-text search: + +```sql +-- File-level index +CREATE TABLE file_index ( + id INTEGER PRIMARY KEY, + realm TEXT NOT NULL, + repo TEXT NOT NULL, + file_path TEXT NOT NULL, + file_hash TEXT NOT NULL, + summary TEXT, + relationships TEXT, -- AI-generated relationship descriptions + indexed_at DATETIME DEFAULT CURRENT_TIMESTAMP, + prompt_version INTEGER DEFAULT 1, -- Invalidate on prompt changes + embedding BLOB, -- Optional, for future vector search + UNIQUE(realm, repo, file_path) +); + +-- Symbol-level index +CREATE TABLE symbol_index ( + id INTEGER PRIMARY KEY, + file_id INTEGER REFERENCES file_index(id) ON DELETE CASCADE, + name TEXT NOT NULL, + kind TEXT NOT NULL, + start_line INTEGER, + end_line INTEGER, + description TEXT +); + +-- FTS5 for search +CREATE VIRTUAL TABLE file_search USING fts5( + file_path, + summary, + relationships, + content=file_index, + content_rowid=id +); + +CREATE VIRTUAL TABLE symbol_search USING fts5( + name, + description, + content=symbol_index, + content_rowid=id +); +``` + +### Update Triggers: Git-Driven + +**Primary: Pre-commit hook on diff** + +```bash +# .git/hooks/pre-commit (installed by blue index --install-hook) +#!/bin/sh +blue index --diff +``` + +The hook runs `blue index --diff` which: +1. Gets staged files from `git diff --cached --name-only` +2. Indexes only those files +3. Commits include fresh index entries + +**Bootstrap: Full index from scratch** + +```bash +# First time setup - index everything +blue index --all + +# Or index specific directory +blue index --all src/ +``` + +**On-demand: Single file or refresh** + +```bash +# Re-index specific file +blue index --file src/domain.rs + +# Refresh stale entries (re-index files where hash changed) +blue index --refresh +``` + +**MCP inline**: When called from Claude, can index files during conversation. + +### Staleness Detection + +``` +blue index status + +Index status: + Total files: 147 + Indexed: 142 (96%) + Stale: 3 (hash mismatch) + Unindexed: 2 (new files) + + Stale: + - src/realm/domain.rs + - src/realm/service.rs + + Unindexed: + - src/new_feature.rs + - tests/new_test.rs +``` + +### Relationships: AI-Generated at Index Time + +When indexing a file, AI generates a concise `relationships` description alongside the summary: + +```yaml +file: src/realm/service.rs +summary: "RealmService coordinates cross-repo state and notifications" + +relationships: | + Uses Domain and Binding from domain.rs for state representation. + Calls RepoConfig from config.rs for realm settings. + Provides notifications consumed by daemon/server.rs. + Tested by tests/realm_service_test.rs. + +symbols: + - name: RealmService + kind: struct + lines: [15, 89] + description: "Main service coordinating realm operations" +``` + +The `relationships` field is a natural language description — searchable via FTS5: + +``` +Query: "what uses Domain" +→ Matches service.rs: "Uses Domain and Binding from domain.rs..." + +Query: "what provides notifications" +→ Matches service.rs: "Provides notifications consumed by..." +``` + +AI does the relationship analysis once during indexing. Search is just text matching over stored descriptions. Fast and deterministic. + +### AI Model: Qwen2.5:3b via Ollama + +**Recommended**: `qwen2.5:3b` — optimal balance of speed and quality for code indexing. + +| Model | Speed (M2) | Quality | Verdict | +|-------|------------|---------|---------| +| qwen2.5:1.5b | ~150 tok/s | Basic | Too shallow for code analysis | +| **qwen2.5:3b** | ~100 tok/s | Very Good | **Sweet spot** — fast, accurate | +| qwen2.5:7b | ~50 tok/s | Excellent | Too slow for batch indexing | + +At 3b, a 500-token file indexes in ~5 seconds. A 5-file commit takes ~25 seconds — acceptable for pre-commit hook. + +``` +Model priority: +1. Ollama qwen2.5:3b (default) - fast, local, private +2. --model flag - explicit override (e.g., qwen2.5:7b for quality) +3. Inline Claude - when called from MCP, use active model +``` + +Privacy: code stays local by default. API requires explicit opt-in. + +### Large File Handling + +Files under 1000 lines: index whole file. +Files over 1000 lines: summarize with warning "Large file, partial index." + +No chunking for MVP. Note the limitation, move on. + +### Indexing Prompt + +Versioned prompt for structured extraction: + +``` +Analyze this source file and provide: +1. A one-sentence summary of what this file does +2. A paragraph describing relationships to other files (imports, exports, dependencies) +3. A list of key symbols (functions, classes, structs, enums) with: + - name + - kind (function/class/struct/enum/const) + - start and end line numbers + - one-sentence description + +Output as YAML. +``` + +Store `prompt_version` in file_index. When prompt changes, all entries are stale. + +### CLI Commands + +```bash +# Bootstrap: index everything from scratch +blue index --all + +# Install git pre-commit hook +blue index --install-hook + +# Index staged files (called by hook) +blue index --diff + +# Index single file +blue index --file src/domain.rs + +# Refresh stale entries +blue index --refresh + +# Check index freshness +blue index status + +# Search the index +blue search "S3 permissions" +blue search --symbols "validate" + +# Impact analysis +blue impact src/domain.rs +``` + +### MCP Tools + +| Tool | Purpose | +|------|---------| +| `blue_index_realm` | Index all files in current realm | +| `blue_index_file` | Index a single file | +| `blue_index_status` | Show index freshness | +| `blue_index_search` | Search across indexed files | +| `blue_index_impact` | Show files depending on target | + +## Non-Goals + +- Cross-realm search (scope to single realm for MVP) +- Automatic relationship storage (query-time only) +- Required embeddings (FTS5 is sufficient, embeddings are optional) +- Language-specific parsing (AI inference works across languages) + +## Test Plan + +- [ ] Schema created in blue.db on first index +- [ ] `blue index --all` indexes all files in realm, extracts symbols +- [ ] `blue index --diff` indexes only staged files +- [ ] `blue index --file` indexes single file, updates existing entry +- [ ] `blue index --install-hook` creates valid pre-commit hook +- [ ] `blue index --refresh` re-indexes stale entries only +- [ ] `blue index status` shows staleness accurately +- [ ] `blue search` returns relevant files ranked by match quality +- [ ] `blue impact` shows files with symbols referencing target +- [ ] Staleness detection works (file hash comparison) +- [ ] Prompt version tracked; old versions marked stale +- [ ] Qwen2.5:3b produces valid YAML output +- [ ] Large files (>1000 lines) get partial index warning +- [ ] Ollama integration works for local indexing +- [ ] `--model` flag allows override to different model +- [ ] MCP tools available and functional +- [ ] FTS5 search handles partial matches +- [ ] Pre-commit hook runs without blocking commit on failure +- [ ] Relationships field searchable via FTS5 + +## Implementation Plan + +- [ ] Add schema to blue.db (file_index, symbol_index, FTS5 tables) +- [ ] Create versioned indexing prompt for structured YAML extraction +- [ ] Implement Ollama integration with qwen2.5:3b default +- [ ] Implement `blue index --all` for bootstrap +- [ ] Implement `blue index --diff` for staged files +- [ ] Implement `blue index --file` for single-file updates +- [ ] Implement `blue index --install-hook` for git hook setup +- [ ] Implement `blue index --refresh` for stale entry updates +- [ ] Implement `blue index status` for freshness reporting +- [ ] Add large file handling (>1000 lines warning) +- [ ] Implement `blue search` with FTS5 backend +- [ ] Implement `blue impact` for dependency queries +- [ ] Add MCP tools (5 tools) +- [ ] Add `--model` flag for model override +- [ ] Optional: embedding column support + +## Open Questions (Resolved) + +| Question | Resolution | Alignment | +|----------|------------|-----------| +| Storage backend | SQLite + FTS5, optional embedding column | 92% | +| Update triggers | Git pre-commit hook on diff, `--all` for bootstrap | 98% | +| Relationships | AI-generated descriptions stored at index time | 96% | +| AI model | Qwen2.5:3b via Ollama, `--model` for override | 94% | +| Granularity | Symbol-level with line numbers | 92% | +| Large files | Whole-file <1000 lines, warning for larger | 92% | +| Prompt design | Structured YAML, versioned | 96% | + +--- + +*"Index the realm. Know the impact. Change with confidence."* + +— Blue diff --git a/.blue/docs/spikes/2026-01-24-Realm Semantic Index.md b/.blue/docs/spikes/2026-01-24-Realm Semantic Index.md new file mode 100644 index 0000000..910a4b8 --- /dev/null +++ b/.blue/docs/spikes/2026-01-24-Realm Semantic Index.md @@ -0,0 +1,180 @@ +# Spike: Realm Semantic Index + +| | | +|---|---| +| **Status** | In Progress | +| **Date** | 2026-01-24 | +| **Time Box** | 4 hours | + +--- + +## Question + +How can we create an AI-maintained semantic index of files within a realm, tracking what each file does (with line references), its relationships to other files, and enabling semantic search for change impact analysis? + +--- + +## Context + +Realms coordinate across repos. Domains define relationships (provider/consumer, exports/imports). But when a file changes, there's no quick way to know: +- What does this file actually do? +- What other files depend on it? +- What's the blast radius of a change? + +We want an AI-maintained index that answers these questions via semantic search. + +## Design Space + +### What Gets Indexed + +For each file in a realm: + +```yaml +file: src/realm/domain.rs +last_indexed: 2026-01-24T10:30:00Z +hash: abc123 # for change detection + +summary: "Domain definitions for cross-repo coordination. Defines Domain, Binding, ExportBinding, ImportBinding types." + +symbols: + - name: Domain + kind: struct + lines: [13, 73] + description: "A coordination context between repos with name, description, creation time, and member list" + + - name: Binding + kind: struct + lines: [76, 143] + description: "Declares what a repo exports or imports in a domain" + + - name: ImportStatus + kind: enum + lines: [259, 274] + description: "Status of an import binding: Pending, Current, Outdated, Broken" + +relationships: + - target: src/realm/service.rs + kind: used_by + description: "RealmService uses Domain and Binding to manage cross-repo state" + + - target: src/realm/repo.rs + kind: used_by + description: "Repo operations load/save Domain and Binding files" +``` + +### Storage Options + +| Option | Pros | Cons | +|--------|------|------| +| **SQLite + FTS5** | Already have blue.db, full-text search built-in | No semantic/vector search | +| **SQLite + sqlite-vec** | Vector similarity search, keeps single DB | Requires extension, Rust bindings unclear | +| **Separate JSON files** | Human-readable, git-tracked | Slow to search at scale | +| **Embedded vector DB (lancedb)** | Purpose-built for semantic search | Another dependency | + +**Recommendation:** Start with SQLite + FTS5 for keyword search. Add embeddings later if needed. + +### Index Update Triggers + +1. **On-demand** - `blue index` command regenerates +2. **Git hook** - Post-commit hook calls `blue index --changed` +3. **File watcher** - Daemon watches for changes (already have daemon infrastructure) +4. **MCP tool** - `blue_index_file` for AI agents to update during work + +Likely want combination: daemon watches + on-demand refresh. + +### Semantic Search Approaches + +**Phase 1: Keyword + Structure** +- FTS5 for text search across summaries and descriptions +- Filter by file path, symbol kind, relationship type +- Good enough for "find files related to authentication" + +**Phase 2: Embeddings** +- Generate embeddings for each symbol description +- Store in sqlite-vec or similar +- Query: "what handles S3 bucket permissions" → vector similarity + +### Relationship Detection + +AI needs to identify relationships. Approaches: + +1. **Static analysis** - Parse imports/uses (language-specific, complex) +2. **AI inference** - "Given file A and file B, describe their relationship" +3. **Explicit declarations** - Like current ExportBinding/ImportBinding +4. **Hybrid** - AI suggests, human confirms + +**Recommendation:** AI inference with caching. When indexing file A, ask AI to describe relationships to files it references. + +## Proposed Schema + +```sql +-- File-level index +CREATE TABLE file_index ( + id INTEGER PRIMARY KEY, + realm TEXT NOT NULL, + repo TEXT NOT NULL, + file_path TEXT NOT NULL, + file_hash TEXT NOT NULL, + summary TEXT, + indexed_at DATETIME DEFAULT CURRENT_TIMESTAMP, + UNIQUE(realm, repo, file_path) +); + +-- Symbol-level index +CREATE TABLE symbol_index ( + id INTEGER PRIMARY KEY, + file_id INTEGER REFERENCES file_index(id), + name TEXT NOT NULL, + kind TEXT NOT NULL, -- struct, fn, enum, class, etc. + start_line INTEGER, + end_line INTEGER, + description TEXT +); + +-- Relationships between files +CREATE TABLE file_relationships ( + id INTEGER PRIMARY KEY, + source_file_id INTEGER REFERENCES file_index(id), + target_file_id INTEGER REFERENCES file_index(id), + kind TEXT NOT NULL, -- uses, used_by, imports, exports, tests + description TEXT +); + +-- FTS5 virtual table for search +CREATE VIRTUAL TABLE file_search USING fts5( + file_path, + summary, + symbol_names, + symbol_descriptions, + content=file_index +); +``` + +## Proposed MCP Tools + +| Tool | Purpose | +|------|---------| +| `blue_index_realm` | Index all files in a realm | +| `blue_index_file` | Index a single file (for incremental updates) | +| `blue_index_search` | Semantic search across the index | +| `blue_index_impact` | Given a file, show what depends on it | +| `blue_index_status` | Show indexing status and staleness | + +## Open Questions + +1. **Which AI model for indexing?** Local (Ollama) for cost, or API for quality? +2. **How to handle large files?** Chunk by function/class? Summary only? +3. **Cross-realm relationships?** Index within realm first, cross-realm later? +4. **Embedding model?** If we go vector route, which embedding model? + +## Next Steps + +If this spike looks good: +1. Create RFC for the full design +2. Start with SQLite schema + FTS5 +3. Add `blue_index_file` tool that takes AI-generated index data +4. Add daemon file watcher for auto-indexing + +--- + +*Investigation notes by Blue*