Every document filename now mirrors its lifecycle state with a status suffix (e.g., .draft.md, .wip.md, .accepted.md). No more bare .md for tracked document types. Also renamed all from_str methods to parse to avoid FromStr trait confusion, introduced StagingDeploymentParams struct, and fixed all 19 clippy warnings across the codebase. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
347 lines
9.8 KiB
Markdown
347 lines
9.8 KiB
Markdown
# RFC 0010: Realm Semantic Index
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Status** | In Progress |
|
|
| **Date** | 2026-01-24 |
|
|
| **Source Spike** | Realm Semantic Index |
|
|
| **Dialogue** | [realm-semantic-index.dialogue.md](../dialogues/realm-semantic-index.dialogue.md) |
|
|
| **Alignment** | 97% |
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
An AI-maintained semantic index for files within a realm. Each file gets a summary and symbol-level descriptions with line references. Enables semantic search for impact analysis: "what depends on this file?" and "what's the blast radius of this change?"
|
|
|
|
## Problem
|
|
|
|
When working across repos in a realm:
|
|
- No quick way to know what a file does without reading it
|
|
- No way to find files related to a concept ("authentication", "S3 access")
|
|
- No impact analysis before making changes
|
|
- Existing search is keyword-only, misses semantic matches
|
|
|
|
## Proposal
|
|
|
|
### Index Structure
|
|
|
|
Each indexed file contains:
|
|
|
|
```yaml
|
|
file: src/realm/domain.rs
|
|
last_indexed: 2026-01-24T10:30:00Z
|
|
file_hash: abc123
|
|
|
|
summary: "Domain definitions for cross-repo coordination"
|
|
|
|
relationships: |
|
|
Core types used by service.rs for realm state management.
|
|
Loaded/saved by repo.rs for persistence.
|
|
Referenced by daemon/client.rs for cross-repo messaging.
|
|
|
|
symbols:
|
|
- name: Domain
|
|
kind: struct
|
|
lines: [13, 73]
|
|
description: "Coordination context between repos with name, members, timestamps"
|
|
|
|
- name: Binding
|
|
kind: struct
|
|
lines: [76, 143]
|
|
description: "Declares repo exports and imports within a domain"
|
|
|
|
- name: ImportStatus
|
|
kind: enum
|
|
lines: [259, 274]
|
|
description: "Binding status: Pending, Current, Outdated, Broken"
|
|
```
|
|
|
|
### Storage: SQLite + FTS5
|
|
|
|
Use existing blue.db with full-text search:
|
|
|
|
```sql
|
|
-- File-level index
|
|
CREATE TABLE file_index (
|
|
id INTEGER PRIMARY KEY,
|
|
realm TEXT NOT NULL,
|
|
repo TEXT NOT NULL,
|
|
file_path TEXT NOT NULL,
|
|
file_hash TEXT NOT NULL,
|
|
summary TEXT,
|
|
relationships TEXT, -- AI-generated relationship descriptions
|
|
indexed_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
|
prompt_version INTEGER DEFAULT 1, -- Invalidate on prompt changes
|
|
embedding BLOB, -- Optional, for future vector search
|
|
UNIQUE(realm, repo, file_path)
|
|
);
|
|
|
|
-- Symbol-level index
|
|
CREATE TABLE symbol_index (
|
|
id INTEGER PRIMARY KEY,
|
|
file_id INTEGER REFERENCES file_index(id) ON DELETE CASCADE,
|
|
name TEXT NOT NULL,
|
|
kind TEXT NOT NULL,
|
|
start_line INTEGER,
|
|
end_line INTEGER,
|
|
description TEXT
|
|
);
|
|
|
|
-- FTS5 for search
|
|
CREATE VIRTUAL TABLE file_search USING fts5(
|
|
file_path,
|
|
summary,
|
|
relationships,
|
|
content=file_index,
|
|
content_rowid=id
|
|
);
|
|
|
|
CREATE VIRTUAL TABLE symbol_search USING fts5(
|
|
name,
|
|
description,
|
|
content=symbol_index,
|
|
content_rowid=id
|
|
);
|
|
```
|
|
|
|
### Update Triggers: Git-Driven
|
|
|
|
**Primary: Pre-commit hook on diff**
|
|
|
|
```bash
|
|
# .git/hooks/pre-commit (installed by blue index --install-hook)
|
|
#!/bin/sh
|
|
blue index --diff
|
|
```
|
|
|
|
The hook runs `blue index --diff` which:
|
|
1. Gets staged files from `git diff --cached --name-only`
|
|
2. Indexes only those files
|
|
3. Commits include fresh index entries
|
|
|
|
**Bootstrap: Full index from scratch**
|
|
|
|
```bash
|
|
# First time setup - index everything
|
|
blue index --all
|
|
|
|
# Or index specific directory
|
|
blue index --all src/
|
|
```
|
|
|
|
**On-demand: Single file or refresh**
|
|
|
|
```bash
|
|
# Re-index specific file
|
|
blue index --file src/domain.rs
|
|
|
|
# Refresh stale entries (re-index files where hash changed)
|
|
blue index --refresh
|
|
```
|
|
|
|
**MCP inline**: When called from Claude, can index files during conversation.
|
|
|
|
### Staleness Detection
|
|
|
|
```
|
|
blue index status
|
|
|
|
Index status:
|
|
Total files: 147
|
|
Indexed: 142 (96%)
|
|
Stale: 3 (hash mismatch)
|
|
Unindexed: 2 (new files)
|
|
|
|
Stale:
|
|
- src/realm/domain.rs
|
|
- src/realm/service.rs
|
|
|
|
Unindexed:
|
|
- src/new_feature.rs
|
|
- tests/new_test.rs
|
|
```
|
|
|
|
### Relationships: AI-Generated at Index Time
|
|
|
|
When indexing a file, AI generates a concise `relationships` description alongside the summary:
|
|
|
|
```yaml
|
|
file: src/realm/service.rs
|
|
summary: "RealmService coordinates cross-repo state and notifications"
|
|
|
|
relationships: |
|
|
Uses Domain and Binding from domain.rs for state representation.
|
|
Calls RepoConfig from config.rs for realm settings.
|
|
Provides notifications consumed by daemon/server.rs.
|
|
Tested by tests/realm_service_test.rs.
|
|
|
|
symbols:
|
|
- name: RealmService
|
|
kind: struct
|
|
lines: [15, 89]
|
|
description: "Main service coordinating realm operations"
|
|
```
|
|
|
|
The `relationships` field is a natural language description — searchable via FTS5:
|
|
|
|
```
|
|
Query: "what uses Domain"
|
|
→ Matches service.rs: "Uses Domain and Binding from domain.rs..."
|
|
|
|
Query: "what provides notifications"
|
|
→ Matches service.rs: "Provides notifications consumed by..."
|
|
```
|
|
|
|
AI does the relationship analysis once during indexing. Search is just text matching over stored descriptions. Fast and deterministic.
|
|
|
|
### AI Model: Qwen2.5:3b via Ollama
|
|
|
|
**Recommended**: `qwen2.5:3b` — optimal balance of speed and quality for code indexing.
|
|
|
|
| Model | Speed (M2) | Quality | Verdict |
|
|
|-------|------------|---------|---------|
|
|
| qwen2.5:1.5b | ~150 tok/s | Basic | Too shallow for code analysis |
|
|
| **qwen2.5:3b** | ~100 tok/s | Very Good | **Sweet spot** — fast, accurate |
|
|
| qwen2.5:7b | ~50 tok/s | Excellent | Too slow for batch indexing |
|
|
|
|
At 3b, a 500-token file indexes in ~5 seconds. A 5-file commit takes ~25 seconds — acceptable for pre-commit hook.
|
|
|
|
```
|
|
Model priority:
|
|
1. Ollama qwen2.5:3b (default) - fast, local, private
|
|
2. --model flag - explicit override (e.g., qwen2.5:7b for quality)
|
|
3. Inline Claude - when called from MCP, use active model
|
|
```
|
|
|
|
Privacy: code stays local by default. API requires explicit opt-in.
|
|
|
|
### Large File Handling
|
|
|
|
Files under 1000 lines: index whole file.
|
|
Files over 1000 lines: summarize with warning "Large file, partial index."
|
|
|
|
No chunking for MVP. Note the limitation, move on.
|
|
|
|
### Indexing Prompt
|
|
|
|
Versioned prompt for structured extraction:
|
|
|
|
```
|
|
Analyze this source file and provide:
|
|
1. A one-sentence summary of what this file does
|
|
2. A paragraph describing relationships to other files (imports, exports, dependencies)
|
|
3. A list of key symbols (functions, classes, structs, enums) with:
|
|
- name
|
|
- kind (function/class/struct/enum/const)
|
|
- start and end line numbers
|
|
- one-sentence description
|
|
|
|
Output as YAML.
|
|
```
|
|
|
|
Store `prompt_version` in file_index. When prompt changes, all entries are stale.
|
|
|
|
### CLI Commands
|
|
|
|
```bash
|
|
# Bootstrap: index everything from scratch
|
|
blue index --all
|
|
|
|
# Install git pre-commit hook
|
|
blue index --install-hook
|
|
|
|
# Index staged files (called by hook)
|
|
blue index --diff
|
|
|
|
# Index single file
|
|
blue index --file src/domain.rs
|
|
|
|
# Refresh stale entries
|
|
blue index --refresh
|
|
|
|
# Check index freshness
|
|
blue index status
|
|
|
|
# Search the index
|
|
blue search "S3 permissions"
|
|
blue search --symbols "validate"
|
|
|
|
# Impact analysis
|
|
blue impact src/domain.rs
|
|
```
|
|
|
|
### MCP Tools
|
|
|
|
| Tool | Purpose |
|
|
|------|---------|
|
|
| `blue_index_realm` | Index all files in current realm |
|
|
| `blue_index_file` | Index a single file |
|
|
| `blue_index_status` | Show index freshness |
|
|
| `blue_index_search` | Search across indexed files |
|
|
| `blue_index_impact` | Show files depending on target |
|
|
|
|
## Non-Goals
|
|
|
|
- Cross-realm search (scope to single realm for MVP)
|
|
- Automatic relationship storage (query-time only)
|
|
- Required embeddings (FTS5 is sufficient, embeddings are optional)
|
|
- Language-specific parsing (AI inference works across languages)
|
|
|
|
## Test Plan
|
|
|
|
- [ ] Schema created in blue.db on first index
|
|
- [ ] `blue index --all` indexes all files in realm, extracts symbols
|
|
- [ ] `blue index --diff` indexes only staged files
|
|
- [ ] `blue index --file` indexes single file, updates existing entry
|
|
- [ ] `blue index --install-hook` creates valid pre-commit hook
|
|
- [ ] `blue index --refresh` re-indexes stale entries only
|
|
- [ ] `blue index status` shows staleness accurately
|
|
- [ ] `blue search` returns relevant files ranked by match quality
|
|
- [ ] `blue impact` shows files with symbols referencing target
|
|
- [ ] Staleness detection works (file hash comparison)
|
|
- [ ] Prompt version tracked; old versions marked stale
|
|
- [ ] Qwen2.5:3b produces valid YAML output
|
|
- [ ] Large files (>1000 lines) get partial index warning
|
|
- [ ] Ollama integration works for local indexing
|
|
- [ ] `--model` flag allows override to different model
|
|
- [ ] MCP tools available and functional
|
|
- [ ] FTS5 search handles partial matches
|
|
- [ ] Pre-commit hook runs without blocking commit on failure
|
|
- [ ] Relationships field searchable via FTS5
|
|
|
|
## Implementation Plan
|
|
|
|
- [x] Add schema to blue.db (file_index, symbol_index, FTS5 tables)
|
|
- [x] Create versioned indexing prompt for structured YAML extraction
|
|
- [x] Implement Ollama integration with qwen2.5:3b default
|
|
- [x] Implement `blue index --all` for bootstrap
|
|
- [x] Implement `blue index --diff` for staged files
|
|
- [x] Implement `blue index --file` for single-file updates
|
|
- [x] Implement `blue index --install-hook` for git hook setup
|
|
- [x] Implement `blue index --refresh` for stale entry updates
|
|
- [x] Implement `blue index status` for freshness reporting
|
|
- [x] Add large file handling (>1000 lines warning)
|
|
- [x] Implement `blue search` with FTS5 backend
|
|
- [x] Implement `blue impact` for dependency queries
|
|
- [x] Add MCP tools (5 tools)
|
|
- [x] Add `--model` flag for model override
|
|
- [ ] Optional: embedding column support
|
|
|
|
## Open Questions (Resolved)
|
|
|
|
| Question | Resolution | Alignment |
|
|
|----------|------------|-----------|
|
|
| Storage backend | SQLite + FTS5, optional embedding column | 92% |
|
|
| Update triggers | Git pre-commit hook on diff, `--all` for bootstrap | 98% |
|
|
| Relationships | AI-generated descriptions stored at index time | 96% |
|
|
| AI model | Qwen2.5:3b via Ollama, `--model` for override | 94% |
|
|
| Granularity | Symbol-level with line numbers | 92% |
|
|
| Large files | Whole-file <1000 lines, warning for larger | 92% |
|
|
| Prompt design | Structured YAML, versioned | 96% |
|
|
|
|
---
|
|
|
|
*"Index the realm. Know the impact. Change with confidence."*
|
|
|
|
— Blue
|