Problem
Schemas evolve. You add a required field to a type, tighten a controlled vocabulary, or rename a field — and now hundreds of existing entries are technically invalid. `pyrite ci` will fail on entries that were fine yesterday. There's no migration story: no way to add defaults to existing entries, no way to distinguish "required for new entries" vs "required for all entries," and no way to track which schema version an entry was created under.
This matters especially for corporate teams and long-lived KBs where schema changes are inevitable.
Relationship to ODM Layer
Schema versioning is decoupled from the ODM layer and ships independently (pre-0.8). The migration pattern (Ming-style on-load migration) hooks into the existing `KBRepository` and `IndexManager` load/save paths — no `DocumentManager` or `SearchBackend` abstraction required.
The ODM layer (see odm-layer) ships post-launch (0.9+) as a backend abstraction refactor. When it lands, the schema versioning hooks move from `KBRepository` into `DocumentManager` — a straightforward relocation, not a redesign.
See ADR-0015 addendum for the rationale.
Solution
Schema Version Tracking
KB-level and per-type versioning in kb.yaml:
```yaml name: my-kb kb_type: journalism schema_version: 3 # increments when any type changes
types: finding: version: 3 fields: confidence: type: number required: true since_version: 2 # required for entries created at v2+ evidence: type: multi-ref required: true since_version: 1 methodology: type: string required: true since_version: 3 ```
Entries track their schema version in frontmatter:
```yaml --- id: finding-001 type: finding _schema_version: 2 confidence: 0.85 evidence: [doc-001, doc-002]
no 'methodology' — predates v3
--- ```On-Load Migration (Ming Pattern)
When `KBRepository` loads an entry, it checks `_schema_version` against the current type version. If behind, the `MigrationRegistry` applies the migration chain:
```python @migration_registry.register(type="finding", from_version=2, to_version=3) def finding_v2_to_v3(entry_data: dict) -> dict: """Add methodology field with default.""" if "methodology" not in entry_data: entry_data["methodology"] = "unspecified" return entry_data ```
Migrations are registered by core code and extensions via the plugin protocol's `get_migrations()` method.
Implementation
The `MigrationRegistry` and version tracking hook into existing code paths:
No new abstraction layers. The migration registry is a standalone module (`pyrite/schema/migrations.py` or similar) that `KBRepository` calls during load.
Migration Commands
```bash
Show what would change
pyrite schema diff --from 2 --to 3Dry-run migration — forces load of every entry, reports what would change
pyrite schema migrate --kb research --dry-runApply migration — forces load + save of every entry
On-load migration does the actual work; save writes migrated files + updates index
pyrite schema migrate --kb researchResult: "247 entries checked, 31 migrated, 0 errors"
git diff shows exactly what changed — reviewable before commit
Validate at specific version
pyrite ci --schema-version 2 # lenient mode for legacy entries ```Because files in git are the source of truth, migration produces a reviewable diff. Run on a branch, review with `git diff`, merge when satisfied. This is something the original Ming/MongoDB pattern couldn't provide.
Migration Strategies
QA Integration
`pyrite ci` and QA validation should be schema-version-aware:
Prerequisites
Success Criteria
Launch Context
Must ship before 0.8. Without this, the first schema change after launch breaks every existing KB. The `since_version` pattern is the minimum — it lets schemas evolve without invalidating existing content. The on-load migration pattern (from Ming/Allura) means the system tolerates mixed schema versions gracefully — entries migrate when accessed, and `pyrite schema migrate` provides a clean "everything is migrated" checkpoint.