ADR-0015: Object-Document Mapper (ODM) Layer with Schema Versioning and On-Load Migration
Context
Pyrite stores knowledge as typed documents (markdown + YAML frontmatter) but indexes them through SQLAlchemy ORM on SQLite — a relational database. This creates an impedance mismatch:
Additionally, there is no schema migration story. Adding a required field to a type invalidates every existing entry. There's no way to evolve schemas without breaking existing content.
Prior art: Ming + Allura (SourceForge)
Ming was an ODM built for MongoDB at SourceForge. Coupled with Allura's Artifact base class, it provided:
This pattern is proven at scale (SourceForge operated one of the largest MongoDB deployments at the time) and maps directly to Pyrite's needs.
Storage backend flexibility
Multiple storage backends are planned or desirable:
Without an abstraction layer, each backend requires rewriting the service layer. With an ODM, the service layer talks to a stable API and backends are configuration choices.
Decision
Schema versioning is decoupled from the full ODM refactor
Addendum (2026-02-28): The original decision bundled schema versioning with the full ODM/backend abstraction as a single phased effort. After evaluating the critical path to 0.8 (Announceable Alpha), we're decoupling them:
The migration pattern (Ming-style on-load migration with forced-load scripts, reviewable via git diff) is the same regardless of whether it hooks into a `DocumentManager` or directly into `KBRepository`. The pattern is what matters, not the abstraction layer it runs through.
Introduce a Pyrite ODM layer between KBService and storage
The ODM sits between the service layer (KBService, TaskService, etc.) and the storage backends. It handles:
1. Schema-versioned loading: Load from file → parse frontmatter → check `_schema_version` → apply migration chain → return typed entry 2. Validation on save: Entry → validate against current schema → serialize → write file + update index 3. Object versioning: Track which schema version created an entry, which version last modified it 4. Migration registry: Ordered chain of migration functions per type, keyed by version range 5. Backend abstraction: Index operations delegated to pluggable backends (SQLite, Postgres)
On-load migration pattern (from Ming)
Documents migrate lazily when loaded. The migration chain is a sequence of versioned transform functions:
```python @migrate(type="finding", from_version=1, to_version=2) def add_confidence_field(entry): if "confidence" not in entry.metadata: entry.metadata["confidence"] = 0.5 return entry
@migrate(type="finding", from_version=2, to_version=3) def rename_sources_to_evidence(entry): if "sources" in entry.metadata: entry.metadata["evidence"] = entry.metadata.pop("sources") return entry ```
On load, if an entry is at version 1 and current schema is version 3, the ODM runs both migrations in sequence. The entry in memory is always at the current version. If the entry was modified by migration, the ODM optionally writes the migrated version back to the file.
Migration script
```bash
Dry run — show what would change
pyrite schema migrate --kb research --dry-runMigrate all entries — forces load of every entry, triggering on-load migration
pyrite schema migrate --kb researchResult: every entry is at the current schema version
git diff shows exactly what changed in each file
```The migration script is just a forced load of every entry. On-load migration does the actual work. After the script completes, every entry is at the current version — the "everything is migrated" event.
Because the source of truth is files in git, the migration produces a reviewable diff. You can run the migration on a branch, `git diff` the results, and merge when satisfied. This is something the original Ming/MongoDB pattern couldn't provide — git-backed storage turns schema migration into a reviewable PR.
Two-layer storage architecture
The ODM splits storage into two concerns:
Application state (relational, needs ACID):
Knowledge index (document-shaped, needs search):
`pyrite index sync` rebuilds the knowledge index from source files, regardless of backend. Application state persists independently (backed up via `pyrite db backup`).
Backend interface
The ODM defines a `SearchBackend` protocol that any index backend implements:
```python class SearchBackend(Protocol): def upsert_entry(self, entry: Entry, embedding: list[float] | None) -> None: ... def delete_entry(self, entry_id: str, kb_name: str) -> None: ... def search_fts(self, query: str, kb_name: str, **filters) -> list[SearchResult]: ... def search_semantic(self, vector: list[float], kb_name: str, **filters) -> list[SearchResult]: ... def search_hybrid(self, query: str, vector: list[float], kb_name: str, **filters) -> list[SearchResult]: ... def get_entry(self, entry_id: str, kb_name: str) -> EntryRecord | None: ... def query_entries(self, kb_name: str, **filters) -> list[EntryRecord]: ... def get_backlinks(self, entry_id: str, kb_name: str) -> list[str]: ... def get_blocks(self, entry_id: str, kb_name: str) -> list[Block]: ... def rebuild(self, entries: Iterable[Entry]) -> None: ... ```
Current SQLite/FTS5 implementation wraps existing code behind this interface. Postgres/pgvector backend implements it with tsvector and pgvector. The service layer calls `search_backend.search_hybrid()` and doesn't know which engine is underneath.
Configuration
```yaml
pyrite.yaml
storage: # Application state (relational) app_backend: sqlite # or postgres app_url: .pyrite/app.db # or postgresql://...# Knowledge index (document search) index_backend: sqlite # or postgres index_path: .pyrite/index.db # or postgresql://... ```
Default is SQLite for both (current behavior, zero config). Postgres is opt-in for server deployments.
Consequences
Positive
Negative
Implementation sequence (revised)
1. Schema versioning (pre-0.8): `MigrationRegistry`, `_schema_version` tracking, `since_version` field semantics, `pyrite schema migrate` command. Hooks into existing `KBRepository` load/save paths. No new abstraction layer required. 2. ODM interfaces + SQLite wrapping (0.9+): Define `SearchBackend` protocol, implement `SQLiteBackend` wrapping existing code, introduce `DocumentManager`. Route services through ODM. Move schema versioning hooks from `KBRepository` into `DocumentManager`. 3. Alternative backends (0.9+): Postgres backend implementing `SearchBackend` (done, 66/66 conformance tests). LanceDB evaluated and rejected (ADR-0016).
Step 1 is the risk-reducing deliverable — schemas can evolve without breaking existing KBs. Steps 2-3 are architectural improvements that enable backend flexibility but aren't blocking launch.