ADR-0008: Structured Data and Schema-as-Config
Context
Pyrite's entry model is document-oriented: YAML frontmatter for structured metadata, Markdown body for freeform content. Entry type schemas are defined in Python dataclasses (core types in `core_types.py`, plugin types in extension code). This works well for document-heavy entries (notes, investigations, writeups) but creates friction for structured data objects where the value is in typed fields, not prose.
The problem: A user who wants a "meeting" type with a date field, attendee references, and a status dropdown must write a Python plugin. There is no way to define types with typed, validated fields through configuration alone.
The opportunity: Every major knowledge tool (Notion, Anytype, Capacities, Tana, Obsidian) has converged on a model where structured properties coexist with freeform content. Notion's "every row is a page" and Anytype's "Relations + Canvas" are the most relevant patterns for Pyrite because they maintain the structured/freeform split that Pyrite already has (frontmatter/body).
Current state:
What we learned from competitor analysis:
Decision
1. Three-Layer Schema Model
Schema definitions come from three sources, in order of increasing power:
| Layer | Definition | Power | Audience | |-------|-----------|-------|----------| | Config schema | `kb.yaml` types section | Field types, constraints, select options, object refs | Users, no code needed | | Plugin declared schema | Plugin's `get_field_schemas()` | Same as config + programmatic generation | Plugin authors | | Plugin code | Python dataclass + methods | Full power: hooks, computed fields, workflows, MCP tools | Plugin developers |
All three produce entries stored as the same Markdown+YAML files in git. The difference is where the schema lives and how much behavior is attached.
Config-defined types use `GenericEntry` (already exists) with automatic field validation from the schema. Plugin-defined types can optionally declare field schemas for auto-validation and UI form generation, reducing boilerplate while keeping custom code for behavior.
2. Rich Field Type System
Extend `TypeSchema` with a `fields` dict where each field has a type and optional constraints:
```yaml
In kb.yaml
types: meeting: description: "Meeting notes" layout: document fields: date: type: date required: true attendees: type: list items: { type: object-ref, target_type: person } status: type: select options: [scheduled, completed, cancelled] default: scheduled action_items: type: list items: { type: text } subdirectory: meetings/ ```Field types:
| Type | YAML Storage | Validation | |------|-------------|------------| | `text` | string | optional: `format` (email, url, phone), `min_length`, `max_length` | | `number` | number | optional: `min`, `max` | | `date` | ISO 8601 string | date format validation | | `datetime` | ISO 8601 string | datetime format validation | | `checkbox` | boolean | true/false | | `select` | string | must be in `options` list | | `multi-select` | list[string] | each must be in `options` list | | `object-ref` | `{ref: id}` or string | target must match `target_type` if specified | | `list` | list | items validated by inner `items` type | | `tags` | list[string] | (alias for multi-select without fixed options) |
This covers the 80% case. Plugin code handles the rest (computed fields, custom validation logic, complex constraints).
3. Two Entry Layouts
| Layout | Description | Use Case | |--------|------------|----------| | `document` | Rich frontmatter + substantial Markdown body | Notes, writeups, investigations (default) | | `record` | Rich frontmatter + minimal/empty body | Contacts, tasks, bookmarks, data objects |
Both are Markdown files with YAML frontmatter. The difference is a hint to the UI: `document` gets a full editor, `record` gets a form-first view with an optional notes area. The storage format is identical.
4. Typed Object References
A new field type `object-ref` enables relational queries between entries:
```yaml
In frontmatter
project: { ref: project-alpha } attendees: - { ref: jane-doe } - { ref: bob-smith } ```Object references can optionally constrain the target type (`target_type: person`). The indexer builds a reference table enabling reverse lookups ("which meetings reference this person?"), powering backlinks, relation views, and graph queries.
5. Migrate to ruamel.yaml
Replace PyYAML with `ruamel.yaml` for round-trip-safe YAML serialization:
6. Schema Validation is Automatic
When a type has field schemas (from config or plugin), core validation automatically:
Plugin validators remain for domain-specific rules beyond what the type system expresses.