ADR-0010: Content Negotiation and Multi-Format Support
Context
Pyrite entries have a single canonical format: YAML frontmatter + Markdown body, stored as `.md` files in git. All API endpoints return JSON via Pydantic serialization. The MCP server wraps everything in JSON text content blocks. There is no content negotiation — clients cannot request data in alternative formats.
This creates several friction points:
1. AI agents pay a token tax
When an LLM agent retrieves entries via MCP or API, it receives JSON. JSON is verbose — quotes around every key, braces, commas, brackets. For structured data like entry listings, search results, and tabular queries, this verbosity costs real tokens. Formats like TOON (`text/toon`) are purpose-built for LLM consumption: ~40% fewer tokens than JSON on tabular data with equal or better LLM parse accuracy.
2. Human consumers want human formats
A CLI user piping search results to another tool may want CSV or TSV. A researcher exporting findings may want Markdown. A dashboard may want a summary, not a full entry. Currently there is no way to request these.
3. The canonical format is already multi-format
A Pyrite entry is simultaneously:
The conversion infrastructure partially exists — `to_markdown()`, `to_frontmatter()`, `to_db_dict()` — but it's not exposed through content negotiation and there's no unified conversion layer.
4. Import is the flip side of export
Users need to bring data into Pyrite from other systems (CSV contact lists, JSON API dumps, Obsidian vaults, Notion exports). This requires the same format awareness, run in reverse.
Decision
1. Accept Header Content Negotiation on API
API endpoints honor the `Accept` header to select response format. The default remains `application/json`.
``` GET /api/entries/meeting-2024-03-15 Accept: application/json → JSON (current behavior, default) Accept: text/markdown → Full markdown with frontmatter Accept: text/toon → TOON encoding of the entry data Accept: text/csv → CSV (for list/search endpoints only) Accept: application/yaml → YAML frontmatter only (no body) ```
Implementation: A response format layer sits between the endpoint logic and the HTTP response. Endpoints return structured data (Pydantic models or dicts). The format layer serializes based on `Accept` header, falling back to JSON for unsupported types.
```python
Conceptual — actual implementation may vary
from pyrite.formats import negotiate_response@router.get("/api/entries/{entry_id}") def get_entry(entry_id: str, request: Request): data = service.get_entry(entry_id) return negotiate_response(request, data) ```
2. Format Registry
A pluggable format registry maps MIME types to serializer/deserializer pairs:
| MIME Type | Ext | Serialize | Deserialize | Best for | |-----------|-----|-----------|-------------|----------| | `application/json` | `.json` | Built-in (Pydantic) | Built-in | API default, web UI | | `text/markdown` | `.md` | `Entry.to_markdown()` | `Entry.from_markdown()` | Human reading, git storage | | `text/toon` | `.toon` | toon-python `encode()` | toon-python `decode()` | LLM agents (token-efficient) | | `text/csv` | `.csv` | stdlib csv | stdlib csv | Tabular export, spreadsheets | | `application/yaml` | `.yaml` | ruamel.yaml `dump()` | ruamel.yaml `load()` | Frontmatter-only, config tools | | `text/plain` | `.txt` | Body text only | — | Grep, simple tools |
New formats can be added by registering a serializer. Plugins could contribute formats via a future `get_formats()` protocol method, but this is not required for the initial implementation.
3. TOON for LLM-Optimized Responses
TOON (Token-Oriented Object Notation) is a format designed for LLM input. It uses indentation-based nesting (like YAML) with CSV-style tabular arrays, achieving ~40% fewer tokens than JSON on structured data while maintaining lossless round-trip fidelity.
Where TOON excels in Pyrite:
Where TOON doesn't help:
Integration:
4. Automatic Conversion Where It Makes Sense
Some conversions are lossless and automatic; others are lossy and require explicit opt-in:
| From → To | Lossless? | Automatic? | Notes | |-----------|-----------|------------|-------| | Markdown → JSON | Yes | Yes | Full entry with all fields | | JSON → Markdown | Yes | Yes | Reconstruct frontmatter + body | | JSON → TOON | Yes | Yes | Structural encoding, round-trips | | TOON → JSON | Yes | Yes | Structural decoding | | JSON → CSV | Lossy | Yes (for lists) | Flattens nested fields; body truncated | | CSV → JSON | Lossy | Import only | Requires column→field mapping | | JSON → YAML | Partial | Yes | Frontmatter only, no body | | JSON → Plain text | Lossy | Yes | Body only, metadata stripped | | Markdown → Plain text | Lossy | Yes | Strip formatting |
Automatic means: the API/MCP will perform the conversion when the client requests it. No explicit "convert" step needed.
Import conversions (CSV → entries, JSON → entries) require explicit field mapping and are handled by the import/export system, not content negotiation.
5. MCP Format Awareness
The MCP protocol returns text content blocks. Currently all tool responses are `json.dumps(result)`. With format support:
```json { "content": [ { "type": "text", "mimeType": "text/toon", "text": "results[5]{id,title,type,date,tags}:\n meeting-01,Sprint Planning,meeting,2024-03-15,[dev]\n ..." } ] } ```
6. CLI Output Formats
The CLI gains a `--format` flag:
```bash pyrite search "budget" --format json # default pyrite search "budget" --format csv pyrite search "budget" --format toon pyrite get meeting-01 --format markdown pyrite get meeting-01 --format yaml # frontmatter only ```
7. Import Pipeline (Future)
Content negotiation establishes the format registry. The import pipeline (backlog item #36) reuses it:
```bash pyrite import contacts.csv --type person --mapping name=title,email=metadata.email pyrite import notion-export.json --format notion pyrite import vault/ --format obsidian ```
Import is out of scope for this ADR but benefits directly from the format infrastructure.
Implementation Order
1. Format registry — `pyrite/formats/` module with serializer/deserializer protocol 2. API content negotiation — `Accept` header handling, response format layer 3. JSON, Markdown, YAML, CSV serializers — using existing code paths 4. TOON serializer — optional dependency, LLM-optimized output 5. MCP format support — `mimeType` in content blocks 6. CLI `--format` flag — wire into existing CLI output