Problem
Knowledge base quality currently depends entirely on human review at commit time. There is no continuous, automated quality assurance across the corpus. Entries can have missing fields, inconsistent importance scores, unsupported claims, stale cross-references, and style drift — none of which are caught after initial creation.
Solution
A multi-tier QA agent system that evaluates KB entries against type-level criteria, KB-level editorial guidelines, and factual accuracy standards. QA results are themselves KB entries (type: `qa_assessment`), making quality a queryable, trackable property of the knowledge base.
Design
Three evaluation tiers
Tier 1: Structural validation (fully automatable, no LLM)
Deterministic checks that run on every save or as a batch sweep:
Required fields present per entry type (date on events, target/source on relationships)
Date formats parseable
Importance scores within valid range (1-10)
Capture lanes from controlled vocabulary (if KB has `kb.yaml` with vocabulary)
Tags exist and aren't orphaned
Wikilink targets resolve to existing entries
Sources have URLs or citations
No empty bodies on non-stub entriesImplementation: Pure Python validation functions, no LLM needed. Can run as a post-save hook or CLI batch command.
Tier 2: Consistency and appropriateness (LLM-assisted, high confidence)
AI-judged checks that flag issues for human review:
Body text supports the title's claim
Importance score consistent with comparable entries in the KB
Tags and capture lanes appropriate for the content
Entry contextualizes rather than merely records (for investigative KBs)
Relationships are bidirectional where expected
Summaries accurately reflect body content
No duplicate or near-duplicate entries (semantic similarity check)Implementation: LLM evaluation against type-level AI instructions (already in CORE_TYPE_METADATA) plus KB-level guidelines. Produces confidence-scored assessments.
Tier 3: Factual verification (LLM + research, lower confidence)
Deep verification requiring external research:
Specific claims match cited sources
Dates are historically accurate
Quotes are correctly attributed
Causal claims are defensible
Statistics and figures are verifiable
Cross-reference against existing KB for contradictionsImplementation: Research agent with web search capability, source document retrieval, and cross-KB consistency checking. Produces confidence-scored assessments with source chains.
QA assessment entries
Each QA run produces entries of type `qa_assessment`:
```yaml
---
id: qa-{entry-id}-{timestamp}
type: qa_assessment
title: "QA: {entry title}"
tags: [qa, tier-{1|2|3}]
target_entry: {entry-id}
tier: 1|2|3
status: pass|warn|fail
issues_found: 3
issues_resolved: 1
last_assessed: 2026-02-28
---
Assessment Summary
Overall: WARN (2 open issues)
Issues
1. Missing source citation (tier-1, FAIL)
Body claims "$25,000 donation" but no source is linked.
Confidence: 1.02. Importance score inconsistency (tier-2, WARN)
Importance 8 but comparable events scored 5-6.
Confidence: 0.853. Date verified (tier-3, PASS)
"March 2024" confirmed via source URL.
Confidence: 0.92
```KB-level editorial guidelines
New optional section in `kb.yaml`:
```yaml
editorial_guidelines:
tone: analytical
framework: "collective punishment and institutional lineage"
sourcing: "every factual claim must link to a source entry or URL"
style_notes:
- "contextualize events within broader patterns, don't just record"
- "name specific actors and mechanisms, avoid vague systemic claims"
- "trace institutional lineage rather than treating events as isolated"
```
These guidelines are passed to Tier 2/3 evaluations alongside type-level AI instructions.
Phases
Phase 1: Tier 1 structural validation (effort: M)
`QAService` with `validate_entry()` and `validate_all()` methods
Validation rules per entry type (derived from schema + type metadata)
CLI: `pyrite qa validate [--kb ] [--entry ] [--fix]`
MCP: `kb_qa_validate` read-tier tool
Output: structured issue list, no LLM neededPhase 2: QA assessment entry type + storage (effort: M)
`qa_assessment` entry type with schema
Link assessments to target entries
Query interface: "show all entries with open issues", "unassessed entries", "verification rate by capture lane"
CLI: `pyrite qa status [--kb ]` — dashboard of assessment state
MCP: `kb_qa_status` read-tier toolPhase 3: Tier 2 LLM-assisted consistency checks (effort: L)
LLM evaluation prompts using type AI instructions + KB editorial guidelines
Consistency scoring against comparable entries (semantic similarity to find comparables)
Confidence-scored assessments
CLI: `pyrite qa assess [--kb ] [--entry ] [--tier 2]`
MCP: `kb_qa_assess` write-tier tool (creates assessment entries)Phase 4: Tier 3 factual verification (effort: XL)
Research agent with web search for claim verification
Cross-KB contradiction detection
Source chain verification (do cited sources actually support the claims?)
Confidence-scored factual assessments
CLI: `pyrite qa verify [--kb ] [--entry ]`Phase 5: Continuous QA pipeline (effort: L) — partially done
~~Post-save hook triggers Tier 1 validation automatically~~ Done: `validate` param on `kb_create`/`kb_update` MCP tools + `qa_on_write: true` KB-level setting in `kb.yaml`. Issues returned as `qa_issues` in MCP response.
Scheduled batch runs for Tier 2/3 (configurable frequency)
QA dashboard in web UI: verification rates, issue trends, coverage gaps
"Entries needing review" collection (virtual collection with QA-based query)Plugin architecture
The QA system should be domain-agnostic at core:
Core: field validation, type consistency, dedup detection, link integrity
Plugin config: domain-specific evaluation rubrics
- Legal KB: citation accuracy, procedural correctness
- Scientific KB: methodology descriptions, statistical claims
- Investigative KB: sourcing standards, analytical framework consistencyThis means the QA service accepts pluggable evaluation criteria, and plugins can register custom Tier 2/3 checks via the plugin protocol.
Dependencies
Tier 1: No dependencies (pure validation against existing schema)
Tier 2: Depends on LLM abstraction service (#6, done) and type metadata (#42, done)
Tier 3: Depends on Tier 2 + web search capability
Phase 5: Depends on hooks system (#24, done) and collections (#61, done)
KB editorial guidelines: Depends on capture lane validation (#72)Files likely affected
New: `pyrite/services/qa_service.py`
New: `pyrite/models/qa_types.py` (or extension entry type)
Modified: `pyrite/server/mcp_server.py` (new QA tools)
Modified: `pyrite/cli/__init__.py` (new `qa` command group)
New: `pyrite/server/endpoints/qa.py`
Modified: `pyrite/config.py` (editorial_guidelines in KBConfig)
New: `tests/test_qa_service.py`