QA Agent Workflows — pyrite

Problem

Knowledge base quality currently depends entirely on human review at commit time. There is no continuous, automated quality assurance across the corpus. Entries can have missing fields, inconsistent importance scores, unsupported claims, stale cross-references, and style drift — none of which are caught after initial creation.

Solution

A multi-tier QA agent system that evaluates KB entries against type-level criteria, KB-level editorial guidelines, and factual accuracy standards. QA results are themselves KB entries (type: `qa_assessment`), making quality a queryable, trackable property of the knowledge base.

Design

Three evaluation tiers

Tier 1: Structural validation (fully automatable, no LLM)

Deterministic checks that run on every save or as a batch sweep:

Required fields present per entry type (date on events, target/source on relationships)

Date formats parseable

Importance scores within valid range (1-10)

Capture lanes from controlled vocabulary (if KB has `kb.yaml` with vocabulary)

Tags exist and aren't orphaned

Wikilink targets resolve to existing entries

Sources have URLs or citations

No empty bodies on non-stub entries

Implementation: Pure Python validation functions, no LLM needed. Can run as a post-save hook or CLI batch command.

Tier 2: Consistency and appropriateness (LLM-assisted, high confidence)

AI-judged checks that flag issues for human review:

Body text supports the title's claim

Importance score consistent with comparable entries in the KB

Tags and capture lanes appropriate for the content

Entry contextualizes rather than merely records (for investigative KBs)

Relationships are bidirectional where expected

Summaries accurately reflect body content

No duplicate or near-duplicate entries (semantic similarity check)

Implementation: LLM evaluation against type-level AI instructions (already in CORE_TYPE_METADATA) plus KB-level guidelines. Produces confidence-scored assessments.

Tier 3: Factual verification (LLM + research, lower confidence)

Deep verification requiring external research:

Specific claims match cited sources

Dates are historically accurate

Quotes are correctly attributed

Causal claims are defensible

Statistics and figures are verifiable

Cross-reference against existing KB for contradictions

Implementation: Research agent with web search capability, source document retrieval, and cross-KB consistency checking. Produces confidence-scored assessments with source chains.

QA assessment entries

Each QA run produces entries of type `qa_assessment`:

```yaml --- id: qa-{entry-id}-{timestamp} type: qa_assessment title: "QA: {entry title}" tags: [qa, tier-{1|2|3}] target_entry: {entry-id} tier: 1|2|3 status: pass|warn|fail issues_found: 3 issues_resolved: 1 last_assessed: 2026-02-28 ---

Assessment Summary

Overall: WARN (2 open issues)

Issues

1. Missing source citation (tier-1, FAIL)

Body claims "$25,000 donation" but no source is linked. Confidence: 1.0

2. Importance score inconsistency (tier-2, WARN)

Importance 8 but comparable events scored 5-6. Confidence: 0.85

3. Date verified (tier-3, PASS)

"March 2024" confirmed via source URL. Confidence: 0.92 ```

KB-level editorial guidelines

New optional section in `kb.yaml`:

```yaml editorial_guidelines: tone: analytical framework: "collective punishment and institutional lineage" sourcing: "every factual claim must link to a source entry or URL" style_notes: - "contextualize events within broader patterns, don't just record" - "name specific actors and mechanisms, avoid vague systemic claims" - "trace institutional lineage rather than treating events as isolated" ```

These guidelines are passed to Tier 2/3 evaluations alongside type-level AI instructions.

Phases

Phase 1: Tier 1 structural validation (effort: M)

`QAService` with `validate_entry()` and `validate_all()` methods

Validation rules per entry type (derived from schema + type metadata)

CLI: `pyrite qa validate [--kb ] [--entry ] [--fix]`

MCP: `kb_qa_validate` read-tier tool

Output: structured issue list, no LLM needed

Phase 2: QA assessment entry type + storage (effort: M)

`qa_assessment` entry type with schema

Link assessments to target entries

Query interface: "show all entries with open issues", "unassessed entries", "verification rate by capture lane"

CLI: `pyrite qa status [--kb ]` — dashboard of assessment state

MCP: `kb_qa_status` read-tier tool

Phase 3: Tier 2 LLM-assisted consistency checks (effort: L)

LLM evaluation prompts using type AI instructions + KB editorial guidelines

Consistency scoring against comparable entries (semantic similarity to find comparables)

Confidence-scored assessments

CLI: `pyrite qa assess [--kb ] [--entry ] [--tier 2]`

MCP: `kb_qa_assess` write-tier tool (creates assessment entries)

Phase 4: Tier 3 factual verification (effort: XL)

Research agent with web search for claim verification

Cross-KB contradiction detection

Source chain verification (do cited sources actually support the claims?)

Confidence-scored factual assessments

CLI: `pyrite qa verify [--kb ] [--entry ]`

Phase 5: Continuous QA pipeline (effort: L) — partially done

~~Post-save hook triggers Tier 1 validation automatically~~ Done: `validate` param on `kb_create`/`kb_update` MCP tools + `qa_on_write: true` KB-level setting in `kb.yaml`. Issues returned as `qa_issues` in MCP response.

Scheduled batch runs for Tier 2/3 (configurable frequency)

QA dashboard in web UI: verification rates, issue trends, coverage gaps

"Entries needing review" collection (virtual collection with QA-based query)

Plugin architecture

The QA system should be domain-agnostic at core:

Core: field validation, type consistency, dedup detection, link integrity

Plugin config: domain-specific evaluation rubrics

- Legal KB: citation accuracy, procedural correctness - Scientific KB: methodology descriptions, statistical claims - Investigative KB: sourcing standards, analytical framework consistency

This means the QA service accepts pluggable evaluation criteria, and plugins can register custom Tier 2/3 checks via the plugin protocol.

Dependencies

Tier 1: No dependencies (pure validation against existing schema)

Tier 2: Depends on LLM abstraction service (#6, done) and type metadata (#42, done)

Tier 3: Depends on Tier 2 + web search capability

Phase 5: Depends on hooks system (#24, done) and collections (#61, done)

KB editorial guidelines: Depends on capture lane validation (#72)

Files likely affected

New: `pyrite/services/qa_service.py`

New: `pyrite/models/qa_types.py` (or extension entry type)

Modified: `pyrite/server/mcp_server.py` (new QA tools)

Modified: `pyrite/cli/__init__.py` (new `qa` command group)

New: `pyrite/server/endpoints/qa.py`

Modified: `pyrite/config.py` (editorial_guidelines in KBConfig)

New: `tests/test_qa_service.py`

QA Agent Workflowsbacklog_item