Source content verification via LLM (Phase 2) — pyrite

Problem

Even when source URLs are live, there's no verification that the source content actually supports the claims in the KB entry. AI research agents may accurately cite real URLs but misrepresent or hallucinate the content of those sources.

Context

This is Phase 2 of the source verification pipeline. Phase 1 (URL liveness checking) must be complete first. This phase requires LLM integration and has associated API costs.

Scope

Fetch source URL content and extract key claims from the entry body

Use LLM to compare: "Does this source support these claims?"

Flag entries where sources don't appear to support key claims

Store verification results as QA assessment entries linked to the original entry

Support batch processing with cost controls

Acceptance Criteria

`pyrite qa verify-sources --kb=timeline --sample=50` verifies a random sample

Each entry gets a verification score (0-1) based on source-claim alignment

Low-scoring entries are flagged for human review

Cost-controlled: respects `--max-cost` budget parameter

Results stored as QA assessment entries with evidence links

Incremental: skips entries already verified unless `--force`

Open Questions

Which LLM provider to use? Should respect Pyrite's existing LLMService config (BYOK)

What content extraction strategy for web pages? (readability, trafilatura, etc.)

How to handle paywalled sources? (skip with warning? use cached content?)

Source content verification via LLM (Phase 2)backlog_item

Problem

Context

Scope

Acceptance Criteria

Open Questions

Linked from

Links to