Problem
Even when source URLs are live, there's no verification that the source content actually supports the claims in the KB entry. AI research agents may accurately cite real URLs but misrepresent or hallucinate the content of those sources.
Context
This is Phase 2 of the source verification pipeline. Phase 1 (URL liveness checking) must be complete first. This phase requires LLM integration and has associated API costs.
Scope
Fetch source URL content and extract key claims from the entry body
Use LLM to compare: "Does this source support these claims?"
Flag entries where sources don't appear to support key claims
Store verification results as QA assessment entries linked to the original entry
Support batch processing with cost controlsAcceptance Criteria
`pyrite qa verify-sources --kb=timeline --sample=50` verifies a random sample
Each entry gets a verification score (0-1) based on source-claim alignment
Low-scoring entries are flagged for human review
Cost-controlled: respects `--max-cost` budget parameter
Results stored as QA assessment entries with evidence links
Incremental: skips entries already verified unless `--force`Open Questions
Which LLM provider to use? Should respect Pyrite's existing LLMService config (BYOK)
What content extraction strategy for web pages? (readability, trafilatura, etc.)
How to handle paywalled sources? (skip with warning? use cached content?)