Background Embedding Pipelinebacklog_item

performanceaiembedding
1 min read · Edit on Pyrite

Background Embedding Pipeline

Decouple write latency from embedding computation by moving embedding updates to a background pipeline.

Context

Currently, `KBService.create_entry()` and `update_entry()` call `EmbeddingService.embed_entry()` synchronously after indexing. For large entries or slow models, this blocks the API response. The embedding is also silently skipped on failure, meaning some entries may lack embeddings without the user knowing.

Scope

Background queue (SQLite-backed)

  • New `embed_queue` table: `entry_id`, `kb_name`, `queued_at`, `status` (pending/processing/done/failed), `error`, `attempts`
  • On create/update: insert into queue instead of embedding synchronously
  • Worker loop: poll queue, embed, update status
  • Retry with backoff: max 3 attempts before marking failed
  • Worker implementation

  • `EmbeddingWorker` class with `process_queue()` method
  • Runs as a background task in the REST API server (asyncio task started on app startup)
  • CLI command `pyrite index embed --background` for manual triggering
  • Batch processing: embed up to N entries per cycle (configurable, default 10)
  • Graceful shutdown: finish current entry, stop accepting new work
  • Status visibility

  • `GET /api/index/embed-status` — queue depth, processing count, last error
  • `pyrite index embed --status` — CLI equivalent
  • WebSocket event `embed_complete` when an entry is freshly embedded
  • Fallback behavior

  • If no background worker is running (CLI usage, single-request mode): embed synchronously as today
  • REST API server always starts the worker
  • MCP server does not start the worker (MCP is request/response, not long-running)
  • Rationale

    Embedding on every save is the primary source of write latency when sentence-transformers is loaded. Moving to a background queue keeps writes fast while ensuring all entries eventually get embeddings. The queue also provides visibility into embedding failures that are currently swallowed silently.

    References

  • Embedding Service
  • KB Service — current synchronous embedding in CRUD pipeline
  • WebSocket Server — for embed_complete events