Embedding Service
The embedding service provides local vector embeddings for semantic similarity search across knowledge base entries. It uses the `sentence-transformers` library with the `all-MiniLM-L6-v2` model to generate 384-dimensional embeddings, stored in `sqlite-vec` for efficient KNN queries. The entire pipeline runs locally with no external API calls.
Key Files
| File | Purpose | |------|---------| | `pyrite/services/embedding_service.py` | Full implementation: model loading, embedding generation, batch indexing, KNN search |
API / Key Classes
`EmbeddingService`
Initialized with a `PyriteDB` instance and an optional model name (defaults to `all-MiniLM-L6-v2`).
| Method | Description | |--------|-------------| | `embed_text(text)` | Generate a float32 embedding vector for arbitrary text | | `embed_entry(entry_id, kb_name)` | Embed a single entry (title + summary + first 500 chars of body) and upsert into `vec_entry` | | `embed_all(kb_name, force, progress_callback)` | Batch embed all entries for a KB (or all KBs). Returns `{embedded, skipped, errors}` counts. Skips already-embedded entries unless `force=True` | | `search_similar(query, kb_name, limit, max_distance)` | KNN search: embeds query, runs `sqlite-vec MATCH` with cosine distance, filters by KB and distance cutoff (default 1.1), returns up to `limit` results with `distance` and `snippet` fields | | `has_embeddings()` | Check if any embeddings exist in the database | | `embedding_stats()` | Returns `{available, count, total_entries, coverage}` statistics |
Module-Level Functions
| Function | Description | |----------|-------------| | `is_available()` | Check if `sentence-transformers` is installed (optional dependency) | | `_entry_text(entry)` | Combine title + summary + body[:500] into embedding input text | | `_generate_snippet(entry, max_len)` | Generate a display snippet from summary or first paragraph of body | | `_embedding_to_blob(embedding)` / `_blob_to_embedding(blob)` | Serialize/deserialize float32 vectors to bytes for sqlite-vec storage |