Add embedding service pre-warm option to reduce cold-start latency

Problem

The embedding service is lazily initialized — the first semantic search request triggers model loading (1-5s depending on model size). This creates a noticeable delay for the first user/agent query after server startup.

Solution

1. Add `PYRITE_PREWARM_EMBEDDINGS=true` environment variable 2. On server startup (FastAPI `lifespan`), optionally initialize the embedding service and load the model 3. Add a `/health` endpoint field indicating embedding readiness 4. Keep lazy loading as default for CLI and lightweight deployments

Files

`pyrite/services/embedding_service.py` — initialization logic

`pyrite/server/api.py` — lifespan event for pre-warm

`pyrite/config.py` — new config option

Add embedding service pre-warm option to reduce cold-start latencybacklog_item

Add embedding service pre-warm option to reduce cold-start latency

Problem

Solution

Files

Linked from

Links to