Embeddings & Vector Search
Embeddings turn text into vectors; ANN indexes find nearest neighbors fast. Pick the model for your language and domain, and benchmark recall — defaults are rarely best.
Deep dive
Model selection
- Multilingual content → multilingual-e5 or bge-m3.
- Code search → CodeBERT, voyage-code.
- Latency-critical → smaller models with re-rank.
Always benchmark on your queries — MTEB rankings don't predict your corpus.
Index choice
- HNSW: best quality/latency for <50M vectors in memory.
- IVF-PQ: scales further with quantization, accepts recall loss.
- Hosted: pgvector for <10M and operational simplicity; dedicated stores (Qdrant, Weaviate, Pinecone) when scale demands.
Hybrid search
Combine BM25 and dense via reciprocal rank fusion (RRF). Catches exact-match queries dense embeddings often miss.
Real-world example
From productionInternal docs search with pgvector handled 2M chunks at <100 ms p95. Avoided a dedicated vector DB and the ops cost. Migration path: when we hit 10M+ vectors we'll move to Qdrant — but until then, one less service to operate.
Interview questions
1 senior-levelQ1When do you need a dedicated vector DB?▾
When pgvector's recall/latency tradeoffs no longer fit — usually past 5–10M vectors or when you need advanced filtering at scale. Below that, pgvector wins on operational simplicity (one database, transactions, backups you already have).
Common mistakes
Picking an embedding model from a leaderboard without testing on your queries.
Storing raw text as the vector ID — bad locality, slow updates.
Trade-offs
Higher-dim embeddings = better recall, more storage / latency.
Quantization saves cost; measure recall loss against an unquantized baseline.