AI Integration
AI Integration/senior/freq 4/5

Embeddings & Vector Search

Embeddings turn text into vectors; ANN indexes find nearest neighbors fast. Pick the model for your language and domain, and benchmark recall — defaults are rarely best.

embeddingsvectorssearch

Deep dive

Model selection

  • Multilingual content → multilingual-e5 or bge-m3.
  • Code search → CodeBERT, voyage-code.
  • Latency-critical → smaller models with re-rank.

Always benchmark on your queries — MTEB rankings don't predict your corpus.

Index choice

  • HNSW: best quality/latency for <50M vectors in memory.
  • IVF-PQ: scales further with quantization, accepts recall loss.
  • Hosted: pgvector for <10M and operational simplicity; dedicated stores (Qdrant, Weaviate, Pinecone) when scale demands.

Hybrid search

Combine BM25 and dense via reciprocal rank fusion (RRF). Catches exact-match queries dense embeddings often miss.

Real-world example

From production

Internal docs search with pgvector handled 2M chunks at <100 ms p95. Avoided a dedicated vector DB and the ops cost. Migration path: when we hit 10M+ vectors we'll move to Qdrant — but until then, one less service to operate.

Interview questions

1 senior-level
Q1When do you need a dedicated vector DB?

When pgvector's recall/latency tradeoffs no longer fit — usually past 5–10M vectors or when you need advanced filtering at scale. Below that, pgvector wins on operational simplicity (one database, transactions, backups you already have).

Common mistakes

  • Picking an embedding model from a leaderboard without testing on your queries.

  • Storing raw text as the vector ID — bad locality, slow updates.

Trade-offs

  • Higher-dim embeddings = better recall, more storage / latency.

  • Quantization saves cost; measure recall loss against an unquantized baseline.

Related