AI Integration/senior/freq 4/5

Embeddings & Vector Search

Embeddings turn text into vectors; ANN indexes find nearest neighbors fast. Pick the model for your language and domain, and benchmark recall — defaults are rarely best.

embeddingsvectorssearch

Deep dive

Model selection

Multilingual content → multilingual-e5 or bge-m3.
Code search → CodeBERT, voyage-code.
Latency-critical → smaller models with re-rank.

Always benchmark on your queries — MTEB rankings don't predict your corpus.

Index choice

HNSW: best quality/latency for <50M vectors in memory.
IVF-PQ: scales further with quantization, accepts recall loss.
Hosted: pgvector for <10M and operational simplicity; dedicated stores (Qdrant, Weaviate, Pinecone) when scale demands.

Hybrid search

Combine BM25 and dense via reciprocal rank fusion (RRF). Catches exact-match queries dense embeddings often miss.

Real-world example

From production

Internal docs search with pgvector handled 2M chunks at <100 ms p95. Avoided a dedicated vector DB and the ops cost. Migration path: when we hit 10M+ vectors we'll move to Qdrant — but until then, one less service to operate.

Interview questions

1 senior-level

Q1When do you need a dedicated vector DB?▾

When pgvector's recall/latency tradeoffs no longer fit — usually past 5–10M vectors or when you need advanced filtering at scale. Below that, pgvector wins on operational simplicity (one database, transactions, backups you already have).

Common mistakes

Picking an embedding model from a leaderboard without testing on your queries.
Storing raw text as the vector ID — bad locality, slow updates.

Trade-offs

Higher-dim embeddings = better recall, more storage / latency.
Quantization saves cost; measure recall loss against an unquantized baseline.

Retrieval-Augmented Generation (RAG)

AI Integration

AI Governance & Guardrails