Dense Retrieval

Dense retrieval is the family of retrieval techniques where queries and documents are encoded as dense vectors in a continuous embedding space and matched by cosine similarity or dot product, in contrast to sparse retrieval where each token is its own dimension. Dense retrieval became practical for production with the 2019-2020 wave of bi-encoder models — DPR (Dense Passage Retrieval, Facebook 2020), ANCE, Contriever, and the explosion of embedding models that followed. The basic recipe: encode every document with an embedding model offline and store the vectors in a vector database; at query time encode the question with the same model and run k-nearest-neighbor search using HNSW or IVF indexing for sublinear lookup. Dense retrieval excels at paraphrase, multilingual, and conceptual queries that lexical methods miss — "how do I terminate an employee for cause" can retrieve a clause that uses the exact word "discharge" instead of "terminate." It struggles with rare entities (product SKUs, drug brand names, legal citations) where the embedding model never saw enough examples to encode them meaningfully. The state of the art combines dense retrieval with BM25 via hybrid search, often followed by a cross-encoder reranker. A how-to with sentence-transformers in Python: model = SentenceTransformer('BAAI/bge-large-en-v1.5'); doc_embeds = model.encode(docs, normalize_embeddings=True); query_embed = model.encode([query], normalize_embeddings=True); scores = query_embed @ doc_embeds.T. AI governance teams track the embedding model version for every index because reindexing a billion documents to swap embedding models is expensive enough to be a deliberate platform decision.

Dense retrieval as the newest layer on a 25-year search stack: Centralpoint added dense retrieval as the third leg of its hybrid index, sitting beside the lexical and natural-language paths Oxcyon refined over 25 years for clients like the US Congress and Samsung. Vectors are generated on-premise by embedded models (Llama, Qwen, Nomic), tokens meter per skill, and dense-retrieval-aware chatbots deploy through one line of JavaScript.

Related Keywords:
Dense Retrieval,Dense Retrieval,Oxcyon, AI, AI Governance, Generative AI, Inference, Inference, Inferencing, RAG, Prompts, Skills Manager,

Back