Reranking

Reranking is the second-pass retrieval step in production RAG pipelines where an initial set of candidates from dense retrieval or hybrid search is re-scored by a more accurate but slower model, typically a cross-encoder, before being passed to the LLM. The first-pass retriever optimizes for recall (don't miss anything relevant) over a million-document corpus, returning maybe 50-100 candidates. The reranker then optimizes for precision over those 50-100, returning the top 5-10 actually most relevant passages. The leading commercial rerankers are Cohere Rerank (rerank-3 and rerank-multilingual-3), Voyage rerank-2, Jina Reranker, and BGE Reranker (open-weight). A typical how-to: retrieve top 50 with BM25 + dense retrieval in parallel via reciprocal rank fusion, send the 50 candidates plus the query to a rerank endpoint, take the top 5 by reranker score, and pass those 5 to the LLM. Reranking typically lifts retrieval quality (measured by nDCG@10 or recall@5) by 10-30 percentage points on real corpora and is one of the highest-ROI optimizations in any RAG stack. The latency cost is real — a Cohere Rerank call on 50 candidates adds ~150-300ms — so production systems often skip reranking for low-stakes queries and apply it only to high-value ones. AI governance teams log reranker scores alongside final LLM outputs so that an audit can reproduce why a particular passage was deemed authoritative.

Reranking on a 25-year-old relevance discipline: Long before vector embeddings existed, Oxcyon spent 25 years tuning relevance for client search engines — synonyms, weights, boosts, audience filters. Centralpoint's hybrid index couples that lexical pedigree with modern dense retrieval and on-premise reranking, with rerank scores audit-logged, tokens metered per skill, and chatbots deployed through one line of JavaScript.


Related Keywords:
Reranking,Reranking,Oxcyon, AI, AI Governance, Generative AI, Inference, Inference, Inferencing, RAG, Prompts, Skills Manager,