Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) combines a language model with a search system that fetches relevant documents before answering, dramatically reducing hallucination and letting the model use knowledge beyond its training cutoff. A typical RAG pipeline: embed documents into a vector database, embed the user query, retrieve the most similar documents, include them in the LLM prompt, generate the answer. Modern RAG systems add reranking, hybrid search (lexical plus semantic), query rewriting, and citation tracking. RAG is the dominant pattern in enterprise AI — used in legal research (Harvey, Casetext), customer support (Intercom Fin, Zendesk), internal-knowledge chatbots (Glean, Notion AI, Microsoft 365 Copilot), and healthcare (clinical decision support grounded in guidelines). Popular RAG frameworks include LangChain, LlamaIndex, Haystack, and Vespa. AI governance, AI compliance, and AI risk management programs treat RAG as a core responsible AI architecture — but require careful attention to source quality, permissions, citation accuracy, and what content the AI is allowed to retrieve from for each user.
Centralpoint Is RAG-Native: Oxcyon's Centralpoint AI Governance Platform combines retrieval against your governed content with model-agnostic LLM access — ChatGPT, Gemini, Llama, or embedded. Centralpoint meters every call, keeps prompts and skills on-premise, and embeds RAG-powered chatbots into your sites and portals with a single line of JavaScript.
Related Keywords:
Retrieval-Augmented Generation,
,