Retrieval-Augmented Generation (RAG)

Retrieval-augmented generation (RAG) is a technique that lets a language model retrieve relevant external information — from a database, documents, or the web — and use it as context before generating an answer. RAG improves accuracy and reduces hallucination by grounding responses in real source data.

How it works

When a user asks a question, a RAG system first searches a knowledge base (often using vector embeddings) for the most relevant passages, then feeds those passages to the language model alongside the question. The model answers using that retrieved context, so it can cite current, organization-specific, or proprietary information it was never trained on.

Why it matters

RAG is one of the most common enterprise AI architectures because it lets companies apply LLMs to their own data safely and verifiably. Talent capable of building RAG systems is a noted bottleneck to enterprise adoption — see Enterprise AI Statistics 2026.

Related terms: Large Language Model · Agentic AI · All glossary entries