RAG (Retrieval-Augmented Generation)
Definition
Retrieval-augmented generation (RAG) is an AI architecture that supplements a large language model's static training knowledge with real-time retrieval from a private or external knowledge base. RAG reduces hallucinations by grounding LLM responses in verified source documents, making it the standard pattern for enterprise AI assistants built on proprietary data.
A vanilla LLM only knows what it learned during training. RAG adds a retrieval step: at query time, the system searches a vector database for the most relevant documents and passes them as context to the LLM before generation. The model answers from retrieved facts, not memory.
RAG architecture components
- Document ingestion pipeline -- split, embed, and index source documents
- Vector database -- stores embeddings for fast similarity search (Pinecone, pgvector, Weaviate)
- Retriever -- returns the top-k most relevant chunks at query time
- LLM generator -- synthesizes a response from retrieved context
When RAG beats fine-tuning
Use RAG when your knowledge base changes frequently or when you need source citations. Fine-tuning is better for teaching tone, format, or domain-specific reasoning patterns that do not require up-to-date facts.
Related terms
LLM (Large Language Model)
A large language model (LLM) is a deep-learning model trained on billions of text tokens to predict and generate human-readable language. LLMs such as GPT-4, Claude, and Gemini power chatbots, document summarization, code generation, and AI workflow automation -- and serve as the reasoning engine inside RAG systems and AI agents.
Fine-Tuning
Fine-tuning is the process of further training a pre-trained large language model on a curated dataset of domain-specific examples to adjust its tone, format, or reasoning patterns. A fine-tuned model can match a specialized style with 10-100x fewer tokens at inference time, reducing API cost and latency for high-volume production workloads.
Vector Database
A vector database is a specialized data store that indexes and retrieves high-dimensional numerical embeddings by similarity rather than by exact match. Vector databases power retrieval-augmented generation (RAG) systems by finding the documents most semantically relevant to a user query in milliseconds, even across millions of stored records.
Embeddings
Embeddings are dense numerical vectors -- typically 768 to 3,072 floating-point numbers -- that represent the semantic meaning of a piece of text, image, or other data. Documents with similar meaning produce embeddings that are close together in vector space, enabling AI systems to find relevant content by meaning rather than keyword matching.
Need help implementing this in your business?
Code and Trust translates AI concepts like rag (retrieval-augmented generation) into working implementations — starting with a workflow audit that shows exactly where it creates ROI.
Schedule AI Audit →