code/+/trust primary logo full color svg

RAG (Retrieval-Augmented Generation)

Definition

Retrieval-augmented generation (RAG) is an AI architecture that supplements a large language model's static training knowledge with real-time retrieval from a private or external knowledge base. RAG reduces hallucinations by grounding LLM responses in verified source documents, making it the standard pattern for enterprise AI assistants built on proprietary data.

A vanilla LLM only knows what it learned during training. RAG adds a retrieval step: at query time, the system searches a vector database for the most relevant documents and passes them as context to the LLM before generation. The model answers from retrieved facts, not memory.

RAG architecture components

  • Document ingestion pipeline -- split, embed, and index source documents
  • Vector database -- stores embeddings for fast similarity search (Pinecone, pgvector, Weaviate)
  • Retriever -- returns the top-k most relevant chunks at query time
  • LLM generator -- synthesizes a response from retrieved context

When RAG beats fine-tuning

Use RAG when your knowledge base changes frequently or when you need source citations. Fine-tuning is better for teaching tone, format, or domain-specific reasoning patterns that do not require up-to-date facts.

Related terms

Need help implementing this in your business?

Code and Trust translates AI concepts like rag (retrieval-augmented generation) into working implementations — starting with a workflow audit that shows exactly where it creates ROI.

Schedule AI Audit →