Navigasi
'Back to Learn'
IntermediateAI Intermediate65 menitTutorial

RAG (Retrieval-Augmented Generation): Building an LLM That Knows Your Company's Internal Data

Understand RAG architecture from scratch: how embeddings, vector databases, and retrieval work together to enable LLMs to answer from internal documents without fine-tuning.

The Problem RAG Solves


LLMs are trained on data up to a certain cutoff date and know nothing about your company's internal documents โ€” SOPs, reports, policies, or recent emails. Fine-tuning for every update is prohibitively expensive.


RAG solves this more elegantly: before answering, the system automatically retrieves relevant documents and includes them in the LLM's context.


## How It Works: 3 Phases


Phase 1 โ€“ Indexing (done once):

1. Cut documents into ~500-word chunks

2. Convert each chunk into an 'embedding' โ€” a numerical vector representing semantic meaning

3. Store all embeddings in a vector database (Pinecone, ChromaDB)


Phase 2 โ€“ Retrieval (when query arrives):

1. Convert user question into an embedding

2. Find k most similar embeddings (cosine similarity)

3. Retrieve the corresponding document text


Phase 3 โ€“ Generation:

Combine: [system instructions] + [relevant docs] + [user question] โ†’ send to LLM


## Simple Python Implementation


```python

from openai import OpenAI

import chromadb


client = OpenAI()

chroma = chromadb.Client()

collection = chroma.create_collection('docs')


def index_doc(text, doc_id):

emb = client.embeddings.create(

model='text-embedding-3-small', input=text

).data[0].embedding

collection.add(embeddings=[emb], documents=[text], ids=[doc_id])


def rag_query(question, k=3):

q_emb = client.embeddings.create(

model='text-embedding-3-small', input=question

).data[0].embedding

results = collection.query(query_embeddings=[q_emb], n_results=k)

context = chr(10).join(results['documents'][0])

resp = client.chat.completions.create(

model='gpt-4o',

messages=[

{'role':'system','content':f'Answer based on context:{context}'},

{'role':'user','content':question}

])

return resp.choices[0].message.content

```


## Keys to RAG Success


- Chunking: too small loses context, too large wastes tokens

- Metadata filtering: filter docs by department or date

- Hybrid search: combine semantic + keyword search