RAG (Retrieval-Augmented Generation): Cara Membuat LLM yang Tahu Data Internal Perusahaan

Understand RAG architecture from scratch: how embeddings, vector databases, and retrieval work together to enable LLMs to answer from internal documents without fine-tuning.

The Problem RAG Solves

LLMs are trained on data up to a certain cutoff date and know nothing about your company's internal documents — SOPs, reports, policies, or recent emails. Fine-tuning for every update is prohibitively expensive.

RAG solves this more elegantly: before answering, the system automatically retrieves relevant documents and includes them in the LLM's context.

## How It Works: 3 Phases

Phase 1 – Indexing (done once):

1. Cut documents into ~500-word chunks

2. Convert each chunk into an 'embedding' — a numerical vector representing semantic meaning

3. Store all embeddings in a vector database (Pinecone, ChromaDB)

Phase 2 – Retrieval (when query arrives):

1. Convert user question into an embedding

2. Find k most similar embeddings (cosine similarity)

3. Retrieve the corresponding document text

Phase 3 – Generation:

Combine: [system instructions] + [relevant docs] + [user question] → send to LLM

## Simple Python Implementation

```python

from openai import OpenAI

import chromadb

client = OpenAI()

chroma = chromadb.Client()

collection = chroma.create_collection('docs')

def index_doc(text, doc_id):

emb = client.embeddings.create(

model='text-embedding-3-small', input=text

).data[0].embedding

collection.add(embeddings=[emb], documents=[text], ids=[doc_id])

def rag_query(question, k=3):

q_emb = client.embeddings.create(

model='text-embedding-3-small', input=question

).data[0].embedding

results = collection.query(query_embeddings=[q_emb], n_results=k)

context = chr(10).join(results['documents'][0])

resp = client.chat.completions.create(

model='gpt-4o',

messages=[

{'role':'system','content':f'Answer based on context:{context}'},

{'role':'user','content':question}

])

return resp.choices[0].message.content

```

## Keys to RAG Success

- Chunking: too small loses context, too large wastes tokens

- Metadata filtering: filter docs by department or date

- Hybrid search: combine semantic + keyword search

RAG (Retrieval-Augmented Generation): Building an LLM That Knows Your Company's Internal Data

The Problem RAG Solves

'Related Content'