Document Loaders and Retrieval: LangChain RAG Foundations
Retrieval-augmented generation (RAG) grounds LLM responses in your data. Instead of relying on a model's training knowledge, RAG retrieves relevant documents and feeds them to the model. LangChain's document loaders ingest PDFs, web pages, or databases; text splitters chunk large documents; embeddings convert text to vectors; and vector stores index and retrieve relevant passages.
I built a chatbot that hallucinated constantly until I integrated RAG. Same model, but with access to our documentation. Suddenly it cited sources and gave accurate answers. That's the power of RAG.
Document Loaders: Ingesting Text from Sources
Document loaders read from various sources and return a list of Document objects:
from langchain_community.document_loaders import PyPDFLoader, TextLoader, WebBaseLoader
# Load a PDF
pdf_loader = PyPDFLoader("document.pdf")
documents = pdf_loader.load()
# Load a plain text file
text_loader = TextLoader("notes.txt")
documents = text_loader.load()
# Load from a web page
web_loader = WebBaseLoader("https://python.langchain.com/docs/")
documents = web_loader.load()
print(documents[0].page_content) # The text
print(documents[0].metadata) # Source, page number, etc.
Common loaders:
- PyPDFLoader: PDF files
- CSVLoader: CSV data
- DirectoryLoader: Multiple files from a directory
- WebBaseLoader: Web pages
- GitHubRepositoryLoader: GitHub repos
- UnstructuredURLLoader: HTML with fallback parsing
Text Splitters: Chunking Documents
Large documents must be split into chunks that fit the model's context window and retrieve relevant passages:
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Create a splitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Characters per chunk
chunk_overlap=200, # Overlap between chunks to preserve context
separators=["\n\n", "\n", " ", ""] # Split on paragraphs first, then sentences, then words
)
# Split documents
chunks = splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks")
print(f"First chunk: {chunks[0].page_content[:200]}...")
Key parameters:
chunk_size: Larger = more context per chunk, fewer searches needed; smaller = more granular retrievalchunk_overlap: Prevents important information from being cut offseparators: Order matters—split on logical boundaries first
For code or structured text, use Language-specific splitters:
from langchain_text_splitters import Language, RecursiveCharacterTextSplitter
code_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.PYTHON,
chunk_size=512,
chunk_overlap=100
)
code_chunks = code_splitter.split_text(python_code)
Embeddings: Converting Text to Vectors
Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts have similar vectors:
from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings import OllamaEmbeddings
# Use OpenAI embeddings (requires API key)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Or use local Ollama embeddings
embeddings = OllamaEmbeddings(model="nomic-embed-text")
# Embed a single text
vector = embeddings.embed_query("What is Python?")
print(len(vector)) # 384 dimensions for text-embedding-3-small
# Embed multiple texts
texts = ["Python is a language", "JavaScript is also a language"]
vectors = embeddings.embed_documents(texts)
OpenAI embeddings are high-quality but cost per token. Ollama embeddings are free but lower quality. Choose based on your accuracy/cost tradeoff.
Vector Stores: Indexing and Retrieval
Vector stores index embeddings and retrieve the most similar documents:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
# Create embeddings
embeddings = OpenAIEmbeddings()
# Create vector store from documents
vector_store = FAISS.from_documents(
documents=chunks,
embedding=embeddings
)
# Retrieve similar documents
query = "How do I use LangChain chains?"
similar_docs = vector_store.similarity_search(query, k=3)
for doc in similar_docs:
print(f"Source: {doc.metadata['source']}")
print(f"Content: {doc.page_content[:200]}...")
Popular vector stores:
- FAISS: Fast, in-memory, great for prototyping
- Pinecone: Managed cloud, scales to billions of vectors
- Weaviate: Open-source, self-hosted
- Chroma: Lightweight, embedded in Python
- Milvus: Open-source vector database
Building a RAG Retriever
Combine embeddings and vector store into a retriever:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI
# Basic retriever
retriever = vector_store.as_retriever(
search_kwargs={"k": 3} # Return top 3 matches
)
# Or use MultiQueryRetriever: generates multiple queries to improve recall
model = ChatOpenAI(model="gpt-4o-mini")
retriever = MultiQueryRetriever.from_llm_and_retriever(
llm=model,
retriever=retriever
)
# Retrieve documents
results = retriever.invoke("How do chains work?")
Building a Complete RAG Chain
Combine retrieval with generation:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Retriever
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
# Prompt template
prompt = ChatPromptTemplate.from_template("""
Answer the question using only the provided context.
Context:
{context}
Question: {question}
Answer:
""")
# RAG chain: retrieve, format context, then generate
rag_chain = (
{"context": retriever, "question": lambda x: x["question"]}
| prompt
| model
| StrOutputParser()
)
# Run it
result = rag_chain.invoke({"question": "How do I build a chain?"})
print(result)
The retriever receives the question, returns relevant documents, and passes them as context to the prompt.
RAG Pipeline Comparison
| Component | Purpose | Complexity |
|---|---|---|
| Document Loader | Ingest text from sources | Very low |
| Text Splitter | Chunk documents into passages | Low |
| Embeddings | Convert text to vectors | Very low (API call) |
| Vector Store | Index and search vectors | Low |
| Retriever | Wrap vector store with LLM-aware queries | Low |
| RAG Chain | Combine retrieval and generation | Medium |
Key Takeaways
- Document loaders ingest text from PDFs, web pages, databases, and files
- Text splitters chunk documents with controlled overlap to preserve context
- Embeddings convert text to semantic vectors for similarity search
- Vector stores index embeddings and retrieve the most relevant documents
- Retrievers wrap vector stores with intelligence (e.g., multi-query expansion)
- RAG chains retrieve documents and feed them as context to the model
- For production, persist vector stores to disk or cloud (Pinecone, Weaviate)
Frequently Asked Questions
What chunk size and overlap should I use?
Start with 1000 characters and 200-overlap. For dense documents, smaller chunks (500) improve precision. For sparse documents, larger chunks (2000) improve recall. Experiment.
Which vector store should I use?
FAISS for local prototyping, Pinecone for production cloud, Chroma for lightweight embedded use. All support similarity search; choose based on scale and deployment model.
How much do embeddings cost?
OpenAI charges per token: text-embedding-3-small is ~$0.02 per 1M tokens. A 1000-word document is ~1300 tokens, so ~$0.00003 per document. Free local alternatives (Ollama) exist but are lower quality.
Can I update the vector store after creating it?
Yes. Use vector_store.add_documents(new_chunks) to add documents or vector_store.delete() to remove. For large indices, recreating is sometimes faster.
What if my documents are structured (JSON, tables)?
Use loaders that preserve structure (JSONLoader, CSVLoader) and splitters that respect boundaries. Or parse to plain text before splitting—context preservation matters more than structure.