Skip to main content

Document Loaders and Retrieval: LangChain RAG Foundations

Retrieval-augmented generation (RAG) grounds LLM responses in your data. Instead of relying on a model's training knowledge, RAG retrieves relevant documents and feeds them to the model. LangChain's document loaders ingest PDFs, web pages, or databases; text splitters chunk large documents; embeddings convert text to vectors; and vector stores index and retrieve relevant passages.

I built a chatbot that hallucinated constantly until I integrated RAG. Same model, but with access to our documentation. Suddenly it cited sources and gave accurate answers. That's the power of RAG.

Document Loaders: Ingesting Text from Sources

Document loaders read from various sources and return a list of Document objects:

from langchain_community.document_loaders import PyPDFLoader, TextLoader, WebBaseLoader

# Load a PDF
pdf_loader = PyPDFLoader("document.pdf")
documents = pdf_loader.load()

# Load a plain text file
text_loader = TextLoader("notes.txt")
documents = text_loader.load()

# Load from a web page
web_loader = WebBaseLoader("https://python.langchain.com/docs/")
documents = web_loader.load()

print(documents[0].page_content) # The text
print(documents[0].metadata) # Source, page number, etc.

Common loaders:

  • PyPDFLoader: PDF files
  • CSVLoader: CSV data
  • DirectoryLoader: Multiple files from a directory
  • WebBaseLoader: Web pages
  • GitHubRepositoryLoader: GitHub repos
  • UnstructuredURLLoader: HTML with fallback parsing

Text Splitters: Chunking Documents

Large documents must be split into chunks that fit the model's context window and retrieve relevant passages:

from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create a splitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Characters per chunk
chunk_overlap=200, # Overlap between chunks to preserve context
separators=["\n\n", "\n", " ", ""] # Split on paragraphs first, then sentences, then words
)

# Split documents
chunks = splitter.split_documents(documents)

print(f"Created {len(chunks)} chunks")
print(f"First chunk: {chunks[0].page_content[:200]}...")

Key parameters:

  • chunk_size: Larger = more context per chunk, fewer searches needed; smaller = more granular retrieval
  • chunk_overlap: Prevents important information from being cut off
  • separators: Order matters—split on logical boundaries first

For code or structured text, use Language-specific splitters:

from langchain_text_splitters import Language, RecursiveCharacterTextSplitter

code_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.PYTHON,
chunk_size=512,
chunk_overlap=100
)

code_chunks = code_splitter.split_text(python_code)

Embeddings: Converting Text to Vectors

Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts have similar vectors:

from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings import OllamaEmbeddings

# Use OpenAI embeddings (requires API key)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Or use local Ollama embeddings
embeddings = OllamaEmbeddings(model="nomic-embed-text")

# Embed a single text
vector = embeddings.embed_query("What is Python?")
print(len(vector)) # 384 dimensions for text-embedding-3-small

# Embed multiple texts
texts = ["Python is a language", "JavaScript is also a language"]
vectors = embeddings.embed_documents(texts)

OpenAI embeddings are high-quality but cost per token. Ollama embeddings are free but lower quality. Choose based on your accuracy/cost tradeoff.

Vector Stores: Indexing and Retrieval

Vector stores index embeddings and retrieve the most similar documents:

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# Create embeddings
embeddings = OpenAIEmbeddings()

# Create vector store from documents
vector_store = FAISS.from_documents(
documents=chunks,
embedding=embeddings
)

# Retrieve similar documents
query = "How do I use LangChain chains?"
similar_docs = vector_store.similarity_search(query, k=3)

for doc in similar_docs:
print(f"Source: {doc.metadata['source']}")
print(f"Content: {doc.page_content[:200]}...")

Popular vector stores:

  • FAISS: Fast, in-memory, great for prototyping
  • Pinecone: Managed cloud, scales to billions of vectors
  • Weaviate: Open-source, self-hosted
  • Chroma: Lightweight, embedded in Python
  • Milvus: Open-source vector database

Building a RAG Retriever

Combine embeddings and vector store into a retriever:

from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

# Basic retriever
retriever = vector_store.as_retriever(
search_kwargs={"k": 3} # Return top 3 matches
)

# Or use MultiQueryRetriever: generates multiple queries to improve recall
model = ChatOpenAI(model="gpt-4o-mini")
retriever = MultiQueryRetriever.from_llm_and_retriever(
llm=model,
retriever=retriever
)

# Retrieve documents
results = retriever.invoke("How do chains work?")

Building a Complete RAG Chain

Combine retrieval with generation:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Retriever
retriever = vector_store.as_retriever(search_kwargs={"k": 3})

# Prompt template
prompt = ChatPromptTemplate.from_template("""
Answer the question using only the provided context.

Context:
{context}

Question: {question}

Answer:
""")

# RAG chain: retrieve, format context, then generate
rag_chain = (
{"context": retriever, "question": lambda x: x["question"]}
| prompt
| model
| StrOutputParser()
)

# Run it
result = rag_chain.invoke({"question": "How do I build a chain?"})
print(result)

The retriever receives the question, returns relevant documents, and passes them as context to the prompt.

RAG Pipeline Comparison

ComponentPurposeComplexity
Document LoaderIngest text from sourcesVery low
Text SplitterChunk documents into passagesLow
EmbeddingsConvert text to vectorsVery low (API call)
Vector StoreIndex and search vectorsLow
RetrieverWrap vector store with LLM-aware queriesLow
RAG ChainCombine retrieval and generationMedium

Key Takeaways

  • Document loaders ingest text from PDFs, web pages, databases, and files
  • Text splitters chunk documents with controlled overlap to preserve context
  • Embeddings convert text to semantic vectors for similarity search
  • Vector stores index embeddings and retrieve the most relevant documents
  • Retrievers wrap vector stores with intelligence (e.g., multi-query expansion)
  • RAG chains retrieve documents and feed them as context to the model
  • For production, persist vector stores to disk or cloud (Pinecone, Weaviate)

Frequently Asked Questions

What chunk size and overlap should I use?

Start with 1000 characters and 200-overlap. For dense documents, smaller chunks (500) improve precision. For sparse documents, larger chunks (2000) improve recall. Experiment.

Which vector store should I use?

FAISS for local prototyping, Pinecone for production cloud, Chroma for lightweight embedded use. All support similarity search; choose based on scale and deployment model.

How much do embeddings cost?

OpenAI charges per token: text-embedding-3-small is ~$0.02 per 1M tokens. A 1000-word document is ~1300 tokens, so ~$0.00003 per document. Free local alternatives (Ollama) exist but are lower quality.

Can I update the vector store after creating it?

Yes. Use vector_store.add_documents(new_chunks) to add documents or vector_store.delete() to remove. For large indices, recreating is sometimes faster.

What if my documents are structured (JSON, tables)?

Use loaders that preserve structure (JSONLoader, CSVLoader) and splitters that respect boundaries. Or parse to plain text before splitting—context preservation matters more than structure.

Further Reading