Skip to main content

Managing Memory in LangChain Apps: Context and Conversation

Memory allows LLM applications to maintain context across multiple turns of conversation. Without memory, each message is stateless—the model forgets everything said before. With memory, you store and retrieve conversation history, entity information, or summaries, giving the model awareness of prior interactions. LangChain provides multiple memory abstractions for different use cases.

I built a customer support chatbot that initially lost context after two messages. Users got frustrated asking the same questions twice. Switching to LangChain's ConversationBufferMemory solved it—the model now recalled the entire conversation history, making interactions coherent.

Types of Memory in LangChain

LangChain offers several memory backends, each with different tradeoffs:

ConversationBufferMemory: Stores all conversation history. Simple, full context, but token usage grows with conversation length.

from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
model = ChatOpenAI(model="gpt-4o-mini", temperature=0.5)

# ConversationChain automatically manages memory
conversation = ConversationChain(
llm=model,
memory=memory,
verbose=True
)

# First turn
response = conversation.run(input="What's your name?")
print(response) # "I'm Claude, an AI assistant."

# Second turn—model remembers the context
response = conversation.run(input="What did I just ask?")
print(response) # "You asked what my name is."

Under the hood, ConversationBufferMemory stores messages in a list and formats them as a string for the prompt. The memory is inserted into the prompt template automatically.

ConversationSummaryMemory: Summarizes conversation history to keep memory bounded. Older messages are compressed; recent ones kept in full.

from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm=model)
conversation = ConversationChain(llm=model, memory=memory)

# After 10 turns, the first 5 are summarized into one line
response = conversation.run(input="Your message here")

Summaries are shorter but may lose nuance. Use this when conversations are long-running.

ConversationEntityMemory: Extracts entities (people, places, objects) and their attributes from conversation history.

from langchain.memory import ConversationEntityMemory

memory = ConversationEntityMemory(llm=model)
conversation = ConversationChain(llm=model, memory=memory)

conversation.run(input="My name is Alice and I work at OpenAI.")
conversation.run(input="What's my job?")
# Model remembers: Alice → works at OpenAI

Useful for applications tracking character information across long conversations.

ConversationKGMemory (Knowledge Graph Memory): Builds a knowledge graph from conversation, storing relationships and facts.

from langchain.memory import ConversationKGMemory

memory = ConversationKGMemory(llm=model)
conversation = ConversationChain(llm=model, memory=memory)

conversation.run(input="Alice works at OpenAI. OpenAI created ChatGPT.")
# Stores: Alice → works_at → OpenAI, OpenAI → created → ChatGPT

Advanced but heavier; reserve for applications requiring structured knowledge extraction.

Building a Multi-Turn Chatbot with Memory

Here's a complete chatbot example using ConversationBufferMemory:

from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.chains import LLMChain

# Set up memory and model
memory = ConversationBufferMemory(human_prefix="User", ai_prefix="Assistant")
model = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

# Create a custom prompt that includes memory
prompt_template = """You are a helpful Python tutor. Keep responses concise.

{history}

User: {input}
Assistant:"""

prompt = ChatPromptTemplate.from_template(prompt_template)

# Create a chain with memory
chain = LLMChain(
llm=model,
prompt=prompt,
memory=memory,
verbose=False
)

# Simulate conversation
print(chain.run(input="What's the difference between a list and a tuple?"))
print(chain.run(input="Can I modify tuples?")) # Model recalls prior context
print(chain.run(input="How do I convert one to the other?"))

Each chain.run() call:

  1. Retrieves memory as a formatted string
  2. Passes it to the prompt as {history}
  3. Appends the user's new input
  4. Gets a response from the model
  5. Stores the new exchange in memory

Memory Variable Window (Recent-Only Context)

For very long conversations, keep only the last N messages to control token usage:

from langchain.memory import ConversationBufferWindowMemory

# Only keep last 5 exchanges
memory = ConversationBufferWindowMemory(k=5)

# Older messages are dropped; only recent ones are sent to the model

This prevents unbounded growth while maintaining context for recent turns.

Custom Memory Backends

For specialized needs, implement your own Memory class:

from langchain.memory import BaseMemory

class CustomMemory(BaseMemory):
"""Store conversation in a database instead of memory."""

memory_variables = ["history"]

def load_memory_variables(self, inputs):
# Return formatted history string
return {"history": self.fetch_from_db()}

def save_context(self, inputs, outputs):
# Save new exchange to database
self.save_to_db(inputs, outputs)

def clear(self):
# Clear database
self.delete_from_db()

This lets you persist memory to Redis, PostgreSQL, or other backends.

Memory with Different Chain Types

Different chain types handle memory differently. ConversationChain automates memory injection. For custom chains using LCEL, manually inject memory:

from langchain_core.runnables import RunnablePassthrough

memory = ConversationBufferMemory()

# Manually pass memory variable to the prompt
chain = (
RunnablePassthrough.assign(
history=lambda x: memory.load_memory_variables(x)["history"]
)
| prompt
| model
| StrOutputParser()
)

# Remember to save context manually
result = chain.invoke({"input": "Your question"})
memory.save_context({"input": "Your question"}, {"output": result})

Memory Comparison Table

TypeMax HistoryToken EfficiencyComplexityBest For
BufferMemoryUnlimitedLow (grows linearly)Very lowShort conversations, prototypes
BufferWindowMemoryLast N turnsLow (fixed window)Very lowLong conversations with recent context
SummaryMemoryFull (compressed)High (summarized)MediumMulti-hour conversations
EntityMemoryEntities onlyMediumMediumTracking character/user information
KGMemoryKnowledge graphLow (structured)HighComplex reasoning with relationships

Key Takeaways

  • ConversationBufferMemory stores all history; use for conversations under 100 turns
  • ConversationBufferWindowMemory keeps only the last N exchanges, controlling token growth
  • ConversationSummaryMemory compresses older messages into summaries for efficiency
  • ConversationEntityMemory tracks named entities and their attributes across turns
  • Always call memory.save_context() to persist new exchanges
  • Persist memory to a database for production apps using custom Memory backends

Frequently Asked Questions

How many turns of conversation can I store before hitting token limits?

It depends on the model's context window and message length. GPT-4o has 128K tokens; if each turn averages 500 tokens, you can store 256 turns. Beyond that, use BufferWindowMemory (keep last 10-20 turns) or SummaryMemory.

Do I need to clear memory manually?

Yes. Between users or conversations, call memory.clear() to avoid leaking context. In production, tie memory clearance to user session lifecycle.

Can I save memory to a database?

Yes. Implement a custom Memory class that reads from and writes to PostgreSQL, MongoDB, or Redis. This persists conversations across restarts.

What if the model misremembers something?

LLMs hallucinate occasionally. To mitigate, use structured memory (EntityMemory, KGMemory) that enforces data consistency, or fact-check with retrieval-augmented generation (RAG).

Does memory work with function calling (tool use)?

Yes, but you'll track not just messages but also tool calls and results. Use ConversationBufferMemory with structured message types (HumanMessage, AIMessage, ToolMessage).

Further Reading