Why Memory Matters for AI Agents
Without memory, every message is a fresh start. Ask a chatbot “What did I just say?” and it has no idea. For real applications — customer support, coding assistants, personal tutors — conversation memory is essential.
LangChain provides several memory strategies, from simple in-memory buffers to summarization-based approaches that compress long histories. This guide walks through each pattern with working code.
The Problem: LLMs Are Stateless
Every call to an LLM API is independent. The model has no built-in knowledge of previous messages. To give an LLM “memory,” you must explicitly include past messages in the current prompt.
The challenge: chat histories grow quickly. A 30-message conversation can exceed many models’ context windows. Good memory management means deciding what to include and what to summarize or discard.
Approach 1: Manual Message History
The simplest approach — maintain a list of messages and pass it directly:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Start with a system message
messages = [
SystemMessage(content="You are a helpful Python programming assistant."),
]
def chat(user_input: str) -> str:
# Add the user's message
messages.append(HumanMessage(content=user_input))
# Call the LLM with the full history
response = llm.invoke(messages)
# Add the AI's response to history
messages.append(AIMessage(content=response.content))
return response.content
# Multi-turn conversation
print(chat("My name is Alex and I'm learning Python."))
print(chat("What's a good first project for me?"))
print(chat("What did I say my name was?")) # Will remember "Alex"
This works perfectly for short conversations. The downside: the history grows unbounded and will eventually exceed the model’s context limit.
Approach 2: ChatMessageHistory with LCEL
LangChain’s RunnableWithMessageHistory wraps any chain and handles history automatically:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
# Store sessions in a dict (replace with Redis/DB for production)
store = {}
def get_session_history(session_id: str) -> ChatMessageHistory:
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder(variable_name="history"), # messages injected here
("human", "{input}"),
])
chain = prompt | llm
chain_with_history = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history",
)
# Each session_id maintains separate history
config = {"configurable": {"session_id": "user-alice"}}
response1 = chain_with_history.invoke(
{"input": "My favorite language is Rust."},
config=config,
)
print(response1.content)
response2 = chain_with_history.invoke(
{"input": "What language did I mention?"},
config=config,
)
print(response2.content) # "You mentioned Rust."
The session_id key lets you maintain separate histories per user — critical for multi-user applications.
Approach 3: Windowed Buffer (Limit to Last N Messages)
For long conversations, only keep the most recent messages:
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.messages import trim_messages
# Keep only the last 10 messages (5 turns)
trimmer = trim_messages(
max_tokens=1000,
strategy="last",
token_counter=ChatOpenAI(model="gpt-4o-mini"),
include_system=True, # always keep the system message
allow_partial=False,
)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder(variable_name="messages"),
("human", "{input}"),
])
from langchain_core.runnables import RunnablePassthrough
chain = (
RunnablePassthrough.assign(
messages=lambda x: trimmer.invoke(x["messages"])
)
| prompt
| llm
)
Approach 4: Conversation Summary Memory
When you need to remember the gist of a long conversation without storing every message, use summarization:
from langchain.memory import ConversationSummaryBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Keep recent messages in full, older messages as summary
memory = ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=500, # summarize when history exceeds 500 tokens
return_messages=True,
)
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True,
)
conversation.predict(input="I'm building a recommendation engine for an e-commerce site.")
conversation.predict(input="The main challenge is cold-start for new users.")
conversation.predict(input="I'm thinking of using collaborative filtering.")
# Memory now contains a summary of early messages + recent messages verbatim
print(memory.moving_summary_buffer)
This approach is ideal for customer support bots or long-running assistants where the full history would be too large.
Approach 5: Persistent Memory with Redis
For production applications, persist memory in Redis so it survives server restarts:
pip install redis langchain-redis
from langchain_redis import RedisChatMessageHistory
def get_session_history(session_id: str):
return RedisChatMessageHistory(
session_id=session_id,
url="redis://localhost:6379",
ttl=86400, # expire sessions after 24 hours
)
chain_with_history = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history",
)
Swap RedisChatMessageHistory for SQLChatMessageHistory (Postgres/SQLite) or DynamoDBChatMessageHistory (AWS) depending on your infrastructure.
Approach 6: LangGraph Checkpointer (Recommended for Agents)
For agents that use tools, LangGraph with a checkpointer is the modern recommended approach. It persists the entire agent state (messages, tool results, reasoning):
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent as create_lg_agent
llm = ChatOpenAI(model="gpt-4o-mini")
checkpointer = MemorySaver() # in-memory; use SqliteSaver for persistence
agent = create_lg_agent(llm, tools=[], checkpointer=checkpointer)
# Each thread_id is a separate conversation
config = {"configurable": {"thread_id": "conversation-1"}}
result = agent.invoke(
{"messages": [{"role": "user", "content": "My name is Jordan."}]},
config=config,
)
result2 = agent.invoke(
{"messages": [{"role": "user", "content": "What is my name?"}]},
config=config,
)
print(result2["messages"][-1].content) # "Your name is Jordan."
For persistent storage, use SqliteSaver:
from langgraph.checkpoint.sqlite import SqliteSaver
with SqliteSaver.from_conn_string("memory.db") as checkpointer:
agent = create_lg_agent(llm, tools=[], checkpointer=checkpointer)
Choosing the Right Memory Strategy
| Strategy | Best for | Limitation |
|---|---|---|
| Manual message list | Prototypes, simple scripts | Grows unbounded |
RunnableWithMessageHistory | Multi-user chatbots | Still grows without trimming |
| Windowed buffer | Long conversations | May lose important early context |
| Summary memory | Very long conversations | Lossy — details may be dropped |
| Redis/SQL persistence | Production chatbots | Requires infrastructure |
| LangGraph checkpointer | Tool-using agents | Requires LangGraph setup |
Frequently Asked Questions
What is the difference between ConversationBufferMemory and ConversationSummaryMemory?
ConversationBufferMemory stores every message verbatim. It’s simple but grows without bound. ConversationSummaryMemory periodically compresses old messages into a summary, keeping the context window manageable at the cost of some detail. For conversations longer than ~20 turns, use summary or windowed approaches.
How do I give each user their own memory in a web app?
Use the session_id pattern with RunnableWithMessageHistory. Map each user (by user ID or session token) to their own ChatMessageHistory object stored in Redis or a database. Never share a single history object across users.
Does memory work with streaming responses?
Yes. Use .astream() instead of .invoke() and pass the same config:
async for chunk in chain_with_history.astream(
{"input": "Tell me a long story"},
config={"configurable": {"session_id": "user-1"}}
):
print(chunk.content, end="", flush=True)
How do I clear a user’s conversation history?
history = get_session_history("user-alice")
history.clear()
For Redis/SQL backends, this deletes the records from the store.
What happens when the conversation exceeds the model’s context window?
Without trimming, you will get a ContextLengthExceeded API error. Always set either a token-based trimmer or use summary memory in production. As a safety net, catch openai.BadRequestError and clear the oldest messages.
Next Steps
- LangChain Agents and Tools — Combine memory with tool-using agents
- Getting Started with LlamaIndex — Add document retrieval alongside conversation memory
- Letta Memory Architecture — Explore a framework purpose-built for persistent agent memory