LangChain Memory Management: Build Chatbots That Remember

Q: What is the difference between `ConversationBufferMemory` and `ConversationSummaryMemory`?

ConversationBufferMemory stores every message verbatim. It's simple but grows without bound. ConversationSummaryMemory periodically compresses old messages into a summary, keeping the context window manageable at the cost of some detail. For conversations longer than ~20 turns, use summary or windowed approaches.

Q: How do I give each user their own memory in a web app?

Use the sessionid pattern with RunnableWithMessageHistory. Map each user (by user ID or session token) to their own ChatMessageHistory object stored in Redis or a database. Never share a single history object across users.

Q: Does memory work with streaming responses?

Yes. Use .astream() instead of .invoke() and pass the same config: python async for chunk in chainwithhistory.astream( {"input": "Tell me a long story"}, config={"configurable": {"sessionid": "user-1"}} ): print(chunk.content, end="", flush=True)

Q: How do I clear a user's conversation history?

python history = getsessionhistory("user-alice") history.clear() For Redis/SQL backends, this deletes the records from the store.

Q: What happens when the conversation exceeds the model's context window?

Without trimming, you will get a ContextLengthExceeded API error. Always set either a token-based trimmer or use summary memory in production. As a safety net, catch openai.BadRequestError and clear the oldest messages.

Why Memory Matters for AI Agents

Without memory, every message is a fresh start. Ask a chatbot “What did I just say?” and it has no idea. For real applications — customer support, coding assistants, personal tutors — conversation memory is essential.

LangChain provides several memory strategies, from simple in-memory buffers to summarization-based approaches that compress long histories. This guide walks through each pattern with working code.

The Problem: LLMs Are Stateless

Every call to an LLM API is independent. The model has no built-in knowledge of previous messages. To give an LLM “memory,” you must explicitly include past messages in the current prompt.

The challenge: chat histories grow quickly. A 30-message conversation can exceed many models’ context windows. Good memory management means deciding what to include and what to summarize or discard.

Approach 1: Manual Message History

The simplest approach — maintain a list of messages and pass it directly:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Start with a system message
messages = [
    SystemMessage(content="You are a helpful Python programming assistant."),
]

def chat(user_input: str) -> str:
    # Add the user's message
    messages.append(HumanMessage(content=user_input))

    # Call the LLM with the full history
    response = llm.invoke(messages)

    # Add the AI's response to history
    messages.append(AIMessage(content=response.content))

    return response.content

# Multi-turn conversation
print(chat("My name is Alex and I'm learning Python."))
print(chat("What's a good first project for me?"))
print(chat("What did I say my name was?"))  # Will remember "Alex"

This works perfectly for short conversations. The downside: the history grows unbounded and will eventually exceed the model’s context limit.

Approach 2: `ChatMessageHistory` with LCEL

LangChain’s RunnableWithMessageHistory wraps any chain and handles history automatically:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

# Store sessions in a dict (replace with Redis/DB for production)
store = {}

def get_session_history(session_id: str) -> ChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),  # messages injected here
    ("human", "{input}"),
])

chain = prompt | llm

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# Each session_id maintains separate history
config = {"configurable": {"session_id": "user-alice"}}

response1 = chain_with_history.invoke(
    {"input": "My favorite language is Rust."},
    config=config,
)
print(response1.content)

response2 = chain_with_history.invoke(
    {"input": "What language did I mention?"},
    config=config,
)
print(response2.content)  # "You mentioned Rust."

The session_id key lets you maintain separate histories per user — critical for multi-user applications.

Approach 3: Windowed Buffer (Limit to Last N Messages)

For long conversations, only keep the most recent messages:

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.messages import trim_messages

# Keep only the last 10 messages (5 turns)
trimmer = trim_messages(
    max_tokens=1000,
    strategy="last",
    token_counter=ChatOpenAI(model="gpt-4o-mini"),
    include_system=True,  # always keep the system message
    allow_partial=False,
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="messages"),
    ("human", "{input}"),
])

from langchain_core.runnables import RunnablePassthrough

chain = (
    RunnablePassthrough.assign(
        messages=lambda x: trimmer.invoke(x["messages"])
    )
    | prompt
    | llm
)

Approach 4: Conversation Summary Memory

When you need to remember the gist of a long conversation without storing every message, use summarization:

from langchain.memory import ConversationSummaryBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Keep recent messages in full, older messages as summary
memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=500,  # summarize when history exceeds 500 tokens
    return_messages=True,
)

conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True,
)

conversation.predict(input="I'm building a recommendation engine for an e-commerce site.")
conversation.predict(input="The main challenge is cold-start for new users.")
conversation.predict(input="I'm thinking of using collaborative filtering.")

# Memory now contains a summary of early messages + recent messages verbatim
print(memory.moving_summary_buffer)

This approach is ideal for customer support bots or long-running assistants where the full history would be too large.

Approach 5: Persistent Memory with Redis

For production applications, persist memory in Redis so it survives server restarts:

pip install redis langchain-redis

from langchain_redis import RedisChatMessageHistory

def get_session_history(session_id: str):
    return RedisChatMessageHistory(
        session_id=session_id,
        url="redis://localhost:6379",
        ttl=86400,  # expire sessions after 24 hours
    )

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

Swap RedisChatMessageHistory for SQLChatMessageHistory (Postgres/SQLite) or DynamoDBChatMessageHistory (AWS) depending on your infrastructure.

Approach 6: LangGraph Checkpointer (Recommended for Agents)

For agents that use tools, LangGraph with a checkpointer is the modern recommended approach. It persists the entire agent state (messages, tool results, reasoning):

from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent as create_lg_agent

llm = ChatOpenAI(model="gpt-4o-mini")
checkpointer = MemorySaver()  # in-memory; use SqliteSaver for persistence

agent = create_lg_agent(llm, tools=[], checkpointer=checkpointer)

# Each thread_id is a separate conversation
config = {"configurable": {"thread_id": "conversation-1"}}

result = agent.invoke(
    {"messages": [{"role": "user", "content": "My name is Jordan."}]},
    config=config,
)

result2 = agent.invoke(
    {"messages": [{"role": "user", "content": "What is my name?"}]},
    config=config,
)
print(result2["messages"][-1].content)  # "Your name is Jordan."

For persistent storage, use SqliteSaver:

from langgraph.checkpoint.sqlite import SqliteSaver

with SqliteSaver.from_conn_string("memory.db") as checkpointer:
    agent = create_lg_agent(llm, tools=[], checkpointer=checkpointer)

Choosing the Right Memory Strategy

Strategy	Best for	Limitation
Manual message list	Prototypes, simple scripts	Grows unbounded
`RunnableWithMessageHistory`	Multi-user chatbots	Still grows without trimming
Windowed buffer	Long conversations	May lose important early context
Summary memory	Very long conversations	Lossy — details may be dropped
Redis/SQL persistence	Production chatbots	Requires infrastructure
LangGraph checkpointer	Tool-using agents	Requires LangGraph setup

Frequently Asked Questions

What is the difference between `ConversationBufferMemory` and `ConversationSummaryMemory`?

ConversationBufferMemory stores every message verbatim. It’s simple but grows without bound. ConversationSummaryMemory periodically compresses old messages into a summary, keeping the context window manageable at the cost of some detail. For conversations longer than ~20 turns, use summary or windowed approaches.

How do I give each user their own memory in a web app?

Use the session_id pattern with RunnableWithMessageHistory. Map each user (by user ID or session token) to their own ChatMessageHistory object stored in Redis or a database. Never share a single history object across users.

Does memory work with streaming responses?

Yes. Use .astream() instead of .invoke() and pass the same config:

async for chunk in chain_with_history.astream(
    {"input": "Tell me a long story"},
    config={"configurable": {"session_id": "user-1"}}
):
    print(chunk.content, end="", flush=True)

How do I clear a user’s conversation history?

history = get_session_history("user-alice")
history.clear()

For Redis/SQL backends, this deletes the records from the store.

What happens when the conversation exceeds the model’s context window?

Without trimming, you will get a ContextLengthExceeded API error. Always set either a token-based trimmer or use summary memory in production. As a safety net, catch openai.BadRequestError and clear the oldest messages.

Next Steps

LangChain Agents and Tools — Combine memory with tool-using agents
Getting Started with LlamaIndex — Add document retrieval alongside conversation memory
Letta Memory Architecture — Explore a framework purpose-built for persistent agent memory