Intermediate Langchain 4 min read

LangChain Memory Management: Build Chatbots That Remember

#langchain #memory #chatbot #conversation-history #langgraph #python
📚

Read these first:

Why Memory Matters for AI Agents

Without memory, every message is a fresh start. Ask a chatbot “What did I just say?” and it has no idea. For real applications — customer support, coding assistants, personal tutors — conversation memory is essential.

LangChain provides several memory strategies, from simple in-memory buffers to summarization-based approaches that compress long histories. This guide walks through each pattern with working code.

The Problem: LLMs Are Stateless

Every call to an LLM API is independent. The model has no built-in knowledge of previous messages. To give an LLM “memory,” you must explicitly include past messages in the current prompt.

The challenge: chat histories grow quickly. A 30-message conversation can exceed many models’ context windows. Good memory management means deciding what to include and what to summarize or discard.

Approach 1: Manual Message History

The simplest approach — maintain a list of messages and pass it directly:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Start with a system message
messages = [
    SystemMessage(content="You are a helpful Python programming assistant."),
]

def chat(user_input: str) -> str:
    # Add the user's message
    messages.append(HumanMessage(content=user_input))

    # Call the LLM with the full history
    response = llm.invoke(messages)

    # Add the AI's response to history
    messages.append(AIMessage(content=response.content))

    return response.content

# Multi-turn conversation
print(chat("My name is Alex and I'm learning Python."))
print(chat("What's a good first project for me?"))
print(chat("What did I say my name was?"))  # Will remember "Alex"

This works perfectly for short conversations. The downside: the history grows unbounded and will eventually exceed the model’s context limit.

Approach 2: ChatMessageHistory with LCEL

LangChain’s RunnableWithMessageHistory wraps any chain and handles history automatically:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

# Store sessions in a dict (replace with Redis/DB for production)
store = {}

def get_session_history(session_id: str) -> ChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),  # messages injected here
    ("human", "{input}"),
])

chain = prompt | llm

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# Each session_id maintains separate history
config = {"configurable": {"session_id": "user-alice"}}

response1 = chain_with_history.invoke(
    {"input": "My favorite language is Rust."},
    config=config,
)
print(response1.content)

response2 = chain_with_history.invoke(
    {"input": "What language did I mention?"},
    config=config,
)
print(response2.content)  # "You mentioned Rust."

The session_id key lets you maintain separate histories per user — critical for multi-user applications.

Approach 3: Windowed Buffer (Limit to Last N Messages)

For long conversations, only keep the most recent messages:

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.messages import trim_messages

# Keep only the last 10 messages (5 turns)
trimmer = trim_messages(
    max_tokens=1000,
    strategy="last",
    token_counter=ChatOpenAI(model="gpt-4o-mini"),
    include_system=True,  # always keep the system message
    allow_partial=False,
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="messages"),
    ("human", "{input}"),
])

from langchain_core.runnables import RunnablePassthrough

chain = (
    RunnablePassthrough.assign(
        messages=lambda x: trimmer.invoke(x["messages"])
    )
    | prompt
    | llm
)

Approach 4: Conversation Summary Memory

When you need to remember the gist of a long conversation without storing every message, use summarization:

from langchain.memory import ConversationSummaryBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Keep recent messages in full, older messages as summary
memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=500,  # summarize when history exceeds 500 tokens
    return_messages=True,
)

conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True,
)

conversation.predict(input="I'm building a recommendation engine for an e-commerce site.")
conversation.predict(input="The main challenge is cold-start for new users.")
conversation.predict(input="I'm thinking of using collaborative filtering.")

# Memory now contains a summary of early messages + recent messages verbatim
print(memory.moving_summary_buffer)

This approach is ideal for customer support bots or long-running assistants where the full history would be too large.

Approach 5: Persistent Memory with Redis

For production applications, persist memory in Redis so it survives server restarts:

pip install redis langchain-redis
from langchain_redis import RedisChatMessageHistory

def get_session_history(session_id: str):
    return RedisChatMessageHistory(
        session_id=session_id,
        url="redis://localhost:6379",
        ttl=86400,  # expire sessions after 24 hours
    )

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

Swap RedisChatMessageHistory for SQLChatMessageHistory (Postgres/SQLite) or DynamoDBChatMessageHistory (AWS) depending on your infrastructure.

For agents that use tools, LangGraph with a checkpointer is the modern recommended approach. It persists the entire agent state (messages, tool results, reasoning):

from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent as create_lg_agent

llm = ChatOpenAI(model="gpt-4o-mini")
checkpointer = MemorySaver()  # in-memory; use SqliteSaver for persistence

agent = create_lg_agent(llm, tools=[], checkpointer=checkpointer)

# Each thread_id is a separate conversation
config = {"configurable": {"thread_id": "conversation-1"}}

result = agent.invoke(
    {"messages": [{"role": "user", "content": "My name is Jordan."}]},
    config=config,
)

result2 = agent.invoke(
    {"messages": [{"role": "user", "content": "What is my name?"}]},
    config=config,
)
print(result2["messages"][-1].content)  # "Your name is Jordan."

For persistent storage, use SqliteSaver:

from langgraph.checkpoint.sqlite import SqliteSaver

with SqliteSaver.from_conn_string("memory.db") as checkpointer:
    agent = create_lg_agent(llm, tools=[], checkpointer=checkpointer)

Choosing the Right Memory Strategy

StrategyBest forLimitation
Manual message listPrototypes, simple scriptsGrows unbounded
RunnableWithMessageHistoryMulti-user chatbotsStill grows without trimming
Windowed bufferLong conversationsMay lose important early context
Summary memoryVery long conversationsLossy — details may be dropped
Redis/SQL persistenceProduction chatbotsRequires infrastructure
LangGraph checkpointerTool-using agentsRequires LangGraph setup

Frequently Asked Questions

What is the difference between ConversationBufferMemory and ConversationSummaryMemory?

ConversationBufferMemory stores every message verbatim. It’s simple but grows without bound. ConversationSummaryMemory periodically compresses old messages into a summary, keeping the context window manageable at the cost of some detail. For conversations longer than ~20 turns, use summary or windowed approaches.

How do I give each user their own memory in a web app?

Use the session_id pattern with RunnableWithMessageHistory. Map each user (by user ID or session token) to their own ChatMessageHistory object stored in Redis or a database. Never share a single history object across users.

Does memory work with streaming responses?

Yes. Use .astream() instead of .invoke() and pass the same config:

async for chunk in chain_with_history.astream(
    {"input": "Tell me a long story"},
    config={"configurable": {"session_id": "user-1"}}
):
    print(chunk.content, end="", flush=True)

How do I clear a user’s conversation history?

history = get_session_history("user-alice")
history.clear()

For Redis/SQL backends, this deletes the records from the store.

What happens when the conversation exceeds the model’s context window?

Without trimming, you will get a ContextLengthExceeded API error. Always set either a token-based trimmer or use summary memory in production. As a safety net, catch openai.BadRequestError and clear the oldest messages.

Next Steps

Related Articles