Letta Memory Architecture: How Stateful Agents Remember

Q: How is Letta memory different from LangChain's `ConversationBufferMemory`?

ConversationBufferMemory stores messages in memory for the current Python process — it disappears when the process restarts. Letta's memory is persisted to a database and survives restarts, server crashes, and deployments. Additionally, Letta uses a tiered architecture: core memory is always in context (like a working memory), while archival memory is searched on demand (like long-term memory). LangChain's memory is all-or-nothing.

Q: What happens when core memory gets full?

The agent uses its corememoryreplace tool to overwrite less important information with more important new information — just like humans prioritize what to remember. For information that shouldn't be lost, the agent first moves it to archival memory with archivalmemoryinsert, then frees up the core memory space.

Q: Can I use Letta without the cloud API?

Yes. Run the Letta server locally: bash pip install letta letta server Then connect to http://localhost:8283. The local server uses SQLite by default.

Q: How do I export an agent's memories for backup?

python Export all archival memories passages = client.agents.archivalmemory.list(agentid=agentid, limit=1000) import json with open("agentmemories.json", "w") as f: json.dump([p.dict() for p in passages], f)

The Problem with Stateless AI

Every call to an LLM starts from zero. By default, there is no concept of yesterday, last week, or “that customer who called three times about the same issue.” The model’s context window is its entire world — and it disappears after each response.

Letta (formerly MemGPT) was built to solve this. Its core innovation is a three-tier memory architecture that gives agents memory at different timescales and capacities:

Core Memory — Always in context. Small, fast, always available.
Archival Memory — Infinite storage. Searched semantically on demand.
Recall Memory — The conversation history. Searchable by time and content.

Understanding these three layers is essential to building agents that actually remember users, learn from past interactions, and maintain long-running context.

Tier 1: Core Memory (In-Context)

Core memory is what the agent always knows. It’s included in every LLM call, so the model can reference it without any retrieval step.

Core memory is structured as memory blocks — labeled sections of text:

Block label	Purpose
`persona`	The agent’s identity, personality, and behavioral guidelines
`human`	What the agent knows about the current user
`system`	Task-specific context, rules, or knowledge
Custom	Any labeled block you define

Because core memory is always in-context, it’s size-limited. Each block has a character limit (default ~2,000 characters). The agent can update its own core memory during a conversation by calling internal memory tools.

Reading Core Memory

from letta_client import Letta
import os

client = Letta(api_key=os.getenv("LETTA_API_KEY"))

# Get the persona block
persona_block = client.agents.core_memory.blocks.retrieve(
    agent_id="agent-your-id-here",
    block_label="persona"
)
print(f"Agent persona:\n{persona_block.value}")

# Get the human block (what the agent knows about the user)
human_block = client.agents.core_memory.blocks.retrieve(
    agent_id="agent-your-id-here",
    block_label="human"
)
print(f"User context:\n{human_block.value}")

Updating Core Memory from Outside

You can update memory blocks directly via the API:

# Update what the agent knows about the user
client.agents.core_memory.blocks.modify(
    agent_id="agent-your-id-here",
    block_label="human",
    value=(
        "Name: Sarah Chen\n"
        "Role: Senior Software Engineer\n"
        "Preferred language: Python\n"
        "Current project: Building a RAG system for internal docs\n"
        "Last interaction: Asked about Pinecone vs Weaviate"
    )
)

The Agent Updates Its Own Memory

This is the key to Letta’s design: the agent can call memory tools mid-conversation to update its core memory when it learns something important:

User: "By the way, I prefer concise responses — I'm an experienced developer."

Agent internal thought: I should update the human memory block with this preference.
[calls: core_memory_replace(label="human", old_content="...", new_content="... Prefers concise, technical responses")]

Agent response: "Got it — I'll keep responses concise and technical."

This happens automatically without you writing any code. The agent decides when to update memory based on conversation.

Tier 2: Archival Memory (Semantic Search)

Archival memory is unlimited storage — think of it as the agent’s long-term knowledge base. Items stored in archival memory are embedded as vectors and retrieved via semantic search when relevant.

Unlike core memory, archival memory is not always in context. The agent must explicitly call archival_memory_search to retrieve relevant passages.

Inserting into Archival Memory

# Store a customer interaction summary
passages = client.agents.archival_memory.create(
    agent_id="agent-your-id-here",
    text=(
        "Customer: Sarah Chen (ID: sc-4821)\n"
        "Date: 2026-04-08\n"
        "Issue: Confusion about Pinecone vs Weaviate performance\n"
        "Resolution: Explained that Weaviate is better for hybrid search, "
        "Pinecone for pure vector similarity at scale.\n"
        "Outcome: Customer satisfied, will evaluate Weaviate."
    ),
    tags=["customer-sarah-chen", "vector-db", "resolved"]
)
print(f"Stored {len(passages)} passage(s)")

How the Agent Searches Archival Memory

During a conversation, if a user says “Do you remember our conversation last month about databases?”, the agent will call archival_memory_search with a query like “database conversation Sarah Chen” and retrieve the relevant passage.

You can also search manually via the API:

# Search archival memory semantically
results = client.agents.archival_memory.list(
    agent_id="agent-your-id-here",
    query="vector database recommendation",
    limit=5,
)

for passage in results:
    print(f"ID: {passage.id}")
    print(f"Text: {passage.text}")
    print(f"Tags: {passage.tags}")
    print("---")

Shared Memory Blocks

One powerful Letta pattern: create a memory block once and attach it to multiple agents. All agents read from the same source of truth:

# Create a shared knowledge block
shared_block = client.blocks.create(
    label="company_info",
    value=(
        "Acme Corp is a B2B SaaS company founded in 2020.\n"
        "Products: TaskFlow (project management), DataPulse (analytics)\n"
        "Support hours: 9am-6pm PST, Mon-Fri\n"
        "SLA: 4-hour response for enterprise, 24h for standard"
    ),
    description="Shared company context for all support agents",
    tags=["shared", "company", "support"]
)

# Attach to multiple agents
for agent_id in support_agent_ids:
    client.agents.core_memory.blocks.attach(
        agent_id=agent_id,
        block_id=shared_block.id
    )

Now all support agents share the same company_info block. Update it once, and all agents immediately see the change.

Tier 3: Recall Memory (Conversation History)

Recall memory is the conversation history — every message ever sent to this agent, stored in a searchable database. Unlike other frameworks where conversation history is either in-context (expensive) or lost, Letta stores the full history and retrieves relevant portions on demand.

The agent can search recall memory by:

Time range — “What did we discuss in March?”
Semantic query — “Find previous questions about billing”

From the outside, you can list recent messages:

messages = client.agents.messages.list(
    agent_id="agent-your-id-here",
    limit=20,
)

for msg in messages:
    print(f"[{msg.role}] {msg.content[:100]}...")

Memory Architecture Diagram

┌─────────────────────────────────────────┐
│           LLM Context Window             │
│                                          │
│  ┌─────────────────────────────────┐    │
│  │       Core Memory (always here) │    │
│  │  [persona] [human] [system]     │    │
│  └─────────────────────────────────┘    │
│                                          │
│  ┌─────────────────────────────────┐    │
│  │   Retrieved snippets (on demand)│    │
│  │   from archival / recall        │    │
│  └─────────────────────────────────┘    │
│                                          │
│  [current conversation messages]         │
└─────────────────────────────────────────┘

     ↑ retrieved by search ↑
┌────────────────┐    ┌──────────────────┐
│ Archival Memory│    │  Recall Memory   │
│ (vector store) │    │  (message DB)    │
│ Unlimited      │    │  Full history    │
└────────────────┘    └──────────────────┘

Practical Pattern: Personalized Support Agent

Here’s how to build an agent that remembers each customer across unlimited interactions:

from letta_client import Letta
import os

client = Letta(api_key=os.getenv("LETTA_API_KEY"))

def get_or_create_agent(customer_id: str, customer_name: str):
    """Return existing agent for a customer or create a new one."""
    # Check if agent already exists for this customer
    agents = client.agents.list(tags=[f"customer:{customer_id}"])
    if agents:
        return agents[0]

    # Create a new agent with customer-specific memory
    agent = client.agents.create(
        name=f"support-agent-{customer_id}",
        system=(
            "You are a helpful support agent. "
            "Use your memory tools to remember details about this customer "
            "and provide personalized help. "
            "Always update the human memory block when you learn something new."
        ),
        memory_blocks=[
            {"label": "human", "value": f"Customer name: {customer_name}\nID: {customer_id}"},
            {"label": "persona", "value": "I am a patient, knowledgeable support agent."},
        ],
        tags=[f"customer:{customer_id}"],
    )
    return agent

# Use the agent
agent = get_or_create_agent("cust-001", "Alice Johnson")

response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Hi, I'm having trouble with my API key."}]
)
print(response.messages[-1].content)

Each customer gets a persistent agent that remembers them forever. No session management, no re-sending context — Letta handles it all.

Frequently Asked Questions

How is Letta memory different from LangChain’s `ConversationBufferMemory`?

ConversationBufferMemory stores messages in memory for the current Python process — it disappears when the process restarts. Letta’s memory is persisted to a database and survives restarts, server crashes, and deployments. Additionally, Letta uses a tiered architecture: core memory is always in context (like a working memory), while archival memory is searched on demand (like long-term memory). LangChain’s memory is all-or-nothing.

What happens when core memory gets full?

The agent uses its core_memory_replace tool to overwrite less important information with more important new information — just like humans prioritize what to remember. For information that shouldn’t be lost, the agent first moves it to archival memory with archival_memory_insert, then frees up the core memory space.

Can I use Letta without the cloud API?

Yes. Run the Letta server locally:

pip install letta
letta server

Then connect to http://localhost:8283. The local server uses SQLite by default.

How do I export an agent’s memories for backup?

# Export all archival memories
passages = client.agents.archival_memory.list(agent_id=agent_id, limit=1000)
import json
with open("agent_memories.json", "w") as f:
    json.dump([p.dict() for p in passages], f)

Is Letta suitable for production customer-facing chatbots?

Yes, that’s a primary use case. The Letta Cloud API is production-ready. For self-hosted deployments, you need PostgreSQL (not SQLite) for concurrent access and proper vector search with pgvector or a dedicated vector DB.

Next Steps

Getting Started with Letta — Installation and first agent
LangChain Memory Management — Compare Letta’s approach with LangChain’s memory options
LlamaIndex vs LangChain for RAG — Combine memory with document retrieval