Letta’s Deployment Model
Letta runs as a server — a persistent process that hosts all your agents, their memories, and tools. Unlike stateless function calls, Letta agents live on the server between requests.
This means deployment is different from deploying a Python script: you’re running a long-lived service that needs uptime, storage, and proper configuration.
Running Letta Server
Local Development
# Install
pip install letta
# Start server (default: localhost:8283)
letta server
# Check it's running
curl http://localhost:8283/v1/health
# → {"status": "ok"}
Configuration File
Letta reads from ~/.letta/config:
letta configure
# Interactive setup: sets LLM provider, embedding model, storage backend
Or set via environment variables:
export OPENAI_API_KEY="sk-..."
export LETTA_PG_URI="postgresql://user:pass@localhost/letta" # optional: use PostgreSQL
letta server
Server Options
letta server \
--host 0.0.0.0 \ # listen on all interfaces
--port 8283 \
--debug # verbose logging
Docker Deployment
Single-Container Setup
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
RUN pip install letta
# Copy configuration (or use env vars)
ENV OPENAI_API_KEY=""
ENV LETTA_SERVER_PASS="" # server authentication token
EXPOSE 8283
CMD ["letta", "server", "--host", "0.0.0.0", "--port", "8283"]
docker build -t letta-server .
docker run -d \
-p 8283:8283 \
-e OPENAI_API_KEY="sk-..." \
-e LETTA_SERVER_PASS="your-token" \
-v letta-data:/root/.letta \
--name letta \
letta-server
Docker Compose with PostgreSQL
For production, use PostgreSQL instead of SQLite:
# docker-compose.yml
version: "3.8"
services:
postgres:
image: postgres:15
environment:
POSTGRES_DB: letta
POSTGRES_USER: letta
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- pg-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U letta"]
interval: 10s
retries: 5
letta:
image: python:3.11-slim
command: sh -c "pip install letta && letta server --host 0.0.0.0 --port 8283"
ports:
- "8283:8283"
environment:
OPENAI_API_KEY: ${OPENAI_API_KEY}
LETTA_SERVER_PASS: ${LETTA_SERVER_PASS}
LETTA_PG_URI: postgresql://letta:${POSTGRES_PASSWORD}@postgres/letta
depends_on:
postgres:
condition: service_healthy
restart: always
volumes:
pg-data:
# .env
OPENAI_API_KEY=sk-...
POSTGRES_PASSWORD=secure-password-here
LETTA_SERVER_PASS=api-access-token-here
docker compose up -d
Integrating with Web Applications
REST API
Letta exposes a full REST API. Authenticate with the server password:
import httpx
base_url = "http://localhost:8283"
headers = {"Authorization": "Bearer your-server-pass"}
# Create an agent via REST
response = httpx.post(
f"{base_url}/v1/agents/",
headers=headers,
json={
"name": "my_agent",
"system": "You are a helpful assistant.",
"llm_config": {
"model": "gpt-4o-mini",
"model_endpoint_type": "openai",
"model_endpoint": "https://api.openai.com/v1",
"context_window": 128000,
},
"embedding_config": {
"embedding_model": "text-embedding-3-small",
"embedding_endpoint_type": "openai",
"embedding_endpoint": "https://api.openai.com/v1",
"embedding_dim": 1536,
},
"memory": {
"memory": {
"human": {"label": "human", "value": "", "limit": 2000},
"persona": {"label": "persona", "value": "I am a helpful assistant.", "limit": 2000},
}
},
},
)
agent_id = response.json()["id"]
# Send a message
msg_response = httpx.post(
f"{base_url}/v1/agents/{agent_id}/messages",
headers=headers,
json={"messages": [{"role": "user", "text": "Hello, remember my name is Alex."}]},
)
print(msg_response.json()["messages"][-1]["text"])
FastAPI Integration
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from letta import create_client
app = FastAPI()
client = create_client(base_url="http://localhost:8283", token="your-server-pass")
# Cache: user_id → agent_id mapping
user_agents: dict[str, str] = {}
class ChatRequest(BaseModel):
user_id: str
message: str
class ChatResponse(BaseModel):
response: str
agent_id: str
def get_or_create_agent(user_id: str) -> str:
"""Get existing agent for user, or create a new one."""
if user_id in user_agents:
return user_agents[user_id]
# Check if agent exists in Letta
agents = client.list_agents()
for agent in agents:
if agent.name == f"user_{user_id}":
user_agents[user_id] = agent.id
return agent.id
# Create new agent
from letta.schemas.memory import ChatMemory
from letta.schemas.llm_config import LLMConfig
from letta.schemas.embedding_config import EmbeddingConfig
agent = client.create_agent(
name=f"user_{user_id}",
system="You are a helpful personal assistant. Remember user preferences and context.",
memory=ChatMemory(human="", persona="I am a persistent personal assistant."),
llm_config=LLMConfig(
model="gpt-4o-mini",
model_endpoint_type="openai",
model_endpoint="https://api.openai.com/v1",
context_window=128000,
),
embedding_config=EmbeddingConfig(
embedding_model="text-embedding-3-small",
embedding_endpoint_type="openai",
embedding_endpoint="https://api.openai.com/v1",
embedding_dim=1536,
),
)
user_agents[user_id] = agent.id
return agent.id
@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
agent_id = get_or_create_agent(request.user_id)
response = client.send_message(
agent_id=agent_id,
message=request.message,
role="user",
)
text = next(
(m.text for m in reversed(response.messages) if hasattr(m, "text") and m.text),
"No response"
)
return ChatResponse(response=text, agent_id=agent_id)
@app.get("/agents/{user_id}/memory")
async def get_memory(user_id: str):
agent_id = get_or_create_agent(user_id)
memory = client.get_core_memory(agent_id)
return {
"human": memory.get_block("human").value,
"persona": memory.get_block("persona").value,
}
Scaling Considerations
Multiple Workers (Read-Heavy)
For read-heavy workloads (many users reading agent state), run multiple Letta server instances behind a load balancer, all pointing to the same PostgreSQL database:
upstream letta {
server letta-1:8283;
server letta-2:8283;
server letta-3:8283;
}
Important: Agent memory writes are serialized per-agent in PostgreSQL. Avoid concurrent writes to the same agent from multiple servers.
Per-User Agent Management
Don’t create a new agent per request. Create one agent per user and reuse it:
# Good: agent persists, accumulates memory
agent_id = get_or_create_agent(user_id)
client.send_message(agent_id=agent_id, message=msg, role="user")
# Bad: creates new agent every time, no memory accumulation
agent = client.create_agent(...)
client.send_message(agent_id=agent.id, message=msg, role="user")
Memory Maintenance
For long-lived production agents, periodically compact archival memory to prevent context window bloat:
# Search and summarize old memories
old_memories = client.get_archival_memory(
agent_id=agent_id,
query="", # all memories
limit=100,
)
# Summarize old memories into the agent's human block
if len(old_memories) > 50:
client.send_message(
agent_id=agent_id,
message="Please summarize the key facts about me stored in your archival memory and update your core memory.",
role="system",
)
Monitoring and Observability
# List all agents and their stats
agents = client.list_agents()
for agent in agents:
print(f"{agent.name}: {agent.id}")
# Check agent message count (proxy for activity)
messages = client.get_messages(agent_id=agent_id, limit=1)
print(f"Agent has at least {len(messages)} messages")
# Health check endpoint
import httpx
health = httpx.get("http://localhost:8283/v1/health")
print(health.json()) # {"status": "ok"}
Frequently Asked Questions
Can I run Letta without OpenAI?
Yes. Configure Letta to use any OpenAI-compatible API endpoint:
letta configure
# Select: openai_chat_completions (compatible)
# Set endpoint: http://localhost:11434/v1 (Ollama example)
# Set model: llama3.2
Local models work but require more RAM and produce lower-quality memory management.
How do I back up agent memories?
Back up the PostgreSQL database (or the SQLite file at ~/.letta/sqlite.db). All agent state, memories, and tools are stored there. For PostgreSQL:
pg_dump -U letta letta > backup.sql
What’s the maximum number of agents I can run?
Memory and compute are the only limits. Each agent’s active context window is loaded on first message. For inactive agents, storage is the only cost. A 4GB server can support hundreds of concurrent agents and thousands of total agents.
Can I use Letta with Claude instead of GPT-4o?
Yes:
llm_config = LLMConfig(
model="claude-sonnet-4-6",
model_endpoint_type="anthropic",
model_endpoint="https://api.anthropic.com",
context_window=200000,
)
Set ANTHROPIC_API_KEY in your environment.
How do I handle agent versioning?
Letta doesn’t have built-in versioning. Best practice: use agent name conventions (e.g., user_{id}_v2) and migrate by creating a new agent and copying key memories via archival_memory_insert.
Next Steps
- Letta Multi-Agent Collaboration — Deploy multi-agent systems
- Letta Tool Use and External Integrations — Add capabilities to production agents
- AutoGen Multi-Agent Workflows — Compare with AutoGen’s approach to persistent agents