Letta Multi-Agent Collaboration: Build Agent Networks

Q: Can Letta agents run in parallel?

Yes. client.sendmessage() is synchronous but you can use Python's asyncio or ThreadPoolExecutor to call multiple agents concurrently. The Letta server handles concurrent agent state safely.

Q: What happens if a sub-agent fails?

Exceptions from client.sendmessage() propagate normally. Wrap calls in try/except and implement retry logic. The orchestrator agent can also be instructed via its system prompt to handle sub-agent failures gracefully.

Why Multi-Agent in Letta?

Letta’s persistent memory architecture makes it uniquely powerful for multi-agent systems. Unlike frameworks where agents lose state after each call, Letta agents maintain:

Persistent identity — each agent has a consistent persona and history
Shared memory blocks — agents can read/write shared state without external storage
Cross-agent messaging — agents communicate asynchronously

This enables architectures where specialized agents collaborate over time, each maintaining their own expertise and context while sharing relevant knowledge.

Core Multi-Agent Concepts

In Letta, multi-agent collaboration happens through:

Agent-to-Agent Messaging — one agent sends a message to another
Shared Memory Blocks — agents read from the same memory block
Orchestration Agents — a manager agent that routes tasks to sub-agents
Tool Calls to Other Agents — wrapping agents as callable tools

Setting Up a Multi-Agent System

from letta import create_client
from letta.schemas.memory import ChatMemory, Memory
from letta.schemas.block import Block
from letta.schemas.llm_config import LLMConfig
from letta.schemas.embedding_config import EmbeddingConfig

client = create_client()  # connects to local Letta server

# Shared LLM config
llm_config = LLMConfig(
    model="gpt-4o-mini",
    model_endpoint_type="openai",
    model_endpoint="https://api.openai.com/v1",
    context_window=128000,
)

embed_config = EmbeddingConfig(
    embedding_model="text-embedding-3-small",
    embedding_endpoint_type="openai",
    embedding_endpoint="https://api.openai.com/v1",
    embedding_dim=1536,
)

Pattern 1: Orchestrator + Specialist Agents

The most common pattern: one orchestrator routes tasks to specialist agents.

# Create a shared memory block for task status
shared_state = client.create_block(
    label="shared_state",
    value="Task queue: []\nCompleted: []\nErrors: []",
    limit=2000,
)

# Create specialist agents
research_agent = client.create_agent(
    name="researcher",
    system=(
        "You are a specialized research agent. "
        "When asked to research a topic, search for information, "
        "summarize key findings, and store them in your archival memory. "
        "Report back with a structured summary."
    ),
    memory=ChatMemory(
        human="",
        persona="I am a research specialist with deep web search capabilities.",
    ),
    block_ids=[shared_state.id],  # shared access
    llm_config=llm_config,
    embedding_config=embed_config,
)

writer_agent = client.create_agent(
    name="writer",
    system=(
        "You are a specialized content writer. "
        "When given research findings, turn them into clear, engaging articles. "
        "Store completed drafts in your archival memory."
    ),
    memory=ChatMemory(
        human="",
        persona="I am a professional technical writer with a clear, direct style.",
    ),
    block_ids=[shared_state.id],
    llm_config=llm_config,
    embedding_config=embed_config,
)

# Orchestrator agent
orchestrator = client.create_agent(
    name="orchestrator",
    system=(
        "You are the task orchestrator. You coordinate research_agent and writer_agent. "
        "For any content creation task:\n"
        "1. First send the research task to the researcher\n"
        "2. Once research is done, send findings to the writer\n"
        "3. Compile the final output"
    ),
    memory=ChatMemory(
        human="",
        persona="I am an orchestrator who manages a team of specialized agents.",
    ),
    block_ids=[shared_state.id],
    llm_config=llm_config,
    embedding_config=embed_config,
)

Sending Messages Between Agents

# Step 1: Orchestrator receives the user task
orchestrator_response = client.send_message(
    agent_id=orchestrator.id,
    message="Create an article about LLM memory management techniques.",
    role="user",
)
print("Orchestrator:", orchestrator_response.messages[-1].text)

# Step 2: Research agent gets the research task
research_response = client.send_message(
    agent_id=research_agent.id,
    message="Research LLM memory management techniques. Cover: types of memory, popular frameworks, and best practices. Summarize in 500 words.",
    role="user",
)
research_findings = research_response.messages[-1].text
print("Research findings:", research_findings[:300])

# Step 3: Writer agent gets the research and produces content
writer_response = client.send_message(
    agent_id=writer_agent.id,
    message=f"Write a 1000-word article based on this research:\n\n{research_findings}",
    role="user",
)
final_article = writer_response.messages[-1].text
print("Final article:", final_article[:500])

Pattern 2: Shared Memory Blocks

The most powerful Letta multi-agent feature: shared memory blocks allow agents to read and write to the same mutable state.

# Create a shared project memory block
project_memory = client.create_block(
    label="project_memory",
    value="""
# Project: AI Agent Cookbook Articles

## Assigned Topics
- LLM memory management: ASSIGNED to researcher
- RAG pipeline optimization: ASSIGNED to researcher

## Research Complete
(none yet)

## Articles Written
(none yet)
""",
    limit=5000,
)

# Both agents share this block
agent_a = client.create_agent(
    name="agent_a",
    block_ids=[project_memory.id],
    # ... other config
)

agent_b = client.create_agent(
    name="agent_b",
    block_ids=[project_memory.id],
    # ... other config
)

# Agent A updates the shared memory block
client.send_message(
    agent_id=agent_a.id,
    message="Update the project memory to mark 'LLM memory management' research as complete with key findings.",
    role="user",
)

# Agent B can now read Agent A's updates
client.send_message(
    agent_id=agent_b.id,
    message="Check the project memory and write an article for the first completed research topic.",
    role="user",
)

Pattern 3: Agents as Tools

Wrap agents as callable tools for seamless orchestration:

from letta.schemas.tool import Tool

def call_research_agent(topic: str) -> str:
    """
    Call the research agent to investigate a topic.
    Returns a structured research summary.

    Args:
        topic: The topic to research
    Returns:
        Research findings as a string
    """
    response = client.send_message(
        agent_id=research_agent.id,
        message=f"Research: {topic}",
        role="user",
    )
    return response.messages[-1].text

# Register as a tool
research_tool = client.create_tool(call_research_agent)

# Attach to orchestrator
client.add_tool_to_agent(
    agent_id=orchestrator.id,
    tool_id=research_tool.id,
)

# Now orchestrator can call the research agent as a tool
orchestrator_response = client.send_message(
    agent_id=orchestrator.id,
    message="Create a comprehensive guide on RAG pipelines. Use your tools.",
    role="user",
)

Conversation History Across Agents

A key advantage of Letta multi-agent systems: each agent maintains its full conversation history independently.

# Check what each agent remembers
research_memory = client.get_core_memory(research_agent.id)
writer_memory = client.get_core_memory(writer_agent.id)

print("Researcher core memory:", research_memory.get_block("human").value)
print("Writer core memory:", writer_memory.get_block("human").value)

# Search archival memory of an agent
research_archive = client.get_archival_memory(
    agent_id=research_agent.id,
    query="LLM memory management",
    limit=5,
)
for passage in research_archive:
    print(passage.text[:200])

Practical Example: Editorial Team

# Build a 3-agent editorial team
agents = {
    "researcher": client.create_agent(
        name="editorial_researcher",
        system="Research topics thoroughly. Store findings in archival memory.",
        # ...
    ),
    "writer": client.create_agent(
        name="editorial_writer",
        system="Write clear technical articles from research briefs.",
        # ...
    ),
    "editor": client.create_agent(
        name="editorial_editor",
        system="Review articles for accuracy, clarity, and completeness. Return edited version.",
        # ...
    ),
}

# Pipeline
def create_article(topic: str) -> str:
    # Stage 1: Research
    research = client.send_message(
        agent_id=agents["researcher"].id,
        message=f"Research: {topic}",
        role="user",
    ).messages[-1].text

    # Stage 2: Write
    draft = client.send_message(
        agent_id=agents["writer"].id,
        message=f"Write article based on:\n{research}",
        role="user",
    ).messages[-1].text

    # Stage 3: Edit
    final = client.send_message(
        agent_id=agents["editor"].id,
        message=f"Edit this article:\n{draft}",
        role="user",
    ).messages[-1].text

    return final

article = create_article("Letta memory architecture for production AI agents")
print(article)

Frequently Asked Questions

How do I limit what each agent can see in shared memory?

Shared blocks are all-or-nothing per block. For fine-grained access control, create separate blocks with different access groups. Agent A gets block_a and block_shared; Agent B gets block_b and block_shared.

Can Letta agents run in parallel?

Yes. client.send_message() is synchronous but you can use Python’s asyncio or ThreadPoolExecutor to call multiple agents concurrently. The Letta server handles concurrent agent state safely.

What happens if a sub-agent fails?

Exceptions from client.send_message() propagate normally. Wrap calls in try/except and implement retry logic. The orchestrator agent can also be instructed via its system prompt to handle sub-agent failures gracefully.

There’s no hard limit — any number of agents can reference the same block ID. However, concurrent writes to the same block may conflict. For high-concurrency systems, use separate blocks with periodic merge operations.

Next Steps

Letta Tool Use and External Integrations — Equip your multi-agent system with external capabilities
Letta Deployment and Production — Deploy your agent network in production
CrewAI Multi-Agent Workflows — Compare with CrewAI’s team-based approach