Why Code Execution Changes Everything
Most AI assistants tell you what code to write. AutoGen agents write the code, run it, observe the output, and fix errors — all without human intervention. This loop transforms AutoGen from a chatbot into a programming partner that actually gets things done.
The core pattern is simple: an AssistantAgent writes Python or shell code, a UserProxyAgent executes it in a sandbox, and the result goes back to the assistant. If it fails, the assistant reads the error and tries again. This code → run → observe → fix cycle enables AutoGen to solve complex programming tasks autonomously.
Installation
pip install autogen-agentchat autogen-ext[openai]
export OPENAI_API_KEY="sk-your-key"
The Two Executors
AutoGen provides two built-in code executors:
LocalCommandLineCodeExecutor — runs code directly on your host machine. Fast, but the agent can access your file system.
DockerCommandLineCodeExecutor — runs code in an isolated Docker container. Slower to start but safe for untrusted code.
Basic Code Execution Setup
import asyncio
from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen.coding import LocalCommandLineCodeExecutor
async def main():
model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
# The AI that writes code
coder = AssistantAgent(
name="coder",
model_client=model_client,
system_message=(
"You are an expert Python developer. Write clean, working code to solve tasks. "
"Always put code in ```python blocks. Say DONE when the task is verified."
),
)
# The executor that runs code
executor = UserProxyAgent(
name="executor",
code_execution_config={
"code_executor": LocalCommandLineCodeExecutor(
work_dir="./output", # files saved here
timeout=30, # max seconds per execution
)
},
)
team = RoundRobinGroupChat(
[coder, executor],
termination_condition=TextMentionTermination("DONE"),
max_turns=10,
)
await Console(team.run_stream(
task="Write a Python script that generates a Fibonacci sequence up to 100 and saves it to fibonacci.txt"
))
asyncio.run(main())
Watch the agent:
- Write the Python code
- The executor runs it
- If there’s an error, the coder reads it and fixes the code
- When it works, the coder says
DONE
Docker Sandbox (Safer Execution)
For production or untrusted environments, use Docker:
pip install autogen-ext[docker]
from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor
executor = UserProxyAgent(
name="executor",
code_execution_config={
"code_executor": DockerCommandLineCodeExecutor(
image="python:3.12-slim",
work_dir="./output",
timeout=60,
auto_remove=True, # remove container after execution
)
},
)
Docker adds ~2–5 second startup time but completely isolates the execution from your host.
Data Analysis Agent
A practical example — an agent that analyzes a CSV file:
import asyncio
from pathlib import Path
from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen.coding import LocalCommandLineCodeExecutor
# Create a sample CSV for the agent to analyze
Path("output").mkdir(exist_ok=True)
Path("output/sales.csv").write_text(
"month,revenue,units\n"
"Jan,45000,150\nFeb,52000,175\nMar,48000,160\n"
"Apr,61000,205\nMay,58000,195\nJun,70000,235\n"
)
async def main():
model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
analyst = AssistantAgent(
name="analyst",
model_client=model_client,
system_message=(
"You are a data analyst. Use pandas and matplotlib to analyze data. "
"The CSV file is at output/sales.csv. Say VERIFIED when analysis is complete."
),
)
executor = UserProxyAgent(
name="executor",
code_execution_config={
"code_executor": LocalCommandLineCodeExecutor(work_dir="./output")
},
)
team = RoundRobinGroupChat(
[analyst, executor],
termination_condition=TextMentionTermination("VERIFIED"),
max_turns=8,
)
from autogen_agentchat.ui import Console
await Console(team.run_stream(
task="Analyze the sales CSV. Calculate total revenue, average monthly revenue, best month, and growth rate. Print a summary report."
))
asyncio.run(main())
Test Writing Agent
Use AutoGen to automatically write and run pytest tests:
async def main():
model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
tester = AssistantAgent(
name="tester",
model_client=model_client,
system_message=(
"You are a QA engineer. Write pytest unit tests for the provided code. "
"Run tests, fix failures, and confirm all tests pass. Say PASS when done."
),
)
executor = UserProxyAgent(
name="executor",
code_execution_config={
"code_executor": LocalCommandLineCodeExecutor(work_dir="./output")
},
)
team = RoundRobinGroupChat(
[tester, executor],
termination_condition=TextMentionTermination("PASS"),
)
await Console(team.run_stream(
task="""
Write and run pytest tests for this function:
def calculate_discount(price: float, discount_pct: float) -> float:
if discount_pct < 0 or discount_pct > 100:
raise ValueError("Discount must be 0-100")
return price * (1 - discount_pct / 100)
Test: normal discount, zero discount, 100% discount, invalid values.
"""
))
asyncio.run(main())
Controlling What the Agent Can Execute
Restrict the agent to only run specific types of code:
from autogen.coding import LocalCommandLineCodeExecutor
# Only allow Python files, no shell commands
executor_config = LocalCommandLineCodeExecutor(
work_dir="./output",
timeout=15,
functions=[], # no registered functions
)
# Or add allowed functions explicitly:
def safe_read_file(path: str) -> str:
"""Read a file safely."""
from pathlib import Path
p = Path(path)
if not p.is_relative_to(Path("./output")):
return "Error: can only read files in ./output"
return p.read_text()
Frequently Asked Questions
Is it safe to let AutoGen execute code on my machine?
LocalCommandLineCodeExecutor runs with your user permissions — the agent can read and write files in work_dir and run any Python code. For development and trusted tasks, this is fine. For untrusted inputs or production, use DockerCommandLineCodeExecutor.
How do I see what files the agent created?
Check the work_dir folder after the run. The executor saves all code files and output there. You can also add a task like “list the files you created” and the agent will run ls to show them.
Can the agent install packages?
Yes — the agent can write import subprocess; subprocess.run(['pip', 'install', 'pandas']) and the executor will run it. Set timeout high enough for package installs (~120s). In Docker mode, installed packages are wiped when the container stops.
How do I prevent infinite loops?
Set max_turns on the team and timeout on the executor. Also give the termination condition a clear signal word (DONE, VERIFIED, PASS) and tell the agent in the system message to use it when finished.
Can the agent debug existing code I provide?
Yes. Include the buggy code in the task description. The agent will read it, run it, observe the error, and fix it iteratively.
Next Steps
- AutoGen Human-in-the-Loop Workflows — Add human review before agents execute critical code
- Getting Started with AutoGen — Back to basics
- CrewAI Multi-Agent Workflows — Compare with CrewAI’s task-based approach