Intermediate Autogen 4 min read

AutoGen Code Execution: Build Agents That Write and Run Code

#autogen #code-execution #testing #python #sandbox #docker

Why Code Execution Changes Everything

Most AI assistants tell you what code to write. AutoGen agents write the code, run it, observe the output, and fix errors — all without human intervention. This loop transforms AutoGen from a chatbot into a programming partner that actually gets things done.

The core pattern is simple: an AssistantAgent writes Python or shell code, a UserProxyAgent executes it in a sandbox, and the result goes back to the assistant. If it fails, the assistant reads the error and tries again. This code → run → observe → fix cycle enables AutoGen to solve complex programming tasks autonomously.

Installation

pip install autogen-agentchat autogen-ext[openai]
export OPENAI_API_KEY="sk-your-key"

The Two Executors

AutoGen provides two built-in code executors:

LocalCommandLineCodeExecutor — runs code directly on your host machine. Fast, but the agent can access your file system.

DockerCommandLineCodeExecutor — runs code in an isolated Docker container. Slower to start but safe for untrusted code.

Basic Code Execution Setup

import asyncio
from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen.coding import LocalCommandLineCodeExecutor

async def main():
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")

    # The AI that writes code
    coder = AssistantAgent(
        name="coder",
        model_client=model_client,
        system_message=(
            "You are an expert Python developer. Write clean, working code to solve tasks. "
            "Always put code in ```python blocks. Say DONE when the task is verified."
        ),
    )

    # The executor that runs code
    executor = UserProxyAgent(
        name="executor",
        code_execution_config={
            "code_executor": LocalCommandLineCodeExecutor(
                work_dir="./output",  # files saved here
                timeout=30,           # max seconds per execution
            )
        },
    )

    team = RoundRobinGroupChat(
        [coder, executor],
        termination_condition=TextMentionTermination("DONE"),
        max_turns=10,
    )

    await Console(team.run_stream(
        task="Write a Python script that generates a Fibonacci sequence up to 100 and saves it to fibonacci.txt"
    ))

asyncio.run(main())

Watch the agent:

  1. Write the Python code
  2. The executor runs it
  3. If there’s an error, the coder reads it and fixes the code
  4. When it works, the coder says DONE

Docker Sandbox (Safer Execution)

For production or untrusted environments, use Docker:

pip install autogen-ext[docker]
from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor

executor = UserProxyAgent(
    name="executor",
    code_execution_config={
        "code_executor": DockerCommandLineCodeExecutor(
            image="python:3.12-slim",
            work_dir="./output",
            timeout=60,
            auto_remove=True,  # remove container after execution
        )
    },
)

Docker adds ~2–5 second startup time but completely isolates the execution from your host.

Data Analysis Agent

A practical example — an agent that analyzes a CSV file:

import asyncio
from pathlib import Path
from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen.coding import LocalCommandLineCodeExecutor

# Create a sample CSV for the agent to analyze
Path("output").mkdir(exist_ok=True)
Path("output/sales.csv").write_text(
    "month,revenue,units\n"
    "Jan,45000,150\nFeb,52000,175\nMar,48000,160\n"
    "Apr,61000,205\nMay,58000,195\nJun,70000,235\n"
)

async def main():
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")

    analyst = AssistantAgent(
        name="analyst",
        model_client=model_client,
        system_message=(
            "You are a data analyst. Use pandas and matplotlib to analyze data. "
            "The CSV file is at output/sales.csv. Say VERIFIED when analysis is complete."
        ),
    )

    executor = UserProxyAgent(
        name="executor",
        code_execution_config={
            "code_executor": LocalCommandLineCodeExecutor(work_dir="./output")
        },
    )

    team = RoundRobinGroupChat(
        [analyst, executor],
        termination_condition=TextMentionTermination("VERIFIED"),
        max_turns=8,
    )

    from autogen_agentchat.ui import Console
    await Console(team.run_stream(
        task="Analyze the sales CSV. Calculate total revenue, average monthly revenue, best month, and growth rate. Print a summary report."
    ))

asyncio.run(main())

Test Writing Agent

Use AutoGen to automatically write and run pytest tests:

async def main():
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")

    tester = AssistantAgent(
        name="tester",
        model_client=model_client,
        system_message=(
            "You are a QA engineer. Write pytest unit tests for the provided code. "
            "Run tests, fix failures, and confirm all tests pass. Say PASS when done."
        ),
    )

    executor = UserProxyAgent(
        name="executor",
        code_execution_config={
            "code_executor": LocalCommandLineCodeExecutor(work_dir="./output")
        },
    )

    team = RoundRobinGroupChat(
        [tester, executor],
        termination_condition=TextMentionTermination("PASS"),
    )

    await Console(team.run_stream(
        task="""
        Write and run pytest tests for this function:

        def calculate_discount(price: float, discount_pct: float) -> float:
            if discount_pct < 0 or discount_pct > 100:
                raise ValueError("Discount must be 0-100")
            return price * (1 - discount_pct / 100)

        Test: normal discount, zero discount, 100% discount, invalid values.
        """
    ))

asyncio.run(main())

Controlling What the Agent Can Execute

Restrict the agent to only run specific types of code:

from autogen.coding import LocalCommandLineCodeExecutor

# Only allow Python files, no shell commands
executor_config = LocalCommandLineCodeExecutor(
    work_dir="./output",
    timeout=15,
    functions=[],           # no registered functions
)

# Or add allowed functions explicitly:
def safe_read_file(path: str) -> str:
    """Read a file safely."""
    from pathlib import Path
    p = Path(path)
    if not p.is_relative_to(Path("./output")):
        return "Error: can only read files in ./output"
    return p.read_text()

Frequently Asked Questions

Is it safe to let AutoGen execute code on my machine?

LocalCommandLineCodeExecutor runs with your user permissions — the agent can read and write files in work_dir and run any Python code. For development and trusted tasks, this is fine. For untrusted inputs or production, use DockerCommandLineCodeExecutor.

How do I see what files the agent created?

Check the work_dir folder after the run. The executor saves all code files and output there. You can also add a task like “list the files you created” and the agent will run ls to show them.

Can the agent install packages?

Yes — the agent can write import subprocess; subprocess.run(['pip', 'install', 'pandas']) and the executor will run it. Set timeout high enough for package installs (~120s). In Docker mode, installed packages are wiped when the container stops.

How do I prevent infinite loops?

Set max_turns on the team and timeout on the executor. Also give the termination condition a clear signal word (DONE, VERIFIED, PASS) and tell the agent in the system message to use it when finished.

Can the agent debug existing code I provide?

Yes. Include the buggy code in the task description. The agent will read it, run it, observe the error, and fix it iteratively.

Next Steps

Related Articles