Advanced Opendevin Explore 3 min read

OpenHands Advanced Configuration: Agents, Models, and Runtime

#opendevin #openhands #configuration #agents #models #runtime #settings

Configuration Overview

OpenHands is highly configurable. Key areas:

  1. Agent selection — which reasoning architecture to use
  2. LLM configuration — model selection and API settings
  3. Runtime configuration — sandbox environment settings
  4. Microagent configuration — specialized sub-agents
  5. Security settings — what the agent can and cannot do

Configuration lives in config.toml (project root) or environment variables.

Configuration File

Create config.toml in your OpenHands directory:

[core]
# Base directory for workspace (mounted into the sandbox)
workspace_base = "./workspace"

# Maximum iterations before stopping (prevents infinite loops)
max_iterations = 100

# Default agent to use
default_agent = "CodeActAgent"

[llm]
# Your primary LLM
model = "gpt-4o"
api_key = "${OPENAI_API_KEY}"

# Optional: different model for lightweight tasks
# cheap_model = "gpt-4o-mini"

# Retry configuration
num_retries = 3
retry_min_wait = 10
retry_max_wait = 60

[sandbox]
# Docker image for the execution environment
sandbox_container_image = "ghcr.io/all-hands-ai/runtime:0.14-nikolaik"

# Resource limits
timeout = 120  # seconds per action

# Allowed hostnames for web browsing
# browsing_allowed_hosts = ["github.com", "docs.python.org"]

Choosing the Right Agent

OpenHands supports multiple agent architectures:

[core]
default_agent = "CodeActAgent"

CodeActAgent uses a code-execution loop: the LLM writes executable code (shell commands, Python scripts) to accomplish tasks. It observes the output and iterates.

Best for: Software development, file operations, data analysis, running tests.

BrowsingAgent

[core]
default_agent = "BrowsingAgent"

Specialized for web browsing and research. Uses Playwright to control a real browser.

Best for: Research tasks, web scraping, filling out web forms.

SWEAgent

[core]
default_agent = "SWEAgent"

Modeled after the SWE-bench benchmark setup. More methodical approach to software engineering tasks.

Best for: Complex bug fixing, when reliability is more important than speed.

LLM Configuration

OpenAI

[llm]
model = "gpt-4o"
api_key = "sk-..."
base_url = "https://api.openai.com/v1"

Anthropic Claude

[llm]
model = "claude-sonnet-4-6"
api_key = "${ANTHROPIC_API_KEY}"

Azure OpenAI

[llm]
model = "azure/gpt-4o"
api_key = "${AZURE_OPENAI_API_KEY}"
base_url = "https://YOUR_RESOURCE.openai.azure.com/"
api_version = "2024-05-01-preview"

Local Models via Ollama

[llm]
model = "ollama/llama3.2:3b"
base_url = "http://localhost:11434"
# No API key needed
# Start Ollama first
ollama pull llama3.2:3b
ollama serve

Note: Local models have lower performance for complex tasks. Use at least a 70B parameter model for reliable results.

Multiple Models (Cost Optimization)

Use a capable model for reasoning and a cheaper model for simple tasks:

[llm]
model = "gpt-4o"
api_key = "${OPENAI_API_KEY}"

[llm.cheap]
model = "gpt-4o-mini"
api_key = "${OPENAI_API_KEY}"

Sandbox Configuration

The sandbox is a Docker container where the agent executes code.

Custom Docker Image

Build a custom sandbox with your project’s dependencies pre-installed:

# Dockerfile.sandbox
FROM ghcr.io/all-hands-ai/runtime:0.14-nikolaik

# Install project-specific dependencies
RUN pip install pandas numpy scikit-learn torch

# Or install from requirements.txt
COPY requirements.txt /tmp/
RUN pip install -r /tmp/requirements.txt

# Pre-configure git
RUN git config --global user.email "[email protected]"
RUN git config --global user.name "OpenHands Agent"
docker build -f Dockerfile.sandbox -t my-openhands-sandbox .
[sandbox]
sandbox_container_image = "my-openhands-sandbox"

Resource Limits

[sandbox]
timeout = 300           # 5 minutes per action
# memory_limit = "4g"   # RAM limit (Docker syntax)
# cpu_count = 2         # CPU cores

Network Access

Control what the agent can access on the network:

[sandbox]
# Disable all outbound network access (most secure)
# browsing_allowed_hosts = []

# Allow only specific domains
browsing_allowed_hosts = ["github.com", "pypi.org", "docs.python.org"]

Microagents

Microagents are specialized knowledge snippets that OpenHands activates based on keywords in the task. They provide domain-specific instructions without being loaded for every task.

Create microagents/my_project.md:

---
name: my_project
type: knowledge
triggers:
  - "my-app"
  - "myapp"
  - "our API"
---

# My Project Context

## Architecture
- FastAPI backend at /workspace/backend/
- React frontend at /workspace/frontend/
- PostgreSQL database, connection in .env as DATABASE_URL
- Tests run with: cd backend && pytest

## Conventions
- All API endpoints in /workspace/backend/app/routers/
- Database models in /workspace/backend/app/models/
- Always use async/await for database operations
- Run linting: cd backend && ruff check .

## Common Commands
- Start dev server: docker compose up
- Run tests: pytest backend/tests/
- Format code: black backend/ && ruff check --fix backend/

When a task mentions “my-app” or “our API”, this context is automatically injected.

Security Hardening

For production deployments:

1. Disable dangerous commands:

[sandbox]
# These restrict what can be run in the sandbox
restricted_commands = ["rm -rf /", "dd if=/dev/", "mkfs"]

2. Read-only workspace options: Mount parts of your filesystem as read-only:

docker run \
  -v /path/to/source:/workspace/source:ro \  # read-only source
  -v /path/to/output:/workspace/output:rw \  # writable output
  openhands

3. Network isolation:

# Run Docker container without internet access
docker run --network=none openhands

4. Server authentication:

[server]
jwt_secret = "${OPENHANDS_JWT_SECRET}"

Performance Tuning

Parallel Tool Execution

Enable parallel tool calls (where supported by the LLM):

[agent]
enable_prompt_extensions = true

Context Management

For large codebases, control how much context is sent to the LLM:

[llm]
# Max tokens to include in context
max_input_tokens = 100000

# Truncation strategy
truncation_strategy = "recent"  # keep most recent context

Caching

OpenHands caches LLM responses during a session. For repeated development cycles, this reduces API costs significantly.

Monitoring and Logging

[core]
# Log level: DEBUG, INFO, WARNING, ERROR
log_level = "INFO"

# Save conversation history
save_trajectory_path = "./trajectories/"

Trajectories are saved as JSON and contain the full agent reasoning + action history — useful for debugging agent behavior.

# Load and analyze a trajectory
import json

with open("trajectories/task_20260408.json") as f:
    trajectory = json.load(f)

for step in trajectory["history"]:
    print(f"Action: {step['action']['action_type']}")
    if "command" in step["action"]:
        print(f"  Command: {step['action']['command']}")
    print(f"  Observation: {step['observation']['content'][:200]}")

Frequently Asked Questions

Which LLM gives the best results?

Based on SWE-bench evaluations:

  1. Claude Sonnet 4+ / GPT-4o — best performance
  2. GPT-4o-mini — good for simpler tasks, much cheaper
  3. Local models (Llama 3.1 70B+) — viable but lower success rate

For production use, Claude Sonnet or GPT-4o is recommended.

How do I prevent the agent from modifying certain files?

Use a .openhands_instructions file in your repository with explicit constraints:

Never modify: .env, config/production.yaml, migrations/
Always run tests after changes.

Or configure read-only volume mounts for critical directories.

Can I run OpenHands without Docker?

Not recommended for production. Docker provides the isolated execution environment that makes OpenHands safe. For development, you can use the local runtime:

[sandbox]
runtime = "local"  # runs directly on host (not sandboxed!)

How do I update OpenHands?

docker pull ghcr.io/all-hands-ai/openhands:main
# or specific version:
docker pull ghcr.io/all-hands-ai/openhands:0.14

What’s the difference between max_iterations and action timeout?

  • max_iterations: total number of action steps the agent can take (prevents infinite loops)
  • timeout: seconds per individual action (prevents a single command from hanging)

Next Steps

Related Articles