Building a Self-Debugging Agent in Claw Code using ReAct Principles is one of the most practical applications of agentic AI — giving your toolchain the ability to observe its own failures and reason its way to a fix. This tutorial walks you through installing Claw Code from source, understanding the ReAct (Reason + Act) loop, and wiring together a Python orchestrator that sends broken code to the claw CLI, captures the error, and iterates until the code runs clean.
If you’ve worked with frameworks like LangChain or are familiar with LlamaIndex Workflows: Event-Driven AI Pipelines, the pattern here will feel natural — a tight observe-reason-act cycle driving incremental improvement.
Installing Claw Code from Source
The claw-code package on crates.io is deprecated and installs the wrong binary. You must build from the ultraworkers/claw-code repository directly.
Prerequisites: Rust toolchain (rustup, cargo), Git, Python 3.10+.
# Clone the repository
git clone https://github.com/ultraworkers/claw-code
cd claw-code/rust
# Build the full workspace
cargo build --workspace
The build output lands at ./target/debug/claw. Verify everything is working:
# Set your API key (Anthropic recommended for this tutorial)
export ANTHROPIC_API_KEY="sk-ant-..."
# Run the health check
./target/debug/claw doctor
A successful doctor run prints your authenticated model, available tools, and sandbox status. If it fails, confirm that ANTHROPIC_API_KEY is exported in the same shell session — a Claude web subscription login is not accepted here.
On Windows (PowerShell), the syntax differs slightly:
$env:ANTHROPIC_API_KEY = "sk-ant-..."
.\target\debug\claw.exe doctor
Keep the absolute path to claw handy — the orchestrator script below will need it.
ReAct Principles: Reason, Act, Observe
ReAct is a prompting and architecture pattern introduced in the 2022 paper ReAct: Synergizing Reasoning and Acting in Language Models. The core idea is simple: instead of generating a single response, the agent alternates between:
- Thought — internal reasoning about the current state
- Action — invoking a tool or executing code
- Observation — reading the result of that action
Then it loops. The loop terminates when the agent decides its goal is satisfied or a maximum iteration count is reached.
For a self-debugging agent, the loop looks like this:
flowchart TD
A[User Submits Broken Code] --> B[Agent: Reason about error]
B --> C[Agent: Propose Fix]
C --> D[Execute Fixed Code]
D --> E{Execution Result}
E -- Error --> F[Observation: Capture stderr]
F --> B
E -- Success --> G[Return Fixed Code to User]
G --> H[End]
Each iteration, the agent receives the previous code, the error output, and a reasoning prompt. It produces a corrected version, which the orchestrator executes. This continues until the code exits cleanly (exit code 0) or the iteration budget is exhausted.
This is fundamentally the same pattern used in autonomous agent frameworks — compare it to how What Is AutoGPT? The Autonomous AI Agent Explained describes AutoGPT’s inner loop.
Architecture of the Orchestrator
Before writing code, let’s define the moving parts:
| Component | Role |
|---|---|
claw prompt | Calls the LLM with a structured prompt, returns text |
| Python subprocess | Executes the LLM-generated code snippet |
| Orchestrator loop | Feeds error output back as observation |
| Iteration limit | Prevents infinite loops on unsolvable problems |
The orchestrator is intentionally thin — it does not reimplement LLM calling logic. All model interaction goes through claw prompt, keeping the Rust binary as the single source of truth for auth, retries, and model routing.
Implementation: The Self-Debugging Agent
Save this as self_debug_agent.py alongside your claw-code checkout:
import subprocess
import sys
import tempfile
import textwrap
from pathlib import Path
# Path to the compiled claw binary — adjust if you ran cargo build --release
CLAW_BIN = Path(__file__).parent / "claw-code" / "rust" / "target" / "debug" / "claw"
MAX_ITERATIONS = 5
def run_claw(prompt: str) -> str:
"""Send a prompt to the claw CLI and return the text response."""
result = subprocess.run(
[str(CLAW_BIN), "prompt", prompt],
capture_output=True,
text=True,
timeout=120,
)
if result.returncode != 0:
raise RuntimeError(f"claw exited {result.returncode}: {result.stderr.strip()}")
return result.stdout.strip()
def execute_python(code: str) -> tuple[bool, str]:
"""Write code to a temp file and execute it. Returns (success, output)."""
with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
f.write(code)
tmp_path = f.name
result = subprocess.run(
[sys.executable, tmp_path],
capture_output=True,
text=True,
timeout=30,
)
output = result.stdout + result.stderr
return result.returncode == 0, output.strip()
def extract_code_block(text: str) -> str:
"""Pull the first ```python ... ``` block from the model response."""
lines = text.splitlines()
in_block = False
collected = []
for line in lines:
if line.strip().startswith("```python"):
in_block = True
continue
if in_block and line.strip() == "```":
break
if in_block:
collected.append(line)
if not collected:
# Fall back: return the whole response if no fenced block found
return text
return "\n".join(collected)
def build_fix_prompt(code: str, error: str, iteration: int) -> str:
return textwrap.dedent(f"""
You are a Python debugging assistant operating in a ReAct loop (iteration {iteration}).
THOUGHT: Analyze why the following code produced an error.
ACT: Produce a corrected, complete, runnable Python script.
Rules:
- Return ONLY a single ```python ... ``` fenced code block.
- Do not explain. Do not add prose outside the code block.
- The fixed code must be self-contained and executable with no arguments.
--- BROKEN CODE ---
{code}
--- ERROR OUTPUT ---
{error}
Now return the fixed code.
""").strip()
def self_debug(broken_code: str) -> str:
"""
ReAct loop: reason about the error, act by generating a fix,
observe the result, repeat until success or budget exhausted.
"""
current_code = broken_code
for iteration in range(1, MAX_ITERATIONS + 1):
print(f"\n[Iteration {iteration}] Executing code...")
success, output = execute_python(current_code)
if success:
print(f"[Iteration {iteration}] Code executed successfully.")
return current_code
print(f"[Iteration {iteration}] Error detected:\n{output}")
prompt = build_fix_prompt(current_code, output, iteration)
print(f"[Iteration {iteration}] Asking claw for a fix...")
response = run_claw(prompt)
current_code = extract_code_block(response)
raise RuntimeError(
f"Agent could not fix the code within {MAX_ITERATIONS} iterations."
)
# ---------------------------------------------------------------------------
# Example: deliberately broken code for the agent to fix
# ---------------------------------------------------------------------------
BROKEN_CODE = """
import statistics
data = [10, 20, 30, 40, 50]
# Bug 1: wrong function name
mean_val = statistics.average(data)
# Bug 2: undefined variable
print(f"Mean: {mean_val}, Median: {meadian_val}")
"""
if __name__ == "__main__":
print("=== Self-Debugging Agent (ReAct) ===")
print("Initial broken code:")
print(BROKEN_CODE)
try:
fixed = self_debug(BROKEN_CODE.strip())
print("\n=== FINAL FIXED CODE ===")
print(fixed)
except RuntimeError as e:
print(f"\nFailed: {e}")
sys.exit(1)
Run it from the repo root:
python self_debug_agent.py
You should see the agent iterate — first catching AttributeError: module 'statistics' has no attribute 'average', generating a fix (statistics.mean), then catching NameError: name 'meadian_val' is not defined, producing the correct median_val, and finally printing a clean result.
Testing and Extending the Agent
Verify the sandbox before running untrusted code through the agent:
./target/debug/claw sandbox
This shows the execution context — whether the binary detects a container (Docker/Podman) and what filesystem isolation is in place. For a self-debugging agent that executes arbitrary generated code, running inside a container is strongly recommended in any non-development environment.
To test more complex failure modes, replace BROKEN_CODE with real-world broken snippets:
# Test: import error
BROKEN_CODE = """
import nonexistent_package
print("hello")
"""
# Test: type mismatch
BROKEN_CODE = """
values = {"a": 1, "b": 2}
total = sum(values) # TypeError: unsupported operand type
print(total)
"""
# Test: off-by-one
BROKEN_CODE = """
items = [1, 2, 3]
print(items[3]) # IndexError: list index out of range
"""
Each of these will trigger a different observation in the loop, forcing the agent to reason about a distinct class of Python error.
Extending to multi-file projects: The current orchestrator writes a single temp file. For projects with imports, use tempfile.TemporaryDirectory(), write all files there, and run python main.py from that directory. The run_claw prompt can include multiple code blocks — just expand extract_code_block to handle them by filename.
For larger agent orchestration patterns — including role-based task delegation — see Getting Started with CrewAI: Multi-Agent Workflows in Python, which covers a similar observe-plan-act design at the multi-agent level.
Frequently Asked Questions
Why build from source instead of cargo install claw-code?
The claw-code name on crates.io points to a deprecated, unmaintained package. The active project lives at ultraworkers/claw-code and must be compiled from source using cargo build --workspace inside the rust/ directory. The resulting binary is named claw, not claw-code.
Can I use OpenAI models instead of Anthropic?
Yes. Set export OPENAI_API_KEY="..." instead of ANTHROPIC_API_KEY. The claw doctor command will confirm which provider is active. No changes to the orchestrator script are needed — claw prompt handles model routing internally.
How do I prevent the agent from running dangerous generated code?
Two layers of defense: first, run the orchestrator inside a Docker or Podman container (the Containerfile in the repository provides a ready-made image). Second, limit subprocess permissions using Python’s resource module on Linux (set CPU and memory limits before subprocess.run). For production use, treat every generated code snippet as untrusted user input.
What happens if the model returns code that doesn’t include a fenced code block?
The extract_code_block function falls back to returning the entire response. This is intentional — some models return plain code without fencing on simple prompts. If you find the agent frequently producing malformed responses, tighten the prompt: add "IMPORTANT: your response must contain exactly one \“python block and nothing else.”tobuild_fix_prompt`.
How do I increase the iteration limit safely?
Change MAX_ITERATIONS at the top of the script. Each iteration makes one claw prompt call (one API request). With Anthropic’s API, each call has a cost. For interactive development, 5 iterations is a reasonable budget. For CI/CD pipelines where you want higher confidence, 8–10 is practical. Always log the iteration count and final code so you can audit how many attempts were needed.