Every AI agent tutorial eventually reaches the same question: how does the agent actually decide what to do next? Understanding From Plan to Action: Understanding Core AI Agent Reasoning Loops is the difference between building a system that works once in a demo and one that handles real-world complexity reliably. This article breaks down the core loop architectures, implements them from scratch in Python, and shows you the production patterns that keep agents from going off the rails.
What Is a Reasoning Loop?
A reasoning loop is the iterative cycle an AI agent runs to move from an initial goal to a completed task. Unlike a simple prompt-response pair, an agent’s loop interleaves thinking, tool use, and observation — repeating until a stopping condition is met.
The three fundamental components of every reasoning loop are:
- Thought — the model’s internal reasoning about the current state
- Action — a concrete step the agent takes (calling a tool, writing output, delegating a subtask)
- Observation — the result of the action, fed back into the next thought
Most modern agent frameworks are built on top of one of two canonical loop patterns: ReAct (Reasoning + Acting) and Plan-and-Execute. Knowing when to use each is half the battle.
flowchart TD
A([User Goal]) --> B[Thought: Analyze goal]
B --> C{Sufficient info?}
C -- No --> D[Action: Call Tool]
D --> E[Observation: Tool Result]
E --> B
C -- Yes --> F[Action: Generate Final Answer]
F --> G([Output to User])
The ReAct Pattern in Depth
ReAct (introduced by Yao et al., 2022) is the most widely deployed reasoning loop. The model alternates between free-form reasoning traces and discrete tool calls, grounding each thought in real observations.
The loop runs as a structured prompt where each turn appends:
Thought: <what the model is reasoning>
Action: <tool_name>
Action Input: <arguments>
Observation: <tool result>
This continues until the model emits a Final Answer: token. The beauty of ReAct is that the reasoning trace is visible — you can debug why an agent took a particular path.
Plan-and-Execute, by contrast, separates planning from execution entirely. A planner LLM produces a full step list upfront; an executor works through it sequentially. This is better for long, structured tasks (multi-file codegen, research reports) but less adaptive when early steps reveal unexpected results.
| ReAct | Plan-and-Execute | |
|---|---|---|
| Adaptability | High | Low |
| Token cost | Higher (iterative) | Lower per step |
| Debuggability | Easy (trace visible) | Moderate |
| Best for | Dynamic, open-ended tasks | Structured, predictable tasks |
Implementing a ReAct Loop from Scratch
You don’t need LangChain to build a working ReAct agent. The following implementation uses the Anthropic SDK directly, making every part of the loop transparent.
Prerequisites:
pip install anthropic
export ANTHROPIC_API_KEY="your-key-here"
Tool definitions — tools are just Python functions with a schema:
import json
import anthropic
client = anthropic.Anthropic()
# Tool implementations
def search_web(query: str) -> str:
"""Simulated web search — replace with real search API."""
results = {
"python asyncio": "asyncio is Python's built-in async framework, introduced in 3.4.",
"langchain agents": "LangChain agents use tool-calling LLMs to solve multi-step tasks.",
}
for key, val in results.items():
if key.lower() in query.lower():
return val
return f"No results found for: {query}"
def calculate(expression: str) -> str:
"""Safe arithmetic evaluator."""
try:
allowed = set("0123456789 +-*/().")
if not all(c in allowed for c in expression):
return "Error: only basic arithmetic allowed"
return str(eval(expression)) # noqa: S307 — allowlist enforced above
except Exception as e:
return f"Error: {e}"
# Tool registry
TOOLS = {
"search_web": search_web,
"calculate": calculate,
}
# Anthropic tool schemas
TOOL_SCHEMAS = [
{
"name": "search_web",
"description": "Search the web for factual information.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query"}
},
"required": ["query"],
},
},
{
"name": "calculate",
"description": "Evaluate a basic arithmetic expression.",
"input_schema": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "e.g. '(3 + 5) * 2'"}
},
"required": ["expression"],
},
},
]
The core reasoning loop:
def run_react_agent(goal: str, max_iterations: int = 10) -> str:
"""
ReAct reasoning loop using Claude's native tool-calling API.
Returns the agent's final answer as a string.
"""
messages = [{"role": "user", "content": goal}]
iteration = 0
while iteration < max_iterations:
iteration += 1
print(f"\n--- Iteration {iteration} ---")
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=TOOL_SCHEMAS,
messages=messages,
)
# Check stop condition
if response.stop_reason == "end_turn":
# Extract final text answer
for block in response.content:
if hasattr(block, "text"):
return block.text
return "Agent completed without text output."
# Process tool calls
if response.stop_reason == "tool_use":
# Append assistant message with all content blocks
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f" Action: {block.name}({block.input})")
observation = TOOLS[block.name](**block.input)
print(f" Observation: {observation}")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": observation,
})
# Feed observations back into the loop
messages.append({"role": "user", "content": tool_results})
continue
# Unexpected stop reason
break
return "Max iterations reached without a final answer."
if __name__ == "__main__":
goal = "What is asyncio in Python, and what is 144 divided by 12?"
answer = run_react_agent(goal)
print(f"\nFinal Answer:\n{answer}")
Run it:
python react_agent.py
Expected output structure:
--- Iteration 1 ---
Action: search_web({'query': 'python asyncio'})
Observation: asyncio is Python's built-in async framework, introduced in 3.4.
--- Iteration 2 ---
Action: calculate({'expression': '144 / 12'})
Observation: 12.0
--- Iteration 3 ---
Final Answer:
asyncio is Python's built-in asynchronous I/O framework, introduced in version 3.4...
And 144 divided by 12 equals 12.
Adding Memory to the Loop
A bare reasoning loop is stateless — it forgets everything between runs. Episodic memory lets you persist key observations across sessions, dramatically improving agent performance on repeated tasks.
For more on memory architecture, see LangChain Memory Management: Build Chatbots That Remember — the same patterns apply to any reasoning loop, not just LangChain.
Here’s a minimal in-process memory layer:
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class MemoryEntry:
content: str
timestamp: str = field(default_factory=lambda: datetime.utcnow().isoformat())
relevance_score: float = 1.0
class AgentMemory:
def __init__(self, max_entries: int = 20):
self.entries: list[MemoryEntry] = []
self.max_entries = max_entries
def add(self, content: str) -> None:
self.entries.append(MemoryEntry(content=content))
# Keep only most recent entries
if len(self.entries) > self.max_entries:
self.entries = self.entries[-self.max_entries:]
def format_for_prompt(self) -> str:
if not self.entries:
return "No prior context."
return "\n".join(
f"[{e.timestamp}] {e.content}" for e in self.entries[-5:]
)
def run_agent_with_memory(goal: str, memory: AgentMemory) -> str:
context = memory.format_for_prompt()
enriched_goal = f"Prior context:\n{context}\n\nCurrent task: {goal}"
answer = run_react_agent(enriched_goal)
memory.add(f"Task: {goal} → Answer: {answer[:200]}")
return answer
# Usage
memory = AgentMemory()
print(run_agent_with_memory("What is asyncio?", memory))
print(run_agent_with_memory("Give me an asyncio example using what you know.", memory))
Production Patterns and Failure Modes
Reasoning loops fail in predictable ways. Here are the four you’ll encounter most often and how to handle them.
1. Infinite loops
The max_iterations guard in the implementation above is non-negotiable. Always set it. A model stuck in a cycle (thought → wrong tool → bad observation → repeat) will drain your token budget in minutes.
2. Hallucinated tool calls
Models sometimes invent tool names that don’t exist. Validate every tool call against your registry before executing:
if block.name not in TOOLS:
observation = f"Error: tool '{block.name}' does not exist. Available: {list(TOOLS.keys())}"
3. Uncontrolled autonomy
For tasks with side effects (sending emails, writing files, deploying code), add a human approval gate before execution. See AutoGen Human-in-the-Loop: Keep Humans in Control of AI Agents for a pattern that works across frameworks.
4. Context window overflow
Long loops accumulate many messages. Implement a rolling window or summarize older turns:
def trim_messages(messages: list, keep_last: int = 20) -> list:
"""Keep system message + last N turns."""
if len(messages) <= keep_last:
return messages
return messages[:1] + messages[-(keep_last - 1):]
A quick observability checklist before shipping a reasoning loop to production:
- Maximum iteration count enforced
- All tool inputs validated before execution
- Tool errors caught and returned as observations (not exceptions)
- Message history trimmed to fit context window
- Human-in-the-loop gate for destructive actions
- Structured logging of each thought/action/observation cycle
Frequently Asked Questions
What is the difference between a ReAct agent and a function-calling agent?
They’re closely related. Function-calling (or tool-calling) is the mechanism — the model emits a structured request to invoke a function. ReAct is a loop architecture that uses function-calling as its action primitive. A function-calling agent without a loop just calls one tool and stops; a ReAct agent repeats the cycle until the goal is reached.
How many iterations should I allow before stopping the loop?
It depends on task complexity. For simple lookup tasks, 3–5 iterations is usually enough. For research or multi-step coding tasks, 10–15 is reasonable. Set a hard ceiling at 20–25 to prevent runaway loops. Monitor your p95 iteration count in production and tune from there.
Can I mix ReAct and Plan-and-Execute in the same system?
Yes — this is called a hybrid agent. A common pattern is to use Plan-and-Execute for the outer structure (break the goal into phases) and ReAct within each phase (adaptive tool use per step). This gives you structure without sacrificing adaptability.
How do I debug a reasoning loop that produces wrong answers?
Print the full messages list at each iteration. The reasoning trace is your primary debugging tool — look for the step where the model’s thought diverges from reality. Common culprits: a tool returning unexpected output format, a missing observation being silently ignored, or context window truncation cutting off earlier relevant information.
Does the loop pattern change when using open-source models instead of Claude?
The loop logic is identical — it’s just Python. What changes is the tool-calling API. Open-source models may require a different schema format or may not support native tool-calling at all, in which case you implement tool dispatch via regex parsing of the model’s raw text output (the original ReAct paper approach). Claude’s native tool-calling is more reliable and structured, but the loop architecture stays the same regardless.