LlamaIndex Agents: Build Tool-Using Agents Over Your Data

Q: What's the difference between `ReActAgent` and `FunctionCallingAgent`?

ReActAgent uses the ReAct pattern (Reasoning + Acting) via prompt engineering. It works with any LLM. FunctionCallingAgent uses native function/tool calling supported by OpenAI and Anthropic models. FunctionCallingAgent is more reliable and supports parallel tool calls. Use FunctionCallingAgent for GPT-4o and Claude; use ReActAgent for other models.

Q: How do I debug why the agent chose a particular tool?

Set verbose=True on the agent. Each tool call prints the reasoning (ReActAgent) or function call details (FunctionCallingAgent). For deeper inspection, access agent.chathistory after a run.

Q: Can I build an agent that writes and executes code?

Yes, add a code execution function tool: python import subprocess def executepython(code: str) -> str: """Execute Python code and return stdout.""" result = subprocess.run(["python", "-c", code], captureoutput=True, text=True, timeout=10) return result.stdout or result.stderr codetool = FunctionTool.fromdefaults(fn=executepython)

LlamaIndex Agents vs LangChain Agents

LlamaIndex has its own agent system built around query engine tools — tools that wrap RAG pipelines. This makes LlamaIndex agents uniquely suited for tasks where the agent needs to query different knowledge bases to answer a question.

While LangChain agents excel at general tool use, LlamaIndex agents shine when your tools are primarily document retrieval operations.

Setup

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
export OPENAI_API_KEY="sk-your-key"

The Simplest Agent: ReActAgent with Tools

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.tools import QueryEngineTool, FunctionTool
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure global settings
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Build a RAG index
documents = SimpleDirectoryReader("data/").load_data()
index = VectorStoreIndex.from_documents(documents)

# Wrap the query engine as a tool
rag_tool = QueryEngineTool.from_defaults(
    query_engine=index.as_query_engine(),
    name="knowledge_base",
    description=(
        "Search the internal knowledge base for information about our products, "
        "policies, and technical documentation. "
        "Use this for any factual questions about our company."
    ),
)

# Create the agent
agent = ReActAgent.from_tools(
    tools=[rag_tool],
    verbose=True,
    max_iterations=10,
)

# Ask a question — the agent decides whether and how to use the tool
response = agent.chat("What is the pricing for the Pro plan?")
print(str(response))

Function Tools: Custom Capabilities

Add any Python function as a tool:

from llama_index.core.tools import FunctionTool
from datetime import datetime

def get_current_date() -> str:
    """Returns the current date and time."""
    return datetime.now().strftime("%Y-%m-%d %H:%M UTC")

def calculate(expression: str) -> str:
    """Evaluate a math expression. Input: Python math expression like '100 * 1.15'."""
    try:
        result = eval(expression, {"__builtins__": {}}, {})
        return str(result)
    except Exception as e:
        return f"Error: {e}"

def lookup_customer(customer_id: str) -> str:
    """Look up customer information by ID."""
    # In production: query your database
    customers = {
        "C001": "Alice Johnson, Premium, joined 2023-05",
        "C002": "Bob Smith, Free, joined 2024-01",
    }
    return customers.get(customer_id, f"Customer {customer_id} not found")

# Create FunctionTool objects
date_tool = FunctionTool.from_defaults(fn=get_current_date)
calc_tool = FunctionTool.from_defaults(fn=calculate)
lookup_tool = FunctionTool.from_defaults(fn=lookup_customer)

Multi-Source Agent

The real power: an agent that queries different knowledge bases based on what the question needs:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool
from llama_index.core.agent import ReActAgent

# Build separate indexes for different document sets
product_docs = SimpleDirectoryReader("data/products/").load_data()
policy_docs = SimpleDirectoryReader("data/policies/").load_data()
technical_docs = SimpleDirectoryReader("data/technical/").load_data()

product_index = VectorStoreIndex.from_documents(product_docs)
policy_index = VectorStoreIndex.from_documents(policy_docs)
technical_index = VectorStoreIndex.from_documents(technical_docs)

# Wrap each as a tool with a clear description
tools = [
    QueryEngineTool.from_defaults(
        query_engine=product_index.as_query_engine(),
        name="product_catalog",
        description="Information about products, pricing, and features. Use for product questions.",
    ),
    QueryEngineTool.from_defaults(
        query_engine=policy_index.as_query_engine(),
        name="company_policies",
        description="HR policies, legal terms, and company procedures. Use for policy questions.",
    ),
    QueryEngineTool.from_defaults(
        query_engine=technical_index.as_query_engine(),
        name="technical_docs",
        description="API references, setup guides, and technical documentation.",
    ),
]

agent = ReActAgent.from_tools(tools=tools, verbose=True)

# The agent automatically chooses which knowledge base to query
response = agent.chat("What's the refund policy for enterprise customers?")
# → Uses company_policies tool

response = agent.chat("How do I configure the API rate limits?")
# → Uses technical_docs tool

FunctionCallingAgent (Recommended for Tool Calling)

For models with native function calling (GPT-4o, Claude), use FunctionCallingAgent instead of ReActAgent:

from llama_index.core.agent import FunctionCallingAgent

# More reliable than ReAct for tool-heavy workflows
agent = FunctionCallingAgent.from_tools(
    tools=[rag_tool, calc_tool, date_tool],
    verbose=True,
    allow_parallel_tool_calls=True,  # call multiple tools simultaneously
)

response = agent.chat("What's today's date and how many products do we have?")
# Calls date_tool and rag_tool in parallel

Conversational Agent with Memory

from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=4096)

agent = FunctionCallingAgent.from_tools(
    tools=[rag_tool, calc_tool],
    memory=memory,
    verbose=True,
)

# Multi-turn conversation — agent remembers context
print(agent.chat("What is the price of the Pro plan?"))
print(agent.chat("And how much would that be for 10 users?"))
print(agent.chat("What about the Enterprise plan?"))

Async Agent for Web Applications

import asyncio
from llama_index.core.agent import FunctionCallingAgent

async def handle_user_query(user_message: str) -> str:
    response = await agent.achat(user_message)
    return str(response)

async def main():
    queries = [
        "What is the pricing?",
        "How do I get started?",
        "What integrations are available?",
    ]
    responses = await asyncio.gather(*[handle_user_query(q) for q in queries])
    for q, r in zip(queries, responses):
        print(f"Q: {q}\nA: {r}\n")

asyncio.run(main())

Streaming Agent Responses

streaming_agent = FunctionCallingAgent.from_tools(
    tools=[rag_tool],
    llm=OpenAI(model="gpt-4o-mini", streaming=True),
)

# Stream the response token by token
response = streaming_agent.stream_chat("Explain the main features.")
for token in response.response_gen:
    print(token, end="", flush=True)
print()

Frequently Asked Questions

What’s the difference between `ReActAgent` and `FunctionCallingAgent`?

ReActAgent uses the ReAct pattern (Reasoning + Acting) via prompt engineering. It works with any LLM. FunctionCallingAgent uses native function/tool calling supported by OpenAI and Anthropic models. FunctionCallingAgent is more reliable and supports parallel tool calls. Use FunctionCallingAgent for GPT-4o and Claude; use ReActAgent for other models.

How do I limit which tools the agent can use?

Create the agent with only the tools you want to expose, or use a routing layer that selects tools based on the query type before passing to the agent.

Can the agent call a tool multiple times?

Yes, this is normal. The agent will call tools iteratively until it has enough information. Set max_iterations to prevent infinite loops.

How do I debug why the agent chose a particular tool?

Set verbose=True on the agent. Each tool call prints the reasoning (ReActAgent) or function call details (FunctionCallingAgent). For deeper inspection, access agent.chat_history after a run.

Can I build an agent that writes and executes code?

Yes, add a code execution function tool:

import subprocess

def execute_python(code: str) -> str:
    """Execute Python code and return stdout."""
    result = subprocess.run(["python", "-c", code], capture_output=True, text=True, timeout=10)
    return result.stdout or result.stderr

code_tool = FunctionTool.from_defaults(fn=execute_python)

Next Steps

LlamaIndex Workflows — Orchestrate agents in event-driven pipelines
LlamaIndex Advanced Retrieval Techniques — Improve the RAG tools your agents use
LangChain Agents and Tools — Compare with LangChain’s agent approach