Unlocking Agent Capabilities: A Beginner's Guide to Function Calling and Tool Use

Q: How do I prevent the model from calling a tool when it shouldn't?

Two levers: (1) write precise description fields that specify both the ideal use case and explicit exclusions ("do NOT use for X"), and (2) use the toolchoice parameter. Setting toolchoice={"type": "none"} disables all tools for a turn; {"type": "auto"} lets the model decide; {"type": "tool", "name": "calculate"} forces a specific tool.

If you’ve been exploring AI agents, you’ve likely wondered how they move beyond conversation into actually doing things — searching the web, reading files, calling APIs, running code. The answer lies in function calling and tool use, and this guide, Unlocking Agent Capabilities: A Beginner’s Guide to Function Calling and Tool Use, walks you through every step from first principles to a working implementation you can run today.

What Is Function Calling?

By default, a large language model (LLM) outputs text. It can reason, summarize, and generate — but it cannot directly query a database or call a REST API. Function calling (also called tool use) bridges that gap: the model outputs a structured request to invoke a specific function, your application executes it, and the result feeds back to the model.

The key insight is that the model never runs code itself. It decides when and with what arguments to call a function. Your runtime actually executes it and returns the result. This separation is intentional — it keeps the model sandboxed while giving it real-world reach.

Here’s the high-level flow:

sequenceDiagram
    participant User
    participant Agent
    participant LLM
    participant Tool

    User->>Agent: "What's the weather in Seoul?"
    Agent->>LLM: User message + tool definitions
    LLM-->>Agent: tool_call: get_weather(city="Seoul")
    Agent->>Tool: Execute get_weather("Seoul")
    Tool-->>Agent: {"temp": 18, "condition": "cloudy"}
    Agent->>LLM: Tool result appended to context
    LLM-->>Agent: "It's 18°C and cloudy in Seoul."
    Agent-->>User: Final answer

This loop — reason → call → observe → reason — is the foundation of every capable agent you’ll build.

Setting Up Your Environment

We’ll use the Anthropic Python SDK, which has first-class support for tool use. The patterns here map directly onto OpenAI’s function calling API with minor syntax differences, so the concepts transfer immediately.

pip install anthropic python-dotenv

Create a .env file:

ANTHROPIC_API_KEY=your_key_here

Your project layout for this tutorial:

tool_use_demo/
├── .env
├── tools.py          # tool definitions and implementations
├── agent.py          # agent loop
└── main.py           # entry point

Core Concepts: Defining Tools

A tool definition is a JSON schema that tells the model what functions exist, what they do, and what parameters they accept. The model uses this schema to decide whether to call a tool and how to structure the arguments.

Every tool definition has three required fields:

Field	Purpose
`name`	Unique identifier the model uses to call the function
`description`	Natural language description — this is what the model reads to decide when to use the tool
`input_schema`	JSON Schema describing the expected parameters

The description field is the most important part. A vague description leads to wrong tool selection. A precise description — including when not to use the tool — leads to accurate, reliable behavior.

Here’s tools.py with three practical tools:

# tools.py
import json
import math
from datetime import datetime

# --- Tool Definitions (what the model sees) ---

TOOL_DEFINITIONS = [
    {
        "name": "get_current_time",
        "description": (
            "Returns the current UTC date and time. "
            "Use this when the user asks about the current time, date, or day of week."
        ),
        "input_schema": {
            "type": "object",
            "properties": {},
            "required": [],
        },
    },
    {
        "name": "calculate",
        "description": (
            "Evaluates a safe mathematical expression and returns the result. "
            "Use for arithmetic, trigonometry, or any numeric calculation. "
            "Do NOT use for string operations."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "A Python-evaluable math expression, e.g. '2 ** 10' or 'math.sqrt(144)'",
                }
            },
            "required": ["expression"],
        },
    },
    {
        "name": "search_knowledge_base",
        "description": (
            "Searches a local knowledge base and returns matching documents. "
            "Use when the user asks about topics that may be in stored documentation."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query string",
                },
                "max_results": {
                    "type": "integer",
                    "description": "Maximum number of results to return (default: 3)",
                    "default": 3,
                },
            },
            "required": ["query"],
        },
    },
]

# --- Tool Implementations (what your code actually runs) ---

KNOWLEDGE_BASE = {
    "langchain": "LangChain is a framework for building LLM-powered applications with chains and agents.",
    "llamaindex": "LlamaIndex specializes in connecting LLMs with external data via retrieval pipelines.",
    "rag": "RAG (Retrieval-Augmented Generation) combines vector search with LLM generation.",
    "vector database": "Vector databases store embeddings for semantic similarity search.",
}


def get_current_time() -> dict:
    now = datetime.utcnow()
    return {
        "utc_time": now.strftime("%Y-%m-%d %H:%M:%S"),
        "day_of_week": now.strftime("%A"),
    }


def calculate(expression: str) -> dict:
    # Restrict to safe math operations only
    allowed_names = {k: v for k, v in math.__dict__.items() if not k.startswith("_")}
    allowed_names["abs"] = abs
    try:
        result = eval(expression, {"__builtins__": {}}, allowed_names)
        return {"result": result, "expression": expression}
    except Exception as e:
        return {"error": str(e), "expression": expression}


def search_knowledge_base(query: str, max_results: int = 3) -> dict:
    query_lower = query.lower()
    matches = [
        {"topic": k, "summary": v}
        for k, v in KNOWLEDGE_BASE.items()
        if any(word in k for word in query_lower.split())
    ]
    return {"results": matches[:max_results], "total_found": len(matches)}


def dispatch_tool(tool_name: str, tool_input: dict) -> str:
    """Route a tool call to the correct implementation."""
    if tool_name == "get_current_time":
        result = get_current_time()
    elif tool_name == "calculate":
        result = calculate(**tool_input)
    elif tool_name == "search_knowledge_base":
        result = search_knowledge_base(**tool_input)
    else:
        result = {"error": f"Unknown tool: {tool_name}"}

    return json.dumps(result)

Implementing the Agent Loop

The agentic loop runs until the model either produces a final text response or explicitly stops calling tools. Here’s agent.py:

# agent.py
import anthropic
from tools import TOOL_DEFINITIONS, dispatch_tool

client = anthropic.Anthropic()


def run_agent(user_message: str, verbose: bool = True) -> str:
    """
    Run a single-turn agent that can call tools until it produces a final answer.
    Returns the final text response.
    """
    messages = [{"role": "user", "content": user_message}]

    if verbose:
        print(f"\n[User] {user_message}")

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=TOOL_DEFINITIONS,
            messages=messages,
        )

        # Check stop reason
        if response.stop_reason == "end_turn":
            # Model produced a final text response
            final_text = next(
                block.text for block in response.content if block.type == "text"
            )
            if verbose:
                print(f"[Agent] {final_text}")
            return final_text

        if response.stop_reason == "tool_use":
            # Collect all tool calls from this response turn
            tool_calls = [b for b in response.content if b.type == "tool_use"]
            tool_results = []

            for tool_call in tool_calls:
                if verbose:
                    print(f"[Tool Call] {tool_call.name}({tool_call.input})")

                result_content = dispatch_tool(tool_call.name, tool_call.input)

                if verbose:
                    print(f"[Tool Result] {result_content}")

                tool_results.append(
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_call.id,
                        "content": result_content,
                    }
                )

            # Append assistant turn (with tool_use blocks) + tool results to messages
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})

        else:
            # Unexpected stop reason
            break

    return "[Agent stopped unexpectedly]"

And main.py to wire it together:

# main.py
from dotenv import load_dotenv
from agent import run_agent

load_dotenv()

if __name__ == "__main__":
    queries = [
        "What time is it right now?",
        "What is 2 to the power of 32?",
        "Tell me about LlamaIndex and RAG.",
        "What's the square root of 1764, and what day is today?",
    ]

    for query in queries:
        print("\n" + "=" * 60)
        run_agent(query)

Run it:

python main.py

You’ll see the agent reason through multi-step queries — like simultaneously calling get_current_time and calculate for the last question — before synthesizing a natural language answer.

Production Patterns

Once your basic loop works, three patterns separate toy implementations from production-grade agents.

Parallel tool calls — Modern models can request multiple tool calls in a single response turn. The loop above already handles this: it collects all tool_use blocks before dispatching, runs them (potentially in parallel with asyncio or ThreadPoolExecutor), and returns all results together. Never dispatch one tool, wait, then dispatch the next — you’ll waste latency.

Tool result validation — Before returning tool output to the model, validate and truncate. If a search returns 50KB of text, the model’s context fills up fast. Add a max_chars guard:

def safe_dispatch(tool_name: str, tool_input: dict, max_chars: int = 2000) -> str:
    raw = dispatch_tool(tool_name, tool_input)
    if len(raw) > max_chars:
        raw = raw[:max_chars] + "... [truncated]"
    return raw

Graceful error handling — Tools fail. Network timeouts, bad inputs, permission errors. Return the error as a tool_result with is_error: True instead of raising an exception — the model can then decide to retry, try a different tool, or inform the user:

tool_results.append(
    {
        "type": "tool_result",
        "tool_use_id": tool_call.id,
        "content": json.dumps({"error": str(e)}),
        "is_error": True,
    }
)

For deeper patterns around giving agents persistent state across tool calls, see Advanced State and Memory Management in AgentScope. If you want to see how tool use underpins self-correcting agents, Building a Self-Debugging Agent in Claw Code using ReAct Principles applies the same loop to code execution.

Frequently Asked Questions

What’s the difference between function calling and tool use?

They refer to the same mechanism with different naming conventions. OpenAI coined “function calling” in their API; Anthropic uses “tool use.” The underlying pattern is identical: the model emits a structured JSON call, your code executes it, the result returns to the model. When reading documentation across providers, treat the terms as interchangeable.

Can a tool call trigger another tool call?

Yes — this is exactly what happens in a multi-step agent. After your code returns a tool result, the model reasons again and may emit another tool_use block. The while True loop in agent.py handles this naturally. To prevent infinite loops in production, add a max_iterations counter and break out if it’s exceeded.

How do I prevent the model from calling a tool when it shouldn’t?

Two levers: (1) write precise description fields that specify both the ideal use case and explicit exclusions (“do NOT use for X”), and (2) use the tool_choice parameter. Setting tool_choice={"type": "none"} disables all tools for a turn; {"type": "auto"} lets the model decide; {"type": "tool", "name": "calculate"} forces a specific tool.

What happens if my tool takes too long?

Tool execution happens in your runtime, so standard async/timeout patterns apply. Wrap slow tools in asyncio.wait_for with a timeout, and return the error as a tool_result with is_error: True. The model will acknowledge the timeout and can decide to retry or proceed without the result.

How many tools can I define at once?

Practically, you can define dozens of tools, but cognitive overload is real — both for the model and for debugging. Keep tool sets focused. If you have more than 10–15 tools, consider using a tool router pattern: a lightweight first call classifies the user intent and loads only the relevant subset of tools for the main agent call.