Part of Python AI Tutorial Series

Build AI Apps with Python: Tool Error Handling — Make Tools Bulletproof | Episode 11

Celest KimCelest Kim

Video: Build AI Apps with Python: Tool Error Handling — Make Tools Bulletproof | Episode 11 by Taught by Celeste AI - AI Coding Coach

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

Student code: github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode11 A bulletproof tool returns an error dict instead of crashing — and Claude reads it and adapts.

There's a moment in every agent project where a tool fails and the script either crashes, loops forever, or — worst — silently does the wrong thing. The way to prevent all three is the same: never let a tool raise an exception, never return raw failures, always return structured error data, and let Claude react to it.

That's the entire content of this episode. It sounds small. In practice, the difference between an agent that works in demos and an agent that holds up in production is almost entirely about error discipline at the tool layer.

What we're building

Three tools, hardened with try/except, input validation, and structured errors:

  • read_file(path) — refuses paths containing .., returns {"error": ...} for missing files.
  • write_file(path, content) — refuses empty paths and .., returns errors instead of raising.
  • divide(a, b) — validates that both arguments are numbers, refuses zero division.

Three test prompts that will hit errors and let us watch Claude handle them:

  • "Read the file secret.txt" — file doesn't exist; tool returns an error; Claude explains.
  • "Divide 100 by 0, then try dividing 100 by 4" — first call errors, second succeeds; Claude does both.
  • "Write a greeting to hello.txt, then read it back" — happy path, both calls succeed.

Errors as values

A typical Python instinct is to let exceptions propagate. "If the file doesn't exist, raise FileNotFoundError. The caller will deal with it." For agent tools, that instinct is wrong.

The reason is a simple chain of facts. Inside the agent loop:

  1. The model sends a tool_use block.
  2. We dispatch to the corresponding Python function.
  3. The function returns a result.
  4. We wrap the result in a tool_result block and send it back.

If step 3 throws, the script crashes. The model never finds out the tool failed. The user gets a Python traceback, not a helpful answer.

Worse: if you wrap step 3 in a generic try/except that swallows the exception and returns nothing, the agent sees a null tool result and can't tell what happened. Maybe the tool worked and returned nothing. Maybe it crashed. Maybe the file is genuinely empty. The model has no information.

The fix is to catch the exception inside the tool and return it as data:

def read_file(path):
    try:
        full_path = os.path.join(WORK_DIR, path)
        if ".." in path:
            return {"error": "Invalid path: directory traversal not allowed"}
        if not os.path.exists(full_path):
            return {"error": f"File not found: {path}"}
        with open(full_path, "r") as f:
            return {"path": path, "content": f.read()}
    except Exception as e:
        return {"error": f"Read failed: {str(e)}"}

The function never raises. It always returns a dict. On success, the dict has the success keys (path, content). On failure, it has an error key with a human-readable message.

Claude can read this. The model gets {"error": "File not found: secret.txt"} and reasons: "The file doesn't exist. I should tell the user it isn't there, or suggest they check the filename." That's the magic. The error becomes feedback the model uses to plan the next step.

Validation before action

if ".." in path:
    return {"error": "Invalid path: directory traversal not allowed"}
if not os.path.exists(full_path):
    return {"error": f"File not found: {path}"}

Two validation rules before we do anything risky.

The first protects the sandbox. We discussed in Episode 9 that joining onto WORK_DIR doesn't fully constrain the path — WORK_DIR/../../etc/passwd resolves outside the workspace. The simple defence is to refuse any path containing ... It's blunt but effective for tutorial-grade safety. (For production, use os.path.realpath() to resolve the path and confirm it stays inside WORK_DIR.)

The second is friendly: we check existence before opening the file, so we can return a clear error instead of letting open() raise FileNotFoundError.

The pattern is validate first, then act. Every external boundary — file system, network, database, user input — deserves the same shape.

The numerical tool: validating types

def divide(a, b):
    try:
        if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
            return {"error": "Both values must be numbers"}
        if b == 0:
            return {"error": "Cannot divide by zero"}
        return {"result": a / b}
    except Exception as e:
        return {"error": f"Division failed: {str(e)}"}

Two validations: type, then value. Even though our schema declares a and b as number, defending in the function is right — schemas are advisory to the model, not enforced by the SDK at runtime. If Claude misreads or you forgot a constraint, the type check catches it.

Notice we check b == 0 before dividing. We could have let Python raise ZeroDivisionError and caught it, but explicit checks read better and produce cleaner error messages. "Cannot divide by zero" is more useful to the model than "Division failed: division by zero".

The system prompt: tell the model errors are normal

system="You are a helpful assistant. Respond in plain text. No markdown. If a tool returns an error, explain what went wrong and try a different approach."

That last clause is doing a lot of work. Without it, Claude might:

  • Give up after the first error.
  • Apologise and stop without trying alternatives.
  • Confidently invent a result ("I successfully read the file...") despite the error.

With it, the model is expected to read the error, react, and either retry, suggest a fix to the user, or move on. This is part of the prompt-engineering discipline we'll spend more time on in Episode 24: tell the model what to do when things don't go to plan.

Watching it run

Task: Read the file secret.txt
--------------------------------------------------
  Step 1: read_file({'path': 'secret.txt'})
  Result: {'error': 'File not found: secret.txt'}

  Answer: I couldn't find a file called secret.txt in the workspace. Could you check the filename or let me know what you'd like to read instead?

The tool returned an error. Claude saw the error inside the result, decided not to retry (correctly — there's nothing to retry without a different filename), and wrote a useful answer.

Task: Divide 100 by 0, then try dividing 100 by 4
--------------------------------------------------
  Step 1: divide({'a': 100, 'b': 0})
  Result: {'error': 'Cannot divide by zero'}
  Step 2: divide({'a': 100, 'b': 4})
  Result: {'result': 25.0}

  Answer: 100 divided by 0 is undefined (cannot divide by zero), and 100 divided by 4 equals 25.

Two tool calls. The first errored, but Claude understood the error, didn't retry that call, and proceeded to the second one. The final answer mentions both — the failure and the success.

Task: Write a greeting to hello.txt, then read it back
--------------------------------------------------
  Step 1: write_file({'path': 'hello.txt', 'content': 'Hello!'})
  Result: {'path': 'hello.txt', 'status': 'written', 'size': 6}
  Step 2: read_file({'path': 'hello.txt'})
  Result: {'path': 'hello.txt', 'content': 'Hello!'}

  Answer: I've written a greeting to hello.txt and verified its contents: "Hello!"

Happy path still works. The error infrastructure doesn't degrade success cases.

Patterns for robust tools

A short checklist for any tool you write going forward:

  1. Always return a dict. Success keys on success, error key on failure.
  2. Wrap the function body in try/except Exception. Catch unexpected errors and return them. Never let the function raise.
  3. Validate inputs explicitly. Type checks, bounds checks, sandbox checks. Return clear error messages.
  4. Use specific error messages, not generic ones. "File not found: hello.txt" beats "Error."
  5. Don't leak sensitive data in errors. Stack traces, credentials, internal paths — sanitise.
  6. Tell Claude in the system prompt how to handle errors. "If a tool returns an error, ..."
  7. Cap the number of agent loop steps. Even with great error handling, a buggy tool that always errors on the same input could otherwise cause an infinite loop.

Parallel tool calls

You may notice this script handles multiple tool calls per assistant turn:

tool_blocks = [b for b in response.content if b.type == "tool_use"]
tool_results = []
for tool_block in tool_blocks:
    fn = tool_dispatch[tool_block.name]
    result = fn(**tool_block.input)
    ...
    tool_results.append({"type": "tool_result", "tool_use_id": tool_block.id, "content": json.dumps(result)})

messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})

When the API supports parallel tool calls (Claude can return multiple tool-use blocks in one response), you collect each result and send them back as a list. Each tool_result matches its corresponding call by tool_use_id. The agent loop handles both serial and parallel cases with the same code.

This is a small but important upgrade from earlier episodes. In production, parallel tool use saves real wall-clock time when tools are independent (read three files at once instead of sequentially).

Common mistakes

Letting tools raise exceptions. Crashes the agent. Wrap and return errors as data.

Returning vague errors. "Something went wrong." Useless to the model. Be specific.

Skipping validation. A tool that does dangerous things on bad input (delete, format, send) without checking is a footgun. Validate before acting.

Forgetting to cap loop iterations. Even with errors-as-data, a poorly designed tool could trap the model. Cap the loop.

Not telling Claude in the system prompt how to handle errors. Without instruction, the model may give up too early or invent results. Tell it explicitly.

What's next

Phase 2 is now complete. You have a real agent — multiple tools, the agentic loop, parallel calls, robust error handling. From here we move to Phase 3: knowledge augmentation (RAG). The model is great at reasoning over text it sees in its prompt; it's weaker on facts that aren't in its training data. RAG fixes that by retrieving relevant text from your own documents and showing it to the model alongside the question.

Next episode: why RAG? A motivating problem and the architecture that solves it. Episodes 13–17 then build the full pipeline: chunking, embeddings, vector search, retrieval, multi-document RAG.

Recap

What we did today. Hardened three tools — read, write, divide — with try/except, input validation, and dict-shaped errors instead of exceptions. Updated the system prompt to instruct Claude on how to react to errors. Generalised the loop to handle parallel tool calls. Watched Claude take a graceful path through a failing call (divide by 0) and a missing-file scenario.

You haven't built a production agent. You've removed the largest cliff between toy agents and production ones: tools that hide their failures or crash the loop.

Next episode: why RAG. See you in the next one.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.