Part of Python AI Tutorial Series

Build AI Apps with Python: Full-Featured CLI Agent — Tools Guardrails History | Episode 23

Celest Kim

•April 18, 2026

Video: Build AI Apps with Python: Full-Featured CLI Agent — Tools Guardrails History | Episode 23 by Taught by Celeste AI - AI Coding Coach

Take the quiz on the full lesson page

Test what you've read · interactive walkthrough

Student code: github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode23 Everything you have built. One CLI. The capstone of Phase 4.

This is the put-it-together episode. We take the parts you have assembled across the series — tool use from Phase 2, guardrails from Episode 21, conversation memory from Episode 3, a clean system prompt from Episode 2 — and weave them into a single interactive command-line agent. You type a question, it reasons about which tool (if any) to use, runs the tool, gives you an answer, and remembers what you discussed for the next turn.

Nothing today is conceptually new. The interest is in seeing the whole system as one thing. Most production AI agents are exactly this shape: a CLI or web UI on top of a tool-using, guarded, memory-bearing loop.

What we are building

A REPL-style assistant. The user types a line; the agent considers their question, may call one of three tools, and replies. Conversation history persists across turns. Bad input is blocked at the door.

Three tools: calculate, define_word, get_weather. The input guardrail blocks four denylisted terms — hack, exploit, weapon, illegal. Output guardrails are out of scope today; production code would add them.

The structure: two nested loops

The shape is two nested loops.

The outer loop is the chatbot REPL from Episode 3: read user input, validate, append to history, hand off to the agent, repeat.

The inner loop is the ReAct loop from Episode 10: call Claude with the conversation; if Claude requests a tool, execute it and feed the result back; if Claude returns plain text, print it and break out.

A single messages list serves both layers. It carries the user-and-assistant turn-taking from the chatbot, plus the tool-use round-trips from the agent. Same data structure, two purposes.

def main():
    messages = []

    while True:
        user_input = input("\nYou: ")
        if user_input.lower() == "quit":
            break

        allowed, reason = check_input(user_input)
        if not allowed:
            print(f"  [BLOCKED] {reason}")
            continue

        messages.append({"role": "user", "content": user_input})

        while True:
            response = client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=300,
                system="You are a helpful assistant with tools. Use calculate for math, define_word for definitions, get_weather for weather. Be concise.",
                tools=tools,
                messages=messages,
            )

            if response.stop_reason == "end_turn":
                for block in response.content:
                    if hasattr(block, "text"):
                        print(f"\nAssistant: {block.text}")
                messages.append({"role": "assistant", "content": response.content})
                break

            for block in response.content:
                if block.type == "tool_use":
                    result = run_tool(block.name, block.input)
                    print(f"  [Tool: {block.name}] {result}")
                    messages.append({"role": "assistant", "content": response.content})
                    messages.append({
                        "role": "user",
                        "content": [{
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": result,
                        }],
                    })

Why memory plus tools is the real product

A chatbot without tools is fluent but useless for anything specific — it can talk about the weather, not tell you the weather. A tool-using agent without memory is task-oriented but amnesiac — every question starts from scratch. The combination is the form most useful AI products take.

A user can ask "what is 256 times 4" and get 1024. Then say "divide that by 8" and the agent understands "that" because the previous turn is in the conversation. Reference resolution falls out of conversation history; the tool layer handles the math; the user types short sentences instead of the full expression each time.

This pattern scales. Replace calculate / define / weather with database queries, calendar tools, code-search tools — and you have the AI assistant of an internal product.

Running it

Start with :sp | terminal python % from Neovim, or python cli_agent.py in any terminal. A short session might look like:

You: What is 256 times 4? → tool call to calculate({"expression": "256 * 4"}) → result 1024 → assistant says "256 times 4 is 1024."
You: What does API stand for? → tool call to define_word({"word": "api"}) → assistant explains.
You: What is the weather in Tokyo? → tool call to get_weather({"city": "tokyo"}) → assistant reports 80F humid.
You: How do I hack a system? → input guardrail blocks before any API call.
You: quit → exits.

Each turn shows the tool name and its raw output before the assistant's natural-language reply. Useful for tutorial transparency; in a polished product you would suppress the tool noise and only show the assistant's reply.

Production gaps

Today's script is the architecture, not the product. A real shipped version would also include:

Output guardrails. PII redaction, profanity filter, brand-safety checks before display.
Persistent history. Save messages to disk so conversations survive restarts.
Per-user state. Multi-user products isolate history, memory, and rate limits per user ID.
Streaming output. From Episode 4. The current code waits for full responses; users prefer to watch text arrive.
Cost and rate limits. Cap tokens per user per day. Cap requests per minute.
Eval harness. Episode 22's pattern. Run after every meaningful change.
Audit logging. Every input, every output, every tool call. Required for compliance and incident response.
Better dispatcher. A real product would auto-register tools rather than hand-maintain run_tool() and the tools list separately.

Each of these layers is one or two episodes' worth of work. The architecture today gives you the bones; production is the meat on top.

Common mistakes

Forgetting the inner break. When the assistant returns text, the inner loop has to break, otherwise you call the API again on the same history and waste tokens.

Appending the assistant turn only on tool calls. Append it on end_turn too, otherwise the next user turn lacks the assistant's reply in context and Claude reverts to amnesia.

Confusing the two messages.append lines after a tool call. The first appends the assistant's tool-use response (response.content), the second appends the tool result wrapped in a user message. Both are required.

Skipping the input guardrail at the wrong layer. Run it on user_input, not on messages[-1]["content"] — by then it is already in history.

Echoing tool results twice. Easy to print the result from the tool function and let the assistant repeat it. Either suppress one or accept the duplication.

What's next

Next episode: prompt engineering patterns. The series finale. Four patterns — zero-shot vs few-shot, chain-of-thought, role prompting, output format control — that materially improve the answers your agent produces. The code does not change; the prompts do.

Recap

What we did today. Combined the chatbot REPL from Episode 3, the agent loop from Episode 10, the input guardrail from Episode 21, and three tools into a single interactive CLI agent. Used one shared messages list to carry both conversational memory and tool-use round-trips. Demonstrated the four primary turn types: a calculation, a definition, a weather lookup, and a blocked input. Identified the production gaps that separate today's script from a shippable product.

You have shipped the spine of an AI assistant. Everything beyond is layering — UI, persistence, observability, eval — on the same skeleton.

Next episode: prompt engineering patterns. See you in the next one.

Ready? Take the quiz on the full lesson page →

Test what you've learned. Watch the lesson and try the interactive quiz on the same page.

View all episodes in Python AI Tutorial Series →