Build AI Apps with Python: Test Your AI Agent — Keyword Matching Eval Framework | Episode 22
1views
0020:00
T
Taught by Celeste AI - AI Coding Coach
View on YouTubeDescription
How do you know your agent works? Test it. Define expected answers, run the agent, check the results. A score report tells you exactly what
passed and what broke.
We build an evaluation framework from scratch. Five test cases with expected keywords — capital of France expects "Paris", Django language
expects "Python", Linux creator expects "Linus" and "Torvalds". The evaluate function runs the agent against each case, checks keyword
matches case-insensitively, and produces a pass/fail report with a percentage score. Run it after every change to catch regressions before
deploying.
Student code: https://github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode22
Every keystroke is shown on screen with 3-second pauses so you can follow along at your own pace.
What You'll Learn:
• Test cases with expected keywords
• Running agents against test suites
• Case-insensitive keyword matching
• Tracking found and missing keywords
• Pass/fail determination per test
• Score calculation with percentage
• Regression testing for agents
Key Takeaways:
1. Test cases define expected keywords — simple to write, easy to extend
2. Keyword matching is reliable — case-insensitive search tracks found and missing
3. The score catches regressions — run after every change, fix drops before deploying
This is Episode 22 of Build AI Apps with Python in Neovim — Phase 4 (AI Agents).
Taught by CelesteAI. Like & subscribe for more tutorials!
#python #ai #evaluation #testing #agenteval #claudeapi #anthropic #neovim #programming #tutorial #machinelearning #artificialintelligence
#coding #pythontutorial #buildaiapps #llm #genai #regressiontesting
Tags
python agent evaluationai testing pythonevaluate ai agentskeyword matching testclaude api testinganthropic sdkagent scoring pythonai tutorial 2026build ai apps pythonneovim tutorialgenerative ai pythonscreenkeycode alongregression testingpass fail report