Build AI Apps with Python: Vision — AI That Can See Images | Episode 6

12views
00
12:04
T
Taught by Celeste AI - AI Coding Coach
View on YouTube
Description
Claude can see! In this episode, we send images to Claude and get back detailed descriptions. Read any image file, convert it to base64, and send it alongside a text prompt — Claude analyzes the image and tells you what it sees. This completes Phase 1 of the series. You now know all the core API capabilities: text, system prompts, conversations, streaming, structured output, and vision. Next up: giving AI the ability to call your functions. Student code: https://github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode06 Every keystroke is shown on screen with 3-second pauses so you can follow along at your own pace. What You'll Learn: • base64 encoding — converting images to text strings for APIs • Multi-modal messages — image block + text block in one message • The image source structure: type, media_type, data • Building a reusable describe_image() function • Handling different image formats (PNG, JPEG) • Sending two different images with the same function • Running Python scripts with :!python % Timestamps: 0:00 - Introduction 0:12 - Claude Can See Images (Preview) 0:44 - Image 1: Sunset 0:49 - Image 2: City Skyline 0:54 - Creating vision.py 1:10 - Imports (new: base64 module) 1:50 - Setup 2:10 - describe_image() function 2:28 - Reading image files in binary mode 2:48 - base64.b64encode — the key conversion 3:05 - Media type detection (PNG vs JPEG) 3:35 - Save progress 3:55 - API call with image block 4:38 - Image source: base64, media_type, data 5:48 - Text block: "Describe this image in detail" 6:22 - Close brackets and return 7:00 - First call: sunset.png 7:28 - Second call: city.png 7:52 - Save and run 8:18 - Claude describes both images! 9:40 - Code review 10:00 - Recap: 3 Key Takeaways 10:32 - End Screen Key Takeaways: 1. base64 encoding converts images to text strings the API can handle 2. Message content becomes a list — image block plus text block together 3. Claude sees and understands images — same API, new capability This completes Phase 1 (API Fundamentals) of Build AI Apps with Python in Neovim. Next: Phase 2 — Tool Use & Function Calling (Episodes 7-11).

Tags

python ai visionclaude api image inputbase64 pythonanthropic sdk visionmultimodal aiimage description aiclaude api pythonai image analysiscomputer vision pythonai tutorial 2026build ai apps pythonneovim tutorialgenerative ai pythonscreenkeycode along
Back to tutorials

Duration

12:04

Published

March 28, 2026

Added to Codegiz

March 30, 2026

Open in YouTube