Build AI Apps with Python: Vision — AI That Can See Images | Episode 6
Video: Build AI Apps with Python: Vision — AI That Can See Images | Episode 6 by Taught by Celeste AI - AI Coding Coach
Watch full page →Build AI Apps with Python: Vision — AI That Can See Images
In this episode, you'll learn how to send images to an AI model like Claude by converting image files to base64-encoded strings and combining them with text prompts. This approach enables the AI to analyze and describe images, unlocking powerful multi-modal capabilities in your Python applications.
Code
import base64
def encode_image_to_base64(image_path):
"""Read an image file and return its base64-encoded string."""
with open(image_path, "rb") as image_file:
encoded_bytes = base64.b64encode(image_file.read())
return encoded_bytes.decode("utf-8")
def describe_image(image_path, client):
"""
Send an image and a prompt to the AI client to get a detailed description.
The message includes an image block and a text prompt block.
"""
image_data = encode_image_to_base64(image_path)
message = [
{
"type": "image",
"image": {
"type": "base64",
"media_type": "image/png", # adjust if your image is jpg or other format
"data": image_data
}
},
{
"type": "text",
"text": "Describe this image in detail."
}
]
response = client.chat.completions.create(
model="claude-2",
messages=message
)
return response.choices[0].message.content
# Example usage:
# from your_ai_sdk import AIClient
# client = AIClient(api_key="your_api_key")
# description = describe_image("cat.png", client)
# print(description)
Key Points
- Convert images to base64 strings to embed them in text-based API messages.
- Combine image data and text prompts in a single multi-modal message for AI processing.
- The image object requires specifying the type ("base64"), media_type (e.g., "image/png"), and encoded data.
- Encapsulating image description logic in a reusable function simplifies your app code.
- This technique enables AI models to "see" and interpret images alongside text inputs.