Back to Blog

Build AI Apps with Python: Vision — AI That Can See Images | Episode 6

Celest KimCelest Kim

Video: Build AI Apps with Python: Vision — AI That Can See Images | Episode 6 by Taught by Celeste AI - AI Coding Coach

Watch full page →

Build AI Apps with Python: Vision — AI That Can See Images

In this episode, you'll learn how to send images to an AI model like Claude by converting image files to base64-encoded strings and combining them with text prompts. This approach enables the AI to analyze and describe images, unlocking powerful multi-modal capabilities in your Python applications.

Code

import base64

def encode_image_to_base64(image_path):
  """Read an image file and return its base64-encoded string."""
  with open(image_path, "rb") as image_file:
    encoded_bytes = base64.b64encode(image_file.read())
    return encoded_bytes.decode("utf-8")

def describe_image(image_path, client):
  """
  Send an image and a prompt to the AI client to get a detailed description.
  The message includes an image block and a text prompt block.
  """
  image_data = encode_image_to_base64(image_path)
  message = [
    {
      "type": "image",
      "image": {
        "type": "base64",
        "media_type": "image/png",  # adjust if your image is jpg or other format
        "data": image_data
      }
    },
    {
      "type": "text",
      "text": "Describe this image in detail."
    }
  ]
  response = client.chat.completions.create(
    model="claude-2",
    messages=message
  )
  return response.choices[0].message.content

# Example usage:
# from your_ai_sdk import AIClient
# client = AIClient(api_key="your_api_key")
# description = describe_image("cat.png", client)
# print(description)

Key Points

  • Convert images to base64 strings to embed them in text-based API messages.
  • Combine image data and text prompts in a single multi-modal message for AI processing.
  • The image object requires specifying the type ("base64"), media_type (e.g., "image/png"), and encoded data.
  • Encapsulating image description logic in a reusable function simplifies your app code.
  • This technique enables AI models to "see" and interpret images alongside text inputs.