Back to Blog

Build AI Apps with Python: Streaming Responses — Real-Time AI Output | Episode 4

Celest KimCelest Kim

Video: Build AI Apps with Python: Streaming Responses — Real-Time AI Output | Episode 4 by Taught by Celeste AI - AI Coding Coach

Watch full page →

Build AI Apps with Python: Streaming Responses — Real-Time AI Output

Streaming AI responses lets your chatbot display text as it’s generated, creating a more dynamic and engaging user experience. Instead of waiting for the full reply, you can show words or phrases in real time, just like popular AI chat interfaces do.

Code

from some_ai_sdk import AIClient

client = AIClient(api_key="your_api_key")

messages = [
  {"role": "user", "content": "Tell me a joke about cats."}
]

# Use a context manager to handle the streaming response
with client.chat.stream(messages=messages) as stream:
  full_response = ""
  # Iterate over text chunks as they arrive
  for chunk in stream.text_stream:
    print(chunk, end="", flush=True)  # Print without newline, flush for real-time output
    full_response += chunk

print()  # Newline after streaming finishes

# Add the AI's full response to the conversation history
messages.append({"role": "assistant", "content": full_response})

Key Points

  • Use messages.stream() instead of messages.create() to receive partial AI outputs incrementally.
  • The with statement manages the lifecycle of the streaming response cleanly and safely.
  • Iterate over stream.text_stream to get chunks of text as they are generated by the AI.
  • Print with end="" and flush=True to display output immediately without buffering delays.
  • Collect all streamed chunks into a string to maintain complete conversation history for context in future messages.