Build AI Apps with Python: Multi-Turn Conversations — Chatbot with Memory | Episode 3
Video: Build AI Apps with Python: Multi-Turn Conversations — Chatbot with Memory | Episode 3 by Taught by Celeste AI - AI Coding Coach
Student code: github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode03 One Python list. That's the whole memory system.
If you've used ChatGPT or Claude.ai in a browser, you've had multi-turn conversations. You ask a question. The model answers. You ask a follow-up — "explain that more" — and it knows what that refers to. Memory feels built in.
It isn't. The Claude API has no memory. Every call is stateless: same model, same parameters, no recollection of yesterday or the previous request or the previous line in the same script. If you call the API twice, the second call has no idea the first one happened.
So how do chatbots remember? They cheat. They keep the conversation in a Python list, and they send the whole list with every request.
That's it. That's the entire mechanism. By the end of this episode you'll have a working CLI chatbot — twenty lines of Python — that proves it.
What "memory" actually means here
A conversation is a sequence of turns: user, assistant, user, assistant. The Claude API expects this exact shape. The messages parameter is a list, not a string, because it represents a transcript:
messages=[
{"role": "user", "content": "What is Python?"},
{"role": "assistant", "content": "Python is a high-level..."},
{"role": "user", "content": "What are its main uses?"}
]
When Claude reads this list, it sees three turns of context. It "remembers" the first user message because that message is still in the list. It "remembers" what it said before because its own response is in the list too. The model isn't recalling anything — you handed it the recall.
The implication is enormous. Memory in an AI app is your responsibility, not the model's. You decide what goes into the list, what comes out, how long it lives, and when it gets pruned. Every chatbot, every assistant, every long-running agent in this series will reduce to manage the messages list.
What we're building
A command-line chatbot. You type a question, Claude answers, you can follow up. Type quit to exit. The whole thing is a while True loop with a list called history.
We'll demonstrate memory by asking three questions where the second and third only make sense if Claude remembers the first:
- "What is Python?"
- "What are its main uses?" — note: never says the word "Python"
- "Which one should a beginner start with?" — only makes sense if Claude knows what we're picking from
If the chatbot has memory, all three answers will be coherent. If it doesn't, turn 2 will be confused ("What are what's main uses?") and turn 3 will hallucinate context.
The script
import os
from dotenv import load_dotenv
from anthropic import Anthropic
load_dotenv()
client = Anthropic()
history = []
print("Chatbot ready! Type quit to exit.\n")
while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
history.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=150,
system="You are a helpful assistant. Keep responses brief and clear.",
messages=history,
)
assistant_message = response.content[0].text
print(f"\nClaude: {assistant_message}\n")
history.append({"role": "assistant", "content": assistant_message})
print("Goodbye!")
Twenty-four lines. Let's walk the new bits.
The history list
history = []
This is the entire memory system. An empty Python list, declared once before the loop starts.
It will live for the duration of the script. Every turn appends two items to it: the user's message and Claude's reply. After three turns it has six items. After a hundred turns it has two hundred. The list grows linearly with the conversation.
This list is the chatbot's mind. If your script crashes, the list is gone and the next run starts fresh. Persistence — saving the history to a file or database between sessions — is something you'd add later. For now, the lifetime of history is the lifetime of the process.
The input loop
while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
Standard Python interactive loop. input() blocks until the user types a line and hits Enter. We check for "quit" (lowercased so Quit, QUIT, quit all work) and break out, otherwise the rest of the loop body runs.
This is also the reason we're running the script differently in this episode. From Neovim, :!python % won't work because that command captures stdout and doesn't give you an interactive stdin. Instead, use :sp | terminal python % — split the window, open a terminal in the new pane, and run Python there. Now input() has a real keyboard to read from.
Appending user input
history.append({"role": "user", "content": user_input})
Before we call the API, we add the user's new message to the history. The format matches exactly what messages.create() wants: a dict with role and content.
We append before the call, not after. Claude needs to see the question we're asking right now, not just the previous turns. The list passed to the API always ends with the most recent user turn — that's the turn the model is responding to.
Sending the whole list
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=150,
system="You are a helpful assistant. Keep responses brief and clear.",
messages=history,
)
Notice messages=history. We're not passing a single message. We're passing the full conversation so far.
This is the key call. Every time it runs, Claude gets everything: the system prompt, the user's first turn, the assistant's first reply, the user's second turn, the assistant's second reply, and so on. Then the model generates one new assistant turn that fits the whole story.
The model sees the entire history every time. That's why it's expensive at scale — long conversations make every subsequent call larger — and that's why prompt caching exists. We'll cover caching later in the series; for now, accept that "Claude remembers" really means "we keep telling Claude what happened."
Appending the assistant's reply
assistant_message = response.content[0].text
print(f"\nClaude: {assistant_message}\n")
history.append({"role": "assistant", "content": assistant_message})
We extract the text from the response (same .content[0].text pattern as before), print it for the user, and append it back to the history with role: "assistant".
This is the step beginners forget most often. They send the user message, get a reply, print it, and then on the next turn they only have the user messages in history. Claude looks at the list and sees [user, user, user] — alternating roles broken — and either errors out or behaves erratically. Always append both sides.
The pattern is: append user → call API → extract text → print → append assistant. Same five steps, every turn, forever.
Running it
:sp | terminal python % from Neovim. The terminal split appears at the bottom. The chatbot prints its prompt:
Chatbot ready! Type quit to exit.
You:
Type the first question:
You: What is Python?
Claude: Python is a high-level, interpreted programming language known for its readable syntax and versatility...
Now the second question — phrased deliberately to require memory:
You: What are its main uses?
Claude: Python is widely used in web development, data science, machine learning, automation, and scientific computing...
It worked. Claude understood that "its" referred to Python. The trick? On the second call, messages contained four items: the first user question, the first assistant reply, and the new user question. Claude saw the previous exchange and knew the antecedent.
Third question:
You: Which one should a beginner start with?
Claude: For a beginner, I'd recommend starting with web development using a framework like Flask or Django, or scripting and automation. These give you quick, visible results...
"Which one" — out of what? Out of the list it just gave in turn two. The chatbot kept the thread.
Type quit and the loop exits cleanly.
What's actually expensive
A subtle thing worth noticing: turn three sent six list items to the API. Turn ten will send twenty. Turn one hundred will send two hundred.
Each item costs tokens. Tokens cost money and time. A long-running chatbot conversation gets monotonically more expensive per turn, and eventually slower as the prompt grows. This is the core scaling problem in any chat app.
Solutions exist — and we'll get to them later in the series. The main families:
- Truncation: keep only the last N turns. Cheap, but Claude forgets early context.
- Summarisation: periodically have Claude summarise the early conversation into one short message and replace those turns. Preserves gist, loses detail.
- Prompt caching: tell Anthropic to cache the static prefix of your conversation so subsequent calls don't pay full price for the same tokens.
For a 5-minute personal chatbot, none of this matters. For a production app, it's the architecture of your context-management system.
Common mistakes
Forgetting to append the assistant turn. History becomes user-only. Claude either errors (consecutive user roles aren't allowed) or hallucinates that it never said anything. Always append both sides.
Mutating history while iterating elsewhere. If you parallelise multiple chats and they share a list by accident, you get cross-contaminated conversations. Each user session needs its own list.
Using a string instead of a list. Newcomers sometimes try to pass messages="some giant transcript". The API requires the list-of-dicts shape. Pass the structured list; don't flatten.
Sending no history at all. This is the Episode-1 pattern — every turn starts fresh, Claude has no memory. Useful for stateless API endpoints. Wrong for chatbots.
What's next
You now have the two pieces every conversational AI app needs: system prompt for persona/instructions, messages list for memory. From here, we'll add the things that make AI apps feel fast and modern.
Next episode: streaming responses. Right now, every reply arrives as one big chunk after a multi-second pause. That's how most chat apps used to feel. The modern feel — words appearing as they're generated, like watching someone type — is what stream=True gives you. Same chatbot, dramatically different UX.
Recap
What we did today. Built a CLI chatbot in twenty-four lines of Python. Used a list called history to keep every turn — both user and assistant — and sent the whole list with each messages.create() call. Proved memory works by asking three questions where each depended on the previous answer. Identified the cost-and-latency problem that makes long conversations a real engineering concern.
You haven't built a chatbot product. You've built the loop every chatbot product is made of. The shape we wrote — append user, call API, extract text, print, append assistant — is the same shape Slack apps, customer-support widgets, and AI tutors use. The system prompt and the model change. The loop doesn't.
Next episode: streaming. See you in the next one.