Part of Python AI Tutorial Series

Build AI Apps with Python: System Prompts and Personas | Episode 2

Celest KimCelest Kim

Video: Build AI Apps with Python: System Prompts and Personas | Episode 2 by Taught by Celeste AI - AI Coding Coach

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

Student code: github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode02 Three personas. Three responses. One parameter does the work.

In Episode 1 we made our first API call: ask Claude a question, print the answer. The script worked, but it was a one-shot. There's no personality, no role, no constraint on how the answer should sound. Claude defaulted to its general "helpful assistant" tone — which is fine for a demo, but not how anyone actually ships an AI feature.

The real product use case is different. "Be terse. Speak like a senior engineer. Answer only in JSON. Refuse to discuss anything off-topic. Always end with a follow-up question." These are not separate APIs. They are all the same API, with one parameter changed.

That parameter is system. This episode is the whole story of what it does, why it matters, and the pattern you'll use in every Claude app from here on.

What a system prompt is

The Claude API takes a list of messages — turns in a conversation between "user" and "assistant". The system parameter sits outside that list. It's a separate string that tells Claude who it is and how to behave before the conversation even starts.

Think of it as the briefing the model gets in its earpiece. The user can't see it. The user can't override it (well, mostly — we'll get to that in later episodes on prompt injection). It runs once, before turn one, and shapes every word the model produces afterwards.

This is the single most powerful lever in the Claude API. Tone, format, persona, constraints, refusal behaviour, output schema — all of it lives in the system prompt. Most "I'm building an AI feature" reduces, after the marketing, to I wrote a clever system prompt and pointed it at user input.

What we're building

Same question, three different personalities. We'll ask Claude "What is the best way to learn programming?" three times. Once with no instructions other than "be a pirate." Once as a chef. Once as a patient tutor. Same model, same messages, same code structure — three radically different answers.

By the end of the episode, you'll have a script that proves to your fingers what the docs say in words: changing one line changes everything.

The script

Setup is identical to Episode 1: virtualenv, anthropic and python-dotenv installed, .env with your ANTHROPIC_API_KEY. From there:

import os
from dotenv import load_dotenv
from anthropic import Anthropic

load_dotenv()

client = Anthropic()

question = "What is the best way to learn programming?"

# Pirate persona
print("=== Pirate ===")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=100,
    system="You are a pirate. Speak like a pirate in every response.",
    messages=[
        {"role": "user", "content": question}
    ],
)
print(response.content[0].text)

# Chef persona
print("\n=== Chef ===")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=100,
    system="You are a chef. Relate everything to cooking and food.",
    messages=[
        {"role": "user", "content": question}
    ],
)
print(response.content[0].text)

# Tutor persona
print("\n=== Tutor ===")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=100,
    system="You are a patient tutor. Give clear, step-by-step advice.",
    messages=[
        {"role": "user", "content": question}
    ],
)
print(response.content[0].text)

Three near-identical blocks. The only thing that changes between them is the string passed to system.

Reading the pattern

Notice what is the same in all three calls:

  • The model. Same claude-sonnet-4-20250514.
  • The token budget. Same max_tokens=100.
  • The user message. Same question variable, the same string of text.
  • The shape of the call. client.messages.create(...) returning a response whose content[0].text we print.

The only difference, the only difference, is the value of system. And yet you'll get a pirate, a chef, and a tutor — three different answers in three different tones, with different vocabularies, different sentence rhythms, different priorities about what to mention first.

That asymmetry — small input change, large output change — is what makes prompt engineering a skill rather than a job. The messages you send don't change much from call to call. The system is where the product lives.

Why we kept the question in a variable

question = "What is the best way to learn programming?"

This isn't aesthetic. It's the single most important habit you can pick up early.

In production, question won't be a string literal. It'll be request.body.user_message, or a chunk read from a CSV, or the latest line from a CLI input loop. Pulling the user input into a variable separates the thing the user controls from the thing you control. Your system prompt is yours. Your model selection is yours. The user's content is theirs. Once they're separate, you can wrap, validate, log, or filter the user portion independently.

It also makes the comparison fair. The same exact bytes go into all three calls. Whatever differences you see in the output came from the system prompt and nothing else. That's the experiment we're running.

max_tokens, deliberately small

Episode 1 used max_tokens=1024. Here we used max_tokens=100. Why?

Two reasons. The first is comparison. Reading three answers stacked in a terminal is easier when each is one short paragraph. A pirate going on for 800 tokens followed by a chef going for 800 tokens is hard to compare visually; three 60-word blurbs lined up are easy.

The second is intent. max_tokens is a hard cap, not a target. Claude won't pad the answer to reach 100. It'll write what it would naturally write and stop, unless it would have gone past 100 — in which case it gets cut off mid-sentence. So setting a low cap is also a forcing function: "Be brief, even if you have more to say." Personas tend to ramble in character; clamping the budget keeps the comparison honest.

In production, choose max_tokens based on the longest answer you're prepared to display and pay for. There's no penalty for setting it generously — Claude won't fill the budget for its own sake. It just stops you from getting a 4,000-token surprise.

Running it

Save as system_prompts.py. From Neovim, :!python %. The terminal pauses for a few seconds while three sequential API calls go out, and then prints something like:

=== Pirate ===
Arrr, matey! The finest way to learn the ways o' programmin' be to set sail with a single language — Python be a fair first ship — and write a wee program every day...

=== Chef ===
Learning programming is like mastering a kitchen. Start with one cuisine — a single language — and cook from it daily. Read recipes (tutorials) for technique, but don't just read...

=== Tutor ===
1. Pick one language and stick with it for the first three months — Python is a great starting point.
2. Set a daily practice goal of 30 minutes — consistency matters more than session length.
3. Build small, complete projects rather than copying tutorials line by line...

Three answers, structurally identical (they're all about how to learn), tonally different (pirate vocabulary vs. cooking metaphors vs. numbered list with concrete advice). Same model. Same question. One parameter.

That's the whole episode in three blocks of output.

What's actually happening underneath

It's worth being clear about the mechanics, because beginners sometimes assume system is doing something magical.

It isn't. The system prompt is just more text prepended to the model's context window before it starts generating. Claude is autoregressively predicting the next token based on everything in front of it: the system prompt, the user message, and whatever it's already generated. A system prompt that says "You are a pirate" loads the early context with the pattern of pirate speech, which makes the next-token distribution favour pirate-like words. There's no separate "persona" component in the model. There's just text in, text out.

This matters because it tells you what system prompts can and can't do.

Can: set tone, format, role, scope, refusal style, output schema, length preferences, language, audience level, list-vs-prose, tool-usage policy.

Can't: make Claude know facts it doesn't know, disable safety behaviours, prevent the model from refusing things, hide instructions from a sufficiently-determined user, guarantee zero hallucination.

Treat the system prompt as a strong default, not an enforcement layer.

Common mistakes with system prompts

A few things you'll want to avoid as you start writing your own.

Putting instructions in the user message instead. "Hi, you are a pirate. Now answer this..." technically works, but it muddles user intent with developer intent. Claude is more likely to ignore or contradict role instructions when they're inside a user turn, especially in long conversations. Keep developer voice in system, user voice in messages.

Stacking three different personas in one prompt. "You are a pirate, a chef, and a tutor. Answer in all three voices." You'll get a mush. If you want three answers, make three calls. The model handles one persona well; it handles "be three different things at once" badly.

Conflicting constraints. "Always be brief. Always cite sources for every claim with full URLs." Pick one. The model will satisfy whichever one shows up later in the prompt and silently violate the other.

Forgetting system exists at all. This is the most common one for newcomers. They write a 600-word user message that opens with "Pretend you're an expert in...". That belongs in system. Move it. Your call site becomes shorter, your prompts more reusable, and the model's behaviour more reliable.

Where this leads in the rest of the series

System prompts are the single most reused tool in the API. Every episode from here on uses one — sometimes a small one, sometimes a 30-line one with embedded instructions, examples, and tool-usage policies.

In Episode 3 we'll add the second axis: multi-turn conversations, where messages grows over time and Claude has to keep track of earlier turns. The system prompt sets the persona; the messages list carries the memory.

By Episode 7 you'll be writing system prompts that explain which tools Claude has and when to use them. By Episode 24 you'll be combining system prompts with few-shot examples and chain-of-thought patterns to wring better answers out of the same model.

If you remember nothing else from this episode: the system prompt is where the product lives. Same model, same question — different system prompt, different app.

Recap

What we did today. Sent the same user question through client.messages.create() three times. Changed only the system parameter between calls. Got three personality-shifted answers — a pirate, a chef, a tutor. Confirmed that max_tokens is a hard cap, not a target. Saw that pulling the user input into a variable lets you separate the thing the user controls from the thing you control.

Three lines of intent encoded as three system prompts. Three completely different products from one piece of code.

Next episode: multi-turn conversations. Where messages becomes a list that grows with every turn, and Claude starts to remember what was said before.

See you in the next one.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.