Build AI Apps with Python: Search by Meaning with ChromaDB | Episode 15
Video: Build AI Apps with Python: Search by Meaning with ChromaDB | Episode 15 by Taught by Celeste AI - AI Coding Coach
Student code: github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode15 Six recipes. Three queries. ChromaDB returns the right ones — by meaning, not keyword.
In Episode 14 we computed similarity between four sentences by hand. That doesn't scale. Real RAG systems index thousands or millions of chunks, and the cost of computing similarity to every chunk on every query is prohibitive.
The fix is a vector store — a database designed to store embeddings and answer the question "which N items in my collection are most similar to this query?" in milliseconds. ChromaDB is the simplest one to start with: pip install, two lines to set up, in-memory or persistent.
This is the database every "chat with your docs" product is built on. Once you've used it once, you've understood vector databases — Pinecone, Weaviate, Qdrant, pgvector, all of them — because the API is essentially the same.
What we're building
A recipe search engine. Six recipes — chicken fried rice, pasta carbonara, vegetable stir-fry, banana smoothie, tomato soup, grilled salmon — go into a collection. Then we ask three semantic questions:
- "What can I make with chicken?" — should retrieve the chicken fried rice.
- "I want something healthy for breakfast" — should retrieve the banana smoothie.
- "How do I make Italian pasta?" — should retrieve the carbonara.
None of those queries share specific keywords with the recipes (the recipes don't say "breakfast" or "healthy"). Vector search will find them anyway, by meaning.
The script
import chromadb
client = chromadb.Client()
collection = client.create_collection(name="recipes")
recipes = [
"Chicken Fried Rice: Cook rice and let it cool. Stir-fry diced chicken with garlic and ginger...",
"Pasta Carbonara: Boil spaghetti until al dente. Fry pancetta until crispy...",
"Vegetable Stir Fry: Slice bell peppers, broccoli, snap peas, and mushrooms...",
"Banana Smoothie: Blend frozen bananas with milk, a spoonful of peanut butter, and honey...",
"Tomato Soup: Roast tomatoes, onion, and garlic at 400F for 30 minutes...",
"Grilled Salmon: Season salmon fillets with lemon, dill, salt, and pepper...",
]
collection.add(
documents=recipes,
ids=[f"recipe_{i}" for i in range(len(recipes))],
)
results = collection.query(
query_texts=["What can I make with chicken?"],
n_results=2,
)
for i, doc in enumerate(results["documents"][0]):
dist = results["distances"][0][i]
print(f"Result {i+1} (distance: {dist:.3f}):")
print(f" {doc}\n")
That's the whole pipeline. Three calls into ChromaDB: Client(), create_collection(), add(), query().
Notice what's not there. We didn't load an embedding model. We didn't write cosine_similarity. ChromaDB embeds on add() and on query() for us using its default embedding function (a sentence-transformer under the hood). For tutorial purposes, this is the right level of abstraction. In production you'd want to swap in a specific embedding function — we'll discuss when.
Collections, IDs, and documents
collection = client.create_collection(name="recipes")
collection.add(
documents=recipes,
ids=[f"recipe_{i}" for i in range(len(recipes))],
)
A collection is like a table for vectors. You give it a name; you add items.
An item has at least two parts: a document (the actual text) and an id (a unique string). When you add documents, ChromaDB embeds each one with the collection's embedding function and stores both the text and the embedding.
The IDs let you update or delete specific items later. They have to be unique within a collection. We're using recipe_0, recipe_1, ... — predictable and traceable. Real apps often use UUIDs or content hashes.
You can also attach metadata to each item — which document this chunk came from, which page, which section — and filter queries on those fields. We'll use metadata in Episode 17.
Querying
results = collection.query(
query_texts=["What can I make with chicken?"],
n_results=2,
)
query_texts is a list because ChromaDB supports batched queries (multiple questions at once). n_results=2 means return the top 2 most similar items.
The result has parallel arrays: documents[i], distances[i], ids[i], metadatas[i]. The [0] index after each is because we asked one query — index 0 is "the results for query 0."
for i, doc in enumerate(results["documents"][0]):
dist = results["distances"][0][i]
print(f"Result {i+1} (distance: {dist:.3f}):")
print(f" {doc}\n")
Loop, print, done. Distance, by default, is a measure where lower is better — the opposite convention from cosine similarity (where higher is better). ChromaDB's default is L2 (Euclidean) distance over normalised vectors, which produces values between 0 and 2 — close to 0 means very similar.
Running it
:!python %. Three queries, two results each:
=== Query 1 ===
Question: What can I make with chicken?
Result 1 (distance: 0.689):
Chicken Fried Rice: Cook rice and let it cool. Stir-fry diced chicken with garlic and ginger...
Result 2 (distance: 1.124):
Grilled Salmon: Season salmon fillets with lemon, dill, salt, and pepper...
=== Query 2 ===
Question: I want something healthy for breakfast
Result 1 (distance: 1.142):
Banana Smoothie: Blend frozen bananas with milk, a spoonful of peanut butter, and honey...
Result 2 (distance: 1.298):
Vegetable Stir Fry: Slice bell peppers, broccoli, snap peas, and mushrooms...
=== Query 3 ===
Question: How do I make Italian pasta?
Result 1 (distance: 0.748):
Pasta Carbonara: Boil spaghetti until al dente. Fry pancetta until crispy...
Result 2 (distance: 1.265):
Vegetable Stir Fry: Slice bell peppers, broccoli, snap peas, and mushrooms...
Each query's top result is the right one. Notice the distance gap: the relevant recipe scores noticeably lower distance than the runner-up. That gap is the signal you have a confident hit. Two near-identical distances would mean "either could be right; I'm not sure."
Notice also that "breakfast" doesn't appear in the banana smoothie text. The text mentions "breakfast" in the second sentence of the recipe, which the embedding picks up. The vector model captures the semantic gist — fruity blended drink with peanut butter, post-workout, breakfast — and matches it to "healthy for breakfast" even though only one shared word exists.
In-memory vs persistent
client = chromadb.Client()
The default Client() is in-memory. Everything you add() lives only for the duration of the Python process. Quit and the collection is gone. Fine for tutorials and tests.
For real apps you want persistent:
client = chromadb.PersistentClient(path="./chroma_db")
Pass a directory. ChromaDB writes the embeddings and metadata to disk. Restart the script and the collection is still there.
For production, ChromaDB also runs as a server. Other apps can be written against it like any database. The Python client API is the same.
Custom embedding functions
ChromaDB's default embedding is a built-in sentence-transformer model. You can swap it:
from chromadb.utils import embedding_functions
ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="...", model_name="text-embedding-3-small"
)
collection = client.create_collection(name="recipes", embedding_function=ef)
Or write your own:
class MyEmbedding(chromadb.EmbeddingFunction):
def __call__(self, texts):
return [embed_with_my_model(t) for t in texts]
When does this matter? When you care about embedding quality (use a top-tier model) or when you've already computed embeddings somewhere else (provide them directly via add(embeddings=[...], documents=[...], ids=[...])).
Common mistakes
Mismatched embedding functions on add and query. If you indexed with one model and query with another, results are random noise. Use the same function consistently.
Forgetting to persist. In-memory clients lose data on restart. Use PersistentClient for anything you want to keep.
Treating distance as similarity. ChromaDB's default returns distances (lower = better), but you'll often see code that sorts ascending or descending wrong. Read the column you're getting.
Indexing entire documents instead of chunks. A 50-page handbook indexed as one document means every query retrieves the whole 50 pages. Split first (Episode 13), then index the chunks.
Not filtering by metadata when you can. If your query is about leave policy, you can filter to chunks tagged source: handbook. We'll cover this in Episode 17.
What's next
You now have all the pieces of a RAG system: chunks (Episode 13), embeddings (Episode 14), and a vector store with semantic search (today). Next episode wires them together with Claude.
Episode 16: the RAG pipeline. Question in. Vector search retrieves the most relevant chunks. We build a prompt that includes those chunks. Claude answers using only what was retrieved. Single function, three steps, working end-to-end.
Recap
What we did today. Spun up a ChromaDB collection. Added six recipes with auto-generated embeddings. Asked three semantic questions and watched ChromaDB return the right recipes — even when keywords didn't overlap. Compared in-memory vs persistent clients. Discussed when to swap in custom embedding functions.
You haven't done RAG yet. You've built the search primitive RAG depends on.
Next episode: the RAG pipeline. See you in the next one.