Build AI Apps with Python: Multi-Document RAG — Metadata Filtering & Citations | Episode 17
1views
0021:01
T
Taught by Celeste AI - AI Coding Coach
View on YouTubeDescription
Multiple documents, one collection. Tag every chunk with its source file. Filter searches to a single document. Get answers that cite
exactly where each fact came from.
We build a company Q&A system with three documents — employee handbook, FAQ, and company policy. Nine chunks, each tagged with metadata.
Ask questions across all sources or filter to just one. Every answer includes source citations so you know which document it came from.
Student code: https://github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode17
Every keystroke is shown on screen with 3-second pauses so you can follow along at your own pace.
What You'll Learn:
• ChromaDB metadata — tagging chunks with source filenames
• Metadata filtering with where clauses
• Source-labeled context for auditable prompts
• Claude generating answers with source citations
• Cross-document vs filtered queries
• Building a multi-file knowledge base
• The pattern for production RAG systems
• Running Python scripts with :!python %
Timestamps:
0:00 - Introduction
0:12 - What is Multi-Document RAG? (Preview)
0:44 - Creating multi_doc_rag.py
1:00 - Imports: Anthropic + ChromaDB
2:00 - Setup: clients and collection
2:45 - Document 1: Employee Handbook (3 chunks)
3:40 - Document 2: FAQ (3 chunks)
4:25 - Document 3: Company Policy (3 chunks)
5:10 - Adding metadata: source tags per chunk
6:30 - Three loops: handbook, faq, policy metadata
8:30 - collection.add with metadatas
9:30 - Save progress
9:50 - The ask function with source_filter parameter
10:45 - Building query_args dictionary
11:30 - Conditional where filter
12:15 - Step 1: RETRIEVE with metadata
13:00 - Step 2: AUGMENT with source labels
14:30 - Step 3: GENERATE with citations
16:30 - Three test questions
18:00 - Save and run
18:30 - Output: 9 chunks, 3 cited answers
19:30 - Code review
19:50 - Recap: 3 Key Takeaways
20:20 - End Screen
Key Takeaways:
1. Metadata tags every chunk with its source file — ChromaDB stores it alongside the embedding
2. Where filters search specific sources — query one document or all documents at once
3. Source labels in the prompt enable citations — Claude cites which file each fact came from
This is Episode 17 of Build AI Apps with Python in Neovim — Phase 3 (RAG).
Taught by CelesteAI. Like & subscribe for more tutorials!
#python #ai #rag #multidocumentrag #chromadb #metadata #claudeapi #anthropic #neovim #programming #tutorial #machinelearning
#artificialintelligence #coding #pythontutorial #buildaiapps #llm #genai #vectorstore #citations