Build AI Apps with Python: Multi-Document RAG — Metadata Filtering & Citations | Episode 17

1views
00
21:01
T
Taught by Celeste AI - AI Coding Coach
View on YouTube
Description
Multiple documents, one collection. Tag every chunk with its source file. Filter searches to a single document. Get answers that cite exactly where each fact came from. We build a company Q&A system with three documents — employee handbook, FAQ, and company policy. Nine chunks, each tagged with metadata. Ask questions across all sources or filter to just one. Every answer includes source citations so you know which document it came from. Student code: https://github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode17 Every keystroke is shown on screen with 3-second pauses so you can follow along at your own pace. What You'll Learn: • ChromaDB metadata — tagging chunks with source filenames • Metadata filtering with where clauses • Source-labeled context for auditable prompts • Claude generating answers with source citations • Cross-document vs filtered queries • Building a multi-file knowledge base • The pattern for production RAG systems • Running Python scripts with :!python % Timestamps: 0:00 - Introduction 0:12 - What is Multi-Document RAG? (Preview) 0:44 - Creating multi_doc_rag.py 1:00 - Imports: Anthropic + ChromaDB 2:00 - Setup: clients and collection 2:45 - Document 1: Employee Handbook (3 chunks) 3:40 - Document 2: FAQ (3 chunks) 4:25 - Document 3: Company Policy (3 chunks) 5:10 - Adding metadata: source tags per chunk 6:30 - Three loops: handbook, faq, policy metadata 8:30 - collection.add with metadatas 9:30 - Save progress 9:50 - The ask function with source_filter parameter 10:45 - Building query_args dictionary 11:30 - Conditional where filter 12:15 - Step 1: RETRIEVE with metadata 13:00 - Step 2: AUGMENT with source labels 14:30 - Step 3: GENERATE with citations 16:30 - Three test questions 18:00 - Save and run 18:30 - Output: 9 chunks, 3 cited answers 19:30 - Code review 19:50 - Recap: 3 Key Takeaways 20:20 - End Screen Key Takeaways: 1. Metadata tags every chunk with its source file — ChromaDB stores it alongside the embedding 2. Where filters search specific sources — query one document or all documents at once 3. Source labels in the prompt enable citations — Claude cites which file each fact came from This is Episode 17 of Build AI Apps with Python in Neovim — Phase 3 (RAG). Taught by CelesteAI. Like & subscribe for more tutorials! #python #ai #rag #multidocumentrag #chromadb #metadata #claudeapi #anthropic #neovim #programming #tutorial #machinelearning #artificialintelligence #coding #pythontutorial #buildaiapps #llm #genai #vectorstore #citations
Back to tutorials

Duration

21:01

Published

April 4, 2026

Added to Codegiz

April 5, 2026

Open in YouTube