Build AI Apps with Python: Text Splitting — Break Documents into Chunks | Episode 13

3views
00
9:37
T
Taught by Celeste AI - AI Coding Coach
View on YouTube
Description
Before you can search documents, you have to break them into pieces. In this episode, we build a text splitter that cuts a company handbook into chunks with configurable size and overlap. Why split? Because large documents do not fit in one prompt. You split them into chunks, then retrieve only the relevant ones. Overlap ensures sentences are not cut in half at chunk boundaries. We compare 200, 500, and 1000 character chunks to see the trade-off between precision and context. Student code: https://github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode13 Every keystroke is shown on screen with 3-second pauses so you can follow along at your own pace. What You'll Learn: • Why large documents need splitting for RAG • Building a text splitter function from scratch • chunk_size — how many characters per piece • overlap — shared text prevents lost sentences • The while loop sliding window pattern • Displaying numbered chunks with character counts • Comparing different chunk sizes (200, 500, 1000) • The precision vs context trade-off • Running Python scripts with :!python % Timestamps: 0:00 - Introduction 0:12 - Why Split? (Preview) 0:46 - Creating text_splitting.py 0:58 - Company handbook — 5 chapters 1:51 - Paste handbook with :r 2:18 - Save and review the document 2:34 - The split_text function 3:34 - While loop sliding window 4:08 - Display chunks with numbering 5:32 - Compare chunk sizes (200, 500, 1000) 6:18 - Save and run 6:55 - 3 chunks at size 500 with overlap 7:15 - Size 200: 9 chunks, Size 1000: 2 chunks 7:52 - Code review 8:02 - Recap: 3 Key Takeaways 8:34 - End Screen Key Takeaways: 1. Large documents must be split into chunks to fit token limits 2. Overlap prevents sentences from being cut in half at boundaries 3. Chunk size is a trade-off — smaller means more precise, larger means more context This is Episode 13 of Build AI Apps with Python in Neovim — Phase 3 (RAG). Taught by CelesteAI. Like & subscribe for more tutorials! Tags python text splitting, rag text chunks, chunk size overlap, document splitting python, text chunking, ai tutorial 2026, build ai apps python, neovim tutorial, generative ai python, screenkey, code along, rag pipeline, retrieval augmented generation, anthropic sdk, claude api

Tags

python text splittingrag text chunkschunk size overlapdocument splitting pythontext chunkingai tutorial 2026build ai apps pythonneovim tutorialgenerative ai pythonscreenkeycode alongrag pipelineretrieval augmented generationanthropic sdkclaude api
Back to tutorials

Duration

9:37

Published

April 1, 2026

Added to Codegiz

April 5, 2026

Open in YouTube