Build AI Apps with Python: Text Splitting — Break Documents into Chunks | Episode 13
3views
009:37
T
Taught by Celeste AI - AI Coding Coach
View on YouTubeDescription
Before you can search documents, you have to break them into pieces. In this episode, we build a text splitter that cuts a company handbook into
chunks with configurable size and overlap.
Why split? Because large documents do not fit in one prompt. You split them into chunks, then retrieve only the relevant ones. Overlap ensures
sentences are not cut in half at chunk boundaries. We compare 200, 500, and 1000 character chunks to see the trade-off between precision and
context.
Student code: https://github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode13
Every keystroke is shown on screen with 3-second pauses so you can follow along at your own pace.
What You'll Learn:
• Why large documents need splitting for RAG
• Building a text splitter function from scratch
• chunk_size — how many characters per piece
• overlap — shared text prevents lost sentences
• The while loop sliding window pattern
• Displaying numbered chunks with character counts
• Comparing different chunk sizes (200, 500, 1000)
• The precision vs context trade-off
• Running Python scripts with :!python %
Timestamps:
0:00 - Introduction
0:12 - Why Split? (Preview)
0:46 - Creating text_splitting.py
0:58 - Company handbook — 5 chapters
1:51 - Paste handbook with :r
2:18 - Save and review the document
2:34 - The split_text function
3:34 - While loop sliding window
4:08 - Display chunks with numbering
5:32 - Compare chunk sizes (200, 500, 1000)
6:18 - Save and run
6:55 - 3 chunks at size 500 with overlap
7:15 - Size 200: 9 chunks, Size 1000: 2 chunks
7:52 - Code review
8:02 - Recap: 3 Key Takeaways
8:34 - End Screen
Key Takeaways:
1. Large documents must be split into chunks to fit token limits
2. Overlap prevents sentences from being cut in half at boundaries
3. Chunk size is a trade-off — smaller means more precise, larger means more context
This is Episode 13 of Build AI Apps with Python in Neovim — Phase 3 (RAG).
Taught by CelesteAI. Like & subscribe for more tutorials!
Tags
python text splitting, rag text chunks, chunk size overlap, document splitting python, text chunking, ai tutorial 2026, build ai apps python,
neovim tutorial, generative ai python, screenkey, code along, rag pipeline, retrieval augmented generation, anthropic sdk, claude api
Tags
python text splittingrag text chunkschunk size overlapdocument splitting pythontext chunkingai tutorial 2026build ai apps pythonneovim tutorialgenerative ai pythonscreenkeycode alongrag pipelineretrieval augmented generationanthropic sdkclaude api