Iterators & Generators (yield, Lazy Evaluation, Data Pipelines) - Python Tutorial for Beginners #24
Video: Iterators & Generators (yield, Lazy Evaluation, Data Pipelines) - Python Tutorial for Beginners #24 by Taught by Celeste AI - AI Coding Coach
Python Iterators and Generators: iter, next, yield
An iterator is anything with
__iter__and__next__. A generator is a function that usesyieldinstead ofreturn— it pauses, hands back a value, and resumes on the next call. Generators are lazy: they produce values on demand, not all at once.
When you write for x in something, Python is calling iter(something) and then next() repeatedly. Understanding the protocol lets you build your own.
The iterator protocol
An iterator has two methods:
__iter__(self)— returns the iterator (usuallyreturn self).__next__(self)— returns the next value, or raisesStopIterationwhen done.
class Countdown:
def __init__(self, start):
self.start = start
def __iter__(self):
self.current = self.start
return self
def __next__(self):
if self.current < 1:
raise StopIteration
value = self.current
self.current -= 1
return value
for num in Countdown(5):
print(num)
# 5 4 3 2 1
The for loop calls iter(Countdown(5)) (which calls __iter__), then next() repeatedly until StopIteration.
Manual iteration
counter = Countdown(3)
it = iter(counter)
print(next(it)) # 3
print(next(it)) # 2
print(next(it)) # 1
print(next(it)) # StopIteration
iter() and next() are how for works under the hood. Useful for "peeking" or stepwise consumption.
Iterables vs iterators
- Iterable — has
__iter__. Can produce an iterator. Lists, tuples, dicts, strings, ranges, files. - Iterator — has both
__iter__and__next__. Tracks position; gets exhausted.
nums = [1, 2, 3] # iterable, not iterator
it = iter(nums) # iterator
print(next(it)) # 1
# Lists can be iterated multiple times:
for x in nums: print(x)
for x in nums: print(x) # works again
# Iterators get exhausted:
for x in it: print(x)
for x in it: print(x) # nothing — already consumed
A list is iterable but not itself an iterator. Each for loop creates a fresh iterator from it.
Generators: yield does it for you
Writing __iter__ + __next__ is verbose. Generators are easier:
def countdown(n):
while n > 0:
yield n
n -= 1
for num in countdown(5):
print(num)
# 5 4 3 2 1
yield is the magic. Calling countdown(5) doesn't run the function — it returns a generator object. Each next() runs until the next yield, then pauses.
The function's local variables are preserved between yields. When the function returns (or hits the end), StopIteration is raised automatically.
Why generators are cheaper
import sys
nums_list = [x * x for x in range(1000)]
nums_gen = (x * x for x in range(1000))
print(sys.getsizeof(nums_list)) # ~8000 bytes
print(sys.getsizeof(nums_gen)) # ~100 bytes
A list builds all values up front. A generator stores nothing — it computes each value on demand. For a million items, that's MB vs. constant.
For huge sequences or infinite streams, generators are essential.
Fibonacci as a generator
def fibonacci(limit):
a, b = 0, 1
while a <= limit:
yield a
a, b = b, a + b
fibs = list(fibonacci(100))
print(fibs) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
The state — a, b — lives across yields. The generator pauses with the variables intact.
For an infinite sequence:
def naturals():
n = 0
while True:
yield n
n += 1
# Take the first 5
from itertools import islice
print(list(islice(naturals(), 5))) # [0, 1, 2, 3, 4]
Don't list(naturals()) — that runs forever. Always cap with islice, take, or a break condition.
Generator expressions (recap)
gen = (x * 2 for x in range(10))
print(next(gen)) # 0
print(next(gen)) # 2
Same as a list comprehension but with () instead of []. Lazy. Covered in lesson 14.
Pipelines
Generators chain naturally:
def numbers(n):
for i in range(1, n + 1):
yield i
def double(nums):
for n in nums:
yield n * 2
def keep_even(nums):
for n in nums:
if n % 2 == 0:
yield n
result = list(keep_even(double(numbers(10))))
print(result)
# [4, 8, 12, 16, 20] (every doubled value is already even)
Each stage processes one value at a time. No intermediate lists. Memory is O(1) regardless of input size.
This is the core idea of streaming pipelines — the same pattern Unix uses with pipes (|).
yield from
def inner():
yield 1
yield 2
def outer():
yield "start"
yield from inner()
yield "end"
print(list(outer()))
# ['start', 1, 2, 'end']
yield from inner() is shorthand for "yield each value from this iterable in turn." Equivalent to for x in inner(): yield x.
Useful for delegating to sub-generators or for flattening nested generators.
Sending values into a generator
def echo():
while True:
x = yield
print(f"got {x}")
g = echo()
next(g) # prime
g.send("hello") # got hello
g.send(42) # got 42
g.send(value) resumes the generator with value as the result of the yield expression. This makes generators full coroutines — they can both produce and consume.
In practice, modern Python uses async/await for coroutines (lesson 31). .send() is rarely used directly.
Generator cleanup
def session():
print("opening")
try:
yield "data"
finally:
print("closing")
for x in session():
print(x)
# opening
# data
# closing
If the generator is fully consumed, finally runs. If it's abandoned, Python calls .close() (which raises GeneratorExit inside, allowing finally to clean up).
This pattern is the foundation of contextlib.contextmanager (lesson 25).
itertools: a toolbox
The itertools module has dozens of generator-based utilities:
from itertools import count, cycle, repeat, chain, islice, accumulate, groupby, takewhile, combinations, permutations, product
# Infinite counter
list(islice(count(10), 5)) # [10, 11, 12, 13, 14]
# Cycle through values
list(islice(cycle("ABC"), 7)) # ['A', 'B', 'C', 'A', 'B', 'C', 'A']
# Chain multiple iterables
list(chain([1, 2], [3, 4])) # [1, 2, 3, 4]
# Running totals
list(accumulate([1, 2, 3, 4])) # [1, 3, 6, 10]
# Take while predicate true
list(takewhile(lambda x: x < 5, range(10))) # [0, 1, 2, 3, 4]
# Combinations and permutations
list(combinations([1,2,3], 2)) # [(1,2), (1,3), (2,3)]
Whenever you find yourself writing nested loops or accumulator code, check itertools first.
Common stumbles
Re-iterating an exhausted generator. Once consumed, it's done. Call the function again to get a fresh one.
Forgetting iter(obj) for the iterator. Looping a list directly works (Python calls iter for you), but if you need manual next(), call iter() first.
list() of an infinite generator. Hangs forever. Use islice or a cap.
Using return instead of yield. A function with return is just a function — no generator. Mix-and-match: return exits the generator (raises StopIteration); yield produces a value.
yield inside a list comprehension. Comprehensions can't contain yield. Use a generator function instead.
Side effects in generators. They run at iteration time, not creation time. gen = my_generator() doesn't execute anything yet — surprising if you expect file-opens or DB calls to happen up front.
What's next
Lesson 25: context managers. with statement, __enter__/__exit__, @contextmanager.
Recap
Iterator protocol: __iter__ returns the iterator, __next__ returns the next value or raises StopIteration. Generators are functions that use yield — Python builds the iterator for you. Lazy evaluation: O(1) memory. Pipelines via chained generators. yield from delegates to sub-generators. itertools has reusable building blocks.
Next lesson: context managers.