Why Is My pandas Loop So Slow? iterrows vs Vectorize

0views
C
CelesteAI
Description
Why is your pandas loop slow? Because it's a loop. Pandas was built to operate on whole columns at once, not one row at a time. New users see a DataFrame, see something that looks like a spreadsheet, and reach for for row in df.iterrows() — which works on a hundred rows, takes a quarter of a second on twenty-eight thousand, and takes minutes on a million. The fix is one line. This tutorial measures the same calculation three ways on the same 28,000-row OHLC table. The textbook anti-pattern iterrows — Series per row, slowest possible. The right-escape-hatch itertuples — named tuples, about ten times faster. And vectorize — drop the loop entirely, operate on whole columns, push the work down to NumPy and from there to C. Same answer. Over a thousand times faster. What You'll Build: - pandas_loop.py — compute daily return on a 28k-row prices table three ways. Measure each with time.perf_counter. Print the speed ratio at the end. - The iterrows pattern — for idx, row in df.iterrows(): row["Close"]. Works. Slow. Wraps each row in a Series, allocates a dict, pays for it on every row. 278 ms on 28k rows. - The itertuples upgrade — for row in df.itertuples(index=False): row.Close. Same loop shape, no Series wrapping, ~10x faster. Use it when you genuinely need a row-by-row walk that can't be expressed column-wise. - The vectorize pattern — df["ret"] = (df["Close"] - df["Open"]) / df["Open"]. No loop. NumPy under the hood, C under that. 0.26 ms on the same 28k rows. Over a thousand times faster than iterrows. - The mental shift — pandas is for whole-column operations. Boolean masks, arithmetic on Series, pandas built-ins like pct_change and rolling all vectorize automatically. If you find yourself writing a for loop over a DataFrame, there's usually a one-line column expression that replaces it. Timestamps: 0:00 - Intro — Why is your pandas loop slow? 0:22 - Preview — three strategies, same answer 1:07 - Open pandas_loop.py in nvim 1:23 - Method 1 — iterrows, the textbook anti-pattern 2:01 - Method 2 — itertuples, the right escape hatch 2:31 - Method 3 — vectorize, the answer 3:18 - Save and run 3:26 - 278 ms → 25 ms → 0.26 ms (1000× speedup) 3:51 - End screen — recap Key Takeaways: 1. iterrows is almost always the wrong tool. It wraps every row in a Series, which allocates a dict under the hood — and that allocation cost dominates the runtime. On a 28k-row OHLC table, iterrows takes about 278 milliseconds. The same calculation vectorized takes 0.26 milliseconds. There is almost always a column-expression alternative. 2. itertuples is the right escape hatch when you genuinely need to walk row-by-row — a stateful sequence, a path-dependent calculation, something that truly can't be expressed column-wise. It returns named tuples, skips the Series allocation, and runs roughly 10 times faster than iterrows. About 25 ms for the same 28k-row job. 3. Vectorize first. Column arithmetic — addition, subtraction, multiplication, division — runs in NumPy, which runs in C. Boolean masks, .pct_change, .rolling, .shift all do the same. The one-line vectorized version of "daily return" runs in 0.26 ms on 28k rows. Over a thousand times faster than the loop you would have written. 4. Treat the row loop as a red flag. If a teammate's pandas script has a for loop walking rows, that's the first place to look for a 100x to 1000x speedup. The fix is usually swapping the loop body for a column expression. The line count drops too. 5. The same applies to .apply on axis=1 — it's a loop in disguise. Faster than iterrows, slower than true vectorize. When you can rewrite df.apply(f, axis=1) as a column expression, you should. This channel is run by Claude AI. Tutorials AI-produced; reviewed and published by Codegiz. Source code at codegiz.com. #Python #Pandas #DataAnalytics #Performance #Vectorize #DataEngineering #PythonTutorial #LearnPython #DataFrame #pandasspeed --- Generated by Claude AI · part of the Common Questions in Python series
Back to tutorials

Duration

Added to Codegiz

May 22, 2026

📖 Read the articleOpen in YouTube