Polars for Finance: Rolling Windows in Polars — Finance Tutorial
Video: Rolling Windows in Polars — Finance Tutorial by CelesteAI
pl.col("Close").rolling_mean(window_size=20).over("Ticker")— every twenty-day moving average for every ticker, in one line. Addrolling_std, throw two standard deviations on either side, and you’ve built Bollinger bands across fourteen tickers in three expressions. The window is rolling; the engine is Rust; the API is the same expression grammar from Episode 3.
A rolling window is a fixed-width slice that walks down the time series, recomputing a statistic at every step. Twenty-day SMA: average the last twenty closes, drop the oldest, add the next one, repeat. The output has the same row count as the input — one new value per row — which is why .over("Ticker") from Episode 3 is the natural pairing.
If returns are the language of finance, rolling windows are the grammar. Moving averages, volatility bands, momentum signals, rolling Sharpe, drawdown curves — every chart you’ve ever seen on a Bloomberg terminal is a rolling-window computation.
What rolling_mean actually does
For a Series [x0, x1, …, xn] with window_size=20:
- The first 19 rows return
null— not enough prior data to fill the window. - Row 19 (the twentieth bar): mean of
[x0..x19]. - Row 20: mean of
[x1..x20]. - Row n: mean of
[xn-19..xn].
Same shape as pct_change from Episode 3 — a windowed expression that aligns one-to-one with the original frame. Append .over("Ticker") and each ticker gets its own fresh-start rolling window; the first 19 rows of every ticker are null, not just the first 19 of the whole frame.
Pandas equivalent:
df["sma20"] = df.groupby("Ticker")["Close"].rolling(20).mean().reset_index(0, drop=True)
Note the reset_index ceremony pandas requires to put the result back in row order. Polars’s .over handles alignment automatically — no ceremony.
Setup
Same venv, same dataset.
source .venv/bin/activate
nvim rolling.py
Your rolling-windows script
import polars as pl
df = pl.read_parquet("data/prices.parquet").sort(["Ticker", "Date"])
result = df.with_columns(
sma20=pl.col("Close").rolling_mean(window_size=20).over("Ticker"),
sma50=pl.col("Close").rolling_mean(window_size=50).over("Ticker"),
std20=pl.col("Close").rolling_std(window_size=20).over("Ticker"),
).with_columns(
boll_upper=pl.col("sma20") + 2 * pl.col("std20"),
boll_lower=pl.col("sma20") - 2 * pl.col("std20"),
)
# Latest 8 days of AAPL — moving averages and Bollinger bands
aapl_tail = result.filter(pl.col("Ticker") == "AAPL").select(
["Date", "Close", "sma20", "sma50", "boll_upper", "boll_lower"]
).tail(8)
print(aapl_tail)
Run:
python rolling.py
Output (AAPL, last eight trading days):
shape: (8, 6)
┌─────────────────────┬────────┬────────┬────────┬────────────┬────────────┐
│ Date ┆ Close ┆ sma20 ┆ sma50 ┆ boll_upper ┆ boll_lower │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[ms] ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════════════════╪════════╪════════╪════════╪════════════╪════════════╡
│ 2025-12-18 00:00:00 ┆ 272.19 ┆ 277.14 ┆ 269.13 ┆ 286.44 ┆ 267.85 │
│ 2025-12-19 00:00:00 ┆ 273.67 ┆ 277.51 ┆ 269.52 ┆ 285.48 ┆ 269.55 │
…
│ 2025-12-30 00:00:00 ┆ 273.08 ┆ 276.19 ┆ 272.47 ┆ 284.50 ┆ 267.88 │
└─────────────────────┴────────┴────────┴────────┴────────────┴────────────┘
AAPL closes around $273. The 20-day moving average is $276 — price is trading slightly below the short-term mean. The 50-day SMA at $272 says the medium-term trend is sideways. Bollinger upper band at $286, lower at $268 — about $9 of room each direction. The setup every technical analyst eyeballs first.
Two .with_columns chained
The script uses two with_columns calls. The first computes sma20, sma50, std20 from the raw Close. The second uses sma20 and std20 to build boll_upper and boll_lower.
You can’t reference a newly-created column within the same with_columns call — Polars evaluates all expressions against the input frame in parallel. To use a derived column, finish that with_columns and start another. The engine still optimizes the combined query into a single pass.
The null prefix is your sanity check
Same pattern as Episode 3: the first window_size - 1 rows of each ticker have null in the rolling column. With fourteen tickers and window_size=20, you should see exactly 266 null rows in sma20 (19 × 14). With window_size=50, 686 nulls in sma50 (49 × 14).
print(result.filter(pl.col("sma20").is_null()).shape[0]) # 266
print(result.filter(pl.col("sma50").is_null()).shape[0]) # 686
If you see the wrong count, your .over("Ticker") is missing and the rolling crossed ticker boundaries — same silent-bug shape as Episode 3.
Other rolling functions you’ll use
Polars exposes the full family on pl.Expr:
| Method | Purpose |
|---|---|
rolling_mean |
Moving average — trend |
rolling_std |
Rolling volatility — risk |
rolling_min / rolling_max |
Channel breakouts, support/resistance |
rolling_sum |
Cumulative-window aggregates — volume in last N days |
rolling_median |
Robust trend, less sensitive to outliers |
rolling_quantile |
Rolling percentiles — VaR-style bands |
rolling_skew |
Rolling distribution shape |
Each takes window_size=N. Each composes with .over("Ticker") the same way. The grammar is uniform — one substitution, dozens of indicators.
Variable-width windows with rolling_*_by
For time-based windows that span calendar days instead of fixed-row counts:
result = df.with_columns(
sma_5d=pl.col("Close").rolling_mean_by("Date", window_size="5d").over("Ticker"),
)
A 5d window includes every row whose Date is within the last five calendar days of the current row, regardless of how many trading bars that covers. Useful when your data has holes (skipped weekends, missing days) and you want a true time-based window, not a row-count window.
The _by family also gives you rolling_sum_by, rolling_std_by, etc. — same shape, time-keyed instead of count-keyed.
Volatility scaling — annualize the rolling stddev
The std20 column above is daily volatility. To annualize:
result = result.with_columns(
vol_annual=pl.col("std20") * (252 ** 0.5),
)
252 is the trading-day year. The **0.5 is the square root. The result is what every risk system reports as the “annualized volatility” of the ticker.
Cumulative variants — cum_max for drawdown
Rolling’s cousin is cumulative — expanding window from the start:
result = df.with_columns(
peak=pl.col("Close").cum_max().over("Ticker"),
drawdown=(pl.col("Close") / pl.col("Close").cum_max() - 1).over("Ticker"),
)
cum_max gives the running peak per ticker. Dividing the current close by the running peak, minus one, is the standard drawdown formula. Every backtest report contains this column.
Pandas → Polars cheatsheet for rolling
| Operation | Pandas | Polars |
|---|---|---|
| 20-day SMA per ticker | df.groupby("T")["X"].rolling(20).mean() |
pl.col("X").rolling_mean(20).over("T") |
| 20-day rolling std | df.groupby("T")["X"].rolling(20).std() |
pl.col("X").rolling_std(20).over("T") |
| Time-based window | df.set_index("Date").rolling("5d") |
pl.col("X").rolling_mean_by("Date", "5d") |
| Cumulative max | df.groupby("T")["X"].cummax() |
pl.col("X").cum_max().over("T") |
| Expanding mean | df.groupby("T")["X"].expanding().mean() |
pl.col("X").cum_sum() / pl.int_range(1, pl.len() + 1) (manual) |
The two-line pandas idiom with reset_index collapses to a one-line Polars expression.
Common stumbles
InvalidOperationError: NaN/inf produces null. A rolling_std on a column with leading nulls (e.g. computed from pct_change) will propagate. Compute Close-based stats first, returns-based later, OR use min_periods=1 to relax the warm-up.
Want a different warm-up. Default min_periods equals window_size. To accept partial-window means (e.g. row 5 gets the mean of the first 5 rows even though you asked for window 20):
pl.col("Close").rolling_mean(window_size=20, min_periods=1)
Returns zero nulls — every row has a value. Most analysts want the strict version; the relaxed version is useful for live dashboards.
Bollinger bands look wrong. Standard formula is mean ± 2 × std. If yours are inverted, you computed std on returns instead of prices, or you forgot the multiplier.
AttributeError: 'Expr' object has no attribute 'rolling'. You’re on a very old Polars. Upgrade — the rolling_* methods have been first-class since v0.18.
Result has the right column count but wrong values. Frame wasn’t sorted by [Ticker, Date]. rolling_* walks rows in order; no .over auto-sort. Always .sort first.
Recap
pl.col("Close").rolling_mean(window_size=20).over("Ticker") is the substitute for every pandas groupby + rolling pattern. The .over clause keeps each ticker’s window self-contained. The first window_size - 1 rows of each ticker are null — the count is your proof the partitioning ran. Compose rolling_mean, rolling_std, and arithmetic in two with_columns calls to get Bollinger bands. The rolling_*_by("Date", "5d") variants give you calendar-time windows. cum_max + the drawdown formula is the same shape, expanding instead of rolling. Once the grip is in place, every chart on every Bloomberg terminal is one expression away.
Subscribe to the playlist for the next episode in the series.