Polars for Finance: Compute Daily Returns in Polars — Finance Tutorial
Video: Compute Daily Returns in Polars — Finance Tutorial by CelesteAI
pl.col("Close").pct_change().over("Ticker")— one line, fourteen tickers, six years of daily returns computed in parallel. The.overis the window function pattern that replaces every pandasgroupby + applyyou’ve ever written for time-series math.
Returns are the language of finance. Every alpha model, every risk metric, every backtest starts here: take the close price column, divide today by yesterday, subtract one. Trivial on a single ticker. The interesting question is what happens when you stack fourteen tickers in the same DataFrame.
The answer in Polars is .over(...) — a clause you append to any expression to make it run per-group while keeping the result aligned to the original frame. Pandas writes the same operation as df.groupby("Ticker")["Close"].pct_change() and the alignment is fragile. Polars treats .over as a first-class part of the expression API, which means it composes cleanly with filter, select, with_columns, and lazy evaluation.
What pct_change actually does
For a Series [x0, x1, x2, …, xn], pct_change returns [null, x1/x0 - 1, x2/x1 - 1, …]. The first row is null (no prior value), every subsequent row is the period-over-period return as a decimal — 0.012 for +1.2%, -0.018 for -1.8%.
log_ret is the variant you want for cumulative math: log(x1) - log(x0). Log returns sum across periods (log_ret_1 + log_ret_2 = log_ret_total) while simple returns compound. Both belong in your toolkit; build them at the same time.
Setup
Same venv, same dataset.
source .venv/bin/activate
nvim returns.py
import polars as pl
df = pl.read_parquet("data/prices.parquet").sort(["Ticker", "Date"])
result = df.with_columns(
daily_ret=pl.col("Close").pct_change().over("Ticker"),
log_ret=(pl.col("Close").log() - pl.col("Close").log().shift(1)).over("Ticker"),
)
print("=== Daily and log returns (first 8 rows) ===")
print(result.head(8).select(["Date", "Ticker", "Close", "daily_ret", "log_ret"]))
null_count = result.filter(pl.col("daily_ret").is_null()).shape[0]
print(f"\nNull returns (one per ticker): {null_count}")
print(f"Total tickers: {result['Ticker'].n_unique()}")
Run:
python returns.py
Output:
=== Daily and log returns (first 8 rows) ===
shape: (8, 5)
┌─────────────────────┬────────┬───────────┬───────────┬───────────┐
│ Date ┆ Ticker ┆ Close ┆ daily_ret ┆ log_ret │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[ms] ┆ str ┆ f64 ┆ f64 ┆ f64 │
╞═════════════════════╪════════╪═══════════╪═══════════╪═══════════╡
│ 2018-01-02 00:00:00 ┆ AAPL ┆ 43.064999 ┆ null ┆ null │
│ 2018-01-03 00:00:00 ┆ AAPL ┆ 43.057499 ┆ -0.000174 ┆ -0.000174 │
│ 2018-01-04 00:00:00 ┆ AAPL ┆ 43.2575 ┆ 0.004645 ┆ 0.004634 │
│ 2018-01-05 00:00:00 ┆ AAPL ┆ 43.75 ┆ 0.011385 ┆ 0.011321 │
│ 2018-01-08 00:00:00 ┆ AAPL ┆ 43.587502 ┆ -0.003714 ┆ -0.003721 │
…
└─────────────────────┴────────┴───────────┴───────────┴───────────┘
Null returns (one per ticker): 14
Total tickers: 14
Fourteen nulls. One for each ticker’s first trading day. That’s the proof — the window ran per group, not across the whole 28k frame. If .over("Ticker") were missing, you’d see one null total (the very first row) and bogus returns at every ticker boundary as the engine computed first_AMZN_close / last_AAPL_close - 1, which is nonsense.
What .over does mechanically
pl.col("Close").pct_change() is one expression. .over("Ticker") is a clause that wraps it. Conceptually:
- Sort the frame by Ticker (handled by your prior
.sort(["Ticker", "Date"])). - Partition into 14 ticker-blocks.
- Run
pct_changeindependently on each block. - Stitch the results back into the original row order.
In Polars source you’ll see this called a window expression — borrowed from SQL’s OVER (PARTITION BY ...). The semantics are identical. If you’ve written SQL window functions for finance, every .over reads as the same shape.
Pandas equivalent — for migration:
# Pandas
df["daily_ret"] = df.groupby("Ticker")["Close"].pct_change()
# Polars
df = df.with_columns(daily_ret=pl.col("Close").pct_change().over("Ticker"))
The pandas version mutates df. The Polars version returns a new frame. The Polars version also runs ~10× faster on this dataset because the windowed operation parallelizes across cores.
Composing .over with other expressions
.over is a clause, not a method on a special object. It works anywhere a window-shaped operation makes sense:
result = df.with_columns(
# Trailing 5-day mean close, per ticker (preview of Episode 6)
ma5=pl.col("Close").rolling_mean(window_size=5).over("Ticker"),
# Cumulative max (running peak), per ticker — useful for drawdown
peak=pl.col("Close").cum_max().over("Ticker"),
# Rank within ticker by volume
vol_rank=pl.col("Volume").rank("ordinal").over("Ticker"),
)
Every line uses the same shape: build the expression, append .over("Ticker"). The engine handles partitioning. You think in terms of “do X per ticker” and write exactly that.
A common bug: forgetting .over
The dataset is sorted by Ticker first, Date second — 14 ticker-blocks of ~2000 daily bars each. Run pct_change without .over:
bad = df.with_columns(naive=pl.col("Close").pct_change())
You’ll get a return for every row, but at each ticker boundary the return is garbage — it computes the ratio of the first AMZN day to the last AAPL day, which is meaningless. Pandas does this silently too; both libraries assume you know what pct_change means when the underlying group changes.
Rule of thumb: if your frame has more than one entity stacked, every per-period expression needs .over(entity_col). Mark it as a checklist item.
Filtering on the returns
The natural follow-up — show the biggest up-days per ticker:
big_up = (
result.filter(pl.col("daily_ret") > 0.10)
.select(["Date", "Ticker", "Close", "daily_ret"])
.sort("daily_ret", descending=True)
)
print(big_up.head(10))
Filter (Episode 2) + the returns we just computed. Polars chains them through the optimizer; the result is a sorted leaderboard of every ten-percent-plus daily move across all 14 tickers in six years.
Pandas → Polars cheatsheet for returns
| Operation | Pandas | Polars |
|---|---|---|
| Daily return per ticker | df.groupby("T")["Close"].pct_change() |
pl.col("Close").pct_change().over("T") |
| Log return per ticker | np.log(df.groupby("T")["Close"]).diff() |
(pl.col("Close").log() - pl.col("Close").log().shift(1)).over("T") |
| Cumulative return | (1 + r).groupby("T").cumprod() - 1 |
((1 + pl.col("r")).cum_prod() - 1).over("T") |
| Annualized vol | r.groupby("T").std() * np.sqrt(252) |
pl.col("r").std().over("T") * 252**0.5 |
| Rolling 20-day mean | df.groupby("T")["Close"].rolling(20).mean() |
pl.col("Close").rolling_mean(20).over("T") |
Every pandas groupby + method becomes a Polars expression.over(group_col). The mental model is one substitution rule, applied uniformly.
Common stumbles
Returns are all NaN. Frame isn’t sorted by Date within each Ticker. pct_change assumes chronological order. Always .sort(["Ticker", "Date"]) first.
Returns garbage at ticker boundaries. Missing .over("Ticker"). The pandas error mode is identical — silently wrong. Polars doesn’t catch it for you; this is craft, not type-checking.
AttributeError: 'Expr' object has no attribute 'over'. You’re on Polars older than v0.20. Upgrade to v1.x: pip install --upgrade polars.
Want returns starting from zero, not null. fill_null(0) after .over: pl.col("Close").pct_change().over("Ticker").fill_null(0). The first row of each ticker becomes 0% return, which is what most backtest scaffolding expects.
Confused by .over vs .group_by. .over is for windowed operations that return the same row count as the input. .group_by is for aggregations that collapse to one row per group (Episode 4). Use .over when you want to add a column. Use .group_by when you want a summary.
What’s next
Episode 4 covers group_by and the .agg clause — the per-ticker statistics that pair with the returns we just computed. Mean return, standard deviation, annualized Sharpe ratio, max drawdown — one frame, 14 rows, one expression each.
Recap
.over("Ticker") is the window function clause that turns any expression into a per-group operation while keeping the result aligned to the original frame. Daily returns are pl.col("Close").pct_change().over("Ticker"); log returns are the log-diff variant of the same shape. Always sort by [Ticker, Date] first. The first row of each ticker is null — fourteen nulls in a 28k frame is your proof that the window ran. Forgetting .over silently produces garbage at every ticker boundary; mark it as a checklist item. The same .over clause unlocks rolling means, cumulative maxes, ranks, and every other window-shaped operation you’ll need.
Next episode: group_by and aggregation. Per-ticker statistics in a 14-row summary frame.