GroupBy on Time: Buckets, Pivots, Named Aggregations | Pandas for Finance Ep7
0views
C
CelesteAI
Description
Episode 7 of *Pandas for Finance*. The pivot table you don't have to drag.
Most analyst questions are "for each time bucket, what is some metric?" Pandas's `df.groupby(key).agg(func)` answers them in one line. The episode walks through three shapes: a single-key groupby (average close per ticker), a multi-key time-bucket groupby with `unstack` (monthly close pivoted across all fourteen tickers), and a named multi-stat aggregation (`AvgClose`, `MaxClose`, `AvgVolume`) that produces a Bloomberg fact-sheet in three lines of pandas.
What You'll Build:
- `groups.py` — load the cached prices, set wide-table display options, and build three groupby reports against the long-format universe.
- A monthly pivot: `df.groupby([df["Date"].dt.to_period("M"), "Ticker"])["Close"].last().unstack("Ticker")` — one row per month, one column per ticker, last close in each cell.
- A named multi-stat report: `.agg(AvgClose=("Close", "mean"), MaxClose=("Close", "max"), AvgVolume=("Volume", "mean"))` — three derived columns per ticker in one expression.
- The `pd.set_option("display.max_columns", None)` and `pd.set_option("display.width", 200)` setup so all 14 ticker columns render side-by-side without truncation.
Timestamps:
0:00 - Intro — Episode 7 starts here
0:19 - Preview — groupby anatomy
0:56 - Open nvim, write groups.py
1:09 - Read parquet, convert to datetime
1:20 - groupby Ticker, average close
1:32 - Time bucket: monthly close, unstacked
1:47 - Multi-stat named report
2:01 - Save and run
2:07 - Average close per ticker (sorted)
2:20 - Monthly close, 14 ticker columns
2:33 - Multi-stat report output
2:50 - Recap — bucket, key, agg, unstack
3:29 - End screen
Key Takeaways:
1. **`df.groupby(key).agg(func)` is the entire pivot table mental model.** Three pieces: a key (column, list, or derived expression), an aggregation (string name like `"mean"`, a function, or a list/dict for multi-stat), and a result indexed by the group. For the long-format universe, every analyst report — average per ticker, monthly stats, year-over-year — collapses to this one shape.
2. **Time buckets come from `df["Date"].dt.to_period(...)`.** Pass `"D"`, `"W"`, `"M"`, `"Q"`, `"Y"` for day/week/month/quarter/year. The `.dt` accessor only works after you `pd.to_datetime` the column, so always convert dates before grouping. For fiscal years ending in November, use `"M-NOV"` — pandas understands fiscal calendars.
3. **Multi-key groupby + `unstack` = the wide pivot.** Group by `[period, "Ticker"]`, take the last close, then `.unstack("Ticker")` lifts the inner level into columns. Result: dates as rows, tickers as columns, last close in each cell. The "drag-and-drop pivot table" replaced with three method calls. For `unstack` to render readably with many columns, set `pd.set_option("display.max_columns", None)` and `display.width` wide enough.
4. **Named aggregations beat positional ones.** `df.groupby("Ticker").agg(AvgClose=("Close", "mean"), MaxClose=("Close", "max"), AvgVolume=("Volume", "mean"))` — each output column is named and bound to a (column, function) pair. Far cleaner than `agg(["mean", "max"])` which gives you a MultiIndex column you have to flatten. Named is the modern pandas idiom; use it everywhere.
5. **`size()` vs `count()` matter when nulls exist.** `groupby(...).size()` returns the total rows in each group (including nulls). `groupby(...).count()` returns the per-column count of non-null values. For "how big is each group?" you want `size()`. For "how many non-null values do I have in this column?" you want `count()`. Subtle, but the difference shows up the first time you try to debug a missing-data report.
#Pandas #Python #Finance #GroupBy #PivotTable #DataAnalytics #PythonForFinance #Aggregation #LearnPandas #DataFrame