GroupBy on Time: Buckets, Pivots, Named Aggregations | Pandas for Finance Ep7

0views
C
CelesteAI
Description
Episode 7 of Pandas for Finance. The pivot table you don't have to drag. Source code: https://github.com/GoCelesteAI/pandas-for-finance Most analyst questions are "for each time bucket, what is some metric?" Pandas's df.groupby(key).agg(func) answers them in one line. The episode walks through three shapes: a single-key groupby (average close per ticker), a multi-key time-bucket groupby with unstack (monthly close pivoted across all fourteen tickers), and a named multi-stat aggregation (AvgClose, MaxClose, AvgVolume) that produces a Bloomberg fact-sheet in three lines of pandas. What You'll Build: - groups.py — load the cached prices, set wide-table display options, and build three groupby reports against the long-format universe. - A monthly pivot: df.groupby([df["Date"].dt.to_period("M"), "Ticker"])["Close"].last().unstack("Ticker") — one row per month, one column per ticker, last close in each cell. - A named multi-stat report: .agg(AvgClose=("Close", "mean"), MaxClose=("Close", "max"), AvgVolume=("Volume", "mean")) — three derived columns per ticker in one expression. - The pd.set_option("display.max_columns", None) and pd.set_option("display.width", 200) setup so all 14 ticker columns render side-by-side without truncation. Timestamps: 0:00 - Intro — Episode 7 starts here 0:19 - Preview — groupby anatomy 0:56 - Open nvim, write groups.py 1:09 - Read parquet, convert to datetime 1:20 - groupby Ticker, average close 1:32 - Time bucket: monthly close, unstacked 1:47 - Multi-stat named report 2:01 - Save and run 2:07 - Average close per ticker (sorted) 2:20 - Monthly close, 14 ticker columns 2:33 - Multi-stat report output 2:50 - Recap — bucket, key, agg, unstack 3:29 - End screen Key Takeaways: 1. df.groupby(key).agg(func) is the entire pivot table mental model. Three pieces: a key (column, list, or derived expression), an aggregation (string name like "mean", a function, or a list/dict for multi-stat), and a result indexed by the group. For the long-format universe, every analyst report — average per ticker, monthly stats, year-over-year — collapses to this one shape. 2. Time buckets come from df["Date"].dt.to_period(...). Pass "D", "W", "M", "Q", "Y" for day/week/month/quarter/year. The .dt accessor only works after you pd.to_datetime the column, so always convert dates before grouping. For fiscal years ending in November, use "M-NOV" — pandas understands fiscal calendars. 3. Multi-key groupby + unstack = the wide pivot. Group by [period, "Ticker"], take the last close, then .unstack("Ticker") lifts the inner level into columns. Result: dates as rows, tickers as columns, last close in each cell. The "drag-and-drop pivot table" replaced with three method calls. For unstack to render readably with many columns, set pd.set_option("display.max_columns", None) and display.width wide enough. 4. Named aggregations beat positional ones. df.groupby("Ticker").agg(AvgClose=("Close", "mean"), MaxClose=("Close", "max"), AvgVolume=("Volume", "mean")) — each output column is named and bound to a (column, function) pair. Far cleaner than agg(["mean", "max"]) which gives you a MultiIndex column you have to flatten. Named is the modern pandas idiom; use it everywhere. 5. size() vs count() matter when nulls exist. groupby(...).size() returns the total rows in each group (including nulls). groupby(...).count() returns the per-column count of non-null values. For "how big is each group?" you want size(). For "how many non-null values do I have in this column?" you want count(). Subtle, but the difference shows up the first time you try to debug a missing-data report. #Pandas #Python #Finance #GroupBy #PivotTable #DataAnalytics #PythonForFinance #Aggregation #LearnPandas #DataFrame --- Generated by GoCelesteAI · part of the Pandas for Finance series