Part of Pandas for Finance

Pandas for Finance: GroupBy on Time: Buckets, Pivots, Named Aggregations

Celest KimCelest Kim

Video: GroupBy on Time: Buckets, Pivots, Named Aggregations | Pandas for Finance Ep7 by CelesteAI

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

df.groupby(df["Date"].dt.to_period("M"))["Close"].mean() — monthly average close. df.groupby([df["Date"].dt.year, "Ticker"])["Return"].sum() — annual return per ticker. The pivot table you don’t have to drag.

Most analyst questions are “for each [time bucket], what is [metric]?” Pandas’s groupby answers them in one line.

groupby anatomy

df.groupby(key).agg(func) — three pieces:

  1. key — what to group by. A column, a list of columns, or a derived expression.
  2. agg — what to compute per group. mean, sum, count, custom function.
  3. result — a Series or DataFrame with the group key as index.
import pandas as pd

df = pd.read_parquet("data/prices.parquet")
df["Date"] = pd.to_datetime(df["Date"])

# Average close per ticker
avg_close = df.groupby("Ticker")["Close"].mean()
print(avg_close)
Ticker
AAPL    140.86
AMZN    145.40
GOOGL    138.12
JNJ     158.23
JPM     157.40
...

The result is a Series indexed by Ticker.

Time buckets

The killer use of groupby is bucketing by time period.

# Monthly average close, all tickers combined
monthly = df.groupby(df["Date"].dt.to_period("M"))["Close"].mean()
print(monthly.head())
Date
2018-01     85.22
2018-02     86.71
2018-03     87.04
2018-04     87.60
2018-05     90.12

.dt.to_period("M") converts each Date to “the month it belongs to.” Group by that.

Other periods:

Code Period
"D" day
"W" week (Sun–Sat)
"M" month
"Q" quarter
"Y" year
"M-NOV" fiscal year ending November

Multi-key groupby

# Monthly average per ticker
monthly_per_ticker = (
  df.groupby([df["Date"].dt.to_period("M"), "Ticker"])["Close"]
    .mean()
    .unstack("Ticker")
)
print(monthly_per_ticker.head())

unstack("Ticker") pivots the inner level to columns:

Ticker     AAPL    AMZN   GOOGL    JNJ    JPM ...
Date
2018-01   42.96   71.39   55.69  131.81  111.42
2018-02   42.43   76.08   55.48  127.95  113.43

A pivot table — date as rows, ticker as columns. The drag-and-drop replaced.

Multiple aggregations

stats = (
  df.groupby("Ticker")["Close"]
    .agg(["mean", "min", "max", "std"])
    .round(2)
)
print(stats)
            mean    min     max     std
Ticker
AAPL      140.86  35.55  258.39   60.20
AMZN      145.40  73.30  226.95   34.45
GOOGL     138.12  53.22  198.66   31.04
...

Pass a list to .agg(). Each becomes a column in the result.

For different aggs on different columns:

report = df.groupby("Ticker").agg(
  AvgClose=("Close", "mean"),
  MaxClose=("Close", "max"),
  AvgVolume=("Volume", "mean"),
  TotalVolume=("Volume", "sum"),
)

Name=("col", "func") — named aggregations. The output column name is Name, the function applies to col.

Custom aggregations

def daily_to_annual(returns):
  return (1 + returns.mean()) ** 252 - 1

annualized = (
  df.groupby("Ticker")["Return"]
    .agg(daily_to_annual)
    .sort_values(ascending=False)
)

Any function that takes a Series and returns a scalar works.

For inline:

df.groupby("Ticker")["Return"].agg(lambda r: (1+r.mean())**252 - 1)

Year-over-year, quarter-over-quarter

# Returns aggregated to monthly
monthly_ret = (
  df.set_index("Date")
    .groupby("Ticker")["Adj Close"]
    .resample("ME").last()
    .groupby(level=0)
    .pct_change()
)

(We’ll cover .resample() properly in Ep 10. For now, the takeaway: bucket → aggregate → compare buckets.)

For YoY at the table level:

df["Year"] = df["Date"].dt.year
yearly_close = df.groupby(["Year", "Ticker"])["Adj Close"].last().unstack()
yoy = yearly_close.pct_change()
print(yoy.tail())
Ticker     AAPL    AMZN   GOOGL    JNJ    JPM
Year
2021     0.348   0.022   0.652  0.108  0.250
2022    -0.265 -0.495  -0.391  0.032 -0.140
2023     0.480   0.808   0.583  0.018  0.270
2024     0.305   0.444   0.357  0.054  0.330

A return table you’d recognize from Bloomberg.

Year × Month heatmap input

df["Year"] = df["Date"].dt.year
df["Month"] = df["Date"].dt.month
df["DailyReturn"] = df.groupby("Ticker")["Adj Close"].pct_change()

aapl = df[df["Ticker"] == "AAPL"]

# Compound monthly return
monthly = aapl.groupby(["Year", "Month"])["DailyReturn"].apply(
  lambda r: (1 + r).prod() - 1
).unstack("Month")

print(monthly.round(3))
Month       1      2      3      4      5      6     ...
Year
2020    -0.078 -0.111  -0.067  0.157  0.083  0.149   ...
2021     0.000 -0.080  -0.082  0.076  0.045 -0.020   ...
2022    -0.014 -0.057   0.058 -0.105 -0.054 -0.082   ...

Year as rows, month as columns. Hand this to seaborn or plotly for a heatmap.

Counting

# How many trading days per ticker per year
counts = (
  df.groupby([df["Date"].dt.year, "Ticker"])
    .size()
    .unstack("Ticker")
)

.size() returns the count regardless of nulls; .count() only non-null values per column.

Weekday effect

df["DailyReturn"] = df.groupby("Ticker")["Adj Close"].pct_change()
df["Weekday"] = df["Date"].dt.day_name()

weekday_ret = df.groupby("Weekday")["DailyReturn"].mean()
print(weekday_ret.reindex(["Monday","Tuesday","Wednesday","Thursday","Friday"]))

The classic “what day of the week is best?” analysis. Spoiler: not very different.

Try it

groups.py:

import pandas as pd

pd.set_option("display.float_format", lambda x: f"{x:,.2f}")

df = pd.read_parquet("data/prices.parquet")
df["Date"] = pd.to_datetime(df["Date"])
df["Return"] = df.groupby("Ticker")["Adj Close"].pct_change()

print("=== Average close per ticker ===")
print(df.groupby("Ticker")["Close"].mean().sort_values(ascending=False))

print("\n=== Monthly close (last value) per ticker ===")
monthly = (
  df.groupby([df["Date"].dt.to_period("M"), "Ticker"])["Close"]
    .last()
    .unstack("Ticker")
    .tail()
)
print(monthly)

print("\n=== Multi-stat report ===")
report = df.groupby("Ticker").agg(
  AvgClose=("Close", "mean"),
  MaxClose=("Close", "max"),
  AvgVolume=("Volume", "mean"),
)
print(report.round(0))

print("\n=== Year × Year Annualized Return ===")
df["Year"] = df["Date"].dt.year
yoy = (
  df.groupby(["Year", "Ticker"])["Adj Close"]
    .last()
    .unstack("Ticker")
    .pct_change()
)
print(yoy.tail())

Common stumbles

Forgetting to convert to datetime. df["Date"].dt... errors if Date is still a string. pd.to_datetime(df["Date"]) first.

size() vs count(). size() is total rows in the group (NaN-aware: counts them). count() is per column non-null. For “how big is this group?” use size().

groupby().mean() on string columns. Errors after pandas 2.0 unless you numeric_only=True. Or select numeric columns first.

unstack with too many groups. A million unique tickers → million-column DataFrame, OOM. Stack/unstack carefully.

Period vs Timestamp. .dt.to_period("M") produces Period objects (display “2024-01”). .dt.month_year doesn’t exist; use df["Date"].dt.year + df["Date"].dt.month for separate columns.

Time-aware groupby on long format. Always df.groupby(["Date_period", "Ticker"])... not just Date_period — otherwise tickers compete with dates as keys.

What’s next

Ep 8: joins. Prices × sector lookup; the VLOOKUP-for-adults episode.

Recap

df.groupby(key).agg(func). Time bucket: df["Date"].dt.to_period("M"). Multi-key for “ticker × month.” .agg([...]) for multiple stats; agg(Name=("col", "func")) for named outputs. .unstack(level) pivots one level to columns. .dt.year, .dt.month, .dt.day_name() for date pieces. The pivot table replaced.

Next episode: joins.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.