Pandas for Finance: Resampling: Daily to Weekly OHLC, Monthly Returns, Yearly Matrix
Video: Resampling: Daily to Weekly OHLC, Monthly Returns, Yearly Matrix | Pandas for Finance Ep10 by CelesteAI
df.resample("ME").last()— month-end last value.df.resample("W").agg({"High":"max","Low":"min","Close":"last"})— weekly OHLC bars from daily.df.shift(252)for year-over-year. The thing that’s brutal in Excel.
Resampling changes the time frequency: daily → weekly → monthly → yearly, or the other direction. It’s the most-used time-series tool in finance.
Setup: DatetimeIndex
Resampling needs a DatetimeIndex.
import pandas as pd
df = pd.read_parquet("data/prices.parquet")
aapl = (
df[df["Ticker"] == "AAPL"]
.copy()
.sort_values("Date")
.set_index("Date")
)
Now aapl.index is a DatetimeIndex, and resampling works.
Downsample: daily → weekly
weekly_close = aapl["Close"].resample("W").last()
print(weekly_close.head())
Date
2018-01-07 43.41
2018-01-14 44.27
2018-01-21 44.50
2018-01-28 43.82
2018-02-04 38.59
.resample(freq) buckets rows by frequency. .last() aggregates each bucket. Other aggs:
aapl["Close"].resample("W").mean() # average close per week
aapl["Close"].resample("W").max() # weekly high based on close
aapl["Volume"].resample("W").sum() # total weekly volume
Custom OHLC bar
weekly_ohlc = aapl[["Open", "High", "Low", "Close", "Volume"]].resample("W").agg({
"Open": "first",
"High": "max",
"Low": "min",
"Close": "last",
"Volume": "sum",
})
print(weekly_ohlc.tail())
The classic candle bar at a different frequency. Each .agg(dict) keys says how to fold that column.
Frequency aliases
| Alias | Meaning |
|---|---|
"D" |
daily |
"B" |
business day |
"W" |
weekly (default Sunday end) |
"W-FRI" |
weekly Friday end |
"ME" |
month end |
"MS" |
month start |
"BME" |
business month end |
"QE" |
quarter end |
"BQE" |
business quarter end |
"YE" |
year end |
Pandas 2.2+ uses uppercase suffixes like ME, QE, YE. Older code shows M, Q, Y — still works, deprecation warning.
Monthly returns from daily
monthly_close = aapl["Adj Close"].resample("ME").last()
monthly_return = monthly_close.pct_change()
print(monthly_return.tail())
The “what’s my month-over-month return” calculation in two lines.
For multiple tickers:
df = df.set_index("Date").sort_index()
monthly = (
df.groupby("Ticker")["Adj Close"]
.resample("ME").last()
)
print(monthly.head())
groupby + resample is a common combo. Result is a Series with MultiIndex (Ticker, Date).
To compute monthly returns within each ticker:
monthly_ret = (
df.groupby("Ticker")["Adj Close"]
.resample("ME").last()
.groupby(level=0)
.pct_change()
)
groupby(level=0) reuses the Ticker level for the second groupby.
Upsampling: weekly → daily
Less common in finance but shows up:
weekly = aapl["Close"].resample("W").last()
daily = weekly.resample("D").ffill() # forward-fill: each day = last week's value
Upsampling needs a fill strategy:
- ffill() — forward fill (carry last value).
- bfill() — backward fill.
- interpolate() — linear interpolation.
For prices, ffill is usually right (no trading on the day → carry last close).
Year-over-year
aapl["YoY"] = aapl["Adj Close"].pct_change(252)
.pct_change(N) looks back N rows. With ~252 trading days per year, pct_change(252) is “today vs ~one year ago.”
For period-aligned YoY (calendar year):
yoy = (
aapl["Adj Close"]
.resample("YE").last()
.pct_change()
)
print(yoy)
Calendar-year-end vs prior calendar-year-end.
Quarter-end snapshots
quarterly = (
df.groupby("Ticker")["Adj Close"]
.resample("QE").last()
.unstack("Ticker")
)
print(quarterly.tail())
A snapshot every quarter — what the press release would quote.
Resampling and grouping in one go
df["Date"] = pd.to_datetime(df["Date"])
# Quarter × Ticker close
q = (
df.groupby([pd.Grouper(key="Date", freq="QE"), "Ticker"])["Adj Close"]
.last()
.unstack("Ticker")
)
pd.Grouper(key="Date", freq="QE") is “groupby this column at this frequency.” Useful when the date isn’t the index.
Custom aggregation
# Annualized volatility, computed monthly
monthly_vol = (
aapl["Close"].pct_change()
.resample("ME")
.agg(lambda r: r.std() * (252 ** 0.5))
)
print(monthly_vol.tail())
Standard deviation per month, scaled to annual. Any function that takes a Series and returns a scalar works.
Try it
resample.py:
import pandas as pd
pd.set_option("display.float_format", lambda x: f"{x:,.4f}")
df = pd.read_parquet("data/prices.parquet")
aapl = df[df["Ticker"] == "AAPL"].copy().set_index("Date").sort_index()
print("=== Weekly OHLC ===")
weekly = aapl[["Open", "High", "Low", "Close", "Volume"]].resample("W").agg({
"Open": "first",
"High": "max",
"Low": "min",
"Close": "last",
"Volume": "sum",
})
print(weekly.tail())
print("\n=== Monthly returns ===")
monthly_close = aapl["Adj Close"].resample("ME").last()
monthly_ret = monthly_close.pct_change()
print(monthly_ret.tail(8).map(lambda x: f"{x:+.2%}" if pd.notna(x) else "—"))
print("\n=== Yearly returns (all tickers) ===")
yearly = (
df.set_index("Date").groupby("Ticker")["Adj Close"]
.resample("YE").last()
.groupby(level=0)
.pct_change()
.dropna()
.unstack("Ticker")
)
print(yearly.map(lambda x: f"{x:+.2%}" if pd.notna(x) else "—"))
print("\n=== Year-over-year (rolling 252-day) for AAPL ===")
aapl["YoY"] = aapl["Adj Close"].pct_change(252)
print(aapl["YoY"].tail().map(lambda x: f"{x:+.2%}"))
Common stumbles
Resampling without a DatetimeIndex. Errors. df.set_index("Date") first.
M deprecation. Pandas 2.2+ wants ME (month-end) explicit. Old code shows M and works with a warning. Fix: replace M→ME, Q→QE, Y→YE.
groupby + resample creates MultiIndex. Result has (Ticker, Date) as index. .unstack("Ticker") to spread tickers across columns.
first and last on NaN. They skip NaN. Sometimes you want first non-NaN which is the same; for “first regardless,” use .head(1) per group.
Resampling shifts the bucket label. By default month-end goes to the last day of the month. label="left" to put it on the first.
Forward-fill across tickers. Naïve df["Adj Close"].ffill() fills cross-ticker boundaries. Use df.groupby("Ticker")["Adj Close"].ffill() instead.
What’s next
Ep 11: rolling windows — moving averages, Bollinger bands, drawdown.
Recap
df.resample(freq).agg(func) for time-frequency conversion. "ME" "QE" "YE" "W" are the common targets. Custom OHLC bar via agg({"Open":"first","High":"max","Low":"min","Close":"last"}). For multi-ticker, df.groupby("Ticker")["col"].resample(freq).func() then groupby(level=0).pct_change() for within-ticker returns. pd.Grouper(key="Date", freq="QE") if Date isn’t the index.
Next episode: rolling windows.