Resampling: Daily to Weekly OHLC, Monthly Returns, Yearly Matrix | Pandas for Finance Ep10

0views
C
CelesteAI
Description
Episode 10 of Pandas for Finance. Time-frequency conversion — the most-used time-series tool in finance. Source code: https://github.com/GoCelesteAI/pandas-for-finance df.resample("W").agg({"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"}) builds a weekly OHLC bar from daily prices in one line. df["Adj Close"].resample("ME").last().pct_change() gives you monthly returns. And df.groupby("Ticker")["Adj Close"].resample("YE").last().groupby(level=0).pct_change().unstack("Ticker") produces a year-by-year, ticker-by-ticker return matrix you'd recognize from any Bloomberg page. The thing that's brutal in Excel reduces to one chained line in pandas. What You'll Build: - resample.py — load the cached prices, set Date as the index, build a weekly OHLC bar from daily, then monthly returns, then a year-by-year return matrix across all 14 tickers. - The OHLC aggregation pattern: .resample("W").agg({"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"}) — each column folds with its own function. - Monthly returns at month-end: .resample("ME").last().pct_change() — the standard month-over-month report shape. - A 7-row × 14-column yearly return matrix: groupby + resample + groupby level 0 + pct_change + unstack — the chained pipeline that produces what every annual review prints. Timestamps: 0:00 - Intro — Episode 10 starts here 0:19 - Preview — daily to weekly to monthly 0:58 - Open nvim, write resample.py 1:14 - Daily to weekly OHLC 1:30 - Monthly returns 1:43 - Yearly returns across 14 tickers 2:16 - Save and run 2:24 - Weekly OHLC tail 2:37 - Monthly returns 2:47 - Yearly leaderboard 3:08 - Recap 3:50 - End screen Key Takeaways: 1. Resampling needs a DatetimeIndex. The first line of every resampling script: df.set_index("Date").sort_index(). Without that, .resample(...) errors. With it, every .resample(freq) call buckets rows by the frequency and lets you aggregate per bucket. Frequency aliases: D daily, W weekly, ME month-end, QE quarter-end, YE year-end. Pandas 2.2+ wants the uppercase suffix forms (ME not M); old code still works with a deprecation warning. 2. OHLC at any frequency = .resample(freq).agg(dict). Pass {"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"} and each column folds with its own rule. The same pattern produces weekly bars from daily, monthly bars from weekly, or quarterly bars from any of those — change the freq string, the agg dict stays the same. 3. Monthly returns: .resample("ME").last().pct_change(). Last close of each month, then percent change vs the prior month. Two lines for month-over-month — the standard analyst question. For year-over-year on a row-by-row basis, .pct_change(252) looks back ~one trading year. For period-aligned YoY (calendar year vs prior calendar year), .resample("YE").last().pct_change() is cleaner. 4. Multi-ticker: groupby("Ticker") → resample → groupby(level=0) → pct_change. The groupby preserves the Ticker boundary so percent change doesn't bleed across tickers. The result is a MultiIndex Series; .unstack("Ticker") lifts the ticker level to columns and you have a 7-row × 14-ticker matrix in one chained line. 5. pd.Grouper(key="Date", freq="QE") when Date isn't the index. If you don't want to set the index, groupby([pd.Grouper(key="Date", freq="QE"), "Ticker"]) does both bucketing and grouping in one step. Useful for ad-hoc reports where you want the Date column kept around. For most production scripts, set the index up front and .resample directly — fewer keystrokes. #Pandas #Python #Finance #Resample #OHLC #DataAnalytics #PythonForFinance #LearnPandas #DataFrame #TimeSeries --- Generated by GoCelesteAI · part of the Pandas for Finance series