Resampling: Daily to Weekly OHLC, Monthly Returns, Yearly Matrix | Pandas for Finance Ep10

0views
C
CelesteAI
Description
Episode 10 of *Pandas for Finance*. Time-frequency conversion — the most-used time-series tool in finance. `df.resample("W").agg({"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"})` builds a weekly OHLC bar from daily prices in one line. `df["Adj Close"].resample("ME").last().pct_change()` gives you monthly returns. And `df.groupby("Ticker")["Adj Close"].resample("YE").last().groupby(level=0).pct_change().unstack("Ticker")` produces a year-by-year, ticker-by-ticker return matrix you'd recognize from any Bloomberg page. The thing that's brutal in Excel reduces to one chained line in pandas. What You'll Build: - `resample.py` — load the cached prices, set Date as the index, build a weekly OHLC bar from daily, then monthly returns, then a year-by-year return matrix across all 14 tickers. - The OHLC aggregation pattern: `.resample("W").agg({"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"})` — each column folds with its own function. - Monthly returns at month-end: `.resample("ME").last().pct_change()` — the standard month-over-month report shape. - A 7-row × 14-column yearly return matrix: groupby + resample + groupby level 0 + pct_change + unstack — the chained pipeline that produces what every annual review prints. Timestamps: 0:00 - Intro — Episode 10 starts here 0:19 - Preview — daily to weekly to monthly 0:58 - Open nvim, write resample.py 1:14 - Daily to weekly OHLC 1:30 - Monthly returns 1:43 - Yearly returns across 14 tickers 2:16 - Save and run 2:24 - Weekly OHLC tail 2:37 - Monthly returns 2:47 - Yearly leaderboard 3:08 - Recap 3:50 - End screen Key Takeaways: 1. **Resampling needs a `DatetimeIndex`.** The first line of every resampling script: `df.set_index("Date").sort_index()`. Without that, `.resample(...)` errors. With it, every `.resample(freq)` call buckets rows by the frequency and lets you aggregate per bucket. Frequency aliases: `D` daily, `W` weekly, `ME` month-end, `QE` quarter-end, `YE` year-end. Pandas 2.2+ wants the uppercase suffix forms (`ME` not `M`); old code still works with a deprecation warning. 2. **OHLC at any frequency = `.resample(freq).agg(dict)`.** Pass `{"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"}` and each column folds with its own rule. The same pattern produces weekly bars from daily, monthly bars from weekly, or quarterly bars from any of those — change the freq string, the agg dict stays the same. 3. **Monthly returns: `.resample("ME").last().pct_change()`.** Last close of each month, then percent change vs the prior month. Two lines for month-over-month — the standard analyst question. For year-over-year on a row-by-row basis, `.pct_change(252)` looks back ~one trading year. For period-aligned YoY (calendar year vs prior calendar year), `.resample("YE").last().pct_change()` is cleaner. 4. **Multi-ticker: `groupby("Ticker") → resample → groupby(level=0) → pct_change`.** The groupby preserves the Ticker boundary so percent change doesn't bleed across tickers. The result is a MultiIndex Series; `.unstack("Ticker")` lifts the ticker level to columns and you have a 7-row × 14-ticker matrix in one chained line. 5. **`pd.Grouper(key="Date", freq="QE")` when Date isn't the index.** If you don't want to set the index, `groupby([pd.Grouper(key="Date", freq="QE"), "Ticker"])` does both bucketing and grouping in one step. Useful for ad-hoc reports where you want the Date column kept around. For most production scripts, set the index up front and `.resample` directly — fewer keystrokes. #Pandas #Python #Finance #Resample #OHLC #DataAnalytics #PythonForFinance #LearnPandas #DataFrame #TimeSeries