Dates and the Trading Calendar: .dt, bdate_range, Calendar Gaps | Pandas for Finance Ep9

0views
C
CelesteAI
Description
Episode 9 of Pandas for Finance. The calendar quirks that bite finance scripts. Source code: https://github.com/GoCelesteAI/pandas-for-finance If your Date column is a string, half of pandas's time superpowers don't work. Step one: pd.to_datetime. Once the dtype is datetime64[ns], the .dt accessor unlocks year, month, weekday, quarter, day-of-year, and a dozen .is_*_end booleans for free. The episode walks through Apple's weekday returns, the count of business days in 2024, and a sneaky-clean trick — diffing dates and looking for jumps over 3 days reveals every Monday market holiday in the dataset. What You'll Build: - dates.py — load AAPL, extract Year/Weekday/Quarter via .dt, then group by Weekday and rank the daily returns Mon-Fri. - A trading-day count for 2024 with pd.bdate_range (262 weekend-only business days; about 10 fewer once you swap in CustomBusinessDay(calendar=USFederalHolidayCalendar())). - A holiday-detection trick: index.to_series().diff().dt.days.gt(3) flags every long weekend in the dataset — MLK, Presidents, Good Friday, Memorial, Labor Day all visible. - The reorder pattern: groupby("Weekday").mean().reindex(["Monday","Tuesday",...]) so the days print in calendar order, not alphabetical. Timestamps: 0:00 - Intro — Episode 9 starts here 0:19 - Preview — .dt, bdate_range, gaps 0:58 - Open nvim, write dates.py 1:12 - .dt accessor: year, weekday, quarter 1:24 - Average return by weekday 1:38 - Business-day calendar 1:51 - Calendar gaps = Monday holidays 2:04 - Save and run 2:10 - Date components output 2:22 - Weekday returns 2:37 - Trading days in 2024 2:50 - Recap 3:29 - End screen Key Takeaways: 1. pd.to_datetime is the gateway. .dt.year, .dt.month, .dt.day_name(), .dt.is_month_end — none of it works while the dtype is object/string. Convert first, then everything else flows. For messy dates, errors="coerce" turns parse failures into NaT instead of raising; for non-standard formats, pass format="%d/%m/%Y" for a 5-10x speed boost over the default infer-from-each-value behavior. 2. .dt is the analyst's date-piece extractor. .dt.year, .dt.month, .dt.day_name() for ordinary components. .dt.quarter, .dt.dayofyear, .dt.weekday (0=Mon) for less obvious ones. .dt.is_month_end, .dt.is_quarter_end, .dt.is_year_end for the booleans that drive end-of-period rebalancing logic. Every analyst report that buckets by "month" or "quarter" runs through this accessor. 3. pd.bdate_range(start, end) for business-day ranges. 262 weekdays in 2024. For US market days specifically, swap to CustomBusinessDay(calendar=USFederalHolidayCalendar()) to drop ~10 federal holidays — the result is close to NYSE but not identical (NYSE has Good Friday off, federal doesn't). For exact NYSE/CME calendars install pandas_market_calendars or exchange_calendars. Ten holidays sounds small until your annualization is off by 4%. 4. Calendar gaps reveal market holidays. index.to_series().diff().dt.days.gt(3) — any consecutive trading days more than 3 apart are a Monday holiday (Sat + Sun + Mon = 3 missing days, so Tuesday's diff to Friday is 4). It's a one-line audit that catches missing data too: a 6-day gap means you're missing a session. 5. Period vs Timestamp matter for monthly logic. A Timestamp is a specific instant; a Period is a span. df["Date"].dt.to_period("M") converts each date to "the month it belongs to" (Period('2024-01', 'M')), which is the right type for monthly aggregations and renames. For day-level operations, stick with Timestamp; for span-level, use Period. Mixing them is the sneakiest bug in time-series code. #Pandas #Python #Finance #Datetime #Calendar #BusinessDays #DataAnalytics #PythonForFinance #LearnPandas #DataFrame --- Generated by GoCelesteAI · part of the Pandas for Finance series