Dates and the Trading Calendar: .dt, bdate_range, Calendar Gaps | Pandas for Finance Ep9
0views
C
CelesteAI
Description
Episode 9 of *Pandas for Finance*. The calendar quirks that bite finance scripts.
If your `Date` column is a string, half of pandas's time superpowers don't work. Step one: `pd.to_datetime`. Once the dtype is `datetime64[ns]`, the `.dt` accessor unlocks year, month, weekday, quarter, day-of-year, and a dozen `.is_*_end` booleans for free. The episode walks through Apple's weekday returns, the count of business days in 2024, and a sneaky-clean trick — diffing dates and looking for jumps over 3 days reveals every Monday market holiday in the dataset.
What You'll Build:
- `dates.py` — load AAPL, extract Year/Weekday/Quarter via `.dt`, then group by Weekday and rank the daily returns Mon-Fri.
- A trading-day count for 2024 with `pd.bdate_range` (262 weekend-only business days; about 10 fewer once you swap in `CustomBusinessDay(calendar=USFederalHolidayCalendar())`).
- A holiday-detection trick: `index.to_series().diff().dt.days.gt(3)` flags every long weekend in the dataset — MLK, Presidents, Good Friday, Memorial, Labor Day all visible.
- The reorder pattern: `groupby("Weekday").mean().reindex(["Monday","Tuesday",...])` so the days print in calendar order, not alphabetical.
Timestamps:
0:00 - Intro — Episode 9 starts here
0:19 - Preview — .dt, bdate_range, gaps
0:58 - Open nvim, write dates.py
1:12 - .dt accessor: year, weekday, quarter
1:24 - Average return by weekday
1:38 - Business-day calendar
1:51 - Calendar gaps = Monday holidays
2:04 - Save and run
2:10 - Date components output
2:22 - Weekday returns
2:37 - Trading days in 2024
2:50 - Recap
3:29 - End screen
Key Takeaways:
1. **`pd.to_datetime` is the gateway.** `.dt.year`, `.dt.month`, `.dt.day_name()`, `.dt.is_month_end` — none of it works while the dtype is `object`/string. Convert first, then everything else flows. For messy dates, `errors="coerce"` turns parse failures into `NaT` instead of raising; for non-standard formats, pass `format="%d/%m/%Y"` for a 5-10x speed boost over the default infer-from-each-value behavior.
2. **`.dt` is the analyst's date-piece extractor.** `.dt.year`, `.dt.month`, `.dt.day_name()` for ordinary components. `.dt.quarter`, `.dt.dayofyear`, `.dt.weekday` (0=Mon) for less obvious ones. `.dt.is_month_end`, `.dt.is_quarter_end`, `.dt.is_year_end` for the booleans that drive end-of-period rebalancing logic. Every analyst report that buckets by "month" or "quarter" runs through this accessor.
3. **`pd.bdate_range(start, end)` for business-day ranges.** 262 weekdays in 2024. For US market days specifically, swap to `CustomBusinessDay(calendar=USFederalHolidayCalendar())` to drop ~10 federal holidays — the result is close to NYSE but not identical (NYSE has Good Friday off, federal doesn't). For exact NYSE/CME calendars install `pandas_market_calendars` or `exchange_calendars`. Ten holidays sounds small until your annualization is off by 4%.
4. **Calendar gaps reveal market holidays.** `index.to_series().diff().dt.days.gt(3)` — any consecutive trading days more than 3 apart are a Monday holiday (Sat + Sun + Mon = 3 missing days, so Tuesday's diff to Friday is 4). It's a one-line audit that catches missing data too: a 6-day gap means you're missing a session.
5. **Period vs Timestamp matter for monthly logic.** A `Timestamp` is a specific instant; a `Period` is a span. `df["Date"].dt.to_period("M")` converts each date to "the month it belongs to" (`Period('2024-01', 'M')`), which is the right type for monthly aggregations and renames. For day-level operations, stick with `Timestamp`; for span-level, use `Period`. Mixing them is the sneakiest bug in time-series code.
#Pandas #Python #Finance #Datetime #Calendar #BusinessDays #DataAnalytics #PythonForFinance #LearnPandas #DataFrame