Pick Tickers and Columns: Brackets, .loc, .iloc, drop, rename | Pandas for Finance Ep5
0views
C
CelesteAI
Description
Episode 5 of *Pandas for Finance*. The SELECT clause for DataFrames.
Episode 4 was filtering — keep the rows you want. This episode is the column-side counterpart. `df[["Date", "Close"]]` selects a subset of columns into a clean new DataFrame. `.loc[rows, cols]` slices both at once with labels. `.iloc[...]` does the same with integer positions. Drop with `df.drop(columns=[...])`, rename with `df.rename(columns={old: new})`, reorder by re-listing. Together with Episode 4's filter you have everything to shape any DataFrame down to exactly the columns and rows you need before plotting, exporting, or moving to the next analytical step.
What You'll Build:
- `pick.py` — load the cached parquet, take a four-column subset, filter to AAPL with two columns, demonstrate `.loc` date slicing, then a drop-and-rename pipeline.
- The single-bracket vs double-bracket distinction live in the REPL output: `df["Close"]` (Series) vs `df[["Close"]]` (DataFrame).
- A `.loc["2024-01", ["Close", "Volume"]]` partial-date slice that works because the index is a `DatetimeIndex` — set once, slice forever.
- A method-chaining example: `df.drop(columns=["Adj Close"]).rename(columns={"Volume": "Vol"})` — readable cleaning pipeline that returns the column list you actually want.
Timestamps:
0:00 - Intro — Episode 5 starts here
0:19 - Preview — SELECT for DataFrames
0:56 - Open nvim, write pick.py
1:08 - Subset of columns with double brackets
1:18 - One ticker, two columns
1:28 - .loc with date slice
1:39 - Drop and rename, method chaining
1:50 - Save and run
1:56 - Subset output
2:04 - AAPL Date+Close, 2010 rows
2:12 - Date-indexed loc slice
2:23 - Recap — brackets, .loc, drop, rename
3:02 - End screen
Key Takeaways:
1. **Single brackets vs double brackets is the first distinction to internalize.** `df["Close"]` returns a 1-D Series. `df[["Close"]]` returns a 2-D DataFrame with one column. Same name, different shape — and the shape difference matters when you call methods that expect 2-D input. For analyst pipelines, you almost always want the DataFrame form so chained operations keep working uniformly.
2. **`.loc` is your label-based tool, `.iloc` is positional.** `.loc[rows, cols]` slices by labels — dates, ticker names, column names — and is the right answer for prices because the index is meaningful. `.iloc[100:200]` slices by integer position and is the right answer when the index is just `0, 1, 2, …`. Use `.loc` for both row-and-column selection in one expression and to avoid `SettingWithCopyWarning` on assignment.
3. **`.loc["2024-01"]` works because the index is a `DatetimeIndex`.** Partial date strings resolve to the full month range; partial year strings to the full year. `.loc["2024-01":"2024-03"]` is inclusive on both ends (unlike `.iloc` which is exclusive on the upper bound, like Python lists). Set the date as your index once and the rest of the series gets these slices for free.
4. **Drop, rename, reorder are returned-not-mutated.** `df.drop(columns=[...])` and `df.rename(columns={...})` return *new* DataFrames; the originals are unchanged unless you reassign. Method chaining handles this naturally — the cleaning pipeline reads top-down and each step is an explicit transformation. Reorder columns simply by re-listing in the order you want: `df[["Date", "Ticker", "Close"]]`.
5. **`.select_dtypes(include="number")` grabs all numeric columns at once.** Useful when you want to apply math to every float column in a frame without enumerating their names. `.select_dtypes(include="object")` for strings, `.select_dtypes(include="bool")` for booleans. The shortcut that turns "all the numeric ones" from an enumeration problem into a one-liner.
#Pandas #Python #Finance #DataFrame #Loc #Iloc #DataAnalytics #PythonForFinance #LearnPandas #ColumnSelection