Pick Tickers and Columns: Brackets, .loc, .iloc, drop, rename | Pandas for Finance Ep5

0views
C
CelesteAI
Description
Episode 5 of Pandas for Finance. The SELECT clause for DataFrames. Source code: https://github.com/GoCelesteAI/pandas-for-finance Episode 4 was filtering — keep the rows you want. This episode is the column-side counterpart. df[["Date", "Close"]] selects a subset of columns into a clean new DataFrame. .loc[rows, cols] slices both at once with labels. .iloc[...] does the same with integer positions. Drop with df.drop(columns=[...]), rename with df.rename(columns={old: new}), reorder by re-listing. Together with Episode 4's filter you have everything to shape any DataFrame down to exactly the columns and rows you need before plotting, exporting, or moving to the next analytical step. What You'll Build: - pick.py — load the cached parquet, take a four-column subset, filter to AAPL with two columns, demonstrate .loc date slicing, then a drop-and-rename pipeline. - The single-bracket vs double-bracket distinction live in the REPL output: df["Close"] (Series) vs df[["Close"]] (DataFrame). - A .loc["2024-01", ["Close", "Volume"]] partial-date slice that works because the index is a DatetimeIndex — set once, slice forever. - A method-chaining example: df.drop(columns=["Adj Close"]).rename(columns={"Volume": "Vol"}) — readable cleaning pipeline that returns the column list you actually want. Timestamps: 0:00 - Intro — Episode 5 starts here 0:19 - Preview — SELECT for DataFrames 0:56 - Open nvim, write pick.py 1:08 - Subset of columns with double brackets 1:18 - One ticker, two columns 1:28 - .loc with date slice 1:39 - Drop and rename, method chaining 1:50 - Save and run 1:56 - Subset output 2:04 - AAPL Date+Close, 2010 rows 2:12 - Date-indexed loc slice 2:23 - Recap — brackets, .loc, drop, rename 3:02 - End screen Key Takeaways: 1. Single brackets vs double brackets is the first distinction to internalize. df["Close"] returns a 1-D Series. df[["Close"]] returns a 2-D DataFrame with one column. Same name, different shape — and the shape difference matters when you call methods that expect 2-D input. For analyst pipelines, you almost always want the DataFrame form so chained operations keep working uniformly. 2. .loc is your label-based tool, .iloc is positional. .loc[rows, cols] slices by labels — dates, ticker names, column names — and is the right answer for prices because the index is meaningful. .iloc[100:200] slices by integer position and is the right answer when the index is just 0, 1, 2, …. Use .loc for both row-and-column selection in one expression and to avoid SettingWithCopyWarning on assignment. 3. .loc["2024-01"] works because the index is a DatetimeIndex. Partial date strings resolve to the full month range; partial year strings to the full year. .loc["2024-01":"2024-03"] is inclusive on both ends (unlike .iloc which is exclusive on the upper bound, like Python lists). Set the date as your index once and the rest of the series gets these slices for free. 4. Drop, rename, reorder are returned-not-mutated. df.drop(columns=[...]) and df.rename(columns={...}) return new DataFrames; the originals are unchanged unless you reassign. Method chaining handles this naturally — the cleaning pipeline reads top-down and each step is an explicit transformation. Reorder columns simply by re-listing in the order you want: df[["Date", "Ticker", "Close"]]. 5. .select_dtypes(include="number") grabs all numeric columns at once. Useful when you want to apply math to every float column in a frame without enumerating their names. .select_dtypes(include="object") for strings, .select_dtypes(include="bool") for booleans. The shortcut that turns "all the numeric ones" from an enumeration problem into a one-liner. #Pandas #Python #Finance #DataFrame #Loc #Iloc #DataAnalytics #PythonForFinance #LearnPandas #ColumnSelection --- Generated by GoCelesteAI · part of the Pandas for Finance series