Joins: Prices x Sector Lookup, merge() and the SQL JOIN | Pandas for Finance Ep8

Name: Joins: Prices x Sector Lookup, merge() and the SQL JOIN | Pandas for Finance Ep8
Uploaded: 2026-05-09 04:33:18
Channel: CelesteAI

0views

CelesteAI

Description

Episode 8 of Pandas for Finance. The VLOOKUP for adults — pandas's merge() is SQL's JOIN, and it's how analysts attach metadata to a main table. Source code: https://github.com/GoCelesteAI/pandas-for-finance The episode walks through prices.merge(sectors, on="Ticker", how="left") to glue a fourteen-ticker sector lookup onto the long-format price table, then groups by Sector to answer the analyst question — which sectors led the period? Consumer Discretionary 50%, Technology 46%, Communication Services 31%, and so on. The four how values, the validate parameter that catches unexpected cartesian explosions, and the named-aggregation pattern from Episode 7 all carry forward. What You'll Build: - join.py — load the cached prices.parquet and the sector_map.csv lookup, merge on Ticker with how="left", attach a Sector column to every price row, then group by Sector for the leaderboard. - The merge with explicit how and the validate-many-to-one safety check that catches accidental cartesian explosions. - A sector-level annualized-return ranking computed in one chained pipeline: groupby → mean → sort → annualize → format as percent. - The cat data/sector_map.csv warm-up so viewers see the small reference table that powers the join. Timestamps: 0:00 - Intro — Episode 8 starts here 0:19 - Preview — merge anatomy 0:58 - cat the sector lookup 1:11 - Open nvim, write join.py 1:23 - merge(prices, sectors, on=Ticker, how=left) 1:35 - groupby Sector, annualize 1:49 - Save and run 1:55 - Joined head (Sector column attached) 2:09 - Annualized return by sector 2:31 - Recap 3:10 - End screen Key Takeaways: 1. df1.merge(df2, on="key", how=...) is SQL's JOIN. The four how values mirror SQL exactly: inner keeps only rows with the key in both (default and easy to silently drop rows on); left keeps everything on the left and matches what it can on the right; right is the symmetric case; outer keeps everything from both, NaN where unmatched. For "attach metadata to my main table," how="left" is the right answer ninety percent of the time. 2. Use validate="many_to_one" to catch cartesian explosions. When you expect each key on the right to be unique (one sector per ticker, one industry per company), pandas can verify the relationship and raise a clear error if it isn't. Without validate, a duplicated row on the supposed-unique side silently multiplies your row count and the bug surfaces hours later in a downstream aggregation. One keyword argument; saves debugging time. 3. Different key column names → left_on and right_on. When the join key is named Ticker on one side and Symbol on the other, left_on="Ticker", right_on="Symbol" does the join without renaming first. The output keeps both columns; drop one if you don't need it. 4. The point of the join is the groupby on the new column. Joining is rarely the goal; aggregating the result is. After prices.merge(sectors), groupby("Sector")["Return"].mean() is the actual analysis — the join just made the column available. For finance work, the pattern is always: load main table, attach lookups, compute returns, group by the lookup column. 5. Concat is not merge. pd.concat([df1, df2]) stacks DataFrames vertically without matching keys — use it for same-schema chunks like yearly files or month-by-month dumps. merge is for matching on a key. Mixing them up is one of the top mistakes new pandas users make; remember that concat is "glue them together" and merge is "match them up." #Pandas #Python #Finance #Merge #Join #SQL #DataAnalytics #PythonForFinance #LearnPandas #DataFrame --- Generated by GoCelesteAI · part of the Pandas for Finance series

Back to tutorials

Duration

Added to Codegiz

May 9, 2026

📖 Read the article Open in YouTube