Inner, Left, Outer — Merging DataFrames in pandas

0views
C
CelesteAI
Description
Real data lives in pieces. A trade blotter on one side, a ticker reference on the other. A users table here, an addresses table there. The work of any pandas analysis is joining them — and pandas has exactly one function for that: pd.merge. Same mental model as SQL — left side, right side, a key column, a how-strategy. One call covers every join type. Source code: https://github.com/GoCelesteAI/merge-dataframes-pandas This tutorial covers the three join strategies every Python data person should know — all from the same function. The default inner join — only rows where the key exists on both sides, the safest default for analysis. The how equals left join — every row from the left DataFrame survives, with NaN where the right side has no match. The how equals outer join — the union of keys from both sides, NaN wherever either side is missing. What You'll Build: - merge_dataframes.py — merge an 8-row trade blotter with a 14-row ticker reference three different ways. Same key, three how-strategies, three result shapes. - The pd.merge default — pass left, right, and on. Returns the inner join. Only the 8 rows where Ticker exists on both sides survive. - The how equals left pattern — every trade survives, even when no reference row exists. NaN in the reference columns. Use this when the left side is your fact table and you can't afford silent row loss. - The how equals outer pattern — the union of keys from both sides. Returns 14 rows: 8 trades plus 6 reference-only tickers. NaN wherever either side is missing. - The trades-versus-tickers data structure — a typical fact/reference split: fewer trades, broader ticker universe. The asymmetry makes the three joins visibly different. - The on equals key idiom — when the join column has the same name on both sides. If the names differ, swap to left_on and right_on (covered in a follow-up). Timestamps: 0:00 - Intro — Merge two DataFrames in one call 0:22 - Preview — three strategies, one function 1:07 - Open merge_dataframes.py in nvim 1:33 - Method 1 — default inner join 2:01 - Method 2 — how="left" keeps every trade 2:41 - Method 3 — how="outer" returns the union 3:23 - Save and run 3:31 - Three joins, three shapes 3:51 - End screen — recap Key Takeaways: 1. pd.merge is the join function. left, right, on, how — same four arguments as a SQL join. One function for every join type — no concat, no manual lookup, no for-loop over keys. 2. The default join is inner. Only rows where the key exists on both sides survive. Use this when you want clean matched data and don't care about either side's orphans. 3. how equals left keeps every row from the left DataFrame, with NaN where the right side has no match. The right choice when the left side is your fact table (trades, events, transactions) and dropping rows silently would corrupt downstream metrics. 4. how equals outer is the union of keys from both sides. NaN where either side has no match. Useful for set-difference checks — "which tickers are in the reference but have no trades?" — and for full reconciliation. 5. on equals key when the join column has the same name on both sides. If the names differ, use left_on equals "Symbol" and right_on equals "Ticker". The result has both columns unless you drop one. 📺 Deeper dive: Pandas for Finance Ep 8 walks the same join with validate=many_to_one, the right join, and a full sector-leaders aggregation: https://www.youtube.com/watch?v=j1O-w-Darxc This channel is run by Claude AI. Tutorials AI-produced; reviewed and published by Codegiz. Source code at codegiz.com. #Python #Pandas #DataAnalytics #DataFrame #Merge #SQL #DataEngineering #PythonTutorial #LearnPython #pd_merge --- Generated by Claude AI · part of the Common Questions in Python series
Back to tutorials

Duration

Added to Codegiz

May 22, 2026

📖 Read the articleOpen in YouTube