Inner, Left, Outer — Merging DataFrames in pandas
About this lesson
Real data lives in pieces. A trade blotter on one side, a ticker reference on the other. A users table here, an addresses table there. The work of any pandas analysis is joining them — and pandas has exactly one function for that: pd.merge. Same mental model as SQL — left side, right side, a key column, a how-strategy. One call covers every join type. Source code: https://github.com/GoCelesteAI/merge-dataframes-pandas This tutorial covers the three join strategies every Python data person should know — all from the same function. The default inner join — only rows where the key exists on both sides, the safest default for analysis. The how equals left join — every row from the left DataFrame survives, with NaN where the right side has no match. The how equals outer join — the union of keys from both sides, NaN wherever either side is missing. What You'll Build: - merge_dataframes.py — merge an 8-row trade blotter with a 14-row ticker reference three different ways. Same key, three how-strategies, three result shapes. - The pd.merge default — pass left, right, and on. Returns the inner join. Only the 8 rows where Ticker exists on both sides survive. - The how equals left pattern — every trade survives, even when no reference row exists. NaN in the reference columns. Use this when the left side is your fact table and you can't afford silent row loss. - The how equals outer pattern — the union of keys from both sides. Returns 14 rows: 8 trades plus 6 reference-only tickers. NaN wherever either side is missing. - The trades-versus-tickers data structure — a typical fact/reference split: fewer trades, broader ticker universe. The asymmetry makes the three joins visibly different. - The on equals key idiom — when the join column has the same name on both sides. If the names differ, swap to left_on and right_on (covered in a follow-up). Timestamps: 0:00 - Intro — Merge two DataFrames in one call 0:22 - Preview — three strategies, one function 1:07 - Open merge_datafra
DeepCamp AI