Join DataFrames in Polars — Python Tutorial
About this lesson
Finance data lives in pieces. Prices in one table, sectors in another, fundamentals in a third. Production work is the join — the line that stitches the row you have to the lookup table you need. Polars's dot join is the SQL JOIN you have used a thousand times, with the Polars expression API on either side of it. Source code: https://github.com/GoCelesteAI/polars-for-finance This episode covers the four join types you will actually use: left for tagging, inner for intersection, outer for union, and anti for finding the gaps. Same dataset universe — twenty eight thousand price rows joined against a fourteen-row sector lookup, then grouped by sector for a seven-row sector-level summary. The full analyst pipeline shape: load pieces, join them, derive columns, summarize. What You'll Build: - joins.py — read prices.parquet and sector_map.csv, left join on Ticker, then chain through with_columns and group_by Sector to produce the seven-row sector statistics table. - The left join idiom — prices.join(sectors, on=Ticker, how=left). One method, three arguments, twenty eight thousand rows tagged with their sector in one line. - The four join semantics — left, inner, outer, anti — with one-line examples of each. When to use which is the central decision in every join you write. - left_on and right_on for differently-named keys. List arguments to on for multi-column joins like ticker plus date. The suffix argument when both sides have a non-key column with the same name. - The anti-join data-quality check — find every left row without a right match. If empty, your lookup is complete. If not, you have prices for tickers you cannot classify. - join_asof for sorted time-series merges — match each left row to the nearest prior right row. The pandas merge_asof equivalent, the standard tick-data join shape. Timestamps: 0:00 - Intro — joins, the finance pipeline staple 0:18 - Preview — left join is the most-used flavor 0:54 - Open joins.py in nvim 1:14 - Load prices and sectors 1
DeepCamp AI