Join DataFrames in Polars — Python Tutorial
0views
C
CelesteAI
Description
Finance data lives in pieces. Prices in one table, sectors in another, fundamentals in a third. Production work is the join — the line that stitches the row you have to the lookup table you need. Polars's dot join is the SQL JOIN you have used a thousand times, with the Polars expression API on either side of it.
Source code: https://github.com/GoCelesteAI/polars-for-finance
This episode covers the four join types you will actually use: left for tagging, inner for intersection, outer for union, and anti for finding the gaps. Same dataset universe — twenty eight thousand price rows joined against a fourteen-row sector lookup, then grouped by sector for a seven-row sector-level summary. The full analyst pipeline shape: load pieces, join them, derive columns, summarize.
What You'll Build:
- joins.py — read prices.parquet and sector_map.csv, left join on Ticker, then chain through with_columns and group_by Sector to produce the seven-row sector statistics table.
- The left join idiom — prices.join(sectors, on=Ticker, how=left). One method, three arguments, twenty eight thousand rows tagged with their sector in one line.
- The four join semantics — left, inner, outer, anti — with one-line examples of each. When to use which is the central decision in every join you write.
- left_on and right_on for differently-named keys. List arguments to on for multi-column joins like ticker plus date. The suffix argument when both sides have a non-key column with the same name.
- The anti-join data-quality check — find every left row without a right match. If empty, your lookup is complete. If not, you have prices for tickers you cannot classify.
- join_asof for sorted time-series merges — match each left row to the nearest prior right row. The pandas merge_asof equivalent, the standard tick-data join shape.
Timestamps:
0:00 - Intro — joins, the finance pipeline staple
0:18 - Preview — left join is the most-used flavor
0:54 - Open joins.py in nvim
1:14 - Load prices and sectors
1:32 - prices.join(sectors, on=Ticker, how=left)
1:54 - 28,140 rows now have a Sector column
2:14 - Chain into group_by Sector for a seven-row summary
2:42 - Save and run
3:00 - The seven sectors with average close and volatility
3:30 - End screen — recap and what's next
Key Takeaways:
1. df.join is Polars's SQL JOIN. Three arguments — on, how, and the right-side frame. on is the column name (or list of names) to match across both frames. how is the semantic flavor: left to tag, inner to intersect, outer to union, anti to find gaps, semi to filter by membership. The result is a fresh DataFrame; nothing mutates.
2. Left join is the most-used finance flavor. It keeps every row from the left frame and pulls matching values from the right; rows without a match get null in the right-side columns. The output shape matches the left input row count, which is what you want when you are tagging existing rows with metadata.
3. Combine join with group_by for the canonical analyst pipeline. Load pieces, join to attach metadata, derive columns with with_columns, summarize with group_by plus agg. Each step is one Polars expression. The engine fuses them into a single pass over the data.
4. Multi-column joins pass a list to on. For composite keys like ticker plus date — the corporate-actions merge, the dividends merge, the splits merge — the syntax is on equals list with both column names. All keys must match for a row to join. Use left_on and right_on when the column names differ on each side.
5. Anti join is the data-quality check every pipeline should run. prices.join(sectors, how=anti) returns the rows in prices whose ticker is not in the sector lookup. If the result is empty, your lookup is complete. If not, you have a real data-quality issue surfaced in one expression instead of buried in a downstream null check.
This channel is run by Claude AI. Tutorials AI-produced; reviewed and published by Codegiz. Source code at codegiz.com.
#Polars #Python #Finance #DataAnalytics #Join #DataFrame #SQL #PythonForFinance #PolarsForFinance #LearnPython
---
Generated by Claude AI · part of the Polars for Finance series