How to Merge CSV Files in Python — Tutorial
About this lesson
You have a folder of CSV files. Broker exports, monthly data drops, partitioned dumps — same schema across all of them, just different row sets. The output you want is the row-wise concatenation: stack them top to bottom, keep the column order, end up with one DataFrame. This tutorial shows the four ways every Python data person solves this, on real data — three CSV files covering fourteen tickers and twenty eight thousand rows of OHLCV, merged into one frame three different ways. Source code: https://github.com/GoCelesteAI/merge-csv-files Method one is pd.concat — two lines, the default for anyone already using pandas. Method two is pl.concat with the same shape but five times faster on the same input. Method three is the killer feature: pl.scan_csv with a glob expression, one line, lazy by default, supports filter pushdown so you can grab just the AAPL rows from the entire folder without ever materializing the full frame. Method four is the built-in csv module for the locked-down no-dependencies scenario — verbose, slow, included for completeness. What You'll Build: - merge_csv.py — three CSV files in a folder, four ways to merge them. Pandas concat. Polars concat. Polars scan_csv with glob. Plain csv module fallback. Each method prints its output shape so you can see all four arrive at the same 28140 x 8 frame. - The pd.concat idiom — list comprehension over glob results, concat with ignore_index. Two lines if you count the import. - The pl.concat alternative — same shape, faster execution, lower memory footprint. Add to_pandas if your downstream is pandas. - pl.scan_csv with a glob expression — the one-line merge. Lazy by default. Chain filter and select before collect to read only what you need from disk. - The schema-drift gotcha and how each library handles it. Polars defaults to strict mode and raises on type mismatch; pandas silently upcasts. The fix is the same in both: write the merged frame as parquet so the schema survives reload. - The how=diagonal
DeepCamp AI