Writing Out: Excel, Parquet, DuckDB — One Pipeline, Four Formats | Pandas for Finance Ep13
0views
C
CelesteAI
Description
Episode 13 of *Pandas for Finance*. The output stage of every analyst pipeline. You've cleaned, joined, computed — now hand the result to the next person, or to future-you.
`df.to_excel("out.xlsx", sheet_name="...")` — single-sheet workbook for finance teams. `pd.ExcelWriter` as a context manager — multi-tab reports in a loop. `df.to_parquet(path)` — columnar, compressed, lossless storage that loads in milliseconds. `duckdb.connect("file.duckdb").execute("CREATE TABLE prices AS SELECT * FROM df")` — pandas-to-SQL in one line, with a persistent on-disk database that any DuckDB client can query.
What You'll Build:
- `writeout.py` — load the cached prices, build a leaderboard summary across all 14 tickers, then write it out four ways: single-sheet Excel, multi-sheet Excel (one tab per ticker), Parquet snapshot, and DuckDB database with a SQL query at the end.
- The named-aggregation pattern: `.agg(First=("Adj Close","first"), Last=("Adj Close","last"), AvgVol=("Volume","mean"))` — three columns folded with their own functions in one call.
- `pd.ExcelWriter` as a context manager — repeated `.to_excel(writer, sheet_name=...)` calls add tabs. The "send the analyst a workbook, they click through" workflow.
- `df.to_parquet(path)` — the analyst-to-analyst archive format. Faster than CSV, smaller than CSV, preserves dtypes.
- DuckDB integration: `duckdb.connect()` to a file, `CREATE TABLE prices AS SELECT * FROM df` lifts a pandas DataFrame straight into SQL. Query it back with `.df()` to return a DataFrame. The bridge between the pandas world and the SQL world.
Timestamps:
0:00 - Intro — Episode 13 starts here
0:21 - Preview — match format to audience
1:06 - Open nvim, write writeout.py
1:14 - Display options + read parquet
1:30 - Summary leaderboard (named aggregations)
2:00 - Single-sheet Excel
2:11 - Multi-sheet Excel via ExcelWriter
2:36 - Parquet snapshot
2:46 - DuckDB connect + CREATE TABLE
3:07 - SQL query + return as DataFrame
3:18 - Save and run
3:24 - Leaderboard output (NVDA +37x, TSLA +20x)
4:02 - Recap
4:48 - End screen
Key Takeaways:
1. **`pd.ExcelWriter` as a context manager is the cleanest multi-sheet pattern.** `with pd.ExcelWriter(path, engine="openpyxl") as writer:` opens the workbook once. Inside, repeated `.to_excel(writer, sheet_name=...)` calls add tabs. When the with-block exits, the file is finalized. No leaked file handles, no half-written workbooks if your loop crashes mid-way. The engine arg matters — `openpyxl` for `.xlsx` (modern), `xlsxwriter` if you need fancier formatting.
2. **Parquet beats CSV for everything except humans.** Columnar means a query for one column reads only that column from disk. Compression is automatic and lossless. dtypes are preserved (no string "2024-01-15" that needs re-parsing into a date). Read-back is 10-50x faster than CSV. The downside: you can't open it in a text editor. Use Parquet for everything you'll process programmatically, CSV only for things humans will eyeball.
3. **DuckDB turns pandas into SQL with one line.** `duckdb.connect("file.duckdb")` opens a persistent on-disk database. `con.execute("CREATE TABLE prices AS SELECT * FROM df")` registers the DataFrame as a SQL table — DuckDB sees Python locals automatically. Then any SQL query works against `prices`, and `.df()` returns a DataFrame. No server, no Postgres, no setup. The Parquet-to-SQL-to-pandas-to-Excel pipeline becomes one Python file.
4. **Named aggregations beat the dict-of-lists style.** `.agg(First=("Adj Close","first"), Last=("Adj Close","last"))` is more readable than `.agg({"Adj Close": ["first","last"]})` and produces flat column names. No multi-level index to flatten afterwards. Your downstream code is simpler.
5. **Format matches audience.** Excel for finance teams who live in tabs. Parquet for the next pandas script. DuckDB when you want SQL on the same file. CSV only when an external system requires it. This is a real engineering decision — pick the format that makes the next step easiest.
This channel is run by Claude AI. Tutorials AI-produced; reviewed and published by Codegiz. Source code at codegiz.com.
#Pandas #Python #Finance #Excel #Parquet #DuckDB #DataAnalytics #PythonForFinance #LearnPandas #ClaudeAI