Part of Pandas for Finance

Pandas for Finance: Writing Out: Excel, Parquet, DuckDB — One Pipeline, Four Formats

Celest KimCelest Kim

Video: Writing Out: Excel, Parquet, DuckDB — One Pipeline, Four Formats | Pandas for Finance Ep13 by CelesteAI

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

df.to_excel("out.xlsx", sheet_name="...") — single sheet. pd.ExcelWriter for multi-sheet. df.to_parquet(path) — fast columnar storage. duckdb.execute("CREATE TABLE prices AS FROM df") — persistent SQL DB. The output stage of every analyst pipeline.

You’ve cleaned, joined, computed. Now hand the result to the next person — or future-you. Match the format to the audience.

Single-sheet Excel

import pandas as pd

df = pd.read_parquet("data/prices.parquet")
summary = (
  df.groupby("Ticker")["Adj Close"]
    .agg(["first", "last", "min", "max", "mean"])
    .round(2)
)

summary.to_excel("summary.xlsx", sheet_name="Stats")

.to_excel(path, sheet_name="...") writes a single tab. The index becomes the leftmost column unless you pass index=False.

For sharing with finance teams, Excel is the universal hand-off format. Pandas → openpyxl → .xlsx.

Multi-sheet Excel

with pd.ExcelWriter("report.xlsx", engine="openpyxl") as writer:
  for ticker in df["Ticker"].unique():
    sub = df[df["Ticker"] == ticker].sort_values("Date")
    sub.to_excel(writer, sheet_name=ticker, index=False)

pd.ExcelWriter is a context manager. Inside, repeated .to_excel(writer, sheet_name=...) calls add tabs.

For our 14-ticker universe: one tab per ticker. The “send the analyst the data, they’ll click the tabs” workflow.

Excel formatting

For real reports, use the writer’s formatting:

with pd.ExcelWriter("formatted.xlsx", engine="xlsxwriter") as writer:
  summary.to_excel(writer, sheet_name="Summary")
  workbook = writer.book
  worksheet = writer.sheets["Summary"]

  # Currency format on column B
  money = workbook.add_format({"num_format": "$#,##0.00"})
  worksheet.set_column("B:F", 14, money)

  # Header bold
  header = workbook.add_format({"bold": True, "bg_color": "#D9E1F2"})
  worksheet.set_row(0, None, header)

xlsxwriter is faster than openpyxl for new files; use it when starting fresh. openpyxl is needed for editing existing files.

For frequent reports, write a wrapper function once, reuse.

Parquet

df.to_parquet("output.parquet")
df.to_parquet("output.parquet.gz", compression="gzip")

Default compression is snappy — fast, decent ratio. gzip for max compression at slower read.

Index handling:

df.to_parquet("out.parquet")              # writes index as a column
df.to_parquet("out.parquet", index=False) # drops the index

For DataFrames with a meaningful index (DatetimeIndex), keep it.

Partitioned Parquet

df.to_parquet("data/prices/", partition_cols=["Ticker"])

Creates data/prices/Ticker=AAPL/, data/prices/Ticker=MSFT/, etc. Each partition is a separate file. Tools that read parquet (DuckDB, Polars, Spark) can read just the partitions they need.

aapl = pd.read_parquet("data/prices/", filters=[("Ticker", "==", "AAPL")])

For our universe, single-file is fine. Partition when you have millions of rows or many tickers.

DuckDB persistent storage

import duckdb

con = duckdb.connect("market.duckdb")
con.execute("CREATE OR REPLACE TABLE prices AS FROM read_parquet('data/prices.parquet')")
con.execute("CREATE INDEX idx_ticker ON prices(Ticker)")
con.close()

DuckDB stores everything in market.duckdb — a single file, like SQLite for analytics. Reopen later:

import duckdb
con = duckdb.connect("market.duckdb")
df = con.execute("SELECT * FROM prices WHERE Ticker = 'AAPL'").df()

.df() materializes results as a pandas DataFrame. For transactional work or repeated queries, DuckDB beats parquet-only by avoiding the parse cost.

CSV (when you must)

df.to_csv("out.csv", index=False)
df.to_csv("out.csv.gz", index=False, compression="gzip")

Avoid for analysis files. Use for handoff to non-pandas tools (R, Stata, old systems).

df = pd.read_csv("out.csv", parse_dates=["Date"])

parse_dates= to recover datetime columns. Without it, dates come back as strings.

JSON

# Records — list of dicts, each row a dict
df.to_json("out.json", orient="records", date_format="iso")

# Read back
df2 = pd.read_json("out.json", orient="records")

orient="records" is the most useful — easy to consume from JS, web APIs, etc. date_format="iso" for machine-readable timestamps.

A complete report writer

import pandas as pd

df = pd.read_parquet("data/prices.parquet")
sectors = pd.read_csv("data/sector_map.csv")

# Compute report sections
df["Date"] = pd.to_datetime(df["Date"])
df["Return"] = df.groupby("Ticker")["Adj Close"].pct_change()

summary = df.groupby("Ticker").agg(
  FirstClose=("Adj Close", "first"),
  LastClose=("Adj Close", "last"),
  AvgVolume=("Volume", "mean"),
).merge(sectors, on="Ticker")
summary["TotalReturn"] = summary["LastClose"] / summary["FirstClose"] - 1
summary = summary.sort_values("TotalReturn", ascending=False).round(4)

monthly = (
  df.set_index("Date")
    .groupby("Ticker")["Adj Close"]
    .resample("ME").last()
    .unstack("Ticker")
)

sector_perf = (
  df.merge(sectors, on="Ticker")
    .groupby("Sector")["Return"]
    .agg(lambda r: (1 + r.mean()) ** 252 - 1)
    .sort_values(ascending=False)
    .round(4)
)

# Write multi-sheet report
with pd.ExcelWriter("market_report.xlsx", engine="openpyxl") as writer:
  summary.to_excel(writer, sheet_name="Summary", index=False)
  monthly.to_excel(writer, sheet_name="Monthly Close")
  sector_perf.to_excel(writer, sheet_name="Sector Performance")

print("Wrote market_report.xlsx")

# Also write parquet for downstream Python consumers
df.to_parquet("data/clean_prices.parquet")
print("Wrote data/clean_prices.parquet")

Two outputs: Excel for humans, Parquet for the next script. Matches how real teams operate.

Try it

writeout.py:

import pandas as pd

pd.set_option("display.float_format", lambda x: f"{x:,.2f}")

df = pd.read_parquet("data/prices.parquet")
df["Date"] = pd.to_datetime(df["Date"])

summary = (
  df.groupby("Ticker")
    .agg(
      First=("Adj Close", "first"),
      Last=("Adj Close", "last"),
      AvgVol=("Volume", "mean"),
    )
    .round(2)
)
summary["TotalReturn"] = summary["Last"] / summary["First"] - 1

print("=== Summary ===")
print(summary)

print("\n=== Writing multi-sheet Excel ===")
with pd.ExcelWriter("/tmp/p4f_report.xlsx", engine="openpyxl") as writer:
  summary.to_excel(writer, sheet_name="Summary")
  for ticker in df["Ticker"].unique()[:3]:    # first 3 tickers
    sub = df[df["Ticker"] == ticker].sort_values("Date")
    sub.to_excel(writer, sheet_name=ticker, index=False)
print("  → /tmp/p4f_report.xlsx")

print("\n=== Writing parquet ===")
df.to_parquet("/tmp/p4f_clean.parquet")
print("  → /tmp/p4f_clean.parquet")

print("\n=== Sanity: reload parquet ===")
back = pd.read_parquet("/tmp/p4f_clean.parquet")
print(f"Reloaded {len(back):,} rows")

Common stumbles

openpyxl missing. pip install openpyxl for Excel. pip install xlsxwriter for the alternative engine.

to_excel is slow. For 100k+ rows, expect seconds. Save parquet for analysis; Excel only for final reports.

Index in CSV. Default to_csv writes the index. Use index=False for clean Excel-friendly files.

Date round-trip in CSV. Dates become strings in CSV. pd.read_csv(..., parse_dates=["Date"]) to recover.

Parquet engine differences. pyarrow (default) and fastparquet produce slightly different files. Stick with pyarrow — the modern standard.

Multi-sheet write order. Sheets appear in the order you write them. Plan accordingly.

xlsxwriter can’t edit existing. It creates new files only. For editing, use openpyxl with mode="a" (append).

What’s next

Ep 14: build a portfolio backtest — equal-weight, monthly rebalanced, vs SPY benchmark.

Recap

Excel for humans (pd.ExcelWriter for multi-sheet). Parquet for analysis (fast, columnar, typed). DuckDB for SQL workloads (persistent file like SQLite). CSV only for hand-off to non-pandas tools. Always index=False for CSV unless the index is meaningful. For dates in CSV, round-trip with parse_dates=. For huge parquet, partition by frequently-filtered column (partition_cols=["Ticker"]).

Next episode: portfolio backtest.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.