Part of Pandas for Finance

Pandas for Finance: Capstone Backtest: Equal-Weight Portfolio vs SPY in Pandas

Celest KimCelest Kim

Video: Capstone Backtest: Equal-Weight Portfolio vs SPY in Pandas | Pandas for Finance Ep14 by CelesteAI

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

Equal-weight 5 stocks, monthly rebalanced. Compute the wealth curve, Sharpe, max drawdown, vs SPY. The “I want to learn pandas to build a backtester” moment — assembled from everything in the previous 13 episodes.

This is the capstone: load → clean → compute returns → build a portfolio → benchmark → report. Real, end-to-end.

The setup

import pandas as pd

PORTFOLIO = ["AAPL", "MSFT", "GOOGL", "AMZN", "NVDA"]
BENCHMARK = "SPY"
RF = 0.04   # risk-free rate, for Sharpe

df = pd.read_parquet("data/prices.parquet")
df["Date"] = pd.to_datetime(df["Date"])
df = df.sort_values(["Ticker", "Date"])

Five mega-caps. Equal weight (20% each). Monthly rebalance back to equal weight.

Pivot to wide

For portfolio math, wide format is easier — one column per ticker:

wide = (
  df[df["Ticker"].isin(PORTFOLIO + [BENCHMARK])]
    .pivot(index="Date", columns="Ticker", values="Adj Close")
    .ffill()
)
print(wide.tail(3))
Ticker         AAPL    AMZN   GOOGL    MSFT    NVDA      SPY
Date
2024-12-30   252.20  219.39  191.59  416.76  134.29   586.08
2024-12-31   250.42  219.39  189.30  421.50  134.29   586.08

Each column is a ticker; index is date.

Daily returns

returns = wide.pct_change().dropna()
print(returns.head())

pct_change() works on each column. Drop the first row (NaN).

Equal-weight portfolio

weights = pd.Series(1 / len(PORTFOLIO), index=PORTFOLIO)
print(weights)
# AAPL    0.2
# AMZN    0.2
# GOOGL   0.2
# MSFT    0.2
# NVDA    0.2

portfolio_returns = (returns[PORTFOLIO] * weights).sum(axis=1)

Multiply each column by its weight (broadcasts), sum across columns. The portfolio’s daily return — still in long-only, no leverage.

Add the benchmark

benchmark_returns = returns[BENCHMARK]

combined = pd.DataFrame({
  "Portfolio": portfolio_returns,
  "Benchmark": benchmark_returns,
})

print(combined.head())

Two return streams aligned by date.

Wealth curves

combined["PortfolioWealth"] = (1 + combined["Portfolio"]).cumprod()
combined["BenchmarkWealth"] = (1 + combined["Benchmark"]).cumprod()

print(combined[["PortfolioWealth", "BenchmarkWealth"]].iloc[[0, -1]])
Date
2018-01-04   1.0000  1.0000
2024-12-31   5.7124  2.0892

Five-bagger vs roughly doubling. Dramatic — but heavily weighted toward the post-2019 tech rally.

Performance metrics

def stats(returns: pd.Series, rf: float = 0.04) -> dict:
  mu = returns.mean()
  sigma = returns.std()
  ann_return = (1 + mu) ** 252 - 1
  ann_vol = sigma * (252 ** 0.5)
  sharpe = (ann_return - rf) / ann_vol if ann_vol > 0 else float("nan")

  wealth = (1 + returns.fillna(0)).cumprod()
  peak = wealth.expanding().max()
  drawdown = wealth / peak - 1
  max_dd = drawdown.min()

  return {
    "AnnReturn": ann_return,
    "AnnVol": ann_vol,
    "Sharpe": sharpe,
    "MaxDD": max_dd,
  }


port_stats = stats(combined["Portfolio"], rf=RF)
bench_stats = stats(combined["Benchmark"], rf=RF)

report = pd.DataFrame({"Portfolio": port_stats, "Benchmark": bench_stats}).round(4)
print(report)
            Portfolio  Benchmark
AnnReturn      0.2876     0.1132
AnnVol         0.2455     0.1820
Sharpe         1.01       0.40
MaxDD         -0.3540    -0.3392

Portfolio crushed the benchmark on return, with ~35% more vol — Sharpe still wins. Max drawdown ~35% in both (the 2022 bear).

Monthly rebalancing

Daily rebalancing back to equal weight is unrealistic. Real portfolios rebalance monthly or quarterly:

# Resample to month-end
monthly_close = wide[PORTFOLIO].resample("ME").last()
monthly_returns_per_stock = monthly_close.pct_change().dropna()

# Each month, equal weights
monthly_port_return = (monthly_returns_per_stock * weights).sum(axis=1)

monthly_wealth = (1 + monthly_port_return).cumprod()
print(monthly_wealth.tail())

Result is a monthly-rebalanced portfolio: at each month-end, sell winners, buy losers, return to 20% each.

In practice, monthly rebalancing modestly underperforms holding (in trending markets) and modestly outperforms in mean-reverting markets. For our 7 years, it’s roughly the same.

Buy-and-hold (no rebalance)

# Each stock's wealth curve, equally-weighted on day 1 only
initial_value = 1 / len(PORTFOLIO)
stock_wealth = wide[PORTFOLIO].divide(wide[PORTFOLIO].iloc[0])
buy_hold = stock_wealth.sum(axis=1) * initial_value
print(buy_hold.tail())

Day 1: each stock gets 20% of $1. Then let it ride. Tickers that grew faster end up larger weights (NVDA dominates by 2024).

Compare to equal-weight monthly:

print("Buy-and-hold final:", buy_hold.iloc[-1])
print("Monthly EW final:", (1 + monthly_port_return).cumprod().iloc[-1])

Per-ticker contribution

Where did the returns come from?

contributions = stock_wealth.iloc[-1] - 1
contributions = contributions / contributions.sum()
print((contributions * 100).round(1))

Spoiler: NVDA, then AAPL, then MSFT.

The full backtest in 50 lines

import pandas as pd

PORTFOLIO = ["AAPL", "MSFT", "GOOGL", "AMZN", "NVDA"]
BENCHMARK = "SPY"
RF = 0.04

# Load + pivot
df = pd.read_parquet("data/prices.parquet")
df["Date"] = pd.to_datetime(df["Date"])
wide = (
  df[df["Ticker"].isin(PORTFOLIO + [BENCHMARK])]
    .pivot(index="Date", columns="Ticker", values="Adj Close")
    .ffill()
)

# Returns
returns = wide.pct_change().dropna()

# Equal-weight portfolio
weights = pd.Series(1 / len(PORTFOLIO), index=PORTFOLIO)
port_ret = (returns[PORTFOLIO] * weights).sum(axis=1)
bench_ret = returns[BENCHMARK]


def stats(r, rf=RF):
  mu, sd = r.mean(), r.std()
  ann_r = (1 + mu) ** 252 - 1
  ann_v = sd * (252 ** 0.5)
  wealth = (1 + r.fillna(0)).cumprod()
  dd = (wealth / wealth.expanding().max() - 1).min()
  return {
    "AnnReturn": f"{ann_r:.2%}",
    "AnnVol":    f"{ann_v:.2%}",
    "Sharpe":    f"{(ann_r - rf) / ann_v:.2f}",
    "MaxDD":     f"{dd:.2%}",
  }


report = pd.DataFrame({
  "Portfolio": stats(port_ret),
  "Benchmark (SPY)": stats(bench_ret),
})
print(report)

# Wealth curves to CSV for plotting
out = pd.DataFrame({
  "Portfolio": (1 + port_ret).cumprod(),
  "Benchmark": (1 + bench_ret).cumprod(),
})
out.to_csv("backtest_wealth.csv")
print("Wrote backtest_wealth.csv")

End-to-end. Load, pivot, weight, aggregate, report, save. The 50 lines that replace 200 lines of Excel and 50 SQL queries.

Try it

backtest.py:

import pandas as pd

PORTFOLIO = ["AAPL", "MSFT", "GOOGL", "AMZN", "NVDA"]
BENCHMARK = "SPY"
RF = 0.04

df = pd.read_parquet("data/prices.parquet")
df["Date"] = pd.to_datetime(df["Date"])
wide = (
  df[df["Ticker"].isin(PORTFOLIO + [BENCHMARK])]
    .pivot(index="Date", columns="Ticker", values="Adj Close")
    .ffill()
)
returns = wide.pct_change().dropna()

weights = pd.Series(1 / len(PORTFOLIO), index=PORTFOLIO)
port_ret = (returns[PORTFOLIO] * weights).sum(axis=1)
bench_ret = returns[BENCHMARK]


def stats(r, rf=RF):
  mu, sd = r.mean(), r.std()
  ann_r = (1 + mu) ** 252 - 1
  ann_v = sd * (252 ** 0.5)
  wealth = (1 + r.fillna(0)).cumprod()
  dd = (wealth / wealth.expanding().max() - 1).min()
  return {
    "AnnReturn": f"{ann_r:.2%}",
    "AnnVol":    f"{ann_v:.2%}",
    "Sharpe":    f"{(ann_r - rf) / ann_v:.2f}",
    "MaxDD":     f"{dd:.2%}",
  }


report = pd.DataFrame({
  "Portfolio": stats(port_ret),
  "Benchmark (SPY)": stats(bench_ret),
})
print(report)

out = pd.DataFrame({
  "Portfolio": (1 + port_ret).cumprod(),
  "Benchmark": (1 + bench_ret).cumprod(),
})
out.to_csv("/tmp/backtest_wealth.csv")
print("Wrote /tmp/backtest_wealth.csv")

Common stumbles

Look-ahead bias. Using day N’s close to “buy at day N’s open” — impossible IRL. Lag execution by one day if simulating fills.

Survivorship bias. Our universe excludes tickers that delisted/went bankrupt. Real backtests need a survivor-free universe.

Ignoring transaction costs. Daily rebalancing in the real world costs basis points. Add a fee per turnover.

Pivot misalignment. If a ticker doesn’t trade on a date, it’s NaN in wide. ffill() handles this — without it, your portfolio return becomes NaN.

Annualizing wrong. mu * 252 for simple returns is approximate; geometrically correct: (1 + mu) ** 252 - 1.

Starting from 0% return on day 1. First row’s pct_change is NaN. dropna() or fillna(0).

Forgetting the benchmark. Always compare to a passive benchmark. Outperforming SPY by 2 points means you earned alpha; matching it means you got lucky and paid fees.

What’s next

Ep 15: pandas + DuckDB at scale. When pandas slows on big universes, hand off SQL to DuckDB.

Recap

End-to-end backtest in pandas: load → pivot wide → daily returns → weighted portfolio return → wealth curve → Sharpe + drawdown. (returns[tickers] * weights).sum(axis=1) for daily portfolio return. (1 + r).cumprod() for wealth. wealth / wealth.expanding().max() - 1 for drawdown. Always benchmark against a passive ETF (SPY, ACWI, AGG) — beating the market is the only meaningful claim.

Next episode: pandas + DuckDB at scale.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.