Pandas for Finance: Why Pandas When You Have Excel: 5 Years of Apple in 5 Lines
Video: Why Pandas When You Have Excel: 5 Years of Apple in 5 Lines | Pandas for Finance Ep1 by CelesteAI
If you can already query SQL and pivot tables in Excel, pandas is the next mile. Three lines:
import yfinance as yf→df = yf.download("AAPL", start="2020-01-01")→df.head(). That’s 5 years of Apple price history in your terminal — ready to filter, group, and plot.
You’re a finance analyst. You know Excel. Maybe SQL too. This series is the bridge to Python’s data toolkit, with real market data as the running example.
What pandas gives you that Excel doesn’t
| Need | Excel | Pandas |
|---|---|---|
| Open 2 million rows | crashes | instant |
| Reproducible from a CSV | screenshot the steps | run a script |
| 30-day moving average across 50 stocks | hand-build, fragile | one line |
| Glue together stock prices, sectors, fundamentals | Power Query if you’re patient | merge |
| Version-control your analysis | … | git |
Pandas is a Python library. You write small scripts that read data, transform it, and print or plot results. The first script is 5 lines.
Setup
You need Python 3.11+ and a few packages.
python3 -m venv .venv
source .venv/bin/activate
pip install pandas yfinance duckdb pyarrow matplotlib
That’s it. yfinance downloads stock data straight from Yahoo Finance, free and no API key.
Your first script
Open a terminal in a fresh directory. Write a Python file:
nvim first.py
Type:
import yfinance as yf
df = yf.download("AAPL", start="2020-01-01", end="2025-01-01")
print(df.head())
print(df.shape)
Save (:wq), then run:
python first.py
Output (truncated):
Open High Low Close Adj Close Volume
Date
2020-01-02 74.060 75.150 73.797 75.087 72.876 135480400
2020-01-03 74.287 75.144 74.125 74.357 72.167 146322800
2020-01-06 73.447 74.989 73.187 74.949 72.741 118387200
2020-01-07 74.959 75.224 74.370 74.597 72.399 108872000
2020-01-08 74.290 76.110 74.290 75.797 73.564 132079200
(1257, 6)
Five years of daily Apple bars. (1257, 6) is the shape — 1,257 rows, 6 columns. That’s it; you’ve loaded a market dataset.
What just happened
import yfinance as yf
Loads the yfinance library and gives it the short name yf. Standard convention.
df = yf.download("AAPL", start="2020-01-01", end="2025-01-01")
yf.download(...) fetches the data and returns a DataFrame — the pandas equivalent of an Excel sheet, named df by long convention.
The DataFrame has:
- Index: Date (the row labels — auto-set to the trading dates).
- Columns: Open, High, Low, Close, Adj Close, Volume.
- 1,257 rows of trading days.
print(df.head())
.head() returns the first 5 rows. There’s also .tail() (last 5), .sample(5) (random 5).
print(df.shape)
.shape is a (rows, cols) tuple. Quick way to confirm “yes I got the data I expected.”
Cleaner output
The default DataFrame print can be wide. Add a couple of display options at the top of the script:
import pandas as pd
import yfinance as yf
pd.set_option("display.max_columns", 10)
pd.set_option("display.width", 120)
pd.set_option("display.float_format", lambda x: f"{x:,.2f}")
df = yf.download("AAPL", start="2020-01-01", end="2025-01-01")
print(df.head())
Now floats print with two decimals and commas. Set once in any script that prints DataFrames.
A first calculation
Still in first.py:
print(df["Close"].max()) # all-time high close in this window
print(df["Close"].min()) # low
print(df["Close"].mean()) # average
Output:
$258.39
$56.09
$155.27
df["Close"] selects one column — a 1D Series. .max(), .min(), .mean() return scalars.
What was the highest day? Index of the max:
print(df["Close"].idxmax())
# 2024-12-26 00:00:00
That’s a Date — pandas understands the index is dates.
Why scripts and not Jupyter?
You’ll see most pandas tutorials use Jupyter notebooks. They’re great for exploration. But for this series:
- Scripts are easier to version-control (git diffs work).
- Scripts are easier to run on a schedule (cron, GitHub Actions).
- Scripts force you to think about the whole flow in one file.
- Print statements are enough — you don’t need cells to inspect.
You can copy any script from this series, drop it in a notebook, and it’ll work. We’re keeping the runtime simple.
What’s coming
Each episode is a small standalone script that builds one new skill:
- Ep 2 — the DataFrame mental model: what is the index, what’s a column, what’s a Series.
- Ep 3 — caching to parquet so you stop re-downloading.
- Ep 4–5 — filtering rows, picking columns.
- Ep 6 — daily and cumulative returns.
- Ep 7–8 — groupby and joins.
- Ep 9–11 — dates, resampling, rolling windows (the time-series superpowers).
- Ep 12–13 — cleaning, exporting.
- Ep 14 — a full backtest: equal-weight portfolio vs SPY.
- Ep 15 — pandas + DuckDB for big universes.
By the end you’ll be able to write a Monday-morning report script: download, clean, aggregate, save to Excel. The skill stack analysts ask for in 2026.
Common stumbles
pip install failing. macOS sometimes ships an old Python. python3.11 -m venv .venv (or whichever version you have) avoids the system Python entirely.
No module named 'yfinance'. You’re in the wrong terminal — venv not activated. source .venv/bin/activate before running.
Empty DataFrame. Yahoo sometimes throttles unauthenticated requests. Wait a minute and retry, or pin a period="5y" instead of date range.
Can’t see all columns. Set pd.set_option("display.max_columns", None) to print every column. The default is 20.
MultiIndex columns when downloading multiple tickers. yf.download(["AAPL","MSFT"]) returns columns like (Open, AAPL). We tackle this in Ep 2 — a different shape, same DataFrame.
What’s next
Ep 2: the DataFrame mental model. Index, columns, Series. Why the DataFrame is the analyst’s home in Python.
Recap
Install pandas + yfinance. Open nvim, write a 5-line script, run with python file.py. yf.download(ticker, start, end) returns a DataFrame; .head() shows first rows, .shape shows dimensions, df["Close"].max() etc. for column stats. Set display options once for clean output. Scripts, not notebooks — easier to version, schedule, share. The series builds toward a real Monday-morning analyst pipeline.
Next episode: the DataFrame mental model.