Part of Pandas for Finance

Pandas for Finance: Why Pandas When You Have Excel: 5 Years of Apple in 5 Lines

Celest KimCelest Kim

Video: Why Pandas When You Have Excel: 5 Years of Apple in 5 Lines | Pandas for Finance Ep1 by CelesteAI

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

If you can already query SQL and pivot tables in Excel, pandas is the next mile. Three lines: import yfinance as yfdf = yf.download("AAPL", start="2020-01-01")df.head(). That’s 5 years of Apple price history in your terminal — ready to filter, group, and plot.

You’re a finance analyst. You know Excel. Maybe SQL too. This series is the bridge to Python’s data toolkit, with real market data as the running example.

What pandas gives you that Excel doesn’t

Need Excel Pandas
Open 2 million rows crashes instant
Reproducible from a CSV screenshot the steps run a script
30-day moving average across 50 stocks hand-build, fragile one line
Glue together stock prices, sectors, fundamentals Power Query if you’re patient merge
Version-control your analysis git

Pandas is a Python library. You write small scripts that read data, transform it, and print or plot results. The first script is 5 lines.

Setup

You need Python 3.11+ and a few packages.

python3 -m venv .venv
source .venv/bin/activate
pip install pandas yfinance duckdb pyarrow matplotlib

That’s it. yfinance downloads stock data straight from Yahoo Finance, free and no API key.

Your first script

Open a terminal in a fresh directory. Write a Python file:

nvim first.py

Type:

import yfinance as yf

df = yf.download("AAPL", start="2020-01-01", end="2025-01-01")
print(df.head())
print(df.shape)

Save (:wq), then run:

python first.py

Output (truncated):

                  Open      High       Low     Close  Adj Close      Volume
Date
2020-01-02  74.060    75.150   73.797   75.087     72.876  135480400
2020-01-03  74.287    75.144   74.125   74.357     72.167  146322800
2020-01-06  73.447    74.989   73.187   74.949     72.741  118387200
2020-01-07  74.959    75.224   74.370   74.597     72.399  108872000
2020-01-08  74.290    76.110   74.290   75.797     73.564  132079200
(1257, 6)

Five years of daily Apple bars. (1257, 6) is the shape — 1,257 rows, 6 columns. That’s it; you’ve loaded a market dataset.

What just happened

import yfinance as yf

Loads the yfinance library and gives it the short name yf. Standard convention.

df = yf.download("AAPL", start="2020-01-01", end="2025-01-01")

yf.download(...) fetches the data and returns a DataFrame — the pandas equivalent of an Excel sheet, named df by long convention.

The DataFrame has: - Index: Date (the row labels — auto-set to the trading dates). - Columns: Open, High, Low, Close, Adj Close, Volume. - 1,257 rows of trading days.

print(df.head())

.head() returns the first 5 rows. There’s also .tail() (last 5), .sample(5) (random 5).

print(df.shape)

.shape is a (rows, cols) tuple. Quick way to confirm “yes I got the data I expected.”

Cleaner output

The default DataFrame print can be wide. Add a couple of display options at the top of the script:

import pandas as pd
import yfinance as yf

pd.set_option("display.max_columns", 10)
pd.set_option("display.width", 120)
pd.set_option("display.float_format", lambda x: f"{x:,.2f}")

df = yf.download("AAPL", start="2020-01-01", end="2025-01-01")
print(df.head())

Now floats print with two decimals and commas. Set once in any script that prints DataFrames.

A first calculation

Still in first.py:

print(df["Close"].max())   # all-time high close in this window
print(df["Close"].min())   # low
print(df["Close"].mean())  # average

Output:

$258.39
$56.09
$155.27

df["Close"] selects one column — a 1D Series. .max(), .min(), .mean() return scalars.

What was the highest day? Index of the max:

print(df["Close"].idxmax())
# 2024-12-26 00:00:00

That’s a Date — pandas understands the index is dates.

Why scripts and not Jupyter?

You’ll see most pandas tutorials use Jupyter notebooks. They’re great for exploration. But for this series:

  • Scripts are easier to version-control (git diffs work).
  • Scripts are easier to run on a schedule (cron, GitHub Actions).
  • Scripts force you to think about the whole flow in one file.
  • Print statements are enough — you don’t need cells to inspect.

You can copy any script from this series, drop it in a notebook, and it’ll work. We’re keeping the runtime simple.

What’s coming

Each episode is a small standalone script that builds one new skill:

  • Ep 2 — the DataFrame mental model: what is the index, what’s a column, what’s a Series.
  • Ep 3 — caching to parquet so you stop re-downloading.
  • Ep 4–5 — filtering rows, picking columns.
  • Ep 6 — daily and cumulative returns.
  • Ep 7–8 — groupby and joins.
  • Ep 9–11 — dates, resampling, rolling windows (the time-series superpowers).
  • Ep 12–13 — cleaning, exporting.
  • Ep 14 — a full backtest: equal-weight portfolio vs SPY.
  • Ep 15 — pandas + DuckDB for big universes.

By the end you’ll be able to write a Monday-morning report script: download, clean, aggregate, save to Excel. The skill stack analysts ask for in 2026.

Common stumbles

pip install failing. macOS sometimes ships an old Python. python3.11 -m venv .venv (or whichever version you have) avoids the system Python entirely.

No module named 'yfinance'. You’re in the wrong terminal — venv not activated. source .venv/bin/activate before running.

Empty DataFrame. Yahoo sometimes throttles unauthenticated requests. Wait a minute and retry, or pin a period="5y" instead of date range.

Can’t see all columns. Set pd.set_option("display.max_columns", None) to print every column. The default is 20.

MultiIndex columns when downloading multiple tickers. yf.download(["AAPL","MSFT"]) returns columns like (Open, AAPL). We tackle this in Ep 2 — a different shape, same DataFrame.

What’s next

Ep 2: the DataFrame mental model. Index, columns, Series. Why the DataFrame is the analyst’s home in Python.

Recap

Install pandas + yfinance. Open nvim, write a 5-line script, run with python file.py. yf.download(ticker, start, end) returns a DataFrame; .head() shows first rows, .shape shows dimensions, df["Close"].max() etc. for column stats. Set display options once for clean output. Scripts, not notebooks — easier to version, schedule, share. The series builds toward a real Monday-morning analyst pipeline.

Next episode: the DataFrame mental model.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.