Excel to DataFrame in Python — pandas Tutorial

Celest Kim

•May 20, 2026

Video: Excel to DataFrame in Python — pandas Tutorial by CelesteAI

Take the quiz on the full lesson page

Test what you've read · interactive walkthrough

Most finance data lives in spreadsheets. Pricing sheets, contract terms, fund holdings, board decks. Excel is still the lingua franca of business data, and getting it into Python is the most common handoff every analyst has to solve.

The good news: pandas reads xlsx in one line. The output is a DataFrame — the same rows, the same columns, ready for everything that comes next.

The shape of the problem

Excel is great for capture and review. People type into cells, paste from web pages, share sheets back and forth. But once you need to join two datasets, run a rolling window, or aggregate by group, Excel starts to bend. Formulas slow down. Filters get stale. Pivot tables disagree.

That is the moment you reach for Python — and the very first line of every analysis script is the same: read the workbook into a DataFrame.

Method 1 — single sheet by name

The canonical pattern:

import pandas as pd

df = pd.read_excel("stock_summary.xlsx", sheet_name="Prices")
print(df.shape)   # (420, 8)

Two things to note:

sheet_name accepts either a name (a string) or a position (an integer, zero-indexed). Passing nothing defaults to the first sheet.
Pandas uses the openpyxl engine under the hood. It ships with most pandas installs. If you ever see ImportError: openpyxl is required, the fix is pip install openpyxl.

That’s it. One line in, one DataFrame out. Most analyst scripts never need more.

Method 2 — every sheet at once

What if the workbook has a dozen sheets and you want them all? Skip the loop. Pass sheet_name=None:

all_sheets = pd.read_excel("stock_summary.xlsx", sheet_name=None)
print({k: v.shape for k, v in all_sheets.items()})
# {'Prices': (420, 8), 'Tickers': (14, 2)}

The return type changes. Instead of one DataFrame, you get a dictionary keyed by sheet name. Each value is the DataFrame for that sheet. From there:

prices = all_sheets["Prices"]
tickers = all_sheets["Tickers"]
joined = prices.merge(tickers, on="Ticker")

This is the move that saves people the most code. Every for-loop-over-sheets pattern you’ve ever written is one keyword argument away from going away.

Method 3 — the ExcelFile context manager

When you need different slices of the same workbook — different columns from one sheet, different rows from another, parse-on-demand — wrapping the file in pd.ExcelFile is faster and cleaner:

with pd.ExcelFile("stock_summary.xlsx") as xl:
    prices = xl.parse("Prices")
    tickers = xl.parse("Tickers")

ExcelFile opens the workbook once. Each .parse() call reuses that open handle instead of cracking the zip and walking the XML again. On a small workbook the win is invisible. On a multi-megabyte book with twenty sheets, you’ll feel it.

The context manager (with ... as xl:) also guarantees the file closes cleanly when you’re done — useful in long-running scripts where you don’t want lingering file handles.

xl.parse() takes the same arguments as pd.read_excel: sheet_name, header, usecols, dtype, everything. Think of it as read_excel with the workbook pre-loaded.

When to use which

Need	Pattern
One sheet, one read	`pd.read_excel(file, sheet_name="Sheet1")`
Every sheet, into a dict	`pd.read_excel(file, sheet_name=None)`
Different slices of the same workbook	`with pd.ExcelFile(file) as xl: xl.parse(...)`
Repeated reads in a loop	`pd.ExcelFile` — open once, parse N times

What you’re not going to do

A few patterns that show up on Stack Overflow but you should avoid:

csv.reader then manual list-of-lists. Just use pandas. The CSV reader has its place, but for Excel it’s never the right answer.
openpyxl directly. It’s a low-level library — useful when you need cell-by-cell formatting, but overkill for reading data into a DataFrame.
Save as CSV then read_csv. Slower, lossy, and breaks every time someone re-opens the workbook and Excel changes a column type.
pd.read_excel in a for loop over sheet names. Use sheet_name=None or ExcelFile. Opening the file N times is wasted work.

The full demo

Here’s the complete script the video walks through:

import pandas as pd

FILE = "stock_summary.xlsx"

# Method 1: single sheet by name
df = pd.read_excel(FILE, sheet_name="Prices")
print(f"single sheet:         {df.shape}")

# Method 2: every sheet at once as a dict
all_sheets = pd.read_excel(FILE, sheet_name=None)
print(f"all sheets:           {[ (k, v.shape) for k, v in all_sheets.items() ]}")

# Method 3: ExcelFile context — read multiple sheets without re-opening
with pd.ExcelFile(FILE) as xl:
    prices = xl.parse("Prices")
    tickers = xl.parse("Tickers")
print(f"ExcelFile parsed:     prices={prices.shape}, tickers={tickers.shape}")

Running it:

single sheet:         (420, 8)
all sheets:           [('Prices', (420, 8)), ('Tickers', (14, 2))]
ExcelFile parsed:     prices=(420, 8), tickers=(14, 2)

Three patterns, one workbook, one library, same shape every time.

Takeaways

pd.read_excel is the canonical Excel-to-DataFrame in Python. One line per sheet. Works with any modern xlsx.
sheet_name=None returns every sheet as a dict. Skip every for-loop-over-sheets pattern.
pd.ExcelFile is the context manager for repeated reads. Open once, parse many times.
Don’t reach for openpyxl directly unless you need formatting. Pandas wraps it for the 95% case.
Excel is not your final layer. Read once, work in DataFrame.

Watch the video for the full demo, including the all-sheets-as-dict pattern and the ExcelFile context manager in action.

Ready? Take the quiz on the full lesson page →

Test what you've learned. Watch the lesson and try the interactive quiz on the same page.