Excel to DataFrame in Python — pandas Tutorial
Video: Excel to DataFrame in Python — pandas Tutorial by CelesteAI
Most finance data lives in spreadsheets. Pricing sheets, contract terms, fund holdings, board decks. Excel is still the lingua franca of business data, and getting it into Python is the most common handoff every analyst has to solve.
The good news: pandas reads xlsx in one line. The output is a DataFrame — the same rows, the same columns, ready for everything that comes next.
The shape of the problem
Excel is great for capture and review. People type into cells, paste from web pages, share sheets back and forth. But once you need to join two datasets, run a rolling window, or aggregate by group, Excel starts to bend. Formulas slow down. Filters get stale. Pivot tables disagree.
That is the moment you reach for Python — and the very first line of every analysis script is the same: read the workbook into a DataFrame.
Method 1 — single sheet by name
The canonical pattern:
import pandas as pd
df = pd.read_excel("stock_summary.xlsx", sheet_name="Prices")
print(df.shape) # (420, 8)
Two things to note:
sheet_nameaccepts either a name (a string) or a position (an integer, zero-indexed). Passing nothing defaults to the first sheet.- Pandas uses the
openpyxlengine under the hood. It ships with most pandas installs. If you ever seeImportError: openpyxl is required, the fix ispip install openpyxl.
That’s it. One line in, one DataFrame out. Most analyst scripts never need more.
Method 2 — every sheet at once
What if the workbook has a dozen sheets and you want them all? Skip the loop. Pass sheet_name=None:
all_sheets = pd.read_excel("stock_summary.xlsx", sheet_name=None)
print({k: v.shape for k, v in all_sheets.items()})
# {'Prices': (420, 8), 'Tickers': (14, 2)}
The return type changes. Instead of one DataFrame, you get a dictionary keyed by sheet name. Each value is the DataFrame for that sheet. From there:
prices = all_sheets["Prices"]
tickers = all_sheets["Tickers"]
joined = prices.merge(tickers, on="Ticker")
This is the move that saves people the most code. Every for-loop-over-sheets pattern you’ve ever written is one keyword argument away from going away.
Method 3 — the ExcelFile context manager
When you need different slices of the same workbook — different columns from one sheet, different rows from another, parse-on-demand — wrapping the file in pd.ExcelFile is faster and cleaner:
with pd.ExcelFile("stock_summary.xlsx") as xl:
prices = xl.parse("Prices")
tickers = xl.parse("Tickers")
ExcelFile opens the workbook once. Each .parse() call reuses that open handle instead of cracking the zip and walking the XML again. On a small workbook the win is invisible. On a multi-megabyte book with twenty sheets, you’ll feel it.
The context manager (with ... as xl:) also guarantees the file closes cleanly when you’re done — useful in long-running scripts where you don’t want lingering file handles.
xl.parse() takes the same arguments as pd.read_excel: sheet_name, header, usecols, dtype, everything. Think of it as read_excel with the workbook pre-loaded.
When to use which
| Need | Pattern |
|---|---|
| One sheet, one read | pd.read_excel(file, sheet_name="Sheet1") |
| Every sheet, into a dict | pd.read_excel(file, sheet_name=None) |
| Different slices of the same workbook | with pd.ExcelFile(file) as xl: xl.parse(...) |
| Repeated reads in a loop | pd.ExcelFile — open once, parse N times |
What you’re not going to do
A few patterns that show up on Stack Overflow but you should avoid:
csv.readerthen manual list-of-lists. Just use pandas. The CSV reader has its place, but for Excel it’s never the right answer.openpyxldirectly. It’s a low-level library — useful when you need cell-by-cell formatting, but overkill for reading data into a DataFrame.- Save as CSV then
read_csv. Slower, lossy, and breaks every time someone re-opens the workbook and Excel changes a column type. pd.read_excelin aforloop over sheet names. Usesheet_name=NoneorExcelFile. Opening the file N times is wasted work.
The full demo
Here’s the complete script the video walks through:
import pandas as pd
FILE = "stock_summary.xlsx"
# Method 1: single sheet by name
df = pd.read_excel(FILE, sheet_name="Prices")
print(f"single sheet: {df.shape}")
# Method 2: every sheet at once as a dict
all_sheets = pd.read_excel(FILE, sheet_name=None)
print(f"all sheets: {[ (k, v.shape) for k, v in all_sheets.items() ]}")
# Method 3: ExcelFile context — read multiple sheets without re-opening
with pd.ExcelFile(FILE) as xl:
prices = xl.parse("Prices")
tickers = xl.parse("Tickers")
print(f"ExcelFile parsed: prices={prices.shape}, tickers={tickers.shape}")
Running it:
single sheet: (420, 8)
all sheets: [('Prices', (420, 8)), ('Tickers', (14, 2))]
ExcelFile parsed: prices=(420, 8), tickers=(14, 2)
Three patterns, one workbook, one library, same shape every time.
Takeaways
pd.read_excelis the canonical Excel-to-DataFrame in Python. One line per sheet. Works with any modern xlsx.sheet_name=Nonereturns every sheet as a dict. Skip every for-loop-over-sheets pattern.pd.ExcelFileis the context manager for repeated reads. Open once, parse many times.- Don’t reach for
openpyxldirectly unless you need formatting. Pandas wraps it for the 95% case. - Excel is not your final layer. Read once, work in DataFrame.
Watch the video for the full demo, including the all-sheets-as-dict pattern and the ExcelFile context manager in action.