Back to Blog

How to Read SQL into a DataFrame in Python

Celest KimCelest Kim

Video: How to Read SQL into a DataFrame in Python — Tutorial by CelesteAI

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

Three lines. import sqlite3conn = sqlite3.connect("prices.db")pd.read_sql("SELECT * FROM prices WHERE Close > 200", conn). You now have a pandas DataFrame containing every row your SQL query returned. The same shape works with Polars via pl.read_database. The same shape works against Postgres, MySQL, BigQuery, and Snowflake — just swap the connection.

The single most common analyst pipeline starts with a SQL query and ends with a DataFrame. The database has the data; the SQL filters it; the DataFrame is where the real work happens. Python’s standard library and every major dataframe library make this a two-line operation.

What’s actually happening

A database client opens a connection. The connection talks SQL — you send a query string, the server runs it, the result rows come back as cursor records. Both pandas and Polars wrap that cursor and materialize the result as a DataFrame, handling type conversion and column naming automatically.

The dataframe library doesn’t speak SQL itself — it leans on the underlying driver (sqlite3, psycopg2, pymysql, pyarrow.flight). Your job is to pass it a connection and a query string. The library handles the rest.

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install pandas polars connectorx

sqlite3 is in the standard library. connectorx is the fast Rust-based connector polars uses for URI-style reads — optional but worth installing.

For the demo we’ll use a SQLite database with a prices table — fourteen tickers, daily OHLCV, ~28 thousand rows. The setup is one line: df.to_pandas().to_sql("prices", sqlite3.connect("prices.db")).

Method 1 — pandas read_sql (most common)

import sqlite3
import pandas as pd

conn = sqlite3.connect("prices.db")
df = pd.read_sql("SELECT * FROM prices WHERE Ticker='AAPL' AND Close > 200", conn)

print(df.shape)   # (367, 8)

Three lines if you count the import. pd.read_sql accepts any DB-API 2.0 connection — SQLite, Postgres, MySQL, Oracle, anything that exposes a cursor.execute(sql) method.

For postgres:

import psycopg2
conn = psycopg2.connect("postgresql://user:pass@host:5432/db")
df = pd.read_sql("SELECT * FROM prices", conn)

For SQLAlchemy (preferred for production):

from sqlalchemy import create_engine
engine = create_engine("postgresql+psycopg2://user:pass@host/db")
df = pd.read_sql("SELECT * FROM prices", engine)

The query string is whatever your database speaks. Parameterized queries belong in params=:

df = pd.read_sql(
  "SELECT * FROM prices WHERE Ticker = ? AND Close > ?",
  conn,
  params=("AAPL", 200),
)

Never f-string user input into a SQL query string. Parameters get escaped automatically; concatenation gets you SQL injection.

Method 2 — polars read_database (connection-based)

import sqlite3
import polars as pl

conn = sqlite3.connect("prices.db")
df = pl.read_database("SELECT * FROM prices WHERE Ticker='AAPL' AND Close > 200", conn)

print(df.shape)   # (367, 8)

Same shape. Polars’s read_database accepts the same DB-API connection objects pandas does. The result is a pl.DataFrame — call .to_pandas() if downstream code wants it back as pandas.

For execution-engine reasons, polars’s reader is faster on large result sets — it streams rows through Arrow buffers rather than building a pandas frame and converting.

Method 3 — polars read_database_uri (one-liner)

import polars as pl

df = pl.read_database_uri(
  "SELECT * FROM prices WHERE Ticker='AAPL' AND Close > 200",
  "sqlite:///prices.db",
)

print(df.shape)   # (367, 8)

No connection management. Polars (via the connectorx backend) opens, reads, and closes the connection in one call. The URI string tells it everything it needs — driver, host, credentials, database name.

URI examples:

sqlite:///prices.db
postgresql://user:pass@localhost:5432/finance
mysql://user:pass@db.example.com:3306/finance
mssql://user:pass@host:1433/finance

For ad-hoc scripts and notebooks, this is the cleanest path. For production pipelines, the connection-pool methods (read_database with SQLAlchemy/psycopg2) give you finer control over reconnects and transactions.

Filter at the SQL layer, not in Python

The single most common mistake: pulling the whole table into memory and then filtering with pandas.

Wrong:

df = pd.read_sql("SELECT * FROM prices", conn)
big = df[df["Close"] > 200]      # filtered in Python after reading 28k rows

Right:

big = pd.read_sql("SELECT * FROM prices WHERE Close > 200", conn)

The database is the right place to filter. For a 28k-row table the difference is small. For a 100-million-row table it’s the difference between a 5-second query and a 5-minute Python crash. Push the filter into SQL whenever you can.

Same principle for column selection — use SELECT col1, col2, … instead of SELECT * then dropping columns in Python.

What the libraries do well

pandas read_sql polars read_database polars read_database_uri
Needs a connection object yes yes no
Output type pd.DataFrame pl.DataFrame pl.DataFrame
Speed (large results) baseline ~2-5× faster ~2-5× faster
Production pipelines yes (SQLAlchemy) yes best for ad-hoc
Parameterized queries params= execute_options not via URI directly
Chunked reads chunksize= yes no

For most analyst work, pandas read_sql is the right default if you’re already on pandas. Switch to polars read_database when result sets are large enough that the speed-up matters (rule of thumb: millions of rows).

Reading the schema first

When you don’t know the columns yet:

df = pd.read_sql("SELECT * FROM prices LIMIT 0", conn)
print(df.dtypes)

LIMIT 0 returns zero rows but the schema comes back. Cheap way to inspect a table you didn’t create.

For polars:

df = pl.read_database("SELECT * FROM prices LIMIT 0", conn)
print(df.schema)

Iterating in chunks for tables too big to fit in memory

pd.read_sql supports streamed reads:

for chunk in pd.read_sql("SELECT * FROM huge_table", conn, chunksize=100_000):
  process(chunk)

Each chunk is a DataFrame of up to chunksize rows. The connection stays open across iterations. The full table is never materialized in memory.

Polars handles this differently — it would rather you push the filter into SQL so chunking isn’t necessary. For genuinely massive reads, scan into Parquet first:

import duckdb
duckdb.sql("COPY (SELECT * FROM postgres_scan('postgres://...', 'prices') ) TO 'prices.parquet'")
df = pl.scan_parquet("prices.parquet").filter(...).collect()

A whole other tutorial. For most reads, parameterized SQL + a single-call read is enough.

Common stumbles

OperationalError: no such table. Your connection is pointed at the wrong database. SELECT name FROM sqlite_master WHERE type='table' lists every table SQLite can see.

Date columns come back as strings. SQLite doesn’t have a native date type — it stores dates as text. Cast on read:

df["Date"] = pd.to_datetime(df["Date"])

Polars has the same issue but resolves with .str.to_datetime() after the read.

UserWarning: pandas only supports SQLAlchemy connectable. Modern pandas wants a SQLAlchemy engine for non-SQLite databases. Either swap to SQLAlchemy or wrap your raw connection: pd.read_sql(sql, sqlalchemy_engine).

Connection times out on long queries. Set the timeout on the connection: sqlite3.connect("prices.db", timeout=60). For postgres, use the engine’s connect_args={"connect_timeout": 60}.

sqlite3.OperationalError: database is locked. Two processes wrote at the same time. SQLite serializes writes; use connection.execute("PRAGMA busy_timeout = 5000") to wait up to 5 seconds for the lock.

Recap

pd.read_sql(query, connection) is the canonical two-line SQL-to-DataFrame in Python. Same shape with pl.read_database if you want a Polars frame. pl.read_database_uri skips connection management entirely, taking a URI string instead. Use parameterized queries (params=) to escape user input; never f-string SQL. Filter and select at the SQL layer, not in Python — let the database do its job. chunksize= for streamed reads when the result set won’t fit in memory. SQLAlchemy engines for production; raw DB-API connections fine for one-off scripts.

Subscribe to the playlist for more Python answers.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.