Part of Pandas for Finance

Pandas for Finance: Filter Rows: Boolean Masks, .query(), .nlargest, Date Slices

Celest KimCelest Kim

Video: Filter Rows: Boolean Masks, .query(), .nlargest, Date Slices | Pandas for Finance Ep4 by CelesteAI

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

df[df["Close"].pct_change() > 0.03] keeps days with >3% up moves. Combine with &/|. Use .query("Volume > 1e8") for readable filters. Use .loc["2023-01":"2023-06"] for date slices. The WHERE clause for DataFrames.

In SQL: WHERE close_pct_change > 0.03. In pandas: pass a boolean Series as the index. Same idea, different syntax.

Boolean indexing — the core idea

import pandas as pd

df = pd.read_parquet("data/prices.parquet")
aapl = df[df["Ticker"] == "AAPL"].copy()

# Daily return as a column
aapl["Return"] = aapl["Close"].pct_change()

# Days up more than 3%
big_up = aapl[aapl["Return"] > 0.03]
print(big_up[["Date", "Close", "Return"]].head())

aapl["Return"] > 0.03 produces a Series of True/False. Indexing the DataFrame with that Series keeps rows where it’s True.

Multiple conditions

Use & (and), | (or), ~ (not). Wrap each condition in parentheses — Python operator precedence is sneaky.

# Big up days with high volume
big_active = aapl[(aapl["Return"] > 0.03) & (aapl["Volume"] > 1e8)]

# Big moves either way
big_move = aapl[aapl["Return"].abs() > 0.05]

# Anything but the smallest moves
not_tiny = aapl[~(aapl["Return"].abs() < 0.001)]

and / or keywords don’t work for vectorized boolean Series — they only work on scalar True/False. Use & / |.

.query() — readable filters

big_up = aapl.query("Return > 0.03")
big_active = aapl.query("Return > 0.03 and Volume > 1e8")
big_move = aapl.query("abs(Return) > 0.05")

.query() takes a string expression. You can use and/or because the parser converts to bitwise. Reads more like SQL.

For variables outside the DataFrame, prefix with @:

threshold = 0.03
result = aapl.query("Return > @threshold")

For columns with spaces or special chars: backticks.

df.query("`Adj Close` > 100")

I use .query() for anything past two conditions — easier to read.

isin and between

# Multiple tickers
megacaps = df[df["Ticker"].isin(["AAPL", "MSFT", "GOOGL"])]

# Volume range
mid_volume = aapl[aapl["Volume"].between(5e7, 1e8)]

# Specific dates
specific = aapl[aapl["Date"].isin(["2024-01-02", "2024-12-31"])]

.isin(list) is the SQL IN. .between(low, high) is BETWEEN (inclusive on both ends by default).

Date slicing with .loc

If Date is your index (not a column), date slicing becomes magical:

aapl_indexed = aapl.set_index("Date")

# Just January 2024
jan = aapl_indexed.loc["2024-01"]

# Q1 2024
q1 = aapl_indexed.loc["2024-01":"2024-03"]

# Specific day
print(aapl_indexed.loc["2024-01-02"])

loc with a DatetimeIndex accepts partial date strings — pandas resolves them to the right range.

String filters with .str

# Tickers starting with "X" (sector ETFs)
etfs = df[df["Ticker"].str.startswith("X")]

# Ticker contains "P" (case-insensitive)
matches = df[df["Ticker"].str.contains("P", case=False)]

# Length 4 chars
four_letter = df[df["Ticker"].str.len() == 4]

.str is the string accessor — every standard string method, vectorized.

Common patterns

Top N by value

top_volume = aapl.nlargest(10, "Volume")
top_returns = aapl.nlargest(10, "Return")

.nlargest(N, "col") and .nsmallest(N, "col"). Faster than sort + head.

Filter and aggregate

# Average volume on big-up days
big_up = aapl[aapl["Return"] > 0.03]
print(big_up["Volume"].mean())

# Compare to baseline
print("Baseline avg volume:", aapl["Volume"].mean())
print("Big-up days avg volume:", big_up["Volume"].mean())

A common analyst question: do big moves happen on high-volume days? Filter, aggregate, compare.

Drop rows where a condition is true

# Drop NaN rows
clean = aapl.dropna(subset=["Return"])

# Drop zero-volume days (data glitches)
real = aapl[aapl["Volume"] > 0]

# Equivalent: invert the condition
real = aapl[~(aapl["Volume"] == 0)]

.dropna() for missing values. Boolean filtering for everything else.

Performance note

Boolean indexing creates a copy by default. For one-time filters, fine. For repeated filters in a loop:

# Slow — creates many copies
for t in tickers:
  subset = df[df["Ticker"] == t]
  process(subset)

# Faster — group once
for t, subset in df.groupby("Ticker"):
  process(subset)

groupby produces views, not copies. We cover groupby in Ep 7.

Try it

big_days.py:

import pandas as pd

pd.set_option("display.float_format", lambda x: f"{x:,.4f}")

df = pd.read_parquet("data/prices.parquet")
aapl = df[df["Ticker"] == "AAPL"].copy()
aapl["Return"] = aapl["Close"].pct_change()

print("=== Big up days (>3%) ===")
big_up = aapl.query("Return > 0.03")
print(big_up[["Date", "Close", "Return"]].head(10))

print("\n=== Big up + high volume ===")
big_active = aapl.query("Return > 0.03 and Volume > 1e8")
print(f"Count: {len(big_active)}")

print("\n=== Top 5 by daily return ===")
print(aapl.nlargest(5, "Return")[["Date", "Return"]])

print("\n=== Q1 2024 only ===")
q1 = aapl.set_index("Date").loc["2024-01":"2024-03"]
print(f"Q1 2024 days: {len(q1)}")
print(f"Q1 2024 avg close: {q1['Close'].mean():,.2f}")

Common stumbles

and/or don’t work for Series. df[(a) & (b)], not df[a and b]. Even worse, and/or doesn’t error on a vector — it raises “ambiguous truth value.” Always &/| with parens.

Forgetting parens. df[df.x > 1 & df.y < 5] parses as df[df.x > (1 & df.y) < 5] — disaster. df[(df.x > 1) & (df.y < 5)].

SettingWithCopyWarning on filtered results. aapl = df[df["Ticker"] == "AAPL"]; aapl["Return"] = ... warns because aapl may be a view. Add .copy(): aapl = df[df["Ticker"] == "AAPL"].copy().

.query() and column names with spaces. Use backticks: query("Adj Close> 100").

Date slicing without DatetimeIndex. df.loc["2024-01"] errors if Date is just a column. df.set_index("Date").loc["2024-01"] works.

pct_change on the first row. Returns NaN. Filters that depend on it drop that row. Use .dropna() if you want clean rows.

What’s next

Ep 5: picking columns. .loc, .iloc, .xs, drop, rename, reorder.

Recap

df[boolean_series] for filtering. Combine with &/|/~ and parens. .query("...") for readable multi-condition filters. .isin([...]), .between(a, b), .str.startswith() for special cases. .nlargest(N, "col") for top-N. Date slicing requires DatetimeIndexdf.set_index("Date").loc["2024-01"]. Watch the SettingWithCopyWarning — add .copy() after filtering when you’ll modify.

Next episode: picking columns.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.