Filter Rows: Boolean Masks, .query(), .nlargest, Date Slices | Pandas for Finance Ep4

Name: Filter Rows: Boolean Masks, .query(), .nlargest, Date Slices | Pandas for Finance Ep4
Uploaded: 2026-05-09 01:29:16
Channel: CelesteAI

0views

CelesteAI

Description

Episode 4 of Pandas for Finance. The WHERE clause for DataFrames. Source code: https://github.com/GoCelesteAI/pandas-for-finance If you finished Episode 3, you have the cached parquet on disk. Now we filter. In SQL, you write WHERE close_return greater than 0.03. In pandas, you pass a boolean Series as the index. Same idea — df[df["Close"].pct_change().gt(0.03)] keeps the days Apple closed up more than three percent. Combine with & and |, wrap each condition in parens, and you have everything you need to find the moments that matter in seven years of price data. What You'll Build: - big_days.py — load the cached prices, filter to AAPL, compute daily returns, and pull the days the stock moved more than three percent. - Two-condition filtering with .query("Return greater than 0.03 and Volume greater than 1e8") — readable, SQL-shaped, no operator-precedence traps. - .nlargest(5, "Return") to pull the five biggest single-day moves across the whole range — including April 2025 up fifteen percent. - Date slicing with .set_index("Date").loc["2024-01":"2024-03"] to get just Q1 2024 and its average close. Timestamps: 0:00 - Intro — Episode 4 starts here 0:17 - Preview — boolean masks, .query, top-N 0:54 - Open nvim, write big_days.py 1:08 - Filter to AAPL, compute daily Return 1:20 - Boolean filter for big up days 1:32 - Two-condition .query 1:44 - .nlargest for top 5 1:54 - Save and run 2:01 - Big up days output 2:11 - Big up + high volume; top 5 returns 2:31 - Recap — boolean masks, .query, .nlargest 3:08 - End screen Key Takeaways: 1. Boolean indexing is the core idea. df[boolean_series] keeps the rows where the Series is True. aapl["Return"].gt(0.03) produces a Series of True/False aligned to the index, and aapl[that] returns the matching rows. This single pattern handles ninety percent of the row-filtering you'll ever do — anything more complex is just a different way to construct the boolean Series. 2. Use &, |, ~ — never and, or, not. Vectorized operations need bitwise operators on element-wise Series. Python's and/or keywords expect a single scalar truth value and will raise "ambiguous truth value of a Series." And always wrap each condition in parens: (a.gt(1)) & (b.lt(5)). Without the parens, operator precedence parses the bitwise ampersand before the comparisons — silently wrong. 3. .query() is the readable filter. Takes a string expression, parses it, and applies it. You can use and/or inside the string because the parser converts them to bitwise. Reference outside variables with @variable. Reference column names with spaces using backticks. Anything past two conditions is more readable as a .query() string than as a chain of bitwise expressions. 4. .isin([...]), .between(a, b), .str.contains(...) cover the SQL IN, BETWEEN, and LIKE cases. df[df["Ticker"].isin(["AAPL", "MSFT", "GOOGL"])] for whitelist. aapl[aapl["Volume"].between(5e7, 1e8)] for ranges. df[df["Ticker"].str.startswith("X")] for string patterns. Every SQL filter has a one-line pandas equivalent. 5. Date slicing requires a DatetimeIndex. df.set_index("Date").loc["2024-01":"2024-03"] works because loc understands partial date strings on a DatetimeIndex. Without set_index, you get a KeyError. For series operations on time-indexed data — rolling windows, resample, asfreq — the DatetimeIndex pays back the conversion cost across every later episode. #Pandas #Python #Finance #Filter #BooleanIndexing #PandasQuery #DataAnalytics #PythonForFinance #LearnPandas #DataFrame --- Generated by GoCelesteAI · part of the Pandas for Finance series

Back to tutorials

Duration

Added to Codegiz

May 9, 2026

📖 Read the article Open in YouTube