SQL for Finance: Window Functions: When Aggregates Shouldn't Collapse Your Rows
Video: Window Functions: When Aggregates Shouldn't Collapse Your Rows | SQL for Finance Ep6 by CelesteAI
Download the AtlasParts dataset and
atlasparts.db— same as previous episodes: github.com/GoCelesteAI/excel-powerbi-for-finance.
By Episode 5 you can stage any reporting query inside a WITH clause and read it top-to-bottom. That covers most of the queries a finance team writes — but not all of them. The moment you need a running total, a month-over-month change, a rank, or “each customer’s share of their region,” GROUP BY runs out of room.
GROUP BY collapses rows. One row per group, in. One row per group, out. But finance reporting constantly asks for both: keep every row and attach an aggregate. That’s what window functions are for. They compute aggregates over a defined “window” of rows without collapsing the result. By the end of this episode you’ll have built a 12-month revenue dashboard with prior month, change, running total, and percent-of-year — all in one query, no helper columns, no Excel formulas in column N.
The problem GROUP BY can’t solve
Here’s a fair finance question: show me every month’s revenue, what it was the month before, and the running total for the year.
With GROUP BY you can only get the first column. The moment you want “the month before” or “the running total,” you need a value that depends on a neighbor row — and GROUP BY has already thrown those neighbor rows away.
You can simulate it with a self-join (orders aliased twice, joined on month = month - 1) or a correlated subquery in SELECT. Both work. Both are slow, brittle, and unreadable. This is exactly the gap window functions fill.
The OVER clause: aggregate without collapsing
A window function is any aggregate or ranking function followed by OVER (...). The contents of the parentheses describe the window: which rows count, in what order. Without OVER, SUM(revenue) collapses every row into one. With OVER, it stays attached to each row.
SELECT month,
revenue,
SUM(revenue) OVER () AS year_total,
ROUND(100.0 * revenue / SUM(revenue) OVER (), 1) AS pct_of_year
FROM monthly_revenue
ORDER BY month;
SUM(revenue) OVER () — empty window — means “sum across the whole result.” That single number is attached to every row, so we can compute each month’s percent of the year without a self-join. November is 14.9% of the year, January is 4.3%.
PARTITION BY: groups within rows
Add PARTITION BY and the window splits into independent groups. Same column shows up multiple times, with a different “total” depending on which partition the row belongs to.
SELECT region,
customer_name,
revenue,
SUM(revenue) OVER (PARTITION BY region) AS region_total,
ROUND(100.0 * revenue / SUM(revenue) OVER (PARTITION BY region), 1) AS pct_of_region
FROM per_customer_rev;
Frontier Delivery does $284K and is 12.1% of Midwest. Oxford Transport does $296K — almost the same dollars — but it’s 15.9% of Northeast because Northeast has fewer customers. Same SUM aggregate, two different denominators, both held next to the row. That’s the move.
ORDER BY in OVER: running totals, LAG, LEAD
Add ORDER BY inside OVER and the window becomes “all rows up to this one.” That’s how you get a running total. Or pair it with LAG and LEAD to look at the previous or next row directly.
SELECT month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
revenue - LAG(revenue) OVER (ORDER BY month) AS change,
SUM(revenue) OVER (ORDER BY month) AS cumulative
FROM monthly_revenue
ORDER BY month;
LAG looks one row back in the ordered window. LEAD looks one ahead. SUM(...) OVER (ORDER BY ...) becomes a running cumulative because the default frame is “everything from the start through the current row.”
This one query gives finance every comparison they care about: prior-period, period change, year-to-date, all without leaving SQL.
Ranking: ROW_NUMBER, RANK, DENSE_RANK
Three ranking functions, all OVER (...), all subtly different. The difference matters as soon as you have ties.
| Function | On a tie at 9th | Next value |
|---|---|---|
ROW_NUMBER() |
Picks one as 9, one as 10 (arbitrary tiebreak) | 11 |
RANK() |
Both are 9 | 12 (skip) |
DENSE_RANK() |
Both are 9 | 10 (no skip) |
In the AtlasParts product catalog, SKU-10048 and SKU-10098 both sold exactly 581 units. ROW_NUMBER ranks them 9 and 10 (whichever the engine sorts first); RANK calls them both 9 and skips to 11; DENSE_RANK calls them both 9 and continues to 10. Pick RANK for a leaderboard where ties truly tie. Pick ROW_NUMBER when you need a unique row id. Pick DENSE_RANK when you want a dense ranking of distinct values.
Top N per group
This is the pattern that earns its keep. Give me the top 3 customers in each region. Pure GROUP BY can’t do it — it gives you one row per group. The window-function version is two stages: rank inside the partition, then filter on the rank.
WITH ranked AS (
SELECT region,
customer_name,
revenue,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue DESC) AS rn
FROM per_customer_rev
)
SELECT region, customer_name, revenue
FROM ranked
WHERE rn <= 3
ORDER BY region, rn;
That returns the top 3 of every region in one query. Try writing that with GROUP BY alone — you’ll end up with a self-join on rev > rev, counting how many customers beat each row, which is the kind of code people quietly rewrite at 2 AM.
The centerpiece: a 12-month revenue dashboard
Here’s everything in one shot — the kind of query a finance team would build once and re-run every month-end.
WITH monthly AS (
SELECT strftime('%Y-%m', g.posting_date) AS month,
SUM(CAST(g.credit AS REAL)) AS revenue
FROM gl_journal g
JOIN chart_of_accounts a USING (account_id)
WHERE a.account_type = 'Revenue'
GROUP BY month
)
SELECT month,
ROUND(revenue, 0) AS revenue,
ROUND(LAG(revenue) OVER (ORDER BY month), 0) AS prev_month,
ROUND(revenue - LAG(revenue) OVER (ORDER BY month), 0) AS change,
ROUND(SUM(revenue) OVER (ORDER BY month), 0) AS cumulative,
ROUND(100.0 * revenue / SUM(revenue) OVER (), 1) AS pct_of_year
FROM monthly
ORDER BY month;
The result tells AtlasParts’ year as a single sheet:
| month | revenue | prev_month | change | cumulative | pct_of_year |
|---|---|---|---|---|---|
| 2025-01 | 447,166 | 447,166 | 4.3 | ||
| 2025-02 | 504,074 | 447,166 | 56,908 | 951,240 | 4.8 |
| 2025-03 | 716,093 | 504,074 | 212,019 | 1,667,334 | 6.8 |
| 2025-04 | 886,348 | 716,093 | 170,255 | 2,553,682 | 8.5 |
| 2025-05 | 784,731 | 886,348 | -101,618 | 3,338,413 | 7.5 |
| 2025-06 | 717,665 | 784,731 | -67,066 | 4,056,078 | 6.9 |
| 2025-07 | 877,149 | 717,665 | 159,483 | 4,933,227 | 8.4 |
| 2025-08 | 967,301 | 877,149 | 90,152 | 5,900,528 | 9.2 |
| 2025-09 | 763,483 | 967,301 | -203,818 | 6,664,011 | 7.3 |
| 2025-10 | 1,248,662 | 763,483 | 485,180 | 7,912,673 | 11.9 |
| 2025-11 | 1,558,835 | 1,248,662 | 310,173 | 9,471,508 | 14.9 |
| 2025-12 | 990,052 | 1,558,835 | -568,783 | 10,461,560 | 9.5 |
That’s a story: Q1 ramp, summer plateau, September pullback, October–November holiday explosion, December back to baseline. The full year is $10.46M, with 14.9% landing in November alone. A dashboard built on this query updates every month without anyone editing a formula.
When to reach for window functions
Reach for a window function when you need a row-level aggregate — when the answer should keep every row and add a column from a group calculation. The four signals:
- Ranking inside groups — top N per region, customer rank within industry.
- Row-to-row comparisons — prior period, next period, period change.
LAGandLEAD. - Running totals — cumulative revenue, AR aging buckets that compound, depreciation schedules.
- Share-of-total or share-of-group — percent of region, percent of year, percent of category.
If the answer can be a flat GROUP BY summary, use GROUP BY — it’s simpler and faster. The moment you find yourself writing a self-join “to get the previous row” or an outer query “to compute the total again,” that’s a window function.
A note on the frame clause
We’ve used the default frame everywhere: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. That’s why SUM(...) OVER (ORDER BY month) becomes a running total. Window functions also let you write ROWS BETWEEN 2 PRECEDING AND CURRENT ROW for a 3-month rolling average, or ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING for a window covering everything regardless of order. Frame clauses are a deep topic; the takeaway is that “the window” can be tuned far past the defaults — and the moment you need a rolling average, that’s where you go.
What’s next
In Episode 7 we’ll cover dates and reporting patterns — strftime, fiscal periods, quarter and year groupings, and the date-spine trick that makes “every month including the empty ones” work. Window functions plus date logic is most of what powers a real finance reporting layer.
The full SQL file from this episode (every query, runnable against atlasparts.db) is on GitHub: atlasparts/sql/ep06_window_functions.sql.