Part of SQL for Finance

SQL for Finance: Window Functions: When Aggregates Shouldn't Collapse Your Rows

Celest Kim

•May 5, 2026

Video: Window Functions: When Aggregates Shouldn't Collapse Your Rows | SQL for Finance Ep6 by CelesteAI

Take the quiz on the full lesson page

Test what you've read · interactive walkthrough

Download the AtlasParts dataset and atlasparts.db — same as previous episodes: github.com/GoCelesteAI/excel-powerbi-for-finance.

By Episode 5 you can stage any reporting query inside a WITH clause and read it top-to-bottom. That covers most of the queries a finance team writes — but not all of them. The moment you need a running total, a month-over-month change, a rank, or “each customer’s share of their region,” GROUP BY runs out of room.

GROUP BY collapses rows. One row per group, in. One row per group, out. But finance reporting constantly asks for both: keep every row and attach an aggregate. That’s what window functions are for. They compute aggregates over a defined “window” of rows without collapsing the result. By the end of this episode you’ll have built a 12-month revenue dashboard with prior month, change, running total, and percent-of-year — all in one query, no helper columns, no Excel formulas in column N.

The problem `GROUP BY` can’t solve

Here’s a fair finance question: show me every month’s revenue, what it was the month before, and the running total for the year.

With GROUP BY you can only get the first column. The moment you want “the month before” or “the running total,” you need a value that depends on a neighbor row — and GROUP BY has already thrown those neighbor rows away.

You can simulate it with a self-join (orders aliased twice, joined on month = month - 1) or a correlated subquery in SELECT. Both work. Both are slow, brittle, and unreadable. This is exactly the gap window functions fill.

The `OVER` clause: aggregate without collapsing

A window function is any aggregate or ranking function followed by OVER (...). The contents of the parentheses describe the window: which rows count, in what order. Without OVER, SUM(revenue) collapses every row into one. With OVER, it stays attached to each row.

SELECT month,
       revenue,
       SUM(revenue) OVER ()                       AS year_total,
       ROUND(100.0 * revenue / SUM(revenue) OVER (), 1) AS pct_of_year
FROM   monthly_revenue
ORDER  BY month;

SUM(revenue) OVER () — empty window — means “sum across the whole result.” That single number is attached to every row, so we can compute each month’s percent of the year without a self-join. November is 14.9% of the year, January is 4.3%.

`PARTITION BY`: groups within rows

Add PARTITION BY and the window splits into independent groups. Same column shows up multiple times, with a different “total” depending on which partition the row belongs to.

SELECT region,
       customer_name,
       revenue,
       SUM(revenue) OVER (PARTITION BY region) AS region_total,
       ROUND(100.0 * revenue / SUM(revenue) OVER (PARTITION BY region), 1) AS pct_of_region
FROM   per_customer_rev;

Frontier Delivery does $284K and is 12.1% of Midwest. Oxford Transport does $296K — almost the same dollars — but it’s 15.9% of Northeast because Northeast has fewer customers. Same SUM aggregate, two different denominators, both held next to the row. That’s the move.

`ORDER BY` in `OVER`: running totals, `LAG`, `LEAD`

Add ORDER BY inside OVER and the window becomes “all rows up to this one.” That’s how you get a running total. Or pair it with LAG and LEAD to look at the previous or next row directly.

SELECT month,
       revenue,
       LAG(revenue)  OVER (ORDER BY month) AS prev_month,
       revenue - LAG(revenue) OVER (ORDER BY month) AS change,
       SUM(revenue) OVER (ORDER BY month) AS cumulative
FROM   monthly_revenue
ORDER  BY month;

LAG looks one row back in the ordered window. LEAD looks one ahead. SUM(...) OVER (ORDER BY ...) becomes a running cumulative because the default frame is “everything from the start through the current row.”

This one query gives finance every comparison they care about: prior-period, period change, year-to-date, all without leaving SQL.

Ranking: `ROW_NUMBER`, `RANK`, `DENSE_RANK`

Three ranking functions, all OVER (...), all subtly different. The difference matters as soon as you have ties.

Function	On a tie at 9th	Next value
`ROW_NUMBER()`	Picks one as 9, one as 10 (arbitrary tiebreak)	11
`RANK()`	Both are 9	12 (skip)
`DENSE_RANK()`	Both are 9	10 (no skip)

In the AtlasParts product catalog, SKU-10048 and SKU-10098 both sold exactly 581 units. ROW_NUMBER ranks them 9 and 10 (whichever the engine sorts first); RANK calls them both 9 and skips to 11; DENSE_RANK calls them both 9 and continues to 10. Pick RANK for a leaderboard where ties truly tie. Pick ROW_NUMBER when you need a unique row id. Pick DENSE_RANK when you want a dense ranking of distinct values.

Top N per group

This is the pattern that earns its keep. Give me the top 3 customers in each region. Pure GROUP BY can’t do it — it gives you one row per group. The window-function version is two stages: rank inside the partition, then filter on the rank.

WITH ranked AS (
  SELECT region,
         customer_name,
         revenue,
         ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue DESC) AS rn
  FROM   per_customer_rev
)
SELECT region, customer_name, revenue
FROM   ranked
WHERE  rn <= 3
ORDER  BY region, rn;

That returns the top 3 of every region in one query. Try writing that with GROUP BY alone — you’ll end up with a self-join on rev > rev, counting how many customers beat each row, which is the kind of code people quietly rewrite at 2 AM.

The centerpiece: a 12-month revenue dashboard

Here’s everything in one shot — the kind of query a finance team would build once and re-run every month-end.

WITH monthly AS (
  SELECT strftime('%Y-%m', g.posting_date) AS month,
         SUM(CAST(g.credit AS REAL))       AS revenue
  FROM   gl_journal       g
  JOIN   chart_of_accounts a USING (account_id)
  WHERE  a.account_type = 'Revenue'
  GROUP  BY month
)
SELECT month,
       ROUND(revenue, 0)                                                           AS revenue,
       ROUND(LAG(revenue) OVER (ORDER BY month), 0)                                AS prev_month,
       ROUND(revenue - LAG(revenue) OVER (ORDER BY month), 0)                      AS change,
       ROUND(SUM(revenue) OVER (ORDER BY month), 0)                                AS cumulative,
       ROUND(100.0 * revenue / SUM(revenue) OVER (), 1)                            AS pct_of_year
FROM   monthly
ORDER  BY month;

The result tells AtlasParts’ year as a single sheet:

month	revenue	prev_month	change	cumulative	pct_of_year
2025-01	447,166			447,166	4.3
2025-02	504,074	447,166	56,908	951,240	4.8
2025-03	716,093	504,074	212,019	1,667,334	6.8
2025-04	886,348	716,093	170,255	2,553,682	8.5
2025-05	784,731	886,348	-101,618	3,338,413	7.5
2025-06	717,665	784,731	-67,066	4,056,078	6.9
2025-07	877,149	717,665	159,483	4,933,227	8.4
2025-08	967,301	877,149	90,152	5,900,528	9.2
2025-09	763,483	967,301	-203,818	6,664,011	7.3
2025-10	1,248,662	763,483	485,180	7,912,673	11.9
2025-11	1,558,835	1,248,662	310,173	9,471,508	14.9
2025-12	990,052	1,558,835	-568,783	10,461,560	9.5

That’s a story: Q1 ramp, summer plateau, September pullback, October–November holiday explosion, December back to baseline. The full year is $10.46M, with 14.9% landing in November alone. A dashboard built on this query updates every month without anyone editing a formula.

When to reach for window functions

Reach for a window function when you need a row-level aggregate — when the answer should keep every row and add a column from a group calculation. The four signals:

Ranking inside groups — top N per region, customer rank within industry.
Row-to-row comparisons — prior period, next period, period change. LAG and LEAD.
Running totals — cumulative revenue, AR aging buckets that compound, depreciation schedules.
Share-of-total or share-of-group — percent of region, percent of year, percent of category.

If the answer can be a flat GROUP BY summary, use GROUP BY — it’s simpler and faster. The moment you find yourself writing a self-join “to get the previous row” or an outer query “to compute the total again,” that’s a window function.

A note on the frame clause

We’ve used the default frame everywhere: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. That’s why SUM(...) OVER (ORDER BY month) becomes a running total. Window functions also let you write ROWS BETWEEN 2 PRECEDING AND CURRENT ROW for a 3-month rolling average, or ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING for a window covering everything regardless of order. Frame clauses are a deep topic; the takeaway is that “the window” can be tuned far past the defaults — and the moment you need a rolling average, that’s where you go.

What’s next

In Episode 7 we’ll cover dates and reporting patterns — strftime, fiscal periods, quarter and year groupings, and the date-spine trick that makes “every month including the empty ones” work. Window functions plus date logic is most of what powers a real finance reporting layer.

The full SQL file from this episode (every query, runnable against atlasparts.db) is on GitHub: atlasparts/sql/ep06_window_functions.sql.

Ready? Take the quiz on the full lesson page →

Test what you've learned. Watch the lesson and try the interactive quiz on the same page.

View all episodes in SQL for Finance →