SQL for Finance: Aggregations and GROUP BY: The Pivot Table, Replaced
Video: Aggregations and GROUP BY: The Pivot Table, Replaced | SQL for Finance Ep3 by CelesteAI
Download the AtlasParts dataset and
atlasparts.db— same as Episodes 1 and 2: github.com/GoCelesteAI/excel-powerbi-for-finance.
The first thing every finance analyst builds in Excel is a pivot table. Drag a category field into rows, drag a number field into values, change the aggregation to Sum. That’s how Total revenue by region, Average invoice by vendor, and Count of orders by status all get answered.
In SQL, the same answer is one keyword: GROUP BY. Combined with the five aggregate functions — COUNT, SUM, AVG, MIN, MAX — it replaces 80% of the pivot tables your team builds, and it does it as one query you can save and re-run forever.
Aggregate functions: condense many rows into one number
Aggregate functions take a column of values and return a single number. There are five you’ll use almost daily:
SELECT COUNT(*),
SUM(CAST(order_total AS REAL)) AS revenue,
AVG(CAST(order_total AS REAL)) AS avg_order,
MIN(CAST(order_total AS REAL)) AS smallest,
MAX(CAST(order_total AS REAL)) AS largest
FROM sales_orders;
Run that against AtlasParts and you get one row back: 2,242 orders, $11,505,630 in revenue, $5,131.86 average, $28.56 smallest, $29,906.06 largest. That’s a finance-team-meeting answer to “how was the year?” in five lines and one query.
COUNT(*) counts rows including NULLs. COUNT(column) counts rows where that column is not NULL — which is how you’d answer “how many orders have a close date.” SUM, AVG, MIN, and MAX ignore NULLs by default; that’s usually what you want.
CAST again. As in Episode 2, CSV-loaded columns are TEXT. Every numeric aggregate needs
CAST(col AS REAL)until Episode 7 covers schema fixes. It’s noise in every query; once you accept it, it stops being annoying.
GROUP BY: aggregate per category
The aggregate functions above collapsed everything to one row. GROUP BY collapses the rows within each group of a category column.
SELECT status,
COUNT(*) AS orders,
ROUND(SUM(CAST(order_total AS REAL)), 0) AS total
FROM sales_orders
GROUP BY status
ORDER BY total DESC;
Three rows back, one per status:
| status | orders | total |
|---|---|---|
| Shipped | 1,216 | $6,026,319 |
| Closed | 821 | $4,435,241 |
| Open | 205 | $1,044,070 |
That is a pivot table, end to end, in five lines.
The rule for GROUP BY: every column in your SELECT list must either appear in the GROUP BY clause or be inside an aggregate function. The reason is logical — if you group by status, the engine collapses each status group to one row, so any other column (like customer_id) would have many possible values and SQL doesn’t know which to show. Group it, or aggregate it, or drop it.
GROUP BY across tables
Pivot tables in Excel work on one sheet. Once your category lives in another file — region in customers.csv, amount in sales_orders.csv — you’re suddenly building a VLOOKUP layer first. SQL does it inline:
SELECT c.region,
COUNT(*) AS orders,
ROUND(SUM(CAST(o.order_total AS REAL)), 0) AS revenue
FROM sales_orders o
JOIN customers c USING (customer_id)
GROUP BY c.region
ORDER BY revenue DESC
LIMIT 8;
| region | orders | revenue |
|---|---|---|
| Midwest | 465 | $2,352,437 |
| Northeast | 344 | $1,856,540 |
| West | 290 | $1,433,102 |
| South | 257 | $1,247,029 |
| Quebec | 144 | $759,176 |
| Scotland | 133 | $647,251 |
| England | 99 | $514,710 |
| Ontario | 92 | $501,344 |
The four US regions dominate. Quebec, Scotland, and Ontario each show their own row because the customer field stores them by sub-national region rather than country. (Episode 4 covers JOINs in depth — the one-line JOIN here is the version you’ll use most often.)
A finance trap: Revenue debits are zero
Now run this — the JOIN-and-group from Episode 1 — but read the result carefully:
SELECT account_type,
COUNT(*) AS entries,
ROUND(SUM(CAST(debit AS REAL)), 0) AS total_debits
FROM gl_journal
JOIN chart_of_accounts USING (account_id)
GROUP BY account_type
ORDER BY total_debits DESC;
| account_type | entries | total_debits |
|---|---|---|
| Asset | 10,031 | $27,396,281 |
| Expense | 5,713 | $10,000,777 |
| Liability | 764 | $4,817,995 |
| Revenue | 5,518 | $0 |
| Equity | 2 | $0 |
Revenue total: zero. Equity total: zero. 5,518 revenue lines and the SUM is zero?
This is correct. In double-entry bookkeeping, revenue accounts are credit-normal — every revenue entry is a credit, not a debit. Summing the debit column on revenue rows gives you exactly $0. To get revenue you’d sum the credit column, or use credit - debit.
The lesson is bigger than the syntax: aggregates over the wrong column give you the wrong number, silently. The query runs, the result table looks plausible, the CFO reads “Revenue = $0” and wonders if you’ve lost your mind. Always sanity-check the column you’re aggregating against the meaning of the data.
HAVING: filter the groups themselves
WHERE filters rows before aggregation. HAVING filters groups after aggregation. The classic question — “which regions did more than a million dollars of business?” — needs HAVING:
SELECT c.region,
ROUND(SUM(CAST(o.order_total AS REAL)), 0) AS revenue
FROM sales_orders o
JOIN customers c USING (customer_id)
GROUP BY c.region
HAVING SUM(CAST(o.order_total AS REAL)) > 1000000
ORDER BY revenue DESC;
Four rows: Midwest, Northeast, West, South — the regions that crossed seven figures.
Could you do this with WHERE instead? No — the row-level WHERE doesn’t know about group totals because grouping hasn’t happened yet. HAVING runs after GROUP BY collapses the rows, and it can reference aggregate functions directly.
The mental order is:
FROM/JOIN— gather rowsWHERE— filter rowsGROUP BY— collapse into groupsHAVING— filter groupsSELECT— pick columns and aggregatesORDER BY— sortLIMIT— take top N
Memorize that ordering. It’s the lifecycle of every aggregate query you’ll ever write.
The full pivot-table pattern
Combine everything from this episode and Episode 2 and you have the universal “pivot-table-as-query” template:
SELECT <category>,
<aggregate functions>
FROM <table or join>
WHERE <row-level filter>
GROUP BY <category>
HAVING <group-level filter>
ORDER BY <column or aggregate>
LIMIT <top N>;
That single shape covers the vast majority of analytical queries finance teams write — month-over-month revenue, top-twenty vendors by spend, average days-to-pay by customer segment. Once it’s in your fingers, the next thousand variations are minor edits.
What’s next
Episode 4: JOINs in depth. We’ve used the one-line JOIN casually in Episodes 1 and 3; next episode is the careful walk-through — INNER, LEFT, RIGHT, the AtlasParts ER diagram, why USING is sometimes wrong, and how to debug a join that returns the wrong number of rows.
Code
The queries from this episode are in atlasparts/sql/ep03_aggregations.sql on the repo. Run with sqlite3 atlasparts.db < ep03_aggregations.sql.