Part of SQL for Finance

SQL for Finance: Aggregations and GROUP BY: The Pivot Table, Replaced

Celest KimCelest Kim

Video: Aggregations and GROUP BY: The Pivot Table, Replaced | SQL for Finance Ep3 by CelesteAI

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

Download the AtlasParts dataset and atlasparts.db — same as Episodes 1 and 2: github.com/GoCelesteAI/excel-powerbi-for-finance.

The first thing every finance analyst builds in Excel is a pivot table. Drag a category field into rows, drag a number field into values, change the aggregation to Sum. That’s how Total revenue by region, Average invoice by vendor, and Count of orders by status all get answered.

In SQL, the same answer is one keyword: GROUP BY. Combined with the five aggregate functions — COUNT, SUM, AVG, MIN, MAX — it replaces 80% of the pivot tables your team builds, and it does it as one query you can save and re-run forever.

Aggregate functions: condense many rows into one number

Aggregate functions take a column of values and return a single number. There are five you’ll use almost daily:

SELECT COUNT(*),
       SUM(CAST(order_total AS REAL))   AS revenue,
       AVG(CAST(order_total AS REAL))   AS avg_order,
       MIN(CAST(order_total AS REAL))   AS smallest,
       MAX(CAST(order_total AS REAL))   AS largest
FROM   sales_orders;

Run that against AtlasParts and you get one row back: 2,242 orders, $11,505,630 in revenue, $5,131.86 average, $28.56 smallest, $29,906.06 largest. That’s a finance-team-meeting answer to “how was the year?” in five lines and one query.

COUNT(*) counts rows including NULLs. COUNT(column) counts rows where that column is not NULL — which is how you’d answer “how many orders have a close date.” SUM, AVG, MIN, and MAX ignore NULLs by default; that’s usually what you want.

CAST again. As in Episode 2, CSV-loaded columns are TEXT. Every numeric aggregate needs CAST(col AS REAL) until Episode 7 covers schema fixes. It’s noise in every query; once you accept it, it stops being annoying.

GROUP BY: aggregate per category

The aggregate functions above collapsed everything to one row. GROUP BY collapses the rows within each group of a category column.

SELECT status,
       COUNT(*)                                     AS orders,
       ROUND(SUM(CAST(order_total AS REAL)), 0)     AS total
FROM   sales_orders
GROUP  BY status
ORDER  BY total DESC;

Three rows back, one per status:

status orders total
Shipped 1,216 $6,026,319
Closed 821 $4,435,241
Open 205 $1,044,070

That is a pivot table, end to end, in five lines.

The rule for GROUP BY: every column in your SELECT list must either appear in the GROUP BY clause or be inside an aggregate function. The reason is logical — if you group by status, the engine collapses each status group to one row, so any other column (like customer_id) would have many possible values and SQL doesn’t know which to show. Group it, or aggregate it, or drop it.

GROUP BY across tables

Pivot tables in Excel work on one sheet. Once your category lives in another file — region in customers.csv, amount in sales_orders.csv — you’re suddenly building a VLOOKUP layer first. SQL does it inline:

SELECT c.region,
       COUNT(*)                                     AS orders,
       ROUND(SUM(CAST(o.order_total AS REAL)), 0)   AS revenue
FROM   sales_orders o
JOIN   customers   c USING (customer_id)
GROUP  BY c.region
ORDER  BY revenue DESC
LIMIT  8;
region orders revenue
Midwest 465 $2,352,437
Northeast 344 $1,856,540
West 290 $1,433,102
South 257 $1,247,029
Quebec 144 $759,176
Scotland 133 $647,251
England 99 $514,710
Ontario 92 $501,344

The four US regions dominate. Quebec, Scotland, and Ontario each show their own row because the customer field stores them by sub-national region rather than country. (Episode 4 covers JOINs in depth — the one-line JOIN here is the version you’ll use most often.)

A finance trap: Revenue debits are zero

Now run this — the JOIN-and-group from Episode 1 — but read the result carefully:

SELECT account_type,
       COUNT(*)                              AS entries,
       ROUND(SUM(CAST(debit AS REAL)), 0)    AS total_debits
FROM   gl_journal
JOIN   chart_of_accounts USING (account_id)
GROUP  BY account_type
ORDER  BY total_debits DESC;
account_type entries total_debits
Asset 10,031 $27,396,281
Expense 5,713 $10,000,777
Liability 764 $4,817,995
Revenue 5,518 $0
Equity 2 $0

Revenue total: zero. Equity total: zero. 5,518 revenue lines and the SUM is zero?

This is correct. In double-entry bookkeeping, revenue accounts are credit-normal — every revenue entry is a credit, not a debit. Summing the debit column on revenue rows gives you exactly $0. To get revenue you’d sum the credit column, or use credit - debit.

The lesson is bigger than the syntax: aggregates over the wrong column give you the wrong number, silently. The query runs, the result table looks plausible, the CFO reads “Revenue = $0” and wonders if you’ve lost your mind. Always sanity-check the column you’re aggregating against the meaning of the data.

HAVING: filter the groups themselves

WHERE filters rows before aggregation. HAVING filters groups after aggregation. The classic question — “which regions did more than a million dollars of business?” — needs HAVING:

SELECT c.region,
       ROUND(SUM(CAST(o.order_total AS REAL)), 0) AS revenue
FROM   sales_orders o
JOIN   customers   c USING (customer_id)
GROUP  BY c.region
HAVING SUM(CAST(o.order_total AS REAL)) > 1000000
ORDER  BY revenue DESC;

Four rows: Midwest, Northeast, West, South — the regions that crossed seven figures.

Could you do this with WHERE instead? No — the row-level WHERE doesn’t know about group totals because grouping hasn’t happened yet. HAVING runs after GROUP BY collapses the rows, and it can reference aggregate functions directly.

The mental order is:

  1. FROM / JOIN — gather rows
  2. WHERE — filter rows
  3. GROUP BY — collapse into groups
  4. HAVING — filter groups
  5. SELECT — pick columns and aggregates
  6. ORDER BY — sort
  7. LIMIT — take top N

Memorize that ordering. It’s the lifecycle of every aggregate query you’ll ever write.

The full pivot-table pattern

Combine everything from this episode and Episode 2 and you have the universal “pivot-table-as-query” template:

SELECT   <category>,
         <aggregate functions>
FROM     <table or join>
WHERE    <row-level filter>
GROUP BY <category>
HAVING   <group-level filter>
ORDER BY <column or aggregate>
LIMIT    <top N>;

That single shape covers the vast majority of analytical queries finance teams write — month-over-month revenue, top-twenty vendors by spend, average days-to-pay by customer segment. Once it’s in your fingers, the next thousand variations are minor edits.

What’s next

Episode 4: JOINs in depth. We’ve used the one-line JOIN casually in Episodes 1 and 3; next episode is the careful walk-through — INNER, LEFT, RIGHT, the AtlasParts ER diagram, why USING is sometimes wrong, and how to debug a join that returns the wrong number of rows.

Code

The queries from this episode are in atlasparts/sql/ep03_aggregations.sql on the repo. Run with sqlite3 atlasparts.db < ep03_aggregations.sql.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.