Aggregations and GROUP BY: The Pivot Table, Replaced | SQL for Finance Ep3

0views
C
CelesteAI
Description
๐Ÿ“‚ AtlasParts dataset and atlasparts.db: https://github.com/GoCelesteAI/excel-powerbi-for-finance Source code: https://github.com/GoCelesteAI/sql-for-finance Episode 3 of SQL for Finance. The pivot table, replaced by one keyword: GROUP BY. The first thing every finance analyst builds in Excel is a pivot table. Drag a category into rows, drag a number into values, change the aggregation to Sum. Total revenue by region, average invoice by vendor, count of orders by status โ€” all answered the same way. In SQL, the same answer is one keyword: GROUP BY. Combined with the five aggregate functions โ€” COUNT, SUM, AVG, MIN, MAX โ€” it replaces 80% of the pivot tables your team builds, and it does it as one query you can save and re-run forever. What You'll Learn: - The five aggregate functions (COUNT, SUM, AVG, MIN, MAX) and how NULL-handling differs between them. - GROUP BY โ€” collapse rows by category and aggregate within each group. - The "every column either grouped or aggregated" rule and why it exists. - GROUP BY with JOIN โ€” the canonical revenue-by-region query that VLOOKUP can barely manage. - The debit-vs-credit trap: aggregates over the wrong column give silently wrong numbers (Revenue debits = $0 in AtlasParts). - HAVING vs WHERE โ€” filter groups after aggregation, not before. - The 7-step query lifecycle (FROM / WHERE / GROUP BY / HAVING / SELECT / ORDER BY / LIMIT). Timestamps: 0:00 - Intro โ€” the pivot table replaced by one keyword 0:30 - The five aggregate functions 1:45 - GROUP BY โ€” orders by status 3:00 - GROUP BY across tables โ€” revenue by region 4:30 - The debit/credit trap (Revenue total = $0) 5:50 - HAVING โ€” filter groups themselves 7:00 - Recap and the 7-step lifecycle 8:00 - End screen Key Takeaways: 1. GROUP BY plus the five aggregates replaces the vast majority of pivot tables a finance team builds. The shape โ€” SELECT category, SUM(value) FROM table GROUP BY category โ€” is the universal "pivot table as query" pattern. Once it's in your fingers, the next thousand variations are minor edits to the category and the aggregate. 2. Every column in the SELECT list of a GROUP BY query must either appear in the GROUP BY clause or be wrapped in an aggregate function. The reason is logical โ€” once rows collapse into groups, any other column has many possible values per group, and SQL doesn't know which to show. Group it, aggregate it, or drop it. 3. The aggregate functions have different NULL behavior. COUNT(*) counts all rows including NULLs. COUNT(column) counts only rows where the column is not NULL. SUM, AVG, MIN, and MAX ignore NULLs by default. That's usually what you want โ€” but if you wonder why your average looks too high, the NULLs got skipped. 4. Aggregates over the wrong column give silently wrong numbers. Summing the debit column on revenue accounts in AtlasParts returns $0 โ€” and it's correct, because revenue is credit-normal in double-entry bookkeeping. The query runs, the result table looks plausible, and the CFO reads "Revenue = $0" and wonders if you've lost your mind. Always sanity-check the column you're aggregating against the meaning of the data. 5. HAVING and WHERE are not interchangeable. WHERE filters rows before grouping. HAVING filters groups after. The mental order of every aggregate query: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT. Memorize it โ€” every aggregate query you'll ever write follows this lifecycle. #SQL #SQLite #FinanceAnalytics #DataAnalytics #SQLforFinance #GroupBy #Aggregations #FinancialReporting #ERPdata #Database #LearnSQL --- Generated by GoCelesteAI ยท part of the SQL for Finance series