SQL for Finance: Subqueries and CTEs: Stop Writing One Giant Query
Video: Subqueries and CTEs: Stop Writing One Giant Query | SQL for Finance Ep5 by CelesteAI
Download the AtlasParts dataset and
atlasparts.db— same as previous episodes: github.com/GoCelesteAI/excel-powerbi-for-finance.
By Episode 4 you can stitch any two AtlasParts tables together, aggregate at the right grain, and avoid the cardinality traps. That covers most one-shot reporting questions. But the moment a question has two parts — “show me customers above the average” or “build me a monthly P&L” — a single SELECT runs out of room.
This episode is about how you break a hard question into stages. There are two tools for that: subqueries (a query inside another query) and CTEs (a WITH clause that names each stage). Both compute the same things. CTEs make the code readable; subqueries make it compact. By the end you’ll know when to reach for which, and you’ll have built a four-stage monthly P&L pipeline that mirrors what your finance team actually builds in a real reporting database.
The two-step problem
The single hardest gear-shift moving from Excel to SQL is realizing that a single query can answer questions like “show me everyone above average revenue.”
In Excel you’d write the average in cell B1, then sort the column and filter B > $B$1. Two steps, two cells, easy.
In SQL you can’t reference a “previous result” — a query is one closed expression. Your tool for that is the subquery: a complete SELECT that returns one or more rows, dropped inside another query.
SELECT customer_id,
ROUND(SUM(CAST(order_total AS REAL)), 0) AS revenue
FROM sales_orders
GROUP BY customer_id
HAVING revenue > (
SELECT AVG(rev) FROM (
SELECT SUM(CAST(order_total AS REAL)) AS rev
FROM sales_orders
GROUP BY customer_id
)
)
ORDER BY revenue DESC;
That inner (SELECT AVG(rev) FROM ...) returns one number — 230,113 — the average revenue per customer across all 50 customers. The outer query keeps only the 25 customers above that line. Without the subquery, you’d run two queries, copy the average into the second one by hand, and re-run it tomorrow.
Subqueries can show up in three places:
| Position | Returns | Example |
|---|---|---|
WHERE |
One row, one col | WHERE total > (SELECT AVG(total) FROM ...) |
FROM |
A table | FROM (SELECT ... FROM orders GROUP BY ...) x |
SELECT |
One row, one col | SELECT name, (SELECT COUNT(*) FROM lines ...) |
Where subqueries break down
Try writing a four-stage monthly P&L using only subqueries. Revenue per month, COGS per month, OpEx per month, then revenue minus COGS minus OpEx. The query becomes a Russian doll — each stage nested inside the next, indented across half your screen. You can’t tell which part computes what; you can’t reuse a stage; and adding a fifth stage means rewriting the whole tree.
That’s where CTEs save you.
CTEs: name your stages
A CTE (Common Table Expression) is a temporary, named result set you define with WITH at the top of a query and then reference like a table. Same SQL engine, same execution plan in most cases — but you read top-to-bottom instead of inside-out.
Here is the same “above-average customers” query, refactored:
WITH per_customer AS (
SELECT customer_id,
SUM(CAST(order_total AS REAL)) AS revenue
FROM sales_orders
GROUP BY customer_id
),
average AS (
SELECT AVG(revenue) AS avg_rev FROM per_customer
)
SELECT customer_id, ROUND(revenue, 0) AS revenue
FROM per_customer, average
WHERE revenue > avg_rev
ORDER BY revenue DESC;
Same answer (25 customers, top one is Express Distributors at $347,694). But now anyone reading it sees the steps spelled out: first compute revenue per customer; then compute the average; then filter.
CTEs can also reference other CTEs, which is what makes them the right tool for a real pipeline.
The centerpiece: a four-stage monthly P&L
Here’s the kind of query a finance team actually maintains in a reporting database. It’s the building block of every “monthly performance” dashboard. Watch how each WITH stage answers one finance question, then the final SELECT lays them side by side.
WITH monthly_pnl AS (
SELECT strftime('%Y-%m', g.posting_date) AS month,
SUM(CASE WHEN a.account_type = 'Revenue'
THEN CAST(g.credit AS REAL) ELSE 0 END) AS revenue,
SUM(CASE WHEN a.account_type = 'Expense'
AND CAST(a.account_id AS INTEGER) BETWEEN 5000 AND 5999
THEN CAST(g.debit AS REAL) ELSE 0 END) AS cogs,
SUM(CASE WHEN a.account_type = 'Expense'
AND CAST(a.account_id AS INTEGER) >= 6000
THEN CAST(g.debit AS REAL) ELSE 0 END) AS opex
FROM gl_journal g
JOIN chart_of_accounts a USING (account_id)
GROUP BY month
)
SELECT month,
ROUND(revenue, 0) AS revenue,
ROUND(cogs, 0) AS cogs,
ROUND(revenue - cogs, 0) AS gross_profit,
ROUND(opex, 0) AS opex,
ROUND(revenue - cogs - opex, 0) AS net_profit
FROM monthly_pnl
ORDER BY month;
The result is a real income statement, by month, for AtlasParts’ first year:
| month | revenue | cogs | gross_profit | opex | net_profit |
|---|---|---|---|---|---|
| 2025-01 | 447,166 | 323,580 | 123,587 | 179,837 | -56,250 |
| 2025-02 | 504,074 | 368,058 | 136,016 | 165,664 | -29,647 |
| 2025-03 | 716,093 | 523,362 | 192,731 | 189,055 | 3,676 |
| 2025-04 | 886,348 | 646,932 | 239,416 | 203,288 | 36,128 |
| 2025-05 | 784,731 | 571,073 | 213,657 | 163,116 | 50,541 |
| 2025-06 | 717,665 | 527,735 | 189,930 | 183,101 | 6,829 |
| 2025-07 | 877,149 | 642,899 | 234,249 | 182,078 | 52,172 |
| 2025-08 | 967,301 | 704,490 | 262,811 | 180,246 | 82,565 |
| 2025-09 | 763,483 | 553,619 | 209,864 | 195,926 | 13,938 |
| 2025-10 | 1,248,662 | 913,721 | 334,942 | 269,945 | 64,997 |
| 2025-11 | 1,558,835 | 1,139,106 | 419,729 | 231,139 | 188,590 |
| 2025-12 | 990,052 | 725,891 | 264,161 | 216,917 | 47,244 |
That’s a story: AtlasParts loses money in January and February as fixed OpEx outruns gross profit, breaks even in March, and grows steadily through November’s holiday peak. You couldn’t see that pattern in a single un-pipelined query — it would either be unreadable or run in three passes with three CSV exports.
Subquery vs CTE: when to use which
Both compute the same thing. The difference is readability and reuse.
Reach for a subquery when:
- The inner query returns one number used in exactly one place (e.g. WHERE x > (SELECT AVG(x) FROM ...))
- The whole query fits on a screen
- You’re doing a quick, throwaway check
Reach for a CTE when: - A stage is reused more than once - The query has more than two logical stages - Anyone other than you will read the code (every stage names what it computes) - You’re building a saved view, a dashboard query, or anything that lives past today
A good rule: if you nest a subquery more than two levels deep, stop and rewrite as CTEs. The reader will thank you. So will future-you in three months when you need to debug it.
A closing note on WITH RECURSIVE
Most CTE work in finance is the kind you saw above — staged calculations. But CTEs have one more trick: WITH RECURSIVE lets a CTE refer to itself, which is how you walk a chart-of-accounts hierarchy (“show me every account under ‘6000 Operating Expenses’”) or build a date spine. We won’t cover recursive CTEs today, but know they exist — when you’re ready to flatten an account tree, that’s the tool.
What’s next
In Episode 6 we’ll cover window functions — the OVER clause, RANK, ROW_NUMBER, LAG and LEAD. These are the tools you reach for when “GROUP BY” can’t help you, like running totals, rolling averages, year-over-year comparisons, and “show me each customer’s biggest order.” Window functions are the single most powerful SQL feature finance pros tend to discover late — once you have them, your dashboards stop needing helper columns.
The full SQL file from this episode (every query, runnable against atlasparts.db) is on GitHub: atlasparts/sql/ep05_subqueries_ctes.sql.