Part of SQL for Finance

SQL for Finance: Subqueries and CTEs: Stop Writing One Giant Query

Celest KimCelest Kim

Video: Subqueries and CTEs: Stop Writing One Giant Query | SQL for Finance Ep5 by CelesteAI

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

Download the AtlasParts dataset and atlasparts.db — same as previous episodes: github.com/GoCelesteAI/excel-powerbi-for-finance.

By Episode 4 you can stitch any two AtlasParts tables together, aggregate at the right grain, and avoid the cardinality traps. That covers most one-shot reporting questions. But the moment a question has two parts — “show me customers above the average” or “build me a monthly P&L” — a single SELECT runs out of room.

This episode is about how you break a hard question into stages. There are two tools for that: subqueries (a query inside another query) and CTEs (a WITH clause that names each stage). Both compute the same things. CTEs make the code readable; subqueries make it compact. By the end you’ll know when to reach for which, and you’ll have built a four-stage monthly P&L pipeline that mirrors what your finance team actually builds in a real reporting database.

The two-step problem

The single hardest gear-shift moving from Excel to SQL is realizing that a single query can answer questions like “show me everyone above average revenue.”

In Excel you’d write the average in cell B1, then sort the column and filter B > $B$1. Two steps, two cells, easy.

In SQL you can’t reference a “previous result” — a query is one closed expression. Your tool for that is the subquery: a complete SELECT that returns one or more rows, dropped inside another query.

SELECT customer_id,
       ROUND(SUM(CAST(order_total AS REAL)), 0) AS revenue
FROM   sales_orders
GROUP  BY customer_id
HAVING revenue > (
  SELECT AVG(rev) FROM (
    SELECT SUM(CAST(order_total AS REAL)) AS rev
    FROM   sales_orders
    GROUP  BY customer_id
  )
)
ORDER  BY revenue DESC;

That inner (SELECT AVG(rev) FROM ...) returns one number — 230,113 — the average revenue per customer across all 50 customers. The outer query keeps only the 25 customers above that line. Without the subquery, you’d run two queries, copy the average into the second one by hand, and re-run it tomorrow.

Subqueries can show up in three places:

Position Returns Example
WHERE One row, one col WHERE total > (SELECT AVG(total) FROM ...)
FROM A table FROM (SELECT ... FROM orders GROUP BY ...) x
SELECT One row, one col SELECT name, (SELECT COUNT(*) FROM lines ...)

Where subqueries break down

Try writing a four-stage monthly P&L using only subqueries. Revenue per month, COGS per month, OpEx per month, then revenue minus COGS minus OpEx. The query becomes a Russian doll — each stage nested inside the next, indented across half your screen. You can’t tell which part computes what; you can’t reuse a stage; and adding a fifth stage means rewriting the whole tree.

That’s where CTEs save you.

CTEs: name your stages

A CTE (Common Table Expression) is a temporary, named result set you define with WITH at the top of a query and then reference like a table. Same SQL engine, same execution plan in most cases — but you read top-to-bottom instead of inside-out.

Here is the same “above-average customers” query, refactored:

WITH per_customer AS (
  SELECT customer_id,
         SUM(CAST(order_total AS REAL)) AS revenue
  FROM   sales_orders
  GROUP  BY customer_id
),
average AS (
  SELECT AVG(revenue) AS avg_rev FROM per_customer
)
SELECT customer_id, ROUND(revenue, 0) AS revenue
FROM   per_customer, average
WHERE  revenue > avg_rev
ORDER  BY revenue DESC;

Same answer (25 customers, top one is Express Distributors at $347,694). But now anyone reading it sees the steps spelled out: first compute revenue per customer; then compute the average; then filter.

CTEs can also reference other CTEs, which is what makes them the right tool for a real pipeline.

The centerpiece: a four-stage monthly P&L

Here’s the kind of query a finance team actually maintains in a reporting database. It’s the building block of every “monthly performance” dashboard. Watch how each WITH stage answers one finance question, then the final SELECT lays them side by side.

WITH monthly_pnl AS (
  SELECT strftime('%Y-%m', g.posting_date) AS month,

         SUM(CASE WHEN a.account_type = 'Revenue'
                  THEN CAST(g.credit AS REAL) ELSE 0 END) AS revenue,

         SUM(CASE WHEN a.account_type = 'Expense'
                       AND CAST(a.account_id AS INTEGER) BETWEEN 5000 AND 5999
                  THEN CAST(g.debit AS REAL) ELSE 0 END) AS cogs,

         SUM(CASE WHEN a.account_type = 'Expense'
                       AND CAST(a.account_id AS INTEGER) >= 6000
                  THEN CAST(g.debit AS REAL) ELSE 0 END) AS opex
  FROM   gl_journal       g
  JOIN   chart_of_accounts a USING (account_id)
  GROUP  BY month
)
SELECT month,
       ROUND(revenue, 0)                  AS revenue,
       ROUND(cogs, 0)                     AS cogs,
       ROUND(revenue - cogs, 0)           AS gross_profit,
       ROUND(opex, 0)                     AS opex,
       ROUND(revenue - cogs - opex, 0)    AS net_profit
FROM   monthly_pnl
ORDER  BY month;

The result is a real income statement, by month, for AtlasParts’ first year:

month revenue cogs gross_profit opex net_profit
2025-01 447,166 323,580 123,587 179,837 -56,250
2025-02 504,074 368,058 136,016 165,664 -29,647
2025-03 716,093 523,362 192,731 189,055 3,676
2025-04 886,348 646,932 239,416 203,288 36,128
2025-05 784,731 571,073 213,657 163,116 50,541
2025-06 717,665 527,735 189,930 183,101 6,829
2025-07 877,149 642,899 234,249 182,078 52,172
2025-08 967,301 704,490 262,811 180,246 82,565
2025-09 763,483 553,619 209,864 195,926 13,938
2025-10 1,248,662 913,721 334,942 269,945 64,997
2025-11 1,558,835 1,139,106 419,729 231,139 188,590
2025-12 990,052 725,891 264,161 216,917 47,244

That’s a story: AtlasParts loses money in January and February as fixed OpEx outruns gross profit, breaks even in March, and grows steadily through November’s holiday peak. You couldn’t see that pattern in a single un-pipelined query — it would either be unreadable or run in three passes with three CSV exports.

Subquery vs CTE: when to use which

Both compute the same thing. The difference is readability and reuse.

Reach for a subquery when: - The inner query returns one number used in exactly one place (e.g. WHERE x > (SELECT AVG(x) FROM ...)) - The whole query fits on a screen - You’re doing a quick, throwaway check

Reach for a CTE when: - A stage is reused more than once - The query has more than two logical stages - Anyone other than you will read the code (every stage names what it computes) - You’re building a saved view, a dashboard query, or anything that lives past today

A good rule: if you nest a subquery more than two levels deep, stop and rewrite as CTEs. The reader will thank you. So will future-you in three months when you need to debug it.

A closing note on WITH RECURSIVE

Most CTE work in finance is the kind you saw above — staged calculations. But CTEs have one more trick: WITH RECURSIVE lets a CTE refer to itself, which is how you walk a chart-of-accounts hierarchy (“show me every account under ‘6000 Operating Expenses’”) or build a date spine. We won’t cover recursive CTEs today, but know they exist — when you’re ready to flatten an account tree, that’s the tool.

What’s next

In Episode 6 we’ll cover window functions — the OVER clause, RANK, ROW_NUMBER, LAG and LEAD. These are the tools you reach for when “GROUP BY” can’t help you, like running totals, rolling averages, year-over-year comparisons, and “show me each customer’s biggest order.” Window functions are the single most powerful SQL feature finance pros tend to discover late — once you have them, your dashboards stop needing helper columns.

The full SQL file from this episode (every query, runnable against atlasparts.db) is on GitHub: atlasparts/sql/ep05_subqueries_ctes.sql.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.