Part of SQL for Finance

SQL for Finance: The Finance Reporting Query Library (Capstone)

Celest Kim

•May 7, 2026

Video: The Finance Reporting Query Library (Capstone) | SQL for Finance Ep8 by CelesteAI

Take the quiz on the full lesson page

Test what you've read · interactive walkthrough

Seven episodes ago we opened a .db file for the first time and ran SELECT * FROM customers LIMIT 5. Since then we’ve layered in WHERE, GROUP BY, JOIN, CTEs, window functions, dates. Each episode added one tool. Each tool got demonstrated on a question a CFO might actually ask.

This is the finale. We’re going to do something different. Instead of teaching one new tool, we’re going to take everything from the previous seven episodes and bundle it into the thing finance teams actually use SQL for: a query library.

A query library is a folder of .sql files, each one answering a recurring question. P&L. Balance sheet. AR aging. AP aging. The same four reports your accounting software produces, except yours run in 200 milliseconds against the raw GL, you control every line of the calculation, and the file is yours forever. Open Excel, paste the result, refresh tomorrow, refresh next month. The library outlives the analyst.

That’s the mental shift this episode is about. From “I’ll write a query when someone asks me a question” to “I have a folder of queries, and I run them on the first of every month.”

Why a library, not a notebook

You can build a P&L in a Jupyter notebook. You can build it in a 30-row CTE. You can build it in 14 stacked Excel formulas. People do all three.

A .sql file has properties the others don’t:

It’s plain text. Diff it. Commit it. Email it. Open it in Notepad on a locked-down work machine where you can’t install anything. The text is the contract.
It runs in any client. sqlite3 atlasparts.db < pnl.sql > pnl.csv works in CI, in a cron job, in a terminal a junior just opened. No kernel state. No “did you run cell 4 first?”
It’s the artifact handed to engineering. When the data team rebuilds the warehouse, your .sql file is the spec. They translate the SQLite dialect to Postgres or BigQuery and the logic survives. Notebooks don’t survive that translation. SQL does.

This is also the literal pattern dbt uses, the pattern Looker uses, the pattern any modern analytics stack uses. A folder of files, each describing one piece of business logic. We’re just going to do it without the framework — because the framework adds complexity we don’t need for a 70-account chart and 22,000 journal lines (53 active in 2025).

The four reports

Four reports cover most of what a small finance team produces every month. They’re our library.

Trial balance — every active account, debits and credits, net. The sanity check. If this doesn’t sum to zero across all accounts, nothing else matters.
P&L (income statement) — revenue minus expense, broken out by account, plus the net-income line.
Balance sheet snapshot — assets, liabilities, and equity at a point in time. The static portrait of where the company stands.
AR aging — who owes us money, how long it’s been outstanding, how much of it is in trouble. The mirror of last episode’s AP aging.

Four queries. Four files. One folder. That’s the library.

Trial balance — the integrity check

Before you trust any other report, you trust the trial balance. Sum every debit, sum every credit, take the difference per account, and confirm the books balance.

SELECT  c.account_type,
        c.account_id,
        c.account_name,
        ROUND(SUM(CAST(j.debit  AS REAL)), 2) AS total_dr,
        ROUND(SUM(CAST(j.credit AS REAL)), 2) AS total_cr,
        ROUND(SUM(CAST(j.debit  AS REAL)
                - CAST(j.credit AS REAL)), 2) AS net_dr_cr
FROM    gl_journal       j
JOIN    chart_of_accounts c ON c.account_id = j.account_id
WHERE   c.is_active = 'TRUE'
GROUP BY c.account_id
ORDER BY c.account_type, c.account_id;

Run it on AtlasParts and you get 53 rows — one per account with activity in 2025 — with debit, credit, and net columns. Sum the net_dr_cr column down the page and you should get zero. Any non-zero number is a posting error: a one-sided journal, a cut-off mistake, a typo. The trial balance is the first query you run and the last query you trust.

This single result drops down through every other report. P&L, balance sheet, aging — all of them aggregate the same gl_journal rows. If the trial balance is wrong, everything downstream is wrong. So the file at the top of your library is 01_trial_balance.sql.

P&L — the marquee

Revenue accounts have credit-balance natures. Expense accounts have debit-balance natures. To present revenue as a positive number, you flip its sign: SUM(credit - debit). For expenses you keep the sign as is: SUM(debit - credit).

WITH activity AS (
  SELECT  c.account_type,
          c.account_id,
          c.account_name,
          ROUND(SUM(CAST(j.credit AS REAL)
                  - CAST(j.debit  AS REAL)), 0) AS amount
  FROM    gl_journal       j
  JOIN    chart_of_accounts c ON c.account_id = j.account_id
  WHERE   c.account_type IN ('Revenue', 'Expense')
  GROUP BY c.account_id
)
SELECT  account_type,
        account_name,
        CASE WHEN account_type = 'Revenue' THEN  amount
                                           ELSE -amount END AS pnl_amount
FROM    activity
ORDER BY account_type DESC, account_id;

For AtlasParts 2025 the answer is $10.46M revenue, $10.00M expense, $461K net income. A 4.4% operating margin. The CTE separates the aggregation from the presentation — once you’ve summed the activity, the second SELECT just decides how to display it. Add a UNION at the bottom that produces the totals row and you have a board-ready report. Pipe it to a CSV and your CFO can paste it into a deck.

The P&L is the marquee report because it answers the headline question: did we make money this period? Everything else explains how and why, but the P&L answers the whether.

Balance sheet — the snapshot

A balance sheet is a slice through time. Run it as of December 31 and you get the closing position. Run it as of July 31 and you get mid-year.

SELECT  c.account_type,
        c.account_name,
        CASE WHEN c.account_type = 'Asset'
             THEN  ROUND(SUM(CAST(j.debit  AS REAL)
                           - CAST(j.credit AS REAL)), 0)
             ELSE  ROUND(SUM(CAST(j.credit AS REAL)
                           - CAST(j.debit  AS REAL)), 0) END AS balance
FROM    gl_journal       j
JOIN    chart_of_accounts c ON c.account_id = j.account_id
WHERE   c.account_type IN ('Asset', 'Liability', 'Equity')
  AND   j.posting_date <= '2025-12-31'
GROUP BY c.account_id
HAVING  ABS(balance) > 0
ORDER BY c.account_type, c.account_id;

Two things to notice. First, the date filter — posting_date <= '2025-12-31' — is what makes it a snapshot. Change the date and you change the snapshot. Pass the date in as a parameter and you have a balance-sheet generator.

Second, the sign-flip is per account-type. Assets are debit-natural so we use debit - credit. Liabilities and equity are credit-natural so we use credit - debit. The CASE expression handles both in one query.

For AtlasParts at year-end: $7.99M in assets, $3.44M in liabilities, $4.55M in equity (the equity is $1M of common stock plus $3.09M of retained earnings carried forward, plus $461K of period net income that has not yet been closed out). The accounting equation holds — assets equal liabilities plus equity. If it didn’t, the trial balance would have caught it first.

AR aging — the mirror of AP

Last episode we built AP aging — invoices we owe, bucketed by how late they are. AR aging is the same query with three substitutions: sales_orders instead of ap_invoices, the customer’s payment terms instead of the vendor’s, and Shipped status instead of Open.

WITH ar_open AS (
  SELECT  s.order_id,
          s.customer_id,
          s.order_date,
          CAST(s.order_total AS REAL) AS amount,
          DATE(s.order_date,
               '+' || CAST(SUBSTR(c.payment_terms, 5) AS INT)
                   || ' days') AS due_date
  FROM    sales_orders s
  JOIN    customers     c ON c.customer_id = s.customer_id
  WHERE   s.status = 'Shipped'
    AND   c.payment_terms LIKE 'Net %'
)
SELECT  CASE
          WHEN julianday('2025-12-31') - julianday(due_date) <=  0 THEN 'Current'
          WHEN julianday('2025-12-31') - julianday(due_date) <= 30 THEN '1-30'
          WHEN julianday('2025-12-31') - julianday(due_date) <= 60 THEN '31-60'
          WHEN julianday('2025-12-31') - julianday(due_date) <= 90 THEN '61-90'
          ELSE                                                          '90+'
        END                                          AS bucket,
        COUNT(*)                                     AS invoices,
        ROUND(SUM(amount), 0)                        AS balance
FROM    ar_open
GROUP BY bucket
ORDER BY MIN(julianday('2025-12-31') - julianday(due_date));

Three small changes from last episode’s AP query. The CTE pulls from sales_orders joined to customers and parses the Net 30 / Net 45 / Net 60 string into a number of days. The CASE block buckets the difference. The ORDER BY MIN(...) trick — same pattern as last episode — keeps the buckets in time order without a manual CASE in the ORDER BY.

Drop one query, change three identifiers, get a different report. That’s the value of the mental model from Episode 7. The aging pattern isn’t just for AP; it’s for any “how old is this thing” report. AR aging. Inventory aging. Open service tickets. Contract renewals. The pattern transfers.

Organizing the library

Four queries. One folder. Numbered prefixes so the order is preserved when you ls:

sql/
  finance-reports/
    01_trial_balance.sql
    02_pnl.sql
    03_balance_sheet.sql
    04_ar_aging.sql
    README.md

The numbering matches the order you run them. Trial balance first — if it’s broken, stop. P&L next — the headline. Balance sheet third — the position. AR aging last — the working-capital concern.

To run the whole pack:

sqlite3 atlasparts.db < 01_trial_balance.sql > out/trial.csv
sqlite3 atlasparts.db < 02_pnl.sql            > out/pnl.csv
sqlite3 atlasparts.db < 03_balance_sheet.sql  > out/bs.csv
sqlite3 atlasparts.db < 04_ar_aging.sql       > out/ar.csv

Or as a single shell loop:

for f in sql/finance-reports/*.sql; do
  name=$(basename "$f" .sql)
  sqlite3 atlasparts.db < "$f" > "out/${name}.csv"
done

Each file is its own contract. Each output is its own CSV. Open the CSVs in Excel, paste them into a board pack, refresh next month.

Inside each .sql file, there’s a small ritual that makes the library nicer to use:

-- 02_pnl.sql
-- AtlasParts P&L for the period 2025-01-01 .. 2025-12-31.
-- Source: gl_journal joined to chart_of_accounts.
-- Output: account_type, account_name, pnl_amount.

.headers on
.mode csv

WITH activity AS (
  ...
)
SELECT ...;

Header comments tell future-you what the file does. The .headers on and .mode csv directives mean the output is ready to import without post-processing. Three lines of preamble, but they save you ten minutes of reformatting every time you regenerate the report.

What the library is not

A library is not a dashboard. It produces flat results — CSVs and tabular output — that get assembled into a dashboard somewhere downstream. The dashboard is Power BI’s job, or Excel’s, or a notebook’s. Keep the SQL focused on the business logic.

A library is not a substitute for an ETL pipeline. If you need scheduled refreshes, a warehouse, role-based access, you eventually outgrow the loose-files approach. dbt is what you graduate to. But the queries you wrote in the loose-files phase translate almost line-for-line. The investment compounds.

A library is not finished. Every month you find a new question worth saving, you add another file. 05_vendor_concentration.sql. 06_top_customers_ytd.sql. The folder grows. Six months in, you can answer most ad-hoc finance questions in five minutes because someone — you, last quarter — already wrote the query.

Eight episodes, one mental model

We started with SELECT * FROM customers LIMIT 5. We ended with a folder of .sql files that produces a board-ready financial pack.

The arc: 1. Episode 1. Why SQL when you have Excel. The 22K-row aggregation in 12 milliseconds. 2. Episode 2. WHERE, ORDER BY, LIMIT. The first filtered question. 3. Episode 3. GROUP BY. The pivot-table replacement. 4. Episode 4. JOINs. The chart-of-accounts join. 5. Episode 5. CTEs and subqueries. The pipeline shape that produces a P&L. 6. Episode 6. Window functions. Running totals and prior-period comparisons. 7. Episode 7. Dates, fiscal years, AP aging. The date-mechanics layer. 8. Episode 8. The library. Everything bundled, run from CLI, output to CSV, ready for production.

Every tool earned its place by answering a question a CFO actually asks. None of it was demonstrated on toy data — every example ran on AtlasParts, the same 22K-row GL we opened in Episode 1.

The skill you have now is the skill that bridges spreadsheet work and the rest of the data team. You can read the same SQL the engineers write. You can write the same .sql files dbt expects. You can answer the question in the time the meeting takes, because the query is already in your folder.

Where to go next

If you want to keep building on this, three directions:

Add reports. Vendor concentration. Top customers. Inventory turnover. Each new report is one new file. The library compounds.
Move to a warehouse. When AtlasParts becomes a real company with a real Postgres, the SQLite queries port across with minor dialect changes. Try BigQuery’s free tier with a public dataset to feel the difference.
Adopt dbt. dbt is the loose-files-of-SQL pattern, formalized. The queries you wrote in this series are exactly what a dbt model looks like. The leap from this library to a models/ folder is small.

That’s the series. Eight episodes, one dataset, one mental model. The full query library is on the GitHub repo — copy it, modify it, ship your own version. Subscribe and the next series shows up in your feed.

See you in the next one.

Ready? Take the quiz on the full lesson page →

Test what you've learned. Watch the lesson and try the interactive quiz on the same page.

View all episodes in SQL for Finance →