Excel & Power BI for Finance: Power Query for Finance: Clean Once, Refresh Forever
Video: Power Query for Finance: Clean Once, Refresh Forever | Excel & Power BI S1 Ep5 by CelesteAI
Watch full page →Download the AtlasParts dataset and the Episode 5 starter / completed workbooks — follow along in your own copy of Excel: github.com/GoCelesteAI/excel-powerbi-for-finance.
In Episode 4 we built the Episode 3 P&L panel as a pivot — same numbers, no formulas. At the very end we hit a ceiling: pivots are snapshots. They don’t run on a schedule, they don’t auto-update, and they break if the source file’s structure changes. Worse, even when the pivot itself is fine, the data feeding it is rarely clean. The customer master in AtlasParts has the country US spelled five different ways. The pivot dutifully aggregates each spelling into its own column. The report shows duplicate revenue stripes. Nobody trusts it.
This episode is about Power Query — the transformation layer that sits between your raw data and your report. You build the cleanup once, as a recipe of steps. The recipe runs every time you click Refresh, on whatever data is sitting in the source file. Tomorrow’s export goes through the same pipeline as today’s. The customer master gets canonicalized. The blanks get filled. The header drift gets handled. And the pivot gets clean data without you babysitting it.
It’s the difference between cleaning your spreadsheet and building a process that cleans your spreadsheet.
The mess we’re cleaning
Open customers.csv from the AtlasParts dataset. Fifty rows. Seven columns: customer_id, customer_name, country, region, industry, credit_limit, payment_terms. The structure is fine. The data is real-life messy:
- The
countrycolumn hasUS,USA,us(lowercase), andUnited States of Americafor what should be one country UKshows up alongsideGreat BritainGermanyshows up alongsideDE(the ISO country code)- Several
customer_namevalues have leading or trailing whitespace —" Ashford Auto Repair","Eastgate Wholesale " - The
regioncolumn uses inconsistent capitalisation and abbreviation
In Episode 3 we built a chained XLOOKUP that looked up customer_country from this master to populate the GL. The lookup worked — but it pulled in the dirty values. So our pivot in Episode 4 has columns for US, USA, us, and United States of America — four columns that should be one. The numbers are right; the dimensional grouping is wrong.
We could clean the CSV by hand. That works once. It doesn’t work on next month’s export. Power Query is the answer because it builds a recipe — an ordered list of transformations — that runs on whatever file is at the source path, every time you refresh.
Loading data into the Power Query Editor
Power Query lives inside Excel under Data → Get & Transform Data. There are several entry points: From File (CSV, Excel, JSON), From Database (SQL Server, Oracle), From Web, From Folder. For our customer master, From Text/CSV is the right choice. Browse to customers.csv and Excel shows a preview pane: detected delimiter, encoding, a sample of the first 200 rows. Click Transform Data (not Load) and the Power Query Editor opens in a new window.
The editor has three regions worth knowing:
- Left sidebar —
Queriespanel. Lists every query in this workbook. Queries can chain: one query’s output is another query’s input. - Centre — the data preview. Shows what the query produces at this step. Above it, a formula bar holds the M code for the current step. M is Power Query’s underlying language; you don’t need to write it from scratch but reading it helps debugging.
- Right sidebar — the
Applied Stepspanel. The recipe. Every transformation you apply in the UI shows up here as a step. You can click any step to see what the data looked like at that point, edit it, delete it, or reorder.
That Applied Steps panel is the entire point. The pipeline is visible. Every transformation is an inspectable, editable, reversible operation. When the source data changes next month, every step re-runs in order. When something breaks, you can step through the recipe and see exactly which step failed and on which row.
Cleaning the customer master
Let’s walk the customer master cleanup step by step.
Trim whitespace. Select the customer_name and country columns, then Transform → Format → Trim. The leading and trailing spaces on Ashford Auto Repair, Eastgate Wholesale, and a handful of others disappear. A new step Trimmed Text appears in Applied Steps. The M code shows = Table.TransformColumns(Source, {{"customer_name", Text.Trim}, {"country", Text.Trim}}) — readable, even if you’ve never written M.
Uppercase country codes for matching. Right-click the country column → Transform → UPPERCASE. Now us becomes US, which collapses one of our duplicates immediately. Two of the five US variants are now the same.
Replace values for canonicalisation. Right-click country → Replace Values. Replace USA with US. Apply. Replace UNITED STATES OF AMERICA with US. Replace GREAT BRITAIN with UK. Replace DE with GERMANY. Each replacement is a separate Applied Step — meaning they’re independently inspectable and reversible. This is much better than a giant nested IF formula in a helper column, because a future reader sees the cleanup as a documented, ordered transformation.
Fix the country casing back to title case. Now that all variants are unified, change country back to title case so the pivot shows Us — wait, that’s wrong, we want US and UK to stay uppercase. So: change to title case for everything, then a single Replace Values for Us → US and Uk → UK. (In a real workflow you’d use a small reference table; we’ll show that in the workbook.)
Change column types. Power Query infers types but it’s worth being explicit. Click each column header and set: customer_id Text, country Text, credit_limit Whole Number, customer_name Text. Wrong types cause subtle failures downstream — a credit_limit imported as Text won’t sum correctly in a pivot.
Rename the query. In the right pane, change the query name from customers to customers_clean. This is what shows up in your worksheet’s queries pane and what other queries reference.
When you click Home → Close & Load, Power Query loads the cleaned table into a new sheet. Queries on the right pane shows customers_clean with row count. Crucially: the data is now connected to the source file. Edit customers.csv, save, then Data → Refresh All, and the cleanup runs again on the new data.
The Group By and Merge moves
Two transformations carry their weight in finance work and deserve names of their own.
Group By. From the Power Query Editor, Home → Group By. Pick country and the aggregation Sum of credit_limit. The result collapses fifty customer rows into eight country totals. This is what the pivot does, but baked into the query — useful when you want a pre-aggregated table to feed multiple pivots without paying the aggregation cost each time.
Merge. This is Power Query’s join. Home → Merge Queries. Pick a left table (gl_journal), a right table (customers_clean), and the join column on each side (source_doc matched against another query that stitches it through sales_orders). Choose the join type (left outer is the default; left-outer gives you all GL rows even if the customer is missing). Power Query expands the matched columns inline. The chained-XLOOKUP work from Episode 3 — looking up customer_country across two tables — becomes two Merge steps in Power Query, each with named columns instead of nested formulas.
Once you have the cleaned customer master and the merged GL, build the pivot in Episode 4’s style. The country column in the pivot now has four country columns instead of seven — Canada, France, Germany, UK, US. The numbers stayed the same; the duplicates collapsed. That’s the win.
What the recipe buys you
The first time you build this, it feels like more work than just cleaning the CSV by hand. It is. The payoff is in week two.
Next month, finance exports a new customers.csv. Same path, same shape, same dirty data — and probably some new dirty values you haven’t seen before. You drop the file in place. You click Data → Refresh All. The recipe runs. The new file flows through the same trim, uppercase, replace, type-set, rename pipeline. The new dirty values that match existing replacement rules get cleaned automatically. The new ones that don’t match show up in your country list as obvious outliers — MEXICO shows up as a new column instead of being silently aggregated into something wrong.
You added a row to the source file? It’s in the cleaned table. You renamed a column upstream? Power Query throws a clear error pointing at the step that referenced the missing column — much better than a SUMIFS silently returning zero. You want to add a new transformation? Edit the recipe; the change applies retroactively to every previous and future refresh.
This is what Excel-driven finance teams move to once they realise the unit of work is the recipe, not the spreadsheet.
Three Power Query gotchas
Refresh order matters. If query B uses query A’s output, A has to refresh before B. Excel handles this automatically when you click Refresh All, but if you run individual refreshes manually you can produce a stale state. Stick with Refresh All.
Type inference can lie. Power Query inspects the first 200 rows to infer types. If your first 200 rows have an integer credit_limit but row 250 has a decimal, the column gets typed as Whole Number and row 250 silently errors. Set types explicitly, and check the Errors link in the Queries pane after every refresh.
Sources pinned to absolute paths break on share. If your query references C:\Users\daryl\Documents\customers.csv, sending the workbook to a colleague gives them a broken refresh. Use a parameter for the file path, or store the source file alongside the workbook in a known relative location. Both fixes are documented and standard; pick one and stick with it.
What’s next
We now have a cleaned customer master and the start of a real pipeline. Episode 6 closes Season 1 by walking what makes spreadsheets robust — Excel Tables (Ctrl+T from Episode 4 done properly), named ranges, dynamic array functions like FILTER and UNIQUE and LET, and the discipline that makes a workbook hand-offable rather than personal.
After Season 1 wraps, Season 2 picks up where Power Query left off. Power Query in Excel is the single-workbook version. Power Query in Power BI is the same engine, scaled up — multiple sources, full data model, DAX, and dashboards that refresh on their own schedule. Episode 7 starts the cross-over.
Same data. Same Atlas Parts. Different ceiling.