3 Ways to Read JSON into a pandas DataFrame
Video: 3 Ways to Read JSON into a pandas DataFrame by CelesteAI
Every REST API, every webhook, every BigQuery export hands you JSON. The shape varies — sometimes a records array, sometimes one-object-per-line, sometimes deeply nested — but the destination is always a DataFrame.
The good news: pandas reads all three shapes in a single line. No manual parsing, no for-loop over keys, no flattening helper. The wire format changes; the call stays simple.
The shape of the problem
JSON is how the web talks to itself. It’s also how data warehouses unload, how log shippers stream, how every modern API hands back results. Three shapes show up over and over:
- Records — a JSON array of objects:
[{...}, {...}]. The default REST response. - JSON Lines (also called NDJSON) — one object per line, no enclosing array:
{...}\n{...}\n. Streaming-friendly. Every BigQuery export, every log pipeline. - Nested — objects inside objects:
{"id": 1, "execution": {"price": 273.08, "venue": "NYSE"}}. What real APIs actually return.
Pandas reads all three. Two with pd.read_json. One with pd.json_normalize.
Method 1 — records array
The canonical pattern:
import pandas as pd
df = pd.read_json("prices.json")
print(df.shape) # (420, 8)
Two things to note:
pd.read_jsonauto-detects records orientation when the top-level value is a list of objects. The dict keys become column names. The objects become rows.- If you ever need to be explicit — say, a server hands back the columnar shape — pass
orient="records",orient="index",orient="columns", ororient="split". Records is the default for what 99% of REST endpoints return.
One line in, one DataFrame out. Most analyst scripts never need more.
Method 2 — JSON Lines
What if the file isn’t a single JSON document but a stream — one object per line, no enclosing array? That’s JSON Lines, also called NDJSON (“newline-delimited JSON”). It’s the format that BigQuery, Snowflake, Loki, and most log shippers emit because it’s streamable: each line parses independently.
Same call, one keyword:
lines = pd.read_json("prices.jsonl", lines=True)
print(lines.shape) # (420, 8)
The data is identical to the records array. Only the wire format changed: instead of [ {...}, {...} ], it’s {...}\n{...}\n. Pandas reads it line-by-line and stitches the result into the same DataFrame you’d get from read_json("prices.json").
This is the move that saves you for line in file: json.loads(line) boilerplate. One keyword, done.
Method 3 — nested → flat with json_normalize
Real API responses are rarely flat. A trade execution comes back with a nested execution block. A user record comes back with a nested address. A search result comes back with a nested metadata.
read_json keeps the nesting — the column ends up holding a dict. That’s almost never what you want for analysis. You want flat columns you can filter and group by.
That’s what pd.json_normalize does:
import json
import pandas as pd
with open("trades.json") as f:
raw = json.load(f)
trades = pd.json_normalize(raw)
print(list(trades.columns))
# ['trade_id', 'symbol', 'sector', 'execution.price', 'execution.size', 'execution.venue']
Each nested key becomes a dot-path column — execution.price, execution.size, execution.venue. The same convention used by every JSON path library, every API doc, every flattening tool. Predictable, consistent, machine-readable.
If your nesting goes deeper than one level — user.address.city — json_normalize flattens that too. You can also pass record_path to pull a sub-array out into rows, or meta to keep parent keys alongside the flattened children. Read the json_normalize docs when you need those.
When to use which
| Wire shape | Pattern |
|---|---|
[ {...}, {...} ] (records array) |
pd.read_json("file.json") |
{...}\n{...}\n (JSON Lines / NDJSON) |
pd.read_json("file.jsonl", lines=True) |
{...} with nested objects inside |
pd.json_normalize(raw) (after json.load) |
| HTTP response → DataFrame | pd.json_normalize(response.json()) |
| Sub-array inside parent object | pd.json_normalize(raw, record_path=[...], meta=[...]) |
What you’re not going to do
A few patterns that show up on Stack Overflow but you should avoid:
- Manual
json.load+ dict-of-dicts comprehension. Just usepd.read_jsonorjson_normalize. The comprehension version always misses an edge case. for line in file: json.loads(line)thenpd.DataFrame(rows). Uselines=True. One keyword replaces a five-line loop.- String-replacing the nesting flat before parsing.
json_normalizeexists for exactly this. Don’t fight JSON with regex. - Reading every line of NDJSON as a single document. That’ll raise
JSONDecodeErroron the second line. Uselines=True.
The full demo
Here’s the complete script the video walks through:
import pandas as pd
# Method 1: records-shaped JSON (list of dicts — REST API style)
df = pd.read_json("prices.json")
print(f"records: {df.shape}")
# Method 2: JSON Lines / NDJSON (one object per line)
lines = pd.read_json("prices.jsonl", lines=True)
print(f"json lines: {lines.shape}")
# Method 3: nested JSON → flat columns with json_normalize
import json
with open("trades.json") as f:
raw = json.load(f)
trades = pd.json_normalize(raw)
print(f"normalized columns: {list(trades.columns)}")
print(f"normalized shape: {trades.shape}")
Running it:
records: (420, 8)
json lines: (420, 8)
normalized columns: ['trade_id', 'symbol', 'sector', 'execution.price', 'execution.size', 'execution.venue']
normalized shape: (8, 6)
Three shapes, one library, the same DataFrame surface every time.
Takeaways
pd.read_jsonreads any records-shaped JSON in one line. No loop, no manual parse.lines=Truereads JSON Lines / NDJSON. Same call, streamable wire format.pd.json_normalizeflattens nested objects into dot-path columns.execution.price,execution.venue— ready for filter and groupby.- The wire format is a transport detail. Records and JSON Lines yield the same DataFrame. Pick records for human readability, lines for streamability.
- JSON is not your final layer. Read once, work in DataFrame.
Watch the video for the full demo, including the nested-to-flat normalization in action.