Back to Blog

3 Ways to Read JSON into a pandas DataFrame

Celest KimCelest Kim

Video: 3 Ways to Read JSON into a pandas DataFrame by CelesteAI

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

Every REST API, every webhook, every BigQuery export hands you JSON. The shape varies — sometimes a records array, sometimes one-object-per-line, sometimes deeply nested — but the destination is always a DataFrame.

The good news: pandas reads all three shapes in a single line. No manual parsing, no for-loop over keys, no flattening helper. The wire format changes; the call stays simple.

The shape of the problem

JSON is how the web talks to itself. It’s also how data warehouses unload, how log shippers stream, how every modern API hands back results. Three shapes show up over and over:

  • Records — a JSON array of objects: [{...}, {...}]. The default REST response.
  • JSON Lines (also called NDJSON) — one object per line, no enclosing array: {...}\n{...}\n. Streaming-friendly. Every BigQuery export, every log pipeline.
  • Nested — objects inside objects: {"id": 1, "execution": {"price": 273.08, "venue": "NYSE"}}. What real APIs actually return.

Pandas reads all three. Two with pd.read_json. One with pd.json_normalize.

Method 1 — records array

The canonical pattern:

import pandas as pd

df = pd.read_json("prices.json")
print(df.shape)   # (420, 8)

Two things to note:

  • pd.read_json auto-detects records orientation when the top-level value is a list of objects. The dict keys become column names. The objects become rows.
  • If you ever need to be explicit — say, a server hands back the columnar shape — pass orient="records", orient="index", orient="columns", or orient="split". Records is the default for what 99% of REST endpoints return.

One line in, one DataFrame out. Most analyst scripts never need more.

Method 2 — JSON Lines

What if the file isn’t a single JSON document but a stream — one object per line, no enclosing array? That’s JSON Lines, also called NDJSON (“newline-delimited JSON”). It’s the format that BigQuery, Snowflake, Loki, and most log shippers emit because it’s streamable: each line parses independently.

Same call, one keyword:

lines = pd.read_json("prices.jsonl", lines=True)
print(lines.shape)   # (420, 8)

The data is identical to the records array. Only the wire format changed: instead of [ {...}, {...} ], it’s {...}\n{...}\n. Pandas reads it line-by-line and stitches the result into the same DataFrame you’d get from read_json("prices.json").

This is the move that saves you for line in file: json.loads(line) boilerplate. One keyword, done.

Method 3 — nested → flat with json_normalize

Real API responses are rarely flat. A trade execution comes back with a nested execution block. A user record comes back with a nested address. A search result comes back with a nested metadata.

read_json keeps the nesting — the column ends up holding a dict. That’s almost never what you want for analysis. You want flat columns you can filter and group by.

That’s what pd.json_normalize does:

import json
import pandas as pd

with open("trades.json") as f:
    raw = json.load(f)

trades = pd.json_normalize(raw)
print(list(trades.columns))
# ['trade_id', 'symbol', 'sector', 'execution.price', 'execution.size', 'execution.venue']

Each nested key becomes a dot-path columnexecution.price, execution.size, execution.venue. The same convention used by every JSON path library, every API doc, every flattening tool. Predictable, consistent, machine-readable.

If your nesting goes deeper than one level — user.address.cityjson_normalize flattens that too. You can also pass record_path to pull a sub-array out into rows, or meta to keep parent keys alongside the flattened children. Read the json_normalize docs when you need those.

When to use which

Wire shape Pattern
[ {...}, {...} ] (records array) pd.read_json("file.json")
{...}\n{...}\n (JSON Lines / NDJSON) pd.read_json("file.jsonl", lines=True)
{...} with nested objects inside pd.json_normalize(raw) (after json.load)
HTTP response → DataFrame pd.json_normalize(response.json())
Sub-array inside parent object pd.json_normalize(raw, record_path=[...], meta=[...])

What you’re not going to do

A few patterns that show up on Stack Overflow but you should avoid:

  • Manual json.load + dict-of-dicts comprehension. Just use pd.read_json or json_normalize. The comprehension version always misses an edge case.
  • for line in file: json.loads(line) then pd.DataFrame(rows). Use lines=True. One keyword replaces a five-line loop.
  • String-replacing the nesting flat before parsing. json_normalize exists for exactly this. Don’t fight JSON with regex.
  • Reading every line of NDJSON as a single document. That’ll raise JSONDecodeError on the second line. Use lines=True.

The full demo

Here’s the complete script the video walks through:

import pandas as pd

# Method 1: records-shaped JSON (list of dicts — REST API style)
df = pd.read_json("prices.json")
print(f"records:              {df.shape}")

# Method 2: JSON Lines / NDJSON (one object per line)
lines = pd.read_json("prices.jsonl", lines=True)
print(f"json lines:           {lines.shape}")

# Method 3: nested JSON → flat columns with json_normalize
import json
with open("trades.json") as f:
    raw = json.load(f)
trades = pd.json_normalize(raw)
print(f"normalized columns:   {list(trades.columns)}")
print(f"normalized shape:     {trades.shape}")

Running it:

records:              (420, 8)
json lines:           (420, 8)
normalized columns:   ['trade_id', 'symbol', 'sector', 'execution.price', 'execution.size', 'execution.venue']
normalized shape:     (8, 6)

Three shapes, one library, the same DataFrame surface every time.

Takeaways

  1. pd.read_json reads any records-shaped JSON in one line. No loop, no manual parse.
  2. lines=True reads JSON Lines / NDJSON. Same call, streamable wire format.
  3. pd.json_normalize flattens nested objects into dot-path columns. execution.price, execution.venue — ready for filter and groupby.
  4. The wire format is a transport detail. Records and JSON Lines yield the same DataFrame. Pick records for human readability, lines for streamability.
  5. JSON is not your final layer. Read once, work in DataFrame.

Watch the video for the full demo, including the nested-to-flat normalization in action.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.