Dataclasses (@dataclass, field, frozen, post_init) - Python Tutorial #30
Video: Dataclasses (@dataclass, field, frozen, post_init) - Python Tutorial #30 by Taught by Celeste AI - AI Coding Coach
Python Dataclasses: @dataclass, field, frozen, post_init
@dataclassgenerates__init__,__repr__,__eq__from class annotations.field(default_factory=list)for mutable defaults.frozen=Truefor immutability.__post_init__for derived state. The right tool for "data-shaped" classes.
For classes whose main job is to bundle data, writing __init__, __repr__, and __eq__ by hand is tedious. @dataclass does it for you.
A regular class, then a dataclass
# Regular class — verbose
class Point:
def __init__(self, x: float, y: float):
self.x = x
self.y = y
def __repr__(self):
return f"Point(x={self.x}, y={self.y})"
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return self.x == other.x and self.y == other.y
# Dataclass — same behavior
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
p = Point(3.0, 4.0)
print(p) # Point(x=3.0, y=4.0)
print(p == Point(3.0, 4.0)) # True
@dataclass reads the class annotations and generates:
__init__accepting all annotated attributes.__repr__showing class + fields.__eq__comparing all fields.
Less code, fewer bugs.
Defaults
@dataclass
class User:
name: str
email: str
age: int = 0
active: bool = True
u = User("Alice", "alice@example.com")
print(u)
# User(name='Alice', email='alice@example.com', age=0, active=True)
Like regular function parameters, defaults must come after non-defaults.
The mutable default trap
@dataclass
class Team:
name: str
members: list = [] # ERROR — same trap as function defaults
# Generates:
# Team("X").members is Team("Y").members → True
@dataclass catches this and raises ValueError: mutable default <class 'list'> for field members is not allowed.
The fix: field(default_factory=...):
from dataclasses import dataclass, field
@dataclass
class Team:
name: str
members: list[str] = field(default_factory=list)
t1 = Team("Backend")
t1.members.append("Alice")
t2 = Team("Frontend")
print(t1.members) # ['Alice']
print(t2.members) # [] — separate list
default_factory is called once per instance — fresh list each time.
field() options
field(default=value) # plain default
field(default_factory=callable) # dynamic default
field(repr=False) # exclude from __repr__
field(compare=False) # exclude from __eq__
field(init=False) # not in __init__
field(metadata={"key": "val"}) # arbitrary metadata
Example:
@dataclass
class Team:
name: str
members: list[str] = field(default_factory=list)
max_size: int = field(default=5, repr=False)
max_size doesn't show in repr. Useful for noisy defaults.
Frozen dataclasses
@dataclass(frozen=True)
class Config:
host: str
port: int
debug: bool = False
cfg = Config("localhost", 8080)
cfg.port = 9090 # FrozenInstanceError
frozen=True makes the dataclass immutable. Setting attributes raises an error.
Frozen dataclasses are also hashable (by default) — you can use them as dict keys or in sets:
configs = {Config("a", 1), Config("b", 2)}
For "value objects" — coordinates, IDs, settings — frozen is the right default.
post_init: derived attributes
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False)
def __post_init__(self):
self.area = self.width * self.height
r = Rectangle(5.0, 3.0)
print(r.area) # 15.0
__post_init__ runs after __init__. Use for:
- Derived attributes (compute from inputs).
- Validation (raise if invalid).
- Type conversions.
field(init=False) says "don't take this as an __init__ argument" — needed for derived fields.
Validation in post_init
@dataclass
class Person:
name: str
age: int
def __post_init__(self):
if self.age < 0:
raise ValueError(f"Age must be non-negative, got {self.age}")
Common pattern. For more rigorous validation, look at pydantic — full schema validation with similar syntax.
All the @dataclass options
@dataclass(
init=True, # generate __init__ (default)
repr=True, # generate __repr__ (default)
eq=True, # generate __eq__ (default)
order=False, # generate __lt__, __le__, __gt__, __ge__
frozen=False, # immutable
unsafe_hash=False, # generate __hash__ even when not frozen
slots=False, # use __slots__ (Python 3.10+)
)
class C:
...
order=True generates comparison methods based on field order:
@dataclass(order=True)
class Score:
points: int
name: str
scores = sorted([Score(80, "Bob"), Score(95, "Alice")])
# [Score(points=80, name='Bob'), Score(points=95, name='Alice')]
slots=True (Python 3.10+)
@dataclass(slots=True)
class Point:
x: float
y: float
Generates __slots__ — fixed attribute set, no per-instance __dict__. Smaller memory, slightly faster attribute access. Useful when you create many instances.
Tradeoff: can't add attributes outside the declared set. Usually a non-issue for data classes.
A product catalog
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class Product:
sku: str
name: str
price: float
tags: list[str] = field(default_factory=list)
created_at: datetime = field(default_factory=datetime.now)
def __post_init__(self):
if self.price < 0:
raise ValueError("Price cannot be negative")
@dataclass(frozen=True)
class OrderItem:
product_sku: str
quantity: int
@dataclass
class Order:
customer: str
items: list[OrderItem] = field(default_factory=list)
@property
def total_quantity(self) -> int:
return sum(item.quantity for item in self.items)
Real-world shape — a few classes, validated, immutable where it matters, with derived methods. Compare to writing all the dunder methods manually.
Inheritance
@dataclass
class Animal:
name: str
age: int
@dataclass
class Dog(Animal):
breed: str
d = Dog("Rex", 5, "Lab")
print(d) # Dog(name='Rex', age=5, breed='Lab')
Subclasses inherit fields. Constructor args are parent fields first, then child fields.
If the parent has defaults, the child's non-default fields cause issues — same rule as regular function defaults. Either give the child fields defaults or use kw_only=True (3.10+).
When to use a dataclass vs alternatives
- Plain class — when you need significant logic, multiple constructors, complex behavior.
- NamedTuple — when you need a lightweight immutable record (and accept tuple semantics).
- TypedDict — when the data is already a dict and you want type checking.
- pydantic.BaseModel — when you need rigorous validation, JSON schema, parse/serialize.
- dataclass — the sweet spot: structured data, optional methods, clean syntax.
Common stumbles
Mutable default without field(default_factory=...). members: list = [] raises immediately. Use the factory.
Field after default-field. name: str after age: int = 0 → "non-default argument follows default argument."
@dataclass without annotations. Fields must have type hints, or they're just class attributes (not constructor args). Even Any works as a hint.
Assigning to frozen. cfg.port = 9090 on frozen=True raises. Create a new instance with dataclasses.replace(cfg, port=9090).
Forgetting __post_init__ for validation. Construction succeeds with bad data; problems show up later.
Inheritance default ordering. Child can't have non-default field after parent's default field. Use kw_only=True if you can.
What's next
Lesson 31: async / await. Concurrency for I/O-bound work. asyncio, async def, await.
Recap
@dataclass generates __init__, __repr__, __eq__ from class annotations. Use field(default_factory=list) for mutable defaults. frozen=True for immutable, hashable instances. __post_init__ for derived attributes and validation. order=True for comparison methods. slots=True (3.10+) for memory efficiency. The right tool for "data with a few methods" — between plain classes and pydantic.
Next lesson: async / await.