Skip to content

Dataclasses

`dataclass` is Python's most natural tool for building lightweight structured objects. It is more than a convenience decorator for `__init__`; it is a clean way to model value objects, internal command payloads, settings, and pattern-matching nodes without hand-writing repetitive boilerplate.

Quick takeaway: a dataclass is not a validation engine or an ORM replacement. It is a value-oriented class generator that removes boilerplate around initialization, equality, and representation. In practice, `frozen=True`, `slots=True`, `kw_only=True`, `default_factory`, and `__post_init__()` are the combinations worth knowing deeply.

The Mental Model

A dataclass does not replace the class model. It keeps the class and generates methods that make value-oriented usage much lighter.

When Dataclasses Fit Extremely Well

  • value objects
  • internal commands and queries
  • settings objects
  • pattern-matching nodes
  • fixture data bundles in tests

When They Are a Poor Fit

  • external input boundaries that need strong runtime validation
  • ORM entities with richer lifecycle and lazy loading
  • framework objects that rely on dynamic attributes

A Safe High-Value Combination

py
from dataclasses import dataclass, field


@dataclass(slots=True, frozen=True, kw_only=True)
class RetryPolicy:
    max_attempts: int
    base_delay_ms: int = 100
    retry_on: tuple[str, ...] = field(default_factory=lambda: ("timeout", "busy"))

`frozen=True` is excellent for immutable value-object semantics, `slots=True` makes the instance shape tighter, and `kw_only=True` reduces positional-call mistakes in settings and command payloads.

Why default_factory Matters

py
from dataclasses import dataclass, field


@dataclass
class Batch:
    job_id: str
    items: list[str] = field(default_factory=list)
  • Mutable defaults should not be written as [], {}, or set() directly.
  • The official dataclasses docs explicitly recommend default_factory for this reason.

__post_init__() Is the Normalization Hook

py
from dataclasses import dataclass


@dataclass(frozen=True)
class EmailAddress:
    value: str

    def __post_init__(self) -> None:
        normalized = self.value.strip().lower()
        if "@" not in normalized:
            raise ValueError("invalid email address")
        object.__setattr__(self, "value", normalized)

__post_init__() is great for normalization and cross-field invariants. But once this grows into a full input-validation framework, a separate validation boundary such as Pydantic often becomes a better fit.

Dataclasses and Pattern Matching Work Well Together

  • Dataclasses can generate __match_args__.
  • That makes them pleasant for AST-like nodes or internal command objects used with match/case.
  • Python 3.10's match_args and kw_only settings connect directly to that style.

Practical Guidance

SituationGood default
internal value objectfrozen=True, and often slots=True
settings or command payloaduse kw_only=True aggressively
list/dict/set fielduse default_factory
API boundarykeep validation separate
ORM entitydo not force dataclass and ORM concerns into one model

Common Mistakes

  • using mutable defaults directly
  • treating dataclasses like request validators
  • relying on mutable objects while thinking in value-object terms
  • collapsing ORM models, dataclass DTOs, and Pydantic schemas together

Practical Checklist

Is this a value object?

Dataclasses work best when data shape and equality matter more than rich behavior.

Are mutable defaults avoided?

Use `default_factory` for list, dict, and set fields by default.

Is validation separated?

External boundaries usually need runtime validation beyond what a dataclass alone should own.

Would keyword-only be safer?

For settings-like or longer payload types, `kw_only=True` often improves call-site clarity.

Official References

Built with VitePress for a Python 3.14 handbook.