Post-Init Validation: Securing Your Dataclass Data

The __post_init__ method in dataclasses allows you to run custom validation or initialization logic immediately after __init__ completes. This is your guard rail: enforce data invariants, compute derived fields, and prevent invalid objects from existing. Skipping validation in dataclasses is a common mistake that costs debugging time in production.

I've seen APIs crash because dataclass fields were never validated, allowing negative ages, empty required strings, and out-of-range values to slip through. This article shows you how to add the discipline of assertion and validation at instantiation time.

How `__post_init__` Works

When you define a __post_init__ method in a dataclass, Python calls it automatically after __init__ finishes, passing only the dataclass instance (no additional arguments). This is the ideal place to run validation, compute derived fields, or initialize complex state.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    
    def __post_init__(self) -> None:
        if not self.name or not self.name.strip():
            raise ValueError("name cannot be empty or whitespace-only")
        if self.age < 0 or self.age > 150:
            raise ValueError(f"age must be between 0 and 150, got {self.age}")

# Valid instance
alice = Person("Alice", 30)

# Invalid: raises ValueError
try:
    bob = Person("", 25)
except ValueError as e:
    print(f"Error: {e}")  # Error: name cannot be empty or whitespace-only

The dataclass generates a standard __init__, assigns all fields, and then calls __post_init__(). If __post_init__ raises an exception, the object is never returned to the caller.

Common Validation Patterns

Range and Constraint Checks

from dataclasses import dataclass

@dataclass
class Product:
    sku: str
    price: float
    stock: int
    
    def __post_init__(self) -> None:
        if len(self.sku) < 3 or len(self.sku) > 20:
            raise ValueError(f"sku must be 3–20 chars, got {len(self.sku)}")
        if self.price < 0:
            raise ValueError("price cannot be negative")
        if self.stock < 0:
            raise ValueError("stock cannot be negative")

# Valid
item = Product("HAMMER-001", 29.99, 100)

# Invalid
try:
    bad = Product("X", 10.0, 5)
except ValueError as e:
    print(e)  # sku must be 3–20 chars

Interdependent Field Validation

Validate relationships between fields:

from dataclasses import dataclass
from datetime import datetime

@dataclass
class DateRange:
    start: datetime
    end: datetime
    
    def __post_init__(self) -> None:
        if self.start > self.end:
            raise ValueError("start date cannot be after end date")
        if self.start == self.end:
            raise ValueError("start and end dates must differ")

range1 = DateRange(
    datetime(2026, 1, 1),
    datetime(2026, 12, 31)
)

try:
    range2 = DateRange(
        datetime(2026, 12, 31),
        datetime(2026, 1, 1)
    )
except ValueError as e:
    print(e)  # start date cannot be after end date

Normalization in `__post_init__`

You can also use __post_init__ to normalize or transform fields:

from dataclasses import dataclass
import re

@dataclass
class Email:
    address: str
    
    def __post_init__(self) -> None:
        # Basic email validation
        if not re.match(r"^[^@]+@[^@]+\.[^@]+$", self.address):
            raise ValueError(f"invalid email format: {self.address}")
        # Normalize to lowercase
        self.address = self.address.lower()

email = Email("[email protected]")
print(email.address)  # [email protected]

Computed Fields via `__post_init__`

You can compute derived fields that depend on other fields:

from dataclasses import dataclass, field

@dataclass
class Rectangle:
    width: float
    height: float
    area: float = field(init=False)
    
    def __post_init__(self) -> None:
        if self.width <= 0 or self.height <= 0:
            raise ValueError("dimensions must be positive")
        self.area = self.width * self.height

rect = Rectangle(10, 20)
print(rect.area)  # 200.0

Using field(init=False) tells the dataclass not to include area in __init__, since we compute it ourselves. This keeps the public API clean.

Validation with Type-Checked Defaults

Use __post_init__ alongside field(default_factory=...) to ensure defaults are also valid:

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Config:
    name: str
    timeout_seconds: int = field(default=30)
    retry_count: int = field(default=3)
    
    def __post_init__(self) -> None:
        if not self.name:
            raise ValueError("name is required")
        if self.timeout_seconds <= 0:
            raise ValueError("timeout_seconds must be positive")
        if self.retry_count < 0:
            raise ValueError("retry_count cannot be negative")

# Default values are also validated
config = Config("api-call", timeout_seconds=60)

Raising vs. Asserting

Use raise exceptions in production code. Assertions are turned off with the -O flag, making them unsuitable for invariant checks:

from dataclasses import dataclass

@dataclass
class Account:
    balance: float
    
    def __post_init__(self) -> None:
        # GOOD: exception is always raised
        if self.balance < 0:
            raise ValueError("balance cannot be negative")
        
        # BAD: assertion can be disabled with -O
        # assert self.balance >= 0, "balance cannot be negative"

Integration with Serialization

Validation in __post_init__ integrates well with deserialization (e.g., from JSON):

from dataclasses import dataclass
import json

@dataclass
class User:
    email: str
    age: int
    
    def __post_init__(self) -> None:
        if not self.email or "@" not in self.email:
            raise ValueError("invalid email")
        if self.age < 18:
            raise ValueError("user must be 18 or older")

def load_user(json_str: str) -> User:
    data = json.loads(json_str)
    return User(**data)  # Validation happens here in __post_init__

# Valid
user = load_user('{"email": "[email protected]", "age": 25}')

# Invalid: raises ValueError during deserialization
try:
    bad = load_user('{"email": "invalid", "age": 15}')
except ValueError as e:
    print(e)  # user must be 18 or older

Performance Considerations

__post_init__ is called once per instance at creation time. If validation is CPU-intensive, it adds latency to object instantiation. For batch operations, consider separating validation from creation:

from dataclasses import dataclass

@dataclass
class DataPoint:
    value: float
    timestamp: str
    
    def __post_init__(self) -> None:
        if self.value < -100 or self.value > 100:
            raise ValueError("value out of range")

# For high-volume creation, create first, validate later (or in background)
raw_data = [{"value": x, "timestamp": "2026-06-02"} for x in range(1000)]
points = [DataPoint(**d) for d in raw_data]  # May be slow if validation is heavy

For very high-performance code, consider skipping validation and using type hints + static analysis (mypy) to catch errors at development time.

Key Takeaways

__post_init__ is called automatically after __init__ completes, making it ideal for validation.
Raise ValueError or custom exceptions in __post_init__ to prevent invalid instances.
Use field(init=False) to define computed fields set in __post_init__.
Validate interdependent fields and data invariants in __post_init__.
Normalization (e.g., lowercase, strip whitespace) can happen in __post_init__.
Always raise exceptions, never rely on assertions (assert), because assertions can be disabled.

Frequently Asked Questions

Can I have multiple validation steps?

Yes. Define multiple methods called from __post_init__, or write all validation inline. Extracting to helper methods improves readability.

What if a default value fails validation?

Use field(default_factory=...) and validate the result. The factory is called before __post_init__, so your validation can check it.

Can I access the generated `init` from `__post_init__`?

No. By the time __post_init__ is called, __init__ has already finished and all fields are set. You cannot call __init__ again.

How do I validate without raising (e.g., log a warning)?

You can, but it's not best practice. If an invalid state exists, the object itself is invalid. Logging and returning usually masks the problem. Raise an exception or use a factory function that returns None or Result for invalid data.

How __post_init__ Works​

Common Validation Patterns​

Range and Constraint Checks​

Interdependent Field Validation​

Normalization in __post_init__​

Computed Fields via __post_init__​

Validation with Type-Checked Defaults​

Raising vs. Asserting​

Integration with Serialization​

Performance Considerations​

Key Takeaways​

Frequently Asked Questions​

Can I have multiple validation steps?​

What if a default value fails validation?​

Can I access the generated __init__ from __post_init__?​

How do I validate without raising (e.g., log a warning)?​

Further Reading​