Post-Init Validation: Securing Your Dataclass Data
The __post_init__ method in dataclasses allows you to run custom validation or initialization logic immediately after __init__ completes. This is your guard rail: enforce data invariants, compute derived fields, and prevent invalid objects from existing. Skipping validation in dataclasses is a common mistake that costs debugging time in production.
I've seen APIs crash because dataclass fields were never validated, allowing negative ages, empty required strings, and out-of-range values to slip through. This article shows you how to add the discipline of assertion and validation at instantiation time.
How __post_init__ Works
When you define a __post_init__ method in a dataclass, Python calls it automatically after __init__ finishes, passing only the dataclass instance (no additional arguments). This is the ideal place to run validation, compute derived fields, or initialize complex state.
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
def __post_init__(self) -> None:
if not self.name or not self.name.strip():
raise ValueError("name cannot be empty or whitespace-only")
if self.age < 0 or self.age > 150:
raise ValueError(f"age must be between 0 and 150, got {self.age}")
# Valid instance
alice = Person("Alice", 30)
# Invalid: raises ValueError
try:
bob = Person("", 25)
except ValueError as e:
print(f"Error: {e}") # Error: name cannot be empty or whitespace-only
The dataclass generates a standard __init__, assigns all fields, and then calls __post_init__(). If __post_init__ raises an exception, the object is never returned to the caller.
Common Validation Patterns
Range and Constraint Checks
from dataclasses import dataclass
@dataclass
class Product:
sku: str
price: float
stock: int
def __post_init__(self) -> None:
if len(self.sku) < 3 or len(self.sku) > 20:
raise ValueError(f"sku must be 3–20 chars, got {len(self.sku)}")
if self.price < 0:
raise ValueError("price cannot be negative")
if self.stock < 0:
raise ValueError("stock cannot be negative")
# Valid
item = Product("HAMMER-001", 29.99, 100)
# Invalid
try:
bad = Product("X", 10.0, 5)
except ValueError as e:
print(e) # sku must be 3–20 chars
Interdependent Field Validation
Validate relationships between fields:
from dataclasses import dataclass
from datetime import datetime
@dataclass
class DateRange:
start: datetime
end: datetime
def __post_init__(self) -> None:
if self.start > self.end:
raise ValueError("start date cannot be after end date")
if self.start == self.end:
raise ValueError("start and end dates must differ")
range1 = DateRange(
datetime(2026, 1, 1),
datetime(2026, 12, 31)
)
try:
range2 = DateRange(
datetime(2026, 12, 31),
datetime(2026, 1, 1)
)
except ValueError as e:
print(e) # start date cannot be after end date
Normalization in __post_init__
You can also use __post_init__ to normalize or transform fields:
from dataclasses import dataclass
import re
@dataclass
class Email:
address: str
def __post_init__(self) -> None:
# Basic email validation
if not re.match(r"^[^@]+@[^@]+\.[^@]+$", self.address):
raise ValueError(f"invalid email format: {self.address}")
# Normalize to lowercase
self.address = self.address.lower()
email = Email("[email protected]")
print(email.address) # [email protected]
Computed Fields via __post_init__
You can compute derived fields that depend on other fields:
from dataclasses import dataclass, field
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False)
def __post_init__(self) -> None:
if self.width <= 0 or self.height <= 0:
raise ValueError("dimensions must be positive")
self.area = self.width * self.height
rect = Rectangle(10, 20)
print(rect.area) # 200.0
Using field(init=False) tells the dataclass not to include area in __init__, since we compute it ourselves. This keeps the public API clean.
Validation with Type-Checked Defaults
Use __post_init__ alongside field(default_factory=...) to ensure defaults are also valid:
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class Config:
name: str
timeout_seconds: int = field(default=30)
retry_count: int = field(default=3)
def __post_init__(self) -> None:
if not self.name:
raise ValueError("name is required")
if self.timeout_seconds <= 0:
raise ValueError("timeout_seconds must be positive")
if self.retry_count < 0:
raise ValueError("retry_count cannot be negative")
# Default values are also validated
config = Config("api-call", timeout_seconds=60)
Raising vs. Asserting
Use raise exceptions in production code. Assertions are turned off with the -O flag, making them unsuitable for invariant checks:
from dataclasses import dataclass
@dataclass
class Account:
balance: float
def __post_init__(self) -> None:
# GOOD: exception is always raised
if self.balance < 0:
raise ValueError("balance cannot be negative")
# BAD: assertion can be disabled with -O
# assert self.balance >= 0, "balance cannot be negative"
Integration with Serialization
Validation in __post_init__ integrates well with deserialization (e.g., from JSON):
from dataclasses import dataclass
import json
@dataclass
class User:
email: str
age: int
def __post_init__(self) -> None:
if not self.email or "@" not in self.email:
raise ValueError("invalid email")
if self.age < 18:
raise ValueError("user must be 18 or older")
def load_user(json_str: str) -> User:
data = json.loads(json_str)
return User(**data) # Validation happens here in __post_init__
# Valid
user = load_user('{"email": "[email protected]", "age": 25}')
# Invalid: raises ValueError during deserialization
try:
bad = load_user('{"email": "invalid", "age": 15}')
except ValueError as e:
print(e) # user must be 18 or older
Performance Considerations
__post_init__ is called once per instance at creation time. If validation is CPU-intensive, it adds latency to object instantiation. For batch operations, consider separating validation from creation:
from dataclasses import dataclass
@dataclass
class DataPoint:
value: float
timestamp: str
def __post_init__(self) -> None:
if self.value < -100 or self.value > 100:
raise ValueError("value out of range")
# For high-volume creation, create first, validate later (or in background)
raw_data = [{"value": x, "timestamp": "2026-06-02"} for x in range(1000)]
points = [DataPoint(**d) for d in raw_data] # May be slow if validation is heavy
For very high-performance code, consider skipping validation and using type hints + static analysis (mypy) to catch errors at development time.
Key Takeaways
__post_init__is called automatically after__init__completes, making it ideal for validation.- Raise
ValueErroror custom exceptions in__post_init__to prevent invalid instances. - Use
field(init=False)to define computed fields set in__post_init__. - Validate interdependent fields and data invariants in
__post_init__. - Normalization (e.g., lowercase, strip whitespace) can happen in
__post_init__. - Always raise exceptions, never rely on assertions (
assert), because assertions can be disabled.
Frequently Asked Questions
Can I have multiple validation steps?
Yes. Define multiple methods called from __post_init__, or write all validation inline. Extracting to helper methods improves readability.
What if a default value fails validation?
Use field(default_factory=...) and validate the result. The factory is called before __post_init__, so your validation can check it.
Can I access the generated __init__ from __post_init__?
No. By the time __post_init__ is called, __init__ has already finished and all fields are set. You cannot call __init__ again.
How do I validate without raising (e.g., log a warning)?
You can, but it's not best practice. If an invalid state exists, the object itself is invalid. Logging and returning usually masks the problem. Raise an exception or use a factory function that returns None or Result for invalid data.