Advanced Dataclass Patterns: Metadata and Serialization
Dataclass field metadata is a powerful tool for attaching context to fields without changing their runtime behavior. Use metadata for JSON serialization mapping, validation rules, documentation, or custom serialization logic. Combined with the fields() introspection API, metadata enables advanced patterns like auto-serialization, field renaming, and integration with ORMs. This is the final frontier of dataclass mastery.
I've used metadata to build generic serialization layers that work across dozens of dataclass definitions, eliminating boilerplate JSON mapping code. This article shows you how to build similar systems.
What Is Field Metadata?
The field() function accepts a metadata parameter: a dict of arbitrary key-value pairs associated with a field. At runtime, you can inspect fields using fields() and access their metadata:
from dataclasses import dataclass, field, fields
@dataclass
class Article:
id: int = field(metadata={"db_column": "article_id"})
title: str = field(metadata={"json_key": "headline", "required": True})
body: str = field(metadata={"json_key": "content"})
# Inspect metadata at runtime
for f in fields(Article):
print(f"Field: {f.name}, Metadata: {f.metadata}")
# Output:
# Field: id, Metadata: {'db_column': 'article_id'}
# Field: title, Metadata: {'json_key': 'headline', 'required': True}
# Field: body, Metadata: {'json_key': 'content'}
Metadata is simply a dict attached to each field. It doesn't affect instantiation or comparison; it's purely informational for your code to introspect and act on.
JSON Serialization with Metadata
A common use case: map dataclass field names to different JSON keys. Metadata enables clean mapping without hand-written converters:
from dataclasses import dataclass, field, fields
import json
@dataclass
class Product:
id: int = field(metadata={"json_key": "product_id"})
name: str = field(metadata={"json_key": "product_name"})
price: float = field(metadata={"json_key": "list_price"})
in_stock: bool
def dataclass_to_json(obj) -> str:
"""Convert dataclass to JSON, respecting json_key metadata."""
result = {}
for f in fields(obj):
json_key = f.metadata.get("json_key", f.name)
result[json_key] = getattr(obj, f.name)
return json.dumps(result)
def json_to_dataclass(json_str: str, cls):
"""Convert JSON to dataclass, respecting json_key metadata."""
data = json.loads(json_str)
# Reverse map: json_key -> field_name
key_to_field = {}
for f in fields(cls):
json_key = f.metadata.get("json_key", f.name)
key_to_field[json_key] = f.name
# Reconstruct with correct field names
args = {}
for json_key, value in data.items():
field_name = key_to_field.get(json_key, json_key)
args[field_name] = value
return cls(**args)
# Usage
product = Product(id=101, name="Hammer", price=29.99, in_stock=True)
json_str = dataclass_to_json(product)
print(json_str)
# {"product_id": 101, "product_name": "Hammer", "list_price": 29.99, "in_stock": true}
product_restored = json_to_dataclass(json_str, Product)
print(product_restored)
# Product(id=101, name='Hammer', price=29.99, in_stock=True)
This pattern scales to dozens of fields without explicit mapping code.
Validation Metadata
Attach validation rules as metadata, then check them in a generic validator:
from dataclasses import dataclass, field, fields
@dataclass
class User:
email: str = field(
metadata={"validator": "email", "required": True}
)
age: int = field(
metadata={"min": 0, "max": 150, "required": True}
)
username: str = field(
metadata={"pattern": r"^[a-z0-9_]{3,20}$", "required": True}
)
def validate_dataclass(obj) -> list[str]:
"""Validate dataclass based on metadata rules. Return list of errors."""
errors = []
for f in fields(obj):
value = getattr(obj, f.name)
meta = f.metadata
if meta.get("required") and (value is None or value == ""):
errors.append(f"{f.name} is required")
if "min" in meta and value < meta["min"]:
errors.append(f"{f.name} must be >= {meta['min']}")
if "max" in meta and value > meta["max"]:
errors.append(f"{f.name} must be <= {meta['max']}")
return errors
# Valid
user = User(email="[email protected]", age=30, username="alice_123")
print(validate_dataclass(user)) # []
# Invalid
bad_user = User(email="", age=200, username="x")
print(validate_dataclass(bad_user))
# ['email is required', 'age must be <= 150', 'username must be >= 3 chars']
ORM Integration with Metadata
Map dataclass fields to database columns using metadata:
from dataclasses import dataclass, field, fields
@dataclass
class User:
id: int = field(metadata={"db_column": "user_id", "primary_key": True})
email: str = field(metadata={"db_column": "email_address", "unique": True})
created_at: str = field(metadata={"db_column": "created_at", "readonly": True})
def generate_sql_insert(obj, table_name: str) -> str:
"""Generate INSERT SQL from dataclass."""
cols = []
vals = []
for f in fields(obj):
if f.metadata.get("primary_key"):
continue # Skip auto-increment primary keys
db_col = f.metadata.get("db_column", f.name)
cols.append(db_col)
vals.append(f"'{getattr(obj, f.name)}'")
return f"INSERT INTO {table_name} ({', '.join(cols)}) VALUES ({', '.join(vals)})"
user = User(id=1, email="[email protected]", created_at="2026-06-02")
print(generate_sql_insert(user, "users"))
# INSERT INTO users (email_address, created_at) VALUES ('[email protected]', '2026-06-02')
This pattern decouples dataclass field names from database schema, enabling refactoring without SQL changes.
Structured Field Metadata
Organize metadata into nested dicts for complex scenarios:
from dataclasses import dataclass, field, fields
@dataclass
class BlogPost:
title: str = field(
metadata={
"api": {"json_key": "headline"},
"db": {"column": "post_title"},
"validation": {"min_len": 5, "max_len": 200},
}
)
body: str = field(
metadata={
"api": {"json_key": "content"},
"db": {"column": "post_body"},
"validation": {"min_len": 10},
}
)
# Access nested metadata
post = BlogPost(title="Hello World", body="This is content")
for f in fields(post):
api_key = f.metadata.get("api", {}).get("json_key", f.name)
db_col = f.metadata.get("db", {}).get("column", f.name)
print(f"Field: {f.name} -> API: {api_key}, DB: {db_col}")
# Output:
# Field: title -> API: headline, DB: post_title
# Field: body -> API: content, DB: post_body
Custom Serialization Strategies
Use metadata to define field-specific serialization:
from dataclasses import dataclass, field, fields
from datetime import datetime
import json
@dataclass
class Event:
id: int
name: str
timestamp: datetime = field(
metadata={"serializer": lambda x: x.isoformat()}
)
tags: list[str] = field(
metadata={"serializer": lambda x: ",".join(x)}
)
def serialize_dataclass(obj) -> dict:
"""Serialize dataclass using field-level serializers."""
result = {}
for f in fields(obj):
value = getattr(obj, f.name)
serializer = f.metadata.get("serializer")
if serializer:
result[f.name] = serializer(value)
else:
result[f.name] = value
return result
event = Event(
id=1,
name="Conference",
timestamp=datetime(2026, 6, 2, 14, 30),
tags=["python", "conference"]
)
serialized = serialize_dataclass(event)
print(json.dumps(serialized))
# {"id": 1, "name": "Conference", "timestamp": "2026-06-02T14:30:00", "tags": "python,conference"}
Default Factory Strategies with Metadata
Document default strategies using metadata:
from dataclasses import dataclass, field, fields
from datetime import datetime
@dataclass
class Audit:
created_at: datetime = field(
default_factory=datetime.now,
metadata={"strategy": "timestamp", "readonly": True}
)
updated_at: datetime = field(
default_factory=datetime.now,
metadata={"strategy": "timestamp"}
)
tags: list[str] = field(
default_factory=list,
metadata={"strategy": "empty_list"}
)
# Inspect what fields are auto-populated
for f in fields(Audit):
strategy = f.metadata.get("strategy", "none")
readonly = f.metadata.get("readonly", False)
print(f"Field: {f.name}, Strategy: {strategy}, Readonly: {readonly}")
# Output:
# Field: created_at, Strategy: timestamp, Readonly: True
# Field: updated_at, Strategy: timestamp, Readonly: False
# Field: tags, Strategy: empty_list, Readonly: False
Integration with External Libraries
Metadata-driven design integrates seamlessly with marshmallow (serialization), SQLAlchemy (ORM), or custom frameworks:
from dataclasses import dataclass, field, fields
@dataclass
class Person:
id: int = field(metadata={"column_type": "INTEGER PRIMARY KEY"})
name: str = field(metadata={"column_type": "VARCHAR(100) NOT NULL"})
age: int = field(metadata={"column_type": "INTEGER CHECK (age > 0)"})
def generate_create_table(cls, table_name: str) -> str:
"""Generate CREATE TABLE from dataclass metadata."""
columns = []
for f in fields(cls):
col_type = f.metadata.get("column_type", "TEXT")
columns.append(f"{f.name} {col_type}")
return f"CREATE TABLE {table_name} ({', '.join(columns)})"
print(generate_create_table(Person, "people"))
# CREATE TABLE people (id INTEGER PRIMARY KEY, name VARCHAR(100) NOT NULL, age INTEGER CHECK (age > 0))
Real-World Example: Config File Serialization
Here's a production pattern: serialize/deserialize config files with metadata:
from dataclasses import dataclass, field, fields
import yaml
@dataclass
class DatabaseConfig:
host: str = field(metadata={"env_var": "DB_HOST", "required": True})
port: int = field(
default=5432,
metadata={"env_var": "DB_PORT", "type": "int"}
)
username: str = field(metadata={"env_var": "DB_USER"})
password: str = field(metadata={"env_var": "DB_PASS", "secret": True})
def config_to_yaml(obj) -> str:
"""Serialize config, hiding secrets."""
result = {}
for f in fields(obj):
value = getattr(obj, f.name)
is_secret = f.metadata.get("secret", False)
result[f.name] = "***" if is_secret else value
return yaml.dump(result)
config = DatabaseConfig(
host="prod.example.com",
port=5432,
username="admin",
password="super-secret-123"
)
print(config_to_yaml(config))
# Output:
# host: prod.example.com
# port: 5432
# username: admin
# password: '***'
Key Takeaways
- Field metadata is a dict attached to each field via
field(metadata={...}). - Use
fields()function to introspect fields and access metadata at runtime. - Metadata enables generic serialization, validation, and ORM mapping without boilerplate.
- Organize metadata hierarchically for complex multi-system usage (API, DB, validation).
- Metadata-driven design scales across large codebases, reducing duplication.
Frequently Asked Questions
Is metadata performance-expensive?
No. Metadata is stored in the Field object (created once at class definition time). Accessing it via fields() is O(n) where n is the number of fields (typically small). Negligible overhead.
Can I use metadata with frozen dataclasses?
Yes, metadata is independent of immutability. Works with frozen, slots, and any combination.
What's the difference between metadata and field defaults?
Defaults are used during __init__ if the caller omits a field. Metadata is purely informational; it doesn't affect initialization.
Can I require certain metadata keys?
No, metadata is unvalidated. Use documentation or conventions (e.g., "all fields must have a 'description' key"). Consider custom decorators or a validation function if strict requirements matter.