Skip to main content

Default Values and Factories in Python Dataclasses

Default values in dataclasses allow you to omit arguments when creating instances. However, mutable defaults (lists, dicts) are a common pitfall: they are shared across all instances, causing silent bugs. The field() function and default_factory parameter solve this by creating a fresh object for each instance. Understanding when and how to use them is critical for writing correct dataclasses.

I've debugged countless production issues where a shared list was accidentally mutated across instances. This article shows you the exact pattern to avoid that trap.

The Mutable Default Problem

In regular Python classes, mutable defaults in function arguments are dangerous:

def greet(name, tags=[]):  # WRONG: shared list!
tags.append(name)
return f"Hello, {name}. Tags: {tags}"

print(greet("Alice")) # Tags: ['Alice']
print(greet("Bob")) # Tags: ['Alice', 'Bob'] -- SHARED!

The same problem occurs in dataclasses if you naively assign a mutable default:

from dataclasses import dataclass

@dataclass
class User:
name: str
tags: list[str] = [] # WRONG: shared list across all instances!

alice = User("Alice")
alice.tags.append("admin")

bob = User("Bob")
print(bob.tags) # ['admin'] -- SHARED BUG!

All instances of User that don't provide tags argument share the same list object. When you modify one, all others see the change. This is rarely what you intend.

The Solution: field() and default_factory

The dataclasses module provides a field() function that accepts a default_factory parameter: a callable that creates a fresh instance for each object.

from dataclasses import dataclass, field

@dataclass
class User:
name: str
tags: list[str] = field(default_factory=list) # Fresh list per instance

alice = User("Alice")
alice.tags.append("admin")

bob = User("Bob")
print(bob.tags) # [] -- NOT shared!
print(alice.tags) # ['admin']

Now each instance gets its own list() object created at instantiation time. The default_factory callable is invoked once per instance that omits the field.

Using field() for Default Values

Beyond default_factory, field() supports several other options:

Basic default Parameter

For immutable values, use the default parameter directly:

from dataclasses import dataclass, field

@dataclass
class Product:
name: str
price: float
currency: str = field(default="USD")
in_stock: bool = field(default=True)

item = Product("Laptop", 999.99)
print(item.currency) # USD
print(item.in_stock) # True

default_factory with Lambdas and Functions

You can use any callable for default_factory. Common choices are built-in types (list, dict, set) or custom functions:

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class LogEntry:
message: str
timestamp: datetime = field(default_factory=datetime.now)
metadata: dict = field(default_factory=dict)
retry_count: int = field(default_factory=lambda: 0)

log1 = LogEntry("Error 1")
import time; time.sleep(0.1)
log2 = LogEntry("Error 2")

print(log1.timestamp != log2.timestamp) # True: different times
print(log1.metadata is not log2.metadata) # True: different dicts

Here, datetime.now is called fresh for each instance, and a new dict is created per instance. The lambda provides a default that is not a simple immutable value.

Field Metadata and Options

The field() function accepts several parameters beyond defaults:

from dataclasses import dataclass, field

@dataclass
class Article:
title: str
slug: str = field(
default="",
metadata={"description": "URL-friendly identifier"},
)
published: bool = field(
default=False,
init=False, # Don't include in __init__
repr=False, # Don't include in __repr__
)
internal_id: int = field(
default=0,
compare=False, # Exclude from __eq__ and __lt__
)

article = Article("My Blog Post")
print(article) # Article(title='My Blog Post', slug='')
# Note: 'published' is hidden from repr; 'internal_id' is excluded from equality checks
  • metadata: A dict of arbitrary metadata about the field (useful for serialization or validation libraries).
  • init=False: Do not add this field to __init__. Useful for computed or internal fields.
  • repr=False: Do not include in the __repr__ string.
  • compare=False: Exclude from equality and ordering comparisons.

Practical Patterns

1. Configuration Objects with Sensible Defaults

from dataclasses import dataclass, field

@dataclass
class DatabaseConfig:
host: str
port: int = 5432
username: str = "postgres"
password: str = ""
pool_size: int = field(default=10, metadata={"min": 1, "max": 100})
options: dict[str, str] = field(default_factory=dict)

# Use defaults
config1 = DatabaseConfig("localhost")
print(config1.port) # 5432

# Override some
config2 = DatabaseConfig("prod-db.example.com", port=3306, pool_size=50)
print(config2.pool_size) # 50

2. Gradual Field Extension

When you add optional fields to an existing dataclass, use field(default_factory=...) to preserve backward compatibility:

from dataclasses import dataclass, field

@dataclass
class Event:
name: str
timestamp: str
# New optional field added later
tags: list[str] = field(default_factory=list)

# Old code without 'tags' still works
event = Event("conference", "2026-06-02T10:00:00Z")
print(event.tags) # []

3. Nested Structures

Factories are essential for nested mutable structures:

from dataclasses import dataclass, field

@dataclass
class BlogPost:
title: str
comments: list[dict[str, str]] = field(default_factory=list)
metadata: dict[str, object] = field(default_factory=dict)

post = BlogPost("My Post")
post.comments.append({"author": "Alice", "text": "Great!"})

post2 = BlogPost("Another Post")
print(post2.comments) # [] -- Not shared!

Field Metadata for Serialization

You can attach metadata to fields and retrieve it at runtime. This is useful for JSON serialization, ORM mappings, or custom validators:

from dataclasses import dataclass, field, fields

@dataclass
class Product:
name: str = field(metadata={"json_key": "product_name"})
price: float = field(metadata={"json_key": "list_price"})

def to_json_keys(obj):
result = {}
for f in fields(obj):
json_key = f.metadata.get("json_key", f.name)
result[json_key] = getattr(obj, f.name)
return result

product = Product("Hammer", 29.99)
print(to_json_keys(product))
# {'product_name': 'Hammer', 'list_price': 29.99}

The fields() function returns a tuple of Field objects, each with a metadata dict that you can query.

Field Ordering Rules

When a dataclass has both default and non-default fields, all non-default fields must come first in the definition. This ensures the __init__ signature is unambiguous:

from dataclasses import dataclass, field

# CORRECT: non-default first, defaults last
@dataclass
class Post:
title: str # required
body: str # required
published: bool = False # optional
tags: list[str] = field(default_factory=list) # optional

# WRONG: this would raise a TypeError
# @dataclass
# class BadOrder:
# published: bool = False
# title: str # ERROR: non-default after default!

Key Takeaways

  • Never use mutable defaults directly (e.g., = []); they are shared across instances.
  • Use field(default_factory=...) to provide a callable that creates a fresh object per instance.
  • Common factories: list, dict, set, or lambdas like lambda: 0 or custom functions.
  • field() also supports init, repr, and compare to control code generation.
  • Attach metadata to fields with field(metadata={...}) for serialization or validation.
  • Respect field ordering: non-defaults must come before defaults in the class definition.

Frequently Asked Questions

What if I need a default that is a mutable value I share intentionally?

You can store it as a class variable outside the dataclass, or use a frozen-default pattern with field(init=False). Generally, shared mutable defaults indicate a design issue; use a classmethod or factory function instead.

Can I use default and default_factory on the same field?

No. The decorator will raise an error. Choose one: default for immutables, default_factory for mutables.

How do I inspect field defaults and factories at runtime?

Import fields from dataclasses. It returns a tuple of Field objects with attributes like default, default_factory, and metadata.

Does default_factory affect performance?

Minimally. The factory is called once per instance at __init__ time, so the cost is the function call itself. For list() and dict(), this is negligible (microseconds).

Further Reading