Building a Lightweight ORM with Metaclasses in Python
Building a lightweight ORM is the capstone exercise for understanding descriptors and metaclasses: an ORM collects field definitions via descriptors, uses a metaclass to build a schema, and generates SQL queries. This hands-on project reveals how Django and SQLAlchemy achieve their elegance—and gives you a working example of metaprogramming in production context. A minimal ORM demonstrates field tracking, inheritance chains, descriptor protocols for lazy loading, and metaclass-driven code generation. The result is a framework where users write simple class definitions and get automatic database mapping, validation, and querying.
Part 1: Define Field Descriptors
Start with field types that handle validation and store values:
class Field:
"""Base descriptor for ORM fields."""
def __init__(self, type_=str, default=None, primary_key=False, nullable=True):
self.type_ = type_
self.default = default
self.primary_key = primary_key
self.nullable = nullable
self.name = None
def __set_name__(self, owner, name):
self.name = name
def __get__(self, obj, objtype=None):
if obj is None:
return self
return obj.__dict__.get(f'_{self.name}', self.default)
def __set__(self, obj, value):
if value is None and not self.nullable:
raise ValueError(f'{self.name} cannot be null')
if value is not None and not isinstance(value, self.type_):
raise TypeError(
f'{self.name} must be {self.type_.__name__}, '
f'got {type(value).__name__}'
)
obj.__dict__[f'_{self.name}'] = value
def sql_type(self):
"""Return SQL type for this field."""
type_map = {str: 'TEXT', int: 'INTEGER', float: 'REAL', bool: 'BOOLEAN'}
return type_map.get(self.type_, 'TEXT')
class IntegerField(Field):
def __init__(self, **kwargs):
super().__init__(type_=int, **kwargs)
class StringField(Field):
def __init__(self, max_length=255, **kwargs):
super().__init__(type_=str, **kwargs)
self.max_length = max_length
def sql_type(self):
return f'VARCHAR({self.max_length})'
class TextField(Field):
def __init__(self, **kwargs):
super().__init__(type_=str, **kwargs)
def sql_type(self):
return 'TEXT'
These field descriptors handle type checking and provide SQL schema information—the foundation of the ORM.
Part 2: Build the Metaclass
The metaclass collects fields, generates schema, and provides query methods:
class ORMMeta(type):
"""A metaclass that generates ORM schema and query methods."""
def __new__(mcs, name, bases, namespace):
# Collect field definitions
fields = {}
for key, value in list(namespace.items()):
if isinstance(value, Field):
fields[key] = value
namespace['_fields'] = fields
# Generate __init__
def __init__(self, **kwargs):
for field_name in self._fields:
setattr(self, field_name, kwargs.get(field_name))
namespace['__init__'] = __init__
# Generate __repr__
def __repr__(self):
fields_str = ', '.join(
f'{k}={getattr(self, k, None)!r}'
for k in self._fields
)
return f'{name}({fields_str})'
namespace['__repr__'] = __repr__
# Create the class
cls = super().__new__(mcs, name, bases, namespace)
# Store table metadata
if name != 'Model':
cls._table_name = name.lower() + 's'
return cls
class Model(metaclass=ORMMeta):
"""Base ORM model class."""
@classmethod
def get_schema(cls):
"""Generate CREATE TABLE statement."""
fields = []
for name, field in cls._fields.items():
constraints = []
if field.primary_key:
constraints.append('PRIMARY KEY')
if not field.nullable:
constraints.append('NOT NULL')
constraint_str = ' '.join(constraints)
sql = f'{name} {field.sql_type()}'
if constraint_str:
sql += f' {constraint_str}'
fields.append(sql)
return f'CREATE TABLE {cls._table_name} ({", ".join(fields)});'
class User(Model):
id = IntegerField(primary_key=True)
name = StringField(max_length=100)
email = StringField(max_length=255)
bio = TextField(nullable=True)
print(User.get_schema())
# Output: CREATE TABLE users (id INTEGER PRIMARY KEY, name VARCHAR(100) NOT NULL, email VARCHAR(255) NOT NULL, bio TEXT);
The metaclass discovered all field definitions and auto-generated both __init__ and the SQL schema. Users never manually define those—they emerge from class definitions.
Part 3: Add Query Building
Extend the metaclass with a simple query builder:
class QueryBuilder:
"""Simple query builder for SELECT statements."""
def __init__(self, model_cls):
self.model_cls = model_cls
self.filters = []
def filter(self, **kwargs):
"""Add WHERE clause conditions."""
for key, value in kwargs.items():
if key in self.model_cls._fields:
self.filters.append((key, value))
return self
def build_select(self):
"""Generate SELECT statement."""
query = f'SELECT * FROM {self.model_cls._table_name}'
if self.filters:
where_clauses = [f'{k} = \'{v}\'' for k, v in self.filters]
query += ' WHERE ' + ' AND '.join(where_clauses)
query += ';'
return query
class Model(metaclass=ORMMeta):
"""Base ORM model class."""
@classmethod
def get_schema(cls):
"""Generate CREATE TABLE statement."""
fields = []
for name, field in cls._fields.items():
constraints = []
if field.primary_key:
constraints.append('PRIMARY KEY')
if not field.nullable:
constraints.append('NOT NULL')
constraint_str = ' '.join(constraints)
sql = f'{name} {field.sql_type()}'
if constraint_str:
sql += f' {constraint_str}'
fields.append(sql)
return f'CREATE TABLE {cls._table_name} ({", ".join(fields)});'
@classmethod
def objects(cls):
"""Return a QueryBuilder for this model."""
return QueryBuilder(cls)
# Usage
query = User.objects().filter(name='Alice', email='[email protected]').build_select()
print(query) # Output: SELECT * FROM users WHERE name = 'Alice' AND email = '[email protected]';
The query builder is instantiated via .objects(), a pattern directly inspired by Django ORM.
Part 4: Inheritance and Polymorphism
Support model inheritance to share common fields:
class BaseModel(Model):
"""A base model with auto-generated id and timestamps."""
id = IntegerField(primary_key=True)
created_at = StringField(nullable=True)
class User(BaseModel):
name = StringField(max_length=100)
email = StringField(max_length=255)
class Product(BaseModel):
title = StringField(max_length=200)
price = Field(type_=float, default=0.0)
user_schema = User.get_schema()
product_schema = Product.get_schema()
print('User table:')
print(user_schema)
print('\nProduct table:')
print(product_schema)
Both User and Product inherit the id field from BaseModel. The metaclass merges parent fields with child fields, just like Django's multi-table inheritance.
Real-World Patterns Used in Django and SQLAlchemy
| Feature | Mini ORM | Django | SQLAlchemy |
|---|---|---|---|
| Field definitions | IntegerField() | models.IntegerField() | Column(Integer) |
| Model metaclass | ORMMeta | ModelBase | DeclarativeMeta |
| Query builder | .objects().filter() | .objects.filter() | .query.filter() |
| Schema generation | .get_schema() | Migrations | .create_all() |
Key Takeaways
- A minimal ORM demonstrates the synergy between descriptors (field definitions), metaclasses (schema building), and code generation.
- Descriptors handle instance-level data; metaclasses handle class-level infrastructure and schema.
- Query builders provide a fluent, chainable API for generating SQL without string concatenation.
- Inheritance in ORMs naturally leverages Python's MRO to merge fields from parent and child models.
Frequently Asked Questions
How does this ORM handle relationships (foreign keys)?
The mini ORM above doesn't implement relationships for simplicity. Django and SQLAlchemy handle foreign keys via special field types (ForeignKey, relationship()) that the metaclass registers and generates JOIN queries for.
How do real ORMs handle lazy loading?
Real ORMs store the database connection on the model and override __get__ on relationship fields to fetch related objects on first access. This is where __getattr__ (covered in earlier articles) becomes useful for intercepting missing attributes.
Can this ORM handle multiple databases?
The mini ORM is single-database for simplicity. Real ORMs like Django support routing queries to multiple databases via a using() method on the query builder.
How do I add custom methods to ORM models?
Simply define methods on the model class; the metaclass preserves them. The metaclass only auto-generates methods that aren't already defined, so you can always override its behavior.
Does this approach scale to large schemas?
For simple use cases, yes. Production ORMs like SQLAlchemy optimize query execution, connection pooling, and caching. The metaprogramming foundation remains the same; scaling is an implementation detail.
Further Reading
- Django Models Documentation — official guide to Django ORM and its metaclass implementation.
- SQLAlchemy Declarative Base — how SQLAlchemy uses metaclasses for declarative model syntax.
- Building a Python ORM from Scratch — Real Python's deeper dive into ORM implementation.
- Query Builder Patterns — design pattern reference for fluent APIs.