Skip to main content

Building Data Models for Multi-Tenant SaaS

A multi-tenant SaaS database stores data for multiple customers (tenants) in a shared schema while guaranteeing complete isolation. Your data model must enforce tenant boundaries at the database level to prevent one tenant from querying another's data. SQLAlchemy is an industry-standard ORM for Python that lets you define models as classes, handle migrations, and apply indexes—all in code. This guide shows you how to structure models for tenant isolation, set up relationships correctly, and avoid common data leaks.

Why Multi-Tenancy Requires Careful Schema Design

Multi-tenancy means all customers' data lives in one database under one AWS RDS instance, saving ops overhead compared to separate databases per tenant. However, a single mistake—querying without a tenant filter—exposes Customer A's data to Customer B. Companies like Slack and Notion handle this by joining every single query to a tenant_id, enforced at the ORM level. A poorly designed schema makes this manual and error-prone; a good one makes tenant isolation automatic.

Setting Up SQLAlchemy and Your First Model

Install dependencies:

pip install sqlalchemy[postgresql] alembic psycopg[binary] python-dotenv

Create a database connection and base class:

from sqlalchemy import create_engine
from sqlalchemy.orm import declarative_base, Session
from sqlalchemy.pool import NullPool
import os

DATABASE_URL = os.getenv("DATABASE_URL", "postgresql://user:pass@localhost/saas")

engine = create_engine(
DATABASE_URL,
echo=False, # Set to True to see SQL queries
poolclass=NullPool, # NullPool for serverless; QueuePool for long-lived processes
connect_args={"connect_timeout": 10}
)

Base = declarative_base()

def get_db():
"""Yields a database session for dependency injection."""
session = Session(engine)
try:
yield session
finally:
session.close()

Core Multi-Tenant Models

Every tenant has users, and every user belongs to one tenant. Define this relationship explicitly:

from sqlalchemy import Column, Integer, String, DateTime, ForeignKey, Boolean, UniqueConstraint, Index
from sqlalchemy.orm import relationship
from datetime import datetime
from uuid import uuid4

class Tenant(Base):
"""
Represents a single customer in the SaaS.
All data is partitioned by tenant_id.
"""
__tablename__ = "tenants"

id = Column(Integer, primary_key=True)
slug = Column(String(50), unique=True, nullable=False, index=True)
name = Column(String(255), nullable=False)
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)

# Relationships
users = relationship("User", back_populates="tenant", cascade="all, delete-orphan")
teams = relationship("Team", back_populates="tenant", cascade="all, delete-orphan")

class User(Base):
"""
A user always belongs to exactly one tenant.
Every query must filter by tenant_id to prevent data leaks.
"""
__tablename__ = "users"

id = Column(Integer, primary_key=True)
tenant_id = Column(Integer, ForeignKey("tenants.id", ondelete="CASCADE"), nullable=False, index=True)
email = Column(String(255), nullable=False)
password_hash = Column(String(255), nullable=False)
first_name = Column(String(100), nullable=False)
last_name = Column(String(100), nullable=False)
is_active = Column(Boolean, default=True, nullable=False)
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)

# Unique constraint: email is unique per tenant, not globally
__table_args__ = (
UniqueConstraint("tenant_id", "email", name="uq_tenant_email"),
Index("idx_tenant_id_email", "tenant_id", "email"),
)

# Relationships
tenant = relationship("Tenant", back_populates="users")
teams = relationship("TeamMember", back_populates="user", cascade="all, delete-orphan")

class Team(Base):
"""
A team belongs to one tenant and contains multiple users.
Teams allow role-based access control (e.g., admin vs. member).
"""
__tablename__ = "teams"

id = Column(Integer, primary_key=True)
tenant_id = Column(Integer, ForeignKey("tenants.id", ondelete="CASCADE"), nullable=False, index=True)
name = Column(String(255), nullable=False)
description = Column(String(1000), nullable=True)
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)

__table_args__ = (
UniqueConstraint("tenant_id", "name", name="uq_team_name_per_tenant"),
Index("idx_tenant_teams", "tenant_id"),
)

tenant = relationship("Tenant", back_populates="teams")
members = relationship("TeamMember", back_populates="team", cascade="all, delete-orphan")

class TeamMember(Base):
"""
Join table linking users to teams with a role.
"""
__tablename__ = "team_members"

id = Column(Integer, primary_key=True)
team_id = Column(Integer, ForeignKey("teams.id", ondelete="CASCADE"), nullable=False, index=True)
user_id = Column(Integer, ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True)
role = Column(String(50), default="member", nullable=False) # "owner", "admin", "member"
joined_at = Column(DateTime, default=datetime.utcnow, nullable=False)

__table_args__ = (
UniqueConstraint("team_id", "user_id", name="uq_team_user"),
Index("idx_team_members", "team_id", "user_id"),
)

team = relationship("Team", back_populates="members")
user = relationship("User", back_populates="teams")

Enforcing Tenant Isolation in Queries

Define a custom query class that always filters by tenant:

from sqlalchemy.orm import Query
from fastapi import HTTPException

class TenantAwareQuery(Query):
"""
A custom query that automatically filters by the current tenant.
Prevents accidental leaks where a developer forgets to add tenant_id to WHERE.
"""
def __init__(self, *args, tenant_id: int | None = None, **kwargs):
super().__init__(*args, **kwargs)
self._tenant_id = tenant_id

def filter_by_tenant(self, tenant_id: int):
"""Set the tenant ID for this query."""
self._tenant_id = tenant_id
return self

def all(self):
"""Override all() to apply tenant filter automatically."""
if self._tenant_id is None:
raise RuntimeError("Tenant ID not set. Use filter_by_tenant() before querying.")

# Get the primary entity being queried
mapper = self.column_descriptions[0]["type"]
if hasattr(mapper, "tenant_id"):
return super().filter(mapper.tenant_id == self._tenant_id).all()
return super().all()

def first(self):
"""Override first() to apply tenant filter automatically."""
if self._tenant_id is None:
raise RuntimeError("Tenant ID not set. Use filter_by_tenant() before querying.")

mapper = self.column_descriptions[0]["type"]
if hasattr(mapper, "tenant_id"):
return super().filter(mapper.tenant_id == self._tenant_id).first()
return super().first()

# Usage in FastAPI:
from fastapi import Depends
from sqlalchemy.orm import Session
from typing import Annotated

async def get_current_tenant(request: Request) -> int:
"""Extract tenant ID from request (JWT token, header, or domain)."""
# Placeholder: in production, decode a JWT and extract the tenant_id claim
return request.headers.get("X-Tenant-ID", 1)

@app.get("/teams")
async def list_user_teams(
session: Annotated[Session, Depends(get_db)],
tenant_id: Annotated[int, Depends(get_current_tenant)]
):
"""List all teams for the current tenant."""
query = TenantAwareQuery(Team, session=session).filter_by_tenant(tenant_id)
teams = query.all()
return teams

Indexes and Performance Optimization

Indexes speed up queries but slow down writes. For multi-tenant schemas, index (tenant_id, other_field) pairs:

from sqlalchemy import Index

class Workspace(Base):
"""Example: a workspace (project) belongs to a tenant."""
__tablename__ = "workspaces"

id = Column(Integer, primary_key=True)
tenant_id = Column(Integer, ForeignKey("tenants.id", ondelete="CASCADE"), nullable=False)
name = Column(String(255), nullable=False)
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)

__table_args__ = (
# Composite index: speeds up queries like "WHERE tenant_id = ? AND created_at > ?"
Index("idx_tenant_created", "tenant_id", "created_at"),
UniqueConstraint("tenant_id", "name"),
)

A query like SELECT * FROM workspaces WHERE tenant_id = 42 AND created_at > '2024-01-01' uses the composite index; a query like SELECT * FROM workspaces WHERE name = 'MyWorkspace' (missing tenant_id) does a slow full table scan and violates isolation, so it should never happen in production.

Soft Deletes for Audit Trails

Some SaaS platforms use soft deletes (marking rows as deleted instead of removing them) to maintain audit trails:

class AuditMixin:
"""Mixin to add created_at, updated_at, and deleted_at to any model."""
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
deleted_at = Column(DateTime, nullable=True, index=True)

class Document(Base, AuditMixin):
__tablename__ = "documents"

id = Column(Integer, primary_key=True)
tenant_id = Column(Integer, ForeignKey("tenants.id", ondelete="CASCADE"), nullable=False)
title = Column(String(255), nullable=False)
content = Column(String(10000), nullable=False)

__table_args__ = (
Index("idx_tenant_docs_active", "tenant_id", "deleted_at"),
)

# When listing documents, always exclude soft-deleted rows:
def get_active_documents(session: Session, tenant_id: int):
return session.query(Document).filter(
Document.tenant_id == tenant_id,
Document.deleted_at == None
).all()

# When deleting, set deleted_at instead of removing the row:
def soft_delete_document(session: Session, document_id: int, tenant_id: int):
doc = session.query(Document).filter(
Document.id == document_id,
Document.tenant_id == tenant_id
).first()
if doc:
doc.deleted_at = datetime.utcnow()
session.commit()

Key Takeaways

  • Every table except tenants should have a tenant_id foreign key and a unique constraint pairing tenant_id with a business key.
  • Use composite indexes on (tenant_id, field) to speed up queries and prevent accidental queries that miss the tenant filter.
  • Define custom query classes that enforce tenant filtering at the ORM level to prevent data leaks.
  • Relationships (relationship()) navigate foreign keys; cascade settings control behavior when the parent is deleted.
  • Soft deletes (deleted_at column) preserve audit trails; always filter out deleted rows in queries.

Frequently Asked Questions

Can I use a UUID instead of an integer for tenant_id?

Yes. UUIDs are common for distributed systems. Use Column(UUID(as_uuid=True), primary_key=True, default=uuid4). Integers are slightly faster and smaller in indexes; UUIDs are safer for splitting tenants across databases later.

What if a query needs data from multiple tenants?

Redesign the query. If you're joining tables from multiple tenants, you're violating isolation. Fetch data per-tenant, then combine in application code, or redesign the schema so the data belongs to a single tenant.

How do I enforce tenant isolation at the database level?

Row-level security (RLS) in PostgreSQL using policies. Define: CREATE POLICY tenant_isolation ON users USING (tenant_id = current_setting('app.tenant_id')). Set current_setting per-request, and PostgreSQL rejects queries on other tenants even at the database layer. This is a defense-in-depth measure.

Should I use a shared schema or a schema per tenant?

Shared schema (all tenants in one schema, isolated by tenant_id) is easier to migrate, backup, and debug. Schema-per-tenant (separate schema for each tenant) is more complex but offers hard isolation. Most SaaS platforms use shared schema.

How do I test multi-tenant isolation?

Create two test tenants, insert data in each, and verify that queries on one don't return the other's data. Use pytest fixtures to set up test data. Mock the get_current_tenant() dependency to return different tenant IDs per test.

Further Reading