Eager Loading vs. Lazy Loading: SQLAlchemy Query Optimization
Lazy loading is SQLAlchemy's default strategy: relationships are queried only when accessed. This is convenient but causes the infamous N+1 query problem: loading N parent records and then one query per parent to fetch relationships. Eager loading strategies preload relationships, reducing database round trips and improving performance significantly.
The N+1 Query Problem
Lazy loading demonstrates the N+1 problem:
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
from models import Author
engine = create_engine("sqlite:///library.db", echo=True)
with Session(engine) as session:
# Query 1: SELECT * FROM authors
authors = session.query(Author).all()
# Queries 2-N: SELECT * FROM books WHERE author_id = ?
# One query per author!
for author in authors:
print(f"{author.name}: {len(author.books)} books")
With echo=True, you see:
- One query to fetch all authors
- One query per author to fetch their books
If you have 100 authors, that's 101 queries. This is inefficient and becomes a bottleneck in production. Eager loading solves this by fetching relationships in 1-2 additional queries.
Eager Loading Strategies
joinedload(): Use a JOIN
joinedload() fetches related objects in a single query using a JOIN:
from sqlalchemy.orm import joinedload
with Session(engine) as session:
# 1 query: SELECT authors.*, books.* FROM authors LEFT OUTER JOIN books ...
authors = session.query(Author).options(
joinedload(Author.books)
).all()
# No additional queries when accessing author.books
for author in authors:
print(f"{author.name}: {len(author.books)} books")
With joinedload(), the result set contains duplicate author rows (one per book). SQLAlchemy's session automatically deduplicates them, so you get the right number of Author objects. Use .distinct() if the join produces unintended duplicates:
authors = session.query(Author).options(
joinedload(Author.books)
).distinct().all()
selectinload(): Use a Separate Query with IN Clause
selectinload() loads relationships in a separate query using an IN clause:
from sqlalchemy.orm import selectinload
with Session(engine) as session:
# Query 1: SELECT * FROM authors
# Query 2: SELECT * FROM books WHERE author_id IN (1, 2, 3, ...)
authors = session.query(Author).options(
selectinload(Author.books)
).all()
selectinload() is ideal when JOIN causes result set explosion (many-to-many with many related objects). It's also cleaner than joinedload() when joins get complex.
subqueryload(): Use a Correlated Subquery
subqueryload() uses a correlated subquery to fetch relationships:
from sqlalchemy.orm import subqueryload
with Session(engine) as session:
# Query 1: SELECT * FROM authors
# Query 2: SELECT * FROM books WHERE author_id IN (SELECT id FROM authors WHERE ...)
authors = session.query(Author).options(
subqueryload(Author.books)
).all()
subqueryload() is less common; use selectinload() instead (similar performance, clearer intent).
Comparing Eager Loading Strategies
| Strategy | SQL Queries | Best For | Caveats |
|---|---|---|---|
joinedload() | 1 (with JOIN) | One-to-many, small result sets | Duplicate rows; avoid with large result sets |
selectinload() | 2 (main + IN clause) | Many-to-many, large result sets | Requires 2 round trips |
subqueryload() | 2 (with subquery) | Rare; similar to selectinload | Less intuitive syntax |
Choose joinedload() for simple relationships with small result sets. Use selectinload() for complex joins or many-to-many with large datasets.
Eager Loading Nested Relationships
Load multiple levels of relationships:
from sqlalchemy.orm import joinedload, contains_eager
with Session(engine) as session:
# Load books and their reviews in a single query
authors = session.query(Author).options(
joinedload(Author.books).joinedload(Book.reviews)
).all()
Chain .joinedload() calls to eagerly load nested relationships. For complex nesting, selectinload() is often clearer.
Configuring Default Lazy Loading on the Model
Set a default lazy-loading strategy when defining the relationship:
from sqlalchemy.orm import relationship
class Author(Base):
__tablename__ = "authors"
id = Column(Integer, primary_key=True)
name = Column(String(100), nullable=False)
# Default to joinedload for this relationship
books = relationship(
"Book",
lazy="joined",
back_populates="author"
)
Lazy loading strategies:
| Strategy | Behavior |
|---|---|
select | Load on access (default lazy loading) |
joined | Load via JOIN in the same query |
selectin | Load in a separate query with IN clause |
raise | Raise error if accessed outside session |
dynamic | Return a Query object for manual filtering |
Set lazy="joined" if a relationship is always accessed. For optional access, keep the default lazy="select" and use .options() when needed.
Monitoring Query Performance
Use echo=True or logging to see generated SQL:
import logging
logging.basicConfig()
logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO)
with Session(engine) as session:
authors = session.query(Author).all()
for author in authors:
print(author.books) # Watch the console for query count
Count SQL statements to verify lazy loading is eliminated. Tools like SQLAlchemy-utils provide timing utilities for profiling.
Real-World Example
Optimize a report query that fetches authors with all their books and reviews:
from sqlalchemy.orm import selectinload
with Session(engine) as session:
# BEFORE: 1 + 10 + 100 queries (N+1 problem)
authors = session.query(Author).all()
# AFTER: 3 queries (1 authors + 1 books with selectin + 1 reviews)
authors = session.query(Author).options(
selectinload(Author.books).selectinload(Book.reviews)
).all()
for author in authors:
for book in author.books:
print(f"{book.title} ({len(book.reviews)} reviews)")
The optimized query reduces database round trips from 1,000+ to 3, improving response time from seconds to milliseconds.
Key Takeaways
- N+1 problem: Lazy loading queries one parent, then one per child (inefficient).
joinedload(): Fetch relationships via JOIN in one query (best for small result sets).selectinload(): Fetch relationships in a separate query with IN clause (best for large result sets, many-to-many).subqueryload(): Similar to selectinload; less commonly used.- Eager load nested relationships by chaining
.joinedload()or.selectinload(). - Monitor query count with
echo=Trueto verify optimization.
Frequently Asked Questions
Which eager loading strategy should I use?
Start with selectinload() for safety. Use joinedload() if the result set is small and you need maximum performance. Benchmark both to confirm.
Does joinedload() duplicate results?
Yes. If an author has 10 books, the result set includes 10 duplicate author rows. SQLAlchemy's session deduplicates them automatically.
How do I eagerly load a relationship conditionally?
Use .options() in your query:
query = session.query(Author)
if load_books:
query = query.options(selectinload(Author.books))
return query.all()
Can I use eager loading with filters on relationships?
Yes, but it's tricky. Use contains_eager() for complex filtering:
session.query(Author).join(Author.books).filter(
Book.published_year > 2020
).options(contains_eager(Author.books)).all()
What if I set lazy="joined" on a relationship but only sometimes access it?
You'll always pay the JOIN cost, even if you don't need the relationship. Set lazy="select" (default) and use .options(joinedload(...)) when needed.