Skip to main content

Eager Loading vs. Lazy Loading: SQLAlchemy Query Optimization

Lazy loading is SQLAlchemy's default strategy: relationships are queried only when accessed. This is convenient but causes the infamous N+1 query problem: loading N parent records and then one query per parent to fetch relationships. Eager loading strategies preload relationships, reducing database round trips and improving performance significantly.

The N+1 Query Problem

Lazy loading demonstrates the N+1 problem:

from sqlalchemy import create_engine
from sqlalchemy.orm import Session
from models import Author

engine = create_engine("sqlite:///library.db", echo=True)

with Session(engine) as session:
# Query 1: SELECT * FROM authors
authors = session.query(Author).all()

# Queries 2-N: SELECT * FROM books WHERE author_id = ?
# One query per author!
for author in authors:
print(f"{author.name}: {len(author.books)} books")

With echo=True, you see:

  1. One query to fetch all authors
  2. One query per author to fetch their books

If you have 100 authors, that's 101 queries. This is inefficient and becomes a bottleneck in production. Eager loading solves this by fetching relationships in 1-2 additional queries.

Eager Loading Strategies

joinedload(): Use a JOIN

joinedload() fetches related objects in a single query using a JOIN:

from sqlalchemy.orm import joinedload

with Session(engine) as session:
# 1 query: SELECT authors.*, books.* FROM authors LEFT OUTER JOIN books ...
authors = session.query(Author).options(
joinedload(Author.books)
).all()

# No additional queries when accessing author.books
for author in authors:
print(f"{author.name}: {len(author.books)} books")

With joinedload(), the result set contains duplicate author rows (one per book). SQLAlchemy's session automatically deduplicates them, so you get the right number of Author objects. Use .distinct() if the join produces unintended duplicates:

authors = session.query(Author).options(
joinedload(Author.books)
).distinct().all()

selectinload(): Use a Separate Query with IN Clause

selectinload() loads relationships in a separate query using an IN clause:

from sqlalchemy.orm import selectinload

with Session(engine) as session:
# Query 1: SELECT * FROM authors
# Query 2: SELECT * FROM books WHERE author_id IN (1, 2, 3, ...)
authors = session.query(Author).options(
selectinload(Author.books)
).all()

selectinload() is ideal when JOIN causes result set explosion (many-to-many with many related objects). It's also cleaner than joinedload() when joins get complex.

subqueryload(): Use a Correlated Subquery

subqueryload() uses a correlated subquery to fetch relationships:

from sqlalchemy.orm import subqueryload

with Session(engine) as session:
# Query 1: SELECT * FROM authors
# Query 2: SELECT * FROM books WHERE author_id IN (SELECT id FROM authors WHERE ...)
authors = session.query(Author).options(
subqueryload(Author.books)
).all()

subqueryload() is less common; use selectinload() instead (similar performance, clearer intent).

Comparing Eager Loading Strategies

StrategySQL QueriesBest ForCaveats
joinedload()1 (with JOIN)One-to-many, small result setsDuplicate rows; avoid with large result sets
selectinload()2 (main + IN clause)Many-to-many, large result setsRequires 2 round trips
subqueryload()2 (with subquery)Rare; similar to selectinloadLess intuitive syntax

Choose joinedload() for simple relationships with small result sets. Use selectinload() for complex joins or many-to-many with large datasets.

Eager Loading Nested Relationships

Load multiple levels of relationships:

from sqlalchemy.orm import joinedload, contains_eager

with Session(engine) as session:
# Load books and their reviews in a single query
authors = session.query(Author).options(
joinedload(Author.books).joinedload(Book.reviews)
).all()

Chain .joinedload() calls to eagerly load nested relationships. For complex nesting, selectinload() is often clearer.

Configuring Default Lazy Loading on the Model

Set a default lazy-loading strategy when defining the relationship:

from sqlalchemy.orm import relationship

class Author(Base):
__tablename__ = "authors"

id = Column(Integer, primary_key=True)
name = Column(String(100), nullable=False)

# Default to joinedload for this relationship
books = relationship(
"Book",
lazy="joined",
back_populates="author"
)

Lazy loading strategies:

StrategyBehavior
selectLoad on access (default lazy loading)
joinedLoad via JOIN in the same query
selectinLoad in a separate query with IN clause
raiseRaise error if accessed outside session
dynamicReturn a Query object for manual filtering

Set lazy="joined" if a relationship is always accessed. For optional access, keep the default lazy="select" and use .options() when needed.

Monitoring Query Performance

Use echo=True or logging to see generated SQL:

import logging

logging.basicConfig()
logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO)

with Session(engine) as session:
authors = session.query(Author).all()
for author in authors:
print(author.books) # Watch the console for query count

Count SQL statements to verify lazy loading is eliminated. Tools like SQLAlchemy-utils provide timing utilities for profiling.

Real-World Example

Optimize a report query that fetches authors with all their books and reviews:

from sqlalchemy.orm import selectinload

with Session(engine) as session:
# BEFORE: 1 + 10 + 100 queries (N+1 problem)
authors = session.query(Author).all()

# AFTER: 3 queries (1 authors + 1 books with selectin + 1 reviews)
authors = session.query(Author).options(
selectinload(Author.books).selectinload(Book.reviews)
).all()

for author in authors:
for book in author.books:
print(f"{book.title} ({len(book.reviews)} reviews)")

The optimized query reduces database round trips from 1,000+ to 3, improving response time from seconds to milliseconds.

Key Takeaways

  • N+1 problem: Lazy loading queries one parent, then one per child (inefficient).
  • joinedload(): Fetch relationships via JOIN in one query (best for small result sets).
  • selectinload(): Fetch relationships in a separate query with IN clause (best for large result sets, many-to-many).
  • subqueryload(): Similar to selectinload; less commonly used.
  • Eager load nested relationships by chaining .joinedload() or .selectinload().
  • Monitor query count with echo=True to verify optimization.

Frequently Asked Questions

Which eager loading strategy should I use?

Start with selectinload() for safety. Use joinedload() if the result set is small and you need maximum performance. Benchmark both to confirm.

Does joinedload() duplicate results?

Yes. If an author has 10 books, the result set includes 10 duplicate author rows. SQLAlchemy's session deduplicates them automatically.

How do I eagerly load a relationship conditionally?

Use .options() in your query:

query = session.query(Author)
if load_books:
query = query.options(selectinload(Author.books))
return query.all()

Can I use eager loading with filters on relationships?

Yes, but it's tricky. Use contains_eager() for complex filtering:

session.query(Author).join(Author.books).filter(
Book.published_year > 2020
).options(contains_eager(Author.books)).all()

What if I set lazy="joined" on a relationship but only sometimes access it?

You'll always pay the JOIN cost, even if you don't need the relationship. Set lazy="select" (default) and use .options(joinedload(...)) when needed.

Further Reading