Python Databases: Persistence and Data Layer Fundamentals

Databases are where your application's critical data lives, and in Python, mastering data persistence means understanding both relational SQL databases and modern NoSQL alternatives. This chapter equips you with production-ready patterns for storing, querying, and scaling data—from setting up a simple SQLite prototype to architecting async database layers that handle thousands of concurrent requests without blocking. You'll learn to model complex data with SQLAlchemy ORM, track schema changes using Alembic migrations, cache frequently-accessed data with Redis, and integrate MongoDB for flexible document storage.

What You'll Learn

Write parameterized SQL queries and avoid injection vulnerabilities
Use SQLAlchemy ORM to map Python classes to database tables
Run schema migrations safely with Alembic in production
Cache application data with Redis for microsecond-fast retrievals
Query and index MongoDB collections at scale
Build async database drivers for non-blocking I/O
Design normalized schemas and understand database normalization levels
Monitor query performance and optimize slow operations
Implement connection pooling and retry logic
Deploy databases with Docker and manage migrations in CI/CD pipelines

Chapter Overview

This chapter covers five core themes that build on each other. Working with SQL Databases in Python introduces you to relational data models: you'll write SQL directly using libraries like sqlite3, then graduate to parameterized queries that prevent SQL injection. The chapter explains ACID transactions, indexes, and query planning—concepts you'll apply throughout your career. Database Modeling with SQLAlchemy ORM replaces hand-written SQL with declarative Python classes. You'll define models as subclasses of Base, configure relationships, and let the ORM generate SQL. This theme shows when to use ORM query methods versus raw SQL, and how to optimize N+1 queries that silently destroy performance. Schema Migrations with Alembic teaches you how to evolve your database structure without losing data. You'll create migrations that track every table change, write upgrade and downgrade functions, and integrate migrations into your deployment pipeline so your schema stays in sync across development, staging, and production. NoSQL and Caching with MongoDB and Redis introduces non-relational storage: MongoDB for documents that don't fit a rigid schema, and Redis for in-memory caching that turns database queries into microsecond operations. You'll learn when to choose each tool and how to keep cache consistent with your source of truth. Async Database Access at Scale shows you how to query databases without blocking the event loop—essential knowledge for building APIs that serve thousands of users. Using async drivers like asyncpg for PostgreSQL and motor for MongoDB, you'll measure the performance difference between synchronous and asynchronous database I/O, then architecture production systems that scale.

Together, these themes form a complete mental model of how data flows through applications: from schema design through ORM mapping, through migrations that evolve the schema safely, through caching layers that reduce load, to async patterns that scale horizontally. By the end of this chapter, you'll build a multi-tier data layer that feels native to Python and performs in production.

Frequently Asked Questions

Should I use an ORM or write SQL directly?

Both. SQLAlchemy ORM is ideal for CRUD operations, relationships, and rapid iteration—it handles SQL generation and prevents injection vulnerabilities. Use raw SQL for complex analytical queries, window functions, or performance-critical queries where the ORM can't generate efficient SQL. Many production apps use the ORM for 80% of queries and raw SQL for the remaining 20%.

What's the difference between SQL and NoSQL databases?

SQL databases (PostgreSQL, MySQL, SQLite) enforce a rigid schema and use ACID transactions to guarantee consistency; they scale vertically (bigger servers). NoSQL databases (MongoDB, DynamoDB) store flexible documents without a predefined schema and scale horizontally (more servers). Choose SQL for transactional workloads with strong consistency requirements; choose NoSQL for unstructured data, rapid schema evolution, or massive scale.

Why do I need async database drivers if I can use threads?

Async drivers run queries concurrently on a single OS thread using an event loop, so you can serve thousands of concurrent connections with minimal overhead. Threads cost ~2 MB of stack each, so thread pools max out around 100–200 concurrent requests. Async libraries like asyncpg scale to 10,000+ concurrent connections on the same machine. Use async for I/O-bound workloads like web APIs and background jobs; use threads only for CPU-bound work.

How do I avoid the N+1 query problem?

The N+1 problem occurs when you query one parent object then loop over it and query children, resulting in 1 + N queries. In SQLAlchemy, use eager loading with joinedload() or selectinload() to fetch related objects in one query. Use session.query(Parent).options(joinedload(Parent.children)) to load children immediately instead of on access.

Can I use multiple databases in one Python application?

Yes. You can create multiple SQLAlchemy engines (one per database), or configure a single engine with multiple schemas. Use different ORM sessions or async pools for each database. In production, separate databases by function (read replicas, sharded data, or OLTP versus OLAP) and manage connections with Docker and environment variables so configuration changes don't require code changes.

What You'll Learn​

Chapter Overview​

Frequently Asked Questions​

Should I use an ORM or write SQL directly?​

What's the difference between SQL and NoSQL databases?​

Why do I need async database drivers if I can use threads?​

How do I avoid the N+1 query problem?​

Can I use multiple databases in one Python application?​