Multi-Tier Caching: MongoDB, Redis, and Python
Multi-tier caching layered approach where hot data lives closest to computation. Python memory cache (microseconds) is fastest but limited by process RAM. Redis (0.5–1 millisecond) is network-accessible and persists across restarts. MongoDB (10–50 milliseconds) is the source of truth. Requests cascade: check memory cache first, then Redis, then MongoDB. Misses cascade back up, refilling caches along the way.
After optimizing a recommendation engine that served 1 million requests per hour, multi-tier caching reduced average latency from 500 ms to 80 ms by keeping the hottest 1% of data in memory, 5% in Redis, and the rest in MongoDB. This guide teaches you to architect multi-tier caches that scale.
Three-Tier Caching Architecture
User Request
↓
┌─────────────────────────┐
│ L1: Python Memory Cache │ (0.1 µs, process-local, 100 MB)
│ (LRU dict with TTL) │
└─────────────────────────┘
Cache Miss ↓
┌─────────────────────────┐
│ L2: Redis Cluster │ (0.5 ms, network, 64 GB)
│ (Shared, distributed) │
└─────────────────────────┘
Cache Miss ↓
┌─────────────────────────┐
│ L3: MongoDB │ (10 ms, disk, 1 TB)
│ (Authoritative store) │
└─────────────────────────┘
Each tier trades speed for capacity. L1 (process memory) is tiny but fast; L3 (MongoDB) is huge but slow.
L1: In-Process Python Cache
Use Python's functools.lru_cache for simple caching, or implement a custom LRU with TTL for more control.
from functools import lru_cache
from time import time, sleep
import threading
class TTLCache:
"""Simple in-process LRU cache with TTL."""
def __init__(self, max_size=1000, default_ttl=300):
self.cache = {}
self.max_size = max_size
self.default_ttl = default_ttl
self.lock = threading.Lock()
def get(self, key):
"""Get a value if not expired."""
with self.lock:
if key not in self.cache:
return None
value, expiry = self.cache[key]
if time() > expiry:
del self.cache[key]
return None
return value
def set(self, key, value, ttl=None):
"""Set a value with optional TTL."""
with self.lock:
if len(self.cache) >= self.max_size:
# Evict oldest entry (simple FIFO)
oldest_key = next(iter(self.cache))
del self.cache[oldest_key]
ttl = ttl or self.default_ttl
self.cache[key] = (value, time() + ttl)
# Usage
local_cache = TTLCache(max_size=10000, default_ttl=300)
def get_user_profile(user_id):
"""Get profile with three-tier cache."""
# L1: Check process memory
cached = local_cache.get(f'user:{user_id}')
if cached:
print(f"Hit L1 (memory)")
return cached
# L2: Check Redis
redis_key = f'user:{user_id}'
cached = r.get(redis_key)
if cached:
profile = json.loads(cached)
local_cache.set(f'user:{user_id}', profile) # Refill L1
print(f"Hit L2 (Redis)")
return profile
# L3: Query MongoDB (slow)
profile = db.users.find_one({'_id': user_id})
if not profile:
return None
# Refill caches (L2, then L1)
r.setex(redis_key, 3600, json.dumps(profile, default=str))
local_cache.set(f'user:{user_id}', profile)
print(f"Hit L3 (MongoDB)")
return profile
In-process caches are fastest but do not share across processes. Use for read-heavy, non-critical data. For critical data (user sessions), use Redis for cross-process consistency.
L2: Redis Cache Layer
Redis is shared across all Python workers (threads, processes, app servers). Use it for distributed caching with TTLs and atomic operations.
import redis
import json
from datetime import datetime, timedelta
r = redis.Redis(host='redis', port=6379, decode_responses=True)
def cache_feed_in_redis(user_id, feed_data, ttl=600):
"""Cache user's feed in Redis with 10-minute TTL."""
key = f'feed:user:{user_id}'
r.setex(key, ttl, json.dumps(feed_data, default=str))
def get_cached_feed(user_id, local_cache):
"""Get feed with L1 (memory) + L2 (Redis) fallback."""
# L1: Memory
cache_key = f'feed:{user_id}'
cached = local_cache.get(cache_key)
if cached:
return cached, 'L1'
# L2: Redis
cached = r.get(cache_key)
if cached:
feed = json.loads(cached)
local_cache.set(cache_key, feed, ttl=300) # Refill memory cache
return feed, 'L2'
# L3: MongoDB (expensive)
feed = fetch_feed_from_mongodb(user_id)
# Refill caches
cache_feed_in_redis(user_id, feed, ttl=600)
local_cache.set(cache_key, feed, ttl=300)
return feed, 'L3'
# Usage
feed, source = get_cached_feed(user_id=123, local_cache=local_cache)
print(f"Feed from {source}")
Redis is the distributed cache, shared by all app workers. Use it for data that must be consistent across instances.
L3: MongoDB as Source of Truth
MongoDB is the persistent store. Cache misses in L1 and L2 query MongoDB.
from pymongo import MongoClient
from datetime import datetime
client = MongoClient('mongodb://mongo:27017/')
db = client['app']
def fetch_feed_from_mongodb(user_id):
"""Fetch user's feed from MongoDB."""
user = db.users.find_one({'_id': user_id})
if not user:
return []
# Get posts from users this user follows
following_ids = user.get('following', [])
feed = list(db.posts.find(
{'user_id': {'$in': following_ids}},
sort=[('created_at', -1)],
limit=20
))
return feed
MongoDB is queried only on cache misses. With proper caching, hit rates exceed 95%, and MongoDB handles a small percentage of requests.
Cache Invalidation: The Hard Part
Invalidate caches when data changes. Strategies:
TTL-Based Invalidation (Simple)
def update_user_profile(user_id, updates):
"""Update profile in MongoDB and invalidate caches."""
# Update MongoDB
db.users.update_one({'_id': user_id}, {'$set': updates})
# Invalidate L2 (Redis)
r.delete(f'user:{user_id}')
# Invalidate L1 (process memory) — harder to coordinate
local_cache.delete(f'user:{user_id}')
For L1 caches across multiple processes, rely on TTL expiration. Set L1 TTL to 5 minutes and accept slight staleness.
Event-Based Invalidation (Advanced)
import redis
def update_user_profile(user_id, updates):
"""Update profile and publish invalidation event."""
# Update MongoDB
db.users.update_one({'_id': user_id}, {'$set': updates})
# Publish cache invalidation event
pub = redis.Redis(host='redis', port=6379, decode_responses=True)
pub.publish('cache:invalidate', json.dumps({
'type': 'user_profile',
'user_id': user_id
}))
# In a separate listener (background task):
def cache_invalidation_listener():
"""Listen for invalidation events and clear caches."""
sub = redis.Redis(host='redis', port=6379, decode_responses=True)
pubsub = sub.pubsub()
pubsub.subscribe('cache:invalidate')
for message in pubsub.listen():
if message['type'] == 'message':
event = json.loads(message['data'])
if event['type'] == 'user_profile':
user_id = event['user_id']
# Clear Redis
r.delete(f'user:{user_id}')
# Clear all process memory caches (broadcast)
# In practice, use Redis pub/sub to notify all workers
pub.publish('local:cache:clear', f'user:{user_id}')
Event-based invalidation is precise but complex. For most applications, TTL-based invalidation (with slightly stale reads) is sufficient.
Pattern: Cache-Aside (Lazy Loading)
Check caches before querying the database.
def get_user(user_id, local_cache):
"""Get user with cache-aside pattern."""
# L1
user = local_cache.get(f'user:{user_id}')
if user:
return user
# L2
user_json = r.get(f'user:{user_id}')
if user_json:
user = json.loads(user_json)
local_cache.set(f'user:{user_id}', user)
return user
# L3
user = db.users.find_one({'_id': user_id})
if user:
# Refill caches
r.setex(f'user:{user_id}', 3600, json.dumps(user, default=str))
local_cache.set(f'user:{user_id}', user, ttl=3600)
return user
Cache-aside is simple and does not require coordination. Application code is responsible for populating caches.
Pattern: Write-Through (Synchronous Update)
Update all cache tiers synchronously.
def create_post(user_id, content):
"""Create post and update all caches synchronously."""
# Write to MongoDB (L3)
result = db.posts.insert_one({
'user_id': user_id,
'content': content,
'created_at': datetime.now()
})
post_id = result.inserted_id
# Update Redis feed cache (L2)
feed_key = f'feed:user:{user_id}'
feed = json.loads(r.get(feed_key) or '[]')
feed.insert(0, {'_id': str(post_id), 'content': content})
feed = feed[:20] # Keep last 20 posts
r.setex(feed_key, 600, json.dumps(feed))
# Invalidate memory cache (L1)
local_cache.delete(feed_key)
return post_id
Write-through ensures caches stay consistent but is slower (waits for all writes).
Monitoring Cache Performance
Track hit rates and latencies to optimize cache configuration.
from time import time
class CacheStats:
def __init__(self):
self.l1_hits = 0
self.l1_misses = 0
self.l2_hits = 0
self.l2_misses = 0
self.l3_hits = 0
def hit_rate(self):
total = self.l1_hits + self.l1_misses + self.l2_hits + self.l2_misses
hits = self.l1_hits + self.l2_hits
return hits / total if total > 0 else 0
def report(self):
print(f"L1 hits: {self.l1_hits}, misses: {self.l1_misses}")
print(f"L2 hits: {self.l2_hits}, misses: {self.l2_misses}")
print(f"Hit rate: {self.hit_rate():.2%}")
stats = CacheStats()
# In get_user():
def get_user(user_id, local_cache):
user = local_cache.get(f'user:{user_id}')
if user:
stats.l1_hits += 1
return user
stats.l1_misses += 1
user_json = r.get(f'user:{user_id}')
if user_json:
stats.l2_hits += 1
user = json.loads(user_json)
local_cache.set(f'user:{user_id}', user)
return user
stats.l2_misses += 1
user = db.users.find_one({'_id': user_id})
return user
# Log stats periodically
def log_stats():
while True:
time.sleep(60)
stats.report()
Aim for hit rates above 90%. Below 80% indicates undersized caches or misaligned TTLs.
Key Takeaways
- Multi-tier caching reduces latency: L1 memory (microseconds) -> L2 Redis (milliseconds) -> L3 MongoDB (tens of milliseconds)
- Use cache-aside (lazy loading) for simplicity; write-through for strong consistency
- Invalidate caches on writes: TTL-based (simple) or event-based (precise)
- Monitor cache hit rates; aim for above 90% to minimize database load
- L1 (process memory) is fastest but process-local; L2 (Redis) is slower but shared across workers
- For distributed systems, use Redis for global cache consistency; L1 for local, short-lived acceleration
Frequently Asked Questions
How large should each cache tier be?
Typical ratios: L1 (process) = 1–10% of working set, L2 (Redis) = 10–50%, L3 (database) = 100%. For 10 million user profiles, cache top 1% in L1 (100,000 users = 100 MB), top 5% in L2 (500,000 users = 500 MB), rest in MongoDB.
What TTL should I use for each tier?
L1: 5–15 minutes (short, accept staleness), L2: 30 minutes to 24 hours (depends on data update frequency), L3: N/A (persistent). Balance between freshness and hit rate.
How do I handle cache stampede?
Cache stampede: multiple requests hit missed cache simultaneously, all query MongoDB. Use probabilistic early expiration: refresh cache before expiry if next request detects upcoming expiration. Or use cache locks: first requester computes, others wait.
Can I use Memcached instead of Redis for L2?
Yes, Memcached is simpler but lacks TTL granularity, pub/sub, and persistence. Use Memcached for pure caching, Redis for caching + other features (leaderboards, sessions, pub/sub).
Should I cache everything?
No. Cache only hot data (frequently accessed) and expensive operations (database queries, computations). Caching rarely-accessed data wastes memory. Use monitoring to identify what to cache.