Multi-Tier Caching: MongoDB, Redis, and Python

Multi-tier caching layered approach where hot data lives closest to computation. Python memory cache (microseconds) is fastest but limited by process RAM. Redis (0.5–1 millisecond) is network-accessible and persists across restarts. MongoDB (10–50 milliseconds) is the source of truth. Requests cascade: check memory cache first, then Redis, then MongoDB. Misses cascade back up, refilling caches along the way.

After optimizing a recommendation engine that served 1 million requests per hour, multi-tier caching reduced average latency from 500 ms to 80 ms by keeping the hottest 1% of data in memory, 5% in Redis, and the rest in MongoDB. This guide teaches you to architect multi-tier caches that scale.

Three-Tier Caching Architecture

User Request
    ↓
┌─────────────────────────┐
│ L1: Python Memory Cache │ (0.1 µs, process-local, 100 MB)
│ (LRU dict with TTL)     │
└─────────────────────────┘
    Cache Miss ↓
┌─────────────────────────┐
│ L2: Redis Cluster       │ (0.5 ms, network, 64 GB)
│ (Shared, distributed)   │
└─────────────────────────┘
    Cache Miss ↓
┌─────────────────────────┐
│ L3: MongoDB             │ (10 ms, disk, 1 TB)
│ (Authoritative store)   │
└─────────────────────────┘

Each tier trades speed for capacity. L1 (process memory) is tiny but fast; L3 (MongoDB) is huge but slow.

L1: In-Process Python Cache

Use Python's functools.lru_cache for simple caching, or implement a custom LRU with TTL for more control.

from functools import lru_cache
from time import time, sleep
import threading

class TTLCache:
    """Simple in-process LRU cache with TTL."""
    
    def __init__(self, max_size=1000, default_ttl=300):
        self.cache = {}
        self.max_size = max_size
        self.default_ttl = default_ttl
        self.lock = threading.Lock()
    
    def get(self, key):
        """Get a value if not expired."""
        with self.lock:
            if key not in self.cache:
                return None
            
            value, expiry = self.cache[key]
            if time() > expiry:
                del self.cache[key]
                return None
            
            return value
    
    def set(self, key, value, ttl=None):
        """Set a value with optional TTL."""
        with self.lock:
            if len(self.cache) >= self.max_size:
                # Evict oldest entry (simple FIFO)
                oldest_key = next(iter(self.cache))
                del self.cache[oldest_key]
            
            ttl = ttl or self.default_ttl
            self.cache[key] = (value, time() + ttl)

# Usage
local_cache = TTLCache(max_size=10000, default_ttl=300)

def get_user_profile(user_id):
    """Get profile with three-tier cache."""
    # L1: Check process memory
    cached = local_cache.get(f'user:{user_id}')
    if cached:
        print(f"Hit L1 (memory)")
        return cached
    
    # L2: Check Redis
    redis_key = f'user:{user_id}'
    cached = r.get(redis_key)
    if cached:
        profile = json.loads(cached)
        local_cache.set(f'user:{user_id}', profile)  # Refill L1
        print(f"Hit L2 (Redis)")
        return profile
    
    # L3: Query MongoDB (slow)
    profile = db.users.find_one({'_id': user_id})
    if not profile:
        return None
    
    # Refill caches (L2, then L1)
    r.setex(redis_key, 3600, json.dumps(profile, default=str))
    local_cache.set(f'user:{user_id}', profile)
    
    print(f"Hit L3 (MongoDB)")
    return profile

In-process caches are fastest but do not share across processes. Use for read-heavy, non-critical data. For critical data (user sessions), use Redis for cross-process consistency.

L2: Redis Cache Layer

Redis is shared across all Python workers (threads, processes, app servers). Use it for distributed caching with TTLs and atomic operations.

import redis
import json
from datetime import datetime, timedelta

r = redis.Redis(host='redis', port=6379, decode_responses=True)

def cache_feed_in_redis(user_id, feed_data, ttl=600):
    """Cache user's feed in Redis with 10-minute TTL."""
    key = f'feed:user:{user_id}'
    r.setex(key, ttl, json.dumps(feed_data, default=str))

def get_cached_feed(user_id, local_cache):
    """Get feed with L1 (memory) + L2 (Redis) fallback."""
    # L1: Memory
    cache_key = f'feed:{user_id}'
    cached = local_cache.get(cache_key)
    if cached:
        return cached, 'L1'
    
    # L2: Redis
    cached = r.get(cache_key)
    if cached:
        feed = json.loads(cached)
        local_cache.set(cache_key, feed, ttl=300)  # Refill memory cache
        return feed, 'L2'
    
    # L3: MongoDB (expensive)
    feed = fetch_feed_from_mongodb(user_id)
    
    # Refill caches
    cache_feed_in_redis(user_id, feed, ttl=600)
    local_cache.set(cache_key, feed, ttl=300)
    
    return feed, 'L3'

# Usage
feed, source = get_cached_feed(user_id=123, local_cache=local_cache)
print(f"Feed from {source}")

Redis is the distributed cache, shared by all app workers. Use it for data that must be consistent across instances.

L3: MongoDB as Source of Truth

MongoDB is the persistent store. Cache misses in L1 and L2 query MongoDB.

from pymongo import MongoClient
from datetime import datetime

client = MongoClient('mongodb://mongo:27017/')
db = client['app']

def fetch_feed_from_mongodb(user_id):
    """Fetch user's feed from MongoDB."""
    user = db.users.find_one({'_id': user_id})
    
    if not user:
        return []
    
    # Get posts from users this user follows
    following_ids = user.get('following', [])
    
    feed = list(db.posts.find(
        {'user_id': {'$in': following_ids}},
        sort=[('created_at', -1)],
        limit=20
    ))
    
    return feed

MongoDB is queried only on cache misses. With proper caching, hit rates exceed 95%, and MongoDB handles a small percentage of requests.

Cache Invalidation: The Hard Part

Invalidate caches when data changes. Strategies:

TTL-Based Invalidation (Simple)

def update_user_profile(user_id, updates):
    """Update profile in MongoDB and invalidate caches."""
    
    # Update MongoDB
    db.users.update_one({'_id': user_id}, {'$set': updates})
    
    # Invalidate L2 (Redis)
    r.delete(f'user:{user_id}')
    
    # Invalidate L1 (process memory) — harder to coordinate
    local_cache.delete(f'user:{user_id}')

For L1 caches across multiple processes, rely on TTL expiration. Set L1 TTL to 5 minutes and accept slight staleness.

Event-Based Invalidation (Advanced)

import redis

def update_user_profile(user_id, updates):
    """Update profile and publish invalidation event."""
    
    # Update MongoDB
    db.users.update_one({'_id': user_id}, {'$set': updates})
    
    # Publish cache invalidation event
    pub = redis.Redis(host='redis', port=6379, decode_responses=True)
    pub.publish('cache:invalidate', json.dumps({
        'type': 'user_profile',
        'user_id': user_id
    }))

# In a separate listener (background task):
def cache_invalidation_listener():
    """Listen for invalidation events and clear caches."""
    sub = redis.Redis(host='redis', port=6379, decode_responses=True)
    pubsub = sub.pubsub()
    pubsub.subscribe('cache:invalidate')
    
    for message in pubsub.listen():
        if message['type'] == 'message':
            event = json.loads(message['data'])
            
            if event['type'] == 'user_profile':
                user_id = event['user_id']
                
                # Clear Redis
                r.delete(f'user:{user_id}')
                
                # Clear all process memory caches (broadcast)
                # In practice, use Redis pub/sub to notify all workers
                pub.publish('local:cache:clear', f'user:{user_id}')

Event-based invalidation is precise but complex. For most applications, TTL-based invalidation (with slightly stale reads) is sufficient.

Pattern: Cache-Aside (Lazy Loading)

Check caches before querying the database.

def get_user(user_id, local_cache):
    """Get user with cache-aside pattern."""
    
    # L1
    user = local_cache.get(f'user:{user_id}')
    if user:
        return user
    
    # L2
    user_json = r.get(f'user:{user_id}')
    if user_json:
        user = json.loads(user_json)
        local_cache.set(f'user:{user_id}', user)
        return user
    
    # L3
    user = db.users.find_one({'_id': user_id})
    
    if user:
        # Refill caches
        r.setex(f'user:{user_id}', 3600, json.dumps(user, default=str))
        local_cache.set(f'user:{user_id}', user, ttl=3600)
    
    return user

Cache-aside is simple and does not require coordination. Application code is responsible for populating caches.

Pattern: Write-Through (Synchronous Update)

Update all cache tiers synchronously.

def create_post(user_id, content):
    """Create post and update all caches synchronously."""
    
    # Write to MongoDB (L3)
    result = db.posts.insert_one({
        'user_id': user_id,
        'content': content,
        'created_at': datetime.now()
    })
    post_id = result.inserted_id
    
    # Update Redis feed cache (L2)
    feed_key = f'feed:user:{user_id}'
    feed = json.loads(r.get(feed_key) or '[]')
    feed.insert(0, {'_id': str(post_id), 'content': content})
    feed = feed[:20]  # Keep last 20 posts
    r.setex(feed_key, 600, json.dumps(feed))
    
    # Invalidate memory cache (L1)
    local_cache.delete(feed_key)
    
    return post_id

Write-through ensures caches stay consistent but is slower (waits for all writes).

Monitoring Cache Performance

Track hit rates and latencies to optimize cache configuration.

from time import time

class CacheStats:
    def __init__(self):
        self.l1_hits = 0
        self.l1_misses = 0
        self.l2_hits = 0
        self.l2_misses = 0
        self.l3_hits = 0
    
    def hit_rate(self):
        total = self.l1_hits + self.l1_misses + self.l2_hits + self.l2_misses
        hits = self.l1_hits + self.l2_hits
        return hits / total if total > 0 else 0
    
    def report(self):
        print(f"L1 hits: {self.l1_hits}, misses: {self.l1_misses}")
        print(f"L2 hits: {self.l2_hits}, misses: {self.l2_misses}")
        print(f"Hit rate: {self.hit_rate():.2%}")

stats = CacheStats()

# In get_user():
def get_user(user_id, local_cache):
    user = local_cache.get(f'user:{user_id}')
    if user:
        stats.l1_hits += 1
        return user
    
    stats.l1_misses += 1
    
    user_json = r.get(f'user:{user_id}')
    if user_json:
        stats.l2_hits += 1
        user = json.loads(user_json)
        local_cache.set(f'user:{user_id}', user)
        return user
    
    stats.l2_misses += 1
    user = db.users.find_one({'_id': user_id})
    
    return user

# Log stats periodically
def log_stats():
    while True:
        time.sleep(60)
        stats.report()

Aim for hit rates above 90%. Below 80% indicates undersized caches or misaligned TTLs.

Key Takeaways

Multi-tier caching reduces latency: L1 memory (microseconds) -> L2 Redis (milliseconds) -> L3 MongoDB (tens of milliseconds)
Use cache-aside (lazy loading) for simplicity; write-through for strong consistency
Invalidate caches on writes: TTL-based (simple) or event-based (precise)
Monitor cache hit rates; aim for above 90% to minimize database load
L1 (process memory) is fastest but process-local; L2 (Redis) is slower but shared across workers
For distributed systems, use Redis for global cache consistency; L1 for local, short-lived acceleration

Frequently Asked Questions

How large should each cache tier be?

Typical ratios: L1 (process) = 1–10% of working set, L2 (Redis) = 10–50%, L3 (database) = 100%. For 10 million user profiles, cache top 1% in L1 (100,000 users = 100 MB), top 5% in L2 (500,000 users = 500 MB), rest in MongoDB.

What TTL should I use for each tier?

L1: 5–15 minutes (short, accept staleness), L2: 30 minutes to 24 hours (depends on data update frequency), L3: N/A (persistent). Balance between freshness and hit rate.

How do I handle cache stampede?

Cache stampede: multiple requests hit missed cache simultaneously, all query MongoDB. Use probabilistic early expiration: refresh cache before expiry if next request detects upcoming expiration. Or use cache locks: first requester computes, others wait.

Can I use Memcached instead of Redis for L2?

Yes, Memcached is simpler but lacks TTL granularity, pub/sub, and persistence. Use Memcached for pure caching, Redis for caching + other features (leaderboards, sessions, pub/sub).

Should I cache everything?

No. Cache only hot data (frequently accessed) and expensive operations (database queries, computations). Caching rarely-accessed data wastes memory. Use monitoring to identify what to cache.

Three-Tier Caching Architecture​

L1: In-Process Python Cache​

L2: Redis Cache Layer​

L3: MongoDB as Source of Truth​

Cache Invalidation: The Hard Part​

TTL-Based Invalidation (Simple)​

Event-Based Invalidation (Advanced)​

Pattern: Cache-Aside (Lazy Loading)​

Pattern: Write-Through (Synchronous Update)​

Monitoring Cache Performance​

Key Takeaways​

Frequently Asked Questions​

How large should each cache tier be?​

What TTL should I use for each tier?​

How do I handle cache stampede?​

Can I use Memcached instead of Redis for L2?​

Should I cache everything?​

Further Reading​