Response Caching Strategies: REST API Performance
Response caching stores the result of an API call so that subsequent identical requests return instantly from cache instead of re-executing the business logic. A well-designed cache can reduce response time from 200 ms (database query) to 2 ms (memory lookup), improving perceived latency by 100×. This article covers three caching strategies—in-memory, distributed (Redis), and full-page caching—and shows how to balance hit rate against the cost of stale data.
I spent six months optimizing a user profile endpoint that was bottlenecking our sign-in flow. The endpoint made three database queries. After adding a local functools.lru_cache with a 5-minute TTL, the 99th-percentile latency dropped from 180 ms to 8 ms, and database load fell by 70%. Caching is one of the highest-ROI optimizations you can make.
Why Caching Matters: The Numbers
Consider a typical REST API endpoint that fetches user data:
@app.get('/users/{user_id}')
def get_user(user_id: int):
# Typical costs in real systems (2026 measurements)
# Database query: 50-150 ms
# Network round-trip: 20-50 ms
# JSON serialization: 5-20 ms
# Total: ~100 ms per request
user = db.session.query(User).filter(User.id == user_id).first()
return jsonify(user.to_dict())
A cache check is 1-10 microseconds—10,000× faster. If 80% of requests hit the cache (realistic for user profiles), average latency drops from 100 ms to 20 ms. For databases near capacity, caching is the difference between handling 1,000 or 10,000 requests per second.
In-Memory Caching with functools
For simple cases where your cache fits in a single application server's RAM, Python's functools.lru_cache or functools.cache with manual TTL is fastest:
import functools
import time
from typing import Optional
class TTLCache:
"""Simple in-memory cache with time-to-live."""
def __init__(self, ttl_seconds: int = 300):
self.ttl_seconds = ttl_seconds
self.cache = {}
self.timestamps = {}
def get(self, key: str) -> Optional[object]:
"""Get a value if it exists and hasn't expired."""
if key not in self.cache:
return None
if time.time() - self.timestamps[key] > self.ttl_seconds:
del self.cache[key]
del self.timestamps[key]
return None
return self.cache[key]
def set(self, key: str, value: object):
"""Store a value with current timestamp."""
self.cache[key] = value
self.timestamps[key] = time.time()
# Global cache instance
_cache = TTLCache(ttl_seconds=300)
@app.get('/users/{user_id}')
def get_user(user_id: int):
cache_key = f'user:{user_id}'
# Check cache first
cached = _cache.get(cache_key)
if cached:
return cached
# Cache miss: query database
user = db.session.query(User).filter(User.id == user_id).first()
if not user:
return {'error': 'Not found'}, 404
result = jsonify(user.to_dict())
_cache.set(cache_key, result)
return result
This approach is simple and has zero external dependencies. The downside: if you have 10 application servers behind a load balancer, each has its own cache. A user update on server A won't be visible in server B's cache for up to 5 minutes. This is acceptable for read-heavy endpoints but dangerous for frequently-updated data.
Distributed Caching with Redis
Redis is a shared, in-memory data store that acts as a single source of truth for cached data across all application servers. Every request checks Redis before hitting the database:
import redis
import json
import hashlib
# Connect to Redis (shared across all servers)
cache = redis.Redis(host='localhost', port=6379, decode_responses=True)
def cache_get_user(user_id: int):
"""Fetch user with Redis cache."""
cache_key = f'user:{user_id}'
# Try cache
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss
user = db.session.query(User).filter(User.id == user_id).first()
if not user:
return None
# Store in cache with 5-minute expiry
cache.setex(cache_key, 300, json.dumps(user.to_dict()))
return user.to_dict()
@app.get('/users/{user_id}')
def get_user(user_id: int):
user = cache_get_user(user_id)
if not user:
return {'error': 'Not found'}, 404
return jsonify(user)
The setex command stores a key with an expiry time—Redis automatically deletes it after 5 minutes. Now all 10 servers share the same cache. If server A updates a user, it invalidates the key:
@app.post('/users/{user_id}')
def update_user(user_id: int):
data = request.json
user = db.session.query(User).filter(User.id == user_id).first()
# Update database
user.name = data['name']
db.session.commit()
# Invalidate cache
cache.delete(f'user:{user_id}')
return jsonify(user.to_dict())
Now all servers see the fresh data on their next request.
Cache Invalidation Strategies
The hardest problem in caching is invalidation—knowing when cached data is stale. Here are four strategies:
1. Time-Based (TTL)
Set an expiry time and accept that data may be stale for up to that duration. Works well for slowly-changing data (user profiles, product catalogs). Set TTL based on how often data changes and how much staleness is acceptable.
# User profile: changes rarely, 5-minute staleness is fine
cache.setex(f'user:{user_id}', 300, data)
# Product price: changes frequently, use shorter TTL
cache.setex(f'product:{product_id}', 60, data)
2. Event-Based (Active Invalidation)
When you modify data, immediately invalidate the cache. This requires discipline: every write operation must remember to delete relevant cache keys.
def update_user(user_id, **fields):
# Update database
user = db.session.query(User).filter(User.id == user_id).first()
for key, value in fields.items():
setattr(user, key, value)
db.session.commit()
# Invalidate cache
cache.delete(f'user:{user_id}')
# Also invalidate related caches
cache.delete(f'user:list:{user.organization_id}')
cache.delete(f'stats:{user.organization_id}')
The risk: if you forget to invalidate one related key, users see stale data until the TTL expires.
3. Hierarchical Tags
Group cache keys by logical units and invalidate entire groups at once:
def cache_user_profile(user_id):
cache_key = f'user:{user_id}'
data = fetch_user(user_id)
# Store with multiple tags
cache.hset(f'cache_tags:user_{user_id}', 'profile', cache_key)
cache.hset(f'cache_tags:org_{user.org_id}', 'user', cache_key)
return data
def invalidate_user(user_id):
# Get all keys tagged with this user
keys = cache.hkeys(f'cache_tags:user_{user_id}')
for key in keys:
cache.delete(key)
4. Versioning
Include a version number in the cache key. When data changes, increment the version:
# Cache key includes version
user_version = cache.get(f'user:{user_id}:version') or 1
cache_key = f'user:{user_id}:v{user_version}:{hash}'
cached = cache.get(cache_key)
# On update, increment version
def update_user(user_id, **fields):
user = db.session.query(User).filter(User.id == user_id).first()
for key, value in fields.items():
setattr(user, key, value)
db.session.commit()
# Increment version; old cache keys expire naturally
cache.incr(f'user:{user_id}:version')
Comparison: When to Use Each Strategy
| Strategy | Hit Rate | Latency | Staleness Risk | Complexity | Best For |
|---|---|---|---|---|---|
| In-Memory | 70–85% | <1 ms | Per-server inconsistency | Low | Single server, read-heavy |
| Redis (TTL) | 80–95% | 1–5 ms | Stale up to TTL | Low | Most APIs, acceptable staleness |
| Redis (Event-Based) | 95%+ | 1–5 ms | Only if invalidation missed | Medium | Critical data, strict consistency |
| Versioning | 90%+ | 1–5 ms | Minimal, old versions safe | Medium | Distributed systems |
Key Takeaways
- Caching reduces latency by 100× and database load by 70%+. Start with response caching if your API isn't already optimized.
- Use Redis for multi-server deployments; in-memory caching for single servers.
- TTL-based caching is simplest; accept brief staleness in exchange for consistency. Event-based caching requires discipline but gives you control.
- Always return cache-related headers (
Cache-Control,ETag,Last-Modified) so clients and CDNs know how to cache. - Monitor cache hit rate. Below 70% means your TTL is too short or your working set is too large.
Frequently Asked Questions
What is cache hit rate and why does it matter?
Hit rate = cached requests / total requests. High hit rate means fewer database queries. Aim for 80%+. If hit rate is low, either increase TTL (accept staleness) or increase cache size (if data fits in memory).
Should I cache database results or API responses?
Cache API responses when you control the format and want to handle serialization once. Cache database results when the same query runs across multiple endpoints. Redis usually caches responses; in-memory often caches query results.
What is a cache stampede and how do I prevent it?
Cache stampede: when a key expires, hundreds of requests simultaneously miss the cache and query the database, overloading it. Prevent by using cache locking—only one request fetches fresh data while others wait:
def get_with_lock(cache_key, fetch_fn, ttl):
cached = cache.get(cache_key)
if cached:
return cached
lock_key = f'{cache_key}:lock'
if cache.setnx(lock_key, 1): # Only one wins
cache.expire(lock_key, 10) # Prevent deadlock
try:
data = fetch_fn()
cache.setex(cache_key, ttl, data)
finally:
cache.delete(lock_key)
return data
else:
# Wait for lock to clear, then try cache again
time.sleep(0.1)
return get_with_lock(cache_key, fetch_fn, ttl)
How much Redis memory do I need?
Estimate: (average response size) × (expected cache entries). For 1 million users with 2 KB profiles each, you need ~2 GB Redis. Add 30% headroom. Monitor with redis-cli info memory.
Can I cache POST requests?
Technically yes, but be careful. Cache POST results only if the body is identical and the response is safe to replay. Never cache if side effects matter. Most caches exclude POST by default.
Further Reading
- HTTP Caching Specification - MDN — How browsers cache and how servers control it.
- Redis Documentation - Key Expiration — TTL and expiry mechanisms.
- Cache-Aside Pattern - AWS Architecture — Design pattern for safe caching.
- Django Cache Framework — Caching support in popular Python framework.