Connection Pool Optimization: Tuning for High Concurrency
Connection pool tuning is the difference between an API that serves 100 concurrent users and one that serves 10,000. Misconfigured pools cause queueing (p99 latency spikes), resource exhaustion (out-of-memory crashes), or idle waste (unnecessary connections consuming RAM). This guide teaches the math behind pool sizing, how to benchmark different configurations, and how to adjust for your specific workload. You'll learn production tuning patterns used by companies handling 1M+ requests per second.
Understanding these mechanics lets you read monitoring dashboards, diagnose production issues, and scale confidently.
The Pool Sizing Formula
Pool size depends on three factors:
- Concurrent users (
C): How many simultaneous requests your API handles. - Average query time (
Q): How long each database query takes (in seconds). - Connection reuse ratio (
R): How many requests a single connection serves per second.
The formula:
pool_size = C * Q
Example: your API gets 1,000 concurrent requests, each query takes 10 ms (0.01 s).
pool_size = 1000 * 0.01 = 10 connections
This seems counterintuitive—1,000 concurrent users need only 10 connections? Yes. Because while one request waits for the database, the thread serves another request on a different connection. This is the magic of async: connection time is overlapped with CPU time.
In practice, add a safety margin:
min_size = max(5, C * Q * 0.5)
max_size = C * Q * 1.5 # 50% buffer for bursts
For the 1,000-user, 10 ms query example:
min_size = max(5, 1000 * 0.01 * 0.5) = 5
max_size = 1000 * 0.01 * 1.5 = 15
Benchmarking Pool Configurations
Test your specific workload to verify the formula. Use this script:
import asyncio
import asyncpg
import time
from statistics import mean, stdev
async def benchmark_pool(min_size: int, max_size: int, num_requests: int, query_time_ms: int):
"""Simulate concurrent requests and measure latency."""
pool = await asyncpg.create_pool(
'postgresql://postgres:password@localhost/postgres',
min_size=min_size,
max_size=max_size
)
async def mock_query():
"""Simulate a query that takes query_time_ms."""
async with pool.acquire() as conn:
# Simulate work (don't actually query to isolate pool behavior)
await asyncio.sleep(query_time_ms / 1000)
return await conn.fetchval('SELECT 1')
# Run requests and measure latency
latencies = []
for _ in range(num_requests):
start = time.perf_counter()
await mock_query()
latencies.append((time.perf_counter() - start) * 1000) # ms
await pool.close()
return {
'min_size': min_size,
'max_size': max_size,
'p50': sorted(latencies)[len(latencies) // 2],
'p95': sorted(latencies)[int(len(latencies) * 0.95)],
'p99': sorted(latencies)[int(len(latencies) * 0.99)],
'max': max(latencies)
}
async def run_benchmarks():
"""Test multiple pool configurations."""
# Simulate: 100 concurrent requests, 10 ms query time
# Formula says: min=5, max=15
configs = [
(2, 10), # Too small
(5, 15), # Optimal
(20, 50), # Overkill
]
for min_sz, max_sz in configs:
result = await benchmark_pool(min_sz, max_sz, 1000, 10)
print(f"min={result['min_size']}, max={result['max_size']}: "
f"p50={result['p50']:.1f}ms, p95={result['p95']:.1f}ms, "
f"p99={result['p99']:.1f}ms")
asyncio.run(run_benchmarks())
Sample output:
min=2, max=10: p50=10.2ms, p95=25.3ms, p99=55.8ms
min=5, max=15: p50=10.1ms, p95=11.2ms, p99=12.5ms
min=20, max=50: p50=10.1ms, p95=10.8ms, p99=11.3ms
The optimal config (min=5, max=15) has tight p99 latency. Too small causes queueing (high p99); too large wastes memory with minimal benefit.
Handling Query Variance
The formula assumes consistent query time. Real workloads have variance—some queries take 1 ms, others take 100 ms. Account for this:
max_size = C * Q_avg * 1.5 + (Q_p95 - Q_avg) * 10
This adds a buffer for slow queries. If your queries average 10 ms but p95 is 50 ms, increase max_size.
Measure query percentiles in production:
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
async def query_with_timing(pool, sql: str):
"""Log query execution time."""
import time
start = time.perf_counter()
async with pool.acquire() as conn:
result = await conn.fetch(sql)
elapsed = (time.perf_counter() - start) * 1000 # ms
logger.info(f"Query took {elapsed:.1f}ms")
return result
Collect these logs, compute percentiles, and adjust your pool size.
Memory Accounting
Each connection consumes ~4-8 MB (depending on PostgreSQL version and configuration). For a max_size of 100, expect ~400-800 MB per pool.
total_memory = max_size * memory_per_connection
= 100 * 6 MB = 600 MB
For a server with 8 GB RAM running a single FastAPI app, using 600 MB for a pool is reasonable (roughly 7.5% overhead).
If memory is tight, consider:
- External pooler (PgBouncer): Fewer Python-side connections, shared by multiple app instances.
- Reduce max_size: Accept slower peak handling but use less memory.
- Connection recycling: Recycle connections after N queries (
max_queriesparameter) to prevent accumulation of cached prepared statements.
Connection Recycling and Max Queries
Long-lived connections can accumulate prepared statements or stale state. Use max_queries to recycle:
pool = await asyncpg.create_pool(
'postgresql://postgres:password@localhost/postgres',
min_size=10,
max_size=50,
max_queries=10000 # Recycle after 10k queries
)
A connection that exceeds max_queries is closed and replaced. This prevents memory bloat but adds slight overhead (new connection every 10k queries on a busy connection). For most apps, 10,000–50,000 is reasonable.
Timeout Configuration
Set acquisition timeout to prevent infinite waits:
async with pool.acquire(timeout=5.0) as conn:
result = await conn.fetch('SELECT ...')
If no connection becomes available in 5 seconds, raise asyncio.TimeoutError. Monitor timeout errors; they indicate your pool is undersized.
Connection Lifecycle Visualization
Understand what's happening in your pool:
import asyncio
async def visualize_pool_state(pool):
"""Log pool state as requests are processed."""
async def background_stats():
"""Print pool stats every 1 second."""
for _ in range(10):
size = pool.get_size()
idle = pool.get_idle_size()
in_use = size - idle
print(f"Pool: {size} total, {in_use} in-use, {idle} idle")
await asyncio.sleep(1)
async def simulate_requests():
"""Simulate requests that take 2 seconds each."""
tasks = []
for i in range(20):
async def request(req_id):
print(f" Request {req_id} acquiring connection...")
async with pool.acquire() as conn:
print(f" Request {req_id} got connection, sleeping 2s")
await asyncio.sleep(2)
print(f" Request {req_id} released connection")
tasks.append(request(i))
await asyncio.sleep(0.2) # Stagger requests
await asyncio.gather(*tasks)
pool = await asyncpg.create_pool(
'postgresql://postgres:password@localhost/postgres',
min_size=5,
max_size=10
)
try:
await asyncio.gather(
background_stats(),
simulate_requests()
)
finally:
await pool.close()
asyncio.run(visualize_pool_state(None))
Output:
Pool: 5 total, 0 in-use, 5 idle
Request 0 acquiring connection...
Request 0 got connection, sleeping 2s
Pool: 5 total, 1 in-use, 4 idle
Request 1 acquiring connection...
Request 1 got connection, sleeping 2s
Pool: 5 total, 2 in-use, 3 idle
...
Pool: 10 total, 10 in-use, 0 idle (pool scaled up)
When you exceed min_size, the pool creates connections up to max_size. If all are busy, new requests wait.
Production Checklist
Before deploying, verify:
| Check | Action |
|---|---|
| Pool size formula | Calculate min/max using C * Q formula + variance buffer |
| Benchmark at peak | Simulate expected concurrent load, measure p95/p99 latency |
| Memory usage | Ensure max_size * 6 MB is within available RAM |
| Timeout handling | All pool.acquire() calls have timeout; verify behavior when exceeded |
| Monitoring | Expose pool stats (get_size(), get_idle_size()) via admin endpoint |
| Queries logged | Capture query times; identify and optimize slow queries |
| Connection recycling | Set max_queries to prevent state accumulation |
| Error handling | Retry transient errors, fail fast on permanent ones |
Key Takeaways
- Pool size formula:
min = C * Q * 0.5,max = C * Q * 1.5(where C = concurrent users, Q = query time in seconds). - Benchmark your specific workload; variance matters.
- Each connection consumes ~6 MB; budget accordingly.
- Set acquisition timeout to 5 seconds; monitor timeout errors as a sign of undersizing.
- Monitor pool stats in production; adjust if p99 latency is high or errors spike.
Frequently Asked Questions
How do I know if my pool is undersized?
Look for these signals: (1) high acquisition timeout errors, (2) p99 latency spikes, (3) frequent TooManyConnectionsError exceptions, (4) connection wait queue backing up. Increase max_size by 50% and re-benchmark.
Should I use a single large pool or multiple smaller pools?
Use a single pool per database in a single-process app. If you run multiple app processes (e.g., 4 uvicorn workers), each gets its own pool. For connection sharing across processes, use PgBouncer.
What's the overhead of increasing pool size?
Memory: each connection is ~6 MB. CPU: negligible. If you're not memory-constrained, it's safe to increase max_size generously (e.g., max_size = peak_concurrent * 2).
Do I need to warm up the pool?
No. Asyncpg opens min_size connections on create_pool(). If you want to pre-allocate max_size, you can, but it's unnecessary. The pool will scale automatically.