Kubernetes Resource Limits: Optimize Python Deployments
Resource requests and limits tell Kubernetes how much CPU and memory your Python application needs and how much it is allowed to use. Requests inform the scheduler where to place pods; limits cap resource usage and prevent one pod from starving others. Without explicit resource management, Kubernetes cannot make intelligent scheduling decisions, leading to uneven resource utilization, cascading failures, and poor application performance.
Understanding Resource Requests vs. Limits
A resource request is the minimum guaranteed resources for a pod. The scheduler uses requests to decide which node can accommodate a new pod. If a node has 2 CPU cores and 4 GB RAM, and you request 1 CPU and 2 GB, the scheduler can fit 2 such pods on that node. A resource limit is the maximum a pod is allowed to consume. If your pod exceeds a limit, Kubernetes terminates it (out-of-memory kill) or throttles it (CPU).
I deployed a Python batch job without specifying limits, and it consumed all available memory on a node, triggering the Linux OOM killer. This killed every pod on that node in a cascading failure. After setting limits, the batch job was terminated gracefully when it exceeded its allocation, protecting other workloads.
Setting Appropriate Resource Requests and Limits for Python Apps
Here's a Deployment with realistic resource specifications for a Python web application:
apiVersion: apps/v1
kind: Deployment
metadata:
name: python-web-app
spec:
replicas: 3
selector:
matchLabels:
app: python-web
template:
metadata:
labels:
app: python-web
spec:
containers:
- name: app
image: my-registry/python-web:1.0.0
ports:
- containerPort: 8000
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
This pod reserves 0.1 CPU cores and 256 MB RAM. The scheduler ensures these resources are available on the node before placing the pod. The pod is allowed to burst up to 0.5 cores and 512 MB RAM. The ratio of requests to limits (1:5 for CPU, 1:2 for memory) is typical for web applications with variable load.
Understanding CPU Units and Memory Units
Kubernetes measures CPU in millicores (m): 1000m = 1 full CPU core. Memory is measured in mebibytes (Mi) or gibibytes (Gi): 1024 Mi = 1 Gi.
For a Python Flask application serving typical web traffic:
- Request CPU: 100m — 0.1 CPU core is adequate for a single-threaded Python app handling moderate load.
- Request Memory: 256Mi — A Python Flask app with basic dependencies uses 50-100 MB; 256 MB accommodates the application and overhead.
- Limit CPU: 500m — Allow bursting to 0.5 cores during traffic spikes.
- Limit Memory: 512Mi — Cap memory to prevent runaway allocations due to memory leaks.
For a data-processing job that parallelizes across multiple workers:
- Request CPU: 1000m (1 core) — Full core for compute-intensive work.
- Request Memory: 2Gi — Data processing often requires significant memory.
- Limit CPU: 2000m (2 cores) — Burst to 2 cores for parallelism.
- Limit Memory: 4Gi — Cap memory to prevent OOM killer surprises.
How Resource Requests Affect Pod Scheduling
The Kubernetes scheduler uses requests to bin-pack pods efficiently. If you have a 3-node cluster with 4 cores each (12 cores total) and you want to deploy 10 pods requesting 1 core each, the scheduler fits them like:
- Node 1: 4 pods (4 cores used)
- Node 2: 4 pods (4 cores used)
- Node 3: 2 pods (2 cores used, 2 cores free)
If a pod requests more resources than any single node has available, it stays pending forever. Always verify nodes have sufficient capacity: kubectl describe nodes.
Detecting Resource Under-allocation
If your Python application crashes with OutOfMemory errors despite having free memory on the node, your resource limit is too low. Check pod status:
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].state}'
If the output shows terminated: {reason: OOMKilled}, increase the memory limit. Monitor actual usage:
kubectl top pod <pod-name>
This shows CPU and memory usage. If actual usage is close to the limit, increase the limit. If it is much lower, consider reducing it to save costs.
CPU Requests and Throttling
CPU limits work differently from memory limits. If a pod exceeds its CPU limit, Kubernetes does not kill it; instead, it throttles (reduces) the pod's CPU allocation. This causes performance degradation but not a crash. For Python applications doing compute-intensive work, CPU throttling can be noticeable. Set limits generously to avoid throttling.
Here's a Python script that demonstrates CPU usage:
import time
import math
def cpu_intensive_task(duration_seconds=30):
"""Consume CPU for the specified duration."""
end_time = time.time() + duration_seconds
while time.time() < end_time:
# Compute-heavy operation
_ = math.factorial(1000)
if __name__ == "__main__":
print("Starting CPU-intensive task...")
cpu_intensive_task()
print("Task complete")
If this runs in a pod with limits.cpu: 100m, it will be heavily throttled. Increase the limit to 500m or higher for consistent performance.
Memory Requests and the Linux OOM Killer
Memory is more strict than CPU. If a pod exceeds its memory limit, the Linux kernel's OOM killer terminates the pod. For Python, this is especially important because:
- Unbounded data structures: Loading a large file into a list can consume gigabytes quickly.
- Memory leaks: Circular references or unclosed file handles accumulate memory over time.
- Third-party libraries: NumPy, Pandas, and other data libraries allocate memory aggressively.
Always set memory limits and monitor actual usage. Use tools like memory_profiler to detect leaks in development:
from memory_profiler import profile
@profile
def load_and_process_data():
data = []
for i in range(1_000_000):
data.append({"id": i, "value": i * 2})
# Process...
return len(data)
if __name__ == "__main__":
result = load_and_process_data()
print(f"Processed {result} records")
Run with python -m memory_profiler script.py to see line-by-line memory allocation.
Resource Quotas: Limiting Namespace Consumption
For shared clusters, set namespace-level resource quotas to prevent one team from consuming all resources:
apiVersion: v1
kind: ResourceQuota
metadata:
name: python-app-quota
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"
This quota allows the production namespace to request up to 10 cores and 20 GB RAM total, with limits up to 20 cores and 40 GB. If a Deployment tries to exceed this quota, new pods stay pending.
Right-Sizing Through Monitoring and Iteration
Begin with conservative estimates, deploy, and monitor actual usage for a week:
# Get CPU and memory usage for all pods
kubectl top pods -n default
# For a specific pod, check historical usage (requires metrics-server)
kubectl describe pod <pod-name> | grep -A 5 "Requests"
Adjust requests and limits based on observed peaks and trends. Document your reasoning in comments:
resources:
requests:
cpu: 200m # Typical usage ~150m at peak traffic
memory: 512Mi # Peak observed: 450Mi during data import
limits:
cpu: 1000m # Allow 5x burst for sudden traffic spikes
memory: 1Gi # Prevent OOM; crash is safer than cascading failure
Key Takeaways
- Requests reserve resources; limits cap usage. Both are essential for stable scheduling and preventing resource starvation.
- CPU is measured in millicores (m); memory in mebibytes (Mi) or gibibytes (Gi).
- Start with typical values (100m CPU, 256Mi memory for web apps) and adjust based on monitoring.
- CPU limits cause throttling (performance degradation); memory limits cause termination (crashes).
- Use ResourceQuotas for shared clusters to prevent resource hoarding.
Frequently Asked Questions
What happens if I set requests too high?
The scheduler cannot find nodes with enough free resources, and pods stay pending indefinitely. Check kubectl describe pod <pod-name> for "Insufficient cpu" or "Insufficient memory" events. Reduce requests or add more nodes.
What is a good requests-to-limits ratio?
For web applications, 1:5 for CPU and 1:2 for memory is common. For batch jobs with bursty load, 1:10 for CPU is reasonable. For stateless microservices, 1:2 to 1:3 is typical. Test and adjust based on your workload patterns.
How do I measure my Python application's actual CPU and memory usage?
Use kubectl top pod <pod-name> to see current usage. For historical trends, install Prometheus and Grafana (part of most observability stacks). For development, use psutil or memory_profiler libraries.
Can I set requests and limits per container (not just per pod)?
Yes, each container in a pod can have its own resources. This is common for sidecar patterns where a logging sidecar needs minimal resources.
What should I do if my application is consistently hitting its memory limit?
Either the limit is too low, or your application has a memory leak. Increase the limit temporarily to rule out a leak. Use a memory profiler to identify the leak. Fix it, test, and then set an appropriate limit.