Skip to main content

Python Performance Metrics: What to Measure

Python performance optimization starts with measurement, not guesswork. Understanding which metrics to track—execution time, memory footprint, and CPU utilization—allows you to identify real bottlenecks and measure whether your optimizations actually work. This tutorial introduces the essential profiling tools and metrics every Python developer should know.

What Performance Metrics Should You Track?

Effective optimization requires data. There are three core metrics that matter in production Python:

  1. Execution time (wall-clock): How long does a function or program take to run from start to finish? Measure in milliseconds or seconds.
  2. Memory consumption: How much RAM is your process using? Measured in MB or GB. Memory leaks and unbounded growth can crash servers.
  3. CPU utilization: What percentage of available CPU cores is your code using? This matters for parallelism and server efficiency.

The first step is always to measure your baseline (before optimizing) so you can prove that changes help. Optimization without measurement is expensive guesswork.

Why Measure Before Optimizing?

Python's simplicity can hide performance surprises. A line of code that looks efficient might be orders of magnitude slower than an alternative. For example, string concatenation in a loop is O(n²) because strings are immutable, while a list comprehension is O(n). Profiling reveals the real cost.

I learned this lesson early in my career when I spent a week optimizing a loop that turned out to consume less than 1% of a program's runtime. The real bottleneck was in a data structure operation I'd overlooked. Profiling first would have saved that time.

Using timeit for Execution Time

The timeit module is Python's standard tool for measuring execution time. It runs your code multiple times and averages the result, eliminating noise from OS scheduling and system load.

import timeit

# Measure list concatenation vs extend
code_concat = """
result = []
for i in range(1000):
result = result + [i]
"""

code_extend = """
result = []
for i in range(1000):
result.extend([i])
"""

time_concat = timeit.timeit(code_concat, number=100)
time_extend = timeit.timeit(code_extend, number=100)

print(f"Concatenation: {time_concat:.4f}s for 100 runs")
print(f"Extend: {time_extend:.4f}s for 100 runs")
print(f"Extend is {time_concat/time_extend:.1f}x faster")

Output (typical):

Concatenation: 8.5432s for 100 runs
Extend: 0.0234s for 100 runs
Extend is 365.2x faster

The number parameter controls how many times the code executes. Always use it so you get statistically meaningful results. For faster code, increase number; for slower code, decrease it to avoid waiting too long.

Using timeit from the Command Line

You can also profile code without writing a script:

python -m timeit -n 1000 "sum(range(1000))"
# Output: 1000 loops, best of 5: 47.2 usec per loop

Measuring Memory with memory_profiler

The memory_profiler package shows how memory grows during function execution. Install it with pip install memory-profiler.

from memory_profiler import profile

@profile
def create_large_list():
data = [i**2 for i in range(1000000)]
return sum(data)

result = create_large_list()

Run with:

python -m memory_profiler your_script.py

Output shows memory usage (in MB) line by line:

Filename: your_script.py
Line # Mem usage Increment Occurrences Line Contents
1 11.2 MiB 0.0 MiB 1 @profile
2 def create_large_list():
3 50.4 MiB 39.2 MiB 1 data = [i**2 for i in range(1000000)]
4 50.4 MiB 0.0 MiB 1 return sum(data)

This reveals that the list comprehension allocates ~39 MB in a single line. If you're tight on memory, you'd want to use a generator instead (which would show near-zero increment).

Using cProfile for CPU Profiling

The built-in cProfile module shows which functions consume the most CPU time. It's invaluable for finding hot spots.

import cProfile
import pstats
from io import StringIO

def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)

profiler = cProfile.Profile()
profiler.enable()

result = fibonacci(30)

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)

Output shows call count, total time, and per-call time for the top 10 functions. Cumulative sort shows which functions took the most total time (including called functions).

Building a Performance Dashboard

For ongoing optimization work, create a simple benchmark script that tracks multiple metrics:

import timeit
import psutil
import os

def benchmark_operation(label, code_snippet, iterations=100):
"""Measure time and memory for a code snippet."""
process = psutil.Process(os.getpid())
mem_before = process.memory_info().rss / 1024 / 1024 # MB

exec_time = timeit.timeit(code_snippet, number=iterations)

mem_after = process.memory_info().rss / 1024 / 1024
mem_delta = mem_after - mem_before

print(f"{label:30s} | Time: {exec_time/iterations*1000:8.3f}ms | Memory: {mem_delta:+.1f}MB")

benchmark_operation("List append", "result = []\nfor i in range(10000):\n result.append(i)")
benchmark_operation("List extend", "[i for i in range(10000)]")
benchmark_operation("Set creation", "s = set(range(10000))")

This creates a repeatable way to compare operations side by side.

Key Takeaways

  • Measure baseline performance before optimizing; never guess where time is spent
  • Use timeit for execution time, memory_profiler for memory, cProfile for CPU profiling
  • Always run measurements multiple times (number parameter in timeit) for statistical validity
  • Memory growth and CPU utilization are as important as raw speed
  • Create benchmark scripts to track multiple metrics and compare alternative implementations

Frequently Asked Questions

How many iterations should I use with timeit?

Use enough iterations so the total run time is at least 0.2 seconds. If your code runs in 1 microsecond, you might need number=100000. Python prints a warning if iterations are too low. The timeit.Timer.autorange() method chooses automatically.

Should I measure on my laptop or production?

Measure both. Your laptop might have different CPU, RAM, and OS characteristics than production. For algorithmic comparisons (which implementation is faster), relative differences on your machine predict production rank. For absolute latency targets, measure production with real data.

What's the difference between time.time() and timeit?

time.time() measures wall-clock time once and includes OS scheduling variance. timeit runs code multiple times and takes the best result, eliminating system noise. For micro-benchmarks under 1 millisecond, timeit is far more accurate.

Does profiling slow down my code?

Yes, measurably. cProfile adds 15–30% overhead; memory_profiler adds 2–4x. Use them only on the code you're optimizing, not in production. For production performance monitoring, use lighter tools like py-spy or APM services.

How do I profile code that's already running?

Use py-spy (https://github.com/benfred/py-spy), which attaches to a running Python process without modifying code. It works on production binaries and provides flame graphs showing which functions consume the most CPU.

Further Reading