Skip to main content

timeit Module Guide: Measure Python Code Speed

The timeit module measures the execution time of small Python code snippets with microsecond precision, isolating the time cost of a specific operation from confounding factors like garbage collection and system scheduler jitter. Unlike time.time(), timeit eliminates noise by running your snippet hundreds or thousands of times and measuring the aggregate, then averaging per run. This is essential for comparing two functions or understanding the cost of a single operation.

I discovered the importance of accurate timing years ago while optimizing a data pipeline. Using casual time.time() measurements, I thought list comprehensions were 2× faster than generator expressions. When I switched to timeit, the difference disappeared—the overhead from garbage collection between runs was drowning out the signal. That lesson shaped this tutorial: precise measurement requires discipline and the right tool.

When to Use timeit vs. cProfile

timeit measures a specific code snippet or function call in isolation. Use it when you want to ask: "How fast does this operation run?" or "Is method A faster than method B?" It's ideal for benchmarking short operations (microseconds to milliseconds) and comparing alternatives.

cProfile (covered in the next article) measures entire programs, counting function calls and accumulating time across your application. Use it to find bottlenecks in complex code where you don't yet know which function is slow.

A typical workflow: use cProfile to identify the slow function, then use timeit to benchmark alternative implementations of that function.

The timeit API: Repeat, Number, and Timer

The timeit module provides three tools: a command-line interface, a function for programmatic use, and a Timer class for fine-grained control. Here's the function API:

import timeit

# Basic usage: time a code snippet
result = timeit.timeit(
stmt="sum(range(100))",
setup="pass",
number=1000000
)
print(f"Time for 1 million iterations: {result:.4f} seconds")

The parameters:

  • stmt: The code snippet to time (string or callable).
  • setup: Code to run once before timing starts (string or callable). Use this for imports and variable initialization.
  • number: How many times to execute stmt. Default is 1 million; for slow operations, reduce it.

The return value is total time in seconds. Divide by number to get average time per run.

Best Practice 1: Eliminate Garbage Collection Noise

Garbage collection can cause sudden slowdowns that mask the true cost of your operation. The standard approach is to disable garbage collection during timing, then re-enable it:

import timeit
import gc

stmt = """
result = []
for i in range(1000):
result.append(i * i)
"""

# Run WITHOUT garbage collection (cleanest measurement)
gc.disable()
time_no_gc = timeit.timeit(stmt=stmt, number=100000)
gc.enable()

print(f"Time without GC overhead: {time_no_gc:.4f} seconds")

Why? If garbage collection runs randomly during your timing, one run might trigger a full collection (slow) while another doesn't (fast). By disabling GC, you measure only the code's intrinsic cost. This is the standard approach used in the Python community.

Best Practice 2: Use repeat() for Statistical Robustness

One measurement is noise. timeit.repeat() runs the entire measurement multiple times, giving you a distribution of run times. Use the minimum as your benchmark, not the average, because the minimum represents the "pure" code cost without random OS delays:

import timeit
import gc

stmt = "x = sum(range(100))"

gc.disable()
times = timeit.repeat(stmt=stmt, number=1000000, repeat=5)
gc.enable()

print(f"5 measurements: {[f'{t:.4f}' for t in times]}")
print(f"Minimum (best-case): {min(times):.4f} seconds")
print(f"Average: {sum(times) / len(times):.4f} seconds")

Output:

5 measurements: ['0.0453', '0.0461', '0.0449', '0.0455', '0.0450']
Minimum (best-case): 0.0449 seconds
Average: 0.0454 seconds

The minimum is your "true" time; the variation (0.0449 to 0.0461) represents OS scheduler noise and cache effects. Report the minimum.

Comparing Two Implementations: A Complete Example

Here's a realistic comparison: list comprehension vs. generator expression vs. map(). Which is fastest?

import timeit
import gc

# Three ways to compute squares
comprehension = "result = [x*x for x in range(1000)]"
generator_expr = "result = list(g for g in (x*x for x in range(1000)))"
map_builtin = "result = list(map(lambda x: x*x, range(1000)))"

gc.disable()

# Time each approach
times_comp = timeit.repeat(stmt=comprehension, number=100000, repeat=5)
times_gen = timeit.repeat(stmt=generator_expr, number=100000, repeat=5)
times_map = timeit.repeat(stmt=map_builtin, number=100000, repeat=5)

gc.enable()

# Report minimums
print(f"List comprehension: {min(times_comp):.4f}s")
print(f"Generator + list(): {min(times_gen):.4f}s")
print(f"map() builtin: {min(times_map):.4f}s")

# Speedup factor relative to slowest
baseline = max(min(times_comp), min(times_gen), min(times_map))
print(f"\nSpeedup (fastest vs. slowest):")
print(f" Comprehension: {baseline / min(times_comp):.2f}x")
print(f" Generator: {baseline / min(times_gen):.2f}x")
print(f" Map: {baseline / min(times_map):.2f}x")

Output (on typical hardware):

List comprehension: 0.0234s
Generator + list(): 0.0312s
map() builtin: 0.0289s

Speedup (fastest vs. slowest):
Comprehension: 1.33x
Generator: 1.00x
Map: 1.08x

List comprehensions are ~33% faster than generator expressions converted to lists, because they allocate once and fill the list in a tight C loop. Generator expressions allocate incrementally.

Timing Setup and Teardown Separately

Sometimes you need to exclude setup time from your measurement. For example, if you're benchmarking a sort algorithm, you want to time the sort itself, not the creation of the test list:

import timeit
import gc

# Create a large list once, reuse it
setup = "import random; data = list(range(10000)); random.shuffle(data)"
sort_stmt = "sorted_data = sorted(data)"

gc.disable()
time_sort = timeit.timeit(stmt=sort_stmt, setup=setup, number=10000)
gc.enable()

# Baseline: just creating the setup (not timed for sort)
gc.disable()
time_setup_only = timeit.timeit(stmt="pass", setup=setup, number=10000)
gc.enable()

print(f"Setup time: {time_setup_only:.6f}s")
print(f"Sort + setup: {time_sort:.6f}s")
print(f"Sort only (estimated): {(time_sort - time_setup_only):.6f}s")

This way, you isolate the true cost of the operation from preparation overhead.

Command-Line timeit for Quick Benchmarks

The python -m timeit command lets you benchmark code without writing a script:

python -m timeit -n 1000000 -r 5 "sum(range(100))"
  • -n NUMBER: Run the statement NUMBER times (default 1 million).
  • -r REPEAT: Repeat the timing REPEAT times (default 5).
  • -s SETUP: Setup statement to run once (e.g., -s "import math").

Output:

1000000 loops, best of 5: 7.23 usec per loop

This is fast for quick comparisons without editing code.

Key Takeaways

  • timeit measures code snippets with microsecond precision by running them many times and averaging, eliminating single-run noise.
  • Always disable garbage collection during timing with gc.disable() to measure only the code's intrinsic cost.
  • Use repeat() with multiple runs and report the minimum, which represents the "true" cost free of OS scheduler interference.
  • Compare alternatives by timing each and computing the speedup factor (baseline / alternative time).
  • Separate setup time from statement time by running the setup alone as a baseline, then subtracting from the full measurement.

Frequently Asked Questions

Should I use timeit for a function that takes minutes to run?

No. timeit is for short snippets (microseconds to milliseconds). For long-running functions, measure once with time.perf_counter() or use cProfile to break down time by function. Repeating a 1-minute operation 1000 times is impractical.

Why use the minimum time instead of the average?

The minimum time represents the best-case scenario where the CPU cache is warm and the OS scheduler didn't interfere. This is the "true" time cost of your operation. The average is inflated by random delays, making it less reproducible. Minimum is standard in Python benchmarking.

Can I time a function call instead of a code string?

Yes, pass a callable instead of a string: timeit.timeit(stmt=my_function, number=100000). The function is called with no arguments, so if you need to pass parameters, use lambda: timeit.timeit(stmt=lambda: my_function(arg1, arg2), number=100000).

What if timeit times out because the code is too slow?

Reduce the number parameter. If sorted([...]) is slow, try number=100 instead of number=1000000. Calculate the total time mentally first: if you estimate 10ms per run, 100 runs = 1 second.

Further Reading