Performance optimization in Python — best practices and tools
Automated pipelines in article 158 keep regressions honest, yet latency issues often originate from asymptotic inefficiency (O(n²) merges), IO stalls, or thrashing C-extensions. Optimization starts with measurement, not instinct.
📚 Prerequisites
- Ability to reason about asymptotic complexity at a conversational level.
🎯 What you'll learn
- Use
time.perf_counterfor microbench discipline. - Reach for
cProfile/pyinstrumentwhen hotspots span functions. - Know when rewriting in Cython/rust extensions is folly vs necessary.
Measure first
import time
payload = range(500_000)
start = time.perf_counter()
sum(value * value for value in payload)
print(time.perf_counter() - start)
Wrap experiments in repeatable scripts or Jupyter cells with fixed seeds—noise dominates otherwise.
Profiling toolchain sketch
python -m cProfile -o stats.prof scripts/heavy_job.py
Inspect stats.prof visually with snakeviz or textual pstats. Look for cumulative time dominated by unexpectedly low-level helpers.
Common wins before exotic extensions:
- Replace Python-level tight loops manipulating lists with
NumPyvector ops. - Cache redundant pure-function calls (
functools.lru_cache) when inputs repeat. - Stream large files rather than
.read()everything.
💡 Key takeaways
- “Fast enough” hinges on SLA and hardware budget—profile user-visible paths first.
➡️ Next steps
Close the book mindset loop with idiomatic Python — PEP 8 recap.