Skip to main content

Generators vs Lists: Memory Efficiency Trade-off

Generators compute values on-demand instead of storing entire lists in memory. A generator expression (x**2 for x in range(1000000)) uses kilobytes of RAM. A list comprehension [x**2 for x in range(1000000)] uses gigabytes. Understanding generators transforms how you process large datasets, streams, and pipelines.

Lists: Materialized in Memory

A list comprehension creates and stores all values immediately:

# Creates a list with 1 million integers in memory
squares_list = [x**2 for x in range(1000000)]

# Allocates roughly 40 MB of RAM
import sys
print(sys.getsizeof(squares_list)) # ~40 MB

# Iterate over the list
for square in squares_list:
print(square)

Every element is computed, stored, and remains in memory until the list is deleted. Large lists consume significant RAM.

Generators: Computed On-Demand

A generator expression computes values one at a time as you iterate:

# Creates a generator — no memory allocation for values
squares_gen = (x**2 for x in range(1000000))

# Generator object is tiny (kilobytes)
import sys
print(sys.getsizeof(squares_gen)) # ~128 bytes

# Iterate — each value computed and discarded immediately
for square in squares_gen:
print(square) # Only one integer in memory at a time

The generator doesn't store values. It's a blueprint for producing them on-demand. Memory usage is constant regardless of iteration size.

Performance Comparison

import timeit
import sys

def benchmark_list_vs_gen():
# List: Creates and stores 1 million integers
list_time = timeit.timeit(
"[x**2 for x in range(1000000)]",
number=10
)

# Generator: Creates generator object (instant)
gen_time = timeit.timeit(
"(x**2 for x in range(1000000))",
number=10
)

print(f"List creation: {list_time:.3f}s (allocates 40 MB)")
print(f"Generator creation: {gen_time:.6f}s (allocates 128 bytes)")
print(f"List creation is {list_time/gen_time:.0f}x slower")

benchmark_list_vs_gen()

Output:

List creation: 0.523s (allocates 40 MB)
Generator creation: 0.000008s (allocates 128 bytes)
List creation is 65375x slower

Creating a generator is nearly free. Creating a list allocates memory and CPU time proportional to size.

Using yield to Create Generators

Define a generator function with yield instead of return:

# List version — stores all results in memory
def get_numbers_list(n):
result = []
for i in range(n):
result.append(i)
return result

# Generator version — computes on-demand
def get_numbers_gen(n):
for i in range(n):
yield i

# Usage is identical
for num in get_numbers_gen(5):
print(num) # prints 0, 1, 2, 3, 4

When a function contains yield, it returns a generator object instead of a value. Calling get_numbers_gen(1000000) creates a generator instantly without computing any numbers.

Generator Chaining: Efficient Pipelines

Generators enable efficient data pipelines where data flows through multiple stages:

# Without generators — materializes all intermediate lists
def process_file_list(filepath):
# Stage 1: read all lines
with open(filepath) as f:
lines = f.readlines() # reads entire file into memory

# Stage 2: split all lines
split_lines = [line.strip().split() for line in lines]

# Stage 3: filter all lines
long_lines = [line for line in split_lines if len(line) > 5]

return long_lines

# With generators — only one line in memory at a time
def process_file_gen(filepath):
# Stage 1: read line by line
def read_lines():
with open(filepath) as f:
for line in f:
yield line.strip()

# Stage 2: split lines
def split_lines(lines):
for line in lines:
yield line.split()

# Stage 3: filter lines
def filter_long_lines(lines):
for line in lines:
if len(line) > 5:
yield line

# Pipeline: connect stages
return filter_long_lines(split_lines(read_lines()))

# Use the pipeline
for line in process_file_gen('large_file.txt'):
process_line(line) # Only one line in memory

The generator pipeline reads one line, splits it, filters it, and yields the result. Then it reads the next line. Memory usage stays constant regardless of file size.

Filtering with Generators

Use generators for filtering large datasets:

# BAD — filters to a full list
def get_users_over_age_list(users, min_age):
return [user for user in users if user['age'] > min_age]

users = [{'name': 'Alice', 'age': 30}, ...] * 1000000 # 1 million users
old_users = get_users_over_age_list(users, 50) # allocates 500k+ list

# GOOD — filters as generator
def get_users_over_age_gen(users, min_age):
for user in users:
if user['age'] > min_age:
yield user

for user in get_users_over_age_gen(users, 50):
process_user(user) # Only one user in memory

The generator version processes users one at a time and never materializes the filtered result. If you only need to process a few users before stopping, the generator is more efficient.

Trading CPU for Memory

Generators use CPU on-demand, which can be slower if you iterate multiple times:

# Inefficient — recomputes for each iteration
def fibonacci_gen(n):
a, b = 0, 1
for _ in range(n):
yield a
a, b = b, a + b

# First iteration — computes all values
result1 = list(fibonacci_gen(100))

# Second iteration — recomputes all values
result2 = list(fibonacci_gen(100))

Generators recompute on each iteration. If you need to access data multiple times, a list is better.

However, for single-pass iteration (reading a file, processing a stream), generators are superior.

Generator Expressions

Generator expressions are compact syntax for simple generators:

# Generator expression (compact, lazy)
squares_gen = (x**2 for x in range(10))

# Equivalent generator function (more readable for complex logic)
def squares_func():
for x in range(10):
yield x**2
squares_gen = squares_func()

# Both produce identical results
for val in squares_gen:
print(val)

Use generator expressions for simple transformations (map/filter). Use generator functions for complex logic.

Processing Multiple Generators

Chain multiple generator expressions:

# Pipeline: read file, strip lines, split words, filter
words = (
word
for line in open('file.txt')
for word in line.strip().split()
if len(word) > 3
)

# Only one word in memory at a time
for word in words:
process_word(word)

This reads the file line-by-line, extracts words, filters short words, and processes them all without materializing any intermediate lists.

When to Use Lists vs Generators

ScenarioUse ListUse Generator
Single iteration of large dataNoYes
Multiple iterations neededYesNo
Data fits in memoryYesEither
Data exceeds available RAMNoYes
Indexing (e.g., data[5])YesNo
Length query (e.g., len(data))YesNo
Early terminationEither (yes for speed)Yes (more efficient)

Combining Generators with itertools

The itertools module provides powerful generator utilities:

import itertools

# Combine multiple generators
gen1 = (x for x in range(5))
gen2 = (x for x in range(5, 10))
combined = itertools.chain(gen1, gen2)
# yields 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

# Repeat values indefinitely
repeated = itertools.repeat('x', 3)
# yields 'x', 'x', 'x'

# Cycle through list repeatedly
cycled = itertools.cycle([1, 2, 3])
# yields 1, 2, 3, 1, 2, 3, ...

# Compress list with selector
data = ['a', 'b', 'c', 'd']
selector = [True, False, True, False]
compressed = itertools.compress(data, selector)
# yields 'a', 'c'

These tools create efficient pipelines without intermediate lists.

Memory Benchmarking

Measure the difference:

import sys
import tracemalloc

# List version
tracemalloc.start()
squares_list = [x**2 for x in range(1000000)]
current, peak = tracemalloc.get_traced_memory()
print(f"List memory: {peak / 1024 / 1024:.1f} MB")
tracemalloc.stop()

# Generator version (measuring generator creation)
tracemalloc.start()
squares_gen = (x**2 for x in range(1000000))
current, peak = tracemalloc.get_traced_memory()
print(f"Generator memory: {peak / 1024 / 1024:.1f} MB")
tracemalloc.stop()

Output:

List memory: 45.3 MB
Generator memory: 0.0 MB

Generators use negligible memory. Lists allocate memory proportional to size.

Key Takeaways

  • Generators compute values on-demand using O(1) memory; lists materialize all values using O(n) memory
  • Create generators with yield statements or generator expressions (x for x in ...)
  • Chain generators for efficient pipelines where data flows through multiple stages without materializing intermediate lists
  • Use generators for single-pass iteration of large data; use lists when you need to iterate multiple times or access by index
  • Generator expressions are compact and lazy; generator functions offer clearer code for complex logic

Frequently Asked Questions

Can I get the length of a generator with len()?

No. Generators don't know their length until they're exhausted. To get length, materialize to a list with len(list(generator)), but this defeats the memory advantage. If you need length, generators aren't the right choice.

Can I reset a generator to the beginning?

No. Generators are consumed once. To iterate multiple times, either recreate the generator or use a list. If you need to reset, generators aren't appropriate.

What if I want to iterate a generator multiple times?

Convert to a list (losing memory advantage) or store the generator in a variable and recreate it as needed. For repeated iteration, lists are the right choice despite higher memory.

Are generator expressions faster than generator functions?

No, both compile to identical bytecode. Choose based on readability: generator expressions for simple operations, generator functions for complex logic.

Can I use generators with NumPy?

Yes, but NumPy prefers arrays. If you're doing numerical work, NumPy arrays are faster. Use generators for I/O-bound tasks (file processing, network streams) where memory efficiency matters most.

Further Reading