Subinterpreters vs Threads: When to Use Each

Choosing between threads, processes, and subinterpreters shapes your architecture. The GIL made this simple: threads for I/O, processes for CPU work. Free-threaded Python and subinterpreters add a third dimension. I've architected systems using all three; this article distills when each is optimal.

Thread is a lightweight unit of execution sharing an address space and one GIL (on GIL-bound Python) or per-interpreter GIL (on free-threaded Python). A process is a fully isolated OS-level instance with its own memory and kernel resources. A subinterpreter is a Python-level isolated environment within one process, sharing the OS address space but with its own GIL and namespace. The trade-offs differ dramatically.

Quick Comparison Table

Aspect	Threads	Processes	Subinterpreters
Overhead	<1 MB per thread	~10-50 MB per process	~1-5 MB per interpreter
Startup latency	<1 ms	50-500 ms	<1 ms (after Python init)
Data sharing	Direct (shared memory)	Serialization (IPC)	Channels (serialization)
Isolation	No (shared heap)	Yes (separate OS process)	Yes (separate Python namespace)
GIL contention	On GIL-bound Python (bad)	None (separate GILs)	None (per-interpreter GILs)
Parallelism (CPU-bound, GIL)	Single-threaded (1x core)	N processes (Nx cores)	N interpreters (Nx cores, free-threaded)
Parallelism (I/O-bound)	N threads (N concurrent I/O)	N processes (overhead)	N interpreters (overhead)
Context switches	Fast (shared heap)	Slow (kernel, TLB flush)	Medium (Python level)
Debugging	Simple (shared debugger)	Complex (per-process gdb)	Medium (inspect per interpreter)

I/O-Bound Work: Use Threads

Threads shine for I/O-bound tasks: network requests, file reads, database queries. The GIL releases during syscalls, so hundreds of threads can wait on I/O concurrently without blocking others.

Example: web scraper fetching 100 URLs in parallel.

import threading
import requests
from concurrent.futures import ThreadPoolExecutor

urls = [
    f"https://jsonplaceholder.typicode.com/posts/{i}" 
    for i in range(1, 101)
]

def fetch_url(url):
    """Fetch a URL and return its size."""
    try:
        resp = requests.get(url, timeout=5)
        return len(resp.content)
    except Exception as e:
        print(f"Error fetching {url}: {e}")
        return 0

# ThreadPoolExecutor is the pythonic way
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(fetch_url, urls))

total_size = sum(results)
print(f"Fetched {len(urls)} URLs, total size: {total_size} bytes")

This works on any Python version, GIL or free-threaded. The GIL doesn't bottleneck because threads release it during network I/O. Ten threads waiting on network calls don't compete for the GIL.

Use threads for:

Network I/O (HTTP, WebSocket, gRPC).
File I/O (disk reads/writes).
Database queries (psycopg2, PyMySQL—these release the GIL during blocking calls).
Any async-like pattern with concurrent.futures.ThreadPoolExecutor.

CPU-Bound Work: GIL-Bound Python = Processes; Free-Threaded = Threads or Subinterpreters

On GIL-bound Python, threads serialize for CPU work. You have two options: multiprocessing or limit to single-threaded.

import multiprocessing
import time

def fibonacci(n):
    """Compute Fibonacci(n) recursively (CPU-bound)."""
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

if __name__ == "__main__":
    # Multiprocessing (one process per core)
    with multiprocessing.Pool(processes=4) as pool:
        start = time.time()
        results = pool.map(fibonacci, [35] * 4)
        elapsed = time.time() - start
        print(f"Multiprocessing (4 cores): {elapsed:.2f}s")

On free-threaded Python, threads parallelize CPU work. Choose based on overhead:

import threading
import time

def fibonacci(n):
    """CPU-bound computation."""
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

# Threads on free-threaded Python (no GIL)
start = time.time()
threads = []
for _ in range(4):
    t = threading.Thread(target=fibonacci, args=(35,))
    threads.append(t)
    t.start()
for t in threads:
    t.join()
elapsed = time.time() - start
print(f"Threads (free-threaded, 4 cores): {elapsed:.2f}s")

Free-threaded threads offer:

Lower overhead (~0.5 MB vs ~20 MB per process).
Faster startup (~1 ms vs ~100 ms per process).
Shared address space (useful for read-only data: code, pre-loaded models).

Subinterpreters offer the same benefits plus isolation:

import interpreters
import threading
import time

code_template = """
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

result = fibonacci(35)
"""

# Create 4 subinterpreters
interps = [interpreters.create() for _ in range(4)]

start = time.time()
threads = []
for interp in interps:
    def run_in_interp(i):
        interpreters.run_string(i, code_template)
    t = threading.Thread(target=run_in_interp, args=(interp,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()
elapsed = time.time() - start
print(f"Subinterpreters (4 cores): {elapsed:.2f}s")

# Clean up
for interp in interps:
    interpreters.destroy(interp)

When to Use Subinterpreters

Subinterpreters are best when you need isolation + low overhead. Scenarios:

Multi-tenant systems: Each tenant's code runs in its own interpreter; one bug or infinite loop doesn't crash others.
Batch job pooling: Pre-create a pool of workers, pre-load heavy libraries (NumPy, TensorFlow), and dispatch tasks to avoid startup overhead.
Sandboxed plugins: Run user-supplied code safely in an interpreter that you can destroy or limit.
Mixed I/O and CPU work: A subinterpreter thread doing both network requests (I/O) and image processing (CPU) avoids GIL context switching that threads would have.

Example: plugin system.

import interpreters
import threading

def create_plugin_env(name):
    """Create an isolated environment for a plugin."""
    interp = interpreters.create()
    
    # Pre-load common utilities
    interpreters.run_string(interp, """
import json
import time

loaded_plugins = []
""")
    
    return interp

def load_plugin(interp, plugin_code):
    """Load and run user plugin code in isolation."""
    try:
        interpreters.run_string(interp, plugin_code)
        return True
    except RuntimeError as e:
        print(f"Plugin error: {e}")
        return False

# User-supplied plugin (potentially buggy)
plugin = """
def process_data(data):
    return json.dumps({"input": data, "timestamp": time.time()})

result = process_data({"x": 1, "y": 2})
print(f"Plugin result: {result}")
"""

# Run in isolated interpreter
interp = create_plugin_env("user_plugin")
load_plugin(interp, plugin)
interpreters.destroy(interp)

Decision Tree: Which to Use?

Is the work I/O-bound (network, disk, DB)? → Use threads (simplest, lowest latency).
Is the work CPU-bound?
- Running on GIL-bound Python? → Use processes (multiprocessing.Pool).
- Running on free-threaded Python?
  - Need isolation (plugins, multi-tenant)? → Use subinterpreters.
  - Just need parallelism? → Use threads (simpler, lower overhead).
Do I need a worker pool (preload models, avoid startup)? → Use subinterpreters with interpreters.create() at startup.
Do I need to limit resource usage or kill long-running tasks? → Use processes (OS-level limits) or subinterpreters (Python-level limit, but not as strong).

Memory and Startup Comparison

Benchmark on a typical machine:

Approach	Startup Time	Memory per Worker
10 threads	<1 ms	0.5 MB
10 processes	~500 ms	25 MB
10 subinterpreters	~50 ms (if pre-created)	3 MB

Startup time dominates for short-lived tasks. Memory is a concern for large pools.

Recommendation Summary

Web servers (HTTP requests): Use asyncio or threads with ThreadPoolExecutor. Threads are simpler; asyncio is more scalable.
Data processing (CPU-heavy): Use multiprocessing.Pool (GIL-bound) or threads (free-threaded).
Real-time services (latency-critical): Prefer free-threaded threads over processes (500 ms startup overhead is too much).
Batch jobs (throughput-critical): Multiprocessing or subinterpreter pools; startup cost is amortized.
Sandboxing (security): Use subinterpreters if you control the code, separate processes if you don't trust it.

Key Takeaways

Threads: low overhead, best for I/O; GIL limits CPU parallelism on GIL-bound Python.
Processes: high overhead, true isolation, mandatory for CPU work on GIL-bound Python.
Subinterpreters: medium overhead, isolation + shared address space, best for worker pools and sandboxing on free-threaded Python.
Choose based on workload type (I/O vs CPU), isolation needs, and startup latency requirements.

Frequently Asked Questions

Can I mix threads and processes in one app?

Yes. Use multiprocessing to spawn worker processes, and within each process, use threading for I/O-bound tasks. Common in request handlers that do both network calls and spawning subprocess utilities.

Why would I use subinterpreters instead of processes?

Subinterpreters offer lower overhead (startup, memory) and shared address space (useful for caching, pre-loaded models). Use processes if you need OS-level isolation (security, resource limits) or if running untrusted code.

Does `asyncio` make threads obsolete?

No. asyncio is excellent for I/O-bound work in a single thread. Threads are better for CPU-bound I/O (e.g., blocking database drivers that don't cooperate with asyncio). Use whichever fits your libraries and code style.

What if my code uses both CPU-bound and I/O-bound work?

Use threads for I/O; spawn a thread pool for CPU work if using free-threaded Python. If using GIL-bound Python, spawn a multiprocessing pool for CPU work. Libraries like ray automate this decision.

Is free-threaded Python production-ready in 2026?

Yes, Python 3.13+ free-threaded builds are stable. Most major libraries (NumPy, Pandas, Torch) have free-threaded wheels. Test in staging first; some C extensions may not have free-threaded support yet.

Quick Comparison Table​

I/O-Bound Work: Use Threads​

CPU-Bound Work: GIL-Bound Python = Processes; Free-Threaded = Threads or Subinterpreters​

When to Use Subinterpreters​

Decision Tree: Which to Use?​

Memory and Startup Comparison​

Recommendation Summary​

Key Takeaways​

Frequently Asked Questions​

Can I mix threads and processes in one app?​

Why would I use subinterpreters instead of processes?​

Does asyncio make threads obsolete?​

What if my code uses both CPU-bound and I/O-bound work?​

Is free-threaded Python production-ready in 2026?​

Further Reading​