Per-Interpreter GIL: Isolating State and Avoiding Deadlocks
The per-interpreter GIL is the architectural foundation of free-threaded Python. Each subinterpreter holds its own lock; threads running in Interpreter A don't block threads in Interpreter B. This isolation is powerful but introduces new deadlock patterns. I've debugged production deadlocks where one thread held Interpreter A's GIL while waiting for a channel message from Interpreter B, which was blocked trying to send. This article teaches the mental model and patterns to avoid such traps.
A subinterpreter's GIL protects its object heap and reference counts. Threads running code in that interpreter must hold the GIL. Unlike the process-wide GIL, there's no global serialization point. Code in Interpreter A and Interpreter B truly runs in parallel.
The Per-Interpreter GIL: Isolation Semantics
Each subinterpreter is a Python execution context with:
- Its own
__main__module and global namespace. - Its own object heap (objects created in A are invisible to B).
- Its own GIL (a lock protecting A's reference counts).
Threads running code in A acquire A's GIL. Threads running code in B acquire B's GIL. The two locks are independent; no global serialization.
import interpreters
import threading
import time
# Create two interpreters
interp_a = interpreters.create()
interp_b = interpreters.create()
# Code that holds the GIL for a long time
code_cpu_bound = """
import time
start = time.time()
while time.time() - start < 2:
x = 1 + 1
print("Done computing")
"""
# Run CPU-bound code in both interpreters concurrently
t_a = threading.Thread(target=lambda: interpreters.run_string(interp_a, code_cpu_bound))
t_b = threading.Thread(target=lambda: interpreters.run_string(interp_b, code_cpu_bound))
start = time.time()
t_a.start()
t_b.start()
t_a.join()
t_b.join()
elapsed = time.time() - start
print(f"Both completed in {elapsed:.1f}s")
# On free-threaded: ~2 seconds (parallel)
# On GIL-bound: would be serial (one interpreter doesn't exist, so invalid example)
interpreters.destroy(interp_a)
interpreters.destroy(interp_b)
On free-threaded Python, both threads run in parallel (~2 seconds total). On GIL-bound Python, subinterpreters would still hold the global GIL, so parallelism wouldn't improve (but subinterpreters are only available on free-threaded Python 3.13+).
Isolation Guarantees
Within an interpreter, Python's semantics are unchanged:
- A function's local variables are isolated (thread-local by nature).
- Global variables are shared (multiple threads in the same interpreter access the same
__main__namespace). - Object mutation is serialized by the GIL (only one thread holds it at a time).
Across interpreters:
- Direct object sharing is impossible (objects are tied to their interpreter's heap).
- Data must be pickled (serialization) or shared via low-level mechanisms (ctypes, memmap).
Example: attempting to share an object directly fails:
import interpreters
# Create two interpreters
interp_a = interpreters.create()
interp_b = interpreters.create()
code_a = """
x = {"data": [1, 2, 3]}
"""
code_b = """
# Trying to access x from interp_a would fail
# (but there's no syntax for it; each interpreter is isolated)
print("x is not visible here")
"""
interpreters.run_string(interp_a, code_a)
interpreters.run_string(interp_b, code_b)
interpreters.destroy(interp_a)
interpreters.destroy(interp_b)
Data must flow through channels:
import interpreters
import threading
send_id, recv_id = interpreters.create_channel()
code_a = f"""
import interpreters
send_id = {send_id}
x = {{"data": [1, 2, 3]}}
interpreters.channel_send(send_id, x)
"""
code_b = f"""
import interpreters
recv_id = {recv_id}
x = interpreters.channel_recv(recv_id)
print(f"Received: {{x}}")
"""
interp_a = interpreters.create()
interp_b = interpreters.create()
t_a = threading.Thread(target=lambda: interpreters.run_string(interp_a, code_a))
t_b = threading.Thread(target=lambda: interpreters.run_string(interp_b, code_b))
t_a.start()
t_b.start()
t_a.join()
t_b.join()
interpreters.destroy(interp_a)
interpreters.destroy(interp_b)
Deadlock Scenario 1: Channel Blocking with GIL
A classic deadlock: Thread A holds Interpreter A's GIL and tries to receive from a channel. Interpreter B's code (running in Thread B) tries to send but is blocked. If Interpreter B code acquires another lock that Thread A holds (unlikely but possible with nested locks), deadlock occurs.
More commonly, Thread A and Thread B both wait for channel data without sending:
import interpreters
import threading
send_id, recv_id = interpreters.create_channel()
# DEADLOCK: both sides try to receive, no one sends
code_a = f"""
import interpreters
recv_id = {recv_id}
msg = interpreters.channel_recv(recv_id) # Waits forever
print(msg)
"""
code_b = f"""
import interpreters
send_id = {send_id}
msg = interpreters.channel_recv(send_id) # Can't receive on send side!
"""
interp_a = interpreters.create()
interp_b = interpreters.create()
t_a = threading.Thread(target=lambda: interpreters.run_string(interp_a, code_a))
t_b = threading.Thread(target=lambda: interpreters.run_string(interp_b, code_b))
t_a.start()
t_b.start()
# This will hang forever
# t_a.join()
# t_b.join()
print("(Commented out to avoid hanging)")
interpreters.destroy(interp_a)
interpreters.destroy(interp_b)
Lesson: Ensure channel producers and consumers are paired correctly. Use a timeout to detect deadlocks:
import interpreters
import threading
send_id, recv_id = interpreters.create_channel()
code_a = f"""
import interpreters
send_id = {send_id}
interpreters.channel_send(send_id, "hello", timeout=2)
"""
code_b = f"""
import interpreters
recv_id = {recv_id}
msg = interpreters.channel_recv(recv_id, timeout=2)
print(f"Received: {{msg}}")
"""
interp_a = interpreters.create()
interp_b = interpreters.create()
t_a = threading.Thread(target=lambda: interpreters.run_string(interp_a, code_a))
t_b = threading.Thread(target=lambda: interpreters.run_string(interp_b, code_b))
t_a.start()
t_b.start()
try:
t_a.join(timeout=5)
t_b.join(timeout=5)
except:
print("Timeout detected")
interpreters.destroy(interp_a)
interpreters.destroy(interp_b)
Deadlock Scenario 2: Nested Locks Across Interpreters
If Thread A (holding Interpreter A's GIL) tries to acquire a lock held by Thread B (running in Interpreter B), and Thread B tries to send a channel message to Thread A, deadlock occurs.
Example (avoid this pattern):
import interpreters
import threading
import time
send_id, recv_id = interpreters.create_channel()
external_lock = threading.Lock()
# Thread A: holds GIL_A, acquires external_lock, tries to recv
code_a = f"""
import interpreters
import threading
import time
external_lock = None # Passed differently in real code
recv_id = {recv_id}
# Simulate holding the GIL for computation
time.sleep(1)
# Try to receive (while still holding GIL_A implicitly)
msg = interpreters.channel_recv(recv_id, timeout=2)
print(f"Received: {{msg}}")
"""
# Thread B: holds GIL_B, tries to acquire external_lock and send
code_b = f"""
import interpreters
import threading
import time
external_lock = None # Passed differently in real code
send_id = {send_id}
# Try to acquire external_lock (if Thread A holds it, we block)
# with external_lock:
# interpreters.channel_send(send_id, "hello")
# DEADLOCK: Thread A waiting on recv, Thread B waiting on lock
"""
# This example shows the pattern (avoid nesting GIL + external locks)
Pattern to avoid: Never hold an external lock while waiting on a channel. Keep locks and channel I/O separate.
Safe pattern:
import interpreters
import threading
send_id, recv_id = interpreters.create_channel()
# Separate channel I/O from lock-protected sections
code_a = f"""
import interpreters
recv_id = {recv_id}
# Wait for message (not holding any external lock)
msg = interpreters.channel_recv(recv_id, timeout=2)
# Process message (acquire locks if needed)
print(f"Received: {{msg}}")
"""
code_b = f"""
import interpreters
send_id = {send_id}
# Send message
interpreters.channel_send(send_id, "hello")
"""
interp_a = interpreters.create()
interp_b = interpreters.create()
t_a = threading.Thread(target=lambda: interpreters.run_string(interp_a, code_a))
t_b = threading.Thread(target=lambda: interpreters.run_string(interp_b, code_b))
t_a.start()
t_b.start()
t_a.join()
t_b.join()
interpreters.destroy(interp_a)
interpreters.destroy(interp_b)
print("Safe, no deadlock")
Deadlock Prevention Checklist
- Channel pairing: Ensure every
channel_send()has a correspondingchannel_recv(). Use timeouts to detect mismatches. - Lock ordering: If using external locks (not per-interpreter GILs), acquire them in a consistent global order across threads to prevent circular waits.
- Separate concerns: Don't hold external locks while blocking on channel I/O. Decouple synchronization.
- Watchdog threads: In critical services, spawn a watchdog thread that detects hung interpreters and logs/restarts them.
Example watchdog:
import interpreters
import threading
import time
def create_monitored_interpreter():
"""Create an interpreter and a watchdog thread."""
interp = interpreters.create()
last_ping = time.time()
lock = threading.Lock()
def watchdog():
"""Monitor for deadlocks; if no ping in 10 seconds, log warning."""
while True:
time.sleep(10)
with lock:
if time.time() - last_ping > 10:
print(f"Warning: Interpreter {interp} might be deadlocked")
def run_code(code):
nonlocal last_ping
with lock:
last_ping = time.time()
interpreters.run_string(interp, code)
with lock:
last_ping = time.time()
watchdog_thread = threading.Thread(target=watchdog, daemon=True)
watchdog_thread.start()
return interp, run_code
# Use monitored interpreter
interp, run = create_monitored_interpreter()
code = """
import time
time.sleep(2)
print("Done")
"""
run(code)
interpreters.destroy(interp)
Key Takeaways
- Per-interpreter GILs allow true parallelism; each interpreter's GIL is independent.
- Objects don't cross interpreter boundaries; data flows via channels (serialization) or low-level shared memory.
- Deadlock patterns differ from single-GIL Python: avoid holding external locks while blocking on channel I/O.
- Use timeouts on channel operations to detect hangs.
- Separate lock-protected sections from channel I/O to maintain deadlock freedom.
Frequently Asked Questions
Can a thread acquire two interpreters' GILs simultaneously?
No. A thread runs code in one interpreter at a time (holding one GIL). Switching interpreters requires releasing the current GIL. This is by design to prevent deadlocks.
What if I need to run code that accesses multiple interpreters?
You can't directly access objects from Interpreter A while running in Interpreter B. Instead, use channels: send data from A, receive in B, process, send results back. This enforces safe serialization.
How do I debug a deadlock involving subinterpreters?
Use py-spy or gdb to inspect thread stacks. Look for threads blocking on channel_recv() or locks. Verify that channel send/recv sides match. Add logging to pinpoint the exact point where threads block.
Is the per-interpreter GIL visible to my code?
No. The GIL is acquired and released automatically; you can't explicitly call acquire() or release(). You can use threading.Lock() for explicit synchronization if needed.
What's the overhead of per-interpreter GILs vs a global GIL?
Per-interpreter GILs add slightly more overhead per interpreter (~100 bytes per lock), but the lack of global contention more than compensates. Benchmarks show 2-4x improvement on multi-core workloads.