Per-Interpreter GIL: Isolating State and Avoiding Deadlocks

The per-interpreter GIL is the architectural foundation of free-threaded Python. Each subinterpreter holds its own lock; threads running in Interpreter A don't block threads in Interpreter B. This isolation is powerful but introduces new deadlock patterns. I've debugged production deadlocks where one thread held Interpreter A's GIL while waiting for a channel message from Interpreter B, which was blocked trying to send. This article teaches the mental model and patterns to avoid such traps.

A subinterpreter's GIL protects its object heap and reference counts. Threads running code in that interpreter must hold the GIL. Unlike the process-wide GIL, there's no global serialization point. Code in Interpreter A and Interpreter B truly runs in parallel.

The Per-Interpreter GIL: Isolation Semantics

Each subinterpreter is a Python execution context with:

Its own __main__ module and global namespace.
Its own object heap (objects created in A are invisible to B).
Its own GIL (a lock protecting A's reference counts).

Threads running code in A acquire A's GIL. Threads running code in B acquire B's GIL. The two locks are independent; no global serialization.

import interpreters
import threading
import time

# Create two interpreters
interp_a = interpreters.create()
interp_b = interpreters.create()

# Code that holds the GIL for a long time
code_cpu_bound = """
import time
start = time.time()
while time.time() - start < 2:
    x = 1 + 1
print("Done computing")
"""

# Run CPU-bound code in both interpreters concurrently
t_a = threading.Thread(target=lambda: interpreters.run_string(interp_a, code_cpu_bound))
t_b = threading.Thread(target=lambda: interpreters.run_string(interp_b, code_cpu_bound))

start = time.time()
t_a.start()
t_b.start()
t_a.join()
t_b.join()
elapsed = time.time() - start

print(f"Both completed in {elapsed:.1f}s")
# On free-threaded: ~2 seconds (parallel)
# On GIL-bound: would be serial (one interpreter doesn't exist, so invalid example)

interpreters.destroy(interp_a)
interpreters.destroy(interp_b)

On free-threaded Python, both threads run in parallel (~2 seconds total). On GIL-bound Python, subinterpreters would still hold the global GIL, so parallelism wouldn't improve (but subinterpreters are only available on free-threaded Python 3.13+).

Isolation Guarantees

Within an interpreter, Python's semantics are unchanged:

A function's local variables are isolated (thread-local by nature).
Global variables are shared (multiple threads in the same interpreter access the same __main__ namespace).
Object mutation is serialized by the GIL (only one thread holds it at a time).

Across interpreters:

Direct object sharing is impossible (objects are tied to their interpreter's heap).
Data must be pickled (serialization) or shared via low-level mechanisms (ctypes, memmap).

Example: attempting to share an object directly fails:

import interpreters

# Create two interpreters
interp_a = interpreters.create()
interp_b = interpreters.create()

code_a = """
x = {"data": [1, 2, 3]}
"""

code_b = """
# Trying to access x from interp_a would fail
# (but there's no syntax for it; each interpreter is isolated)
print("x is not visible here")
"""

interpreters.run_string(interp_a, code_a)
interpreters.run_string(interp_b, code_b)

interpreters.destroy(interp_a)
interpreters.destroy(interp_b)

Data must flow through channels:

import interpreters
import threading

send_id, recv_id = interpreters.create_channel()

code_a = f"""
import interpreters
send_id = {send_id}
x = {{"data": [1, 2, 3]}}
interpreters.channel_send(send_id, x)
"""

code_b = f"""
import interpreters
recv_id = {recv_id}
x = interpreters.channel_recv(recv_id)
print(f"Received: {{x}}")
"""

interp_a = interpreters.create()
interp_b = interpreters.create()

t_a = threading.Thread(target=lambda: interpreters.run_string(interp_a, code_a))
t_b = threading.Thread(target=lambda: interpreters.run_string(interp_b, code_b))

t_a.start()
t_b.start()
t_a.join()
t_b.join()

interpreters.destroy(interp_a)
interpreters.destroy(interp_b)

Deadlock Scenario 1: Channel Blocking with GIL

A classic deadlock: Thread A holds Interpreter A's GIL and tries to receive from a channel. Interpreter B's code (running in Thread B) tries to send but is blocked. If Interpreter B code acquires another lock that Thread A holds (unlikely but possible with nested locks), deadlock occurs.

More commonly, Thread A and Thread B both wait for channel data without sending:

import interpreters
import threading

send_id, recv_id = interpreters.create_channel()

# DEADLOCK: both sides try to receive, no one sends
code_a = f"""
import interpreters
recv_id = {recv_id}
msg = interpreters.channel_recv(recv_id)  # Waits forever
print(msg)
"""

code_b = f"""
import interpreters
send_id = {send_id}
msg = interpreters.channel_recv(send_id)  # Can't receive on send side!
"""

interp_a = interpreters.create()
interp_b = interpreters.create()

t_a = threading.Thread(target=lambda: interpreters.run_string(interp_a, code_a))
t_b = threading.Thread(target=lambda: interpreters.run_string(interp_b, code_b))

t_a.start()
t_b.start()

# This will hang forever
# t_a.join()
# t_b.join()

print("(Commented out to avoid hanging)")

interpreters.destroy(interp_a)
interpreters.destroy(interp_b)

Lesson: Ensure channel producers and consumers are paired correctly. Use a timeout to detect deadlocks:

import interpreters
import threading

send_id, recv_id = interpreters.create_channel()

code_a = f"""
import interpreters
send_id = {send_id}
interpreters.channel_send(send_id, "hello", timeout=2)
"""

code_b = f"""
import interpreters
recv_id = {recv_id}
msg = interpreters.channel_recv(recv_id, timeout=2)
print(f"Received: {{msg}}")
"""

interp_a = interpreters.create()
interp_b = interpreters.create()

t_a = threading.Thread(target=lambda: interpreters.run_string(interp_a, code_a))
t_b = threading.Thread(target=lambda: interpreters.run_string(interp_b, code_b))

t_a.start()
t_b.start()

try:
    t_a.join(timeout=5)
    t_b.join(timeout=5)
except:
    print("Timeout detected")

interpreters.destroy(interp_a)
interpreters.destroy(interp_b)

Deadlock Scenario 2: Nested Locks Across Interpreters

If Thread A (holding Interpreter A's GIL) tries to acquire a lock held by Thread B (running in Interpreter B), and Thread B tries to send a channel message to Thread A, deadlock occurs.

Example (avoid this pattern):

import interpreters
import threading
import time

send_id, recv_id = interpreters.create_channel()
external_lock = threading.Lock()

# Thread A: holds GIL_A, acquires external_lock, tries to recv
code_a = f"""
import interpreters
import threading
import time

external_lock = None  # Passed differently in real code
recv_id = {recv_id}

# Simulate holding the GIL for computation
time.sleep(1)

# Try to receive (while still holding GIL_A implicitly)
msg = interpreters.channel_recv(recv_id, timeout=2)
print(f"Received: {{msg}}")
"""

# Thread B: holds GIL_B, tries to acquire external_lock and send
code_b = f"""
import interpreters
import threading
import time

external_lock = None  # Passed differently in real code
send_id = {send_id}

# Try to acquire external_lock (if Thread A holds it, we block)
# with external_lock:
#     interpreters.channel_send(send_id, "hello")
#     DEADLOCK: Thread A waiting on recv, Thread B waiting on lock
"""

# This example shows the pattern (avoid nesting GIL + external locks)

Pattern to avoid: Never hold an external lock while waiting on a channel. Keep locks and channel I/O separate.

Safe pattern:

import interpreters
import threading

send_id, recv_id = interpreters.create_channel()

# Separate channel I/O from lock-protected sections
code_a = f"""
import interpreters

recv_id = {recv_id}

# Wait for message (not holding any external lock)
msg = interpreters.channel_recv(recv_id, timeout=2)

# Process message (acquire locks if needed)
print(f"Received: {{msg}}")
"""

code_b = f"""
import interpreters

send_id = {send_id}

# Send message
interpreters.channel_send(send_id, "hello")
"""

interp_a = interpreters.create()
interp_b = interpreters.create()

t_a = threading.Thread(target=lambda: interpreters.run_string(interp_a, code_a))
t_b = threading.Thread(target=lambda: interpreters.run_string(interp_b, code_b))

t_a.start()
t_b.start()
t_a.join()
t_b.join()

interpreters.destroy(interp_a)
interpreters.destroy(interp_b)

print("Safe, no deadlock")

Deadlock Prevention Checklist

Channel pairing: Ensure every channel_send() has a corresponding channel_recv(). Use timeouts to detect mismatches.
Lock ordering: If using external locks (not per-interpreter GILs), acquire them in a consistent global order across threads to prevent circular waits.
Separate concerns: Don't hold external locks while blocking on channel I/O. Decouple synchronization.
Watchdog threads: In critical services, spawn a watchdog thread that detects hung interpreters and logs/restarts them.

Example watchdog:

import interpreters
import threading
import time

def create_monitored_interpreter():
    """Create an interpreter and a watchdog thread."""
    interp = interpreters.create()
    last_ping = time.time()
    lock = threading.Lock()
    
    def watchdog():
        """Monitor for deadlocks; if no ping in 10 seconds, log warning."""
        while True:
            time.sleep(10)
            with lock:
                if time.time() - last_ping > 10:
                    print(f"Warning: Interpreter {interp} might be deadlocked")
    
    def run_code(code):
        nonlocal last_ping
        with lock:
            last_ping = time.time()
        interpreters.run_string(interp, code)
        with lock:
            last_ping = time.time()
    
    watchdog_thread = threading.Thread(target=watchdog, daemon=True)
    watchdog_thread.start()
    
    return interp, run_code

# Use monitored interpreter
interp, run = create_monitored_interpreter()

code = """
import time
time.sleep(2)
print("Done")
"""

run(code)

interpreters.destroy(interp)

Key Takeaways

Per-interpreter GILs allow true parallelism; each interpreter's GIL is independent.
Objects don't cross interpreter boundaries; data flows via channels (serialization) or low-level shared memory.
Deadlock patterns differ from single-GIL Python: avoid holding external locks while blocking on channel I/O.
Use timeouts on channel operations to detect hangs.
Separate lock-protected sections from channel I/O to maintain deadlock freedom.

Frequently Asked Questions

Can a thread acquire two interpreters' GILs simultaneously?

No. A thread runs code in one interpreter at a time (holding one GIL). Switching interpreters requires releasing the current GIL. This is by design to prevent deadlocks.

What if I need to run code that accesses multiple interpreters?

You can't directly access objects from Interpreter A while running in Interpreter B. Instead, use channels: send data from A, receive in B, process, send results back. This enforces safe serialization.

How do I debug a deadlock involving subinterpreters?

Use py-spy or gdb to inspect thread stacks. Look for threads blocking on channel_recv() or locks. Verify that channel send/recv sides match. Add logging to pinpoint the exact point where threads block.

Is the per-interpreter GIL visible to my code?

No. The GIL is acquired and released automatically; you can't explicitly call acquire() or release(). You can use threading.Lock() for explicit synchronization if needed.

What's the overhead of per-interpreter GILs vs a global GIL?

Per-interpreter GILs add slightly more overhead per interpreter (~100 bytes per lock), but the lack of global contention more than compensates. Benchmarks show 2-4x improvement on multi-core workloads.

The Per-Interpreter GIL: Isolation Semantics​

Isolation Guarantees​

Deadlock Scenario 1: Channel Blocking with GIL​

Deadlock Scenario 2: Nested Locks Across Interpreters​

Deadlock Prevention Checklist​

Key Takeaways​

Frequently Asked Questions​

Can a thread acquire two interpreters' GILs simultaneously?​

What if I need to run code that accesses multiple interpreters?​

How do I debug a deadlock involving subinterpreters?​

Is the per-interpreter GIL visible to my code?​

What's the overhead of per-interpreter GILs vs a global GIL?​

Further Reading​