Recent from talks
Nothing was collected or created yet.
Global interpreter lock
View on Wikipedia
A global interpreter lock (GIL) is a mechanism used in computer-language interpreters to synchronize the execution of threads so that only one native thread (per process) can execute basic operations (such as memory allocation and reference counting) at a time.[1] As a general rule, an interpreter that uses GIL will see only one thread to execute at a time, even if it runs on a multi-core processor, although some implementations provide for CPU intensive code to release the GIL, allowing multiple threads to use multiple cores. Some popular interpreters that have a GIL are CPython and Ruby MRI.
Technical background concepts
[edit]A global interpreter lock (GIL) is a mutual-exclusion lock held by a programming language interpreter thread to avoid sharing code that is not thread-safe with other threads. In implementations with a GIL, there is always one GIL for each interpreter process.
Applications running on implementations with a GIL can be designed to use separate processes to achieve full parallelism, as each process has its own interpreter and in turn has its own GIL. Otherwise, the GIL can be a significant barrier to parallelism.
Advantages
[edit]Reasons for employing a global interpreter lock include:
- increased speed of single-threaded programs (no necessity to acquire or release locks on all data structures separately),
- easy integration of C libraries that usually are not thread-safe,
- ease of implementation (having a single GIL is much simpler to implement than a lock-free interpreter or one using fine-grained locks).
A way to get around a GIL is creating a separate interpreter per thread, which is too expensive with most languages.[citation needed]
Drawbacks
[edit]Use of a global interpreter lock in a language effectively limits the amount of parallelism reachable through concurrency of a single interpreter process with multiple threads. If the process is almost purely made up of interpreted code and does not make calls outside of the interpreter which block for long periods of time (allowing the GIL to be released by that thread while they process), there is likely to be very little increase in speed when running the process on a multiprocessor machine. Due to signaling with a CPU-bound thread, it can cause a significant slowdown, even on single processors.[2] More seriously, when the single native thread calls a blocking OS process (such as disk access), the entire process is blocked, even though other application threads may be waiting.
Examples
[edit]Some language implementations that implement a global interpreter lock are CPython, the most widely-used implementation of Python,[3][4] and Ruby MRI, the reference implementation of Ruby (where it is called Global VM Lock).
JVM-based equivalents of these languages (Jython and JRuby) do not use global interpreter locks. IronPython and IronRuby are implemented on top of Microsoft's Dynamic Language Runtime and also avoid using a GIL.[5]
An example of an interpreted language without a GIL is Tcl, which is used in the benchmarking tool HammerDB.[6]
Example code
[edit]Example code in Python. Notice how a lock is acquired and released between each instruction call. It uses the Lock object from the threading module.[7]
from threading import Lock
INSTRUCTION_TABLE = { ... }
def execute(bytecode: list) -> None:
"""Execute bytecode."""
lock = Lock()
for (opcode, args) in bytecode:
lock.acquire()
INSTRUCTION_TABLE[opcode](args)
lock.release()
Recent Development
[edit]Free-Threaded Build (Python 3.13 and later)
In Python 3.13, an experimental "free-threaded" build of CPython was introduced as part of PEP 703 – Making the Global Interpreter Lock Optional in CPython. This build allows developers to compile Python without the Global Interpreter Lock (GIL), enabling true parallel execution of Python bytecode across multiple CPU cores. The feature is still experimental but represents a major step toward improved concurrency in future Python releases.[8]
See also
[edit]References
[edit]- ^ "GlobalInterpreterLock". Python Wiki. Retrieved 30 November 2015.
- ^ David Beazley (2009-06-11). "Inside the Python GIL" (PDF). Chicago: Chicago Python User Group. Retrieved 2009-10-07.
- ^ Shannon -jj Behrens (2008-02-03). "Concurrency and Python". Dr. Dobb's Journal. p. 2. Retrieved 2008-07-12.
The GIL is a lock that is used to protect all the critical sections in Python. Hence, even if you have multiple CPUs, only one thread may be doing "pythony" things at a time.
- ^ "Python/C API Reference Manual: Thread State and the Global Interpreter Lock". Archived from the original on 2008-09-14. Retrieved 2014-08-15.
- ^ "IronPython at python.org". python.org. Retrieved 2011-04-04.
IronPython has no GIL and multi-threaded code can use multi core processors.
- ^ "HammerDB Concepts and Architecture". HammerDB. 2018-11-30. Retrieved 2020-05-10.
It is important to understand at the outset that HammerDB is written in TCL because of the unique threading capabilities that TCL brings.
- ^ "threading — Thread-based parallelism". Python documentation. Retrieved 16 April 2025.
- ^ "PEP 703 – Making the Global Interpreter Lock Optional in CPython | peps.python.org". Python Enhancement Proposals (PEPs). Retrieved 2025-11-13.
Global interpreter lock
View on Grokipediamultiprocessing module, which spawns separate processes to bypass the GIL and utilize multiple cores.[1]
The GIL's impact has long been a point of contention in the Python community, hindering scalability in fields like scientific computing, machine learning, and high-performance computing, where multi-threading could otherwise leverage modern hardware.[2] Efforts to address this date back to proposals like PEP 703, accepted in 2023, which introduced experimental "free-threaded" builds of CPython starting in Python 3.13, configurable via a --disable-gil option.[2] These builds incorporate thread-safe alternatives, such as atomic reference counting and per-object locks, to eliminate the GIL while maintaining compatibility, though it incurs a single-threaded performance overhead of approximately 5-10% due to added synchronization.[2] In Python 3.14 (released October 2025), free-threading became a stable, opt-in feature, with continued refinements for extension modules and the standard library.[4]
Fundamentals
Definition and Purpose
The Global Interpreter Lock (GIL) is a mutex-like mechanism in bytecode interpreters such as CPython that restricts execution to a single native thread at any given time, ensuring that only one thread can execute Python bytecode simultaneously within the same interpreter instance.[3] This lock serializes access to the interpreter's core components, preventing multiple threads from concurrently manipulating shared resources like Python objects.[5] By design, the GIL applies specifically to thread-level parallelism in multi-threaded programs, but it does not affect process-level concurrency (e.g., via the multiprocessing module) or asynchronous programming models like asyncio, which operate without shared interpreter state.[3] The primary purpose of the GIL is to safeguard against race conditions in CPython's memory management system, particularly its reference counting mechanism for tracking object lifetimes and enabling automatic garbage collection. Without the GIL, concurrent threads could interfere with reference count increments or decrements—for instance, two threads attempting to increment an object's reference count might result in only one increment being recorded, leading to premature deallocation and potential crashes.[3] This protection is essential because CPython's memory management is inherently non-thread-safe, relying on the GIL to maintain consistency across threads without requiring complex per-object locking.[5] Additionally, the GIL was introduced to simplify the development of C extensions and the interpreter's internals in multi-threaded environments, allowing extensions to interact with Python objects without implementing their own synchronization primitives in most cases.[6] By providing a centralized lock, it reduces the burden on extension authors to handle thread safety manually, promoting stability while enabling basic multi-threading for I/O-bound tasks.[3]Historical Development
The Global Interpreter Lock (GIL) was introduced in CPython with the release of Python 1.5 in early 1998, primarily developed by Guido van Rossum to enable multithreading support while addressing implementation challenges in the interpreter.[6] At the time, Python's growing popularity prompted interest in leveraging operating system threading capabilities, but the reference-counting-based memory management system posed risks of race conditions if multiple threads accessed Python objects simultaneously.[6] Early motivations for the GIL centered on simplifying garbage collection and ensuring compatibility with the C API for extensions. By serializing access to the Python virtual machine, the GIL prevented concurrent modifications to reference counts, avoiding the need for complex thread-safe garbage collection mechanisms that could complicate the interpreter's design.[6] This approach also streamlined interactions for C extensions, allowing developers to focus on functionality without extensive synchronization primitives, amid the late 1990s shift toward multithreaded applications in scripting languages.[6] Throughout the 2000s and 2010s, the GIL persisted in CPython despite ongoing community debates about its limitations, contrasting with alternative implementations like Jython, which runs on the Java Virtual Machine and lacks a GIL due to Java's inherent thread safety model.[7] Key events included repeated discussions on the python-dev mailing list, where proposals to remove the GIL were rejected to preserve single-threaded performance.[5] In 2023, PEP 703 proposed making the GIL optional through a new build configuration (--disable-gil), marking a significant shift after years of experimentation.[2] The GIL's evolution continued into the 2020s, transitioning from a mandatory feature to an optional one in Python 3.13 (released October 2024), where free-threaded builds became experimentally available without the lock.[8] By Python 3.14 (released October 2025), the implementation of PEP 703 was fully integrated, allowing users to compile CPython without the GIL while maintaining backward compatibility for standard builds, though the lock remains the default to avoid performance regressions in legacy code.[4]Technical Implementation
Mechanism in CPython
In CPython, the Global Interpreter Lock (GIL) serves as a mutex that integrates deeply with the Python Virtual Machine (PVM) to safeguard bytecode execution. The PVM, responsible for interpreting compiled Python bytecode into machine instructions, relies on the GIL to ensure that only one native thread can execute Python code at any given moment, thereby maintaining the atomicity of operations on shared Python objects. This protection is crucial during the evaluation loop, where bytecode instructions are processed sequentially to prevent concurrent modifications that could corrupt object states.[2][9] A key aspect of this integration is the GIL's role in protecting CPython's reference counting mechanism for memory management. Python objects maintain an internal reference count to track active references; increments via Py_INCREF and decrements via Py_DECREF must occur atomically to avoid race conditions in multi-threaded scenarios, where simultaneous updates could lead to premature deallocation or memory leaks. By serializing these operations under the GIL, CPython ensures thread-safe garbage collection without requiring finer-grained locks on every object, simplifying the interpreter's design while preventing crashes from inconsistent reference counts.[2] CPython manages multi-threading through thread state structures defined in PyThreadState, which encapsulate each thread's execution context within the interpreter. These structures track GIL ownership and coordinate access during sensitive operations like garbage collection. This state management allows the interpreter to switch between threads efficiently while upholding the GIL's serialization.[3] At the platform level, the GIL is realized as a pthread_mutex_t on Unix-like systems for POSIX-compliant locking, providing robust mutual exclusion with low overhead for contention scenarios. On Windows, it utilizes a CriticalSection object, a lightweight synchronization primitive optimized for single-process critical sections, ensuring compatibility and performance across operating systems. These implementations are abstracted through CPython's threading module to handle the underlying synchronization transparently.[2]Acquisition and Release
In CPython, threads acquire the Global Interpreter Lock (GIL) prior to executing Python bytecode to ensure thread-safe access to the interpreter's shared resources. This process involves internal calls, such as those akin to the now-removed C API function PyEval_AcquireLock() (deprecated in Python 3.2 and removed in Python 3.13), which atomically attach the thread state and seize the lock; waiting threads employ a timed condition variable wait to avoid indefinite blocking and mitigate starvation risks.[3][10] The GIL is relinquished through defined triggers to enable context switching among threads. Primary among these is the completion of a tunable time slice during bytecode execution, defaulting to 5 milliseconds and adjustable via sys.setswitchinterval() to balance responsiveness and overhead. Release also occurs automatically during blocking I/O operations or sleeps, allowing other threads to proceed without explicit intervention.[3] Context switching is orchestrated by the interpreter's evaluation loop, where the GIL-holding thread periodically checks for drop requests after the time interval elapses. If set—typically due to a timeout from waiting threads—the current thread finishes its ongoing bytecode instruction before yielding the lock via an internal release mechanism, such as drop_gil(), permitting the operating system scheduler to select the next thread. This voluntary or forced yielding promotes fair access in multithreaded environments.[10] Starting with Python 3.2, the GIL underwent a significant rewrite, shifting from opcode-count-based to time-based switching for more precise control. This enables finer-grained releases during prolonged computations, enhancing overall thread responsiveness by minimizing extended lock holds in CPU-intensive scenarios without altering single-threaded performance.[10]Advantages
Thread Safety and Simplicity
The Global Interpreter Lock (GIL) in CPython provides automatic thread safety for pure Python code by serializing the execution of Python bytecode across threads, thereby eliminating the need for developers to implement manual locks or synchronization primitives in most cases.[11] This mechanism ensures that only one thread can access and modify Python objects at a time, inherently preventing race conditions that could arise from concurrent manipulation of shared data structures like reference counts or built-in types such as dictionaries.[12] As a result, multithreaded Python applications achieve thread safety without the overhead of explicit locking, making concurrent programming more straightforward for tasks like I/O-bound operations.[2] For C extensions that interface with Python objects, the GIL further simplifies development by offering a predictable single-threaded environment, reducing the complexity required to ensure thread safety when calling into or from Python code.[2] Extensions can leverage GIL-aware APIs, such as PyGILState_Ensure, to safely acquire the lock and interact with the interpreter, avoiding the need for intricate per-object locking mechanisms that would otherwise be necessary in a fully concurrent system.[12] This design choice lowers the barrier for creating performant, thread-compatible modules while maintaining compatibility with Python's core object model.[6] The GIL's serialization also benefits debugging and maintenance of multithreaded code by minimizing the occurrence of race conditions, which leads to more deterministic behavior and easier testing.[6] With fewer nondeterministic errors, developers can focus on logical issues rather than elusive concurrency bugs, enhancing overall code reliability.[11] Moreover, the GIL preserves the consistency of Python's dynamic typing and object model across threads by enforcing exclusive access, ensuring that operations like attribute access or method calls behave predictably without interference.[2]Performance in Single-Threaded Scenarios
In single-threaded applications, the Global Interpreter Lock (GIL) in CPython introduces negligible overhead, as there is no contention for the lock among multiple threads. The interpreter can execute Python bytecode without invoking synchronization primitives, allowing for efficient, uninterrupted operation in non-concurrent code paths.[2] The GIL enables key optimizations in memory management that enhance single-threaded efficiency, such as non-atomic reference counting. Under the GIL, reference counts for Python objects can use standard integer operations rather than slower atomic instructions required for thread safety, minimizing computational costs during object allocation and deallocation. Additionally, common operations on built-in types like lists and dictionaries avoid per-object locking, further reducing runtime expenses.[2] With the GIL acquired at the start of bytecode evaluation and held throughout the interpreter loop, CPython's dispatch mechanism operates without the need for repeated lock checks or releases at each instruction. This streamlined design accelerates the execution of CPU-bound tasks by eliminating synchronization barriers that would otherwise interrupt the flow, contributing to faster overall performance in solo-thread environments.[2] Benchmarks confirm the GIL's benefits for single-threaded workloads, where standard CPython consistently outperforms experimental no-GIL builds. For example, in Python 3.13 free-threaded mode, single-threaded execution on the pyperformance suite showed about 40% overhead compared to the GIL-enabled build, largely due to disabled adaptive interpreter specializations; this gap narrowed to 5-10% in Python 3.14 with optimizations reinstated.[13]Disadvantages
Multithreading Limitations
The Global Interpreter Lock (GIL) in CPython enforces serialization of Python bytecode execution, permitting only one thread to run Python code at any given time, even on multi-core processors. This mechanism ensures thread safety by preventing concurrent access to shared interpreter state but results in other threads idling while waiting for the lock, effectively negating the benefits of multithreading for parallel computation.[1][3] In CPU-bound tasks, such as numerical computations implemented in pure Python, the GIL prevents any speedup from additional threads, as the workload cannot be distributed across cores. For instance, a pure Python matrix multiplication benchmark using multiple threads shows virtually no performance improvement over a single-threaded version, with runtimes remaining constant at approximately 22 seconds regardless of thread count on an Intel i7-10700K system with 8 cores. This limitation arises because the GIL serializes bytecode instructions like loops and arithmetic operations, forcing threads to contend for the lock rather than executing in parallel.[14][15] A direct consequence is uneven CPU utilization, where a multi-threaded CPU-bound loop, such as a simple countdown iteration, pegs one core at 100% usage while leaving others idle, as observed in profiling tools on systems with four or more cores. Measurements using timing utilities liketimeit further highlight this issue: a 50 million-iteration countdown takes about 6.2 seconds single-threaded but 6.9 seconds with multiple threads, revealing overhead from lock acquisition and release without any parallelism gains. This behavior underscores how the GIL limits Python's ability to leverage modern hardware for concurrent CPU-intensive workloads, as predicted by principles like Amdahl's law, where the serialized portion dominates execution time.[14][16]
Scalability Challenges
The Global Interpreter Lock (GIL) in CPython significantly hinders multi-core processor utilization, as it serializes access to the Python interpreter, preventing multiple threads from executing Python bytecode simultaneously even on systems with numerous cores.[2] This limitation is particularly pronounced in high-performance computing (HPC) environments, where Python workflows often resort to multiprocessing to bypass the GIL, resulting in uncontrolled resource consumption and potential underutilization of available cores due to orphaned processes and inefficient thread management.[17] For instance, in HPC clusters, the GIL forces the creation of separate processes for parallel tasks, which can lead to excessive memory usage and difficulties in adhering to job scheduling constraints, thereby complicating Python's use for compute-intensive simulations on multi-core architectures.[17] In server applications, the GIL poses substantial challenges for scaling web servers and data processing systems, as it bottlenecks CPU-bound operations and necessitates process forking or horizontal scaling across multiple instances to achieve concurrency, rather than leveraging threads within a single process.[2] This approach increases overhead from inter-process communication and memory duplication, making it harder to efficiently handle high-throughput workloads like real-time data ingestion or API serving on multi-core servers without significant infrastructure costs.[2] As of 2025, the GIL continues to pose challenges in core machine learning training pipelines, where CPU-intensive tasks such as data preprocessing and loading are often offloaded to NumPy or C++ backends to circumvent the lock's restrictions on multi-threading.[18] In these pipelines, the GIL causes serialization in data loading stages, leading to underutilized cores during training on large datasets and prompting reliance on libraries that explicitly release the GIL during computations.[19] The GIL's persistence has shaped Python's ecosystem toward hybrid approaches, where large-scale deployments integrate C extensions or external libraries to release the lock and enable parallelism, thereby adding layers of complexity in development, maintenance, and integration for distributed systems.[14] With the proliferation of multi-core processors, this reliance on hybrid strategies amplifies deployment challenges in resource-constrained environments, as developers must balance Python's ease of use with performance optimizations via lower-level integrations.[2]Workarounds and Alternatives
Multiprocessing and Async IO
Themultiprocessing module in Python provides a standard library approach to achieve true parallelism by leveraging operating system processes rather than threads, thereby circumventing the limitations imposed by the Global Interpreter Lock (GIL).[20] Each spawned process runs an independent instance of the Python interpreter, complete with its own memory space and GIL, which enables multiple processes to execute CPU-bound code simultaneously across multiple cores.[20] To facilitate coordination between these isolated processes, the module employs inter-process communication (IPC) mechanisms such as queues, pipes, and shared memory, allowing data exchange without shared state conflicts.[20]
In contrast, the asyncio module implements asynchronous I/O through cooperative multitasking with coroutines, operating within a single thread to handle concurrent operations without invoking the threading model.[21] This design is particularly effective for I/O-bound workloads, where tasks frequently await external resources like network responses or file operations; during these waits, the GIL is temporarily released, permitting the event loop to schedule other coroutines efficiently.[21] As a result, asyncio avoids the overhead of thread creation and context switching while scaling well for scenarios involving numerous non-blocking I/O calls, such as serving multiple client connections.[21]
A key distinction in their application lies in task suitability: asyncio excels in I/O-intensive activities like web scraping or API polling, where concurrency rather than parallelism is paramount, whereas multiprocessing is better suited for compute-heavy simulations or data processing that benefit from parallel CPU utilization.[20][21]
Subinterpreters and Extensions
Subinterpreters provide a mechanism for running multiple isolated Python interpreters within a single process, enabling better concurrency by associating each with its own independent Global Interpreter Lock (GIL). This feature, enhanced by PEP 684 accepted for Python 3.12, relocates the GIL to a per-interpreter scope, allowing subinterpreters to execute Python bytecode simultaneously without contending for a shared lock.[22] Prior to this, subinterpreters shared a single GIL, limiting their utility for parallelism, but the per-interpreter GIL isolates runtime state and enables true multi-threaded execution across interpreters.[22] This approach mitigates GIL bottlenecks in multi-threaded applications by confining thread contention to within each subinterpreter, while maintaining isolation to prevent cross-interpreter data races.[23] C extensions offer another workaround by explicitly releasing the GIL during compute-intensive or I/O-bound operations, permitting other threads to proceed. Developers use the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros in C code to yield the GIL temporarily, as documented in the CPython C API.[3] For instance, NumPy employs this technique in its low-level array operations, releasing the GIL to allow parallel execution of numerical computations across multiple threads. This selective release is particularly effective for extensions performing long-running tasks, balancing Python's thread safety with improved multi-core utilization without altering the core interpreter. Third-party projects have explored advanced modifications to address GIL limitations through experimental locking strategies. The nogil project, initiated before 2023 as a proof-of-concept fork of CPython, investigated removing the GIL entirely by implementing fine-grained, per-object locks to enable multithreading without a global mutex.[24] These efforts, which demonstrated viable parallelism in benchmarks, have since been integrated into official no-GIL development tracks, influencing proposals like PEP 703 for optional GIL builds.[2] Hybrid approaches combine Python with lower-level languages like C++ to bypass GIL constraints in parallel sections. Tools such as Cython facilitate this by compiling Python-like code to C, where developers can declare functions with the nogil qualifier to release the GIL during execution, enabling multi-threaded C++ integration via directives like OpenMP.[25] Similarly, the ctypes module allows calling C++ libraries from Python, and if those libraries manage their own threading without Python API interactions, they avoid GIL acquisition altogether. These methods are ideal for performance-critical code, such as scientific computing, where Python handles high-level logic and C++ manages parallel workloads.Recent Developments
Experimental No-GIL Builds
In Python 3.13, released in 2024, CPython introduced experimental support for free-threaded builds through the--disable-gil configuration flag, as outlined in PEP 703.[2] This option allows compiling the interpreter without the Global Interpreter Lock (GIL), enabling true parallelism in multithreaded Python code while maintaining backward compatibility via a distinct ABI tagged with "t" for threading.[13] The feature represents an early milestone in efforts to optionalize the GIL, building on years of community debate about its removal to better leverage multi-core processors.[26]
A key challenge in implementing no-GIL builds was ensuring thread-safe memory management, particularly reference counting, which traditionally relies on non-atomic operations protected by the GIL. PEP 703 addresses this through biased reference counting, where increments are optimistic and decrements may be deferred, combined with per-object locks for contested cases; this approach draws on prior advancements like immortal objects in PEP 683, which fix reference counts for common immutable objects to reduce synchronization overhead.[2][27] These mechanisms minimize the need for expensive atomic operations in most scenarios, though they introduce additional complexity for object lifecycle management.
Early no-GIL builds in Python 3.13 exhibited a notable single-threaded performance overhead of 20-50% compared to standard GIL-enabled builds, primarily due to the added synchronization costs in reference counting and bytecode execution.[28] For instance, in a prime-counting benchmark, the no-GIL variant took approximately 33% longer for single-threaded execution.[28] However, this comes with significant benefits for CPU-bound multithreaded workloads, where the absence of the GIL allows threads to run concurrently across cores.
Community-driven testing has highlighted these trade-offs through benchmarks on platforms like GitHub. In CPU-bound tasks, such as parallel numerical computations, no-GIL builds demonstrated speedups of up to 3.4x on four threads relative to their single-threaded baseline, approaching near-linear scaling on multi-core systems—contrasting sharply with GIL-limited threading, which offers little to no parallelism.[28][29] Repositories like faster-cpython/benchmarking-public provide ongoing comparative data, showing that while single-threaded slowdowns persist, multithreaded gains make no-GIL viable for parallel-intensive applications like scientific computing.[29]
Python 3.14 and Beyond
Python 3.14, released on October 7, 2025, marks a significant milestone in addressing the Global Interpreter Lock (GIL) by officially supporting free-threaded builds as a standard variant, no longer designating them as experimental.[4] These builds can be compiled from source using the./configure --disable-gil option or obtained via official macOS and Windows installers, enabling true multi-core parallelism without the GIL by default.[13] In free-threaded mode, the GIL can optionally be re-enabled at runtime using the PYTHON_GIL=1 environment variable or the -X gil=1 command-line flag for compatibility testing.[13]
Performance in free-threaded Python 3.14 has been optimized, with single-threaded workloads incurring only a 5-10% overhead compared to GIL-enabled builds, achieved through enhancements like the thread-safe specializing adaptive interpreter and deferred reference counting for immortal objects.[4][13] For multi-threaded, CPU-bound applications, this mode delivers substantial speedups by allowing concurrent execution across cores, with gains scaling based on workload and hardware—typically 2-3x or more on multi-core systems for parallelizable tasks.[4] As of November 2025, default Python 3.14 distributions retain the GIL for backward compatibility, but free-threaded builds are recommended for new multi-threaded applications to leverage these parallelism benefits.[4]
Looking ahead, PEP 779 establishes criteria for advancing free-threaded support, including performance thresholds met in 3.14, and outlines a phased approach toward making it the default build in future releases, potentially Python 3.16 or later, pending community adoption and further optimizations.[30] Ecosystem adaptations are underway, with tools like Cython and pybind11 updating for compatibility, while major libraries such as Pandas are progressively supporting free-threaded mode through ongoing compatibility efforts and ABI stability.[4][30] This progression aims to balance innovation with the stability of Python's vast extension ecosystem.
Examples
Python Threading Demonstration
To demonstrate the impact of the Global Interpreter Lock (GIL) on Python threading, consider a simple CPU-bound task that performs intensive computations, such as summing large ranges of numbers in loops. In a single-threaded execution, the code runs straightforwardly without concurrency overhead.import time
def cpu_bound_task(n):
total = 0
for i in range(n):
total += i * i
return total
def single_threaded():
start = time.perf_counter()
result1 = cpu_bound_task(10**7)
result2 = cpu_bound_task(10**7)
end = time.perf_counter()
print(f"Single-threaded time: {end - start:.2f} seconds")
print(f"Results: {result1}, {result2}")
single_threaded()
import time
def cpu_bound_task(n):
total = 0
for i in range(n):
total += i * i
return total
def single_threaded():
start = time.perf_counter()
result1 = cpu_bound_task(10**7)
result2 = cpu_bound_task(10**7)
end = time.perf_counter()
print(f"Single-threaded time: {end - start:.2f} seconds")
print(f"Results: {result1}, {result2}")
single_threaded()
time.perf_counter() for high-resolution timing. On a typical modern CPU, this might take around 1-2 seconds, depending on hardware, as the computations fully utilize the single core without interruption.
Now, contrast this with a multi-threaded version using threading.Thread to run the same two tasks concurrently:
import threading
import time
def cpu_bound_task(n):
total = 0
for i in range(n):
total += i * i
return total
def threaded_task():
result = cpu_bound_task(10**7)
print(f"Thread result: {result}")
def multi_threaded():
start = time.perf_counter()
thread1 = threading.Thread(target=threaded_task)
thread2 = threading.Thread(target=threaded_task)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
end = time.perf_counter()
print(f"Multi-threaded time: {end - start:.2f} seconds")
multi_threaded()
import threading
import time
def cpu_bound_task(n):
total = 0
for i in range(n):
total += i * i
return total
def threaded_task():
result = cpu_bound_task(10**7)
print(f"Thread result: {result}")
def multi_threaded():
start = time.perf_counter()
thread1 = threading.Thread(target=threaded_task)
thread2 = threading.Thread(target=threaded_task)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
end = time.perf_counter()
print(f"Multi-threaded time: {end - start:.2f} seconds")
multi_threaded()
time.sleep() to mimic blocking I/O:
import threading
import time
def io_bound_task(duration):
print(f"Task sleeping for {duration} seconds")
time.sleep(duration)
print(f"Task {duration} completed")
def multi_threaded_io():
start = time.perf_counter()
thread1 = threading.Thread(target=io_bound_task, args=(2,))
thread2 = threading.Thread(target=io_bound_task, args=(2,))
thread1.start()
thread2.start()
thread1.join()
thread2.join()
end = time.perf_counter()
print(f"Multi-threaded I/O time: {end - start:.2f} seconds")
multi_threaded_io()
import threading
import time
def io_bound_task(duration):
print(f"Task sleeping for {duration} seconds")
time.sleep(duration)
print(f"Task {duration} completed")
def multi_threaded_io():
start = time.perf_counter()
thread1 = threading.Thread(target=io_bound_task, args=(2,))
thread2 = threading.Thread(target=io_bound_task, args=(2,))
thread1.start()
thread2.start()
thread1.join()
thread2.join()
end = time.perf_counter()
print(f"Multi-threaded I/O time: {end - start:.2f} seconds")
multi_threaded_io()
Comparative Code Analysis
To illustrate the impact of the Global Interpreter Lock (GIL) on parallelism, consider a CPU-bound task: computing the sum of a large range of integers (e.g., 0 to 100,000,000) divided across four workers. In standard CPython with the GIL enabled, multithreading fails to achieve true parallelism because only one thread executes Python bytecode at a time, leading to performance equivalent to a single-threaded implementation. The following code uses thethreading module to attempt parallel summation, but due to the GIL, it does not utilize multiple cores effectively:
import threading
import time
def sum_range(start, end):
total = 0
for i in range(start, end):
total += i
return total
if __name__ == "__main__":
n = 100_000_000
chunk_size = n // 4
partial_sums = [None] * 4
threads = []
start_time = time.time()
def worker(start, end, idx):
partial_sums[idx] = sum_range(start, end)
for i in range(4):
start = i * chunk_size
end = start + chunk_size if i < 3 else n
t = threading.Thread(target=worker, args=(start, end, i))
threads.append(t)
t.start()
for t in threads:
t.join()
total = sum(partial_sums)
end_time = time.time()
print(f"Total: {total}, Time: {end_time - start_time:.2f}s")
import threading
import time
def sum_range(start, end):
total = 0
for i in range(start, end):
total += i
return total
if __name__ == "__main__":
n = 100_000_000
chunk_size = n // 4
partial_sums = [None] * 4
threads = []
start_time = time.time()
def worker(start, end, idx):
partial_sums[idx] = sum_range(start, end)
for i in range(4):
start = i * chunk_size
end = start + chunk_size if i < 3 else n
t = threading.Thread(target=worker, args=(start, end, i))
threads.append(t)
t.start()
for t in threads:
t.join()
total = sum(partial_sums)
end_time = time.time()
print(f"Total: {total}, Time: {end_time - start_time:.2f}s")
multiprocessing module bypasses the GIL by spawning separate processes, each with its own interpreter and memory space, enabling true parallelism across cores. The code below uses a process pool for the same summation task:
from multiprocessing import Pool
import time
def sum_range(start, end):
total = 0
for i in range(start, end):
total += i
return total
if __name__ == "__main__":
n = 100_000_000
chunk_size = n // 4
segments = [(i * chunk_size, (i + 1) * chunk_size if i < 3 else n) for i in range(4)]
start_time = time.time()
with Pool(4) as p:
partial_sums = p.starmap(sum_range, segments)
total = sum(partial_sums)
end_time = time.time()
print(f"Total: {total}, Time: {end_time - start_time:.2f}s")
from multiprocessing import Pool
import time
def sum_range(start, end):
total = 0
for i in range(start, end):
total += i
return total
if __name__ == "__main__":
n = 100_000_000
chunk_size = n // 4
segments = [(i * chunk_size, (i + 1) * chunk_size if i < 3 else n) for i in range(4)]
start_time = time.time()
with Pool(4) as p:
partial_sums = p.starmap(sum_range, segments)
total = sum(partial_sums)
end_time = time.time()
print(f"Total: {total}, Time: {end_time - start_time:.2f}s")
--disable-gil during compilation or via official free-threaded installers), which became officially supported in Python 3.14 (released October 2025) per PEP 779, allows the same threading code to execute in parallel without lock contention.[30] The code remains identical to the multithreading example above, but requires a free-threaded Python environment:
- Build from source:
./configure --disable-gil && make && make install.[13] - Or download pre-built binaries from python.org for supported platforms.
| Approach | Runtime on 4-Core CPU (approx.) | Scalability Notes |
|---|---|---|
| Single-Threaded | 10-12 seconds | Baseline; uses 1 core. |
| Threading (GIL-enabled) | 10-12 seconds | No parallelism; GIL serializes execution.[31] |
| Multiprocessing | 2.5-3 seconds | Linear scaling with cores; process overhead ~20-30%.[31] |
| Threading (No-GIL, Python 3.14) | 2-2.5 seconds | Matches multiprocessing; shared memory reduces overhead to 5-10%.[13][32] |
