Hubbry Logo
Thread (computing)Thread (computing)Main
Open search
Thread (computing)
Community hub
Thread (computing)
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Thread (computing)
Thread (computing)
from Wikipedia
A process with two threads of execution, running on one processor
Program vs. process vs. thread
scheduling, preemption, context switching

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system.[1] In many cases, a thread is a component of a process.

The multiple threads of a given process may be executed concurrently (via multithreading capabilities), sharing resources such as memory, while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.

The implementation of threads and processes differs between operating systems.[2][page needed]

History

[edit]

Threads made an early appearance under the name of "tasks" in IBM's batch processing operating system, OS/360, in 1967. It provided users with three available configurations of the OS/360 control system, of which multiprogramming with a variable number of tasks (MVT) was one. Saltzer (1966) credits Victor A. Vyssotsky with the term "thread".[3]

The Mach implementation of threads was described in the summer of 1986.[4] OS/2 1.0, released in 1987, supported threads[5]. The first version of Windows to have threads was Windows NT released in 1993.

The use of threads in software applications became more common in the early 2000s as CPUs began to utilize multiple cores. Applications wishing to take advantage of multiple cores for performance advantages were required to employ concurrency to utilize the multiple cores.[6]

[edit]

Scheduling can be done at the kernel level or user level, and multitasking can be done preemptively or cooperatively. This yields a variety of related concepts.

Processes

[edit]

At the kernel level, a process contains one or more kernel threads, which share the process's resources, such as memory and file handles – a process is a unit of resources, while a thread is a unit of scheduling and execution. Kernel scheduling is typically uniformly done preemptively or, less commonly, cooperatively. At the user level a process such as a runtime system can itself schedule multiple threads of execution. If these do not share data, as in Erlang, they are usually analogously called processes,[7] while if they share data they are usually called (user) threads, particularly if preemptively scheduled. Cooperatively scheduled user threads are known as fibers; different processes may schedule user threads differently. User threads may be executed by kernel threads in various ways (one-to-one, many-to-one, many-to-many). The term light-weight process variously refers to user threads or to kernel mechanisms for scheduling user threads onto kernel threads.

A process is a heavyweight unit of kernel scheduling, as creating, destroying, and switching processes is relatively expensive. Processes own resources allocated by the operating system. Resources include memory (for both code and data), file handles, sockets, device handles, windows, and a process control block. Processes are isolated by process isolation, and do not share address spaces or file resources except through explicit methods such as inheriting file handles or shared memory segments, or mapping the same file in a shared way – see Interprocess communication. Creating or destroying a process is relatively expensive, as resources must be acquired or released. Processes are typically preemptively multitasked, and process switching is relatively expensive, beyond basic cost of context switching, due to issues such as cache flushing (in particular, process switching changes virtual memory addressing, causing invalidation and thus flushing of an untagged translation lookaside buffer (TLB), notably on x86).

Kernel threads

[edit]

A kernel thread is a lightweight unit of kernel scheduling. At least one kernel thread exists within each process. If multiple kernel threads exist within a process, then they share the same memory and file resources. Kernel threads are preemptively multitasked if the operating system's process scheduler is preemptive. Kernel threads do not own resources except for a stack, a copy of the registers including the program counter, and thread-local storage (if any), and are thus relatively cheap to create and destroy. Thread switching is also relatively cheap: it requires a context switch (saving and restoring registers and stack pointer), but does not change virtual memory and is thus cache-friendly (leaving TLB valid). The kernel can assign one or more software threads to each core in a CPU (it being able to assign itself multiple software threads depending on its support for multithreading), and can swap out threads that get blocked. However, kernel threads take much longer than user threads to be swapped.

User threads

[edit]

Threads are sometimes implemented in userspace libraries, thus called user threads. The kernel is unaware of them, so they are managed and scheduled in userspace. Some implementations base their user threads on top of several kernel threads, to benefit from multi-processor machines (M:N model). User threads as implemented by virtual machines are also called green threads.

As user thread implementations are typically entirely in userspace, context switching between user threads within the same process is extremely efficient because it does not require any interaction with the kernel at all: a context switch can be performed by locally saving the CPU registers used by the currently executing user thread or fiber and then loading the registers required by the user thread or fiber to be executed. Since scheduling occurs in userspace, the scheduling policy can be more easily tailored to the requirements of the program's workload.

However, the use of blocking system calls in user threads (as opposed to kernel threads) can be problematic. If a user thread or a fiber performs a system call that blocks, the other user threads and fibers in the process are unable to run until the system call returns. A typical example of this problem is when performing I/O: most programs are written to perform I/O synchronously. When an I/O operation is initiated, a system call is made, and does not return until the I/O operation has been completed. In the intervening period, the entire process is "blocked" by the kernel and cannot run, which starves other user threads and fibers in the same process from executing.

A common solution to this problem (used, in particular, by many green threads implementations) is providing an I/O API that implements an interface that blocks the calling thread, rather than the entire process, by using non-blocking I/O internally, and scheduling another user thread or fiber while the I/O operation is in progress. Similar solutions can be provided for other blocking system calls. Alternatively, the program can be written to avoid the use of synchronous I/O or other blocking system calls (in particular, using non-blocking I/O, including lambda continuations and/or async/await primitives[8]).

Fibers

[edit]

Fibers are an even lighter unit of scheduling which are cooperatively scheduled: a running fiber must explicitly yield to allow another fiber to run, which makes their implementation much easier than kernel or user threads. A fiber can be scheduled to run in any thread in the same process. This permits applications to gain performance improvements by managing scheduling themselves, instead of relying on the kernel scheduler (which may not be tuned for the application). Some research implementations of the OpenMP parallel programming model implement their tasks through fibers.[9][10] Closely related to fibers are coroutines, with the distinction being that coroutines are a language-level construct, while fibers are a system-level construct.

Threads vs processes

[edit]

Threads differ from traditional multitasking operating-system processes in several ways:

  • processes are typically independent, while threads exist as subsets of a process
  • processes carry considerably more state information than threads, whereas multiple threads within a process share process state as well as memory and other resources
  • processes have separate address spaces, whereas threads share their address space
  • processes interact only through system-provided inter-process communication mechanisms
  • context switching between threads in the same process typically occurs faster than context switching between processes

Systems such as Windows NT and OS/2 are said to have cheap threads and expensive processes; in other operating systems there is not so great a difference except in the cost of an address-space switch, which on some architectures (notably x86) results in a translation lookaside buffer (TLB) flush.

Advantages and disadvantages of threads vs processes include:

  • Lower resource consumption of threads: using threads, an application can operate using fewer resources than it would need when using multiple processes.
  • Simplified sharing and communication of threads: unlike processes, which require a message passing or shared memory mechanism to perform inter-process communication (IPC), threads can communicate through data, code and files they already share.
  • Thread crashes a process: due to threads sharing the same address space, an illegal operation performed by a thread can crash the entire process; therefore, one misbehaving thread can disrupt the processing of all the other threads in the application.

Scheduling

[edit]

Preemptive vs cooperative scheduling

[edit]

Operating systems schedule threads either preemptively or cooperatively. Multi-user operating systems generally favor preemptive multithreading for its finer-grained control over execution time via context switching. However, preemptive scheduling may context-switch threads at moments unanticipated by programmers, thus causing lock convoy, priority inversion, or other side-effects. In contrast, cooperative multithreading relies on threads to relinquish control of execution, thus ensuring that threads run to completion. This can cause problems if a cooperatively-multitasked thread blocks by waiting on a resource or if it starves other threads by not yielding control of execution during intensive computation.

Single- vs multi-processor systems

[edit]

Until the early 2000s, most desktop computers had only one single-core CPU, with no support for hardware threads, although threads were still used on such computers because switching between threads was generally still quicker than full-process context switches. In 2002, Intel added support for simultaneous multithreading to the Pentium 4 processor, under the name hyper-threading; in 2005, they introduced the dual-core Pentium D processor and AMD introduced the dual-core Athlon 64 X2 processor.

Systems with a single processor generally implement multithreading by time slicing: the central processing unit (CPU) switches between different software threads. This context switching usually occurs frequently enough that users perceive the threads or tasks as running in parallel (for popular server/desktop operating systems, maximum time slice of a thread, when other threads are waiting, is often limited to 100–200ms). On a multiprocessor or multi-core system, multiple threads can execute in parallel, with every processor or core executing a separate thread simultaneously; on a processor or core with hardware threads, separate software threads can also be executed concurrently by separate hardware threads.

Threading models

[edit]

1:1 (kernel-level threading)

[edit]

Threads created by the user in a 1:1 correspondence with schedulable entities in the kernel[11] are the simplest possible threading implementation. OS/2 and Win32 used this approach from the start, while on Linux the GNU C Library implements this approach (via the NPTL or older LinuxThreads). This approach is also used by Solaris, NetBSD, FreeBSD, macOS, and iOS.

M:1 (user-level threading)

[edit]

An M:1 model implies that all application-level threads map to one kernel-level scheduled entity;[11] the kernel has no knowledge of the application threads. With this approach, context switching can be done very quickly and, in addition, it can be implemented even on simple kernels which do not support threading. One of the major drawbacks, however, is that it cannot benefit from the hardware acceleration on multithreaded processors or multi-processor computers: there is never more than one thread being scheduled at the same time.[11] For example: If one of the threads needs to execute an I/O request, the whole process is blocked and the threading advantage cannot be used. The GNU Portable Threads uses User-level threading, as does State Threads.

M:N (hybrid threading)

[edit]

M:N maps some M number of application threads onto some N number of kernel entities,[11] or "virtual processors." This is a compromise between kernel-level ("1:1") and user-level ("N:1") threading. In general, "M:N" threading systems are more complex to implement than either kernel or user threads, because changes to both kernel and user-space code are required[clarification needed]. In the M:N implementation, the threading library is responsible for scheduling user threads on the available schedulable entities; this makes context switching of threads very fast, as it avoids system calls. However, this increases complexity and the likelihood of priority inversion, as well as suboptimal scheduling without extensive (and expensive) coordination between the userland scheduler and the kernel scheduler.

Hybrid implementation examples

[edit]

History of threading models in Unix systems

[edit]

SunOS 4.x implemented light-weight processes or LWPs. NetBSD 2.x+, and DragonFly BSD implement LWPs as kernel threads (1:1 model). SunOS 5.2 through SunOS 5.8 as well as NetBSD 2 to NetBSD 4 implemented a two level model, multiplexing one or more user level threads on each kernel thread (M:N model). SunOS 5.9 and later, as well as NetBSD 5 eliminated user threads support, returning to a 1:1 model.[12] FreeBSD 5 implemented M:N model. FreeBSD 6 supported both 1:1 and M:N, users could choose which one should be used with a given program using /etc/libmap.conf. Starting with FreeBSD 7, the 1:1 became the default. FreeBSD 8 no longer supports the M:N model.

Single-threaded vs multithreaded programs

[edit]

In computer programming, single-threading is the processing of one instruction at a time.[13] In the formal analysis of the variables' semantics and process state, the term single threading can be used differently to mean "backtracking within a single thread", which is common in the functional programming community.[14]

Multithreading is mainly found in multitasking operating systems. Multithreading is a widespread programming and execution model that allows multiple threads to exist within the context of one process. These threads share the process's resources, but are able to execute independently. The threaded programming model provides developers with a useful abstraction of concurrent execution. Multithreading can also be applied to one process to enable parallel execution on a multiprocessing system.

Multithreading libraries tend to provide a function call to create a new thread, which takes a function as a parameter. A concurrent thread is then created which starts running the passed function and ends when the function returns. The thread libraries also offer data synchronization functions.

Threads and data synchronization

[edit]

Threads in the same process share the same address space. This allows concurrently running code to couple tightly and conveniently exchange data without the overhead or complexity of an IPC. When shared between threads, however, even simple data structures become prone to race conditions if they require more than one CPU instruction to update: two threads may end up attempting to update the data structure at the same time and find it unexpectedly changing underfoot. Bugs caused by race conditions can be very difficult to reproduce and isolate.

To prevent this, threading application programming interfaces (APIs) offer synchronization primitives such as mutexes to lock data structures against concurrent access. On uniprocessor systems, a thread running into a locked mutex must sleep and hence trigger a context switch. On multi-processor systems, the thread may instead poll the mutex in a spinlock. Both of these may sap performance and force processors in symmetric multiprocessing (SMP) systems to contend for the memory bus, especially if the granularity of the locking is too fine.

Other synchronization APIs include condition variables, critical sections, semaphores, and monitors.

Thread pools

[edit]

A popular programming pattern involving threads is that of thread pools where a set number of threads are created at startup that then wait for a task to be assigned. When a new task arrives, it wakes up, completes the task and goes back to waiting. This avoids the relatively expensive thread creation and destruction functions for every task performed and takes thread management out of the application developer's hand and leaves it to a library or the operating system that is better suited to optimize thread management.

Multithreaded programs vs single-threaded programs pros and cons

[edit]

Multithreaded applications have the following advantages vs single-threaded ones:

  • Responsiveness: multithreading can allow an application to remain responsive to input. In a one-thread program, if the main execution thread blocks on a long-running task, the entire application can appear to freeze. By moving such long-running tasks to a worker thread that runs concurrently with the main execution thread, it is possible for the application to remain responsive to user input while executing tasks in the background. On the other hand, in most cases multithreading is not the only way to keep a program responsive, with non-blocking I/O and/or Unix signals being available for obtaining similar results.[15]
  • Parallelization: applications looking to use multicore or multi-CPU systems can use multithreading to split data and tasks into parallel subtasks and let the underlying architecture manage how the threads run, either concurrently on one core or in parallel on multiple cores. GPU computing environments like CUDA and OpenCL use the multithreading model where dozens to hundreds of threads run in parallel across data on a large number of cores. This, in turn, enables better system utilization, and (provided that synchronization costs don't eat the benefits up), can provide faster program execution.

Multithreaded applications have the following drawbacks:

  • Synchronization complexity and related bugs: when using shared resources typical for threaded programs, the programmer must be careful to avoid race conditions and other non-intuitive behaviors. In order for data to be correctly manipulated, threads will often need to rendezvous in time in order to process the data in the correct order. Threads may also require mutually exclusive operations (often implemented using mutexes) to prevent common data from being read or overwritten in one thread while being modified by another. Careless use of such primitives can lead to deadlocks, livelocks or races over resources. As Edward A. Lee has written: "Although threads seem to be a small step from sequential computation, in fact, they represent a huge step. They discard the most essential and appealing properties of sequential computation: understandability, predictability, and determinism. Threads, as a model of computation, are wildly non-deterministic, and the job of the programmer becomes one of pruning that nondeterminism."[16]
  • Being untestable. In general, multithreaded programs are non-deterministic, and as a result, are untestable. In other words, a multithreaded program can easily have bugs which never manifest on a test system, manifesting only in production.[17][16] This can be alleviated by restricting inter-thread communications to certain well-defined patterns (such as message-passing).
  • Synchronization costs. As thread context switch on modern CPUs can cost up to 1 million CPU cycles,[18] it makes writing efficient multithreading programs difficult. In particular, special attention has to be paid to avoid inter-thread synchronization from being too frequent.

Programming language support

[edit]

Many programming languages support threading in some capacity.

  • IBM PL/I(F) included support for multithreading (called multitasking) as early as in the late 1960s, and this was continued in the Optimizing Compiler and later versions. The IBM Enterprise PL/I compiler introduced a new model "thread" API. Neither version was part of the PL/I standard.
  • Many implementations of C and C++ support threading, and provide access to the native threading APIs of the operating system. A standardized interface for thread implementation is POSIX Threads (Pthreads), which is a set of C-function library calls. OS vendors are free to implement the interface as desired, but the application developer should be able to use the same interface across multiple platforms. Most Unix platforms, including Linux, support Pthreads. Microsoft Windows has its own set of thread functions in the process.h interface for multithreading, like beginthread.
  • Some higher level (and usually cross-platform) programming languages, such as Java, Python, and .NET Framework languages, expose threading to developers while abstracting the platform specific differences in threading implementations in the runtime. Several other programming languages and language extensions also try to abstract the concept of concurrency and threading from the developer fully (Cilk, OpenMP, Message Passing Interface (MPI)). Some languages are designed for sequential parallelism instead (especially using GPUs), without requiring concurrency or threads (Ateji PX, CUDA).
  • A few interpreted programming languages have implementations (e.g., Ruby MRI for Ruby, CPython for Python) which support threading and concurrency but not parallel execution of threads, due to a global interpreter lock (GIL). The GIL is a mutual exclusion lock held by the interpreter that can prevent the interpreter from simultaneously interpreting the application's code on two or more threads at once. This effectively limits the parallelism on multiple core systems. It also limits performance for processor-bound threads (which require the processor), but doesn't effect I/O-bound or network-bound ones as much. Other implementations of interpreted programming languages, such as Tcl using the Thread extension, avoid the GIL limit by using an Apartment model where data and code must be explicitly "shared" between threads. In Tcl each thread has one or more interpreters.
  • In programming models such as CUDA designed for data parallel computation, an array of threads run the same code in parallel using only its ID to find its data in memory. In essence, the application must be designed so that each thread performs the same operation on different segments of memory so that they can operate in parallel and use the GPU architecture.
  • Hardware description languages such as Verilog have a different threading model that supports extremely large numbers of threads (for modeling hardware).

See also

[edit]

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In computing, a thread is the basic unit of CPU utilization and execution within a process, consisting of a program counter, a stack, registers, and a unique thread ID, enabling concurrent execution of code while sharing the process's memory and resources. Unlike a full process, which encompasses its own address space, threads are lightweight, allowing multiple threads to run within a single process to improve efficiency and responsiveness. Threads facilitate multithreading, where a program can perform several tasks simultaneously, such as handling user input in one thread while performing background computations in another, thereby enhancing performance on multiprocessor s. They are managed by the operating system scheduler, which allocates processor time to threads, and can be implemented at the user level (handled by libraries within the application) or kernel level (directly by the OS for better control and parallelism). Key benefits include reduced overhead compared to creating separate processes, faster context switching, and improved resource sharing, though they introduce challenges like to prevent race conditions. Modern operating systems, such as Windows, , and macOS, natively support threads as a fundamental concurrency mechanism.

Fundamentals

Definition and Characteristics

In computing, a thread is defined as the smallest sequence of programmed instructions that can be scheduled for execution, serving as the basic unit of CPU utilization within an operating system. It consists of essential components including a to track the current instruction, a stack for managing local variables and function calls, a set of registers for temporary data storage, and a unique thread identifier. Unlike larger execution units, threads enable fine-grained control over concurrent operations by allowing multiple such sequences to run seemingly simultaneously under the management of the scheduler. Threads exhibit key characteristics that distinguish them as lightweight entities compared to full processes. They share the resources of their , such as the , , data, open files, and signals, which minimizes overhead in creation and context switching. This resource sharing facilitates concurrency within a single process, allowing multiple threads to perform independent tasks like handling user input, processing data, or managing I/O operations without the expense of duplicating process-level resources. On multi-core systems, threads support true parallelism, where distinct cores can execute different threads simultaneously, enhancing overall system throughput for compute-intensive applications. Threads operate as subunits of processes, providing a mechanism for intra-process concurrency in contrast to standalone processes, which maintain isolated spaces and resources for inter-process . While processes represent complete programs with their own and execution , threads leverage the process's environment to focus solely on execution flow, making them more efficient for modular, responsive program design. The basic lifecycle of a thread encompasses creation, execution, blocking, and termination. A thread is created (or spawned) by allocating its private components like the stack and registers within an existing , transitioning to a ready or runnable state awaiting scheduler assignment. During execution, it runs on the CPU until it completes, yields control voluntarily, or is preempted. Blocking occurs when the thread waits for an event, such as I/O completion or a resource lock, suspending its progress without terminating the . Finally, termination happens upon finishing its instructions or explicit exit, freeing its resources while the may continue with other threads.

Threads Within Processes

In computing, a process serves as a container for one or more threads, where each thread represents a distinct sequence of instructions that can execute independently within the shared context of the process. All threads belonging to the same process share the process's address space, including code, data, and heap segments, as well as resources such as open files and global variables. This structural integration allows threads to operate as lightweight subunits of the process, enabling concurrent execution without the full overhead of separate process creation. The sharing of resources among threads within a provides significant benefits, particularly in facilitating efficient communication and reducing operational overhead. By accessing the same space, threads can exchange directly without the need for mechanisms like pipes or message queues, which are slower and more complex. This shared access minimizes the cost of thread creation and context switching compared to spawning new processes, as threads do not require duplicating the entire or reinitializing system resources. Despite this sharing, each thread maintains certain private elements to ensure isolation and prevent mutual interference during execution. Specifically, every thread has its own stack for local variables and function calls, a unique thread ID for identification by the operating system or , and an independent execution state including the , registers, and stack pointer. These private components allow threads to maintain distinct control flows while leveraging the process's common resources. A practical example of threads operating within a is found in a multithreaded , where a single server manages multiple client requests concurrently. Each incoming request is assigned to a separate thread, which uses the shared code base and global data structures (such as a request queue) for processing, but executes on its own private stack to handle unique client-specific operations like HTTP headers or generating responses. This approach enables the server to scale efficiently with increasing traffic without creating separate for each connection.

Thread Types

Kernel-Level Threads

Kernel-level threads are threads managed directly by the operating system kernel, where the kernel creates and maintains a (TCB) for each thread to track its state, registers, stack, and scheduling information. Unlike user-level threads, which are handled by a user-space invisible to the kernel, kernel-level threads are fully recognized by the operating system, allowing the kernel to treat them as schedulable entities akin to lightweight processes. This design requires system calls for thread creation (e.g., via APIs like pthread_create in ), termination, and , with the kernel allocating kernel resources such as memory and handling synchronization primitives. In implementation, the kernel performs context switches between kernel-level threads by saving and restoring the TCB contents, which involves mode switches from user to kernel space and back, enabling the scheduler to preempt or yield threads based on priorities or time slices. This kernel involvement ensures that threads from the same process can run concurrently on multiprocessor systems, as the scheduler can assign them to different CPUs for true parallelism and load balancing. Furthermore, when a kernel-level thread blocks on an I/O operation or system call, the kernel immediately schedules another ready thread from the same process, avoiding suspension of the entire process. The primary advantages of kernel-level threads stem from their integration with the OS scheduler, providing efficient support for blocking operations and multiprocessor utilization without the limitations of user-level threads, where a single blocking call can stall all threads in a process. They facilitate better resource allocation, as the kernel can migrate threads across processors to optimize performance. However, these benefits come with significant drawbacks, including higher overhead for thread operations due to mandatory kernel transitions; for instance, each context switch incurs costs averaging around 3.8 microseconds for direct switching time, plus additional indirect costs from cache and TLB flushes that can extend total overhead to several microseconds or more. This makes kernel-level threads slower to create and switch compared to user-level alternatives, potentially impacting performance in applications with frequent thread management. Examples of kernel-level threads include the native threading implementation in and later versions, where threads are scheduled directly by the kernel executive, and Linux's threads (), which map to kernel-level entities via the clone , enabling the OS to schedule them independently. These implementations are standard in modern systems and Windows, supporting scalable concurrency for server applications and tasks.

User-Level Threads

User-level threads are execution units managed entirely by a user-space within a , without any awareness or involvement from the operating system's kernel; from the kernel's perspective, the appears as a single thread of execution. This approach allows multiple user-level threads to share the same and resources of the , enabling efficient concurrency at the application level. In implementation, the user-level library maintains thread control blocks (TCBs) in user space to track each thread's state, stack, and registers. The library multiplexes these threads onto one or a limited number of kernel entities, such as a single kernel thread or , handling creation, scheduling, , and switching through library calls rather than system calls. This user-space management avoids kernel intervention for routine operations, allowing thread switches to occur as simple procedure calls. The primary advantages of user-level threads stem from their isolation from kernel overhead: thread creation and context switching are significantly faster, often by orders of magnitude compared to kernel-level threads, since they bypass system calls and mode switches. Additionally, applications gain flexibility in scheduling, as the library can enforce custom policies optimized for specific workloads, such as priority-based or application-specific algorithms, without relying on the kernel's general-purpose scheduler. However, user-level threads have notable drawbacks. A blocking system call by one thread, such as for I/O, can suspend the entire in the kernel, stalling all other threads and undermining concurrency. They also exhibit limited on multiprocessor systems, as the kernel schedules the hosting as a monolithic unit, preventing true parallelism across multiple CPUs despite multiple user-level threads. In contrast to kernel-level threads, this lack of kernel visibility leads to suboptimal by the OS. Examples of user-level thread implementations include green threads, which were employed in early versions of the (JVM) to provide portable, lightweight concurrency without native OS support. Another is the GNU Portable Threads (Pth) library, a POSIX/ANSI-C-based system for Unix platforms that supports non-preemptive, priority-based scheduling of multiple threads in user space.

Lightweight Threads and Fibers

Fibers represent an ultra-lightweight form of execution context in , designed for manual control without reliance on operating system scheduling. Unlike traditional threads, fibers consist primarily of a user-allocated stack and a minimal execution state, such as processor registers and a , omitting the full (TCB) found in kernel or user-level threads. This structure enables them to serve as building blocks for , where execution flows are paused and resumed explicitly by the application code, often to implement coroutines or asynchronous operations within a single host thread. Key characteristics of fibers include their stack-only footprint and programmer-driven switching, typically via calls that save the current context and restore another without kernel intervention. They share the and resources of the hosting thread, making them ideal for scenarios like event loops or non-blocking I/O, where multiple logical execution paths must interleave without the overhead of full thread creation. For instance, in asynchronous programming, a fiber can yield control during I/O waits, allowing other fibers to proceed on the same thread. Fibers thus approximate coroutines, providing a lightweight alternative to heavier user-level threads, which incorporate an automated scheduler. The primary advantages of fibers lie in their minimal and fast context switching, with overhead limited to the specified stack size—often just a few kilobytes per —compared to the megabytes typical for OS threads. This supports high-concurrency applications, such as servers handling thousands of connections, by enabling massive parallelism within limited memory and without the latency of kernel-mode transitions. Studies on threading frameworks highlight how such mechanisms can outperform OS threads in nested parallelism and task-heavy workloads by reducing scheduling costs. However, fibers have notable limitations due to their nature: they lack preemption, requiring explicit yields from the , which can lead to if one fiber performs prolonged without relinquishing control. Blocking system calls within a fiber will suspend the entire hosting thread, potentially undermining concurrency, and they do not constitute true parallel execution since all fibers serialize on the host thread. These constraints make fibers unsuitable for tasks needing automatic time-slicing. Prominent examples include the Windows Fibers API, introduced in , which provides functions like CreateFiber to allocate a fiber with a custom stack and SwitchToFiber for explicit context switching, facilitating user-mode multitasking in applications like database servers. In Lua, coroutines offer fiber-like functionality through primitives such as coroutine.create and coroutine.resume, enabling efficient implementation of generators, simulators, and cooperative schedulers in scripting environments.

Threading Models

One-to-One Model

The one-to-one threading model establishes a direct correspondence between each user-level thread and a kernel-level thread, such that creating a user thread in an application automatically spawns a corresponding kernel thread managed by the operating system. In this approach, there is no multiplexing of user threads onto fewer kernel threads; instead, every user thread has its own dedicated kernel entity, enabling the kernel to treat them as independent schedulable units. Mechanically, the kernel assumes full responsibility for scheduling, context switching, and for all threads, bypassing any user-space thread library involvement in these operations. This direct mapping ensures that system calls and interrupts are handled at the kernel level without interference from other user threads in the same , as each kernel thread operates autonomously. A key advantage of the one-to-one model is its ability to fully utilize multiprocessor systems, allowing multiple threads to execute in true parallel across available CPU cores without contention from shared kernel resources. Additionally, if one thread blocks on a or I/O operation, the kernel can immediately schedule another thread from the same , preventing the entire from stalling and enhancing overall concurrency. However, this model is resource-intensive because each user thread incurs the full overhead of kernel thread creation, including memory for thread control blocks and kernel stacks, which can limit to typically a few thousand threads per before performance degrades significantly. Context switches between kernel threads also involve higher costs due to mode transitions between user and kernel space. Prominent implementations include the Native POSIX Thread Library (NPTL) in modern Linux kernels starting from version 2.6, which adopted this model for POSIX threads (pthreads) to improve scalability and standards compliance. Windows operating systems, from Windows NT onward, employ a one-to-one model where user threads map directly to kernel threads managed by the executive scheduler. Similarly, Solaris versions 9 and later transitioned to this model as the default for lightweight processes (LWPs), replacing earlier hybrid approaches to simplify thread management and boost performance on multiprocessors.

Many-to-One Model

The many-to-one threading model, also known as the M:1 model, involves mapping multiple user-level threads to a single kernel-level thread, with thread creation, scheduling, and management performed entirely within a user-space . In this approach, the operating system's kernel remains unaware of the individual user threads and treats the process as having only one schedulable entity, the kernel thread. The user-level thread library implements its own scheduler to multiplex the user threads onto this single kernel thread, handling context switches and without invoking kernel services for each operation. This model offers significant advantages in terms of efficiency and low overhead. Thread operations such as creation, termination, and switching occur in user space, avoiding the expense of kernel traps and system calls, which results in faster execution compared to kernel-managed threading. Context switching between user threads is particularly rapid, as it involves minimal state saving and restoration without kernel involvement, making it suitable for applications requiring high concurrency with limited resources. However, the many-to-one model has notable drawbacks that limit its and reliability. A critical issue arises during blocking system calls: if the single kernel thread blocks—for instance, on I/O operations—all multiplexed user threads are effectively blocked, as the kernel cannot any of them independently. Additionally, it provides no parallelism on multiprocessor systems, since only one kernel thread executes at a time, preventing true concurrent execution across multiple CPUs. In contrast to the one-to-one model, which maps each user thread directly to a kernel thread for better blocking behavior and multiprocessor support, the many-to-one approach prioritizes low-cost at the expense of these limitations. Examples of the many-to-one model include early implementations of threads (), where user-level libraries managed multiple threads atop a single kernel entity before the adoption of native kernel support. It is also exemplified by green threads in early virtual machines, such as those in Solaris, which allowed applications to create numerous lightweight threads without kernel overhead. While largely deprecated in modern general-purpose operating systems due to its blocking and scalability issues, the model persists in some embedded systems and runtime environments where resource constraints favor user-space control.

Many-to-Many Model

The many-to-many threading model, also known as the two-level or hybrid model, implements an M:N mapping where multiple user-level threads (M) are dynamically assigned to multiple kernel-level threads (N), with N typically equal to or fewer than M to optimize resource usage. This approach employs a hybrid scheduler that operates partly in user space and partly in kernel space, allowing the operating system to create a sufficient number of kernel threads to support parallelism without requiring a one-to-one correspondence for every user thread. In terms of mechanics, a user-level thread multiplexes the user threads onto the available kernel threads, managing context switches and scheduling decisions at the user level for efficiency, while the kernel scheduler handles the execution of the kernel threads across available processors. When a user thread blocks—such as during I/O operations—the user-level scheduler can reassign other runnable user threads to different kernel threads, preventing the entire from stalling; this is facilitated by mechanisms like scheduler activations, where the kernel notifies the user-level scheduler of events such as thread blocking or processor availability. The dynamic assignment ensures that the mapping adapts to conditions, balancing load across kernel threads without kernel awareness of individual user threads. This model offers advantages in balancing the efficiency of user-level threading—such as low-cost context switching—with the parallelism of kernel-level threading, enabling better utilization of multiprocessor systems and handling blocking operations without halting the whole process. It provides flexibility for applications needing many lightweight threads while avoiding the overhead of creating excessive kernel threads. However, drawbacks include the complexity of implementation, as coordinating the user-kernel interaction requires sophisticated to maintain mapping integrity, and potential overhead from maintaining and updating the thread-to-thread mappings during runtime. Examples of the many-to-many model include the Kernel Scheduled Entities (KSE) system in 5.x, which supported hybrid user-kernel threading through kernel-provided scheduling entities that allowed user libraries to manage multiple threads across kernel threads. Early versions of Solaris, prior to Solaris 9, utilized this model to map user threads to lightweight processes (kernel threads) for improved concurrency. Modern variants appear in Java's threads, introduced in Java 21, where virtual threads are lightweight constructs scheduled by the JVM onto a pool of carrier threads (platform threads) in an M:N fashion, enhancing scalability for high-throughput applications.

Scheduling Mechanisms

Preemptive and Cooperative Scheduling

In cooperative scheduling, threads voluntarily relinquish control to the scheduler, typically through explicit yield calls or when blocking on I/O or primitives, allowing the next thread to execute without interruption from the operating system. This approach simplifies implementation at the user level, as it avoids the need for kernel intervention during thread execution, resulting in lower overhead from context switches and better predictability for short, well-behaved tasks. However, it risks system unresponsiveness if a thread enters an or performs lengthy computations without yielding, potentially leading to of other threads or even application hangs. Preemptive scheduling, in contrast, enables the operating to forcibly a running thread at any time, typically via hardware or when a higher-priority thread becomes ready, ensuring that no single thread can monopolize the CPU. This mechanism promotes fairness and responsiveness by allowing context switches based on time slices or priority levels, making it suitable for interactive and real-time applications where predictable progress across threads is essential. The scheduler evaluates thread priorities periodically—often every few milliseconds—and preempts the current thread if necessary, saving its state and loading the next one from the ready queue. The primary trade-offs between these approaches lie in overhead and reliability: cooperative scheduling incurs minimal runtime costs since threads manage their own switching, but it demands behavior from all threads, which is fragile in the presence of bugs or varying workloads. Preemptive scheduling, while introducing overhead from frequent interrupts and context switches, guarantees progress and prevents monopolization, though it can complicate due to non-deterministic execution order. Hybrid models, such as those mixing cooperative execution within preemptive boundaries, have been explored to balance these aspects, but pure forms dominate standard implementations. Most modern operating systems, including and Windows, employ preemptive scheduling for kernel-level threads to ensure robust multitasking, with timer-driven interrupts enforcing time slices typically ranging from 1 to 100 milliseconds depending on priority. In contrast, scheduling persists in lightweight constructs like Windows fibers, where execution is non-preemptive and divided manually among related tasks within a single kernel thread, offering efficiency for domain-specific concurrency without OS involvement. Early systems, such as Windows 3.x, relied heavily on models but transitioned to preemptive for better stability in multithreaded environments.

Scheduling on Uniprocessor and Multiprocessor Systems

In uniprocessor systems, thread scheduling achieves concurrency through time-slicing, where the operating system allocates fixed quanta of —typically 10 to 100 milliseconds—to each ready thread before performing a to the next. This mechanism serializes execution on the single CPU, creating the illusion of simultaneous progress among multiple threads despite their inherent sequential processing. Context switches involve saving the state of the current thread (such as registers and ) and restoring the state of the next, incurring overhead that the scheduler minimizes by selecting appropriate quantum sizes. A common for uniprocessor thread scheduling is round-robin, which cycles through ready threads in a fixed order, granting each one time slice in turn to ensure fair CPU sharing and prevent indefinite blocking of lower-priority threads. This approach is particularly effective for interactive applications, as it maintains responsiveness by regularly preempting threads, though excessive context switches can degrade performance if quanta are too short. In practice, the scheduler maintains a ready queue to manage thread selection, prioritizing based on factors like virtual runtime to approximate fairness. Multiprocessor systems extend uniprocessor techniques by distributing threads across multiple CPUs, incorporating to bind threads to specific processors and preserve locality in caches and memory. Affinity scheduling reduces migration-induced overhead by favoring the last processor a thread ran on, thereby minimizing cache misses and improving access speeds. Load balancing complements this by dynamically migrating threads from overloaded CPUs to idle ones, using techniques like periodic polling of runqueue lengths to maintain even utilization across processors. Key challenges in multiprocessor scheduling include the loss of cache affinity during thread migration, which invalidates cached data and increases latency, and considerations for Non-Uniform Memory Access (NUMA) architectures in large-scale systems where memory access times vary significantly between local and remote nodes. In NUMA environments, schedulers must balance load while aligning thread placement with memory affinity to avoid remote access penalties, often employing topology-aware policies to map threads to nodes with optimal data locality. These issues can amplify if not addressed, leading to suboptimal performance in memory-intensive workloads. For multiprocessor load balancing, work-stealing queues provide an efficient where each processor maintains a of tasks, allowing idle processors to "steal" tasks from the tail of busy processors' queues to equalize without centralized coordination. This decentralized approach, with expected O(1) steal operations, scales well to many cores and is exemplified in Java's framework, where it supports recursive task partitioning for parallel computations. Work-stealing minimizes overhead and adapts dynamically to varying thread loads, outperforming global queues in reducing contention. Modern operating systems illustrate these concepts in practice. The Earliest Eligible Virtual Deadline First (EEVDF) scheduler, which superseded the (CFS) in kernel version 6.6 (2023), manages both uniprocessor and multiprocessor scenarios using per-CPU runqueues and a virtual deadline metric to select threads, ensuring fairness while supporting through load balancing and affinity hints. Similarly, the Windows scheduler employs multiprocessor-aware policies, including ideal processor selection and soft affinity, to assign threads across CPUs, grouping related threads for efficient handling in symmetric environments. These implementations typically integrate preemptive scheduling to threads at intervals, enhancing responsiveness in multithreaded applications.

Multithreading Applications

Single-Threaded Versus Multithreaded Programs

Single-threaded programs operate using a single thread of execution within a , resulting in sequential processing where tasks are handled one at a time without overlap. This model limits concurrency, as the program cannot perform multiple operations simultaneously, making it suitable for straightforward, linear workflows. For instance, (CLI) tools like text file processors or basic scripts exemplify this approach, executing operations such as reading input, performing computations, and producing output in a strictly ordered manner. In multithreaded programs, multiple threads share the same process address space and resources, enabling parallel execution of tasks that can overlap in time. This concurrency is advantageous for I/O-bound tasks, where one thread can wait for input/output operations while others proceed, or for CPU-bound tasks that benefit from distribution across available processors. Web servers represent a common use case, where separate threads handle incoming client requests concurrently, improving responsiveness without blocking the entire program on a single operation. Similarly, applications like games or scientific simulations leverage multithreading to update graphics, process user inputs, and run background computations in parallel. The implications differ markedly between the two models. Single-threaded programs require no mechanisms, as there is no risk of concurrent access to shared data, simplifying development and debugging while avoiding potential issues like race conditions. Multithreaded programs, by contrast, demand thread-safe code to manage shared resources, necessitating techniques such as mutexes or atomic operations to ensure across threads. Transitioning from a single-threaded to a multithreaded involves identifying independent tasks, then using thread creation APIs—like pthread_create in environments—to spawn and manage additional threads, thereby refactoring the program for concurrency.

Benefits and Drawbacks of Multithreading

Multithreading offers several key advantages in computing systems, primarily by enabling concurrent execution within a single . One primary benefit is improved , particularly in interactive applications, where a thread can continue processing events while other threads handle time-consuming operations like I/O or computations, preventing the application from appearing frozen. Another advantage is efficient resource sharing, as threads within the same process access a common memory space and resources without the need for mechanisms, which reduces overhead compared to separate processes. Additionally, multithreading enhances resource utilization on multiprocessor systems by allowing threads to execute in parallel across multiple cores, thereby exploiting hardware parallelism for tasks. This economy of creation and switching—where thread management is significantly cheaper than process management—further contributes to overall efficiency. Despite these benefits, multithreading introduces notable drawbacks that can complicate and performance. A major issue is increased programming , stemming from the need to manage shared resources and avoid issues like race conditions, where concurrent access to data leads to unpredictable outcomes. Threads exhibit inherent nondeterminism due to unpredictable interleaving of executions, making program behavior difficult to reason about or verify, as even simple designs can produce varying results across runs. multithreaded code is particularly challenging, as concurrency bugs such as deadlocks may remain undetected for extended periods despite thorough testing, with one reported case persisting undetected for four years in a mature system. Overhead associated with thread management also poses performance challenges. Creating and switching threads incurs costs, with context switching for kernel-level threads typically requiring several microseconds; direct costs average around 3.8 μs as measured in 2007, but total costs can exceed 1,000 μs when factoring in cache misses from access patterns. This overhead can result in performance degradation in scenarios with frequent switches, particularly in applications without sufficient parallelism. Furthermore, scalability is limited by , which quantifies that the maximum speedup from multithreading is constrained by the fraction of the program that remains sequential, such that even with many processors, non-parallelizable portions bottleneck overall gains—for instance, if 5% of a task is sequential, the theoretical speedup is capped at 20x regardless of core count. Multithreading is particularly well-suited for I/O-intensive applications, where threads can overlap waiting periods to maintain , but it offers limited value for purely workloads on single-core systems due to switching overheads outweighing parallelism benefits. In contrast to single-threaded programs, which provide deterministic execution but poor concurrency, multithreaded designs trade simplicity for these performance enhancements in suitable contexts.

Thread Pools and Resource Management

A thread pool is a collection of pre-initialized worker threads that execute tasks concurrently, allowing applications to manage concurrency without repeatedly creating and destroying threads. This addresses the high cost of thread lifecycle management in multithreaded environments by maintaining a fixed or dynamic set of reusable threads. Tasks are submitted to the pool via a queue, and idle threads are assigned to process them as they become available. The mechanics of thread pools involve a central task queue where incoming work units, such as runnables or callables, are enqueued for execution. Worker threads continuously poll this queue, dequeuing and performing tasks until the queue is empty or they are signaled to terminate. To prevent resource exhaustion, pools enforce bounds on their size—typically a minimum number of always-active threads and a maximum to cap concurrency—ensuring that the system does not spawn excessive threads during peak loads. For example, if the pool reaches its maximum size, additional tasks wait in the queue rather than triggering new thread creation. Dynamic pools may adjust thread counts based on demand, while fixed pools maintain a constant size for predictability. Thread pools offer significant benefits by minimizing the overhead of thread creation, which involves kernel allocation and context switching, thereby improving performance in scenarios with frequent short-lived tasks. They also provide fine-grained control over resource usage; by limiting the maximum number of threads—often tuned to the number of CPU cores—pools avoid overwhelming the system with too much parallelism, reducing context-switching costs and memory footprint. This approach enhances scalability in server applications, where unbounded threading could lead to thrashing under high load. Implementation of thread pools typically relies on executor frameworks that abstract thread management, task scheduling, and rejection policies for overflow conditions. Developers configure parameters like core pool size, maximum pool size, and queue capacity, often monitoring metrics such as queue length and thread utilization to optimize sizing for specific workloads, such as aligning with available CPU cores for compute-intensive tasks. In the Java ecosystem, the ExecutorService interface, backed by ThreadPoolExecutor, serves as a standard implementation for submitting tasks and shutting down pools gracefully. The Apache Tomcat server employs shared thread pools via its component to process HTTP requests, configurable for minimum and maximum threads to balance throughput and responsiveness.

Synchronization and Concurrency

Thread Synchronization Techniques

Thread synchronization techniques are essential mechanisms in multithreaded to coordinate access to shared resources among concurrent threads, ensuring consistency and preventing race conditions. These techniques provide for , signaling, and coordination, allowing threads to operate safely in parallel without interfering with each other. Common approaches include locking mechanisms, signaling tools, and lock-free methods, each tailored to specific scenarios such as protecting critical sections or managing producer-consumer interactions. Mutexes, or locks, are fundamental primitives that enforce exclusive access to a by allowing only one thread at a time to hold the lock. A thread acquires the mutex before entering a and releases it upon exit, blocking other threads attempting to acquire the same mutex until it becomes available. In threads (pthreads), the pthread_mutex_lock() function implements this by atomically blocking the calling thread if the mutex is already locked, while pthread_mutex_unlock() releases it, potentially waking a waiting thread based on the system's scheduling . Mutexes are designed as low-level building blocks for higher-level , with implementations often relying on hardware support for atomic operations to minimize overhead. Semaphores extend mutex functionality by supporting counting, enabling synchronization for scenarios involving multiple resources or producer-consumer patterns. Introduced by Edsger Dijkstra in the 1960s, a semaphore maintains a non-negative integer counter that threads can increment with a signal operation (V) or decrement with a wait operation (P), blocking if the counter is zero. Binary semaphores, with a counter limited to 0 or 1, function similarly to mutexes for , while counting semaphores allow up to N threads to access a resource pool. In practice, semaphores coordinate bounded buffers in producer-consumer systems, where producers signal availability and consumers wait for data without busy-waiting. Condition variables complement mutexes by providing a way for threads to wait for specific conditions on shared data, avoiding inefficient polling. A condition variable is always used in conjunction with an associated mutex protecting the shared variables; a thread waits by releasing the mutex and blocking until signaled, then reacquires the mutex upon wakeup. In pthreads, pthread_cond_wait() atomically releases the mutex and suspends the thread until pthread_cond_signal() or pthread_cond_broadcast() is called, ensuring the waiting thread rechecks the condition predicate to handle spurious wakeups. This mechanism is crucial for efficient in event-driven scenarios, such as queue management. Atomic operations offer lock-free synchronization by performing read-modify-write actions on shared variables indivisibly, without using explicit locks. The (CAS) instruction is a , atomically checking if a location holds an and, if so, replacing it with a new value; otherwise, it fails without modifying the location. CAS enables non-blocking algorithms like lock-free queues, where threads retry operations on failure, reducing contention in high-throughput environments. Hardware support for CAS, available in most modern processors, ensures progress under reasonable contention, though it may require to mitigate the . Barriers synchronize threads at specific points, ensuring all participants reach a milestone before any proceeds, commonly used in parallel algorithms like parallel . In , pthread_barrier_wait() blocks until the required number of threads arrive, then releases all simultaneously. Read-write locks optimize scenarios with multiple readers but exclusive writers, allowing concurrent reads while blocking writers and vice versa. pthread_rwlock_rdlock() permits multiple readers, whereas pthread_rwlock_wrlock() grants exclusive access for modifications, improving throughput in read-heavy workloads like database caching. Practical implementations include for C/C++ systems and 's synchronized keyword, which implicitly acquires an intrinsic lock on an object or class for methods or blocks. In , marking a method as synchronized ensures only one thread executes it at a time per object instance, simplifying for shared state. Best practices for thread synchronization emphasize minimizing the duration of critical sections to reduce lock hold times and contention, as prolonged locks can lead to performance degradation in multiprocessor systems. Developers should prefer higher-level abstractions like futures or promises for asynchronous coordination, which encapsulate internally and promote scalable designs over low-level .

Common Data Race and Deadlock Issues

A data race occurs when two or more threads concurrently access the same location, with at least one access being a write, and without appropriate , leading to nondeterministic behavior and potential . Such races can result in subtle bugs that are difficult to reproduce, as the outcome depends on thread scheduling and timing, often manifesting as incorrect computations or crashes in multithreaded applications. Deadlocks arise in multithreaded systems when two or more threads are permanently blocked, each waiting for resources held by the others in a , preventing any progress. For a deadlock to occur, four necessary conditions must hold simultaneously: , where resources cannot be shared and must be held exclusively; hold-and-wait, where a thread holds at least one resource while waiting for others; no preemption, meaning resources cannot be forcibly taken from a thread; and circular wait, where there is a cycle of threads each waiting for a resource held by the next. Beyond deadlocks, livelocks represent another concurrency issue where threads are actively running but unable to make meaningful progress, often due to repeated attempts to resolve conflicts that fail in a loop, such as two threads politely yielding to each other indefinitely. Starvation, conversely, happens when a thread is perpetually denied access to necessary resources, typically because higher-priority threads or unfair scheduling mechanisms continually preempt it, leading to indefinite delays in execution. Detection of data races and deadlocks often relies on dynamic tools like ThreadSanitizer (TSan), which instruments code at to track accesses and events, reporting races with low false positives by combining happens-before and analyses during runtime. Static analysis techniques, such as those in tools like or more recent hybrid approaches, examine code without execution to identify potential races by modeling lock usage and access patterns, though they may produce false positives. For deadlocks, runtime monitoring can detect circular waits by graphing resource dependencies, while static methods verify absence through . Avoidance strategies for data races emphasize proper , such as using atomic operations or mutexes to protect shared data, ensuring all accesses are serialized. Deadlock prevention targets breaking one of Coffman's conditions, for instance, by imposing a global lock ordering to eliminate circular waits or using timeouts on requests to avoid indefinite holds. The provides a systematic avoidance method by simulating allocations in advance to ensure the remains in a safe state, where an ordering of threads exists that can complete without deadlock, originally developed for operating resource management. For livelocks and , techniques include randomized backoffs to break symmetry in conflicting threads and priority inheritance protocols to guarantee bounded waits for lower-priority ones.

Historical Development

Early Concepts and Influences

The early motivations for concepts that would evolve into threading stemmed from the limitations of systems in the and early , where computers executed one job at a time, resulting in substantial CPU idle time during (I/O) operations and interrupts. To mitigate this, multiprogramming techniques were developed to keep the CPU busy by overlapping computation with I/O, allowing multiple programs to reside in and switch execution contexts upon interrupts, thus marking the transition from single-job execution to basic multitasking. This approach was essential for improving resource utilization in environments dominated by slow peripheral devices. In the , systems further advanced these ideas to support interactive multi-user access, influencing foundational threading concepts. The project, launched in 1965 by MIT's Project MAC, , and , emphasized hierarchical and concurrent resource sharing to enable efficient timesharing, with early experimentation in lightweight concurrency such as Max Smith's 1970 prototype using multiple stacks within a single process for background tasks. Complementing this, Edsger W. Dijkstra's , designed in 1965–1966 and detailed in 1968, organized the operating system into cooperating sequential processes across layers, using semaphores for synchronization to manage concurrency without hardware support for parallelism. Key contributions from prominent figures solidified these foundations. Dijkstra's 1965 work on cooperating sequential processes introduced semaphores as a primitive for coordinating concurrent activities, providing a mechanism to avoid race conditions in multiprogrammed environments and becoming a cornerstone for later threading synchronization. In parallel, Dennis Ritchie's contributions to Unix in the late 1960s and early 1970s, alongside Ken Thompson, established a process model with the fork() system call for creating child processes that shared the parent's address space initially, promoting efficient, lightweight execution units that influenced threading by prioritizing simplicity in concurrency and inter-process communication. Preceding formal threads, simpler units like subroutines and coroutines served as conceptual precursors by enabling modular and cooperative control flow. Subroutines, a staple since the 1940s, allowed reusable code blocks with their own local state, laying the groundwork for independent execution paths within a program. Coroutines, termed by Melvin E. Conway in 1958 and elaborated in his 1963 paper, generalized subroutines to support symmetric, non-preemptive yielding and resumption between routines, facilitating structured concurrency in assembly-level programs without full OS involvement.

Evolution in Operating Systems

The evolution of threads in operating systems began in earnest during the , driven by the need for efficient concurrency in systems. The standard played a pivotal role in this development, with the IEEE Std 1003.1c-1995 defining the API as an extension to the Portable Operating System Interface (), providing a standardized interface for creating and managing threads in C programs. This standardization facilitated portability across Unix variants, marking a shift toward kernel-supported threading models that improved performance over purely user-level implementations by allowing direct kernel scheduling and reducing context-switch overhead. In , early threading relied on user-level libraries like LinuxThreads, but performance limitations—such as poor scalability on multiprocessors due to the many-to-one (M:1) model—prompted a transition to a one-to-one (1:1) model. This culminated in the Native POSIX Thread Library (NPTL), introduced in 2.6 in 2003, which maps each user thread directly to a kernel thread for better responsiveness and efficiency, particularly in handling blocking system calls without stalling other threads. NPTL's adoption addressed scalability issues, enabling thousands of threads with minimal overhead, and became the default implementation for threads in subsequent distributions. Microsoft Windows introduced kernel-level threads with in 1993, where the executive scheduler dispatches threads as the basic unit of execution, supporting multiprocessor parallelism and preemptive multitasking from the outset. Later versions, such as and beyond, introduced user-mode scheduling (UMS) through dedicated APIs, allowing applications to manage their own thread scheduling for specific workloads, though kernel threads remained the foundation for system-wide concurrency. Other Unix derivatives followed suit with innovative threading architectures in the 1990s and 2000s. Sun Microsystems' Solaris implemented lightweight processes (LWPs) starting with Solaris 2 in the early , serving as kernel-visible entities that bridge user threads and kernel threads in a many-to-many (M:N) model, enabling efficient and reduced kernel overhead for I/O-bound applications. In FreeBSD, the 2000s saw the adoption of a hybrid model via Kernel Scheduled Entities (KSE), proposed in 2000 and integrated as the default threading library by 2004, combining user-level threads (ULT) with kernel entities to balance flexibility and performance on symmetric multiprocessors. Key milestones in this era included the broader industry shift from user-level to kernel-level threads for superior , as kernel threads avoid the pitfalls of user-level blocking—where one thread's could halt the entire —and enable true parallelism on multicore hardware. standardization ensured interoperability, influencing implementations across systems and paving the way for scalable concurrency. By 2025, trends emphasize asynchronous enhancements like Linux's interface, introduced in 2019 and expanded with features like multishot operations—such as receive multishot introduced in 6.0 (2022)—enabling efficient handling of multiple I/O events and reducing overhead in high-throughput applications. Additionally, 's memory-safe concurrency model has begun influencing kernel development, with code integrated into the since version 6.1 to mitigate race conditions in drivers and subsystems, enhancing overall without compromising .

Language and Library Support

Built-in Language Features

Programming languages provide built-in support for threading through core syntactic constructs and semantics that enable concurrent execution without relying solely on external libraries. In imperative languages like , the Thread class, part of the core java.lang package since Java 1.0 in 1996, allows direct creation and management of threads by extending the class or implementing the Runnable interface. The synchronized keyword, also introduced in Java 1.0, provides intrinsic locking for methods and blocks to ensure by preventing concurrent access to shared resources. Starting with Java 21 in 2023, virtual threads were added as a built-in feature, enabling the creation of lightweight, JVM-managed threads via methods like Thread.ofVirtual(), which support high-throughput concurrency for tasks that are often blocked on I/O, allowing millions of threads with low overhead while maintaining compatibility with existing code. Similarly, C# integrates concurrency through the Task Parallel Library (TPL), introduced in .NET 4.0 in 2010, which builds on the language's Task type to simplify parallel operations, though core thread management uses the Thread class from System.Threading since .NET 1.0 in 2002. Influenced by paradigms, Go incorporates goroutines as a concurrency primitive, launched via the go keyword since Go 1.0 in 2012, allowing thousands of concurrent executions with minimal overhead managed by the runtime. Channels, a built-in typed construct in Go since its 1.0 release, enable safe communication and between goroutines by passing of , promoting a "share by communicating" model over . At the low level, C and C++ lack native syntactic keywords for threading, instead depending on platform-specific libraries such as POSIX threads (pthreads) for concurrency, which requires explicit inclusion and linking. In contrast, Rust integrates threading via the std::thread module since its 1.0 stable release in 2015, but its true built-in innovation lies in the ownership system and Send and Sync traits, which enforce compile-time guarantees against data races and unsafe shared access across threads. Over time, languages have evolved toward asynchronous concurrency models for better handling of I/O-bound tasks. Python introduced the async and await keywords in version 3.5 in 2015 via PEP 492, enabling through native coroutines that integrate seamlessly with event loops, shifting from traditional preemptive threading to non-blocking execution. This progression reflects a broader trend in language design toward safer, more efficient concurrency primitives that reduce boilerplate while maintaining performance.

Standard Libraries and APIs

Standard libraries and APIs provide portable abstractions for thread creation, management, and synchronization across operating systems, enabling developers to write concurrent code without deep reliance on platform-specific details. These libraries standardize operations such as spawning threads, waiting for their completion, and handling attributes like stack size or scheduling policies, while promoting interoperability in multi-platform environments. , or , form the foundational standard for threading on systems, defined by the IEEE POSIX.1c specification as a set of C-language interfaces for creating and managing threads within a . The pthread_create function initiates a new thread by specifying a start routine and arguments, returning a thread identifier for subsequent management, while suspends the calling thread until the target thread terminates, facilitating synchronization and resource cleanup. These functions support thread attributes for customization, such as joinability and detachment, ensuring efficient resource handling in multi-threaded applications. On Windows, the native API uses functions like CreateThread to launch a new thread within the current process's , specifying the , parameters, and security attributes, with the function returning a for control. Complementing this, WaitForSingleObject allows a thread to wait on the until the thread signals completion or a timeout occurs, enabling orderly termination similar to POSIX joins. In C++, the std::thread class from the standard library wraps these platform APIs, providing a higher-level interface for construction via callable objects and methods like join() to block until completion, thus abstracting OS differences. For cross-platform development, offers directives for parallelizing loops in , C++, and , where the #pragma omp parallel for construct distributes iterations across threads without explicit creation code, targeting shared-memory systems for scalable performance. Similarly, Boost.Thread in C++ delivers a portable threading model with classes like boost::thread for creation and joining, along with synchronization primitives, bridging gaps between and Windows APIs before native C++ standardization. Modern extensions approximate threading behaviors for specific domains; coroutines enable stackless suspension and resumption of functions via keywords like co_await, allowing asynchronous execution that mimics cooperative threading without full OS threads, useful for I/O-bound concurrency. In browser environments, JavaScript's Web Workers API spawns dedicated threads for script execution isolated from the main UI thread, using postMessage for communication to handle computationally intensive tasks without blocking rendering. Portability challenges arise when mapping POSIX-style APIs to non-native platforms, such as implementing on Windows via the pthreads-win32 library, which emulates the full POSIX.1c subset using Win32 threads under the hood, though it may introduce minor semantic differences in scheduling or error handling. Developers must account for these mappings to ensure consistent behavior across systems and Windows.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.