Recent from talks
Nothing was collected or created yet.
Kernel preemption
View on WikipediaIn computer operating system design, kernel preemption is a property possessed by some kernels, in which the CPU can be interrupted in the middle of executing kernel code and assigned other tasks (from which it later returns to finish its kernel tasks).
Details
[edit]Specifically, the scheduler is permitted to forcibly perform a context switch (on behalf of a runnable and higher-priority process) on a driver or other part of the kernel during its execution, rather than co-operatively waiting for the driver or kernel function (such as a system call) to complete its execution and return control of the processor to the scheduler when done.[1][2][3][4] It is used mainly in monolithic and hybrid kernels, where all or most device drivers are run in kernel space. Linux is an example of a monolithic-kernel operating system with kernel preemption.
The main benefit of kernel preemption is that it solves two issues that would otherwise be problematic for monolithic kernels, in which the kernel consists of one large binary.[5] Without kernel preemption, two major issues exist for monolithic and hybrid kernels:
See also
[edit]References
[edit]- ^ a b "Preemption under Linux". kernelnewbies.org. 2009-08-22. Retrieved 2016-06-10.
- ^ a b Jonathan Corbet (2003-02-24). "Driver porting: the preemptible kernel". LWN.net. Retrieved 2016-06-10.
- ^ "FreeBSD Architecture Handbook, Chapter 8. SMPng Design Document, Section 8.3. General Architecture and Design". freebsd.org. Retrieved 2016-06-10.
- ^ Robert Love (2002-05-01). "Lowering Latency in Linux: Introducing a Preemptible Kernel". Linux Journal. Retrieved 2016-06-10.
- ^ Robert Love (2010). Linux Kernel Development (3 ed.). Pearson Education. ISBN 978-0672329463.
Kernel preemption
View on GrokipediaOverview
Definition
Kernel preemption is the capability of an operating system kernel to interrupt its own execution in kernel mode, allowing a context switch to a higher-priority task without requiring voluntary yielding from the current kernel code.[9] This contrasts with user-space preemption, where the kernel routinely interrupts user-mode processes via timer interrupts or I/O events to enforce multitasking, but traditionally refrained from doing so in kernel mode to maintain consistency and avoid race conditions.[4] Key components of kernel preemption include the scheduler, which evaluates task priorities and decides on switches; kernel-level context switching, which saves and restores the state of kernel-mode execution (such as registers and stack pointers); and triggers like hardware interrupts or periodic timers that prompt the scheduler to assess preemption opportunities.[4] These elements enable the kernel to remain responsive even during prolonged operations, such as system calls or driver handling.[10] In the basic workflow, a running kernel thread—executing code in kernel mode—may be paused mid-execution when an interrupt occurs, such as a clock tick signaling time slice expiration. The interrupt handler processes the event, and upon return, the scheduler invokes a context switch if a higher-priority runnable task exists, suspending the current thread and resuming the new one by loading its saved state.[4] This process ensures fair resource allocation among kernel activities without indefinite blocking.[11] Terminology in preemptive contexts distinguishes kernel threads, which are lightweight execution units confined to kernel mode without associated user-space address spaces, from full processes that encompass both user-mode and kernel-mode components and maintain separate memory contexts.[4] Kernel threads are particularly relevant in preemption scenarios, as they represent the primary entities subject to involuntary interruption during kernel operations.Importance
Kernel preemption plays a vital role in enhancing the responsiveness of operating systems by preventing the kernel from monopolizing the CPU during extended execution periods, thereby allowing timely switching to user-level tasks and handling of I/O events.[12] In non-preemptive kernels, such as those in early Unix systems, a single kernel thread could block user processes for hundreds of milliseconds, resulting in noticeable delays for interactive applications like desktop interfaces or multimedia playback.[12] By enabling interruptions at designated points, preemption ensures that high-priority user tasks resume execution promptly, reducing average and worst-case latencies to sub-millisecond levels (often under 100 µs in real-time configurations as of 2025).[13][14] This mechanism is particularly crucial for real-time systems, where predictable latencies are essential for applications in embedded devices, industrial control, and multimedia processing.[14] Kernel preemption minimizes scheduling delays, aiming to bound maximum response times from event stimuli to task reactions, which supports meeting deadlines in time-sensitive environments.[10] For instance, in real-time Linux configurations like PREEMPT_RT—now integrated into the mainline kernel since Linux 6.12 in 2024—interrupt handlers are managed as preemptible threaded interrupts, further lowering latencies and enabling the use of commercial off-the-shelf hardware for cost-effective real-time deployments.[14][15] On a system-wide level, kernel preemption promotes fairness in multitasking scenarios by averting priority inversions and ensuring equitable CPU allocation among competing processes.[10] It also bolsters support for symmetric multiprocessing (SMP) environments, where preemption facilitates efficient load balancing across multiple processors by allowing kernel threads to yield control without prolonged lock holdings, leveraging existing SMP locking primitives to maintain data integrity while improving overall throughput.[12] This results in higher system utilization and reduced bottlenecks in multi-core setups, benefiting both general-purpose and specialized computing workloads.[12]History
Early non-preemptive kernels
In early operating systems, kernels operated in a non-preemptive manner, meaning that once a process entered kernel mode, it executed to completion without interruption from higher-priority tasks or timers, relying instead on voluntary yielding mechanisms such as sleep or wakeup calls when awaiting resources.[16] This design ensured atomic execution of kernel code, where critical sections were protected by disabling interrupts or using simple locks to maintain data consistency on uniprocessor hardware.[16] Such voluntary cooperation simplified synchronization, as processes would explicitly release the CPU only upon blocking for I/O or other events, avoiding the need for complex preemption logic.[16] Prominent examples include the original Unix kernels developed in the 1970s at Bell Labs, such as Version 6, where kernel operations like file system manipulations or process control ran atomically without preemption in kernel mode, though user-mode processes could be preempted via clock interrupts or signals upon returning from system calls.[16] Similarly, MS-DOS, introduced in 1981 for IBM PC compatibles, featured a single-tasking kernel (comprising IBMDOS.COM and IBMBIO.COM) that executed system calls atomically, with no support for concurrent kernel-mode execution, as the entire system effectively ran one user program at a time under kernel supervision.[17] The non-preemptive approach stemmed from deliberate design choices prioritizing simplicity and reliability amid the hardware constraints of the 1970s and 1980s, including limited processor speeds, absence of robust memory management units (MMUs) in early microcomputers like the Intel 8086, and the challenges of implementing safe context switching without dedicated hardware support.[16] By avoiding forced interruptions, developers minimized the risk of race conditions in shared data structures, such as inodes or disk buffers, which could arise from untimely preemptions in resource-scarce environments; this was particularly evident in Unix's use of interrupt disabling for critical sections and MS-DOS's single-threaded model tailored to basic PC architectures.[17][16] However, this design introduced significant limitations, including prolonged latencies during extended kernel operations like disk I/O or memory allocation, which could block the entire system and render it unresponsive to user input or other events under load.[16] In Unix, for instance, a lengthy system call might delay interrupt processing for seconds, exacerbating responsiveness issues on time-shared systems, while MS-DOS's lack of preemption meant that a poorly behaving application could monopolize the CPU indefinitely, leading to freezes without external intervention.[17] These drawbacks highlighted the trade-offs in early kernel architectures, where simplicity came at the cost of interactive performance in multi-user or resource-intensive scenarios.[16]Introduction of preemption in Unix-like systems
Kernel preemption in Unix-like systems emerged as a response to the limitations of earlier non-preemptive kernel designs, which could lead to prolonged latencies during kernel execution. One of the earliest adoptions occurred in Solaris 2.0, released in 1992 by Sun Microsystems, where the kernel was made fully preemptible to support real-time threads and improve overall system responsiveness.[18] This marked a significant shift for Unix derivatives, allowing higher-priority processes to interrupt kernel-mode execution, thereby enhancing interactivity in server and workstation environments. In contrast, traditional BSD variants, such as those derived from 4.4BSD in the mid-1990s, initially retained non-preemptive kernels but began incorporating voluntary preemption mechanisms to balance performance and latency.[19] The push for kernel preemption intensified in the late 1990s and early 2000s, driven by increasing demands for desktop usability and server efficiency amid the rise of multimedia applications and multi-user workloads. Systems like early Linux kernels (versions 2.4 and prior) operated with only partial or no kernel preemption, relying on voluntary yields at specific points, which often resulted in suboptimal responsiveness for interactive tasks.[20] This evolved during the Linux 2.5 development series in 2003, where experimental full preemption was introduced to allow kernel code to be interrupted more freely, aiming to reduce scheduling latencies and support real-time extensions.[20] A key milestone came with the stable release of Linux kernel 2.6.8 in August 2004, which integrated these preemption enhancements, making them available for production use and solidifying Linux's transition to a more responsive kernel model.[21] Meanwhile, other Unix-like systems varied in their approaches: FreeBSD adopted voluntary kernel preemption with the SMPng project in version 5.0 (2003), enabling preemption primarily at sleep points and for higher-priority kernel threads to improve multiprocessor scalability without full intrusiveness.[22] Solaris, having pioneered earlier adoption, continued to emphasize preemptible kernels for real-time capabilities, influencing subsequent derivatives like illumos. These developments reflected a broader evolution toward preemptible designs to meet growing expectations for low-latency performance in diverse computing scenarios.[23]Types
Voluntary preemption
Voluntary kernel preemption refers to a mechanism where the kernel code explicitly yields control to the scheduler at designated safe points, allowing higher-priority tasks to run without forcible interruption. This process typically involves inserting calls to functions such ascond_resched() or leveraging markers like might_sleep() to check if a reschedule is needed, ensuring the kernel only preempts itself when it is in a state ready for context switching. Introduced in the Linux kernel around 2004 by developers Ingo Molnar and Arjan van de Ven as part of efforts to improve latency in version 2.6, voluntary preemption adds explicit scheduling points to kernel paths without enabling preemption everywhere.[24][25]
This approach is particularly useful in early preemptive kernel designs, such as the initial stages of Linux kernel preemption before the adoption of full preemption models, where minimizing implementation complexity was prioritized for desktop and interactive workloads. For instance, voluntary points are commonly placed at the end of system calls, after interrupt handlers, and within long-running loops or operations like user-space memory copies (e.g., in copy_from_user()), where might_sleep() serves as a natural preemption trigger. These placements allow the kernel to respond to urgent tasks, such as multimedia processing, without disrupting ongoing critical operations.[26][25][27]
In terms of design advantages, voluntary preemption simplifies locking mechanisms by confining interruptions to safe points outside critical sections, ensuring that spinlocks and other atomic operations complete without unexpected concurrency, which reduces the risk of deadlocks or reentrancy issues compared to more aggressive preemption strategies. It also balances system responsiveness with performance by avoiding the overhead of constant preemption checks throughout the kernel, thereby preserving throughput in compute-intensive scenarios while lowering latencies for interactive use—often reducing scheduling delays significantly without the stability concerns of full preemption.[28][26][10]
Involuntary preemption
Involuntary kernel preemption refers to the forced interruption of a running kernel task by a higher-priority task or event, enabling an immediate context switch to enhance system responsiveness and reduce latency.[12] This mechanism allows the kernel to suspend execution of the current task without its explicit consent, contrasting with voluntary preemption where the kernel yields control at designated points.[29] The primary triggers for involuntary preemption include periodic hardware timer interrupts, which signal the expiration of a time quantum, and higher-priority events such as the awakening of a critical task due to priority inversions—where a low-priority task holding a resource blocks a high-priority one.[29] These interrupts prompt the kernel to evaluate whether a context switch is necessary, ensuring that urgent tasks are not unduly delayed by ongoing kernel operations.[12] During the preemption process, the kernel examines a preemption flag—typically set by the scheduler when a higher-priority task becomes runnable—upon returning from an interrupt or at designated safe windows where preemption is permitted.[12] If the flag indicates a need for rescheduling, the current kernel thread is suspended, its state is saved, and the scheduler selects and resumes the higher-priority thread, facilitating a seamless transition.[30] This check occurs only in non-critical sections to avoid disrupting atomic operations, with preemption explicitly disabled in areas like lock-held regions to maintain consistency.[12] For full preemptibility, kernel threads require robust support, including efficient stack management to handle potential re-entrancy during nested interrupts or switches, ensuring that interrupted kernel code can resume correctly without data corruption.[12] This involves allocating sufficient stack space and using mechanisms to track preemption counts, preventing recursive preemptions in sensitive paths.[29] Involuntary kernel preemption differs fundamentally from user-space preemption, as it must account for the complexities of kernel-level execution, such as managing device drivers and ongoing interrupt handlers during switches to prevent system instability or race conditions.[29] While user preemption relies on simple time-slicing, kernel preemption demands careful synchronization of shared resources and interrupt priorities to ensure safe re-entrancy and minimal overhead.[12]Mechanisms
Preemption points and interrupt handling
In kernel design, preemption points are specific locations in the codebase where the scheduler is explicitly invoked to check for and potentially perform a context switch, enabling higher-priority tasks to run without undue delay. These points are typically inserted after operations that could consume significant CPU time, such as tight loops, memory allocations, or blocking I/O completions, to balance responsiveness with performance efficiency. For instance, in the Linux kernel, thecond_resched() macro serves as a common voluntary preemption point, yielding the CPU if a higher-priority task is ready.[26] This approach minimizes the overhead of constant scheduler checks while preventing long-running kernel code from monopolizing the processor.
Hardware interrupts serve as primary triggers for involuntary preemption in the kernel, disrupting normal execution flow and providing opportunities for rescheduling upon handler completion. When an interrupt arrives, the kernel saves the current context, executes the handler in a high-priority mode, and then, during exit, evaluates whether preemption is needed based on task priorities or elapsed time. In Linux, the irq_exit_rcu() function decrements the preemption count and invokes the scheduler if the TIF_NEED_RESCHED flag is set, ensuring seamless transitions back to user space or between kernel tasks.[31] This mechanism ties interrupt handling directly to preemption, as handlers themselves run non-preemptibly to maintain atomicity but open the door for switches at boundaries.
Safe preemption requires protecting critical atomic sections—brief code paths that must execute without interruption—from both preemption and concurrent interrupts to avoid race conditions or data corruption. Kernels use primitives like spinlocks to enforce this, where acquiring a lock (e.g., via spin_lock()) increments a per-task preemption counter, disabling switches until release. This ensures that shared data structures remain consistent even in multiprocessor environments, where an interrupt on another CPU could otherwise interfere.[8] Similarly, explicit calls to preempt_disable() provide nestable protection for short sections, complementing interrupt disable functions like local_irq_disable() for full safety.[8]
High-resolution timers integrate deeply with kernel preemption by generating precise periodic interrupts that enforce time quanta, the maximum CPU allocation per task before rescheduling. These timers, often based on hardware clocks like the APIC timer, fire at sub-millisecond intervals in preemptible kernels, prompting the scheduler to check and potentially preempt the current task if its quantum expires. In Linux, the hrtimer subsystem delivers nanosecond-resolution events, supporting fine-grained enforcement without relying on coarser jiffies, which improves latency in interactive and real-time workloads.[32] This timer-driven approach ensures fairness and responsiveness, as seen in periodic ticks that evaluate task priorities during interrupt handling.[33]
Enabling and disabling preemption
In kernel design, enabling and disabling preemption serves as a runtime mechanism to protect critical sections of code from interruption by higher-priority tasks, ensuring atomicity and preventing issues like data corruption or race conditions.[8] In the Linux kernel, this is primarily achieved through functions likepreempt_disable() and preempt_enable(), which manipulate a per-task preemption counter to toggle the preemptible state.[8] When preempt_disable() is called, it increments the counter, signaling the scheduler to defer any task switches until the counter returns to zero, thereby suspending preemption for the duration of the protected operation.[34] Conversely, preempt_enable() decrements the counter, and preemption is only re-enabled once it reaches zero, allowing the scheduler to potentially reschedule the current task if a higher-priority one is pending.[8]
These functions are commonly used in contexts where kernel code must execute without interruption, such as when accessing per-CPU data structures or during brief, non-blocking operations that could otherwise lead to reentrancy problems.[8] For instance, while holding spinlocks, preemption is implicitly disabled to maintain consistency across multiple CPUs, as spinlocks are designed for short-held, busy-waiting synchronization primitives that assume uninterrupted execution. Similarly, in interrupt handlers, preemption is automatically disabled to avoid nested scheduling decisions that could complicate handler logic and increase stack usage, ensuring that the handler completes before any task switch occurs.[8] This usage prevents scenarios where a preempted task might access inconsistent shared state, such as CPU-local variables obtained via functions like smp_processor_id().[8]
To support nested critical sections, the preemption mechanism employs a counting system that tracks the depth of disable-enable pairs, preventing premature re-enabling.[34] The counter, accessible via preempt_count(), is structured with the least-significant byte dedicated to preemption nesting levels, while higher bits track interrupt states; a non-zero value in any relevant field blocks preemption.[34] This allows multiple invocations of preempt_disable()—for example, within recursive function calls—without immediately re-enabling preemption, as each disable increments the count independently, and an equal number of preempt_enable() calls is required to restore the preemptible state.[8] Such nesting ensures balanced operation even in complex code paths, like those involving layered locking or conditional protections.[34]
While effective for correctness, temporarily disabling preemption can elevate worst-case latency by delaying responses to time-sensitive events, as the scheduler cannot intervene until the critical section completes.[8] In real-time systems, this trade-off necessitates minimizing the scope and duration of disabled sections to bound maximum response times, often to microseconds in optimized kernels.[35] For example, guidelines recommend using these mechanisms only for short, atomic operations, as prolonged disabling exacerbates latency spikes in preemptible environments.[8]
