Recent from talks
Contribute something
Nothing was collected or created yet.
Kexec
View on Wikipedia
kexec (kernel execute), analogous to the Unix/Linux kernel call exec, is a mechanism of the Linux kernel that allows booting of a new kernel from the currently running one.
Details
[edit]Essentially, kexec skips the bootloader stage and hardware initialization phase performed by the system firmware (BIOS or UEFI), and directly loads the new kernel into main memory and starts executing it immediately. This avoids the long times associated with a full reboot, and can help systems to meet high-availability requirements by minimizing downtime.[1][2]
While feasible, implementing a mechanism such as kexec raises two major challenges:
- Memory of the currently running kernel is overwritten by the new kernel, while the old one is still executing.
- The new kernel will usually expect all hardware devices to be in a well defined state, in which they are after a system reboot because the system firmware resets them to a "sane" state. Bypassing a real reboot may leave devices in an unknown state, and the new kernel will have to recover from that.
Support for allowing only signed kernels to be booted through kexec was merged into version 3.17 of the Linux kernel mainline, which was released on October 5, 2014.[3] This disallows a root user to load arbitrary code via kexec and execute it, complementing the UEFI secure boot and in-kernel security mechanisms for ensuring that only signed Linux kernel modules can be inserted into the running kernel.[4][5][6]
Kexec is used by LinuxBoot to boot the main kernel from the Linux kernel located in the firmware.
See also
[edit]- kdump (Linux) – Linux kernel's crash dump mechanism, which internally uses kexec
- kGraft – Linux kernel live patching technology developed by SUSE
- kpatch – Linux kernel live patching technology developed by Red Hat
- Ksplice – Linux kernel live patching technology developed by Ksplice, Inc. and later bought by Oracle
References
[edit]- ^ Hariprasad Nellitheertha (May 4, 2004). "Reboot Linux faster using kexec". IBM. Archived from the original on January 21, 2013. Retrieved December 5, 2013.
- ^ David Pendell (August 16, 2008). "Reboot like a racecar with kexec". linux.com. Archived from the original on February 14, 2009. Retrieved December 5, 2013.
- ^ "Linux kernel 3.17, Section 1.10. Signed kexec kernels". kernelnewbies.org. October 5, 2014. Retrieved November 3, 2014.
- ^ Jake Edge (June 25, 2014). "Reworking kexec for signatures". LWN.net. Retrieved August 9, 2014.
- ^ Matthew Garrett (December 3, 2013). "Subverting security with kexec". dreamwidth.org. Retrieved December 5, 2013.
- ^ Kees Cook (December 10, 2013). "Live patching the kernel". outflux.net. Retrieved December 12, 2013.
Kexec
View on Grokipediakexec_load(2) interface, allows userspace tools like the kexec utility to load kernel images (such as ELF or bzImage formats) into reserved memory regions, with options for normal reboots or crash scenarios.[1][4]
Key features include support for preserving system state across transitions, enforced through kernel configuration options like CONFIG_KEXEC=y, and compatibility with various architectures including x86, ARM, and PowerPC.[1] In modern kernels, enhancements such as Kexec HandOver (KHO)—introduced to serialize and transfer driver states, memory regions, and arbitrary properties to the new kernel—further improve reliability for scenarios like live migrations or complex hardware handoffs.[5] This evolution underscores kexec's role in enabling resilient, high-availability Linux environments, particularly in enterprise and embedded systems where minimizing reboot times is critical.[6]
Overview
Definition and Purpose
kexec is a Linux kernel mechanism that enables the direct loading and execution of a new kernel from within a running kernel, primarily through thekexec_load() system call.[4] This system call allows userspace applications to prepare a secondary kernel image in memory, which can later be booted without invoking hardware reinitialization or firmware components such as BIOS or UEFI.[2] By bypassing these traditional boot stages, kexec avoids the lengthy initialization sequences typically required during system startup.[7]
The primary purposes of kexec include accelerating system transitions for maintenance and updates, streamlining kernel development and testing workflows through quicker iteration cycles, and enabling reliable kernel crash analysis via mechanisms like kdump.[8][3] It addresses the inefficiencies of conventional reboot processes in Linux environments, where firmware boot times can significantly delay operations.[7]
In conceptual terms, kexec parallels the traditional exec system call used in user space to replace a running process with a new executable, but extends this paradigm to the kernel level for seamless kernel handoff while retaining access to system memory.[2]
Basic Mechanism
kexec operates by allowing the currently running Linux kernel to directly load and execute a new kernel image in memory, bypassing traditional firmware and bootloader initialization sequences. This transition is initiated through a system call from user space, typically using tools like the kexec utility, which interfaces with the kernel to prepare the handover. The process ensures that the new kernel starts from a clean state while minimizing hardware reinitialization, thereby reducing boot time.[1][2] The mechanism proceeds in a two-phase load and execution sequence. In the first phase, the new kernel image—along with any associated initramfs and command-line parameters—is loaded into a designated memory region using thekexec_load system call. This phase relocates the kernel segments to avoid conflicts with the running kernel's memory usage, reserving specific address ranges (e.g., via options like --mem-min and --mem-max) to prevent overwriting critical data structures or code. A key component here is the purgatory code segment, an ELF-relocatable object that acts as an intermediary for verification and cleanup tasks, such as computing SHA-256 hashes to ensure the integrity of the loaded kernel image before execution.[1][2]
In the second phase, triggered by the kexec system call, the running kernel performs a minimal shutdown procedure—halting non-crashing CPUs via interrupts if necessary—and transfers control to the purgatory segment. The purgatory code then executes any required post-shutdown actions, such as saving register states, before jumping directly to the entry point of the new kernel. This handover preserves the overall system memory layout where possible, with the reserved region spanning a small portion of physical RAM to house the purgatory, new kernel, and parameters without interfering with ongoing operations.[2][1]
Conceptually, the process can be visualized as a linear transition: the old kernel loads the purgatory and new kernel segments into reserved memory, shuts down minimally, and passes control to purgatory, which in turn invokes the new kernel's startup routine. This avoids the full power cycle and hardware probing of a conventional reboot, enabling a seamless kernel-to-kernel switch.[2][1]
History
Origins and Development
The development of kexec began in the early 2000s amid discussions on the Linux kernel mailing list and the fastboot mailing list, focusing on mechanisms for faster kernel switching to enhance development workflows.[9] Eric W. Biederman emerged as the key contributor, posting the first substantial patches for a minimal kexec implementation on October 30, 2002, targeting Linux kernel version 2.5.44 on x86 architecture.[10] These early efforts laid the groundwork for a system call enabling direct loading and execution of a new kernel from within a running one.[10] The primary motivations for kexec stemmed from the need to drastically reduce reboot times, which could take minutes on complex hardware due to BIOS initialization, device probing, and bootloader overhead—particularly burdensome for kernel developers iterating through frequent tests.[11] Biederman's patches addressed this by allowing a "warm" reboot that skips firmware stages, potentially cutting reboot duration from over a minute to seconds.[11] An additional driver was the desire to facilitate reliable kernel crash dumping without full hardware resets, preserving system memory for analysis in production environments.[2] Biederman continued refining the implementation through 2003-2004, incorporating feedback from the community and expanding support for hardware compatibility.[10] By mid-2005, kexec had matured sufficiently for mainline inclusion, with multiple architecture-specific fixes and cleanups merged into the Linux kernel.[12] It became a standard feature in kernel version 2.6.13, released on August 29, 2005, marking its official adoption as a core capability for fast kernel transitions.[12] This integration enabled broader use in both development and operational contexts, later extending to tools like kdump for crash handling.[2]Adoption and Milestones
kexec was integrated into the mainline Linux kernel with version 2.6.13 in 2005, enabling the core functionality for loading and booting a new kernel directly from a running one without firmware reinitialization.[13][14] Subsequent enhancements included the introduction of file-based loading via the kexec_file_load system call in kernel 3.17 in 2014, which allowed loading kernels and initramfs directly from files and improved compatibility with secure environments.[15][14] Major Linux distributions adopted kexec shortly after its mainline inclusion. Red Hat integrated kexec-tools into Red Hat Enterprise Linux 5, released in 2007, to support fast reboots and crash dumping.[16] SUSE has packaged kexec-tools since at least SUSE Linux Enterprise Server 10 in 2006, providing ongoing support for reboot acceleration and kdump. By the 2010s, Arch Linux documented kexec usage in its official wiki, facilitating community adoption for custom kernel switching.[17] Key milestones in kexec's evolution include ARM architecture support added in kernel 3.17 in 2014, enabling file-based loading on ARM platforms for embedded and mobile systems.[15] For x86_64, improvements for Secure Boot compatibility were implemented in Linux kernel 3.17 in 2014, allowing signed kernel loading to comply with UEFI firmware restrictions.[18] Ongoing updates for EFI/UEFI environments continued into the 6.x series, with features like kexec handover merged in kernel 6.16 in 2025 to enhance live kernel transitions in modern boot setups.[19][3] kexec also influenced related projects, such as kboot, a 2006 proof-of-concept bootloader that leverages kexec to simplify multi-kernel booting from traditional loaders like GRUB.[20]Technical Implementation
Kernel-Level Components
The kexec subsystem is implemented primarily in the kernel source filekernel/kexec.c, which provides the core functionality for loading and executing a new kernel image from within the running kernel. This subsystem handles the allocation of memory for kernel segments, validation of loaded images, and coordination of device shutdown before jumping to the new kernel, ensuring a direct transition without firmware reinitialization. A mutex is employed to serialize operations, particularly during the loading of crash kernels, preventing concurrent modifications to shared resources.
Central to the subsystem is the kimage structure, which manages the segments of the kernel image being loaded. This structure includes fields such as start for the entry point, nr_segments to track the number of segments (limited to KEXEC_SEGMENT_MAX), and an array of kexec_segment entries that define memory ranges for kernel code, data, and other components. The kimage facilitates segment validation, memory allocation via kimage_alloc_page, and preparation for execution, including handling control code pages and swap pages to avoid conflicts during the transition. Additionally, the purgatory component, an architecture-specific code segment, performs cleanup tasks such as disabling interrupts and preparing hardware state before the new kernel takes over.
The primary interface for loading kernel images is the kexec_load system call, defined as SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments, struct kexec_segment *, segments, unsigned long, flags).[14] This call, restricted to root privileges, allows loading up to KEXEC_SEGMENT_MAX segments into memory, with flags controlling aspects like architecture-specific behavior (e.g., KEXEC_ARCH_MASK) and crash kernel designation.[14] For enhanced security, particularly in environments with secure boot, the kexec_file_load system call was introduced in Linux kernel 3.17 as SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, unsigned long, cmdline_len, const char __user *, cmdline, unsigned long, flags). It accepts file descriptors for the kernel and optional initramfs, enabling the kernel to verify signatures and load images directly from files rather than user-provided memory buffers.
Architecture-specific handling is integral to kexec, addressing variations in boot protocols and hardware transitions. On x86, the subsystem manages real-mode transitions by loading the new kernel's real-mode setup code and using purgatory to switch from protected mode back to real mode, mimicking the bootloader process to initialize the new kernel without BIOS/UEFI involvement. For ARM architectures, kexec passes the device tree blob (DTB) to the new kernel, ensuring hardware description continuity; this involves copying the DTB into a reserved segment and updating the kernel's entry parameters to reference it during the handoff.[21] Relocation of the initrd or initramfs is handled by mapping segments to available physical memory, avoiding overlaps with the running kernel's address space, and adjusting pointers in the kimage structure accordingly.
Memory management for kexec, especially in crash scenarios, relies on the crashkernel boot parameter to reserve a dedicated region at boot time. Specified as crashkernel=<size>[@offset], such as crashkernel=256M, it allocates a contiguous block (e.g., 256 megabytes) from low physical memory, preventing its use by the main kernel and ensuring availability for the crash dump kernel loaded via kexec. This reservation is parsed early in the boot process and enforced through the buddy allocator, with the offset allowing placement below the 4 GiB boundary on systems with high memory.
User-Space Tools and Interfaces
The primary user-space tool for interacting with kexec is thekexec command, provided by the kexec-tools package, which contains binaries and utilities to load kernel images into memory and initiate direct execution without invoking the bootloader. This package includes the /sbin/kexec binary, enabling administrators to prepare and trigger kernel transitions from the running system.[22][23]
The kexec-tools package has been distributed in major Linux repositories since the mid-2000s, with initial inclusion in Debian Etch (released April 2007, version 1.101-kdump10-2), and subsequent availability in Fedora (starting with Fedora Core 6 in 2006) and Ubuntu (from version 7.04 in 2007, with ongoing updates).[24][25][26]
Key command-line options facilitate kernel loading and execution; the -l (or --load) flag loads a kernel image along with optional initrd and append parameters, as in the example kexec -l /boot/vmlinuz --initrd=/boot/initrd.img --append="root=/dev/sda1", which specifies the root device and other boot arguments. Following loading, the -e (or --exec) option executes the prepared kernel, bypassing hardware reinitialization for a faster transition. These options support formats like ELF, bzImage, and PE for compatibility across architectures.[1]
For system integration, kexec works with init systems like systemd via the kexec.target, a special target that activates systemd-kexec.service during shutdown to coordinate unmounting filesystems, disabling swap, and process termination before invoking the loaded kernel. This allows seamless use of systemctl kexec for reboots in systemd-based environments. Additionally, automated setups can leverage GRUB configurations by employing helper tools that parse GRUB menu entries to load equivalent kernels via kexec, enabling script-based or default boot alignments without manual intervention.[27][28]
Applications and Use Cases
Fast System Rebooting
kexec enables fast system rebooting by permitting the running Linux kernel to directly load a new kernel into memory and execute it, circumventing the firmware's Power-On Self-Test (POST), hardware initialization, and bootloader phases such as GRUB. This bypass eliminates time-intensive steps like BIOS or UEFI checks and device probing, often shrinking reboot durations from minutes to seconds on compatible systems.[29][30][18] Recent advancements, such as memory persistence over kexec using the Kexec HandOver (KHO) subsystem, allow preservation of non-movable memory pages across transitions, further reducing state loss in cloud and virtualized environments.[31] To set up kexec for fast reboots, administrators install the kexec-tools package and configure it to preload the production kernel, typically via distro-specific files such as /etc/kexec.conf or through scripts and systemd services like kexec-load.service. These configurations specify the kernel image, initramfs, and command-line arguments—often mirroring the current boot parameters—to ensure a seamless transition upon invocation of the kexec command or systemctl kexec.[32][30][1] In high-availability servers, kexec minimizes unplanned downtime during maintenance; in cloud environments like AWS EC2 instances, it accelerates restarts without full hardware resets; and on development machines, it supports swift kernel testing cycles by avoiding lengthy firmware loads.[30][33][34] Benchmarks demonstrate substantial performance gains, with reboot times typically reduced by 50-90% based on hardware complexity—for example, from 184 seconds to 30 seconds on an x86 server with multiple cores and large RAM, or achieving a 77% downtime reduction in controlled evaluations.[30][35][36]Kernel Crash Analysis (kdump)
Kdump is a kernel crash dumping mechanism that leverages kexec to boot a secondary "capture" kernel during a system panic, enabling the preservation and analysis of the crashed kernel's memory state. This approach allows for the creation of a memory dump file, known as vmcore, which captures the volatile contents of RAM that would otherwise be lost upon a full system reboot. The capture kernel is typically a minimal, stable kernel configured specifically for dump collection, often generated using tools like dracut to create an initramfs image. Utilities such as makedumpfile are employed to process the dump, compressing it and applying filters to exclude unnecessary data like free pages, zero pages, or user-space memory, thereby reducing the file size and focusing on kernel-relevant information.[3][37] Configuration of kdump begins with reserving a portion of physical memory for the capture kernel during the primary kernel's boot process, specified via thecrashkernel= boot parameter in the bootloader configuration (e.g., GRUB). This parameter reserves a fixed amount of memory, such as crashkernel=256M, or uses auto-detection like crashkernel=auto for systems with varying RAM sizes; an optional offset (e.g., crashkernel=256M@16M) can be set to avoid conflicts with low-memory regions. The kexec tools load the capture kernel and initramfs into this reserved area. Further customization occurs through the /etc/kdump.conf file, which defines the dump target—such as a local path (e.g., path /var/crash), a network file system (e.g., nfs server.example.com:/export/cores), or secure shell transfer (e.g., ssh [email protected] with an SSH key)—along with options for the core collector like makedumpfile for filtering and compression.[3][37]
Upon a kernel panic—triggered by events like panic(), a fatal die() kernel error, or a non-maskable interrupt (NMI)—the primary kernel invokes kexec to immediately transfer control to the pre-loaded capture kernel without performing a hardware reset or BIOS reinitialization. The capture kernel then mounts the root file system (or network target if specified) and executes scripts to analyze the memory image accessible via /proc/vmcore, which appears as a device file representing the entire physical memory of the crashed system. makedumpfile processes this vmcore by applying filters (e.g., level 31 to exclude cache-private pages, free pages, and user data) and compression (e.g., LZMA or LZO algorithms) before saving the output to the configured target, after which the system can reboot normally. This process ensures a complete and consistent dump, even in cases of hardware faults or triple faults that would corrupt traditional netdump or diskdump methods.[3][37]
The primary advantages of kdump lie in its ability to capture the full, unaltered kernel memory state—including registers, stacks, and dynamic data structures—that dissipates in conventional crash dumping techniques reliant on firmware or external debuggers, facilitating detailed post-mortem analysis with tools like the crash utility. By avoiding the full reboot cycle until after the dump, kdump minimizes data loss and supports debugging of subtle issues like memory corruption or driver bugs. This feature has been integrated into the Linux kernel since version 2.6.13, with ongoing enhancements for architectures including x86_64, ARM, PowerPC, s390, RISC-V, and others.[3][14]
