Hubbry Logo
search
logo
EBPF
EBPF
current hub
2508132

EBPF

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia
eBPF
Original authorsAlexei Starovoitov,
Daniel Borkmann[1][2]
DevelopersOpen source community, Meta, Google, Isovalent, Microsoft, Netflix[1]
Initial release2014; 11 years ago (2014)[3]
RepositoryLinux: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
Windows: github.com/Microsoft/ebpf-for-windows/
Written inC
Operating systemLinux, Windows[4]
TypeRuntime system
LicenseLinux: GPL
Windows: MIT License
Websiteebpf.io

eBPF is a technology that can run programs in a privileged context such as the operating system kernel.[5] It is the successor to the Berkeley Packet Filter (BPF, with the "e" originally meaning "extended") filtering mechanism in Linux and is also used in non-networking parts of the Linux kernel as well.

It is used to safely and efficiently extend the capabilities of the kernel at runtime without requiring changes to kernel source code or loading kernel modules.[6] Safety is provided through an in-kernel verifier which performs static code analysis and rejects programs which crash, hang or otherwise interfere with the kernel negatively.[7][8]

This validation model differs from sandboxed environments, where the execution environment is restricted and the runtime has no insight about the program.[9] Examples of programs that are automatically rejected are programs without strong exit guarantees (i.e. for/while loops without exit conditions) and programs dereferencing pointers without safety checks.[10]

Design

[edit]

Loaded programs which passed the verifier are either interpreted or in-kernel just-in-time compiled (JIT compiled) for native execution performance. The execution model is event-driven and with few exceptions run-to-completion,[2] meaning, programs can be attached to various hook points in the operating system kernel and are run upon triggering of an event. eBPF use cases include (but are not limited to) networking such as XDP, tracing and security subsystems.[5] Given eBPF's efficiency and flexibility opened up new possibilities to solve production issues, Brendan Gregg famously dubbed eBPF "superpowers for Linux".[11] Linus Torvalds said, "BPF has actually been really useful, and the real power of it is how it allows people to do specialized code that isn't enabled until asked for".[12] Due to its success in Linux, the eBPF runtime has been ported to other operating systems such as Windows.[4]

History

[edit]

eBPF evolved from the classic Berkeley Packet Filter (cBPF, a retroactively-applied name). At the most basic level, it introduced the use of ten 64-bit registers (instead of two 32-bit long registers for cBPF), different jump semantics, a call instruction and corresponding register passing convention, new instructions, and a different encoding for these instructions.[13]

Most significant milestones in the evolution of eBPF
Date Event
April 2011 The first in-kernel Linux just-in-time compiler (JIT compiler) for the classic Berkeley Packet Filter was merged.[14]
January 2012 The first non-networking use case of the classic Berkeley Packet Filter, seccomp-bpf,[15] appeared; it allows filtering of system calls using a configurable policy implemented through BPF instructions.
March 2014 David S. Miller, primary maintainer of the Linux networking stack, accepted the rework of the old in-kernel BPF interpreter. It was replaced by an eBPF interpreter and the Linux kernel internally translates classic BPF (cBPF) into eBPF instructions.[16] It was released in version 3.18 of the Linux kernel.[17]
March 2015 The ability to attach eBPF to kprobes as first tracing use case was merged.[19] In the same month, initial infrastructure work got accepted to attach eBPF to the networking traffic control (tc) layer allowing to attach eBPF to the core ingress and later also egress paths of the network stack, later heavily used by projects such as Cilium.[20][21][22]
August 2015 The eBPF compiler backend got merged into LLVM 3.7.0 release.[23]
September 2015 Brendan Gregg announced a collection of new eBPF-based tracing tools as the bcc project, providing a front-end for eBPF to make it easier to write programs.[24]
July 2016 eBPF got the ability to be attached into network driver's core receive path. This layer is known today as eXpress DataPath (XDP) and was added as a response to DPDK to create a fast data path which works in combination with the Linux kernel rather than bypassing it.[25][26][27]
August 2016 Cilium was initially announced during LinuxCon as a project providing fast IPv6 container networking with eBPF and XDP. Today, Cilium has been adopted by major cloud provider's Kubernetes offerings and is one of the most widely used CNIs.[28][22][29]
November 2016 Netronome added offload of eBPF programs for XDP and tc BPF layer to their NIC.[30]
May 2017 Meta's layer 4 load-balancer, Katran, went live. Every packet towards facebook.com since then has been processed by eBPF & XDP.[31]
November 2017 eBPF becomes its own kernel subsystem to ease the continuously growing kernel patch management. The first pull request by eBPF maintainers was submitted.[32]
September 2017 Bpftool was added to the Linux kernel as a user space utility to introspect the eBPF subsystem.[33]
January 2018 A new socket family called AF_XDP was published, allowing for high performance packet processing with zero-copy semantics at the XDP layer.[34] Today, DPDK has an official AF_XDP poll-mode driver support.[35]
February 2018 The bpfilter prototype has been published, allowing translation of a subset of iptables rulesets into eBPF via a newly developed user mode driver. The work has caused controversies due to the ongoing nftables development effort and has not been merged into mainline.[36][37]
October 2018 The new bpftrace tool has been announced by Brendan Gregg as DTrace 2.0 for Linux.[38]
November 2018 eBPF introspection has been added for kTLS in order to support the ability for in-kernel TLS policy enforcement.[39]
November 2018 BTF (BPF Type Format) has been added to the Linux kernel as an efficient meta data format which is approximately 100x smaller in size than DWARF.[40]
December 2019 The first 880-page long book on BPF, written by Brendan Gregg, was released.[41]
March 2020 Google upstreamed BPF LSM support into the Linux kernel, enabling programmable Linux Security Modules (LSMs) through eBPF.[42]
September 2020 The eBPF compiler backend for GNU Compiler Collection (GCC) was merged.[43]
July 2022 Microsoft released eBPF for Windows, which runs code in the NT kernel.[4]
October 2024 The eBPF instruction set architecture (ISA) is published as RFC 9669.

Architecture and concepts

[edit]

eBPF maps

[edit]

eBPF maps are efficient key/value stores that reside in kernel space and can be used to share data among multiple eBPF programs or to communicate between a user space application and eBPF code running in the kernel. eBPF programs can leverage eBPF maps to store and retrieve data in a wide set of data structures. Map implementations are provided by the core kernel. There are various types,[44] including hash maps, arrays, and ring buffers.

In practice, eBPF maps are typically used for scenarios such as a user space program writing configuration information to be retrieved by an eBPF program, an eBPF program storing state for later retrieval by another eBPF program (or a future run of the same program), or an eBPF program writing results or metrics into a map for retrieval by a user space program that will present results.[45]

eBPF virtual machine

[edit]

The eBPF virtual machine runs within the kernel and takes in a program in the form of eBPF bytecode instructions which are converted to native machine instructions that run on the CPU. Early implementations of eBPF saw eBPF bytecode interpreted, but this has now been replaced with a Just-in-Time (JIT) compilation process for performance and security-related reasons.[45]

The eBPF virtual machine consists of eleven 64-bit registers with 32-bit subregisters, a program counter and a 512-byte large BPF stack space. These general purpose registers keep track of state when eBPF programs are executed.[46]  

Tail calls

[edit]

Tail calls can call and execute another eBPF program and replace the execution context, similar to how the execve() system call operates for regular processes. This basically allows an eBPF program to call another eBPF program. Tail calls are implemented as a long jump, reusing the same stack frame. Tail calls are particularly useful in eBPF, where the stack is limited to 512 bytes. During runtime, functionality can be added or replaced atomically, thus altering the BPF program’s execution behavior.[46] A popular use case for tail calls is to spread the complexity of eBPF programs over several programs. Another use case is for replacing or extending logic by replacing the contents of the program array while it is in use. For example, to update a program version without downtime or to enable/disable logic.[47]

BPF to BPF calls

[edit]

It is generally considered good practice in software development to group common code into a function, encapsulating logic for reusability. Prior to Linux kernel 4.16 and LLVM 6.0, a typical eBPF C program had to explicitly direct the compiler to inline a function, resulting in a BPF object file that had duplicate functions. This restriction was lifted, and mainstream eBPF compilers now support writing functions naturally in eBPF programs. This reduces the generated eBPF code size, making it friendlier to a CPU instruction cache.[45][46]

eBPF verifier

[edit]

The verifier is a core component of eBPF, and its main responsibility is to ensure that an eBPF program is safe to execute. It performs a static analysis of the eBPF bytecode to guarantee its safety. The verifier analyzes the program to assess all possible execution paths. It steps through the instructions in order and evaluates them. The verification process starts with a depth-first search through all possible paths of the program, the verifier simulates the execution of each instruction using abstract interpretation,[48] tracking the state of registers and stack if any instruction could lead to an unsafe state, verification fails. This process continues until all paths have been analyzed or a violation is found. Depending on the type of program, the verifier checks for violations of specific rules. These rules can include checking that an eBPF program always terminates within a reasonable amount of time (no infinite loops or infinite recursion), checking that an eBPF program is not allowed to read arbitrary memory because being able to arbitrary read memory could allow a program to leak sensitive information, checking that network programs are not allowed to access memory outside of packet bounds because adjacent memory could contain sensitive information, checking that programs are not allowed to deadlock, so any held spinlocks must be released and only one lock can be held at a time to avoid deadlocks over multiple programs, checking that programs are not allowed to read uninitialized memory.  This is not an exhaustive list of the checks the verifier does, and there are exceptions to these rules. An example is that tracing programs have access to helpers that allow them to read memory in a controlled way, but these program types require root privileges and thus do not pose a security risk.[47][45]

Over time the eBPF verifier has evolved to include newer features and optimizations, such as support for bounded loops, dead-code elimination, function-by-function verification, and callbacks.

eBPF CO-RE (Compile Once - Run Everywhere)

[edit]

eBPF programs use the memory and data structures from the kernel. Some structures can be modified between different kernel versions, altering the memory layout. Since the Linux kernel is continuously developed, there is no guarantee that the internal data structures will remain the same across different versions. CO-RE is a fundamental concept in modern eBPF development that allows eBPF programs to be portable across different kernel versions and configurations. It addresses the challenge of kernel structure variations between different Linux distributions and versions. CO-RE comprises BTF (BPF Type Format) - a metadata format that describes the types used in the kernel and eBPF programs and provides detailed information about struct layouts, field offsets, and data types. It enables runtime accessibility of kernel types, which is crucial for BPF program development and verification. BTF is included in the kernel image of BTF-enabled kernels. Special relocations are emitted by the compiler (e.g., LLVM). These relocations capture high-level descriptions of what information the eBPF program intends to access. The libbpf library adapts eBPF programs to work with the data structure layout on the target kernel where they run, even if this layout is different from the kernel where the code was compiled. To do this, libbpf needs the BPF CO-RE relocation information generated by Clang as part of the compilation process.[45] The compiled eBPF program is stored in an ELF (Executable and Linkable Format) object file. This file contains BTF-type information and Clang-generated relocations. The ELF format allows the eBPF loader (e.g., libbpf) to process and adjust the BPF program dynamically for the target kernel.[49]

Branding

[edit]

The alias eBPF is often interchangeably used with BPF,[2][50] for example by the Linux kernel community. eBPF and BPF is referred to as a technology name like LLVM.[2] eBPF evolved from the machine language for the filtering virtual machine in the Berkeley Packet Filter as an extended version, but as its use cases outgrew networking, today "eBPF" is preferentially interpreted as a pseudo-acronym.[2]

The bee is the official logo for eBPF. At the first eBPF Summit there was a vote taken and the bee mascot was named "eBee".[51][52] The logo has originally been created by Vadim Shchekoldin.[52] Earlier unofficial eBPF mascots have existed in the past,[53] but have not seen widespread adoption.

Governance

[edit]

The eBPF Foundation was created in August 2021 with the goal to expand the contributions being made to extend the powerful capabilities of eBPF and grow beyond Linux.[1] Founding members include Meta, Google, Isovalent, Microsoft and Netflix. The purpose is to raise, budget, and spend funds in support of various open source, open data and/or open standards projects relating to eBPF technologies[54] to further drive the growth and adoption of the eBPF ecosystem. Since inception, Red Hat, Huawei, Crowdstrike, Tigera, DaoCloud, Datoms, FutureWei also joined.[55]

Adoption

[edit]

eBPF has been adopted by a number of large-scale production users, for example:

Security

[edit]

Due to the ease of programmability, eBPF has been used as a tool for implementing microarchitectural timing side-channel attacks such as Spectre against vulnerable microprocessors.[100] While unprivileged eBPF implements mitigations against Spectre v1, v2, and v4 for x86-64,[101][102] unprivileged use has ultimately been disabled by the kernel community by default to protect users of unsupported architectures and limit the impact of future hardware vulnerabilities.[103] On x86-64, Spectre v1 is mitigated through a combination of branchless bounds-enforcement (e.g., masking instructions) and the verification of speculative execution paths. Spectre v4 is mitigated exclusively through speculation barriers (i.e., lfence) and Spectre v2 is mitigated through retpoline when available[104] or speculation barriers. These mitigations prevent sensitive information owned by the kernel (e.g., kernel addresses) from being leaked by malicious eBPF programs, but are not designed to prevent innocuous eBPF programs from accidentally leaking sensitive information they own/process (e.g., cryptographic keys stored as numbers).[102]

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
eBPF, or extended Berkeley Packet Filter, is a powerful in-kernel virtual machine technology in the Linux operating system that enables the safe and efficient execution of user-defined programs directly within kernel space, allowing for runtime extensibility, observability, and customization without requiring changes to kernel source code or the loading of kernel modules.[1] Originally evolved from the classic Berkeley Packet Filter (BPF), which was developed in 1992 for efficient network packet filtering, eBPF expands these foundations into a general-purpose compute engine by introducing a richer instruction set, bounded loops, and direct access to kernel data structures.[2][3] The development of eBPF began in 2013 when it was initially proposed as "internal BPF" (iBPF) by Alexei Starovoitov, a kernel developer at PLUMgrid, to address limitations in classic BPF for broader kernel instrumentation tasks like tracing and networking.[3] It was renamed extended BPF and progressively integrated into the Linux kernel starting with version 3.15 in 2014, with key enhancements such as just-in-time (JIT) compilation for performance and a verifier for safety added in subsequent releases up to version 4.4 in 2015.[4][3] Today, eBPF is maintained as a core Linux kernel feature, supported by the eBPF Foundation—a Linux Foundation project established in 2021 to guide its upstream development, promote adoption, and ensure security and reliability.[5] At its core, eBPF operates through a combination of components that ensure both power and safety: programs written in a restricted subset of C are compiled to eBPF bytecode using tools like LLVM, then loaded into the kernel where a static verifier analyzes them for safety guarantees, such as termination, bounded resource usage, and no invalid memory accesses.[1][2] Supporting structures include maps—efficient key-value stores like hash maps or arrays for persisting data across program executions—and helpers, a stable set of kernel-provided functions for tasks like packet manipulation or random number generation.[2] Programs attach to kernel hooks, such as tracepoints, network interfaces (via XDP for early packet processing), or security modules, and are executed in a sandboxed environment with protections like constant blinding against speculative attacks and privilege requirements (root or CAP_BPF capability).[1][2] eBPF's versatility has made it indispensable for modern cloud-native and data center applications, particularly in networking for high-speed packet processing and load balancing (e.g., in tools like Cilium), observability for low-overhead tracing and profiling (used by companies like Google and Netflix), and security for runtime enforcement and threat detection without kernel modifications.[2][6] Its ability to run at near-native speeds via JIT compilation, combined with composability through tail calls and global variables, positions eBPF as a foundational technology for innovation in operating system extensibility, with ongoing expansions to support unprivileged execution and cross-platform portability. As of 2025, ongoing research includes isolated execution environments and virtual memory interfaces to enhance security and portability.[1][7][8]

Overview and History

Definition and Core Purpose

eBPF, or extended Berkeley Packet Filter, is an in-kernel virtual machine technology in the Linux kernel that enables the execution of sandboxed bytecode programs attached to various kernel hooks, allowing safe runtime extension and instrumentation without modifying kernel source code or loading kernel modules.[9] This mechanism provides programmable extensibility for key areas such as networking, security, and observability, while ensuring kernel stability through a rigorous verifier that checks programs for safety and just-in-time (JIT) compilation that translates bytecode to native machine code for efficient execution.[2] By running user-supplied programs in a privileged kernel context, eBPF facilitates dynamic customization of kernel behavior, such as packet processing in networking stacks or event tracing for performance analysis.[10] Unlike the original Berkeley Packet Filter (BPF), which was primarily designed for simple packet filtering on network sockets with limitations like 32-bit registers and no support for loops, eBPF expands the instruction set to include 64-bit registers (ten general-purpose registers R0–R9 plus a read-only frame pointer) and bounded loops introduced in Linux kernel 5.3, enabling more complex computations.[11] It also broadens attachment points beyond basic socket filters to include diverse kernel events, such as kprobes for dynamic instrumentation of kernel functions and tracepoints for predefined tracing locations, allowing programs to hook into a wide range of system activities.[2] This evolution transforms eBPF from a narrow filtering tool into a versatile framework for kernel-level programmability. At a high level, eBPF programs are developed in a restricted C-like subset, compiled in user space to platform-independent bytecode using tools like LLVM, then loaded into the kernel via the bpf() system call, often facilitated by libraries such as libbpf.[12] Upon loading, the kernel's verifier analyzes the bytecode to ensure it terminates without crashing or exceeding resource limits, after which it can be attached to specific events or hooks for execution.[2] Once triggered by kernel events, the programs run with JIT acceleration, interacting with kernel facilities through helper functions while maintaining isolation to prevent disruptions.[10]

Historical Development and Milestones

The Berkeley Packet Filter (BPF) originated in 1992 as a mechanism for efficient user-level packet capture and filtering in BSD Unix systems, developed by Steven McCanne and Van Jacobson at Lawrence Berkeley Laboratory.[13] Their work introduced a virtual machine-based architecture that executed user-supplied bytecode in the kernel to filter network packets, avoiding the overhead of copying data to user space, and was first integrated into the 4.4BSD release.[13] This classic BPF laid the groundwork for safe, in-kernel program execution without compromising system stability. The extended BPF (eBPF) emerged in 2013–2014 through efforts led by Alexei Starovoitov while at Facebook (now Meta), aiming to generalize BPF beyond packet filtering for broader kernel observability and programmability.[14] Initial patches for eBPF were merged into the Linux kernel version 3.15 in 2014, introducing an expanded instruction set, larger stack, and support for non-packet use cases like tracing.[15] Subsequent milestones accelerated eBPF's evolution: kernel 3.18 (2014) added eBPF maps for data storage and sharing between kernel and user space; 4.1 (2015) introduced just-in-time (JIT) compilation for faster execution; 4.1 (2015) enabled dynamic tracing via kprobes; 4.7 (2016) added tracepoints and extended socket-level filtering; 4.8 (2016) launched eXpress Data Path (XDP) for high-performance packet processing at the driver level; 5.3 (2019) incorporated bounded loops to enhance program expressiveness while maintaining verifier tractability; and 5.10 (2020) supported sleepable eBPF programs for longer-running tasks without blocking kernel threads. Key contributors to eBPF's development include Alexei Starovoitov, who authored much of the core infrastructure, and Daniel Borkmann, a co-maintainer focused on networking integrations.[16] Jesper Dangaard Brouer advanced XDP and packet processing features, while teams from Cloudflare contributed to early adoption in production networking and the Cilium project (from Isovalent) drove container-native use cases like service mesh and security.[17] Post-2020, eBPF saw significant growth with the integration of BPF Type Format (BTF) in kernel 4.18, providing structured type metadata for portable debugging and CO-RE relocations.[18] In August 2021, the eBPF Foundation was established under the Linux Foundation by founding members including Meta, Google, Isovalent, Microsoft, and Netflix to coordinate upstream development and ecosystem expansion.[19] Recent advancements through 2024–2025 emphasize scalability and new applications: Linux kernel 6.11 (released September 2024) included enhancements to the verifier and other eBPF features for better handling of complex programs, improving compilation times and reducing false rejections in large-scale deployments.[20] The eBPF Summit 2024 highlighted emerging integrations with artificial intelligence, such as using eBPF for low-latency data collection in AI training pipelines and model serving.[21]

Technical Architecture

Core Components and Program Lifecycle

The core components of eBPF form the foundation for its in-kernel execution model, enabling safe and efficient program deployment. eBPF programs consist of bytecode compiled from user-space languages like C, representing the executable logic that runs in the kernel virtual machine.[15] Maps serve as versatile data stores, providing key-value structures for sharing state between eBPF programs, the kernel, and user space, with types such as hash tables and arrays supporting operations like lookups and updates.[22] Helpers are predefined kernel functions that eBPF programs can invoke, such as bpf_trace_printk for logging or bpf_map_lookup_elem for map access, with availability restricted by program type to ensure safety.[23] Attachment points define the kernel hooks where programs are triggered, including kprobes for dynamic kernel function entry/exit, tracepoints for static instrumentation sites, and XDP for early packet processing in network drivers.[15] The eBPF program lifecycle begins in user space with compilation, where source code is translated via LLVM/Clang into BPF bytecode embedded in an ELF object file.[22] Loading occurs through the bpf() system call using the BPF_PROG_LOAD command, which passes the bytecode, program type (e.g., BPF_PROG_TYPE_KPROBE), and license details to the kernel, returning a file descriptor (FD) upon success.[24] The kernel's verifier then statically analyzes the program for safety, checking for termination guarantees, memory bounds, and valid instruction flows without executing it.[15] If verified, the bytecode is JIT-compiled to native machine code for the architecture (e.g., x86-64 or ARM64) to optimize performance, potentially yielding 1.5x to 4x speedups over interpretation in network filtering scenarios.[15] Attachment integrates the loaded program into the kernel's execution paths, using mechanisms like the bpf_prog_attach() syscall or specific hooks such as perf events for sampling, network devices via XDP or tc classifiers, and sockets for traffic filtering.[12] Multi-program composition is supported through tail calls, where the bpf_tail_call helper allows a program to jump to another in a program array map, enabling modular chaining without stack overhead or return to the caller.[23] Execution is event-driven: triggers like a kprobe firing or packet ingress invoke the attached program, which processes data and may update maps or call helpers before returning a verdict (e.g., accept, drop, or modify).[15] Unloading happens by closing the program FD, though attached programs may persist until explicitly detached to avoid disrupting ongoing operations.[24] Error handling in the loading phase relies on the bpf() syscall's return codes; for instance, -EPERM indicates insufficient privileges (requiring CAP_SYS_ADMIN or equivalent), while -EINVAL signals invalid bytecode, such as unrecognized instructions or out-of-bounds jumps detected by the verifier.[24] Other common errors include -E2BIG for programs exceeding size limits and -EACCES for unsafe access patterns, ensuring robust failure modes without kernel crashes.[24]

eBPF Maps and Data Structures

eBPF maps serve as generic key-value stores that enable data sharing between eBPF programs running in the kernel and user-space applications. They are created in user space using the bpf() system call with the BPF_MAP_CREATE command, where parameters such as map type, key size, value size, and maximum number of entries (max_entries) are specified to define the map's structure and capacity.[25] Once created, maps are referenced by file descriptors, which can be passed to eBPF programs during loading or shared across processes via inheritance (e.g., after fork()) or Unix domain sockets.[25] From within eBPF programs, maps are accessed using kernel helper functions such as bpf_map_lookup_elem() for reading values by key, bpf_map_update_elem() for inserting or updating entries (with flags like BPF_ANY, BPF_NOEXIST, or BPF_EXIST to control behavior), and bpf_map_delete_elem() for removal; these operations are bounds-checked by the eBPF verifier to ensure safety.[25] User-space applications perform similar operations via bpf() commands like BPF_MAP_LOOKUP_ELEM, BPF_MAP_UPDATE_ELEM, BPF_MAP_DELETE_ELEM, and BPF_MAP_GET_NEXT_KEY for iterating over entries.[25] The kernel enforces size limits based on max_entries to prevent memory exhaustion, returning E2BIG if exceeded during updates.[25] eBPF supports a variety of map types tailored to different storage and access patterns, all functioning as key-value stores but optimized for specific use cases like fast lookups, per-CPU aggregation, or efficient data streaming. The following table summarizes key map types:
Map TypeDescriptionKey CharacteristicsIntroduced In
BPF_MAP_TYPE_HASHGeneral-purpose hash table for arbitrary keys and values.Optimized for lookup speed; supports atomic updates; LRU variants available for eviction.Initial eBPF
BPF_MAP_TYPE_PERCPU_HASHPer-CPU variant of hash map, aggregating values across CPUs.Reduces contention in multi-CPU environments; value size multiplied by number of CPUs.4.6
BPF_MAP_TYPE_ARRAYFixed-size array using 32-bit integer keys as indices.Fastest access; pre-allocated and zero-initialized; no deletes supported.Initial eBPF
BPF_MAP_TYPE_PERCPU_ARRAYPer-CPU variant of array map.Ideal for per-CPU counters; supports atomic operations like increment.4.6
BPF_MAP_TYPE_RINGBUFLockless ring buffer for high-performance data transfer to user space.Multi-producer, single-consumer; used for perf events and tracing output; no lookup/update/delete.5.8
BPF_MAP_TYPE_QUEUEFIFO queue for ordered data storage.Supports push (enqueue) and pop (dequeue); bounded by max_entries.4.20
BPF_MAP_TYPE_STACKLIFO stack for last-in, first-out operations.Supports push and pop; useful for temporary storage.4.20
BPF_MAP_TYPE_BLOOM_FILTERProbabilistic data structure for membership queries.Space-efficient; supports insert and lookup with false positives but no false negatives.5.16
BPF_MAP_TYPE_SOCKMAPArray-based map for storing sockets.Enables socket redirection and load balancing; integrates with networking helpers.4.14
BPF_MAP_TYPE_SOCKHASHHash-based variant for sockets with arbitrary keys.Similar to SOCKMAP but flexible keying; supports policy application at socket level.4.18
These types facilitate use cases such as sharing counters across multiple eBPF programs for aggregated statistics or exchanging data between user space and kernel, exemplified by sock maps that redirect packets for load balancing in networking scenarios.[25][26] Advanced features enhance map flexibility and persistence. Maps can be pinned to a filesystem (e.g., under /sys/fs/bpf/) using the BPF_OBJ_PIN command, allowing them to outlive the creating process and be discovered by other applications without file descriptor passing.[25] Additionally, map-in-map constructs enable nested storage through types like BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS, where an outer map holds file descriptors to inner maps of a uniform type, supporting one level of nesting for complex data hierarchies; inner maps are created separately in user space, and eBPF programs access them via bpf_map_lookup_elem() on the outer map.[27] This nesting, introduced in kernel 4.12, allows for dynamic organization of related data structures while maintaining verifier safety.[27]

eBPF Virtual Machine and Execution Model

The eBPF virtual machine (VM) is a stack-based interpreter embedded in the Linux kernel that executes user-supplied bytecode programs in a sandboxed environment. It features eleven 64-bit registers: ten general-purpose registers (r0 through r9) and a read-only frame pointer (r10) used for accessing the program's stack. The stack is limited to 512 bytes, providing temporary storage during execution, while r0 serves as the return register for function calls and program exits.[28][29] The VM operates on a 64-bit RISC-like architecture, ensuring deterministic behavior and safety through bounded resource usage. The eBPF instruction set consists of 64-bit (basic) and 128-bit (wide) encodings, categorized into several classes for load/store operations, arithmetic logic unit (ALU) computations, jumps, calls, and atomic operations. Load and store instructions (e.g., BPF_LDX for loading from memory to a register like r1 = *(u32 *)(r2 + 4), and BPF_STX for storing from a register to memory) handle data movement with support for various sizes (byte, half-word, word, double-word). ALU operations include 64-bit (BPF_ALU64) and 32-bit (BPF_ALU) instructions for arithmetic (add, subtract, multiply, divide, modulo) and bitwise operations (AND, OR, XOR, shifts), such as r0 += r1 or r0 ^= imm32. Jump instructions (BPF_JMP) enable conditional and unconditional branching, including ja for unconditional jumps, jeq for equality checks (if r1 == r2 then PC += off), and jgt for greater-than comparisons, with 32-bit variants (BPF_JMP32) for narrower operations. Call instructions (BPF_CALL) invoke kernel helper functions, returning results in r0, while exit instructions (BPF_EXIT) terminate the program. Atomic operations, introduced in Linux kernel 5.10 via the BPF_ATOMIC class, support thread-safe memory updates like exchange (BPF_XCHG), compare-and-exchange (BPF_CMPXCHG), and fetch-add (e.g., *(u64 *)(r1 + off) += r2), available for 32-bit (BPF_W) and 64-bit (BPF_DW) modes on compatible memory types.[28][29][30] eBPF programs execute in an event-driven model, triggered by kernel hooks such as network packet arrivals, tracepoints, or kprobes, where the program is attached via user-space APIs. Execution is bounded to prevent infinite loops, with a default limit of 1,000,000 instructions per invocation (increased from 4,096 in kernel 5.3), enforced by the verifier to ensure termination.[24][31] Loops, supported since kernel 5.3, must be bounded and analyzable within this limit. Tail calls, available since kernel 4.2, allow dynamic chaining to other programs stored in a program array map using bpf_tail_call(ctx, &jmp_table, index), with a maximum depth of 33 calls to avoid stack overflows. Sleepable programs, introduced in kernel 5.17, permit blocking operations (e.g., via BPF_F_SLEEPABLE flag) by offloading execution to a kworker thread, enabling interactions like filesystem access that would otherwise be disallowed in non-preemptible contexts.[32][33] For performance, the kernel employs just-in-time (JIT) compilation to translate eBPF bytecode into native machine code tailored to the architecture (e.g., x86-64, ARM64), available since kernel 3.18 with fallbacks to the interpreter if JIT is disabled. JITs maintain a one-to-one mapping between eBPF and native instructions for efficiency, configurable via /proc/sys/net/core/bpf_jit_enable. Context is passed to programs via registers: for example, in networking hooks, r1 points to the socket buffer (struct sk_buff *skb), r2 to the packet data length, and subsequent registers to additional metadata, allowing programs to inspect and modify kernel data without direct pointers.[24][11][34]

Verifier, Safety Mechanisms, and Optimization

The eBPF verifier performs static analysis on programs prior to loading, simulating all possible execution paths to ensure safety and termination without runtime crashes or kernel instability. It begins with a directed acyclic graph (DAG) check to detect and disallow loops or unreachable instructions, followed by path simulation that tracks changes in register and stack states across the program's control flow graph. This process enforces strict bounds, such as limiting iterative paths to fewer than one million instructions to prevent excessive verification time, and prohibits unbounded memory accesses by validating pointer arithmetic and alignment for types like PTR_TO_CTX, PTR_TO_MAP, and PTR_TO_STACK.[35][36] Safety mechanisms in the verifier include comprehensive type checking, which assigns and verifies register states (e.g., distinguishing scalars from pointers) using structures like struct bpf_reg_state, and range analysis that employs ternary numerics (tnum) to track minimum/maximum values and bit patterns for precise bounds inference. Direct memory writes are restricted to verified safe locations, with pointer adjustments rejected in secure contexts to avoid overflows, while helper function calls are gated by prototype validation via get_func_proto() to ensure correct argument types and prevent issues like division by zero along any path. Programs violating these rules, such as those attempting invalid stack reads before writes or speculative type confusions, are rejected outright.[35][36] The verifier incorporates optimizations to enhance efficiency, including dead code elimination to remove unused instructions (initially for privileged programs in kernel v5.1), constant folding to simplify expressions with known values, and peephole optimizations for local instruction improvements during analysis. A checkpoint cache collapses equivalent states to mitigate exponential growth in path exploration, while liveness tracking ignores computations not affecting outputs. Just-in-time (JIT) compilation builds on this with register allocation tailored to the host architecture, though verifier-level opts focus on bytecode refinement before JIT. For modularity, bpf_tail_call enables bounded jumps to other programs, verified to prevent stack overflows.[35][37][36] Limitations of the verifier include worst-case O(n²) time complexity due to path explosion in complex branches, leading to rejections of safe but intricate programs, as it cannot fully resolve the halting problem. Enhancements since kernel v5.3 have improved loop handling by supporting bounded iterations (with verifier-proven termination), extended in v5.17 via the bpf_loop helper for up to ~8 million iterations and in v6.4 with numeric iterators for range-based processing. Error reporting aids debugging through detailed logs output to the trace_pipe interface, providing messages like "R2 !read_ok" or "invalid stack off=8 size=8" to pinpoint rejection causes.[35][38][36]

Compile Once – Run Everywhere (CO-RE)

The Compile Once – Run Everywhere (CO-RE) approach in eBPF enables the creation of portable programs that can execute across different Linux kernel versions without requiring recompilation for each target kernel. This portability is achieved by leveraging the BPF Type Format (BTF), which provides rich metadata about kernel data structures and types, allowing the eBPF loader to dynamically adjust program instructions at load time rather than embedding kernel-specific offsets during compilation. Introduced as part of efforts to address the challenges of kernel evolution, CO-RE replaces hardcoded field accesses—such as direct offsets into structures like sk_buff—with symbolic references that are resolved based on the runtime kernel's BTF information.[39][40] In implementation, the libbpf library serves as the core facilitator of CO-RE by processing BTF data extracted from the kernel's vmlinux image, typically available at /sys/kernel/btf/vmlinux. During compilation, the LLVM-based BPF backend (via Clang) generates relocation records for field accesses, marking them for adjustment; libbpf then uses the target kernel's BTF to compute and apply the correct offsets or sizes at load time, ensuring compatibility even if kernel structures change between versions. For instance, accessing skb->data avoids brittle hardcoded offsets by relying on BTF-defined type layouts, with libbpf handling the relocation transparently through APIs like bpf_object__load. This mechanism depends on BTF support in the kernel, enabled via the CONFIG_DEBUG_INFO_BTF configuration option, which embeds type information directly into the kernel binary.[39][40][41] The primary benefits of CO-RE include enhanced binary compatibility, allowing a single eBPF binary to run seamlessly across a wide range of kernels, such as from version 5.4 to 6.11, without the need for version-specific builds or manual patches. This reduces maintenance overhead for developers and deployers, particularly in heterogeneous environments like cloud infrastructures where kernel versions vary, and minimizes the risks associated with kernel upgrades breaking existing programs. By decoupling eBPF applications from specific kernel headers, CO-RE promotes broader adoption and simplifies distribution, as programs can be shipped as precompiled ELF objects rather than source code requiring on-site compilation.[39][40] The typical workflow for developing CO-RE-enabled eBPF programs begins with compiling the BPF code using a BTF-enabled version of LLVM/Clang, which incorporates relocation metadata into the output ELF object. Developers then use libbpf to load the object via functions like bpf_object__load, where the library automatically fetches and applies the runtime kernel's BTF for adjustments; if BTF is unavailable, libbpf can fall back to manual relocation schemes or user-provided type information, though this is less robust. Tools like bpftool assist by generating header files such as vmlinux.h from BTF, providing type-safe accessors for kernel structures during development.[39][40] Despite its advantages, CO-RE has limitations: it requires kernels compiled with BTF support, which may not be enabled in all distributions or custom builds, and not all kernel structures or fields are exposed via BTF due to privacy or complexity concerns. Additionally, significant kernel changes—such as field renames or major layout shifts—can still cause incompatibilities if they alter the semantic mapping assumed by the program. In such cases, programs may fail to load or require updates to their relocation logic.[39][40] Recent enhancements in 2024 have further improved CO-RE practicality, with more Linux distributions now shipping kernels with pre-embedded BTF information by default, reducing setup barriers and enabling smoother dynamic loading of type data across diverse deployments. These improvements build on ongoing kernel efforts to expand BTF coverage and optimize relocation processing, enhancing support for evolving hardware and module-based extensions.[20][42]

BPF Type Format (BTF) and DWARF Integration

The BPF Type Format (BTF) is a compact, self-describing format for encoding type information in eBPF programs and kernel objects. Derived from DWARF debugging information, BTF provides runtime-accessible type metadata without the overhead of full debug sections, enabling features like CO-RE (Compile Once – Run Everywhere) and enhanced observability.

DWARF in the Kernel

When CONFIG_DEBUG_INFO is enabled, the Linux kernel's vmlinux image includes DWARF sections such as:
  • .debug_info: Type, variable, and function information
  • .debug_abbrev: Abbreviation tables
  • .debug_line: Line number tables
  • .debug_str: String tables
  • .debug_rnglists: Range lists
These sections allow external tools to map machine code to source code but are not loaded at runtime.

BTF Derivation and Usage

BTF is generated from DWARF during kernel build (pahole tool) or from eBPF object files. It encodes essential type details (structs, enums, functions) compactly. Key uses in eBPF:
  • Trampoline generation: BTF provides function argument locations (registers/stack offsets) and sizes for safe context passing in hooks.
  • CO-RE: Enables relocations for field offsets, avoiding recompilation across kernel versions.
  • Observability: Tools like bpftool inspect BTF for program types, maps, and stack traces with source context.
BTF is embedded in ELF object files under sections like .BTF and .BTF.ext, allowing verified, efficient runtime access without full DWARF parsing. This integration enhances eBPF safety, portability, and debugging while minimizing production overhead.

Programming and Development

Writing and Loading eBPF Programs

eBPF programs are authored in a restricted subset of the C programming language, compiled to eBPF bytecode using the LLVM/Clang toolchain. Developers typically use Clang with the -target bpf flag to generate an ELF object file containing the bytecode, along with metadata for maps and sections.[43] The code structure requires including kernel headers such as <linux/bpf.h> and <bpf/bpf_helpers.h>, which provide BPF macros and helper function prototypes. Programs are organized into sections using the SEC() macro to specify the program type (e.g., SEC("xdp") for XDP hooks), and licenses are declared with SEC("license"). Maps are defined using BPF_MAP_DEF() or section-based declarations, while kernel interactions rely on helper functions like bpf_map_lookup_elem() and bpf_redirect(), without access to the standard library or floating-point operations.[43][44] A representative example is a simple XDP program that counts incoming packets using a BPF array map:
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u64);
} pkt_count SEC(".maps");

SEC("xdp")
int count_packets(struct xdp_md *ctx) {
    __u32 key = 0;
    __u64 *count = bpf_map_lookup_elem(&pkt_count, &key);
    if (count) {
        __sync_fetch_and_add(count, 1);
    }
    return XDP_PASS;
}

char _license[] SEC("license") = "Dual MIT/GPL";
This program increments a counter in the map for each packet and passes it to the network stack. The map allows user-space access to the count via file descriptors.[43] Loading an eBPF program involves using the libbpf library in user space to parse the ELF file, create kernel resources, and attach the program. The process begins with bpf_object_open() or bpf_object__open_file() to load the object file, followed by bpf_object_load() to verify the program through the kernel verifier, create maps, and prepare for JIT compilation. File descriptors for maps and programs are obtained via bpf_map__fd() and bpf_program__fd(), and attachment to a hook (e.g., an XDP interface) uses bpf_program__attach() with appropriate configuration. Errors during loading often stem from verifier rejections due to safety violations.[44] Debugging loaded programs utilizes tools like bpftool for inspection, such as bpftool prog dump to display bytecode or bpftool map dump for map contents, and perf trace to capture outputs from helpers like bpf_trace_printk(). These tools enable runtime analysis without modifying the program.[45]

User-Space Interfaces and Helper Functions

User-space interactions with eBPF are primarily facilitated through the bpf() system call, a versatile interface introduced in Linux kernel 3.15 that enables operations such as creating maps, loading programs, and managing eBPF objects.[24] This syscall accepts a command code as its first argument, specifying the operation, followed by a union of structures containing relevant parameters. For instance, the BPF_MAP_CREATE command allows users to allocate eBPF maps by providing details like map type, key and value sizes, maximum entries, and flags, returning a file descriptor (FD) upon success that serves as a handle for subsequent operations.[24] Similarly, BPF_PROG_LOAD is used to load eBPF programs into the kernel, requiring the program type, bytecode, section names, and license, with the kernel verifier analyzing the program for safety before attaching it to hooks or returning its FD.[24] These FDs can be passed between processes via inheritance after fork(2) or Unix domain sockets, enabling shared access to eBPF resources across user-space applications. To abstract the low-level details of the bpf() syscall, the libbpf library provides higher-level C APIs for eBPF development, simplifying tasks like program loading and map management. Functions such as bpf_prog_load_xattr allow loading programs with extended attributes, handling bytecode generation from ELF objects, relocation, and attachment without direct manipulation of syscall arguments. Libbpf supports features like skeleton generation for easier integration, where user-space code can directly access map FDs and program attachments through generated structures, reducing boilerplate and improving portability across kernel versions. This library, maintained under the Linux Foundation, has become a de facto standard for eBPF userspace tooling due to its robustness and integration with build systems like CMake. eBPF programs interact with the kernel and access system state through a suite of over 300 helper functions, which are predefined intrinsics callable from eBPF bytecode via the call instruction (as of Linux kernel 6.12). These helpers, declared in the kernel's UAPI headers (e.g., uapi/linux/bpf.h), provide safe, bounded access to kernel facilities without direct memory manipulation, categorized by domain such as tracing (e.g., bpf_get_current_pid_tgid() to retrieve the current process ID and thread group ID), networking (e.g., bpf_skb_load_bytes() to load bytes from a socket buffer for packet processing), and system calls (e.g., bpf_probe_read_user() for reading user-space memory). Tracing helpers often output data to perf event buffers or tracepoints, while networking ones manipulate skbs or XDP frames, ensuring domain-specific isolation to prevent misuse.[24] User-space applications retrieve data produced by eBPF programs through mechanisms tied to these interfaces, such as reading from map FDs using bpf_map_lookup_elem() via the bpf() syscall or polling perf event FDs for traced events.[24] Maps act as shared memory between kernel and user space, with updates from helpers like bpf_map_update_elem() visible immediately via FD operations, while perf buffers enable ring-buffer-style streaming for high-volume outputs like trace data. This bidirectional access pattern allows real-time monitoring and control, with user space configuring maps or attachments post-loading. Helper functions evolve across kernel versions to add functionality or fix issues, with availability checked during program verification at load time to ensure compatibility. For example, bpf_probe_read_kernel() was introduced in kernel 4.7 to safely read kernel memory, and subsequent versions added variants like bpf_probe_read_user() in 4.8; programs using unavailable helpers fail verification with errors like -EINVAL. Versioning is managed through inline assembly in eBPF or CO-RE relocations in libbpf, allowing programs to adapt to kernel differences without recompilation. Error handling in helpers follows a consistent model, returning 0 on success or negative errno values on failure, which eBPF programs can inspect to branch logic accordingly. Common returns include -ENOENT for absent map keys in lookup operations or -EFAULT for invalid memory accesses, enabling programs to gracefully degrade or log issues without crashing the kernel.[24] This design promotes robust, fault-tolerant eBPF applications by propagating kernel errors back to user space through return codes or map-stored diagnostics.

Tools and Frameworks for Development

bpftool serves as a command-line interface (CLI) tool for inspecting, loading, and manipulating eBPF programs and maps within the Linux kernel. It enables users to list loaded programs with bpftool prog show, examine map contents via bpftool map dump, and load programs from ELF objects using bpftool prog load. Additionally, bpftool supports debugging by disassembling JIT-compiled code through commands like bpftool prog dump jited <ID>, which outputs the native machine code for analysis.[46] The LLVM BPF backend provides the compiler toolchain for generating eBPF bytecode from high-level languages such as C, targeting the eBPF instruction set defined in the Linux kernel. It integrates with Clang to produce ELF object files containing eBPF programs, maps, and relocation information, facilitating the "Compile Once – Run Everywhere" (CO-RE) paradigm through BTF (BPF Type Format) metadata. Developers typically use Clang with the -target bpf flag to compile eBPF code, leveraging LLVM's optimizations for safe and efficient bytecode generation.[47][48] BCC (BPF Compiler Collection) is a Python-based framework that simplifies eBPF development by providing bindings to compile and load C-written eBPF programs directly from user-space scripts. It includes pre-built tools for tasks like network tracing (e.g., tcpconnect) and CPU profiling (e.g., profile), allowing rapid prototyping of observability applications without manual handling of the bpf() syscall. The iovisor/bcc repository hosts examples and utilities that demonstrate its use in creating kernel tracing prototypes.[49] bpftrace offers a high-level tracing language inspired by awk and D, enabling developers to write concise eBPF scripts for kernel and user-space tracing. It compiles scripts to eBPF bytecode using LLVM and libbpf, supporting one-liners like bpftrace -e 'kprobe:do_sys_open { printf("%s %s\n", comm, str(args->filename)); }' to trace file opens by process. This tool excels in quick debugging and ad-hoc analysis, with its standard library providing built-in variables and functions for stack traces and histograms.[50] libbpf is the canonical C library for loading and managing eBPF objects, offering APIs to parse ELF files, create maps, attach programs to kernel hooks, and handle CO-RE relocations using BTF. It abstracts the bpf() syscall, providing skeleton generation via bpftool for type-safe access to maps and global variables in user-space code. libbpf supports advanced features like ring buffers and netfilter attachments, making it essential for production-grade eBPF applications.[12][51] For debugging, bpftool allows inspection of loaded objects, while verifier logs—generated during program loading to detail safety checks—can be captured via libbpf's log buffer or kernel dmesg output for diagnosing rejection reasons. Enhancements in libbpf version 1.6.0, released in 2025, include improved control over BPF object lifetime with the new bpf_object__prepare() API, enhanced error reporting with symbolic codes, and support for multi-uprobe sessions and BPF token mechanisms, further enhancing developer productivity.[52][35] The Aya crate integrates Rust with eBPF development, providing safe abstractions for writing both user-space loaders and kernel programs using the aya-ebpf library. It enforces memory safety and type checking at compile time, reducing common errors in eBPF coding, and supports CO-RE via BTF for cross-kernel compatibility. This framework has gained traction for building secure eBPF applications in Rust ecosystems. Recent developments as of 2025 include expanded Rust tooling like RedBPF for additional safe eBPF integrations and kernel features such as BPF-based CPU scheduling in Linux 6.12, enabling dynamic scheduler implementations via eBPF.[53][54]

Applications and Use Cases

Networking and Packet Processing

eBPF enables high-performance modifications to the Linux network stack by attaching programs to specific hooks, allowing for efficient packet inspection, filtering, and forwarding without traversing the full kernel networking path.[55] In networking applications, eBPF programs leverage kernel-level execution to process packets at line rate, supporting use cases like load balancing and traffic shaping while minimizing overhead. The eXpress Data Path (XDP) provides an early ingress hook in network interface card (NIC) drivers, introduced in Linux kernel 4.8, where eBPF programs execute before packets enter the kernel's networking stack. XDP programs can perform actions such as dropping invalid packets (DROP), forwarding them to the stack (PASS), or redirecting them to other interfaces or user space (REDIRECT).[55] This design facilitates rapid DDoS mitigation by filtering malicious traffic at the driver level, achieving up to 10 times the packet processing rate of traditional iptables rules, with 2017 benchmarks showing XDP handling up to 10 million packets per second (Mpps) on a single core compared to iptables' lower throughput of around 1 Mpps.[56][57] Traffic Control (TC) integrates eBPF since Linux kernel 4.1, enabling offloading of queuing disciplines (qdiscs) and classifiers to programmable code for advanced traffic management.[58] eBPF programs attached to TC hooks, such as classifiers (BPF_PROG_TYPE_SCHED_CLS), allow custom packet classification and actions like policing or shaping, with support for direct-action mode to enqueue or redirect packets without additional kernel lookups.[59] For load balancing, eBPF enhances qdiscs like CAKE (Common Applications Kept Enhanced) by dynamically adjusting flows based on socket states, distributing traffic across multiple paths or endpoints efficiently. Socket filters, using the BPF_PROG_TYPE_SOCKET_FILTER program type, attach eBPF code directly to sockets for per-socket packet processing, allowing custom filtering or redirection to user space.[60] These filters inspect incoming or outgoing packets on TCP/UDP sockets, enabling actions like selective redirection to application buffers via mechanisms such as sk_redirect, which bypasses traditional stack processing for lower latency.[61] Prominent examples include Cilium, an eBPF-based Container Network Interface (CNI) for Kubernetes, which uses XDP and TC hooks to implement pod-to-pod networking, service load balancing, and encryption without a central proxy.[62] Similarly, Cloudflare's Magic Firewall employs eBPF with nftables for distributed packet filtering across its data centers, enforcing rules like IP allowlisting at line rate to mitigate attacks.[63] eBPF networking achieves zero-copy processing by operating on packet buffers in place, avoiding data duplication between kernel and user space, which is particularly effective in XDP's TX mode for forwarding. On modern NICs with hardware offload support, XDP programs can process up to 14 Mpps in hybrid software-hardware setups as of 2018 benchmarks, demonstrating scalability for high-throughput environments.[57]

Observability and Tracing

eBPF enables observability and tracing by allowing programs to attach to kernel and user-space events, capturing data with minimal overhead for performance analysis and debugging.[10] These attachments leverage the kernel's tracing infrastructure to instrument code dynamically or statically, providing insights into system behavior without requiring kernel recompilation or module loading.[64] Tracing in eBPF relies on various hooks for instrumentation. Kprobes provide dynamic tracing of kernel functions by inserting breakpoints, enabling custom eBPF code execution at probe points since Linux 4.1.[10] Uprobes extend this to user-space applications for similar dynamic instrumentation, available since kernel 4.4.[10] Tracepoints offer static hooks into predefined kernel events for efficient, low-overhead observability, with eBPF support introduced in kernel 4.7.[10] Since Linux 5.5, fentry and fexit hooks allow tracing function entry and exit with reduced overhead compared to traditional kprobes, using BPF trampolines for near-zero cost attachment.[10] Tools like bpftrace and BCC integrate eBPF for simplified tracing workflows. Bpftrace is a high-level scripting language inspired by DTrace, supporting one-liners and complex scripts for kernel and user-space tracing; for example, @x = count(); aggregates event counts in-kernel.[65] [66] BCC provides a Python-based framework for building tracers, such as biolatency, which uses kprobes on block I/O functions to generate latency histograms, revealing distributions like multimodal disk I/O patterns.[49] [67] Data from eBPF tracing programs is exported to user space via efficient mechanisms to minimize latency. Perf event buffers (BPF_MAP_TYPE_PERF_EVENT_ARRAY) allow per-CPU event streaming, supporting variable-length records and epoll notifications for consumption.[68] Ring buffers (BPF_MAP_TYPE_RINGBUF), introduced in kernel 5.8, improve on this by preserving event order across CPUs, using a shared power-of-two circular structure with APIs like bpf_ringbuf_output() for low-overhead, non-blocking output of custom events.[68] Common use cases include latency analysis and resource tracking. For instance, eBPF programs attached to syscall tracepoints can build histograms of execution times, identifying bottlenecks in system calls.[67] Off-CPU analysis employs kprobes on scheduler functions like finish_task_switch() to capture stack traces and time spent waiting, enabling in-kernel summarization of voluntary context switches for production profiling with under 6% overhead.[69] Custom metrics can also be generated for integration with monitoring systems, such as exporting kernel counters to Prometheus via eBPF-based exporters.[70] Advancements in eBPF tracing include sleepable programs since Linux 5.10, allowing kprobe attachments to perform blocking operations like sleeping during long-running traces without preempting the kernel. This enhances flexibility for complex observability tasks while maintaining safety through the verifier.[10]

Security Enforcement and Monitoring

eBPF plays a crucial role in runtime security by enabling the attachment of programs to kernel hooks for policy enforcement and real-time anomaly detection, allowing fine-grained control over system behaviors without modifying the kernel.[71][10] One primary mechanism is the integration of eBPF with Linux Security Modules (LSM) hooks, introduced in Linux kernel 5.7, which permits the loading of eBPF programs to implement custom mandatory access controls at key kernel points such as file operations and process creation.[72][71] These programs can enforce granular policies, for instance, by denying unauthorized access to sensitive files or monitoring privilege escalations, providing a flexible alternative to traditional LSM modules like SELinux.[73] A notable example is Falco, an open-source runtime security tool that leverages eBPF-attached LSM hooks to monitor system calls for suspicious activities, such as unexpected shell executions in containers.[74][75] eBPF extends Seccomp (Secure Computing mode) filters, known as Seccomp-BPF, to offer more sophisticated syscall restrictions beyond static rules, enabling dynamic, argument-aware filtering for enhanced process sandboxing.[76][77] This allows applications to load eBPF programs that inspect syscall parameters in real-time, such as validating file paths in open() calls or limiting network socket creations, thereby reducing the attack surface for untrusted code.[78] In monitoring scenarios, eBPF facilitates network policy enforcement and file integrity checks by attaching to kernel tracepoints and kprobes. For Kubernetes environments, Tetragon uses eBPF to enforce runtime policies, such as blocking unauthorized pod-to-pod connections based on labels and contexts, integrating seamlessly with Cilium for identity-aware networking.[79][80] File integrity monitoring can be achieved via kprobes on functions like vfs_open or vfs_unlink, where eBPF programs track modifications to critical paths and alert on deviations, as implemented in Tetragon's tracing policies.[81] Practical deployments highlight these capabilities; Sysdig Secure employs eBPF for comprehensive runtime threat detection, analyzing syscall patterns to identify exploits and integrating with audit mechanisms for enriched logging similar to auditd's event capture.[75][82] This approach outperforms traditional auditd by reducing overhead while providing contextual security insights, such as correlating file accesses with process ancestry.[83] Advancements as of 2025 have positioned eBPF as a cornerstone for zero-trust architectures in cloud-native settings, with Isovalent's Cilium and Tetragon enabling enforcement of micro-segmentation policies that verify every workload interaction at the kernel level.[84][85] Tetragon's 2024 releases introduced enhanced policy languages for proactive blocking of threats like supply-chain attacks, demonstrating eBPF's scalability in production Kubernetes clusters.[86] Emerging 2025 use cases include eBPF for non-invasive tracing of large language models (LLMs) and AI agents, providing kernel-level observability into GPU workloads and inference pipelines.[20]

Ecosystem, Adoption, and Governance

Community Governance and Standards

The eBPF Foundation was established in August 2021 under the Linux Foundation to foster collaboration among eBPF-based open source projects, guide technical direction, and promote portability across platforms.[19] Founding members included Facebook (now Meta), Google, Isovalent, Microsoft, and Netflix, with the organization aimed at enhancing eBPF's security, reliability, and upstream development.[19] The Foundation operates with a Governing Board comprising representatives from platinum members such as CrowdStrike, Google, Isovalent, Meta, and Netflix, alongside silver members including Datadog and Intel.[87] Governance of eBPF development follows a consensus-based model within the Linux kernel community, where contributions are submitted via the [email protected] mailing list and reviewed by maintainers such as Alexei Starovoitov and Daniel Borkmann.[88] Developers maintain separate Git trees for the eBPF subsystem, with changes merged after discussion and testing on the mailing list to ensure stability and compatibility.[89] The eBPF Foundation's BPF Steering Committee (BSC), composed of community volunteers including Alan Jowett, Alexei Starovoitov, Andrii Nakryiko, Brendan Gregg, Daniel Borkmann, Joe Stringer, and KP Singh as of 2025, facilitates technical projects, community engagement, and funding decisions under a defined charter.[90][91] Standards efforts center on the BPF Type Format (BTF), introduced in the Linux kernel in 2018 as a compact binary format for encoding type information of kernel and eBPF programs, serving as the de facto standard for improving portability and developer tooling without relying on full kernel headers. Ongoing work includes developing an eBPF specification for non-Linux environments, such as the Microsoft eBPF for Windows project, which entered preview stages around 2022-2023 to enable cross-platform compatibility while maintaining kernel safety.[92] In June 2023, the Internet Engineering Task Force (IETF) chartered the BPF Working Group to document the eBPF ecosystem and propose extensions, actively engaging the eBPF Foundation for consensus-driven standardization.[93] Community events include the annual eBPF Summit, a virtual conference held since 2022 to showcase advancements in eBPF for networking, security, and observability, with editions in 2022, 2023, 2024, and the 2025 Hackathon Edition featuring talks from kernel maintainers and adopters.[94][95] The Linux Plumbers Conference (LPC) regularly hosts eBPF tracks, such as in 2023, for discussions on kernel integration and tooling.[91] In 2024, the Foundation awarded $250,000 in research grants—five $50,000 awards to academic institutions—to advance eBPF scalability, verifier security, and virtual memory support. In 2025, the Foundation awarded an additional $100,000 in research grants—two $50,000 awards—to projects enhancing eBPF flexibility, safety, and verifier capabilities.[96][97] Collaborations extend to the Cloud Native Computing Foundation (CNCF) for integrating eBPF into cloud-native tools like Cilium, a CNCF-graduated project for Kubernetes networking and security.[98] The Foundation also partners with the IETF on networking standards, ensuring eBPF's programmability aligns with broader protocol developments.[99]

Branding and Terminology Evolution

The term "eBPF," short for extended Berkeley Packet Filter, was introduced to distinguish the modern, extended version of the original Berkeley Packet Filter (BPF) technology, which was limited primarily to packet filtering, from its more versatile successor capable of broader kernel extensions.[2] This branding shift aimed to prevent confusion between classic BPF (cBPF) and the enhanced framework, with "eBPF" formally trademarked by the Linux Foundation through the eBPF Foundation to govern its usage in documentation, tools, and community resources.[100] Following its initial development in 2013–2014, eBPF terminology evolved from early references to "extended BPF" in kernel patches to a standardized "eBPF" designation post-2014, coinciding with its exposure to user space in Linux kernel version 3.18.[15] By Linux kernel 4.4 (released in 2016), eBPF became more formally documented in kernel sources as a distinct in-kernel virtual machine, enabling full utilization of its advanced features like 64-bit registers and bounded loops, while deprecating ambiguous use of "BPF" alone in modern contexts.[101] The eBPF Foundation's usage policy reinforces this by recommending "eBPF" for contemporary discussions to reflect its expanded scope beyond networking, avoiding standalone "BPF" which now lacks semantic relevance for non-packet-filtering applications.[2] Key terminology refinements include the adoption of "helper functions" over direct syscalls, as eBPF programs execute in a sandboxed environment without syscall access, relying instead on kernel-provided helpers (e.g., bpf_get_current_pid_tgid()) for operations like map access or packet manipulation.[102] Similarly, "maps" standardized as the primary data structure abstraction, encompassing hash maps, arrays, and other types, evolving from cBPF's simpler array-only model to support complex state management across program iterations.[103] The libbpf community's introduction of CO-RE (Compile Once – Run Everywhere) branding in 2020 further advanced portability terminology, addressing kernel version incompatibilities through BPF Type Format (BTF) and relocations, allowing programs to run across kernels without recompilation.[39] Community discussions, particularly on GitHub repositories like libbpf, influenced consistent API naming conventions, such as the bpf_prog_ prefix for program-related functions and types (e.g., bpf_prog_load()), ensuring clarity and uniformity in upstream kernel and user-space interfaces.[104]

Industry Adoption and Recent Developments

eBPF has achieved significant industry adoption, particularly among leading technology firms leveraging it for high-scale operations. Meta has processed every packet entering its data centers using eBPF since 2017, and in 2023, it deployed eBPF-based CPU schedulers that delivered 5% bandwidth gains.[105] Google utilizes eBPF for security auditing, packet processing, and performance monitoring across its data centers, with plans to test eBPF-based schedulers in early 2024; it also integrates eBPF for tracing in Android environments.[106] Netflix relies on eBPF to generate billions of flow logs hourly for observability and to detect noisy neighbors through run queue latency monitoring.[106] AWS incorporates eBPF via Cilium for networking and security in services like EKS Anywhere, enhancing container isolation in Firecracker microVMs.[106] In cloud-native environments, eBPF powers key tools that support Kubernetes deployments at scale. Cilium, an eBPF-driven networking, security, and observability platform, is adopted by over 100 companies and manages some of the largest Kubernetes clusters worldwide.[106] Falco, a Cloud Native Computing Foundation (CNCF) graduated project, employs eBPF probes to monitor kernel-level activities and detect anomalous behavior in containerized applications.[107] Advancements in 2024 and 2025 have expanded eBPF's reach beyond Linux. The Linux kernel 6.12, released in November 2024, introduced BPF scheduling capabilities and further refined eBPF support for real-time processing and hardware integration.[54] Microsoft advanced eBPF for Windows, with previews initiated in 2024 and general availability for select features, such as host routing in Azure Kubernetes Service, achieved by October 2025, while broader integration continues in development.[108][109] Concurrent research revealed eBPF's potential overhead in latency-sensitive network applications, with benchmarks showing up to 6 μs added response time when attached to certain hooks and increased latency from multiple attached programs.[110] The eBPF ecosystem has grown rapidly, as outlined in the Linux Foundation's "State of eBPF 2024" report, which highlights its transformative role in observability, security, and networking across startups, financial institutions, and telecommunications.[111] This growth includes over 2,000 eBPF-related GitHub repositories by 2024 and integration into edge computing scenarios for efficient data processing.[112] The technology now features more than 200 kernel helper functions, supported by contributions from hundreds of developers through the eBPF Foundation.[113] Numerous Fortune 500 companies, including Goldman Sachs, employ eBPF-based tools like Cilium for production observability.[106]

Security and Limitations

Built-in Security Features

eBPF incorporates several built-in security features designed to mitigate risks associated with executing user-supplied code in kernel space. Central to this is the eBPF verifier, a static analysis tool that examines programs before loading, ensuring they adhere to strict safety constraints to prevent kernel crashes or exploits.[2] Sandboxing is enforced by the verifier, which prohibits direct calls to arbitrary kernel APIs, instead restricting programs to a predefined set of safe helper functions that mediate interactions with kernel resources. The verifier also imposes limits on stack usage, typically capping it at 512 bytes per frame, and provides bounded iteration constructs to avoid infinite loops, thereby containing resource consumption and potential denial-of-service vectors. Privilege separation further bolsters this by requiring the CAP_BPF Linux capability (or CAP_SYS_ADMIN in older configurations) for loading programs, ensuring only authorized processes can inject code into the kernel. Isolation mechanisms prevent eBPF programs from accessing arbitrary kernel memory, as the verifier symbolically tracks pointer bounds and rejects invalid dereferences or out-of-bounds accesses. Programs execute in a restricted virtual machine environment within the kernel, with all data exchanges routed through vetted helpers that perform necessary validations. For shared data structures like maps, access is governed by file descriptor (FD) permissions, where map FDs must be explicitly passed to programs, enforcing least-privilege access control at the user-kernel boundary.[37] Auditing capabilities enable visibility into eBPF operations, with the bpf_prog_query() syscall allowing users to query and list active programs, including their IDs, types, and attachment points, facilitating inventory and monitoring. Kernel tracepoints, such as trace_bpf_prog_load and trace_bpf_prog_unload, provide hooks for logging program loading and unloading events, supporting forensic analysis and compliance auditing without runtime overhead.[114] Recent kernel enhancements strengthen these protections; starting from Linux 5.11, stricter controls on unprivileged eBPF loading were implemented via the kernel.unprivileged_bpf_disabled sysctl, which defaults to disabling non-root access in subsequent versions to reduce attack surface. For CO-RE (Compile Once - Run Everywhere) programs, BTF (BPF Type Format) metadata enables the verifier to validate type information and relocations at load time, ensuring compatibility and preventing type mismatches that could lead to unsafe memory accesses across kernel versions.[115] eBPF aligns with established mandatory access control (MAC) frameworks through its integration with Linux Security Modules (LSM), where BPF_PROG_TYPE_LSM programs attach to the same kernel hooks used by SELinux and AppArmor. This allows eBPF-based policies to complement or extend these modules, enforcing fine-grained security decisions while leveraging their established permission models for holistic system protection.[71]

Known Risks, Vulnerabilities, and Mitigations

Despite its robust verification mechanisms, eBPF has faced several vulnerabilities, particularly in the verifier component, which is responsible for ensuring program safety before loading. A notable example is CVE-2023-2163, a flaw in the eBPF verifier that allowed specially crafted programs to bypass bounds checks, potentially leading to kernel memory corruption.[116] This issue was patched in Linux kernel version 6.2, highlighting the verifier's susceptibility to complex program constructs that exploit edge cases in type propagation and loop analysis.[116] Additionally, historical verifier bypasses have enabled unsafe memory accesses, with security analyses documenting dozens of such CVEs since 2014.[117] In 2025, additional vulnerabilities like CVE-2025-37963 were disclosed, allowing potential bypasses in eBPF mitigations for privileged users, alongside research presented at DEF CON 33 on exploiting eBPF subsystems.[118][119] eBPF has also been abused to create advanced rootkits capable of evading detection by hiding system artifacts. In 2025, the LinkPro rootkit utilized eBPF tracepoint and kretprobe programs to intercept system calls like getdents and sys_bpf, effectively concealing malicious processes, files, and network connections from tools such as ps and bpftool.[120] Similar techniques leverage kprobes to hook kernel functions, altering process listings and syscall returns to mask rootkit activity without modifying kernel code.[121] These rootkits demonstrate eBPF's potential for persistence in privileged environments, particularly on Linux systems exposed via vulnerabilities like CVE-2024-23897 in Jenkins servers.[120] Key risks include the ability for unprivileged users to load eBPF programs in kernels prior to version 5.11, where such loading was permitted by default, potentially allowing local attackers to execute arbitrary kernel code.[122] Map-related risks involve overflows or excessive allocations leading to denial-of-service (DoS) conditions; for instance, attackers can exhaust kernel memory by creating numerous large eBPF maps or locking kernel threads through concurrent program executions.[123] To mitigate these risks, Linux kernel configurations such as CONFIG_BPF_UNPRIV_DEFAULT_OFF can be enabled to disable unprivileged eBPF loading by default, requiring explicit CAP_BPF or CAP_SYS_ADMIN privileges. Seccomp filters can restrict access to the bpf(2) syscall, limiting which programs or parameters untrusted processes can pass to the kernel.[124] Runtime monitoring tools like Falco, which itself leverages eBPF probes, detect anomalous program loadings or hook installations by auditing kernel events in real time.[75] For environments requiring complete disablement, kernel builds without eBPF support or lockdown modes can be used, though no direct boot parameter exists for runtime disabling. Best practices emphasize the principle of least privilege, granting eBPF programs only necessary hook types and map access to minimize attack surface.[125] Administrators should maintain regular kernel updates to incorporate verifier fixes and use tools like bpflock for auditing and locking eBPF resource access, preventing unauthorized program attachments.[126] In containerized setups, platforms like Kubernetes can enforce eBPF restrictions via admission controllers to block unverified loads. The eBPF Foundation has supported vulnerability research through 2024 grants totaling $250,000, funding projects such as an independent verifier code review by NCC Group, which identified flaws allowing privileged read/write access to kernel memory, and a comprehensive security threat model outlining attack vectors like map hijacking.[96][127][125] These efforts resulted in recommendations for enhanced runtime defenses and supply chain verification to address emerging threats.[128]

Performance Implications and Best Practices

eBPF programs execute within a constrained virtual machine environment, leveraging just-in-time (JIT) compilation to native code with minimal runtime overhead. Map operations, central to data sharing and state management, achieve O(1) average-case lookup times with hash maps, though performance can degrade from cache misses in contended or large-scale scenarios, adding tens of nanoseconds per access.[129] In networking, the XDP hook delivers substantial efficiency gains, often 10-100 times higher throughput than traditional netfilter-based filtering, enabling line-rate processing up to 10 million packets per second on commodity hardware when JIT is enabled.[130] These efficiencies come with trade-offs, particularly during program loading and runtime. Complex eBPF programs with high instruction counts—exceeding 1 million in some cases—can prolong verifier analysis to several seconds, as the verifier exhaustively simulates execution paths to ensure safety.[131] Analyses indicate that attaching programs to hooks like XDP or TC can introduce additional latency under peak loads, primarily from map contention and helper function invocations, though this remains far superior to user-space alternatives. Overall, eBPF's in-kernel execution minimizes context switches, yielding sub-microsecond latencies for simple traces, but scales best when programs avoid excessive computation to prevent CPU saturation in multi-tenant systems.[110] To optimize eBPF implementations, developers should minimize unbounded loops, which inflate verifier time and instruction limits, opting instead for bounded iterations or refactoring into multiple programs linked by tail calls for modular, efficient chaining without stack growth.[131] Tail calls enable up to 33 hops per invocation, promoting code reuse while keeping individual programs lean. Pinning maps to a filesystem path facilitates reuse across program loads, reducing recreation overhead and enabling persistent state in long-running applications.[132] Profiling tools like bpftool prog profile provide runtime insights into instruction execution and cache hit rates, allowing identification of hotspots such as frequent map updates.[133] Key optimizations include Compile Once-Run Everywhere (CO-RE), which relocates kernel structure offsets at load time via BTF metadata, enhancing portability across kernel versions.[134] For data exfiltration to user space, batching events into ring buffers minimizes per-event overhead, supporting high-volume observability with low-latency polling via single-consumer designs.[135] eBPF imposes inherent limitations that influence performance design: the absence of floating-point operations necessitates integer approximations for computations like statistics, potentially requiring more instructions for precision.[136] Global variables are restricted, with only 512 bytes of stack per program instance, compelling reliance on maps for larger state and avoiding deep call stacks. To scale across multi-core systems, per-CPU maps distribute contention by maintaining CPU-local instances, enabling lock-free access and linear throughput growth with core count.[137]

References

User Avatar
No comments yet.