Hubbry Logo
Perf (Linux)Perf (Linux)Main
Open search
Perf (Linux)
Community hub
Perf (Linux)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Perf (Linux)
Perf (Linux)
from Wikipedia
perf
Repositoryhttps://github.com/torvalds/linux/tree/master/tools/perf
Written inC
Operating systemLinux kernel
TypePerformance monitor and testing
LicenseGNU GPL
Websiteperf.wiki.kernel.org/index.php/Main_Page

perf (sometimes called perf_events[1] or perf tools, originally Performance Counters for Linux, PCL)[2] is a performance analyzing tool in Linux, available from Linux kernel version 2.6.31 in 2009.[3] Userspace controlling utility, named perf, is accessed from the command line and provides a number of subcommands; it is capable of statistical profiling of the entire system (both kernel and userland code).

It supports hardware performance counters, tracepoints, software performance counters (e.g. hrtimer), and dynamic probes (for example, kprobes or uprobes).[4] In 2012, two IBM engineers recognized perf (along with OProfile) as one of the two most commonly used performance counter profiling tools on Linux.[5]

Implementation

[edit]

The interface between the perf utility and the kernel consists of only one syscall and is done via a file descriptor and a mapped memory region.[6] Unlike LTTng or older versions of oprofile, no service daemons are needed, as most functionality is integrated into the kernel. The perf utility dumps raw data from the mapped buffer to disk when the buffer becomes filled up. According to R. Vitillo (LBNL), profiling performed by perf involves a very low overhead.[6]

As of 2010, architectures that provide support for hardware counters include x86, PowerPC64, UltraSPARC (III and IV), ARM (v5, v6, v7, Cortex-A8 and -A9), Alpha EV56 and SuperH.[4] Usage of Last Branch Records,[7] a branch tracing implementation available in Intel CPUs since Pentium 4, is available as a patch.[6] Since version 3.14 of the Linux kernel mainline, released on 31 March 2014, perf also supports running average power limit (RAPL) for power consumption measurements, which is available as a feature of certain Intel CPUs.[8][9][10]

Perf is natively supported in many popular Linux distributions, including Red Hat Enterprise Linux (since its version 6 released in 2010)[11] and Debian in the linux-tools-common package (since Debian 6.0 (Squeeze) released in 2011).[12]

Subcommands

[edit]

perf is used with several subcommands:

  • stat: measure total event count for single program or for system for some time
  • top: top-like dynamic view of hottest functions
  • record: measure and save sampling data for single program[13]
  • report: analyze file generated by perf record; can generate flat, or graph profile.[13]
  • annotate: annotate sources or assembly
  • sched: tracing/measuring of scheduler actions and latencies[14]
  • list: list available events

Criticism

[edit]

The documentation of perf is not very detailed (as of 2014); for example, it does not document most events or explain their aliases (often external tools are used to get names and codes of events[15]).[16] Perf tools also cannot profile based on true wall-clock time,[16] something that has been addressed by the addition of off-CPU profiling.

Security

[edit]

The perf subsystem of Linux kernels from 2.6.37 up to 3.8.8 and RHEL6 kernel 2.6.32 contained a security vulnerability (CVE-2013-2094), which was exploited to gain root privileges by a local user.[17][18] The problem was due to an incorrect type being used (32-bit int instead of 64-bit) in the event_id verification code path.[19]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Perf is a comprehensive performance analysis framework and toolset integrated into the , enabling the monitoring, recording, and analysis of hardware and software events such as CPU cycles, cache misses, and tracepoints to diagnose system bottlenecks and optimize workloads. Developed primarily by Ingo Molnar with contributions from Peter Zijlstra, Paul Mackerras, and others, it was introduced in Linux kernel version 2.6.31 in September 2009 as the "Performance Counters for Linux" subsystem, unifying previously fragmented tools for hardware performance monitoring units (PMUs) across architectures like x86 and PowerPC. The core perf command serves as a modular interface, supporting subcommands such as perf stat for event counting, perf record for sampling profiles into data files, perf report for visualizing results, and perf top for real-time monitoring, all leveraging the kernel's perf_events API for low-overhead data collection. Over time, perf has evolved to include advanced features like scripting support in Python and , integration with for custom probes, and architecture-specific extensions, making it a standard tool for kernel developers, system administrators, and performance engineers. By default configurable via the CONFIG_PERF_EVENTS kernel option, perf requires appropriate privileges (e.g., or membership in the perf_users group) to access sensitive hardware counters and prevent information leaks.

Overview

Definition and Purpose

Perf is a command-line performance analysis tool integrated into the , enabling the profiling of CPU, memory, I/O, and other system events through hardware performance monitoring units (PMUs), software counters, tracepoints, and dynamic tracing mechanisms such as kprobes and uprobes. It provides a unified interface to the perf_events kernel subsystem, allowing users to collect and analyze on hardware-level events like instruction executions, cache misses, and branch predictions, as well as software-level events for broader system . The primary purposes of perf include identifying performance bottlenecks in applications and the kernel, measuring overall system and workload efficiency, and facilitating kernel-level observability to debug and optimize resource utilization. By supporting sampling-based profiling, it enables detailed insights into event frequencies and hotspots without requiring invasive instrumentation, thus aiding developers and system administrators in enhancing software performance across diverse workloads. Originating from the Performance Counters for (PCL) project, perf has evolved from a basic framework for hardware counter access into a comprehensive suite that encompasses both sampling and tracing functionalities, with ongoing development within the source tree. Key benefits of perf include its low-overhead sampling approach, which minimizes perturbation to the system being analyzed, support for multiple architectures such as x86, , and PowerPC, and extensibility through dynamic probes and scripting capabilities that allow customization for specific analysis needs.

Historical Development

The perf_events subsystem originated in 2009 as a unified interface for accessing hardware performance counters in the Linux kernel, developed primarily by Thomas Gleixner and Ingo Molnar in response to earlier proposals like perfmon2. This effort addressed the fragmentation in performance monitoring tools by providing a standardized kernel API that supported sampling, counting, and multiplexing of events across diverse architectures. The subsystem was integrated into the mainline Linux kernel with version 2.6.31, released in September 2009, marking the debut of the associated userspace tool known initially as "perf". Prior to mainline inclusion, the project was known as Performance Counters for Linux (PCL), but it underwent a significant rename to perf_events to better reflect its expanded scope beyond mere counters to a broader performance events framework. This renaming occurred in September 2009, just before the 2.6.31 release, and facilitated its acceptance by emphasizing compatibility with existing kernel tracing infrastructure. Key early contributions came from core kernel developers, including patches for event handling and syscall integration, which laid the foundation for subsequent enhancements. Major enhancements followed rapidly, with dynamic Performance Monitoring Unit (PMU) support added in December 2010 through kernel commit 2e80a82a, enabling runtime registration of PMU types for greater flexibility across hardware vendors. The integration of (BPF) capabilities began in the early 2010s, with initial tracing-related features like uprobes merged in kernel 3.5 (2012), paving the way for programmable event processing. By 2015, extended BPF (eBPF) enhancements allowed custom tracing programs to output data directly to perf events, revolutionizing kernel by enabling safe, efficient user-defined probes and summaries without modifying kernel code. In the 2020s, development shifted toward scalability for cloud environments and multi-core systems, with optimizations for high-core-count processors and distributed tracing to handle the demands of modern datacenters. , a prominent kernel tracing expert, contributed extensively to these areas through tools like bpftrace and extensions integrating perf with for advanced observability. As of late 2024, updates in 6.12 included perf improvements for emerging hardware, such as enhanced PMU support for Lunar Lake and Arrow Lake processors, enabling better profiling for compute-intensive workloads. Development has continued into 2025, with kernel 6.17 (released September 2025) incorporating further refinements to perf_events for ongoing hardware support and observability features.

Architecture

Kernel Infrastructure

The perf_events subsystem forms the core kernel framework in for accessing hardware performance monitoring units (PMUs) and software events, enabling the collection of performance data through a unified interface. Introduced in version 2.6.31, it abstracts the differences between various CPU architectures and PMU implementations, allowing tools to monitor events like CPU cycles, cache misses, and system calls without direct hardware-specific programming. This subsystem manages event allocation, scheduling, and data delivery, ensuring compatibility across x86, , and other architectures. The primary kernel interface is the perf_event_open(2) system call, which creates a file descriptor for an event session and supports parameters such as pid (process ID, e.g., -1 for system-wide monitoring), cpu (target CPU, e.g., -1 for any available CPU), and config (event-specific settings, e.g., PERF_COUNT_HW_CPU_CYCLES for hardware CPU cycle counting). Additional parameters include type (e.g., PERF_TYPE_HARDWARE for PMU events or PERF_TYPE_SOFTWARE for kernel software events) and group_fd (for event grouping, e.g., -1 to create a new leader event). This syscall handles event configuration via the perf_event_attr structure, which specifies sampling periods, frequency modes, and inheritance options, with capabilities evolving across kernel versions (e.g., dynamic PMU support added in 2.6.38). Support for hardware counters is provided through PMU drivers, which expose vendor-specific features such as Intel's Precise Event-Based Sampling (PEBS) for low-overhead, precise instruction-level profiling and AMD's Instruction-Based Sampling (IBS) for detailed fetch and op execution analysis. Software events, in contrast, are kernel-generated counters like page faults (PERF_COUNT_SW_PAGE_FAULTS) and context switches (PERF_COUNT_SW_CONTEXT_SWITCHES), offering insights into system behavior without relying on hardware. Both types integrate seamlessly, with hardware events leveraging PMU capabilities for high-precision timing and software events providing aggregated kernel statistics. Data capture occurs via a ring buffer mechanism, implemented using to map kernel pages into user space for low-latency transfer of sampled events, with metadata tracked in the perf_event_mmap_page including head and tail pointers for producer-consumer . This design minimizes overhead by allowing asynchronous reads and overflow handling through signals or polling, while enables rotation among events when hardware counters are limited (e.g., via time_enabled and time_running fields to normalize counts). Scalability is enhanced by per-CPU buffers, where events can be bound to specific CPUs (cpu >= 0) for system-wide collection on multi-core systems, reducing contention and enabling parallel data gathering across processors. Group events further support this by allowing multiple related counters (e.g., cycles and instructions) to be scheduled atomically as a unit, ensuring correlated sampling and synchronized enabling/disabling via ioctls like PERF_EVENT_IOC_ENABLE, which is critical for accurate ratio computations in performance analysis.

Userspace Components

The primary userspace component of perf is the perf binary, which serves as the for performance analysis and monitoring. This executable is built from the tools/perf directory within the source tree, enabling users to interact with kernel performance events through syscalls like perf_event_open. In most Linux distributions, the perf binary is distributed as part of the linux-tools package family; for instance, on Debian-based systems such as , it can be installed via apt install linux-tools-common linux-tools-generic linux-tools-$(uname -r), ensuring compatibility with the running kernel. The version of perf is typically aligned with the corresponding kernel version—for example, perf 6.12 accompanies 6.12—to maintain feature parity and avoid ABI mismatches. perf's userspace functionality is supported by several key libraries that handle specialized tasks such as , trace , and resolution. The libperf , located in tools/lib/perf, provides a high-level C API for accessing the kernel's perf events subsystem, including functions for opening events (perf_evsel__open), reading samples (perf_evsel__read), and memory-mapping buffers (perf_evsel__mmap), abstracting low-level syscall details for developers building custom tools. libtraceevent, a separate maintained under the kernel's libtrace umbrella, is essential for parsing and kernel trace events, enabling perf to decode raw trace data into human-readable formats during analysis. Additionally, libdw from the elfutils package facilitates DWARF-based unwinding of call stacks, allowing perf to resolve symbols and generate accurate stack traces from sampled data without requiring frame pointers in binaries. Building perf from source requires specific dependencies to compile its userspace components fully. Kernel headers must be installed to access performance event definitions and syscall interfaces, while the elfutils development package (providing libelf for ELF file handling and libdw for debugging support) is mandatory for features like symbol resolution and unwinding. Optional dependencies include Python development headers for enabling scripting capabilities, such as custom event processing scripts. The build process involves navigating to the tools/perf directory in the kernel source and running make, which configures and compiles the binary along with embedded libraries like libperf. perf's design emphasizes extensibility in userspace, allowing customization beyond its core features. Plugins can be developed to support alternative output formats, such as integrating with visualization tools or exporting data in proprietary schemas, by leveraging the plugin API in the perf build system. Furthermore, the perf script subcommand facilitates scripting extensibility, enabling users to process recorded event streams with custom Python or scripts for tailored analysis, such as filtering events or generating reports. This modular approach, combined with the libraries' APIs, supports integration into larger profiling workflows while maintaining a lightweight footprint.

Core Functionality

Event Monitoring and Sampling

Perf supports a variety of event types for monitoring and application behavior. Hardware events, accessed via the PERF_TYPE_HARDWARE type, capture low-level CPU metrics such as cycles executed, instructions retired, cache references and misses, instructions and misses, bus cycles, and stalled cycles. These events leverage on-chip performance monitoring units (PMUs) to provide direct hardware counters without significant software intervention. Software events, defined under PERF_TYPE_SOFTWARE, track kernel-level occurrences including CPU clock ticks, task clock time, page faults (major and minor), context switches, and CPU migrations. Tracepoint events, using PERF_TYPE_TRACEPOINT, interface with static kernel probes to observe specific kernel subsystems, such as system calls (e.g., entry and exit points for functions like execve), scheduling decisions, and block I/O operations, by referencing IDs from the debugfs tracing/events hierarchy. Additionally, hardware breakpoints, enabled through PERF_TYPE_BREAKPOINT since 2.6.33, allow monitoring of read/write accesses or instruction execution at specific addresses using CPU debug registers. Sampling in perf operates in modes that balance precision, overhead, and granularity. Periodic sampling collects at fixed intervals, either every N events (via sample_period) or at a target (via sample_freq with the freq flag enabled), where the kernel adjusts dynamically to approximate the desired rate. In contrast, precise sampling minimizes "skid"—the displacement between the sampled event and the recorded instruction pointer—using hardware features like Intel's Precise Event-Based Sampling (PEBS). PEBS, available on x86 when precise_ip is set to 1, 2, or 3, captures the instruction pointer and register state at the moment of event retirement, reducing uncertainty in attribution compared to standard interrupt-based sampling (precise_ip=0). The precise_ip level specifies the requested precision: level 1 for skid-avoiding sampling, level 2 for maximally skid-free where possible, and level 3 mandating zero skid or disabling the event. When the performance counter overflows—reaching the configured sample_period or frequency threshold—perf triggers an to handle the sample. This interrupt-based mechanism notifies userspace via poll(2), select(2), or signals, writing the sample data (including timestamp, PID, CPU, and event value) to a mmap(2)-ed buffer. For deeper analysis, call stack capture can be enabled with the PERF_SAMPLE_CALLCHAIN bit in sample_type, recording the user or kernel stack backtrace up to a configurable depth limited by /proc/sys/kernel/perf_event_max_stack (default 127 frames since Linux 4.8). Stack unwinding during overflow processing incurs additional CPU cost, particularly for user-space frames requiring frame pointer or DWARF-based reconstruction. Aggregation in perf allows tailoring monitoring scope to specific contexts. CPU-wide aggregation (pid=-1, cpu>=0) captures events across all processes on a designated CPU, requiring elevated privileges like CAP_PERFMON or CAP_SYS_ADMIN. Process-specific monitoring targets a single process (pid>0, cpu=-1 for any CPU or cpu>=0 for a specific one), while thread-level granularity follows individual threads by their task IDs. This enables focused without system-wide noise, though may occur if hardware counters are exhausted. The sample rate in perf is fundamentally determined by the ratio of total events observed to the sampling period, yielding the number of samples as total_events / sampling_period. For overhead estimation, a rough approximation considers the product of sampling , maximum depth, and the per-frame unwind cost (typically in cycles for stack walking), as overhead ≈ sampling_freq × stack_depth × unwind_cost; this highlights how higher precision and deeper traces amplify intrusion.

Key Subcommands

perf provides a suite of subcommands for performance analysis, each tailored to specific aspects of event monitoring and data handling. These tools leverage the underlying perf_events kernel interface to capture and interpret hardware and software events efficiently. perf stat counts performance events over the duration of a workload, delivering aggregate statistics such as instructions executed, cycles, and derived metrics like instructions per cycle (IPC), which measures CPU efficiency. It supports system-wide or process-specific collection with options for event selection and detailed breakdowns, enabling quick assessment of basic performance characteristics without generating large data files. perf record samples performance data into a file for offline analysis, capturing events like CPU cycles or cache misses at specified frequencies. A key option, -g or --call-graph, enables recording of call graphs to trace function call stacks in both kernel and user space, facilitating deeper profiling of code paths. This subcommand is essential for workloads requiring post-execution examination. perf report serves as an interactive viewer for data recorded by perf record, presenting hierarchical profiles sorted by overhead and allowing navigation through call graphs. Users can filter by symbols or apply thresholds like --percent-limit to focus on entries exceeding a specified overhead percentage, aiding in identification of bottlenecks. It supports sorting by various criteria, such as or . perf list enumerates all available performance events and performance monitoring units (PMUs), including hardware, software, cache, and tracepoint types. It displays symbolic names, raw encodings, and PMU-specific details, helping users select appropriate events with modifiers like precise sampling levels. This subcommand is crucial for discovering configurable monitoring capabilities on a given system. perf top offers real-time monitoring akin to the top utility, continuously sampling and displaying profiles of hot functions ordered by overhead. It updates dynamically to show current system or process activity, with options for event selection and PID targeting, providing immediate insights into performance hotspots without file I/O. Among other essential subcommands, perf script exports raw trace data from perf records for custom processing or scripting, using options like --dump-raw-trace to output verbose event details in a format suitable for further tools. perf mem specializes in memory access profiling, recording and reporting load/store operations with support for latency analysis on platforms like and , using options such as -t to specify trace types. These extend perf's versatility for targeted investigations.

Advanced Usage

Profiling Techniques

Perf provides several profiling techniques to identify performance bottlenecks in applications and the system, leveraging its sampling and tracing capabilities for detailed analysis. One common approach is CPU profiling, which captures instruction-level hotspots by sampling hardware events such as CPU cycles. To perform this, users execute perf record -e cycles -g to record samples at cycle events with call-graph information enabled via the -g flag, producing a perf.data file that includes stack traces for hotspots. Subsequent analysis with perf report visualizes the data hierarchically, often piped to tools for graphs that illustrate depths and frequencies, highlighting functions consuming the most cycles. This method is particularly effective for pinpointing compute-intensive code paths in user-space applications or kernel routines. For memory analysis, perf employs specialized sampling to examine access patterns and latencies. The perf mem record command captures load and store events, recording details like memory addresses and latencies to identify cache misses and bandwidth issues. Analysis via perf mem report aggregates this data, showing distributions of access types (e.g., L1/L2 cache hits, DRAM accesses) and functions responsible for high-latency operations, such as those causing frequent last-level cache misses. This technique helps diagnose memory-bound workloads by quantifying stall cycles due to data movement, guiding optimizations like data locality improvements. I/O tracing in perf focuses on block device interactions to uncover disk bottlenecks. By recording tracepoints such as block:* (e.g., perf record -e block:*), users capture events like request issues, completions, and latencies for read/write operations. The resulting traces, viewed with perf script or perf report, reveal per-request details including process IDs, byte counts, and queue depths, enabling identification of I/O-intensive processes or suboptimal access patterns like random seeks over sequential reads. Distinguishing kernel and user-space contributions requires accurate stack unwinding, often achieved with the --call-graph dwarf option in perf record. This uses debugging information to reconstruct full call stacks across boundaries, capturing transitions like syscalls without relying on frame pointers, which may be absent in optimized builds. It ensures profiles show complete paths, such as user-space functions invoking kernel I/O routines, providing context for mixed-mode performance issues. Best practices for effective profiling begin with perf stat for quick, non-intrusive metrics like total cycles or cache miss rates over short runs, establishing baselines before deeper sampling. For comprehensive investigations, scale to perf record with targeted events, adjusting sampling periods (e.g., via -F for ) to balance overhead and resolution. To mitigate noisy profiles from system variability, conduct multiple runs and aggregate results, using statistical methods to compute averages and confidence intervals for stable hotspot identification. Always profile under representative workloads to ensure relevance, starting broad and narrowing to specific events as insights emerge.

Integration with Other Tools

Perf synergizes with through the BPF Compiler Collection (BCC), a toolkit that enables the creation of custom eBPF probes and scripts for kernel and user-space tracing. The perf trace subcommand can capture system-wide events, which BCC tools extend by attaching eBPF programs to kernel tracepoints, kprobes, or uprobes for dynamic without kernel recompilation. For instance, BCC's Python-based tools, such as execsnoop or biolatency, leverage eBPF to script complex traces that build on perf's event sampling, allowing users to filter and aggregate data in-kernel for low-overhead . For visualization, perf exports sampled stack traces via perf script, which can be processed into formats compatible with external tools like Flame Graphs. The process involves collapsing stacks with scripts from the FlameGraph repository—e.g., perf script | ./stackcollapse-perf.pl > out.folded followed by ./flamegraph.pl out.folded > profile.svg—to generate interactive SVGs highlighting CPU hotspots. Similarly, perf report --stdio outputs text-based profiles that, when piped through perf script with fields like timestamps and symbols, can be imported into speedscope, a web-based viewer for interactive flame graph analysis of perf data. In debugging scenarios, perf interfaces with GDB by generating symbolic traces via perf script --fields sym,cpu, which provide stack frames, program counters, and CPU details for post-mortem analysis in GDB sessions. This allows correlating perf's sampled events with GDB's disassembly and variable inspection for deeper code-level insights. For kernel tracing, perf integrates with through the perf ftrace subcommand, a wrapper that reads from /sys/kernel/debug/tracing/trace_pipe to capture function graphs, latencies, and profiles while supporting filters for targeted events. Perf supports containerized environments like Docker by requiring elevated capabilities such as --cap-add SYS_ADMIN (or --cap-add PERFMON on newer kernels) to access performance monitoring units (PMUs) and tracepoints inside isolated namespaces. This enables low-level profiling of container workloads without full --privileged mode, complementing higher-level monitoring tools like cAdvisor, which aggregates cgroup-based metrics (e.g., CPU and memory usage) for Prometheus export, allowing combined views of container performance from coarse-grained resource stats to fine-grained perf traces. As of 2025, perf has enhanced synergy with Rust-based loaders, such as those using the aya-rs framework, which compile code to eBPF bytecode for loading via libbpf and integration with perf's tracing ecosystem for safer, memory-safe kernel probes. This allows developers to author custom eBPF programs in Rust that attach to perf events, improving in high-performance scenarios.

Concerns and Limitations

Performance Overhead and Criticisms

The primary sources of performance overhead in perf stem from sampling interrupts, which can consume 1-5% of CPU cycles in typical configurations, escalating to 14% or higher with multiple event instances. Ring buffer copies further contribute by transferring sampled data from kernel to userspace, adding latency during high-volume event capture, while stack unwinding—particularly for deep call stacks using frame pointers or DWARF debugging information—can introduce additional costs. These overheads arise during event monitoring, where interrupt handling and data processing interrupt normal execution flow, as seen in event sampling mechanics. Criticisms of perf often highlight its steep , attributed to the extensive array of options and subcommands that require familiarity with kernel internals and hardware specifics to use effectively. Cross-architecture support remains incomplete, with notably weaker performance monitoring capabilities on platforms until improvements in 2024 via SBI PMU and Sscofpmf extensions. Additionally, sampling can yield misleading results for short workloads, as inconsistent sample sizes fail to capture representative event distributions despite fixed frequencies. To mitigate these overheads, users are advised to prioritize hardware performance monitoring unit (PMU) events over software approximations, which reduce kernel intervention and associated costs. Limiting sampling frequency to 1-2 kHz, below the default 4 kHz, balances detail against overhead by decreasing rates. Community discussions reveal debates over perceived bloat from perf's growing number of subcommands, which expand functionality but complicate the toolset for casual users. There have been calls for improved defaults, with kernel 6.10 introducing enhancements like better event subsystem features to streamline usage without custom tuning. As of kernel 6.12 (2025), further PMU support enhancements continue for architectures like .

Security Implications

The security model of perf in is designed to mitigate risks associated with performance monitoring, which can potentially expose sensitive system information. Access to perf_events is primarily controlled through kernel capabilities and the perf_event_paranoid parameter, which governs unprivileged user access. Processes with the CAP_SYS_ADMIN capability can bypass all restrictions, enabling full system-wide monitoring, though this is considered overly permissive for security-conscious environments. Alternatively, the CAP_PERFMON capability, introduced in 5.8, provides a more targeted privilege for performance monitoring without the broader scope of CAP_SYS_ADMIN. The perf_event_paranoid tunable offers four levels (-1 to 2) to restrict access: -1 imposes no limits; 0 allows system-wide monitoring excluding raw tracepoints; 1 limits to per-process events including kernel ; and 2 (the default) restricts to per-process user-space events only. These mechanisms ensure that unprivileged users cannot monitor arbitrary processes or access kernel internals without explicit authorization. Key risks in using perf stem from its ability to observe hardware performance counters and tracepoints, which can enable side-channel attacks leaking sensitive data such as memory addresses, execution contexts, or process behaviors. For instance, unauthorized monitoring of other processes could reveal timing information exploitable for inferring cryptographic keys or private data, while improper configuration might allow info leaks via performance counter side effects. To address unauthorized process monitoring, perf enforces ptrace-like scoping, where access is limited to processes under the same user ID or those attachable via ptrace rules, preventing cross-user surveillance without elevated privileges. Since Linux kernel 5.8, group-based access has been facilitated through the perf_users group, allowing non-root users to perform monitoring by assigning CAP_PERFMON (and related capabilities like CAP_SYS_PTRACE for older kernels) to the perf binary via file capabilities. Administrators can create this group with groupadd perf_users, set ownership with chgrp perf_users /usr/bin/perf, restrict permissions with chmod o-rwx /usr/bin/perf, and apply capabilities using setcap "cap_perfmon,cap_sys_ptrace=ep" /usr/bin/perf. This setup enables scoped, non-root usage while maintaining isolation from full administrative access. In modern kernels as of 2025, security has been further bolstered by enhanced SELinux policies that include specific access controls for the perf_event class, such as watch permissions for monitoring events and attaching eBPF programs, preventing unauthorized syscall invocations. Additionally, integration with leverages the kernel's verifier for sandboxing, which statically analyzes and limits eBPF programs attached to tracepoints, restricting them to safe operations and mitigating risks from malicious or erroneous tracing code. Best practices for securing perf include setting kernel.perf_event_paranoid=2 in production environments to limit exposure, as this balances usability with protection against broad monitoring. Enabling audit logging for perf_event_open syscall calls via auditd rules (e.g., -a always,exit -F arch=b64 -S perf_event_open -k perf_access) allows tracking and alerting on monitoring attempts, facilitating forensic analysis and policy enforcement.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.