Hubbry Logo
CoremarkCoremarkMain
Open search
Coremark
Community hub
Coremark
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Coremark
Coremark
from Wikipedia

CoreMark is a benchmark that measures the performance of central processing units (CPU) used in embedded systems. It was developed in 2009[1] by Shay Gal-On at EEMBC and is intended to become an industry standard, replacing the Dhrystone benchmark.[2] The code is written in C and contains implementations of the following algorithms: list processing (find and sort), matrix manipulation (common matrix operations), state machine (determine if an input stream contains valid numbers), and CRC. The code is under the Apache License 2.0 and is free of cost to use, but ownership is retained by the Consortium and publication of modified versions under the CoreMark name prohibited.[3]

Issues addressed by CoreMark

[edit]

The CRC algorithm serves a dual function; it provides a workload commonly seen in embedded applications and ensures correct operation of the CoreMark benchmark, essentially providing a self-checking mechanism. Specifically, to verify correct operation, a 16-bit CRC is performed on the data contained in elements of the linked list.

To ensure compilers cannot pre-compute the results at compile time every operation in the benchmark derives a value that is not available at compile time. Furthermore, all code used within the timed portion of the benchmark is part of the benchmark itself (no library calls).

CoreMark versus Dhrystone

[edit]

CoreMark draws on the strengths that made Dhrystone so resilient - it is small, portable, easy to understand, free, and displays a single number benchmark score. Unlike Dhrystone, CoreMark has specific run and reporting rules, and was designed to avoid the well understood issues that have been cited with Dhrystone.

Major portions of Dhrystone are susceptible to a compiler’s ability to optimize the work away; thus it is more a compiler benchmark than a hardware benchmark. This also makes it very difficult to compare results when different compilers/flags are used.

Library calls are made within the timed portion of Dhrystone. Typically, those library calls consume the majority of the time consumed by the benchmark. Since the library code is not part of the benchmark, it is difficult to compare results if different libraries are used. Guidelines exist on how to run Dhrystone but since results are not certified or verified, they are not enforced.[citation needed] There is no standardization on how Dhrystone results should be reported, with various formats in use (DMIPS, Dhrystones per second, DMIPS/MHz)

CoreMark tests

[edit]

Source code for, and instructions for running, both the original CoreMark benchmark and the CoreMark-PRO benchmark are available on GitHub.[4][5]

The tests in the original CoreMark benchmark are intended to run on a wide range of processors from 8-bit microcontrollers to 64-bit microprocessors, and focuses on single-threaded integer processor performance, with support for testing parallel threads.[6]

The CoreMark-PRO benchmark is intended to run on a much narrower range of 32-bit to 64-bit processors, and includes larger data sets for stress-testing larger memory subsystems, tests for floating-point performance, and tests for parallel-processing performance.[6]

While both CoreMark and CoreMark-PRO can be run "bare metal", they are also often run on machines with an operating system, such as with the Phoronix Test Suite.[7]

Results

[edit]

CoreMark results can be found on the CoreMark web site,[8] and on processor data sheets. Results are in the following format:

CoreMark 1.0 : N / C / P / M

  • N Number of iterations per second (with seeds 0,0,0x66,size=2000)
  • C Compiler version and flags
  • P Parameters such as data and code allocation specifics
  • M – Type of Parallel algorithm execution (if used) and number of contexts

For example: CoreMark 1.0 : 128 / GCC 4.1.2 -O2 -fprofile-use / Heap in TCRAM / FORK:2

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
CoreMark is a standardized benchmark suite designed to evaluate the core performance of central processing units (CPUs) and microcontrollers (MCUs) in embedded systems by executing a set of representative algorithms and producing a comparable single-number score. Developed by the Embedded Microprocessor Benchmark Consortium (EEMBC), was introduced in June 2009 as a free, open-source alternative to outdated benchmarks like , which suffered from synthetic code, heavy compiler optimization biases, and lack of standardization in reporting. The motivation behind its creation was to provide a meaningful, real-world metric for CPU efficiency in resource-constrained environments, focusing on common data structures and operations rather than misleading indicators like clock speed or floating-point operations per second. At its core, CoreMark incorporates three primary algorithms to test diverse aspects of processor functionality: list processing, which involves linked lists with pointer manipulations, sorting, and reversal to assess memory access and cache behavior; matrix manipulation, featuring integer matrix multiplications with constants, vectors, and other matrices to evaluate loop efficiency and instruction set accelerators; and state machine behavior, using switch and if-else statements to process input strings and count state transitions, probing control flow and branching performance. These algorithms are self-verifying through 16-bit cyclic redundancy checks (CRC) and require only about 2 KB of memory, making the benchmark suitable for devices ranging from 8-bit MCUs to 64-bit processors. To prevent compiler optimizations from skewing results, all data is initialized at runtime, and the timed execution avoids external library calls. Performance is quantified in CoreMarks, representing iterations completed per second, often normalized as CoreMarks/MHz to account for clock speed variations, with results requiring disclosure of versions, flags, and hardware details for transparency. Certified scores, verified by EEMBC's lab, ensure adherence to strict rules, and the benchmark has become the for embedded CPU comparisons, with extensions like CoreMark-Pro adding parallelism and floating-point workloads, and integrations in power efficiency suites such as ULPMark. Widely adopted by vendors, CoreMark facilitates objective evaluations in applications from IoT devices to automotive systems.

Overview

Definition and Purpose

CoreMark is an industry-standard benchmark developed by the Embedded Microprocessor Benchmark Consortium (EEMBC) to evaluate the performance of central processing units (CPUs) and microcontrollers (MCUs) in embedded systems. It serves as a simple, portable tool for measuring core CPU efficiency, independent of specific system characteristics such as or processor architecture. The primary purpose of CoreMark is to deliver a single, standardized metric that focuses on workloads common in embedded applications, facilitating fair and repeatable comparisons across diverse hardware platforms and compilers. By isolating core performance from external factors like memory subsystems and I/O operations, it provides a quick indicator of processor capabilities suitable for quick evaluations. Key design principles emphasize hardware performance over compiler optimizations, achieved through realistic mixtures of read/write, integer, and control operations that prevent pre-computable results and misleading synthetic metrics. The benchmark excludes external library calls, relying on compact ANSI C code limited to 16 KB in size, to ensure scores accurately reflect the processor's intrinsic abilities without dependencies on system-specific features. Introduced to overcome limitations in legacy benchmarks, CoreMark simulates practical embedded tasks involving data manipulation and control flow, such as list processing and matrix operations.

Development History

CoreMark was developed in 2009 by Shay Gal-On, then director of software development at the Embedded Microprocessor Benchmark Consortium (EEMBC), to address the limitations of existing embedded processor benchmarks and provide a modern standard for measuring CPU performance in resource-constrained environments. The benchmark emerged as a response to the inadequacies of older metrics like , which had become outdated for contemporary embedded systems. EEMBC launched 1.0 on June 1, 2009, making it the organization's first openly available benchmark, distributed freely under a permissive license to encourage widespread adoption and porting across diverse architectures. The initial release included comprehensive , such as a and webinar resources from September 2009, to facilitate implementation on various embedded platforms while adhering to strict execution rules for consistent scoring. By 2010, had surpassed 2,000 downloads, establishing it as a de facto industry standard for embedded performance evaluation. A key milestone occurred in 2012 when EEMBC highlighted its growing impact, with nearly 8,000 users having downloaded the benchmark, underscoring its role in standardizing measurements under EEMBC's oversight. EEMBC integrated into its certification processes through a dedicated lab, ensuring verified scores for official submissions and promoting reliability across processor vendors. CoreMark's evolution continued with ongoing updates to enhance portability, including adaptations for new compiler toolchains and architectures while maintaining its core simplicity. In 2015, EEMBC introduced CoreMark-Pro as an advanced variant, incorporating multiple workloads for more comprehensive testing of multi-core and higher-end processors, though the original remained the baseline for basic embedded benchmarking. In October 2023, EEMBC merged with the (SPEC) to form SPEC's Embedded Group, which continues to maintain , certify scores, and support its periodic refinements for evolving embedded ecosystems, solidifying its position as a standardized tool for performance assessment.

Methodology

Algorithms Used

CoreMark employs three distinct algorithms designed to evaluate key aspects of embedded processor performance, focusing exclusively on operations without or input/output dependencies. These algorithms—list processing, matrix manipulation, and state machine—use fixed-size datasets to ensure portability across different hardware architectures and compilers, with a 16-bit (CRC) integrated for self-verification of outputs. The processing simulates common access patterns and in embedded applications by performing find and sort operations on a . It operates on a number of elements determined by pointer size and the available block, consisting of list headers and data items, where the list is initialized with a mix of sequential and non-sequential pointers to test cache efficiency and pointer manipulation. Operations include searching for specific values, reversing the list structure, and sorting using a on 16-bit data and indices, all without relying on dynamic allocation to maintain portability. This targets computations and handling typical in resource-constrained systems. The CRC is computed on the list data to verify integrity. Matrix manipulation assesses arithmetic throughput and by executing and on small matrices of fixed dimensions derived from the block. It uses two 16-bit input matrices and one 32-bit output matrix, with operations involving constants, vectors, or full matrices, including bit extractions to exercise data handling. These computations mimic or control tasks in embedded devices, emphasizing efficient math without specialized instructions unless naturally supported by the processor. The fixed matrix sizes prevent variability in execution time across platforms, and CRC is applied to the results for validation. The state machine algorithm models behavioral control logic found in embedded systems, such as protocol parsing, by implementing a simple that processes a byte stream to detect valid comma-separated numbers. It transitions through nine states, counting visits and restoring any corrupted data, which tests branch prediction and conditional execution. The input stream is dynamically initialized to a fixed size, ensuring consistent workload while evaluating overhead in integer-based decision-making. CRC verifies the state machine outputs. The algorithms are executed sequentially in each iteration, with the benchmark running multiple iterations until the total execution time reaches at least 10 seconds to ensure statistical reliability; their combined iterations form the basis for the final score.

Execution and Porting Rules

CoreMark porting requires adapting the benchmark code to the target solely through modifications to the platform-specific porting layer files, such as core_portme.h, core_portme.c, and core_portme.mak, without altering the core algorithm logic in files like core_main.c, core_list_join.c, core_matrix.c, core_state.c, or core_util.c. This ensures portability across diverse embedded systems while maintaining the benchmark's integrity, as the implementation relies exclusively on standard with integer arithmetic and prohibits the use of external libraries or floating-point operations. For example, matrix manipulations and list processing routines must remain unchanged, preserving their original integer-based computations. During execution, CoreMark runs in an iterative loop, performing the core algorithms repeatedly until accumulating at least 10 seconds of wall-clock time to ensure statistical reliability, though longer runs (e.g., 30 seconds) are recommended for precision. The benchmark supports a default single dataset via a fixed block size of 2000 bytes, but this can be scaled by adjusting the block size or iteration count to suit varying hardware capabilities and difficulty levels. Comprehensive reporting is mandatory, detailing the version, optimization flags, hardware specifications (e.g., clock speed and configuration), and any parallel execution details to enable fair comparisons. Validation is enforced through built-in cyclic redundancy check (CRC) mechanisms that compute expected checksums for the list join, matrix multiplication, and state machine components using predefined seed values (e.g., 0, 0, 0x66 or 0x3415, 0x3415, 0x66) and buffer sizes; discrepancies indicate invalid implementations or illicit optimizations that bypass computational work. For official , vendors must submit their results, including and build artifacts, to the EEMBC lab, where they undergo rigorous verification to confirm adherence to run rules. Time measurement employs platform-specific timers integrated into the porting layer (e.g., via start_time() and stop_time() functions), capturing elapsed wall-clock time to derive rates, with scores normalized by megahertz (CoreMark/MHz) to mitigate biases from differing clock speeds in raw reporting. This normalization facilitates architecture-agnostic comparisons, focusing on efficiency rather than absolute speed.

Scoring System

The CoreMark score is calculated as the total number of iterations completed across all algorithms divided by the execution time in seconds, where each iteration represents a complete cycle of the benchmark's workloads. This formula yields iterations per second, providing a direct measure of throughput while ensuring the benchmark runs for a minimum duration, typically at least 10 seconds, to minimize timing inaccuracies. To facilitate comparisons across processors operating at different clock speeds, scores are often normalized as CoreMarks per MHz by dividing the raw score by the processor's frequency in megahertz. Compiler optimizations can influence results, so reports include details on the compiler version and flags used, though the primary emphasis remains on hardware performance rather than software tuning. CoreMark employs a single-number score for straightforward reporting and comparison, with full disclosure of parameters such as seed values (e.g., 0, 0, 0x66), buffer sizes (e.g., 2000 bytes), and platform specifics required for reproducibility. Certified scores, verified by the EEMBC Certification Lab, undergo rigorous analysis to confirm adherence to official run and reporting rules, distinguishing them from self-reported results. A score of 1.0 corresponds to the performance of a reference implementation executed on a baseline processor under standardized conditions, serving as the foundational metric against which higher values indicate superior performance.

Comparisons with Other Benchmarks

Versus Dhrystone

Dhrystone, developed in the 1980s, suffers from several limitations that undermine its reliability as a benchmark for modern embedded processors. Its code base is outdated and synthetic, making it highly susceptible to aggressive compiler optimizations, such as loop unrolling or inlining, which can bypass intended computational work and inflate scores disproportionately compared to real application performance. Additionally, Dhrystone heavily relies on string library functions like strcmp() and strcpy(), which account for 10-20% of execution time and primarily measure library optimization rather than core processor capabilities. Its scoring in DMIPS (Dhrystone MIPS) or VAX-equivalent MIPS is based on an obsolete reference machine (VAX 11/750), resulting in non-intuitive metrics that do not reflect contemporary embedded workloads involving real-time control or data processing. CoreMark was explicitly developed by EEMBC in 2009 as a replacement for Dhrystone to address these vulnerabilities, particularly its exposure to compiler tweaks that distort meaningful performance evaluation. Unlike Dhrystone, CoreMark employs non-optimizable algorithms—such as list processing, matrix manipulation, state machine behaviors, and CRC computations—that are driven by runtime values and include self-verification checks to prevent code elimination or pre-computation by compilers. It excludes external library calls from the timed execution portion, ensuring that all measured code is self-contained and focused on intrinsic processor performance. Furthermore, CoreMark uses an iteration-based scoring system (iterations per second, normalized to CoreMark/MHz) that emphasizes portable integer workloads representative of embedded tasks like data manipulation and control logic, without relying on architecture-specific conversions. These design choices highlight fundamental differences between the benchmarks: prioritizes synthetic, MIPS-like performance metrics that are prone to variability across tools and hardware, whereas delivers hardware-centric, directly comparable scores that better align with the needs of modern embedded systems. By enforcing strict run and reporting rules—such as specifying exact versions—CoreMark ensures reproducibility and fairness, mitigating the standardization gaps that plague results.

Versus Other Embedded Benchmarks

CoreMark-Pro extends the original CoreMark benchmark by incorporating additional workloads that include floating-point operations and multi-threaded execution, providing broader coverage of processor capabilities beyond basic performance. Specifically, while CoreMark consists of a single workload with four functions focusing on list processing, matrix manipulation, state machine operations, and CRC calculations, CoreMark-Pro adds five workloads—such as compression, ZIP compression, XML parsing, SHA-256 hashing, and an enhanced memory-intensive variant—and four floating-point workloads, including FFT, linear algebra derived from LINPACK, improved Livermore loops, and a algorithm. This expansion allows CoreMark-Pro to evaluate memory subsystems and diverse performance characteristics in 32-bit to 64-bit microprocessors, contrasting with CoreMark's emphasis on core pipeline efficiency in simpler 8-bit to 64-bit devices. However, CoreMark remains preferable for basic testing due to its smaller footprint—requiring only 2 KB of code and 16 KB of data—making it simpler to port and execute on resource-limited microcontrollers. In comparison to EEMBC's ULPMark suite, particularly ULPMark-CoreMark, prioritizes raw performance metrics without incorporating power consumption analysis. ULPMark-CoreMark builds directly on by measuring iterations per milli-Joule, integrating energy efficiency alongside performance through optimized configurations at varying voltages, such as a performance-focused mode and energy-efficient modes at the lowest voltage or 3V baseline. This results in ULPMark providing a holistic view of active-power efficiency for ultra-low-power MCUs, whereas focuses exclusively on computational throughput in iterations per second, suitable for scenarios where power profiling is not required. Relative to academic and industry alternatives like MiBench, offers superior portability and a minimal resource footprint tailored for microcontrollers. MiBench, a suite of 16 application-specific benchmarks simulating embedded workloads such as automotive, network, and consumer applications, demands more complex setups involving larger codebases and dependencies, making it less ideal for constrained MCU environments. , by contrast, employs a single synthetic program with straightforward algorithms that avoid external libraries like malloc, ensuring easy porting across architectures and emphasizing single-threaded core speed over detailed application simulations. As of 2025, Embench represents another open-source alternative, with versions like Embench IOT 2.0 and DSP 1.0 focusing on realistic, portable workloads for modern IoT and digital signal processing in embedded systems. Unlike CoreMark's synthetic algorithms, Embench draws from real-world applications to better reflect connected device behaviors, while maintaining low resource demands; however, CoreMark's standardized, certified scoring continues to make it a preferred choice for simple core performance comparisons in industry. Unlike the SPEC CPU benchmark suite, which is designed for desktops and servers to evaluate workloads, CoreMark is optimized for resource-constrained embedded systems with its lightweight design and focus on MCU-relevant operations. SPEC CPU includes diverse and floating-point tests that require substantial and resources, often exceeding the capabilities of typical embedded devices, while CoreMark's minimal 2 KB code size and avoidance of complex I/O or threading enable reliable execution on low-end hardware without specialized setups.

Adoption and Results

Usage in Industry

CoreMark has been widely adopted by semiconductor vendors for evaluating and marketing microcontroller units (MCUs) in embedded systems, with companies such as , , , , and submitting certified scores to EEMBC for validation and comparative purposes. These vendors leverage CoreMark to demonstrate processor performance in product datasheets and technical specifications, enabling fair comparisons across diverse architectures from 8-bit to 64-bit devices. The benchmark's strict porting rules, which emphasize platform-neutral implementation, have facilitated this broad uptake by ensuring consistent and verifiable results across vendor ecosystems. In practical applications, supports processor selection during the development of IoT devices, automotive electronic control units (ECUs), and , where it provides a simple metric for assessing computational efficiency in resource-constrained environments. For instance, Renesas integrates into its MCU portfolios for automotive and industrial applications, using it to highlight performance in and other embedded processing tasks. Its integration into compiler toolchains, such as GCC via Embedded Toolchain and vendor-specific SDKs like those from and , allows for automated testing and optimization during software development. CoreMark is often combined with power consumption metrics in benchmarks like ULPMark-CoreMark, which extends its utility to low-power designs by measuring energy efficiency in active scenarios, a critical factor for battery-operated IoT and wearable consumer products. The open-source release of on in 2009, with ongoing maintenance by EEMBC, has enabled its use in academic research and custom extensions, while over 800 scores—many certified—have been submitted to EEMBC since then (as of November 2025, 858 scores are listed), influencing iterative improvements in chip designs across the industry.

Example Performance Scores

CoreMark performance scores vary significantly across processor architectures, influenced primarily by factors such as clock speed, depth, and cache size. For instance, the Cortex-M0 processor achieves an official rating of 2.33 CoreMarks/MHz, while the Cortex-M0+ variant improves to 2.46 CoreMarks/MHz, reflecting enhancements in instruction execution efficiency. The Cortex-M4, with its more advanced and optional , attains 3.54 CoreMarks/MHz, enabling higher throughput in tasks. Higher-end processors like the series can exceed 10 CoreMarks/MHz under optimized conditions; for example, the Allwinner H616 based on Cortex-A53 cores reports 13.10 CoreMarks/MHz at 1.5 GHz. These per-MHz metrics allow fair comparisons independent of clock frequency, though total scores scale with it—for a typical 100 MHz using a Cortex-M4 core, total CoreMarks might range from 300 to 400, depending on implementation details like memory access latency. Only scores verified by the EEMBC Certification Lab are considered official and eligible for the CoreMark logo, ensuring adherence to strict run rules; a baseline certified score of 721 CoreMarks has been established for devices like certain Renesas RX family processors running at around 120 MHz. The public EEMBC database, which includes both certified and self-reported results, reveals variations due to compiler choices—for example, the Renesas RA4C1 (Cortex-M4) achieves 314 CoreMarks with Arm Compiler 6.16 but 324 CoreMarks with IAR Compiler at 80 MHz, a difference attributable to optimization levels and code generation efficiency. GCC and IAR compilers often show 5-10% discrepancies in similar setups, underscoring the importance of standardized reporting. As of recent uploads to the EEMBC database (as of November 2025), top scores for advanced microcontrollers demonstrate 's scalability; the Renesas RA8T2 using a Cortex-M85 core reaches 6,379 at 1 GHz (6.38 CoreMarks/MHz), far surpassing entry-level devices and highlighting gains from deeper pipelines and larger caches in modern embedded systems. These examples illustrate how quantifies performance evolution, with the scoring system (detailed elsewhere) applied consistently to yield these iteration-based results.
Processor ExampleCoreMarks/MHzTotal CoreMarksClock SpeedCompilerCertified?Source
ARM Cortex-M0 (generic)2.33N/AN/AN/AOfficial RatingARM Developer
Renesas RA4C1 (Cortex-M4)3.9331480 MHzArm 6.16NoEEMBC Scores
Allwinner H616 (Cortex-A53)13.1019,6561.5 GHzGCC 7.5.0NoEEMBC Scores
Renesas RA8T2 (Cortex-M85)6.386,3791 GHzArm CompilerNoEEMBC Scores
Baseline Device (e.g., Renesas RX)~6.01721~120 MHzN/AYesEEMBC CoreMark
Add your contribution
Related Hubs
User Avatar
No comments yet.