Recent from talks
All channels
Be the first to start a discussion here.
Be the first to start a discussion here.
Be the first to start a discussion here.
Be the first to start a discussion here.
Welcome to the community hub built to collect knowledge and have discussions related to ARM Cortex-A9.
Nothing was collected or created yet.
ARM Cortex-A9
View on Wikipediafrom Wikipedia
MediaTek M6575 | |
| General information | |
|---|---|
| Launched | 2007 |
| Designed by | ARM Holdings |
| Performance | |
| Max. CPU clock rate | 0.8 GHz to 2 GHz |
| Physical specifications | |
| Cores |
|
| Cache | |
| L1 cache | 32 KB I, 32 KB D |
| L2 cache | 128 KB–8 MB (configurable with L2sr1 cache controller) |
| Architecture and classification | |
| Instruction set | ARMv7-A |
| History | |
| Predecessor | ARM Cortex-A8 |
| Successor | ARM Cortex-A12 |
The ARM Cortex-A9 MPCore is a 32-bit multi-core processor that provides up to 4 cache-coherent cores, each implementing the ARM v7 architecture instruction set.[1] It was introduced in 2007.[2]
Features
[edit]Key features of the Cortex-A9 core are:[3]
- Out-of-order speculative issue superscalar execution 8-stage[4] pipeline giving 8.50 DMIPS/MHz/core.
- NEON SIMD instruction set extension performing up to 16 operations per instruction (optional).
- High performance VFPv3 floating point unit doubling the performance of previous ARM FPUs (optional).
- Thumb-2 instruction set encoding reduces the size of programs with little impact on performance.
- TrustZone security extensions.
- Jazelle DBX support for Java execution.
- Jazelle RCT for JIT compilation.
- Program Trace Macrocell and CoreSight Design Kit for non-intrusive tracing of instruction execution.
- L2 cache controller (0–4 MB).
- Multi-core processing.
ARM states that the TSMC 40G hard macro implementation typically operates at 2 GHz; a single core (excluding caches) occupies less than 1.5 mm2 when designed in a TSMC 65 nanometer (nm) generic process[5] and can be clocked at speeds over 1 GHz, consuming less than 250 mW per core.[2]
Chips
[edit]This section contains an excessive amount of intricate detail. (July 2017) |
Several system on a chip (SoC) devices implement the Cortex-A9 core, including:
- Altera SoC FPGA[6]
- AMLogic AML8726-M[7]
- Apple A5, A5X
- Broadcom BCM11311 (Persona ICE)[8]
- Calxeda EnergyCore ECX-1000[9]
- Entropic EN7588,[10] EN7530
- NXP Semiconductors (Formerly Freescale) QorIQ Layerscape LS1024A[11]
- Freescale Semiconductor i.MX6[12]
- HiSilicon K3V2 -Hi3620[13]
- MediaTek MT6575[14] (single core), MT6577[15] (dual core)
- Mindspeed Technologies Mindspeed Comcerto 2000[16][17][18]
- Nufront NuSmart 2816, 2816M, 115[19]
- Nvidia Tegra 2 (without NEON extensions), Tegra 3 and Tegra 4i
- Trident Microsystems 847x/8x/9x SoC family[20]
- Renesas Electronics RZ/A1H, M, L, LU Family
- Samsung Exynos 4210,[21] 4212, 4412, 4415
- Rockchip RK3066,[22] RK292x, RK31xx
- STMicroelectronics SPEAr1310,[23] SPEAr1340[24]
- ST-Ericsson Nova A9500, NovaThor U8500,[25] NovaThor U9500[26]
- Texas Instruments OMAP4 processors
- Texas Instruments Sitara AM437x[27]
- WonderMedia WM8850, WM8950 and WM8980[28]
- Xilinx Extensible Processing Platform[29]
- ZiiLABS ZMS-20[30]
Systems on a chip
[edit]| Developer | Name | Cores | Process | NEON SIMD | Vector floating point unit | GPU |
|---|---|---|---|---|---|---|
| Altera | SoC FPGA | 1–2 | 28 nm | Yes | VFPv3 | optionally implemented in FPGA; TES Electronic Solutions D/AVE HD Archived 14 November 2017 at the Wayback Machine |
| Ambarella Inc. | S3L | 1 | 28 nm | Yes | VFPv3 | — |
| AMLogic | AML8726-M | 1 | 65 nm | Yes | VFPv3 | ARM Mali-400 |
| AMLogic | AML8726-MX | 2 | 40 nm | Yes | VFPv3 | ARM Mali-400 MP2 |
| AMLogic | AML8726-M8 | 4 | 28 nm | Yes | VFPv3 | ARM Mali-450 MP6 |
| Apple Inc. | A5 | 2 | 32 nm 45 nm |
Yes | VFPv3 | PowerVR SGX543MP2 |
| Apple Inc. | A5X | 2 | 45 nm | Yes | VFPv3 | PowerVR SGX543MP4 |
| Broadcom | BCM11311 (Persona ICE) | 2 | 40 nm | ? | ? | Broadcom Videocore IV |
| Broadcom | BCM21654 | 1 | 40 nm | Yes | VFPv3 | Broadcom Videocore IV |
| Broadcom | BCM21664T | 2 | 40 nm | Yes | VFPv3 | Broadcom Videocore IV |
| Calxeda | EnergyCore ECX-1000[9] | 4 | 40 nm | Yes | VFPv3 | — |
| ELVEES Multicore | 1892VM14Ya | 2 | 40 nm | Yes | VFPv3 | ARM Mali-300 |
| Freescale Semiconductor | i.MX6[31] | 1–4 | 40 nm | Yes | VFPv3-D32 | Vivante Corporation GPU IP cores[32] |
| HiSilicon | K3V2 (Hi3620) | 4 | 40 nm | Yes | VFPv3 | Vivante GC4000 |
| Intel | Cyclone V | 1–2 | 28 nm | Yes | VFPv3 | — |
| LG Corp | LG L9 | 2 | ? | ? | ? | ARM Mali-400 MP4 |
| Marvell | ARMADA 38x | 1–2 | 28 nm | Yes | VFPv3 | — |
| Marvell | PXA986 | 2 | 45 nm | Yes | VFPv3 | PowerVR SGX540 / Vivante GC1000 (Galaxy Tab 3 7-inch) |
| Marvell | PXA988 | 2 | 45 nm | Yes | VFPv3 | Vivante GC1000 |
| MediaTek | MT6575 | 1 | 40 nm | Yes | VFPv3 | PowerVR SGX531[14] |
| MediaTek | MT6577 | 2 | 40 nm | Yes | VFPv3 | PowerVR SGX531[15] |
| Mindspeed Technologies | Comcerto 2000 | 2 | ? | Yes | ? | — |
| Nufront | NuSmartTM 2816(NS2816) | 2 | ? | Yes | VFPv3 | ARM Mali-400[33] |
| Nufront | NuSmartTM 2816M (NS2816M) | 2 | ? | Yes | VFPv3 | ARM Mali-400 |
| Nufront | NuSmartTM 115 (NS115) | 2 | ? | Yes | VFPv3 | ARM Mali-400 |
| Nvidia | Tegra 2 series | 2 | 40 nm | No | VFPv3-D16 | GeForce ULP |
| Nvidia | Tegra 3 (Kal-El) series | 4 | 40 nm | Yes | VFPv3 | GeForce ULP |
| Renesas Electronics | [1] | ? | ? | ? | ? | — |
| Renesas Electronics | RZ/A1H[34] | 1 | various | Yes | VFPv3 | WXGA 2D graphics 10MByte RAM SoC |
| Renesas Electronics | RZ/A1M[34] | 1 | various | Yes | VFPv3 | WXGA 2D graphics 5MByte RAM SoC |
| Renesas Electronics | RZ/A1L[34] | 1 | various | Yes | VFPv3 | WXGA 2D graphics 3MByte RAM SoC |
| Renesas Electronics | RZ/A1LU[34] | 1 | various | Yes | VFPv3 | RZ/A1L plus Ethernet AVB support and a JPEG codec unit, 3MByte RAM SoC |
| Rockchip | RK2928 | 1 | 40 nm | ? | ? | ARM Mali-400 |
| Rockchip | RK3066[22] | 2 | 40 nm | Yes | VFPv3 | ARM Mali-400 MP4 |
| Rockchip | RK3128 | 2 | ? | Yes | VFPv3 | ARM Mali-400 MP4 |
| Rockchip | RK3188[35] | 4 | 28 nm | Yes | VFPv3 | ARM Mali-400 MP4 |
| Samsung | Exynos 4 Dual (4210) | 2 | 45 nm | Yes | VFPv3 | ARM Mali-400 MP4 |
| Samsung | Exynos 4 Dual (4212) | 2 | 32 nm | Yes | VFPv3 | ARM Mali-400 MP4 |
| Samsung | Exynos 4 Quad (4412) | 4 | 32 nm | Yes | VFPv3 | ARM Mali-400 MP4 |
| Samsung | Exynos 4 Quad (4415) | 4 | 28 nm | Yes | VFPv3 | ARM Mali-400 MP4 |
| STMicroelectronics | SPEAr1310 | ? | ? | No | VFPv3 | — |
| STMicroelectronics | SPEAr1340 | 2 | ? | No | VFPv3-D16 | ARM Mali-200[36] |
| ST-Ericsson | Nova A9500 | 2 | 45 nm | Yes | VFPv3 | ARM Mali-400 |
| ST-Ericsson | NovaThor U8500 | 2 | 45 nm | Yes | VFPv3 | ARM Mali-400 |
| ST-Ericsson | NovaThor U9500 | 2 | 45 nm | Yes | VFPv3 | ARM Mali-400 |
| Sony | PlayStation Vita | 4 | 40 nm | Yes | VFPv3 | PowerVR SGX543MP4+ |
| Texas Instruments | Sitara AM437x | 1 | 45 nm | Yes | VFPv3 | SGX530 Graphics Engine |
| Texas Instruments | OMAP4430 OMAP4460 |
2 | 45 nm | Yes | VFPv3 | PowerVR SGX540 |
| Texas Instruments | OMAP4470 | 2 | 45 nm | Yes | VFPv3 | PowerVR SGX544 |
| Trident Microsystems | PNX8473[37] | 1 | ? | ? | ? | PowerVR SGX531 |
| Trident Microsystems | PNX8483[38] | 1 | ? | ? | ? | PowerVR SGX531 |
| Trident Microsystems | PNX8491[39] | 1 | ? | ? | ? | PowerVR SGX531 |
| WonderMedia | WM8850 | 1 | 40 nm | Yes | VFPv3 | ARM Mali-400 |
| WonderMedia | WM8880 | 2 | 40 nm | ? | ? | ARM Mali-400 MP2 |
| WonderMedia | WM8950 | 1 | 40 nm | ? | ? | ARM Mali-400[28] |
| WonderMedia | WM8980 | 2 | 40 nm | ? | ? | ARM Mali-400 MP2 |
| Xilinx | Zynq-7000[40] | 2 | 28 nm | Yes | VFPv3 | — |
| ZiiLABS | ZMS-20 | ? | ? | Yes | VFPv3 | ZiiLABS flexible Stemcell media processing |
See also
[edit]References
[edit]- ^ "ARM Cortex-A9 MPCore". Arm.com. Retrieved 2 February 2012.
- ^ a b "ARM spins multicore-enabled Cortex core - News - Linux for Devices". Archived from the original on 6 September 2012. Retrieved 7 January 2010.
- ^ "Cortex-A9 Processor Specifications". ARM.
- ^ "White paper: The ARM Cortex-A9 Processors" (PDF). ARM. Archived from the original (PDF) on 17 November 2014.
- ^ "Cortex-A9 Single Core Processor". Arm.com. Retrieved 2 February 2012.
- ^ SoC FPGA overview, Altera
- ^ IbhMobile Internet Devices, Amlogic, archived from the original on 4 May 2014
- ^ "BCM11311 - Persona ICE Application Processor". Broadcom.
- ^ a b "EnergyCore ECX-1000: Technical Specifications". Calxeda. Archived from the original on 25 April 2012. Retrieved 8 May 2012.
- ^ "High Performance, Dual-Core IP Set-top Box SoC". Entropic. Archived from the original on 29 October 2013. Retrieved 13 June 2013.
- ^ QorIQ® Layerscape 1024A Dual-Core Communications Processor, NXP Semiconductors
- ^ "Introducing the i.MX 6 Series". Freescale Semiconductor.
- ^ "HiSilicon Unveils Quad-Core Cortex A9 K3V2 Processor (Hi3620)". 27 February 2012.
- ^ a b "MediaTek - MT6575". MediaTek. Archived from the original on 15 January 2013. Retrieved 8 January 2013.
- ^ a b "MediaTek - MT6577". MediaTek. Archived from the original on 13 January 2013. Retrieved 8 January 2013.
- ^ Roy Rubenstein (9 October 2012). "An ARM based programmable processor is set to enable new communications products".
- ^ Kevin Trosian (8 January 2013). "Mindspeed to Showcase the Industry's First ARM Cortex A9-based Communications Processor with Integrated DPI at 2013 CES".
- ^ "MACOM to Showcase Newly Acquired Mindspeed Comcerto 2000 System-on-Chip (SoC) Processors at the 2014 International CES". 7 January 2014.
- ^ "Computer System Chip". Nufront. Archived from the original on 30 August 2011. Retrieved 26 September 2011.
- ^ NXP to show the first fully integrated 45nm set top box soc based on ARM cortex - A9 processors
- ^ "Exynos 4210". samsung.com. 20 January 2012. Retrieved 2 February 2012.
- ^ a b RK3066 Dual-Core Era is coming
- ^ SPEAr1310 Dual-core Cortex A9 embedded MPU for communications
- ^ SPEAr1340 Dual-core Cortex A9 embedded MPU for communications, archived from the original on 2 May 2012, retrieved 3 March 2012
- ^ ST-Ericsson NovaThor U8500, ST-Ericsson, archived from the original on 22 July 2013, retrieved 19 February 2011
- ^ ST-Ericsson NovaThor U9500, ST-Ericsson, archived from the original on 2 October 2011, retrieved 25 September 2011
- ^ "AM437x Sitara Processors".
- ^ a b "WonderMedia Announces PRIZM WM8950 with Android 4.0 Support". 19 May 2013. Archived from the original on 15 August 2013. Retrieved 17 June 2013.
- ^ White Paper: Extensible Processing Platform (PDF), archived from the original (PDF) on 2 September 2011, retrieved 25 September 2011
- ^ ZiiLABS ZMS-20 Dual ARM Cortex A9 Media Processor, archived from the original on 25 September 2011, retrieved 26 September 2011
- ^ Introducing the i.MX 6 Series of Applications Processors (PDF), archived from the original (PDF) on 11 August 2013, retrieved 25 September 2011
- ^ Vivante GPU IP Cores Power the Latest Freescale i.MX 6 Series of Application Processors, archived from the original on 20 November 2011, retrieved 25 September 2011
- ^ Nufront 2GHz ARM Cortex-A9 for Desktop, Laptop and Netbook – NuSmart 2816
- ^ a b c d RZ A1H Home, archived from the original on 22 December 2015, retrieved 21 December 2015
- ^ "Review of Rockchip RK3188 Quad-core chipset". 6 December 2012.
- ^ SPEAr family of embedded microprocessors (PDF), archived from the original (PDF) on 28 June 2011, retrieved 25 September 2011
- ^ PNX8473, archived from the original on 29 October 2013, retrieved 26 June 2013
- ^ PNX8483, archived from the original on 29 October 2013, retrieved 26 June 2013
- ^ Samsung Galaxy A9 sale in India today: Price in India, offers and features (2018), November 2018[permanent dead link]
- ^ "Xilinx Zynq-7000 Extensible Processing Platform". Archived from the original on 7 April 2012. Retrieved 13 April 2012.
External links
[edit]- ARM Holdings
- Other
ARM Cortex-A9
View on Grokipediafrom Grokipedia
The ARM Cortex-A9 is a high-performance, power-efficient 32-bit processor core developed by Arm, implementing the ARMv7-A architecture and designed for embedded applications in low-power, thermally constrained, and cost-sensitive devices.[1] Introduced on March 31, 2008, with its initial revision (r0p0), it supports the ARM, Thumb, and Thumb-2 instruction sets, enabling versatile execution in single-core or multi-core configurations.[2]
Key features of the Cortex-A9 include a dual-issue, partially out-of-order 8-stage superscalar pipeline for enhanced instruction throughput, dynamic branch prediction, and configurable L1 caches of 16KB, 32KB, or 64KB per core, with support for an optional unified L2 cache up to 8MB.[1] It incorporates the ARMv7 Memory Management Unit (MMU) for virtual memory handling, TrustZone security extensions for protected execution environments, and optional NEON Advanced SIMD and Vector Floating-Point (VFPv3) units for multimedia and signal processing acceleration.[2] The multiprocessor variant, known as Cortex-A9 MPCore, scales to up to four cores with cache coherency via the Accelerator Coherency Port (ACP) and a Snoop Control Unit (SCU), facilitating symmetric multiprocessing (SMP) in systems requiring parallel performance.[1]
In terms of performance, the Cortex-A9 delivers over 50% improvement in single-core efficiency compared to its predecessor, the Cortex-A8, while maintaining low power consumption suitable for battery-operated devices; it also integrates CoreSight components for comprehensive debug and trace capabilities.[1] Widely deployed since its launch, the core powers applications in smartphones, digital TVs, consumer electronics, and enterprise systems, with notable implementations in devices such as the NVIDIA Tegra 2, STMicroelectronics SPEAr1300, and Texas Instruments OMAP4 SoCs.[3] Its maturity and configurability as either speed-optimized or power-optimized IP have made it a foundational choice for Arm-based system-on-chips (SoCs) in the late 2000s and early 2010s.[1]
NEON acceleration further boosts multimedia benchmarks, contributing to the A9's edge in vector-heavy workloads over in-order designs like the A8.[51]
Introduction and History
Development Timeline
The ARM Cortex-A9 was developed by ARM Holdings as part of the ARMv7-A architecture family, succeeding the single-core Cortex-A8 and emphasizing multi-core scalability to address increasing performance needs in mobile devices.[4] ARM officially announced the Cortex-A9 single-core and MPCore multi-core processors on October 8, 2007, at the ARM Developers' Conference in Santa Clara, California, highlighting their support for up to four cache-coherent cores based on the ARMv7 instruction set.[5][6] The initial processor release occurred in 2008, with first silicon samples becoming available in late 2009; early demonstrations included ST-Ericsson's multiprocessing implementation running Symbian OS at a private event in February 2009.[7][8] Commercial availability began in 2010, as volume shipments of Cortex-A9-based silicon entered multiple market segments, including smartphones and embedded systems, with key partnerships such as ST-Ericsson enabling rapid adoption through early implementations like the U8500 platform.[7][9]Position in ARM Portfolio
The ARM Cortex-A9 serves as a high-performance, out-of-order processor core within the ARMv7-A architecture profile, designed specifically for applications processors in devices requiring robust computational capabilities while maintaining power efficiency.[3][10] It introduced partial out-of-order execution to the ARM portfolio, marking a significant advancement over its predecessor, the Cortex-A8, which relied on an in-order pipeline and emphasized single-core implementations for simpler mobile applications.[11] In contrast, the Cortex-A9 supported multi-core configurations, paving the way for its successor, the Cortex-A15, which further refined out-of-order processing with enhanced superscalar capabilities for even higher performance demands.[4] Targeted at markets such as smartphones, tablets, and embedded systems, the Cortex-A9 balanced high performance with low power consumption, making it suitable for thermally constrained and cost-sensitive environments where multimedia and general-purpose computing were key.[3] Within the broader ARMv7-A family, it positioned above lower-power options like the Cortex-A5, optimized for minimal area and energy use in basic embedded tasks, and the Cortex-A7, which focused on efficiency for entry-level devices with performance comparable to the A9 but in a smaller footprint.[12][13] ARM offered the Cortex-A9 under a flexible licensing model, providing it as synthesizable intellectual property (soft core) in RTL format for custom integration across various process nodes, or as pre-optimized hard macros tailored for specific manufacturing processes to accelerate time-to-market and ensure performance guarantees.[14][3] This approach enabled scalability, including dual-core configurations, to meet diverse system requirements without overhauling the core design.[1]Core Architecture
Processor Microarchitecture
The ARM Cortex-A9 processor employs an out-of-order superscalar microarchitecture to deliver high performance in embedded and mobile applications, implementing the ARMv7-A architecture with support for the Thumb-2 instruction set for efficient code density.[1] This design incorporates dynamic scheduling, allowing instructions to execute out of program order when dependencies permit, thereby maximizing resource utilization and reducing stalls in the execution pipeline. The integer pipeline consists of up to 8 stages, enabling efficient handling of speculative execution while balancing power and area constraints typical of ARM's application processors.[1][15] A key aspect of the microarchitecture is its support for dual-issue in integer operations, where up to two instructions can be dispatched per cycle from a variable-length decoder that processes the mixed 16- and 32-bit Thumb-2 encodings.[1] This partially out-of-order model applies primarily to integer execution, with load/store operations also benefiting from dynamic reordering to overlap memory accesses effectively. Branch prediction is facilitated by a hybrid mechanism featuring a global history table, implemented as a 2-level dynamic predictor with a configurable Global History Buffer (GHB) with 1024, 2048, 4096, 8192, or 16384 entries, a Branch Target Address Cache (BTAC), and a return stack to anticipate control flow and minimize misprediction penalties.[16][17] The core's scalability allows configuration as a single processor or in multi-core setups, such as the dual-core variant in the Cortex-A9 MPCore, where coherence between cores is maintained through the AMBA AXI interconnect protocol.[18] This flexibility enables designers to tailor the processor for varying performance needs while integrating with AMBA-based system buses for instruction, data, and peripheral access.Pipeline and Execution Units
The ARM Cortex-A9 features an 8-stage integer pipeline designed for out-of-order execution, enabling superscalar processing with up to two instructions issued per cycle in optimal conditions.[2] The pipeline stages consist of fetch, where instructions are retrieved from the instruction cache; decode, which can process up to two instructions simultaneously; rename, for register renaming to handle dependencies; dispatch, allocating instructions to appropriate queues; issue, scheduling ready instructions to execution units; execute, performing the computations; writeback, returning results to the register file; and retire, committing instructions in program order while handling exceptions.[2] This structure supports speculative execution to minimize stalls from branches and dependencies.[2] The execution units include two integer arithmetic logic units (ALUs) for handling address calculations and general-purpose operations, a dedicated multiply-accumulate (MAC) unit for multiplication and accumulation tasks, and a load/store unit capable of one load and one store operation per cycle.[2] These units allow for concurrent processing of up to four instructions in a cycle, including two ALU operations, one memory access, and one branch, enhancing throughput in integer workloads.[2] Floating-point operations are supported through an integrated VFPv3 unit, which features a separate pipeline for scalar floating-point instructions compliant with IEEE 754.[19] The VFPv3 unit achieves one double-precision fused multiply-accumulate (FMA) operation every two cycles, providing efficient support for single- and double-precision arithmetic.[19][20] In multi-core configurations, the Snoop Control Unit (SCU) manages cache coherence by implementing a snooping protocol that ensures data consistency across up to four cores through directed snoop requests and responses. Power efficiency is enhanced via clock gating, which disables clocks to inactive pipeline stages and units, and power gating, allowing individual cores to enter low-power states while supporting dynamic voltage and frequency scaling.Memory Hierarchy
The ARM Cortex-A9 processor features a multi-level memory hierarchy optimized for high-performance embedded applications, comprising Level 1 (L1) caches tightly integrated with the core, an optional external Level 2 (L2) unified cache, a two-level Translation Lookaside Buffer (TLB) for address translation, and a Memory Management Unit (MMU) for virtual memory support. This design balances low-latency access with scalability in single- and multi-core configurations, leveraging the ARMv7-A architecture.[2] The L1 caches are Harvard-style, with separate instruction and data caches that are configurable in size to 16 KB, 32 KB, or 64 KB per cache. Both are 4-way set-associative with 32-byte cache lines, enabling efficient prefetching and branch target buffering integration. The data cache operates in write-back mode to minimize bus traffic, supporting write-allocate policies for cacheable regions.[2] The L2 cache is a unified, external structure implemented via the ARM PrimeCell PL310 controller, configurable from 128 KB to 8 MB in 128 KB increments and typically organized as 16-way set-associative. It connects to the core through dedicated AXI master interfaces, providing shared access in multi-core setups and supporting exclusive caching modes to avoid data duplication between L1 and L2 levels. The TLB architecture uses a two-level hierarchy to reduce MMU lookup overhead. The first level includes separate micro-TLBs: a 32-entry fully associative data micro-TLB and a configurable 32- or 64-entry instruction micro-TLB. The second-level main TLB is unified for instruction and data, implemented as a configurable 2-way set-associative array of 64 to 512 entries plus four fully associative lockable entries, allowing selective retention of critical translations.[2] The MMU provides comprehensive virtual-to-physical address translation and protection, supporting 4 KB small pages as the base granule, along with larger section (1 MB) and supersection (16 MB) mappings in the standard ARMv7 configuration. In multi-core variants, the Cortex-A9 employs AMBA AXI interfaces—typically two 64-bit AXI masters per core—for all external memory accesses, with the Snoop Control Unit (SCU) ensuring cache coherence by snooping AXI transactions and broadcasting invalidations across cores. This AXI4-compatible setup supports system-level interconnects while maintaining low-latency coherence for up to four cores.Key Features
SIMD and Vector Processing
The ARM Cortex-A9 incorporates the NEON advanced SIMD extension as part of its ARMv7-A architecture, providing a dedicated media processing engine for vector operations. The NEON unit is 128-bit wide, enabling parallel processing of multiple data elements within this vector length, and features a register file consisting of 32 64-bit registers (equivalent to 16 full 128-bit vectors) that support both integer and floating-point operations. These registers are shared with the VFPv3 unit, allowing seamless integration between scalar and vector floating-point computations. Integer operations handle unsigned and signed data types from 8-bit to 64-bit, including polynomial arithmetic over GF(2, while floating-point support focuses on single-precision (32-bit) formats, with limited double-precision scalar capabilities.[21] NEON instructions enable efficient vector arithmetic, such as VADD for element-wise addition and VMUL for multiplication, operating on vectors with up to 16 elements (e.g., sixteen 8-bit integers or four 32-bit floats per 128-bit vector). These instructions incorporate saturation modes to prevent overflow by clamping results to the representable range, and rounding modes for precise shifts and conversions, enhancing accuracy in signal processing tasks. Integration with VFPv3 extends this to vectorized floating-point operations, including fused multiply-add (VFMA) instructions that compute a*b + c in a single operation without intermediate rounding, reducing error accumulation in chained computations. This fusion applies to both scalar and vector forms, supporting up to four single-precision elements per instruction.[22] In terms of performance, the NEON unit can achieve up to 8 single-precision floating-point operations per cycle when leveraging the Cortex-A9's dual-issue capability, where two NEON instructions (e.g., a multiply followed by an add) are dispatched simultaneously to the execution pipelines. This throughput is realized in multimedia acceleration scenarios, such as H.264 video decoding, where NEON handles motion compensation and inverse transforms on multiple pixel blocks in parallel, and 3D graphics processing, including vertex shading and texture filtering. These capabilities make NEON particularly suited for embedded applications requiring efficient handling of audio, video, and image data streams.[23][24]Integer and Floating-Point Operations
The ARM Cortex-A9 processor implements scalar integer operations as part of the ARMv7-A architecture, supporting both the traditional 32-bit ARM instruction set and the Thumb-2 instruction set, which combines 16-bit and 32-bit instructions to achieve better code density while maintaining performance comparable to ARM instructions.[2] All scalar integer operations feature conditional execution, enabling instructions to execute only if specified conditions (such as equality or greater-than) are met, which helps minimize branching and improve pipeline efficiency. Additionally, the architecture includes media-oriented instructions for digital signal processing (DSP) tasks, such as SMLAD, which performs two 16-bit signed multiplies followed by a 32-bit addition, useful for audio and image processing applications. Cycle timings for integer operations vary by instruction type but emphasize low latency for common arithmetic. Basic data-processing instructions like ADD and SUB complete in a single cycle, allowing high throughput in sequential computations.[25] Multiply operations, such as MUL for 32-bit results, typically require 3-5 cycles depending on operand size and whether accumulation is involved, balancing precision with performance.[25] Division instructions, including signed (SDIV) and unsigned (UDIV), take longer at 10-14 cycles to ensure accurate results, reflecting the complexity of the iterative algorithm used.[25] These timings assume in-order execution without interlocks; out-of-order execution in the Cortex-A9 can further optimize overall performance by scheduling dependent operations.[2] For floating-point operations, the Cortex-A9 integrates an optional Vector Floating-Point (VFPv3) unit that handles single-precision (32-bit) and double-precision (64-bit) computations in compliance with the IEEE 754 standard, providing robust support for scientific and graphics workloads. The VFPv3 unit includes fused multiply-accumulate (FMA) operations, which combine multiplication and addition into a single rounded result to reduce error accumulation in iterative calculations. Floating-point addition and subtraction require 3 cycles, enabling efficient scalar math in loops, while division operations range from 14 cycles for single-precision to 28 cycles for double-precision, due to the reciprocal approximation method employed. These timings position the VFPv3 as a high-performance coprocessor when enabled, though it can be disabled for power savings in integer-only applications. The Cortex-A9 also supports the optional Jazelle extension, which accelerates Java bytecode execution by allowing direct hardware interpretation of most bytecodes as a third execution state alongside ARM and Thumb modes, though it is rarely utilized in modern implementations due to advancements in just-in-time compilation.[26]Security and Virtualization Support
The ARM Cortex-A9 processor incorporates ARM TrustZone technology, which provides hardware-enforced isolation between a secure world for sensitive operations, such as cryptographic processing, and a normal world for general-purpose computing. This separation is achieved through a dedicated secure state in the processor, where the secure world maintains exclusive access to protected resources while the normal world operates under restricted privileges. All bus transactions originating from the processor include a Non-Secure (NS) bit, which tags accesses as secure or non-secure, enabling peripherals and memory systems to enforce isolation at the hardware level.[27][28] Virtualization support in the Cortex-A9 is provided via optional extensions to the ARMv7-A architecture, allowing for efficient hypervisor operation through two-stage memory address translation. In this setup, stage-1 translation maps virtual addresses to intermediate physical addresses (IPAs) within a guest operating system, while stage-2 translation, managed by the hypervisor, maps IPAs to physical addresses, supporting up to 40-bit IPAs when the extensions are enabled. These features enable secure partitioning of resources among multiple virtual machines, with the hypervisor running in a non-secure monitor mode to oversee guest isolation without compromising performance.[27] World switching between secure and normal states is facilitated by Secure Monitor Calls (SMC), which trigger an exception to enter the monitor mode, a privileged state dedicated to handling transitions and maintaining isolation. The processor's interrupt controller integrates TrustZone by routing interrupts to either secure or non-secure handlers based on configuration bits, such as the FIQ enable bit, ensuring that secure interrupts remain protected from normal-world software. This dedicated handling prevents unauthorized access and supports real-time secure operations.[27] The Cortex-A9 supports a Physical Address Extension (PAE) up to 40 bits when configured, expanding the addressable memory space beyond the standard 32 bits to accommodate large systems, such as those with up to 1 TB of RAM. This extension is optional and implementation-defined, allowing integrators to select it for applications requiring extensive physical memory mapping.[27] Integration with the Memory Management Unit (MMU) extends these capabilities by supporting separate page tables for secure and non-secure worlds, where the NS bit determines which translation table is active during address resolution. In virtualization scenarios, the MMU applies both stages of translation, with secure page tables isolated to prevent tampering, thereby reinforcing TrustZone's protection model across virtualized environments.[27][28]Implementations
Single-Core Configurations
The ARM Cortex-A9 single-core processor, also known as the uniprocessor variant, is implemented as a standalone high-performance core without multi-core clustering, targeting embedded and mobile applications requiring scalable performance. ARM offers this configuration in both synthesizable RTL and hard macro forms to facilitate integration into system-on-chips (SoCs) on advanced process nodes. Hard macros are available on 40 nm and 28 nm processes, enabling optimized area and power for production designs.[29] In terms of operating frequencies, the single-core Cortex-A9 achieves up to 2.5 GHz in speed-optimized hard macro implementations on 28 nm, supporting demanding workloads while maintaining compatibility with ARMv7-A architecture. Typical clock speeds in mobile deployments range from 1 to 2 GHz, balancing performance and thermal constraints in battery-powered devices. Power consumption for a single core is approximately 500 mW at 1 GHz in power-optimized variants, contributing to energy-efficient operation.[30][31] Configuration flexibility is a key aspect of single-core setups, allowing designers to tailor the processor to specific needs. L1 caches can be configured as 16 KB, 32 KB, or 64 KB for both instruction and data sides, with four-way set associativity. An optional unified L2 cache, managed via the L2C-310 controller, supports sizes up to 8 MB for improved memory bandwidth. Additional options include Jazelle hardware acceleration for direct Java bytecode execution and ThumbEE extensions for just-in-time compilation in dynamic environments.[32] ARM delivers the single-core Cortex-A9 as intellectual property (IP) suitable for standalone use, often integrated via the uniprocessor package that excludes multi-core interconnects. This out-of-order execution design enables efficient instruction throughput, supporting the high clock rates observed in these configurations.[1]Multi-Core Variants
The ARM Cortex-A9 MPCore implements multi-core configurations to enable symmetric multiprocessing (SMP), with support for up to four cores in a single cluster for enhanced parallelism while maintaining cache coherence.[33] The dual-core variant is the most prevalent implementation, favored in many designs for its balance of performance gains and power efficiency, as quad-core setups can increase thermal and energy demands without proportional benefits in typical embedded workloads.[18][34] In dual-core MPCore setups, the two Cortex-A9 processors share a unified L2 cache configurable up to 8 MB via the PL310 controller, which provides low-latency access and supports speculative linefills to optimize bandwidth.[35] The Snoop Control Unit (SCU) ensures coherency among the L1 data caches of the cores using a snoop-based mechanism that broadcasts cache operations to maintain data consistency across the cluster.[33] This SCU also arbitrates L2 cache accesses and handles evictions, integrating with the cores' AXI interfaces for efficient memory transactions.[33] Cache coherency in multi-core Cortex-A9 systems follows a MESI-like protocol for intra-cluster L1 interactions, extended by AMBA AXI Coherency Extensions (ACE) to support the AXI interconnect and enable coherent external accesses.[33] The integrated Generic Interrupt Controller (GIC) version 1.0 distributes interrupts across cores, supporting up to 224 shared peripheral interrupts (SPIs) with per-core private interrupts for timers and watchdogs, facilitating efficient task scheduling in SMP environments.[33] Performance scaling in dual-core configurations demonstrates near-linear gains in threaded applications, with representative implementations achieving almost 2x the single-core throughput while consuming only about 40% more power, highlighting the architecture's efficiency for parallel workloads.[34]Integration in SoCs
The ARM Cortex-A9 core was widely integrated into system-on-chips (SoCs) for mobile and embedded applications during the early 2010s, leveraging its ARMv7-A compatibility to enable efficient multi-core processing in power-constrained devices.[1] NVIDIA's Tegra 2, released in 2010, featured a dual-core Cortex-A9 configuration clocked at 1 GHz, marking one of the first mobile SoCs with symmetric multi-processing support for enhanced performance in graphics-intensive tasks. This SoC powered early Android tablets such as the Motorola Xoom and Samsung Galaxy Tab 10.1, combining the CPU with an integrated GeForce GPU for multimedia applications.[36][34] Samsung's Exynos 4210, introduced in 2011 and manufactured on a 45 nm process, incorporated a dual-core Cortex-A9 setup operating at 1.4 GHz, paired with a Mali-400 MP4 GPU to deliver improved graphics rendering for smartphones. It was prominently used in the Samsung Galaxy S II, supporting high-definition video playback and multitasking in mobile environments.[37][38] Apple's A5 SoC, also launched in 2011 on a 45 nm process (later revised to 32 nm), utilized a dual-core Cortex-A9 design clocked at 800 MHz in its iPhone 4S variant, with a higher 1 GHz speed in the iPad 2 configuration; this implementation included custom optimizations for power efficiency alongside a PowerVR SGX543MP2 GPU. The A5 enabled seamless integration in iOS devices, facilitating features like Siri and improved graphics in games.[39][40] Texas Instruments' OMAP 4 series, spanning models like the OMAP4430 and OMAP4460 from 2011 onward, employed dual-core Cortex-A9 processors scalable up to 1.5 GHz, targeted at both consumer mobile devices and industrial embedded systems. These SoCs included dedicated hardware accelerators for imaging and video, making them suitable for applications in smartphones like the Motorola Droid RAZR and automotive infotainment.[41][42] An example of a quad-core implementation is the NXP i.MX 6Quad, released in 2012 on a 40 nm process, featuring four Cortex-A9 cores at 1.0 GHz with integrated 2D/3D graphics acceleration. It has been widely adopted in industrial, automotive, and consumer embedded systems for applications requiring higher parallelism.[43] Other notable integrations included low-cost SoCs for budget tablets, such as Rockchip's RK3066 from 2012, which featured a dual-core Cortex-A9 at up to 1.6 GHz with a Mali-400 GPU to support affordable Android media consumption devices. While some early entrants like Allwinner's A10 targeted similar markets, it used a single Cortex-A8 core instead, highlighting the Cortex-A9's role in bridging performance and cost in emerging consumer electronics.[44]Applications and Performance
Device Adoption
The ARM Cortex-A9 processor powered several first-generation 4G smartphones, including the Motorola Atrix 4G featuring Nvidia Tegra 2. These devices marked early adoption in high-speed mobile connectivity, enabling advanced multimedia and multitasking capabilities in the Android ecosystem. In the tablet market, the Cortex-A9 saw significant uptake through the Apple iPad 2, which utilized the custom A5 SoC with a dual-core Cortex-A9 configuration, contributing to over 30 million units sold during its lifecycle and establishing tablets as mainstream consumer devices.[45] Similarly, the Samsung Galaxy Tab 10.1 employed the Tegra 2 SoC with dual-core Cortex-A9, enhancing portability and performance for media consumption in early Android tablets.[46] The processor also appeared in set-top boxes and early smart televisions, notably powering Google TV platforms such as LG's L9 chipset-based models, which integrated a dual-core Cortex-A9 for seamless streaming and app integration.[47] These implementations brought internet-connected features to home entertainment systems, with LG's early Google TV devices like the 47LM6700 series exemplifying the shift toward smart home interfaces. In automotive and embedded applications, the Freescale (now NXP) i.MX6 series, based on single- to quad-core Cortex-A9 configurations, was widely used in infotainment systems for features like navigation, media playback, and connectivity.[48] The i.MX6's scalability supported rugged environments, powering dashboards in vehicles from manufacturers adopting Android Automotive OS precursors.[49] The Cortex-A9 reached its market peak as the dominant processor in the 2011-2013 Android ecosystem, with widespread shipments across licensees enabling billions of devices in smartphones, tablets, and embedded systems.[11] This era solidified its role in driving the explosion of mobile computing.Benchmark Comparisons
The ARM Cortex-A9 processor exhibits substantial performance gains over the Cortex-A8, delivering more than 50% higher overall performance in single-core setups due to its out-of-order execution and dual-issue pipeline.[3] In integer workloads, it achieves roughly twice the performance of the Cortex-A8 at equivalent clock speeds, while multimedia tasks utilizing NEON SIMD extensions show up to three times the throughput, benefiting from enhanced vector processing and reduced pipeline stalls.[50] [51] Benchmark results from Geekbench 2 indicate dual-core Cortex-A9 configurations scoring approximately 800-1000 points, placing them on par with the Intel Atom N450 in contemporary netbook applications.[52] [53] Compared to the later Cortex-A15, the A9 is 30-50% slower in CPU-intensive tasks per clock cycle but consumes less power, making it suitable for efficiency-focused designs.[54] Power efficiency stands out at around 1000 DMIPS per watt in 28 nm processes, as evaluated via Dhrystone metrics, with the core rated at 2.5 DMIPS/MHz.[55] [56]| Benchmark | Cortex-A9 (Single-Core, ~1 GHz) | Comparison Context |
|---|---|---|
| Dhrystone | 2.5 DMIPS/MHz | Baseline for power-normalized efficiency in 28 nm.[56] |
| Geekbench 2 (Dual-Core) | ~800-1000 | Comparable to Intel Atom N450 multi-threaded loads.[52] [53] |
Legacy and Modern Relevance
The ARM Cortex-A9 processor significantly contributed to ARM's dominance in the mobile computing market by introducing scalable multi-core configurations that balanced performance and power efficiency for battery-constrained devices.[18] Its MPCore variant, supporting up to four cache-coherent cores, enabled high-performance applications in early smartphones and tablets, setting the stage for advanced heterogeneous architectures like big.LITTLE.[57] This multi-core innovation allowed ARM to capture a substantial share of the growing mobile processor market, influencing the shift toward clustered processing in portable electronics.[58] As of 2025, the Cortex-A9 continues to find relevance in legacy embedded and industrial applications, particularly where cost and long-term stability outweigh the need for cutting-edge performance. For instance, NXP's i.MX 6DualPlus processor, featuring dual Cortex-A9 cores, remains actively available for multimedia-enabled edge computing, industrial IoT devices, and automotive systems like e-cockpits.[59] Similarly, Artila's Matrix-770 serves as an Ubuntu Core-based IIoT gateway for industrial networking, leveraging the A9's reliability in low-to-mid-range connectivity solutions.[60] These uses highlight its persistence in sectors such as IoT gateways and alternatives to higher-end single-board computers like Raspberry Pi, where mature ecosystems ensure ongoing viability.[61] ARM has not declared the Cortex-A9 end-of-life, maintaining support through long-term maintenance agreements, with implementations like NXP's i.MX 6 series projected to receive updates until at least 2035.[62] While new licensing for the A9 has diminished since the mid-2010s in favor of ARMv8-based designs, existing deployments benefit from sustained vendor support, ensuring compatibility and security patches for embedded systems.[63] The Cortex-A9 profoundly shaped subsequent multi-core ARM designs by pioneering cache-coherent multiprocessing in the high-performance segment, facilitating seamless scaling in symmetric multi-processing environments.[18] Its adherence to the ARMv7-A architecture provides backward code compatibility with ARMv8 processors through the AArch32 execution state, allowing legacy A9 software to run on modern 64-bit ARM systems without major rewrites.[64] However, it has been outpaced by ARMv8 cores in power efficiency; for example, the Cortex-A53 delivers comparable single-threaded performance to the A9 while consuming approximately 40% less area and energy, making newer cores preferable for demanding applications.[65] Despite this, the A9 retains cost-effectiveness for low-end embedded tasks, where its proven integration and lower licensing overhead justify continued use over more advanced alternatives.[66] Successors like the Cortex-A53 have built upon this foundation, emphasizing efficiency in entry-level multi-core scenarios.[67]References
- https://en.wikichip.org/wiki/samsung/exynos/4210