Recent from talks
Nothing was collected or created yet.
List of ARM processors
View on Wikipedia
This is a list of central processing units based on the ARM family of instruction sets designed by ARM Ltd. and third parties, sorted by version of the ARM instruction set, release and name. In 2005, ARM provided a summary of the numerous vendors who implement ARM cores in their design.[1] Keil also provides a somewhat newer summary of vendors of ARM based processors.[2] ARM further provides a chart[3] displaying an overview of the ARM processor lineup with performance and functionality versus capabilities for the more recent ARM core families.
Processors
[edit]Designed by ARM
[edit]| Product family | ARM architecture | Processor | Feature | Cache (I / D), MMU | Typical MIPS @ MHz | Reference |
|---|---|---|---|---|---|---|
| ARM1 | ARMv1 | ARM1 | First implementation | None | ||
| ARM2 | ARMv2 | ARM2 | ARMv2 added the MUL (multiply) instruction | None | 0.33 DMIPS/MHz | |
| ARM2aS | ARMv2a | ARM250 | Integrated MEMC (MMU), graphics and I/O processor. ARMv2a added the SWP and SWPB (swap) instructions | None, MEMC1a | ||
| ARM3 | First integrated memory cache | 4 KB unified | 0.50 DMIPS/MHz | |||
| ARM6 | ARMv3 | ARM60 | ARMv3 first to support 32-bit memory address space (previously 26-bit). ARMv3M first added long multiply instructions (32x32=64). |
None | 10 MIPS @ 12 MHz | |
| ARM600 | As ARM60, cache and coprocessor bus (for FPA10 floating-point unit) | 4 KB unified | 28 MIPS @ 33 MHz | |||
| ARM610 | As ARM60, cache, no coprocessor bus | 4 KB unified | 17 MIPS @ 20 MHz 0.65 DMIPS/MHz |
[4] | ||
| ARM7 | ARMv3 | ARM700 | coprocessor bus (for FPA11 floating-point unit) | 8 KB unified | 40 MHz | |
| ARM710 | As ARM700, no coprocessor bus | 8 KB unified | 40 MHz | [5] | ||
| ARM710a | As ARM710, also used as core of ARM7100 | 8 KB unified | 40 MHz 0.68 DMIPS/MHz |
|||
| ARM7T | ARMv4T | ARM7TDMI(-S) | 3-stage pipeline, Thumb, ARMv4 first to drop legacy ARM 26-bit addressing | None | 15 MIPS @ 16.8 MHz 63 DMIPS @ 70 MHz |
|
| ARM710T | As ARM7TDMI, cache | 8 KB unified, MMU | 36 MIPS @ 40 MHz | |||
| ARM720T | As ARM7TDMI, cache | 8 KB unified, MMU with FCSE (Fast Context Switch Extension) | 60 MIPS @ 59.8 MHz | |||
| ARM740T | As ARM7TDMI, cache | MPU | ||||
| ARM7EJ | ARMv5TEJ | ARM7EJ-S | 5-stage pipeline, Thumb, Jazelle DBX, enhanced DSP instructions | None | ||
| ARM8 | ARMv4 | ARM810 | 5-stage pipeline, static branch prediction, double-bandwidth memory | 8 KB unified, MMU | 84 MIPS @ 72 MHz 1.16 DMIPS/MHz |
[6][7] |
| ARM9T | ARMv4T | ARM9TDMI | 5-stage pipeline, Thumb | None | ||
| ARM920T | As ARM9TDMI, cache | 16 KB / 16 KB, MMU with FCSE (Fast Context Switch Extension) | 200 MIPS @ 180 MHz | [8] | ||
| ARM922T | As ARM9TDMI, caches | 8 KB / 8 KB, MMU | ||||
| ARM940T | As ARM9TDMI, caches | 4 KB / 4 KB, MPU | ||||
| ARM9E | ARMv5TE | ARM946E-S | Thumb, enhanced DSP instructions, caches | Variable, tightly coupled memories, MPU | ||
| ARM966E-S | Thumb, enhanced DSP instructions | No cache, TCMs | ||||
| ARM968E-S | As ARM966E-S | No cache, TCMs | ||||
| ARMv5TEJ | ARM926EJ-S | Thumb, Jazelle DBX, enhanced DSP instructions | Variable, TCMs, MMU | 220 MIPS @ 200 MHz | ||
| ARMv5TE | ARM996HS | Clockless processor, as ARM966E-S | No caches, TCMs, MPU | |||
| ARM10E | ARMv5TE | ARM1020E | 6-stage pipeline, Thumb, enhanced DSP instructions, (VFP) | 32 KB / 32 KB, MMU | ||
| ARM1022E | As ARM1020E | 16 KB / 16 KB, MMU | ||||
| ARMv5TEJ | ARM1026EJ-S | Thumb, Jazelle DBX, enhanced DSP instructions, (VFP) | Variable, MMU or MPU | |||
| ARM11 | ARMv6 | ARM1136J(F)-S | 8-stage pipeline, SIMD, Thumb, Jazelle DBX, (VFP), enhanced DSP instructions, unaligned memory access | Variable, MMU | 740 @ 532–665 MHz (i.MX31 SoC), 400–528 MHz | [9] |
| ARMv6T2 | ARM1156T2(F)-S | 9-stage pipeline, SIMD, Thumb-2, (VFP), enhanced DSP instructions | Variable, MPU | [10] | ||
| ARMv6Z | ARM1176JZ(F)-S | As ARM1136EJ(F)-S | Variable, MMU + TrustZone | 965 DMIPS @ 772 MHz | [11] | |
| ARMv6K | ARM11MPCore | As ARM1136EJ(F)-S, 1–4 core SMP | Variable, MMU | |||
| SecurCore | ARMv6-M | SC000 | As Cortex-M0 | 0.9 DMIPS/MHz | ||
| ARMv4T | SC100 | As ARM7TDMI | ||||
| ARMv7-M | SC300 | As Cortex-M3 | 1.25 DMIPS/MHz | |||
| Cortex-M | ARMv6-M | Cortex-M0 | Microcontroller profile, most Thumb + some Thumb-2,[12] hardware multiply instruction (optional small), optional system timer, optional bit-banding memory | Optional cache, no TCM, no MPU | 0.84 DMIPS/MHz | [13] |
| Cortex-M0+ | Microcontroller profile, most Thumb + some Thumb-2,[12] hardware multiply instruction (optional small), optional system timer, optional bit-banding memory | Optional cache, no TCM, optional MPU with 8 regions | 0.93 DMIPS/MHz | [14] | ||
| Cortex-M1 | Microcontroller profile, most Thumb + some Thumb-2,[12] hardware multiply instruction (optional small), OS option adds SVC / banked stack pointer, optional system timer, no bit-banding memory | Optional cache, 0–1024 KB I-TCM, 0–1024 KB D-TCM, no MPU | 136 DMIPS @ 170 MHz,[15] (0.8 DMIPS/MHz FPGA-dependent)[16] | [17] | ||
| ARMv7-M | Cortex-M3 | Microcontroller profile, Thumb / Thumb-2, hardware multiply and divide instructions, optional bit-banding memory | Optional cache, no TCM, optional MPU with 8 regions | 1.25 DMIPS/MHz | [18] | |
| ARMv7E-M | Cortex-M4 | Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv4-SP single-precision FPU, hardware multiply and divide instructions, optional bit-banding memory | Optional cache, no TCM, optional MPU with 8 regions | 1.25 DMIPS/MHz (1.27 w/FPU) | [19] | |
| Cortex-M7 | Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv5 single and double precision FPU, hardware multiply and divide instructions | 0−64 KB I-cache, 0−64 KB D-cache, 0–16 MB I-TCM, 0–16 MB D-TCM (all these w/optional ECC), optional MPU with 8 or 16 regions | 2.14 DMIPS/MHz | [20] | ||
| ARMv8-M Baseline | Cortex-M23 | Microcontroller profile, Thumb-1 (most), Thumb-2 (some), Divide, TrustZone | Optional cache, no TCM, optional MPU with 16 regions | 1.03 DMIPS/MHz | [21] | |
| ARMv8-M Mainline | Cortex-M33 | Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processor | Optional cache, no TCM, optional MPU with 16 regions | 1.50 DMIPS/MHz | [22] | |
| Cortex-M35P | Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processor | Built-in cache (with option 2–16 KB), I-cache, no TCM, optional MPU with 16 regions | 1.50 DMIPS/MHz | [23] | ||
| ARMv8.1-M Mainline | Cortex-M52 | 1.60 DMIPS/MHz | [24] | |||
| Cortex-M55 | 1.69 DMIPS/MHz | [25] | ||||
| Cortex-M85 | 3.13 DMIPS/MHz | [26] | ||||
| Cortex-R | ARMv7-R | Cortex-R4 | Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lockstep with fault logic | 0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt. MPU with 8/12 regions | 1.67 DMIPS/MHz[27] | [28] |
| Cortex-R5 | Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lock-step with fault logic / optional as 2 independent cores, low-latency peripheral port (LLPP), accelerator coherency port (ACP)[29] | 0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt. MPU with 12/16 regions | 1.67 DMIPS/MHz[27] | [30] | ||
| Cortex-R7 | Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 11-stage pipeline dual-core running lock-step with fault logic / out-of-order execution / dynamic register renaming / optional as 2 independent cores, low-latency peripheral port (LLPP), ACP[29] | 0–64 KB / 0–64 KB, ? of 0–128 KB TCM, opt. MPU with 16 regions | 2.50 DMIPS/MHz[27] | [31] | ||
| Cortex-R8 | TBD | 0–64 KB / 0–64 KB L1, 0–1 / 0–1 MB TCM, opt MPU with 24 regions | 2.50 DMIPS/MHz[27] | [32] | ||
| ARMv8-R | Cortex-R52 | TBD | 0–32 KB / 0–32 KB L1, 0–1 / 0–1 MB TCM, opt MPU with 24+24 regions | 2.09 DMIPS/MHz | [33] | |
| Cortex-R52+ | TBD | 0–32 KB / 0–32 KB L1, 0–1 / 0–1 MB TCM, opt MPU with 24+24 regions | 2.09 DMIPS/MHz | [34] | ||
| Cortex-R82 | TBD | 16–128 KB /16–64 KB L1, 64K–1MB L2, 0.16–1 / 0.16–1 MB TCM,
opt MPU with 32+32 regions |
3.41 DMIPS/MHz[35] | [36] | ||
| Cortex-A (32-bit) |
ARMv7-A | Cortex-A5 | Application profile, ARM / Thumb / Thumb-2 / DSP / SIMD / Optional VFPv4-D16 FPU / Optional NEON / Jazelle RCT and DBX, 1–4 cores / optional MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) | 4−64 KB / 4−64 KB L1, MMU + TrustZone | 1.57 DMIPS/MHz per core | [37] |
| Cortex-A7 | Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / Jazelle RCT and DBX / Hardware virtualization, in-order execution, superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), architecture and feature set are identical to A15, 8–10 stage pipeline, low-power design[38] | 8−64 KB / 8−64 KB L1, 0–1 MB L2, MMU + TrustZone | 1.9 DMIPS/MHz per core | [39] | ||
| Cortex-A8 | Application profile, ARM / Thumb / Thumb-2 / VFPv3 FPU / NEON / Jazelle RCT and DAC, 13-stage superscalar pipeline | 16–32 KB / 16–32 KB L1, 0–1 MB L2 opt. ECC, MMU + TrustZone | 2.0 DMIPS/MHz | [40] | ||
| Cortex-A9 | Application profile, ARM / Thumb / Thumb-2 / DSP / Optional VFPv3 FPU / Optional NEON / Jazelle RCT and DBX, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) | 16–64 KB / 16–64 KB L1, 0–8 MB L2 opt. parity, MMU + TrustZone | 2.5 DMIPS/MHz per core, 10,000 DMIPS @ 2 GHz on Performance Optimized TSMC 40G (dual-core) | [41] | ||
| Cortex-A12 | Application profile, ARM / Thumb-2 / DSP / VFPv4 FPU / NEON / Hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) | 32−64 KB | 3.0 DMIPS/MHz per core | [42] | ||
| Cortex-A15 | Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP, 15-24 stage pipeline[38] | 32 KB w/parity / 32 KB w/ECC L1, 0–4 MB L2, L2 has ECC, MMU + TrustZone | At least 3.5 DMIPS/MHz per core (up to 4.01 DMIPS/MHz depending on implementation)[43] | [44] | ||
| Cortex-A17 | Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP | 32 KB L1, 256 KB–8 MB L2 w/optional ECC | 2.8 DMIPS/MHz | [45] | ||
| ARMv8-A | Cortex-A32 | Application profile, AArch32, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, dual issue, in-order pipeline | 8–64 KB w/optional parity / 8−64 KB w/optional ECC L1 per core, 128 KB–1 MB L2 w/optional ECC shared | [46] | ||
| Cortex-A (64-bit) |
ARMv8-A | Cortex-A34 | Application profile, AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline | 8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–1 MB L2 shared, 40-bit physical addresses | [47] | |
| Cortex-A35 | Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline | 8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–1 MB L2 shared, 40-bit physical addresses | 1.78 DMIPS/MHz | [48] | ||
| Cortex-A53 | Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline | 8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–2 MB L2 shared, 40-bit physical addresses | 2.3 DMIPS/MHz | [49] | ||
| Cortex-A57 | Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline | 48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB–2 MB L2 shared w/ECC; 44-bit physical addresses | 4.1–4.8 DMIPS/MHz[50][51] | [52] | ||
| Cortex-A72 | Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width superscalar, deeply out-of-order pipeline | 48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB–2 MB L2 shared w/ECC; 44-bit physical addresses | 6.3-7.3 DMIPS/MHz[53] | [54] | ||
| Cortex-A73 | Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width superscalar, deeply out-of-order pipeline | 64 KB / 32−64 KB L1 per core, 256 KB–8 MB L2 shared w/ optional ECC, 44-bit physical addresses | 7.4-8.5 DMIPS/MHz[53] | [55] | ||
| ARMv8.2-A | Cortex-A55 | Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline[56] | 16−64 KB / 16−64 KB L1, 256 KB L2 per core, 4 MB L3 shared | 3 DMIPS/MHz[53] | [57] | |
| Cortex-A65 | Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, out-of-order pipeline, SMT | [58] | ||||
| Cortex-A65AE | As ARM Cortex-A65, adds dual core lockstep for safety applications | 64 / 64 KB L1, 256 KB L2 per core, 4 MB L3 shared | [59] | |||
| Cortex-A75 | Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline[60] | 64 / 64 KB L1, 512 KB L2 per core, 4 MB L3 shared | 8.2-9.5 DMIPS/MHz[53] | [61] | ||
| Cortex-A76 | Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way issue, 13 stage pipeline, deeply out-of-order pipeline[62] | 64 / 64 KB L1, 256−512 KB L2 per core, 512 KB−4 MB L3 shared | 10.7-12.4 DMIPS/MHz[53] | [63] | ||
| Cortex-A76AE | As ARM Cortex-A76, adds dual core lockstep for safety applications | [64] | ||||
| Cortex-A77 | Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 6-width instruction fetch, 12-way issue, 13 stage pipeline, deeply out-of-order pipeline[62] | 1.5K L0 MOPs cache, 64 / 64 KB L1, 256−512 KB L2 per core, 512 KB−4 MB L3 shared | 13-16 DMIPS/MHz[65] | [66] | ||
| Cortex-A78 | [67] | |||||
| Cortex-A78AE | As ARM Cortex-A78, adds dual core lockstep for safety applications | [68] | ||||
| Cortex-A78C | [69] | |||||
| ARMv9-A | Cortex-A510 | [70] | ||||
| Cortex-A710 | [71] | |||||
| Cortex-A715 | [72] | |||||
| ARMv9.2-A | Cortex-A320 | [73] | ||||
| Cortex-A520 | [74] | |||||
| Cortex-A720 | [75] | |||||
| Cortex-A725 | [76] | |||||
| Cortex-X | ARMv8.2-A | Cortex-X1 | Performance-tuned variant of Cortex-A78 | |||
| ARMv9-A | Cortex-X2 | 64 / 64 KB L1, 512–1024 KiB L2 per core, 512 KiB–8 MiB L3 shared | [77] | |||
| Cortex-X3 | 64 / 64 KB L1, 512–2048 KiB L2 per core, 512 KiB–16 MiB L3 shared | [78] | ||||
| ARMv9.2-A | Cortex-X4 | 64 / 64 KB L1, 512–2048 KiB L2 per core, 512 KiB–32 MiB L3 shared | [79] | |||
| Cortex-X925 | [80] | |||||
| Neoverse | ARMv8.2-A | Neoverse N1 | Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way dispatch/issue, 13 stage pipeline, deeply out-of-order pipeline[62] | 64 / 64 KB L1, 512−1024 KB L2 per core, 2−128 MB L3 shared, 128 MB system level cache | [81] | |
| Neoverse E1 | Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, 10 stage pipeline, out-of-order pipeline, SMT | 32−64 KB / 32−64 KB L1, 256 KB L2 per core, 4 MB L3 shared | [82] | |||
| ARMv8.4-A | Neoverse V1 | [83] | ||||
| ARMv9-A | Neoverse N2 | [84] | ||||
| Neoverse V2 | [85] | |||||
| ARMv9.2-A | Neoverse N3 | [86] | ||||
| Neoverse V3 | [87] | |||||
| ARM family | ARM architecture | ARM core | Feature | Cache (I / D), MMU | Typical MIPS @ MHz | Reference |
Designed by third parties
[edit]These cores implement the ARM instruction set, and were developed independently by companies with an architectural license from ARM.
| Product family | ARM architecture | Processor | Feature | Cache (I / D), MMU | Typical MIPS @ MHz |
|---|---|---|---|---|---|
| StrongARM (Digital) |
ARMv4 | SA-110 | 5-stage pipeline | 16 KB / 16 KB, MMU | 100–233 MHz 1.0 DMIPS/MHz |
| SA-1100 | derivative of the SA-110 | 16 KB / 8 KB, MMU | |||
| Faraday[88] (Faraday Technology) |
ARMv4 | FA510 | 6-stage pipeline | Up to 32 KB / 32 KB cache, MPU | 1.26 DMIPS/MHz 100–200 MHz |
| FA526 | Up to 32 KB / 32 KB cache, MMU | 1.26 MIPS/MHz 166–300 MHz | |||
| FA626 | 8-stage pipeline | 32 KB / 32 KB cache, MMU | 1.35 DMIPS/MHz 500 MHz | ||
| ARMv5TE | FA606TE | 5-stage pipeline | No cache, no MMU | 1.22 DMIPS/MHz 200 MHz | |
| FA626TE | 8-stage pipeline | 32 KB / 32 KB cache, MMU | 1.43 MIPS/MHz 800 MHz | ||
| FMP626TE | 8-stage pipeline, SMP | 1.43 MIPS/MHz 500 MHz | |||
| FA726TE | 13 stage pipeline, dual issue | 2.4 DMIPS/MHz 1000 MHz | |||
| XScale (Intel / Marvell) |
ARMv5TE | XScale | 7-stage pipeline, Thumb, enhanced DSP instructions | 32 KB / 32 KB, MMU | 133–400 MHz |
| Bulverde | Wireless MMX, wireless SpeedStep added | 32 KB / 32 KB, MMU | 312–624 MHz | ||
| Monahans[89] | Wireless MMX2 added | 32 KB / 32 KB L1, optional L2 cache up to 512 KB, MMU | Up to 1.25 GHz | ||
| Sheeva (Marvell) |
ARMv5 | Feroceon | 5–8 stage pipeline, single-issue | 16 KB / 16 KB, MMU | 600–2000 MHz |
| Jolteon | 5–8 stage pipeline, dual-issue | 32 KB / 32 KB, MMU | |||
| PJ1 (Mohawk) | 5–8 stage pipeline, single-issue, Wireless MMX2 | 32 KB / 32 KB, MMU | 1.46 DMIPS/MHz 1.06 GHz | ||
| ARMv6 / ARMv7-A | PJ4 | 6–9 stage pipeline, dual-issue, Wireless MMX2, SMP | 32 KB / 32 KB, MMU | 2.41 DMIPS/MHz 1.6 GHz | |
| Snapdragon (Qualcomm) |
ARMv7-A | Scorpion[90] | 1 or 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv3 FPU / NEON (128-bit wide) | 256 KB L2 per core | 2.1 DMIPS/MHz per core |
| Krait[90] | 1, 2, or 4 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON (128-bit wide) | 4 KB / 4 KB L0, 16 KB / 16 KB L1, 512 KB L2 per core | 3.3 DMIPS/MHz per core | ||
| ARMv8-A | Kryo[91] | 4 cores. | ? | Up to 2.2 GHz
(6.3 DMIPS/MHz) | |
| A series (Apple) |
ARMv7-A | Swift[92] | 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON | L1: 32 KB / 32 KB, L2: 1 MB shared | 3.5 DMIPS/MHz per core |
| ARMv8-A | Cyclone[93] | 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64. Out-of-order, superscalar. | L1: 64 KB / 64 KB, L2: 1 MB shared SLC: 4 MB |
1.3 or 1.4 GHz | |
| ARMv8-A | Typhoon[93][94] | 2 or 3 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64 | L1: 64 KB / 64 KB, L2: 1 MB or 2 MB shared SLC: 4 MB |
1.4 or 1.5 GHz | |
| ARMv8-A | Twister[95] | 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64 | L1: 64 KB / 64 KB, L2: 2 MB shared SLC: 4 MB or 0 MB |
1.85 or 2.26 GHz | |
| ARMv8-A | Hurricane and Zephyr[96] | Hurricane: 2 or 3 cores. AArch64, out-of-order, superscalar, 6-decode, 6-issue, 9-wide Zephyr: 2 or 3 cores. AArch64, out-of-order, superscalar. |
L1: 64 KB / 64 KB, L2: 3 MB or 8 MB shared L1: 32 KB / 32 KB. L2: none SLC: 4 MB or 0 MB |
2.34 or 2.38 GHz 1.05 GHz | |
| ARMv8.2-A | Monsoon and Mistral[97] | Monsoon: 2 cores. AArch64, out-of-order, superscalar, 7-decode, ?-issue, 11-wide Mistral: 4 cores. AArch64, out-of-order, superscalar. Based on Swift. |
L1I: 128 KB, L1D: 64 KB, L2: 8 MB shared L1: 32 KB / 32 KB, L2: 1 MB shared SLC: 4 MB |
2.39 GHz 1.70 GHz | |
| ARMv8.3-A | Vortex and Tempest[98] | Vortex: 2 or 4 cores. AArch64, out-of-order, superscalar, 7-decode, ?-issue, 11-wide Tempest: 4 cores. AArch64, out-of-order, superscalar, 3-decode. Based on Swift. |
L1: 128 KB / 128 KB, L2: 8 MB shared L1: 32 KB / 32 KB, L2: 2 MB shared SLC: 8 MB |
2.49 GHz 1.59 GHz | |
| ARMv8.4-A | Lightning and Thunder[99] | Lightning: 2 cores. AArch64, out-of-order, superscalar, 7-decode, ?-issue, 11-wide Thunder: 4 cores. AArch64, out-of-order, superscalar. |
L1: 128 KB / 128 KB, L2: 8 MB shared L1: 32 KB / 48 KB, L2: 4 MB shared SLC: 16 MB |
2.66 GHz 1.73 GHz | |
| ARMv8.5-A | Firestorm and Icestorm[100] | Firestorm: 2 cores. AArch64, out-of-order, superscalar, 8-decode, ?-issue, 14-wide Icestorm: 4 cores. AArch64, out-of-order, superscalar, 4-decode, ?-issue, 7-wide. |
L1: 192 KB / 128 KB, L2: 8 MB shared L1: 128 KB / 64 KB, L2: 4 MB shared SLC: 16 MB |
3.0 GHz 1.82 GHz | |
| ARMv8.6-A | Avalanche and Blizzard | Avalanche: 2 cores. AArch64, out-of-order, superscalar, 8-decode, ?-issue, 14-wide Blizzard: 4 cores. AArch64, out-of-order, superscalar, 4-decode, ?-issue, 8-wide. |
L1: 192 KB / 128 KB, L2: 12 MB shared L1: 128 KB / 64 KB, L2: 4 MB shared SLC: 32 MB |
2.93 or 3.23 GHz 2.02 GHz | |
| ARMv8.6-A | Everest and Sawtooth | Everest: 2 cores. AArch64, out-of-order, superscalar, 8-decode, ?-issue, 14-wide Sawtooth: 4 cores. AArch64, out-of-order, superscalar, 4-decode, ?-issue, 8-wide. |
L1: 192 KB / 128 KB, L2: 16 MB shared L1: 128 KB / 64 KB, L2: 4 MB shared SLC: 24 MB |
3.46 GHz 2.02 GHz | |
| ARMv8.6-A | Apple A17 Pro | Apple A17 Pro (P-cores): 2 cores. AArch64, out-of-order, superscalar, 8-decode, ?-issue, 14-wide Apple A17 Pro (E-cores): 4 cores. AArch64, out-of-order, superscalar, 4-decode, ?-issue, 8-wide. |
L1: 192 KB / 128 KB, L2: 16 MB shared L1: 128 KB / 64 KB, L2: 4 MB shared SLC: 24 MB |
3.78 GHz 2.11 GHz | |
| M series (Apple) |
ARMv8.5-A | Firestorm and Icestorm | Firestorm: 4, 6, 8 or 16 cores. AArch64, out-of-order, superscalar, 8-decode, 8-issue, 14-wide Icestorm: 2 or 4 cores. AArch64, out-of-order, superscalar, 4-decode, 4-issue, 7-wide. |
L1: 192 KB / 128 KB, L2: 12, 24 or 48 MB shared L1: 128 KB / 64 KB, L2: 4 or 8 MB shared SLC: 8, 24, 48 or 96 MB |
3.2-3.23 GHz 2.06 GHz |
| ARMv8.6-A | Avalanche and Blizzard | Avalanche: 4, 6, 8 or 16 cores. AArch64, out-of-order, superscalar, 8-decode, 8-issue, 14-wide Blizzard: 4 or 8 cores. AArch64, out-of-order, superscalar, 4-decode, 4-issue, 8-wide. |
L1: 192 KB / 128 KB, L2: 16, 32 or 64 MB shared L1: 128 KB / 64 KB, L2: 4 or 8 MB shared SLC: 8, 24, 48 or 96 MB |
3.49 GHz 2.42 GHz | |
| ARMv8.6-A | Apple M3 | Apple M3 (P-cores): 4, 5, 6, 10, 12 or 16 cores. AArch64, out-of-order, superscalar, 9-decode, 9-issue, 14-wide Apple M3 (E-cores): 4 or 6 cores. AArch64, out-of-order, superscalar, 5-decode, 5-issue, 8-wide. |
L1: 192 KB / 128 KB, L2: 16, 32 or 64 MB shared L1: 128 KB / 64 KB, L2: 4 or 8 MB shared SLC: 8, 24, 48 or 96 MB |
4.05 GHz 2.75 GHz | |
| ARMv9.2-A | Apple M4 | Apple M4 (P-cores): 3 or 4 cores. AArch64, out-of-order, superscalar, 10-decode, 10-issue, 16-wide Apple M4 (E-cores): 6 cores. AArch64, out-of-order, superscalar, 5-decode, 5-issue, 8-wide. |
L1: 192 KB / 128 KB, L2: 16, 32 or 64 MB shared L1: 128 KB / 64 KB, L2: 4 or 8 MB shared SLC: 8, 24, 48 or 96 MB |
4.40 GHz 2.85 GHz | |
| X-Gene (Applied Micro) |
ARMv8-A | X-Gene | 64-bit, quad issue, SMP, 64 cores[101] | Cache, MMU, virtualization | 3 GHz (4.2 DMIPS/MHz per core) |
| Denver (Nvidia) |
ARMv8-A | Denver[102][103] | 2 cores. AArch64, 7-wide superscalar, in-order, dynamic code optimization, 128 MB optimization cache, Denver1: 28 nm, Denver2:16 nm |
128 KB I-cache / 64 KB D-cache | Up to 2.5 GHz |
| Carmel (Nvidia) |
ARMv8.2-A | Carmel[104][105] | 2 cores. AArch64, 10-wide superscalar, in-order, dynamic code optimization, ? MB optimization cache, functional safety, dual execution, parity & ECC |
? KB I-cache / ? KB D-cache | Up to ? GHz |
| ThunderX (Cavium) |
ARMv8-A | ThunderX | 64-bit, with two models with 8–16 or 24–48 cores (×2 w/two chips) | ? | Up to 2.2 GHz |
| K12 (AMD) |
ARMv8-A | K12[106] | ? | ? | ? |
| Exynos (Samsung) |
ARMv8-A | M1 ("Mongoose")[107] | 4 cores. AArch64, 4-wide, quad-issue, superscalar, out-of-order | 64 KB I-cache / 32 KB D-cache, L2: 16-way shared 2 MB | 5.1 DMIPS/MHz
(2.6 GHz) |
| ARMv8-A | M2 ("Mongoose") | 4 cores. AArch64, 4-wide, quad-issue, superscalar, out-of-order | 64 KB I-cache / 32 KB D-cache, L2: 16-way shared 2 MB | 2.3 GHz | |
| ARMv8-A | M3 ("Meerkat")[108] | 4 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order | 64 KB I-cache / 64 KB D-cache, L2: 8-way private 512 KB, L3: 16-way shared 4 MB | 2.7 GHz | |
| ARMv8.2-A | M4 ("Cheetah")[109] | 2 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order | 64 KB I-cache / 64 KB D-cache, L2: 8-way private 1 MB, L3: 16-way shared 3 MB | 2.73 GHz | |
| ARMv8.2-A | M5 ("Lion") | 2 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order | 64 KB I-cache / 64 KB D-cache, L2: 8-way shared 2 MB, L3: 12-way shared 3 MB | 2.73 GHz |
Timeline
[edit]The following table lists each core by the year it was announced.[110][111]
ARM Classic
[edit]| Year | Classic cores | ||||||
|---|---|---|---|---|---|---|---|
| ARM1-3 | ARM6 | ARM7 | ARM8 | ARM9 | ARM10 | ARM11 | |
| 1985 | ARM1 | ||||||
| 1986 | ARM2 | ||||||
| 1989 | ARM3 | ||||||
| 1992 | ARM250 | ||||||
| 1993 | ARM60 ARM610 |
ARM700 | |||||
| 1994 | ARM710 ARM7DI ARM7TDMI |
||||||
| 1995 | ARM710a | ||||||
| 1996 | ARM810 | ||||||
| 1997 | ARM710T ARM720T ARM740T |
||||||
| 1998 | ARM9TDMI ARM940T |
||||||
| 1999 | ARM9E-S ARM966E-S |
||||||
| 2000 | ARM920T ARM922T ARM946E-S |
ARM1020T | |||||
| 2001 | ARM7EJ-S ARM7TDMI-S |
ARM9EJ-S ARM926EJ-S |
ARM1020E ARM1022E |
||||
| 2002 | ARM1026EJ-S | ARM1136J(F)-S | |||||
| 2003 | ARM968E-S | ARM1156T2(F)-S ARM1176JZ(F)-S | |||||
| 2004 | |||||||
| 2005 | ARM11MPCore | ||||||
| 2006 | ARM996HS | ||||||
ARM Cortex / Neoverse
[edit]| Year | Cortex cores | Neoverse cores | ||||
|---|---|---|---|---|---|---|
| Microcontroller (Cortex-M) |
Real-time (Cortex-R) |
Application (Cortex-A) (32-bit) |
Application (Cortex-A) (64-bit) |
Application (Cortex-X) (64-bit) |
Application (Neoverse) (64-bit) | |
| 2004 | Cortex-M3 | |||||
| 2005 | Cortex-A8 | |||||
| 2006 | ||||||
| 2007 | Cortex-M1 | Cortex-A9 | ||||
| 2008 | ||||||
| 2009 | Cortex-M0 | Cortex-A5 | ||||
| 2010 | Cortex-M4(F) | Cortex-A15 | ||||
| 2011 | Cortex-R4(F) Cortex-R5(F) Cortex-R7(F) |
Cortex-A7 | ||||
| 2012 | Cortex-M0+ | Cortex-A53 Cortex-A57 |
||||
| 2013 | Cortex-A12 | |||||
| 2014 | Cortex-M7(F) | Cortex-A17 | ||||
| 2015 | Cortex-A35 Cortex-A72 |
|||||
| 2016 | Cortex-M23 Cortex-M33(F) |
Cortex-R8(F) Cortex-R52(F) |
Cortex-A32 | Cortex-A73 | ||
| 2017 | Cortex-A55 Cortex-A75 |
|||||
| 2018 | Cortex-M35P(F) | Cortex-A65 Cortex-A65AE Cortex-A76 Cortex-A76AE |
||||
| 2019 | Cortex-A34 | Cortex-A77 | Neoverse E1 Neoverse N1 | |||
| 2020 | Cortex-M55(F) | Cortex-R82(F) | Cortex-A78 Cortex-A78AE Cortex-A78C |
Cortex-X1 [112] |
Neoverse V1 [113] | |
| 2021 | Cortex-A510 Cortex-A710 |
Cortex-X2 | Neoverse E2 Neoverse N2 | |||
| 2022 | Cortex-M85(F) | Cortex-R52+(F) | Cortex-A715 | Cortex-X3 | Neoverse V2 | |
| 2023 | Cortex-M52(F) | Cortex-A520 Cortex-A720 |
Cortex-X4 | Neoverse E3 Neoverse N3 | ||
| 2024 | Cortex-R82AE | Cortex-A520AE Cortex-A720AE Cortex-A725 |
Cortex-X925 | Neoverse V3 Neoverse V3AE Neoverse VN | ||
| 2025 | Cortex-A320 Cortex-A530 Cortex-A730 |
Cortex-X930 | Neoverse E4 Neoverse N4 Neoverse V4 | |||
See also
[edit]References
[edit]- ^ "ARM Powered Standard Products" (PDF). 2005. Archived from the original (PDF) on 20 October 2017. Retrieved 23 December 2017.
- ^ ARM Ltd and ARM Germany GmbH. "Device Database". Keil. Archived from the original on 10 January 2011. Retrieved 6 January 2011.
- ^ "Processors". ARM. 2011. Archived from the original on 17 January 2011. Retrieved 6 January 2011.
- ^ "ARM610 Datasheet" (PDF). ARM Holdings. August 1993. Retrieved 29 January 2019.
- ^ "ARM710 Datasheet" (PDF). ARM Holdings. July 1994. Retrieved 29 January 2019.
- ^ ARM Holdings (7 August 1996). "ARM810 – Dancing to the Beat of a Different Drum" (PDF). Hot Chips. Archived (PDF) from the original on 24 December 2018. Retrieved 14 November 2018.
- ^ "VLSI Technology Now Shipping ARM810". EE Times. 26 August 1996. Archived from the original on 26 September 2013. Retrieved 21 September 2013.
- ^ Register 13, FCSE PID register Archived 7 July 2011 at the Wayback Machine ARM920T Technical Reference Manual
- ^ "ARM1136J(F)-S – ARM Processor". Arm.com. Archived from the original on 21 March 2009. Retrieved 18 April 2009.
- ^ "ARM1156 Processor". Arm Holdings. Archived from the original on 13 February 2010.
- ^ "ARM11 Processor Family". ARM. Archived from the original on 15 January 2011. Retrieved 12 December 2010.
- ^ a b c "Cortex-M0/M0+/M1 Instruction set; ARM Holding". Archived from the original on 18 April 2013.
- ^ "Cortex-M0". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-M0+". Arm Developer. Retrieved 23 September 2020.
- ^ "ARM Extends Cortex Family with First Processor Optimized for FPGA" (Press release). ARM Holdings. 19 March 2007. Archived from the original on 5 May 2007. Retrieved 11 April 2007.
- ^ "ARM Cortex-M1". ARM product website. Archived from the original on 1 April 2007. Retrieved 11 April 2007.
- ^ "Cortex-M1". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-M3". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-M4". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-M7". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-M23". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-M33". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-M35P". Arm Developer. Archived from the original on 8 May 2019. Retrieved 29 April 2019.
- ^ "Cortex-M52". Arm Developer. Retrieved 23 November 2023.
- ^ "Cortex-M55". Arm Developer. Retrieved 28 September 2020.
- ^ "Cortex-M85". Arm Developer. Retrieved 7 July 2022.
- ^ a b c d "Cortex-R – Arm Developer". ARM Developer. Arm Ltd. Archived from the original on 30 March 2018. Retrieved 29 March 2018.
- ^ "Cortex-R4". Arm Developer. Retrieved 23 September 2020.
- ^ a b "Cortex-R5 & Cortex-R7 Press Release; ARM Holdings; 31 January 2011". Archived from the original on 7 July 2011. Retrieved 13 June 2011.
- ^ "Cortex-R5". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-R7". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-R8". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-R52". Arm Developer. Archived from the original on 23 November 2023. Retrieved 23 November 2023.
- ^ "Cortex-R52". Arm Developer. Archived from the original on 23 November 2023. Retrieved 23 November 2023.
- ^ "Cortex-R82". Arm Developer. Retrieved 30 September 2020.
- ^ "Arm Cortex-R comparison Table_v2" (PDF). ARM Developer. 2020. Retrieved 30 September 2020.
- ^ "Cortex-A5". Arm Developer. Retrieved 23 September 2020.
- ^ a b "Deep inside ARM's new Intel killer". The Register. 20 October 2011. Archived from the original on 10 August 2017. Retrieved 10 August 2017.
- ^ "Cortex-A7". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-A8". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-A9". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-A12 Summary; ARM Holdings". Archived from the original on 7 June 2013. Retrieved 3 June 2013.
- ^ "Exclusive : ARM Cortex-A15 "40 Per Cent" Faster Than Cortex-A9 | ITProPortal.com". Archived from the original on 21 July 2011. Retrieved 13 June 2011.
- ^ "Cortex-A15". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-A17". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-A32". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-A34". Arm Developer. Retrieved 11 October 2019.
- ^ "Cortex-A35". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-A53". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-Ax vs performance". Archived from the original on 15 June 2017. Retrieved 5 May 2017.
- ^ "Relative Performance of ARM Cortex-A 32-bit and 64-bit Cores". 9 April 2015. Archived from the original on 1 May 2017. Retrieved 5 May 2017.
- ^ "Cortex-A57". Arm Developer. Retrieved 23 September 2020.
- ^ a b c d e Sima, Dezső (November 2018). "ARM's processor lines" (PDF). University of Óbuda, Neumann Faculty. Retrieved 26 May 2022.
- ^ "Cortex-A72". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-A73". Arm Developer. Retrieved 23 September 2020.
- ^ "Hardware.Info Nederland". nl.hardware.info (in Dutch). Archived from the original on 24 December 2018. Retrieved 27 November 2017.
- ^ "Cortex-A55". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-A65". Arm Developer. Retrieved 3 October 2020.
- ^ "Cortex-A65AE". Arm Developer. Retrieved 11 October 2019.
- ^ "Hardware.Info Nederland". nl.hardware.info (in Dutch). Archived from the original on 24 December 2018. Retrieved 27 November 2017.
- ^ "Cortex-A75". Arm Developer. Retrieved 23 September 2020.
- ^ a b c "Arm's Cortex-A76 CPU Unveiled: Taking Aim at the Top for 7nm". AnandTech. Archived from the original on 16 November 2018. Retrieved 15 November 2018.
- ^ "Cortex-A76". Arm Developer. Retrieved 23 September 2020.
- ^ "Cortex-A76AE". Arm Developer. Retrieved 29 September 2020.
- ^ According to ARM, the Cortex-A77 has a 20% IPC single-thread performance improvement over its predecessor in Geekbench 4, 23% in SPECint2006, 35% in SPECfp2006, 20% in SPECint2017, and 25% in SPECfp2017
- ^ "Cortex-A77". Arm Developer. Retrieved 16 June 2019.
- ^ "Cortex-A78". Arm Developer. Retrieved 29 September 2020.
- ^ "Cortex-A78AE". Arm Developer. Retrieved 30 September 2020.
- ^ "Cortex-A78C". Arm Developer. Retrieved 26 November 2020.
- ^ "Cortex-A510". developer.arm.com. Retrieved 11 October 2024.
- ^ "First Armv9 Cortex CPUs for Consumer Compute". community.arm.com. Retrieved 24 August 2021.
- ^ "Cortex-A715". developer.arm.com. Retrieved 11 October 2024.
- ^ "Cortex-A320". developer.arm.com. Retrieved 26 February 2025.
- ^ "Cortex-A520". developer.arm.com. Retrieved 11 October 2024.
- ^ "Cortex-A720". developer.arm.com. Retrieved 11 October 2024.
- ^ "Cortex-A725". developer.arm.com. Retrieved 11 October 2024.
- ^ "Cortex-X2". developer.arm.com. Retrieved 11 October 2024.
- ^ "Cortex-X3". developer.arm.com. Retrieved 11 October 2024.
- ^ "Cortex-X4". developer.arm.com. Retrieved 11 October 2024.
- ^ "Cortex-X925". developer.arm.com. Retrieved 11 October 2024.
- ^ "Neoverse N1". Arm Developer. Retrieved 16 June 2019.
- ^ "Neoverse E1". Arm Developer. Retrieved 3 October 2020.
- ^ "Neoverse V1". developer.arm.com. Retrieved 30 August 2022.
- ^ "Neoverse N2". developer.arm.com. Retrieved 30 August 2022.
- ^ "Neoverse V2". developer.arm.com. Retrieved 8 May 2022.
- ^ "Neoverse N3". developer.arm.com. Retrieved 8 May 2024.
- ^ "Neoverse V3". developer.arm.com. Retrieved 8 May 2022.
- ^ "Processor Cores". Faraday Technology. Archived from the original on 19 February 2015. Retrieved 19 February 2015.
- ^ "3rd Generation Intel XScale Microarchitecture: Developer's Manual" (PDF). download.intel.com. Intel. May 2007. Archived (PDF) from the original on 25 February 2008. Retrieved 2 December 2010.
- ^ a b "Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored". AnandTech. Archived from the original on 9 October 2011. Retrieved 23 September 2020.
- ^ "Snapdragon 820 and Kryo CPU: heterogeneous computing and the role of custom compute". Qualcomm. 2 September 2015. Archived from the original on 5 September 2015. Retrieved 6 September 2015.
- ^ Lal Shimpi, Anand (15 September 2012). "The iPhone 5's A6 SoC: Not A15 or A9, a Custom Apple Core Instead". AnandTech. Archived from the original on 15 September 2012. Retrieved 15 September 2012.
- ^ a b Smith, Ryan (11 November 2014). "Apple A8X's GPU - GAX6850, Even Better Than I Thought". AnandTech. Archived from the original on 30 November 2014. Retrieved 29 November 2014.
- ^ Chester, Brandon (15 July 2015). "Apple Refreshes The iPod Touch With A8 SoC And New Cameras". AnandTech. Archived from the original on 5 September 2015. Retrieved 11 September 2015.
- ^ Ho, Joshua (28 September 2015). "iPhone 6s and iPhone 6s Plus Preliminary Results". AnandTech. Archived from the original on 26 May 2016. Retrieved 18 December 2015.
- ^ Ho, Joshua (28 September 2015). "The iPhone 7 and iPhone 7 Plus Review". AnandTech. Archived from the original on 14 September 2017. Retrieved 14 September 2017.
- ^ "A11 Bionic - Apple". WikiChip. Retrieved 1 February 2019.
- ^ "The iPhone XS & XS Max Review: Unveiling the Silicon Secrets". AnandTech. Archived from the original on 12 February 2019. Retrieved 11 February 2019.
- ^ Frumusanu, Andrei. "The Apple iPhone 11, 11 Pro & 11 Pro Max Review: Performance, Battery, & Camera Elevated". AnandTech. Archived from the original on 16 October 2019. Retrieved 20 October 2019.
- ^ Frumusanu, Andrei. "The iPhone 12 & 12 Pro Review: New Design and Diminishing Returns". AnandTech. Archived from the original on 30 November 2020. Retrieved 5 April 2021.
- ^ "AppliedMicro's 64-core chip could spark off ARM core war copy". 12 August 2014. Archived from the original on 21 August 2014. Retrieved 21 August 2014.
- ^ "NVIDIA Denver Hot Chips Disclosure". Archived from the original on 5 December 2014. Retrieved 29 November 2014.
- ^ "Mile High Milestone: Tegra K1 "Denver" Will Be First 64-bit ARM Processor for Android". Archived from the original on 12 August 2014. Retrieved 29 November 2014.
- ^ "Drive Xavier für autonome Autos wird ausgeliefert" (in German). Archived from the original on 5 March 2018. Retrieved 5 March 2018.
- ^ "NVIDIA Drive Xavier SOC Detailed – A Marvel of Engineering, Biggest and Most Complex SOC Design To Date With 9 Billion Transistors". 8 January 2018. Archived from the original on 24 February 2018. Retrieved 5 March 2018.
- ^ "AMD Announces K12 Core: Custom 64-bit ARM Design in 2016". Archived from the original on 26 June 2015. Retrieved 26 June 2015.
- ^ "Samsung Announces Exynos 8890 with Cat.12/13 Modem and Custom CPU". AnandTech. Archived from the original on 25 January 2016. Retrieved 23 September 2020.
- ^ "Hot Chips 2018: Samsung's Exynos-M3 CPU Architecture Deep Dive". AnandTech. Archived from the original on 20 August 2018. Retrieved 20 August 2018.
- ^ "ISCA 2020: Evolution of the Samsung Exynos CPU Microarchitecture". AnandTech. 3 June 2020. Archived from the original on 3 June 2020. Retrieved 27 December 2021.
- ^ "ARM Company Milestones". Archived from the original on 28 March 2014. Retrieved 6 April 2014.
- ^ "ARM Press Releases". Archived from the original on 9 April 2014. Retrieved 6 April 2014.
- ^ "Arm's New Cortex-A78 and Cortex-X1 Microarchitectures: An Efficiency and Performance Divergence". Anandtech. Archived from the original on 26 May 2020. Retrieved 15 April 2021.
- ^ "Arm Announces Neoverse V1 & N2 Infrastructure CPUs: +50% IPC, SVE Server Cores". Anandtech. 22 September 2020. Archived from the original on 22 September 2020. Retrieved 15 April 2021.
Further reading
[edit]List of ARM processors
View on Grokipedia- Cortex-A series: High-performance application processors for smartphones, tablets, and laptops, supporting advanced features like simultaneous multithreading (SMT) and Armv9's Scalable Vector Extension (SVE2) for AI and machine learning workloads. In 2025, Arm began rebranding its processor lines, moving away from the traditional Cortex nomenclature.[6][7]
- Cortex-M series: Ultra-low-power microcontrollers for IoT, wearables, and industrial sensors, emphasizing deterministic execution and energy efficiency in devices like smart home gadgets.[8]
- Cortex-R series: Real-time processors for automotive, networking, and storage systems, providing low-latency response critical for safety-critical applications such as engine control units.[6]
- Neoverse series: Infrastructure-grade cores for servers and cloud computing, optimized for high-throughput tasks in data centers, with models like Neoverse V2 delivering scalable performance for hyperscale environments.[6]
Instruction Set Architectures
Early Architectures (ARMv1 to ARMv6)
The ARM architecture originated with ARMv1 in 1985, establishing a foundational 32-bit reduced instruction set computing (RISC) design optimized for low power and efficiency in embedded systems. It employed a load/store architecture, where data processing instructions operated only on registers, while loads and stores handled memory access. The processor featured a 3-stage pipeline consisting of fetch, decode, and execute stages, enabling simple yet effective instruction throughput. At its core were 16 visible 32-bit general-purpose registers (R0–R12 for general use, R13 as stack pointer, R14 as link register, and R15 as program counter), drawn from a total of 37 banked registers to support context switching. Condition flags for negative (N), zero (Z), carry (C), and overflow (V) were maintained in the Current Program Status Register (CPSR). The architecture supported multiple execution modes—User for unprivileged code, Supervisor for operating system tasks, IRQ for interrupt requests, and FIQ for fast interrupts—to manage privilege levels and exceptions securely. A key innovation was the barrel shifter integrated into arithmetic logic unit (ALU) operations, allowing efficient single-cycle shifts, rotates, or multiplies by powers of two on operands. Both big-endian and little-endian byte ordering were configurable, providing flexibility for diverse system requirements.[12] ARMv2, introduced in 1986, built upon this foundation by adding multiply and multiply-accumulate instructions, which accelerated integer computations essential for early signal processing tasks, along with coprocessor interface support for extending functionality via external units. The register set, pipeline, modes, and barrel shifter remained consistent, preserving backward compatibility while enhancing performance. ARMv3, released in 1990, advanced the design to a full 32-bit address space from the prior 26-bit limitation, improving exception handling with added modes including Abort for memory faults, Undefined for unimplemented instructions, and System for privileged operations. It refined load/store multiple (LDM/STM) instructions for efficient block transfers, such as stack operations, and strengthened interworking between modes via banked registers. These changes solidified ARM's suitability for more complex embedded applications without altering the core RISC principles.[12] By 1994, ARMv4 introduced halfword (16-bit) load and store instructions to better handle mixed data sizes, alongside a dedicated kernel privilege mode for low-level system access. The T variant (ARMv4T) marked a significant evolution with the Thumb instruction set, a 16-bit compressed subset of the 32-bit ARM instructions, aimed at reducing code size by up to 30% in memory-constrained environments like early mobile devices; interworking between ARM and Thumb modes was seamless via branch instructions. Endianness support was formalized with BE-32 for big-endian compatibility. Pipeline designs evolved to 3–5 stages in implementations, balancing performance and complexity.[12] ARMv5, launched in 2001, emphasized digital signal processing (DSP) and embedded Java with its TE variants: the E extension added DSP instructions like saturated arithmetic, single-instruction multiple-data (SIMD)-like 16-bit multiplies, and packing/unpacking operations for media handling. The J extension introduced Jazelle DBX, enabling direct execution of Java bytecodes in hardware for faster virtual machine performance. These built on the Thumb mode with improved interworking and added instructions like load/store doubleword (LDRD/STRD) for paired accesses. Condition flags expanded in E variants to include a Q flag for saturation overflow. The architecture retained the banked register model and barrel shifter, with pipelines scaling to 5 stages or more in advanced designs.[12] ARMv6, introduced in 2004, further optimized for multimedia and real-time embedded use by incorporating integer SIMD extensions for parallel byte and halfword operations, such as signed/unsigned saturating additions and multiplies, which boosted media processing efficiency without dedicated vector units. It added configurable unaligned access support for words and halfwords, eliminating software workarounds for misaligned data common in packed structures. The Vector Floating Point (VFP) coprocessor was integrated as an optional extension, providing single- and double-precision floating-point operations with 32 single-precision registers (or 16 double-precision equivalents). Jazelle evolved to RCT (Run-time Compilation and Translation), enhancing dynamic binary translation for bytecodes beyond Java. New exception-handling instructions like Change Processor State (CPS), Store Return State (SRS), and Return From Exception (RFE) streamlined mode switches, while media-specific instructions (e.g., sign/zero extensions and bit reversals) supported embedded audio/video tasks. Endianness options expanded to include BE-8 for byte-invariant big-endian. These enhancements, while maintaining 32-bit compatibility, laid groundwork for the multicore and virtualization advances in ARMv7.[12][13]ARMv7 Architecture
The ARMv7 architecture, introduced in 2006, represents a significant evolution in the ARM instruction set, emphasizing enhanced performance, security, and efficiency for embedded and mobile applications.[14] It defines three distinct profiles tailored to specific use cases: the A-profile for high-performance application processors in devices like smartphones and servers, the R-profile for real-time systems requiring deterministic behavior in automotive and industrial controls, and the M-profile for low-cost, low-power microcontrollers in deeply embedded systems.[15] These profiles share a 32-bit AArch32 execution state while allowing optional extensions for specialized functionality, enabling scalable designs that bridged earlier ARM versions to more advanced computing paradigms. A key advancement in ARMv7 is the Thumb-2 instruction set, which combines 16-bit and 32-bit instructions to achieve superior code density over the original Thumb format, reducing memory footprint by up to 30% in typical workloads without sacrificing performance.[16] Thumb-2 became mandatory in the A-profile and optional in others, facilitating denser binaries for resource-constrained environments. Security features were bolstered with TrustZone, a hardware-enforced partitioning mechanism that divides the system into secure and non-secure worlds, protecting sensitive operations like cryptographic keys from malware in the normal world.[17] Optional virtualization extensions (denoted as -V) further enable hypervisor support, allowing multiple guest operating systems to run isolated in the non-secure world while maintaining TrustZone isolation. ARMv7 also introduced symmetric multiprocessing (SMP) capabilities, supporting coherent multi-core configurations typically up to four cores with cache coherency protocols for shared memory access. For digital signal processing and multimedia, ARMv7 incorporates the NEON advanced SIMD extension and the VFPv3 floating-point unit, providing vectorized operations on 128-bit registers to accelerate tasks like video decoding and image processing.[18] NEON, optional but widely implemented in A-profile cores, enabled efficient handling of multimedia workloads in early smartphones, such as those using the Cortex-A8 processor, by processing multiple data elements in parallel for formats like H.264 video.[19] Power efficiency was improved through hardware support for integer divide instructions (SDIV and UDIV), which execute in a fixed number of cycles to avoid software emulation overhead, alongside advanced sleep modes that allow cores to enter low-power states while retaining context.[20] The variants of ARMv7 reflect its profile-specific optimizations: ARMv7-A targets high-performance scenarios with full Thumb-2, NEON, and virtualization support for complex OSes; ARMv7-R emphasizes real-time determinism with enhanced interrupt handling and optional TrustZone for safety-critical applications; and ARMv7-M focuses on low-power, interrupt-driven execution with Thumb-2 and divide instructions but without full virtualization, suiting simple RTOS environments.[15] These features collectively positioned ARMv7 as a foundational architecture for the proliferation of multi-core, secure mobile processors.ARMv8 Architecture
The ARMv8 architecture, introduced in 2011, marked a significant evolution in the ARM instruction set by incorporating a 64-bit execution state known as AArch64 alongside backward compatibility with the 32-bit AArch32 mode, enabling seamless transitions for existing software ecosystems.[21] This dual-mode design allowed processors to operate in either state, with AArch64 providing expanded address spaces up to 64 terabytes and enhanced integer arithmetic capabilities, while AArch32 ensured compatibility with prior ARMv7 applications.[22] The architecture's foundation emphasized performance improvements for mobile devices and emerging server markets, laying the groundwork for widespread adoption in high-efficiency computing.[21] At its core, ARMv8 features 31 general-purpose registers accessible as 64-bit X0-X30 or 32-bit aliases W0-W30, complemented by 32 advanced vector registers for SIMD operations via the enhanced NEON extension, which supports wider data types and fused multiply-add instructions for better parallel processing.[22] Key extensions include optional cryptographic instructions for AES encryption/decryption and SHA-1/SHA-256 hashing, accelerating secure data handling without external coprocessors.[23] Virtualization support is provided through Exception Levels EL2 (for hypervisors) and EL3 (for secure monitoring), enabling efficient stage-2 address translation and trap handling for multi-tenant environments. ARMv8 defines profiles tailored to specific use cases, with AArch64-A targeting application processors for general computing and AArch64-R focused on real-time systems requiring deterministic performance.[24] Power efficiency is bolstered by instructions like conditional select (CSEL), which avoids branch mispredictions by directly choosing register values based on flags, reducing energy overhead in control-flow intensive code.[25] The architecture supports advanced branch prediction mechanisms, allowing implementations to minimize pipeline stalls and further optimize power in dynamic workloads.[26] For scalability, ARMv8 theoretically accommodates up to 4096 logical cores through its generic interrupt controller and coherence protocols, paired with hardware virtualization for large-scale deployments.[27] A notable application of AArch64 was Apple's transition to 64-bit processing in the iPhone 5s in 2013, which utilized the architecture's first commercial implementation to enhance app performance and pave the way for iOS optimizations.[28] This shift, exemplified in cores like Cortex-A53 and successors, underscored ARMv8's role in enabling efficient 64-bit mobile ecosystems.[21]ARMv9 Architecture and Extensions
The ARMv9 architecture, introduced in March 2021, represents the latest major evolution of the ARM A-profile instruction set, building upon the 64-bit AArch64 execution state of ARMv8 while introducing enhancements targeted at artificial intelligence, security, and scalable computing.[29] It maintains full backward compatibility with ARMv8, allowing seamless migration for existing software ecosystems, and emphasizes a "total compute" approach that integrates CPU, GPU, and accelerator optimizations for diverse workloads from edge devices to data centers.[30] A core focus of ARMv9 is advancing AI and machine learning capabilities through the Scalable Vector Extension 2 (SVE2), which supports vector lengths ranging from 128 bits to 2048 bits in 128-bit increments, enabling efficient processing of large-scale vector and matrix operations without fixed register widths that limit scalability.[31] SVE2 builds on the original SVE from ARMv8 by adding instructions for fixed-point arithmetic, gather-scatter operations, and string processing, making it particularly suited for AI/ML algorithms like neural network training and inference.[29] In machine learning workloads, ARMv9 implementations can achieve up to a 30% improvement in instructions per cycle (IPC) compared to ARMv8 equivalents, driven by these vector enhancements and improved memory bandwidth.[32] Security is another pillar of ARMv9, with the Memory Tagging Extension (MTE) providing hardware-assisted detection of spatial memory errors such as buffer overflows by assigning 4-bit tags to 16-byte memory granules, enabling runtime checks that mitigate common vulnerabilities without significant performance overhead.[33] Complementing MTE, Pointer Authentication Codes (PAC) cryptographically sign pointers to prevent return-oriented programming attacks, a feature matured in ARMv9 from its ARMv8.3 origins.[29] The Realm Management Extension (RME), introduced as part of the Confidential Compute Architecture, enables secure enclaves called "Realms" that isolate sensitive code and data from privileged software like hypervisors, supporting dynamic provisioning for confidential computing in cloud and edge environments.[30] Subsequent branches of ARMv9 have iteratively expanded these foundations. ARMv9.2, released in 2022, introduced the Scalable Matrix Extension (SME) with support for matrix multiply-accumulate operations using up to 256x256 element tiles, optimized for machine learning kernels like generalized matrix multiplication (GEMM) and enhancing AI performance through bfloat16 and FP16 data types.[34] ARMv9.3, announced in 2022, refined branch target identification via an enhanced Branch Record Buffer Extension (BRBE v1p1), which extends control-flow tracing to Exception Level 3 (EL3) for better debugging and security analysis in secure boot scenarios, while also advancing SME with predication and multi-vector support for more flexible AI workloads.[35] The ARMv9.4 branch, announced in 2022, emphasizes infrastructure and high-performance computing (HPC) profiles, particularly for Neoverse series implementations, by introducing advanced fault handling mechanisms such as Exception-Based Event Profiling (EBEP) and Synchronous Exception-Based Event Profiling (SEBEP), which report performance monitor unit overflows as low-latency exceptions to improve reliability in large-scale systems.[36] Additionally, v9.4 mandates enhancements to SVE2 and SME2, including non-widening bfloat16 arithmetic and 128-bit data path support, boosting HPC applications like scientific simulations with up to 512-bit vector operations for decompression and data movement.[36] ARMv9.5, announced in 2023, added support for FP8 formats in SME2, SVE2, and NEON to optimize neural network processing, along with checked pointer arithmetic instructions for enhanced security against pointer corruption and features like FEAT_HDBSS for efficient live migration in virtualized environments.[37] ARMv9.6, announced in October 2024, enhanced SME efficiency for AI with support for 2:4 structured sparsity and quarter tile operations, introduced MPAM domains for multi-chiplet systems, and added granular data isolation (GDI) for improved confidential computing.[38] These extensions underscore ARMv9's role in powering scalable infrastructure, with brief adoption in recent Cortex-X and Neoverse cores demonstrating its versatility across consumer and server domains.[29]ARM-Designed Cores
Legacy Cores
The legacy cores represent ARM's foundational 32-bit processor designs, developed from the mid-1980s to the early 2000s, which established the company's emphasis on low-power, RISC-based architectures for embedded systems and early mobile devices.[3] These in-order execution cores, implementing ARMv1 through ARMv6 instruction sets, prioritized efficiency and simplicity, enabling widespread adoption in battery-constrained applications like personal digital assistants and portable gaming.[4] Their evolution introduced features such as pipelining, coprocessors, and compressed instructions, laying the groundwork for ARM's dominance in embedded computing without venturing into out-of-order or multi-core paradigms seen in later designs.[3] The ARM1, introduced in 1985, was the inaugural ARM processor, implementing the ARMv2 architecture with a 3-stage pipeline and operating at up to 12 MHz on a 3 μm process.[39] Designed initially as a prototype for Acorn Computers under a UK government contract, it focused on high performance-per-watt for the BBC Micro as a co-processor, marking the birth of ARM's RISC philosophy without cache or memory management unit (MMU).[3][40] Succeeding it, the ARM2 arrived in 1986, refining the ARMv2 architecture while retaining the 3-stage pipeline and adding multiply instructions for improved efficiency, with clock speeds reaching 8-18 MHz.[3] The ARM3, released in 1990, advanced to ARMv3 with a 5-stage pipeline, integrated 4 KB unified cache, and support for the Floating Point Accelerator (FPA) coprocessor, achieving up to 25 MHz and enabling more capable personal computing in systems like the Acorn Archimedes.[3][41] The ARM6 family, launched in 1991, implemented ARMv3 with a 3-stage pipeline and full 32-bit addressing, offering variants like the ARM600 with 4 KB cache and optional FPA10 coprocessor at up to 33 MHz; it powered early mobile phones such as the Nokia 6110.[3][42] Building on this, the ARM7 debuted in 1994 with ARMv3 and evolved to ARMv4T by 1996, featuring a 3-stage pipeline, Thumb compressed instruction set for code density, and debug support in the popular ARM7TDMI variant.[4] Fabricated on 0.35-0.18 μm processes with clock speeds up to 133 MHz, the ARM7TDMI achieved over 10 billion shipments, dominating early mobile phones, PDAs like the Apple Newton (using ARM610 variant), and handheld games including the Nintendo Game Boy Advance.[43][40][44][45] The ARM8, introduced in 1996 as the ARM810, implemented ARMv4 with a 5-stage pipeline, in-order execution, static branch prediction, and 8 KB unified cache plus MMU, running at up to 72 MHz for enhanced embedded performance without superscalar capabilities.[42] The ARM9 family, spanning 1997-2001 and supporting ARMv4T to v5TE, featured 5-6 stage pipelines, Jazelle direct bytecode execution in later variants like the ARM926EJ-S, and DSP extensions for multimedia; the ARM926EJ-S, with variable cache and MMU, reached 200 MHz and was widely used in smartphones and networking gear.[4][41] Marking a shift toward higher performance, the ARM10 in 2002 implemented ARMv5TE with a 6-stage dual-issue superscalar pipeline—the first in ARM's lineup—offering variants like the ARM1020E with 32 KB instruction/data caches and VFP floating-point support for demanding embedded tasks. The ARM11 family, from 2002-2005, adopted ARMv6 with up to 8-9 stage pipelines, SIMD instructions, Thumb-2 compression, and multi-core support via the ARM11MPCore for symmetric multiprocessing (SMP), achieving clocks up to 1 GHz in variants like the ARM1176JZ(F)-S and powering early multimedia devices.[4][46]Cortex-A and Cortex-X Series
The Cortex-A series comprises ARM's high-performance application processor cores targeted at consumer devices like smartphones and tablets, emphasizing a balance of performance, power efficiency, and support for rich operating systems. Introduced with the ARMv7-A architecture, these cores evolved to incorporate out-of-order execution, vector processing via NEON extensions, and multi-core scalability, enabling complex tasks such as multimedia rendering and machine learning inference. The series progressed to 64-bit ARMv8-A in 2012 and ARMv9-A in 2021, introducing enhancements like Scalable Vector Extensions (SVE) for AI workloads and improved branch prediction for sustained performance.[47] The Cortex-X series, launched in 2020, extends this lineage with custom, ultra-high-performance variants optimized for flagship devices, featuring wider execution units and deeper pipelines to push single-threaded speeds while maintaining compatibility with DynamIQ heterogeneous clustering. Both series support big.LITTLE and DynamIQ technologies for pairing high-performance "big" cores with efficient "LITTLE" ones, optimizing battery life in mobile scenarios. These designs have powered billions of devices, with process nodes scaling from 65 nm for early implementations to 3 nm in recent generations for greater transistor density and efficiency.[3] In September 2025, ARM announced a rebranding for its mobile CPU cores, dropping the "Cortex" name and introducing the Lumex C1 series for next-generation smartphones and devices, focusing on enhanced on-device AI performance and efficiency.[48]| Core | Announcement Year | Architecture | Key Features | Performance Notes | Notable Uses |
|---|---|---|---|---|---|
| Cortex-A8 | 2005 | ARMv7-A | First out-of-order core, NEON SIMD, up to 1 GHz clock | Dual-issue superscalar, 2x performance over predecessors | iPhone 3GS (2009) |
| Cortex-A9 | 2007 | ARMv7-A | Dual-issue in-order, SMP up to 4 cores, NEON | 30-50% faster than A8 in multi-threaded tasks | NVIDIA Tegra 2 |
| Cortex-A15 | 2011 | ARMv7-A | 3-wide out-of-order, introduced big.LITTLE pairing with A7 | Up to 40% IPC uplift over A9, supports 2.5 GHz | Samsung Exynos 5 Octa |
| Cortex-A7 | 2012 | ARMv7-A | Efficient in-order, big.LITTLE companion | 20% better efficiency than A9 at iso-performance | Paired with A15 in early heterogeneous SoCs |
| Cortex-A57 | 2013 | ARMv8-A | High-performance out-of-order, 64-bit, NEON | 1.9x single-thread perf vs. A15 | NVIDIA Tegra X1, Qualcomm Snapdragon 810 |
| Cortex-A53 | 2012 | ARMv8-A | Efficient in-order, 64-bit, high-density multi-core | Balances perf/watt for background tasks | Ubiquitous in mid-range Android devices |
| Cortex-A72 | 2015 | ARMv8-A | Out-of-order, big.LITTLE support, improved cache | 90% perf uplift over A57 at same power | Samsung Exynos 8890 |
| Cortex-A73 | 2016 | ARMv8-A | Power-optimized out-of-order, 30% efficiency gain | Sustained perf focus for mobile | HiSilicon Kirin 960 |
| Cortex-A75 | 2017 | ARMv8-A | DynamIQ compatible, 3-wide decode, out-of-order | 1.6x faster than A73 in single-thread | Broad adoption in 2018 flagships |
| Cortex-A55 | 2018 | ARMv8-A | High-efficiency LITTLE, DynamIQ, in-order | 15% better perf than A53 | Paired in big.LITTLE configs |
| Cortex-A76 | 2018 | ARMv8-A | Wider dispatch (4-wide), DynamIQ, branch predictor | 35% IPC gain over A75 | Qualcomm Snapdragon 855 |
| Cortex-A77 | 2019 | ARMv8-A | DynamIQ, improved load/store, 5G focus | 20% IPC over A76 | Qualcomm Snapdragon 865 |
| Cortex-A78 | 2020 | ARMv8.2-A | DynamIQ, 20% sustained perf boost, ML enhancements | 20% IPC gain over A77 at 1W power | Samsung Exynos 1080, foldables |
| Cortex-X1 | 2020 | ARMv8.2-A | Custom high-perf, 6-wide dispatch, DynamIQ | 30% faster than A78 in single-thread | Samsung Exynos 1080, premium smartphones |
| Cortex-A710 | 2021 | ARMv9-A | Out-of-order, SVE support, 64-bit only | 30% energy efficiency vs. A78 | Samsung Exynos 2200 |
| Cortex-A715 | 2022 | ARMv9.2-A | Balanced perf/efficiency, matches X1 in some workloads | 20% power efficiency gain over A710 | MediaTek Dimensity 9200+ |
| Cortex-A720 | 2023 | ARMv9.2-A | Premium-efficiency, sustained perf for gaming | Optimized for laptops/wearables | Expected in 2024-2025 devices |
| Cortex-A520 | 2023 | ARMv9.2-A | High-efficiency LITTLE, 22% better than A510 | 3x ML perf vs. A55 | Heterogeneous clusters in mobiles |
| Cortex-A725 | 2024 | ARMv9.2-A | Premium-efficiency, out-of-order, AI-focused | Balances speed and battery life | 2025 smartphone SoCs |
| Cortex-X2 | 2022 | ARMv9-A | Wider execution, improved ML accel | 30-35% perf over X1 | Qualcomm Snapdragon 8 Gen 2 |
| Cortex-X3 | 2023 | ARMv9.2-A | Enhanced branch prediction, vector processing | 11% IPC over X2 | MediaTek Dimensity 9300 |
| Cortex-X4 | 2024 | ARMv9.3-A | Flagship perf, SME for matrix math | 15% faster than X3 in single-thread | Expected 2025 flagships |
| Cortex-X925 | 2024 | ARMv9.3-A | Ultimate perf, 15% IPC uplift, SME2 for AI | 25% single-thread vs. prior gen | NVIDIA GB10, 2025 premium devices |
| Lumex C1-Ultra | 2025 | Armv9.3-A | Flagship AI-focused, rebranded from Cortex-X, enhanced SME2 | Up to 25% faster than X925 in single-thread, 45% multi-core uplift | Expected in 2026 flagship smartphones |
| Lumex C1-Pro | 2025 | Armv9.3-A | High-performance balanced core, AI acceleration | Successor to A725, improved efficiency | Premium mobile SoCs 2026 |
| Lumex C1-Premium | 2025 | Armv9.3-A | Mid-range performance core | Balanced perf/watt for mainstream devices | 2026 mid-tier smartphones |
| Lumex C1-Nano | 2025 | Armv9.3-A | Ultra-efficient LITTLE core, rebranded from A520 | High-density, low-power for clusters | Efficient cores in heterogeneous 2026 designs |
Cortex-M Series
The Cortex-M series comprises ARM's microcontroller-oriented processors, tailored for low-power embedded systems, IoT endpoints, and deterministic applications requiring high code density and interrupt responsiveness. These cores implement the M-profile architecture, emphasizing energy efficiency and integration with peripherals like NVIC for low-latency interrupts, making them ideal for devices from sensors to wearables. Over 50 billion Cortex-M based chips have been shipped globally, powering diverse markets including industrial controls and consumer electronics.[51][52] The series began with foundational cores supporting ARMv6-M, evolving to incorporate advanced features like floating-point units, DSP extensions, and security enhancements in later ARMv7-M and ARMv8-M iterations. Key designs prioritize a balance of performance, area, and power, with metrics such as DMIPS/MHz indicating efficiency— for instance, early cores achieve around 0.9 DMIPS/MHz while later ones exceed 2 DMIPS/MHz without compromising low-power profiles under 1 mW in active states.[8] Cortex-M0, introduced in 2009, is the entry-level core based on ARMv6-M with a 2-stage pipeline and Thumb-only instruction set, targeting ultra-low-power applications like smart sensors; it consumes less than 1 mW at typical operating frequencies and supports up to 32 external interrupts via a basic NVIC. Cortex-M0+, released in 2010 and also on ARMv6-M, refines the M0 with enhanced debug capabilities, sleep modes, and fault handling, reducing active power by up to 30% over its predecessor; it is widely integrated in microcontrollers such as the STM32 family for cost-sensitive IoT nodes. Cortex-M1, announced in 2007 as an ARMv6-M FPGA soft core, facilitates rapid prototyping and ASIC migration with a 2-stage pipeline and AHB-Lite bus interface, optimized for low gate count in programmable logic environments. Shifting to ARMv7-M, Cortex-M3, launched in 2004, introduced a 3-stage pipeline, Thumb-2 instructions for improved code density, and an advanced NVIC supporting up to 240 interrupts with sub-microsecond latency, establishing a benchmark for general-purpose embedded processing at 1.25 DMIPS/MHz.[53] Cortex-M4, from 2010 and based on ARMv7-M, extends the M3 with single-precision FPU and DSP instructions for signal processing tasks, delivering 1.25 DMIPS/MHz while enabling efficient vector math in applications like motor control.[54][55] Cortex-M7, unveiled in 2014 on ARMv7-M, offers the highest performance in the pre-v8 lineup with a 6-stage dual-issue pipeline, double-precision FPU, and optional L1 caches up to 64 KB, supporting clock speeds to 1 GHz and 2.14 DMIPS/MHz for demanding DSP and control in automotive and industrial systems.[56] Transitioning to ARMv8-M, Cortex-M33, introduced in 2017, integrates optional TrustZone for secure IoT partitioning, MPU with 16 regions, and DSP support at 1.5 DMIPS/MHz, facilitating isolated execution environments in connected devices.[57] Cortex-M55, announced in 2020 under ARMv8.1-M Mainline, incorporates Helium vector extensions for machine learning acceleration, yielding up to 15x ML performance uplift and 15% better efficiency than the M7 at 1.6 DMIPS/MHz, with support for up to 16 MB TCM and AXI bus.[58] Cortex-M85, released in 2021 on ARMv8.1-M, builds on the M55 with enhanced security via Pointer Authentication and Branch Target Identification, plus Helium for superior scalar and ML workloads, targeting secure, high-performance embedded AI at up to 4x ML inference speed over prior generations.[59][60] More recently, Cortex-M52, launched in 2023 based on ARMv8.2-M, provides a compact Helium-enabled core for cost-optimized AIoT, emphasizing area efficiency and deterministic behavior for industrial endpoints with ML needs, as the smallest such processor in the series.[61][62]| Processor | Architecture | Pipeline Stages | Key Features | DMIPS/MHz | Typical Power |
|---|---|---|---|---|---|
| Cortex-M0 | ARMv6-M | 2 | Thumb-only, basic NVIC | 0.87 | <1 mW |
| Cortex-M0+ | ARMv6-M | 2 | Enhanced debug/sleep | 0.95 | <0.7 mW |
| Cortex-M1 | ARMv6-M | 2 | FPGA soft core | 0.8 | Low gate count |
| Cortex-M3 | ARMv7-M | 3 | Thumb-2, NVIC (240 ints) | 1.25 | Low-cost |
| Cortex-M4 | ARMv7-M | 3 | FPU, DSP | 1.25 | Signal control |
| Cortex-M7 | ARMv7-M | 6 (dual-issue) | Double FPU, caches/TCM | 2.14 | Up to 1 GHz |
| Cortex-M33 | ARMv8-M | 3 | TrustZone, MPU (16) | 1.5 | Secure IoT |
| Cortex-M55 | ARMv8.1-M | 4 | Helium ML, AXI/TCM | 1.6 | 15% > M7 |
| Cortex-M85 | ARMv8.1-M | 4 | PAC/BTI security, Helium | ~2.0 | High-security ML |
| Cortex-M52 | ARMv8.2-M | 3 | Compact Helium | ~1.6 | Area-efficient AI |
Cortex-R Series
The Cortex-R series comprises ARM-designed processors optimized for real-time applications in safety-critical domains such as automotive systems, industrial control, and storage controllers, emphasizing deterministic performance, low-latency interrupt handling, and fault tolerance through features like error-correcting code (ECC) memory support and memory protection units (MPUs). These cores implement the R-profile of the ARM architecture, starting with ARMv7-R for 32-bit operation and extending to ARMv8-R for enhanced 64-bit capabilities in later models, enabling predictable execution in environments requiring high reliability. Unlike application or microcontroller-focused series, Cortex-R processors prioritize time-sensitive tasks with configurable tightly coupled memory (TCM) for zero-latency access and support for symmetric multiprocessing (SMP) to scale performance while maintaining real-time guarantees. Key safety mechanisms across the series include dual-core lockstep (DCLS) for fault detection and compliance with ISO 26262 up to ASIL-D when configured appropriately, alongside time-triggered interrupt controllers for precise scheduling in automotive and industrial use cases.[64][65] The Cortex-R4, introduced in 2006, is a foundational 32-bit processor based on the ARMv7-R architecture, featuring an 8-stage in-order dual-issue pipeline for efficient real-time processing in embedded systems. It supports ECC on TCM and data caches to enhance reliability, along with a 12-region MPU for memory isolation, and is designed for single-core configurations without hardware coherency, making it suitable for cost-sensitive deterministic applications like motor control. Performance reaches up to 1.67 DMIPS/MHz, with configurable 4-64 KB instruction and data caches, and 0-8 MB TCM for low-latency code execution. Dual-core lockstep modes enable fault tolerance for safety-critical deployments.[66][67] Building on the R4, the Cortex-R5, released in 2008, refines the ARMv7-R design with an 8-stage in-order dual-issue pipeline and improved deterministic performance, achieving up to 1.5 GHz clock speeds in advanced nodes for applications like motor control and powertrain systems. It introduces enhanced error management, including 16-region MPU, DCLS for redundant execution, and optional single-thread lockstep (STL) for fault detection, alongside ECC support on caches and TCM up to 8 MB. Dual-core configurations with I/O coherency enable scalability for real-time multitasking, delivering 1.67 DMIPS/MHz while prioritizing predictability over raw throughput. This core's focus on safety has made it prevalent in automotive ECUs certified to ISO 26262 ASIL-D.[66][68] The Cortex-R7, announced in 2011, advances the series with an 11-stage out-of-order superscalar pipeline on ARMv7-R, offering 2.5 DMIPS/MHz for higher performance in demanding real-time scenarios like baseband processing. It supports up to four cores in SMP with L1/L2 cache hierarchies (4-64 KB L1, up to 1 MB L2 shared), 16-region MPU, and ECC for fault tolerance, though it lacks native DCLS. Time-triggered interrupts via the Generic Interrupt Controller facilitate precise real-time scheduling, and its design emphasizes scalability for industrial and automotive control systems. Note that the R7 has been discontinued in favor of newer models.[69][66][70] Launched in 2016, the Cortex-R8 is a high-end 32-bit ARMv7-R processor with an 11-stage out-of-order superscalar pipeline, targeting 2.5 DMIPS/MHz for applications in LTE/5G modems, storage controllers, and advanced driver-assistance systems (ADAS). It features a 24-region MPU, ECC on 0-1 MB TCM and 0-64 KB caches, and supports up to four cores in SMP configurations for balanced real-time and throughput needs. The core's deterministic response times and optional Neon SIMD extensions enhance efficiency in data-intensive tasks, with ISO 26262 ASIL-D support through lockstep and diagnostic features. Dual-core lockstep configurations provide redundancy for automotive safety.[71][66][72] The Cortex-R82, introduced in 2020, marks the series' shift to 64-bit ARMv8.2-R with an 8-stage in-order triple-issue pipeline, delivering 3.4 DMIPS/MHz—up to 2.25 times faster than the R8 in real-world workloads—for high-performance real-time processing in storage and automotive domains. It supports up to eight cores with hardware coherency, 32+32 region MPU/MMUs, ECC on 0.16-1 MB TCM, 16-128 KB L1 instruction/16-64 KB data caches, and optional 0-4 MB L2 cache, enabling up to 1 TB DRAM addressing for computational storage. Safety features include DCLS and STL for ISO 26262 ASIL-D compliance, with time-triggered interrupts ensuring predictability in ADAS and autonomous vehicle controls. Optional Neon and machine learning extensions further boost efficiency in 2025-era automotive applications.[66][73]| Processor | Architecture | Pipeline | DMIPS/MHz | Max Cores | Key Safety Features | Primary Applications |
|---|---|---|---|---|---|---|
| Cortex-R4 | ARMv7-R | 8-stage in-order dual-issue | 1.67 | 1 | ECC, 12-region MPU, dual-core lockstep | Embedded real-time control |
| Cortex-R5 | ARMv7-R | 8-stage in-order dual-issue | 1.67 | 2 (IO coherency) | ECC, 16-region MPU, DCLS, STL | Motor control, automotive ECUs |
| Cortex-R7 | ARMv7-R | 11-stage OoO superscalar | 2.5 | 4 (SMP) | ECC, 16-region MPU | Baseband, industrial systems (discontinued) |
| Cortex-R8 | ARMv7-R | 11-stage OoO superscalar | 2.5 | 4 (SMP) | ECC, 24-region MPU, lockstep | 5G modems, ADAS, storage |
| Cortex-R82 | ARMv8.2-R | 8-stage in-order triple-issue | 3.4 | 8 (coherency) | ECC, 32+32 MPU/MMUs, DCLS, STL | Computational storage, autonomous vehicles |
Neoverse Series
The Neoverse series comprises Arm's family of 64-bit CPU cores tailored for infrastructure workloads, emphasizing scalability, energy efficiency, and performance in data centers, cloud computing, high-performance computing (HPC), and edge applications. These cores support high core counts—up to hundreds in multi-socket systems—and integrate advanced interconnects like the Coherent Mesh Network (CMN) for cache-coherent scaling across clusters. Unlike mobile-oriented designs, Neoverse prioritizes sustained throughput for server environments, with features such as enhanced branch prediction, larger caches, and vector processing extensions to handle diverse workloads from general-purpose computing to AI acceleration. The series builds on the Armv8 and Armv9 instruction set architectures, incorporating extensions for security and vectorization to enable efficient multi-threaded operation in large-scale deployments. Introduced in 2019, the Neoverse N1 core implements the Armv8.2-A architecture and derives from the Cortex-A76 microarchitecture, featuring a 4-wide out-of-order execution pipeline, 64 KB L1 instruction and data caches per core, and up to 1 MB private L2 cache. It supports up to 128 cores in a single coherent domain via a mesh interconnect, enabling high-density configurations for cloud-native applications. The N1 delivers up to 2.5 times the performance of prior-generation Arm infrastructure cores on key cloud benchmarks, with optimizations for virtualization and memory bandwidth exceeding 175 GB/s in 64-core systems. It powers the AWS Graviton2 processor, which integrates 64 N1 cores on a 7 nm process for cost-effective EC2 instances offering 40% better price-performance over comparable x86 alternatives.[74][75][76] The Neoverse V1, announced in 2020, targets HPC and AI/ML workloads with the Armv8.4-A architecture, including the first implementation of Scalable Vector Extension (SVE) for up to 2x 256-bit vector pipelines and double the floating-point throughput of scalar designs. This out-of-order core provides over 50% instructions-per-cycle (IPC) uplift compared to the N1, with a 10-stage pipeline, 1 MB L2 cache per core, and support for up to 64 cores per cluster. It excels in vector-heavy tasks, achieving leadership in HPC benchmarks through SVE's flexible vector lengths. Deployments include Fujitsu's systems for scientific simulations, contributing to Arm-based advancements in exascale computing ecosystems.[77][78][79] Launched in 2021, the Neoverse N2 core advances efficiency with the Armv8.6-A architecture, offering a 40% IPC improvement over the N1 through enhancements like a 1.5K-entry micro-op cache, 5-wide dispatch, and 13 execution ports. It features 64 KB L1 caches and configurable 512 KB to 1 MB L2 per core, with support for up to 192 cores at 350 W TDP or low-power 8-core variants at 20 W. The design yields up to 50% better performance-per-watt for scale-out cloud workloads, including improved branch prediction and memory subsystem latency under 4 cycles for loads. It underpins the AWS Graviton3 processor with 64 N2 cores, delivering 25% faster performance and 60% greater energy efficiency than Graviton2 for EC2 instances focused on web services and databases.[80][81][82] The Neoverse V2, released in 2022, builds on Armv9.1-A with SVE2 extensions for enhanced vector processing, providing up to twice the machine learning performance of the V1 through wider 512-bit vectors and improved matrix multiply acceleration. This high-performance core supports up to 128 cores (scalable to 256 with CMN-700 mesh), 2 MB L2 cache per core, and a 6-wide decode with 12K-entry branch target buffer for reduced misprediction penalties. It targets HPC and cloud AI, with 1.7x overall performance gains over V1 in floating-point intensive tasks. Notable implementations include the NVIDIA Grace CPU Superchip, featuring 72 V2 cores for data center AI training and inference at up to 114 cores in dual-socket configurations.[83][84][85] Announced in 2024 and available in 2025, the Neoverse N3 core utilizes Armv9.2-A for cloud and general-purpose infrastructure, delivering a 30% IPC uplift over the N2 via a refined 5-wide pipeline, 2 MB L2 cache options, and advanced ML optimizations like int8 and bfloat16 support. It scales to 64 cores per cluster with 48-bit physical addressing and ECC-protected caches, emphasizing power efficiency for dense server racks. The core achieves nearly triple the ML inference throughput of N2, suitable for hyperscale environments. Early adoptions include custom designs from cloud providers targeting 5G and edge infrastructure.[86][87] The Neoverse V3, introduced in 2024 under Armv9.2-A, focuses on maximum performance for HPC, cloud, and ML with confidential computing via Realm Management Extension (RME) for secure enclaves. It supports up to 128 cores per socket, 4 MB L2 cache per core, and PCIe Gen5/CXL 3.0 integration, offering 1.3x the performance of V2 in integer and floating-point workloads. Designed for 2025 server deployments, it includes enhanced SVE2 for AI vectorization and branch prediction with 2x larger history tables. Implementations are slated for next-generation supercomputers and AI clusters, with NVIDIA and others developing V3-based systems.[88][87][89] Across the series, Neoverse cores support CCIX and CHI protocols for cache coherence in multi-socket setups, enabling seamless data sharing up to thousands of cores in disaggregated systems. This infrastructure optimizes Armv9 extensions like pointer authentication and memory tagging for enhanced security in virtualized environments. In November 2025, ARM announced integration of NVIDIA NVLink Fusion into Neoverse platforms for improved coherency with GPUs in AI data centers.[90][91]Third-Party Designed Cores
Apple Processors
Apple began designing its own ARM-based system-on-chip (SoC) processors in 2010 with the A4, marking a shift from licensing off-the-shelf ARM cores to creating custom microarchitectures optimized for its iOS and later macOS ecosystems. These processors power iPhones, iPads, and other devices, emphasizing power efficiency, integrated graphics, and neural processing units (NPUs) for AI tasks. The A-series targets mobile devices, while the M-series, introduced in 2020, extends to Macs and high-end iPads with unified memory architecture that shares RAM across CPU, GPU, and other components for reduced latency and higher bandwidth.[92] The A4, released in 2010, was Apple's first custom ARMv7-based SoC, featuring a single-core Swift CPU derived from the ARM Cortex-A8 design, clocked at up to 1.0 GHz, and integrated with a PowerVR SGX535 GPU; it debuted in the iPhone 4 and first-generation iPad.[93] In 2011, the A5 introduced dual Swift cores at up to 1.0 GHz on ARMv7, doubling CPU performance over the A4 while maintaining similar power envelope, and powered the iPhone 4S and iPad 2; the A5X variant enhanced GPU performance with quad-core graphics for the third-generation iPad. The A6, launched in 2012 for the iPhone 5, refined the Swift microarchitecture on a 22 nm process for 40% better CPU speed and 2x graphics performance compared to the A5, still on ARMv7. Apple's transition to 64-bit computing came with the A7 in 2013, the first ARMv8-based processor with a dual-core Cyclone custom core at 1.3-1.4 GHz, delivering up to 2x faster CPU and GPU performance than the A6 while introducing 64-bit support for iOS; it featured in the iPhone 5s. The A8 in 2014 for the iPhone 6 and 6 Plus used a dual-core Typhoon core on ARMv8 at up to 1.4 GHz, built on a 20 nm process with a 50% faster CPU and 2.5x GPU uplift over the A7, plus the first dedicated motion coprocessor M8. Subsequent A9 (2015, Twister cores, iPhone 6s) and A10 Fusion (2016, Hurricane cores, iPhone 7) on ARMv8 continued scaling with quad-core configurations starting in the A10—two high-performance and two efficiency cores—inspired by big.LITTLE asymmetry for balanced power and performance. From the A11 Bionic in 2017 for the iPhone X, Apple fully embraced custom microarchitectures with Monsoon high-performance and Mistral efficiency cores in a hexa-core setup on ARMv8, adding the first embedded Neural Engine for machine learning acceleration with 600 billion operations per second (OPS). The A12 Bionic (2018, Vortex/Tempest, iPhone XS) on ARMv8.2 advanced to 7 nm with an upgraded 8-core Neural Engine at 5 trillion OPS, while the A13 Bionic (2019, Lightning/Liberty, iPhone 11) refined efficiency on 7 nm. The A14 Bionic (2020, Firestorm/Icestorm, iPhone 12) on 5 nm introduced the first M-series-like cores to mobile, with a 16-core Neural Engine at 11 trillion OPS. The A15 Bionic (2021, iPhone 13) and A16 Bionic (2022, Everest/Sawtooth, iPhone 14 Pro) on ARMv8.5 scaled to 5 nm and 4 nm processes, enhancing AI with up to 17 trillion OPS and deeper Neural Engine integration. The A17 Pro in 2023 for the iPhone 15 Pro marked a leap to ARMv8.6 on a 3 nm process with dual high-performance cores at up to 3.78 GHz, four efficiency cores at 2.11 GHz, and a 16-core Neural Engine at 35 trillion OPS, enabling advanced ray tracing in the GPU. The A18 and A18 Pro in 2024 for the iPhone 16 series adopted ARMv9.2-A on 3 nm, with the A18 featuring a 6-core CPU (2 performance at 4.04 GHz, 4 efficiency at 2.2 GHz) and 30% faster CPU performance than the A17 Pro, alongside a 16-core Neural Engine optimized for Apple Intelligence AI features.[94]| Processor | Year | ARM Version | CPU Cores (Perf/Eff) | Process Node | Key Innovation | Debut Device |
|---|---|---|---|---|---|---|
| A4 | 2010 | v7 | 1 | 45 nm | First custom Swift core | iPhone 4 |
| A5 | 2011 | v7 | 2 | 45 nm | Dual-core debut | iPhone 4S |
| A6 | 2012 | v7 | 2 | 22 nm | Process shrink for efficiency | iPhone 5 |
| A7 | 2013 | v8 | 2 | 28 nm | First 64-bit ARM | iPhone 5s |
| A8 | 2014 | v8 | 2 | 20 nm | Integrated M8 coprocessor | iPhone 6 |
| A9 | 2015 | v8 | 2 | 14 nm | Peak performance focus | iPhone 6s |
| A10 Fusion | 2016 | v8 | 2/2 | 16 nm | First big.LITTLE | iPhone 7 |
| A11 Bionic | 2017 | v8 | 2/4 | 10 nm | First Neural Engine | iPhone X |
| A12 Bionic | 2018 | v8.2 | 2/4 | 7 nm | 7 nm process, AI uplift | iPhone XS |
| A13 Bionic | 2019 | v8.4 | 2/4 | 7 nm | Efficiency refinements | iPhone 11 |
| A14 Bionic | 2020 | v8.5 | 2/4 | 5 nm | Firestorm cores | iPhone 12 |
| A15 Bionic | 2021 | v8.5 | 2/4 | 5 nm | Scalable core count | iPhone 13 |
| A16 Bionic | 2022 | v8.5 | 2/4 | 4 nm | Advanced media engine | iPhone 14 Pro |
| A17 Pro | 2023 | v8.6 | 2/4 | 3 nm | Ray tracing GPU | iPhone 15 Pro |
| A18/A18 Pro | 2024 | v9.2 | 2/4 | 3 nm | Apple Intelligence NPU | iPhone 16 |
Qualcomm Processors
Qualcomm has developed a range of ARM-based processors primarily under its Snapdragon brand, targeting mobile devices, PCs, and embedded systems, with a focus on custom and licensed core designs to optimize performance and efficiency in Android smartphones and Windows on Arm laptops. The company's evolution from early custom architectures to adopting ARM's Cortex cores and back to proprietary designs reflects adaptations to power constraints and market demands for AI and multitasking capabilities. The Scorpion core, introduced in 2007 as part of the Snapdragon S1 platform, marked Qualcomm's first custom ARM-compatible CPU, implementing the ARMv6 instruction set architecture in a single-core configuration clocked up to 1 GHz.[98] Designed with a 13-stage pipeline for higher clock speeds, Scorpion powered early 3G smartphones like the HTC Dream, emphasizing multimedia processing alongside an integrated Adreno GPU.[99] Its superscalar design delivered improved integer and floating-point performance over standard ARM11 cores, though limited to 32-bit operations.[99] Building on this, the Krait cores, debuted in 2011 with the Snapdragon S4 series, represented Qualcomm's custom implementation of the ARMv7-A architecture, supporting both 32-bit and early 64-bit extensions in dual- and quad-core variants up to 2.3 GHz. Krait, used through 2014 in devices like the HTC One and Samsung Galaxy S4, featured out-of-order execution and NEON SIMD support, achieving up to 40% better efficiency than ARM Cortex-A9 equivalents while pairing with Adreno 320 GPUs for enhanced graphics.[100] This custom microarchitecture allowed Qualcomm to tailor branch prediction and cache hierarchies for mobile workloads, powering the shift to LTE-enabled 4G handsets. The Kryo family, launched in 2016 with the Snapdragon 820, introduced Qualcomm's ARMv8-A 64-bit custom cores, initially based on modifications to ARM's Cortex-A57 and A53 designs in a quad-core big.LITTLE configuration up to 2.15 GHz.[101] Evolving to fully custom variants like Kryo 385 in the 2017 Snapdragon 835, these octa-core setups on 10 nm processes delivered 30% better CPU performance over predecessors, integrated with Adreno 540 GPUs for VR support in flagships like the Galaxy S8.[102] By 2018, the Snapdragon 845's Kryo 385 further refined custom elements for AI acceleration, emphasizing heterogeneous computing for gaming and camera tasks.[101] From 2018 to 2020, the Kryo 4xx and 5xx series shifted to licensed ARM Cortex cores under ARMv8.2-A, with the Snapdragon 855 featuring Kryo 485 (based on Cortex-A76 for performance cores at 2.84 GHz and A55 for efficiency at 1.78 GHz) in an octa-core layout on 7 nm.[103] The 2019 Snapdragon 865 advanced this to Kryo 585 (Cortex-A77-based primes at 2.84 GHz with A55 efficiency cores), offering 25% CPU uplift and improved power efficiency for 5G devices like the Galaxy S20, while supporting advanced ISP for computational photography.[103] These designs prioritized big.LITTLE asymmetry to balance sustained performance in multitasking scenarios.[104] Qualcomm's Oryon cores, a fully custom microarchitecture announced in 2022 and implementing ARMv8.6+ extensions (evolving to v8.7 and v9), debuted in production silicon in 2024, marking a return to proprietary designs for superior IPC and efficiency.[105] Featuring wide execution units and large caches, Oryon powers the Snapdragon 8 Gen 1 through Gen 3 platforms (2022–2024), though early generations like the 8 Gen 1 on 4 nm Samsung process used hybrid configs up to 1+3+4 cores (Cortex-based initially transitioning to Oryon influences); by Gen 3, it adopted 1+5+2 layouts with Oryon primes at 3.3 GHz for AI-driven tasks in devices like the Galaxy S24.[106] The 2024 Snapdragon 8 Gen 4 (also branded Elite) on TSMC 3 nm fully leverages second-generation Oryon in an all-big-core 2+6 configuration up to 4.32 GHz, delivering 45% faster CPU performance than Gen 3 while enhancing NPU for on-device generative AI.[107] Plans for 2025 include 2 nm iterations with further custom optimizations.[108] In the PC space, the 2024 Snapdragon X Elite integrates 12 Oryon cores under ARMv9.1 on 4 nm, configured in three quad-core clusters with dual-core boost to 4.2 GHz, targeting Windows on Arm laptops for multi-day battery life and native x86 emulation via Prism.[109] It outperforms prior ARM PCs in multi-threaded workloads, with 42 MB cache enabling seamless productivity and gaming.[110] The Snapdragon X Plus variant offers 10- or 8-core options up to 3.4 GHz for mid-range laptops, while 2025's Snapdragon X2 Elite Extreme (enhanced X Plus successor) introduces third-generation Oryon with up to 18 cores on 3 nm nodes for improved AI inference and connectivity.[111] This expansion underscores Qualcomm's 2025 shift to predominantly custom ARM cores across portfolios, reducing reliance on licensed IP for differentiated efficiency.[112]Samsung Processors
Samsung's Exynos series represents its line of system-on-chip (SoC) processors based on ARM architecture, primarily designed for mobile devices such as smartphones and wearables in the Galaxy lineup.[113] These processors often feature a mix of licensed ARM Cortex cores and, in earlier generations, Samsung's custom Mongoose CPU designs, optimized for performance in Samsung's premium devices while offering regional variants to complement Qualcomm's Snapdragon chips in global markets.[114] Exynos SoCs have evolved from single-core ARMv7 implementations to advanced ARMv9-based deca-core configurations, emphasizing AI capabilities, power efficiency, and integration with Samsung's ecosystem.[115] The series began with the Hummingbird processor in 2010, an ARMv7-based single-core design derived from the Cortex-A8, clocked at 1 GHz and fabricated on a 45 nm process.[114] It powered the original Galaxy S and Nexus S smartphones, supporting 1080p video recording and LPDDR2 memory, marking Samsung's entry into in-house mobile SoC development.[114] This was followed by the Exynos 4210 in 2011, a dual-core ARMv7 Cortex-A9 implementation at 1.2 GHz on a 45 nm node, used in devices like the Galaxy S II and Galaxy Note, with Mali-400MP4 GPU for 1080p/30fps playback.[114] Samsung introduced custom CPU cores with the Mongoose architecture in 2016, starting with the M1 in the Exynos 8890 SoC, an ARMv8-based quad-core design paired with four Cortex-A53 efficiency cores on a 14 nm process.[114] The 8890 powered the Galaxy S7 series and Note 7, featuring a Mali-T880 MP12 GPU and support for 4K/60fps video.[114] Subsequent iterations advanced to the M3 in the Exynos 9810 (2018) and M4 in the Exynos 9820/9825 (2019), both ARMv8.2 compliant, combining custom cores with Cortex-A55 for devices like the Galaxy S9, Note 9, S10, and Note 10 series, adding NPU for AI tasks and 8K video support on 10 nm and 8 nm nodes.[114] The Exynos 9xx series (9810–990) represented Samsung's peak in custom CPU innovation before shifting to fully licensed ARM designs.[114] Transitioning to ARMv9, the Exynos 2100 in 2021 featured a single Cortex-X1 prime core alongside three Cortex-A78 performance cores and four Cortex-A55 efficiency cores on a 5 nm process, integrated with a Mali-G78 MP14 GPU and 5G modem for the Galaxy S21 series, supporting 200 MP cameras.[114] This was followed by the Exynos 2200 in 2022, with a Cortex-X2 prime core, three Cortex-A710 cores, and four Cortex-A510 cores under ARMv9.1, paired with an AMD-designed Xclipse 920 GPU based on RDNA2 for ray tracing, targeted for Galaxy S22 but ultimately limited to select regions due to yield issues.[114] The Exynos 2400, released in 2024, marks a return to all-Cortex configurations under ARMv9.2, with a 10-core setup: one Cortex-X4 at 3.21 GHz, five Cortex-A720 (two at 2.9 GHz, three at 2.6 GHz), and four Cortex-A520 at 1.95 GHz, built on Samsung's 4 nm LPP+ process for improved efficiency.[116][117] It includes the Xclipse 940 GPU (AMD RDNA3-based) and powers international variants of the Galaxy S24 series, emphasizing AI processing with a 14.7x NPU uplift over predecessors.[118] Looking ahead, the Exynos 2500, entering mass production in 2025, adopts ARMv9 with a deca-core layout: one Cortex-X925 at 3.3 GHz, seven Cortex-A725 (two at 2.74 GHz, five at 2.36 GHz), and two Cortex-A520 at 1.8 GHz on a 3 nm GAA process, featuring an enhanced NPU and Xclipse 950 GPU for Galaxy Z Flip 7 devices.[119] For wearables, Samsung's Exynos W9xx series includes the W930 (2023, dual Cortex-A55 at 1.4 GHz on 5 nm) for Galaxy Watch6 and the W1000 (2024, one Cortex-A78 at 1.6 GHz plus four Cortex-A55 at 1.5 GHz on 3 nm GAA) for Galaxy Watch7, both with Mali-G68 GPUs to enable always-on displays and GNSS tracking.[120][121] These processors highlight Samsung's strategy of regional optimization, where Exynos variants are deployed in non-U.S. Galaxy models to leverage supply chain control and tailored performance.[122]| Processor | Year | ARM Version | Core Configuration | Process Node | Key Devices | Notable Features |
|---|---|---|---|---|---|---|
| Hummingbird (Exynos 3110) | 2010 | v7 | 1x Cortex-A8 @1 GHz | 45 nm | Galaxy S, Nexus S | 1080p video, PowerVR SGX540 GPU[114] |
| Exynos 4210 | 2011 | v7 | 2x Cortex-A9 @1.2 GHz | 45 nm | Galaxy S II, Galaxy Note | Mali-400MP4 GPU, 1080p/30fps[114] |
| Exynos 8890 (Mongoose M1) | 2016 | v8 | 4x M1 + 4x A53 | 14 nm | Galaxy S7, Note 7 | Custom CPU, 4K/60fps, Mali-T880 MP12[114] |
| Exynos 9810 (M3)/9820 (M4) | 2018–2019 | v8.2 | 4x M3/M4 + 4x A55 | 10 nm/8 nm | S9/Note9, S10/Note10 | NPU, 8K video, Mali-G72/G76[114] |
| Exynos 2100 | 2021 | v9 | 1x X1 + 3x A78 + 4x A55 | 5 nm | Galaxy S21 | Integrated 5G, 200 MP camera, Mali-G78 MP14[114] |
| Exynos 2200 | 2022 | v9.1 | 1x X2 + 3x A710 + 4x A510 | 4 nm | Galaxy S22 (select regions) | AMD Xclipse 920 (RDNA2), ray tracing[114] |
| Exynos 2400 | 2024 | v9.2 | 1x X4 + 5x A720 + 4x A520 | 4 nm LPP+ | Galaxy S24 (international) | Xclipse 940 (RDNA3), 14.7x NPU boost[116][117] |
| Exynos 2500 | 2025 | v9 | 1x X925 + 7x A725 + 2x A520 | 3 nm GAA | Galaxy Z Flip 7 (expected) | Enhanced NPU, Xclipse 950 (RDNA3)[119] |
| Exynos W930/W1000 | 2023–2024 | v8 | 2x A55 (W930); 1x A78 + 4x A55 (W1000) | 5 nm/3 nm | Galaxy Watch6/7 | Mali-G68 GPU, GNSS, always-on display[120][121] |
MediaTek Processors
MediaTek has established itself as a key player in the ARM ecosystem by designing affordable system-on-chips (SoCs) optimized for mid-range smartphones, tablets, and emerging personal computing devices, leveraging licensed ARM architectures to deliver balanced performance and power efficiency.[123] These SoCs emphasize cost-effectiveness, integrated connectivity like 5G, and features such as AI processing and advanced imaging, making them prevalent in budget to flagship Android devices across global markets, particularly in developing regions where they hold significant market share.[124] The company's early mobile efforts centered on the MT65xx series during the 2010s, which employed ARMv7 instruction set architecture with Cortex-A7 cores for basic multitasking and later Cortex-A17 for improved graphics in entry-level phones. These quad-core SoCs, fabricated on 28nm processes, powered numerous budget Android handsets by prioritizing low power consumption over high-end performance, enabling widespread adoption in feature-rich yet inexpensive devices. Transitioning to 64-bit computing, MediaTek introduced the Helio series in 2015 using ARMv8 architecture, targeting mid-range segments with heterogeneous core designs.[125] Flagship-oriented Helio X models, such as the X10 (octa-core Cortex-A53 at 2.0 GHz) and X20 (Cortex-A72/A53 big.LITTLE configuration), supported LTE and enhanced multimedia, while the P and PX series, like the P23 (octa-core A53 at 2.0 GHz), focused on efficient 4G connectivity for everyday tasks in mid-tier phones through 2019.[126] The series incorporated MediaTek's CorePilot technology for dynamic resource management, improving battery life in gaming and camera applications.[127] With the shift to 5G, the Dimensity series debuted in 2020 on ARMv8.2, integrating modems and Cortex-A76/A55 cores for cost-effective connectivity; for instance, the Dimensity 700 (octa-core up to 2.2 GHz) and 800 (A76/A55 hybrid) enabled sub-6GHz 5G in mid-range devices, supporting 90Hz displays and 108MP cameras.[128] Advancing to ARMv9, the Dimensity 9000 (2021) and 9200 (2022), built on 4nm processes, featured a Cortex-X2 prime core alongside A78 and A710 cores, delivering flagship-level multitasking and ray-tracing graphics via Mali GPUs.[129] The Dimensity 9300, released in 2023 under ARMv9.1, pioneered an all-big-core CPU layout with four Cortex-A720 cores clocked up to 3.25 GHz, eliminating efficiency cores for superior sustained performance in AI and gaming workloads, complemented by the Imagiq 890 ISP for advanced computational photography. Building on this, the 2024 Dimensity 9400 on a 3nm TSMC node adopted ARMv9.2 with a single Cortex-X5 core at 3.62 GHz, three A725 performance cores, and four A520 efficiency cores, achieving a 35% CPU uplift and 28% GPU improvement over the 9300 through optimized power delivery.[130] In 2025, the Dimensity 9500 arrived on a 3nm process with ARMv9 architecture, featuring an all-big-core setup including a custom Arm C1-Ultra prime core and integrated 5G Advanced for ultra-low latency in edge computing scenarios, with an enhanced NPU for AI enhancements.[131] Beyond mobiles, MediaTek expanded into personal computing with the MT8195 (also known as Kompanio 1380) in 2022, an ARMv8 octa-core SoC featuring four Cortex-A78 cores on a 6nm process, designed for premium Chromebooks with support for 8K video and AV1 decoding.[132] For Windows on Arm, MediaTek announced an upcoming 8-core ARM-based chip in late 2025, developed in collaboration with Nvidia, targeting AI PCs with integrated GPU acceleration and expected compatibility with major OEMs like Lenovo.[133]| Series | Key Models | ARM Architecture | Core Configuration Example | Target Devices | Notable Features |
|---|---|---|---|---|---|
| MT65xx | MT6580, MT6595 | v7 | Octa-core A53/A17 | Budget phones | Low-cost 4G, 28nm process |
| Helio | X20, P23 | v8 | 2x A72 + 4x A53 | Mid-range phones | LTE Cat-6, CorePilot scheduling |
| Dimensity (Mid) | 700, 800 | v8.2 | 2x A76 + 6x A55 | 5G entry-level | Integrated modem, 90Hz display support |
| Dimensity (Flagship) | 9300, 9400, 9500 | v9.x | All-big A720/X5/C1-Ultra | Premium phones/PCs | AI NPU, 3nm efficiency, ray-tracing GPU |
Other Third-Party Processors
Other third-party ARM processors encompass a diverse range of implementations developed by vendors beyond the primary mobile-focused designers, often targeting servers, embedded systems, AI acceleration, and niche consumer devices. These cores frequently build on ARM's Cortex-A or Neoverse architectures but include custom modifications for specific workloads, such as high-performance computing or energy-efficient edge processing. Notable examples include processors from Huawei, NVIDIA, Amazon, and others, which have expanded ARM's footprint into data centers and specialized applications since the early 2010s. Huawei's Kirin series, introduced in 2012, utilizes custom ARMv8 and ARMv9 cores like the Da Vinci architecture for mobile devices, with the Kirin 9000 (2020) featuring an ARM Cortex-A77-based prime core for enhanced AI and graphics performance in smartphones. For server applications, Huawei's TaiShan processors, such as the TaiShan V120 (2022) based on the Kunpeng 920 (ARMv8), deliver up to 64 cores optimized for cloud computing and big data workloads, achieving significant power efficiency in enterprise environments. These designs incorporate Huawei's proprietary extensions for security and acceleration, positioning them as key players in China's domestic semiconductor ecosystem. NVIDIA's Tegra family, dating back to 2010 with ARMv7 cores, evolved to include custom elements in later iterations; the Tegra X1 (2015) employed quad Cortex-A57 cores for gaming consoles like the Nintendo Switch, while the Orin SoC (2022) integrates 12 custom Carmel ARMv8.2 cores tailored for AI inference, supporting up to 275 TOPS of performance in autonomous vehicles and robotics. The upcoming NVIDIA Blackwell platform (announced 2024, shipping 2025) incorporates ARM-based elements for control and acceleration in AI superchips, enhancing scalability for exascale computing. Amazon's Graviton processors, launched in 2018, leverage ARM Neoverse foundations for AWS cloud instances; the Graviton3 (2021) uses Neoverse V1 cores to provide up to 25% better price-performance over predecessors in web servers, while the Graviton4 (2024) adopts Neoverse V2 for improved vector processing and up to 30% faster database workloads, emphasizing sustainable computing with lower energy use. Broadcom's BCM2711 (2019), powering the Raspberry Pi 4, features a quad Cortex-A72 core at 1.5 GHz for hobbyist and educational computing, enabling GPIO integration and multimedia decoding in compact boards. AMD's MI300X AI accelerator (2023) supports hybrid workloads in data centers with up to 192 GB of HBM3 memory for machine learning training. Rockchip's RK3588 (2022) combines eight cores—four Cortex-A76 and four Cortex-A55—for high-end tablets and single-board computers, delivering 6 TOPS of NPU performance for edge AI tasks like video analytics. Allwinner Technology's processors, such as the H616 used in budget tablets, rely on quad Cortex-A53 cores for cost-effective media playback and basic computing in consumer electronics. Ampere Computing's Altra series (2020 onward) employs up to 128 Neoverse N1 cores per socket for cloud servers, offering scalable performance for virtual machines with densities up to 256 cores in dual-socket configurations. Qualcomm's Oryon CPU, initially for client devices, expanded to server applications in 2025 via the Snapdragon X Elite adaptations, targeting data center efficiency with custom ARMv9 cores that provide up to 45% better power utilization in edge servers compared to x86 alternatives.Timeline of Releases
1985–2000
The development of ARM processors began in 1985 when Acorn Computers introduced the ARM1, the world's first commercial RISC processor, designed by Steve Furber and Sophie Wilson primarily as a second processor for the BBC Micro computer to accelerate simulation software and CAD tasks.[5][40] This 32-bit design emphasized low power consumption and efficiency, setting the foundation for future embedded applications.[5] In 1987, Acorn launched the ARM2 processor at 8 MHz, integrated into the Archimedes series of personal computers, which became the first commercially successful RISC-based home computers and marked ARM's entry into the PC market.[134][135] The ARM3 followed in 1989, operating at 25 MHz with added cache support, powering upgraded Archimedes models like the A5000 and enhancing performance for educational and desktop use in the UK market.[136] These early cores shifted focus from general computing toward more efficient designs suitable for battery-powered devices.[134] A pivotal milestone occurred in November 1990 with the formation of ARM Holdings (initially Advanced RISC Machines Ltd.) as a joint venture between Acorn Computers, Apple Computer, and VLSI Technology, transitioning ARM from an in-house Acorn project to a licensable IP model focused on embedded systems.[4] In 1991, the ARM6 core debuted at 20 MHz as the ARM610, specifically tailored for Apple's Newton MessagePad PDA, which launched in 1993 and represented ARM's first major foray into portable consumer electronics.[137][4] This collaboration helped establish ARM's low-power credentials for mobile applications.[4] Texas Instruments became one of the first major licensees in May 1993, adopting ARM designs for digital signal processing in mobile communications and advising Nokia on GSM phone implementations.[138] Digital Equipment Corporation (DEC) followed as an architectural licensee in the mid-1990s, developing the StrongARM (based on ARM8 architecture) announced in 1996 at speeds up to 200 MHz for high-performance embedded uses.[139] The ARM7TDMI, introduced in 1994, emerged as ARM's first major embedded success with its Thumb instruction set for code density, powering early Nokia GSM phones like the 8110 in 1996.[140][138] Between 1996 and 1998, the ARM8 (as in the ARM810) and ARM9 families expanded adoption in set-top boxes for digital TV decoding and PDAs, with the ARM9 announced in 1997 offering five-stage pipelining for improved multimedia performance.[141][142] Examples include the ARM7-based Nokia 6110 phone in 1997, which sold millions and solidified ARM in mobile telephony, and the Psion Series 5 PDA in 1997 using the ARM710 at 18 MHz for EPOC OS tasks.[4][143] ARM Holdings went public in 1998, listing on the London Stock Exchange and NASDAQ, reflecting growing industry confidence.[4] By 2000, the ARM10 core was introduced in 1998 with initial silicon in 1999, targeting over 400 MIPS for consumer devices and featuring early symmetric multiprocessing (SMP) support in variants like the ARM1020 for multi-core experiments in handhelds.[144] This period saw a market shift from Acorn PCs to mobile and embedded sectors, driven by licensees like TI and DEC; partner shipments exceeded 180 million units in 1999 alone, reaching hundreds of millions cumulatively by 2000.[145]2001–2010
The period from 2001 to 2010 marked a pivotal era for ARM processors, driven by the explosive growth of mobile devices and the transition from feature phones to smartphones, which propelled ARM architectures into billions of units shipped annually. In 2001, ARM released the ARMv5TE architecture, enhancing digital signal processing capabilities with support for saturated arithmetic and improved Thumb instruction set interworking, enabling more efficient implementations in embedded systems. This architecture underpinned the widespread adoption of ARM9 processors in Symbian-based smartphones, such as those from Nokia and Ericsson, where ARM9 cores like the ARM926EJ-S provided the computational foundation for early mobile operating systems, contributing to Symbian's dominance in the European market with over 100 million devices by mid-decade.[146][147] By 2004, ARM announced the Cortex-M3 processor, targeting low-cost, high-performance microcontroller applications with its ARMv7-M architecture, which included Thumb-2 instructions for better code density and interrupt handling, quickly finding use in consumer electronics and industrial controls. The ARM11 family, introduced in 2002 but gaining traction in portable media players, powered devices like Apple's iPod touch starting in 2007, where the Samsung-fabricated S5L8900 SoC with an ARM1176JZF-S core delivered 412 MHz performance for multimedia tasks. In 2005, ARM unveiled the Cortex-A8, the first out-of-order execution ARM processor based on ARMv7-A, offering up to 2x the performance of ARM11 at similar power levels through advanced branch prediction and NEON SIMD extensions, setting the stage for high-end mobile computing.[148][149] The 2007 launch of the original iPhone, featuring a Samsung S5L8900 processor with an ARM11 core fabricated by Samsung on a 90 nm process, revolutionized the smartphone industry by integrating touch interfaces and app ecosystems, accelerating ARM's penetration into premium mobile markets and inspiring competitors to adopt similar architectures. Concurrently, the ARM7TDMI core, a staple since the 1990s, powered billions of feature phones worldwide, with ARM partners shipping nearly 3 billion processors in 2007 alone, many embedded in basic cellular devices from manufacturers like Nokia and Motorola. In 2008, the Cortex-R4 processor was introduced for real-time applications, particularly in automotive systems, providing deterministic performance with dual-issue execution and optional floating-point support, enabling advancements in engine control units and safety-critical embedded environments.[150][151][67] By 2010, ARM's ecosystem had scaled dramatically, with cumulative shipments exceeding 10 billion cores since inception, fueled by the mobile revolution and the onset of Android's adoption in 2008, which leveraged ARMv7-compatible processors in devices like the HTC Dream. The iPhone 4 introduced Apple's A4 SoC, based on the Cortex-A8 core and fabricated by Samsung on a 45 nm process, delivering 1 GHz performance with integrated PowerVR graphics, underscoring the iPhone's outsized impact in driving demand for powerful, power-efficient ARM designs. NVIDIA's Tegra platform, featuring ARM cores in early tablets like the 2010 development kits, expanded ARM into multimedia handhelds, while the conceptual groundwork for heterogeneous computing—later formalized as big.LITTLE in 2011—was laid through explorations of combining high-performance and efficiency cores. This decade's innovations not only solidified ARM's leadership in low-power computing but also transformed consumer electronics, with smartphones alone accounting for over 90% of ARM-based shipments by 2010.[43]2011–2020
The decade from 2011 to 2020 marked a pivotal shift for ARM processors toward 64-bit architectures, enabling greater performance and efficiency that propelled widespread adoption in mobile devices and initial forays into server infrastructure. In 2011, ARM announced the ARMv8 architecture, introducing 64-bit capabilities while maintaining backward compatibility with 32-bit code to facilitate a smooth transition for developers and manufacturers. Concurrently, Apple's iPad 2 debuted with the A5 processor, featuring a dual-core ARM Cortex-A9 design that enhanced graphics and multitasking for tablets, underscoring ARM's growing dominance in consumer electronics.[4][152] By 2013, the 64-bit era arrived in smartphones with Apple's iPhone 5S, powered by the A7 chip—the first 64-bit ARM-based processor in a consumer device—which delivered up to double the CPU and GPU performance of its predecessor while improving energy efficiency. This milestone accelerated the industry's move to 64-bit computing, as ARMv8 enabled more complex applications and better handling of large datasets. In 2014, ARM released the Cortex-A53 and Cortex-A57 cores, optimized for the ARMv8-A instruction set, with the A57 providing high-performance capabilities and the A53 focusing on efficiency; these were first implemented in Qualcomm's Snapdragon 810, which employed a big.LITTLE heterogeneous configuration to dynamically balance power and performance in premium smartphones.[153][154] The mid-2010s saw explosive ecosystem growth, particularly in mobile. In 2016, ARM introduced the Cortex-A73, a more efficient successor to the A57 that improved single-threaded performance by 30% at iso-power, further refining big.LITTLE implementations for sustained battery life in always-on devices. Samsung advanced custom ARM designs with the Exynos 8890, incorporating four proprietary M1 cores alongside Cortex-A53 for flagship Galaxy S7 devices, demonstrating how licensees could tailor ARM IP for competitive edges in AI and multimedia processing. That year, ARM-based processors powered over 10 billion smartphone shipments cumulatively, reflecting the architecture's near-ubiquity in the global mobile market.[155] Entering the late 2010s, manufacturing advances amplified ARM's scalability. In 2018, the Cortex-A76 debuted as ARM's first core designed specifically for 7nm process nodes, offering 35% higher performance or 40% better efficiency compared to the A73, enabling laptop-class computing in thin smartphones. This period also saw ARM launch the Neoverse N1 platform in 2019 (building on 2018 announcements), a server-oriented core based on the A76 that supported up to 128 cores per socket for cloud workloads, marking ARM's strategic entry into data centers with hyperscalers like AWS adopting it for energy-efficient scaling. ARMv8 architectures achieved dominance, powering the majority of new mobile and embedded designs.[156] The year 2020 highlighted ARM's maturation across sectors amid global challenges. The Cortex-A78 arrived, enhancing 5G integration with improved modem support and AI acceleration, as seen in Qualcomm's Snapdragon 888, which combined it with a dedicated 5G RF system for sub-6GHz and mmWave connectivity in next-generation devices. Apple announced the M1 chip, transitioning Macs to custom ARM-based silicon for the first time, promising superior performance-per-watt for professional workflows. The COVID-19 pandemic accelerated demand for ARM PCs by emphasizing remote computing needs, with efficient, low-power designs gaining traction in laptops and edge devices. By 2020, cumulative shipments of ARM-based chips exceeded 160 billion, with 5G integration becoming standard in premium processors to support emerging connectivity ecosystems.[157][158][159]2021–Present
The period from 2021 onward marked a significant evolution in ARM processors, driven by the introduction of the ARMv9 architecture, which enhanced security, machine learning capabilities, and scalability for both mobile and server applications. Announced in March 2021, ARMv9 incorporated scalable vector extensions (SVE2) for improved AI and HPC workloads, building on the 64-bit foundation while addressing emerging needs in confidential computing and branch prediction. In parallel, the Cortex-X1 core debuted in high-end mobile SoCs, offering up to 22% single-threaded performance gains over its predecessor through wider execution units and advanced caching, first appearing in Qualcomm's Snapdragon 888 in late 2020 but gaining prominence in the Snapdragon 8 Gen 1, unveiled in December 2021 with a 4nm process for flagship Android devices. By 2022, Apple's M2 processor advanced the ARM ecosystem in personal computing, released in June for the MacBook Air and iPad Pro, featuring an 8-core CPU with improved Avalanche performance cores and up to 24 GPU cores on TSMC's second-generation 5nm node, delivering 18% faster CPU performance than the M1. Server-side progress included the Neoverse V2 core, announced in February 2022, which targeted cloud infrastructure with SVE support and up to 50% better integer performance per watt compared to V1, powering platforms like AWS Graviton instances. Process node shifts accelerated with MediaTek's Dimensity 9000 in November 2021 (devices in 2022), adopting TSMC's 4nm for efficient 5G and AI processing, signaling the industry's move toward sub-5nm fabrication. In 2023, mobile AI integration surged with dedicated NPUs becoming standard, exemplified by Apple's A17 Pro in the iPhone 15 series (September 2023), which included a 16-core Neural Engine capable of 35 trillion operations per second on a 3nm process, enhancing on-device machine learning for features like real-time translation. Server advancements continued with Neoverse V3, unveiled in October 2023, offering 50% more performance than V2 through mesh networking and confidential computing extensions, while AWS's Graviton3 processors, based on Neoverse V1 but scaled in 2023 deployments, provided up to 25% better price-performance for EC2 instances. The AI NPU boom reflected broader adoption, with over 80% of premium smartphones incorporating dedicated accelerators by year-end, boosting inference speeds for generative AI tasks. 2024 saw further maturation in high-performance computing with Apple's M4 chip in May for the iPad Pro, featuring a 10-core CPU on TSMC's 3nm enhanced process and a 16-core Neural Engine at 38 TOPS, prioritizing efficiency for slim designs. Qualcomm's Snapdragon 8 Elite, announced in October, shifted to custom Oryon cores with up to 45% faster CPU performance on a 3nm node, targeting AI-driven laptops and phones. Samsung's Exynos 2400, released in January for the Galaxy S24, integrated a deca-core design with Xclipse GPU on 4nm, while MediaTek's Dimensity 9400 in October emphasized 3nm efficiency and ray-tracing GPU for gaming. Announcements for 2nm processes proliferated, with TSMC and Samsung planning EUV-based production in 2025 for next-gen ARM SoCs, promising 15-20% density gains. As of November 2025, the ARM landscape continued to expand with Cortex-X925 and Cortex-A725 cores announced in May 2024, delivering up to 36% IPC uplift via ARMv9.2 extensions for AI and 5G, set for integration in 2025 flagships. Qualcomm's Snapdragon 8 Elite Gen 5, released in September 2025, featured enhanced NPU at 75 TOPS on 3nm, powering AI-centric devices. Apple's A18 (iPhone 16, September 2024, but iterated in 2025 devices) and anticipated M5 (expected late 2025 for Macs) emphasized 2nm readiness and 40+ TOPS AI, while MediaTek expanded into Windows on ARM with Dimensity variants for Copilot+ PCs in Q2 2025. ARM reported surpassing 310 billion total shipped cores as of mid-2025, underscoring ecosystem dominance.[160] Key milestones included the revival of Windows on ARM, fueled by Qualcomm's Snapdragon X series and Microsoft's Copilot+ initiative in 2024, achieving over 20 million units shipped by 2025 with native x86 emulation. EUV-enabled 2nm adoption began in production for select ARM designs, enabling denser integrations. ARMv9.3, announced in September 2025, saw rapid uptake in servers for enhanced virtualization. AI performance in ARM processors roughly doubled annually, from ~10 TOPS in 2021 mobiles to over 50 TOPS by 2025, driven by NPU optimizations.References
- https://en.wikichip.org/wiki/qualcomm/kryo
- https://en.wikichip.org/wiki/samsung/exynos
- https://en.wikichip.org/wiki/mediatek/helio
- https://en.wikichip.org/wiki/acorn/microarchitectures/arm3
- https://en.wikichip.org/wiki/arm_holdings/microarchitectures/arm6
- https://en.wikichip.org/wiki/dec/microarchitectures/strongarm
- https://en.wikichip.org/wiki/arm_holdings/microarchitectures/arm8
- https://en.wikichip.org/wiki/arm_holdings/microarchitectures/arm9/pr1
- https://en.wikichip.org/wiki/arm_holdings/microarchitectures/arm10
