Recent from talks
Contribute something
Nothing was collected or created yet.
ARM architecture family
View on Wikipedia
| Designer | |
|---|---|
| Bits | 32-bit, 64-bit |
| Introduced | 1985 |
| Design | RISC |
| Type | Load–store |
| Branching | Condition code, compare and branch |
| Open | No; proprietary |
| Introduced | 2011 |
|---|---|
| Version | ARMv8-R, ARMv8-A, ARMv8.1-A, ARMv8.2-A, ARMv8.3-A, ARMv8.4-A, ARMv8.5-A, ARMv8.6-A, ARMv8.7-A, ARMv8.8-A, ARMv8.9-A, ARMv9.0-A, ARMv9.1-A, ARMv9.2-A, ARMv9.3-A, ARMv9.4-A, ARMv9.5-A, ARMv9.6-A |
| Encoding | AArch64/A64 and AArch32/A32 use 32-bit instructions, AArch32/T32 (Thumb-2) uses mixed 16- and 32-bit instructions[1] |
| Endianness | Bi (little as default) |
| Extensions | SVE, SVE2, SME, AES, SM3, SM4, SHA, CRC32, RNDR, TME; All mandatory: Thumb-2, Neon, VFPv4-D16, VFPv4; obsolete: Thumb and Jazelle |
| Registers | |
| General-purpose | 31 × 64-bit integer registers[1] |
| Floating-point | 32 × 128-bit registers[1] for scalar 32- and 64-bit FP or SIMD FP or integer; or cryptography |
| Version | ARMv9-R, ARMv9-M, ARMv8-R, ARMv8-M, ARMv7-A, ARMv7-R, ARMv7E-M, ARMv7-M |
|---|---|
| Encoding | 32-bit, except Thumb-2 extensions use mixed 16- and 32-bit instructions. |
| Endianness | Bi (little as default) |
| Extensions | Thumb, Thumb-2, Neon, Jazelle, AES, SM3, SM4, SHA, CRC32, RNDR, DSP, Saturated, FPv4-SP, FPv5, Helium; obsolete since ARMv8: Thumb and Jazelle |
| Registers | |
| General-purpose | 15 × 32-bit integer registers, including R14 (link register), but not R15 (PC) |
| Floating-point | Up to 32 × 64-bit registers,[2] SIMD/floating-point (optional) |
| Version | ARMv6, ARMv5, ARMv4T, ARMv3, ARMv2 |
|---|---|
| Encoding | 32-bit, except Thumb extension uses mixed 16- and 32-bit instructions. |
| Endianness | Bi (little as default) in ARMv3 and above |
| Extensions | Thumb, Jazelle |
| Registers | |
| General-purpose | 15 × 32-bit integer registers, including R14 (link register), but not R15 (PC, 26-bit addressing in older) |
| Floating-point | None |
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of RISC instruction set architectures for computer processors. Arm Holdings develops the instruction set architecture and licenses them to other companies, who build the physical devices that use the instruction set. It also designs and licenses cores that implement these instruction set architectures.
Due to their low costs, low power consumption, and low heat generation, ARM processors are useful for light, portable, battery-powered devices, including smartphones, laptops, and tablet computers, as well as embedded systems.[3][4][5] However, ARM processors are also used for desktops and servers, including Fugaku, the world's fastest supercomputer from 2020[6] to 2022. With over 230 billion ARM chips produced,[7][8] since at least 2003, and with its dominance increasing every year[update], ARM is the most widely used family of instruction set architectures.[9][4][10][11][12]
There have been several generations of the ARM design. The original ARM1 used a 32-bit internal structure but had a 26-bit address space that limited it to 64 MB of main memory. This limitation was removed in the ARMv3 series, which has a 32-bit address space, and several additional generations up to ARMv7 remained 32-bit. Released in 2011, the ARMv8-A architecture added support for a 64-bit address space and 64-bit arithmetic with its new 32-bit fixed-length instruction set.[13] Arm Holdings has also released a series of additional instruction sets for different roles: the "Thumb" extensions add both 32- and 16-bit instructions for improved code density, while Jazelle added instructions for directly handling Java bytecode. More recent changes include the addition of simultaneous multithreading (SMT) for improved performance or fault tolerance.[14]
History
[edit]BBC Micro
[edit]Acorn Computers' first widely successful design was the BBC Micro, introduced in December 1981. This was a relatively conventional machine based on the MOS Technology 6502 CPU but ran at roughly double the performance of competing designs like the Apple II due to its use of faster dynamic random-access memory (DRAM). Typical DRAM of the era ran at about 2 MHz; Acorn arranged a deal with Hitachi for a supply of faster 4 MHz parts.[15]
Machines of the era generally shared memory between the processor and the framebuffer, which allowed the processor to quickly update the contents of the screen without having to perform separate input/output (I/O). As the timing of the video display is exacting, the video hardware had to have priority access to that memory. Due to a quirk of the 6502's design, the CPU left the memory untouched for half of the time. Thus by running the CPU at 1 MHz, the video system could read data during those down times, taking up the total 2 MHz bandwidth of the RAM. In the BBC Micro, the use of 4 MHz RAM allowed the same technique to be used, but running at twice the speed. This allowed it to outperform any similar machine on the market.[16]
Acorn Business Computer
[edit]1981 was also the year that the IBM Personal Computer was introduced. Using the recently introduced Intel 8088, a 16-bit CPU compared to the 6502's 8-bit design, it offered higher overall performance. Its introduction changed the desktop computer market radically: what had been largely a hobby and gaming market emerging over the prior five years began to change to a must-have business tool where the earlier 8-bit designs simply could not compete. Even newer 32-bit designs were also coming to market, such as the Motorola 68000[17] and National Semiconductor NS32016.[18]
Acorn began considering how to compete in this market and produced a new paper design named the Acorn Business Computer. They set themselves the goal of producing a machine with ten times the performance of the BBC Micro, but at the same price.[19] This would outperform and underprice the PC. At the same time, the recent introduction of the Apple Lisa brought the graphical user interface (GUI) concept to a wider audience and suggested the future belonged to machines with a GUI.[20] The Lisa, however, cost $9,995, as it was packed with support chips, large amounts of memory, and a hard disk drive, all very expensive then.[21]
The engineers then began studying all of the CPU designs available. Their conclusion about the existing 16-bit designs was that they were a lot more expensive and were still "a bit crap",[22] offering only slightly higher performance than their BBC Micro design. They also almost always demanded a large number of support chips to operate even at that level, which drove up the cost of the computer as a whole. These systems would simply not hit the design goal.[22] They also considered the new 32-bit designs, but these cost even more and had the same issues with support chips.[23] According to Sophie Wilson, all the processors tested at that time performed about the same, with about a 4 Mbit/s bandwidth.[24][a]
Two key events led Acorn down the path to ARM. One was the publication of a series of reports from the University of California, Berkeley, which suggested that a simple chip design could nevertheless have extremely high performance, much higher than the latest 32-bit designs on the market.[25] The second was a visit by Steve Furber and Sophie Wilson to the Western Design Center, a company run by Bill Mensch and his sister Kathryn,[26] which had become the logical successor to the MOS team and was offering new versions like the WDC 65C02. The Acorn team saw high school students producing chip layouts on Apple II machines, which suggested that anyone could do it.[27][28] In contrast, a visit to another design firm working on modern 32-bit CPU revealed a team with over a dozen members who were already on revision H of their design and yet it still contained bugs.[b] This cemented their late 1983 decision to begin their own CPU design, the Acorn RISC Machine.[29]
Design concepts
[edit]The original Berkeley RISC designs were in some sense teaching systems, not designed specifically for outright performance. To the RISC's basic register-heavy and load/store concepts, ARM added a number of the well-received design notes of the 6502. Primary among them was the ability to quickly service interrupts, which allowed the machines to offer reasonable input/output performance with no added external hardware. To offer interrupts with similar performance as the 6502, the ARM design limited its physical address space to 64 MB of total addressable space, requiring 26 bits of address. As instructions were 4 bytes (32 bits) long, and required to be aligned on 4-byte boundaries, the lower 2 bits of an instruction address were always zero. This meant the program counter (PC) only needed to be 24 bits, allowing it to be stored along with the eight bit processor flags in a single 32-bit register. That meant that upon receiving an interrupt, the entire machine state could be saved in a single operation, whereas had the PC been a full 32-bit value, it would require separate operations to store the PC and the status flags. This decision halved the interrupt overhead.[30]
Another change, and among the most important in terms of practical real-world performance, was the modification of the instruction set to take advantage of page mode DRAM. Recently introduced, page mode allowed subsequent accesses of memory to run twice as fast if they were roughly in the same location, or "page", in the DRAM chip. Berkeley's design did not consider page mode and treated all memory equally. The ARM design added special vector-like memory access instructions, the "S-cycles", that could be used to fill or save multiple registers in a single page using page mode. This doubled memory performance when they could be used, and was especially important for graphics performance.[31]
The Berkeley RISC designs used register windows to reduce the number of register saves and restores performed in procedure calls; the ARM design did not adopt this.
Wilson developed the instruction set, writing a simulation of the processor in BBC BASIC that ran on a BBC Micro with a second 6502 processor.[32][33] This convinced Acorn engineers they were on the right track. Wilson approached Acorn's CEO, Hermann Hauser, and requested more resources. Hauser gave his approval and assembled a small team to design the actual processor based on Wilson's instruction set architecture.[34] The official Acorn RISC Machine project started in October 1983.
ARM1
[edit]
Acorn chose VLSI Technology as the "silicon partner", as they were a source of ROMs and custom chips for Acorn. Acorn provided the design and VLSI provided the layout and production. The first samples of ARM silicon worked properly when first received and tested on 26 April 1985.[3] Known as ARM1, these versions ran at 6 MHz.[35]
The first ARM application was as a second processor for the BBC Micro, where it helped in developing simulation software to finish development of the support chips (VIDC, IOC, MEMC), and sped up the CAD software used in ARM2 development. Wilson subsequently rewrote BBC BASIC in ARM assembly language. The in-depth knowledge gained from designing the instruction set enabled the code to be very dense, making ARM BBC BASIC an extremely good test for any ARM emulator.
ARM Evaluations Systems featuring ARM1 CPUs and supplied as a second processors for BBC Micro and Master machines, were made available from July 1986[36] under the Acorn OEM Products brand to developers and researchers[37].
The A500 Second Processor, an additional ARM1 based BBC Micro and Master second processor, featured the ARM support chipset (VIDC, IOC, MEMC), was capable of producing video output[38] and operating near independently of the host BBC Micro.
ARM2
[edit]The result of the simulations on the ARM1 boards led to the late 1986 introduction of the ARM2 design running at 8 MHz, and the early 1987 speed-bumped version at 10 to 12 MHz.[c] A significant change in the underlying architecture was the addition of a Booth multiplier, whereas formerly multiplication had to be carried out in software.[40] Further, a new Fast Interrupt reQuest mode, FIQ for short, allowed registers 8 to 14 to be replaced as part of the interrupt itself. This meant FIQ requests did not have to save out their registers, further speeding interrupts.[41]
The first use of the ARM2 was in internal Acorn A500 development machines,[42] and the Acorn Archimedes personal computer models A305, A310, and A440, launched on the 6th June 1987.
According to the Dhrystone benchmark, the ARM2 was roughly seven times the performance of a typical 7 MHz 68000-based system like the Amiga or Macintosh SE. It was twice as fast as an Intel 80386 running at 16 MHz, and about the same speed as a multi-processor VAX-11/784 superminicomputer. The only systems that beat it were the Sun SPARC and MIPS R2000 RISC-based workstations.[43] Further, as the CPU was designed for high-speed I/O, it dispensed with many of the support chips seen in these machines; notably, it lacked any dedicated direct memory access (DMA) controller which was often found on workstations. The graphics system was also simplified based on the same set of underlying assumptions about memory and timing. The result was a dramatically simplified design, offering performance on par with expensive workstations but at a price point similar to contemporary desktops.[43]
The ARM2 featured a 32-bit data bus, 26-bit address space and 27 32-bit registers, of which 16 are accessible at any one time (including the PC).[44] The ARM2 had a transistor count of just 30,000,[45] compared to Motorola's six-year-older 68000 model with around 68,000. Much of this simplicity came from the lack of microcode, which represents about one-quarter to one-third of the 68000's transistors, and the lack of (like most CPUs of the day) a cache. This simplicity enabled the ARM2 to have a low power consumption and simpler thermal packaging by having fewer powered transistors. Nevertheless, ARM2 offered better performance than the contemporary 1987 IBM PS/2 Model 50, which initially utilised an Intel 80286, offering 1.8 MIPS @ 10 MHz, and later in 1987, the 2 MIPS of the PS/2 70, with its Intel 386 DX @ 16 MHz.[46][47]
A successor, ARM3, was produced with a 4 KB cache, which further improved performance.[48] The address bus was extended to 32 bits in the ARM6, but program code still had to lie within the first 64 MB of memory in 26-bit compatibility mode, due to the reserved bits for the status flags.[49]
Advanced RISC Machines Ltd. – ARM6
[edit]
In the late 1980s, Apple Computer and VLSI Technology started working with Acorn on newer versions of the ARM core. In 1990, Acorn spun off the design team into a new company named Advanced RISC Machines Ltd.,[50][51][52] which became ARM Ltd. when its parent company, Arm Holdings plc, floated on the London Stock Exchange and Nasdaq in 1998.[53] The new Apple–ARM work would eventually evolve into the ARM6, first released in early 1992. Apple used the ARM6-based ARM610 as the basis for their Apple Newton PDA.
Early licensees
[edit]In 1994, Acorn used the ARM610 as the main central processing unit (CPU) in their RiscPC computers. DEC licensed the ARMv4 architecture and produced the StrongARM.[54] At 233 MHz, this CPU drew only one watt (newer versions draw far less). This work was later passed to Intel as part of a lawsuit settlement, and Intel took the opportunity to supplement their i960 line with the StrongARM. Intel later developed its own high performance implementation named XScale, which it has since sold to Marvell. Transistor count of the ARM core remained essentially the same throughout these changes; ARM2 had 30,000 transistors,[55] while ARM6 grew only to 35,000.[56]
Market share
[edit]In 2005, about 98% of all mobile phones sold used at least one ARM processor.[57] In 2010, producers of chips based on ARM architectures reported shipments of 6.1 billion ARM-based processors, representing 95% of smartphones, 35% of digital televisions and set-top boxes, and 10% of mobile computers. In 2011, the 32-bit ARM architecture was the most widely used architecture in mobile devices and the most popular 32-bit one in embedded systems.[58] In 2013, 10 billion were produced[59] and "ARM-based chips are found in nearly 60 percent of the world's mobile devices".[60]
Licensing
[edit]
Core licence
[edit]Arm Holdings's primary business is selling IP cores, which licensees use to create microcontrollers (MCUs), CPUs, and systems-on-chips based on those cores. The original design manufacturer combines the ARM core with other parts to produce a complete device, typically one that can be built in existing semiconductor fabrication plants (fabs) at low cost and still deliver substantial performance. The most successful implementation has been the ARM7TDMI with hundreds of millions sold. Atmel has been a precursor design center in the ARM7TDMI-based embedded system.
The ARM architectures used in smartphones, PDAs and other mobile devices range from ARMv5 to ARMv8-A.
In 2009, some manufacturers introduced netbooks based on ARM architecture CPUs, in direct competition with netbooks based on Intel Atom.[61]
Arm Holdings offers a variety of licensing terms, varying in cost and deliverables. Arm Holdings provides to all licensees an integratable hardware description of the ARM core as well as complete software development toolset (compiler, debugger, software development kit), and the right to sell manufactured silicon containing the ARM CPU.
SoC packages integrating ARM's core designs include Nvidia Tegra's first three generations, CSR plc's Quatro family, ST-Ericsson's Nova and NovaThor, Silicon Labs's Precision32 MCU, Texas Instruments's OMAP products, Samsung's Hummingbird and Exynos products, Apple's A4, A5, and A5X, and NXP's i.MX.
Fabless licensees, who wish to integrate an ARM core into their own chip design, are usually only interested in acquiring a ready-to-manufacture verified semiconductor intellectual property core. For these customers, Arm Holdings delivers a gate netlist description of the chosen ARM core, along with an abstracted simulation model and test programs to aid design integration and verification. More ambitious customers, including integrated device manufacturers (IDM) and foundry operators, choose to acquire the processor IP in synthesizable RTL (Verilog) form. With the synthesizable RTL, the customer has the ability to perform architectural level optimisations and extensions. This allows the designer to achieve exotic design goals not otherwise possible with an unmodified netlist (high clock speed, very low power consumption, instruction set extensions, etc.). While Arm Holdings does not grant the licensee the right to resell the ARM architecture itself, licensees may freely sell manufactured products such as chip devices, evaluation boards and complete systems. Merchant foundries can be a special case; not only are they allowed to sell finished silicon containing ARM cores, they generally hold the right to re-manufacture ARM cores for other customers.
Arm Holdings prices its IP based on perceived value. Lower performing ARM cores typically have lower licence costs than higher performing cores. In implementation terms, a synthesisable core costs more than a hard macro (blackbox) core. Complicating price matters, a merchant foundry that holds an ARM licence, such as Samsung or Fujitsu, can offer fab customers reduced licensing costs. In exchange for acquiring the ARM core through the foundry's in-house design services, the customer can reduce or eliminate payment of ARM's upfront licence fee.
Compared to dedicated semiconductor foundries (such as TSMC and UMC) without in-house design services, Fujitsu/Samsung charge two- to three-times more per manufactured wafer.[citation needed] For low to mid volume applications, a design service foundry offers lower overall pricing (through subsidisation of the licence fee). For high volume mass-produced parts, the long term cost reduction achievable through lower wafer pricing reduces the impact of ARM's NRE (non-recurring engineering) costs, making the dedicated foundry a better choice.
Companies that have developed chips with cores designed by Arm include Amazon.com's Annapurna Labs subsidiary,[62] Analog Devices, Apple, AppliedMicro (now: MACOM Technology Solutions[63]), Atmel, Broadcom, Cavium, Cypress Semiconductor, Freescale Semiconductor (now NXP Semiconductors), Huawei, Intel,[dubious – discuss] Maxim Integrated, Nvidia, NXP, Qualcomm, Renesas, Samsung Electronics, ST Microelectronics, Texas Instruments, and Xilinx.
Built on ARM Cortex Technology licence
[edit]In February 2016, ARM announced the Built on ARM Cortex Technology licence, often shortened to Built on Cortex (BoC) licence. This licence allows companies to partner with ARM and make modifications to ARM Cortex designs. These design modifications will not be shared with other companies. These semi-custom core designs also have brand freedom, for example Kryo 280.
Companies that are current licensees of Built on ARM Cortex Technology include Qualcomm.[64]
Architectural licence
[edit]Companies can also obtain an ARM architectural licence for designing their own CPU cores using the ARM instruction sets. These cores must comply fully with the ARM architecture. Companies that have designed cores that implement an ARM architecture include Apple, AppliedMicro (now: Ampere Computing), Broadcom, Cavium (now: Marvell), Digital Equipment Corporation, Intel, Nvidia, Qualcomm, Samsung Electronics, Fujitsu, and NUVIA Inc. (acquired by Qualcomm in 2021).
ARM Flexible Access
[edit]On 16 July 2019, ARM announced ARM Flexible Access. ARM Flexible Access provides unlimited access to included ARM intellectual property (IP) for development. Per product licence fees are required once a customer reaches foundry tapeout or prototyping.[65][66]
75% of ARM's most recent IP over the last two years are included in ARM Flexible Access. As of October 2019:
- CPUs: Cortex-A5, Cortex-A7, Cortex-A32, Cortex-A34, Cortex-A35, Cortex-A53, Cortex-R5, Cortex-R8, Cortex-R52, Cortex-M0, Cortex-M0+, Cortex-M3, Cortex-M4, Cortex-M7, Cortex-M23, Cortex-M33
- GPUs: Mali-G52, Mali-G31. Includes Mali Driver Development Kits (DDK).
- Interconnect: CoreLink NIC-400, CoreLink NIC-450, CoreLink CCI-400, CoreLink CCI-500, CoreLink CCI-550, ADB-400 AMBA, XHB-400 AXI-AHB
- System Controllers: CoreLink GIC-400, CoreLink GIC-500, PL192 VIC, BP141 TrustZone Memory Wrapper, CoreLink TZC-400, CoreLink L2C-310, CoreLink MMU-500, BP140 Memory Interface
- Security IP: CryptoCell-312, CryptoCell-712, TrustZone True Random Number Generator
- Peripheral Controllers: PL011 UART, PL022 SPI, PL031 RTC
- Debug & Trace: CoreSight SoC-400, CoreSight SDC-600, CoreSight STM-500, CoreSight System Trace Macrocell, CoreSight Trace Memory Controller
- Design Kits: Corstone-101, Corstone-201
- Physical IP: Artisan PIK for Cortex-M33 TSMC 22ULL including memory compilers, logic libraries, GPIOs and documentation
- Tools & Materials: Socrates IP ToolingARM Design Studio, Virtual System Models
- Support: Standard ARM Technical support, ARM online training, maintenance updates, credits toward onsite training and design reviews
Cores
[edit]| Architecture | Core bit-width |
Cores | Profile | Refe- rences | |
|---|---|---|---|---|---|
| Arm Ltd. | Third-party | ||||
ARMv1 |
ARM1 | Classic |
|||
ARMv2 |
32 |
ARM2, ARM250, ARM3 | Amber, STORM Open Soft Core[67] | Classic |
|
ARMv3 |
32 |
ARM6, ARM7 | Classic |
||
ARMv4 |
32 |
ARM8 | StrongARM, FA526, ZAP Open Source Processor Core | Classic |
|
ARMv4T |
32 |
ARM7TDMI, ARM9TDMI, SecurCore SC100 | Classic |
||
ARMv5TE |
32 |
ARM7EJ, ARM9E, ARM10E | XScale, FA626TE, Feroceon, PJ1/Mohawk | Classic |
|
ARMv6 |
32 |
ARM11 | Classic |
||
ARMv6-M |
32 |
ARM Cortex-M0, ARM Cortex-M0+, ARM Cortex-M1, SecurCore SC000 | |||
ARMv7-M |
32 |
ARM Cortex-M3, SecurCore SC300 | Apple M7 motion coprocessor | Microcontroller |
|
ARMv7E-M |
32 |
ARM Cortex-M4, ARM Cortex-M7 | Microcontroller |
||
ARMv8-M |
32 |
ARM Cortex-M23,[69] ARM Cortex-M33[70] | Microcontroller |
||
ARMv8.1-M
|
32
|
ARM Cortex-M55, ARM Cortex-M85 | Microcontroller
|
||
ARMv7-R |
32 |
ARM Cortex-R4, ARM Cortex-R5, ARM Cortex-R7, ARM Cortex-R8 | |||
ARMv8-R |
32 |
ARM Cortex-R52 | Real-time |
||
64
|
ARM Cortex-R82 | Real-time
|
|||
ARMv7-A |
32 |
ARM Cortex-A5, ARM Cortex-A7, ARM Cortex-A8, ARM Cortex-A9, ARM Cortex-A12, ARM Cortex-A15, ARM Cortex-A17 | Qualcomm Scorpion/Krait, PJ4/Sheeva, Apple Swift (A6, A6X) | ||
ARMv8-A |
32 |
ARM Cortex-A32[76] | Application |
||
64/32 |
ARM Cortex-A35,[77] ARM Cortex-A53, ARM Cortex-A57,[78] ARM Cortex-A72,[79] ARM Cortex-A73[80] | X-Gene, Nvidia Denver 1/2, Cavium ThunderX, AMD K12, Apple Cyclone (A7)/Typhoon (A8, A8X)/Twister (A9, A9X)/Hurricane+Zephyr (A10, A10X), Qualcomm Kryo, Samsung M1/M2 ("Mongoose") /M3 ("Meerkat") | Application |
||
| ARM Cortex-A34[86] | Application
|
||||
ARMv8.1-A |
64/32 |
Cavium ThunderX2 | Application |
||
ARMv8.2-A |
64/32 |
ARM Cortex-A55,[88] ARM Cortex-A75,[89] ARM Cortex-A76,[90] ARM Cortex-A77, ARM Cortex-A78, ARM Cortex-X1, ARM Neoverse N1 | Nvidia Carmel, Samsung M4 ("Cheetah"), Fujitsu A64FX (ARMv8 SVE 512-bit) | Application |
|
64 |
ARM Cortex-A65, ARM Neoverse E1 with simultaneous multithreading (SMT), ARM Cortex-A65AE[94] (also having e.g. ARMv8.4 Dot Product; made for safety critical tasks such as advanced driver-assistance systems (ADAS)) | Apple Monsoon+Mistral (A11) (September 2017) | Application |
||
ARMv8.3-A
|
64/32 |
Application
|
|||
64 |
Apple Vortex+Tempest (A12, A12X, A12Z), Marvell ThunderX3 (v8.3+)[95] | Application |
|||
ARMv8.4-A |
64/32 |
Application |
|||
64 |
ARM Neoverse V1 | Apple Lightning+Thunder (A13), Apple Firestorm+Icestorm (A14, M1) | Application |
||
ARMv8.5-A
|
64/32 |
Application
|
|||
64 |
Application
|
||||
ARMv8.6-A
|
64 |
Apple Avalanche+Blizzard (A15, M2), Apple Everest+Sawtooth (A16),[96] Apple Coll (A17), Apple Ibiza/Lobos/Palma (M3) | Application
|
||
ARMv8.7-A
|
64 |
Application
|
|||
ARMv8.8-A
|
64
|
Application
|
|||
ARMv8.9-A
|
64
|
Application
|
|||
ARMv9.0-A
|
64
|
ARM Cortex-A510, ARM Cortex-A710, ARM Cortex-A715, ARM Cortex-X2, ARM Cortex-X3, ARM Neoverse E2, ARM Neoverse N2, ARM Neoverse V2 | Application
|
||
ARMv9.1-A
|
64
|
Application
|
|||
ARMv9.2-A
|
64
|
ARM Cortex-A520, ARM Cortex-A720, ARM Cortex-X4, ARM Neoverse V3,[100] ARM Cortex-X925,[101] ARM Cortex-A320[102] | Apple Donan/BravaChop/Brava (Apple M4),[103] Apple Tupai/Tahiti (A18) | Application
|
|
ARMv9.3-A
|
64
|
TBA | Application
|
||
ARMv9.4-A
|
64
|
TBA | Application
|
||
ARMv9.5-A
|
64
|
TBA | Application
|
||
ARMv9.6-A
|
64
|
TBA | Application
|
||
- ^ a b Although most datapaths and CPU registers in the early ARM processors were 32-bit, addressable memory was limited to 26 bits; with upper bits, then, used for status flags in the program counter register.
- ^ a b c ARMv3 included a compatibility mode to support the 26-bit addresses of earlier versions of the architecture. This compatibility mode optional in ARMv4, and removed entirely in ARMv5.
Arm provides a list of vendors who implement ARM cores in their design (application specific standard products (ASSP), microprocessor and microcontrollers).[108]
Example applications of ARM cores
[edit]
ARM cores are used in a number of products, particularly PDAs and smartphones. Some computing examples are Microsoft's first generation Surface, Surface 2 and Pocket PC devices (following 2002), Apple's iPads, and Asus's Eee Pad Transformer tablet computers, and several Chromebook laptops. Others include Apple's iPhone smartphones and iPod portable media players, Canon PowerShot digital cameras, Nintendo Switch hybrid, the Wii security processor and 3DS handheld game consoles, and TomTom turn-by-turn navigation systems.
In 2005, Arm took part in the development of Manchester University's computer SpiNNaker, which used ARM cores to simulate the human brain.[109]
ARM chips are also used in Raspberry Pi, BeagleBoard, BeagleBone, PandaBoard, and other single-board computers, because they are very small, inexpensive, and consume very little power.
32-bit architecture
[edit]

The 32-bit ARM architecture (ARM32), such as ARMv7-A (implementing AArch32; see section on Armv8-A for more on it), was the most widely used architecture in mobile devices as of 2011[update].[58]
Since 1995, various versions of the ARM Architecture Reference Manual (see § External links) have been the primary source of documentation on the ARM processor architecture and instruction set, distinguishing interfaces that all ARM processors are required to support (such as instruction semantics) from implementation details that may vary. The architecture has evolved over time, and version seven of the architecture, ARMv7, defines three architecture "profiles":
- A-profile, the "Application" profile, implemented by 32-bit cores in the Cortex-A series and by some non-ARM cores
- R-profile, the "Real-time" profile, implemented by cores in the Cortex-R series
- M-profile, the "Microcontroller" profile, implemented by most cores in the Cortex-M series
Although the architecture profiles were first defined for ARMv7, ARM subsequently defined the ARMv6-M architecture (used by the Cortex M0/M0+/M1) as a subset of the ARMv7-M profile with fewer instructions.
Architecture versions
[edit]- ARMv1
26-bit addressing - obsolete as of June 2000[110]
- ARMv2
Multiply and multiply-accumulate instructions; coprocessor support - obsolete as of June 2000[110]
- ARMv2a
Atomic load-and-store instructions - obsolete as of June 2000[110]
- ARMv3
32-bit addressing[110] - obsolete as of July 2005[111]
- ARMv4
Halfword load and store instructions; sign-extending byte and halfword load instructions; 26-bit addressing support removed[110]
- ARMv5
Count leading zeros instruction[110] - obsolete as of July 2005[111]
- ARMv5T
- ARMv5 plus version 2 of Thumb[110]
- ARMv5TE
- ARMv5T plus enhanced DSP instructions[110]
- ARMv5TExP
- ARMv5TE, but without
LDRD,MCRR,MRRC,PLD, andSTRDenhanced DSP instructions[110]
- ARMv6
Full ARMv5TEJ; byte reversal instructions; exclusive-access load and store instructions; byte and halfword sign-extend and zero-extend instructions; SIMD media instructions; unaligned access support[111]
- ARMv6K
- ARMv6 plus instructions to support multiprocessor systems[112]
- ARMv7
- ARMv7-A, ARMv7-R
- Optional signed and unsigned divide; memory and synchronization barrier instructions; preload instruction hint instruction[112]
- ARMv7-M
- Thumb-2 only[113]
- ARMv8
- Introduces two Execution states, AArch32 and AArch64, the former of which supports the 32-bit ARM instruction set, called A32, and the Thumb-2 instruction set, called T32, and the latter of which supports a new instruction set with 32 64-bit registers, called A64.
- ARMv8-A AArch32, ARMv8-R AArch32
- Load-acquire and store-release instructions, crypto instructions, data barrier instruction extensions, Send Event Locally instruction[114]
- ARMv8-M
- Variant Thumb-2 only[115]
CPU modes
[edit]Except in the M-profile, the 32-bit ARM architecture specifies several CPU modes, depending on the implemented architecture features. At any moment in time, the CPU can be in only one mode, but it can switch modes due to external events (interrupts) or programmatically.[116]
- User mode: The only non-privileged mode.
- FIQ mode: A privileged mode that is entered whenever the processor accepts a fast interrupt request.
- IRQ mode: A privileged mode that is entered whenever the processor accepts an interrupt.
- Supervisor (svc) mode: A privileged mode entered whenever the CPU is reset or when an SVC instruction is executed.
- Abort mode: A privileged mode that is entered whenever a prefetch abort or data abort exception occurs.
- Undefined mode: A privileged mode that is entered whenever an undefined instruction exception occurs.
- System mode (ARMv4 and above): The only privileged mode that is not entered by an exception. It can only be entered by executing an instruction that explicitly writes to the mode bits of the Current Program Status Register (CPSR) from another privileged mode (not from user mode).
- Monitor mode (ARMv6 and ARMv7 Security Extensions, ARMv8 EL3): A monitor mode is introduced to support TrustZone extension in ARM cores.
- Hyp mode (ARMv7 Virtualization Extensions, ARMv8 EL2): A hypervisor mode that supports Popek and Goldberg virtualization requirements for the non-secure operation of the CPU.[117][118]
- Thread mode (ARMv6-M, ARMv7-M, ARMv8-M): A mode which can be specified as either privileged or unprivileged. Whether the Main Stack Pointer (MSP) or Process Stack Pointer (PSP) is used can also be specified in CONTROL register with privileged access. This mode is designed for user tasks in RTOS environment but it is typically used in bare-metal for super-loop.
- Handler mode (ARMv6-M, ARMv7-M, ARMv8-M): A mode dedicated for exception handling (except the RESET which are handled in Thread mode). Handler mode always uses MSP and works in privileged level.
Instruction set
[edit]The original (and subsequent) ARM implementation was hardwired without microcode, like the much simpler 8-bit 6502 processor used in prior Acorn microcomputers.
The 32-bit ARM architecture (and the 64-bit architecture for the most part) includes the following RISC features:
- Load–store architecture.
- No support for unaligned memory accesses in the original version of the architecture. ARMv6 and later, except some microcontroller versions, support unaligned accesses for half-word and single-word load/store instructions with some limitations, such as no guaranteed atomicity.[119][120]
- Uniform 16 × 32-bit register file (including the program counter, stack pointer and the link register).
- Fixed instruction width of 32 bits to ease decoding and pipelining, at the cost of decreased code density. Later, the Thumb instruction set added 16-bit instructions and increased code density.
- Mostly single clock-cycle execution.
To compensate for the simpler design, compared with processors like the Intel 80286 and Motorola 68020, some additional design features were used:
- Conditional execution of most instructions reduces branch overhead and compensates for the lack of a branch predictor in early chips.
- Arithmetic instructions alter condition codes only when desired.
- 32-bit barrel shifter can be used without performance penalty with most arithmetic instructions and address calculations.
- Has powerful indexed addressing modes.
- A link register supports fast leaf function calls.
- A simple, but fast, 2-priority-level interrupt subsystem has switched register banks.
Arithmetic instructions
[edit]ARM includes integer arithmetic operations for add, subtract, and multiply; some versions of the architecture also support divide operations.
ARM supports 32-bit × 32-bit multiplies with either a 32-bit result or 64-bit result, though Cortex-M0 / M0+ / M1 cores do not support 64-bit results.[121] Some ARM cores also support 16-bit × 16-bit and 32-bit × 16-bit multiplies.
The divide instructions are only included in the following ARM architectures:
- Armv7-M and Armv7E-M architectures always include divide instructions.[122]
- Armv7-R architecture always includes divide instructions in the Thumb instruction set, but optionally in its 32-bit instruction set.[123]
- Armv7-A architecture optionally includes the divide instructions. The instructions might not be implemented, or implemented only in the Thumb instruction set, or implemented in both the Thumb and ARM instruction sets, or implemented if the Virtualization Extensions are included.[123]
Registers
[edit]| usr | sys | svc | abt | und | irq | fiq |
|---|---|---|---|---|---|---|
| R0 | ||||||
| R1 | ||||||
| R2 | ||||||
| R3 | ||||||
| R4 | ||||||
| R5 | ||||||
| R6 | ||||||
| R7 | ||||||
| R8 | R8_fiq | |||||
| R9 | R9_fiq | |||||
| R10 | R10_fiq | |||||
| R11 | R11_fiq | |||||
| R12 | R12_fiq | |||||
| R13 | R13_svc | R13_abt | R13_und | R13_irq | R13_fiq | |
| R14 | R14_svc | R14_abt | R14_und | R14_irq | R14_fiq | |
| R15 | ||||||
| CPSR | ||||||
| SPSR_svc | SPSR_abt | SPSR_und | SPSR_irq | SPSR_fiq | ||
Registers R0 through R7 are the same across all CPU modes; they are never banked.
Registers R8 through R12 are the same across all CPU modes except FIQ mode. FIQ mode has its own distinct R8 through R12 registers.
R13 and R14 are banked across all privileged CPU modes except system mode. That is, each mode that can be entered because of an exception has its own R13 and R14. These registers generally contain the stack pointer and the return address from function calls, respectively.
Aliases:
- R13 is also referred to as SP, the stack pointer.
- R14 is also referred to as LR, the link register.
- R15 is also referred to as PC, the program counter.
The Current Program Status Register (CPSR) has the following 32 bits.[124]
- M (bits 0–4) is the processor mode bits.
- T (bit 5) is the Thumb state bit.
- F (bit 6) is the FIQ disable bit.
- I (bit 7) is the IRQ disable bit.
- A (bit 8) is the imprecise data abort disable bit.
- E (bit 9) is the data endianness bit.
- IT (bits 10–15 and 25–26) is the if-then state bits.
- GE (bits 16–19) is the greater-than-or-equal-to bits.
- DNM (bits 20–23) is the do not modify bits.
- J (bit 24) is the Java state bit.
- Q (bit 27) is the sticky overflow bit.
- V (bit 28) is the overflow bit.
- C (bit 29) is the carry/borrow/extend bit.
- Z (bit 30) is the zero bit.
- N (bit 31) is the negative/less than bit.
Conditional execution
[edit]Almost every ARM instruction has a conditional execution feature called predication, which is implemented with a 4-bit condition code selector (the predicate). To allow for unconditional execution, one of the four-bit codes causes the instruction to be always executed. Most other CPU architectures only have condition codes on branch instructions.[125]
Though the predicate takes up four of the 32 bits in an instruction code, and thus cuts down significantly on the encoding bits available for displacements in memory access instructions, it avoids branch instructions when generating code for small if statements. Apart from eliminating the branch instructions themselves, this preserves the fetch/decode/execute pipeline at the cost of only one cycle per skipped instruction.
An algorithm that provides a good example of conditional execution is the subtraction-based Euclidean algorithm for computing the greatest common divisor. In the C programming language, the algorithm can be written as:
int gcd(int a, int b) {
while (a != b) // We enter the loop when a < b or a > b, but not when a == b
if (a > b) // When a > b we do this
a -= b;
else // When a < b we do that (no "if (a < b)" needed since a != b is checked in while condition)
b -= a;
return a;
}
The same algorithm can be rewritten in a way closer to target ARM instructions as:
loop:
// Compare a and b
GT = a > b;
LT = a < b;
NE = a != b;
// Perform operations based on flag results
if (GT) a -= b; // Subtract *only* if greater-than
if (LT) b -= a; // Subtract *only* if less-than
if (NE) goto loop; // Loop *only* if compared values were not equal
return a;
and coded in assembly language as:
; assign a to register r0, b to r1
loop: CMP r0, r1 ; set condition "NE" if (a ≠ b),
; "GT" if (a > b),
; or "LT" if (a < b)
SUBGT r0, r0, r1 ; if "GT" (Greater Than), then a = a − b
SUBLT r1, r1, r0 ; if "LT" (Less Than), then b = b − a
BNE loop ; if "NE" (Not Equal), then loop
B lr ; return
which avoids the branches around the then and else clauses. If r0 and r1 are equal then neither of the SUB instructions will be executed, eliminating the need for a conditional branch to implement the while check at the top of the loop, for example had SUBLE (less than or equal) been used.
One of the ways that Thumb code provides a more dense encoding is to remove the four-bit selector from non-branch instructions.
Other features
[edit]Another feature of the instruction set is the ability to fold shifts and rotates into the data processing (arithmetic, logical, and register-register move) instructions, so that, for example, the statement in C language:
a += (j << 2);
could be rendered as a one-word, one-cycle instruction:[126]
ADD Ra, Ra, Rj, LSL #2
This results in the typical ARM program being denser than expected with fewer memory accesses; thus the pipeline is used more efficiently.
The ARM processor also has features rarely seen in other RISC architectures, such as PC-relative addressing (indeed, on the 32-bit[1] ARM the PC is one of its 16 registers) and pre- and post-increment addressing modes.
The ARM instruction set has increased over time. Some early ARM processors (before ARM7TDMI), for example, have no instruction to store a two-byte quantity.
Pipelines and other implementation issues
[edit]The ARM7 and earlier implementations have a three-stage pipeline; the stages being fetch, decode, and execute. Higher-performance designs, such as the ARM9, have deeper pipelines: Cortex-A8 has thirteen stages. Additional implementation changes for higher performance include a faster adder and more extensive branch prediction logic. The difference between the ARM7DI and ARM7DMI cores, for example, was an improved multiplier; hence the added "M".
Coprocessors
[edit]The ARM architecture (pre-Armv8) provides a non-intrusive way of extending the instruction set using "coprocessors" that can be addressed using MCR, MRC, MRRC, MCRR, and similar instructions. The coprocessor space is divided logically into 16 coprocessors with numbers from 0 to 15, coprocessor 15 (cp15) being reserved for some typical control functions like managing the caches and MMU operation on processors that have one.
In ARM-based machines, peripheral devices are usually attached to the processor by mapping their physical registers into ARM memory space, into the coprocessor space, or by connecting to another device (a bus) that in turn attaches to the processor. Coprocessor accesses have lower latency, so some peripherals—for example, an XScale interrupt controller—are accessible in both ways: through memory and through coprocessors.
In other cases, chip designers only integrate hardware using the coprocessor mechanism. For example, an image processing engine might be a small ARM7TDMI core combined with a coprocessor that has specialised operations to support a specific set of HDTV transcoding primitives.
Debugging
[edit]This section needs additional citations for verification. (March 2011) |
All modern ARM processors include hardware debugging facilities, allowing software debuggers to perform operations such as halting, stepping, and breakpointing of code starting from reset. These facilities are built using JTAG support, though some newer cores optionally support ARM's own two-wire "SWD" protocol. In ARM7TDMI cores, the "D" represented JTAG debug support, and the "I" represented presence of an "EmbeddedICE" debug module. For ARM7 and ARM9 core generations, EmbeddedICE over JTAG was a de facto debug standard, though not architecturally guaranteed.
The ARMv7 architecture defines basic debug facilities at an architectural level. These include breakpoints, watchpoints and instruction execution in a "Debug Mode"; similar facilities were also available with EmbeddedICE. Both "halt mode" and "monitor" mode debugging are supported. The actual transport mechanism used to access the debug facilities is not architecturally specified, but implementations generally include JTAG support.
There is a separate ARM "CoreSight" debug architecture, which is not architecturally required by ARMv7 processors.
Debug Access Port
[edit]The Debug Access Port (DAP) is an implementation of an ARM Debug Interface.[127] There are two different supported implementations, the Serial Wire JTAG Debug Port (SWJ-DP) and the Serial Wire Debug Port (SW-DP).[128] CMSIS-DAP is a standard interface that describes how various debugging software on a host PC can communicate over USB to firmware running on a hardware debugger, which in turn talks over SWD or JTAG to a CoreSight-enabled ARM Cortex CPU.[129][130][131]
DSP enhancement instructions
[edit]To improve the ARM architecture for digital signal processing and multimedia applications, DSP instructions were added to the instruction set.[132] These are signified by an "E" in the name of the ARMv5TE and ARMv5TEJ architectures. E-variants also imply T, D, M, and I.
The new instructions are common in digital signal processor (DSP) architectures. They include variations on signed multiply–accumulate, saturated add and subtract, and count leading zeros.
First introduced in 1999, this extension of the core instruction set contrasted with ARM's earlier DSP coprocessor known as Piccolo, which employed a distinct, incompatible instruction set whose execution involved a separate program counter.[133] Piccolo instructions employed a distinct register file of sixteen 32-bit registers, with some instructions combining registers for use as 48-bit accumulators and other instructions addressing 16-bit half-registers. Some instructions were able to operate on two such 16-bit values in parallel. Communication with the Piccolo register file involved load to Piccolo and store from Piccolo coprocessor instructions via two buffers of eight 32-bit entries. Described as reminiscent of other approaches, notably Hitachi's SH-DSP and Motorola's 68356, Piccolo did not employ dedicated local memory and relied on the bandwidth of the ARM core for DSP operand retrieval, impacting concurrent performance.[134] Piccolo's distinct instruction set also proved not to be a "good compiler target".[133]
SIMD extensions for multimedia
[edit]Introduced in the ARMv6 architecture, this was a precursor to Advanced SIMD, also named Neon.[135]
Jazelle
[edit]Jazelle DBX (Direct Bytecode eXecution) is a technique that allows Java bytecode to be executed directly in the ARM architecture as a third execution state (and instruction set) alongside the existing ARM and Thumb-mode. Support for this state is signified by the "J" in the ARMv5TEJ architecture, and in ARM9EJ-S and ARM7EJ-S core names. Support for this state is required starting in ARMv6 (except for the ARMv7-M profile), though newer cores only include a trivial implementation that provides no hardware acceleration.
Thumb
[edit]To improve compiled code density, processors since the ARM7TDMI (released in 1994[136]) have featured the Thumb compressed instruction set, which have their own state. (The "T" in "TDMI" indicates the Thumb feature.) When in this state, the processor executes the Thumb instruction set, a compact 16-bit encoding for a subset of the ARM instruction set.[137] Most of the Thumb instructions are directly mapped to normal ARM instructions. The space saving comes from making some of the instruction operands implicit and limiting the number of possibilities compared to the ARM instructions executed in the ARM instruction set state.
In Thumb, the 16-bit opcodes have less functionality. For example, only branches can be conditional, and many opcodes are restricted to accessing only half of all of the CPU's general-purpose registers. The shorter opcodes give improved code density overall, even though some operations require extra instructions. In situations where the memory port or bus width is constrained to less than 32 bits, the shorter Thumb opcodes allow increased performance compared with 32-bit ARM code, as less program code may need to be loaded into the processor over the constrained memory bandwidth.
Unlike processor architectures with variable length (16- or 32-bit) instructions, such as the Cray-1 and Hitachi SuperH, the ARM and Thumb instruction sets exist independently of each other. Embedded hardware, such as the Game Boy Advance, typically have a small amount of RAM accessible with a full 32-bit datapath; the majority is accessed via a 16-bit or narrower secondary datapath. In this situation, it usually makes sense to compile Thumb code and hand-optimise a few of the most CPU-intensive sections using full 32-bit ARM instructions, placing these wider instructions into the 32-bit bus accessible memory.
The first processor with a Thumb instruction decoder was the ARM7TDMI. All processors supporting 32-bit instruction sets, starting with ARM9, and including XScale, have included a Thumb instruction decoder. It includes instructions adopted from the Hitachi SuperH (1992), which was licensed by ARM.[138] ARM's smallest processor families (Cortex M0 and M1) implement only the 16-bit Thumb instruction set for maximum performance in lowest cost applications. ARM processors that don't support 32-bit addressing also omit Thumb.
Thumb-2
[edit]Thumb-2 technology was introduced in the ARM1156 core, announced in 2003. Thumb-2 extends the limited 16-bit instruction set of Thumb with additional 32-bit instructions to give the instruction set more breadth, thus producing a variable-length instruction set. A stated aim for Thumb-2 was to achieve code density similar to Thumb with performance similar to the ARM instruction set on 32-bit memory.
Thumb-2 extends the Thumb instruction set with bit-field manipulation, table branches and conditional execution. At the same time, the ARM instruction set was extended to maintain equivalent functionality in both instruction sets. A new "Unified Assembly Language" (UAL) supports generation of either Thumb or ARM instructions from the same source code; versions of Thumb seen on ARMv7 processors are essentially as capable as ARM code (including the ability to write interrupt handlers). This requires a bit of care, and use of a new "IT" (if-then) instruction, which permits up to four successive instructions to execute based on a tested condition, or on its inverse. When compiling into ARM code, this is ignored, but when compiling into Thumb it generates an actual instruction. For example:
; if (r0 == r1)
CMP r0, r1
ITE EQ ; ARM: no code ... Thumb: IT instruction
; then r0 = r2;
MOVEQ r0, r2 ; ARM: conditional; Thumb: condition via ITE 'T' (then)
; else r0 = r3;
MOVNE r0, r3 ; ARM: conditional; Thumb: condition via ITE 'E' (else)
; recall that the Thumb MOV instruction has no bits to encode "EQ" or "NE".
All ARMv7 chips support the Thumb instruction set. All chips in the Cortex-A series that support ARMv7, all Cortex-R series, and all ARM11 series support both "ARM instruction set state" and "Thumb instruction set state", while chips in the Cortex-M series support only the Thumb instruction set.[139][140][141]
Thumb Execution Environment (ThumbEE)
[edit]ThumbEE (erroneously called Thumb-2EE in some ARM documentation), which was marketed as Jazelle RCT[142] (Runtime Compilation Target), was announced in 2005 and deprecated in 2011. It first appeared in the Cortex-A8 processor. ThumbEE is a fourth instruction set state, making small changes to the Thumb-2 extended instruction set. These changes make the instruction set particularly suited to code generated at runtime (e.g. by JIT compilation) in managed Execution Environments. ThumbEE is a target for languages such as Java, C#, Perl, and Python, and allows JIT compilers to output smaller compiled code without reducing performance.[citation needed]
New features provided by ThumbEE include automatic null pointer checks on every load and store instruction, an instruction to perform an array bounds check, and special instructions that call a handler. In addition, because it utilises Thumb-2 technology, ThumbEE provides access to registers r8–r15 (where the Jazelle/DBX Java VM state is held).[143] Handlers are small sections of frequently called code, commonly used to implement high level languages, such as allocating memory for a new object. These changes come from repurposing a handful of opcodes, and knowing the core is in the new ThumbEE state.
On 23 November 2011, Arm deprecated any use of the ThumbEE instruction set,[144] and Armv8 removes support for ThumbEE.
Floating-point (VFP)
[edit]VFP (Vector Floating Point) technology is a floating-point unit (FPU) coprocessor extension to the ARM architecture[145] (implemented differently in Armv8 – coprocessors not defined there). It provides low-cost single-precision and double-precision floating-point computation fully compliant with the ANSI/IEEE Std 754-1985 Standard for Binary Floating-Point Arithmetic. VFP provides floating-point computation suitable for a wide spectrum of applications such as PDAs, smartphones, voice compression and decompression, three-dimensional graphics and digital audio, printers, set-top boxes, and automotive applications. The VFP architecture was intended to support execution of short "vector mode" instructions but these operated on each vector element sequentially and thus did not offer the performance of true single instruction, multiple data (SIMD) vector parallelism. This vector mode was therefore removed shortly after its introduction,[146] to be replaced with the much more powerful Advanced SIMD, also named Neon.
Some devices such as the ARM Cortex-A8 have a cut-down VFPLite module instead of a full VFP module, and require roughly ten times more clock cycles per float operation.[147] Pre-Armv8 architecture implemented floating-point/SIMD with the coprocessor interface. Other floating-point and/or SIMD units found in ARM-based processors using the coprocessor interface include FPA, FPE, iwMMXt, some of which were implemented in software by trapping but could have been implemented in hardware. They provide some of the same functionality as VFP but are not opcode-compatible with it. FPA10 also provides extended precision, but implements correct rounding (required by IEEE 754) only in single precision.[148]
- VFPv1
- Obsolete
- VFPv2
- An optional extension to the ARM instruction set in the ARMv5TE, ARMv5TEJ and ARMv6 architectures. VFPv2 has 16 64-bit FPU registers.
- VFPv3 or VFPv3-D32
- Implemented on most Cortex-A8 and A9 ARMv7 processors. It is backward-compatible with VFPv2, except that it cannot trap floating-point exceptions. VFPv3 has 32 64-bit FPU registers as standard, adds VCVT instructions to convert between scalar, float and double, adds immediate mode to VMOV such that constants can be loaded into FPU registers.
- VFPv3-D16
- As above, but with only 16 64-bit FPU registers. Implemented on Cortex-R4 and R5 processors and the Tegra 2 (Cortex-A9).
- VFPv3-F16
- Uncommon; it supports IEEE754-2008 half-precision (16-bit) floating point as a storage format.
- VFPv4 or VFPv4-D32
- Implemented on Cortex-A12 and A15 ARMv7 processors, Cortex-A7 optionally has VFPv4-D32 in the case of an FPU with Neon.[149] VFPv4 has 32 64-bit FPU registers as standard, adds both half-precision support as a storage format and fused multiply-accumulate instructions to the features of VFPv3.
- VFPv4-D16
- As above, but it has only 16 64-bit FPU registers. Implemented on Cortex-A5 and A7 processors in the case of an FPU without Neon.[149]
- VFPv5-D16-M
- Implemented on Cortex-M7 when single and double-precision floating-point core option exists.
In Debian Linux and derivatives such as Ubuntu and Linux Mint, armhf (ARM hard float) refers to the ARMv7 architecture including the additional VFP3-D16 floating-point hardware extension (and Thumb-2) above. Software packages and cross-compiler tools use the armhf vs. arm/armel suffixes to differentiate.[150]
Advanced SIMD (Neon)
[edit]The Advanced SIMD extension (also known as Neon or "MPE" Media Processing Engine) is a combined 64- and 128-bit SIMD instruction set that provides standardised acceleration for media and signal processing applications. Neon is included in all Cortex-A8 devices, but is optional in Cortex-A9 devices.[151] Neon can execute MP3 audio decoding on CPUs running at 10 MHz, and can run the GSM adaptive multi-rate (AMR) speech codec at 13 MHz. It features a comprehensive instruction set, separate register files, and independent execution hardware.[152] Neon supports 8-, 16-, 32-, and 64-bit integer and single-precision (32-bit) floating-point data and SIMD operations for handling audio and video processing as well as graphics and gaming processing. In Neon, the SIMD supports up to 16 operations at the same time. The Neon hardware shares the same floating-point registers as used in VFP. Devices such as the ARM Cortex-A8 and Cortex-A9 support 128-bit vectors, but will execute with 64 bits at a time,[147] whereas some more powerful CPUs such as Cortex-A15 can execute 128 bits at a time.[153][154]
A quirk of Neon in Armv7 devices is that it flushes all subnormal numbers to zero, and as a result the GCC compiler will not use it unless -funsafe-math-optimizations, which allows losing denormals, is turned on. "Enhanced" Neon defined since Armv8 does not have this quirk, but as of GCC 8.2 the same flag is still required to enable Neon instructions.[155] On the other hand, GCC does consider Neon safe on AArch64 for Armv8.
ProjectNe10 is ARM's first open-source project (from its inception; while they acquired an older project, now named Mbed TLS). The Ne10 library is a set of common, useful functions written in both Neon and C (for compatibility). The library was created to allow developers to use Neon optimisations without learning Neon, but it also serves as a set of highly optimised Neon intrinsic and assembly code examples for common DSP, arithmetic, and image processing routines. The source code is available on GitHub.[156]
ARM Helium technology
[edit]Helium is the M-Profile Vector Extension (MVE). It adds more than 150 scalar and vector instructions.[157]
Security extensions
[edit]TrustZone (for Cortex-A profile)
[edit]The Security Extensions, marketed as TrustZone Technology, is in ARMv6KZ and later application profile architectures. It provides a low-cost alternative to adding another dedicated security core to an SoC, by providing two virtual processors backed by hardware based access control. This lets the application core switch between two states, referred to as worlds (to reduce confusion with other names for capability domains), to prevent information leaking from the more trusted world (the Secure world) to the less trusted world (the Normal world).[158] This world switch is generally orthogonal to all other capabilities of the processor, thus each world can operate independently of the other while using the same core. Memory and peripherals are then made aware of the operating world of the core and may use this to provide access control to secrets and code on the device.[159]
Typically, a rich operating system is run in the less trusted world, with smaller security-specialised code in the more trusted world, aiming to reduce the attack surface. Typical applications include DRM functionality for controlling the use of media on ARM-based devices,[160] and preventing any unapproved use of the device.
In practice, since the specific implementation details of proprietary TrustZone implementations have not been publicly disclosed for review, it is unclear what level of assurance is provided for a given threat model, but they are not immune from attack.[161][162]
Open Virtualization[163] is an open source implementation of the trusted world architecture for TrustZone.
AMD has licensed and incorporated TrustZone technology into its Secure Processor Technology.[164] AMD's APUs include a Cortex-A5 processor for handling secure processing, which is enabled in some, but not all products.[165][166][167] In fact, the Cortex-A5 TrustZone core had been included in earlier AMD products, but was not enabled due to time constraints.[166]
Samsung Knox uses TrustZone for purposes such as detecting modifications to the kernel, storing certificates and attestating keys.[168]
TrustZone for Armv8-M (for Cortex-M profile)
[edit]The Security Extension, marketed as TrustZone for Armv8-M Technology, was introduced in the Armv8-M architecture. While containing similar concepts to TrustZone for Armv8-A, it has a different architectural design, as world switching is performed using branch instructions instead of using exceptions.[169] It also supports safe interleaved interrupt handling from either world regardless of the current security state. Together these features provide low latency calls to the secure world and responsive interrupt handling. ARM provides a reference stack of secure world code in the form of Trusted Firmware for M and PSA Certified.
No-execute page protection
[edit]As of ARMv6, the ARM architecture supports no-execute page protection, which is referred to as XN, for eXecute Never.[170]
Large Physical Address Extension (LPAE)
[edit]The Large Physical Address Extension (LPAE), which extends the physical address size from 32 bits to 40 bits, was added to the Armv7-A architecture in 2011.[171]
The physical address size may be even larger in processors based on the 64-bit (Armv8-A) architecture. For example, it is 44 bits in Cortex-A75 and Cortex-A65AE.[172]
Armv8-R and Armv8-M
[edit]The Armv8-R and Armv8-M architectures, announced after the Armv8-A architecture, share some features with Armv8-A. However, Armv8-M does not include any 64-bit AArch64 instructions, and Armv8-R originally did not include any AArch64 instructions; those instructions were added to Armv8-R later.
Armv8.1-M
[edit]The Armv8.1-M architecture, announced in February 2019, is an enhancement of the Armv8-M architecture. It brings new features including:
- A new vector instruction set extension. The M-Profile Vector Extension (MVE), or Helium, is for signal processing and machine learning applications.
- Additional instruction set enhancements for loops and branches (Low Overhead Branch Extension).
- Instructions for half-precision floating-point support.
- Instruction set enhancement for TrustZone management for Floating Point Unit (FPU).
- New memory attribute in the Memory Protection Unit (MPU).
- Enhancements in debug including Performance Monitoring Unit (PMU), Unprivileged Debug Extension, and additional debug support focus on signal processing application developments.
- Reliability, Availability and Serviceability (RAS) extension.
64/32-bit architecture
[edit]
Armv8
[edit]Armv8-A
[edit]Announced in October 2011,[13] Armv8-A (often called ARMv8 while the Armv8-R is also available) represents a fundamental change to the ARM architecture. It supports two Execution states: a 64-bit state named AArch64 and a 32-bit state named AArch32. In the AArch64 state, a new 64-bit A64 instruction set is supported; in the AArch32 state, two instruction sets are supported: the original 32-bit instruction set, named A32, and the 32-bit Thumb-2 instruction set, named T32. AArch32 provides user-space compatibility with Armv7-A. The processor state can change on an Exception level change; this allows 32-bit applications to be executed in AArch32 state under a 64-bit OS whose kernel executes in AArch64 state, and allows a 32-bit OS to run in AArch32 state under the control of a 64-bit hypervisor running in AArch64 state.[1] ARM announced their Cortex-A53 and Cortex-A57 cores on 30 October 2012.[78] Apple was the first to release an Armv8-A compatible core in a consumer product (Apple A7 in iPhone 5S). AppliedMicro, using an FPGA, was the first to demo Armv8-A.[173] The first Armv8-A SoC from Samsung is the Exynos 5433 used in the Galaxy Note 4, which features two clusters of four Cortex-A57 and Cortex-A53 cores in a big.LITTLE configuration; but it will run only in AArch32 mode.[174]
To both AArch32 and AArch64, Armv8-A makes VFPv3/v4 and advanced SIMD (Neon) standard. It also adds cryptography instructions supporting AES, SHA-1/SHA-256 and finite field arithmetic.[175] AArch64 was introduced in Armv8-A and its subsequent revision. AArch64 is not included in the 32-bit Armv8-R and Armv8-M architectures.
An ARMv8-A processor can support one or both of AArch32 and AArch64; it may support AArch32 and AArch64 at lower Exception levels and only AArch64 at higher Exception levels.[176] For example, the ARM Cortex-A32 supports only AArch32,[177] the ARM Cortex-A34 supports only AArch64,[178] and the ARM Cortex-A72 supports both AArch64 and AArch32.[179] An ARMv9-A processor must support AArch64 at all Exception levels, and may support AArch32 at EL0.[176]
Armv8-R
[edit]Optional AArch64 support was added to the Armv8-R profile, with the first ARM core implementing it being the Cortex-R82.[180] It adds the A64 instruction set.
Armv9
[edit]Armv9-A
[edit]Announced in March 2021, the updated architecture places a focus on secure execution and compartmentalisation.[181][182] The first ARMv9-A processors were released later that year, including the Cortex-A510, Cortex-A710 and Cortex-X2.
Arm SystemReady
[edit]Arm SystemReady is a compliance program that helps ensure the interoperability of an operating system on Arm-based hardware from datacenter servers to industrial edge and IoT devices. The key building blocks of the program are the specifications for minimum hardware and firmware requirements that the operating systems and hypervisors can rely upon. These specifications are:[183]
- Base System Architecture (BSA)[184] and the market segment specific supplements (e.g., Server BSA supplement)[185]
- Base Boot Requirements (BBR)[186] and Base Boot Security Requirements (BBSR)[187]
These specifications are co-developed by Arm and its partners in the System Architecture Advisory Committee (SystemArchAC).
Architecture Compliance Suite (ACS) is the test tools that help to check the compliance of these specifications. The Arm SystemReady Requirements Specification documents the requirements of the certifications.[188]
This program was introduced by Arm in 2020 at the first DevSummit event. Its predecessor Arm ServerReady was introduced in 2018 at the Arm TechCon event. This program currently includes two bands:
- SystemReady Band: this band focuses on operating system interoperability for Advanced Configuration and Power Interface ACPI environments, where generic operating systems can be installed on either new or old hardware without modification. This band is relevant for systems using Windows, Linux, VMware, and BSD environments.[189]
- SystemReady Devicetree Band: this band optimizes install and boot for embedded systems where devicetree is the preferred method of describing hardware, with a focus on forward compatibility. This applies to Linux distributions and BSD environments specifically.[190]
PSA Certified
[edit]PSA Certified, formerly named Platform Security Architecture, is an architecture-agnostic security framework and evaluation scheme. It is intended to help secure Internet of things (IoT) devices built on system-on-a-chip (SoC) processors.[191] It was introduced to increase security where a full trusted execution environment is too large or complex.[192]
The architecture was introduced by Arm in 2017 at the annual TechCon event.[192][193] Although the scheme is architecture agnostic, it was first implemented on Arm Cortex-M processor cores intended for microcontroller use. PSA Certified includes freely available threat models and security analyses that demonstrate the process for deciding on security features in common IoT products.[194] It also provides freely downloadable application programming interface (API) packages, architectural specifications, open-source firmware implementations, and related test suites.[195]
Following the development of the architecture security framework in 2017, the PSA Certified assurance scheme launched two years later at Embedded World in 2019.[196] PSA Certified offers a multi-level security evaluation scheme for chip vendors, OS providers and IoT device makers.[197] The Embedded World presentation introduced chip vendors to Level 1 Certification. A draft of Level 2 protection was presented at the same time.[198] Level 2 certification became a usable standard in February 2020.[199]
The certification was created by PSA Joint Stakeholders to enable a security-by-design approach for a diverse set of IoT products. PSA Certified specifications are implementation and architecture agnostic, as a result they can be applied to any chip, software or device.[200][198] The certification also removes industry fragmentation for IoT product manufacturers and developers.[201]
Operating system support
[edit]32-bit operating systems
[edit]Historical operating systems
[edit]The first 32-bit ARM-based personal computer, the Acorn Archimedes, was originally intended to run an ambitious operating system called ARX. The machines shipped with RISC OS, which was also used on later ARM-based systems from Acorn and other vendors. Some early Acorn machines were also able to run a Unix port called RISC iX. (Neither is to be confused with RISC/os, a contemporary Unix variant for the MIPS architecture.)
Embedded operating systems
[edit]The 32-bit ARM architecture is supported by a large number of embedded and real-time operating systems, including:
- A2
- Android
- ChibiOS/RT
- Deos
- DRYOS
- eCos
- embOS
- FreeBSD
- FreeRTOS
- INTEGRITY
- Linux
- Micro-Controller Operating Systems
- Mbed
- MINIX 3
- MQX
- Nucleus PLUS
- NuttX
- OKL4
- Operating System Embedded (OSE)
- OS-9[202]
- Pharos[203]
- Plan 9
- PikeOS[204]
- QNX
- RIOT
- RTEMS
- RTXC Quadros
- SCIOPTA[205]
- ThreadX
- TizenRT
- T-Kernel
- VxWorks
- Windows Embedded Compact
- Windows 10 IoT Core
- Zephyr
Mobile device operating systems
[edit]As of March 2024, the 32-bit ARM architecture used to be the primary hardware environment for most mobile device operating systems such as the following but many of these platforms such as Android and Apple iOS have evolved to the 64-bit ARM architecture:
Formerly, but now discontinued:
Desktop and server operating systems
[edit]The 32-bit ARM architecture is supported by RISC OS and by multiple Unix-like operating systems including:
64-bit operating systems
[edit]Embedded operating systems
[edit]Mobile device operating systems
[edit]- Android supports Armv8-A in Android Lollipop (5.0) and later.
- iOS supports Armv8-A in iOS 7 and later on 64-bit Apple SoCs. iOS 11 and later, and iPadOS, only support 64-bit ARM processors and applications.
- HarmonyOS NEXT was developed specifically for ARM processors, starting from its launch in 2024.
- Mobian
- PostmarketOS
- Arch Linux ARM
- Manjaro[212]
Desktop and server operating systems
[edit]- Support for Armv8-A was merged into the Linux kernel version 3.7 in late 2012.[213] Armv8-A is supported by a number of Linux distributions, such as:
- Support for Armv8-A was merged into FreeBSD in late 2014.[222]
- OpenBSD has Armv8 support as of 2023[update].[223]
- NetBSD has Armv8 support since early 2018.[224]
- Windows - Windows 10 runs 32-bit "x86 and 32-bit ARM applications",[225] as well as native ARM64 desktop apps;[226][227] Windows 11 runs native ARM64 apps and can also run x86 and x86-64 apps via emulation. Support for 64-bit ARM apps in the Microsoft Store has been available since November 2018.[228]
- macOS has ARM support since late 2020; the first release to support ARM is macOS Big Sur.[229] Rosetta 2 adds support for x86-64 applications but not virtualization of x86-64 computer platforms.[230]
Porting to 32- or 64-bit ARM operating systems
[edit]Windows applications recompiled for ARM and linked with Winelib, from the Wine project, can run on 32-bit or 64-bit ARM in Linux, FreeBSD, or other compatible operating systems.[231][232] x86 binaries, e.g. when not specially compiled for ARM, have been demonstrated on ARM using QEMU with Wine (on Linux and more),[citation needed] but do not work at full speed or same capability as with Winelib.
Notes
[edit]- ^ Using 32-bit words, 4 Mbit/s corresponds to 1 MIPS.
- ^ Available references do not mention which design team this was, but given the timing and known history of designs of the era, it is likely this was the National Semiconductor team whose NS32016 suffered from a large number of bugs.
- ^ Matt Evans notes that it appears the faster versions were simply binned higher, and appear to have no underlying changes.[39]
See also
[edit]- Amber – an open-source ARM-compatible processor core
- AMULET – an asynchronous implementation of the ARM architecture
- Apple silicon
- ARM Accredited Engineer – certification program
- ARM big.LITTLE – ARM's heterogeneous computing architecture
- ARMulator – an instruction set simulator
- Comparison of ARM processors
- Meltdown (security vulnerability)[233]
- Reduced instruction set computer (RISC)
- RISC-V
- Spectre (security vulnerability)
- Unicore – a 32-register architecture based heavily on a 32-bit ARM
References
[edit]Citations
[edit]- ^ a b c d e f Grisenthwaite, Richard (2011). "ARMv8-A Technology Preview" (PDF). Archived from the original (PDF) on 11 November 2011. Retrieved 31 October 2011.
- ^ "6.1.2.1 VFP register usage conventions". Procedure Call Standard for the ARM Architecture. Arm Holdings. 6 October 2023. Retrieved 22 August 2024.
- ^ a b Wilson, Roger (2 November 1988). "Some facts about the Acorn RISC Machine". Newsgroup: comp.arch. Retrieved 25 May 2007.
- ^ a b Hachman, Mark (14 October 2002). "ARM Cores Climb into 3G Territory". ExtremeTech. Retrieved 24 May 2018.
- ^ Turley, Jim (18 December 2002). "The Two Percent Solution". Embedded. Retrieved 14 February 2023.
- ^ Cutress, Ian (22 June 2020). "New #1 Supercomputer: Fujitsu's Fugaku and A64FX take Arm to the Top with 415 PetaFLOPs". anandtech.com. Archived from the original on 12 June 2025. Retrieved 25 January 2021.
- ^ "Arm Partners Have Shipped 200 Billion Chips". Arm (Press release). Retrieved 3 November 2021.
- ^ "Enabling Mass IoT connectivity as ARM partners ship 100 billion chips". community.arm.com. 27 February 2017. Retrieved 8 April 2020.
the cumulative deployment of 100 billion chips, half of which shipped in the last four years. [..] why not a trillion or more? That is our target, seeing a trillion connected devices deployed over the next two decades.
- ^ "MCU Market on Migration Path to 32-bit and ARM-based Devices: 32-bit tops in sales; 16-bit leads in unit shipments". IC Insights. 25 April 2013. Retrieved 1 July 2014.
- ^ Turley, Jim (2002). "The Two Percent Solution". embedded.com. Archived from the original on 15 February 2023.
- ^ Prickett Morgan, Timothy (1 February 2011). "Arm Holdings eager for PC and server expansion". The Register.
- ^ McGuire-Balanza, Kerry (11 May 2010). "ARM from zero to billions in 25 short years". Arm Holdings. Retrieved 8 November 2012.
- ^ a b "ARM Discloses Technical Details of the Next Version of the ARM Architecture" (Press release). Arm Holdings. 27 October 2011. Archived from the original on 1 January 2019. Retrieved 20 September 2013.
- ^ "Announcing the ARM Neoverse N1 Platform". community.arm.com. 20 February 2019. Retrieved 8 April 2020.
- ^ Fairbairn, Douglas (31 January 2012). "Oral History of Sophie Wilson" (PDF). Archived (PDF) from the original on 3 March 2016. Retrieved 2 February 2016.
- ^ Smith, Tony (30 November 2011). "The BBC Micro turns 30". The Register Hardware. Archived from the original on 12 December 2011. Retrieved 12 December 2011.
- ^ Polsson, Ken. "Chronology of Microprocessors". Processortimeline.info. Archived from the original on 9 August 2018. Retrieved 27 September 2013.
- ^ Leedy, Glenn (April 1983). "The National Semiconductor NS16000 Microprocessor Family". Byte. pp. 53–66. Retrieved 22 August 2020.
- ^ Evans 2019, 6:00.
- ^ Manners, David (29 April 1998). "ARM's way". Electronics Weekly. Archived from the original on 29 July 2012. Retrieved 26 October 2012.
- ^ Evans 2019, 5:30.
- ^ a b Evans 2019, 7:45.
- ^ Evans 2019, 8:30.
- ^ Sophie Wilson at Alt Party 2009 (Part 3/8). Archived from the original on 11 December 2021.
- ^ Chisnall, David (23 August 2010). Understanding ARM Architectures. Retrieved 26 May 2013.
- ^ Bateman, Selby (22 September 1986). "Bill Mench -- The Brains Behind The Brains". Retrieved 3 September 2025.
- ^ Evans 2019, 9:00.
- ^ Furber, Stephen B. (2000). ARM system-on-chip architecture. Boston: Addison-Wesley. ISBN 0-201-67519-6.
- ^ Evans 2019, 9:50.
- ^ Evans 2019, 23:30.
- ^ Evans 2019, 26:00.
- ^ "ARM Instruction Set design history with Sophie Wilson (Part 3)". 10 May 2015. Archived from the original on 11 December 2021. Retrieved 25 May 2020 – via YouTube.
- ^ "Oral History of Sophie Wilson – 2012 Computer History Museum Fellow" (PDF). Computer History Museum. 31 January 2012. Retrieved 25 May 2020.
- ^ Harker, T. (Summer 2009). "ARM gets serious about IP (Second in a two-part series [Associated Editors' View]". IEEE Solid-State Circuits Magazine. 1 (3): 8–69. doi:10.1109/MSSC.2009.933674. ISSN 1943-0590. S2CID 36567166.
- ^ Evans 2019, 20:30.
- ^ "Chris's Acorns: Acorn OEM Products". chrisacorns.computinghistory.org.uk. Retrieved 24 April 2025.
- ^ "Acorn Computers Limited - Press Release - ARM Evaluation System Announcement" (PDF). Chris's Acorns Website, Archive hosted by The Cambridge Centre for Computing History, UK. 7 July 1986. Retrieved 13 October 2025.
- ^ "Chris's Acorns: Acorn A500 second processor". Chris's Acorns Website, Archive hosted by Cambridge Centre for Computing History, UK. Retrieved 13 October 2025.
- ^ Evans 2019, 22:00.
- ^ Evans 2019, 21:30.
- ^ Evans 2019, 22:0030.
- ^ "Chris's Acorns: Acorn A500 (prototype)". chrisacorns.computinghistory.org.uk. Retrieved 24 April 2025.
- ^ a b Evans 2019, 14:00.
- ^ "From one Arm to the next! ARM Processors and Architectures". Retrieved 31 May 2022.
- ^ Levy, Markus. "The History of The ARM Architecture: From Inception to IPO" (PDF). Archived from the original (PDF) on 18 July 2022. Retrieved 18 July 2022.
- ^ Introducing the Commodore Amiga 3000 (PDF). Commodore-Amiga, Inc. 1991.
- ^ "Computer MIPS and MFLOPS Speed Claims 1980 to 1996". www.roylongbottom.org.uk. Retrieved 17 June 2023.
- ^ Santanu Chattopadhyay (2010). Embedded System Design. PHI Learning Pvt. Ltd. p. 9. ISBN 978-81-203-4024-4.
- ^ Richard Murray. "32 bit operation".
- ^ "ARM Company Milestones". ARM. Archived from the original on 20 April 2015. Retrieved 8 April 2015.
- ^ Andrews, Jason (2005). "3 SoC Verification Topics for the ARM Architecture". Co-verification of hardware and software for ARM SoC design. Oxford, UK: Elsevier. pp. 69. ISBN 0-7506-7730-9.
ARM started as a branch of Acorn Computer in Cambridge, England, with the formation of a joint venture between Acorn, Apple and VLSI Technology. A team of twelve employees produced the design of the first ARM microprocessor between 1983 and 1985.
- ^ Weber, Jonathan (28 November 1990). "Apple to Join Acorn, VLSI in Chip-Making Venture". Los Angeles Times. Los Angeles. Retrieved 6 February 2012.
Apple has invested about $3 million (roughly 1.5 million pounds) for a 30% interest in the company, dubbed Advanced Risc Machines Ltd. (ARM) [...]
- ^ "ARM Corporate Backgrounder" (PDF). ARM. Archived from the original (PDF) on 4 October 2006.
- ^ Montanaro, James; et al. (1997). "A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor" (PDF). Digital Technical Journal. 9 (1): 49–62. Archived from the original (PDF) on 18 January 2024.
- ^ DeMone, Paul (9 November 2000). "ARM's Race to Embedded World Domination". Real World Technologies. Retrieved 6 October 2015.
- ^ "March of the Machines". technologyreview.com. MIT Technology Review. 20 April 2010. Archived from the original on 16 October 2015. Retrieved 6 October 2015.
- ^ Krazit, Tom (3 April 2006). "ARMed for the living room". CNET.
- ^ a b Fitzpatrick, J. (2011). "An Interview with Steve Furber". Communications of the ACM. 54 (5): 34–39. doi:10.1145/1941487.1941501.
- ^ Tracy Robinson (12 February 2014). "Celebrating 50 Billion shipped ARM-powered Chips". Archived from the original on 20 December 2015. Retrieved 31 January 2016.
- ^ Sarah Murry (3 March 2014). "ARM's Reach: 50 Billion Chip Milestone". Archived from the original on 16 September 2015.
- ^ Brown, Eric (2009). "ARM netbook ships with detachable tablet". Archived from the original on 3 January 2013. Retrieved 19 August 2009.
- ^ Peter Clarke (7 January 2016). "Amazon Now Sells Own ARM chips".
- ^ "MACOM Successfully Completes Acquisition of AppliedMicro" (Press release). 26 January 2017. Archived from the original on 25 May 2019. Retrieved 25 May 2019.
- ^ Frumusanu, Andrei. "ARM Details Built on ARM Cortex Technology License". AnandTech. Archived from the original on 31 May 2016. Retrieved 26 May 2019.
- ^ Cutress, Ian. "ARM Flexible Access: Design the SoC Before Spending Money". AnandTech. Archived from the original on 16 July 2019. Retrieved 9 October 2019.
- ^ "ARM Flexible Access Frequently Asked Questions". ARM. Retrieved 9 October 2019.
- ^ Nolting, Stephan. "STORM CORE Processor System" (PDF). OpenCores. Retrieved 1 April 2014.
- ^ ZAP on GitHub
- ^ "Cortex-M23 Processor". ARM. Retrieved 27 October 2016.
- ^ "Cortex-M33 Processor". ARM. Retrieved 27 October 2016.
- ^ "ARMv8-M Architecture Simplifies Security for Smart Embedded". ARM. Retrieved 10 November 2015.
- ^ Ltd, Arm. "M-Profile Architectures". Arm | The Architecture for the Digital World. Retrieved 29 August 2023.
- ^ "ARMv8-R Architecture". Retrieved 10 July 2015.
- ^ Craske, Simon (October 2013). "ARM Cortex-R Architecture" (PDF). Arm Holdings. Archived from the original (PDF) on 6 April 2014. Retrieved 1 February 2014.
- ^ Smith, Ryan (20 September 2016). "ARM Announces Cortex-R52 CPU: Deterministic & Safe, for ADAS & More". AnandTech. Archived from the original on 21 September 2016. Retrieved 20 September 2016.
- ^ "Cortex-A32 Processor". ARM. Retrieved 10 October 2019.
- ^ "Cortex-A35 Processor". ARM. Retrieved 10 November 2015.
- ^ a b "ARM Launches Cortex-A50 Series, the World's Most Energy-Efficient 64-bit Processors" (Press release). Arm Holdings. Retrieved 31 October 2012.
- ^ "Cortex-A72 Processor". ARM. Retrieved 10 July 2015.
- ^ "Cortex-A73 Processor". ARM. Retrieved 2 June 2016.
- ^ "ARMv8-A Architecture". Retrieved 10 July 2015.
- ^ "Cavium Thunder X ups the ARM core count to 48 on a single chip". Semiaccurate. SemiAccurate. 3 June 2014. Archived from the original on 6 March 2018. Retrieved 9 December 2014.
- ^ "Cavium at Supercomputing 2014". Yahoo Finance. 17 November 2014. Archived from the original on 16 October 2015. Retrieved 15 January 2017.
- ^ Burt, Jeff (17 November 2014). "Cray to Evaluate ARM Chips in Its Supercomputers". eWeek.
- ^ "Samsung Announces Exynos 8890 with Cat.12/13 Modem and Custom CPU". AnandTech. Archived from the original on 12 November 2015.
- ^ "Cortex-A34 Processor". ARM. Retrieved 10 October 2019.
- ^ "D21500 [AARCH64] Add support for Broadcom Vulcan". reviews.llvm.org.
- ^ "Cortex-A55 Processor". ARM. Retrieved 29 May 2017.
- ^ "Cortex-A75 Processor". ARM. Retrieved 29 May 2017.
- ^ "Cortex-A76 Processor". ARM. Retrieved 11 October 2018.
- ^ Berenice Mann (April 2017). "ARM Architecture – ARMv8.2-A evolution and delivery". community.ARM.com.
- ^ Frumusanu, Andrei. "Samsung Announces the Exynos 9825 SoC: First 7nm EUV Silicon Chip". AnandTech. Archived from the original on 7 August 2019. Retrieved 11 October 2019.
- ^ "Fujitsu began to produce Japan's billions of super-calculations with the strongest ARM processor A64FX". China IT News. Archived from the original on 20 June 2019. Retrieved 17 August 2019.
ARMv8 SVE (Scalable Vector Extension) chip, which uses 512bit floating point.
- ^ "Cortex-A65AE – ARM". ARM. Retrieved 8 April 2020.
can execute two-threads in parallel on each cycle. Each thread can be at different exception levels and run different operating systems.
- ^ Frumusanu, Andrei. "Marvell Announces ThunderX3: 96 Cores & 384 Thread 3rd Gen ARM Server Processor". AnandTech. Archived from the original on 16 March 2020. Retrieved 26 May 2020.
- ^ "AArch64: add support for newer Apple CPUs · apple/llvm-project@677da09". GitHub. Retrieved 23 September 2022.
- ^ "New features for the Armv8-A architecture - Architectures and Processors blog - Arm Community blogs - Arm Community". community.arm.com. 21 September 2020. Retrieved 28 December 2021.
- ^ "Arm's solution to the future needs of AI, security and specialized computing is v9". Arm. Retrieved 16 August 2021.
- ^ "First Armv9 Cortex CPUs for Consumer Compute". community.arm.com. 25 May 2021. Retrieved 16 August 2021.
- ^ "Documentation – Arm Developer". developer.arm.com. Retrieved 3 October 2024.
- ^ "Documentation – Arm Developer". developer.arm.com. Retrieved 3 October 2024.
- ^ "Documentation – Arm Developer". developer.arm.com. Retrieved 3 October 2024.
- ^ "Apple M4 Support Added To The LLVM Compiler, Confirming Its ISA Capabilities". www.phoronix.com. Retrieved 15 June 2024.
- ^ "Arm A-Profile Architecture Developments 2021 - Architectures and Processors blog - Arm Community blogs - Arm Community". community.arm.com. 8 September 2021. Retrieved 25 September 2023.
- ^ "Arm A-Profile Architecture Developments 2022 - Architectures and Processors blog - Arm Community blogs - Arm Community". community.arm.com. 29 September 2022. Retrieved 25 September 2023.
- ^ "Arm A-Profile Architecture Developments 2023 - Architectures and Processors blog - Arm Community blogs - Arm Community". community.arm.com. 29 September 2022. Retrieved 11 October 2024.
- ^ "Arm A-Profile Architecture Developments 2024 - Architectures and Processors blog - Arm Community blogs - Arm Community". community.arm.com. 29 September 2022. Retrieved 11 October 2024.
- ^ "Line Card" (PDF). 2003. Retrieved 1 October 2012.
- ^ Parrish, Kevin (14 July 2011). "One Million ARM Cores Linked to Simulate Brain". EE Times. Retrieved 2 August 2011.
- ^ a b c d e f g h i j k l m n o p ARM Architecture Reference Manual (PDF) (E ed.). ARM. June 2000. pp. v–ix.
- ^ a b c d e f g h i j ARM Architecture Reference Manual (PDF) (I ed.). ARM. July 2005. pp. xiii–xvii.
- ^ a b c ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition (PDF) (C.c ed.). ARM. p. D12-2513.
- ^ Armv7-M Architecture Reference Manual. ARM.
- ^ "ARMv8 Instruction Set Overview" (PDF). ARM. 11 November 2011. A32 & T32 Instruction Sets.
- ^ Armv8-M Architecture Reference Manual. ARM.
- ^ "Processor mode". Arm Holdings. Retrieved 26 March 2013.
- ^ "KVM/ARM" (PDF). Retrieved 14 February 2023.
- ^ Brash, David (August 2010). Extensions to the ARMv7-A Architecture. 2010 IEEE Hot Chips 22 Symposium (HCS). pp. 1–21. doi:10.1109/HOTCHIPS.2010.7480070. ISBN 978-1-4673-8875-7. S2CID 46339775.
- ^ "How does the ARM Compiler support unaligned accesses?". 2011. Archived from the original on 14 October 2013. Retrieved 5 October 2013.
- ^ "Unaligned data access". Retrieved 5 October 2013.
- ^ "Cortex-M0 r0p0 Technical Reference Manual" (PDF). Arm.
- ^ "ARMv7-M Architecture Reference Manual". Arm. Retrieved 18 July 2022.
- ^ a b "ARMv7-A and ARMv7-R Architecture Reference Manual; Arm Holdings". arm.com. Retrieved 19 January 2013.
- ^ "ARM Information Center". Retrieved 10 July 2015.
- ^ "Condition Codes 1: Condition flags and codes". ARM Community. 11 September 2013. Retrieved 26 September 2019.
- ^ "9.1.2. Instruction cycle counts".
- ^ "CoreSight Components: About the Debug Access Port".
- ^ "The Cortex-M3: Debug Access Port (DAP)".
- ^ Anderson, Mike. "Understanding ARM HW Debug Options" (PDF).
- ^ "CMSIS-DAP Debugger User's Guide".
- ^ "CMSIS-DAP".
- ^ "ARM DSP Instruction Set Extensions". arm.com. Archived from the original on 14 April 2009. Retrieved 18 April 2009.
- ^ a b Clarke, Peter (3 May 1999). "EPF: ARC, ARM add DSP extensions to their RISC cores". EE Times. Retrieved 15 March 2024.
- ^ Turley, Jim (18 November 1996). "ARM Tunes Piccolo for DSP Performance" (PDF). Microprocessor Report. Retrieved 15 March 2024.
- ^ "DSP & SIMD". Retrieved 10 July 2015.
- ^ "ARM7TDMI Technical Reference Manual" (PDF). p. ii.
- ^ Jaggar, Dave (1996). ARM Architecture Reference Manual. Prentice Hall. pp. 6–1. ISBN 978-0-13-736299-8.
- ^ Willis, Nathan (10 June 2015). "Resurrecting the SuperH architecture". LWN.net.
- ^ "ARM Processor Instruction Set Architecture". ARM.com. Archived from the original on 15 April 2009. Retrieved 18 April 2009.
- ^ "ARM aims son of Thumb at uCs, ASSPs, SoCs". Linuxdevices.com. Archived from the original on 9 December 2012. Retrieved 18 April 2009.
- ^ "ARM Information Center". Infocenter.arm.com. Retrieved 18 April 2009.
- ^ "Jazelle". ARM Ltd. Archived from the original on 2 June 2017.
- ^ Halfhill, Tom R. (2005). "ARM strengthens Java compilers: New 16-Bit Thumb-2EE Instructions Conserve System Memory" (PDF). Archived from the original (PDF) on 5 October 2007.
- ^ ARM Architecture Reference Manual, Armv7-A and Armv7-R edition, issue C.b, Section A2.10, 25 July 2012.
- ^ "ARM Compiler toolchain Using the Assembler – VFP coprocessor". ARM.com. Retrieved 20 August 2014.
- ^ "VFP directives and vector notation". ARM.com. Retrieved 21 November 2011.
- ^ a b "Differences between ARM Cortex-A8 and Cortex-A9". Shervin Emami. Retrieved 21 September 2025.
- ^ "FPA10 Data Sheet" (PDF). chrisacorns.computinghistory.org.uk. GEC Plessey Semiconductors. 11 June 1993. Retrieved 26 November 2020.
In relation to IEEE 754-1985, the FPA achieves conformance in single-precision arithmetic [...] Occasionally, double- and extended-precision multiplications may be produced with an error of 1 or 2 units in the least significant place of the mantissa.
- ^ a b "Cortex-A7 MPCore Technical Reference Manual – 1.3 Features". ARM. Retrieved 11 July 2014.
- ^ "ArmHardFloatPort – Debian Wiki". Wiki.debian.org. 20 August 2012. Retrieved 8 January 2014.
- ^ "Cortex-A9 Processor". arm.com. Retrieved 21 November 2011.
- ^ "About the Cortex-A9 NEON MPE". arm.com. Retrieved 21 November 2011.
- ^ "US20050125476A1".
- ^ "US20080141004A1".
- ^ "ARM Options". GNU Compiler Collection Manual. Retrieved 20 September 2019.
- ^ Ne10: An open optimized software library project for the ARM Architecture on GitHub
- ^ Yiu, Joseph. "Introduction to ARMv8.1-M architecture" (PDF). Retrieved 18 July 2022.
- ^ "The TrustZone hardware architecture". ARM Developer.
- ^ "Genode – An Exploration of ARM TrustZone Technology". Retrieved 10 July 2015.
- ^ "ARM Announces Availability of Mobile Consumer DRM Software Solutions Based on ARM TrustZone Technology" (Press release). News.thomasnet.com. Retrieved 18 April 2009.
- ^ Laginimaineb (8 October 2015). "Bits, Please!: Full TrustZone exploit for MSM8974". Bits, Please!. Retrieved 3 May 2016.
- ^ Di Shen. "Attacking your 'Trusted Core' Exploiting TrustZone on Android" (PDF). Black Hat Briefings. Retrieved 3 May 2016.
- ^ "ARM TrustZone and ARM Hypervisor Open Source Software". Open Virtualization. Archived from the original on 14 June 2013. Retrieved 14 June 2013.
- ^ "AMD Secure Technology". AMD. Archived from the original on 23 July 2016. Retrieved 6 July 2016.
- ^ Smith, Ryan (13 June 2012). "AMD 2013 APUs to include ARM Cortex A5 Processor for Trustzone Capabilities". AnandTech. Archived from the original on 15 June 2012. Retrieved 6 July 2016.
- ^ a b Shimpi, Anand Lal (29 April 2014). "AMD Beema Mullins Architecture A10 micro 6700T Performance Preview". AnandTech. Archived from the original on 29 April 2014. Retrieved 6 July 2016.
- ^ Walton, Jarred (4 June 2014). "AMD Launches Mobile Kaveri APUs". AnandTech. Archived from the original on 6 June 2014. Retrieved 6 July 2016.
- ^ "Root of Trust" (white paper). Samsung Electronics. April 2016.
- ^ "Relationship between ARM TrustZone technology for ARMv8-M and ARM Cortex-A processors". ARM Developer.
- ^ "ARM Architecture Reference Manual" (PDF). p. B4-8. Archived from the original (PDF) on 6 February 2009.
APX and XN (execute never) bits have been added in VMSAv6 [Virtual Memory System Architecture]
- ^ ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition. ARM Limited.
- ^ "Cortex-A65AE". ARM Developer. Retrieved 26 April 2019.
- ^ "AppliedMicro Showcases World's First 64-bit ARM v8 Core" (Press release). AppliedMicro. 28 October 2011. Retrieved 11 February 2014.
- ^ "Samsung's Exynos 5433 is an A57/A53 ARM SoC". AnandTech. Archived from the original on 16 September 2014. Retrieved 17 September 2014.
- ^ "ARM Cortex-A53 MPCore Processor Technical Reference Manual: Cryptography Extension". ARM. Retrieved 11 September 2016.
- ^ a b "Impact of implemented Exception levels". Learn the architecture - AArch64 Exception Model. Arm.
- ^ "Cortex-A32". Arm Developer.
- ^ "Cortex-A34". Arm Developer.
- ^ "Cortex-A72". Arm Developer.
- ^ Frumusanu, Andrei (3 September 2020). "ARM Announced Cortex-R82: First 64-bit Real Time Processor". AnandTech. Archived from the original on 3 September 2020.
- ^ Frumusanu, Andrei (30 March 2021). "Arm Announces Armv9 Architecture: SVE2, Security, and the Next Decade". AnandTech. Archived from the original on 30 March 2021.
- ^ Harrod, Alex (30 March 2021). "Arm's Solution to the Future Needs of AI, Security and Specialized Computing is v9" (Press release). Arm Holdings.
- ^ "SystemReady Compliance Program". Arm.
- ^ "Arm Base System Architecture". Arm.
- ^ "Arm Server Base System Architecture". Arm.
- ^ "Arm Base Boot Requirements". Arm.
- ^ "Base Boot Security Requirements". Arm.
- ^ "Arm SystemReady Requirements Specification". Arm.
- ^ "Arm SystemReady Band". Arm.
- ^ "Arm SystemReady Devicetree Band". Arm.
- ^ Osborne, Charlie. "ARM announces PSA security architecture for IoT devices". ZDNet.
- ^ a b Wong, William (25 October 2017). "ARM's Platform Security Architecture Targets Cortex-M". Electronic Design. Archived from the original on 8 May 2019.
- ^ Hoffenberg, Steve (31 October 2017). "ARM: Security Isn't Just a Technological Imperative, It's a Social Responsibility". VDC Research. Archived from the original on 28 September 2023.
- ^ Armasu, Lucian (22 February 2018). "ARM Reveals More Details About Its IoT Platform Security Architecture". Tom's Hardware.
- ^ Williams, Chris. "ARM PSA IoT API? BRB... Toolbox of tech to secure net-connected kit opens up some more". The Register.
- ^ Hayes, Caroline (25 February 2019). "Embedded World: Arm introduces fourth security element to PSA". Electronics Weekly.
- ^ "PSA Certified: building trust in IoT". PSA Certified.
- ^ a b "PSA Certified–building trust, building value". EE Times. 4 March 2019.
- ^ "The $6trn importance of security standards and regulation in the IoT era". IoT Now. 16 March 2020.
- ^ McGregor, Jim (4 March 2019). "Arm Introduces Security Certification Testing For IoT". Forbes.
- ^ Speed, Richard (26 February 2019). "Azure IoT heads spaceward to maintain connectivity at the edge, courtesy of Inmarsat". TheRegister.
- ^ "OS-9 Specifications". Microware. Archived from the original on 7 January 2019. Retrieved 29 April 2014.
- ^ a b "Pharos". SourceForge. Retrieved 24 May 2018.
- ^ "PikeOS Safe and Secure Virtualization". Retrieved 10 July 2013.
- ^ a b "Safety Certified Real-Time Operating Systems – Supported CPUs".
- ^ "ARM Platform Port". opensolaris.org. Archived from the original on 2 December 2012. Retrieved 29 December 2012.
- ^ "Green Hills Software's INTEGRITY-based Multivisor Delivers Embedded Industry's First 64-bit Secure Virtualization Solution". ghs.com. Retrieved 14 March 2018.
- ^ "Enea OSE real-time operating system for 5G and LTE-A | Enea". enea.com. Archived from the original on 1 January 2019. Retrieved 17 April 2018.
- ^ "Supported Platforms". docs.sel4.systems. Retrieved 23 November 2018.
- ^ "QNX Software Development Platform (SDP 7.0) | BlackBerry QNX". blackberry.qnx.com. Retrieved 27 July 2020.
- ^ "Wind River Releases 64-Bit VxWorks RTOS" (Press release). Wind River Systems. 28 February 2011. Retrieved 24 October 2023.
- ^ "Manjaro-ARM". Manjaro wiki. 20 June 2022.
- ^ Torvalds, Linus (1 October 2012). "Re: [GIT PULL] arm64: Linux kernel port". Linux kernel mailing list (Mailing list). Retrieved 2 May 2019.
- ^ Larabel, Michael (27 February 2013). "64-bit ARM Version of Ubuntu/Debian Is Booting". Phoronix. Retrieved 17 August 2014.
- ^ "Debian Project News – August 14th, 2014". Debian. 14 August 2014. Retrieved 8 August 2025.
- ^ "Ubuntu Server for ARM". ubuntu.com.
- ^ "Architectures/AArch64". Retrieved 16 January 2015.
- ^ "NixOS on ARM". Retrieved 21 March 2025.
- ^ "Portal:ARM/AArch64". Retrieved 16 January 2015.
- ^ "SUSE Linux Enterprise 12 SP2 Release Notes". Retrieved 11 November 2016.
- ^ "Red Hat introduces ARM server support for Red Hat Enterprise Linux". redhat.com. Retrieved 18 January 2019.
- ^ "64-bit ARM architecture project update". The FreeBSD Foundation. 24 November 2014.
- ^ "OpenBSD/arm64". Retrieved 25 September 2023.
- ^ "NetBSD/arm64". Retrieved 5 August 2018.
- ^ "HP, Asus announce first Windows 10 ARM PCs: 20-hour battery life, gigabit LTE". Ars Technica. Retrieved 22 January 2018.
This new version of Windows 10 is Microsoft's first 64-bit ARM operating system. It'll run x86 and 32-bit ARM applications from the Store, and in due course, 64-bit ARM applications. However, Microsoft hasn't yet finalised its 64-bit ARM SDK. Many pieces are in place (there's a 64-bit ARM compiler, for example), but the company isn't yet taking 64-bit ARM applications submitted to the Store, and there aren't any 64-bit ARM desktop applications either.
- ^ Hassan, Mehedi (10 December 2016). "Windows 10 on ARM64 gets its first compiled apps". MSPoweruser.
- ^ Filippidis, Katrina (1 June 2018). "VLC becomes one of first ARM64 Windows apps". Engadget.
- ^ Sweetgall, Marc (15 November 2018). "Official support for Windows 10 on ARM development". Windows Developer. Windows Blogs. Microsoft. Retrieved 17 December 2019.
- ^ Gartenberg, Chaim (12 November 2020). "macOS Big Sur is now available to download". The Verge. Retrieved 13 November 2020.
- ^ Clover, Juli (23 June 2020). "Rosetta Won't Support x86 Virtualization Apps Running Windows". MacRumors. Retrieved 13 November 2020.
- ^ "ARM – The Official Wine Wiki". Retrieved 10 July 2015.
- ^ "ARM64 – The Official Wine Wiki". Retrieved 10 July 2015.
- ^ "ARM Security Updates". ARM Developer. Retrieved 24 May 2018.
Bibliography
[edit]- Evans, Matt (27 December 2019). The Ultimate Acorn Archimedes talk. Schedule 36 Chaos Communication Congress. YouTube. Archived from the original on 11 December 2021 – via media.ccc.de.
Further reading
[edit]External links
[edit]- Official website
, ARM Ltd.
Architecture manuals
[edit]- ARM Limited (1996–2005). "ARM Architecture Reference Manual". documentation-service.arm.com. Retrieved 16 July 2021. - covers ARMv4, ARMv4T, ARMv5T, (ARMv5TExP), ARMv5TE, ARMv5TEJ, and ARMv6
- ARM Limited (2007–2018). "Armv6-M Architecture Reference Manual". ARM documentation. Retrieved 17 July 2021.
- ARM Limited (2007–2018). "ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition". ARM documentation. Retrieved 17 July 2021.
- ARM Limited (2006–2021). "ARMv7-M Architecture Reference Manual". ARM documentation. Retrieved 24 August 2022.
- ARM Limited (2013–2022). "Arm Architecture Reference Manual for A-profile architecture". ARM documentation. Retrieved 24 August 2022.
- ARM Limited (2016–2020). "ARM Architecture Reference Manual Supplement - ARMv8, for the ARMv8-R AArch32 architecture profile". ARM documentation. Retrieved 17 July 2021.
- ARM Limited (2020–2022). "Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R AArch64 architecture profile". ARM documentation. Retrieved 24 August 2022.
- ARM Limited (2015–2022). "Armv8-M Architecture Reference Manual". ARM documentation. Retrieved 24 August 2022.
- ARM Limited (2021). "Arm Armv9-A A64 Instruction Set Architecture". ARM documentation. Retrieved 17 July 2021.
- "ARM Virtualization Extensions". Archived from the original on 18 December 2013.
Quick-reference cards
[edit]Instructions
[edit]- Thumb Archived 20 June 2020 at the Wayback Machine
- ARM and Thumb-2 Archived 20 June 2020 at the Wayback Machine
- Vector Floating Point Archived 19 June 2020 at the Wayback Machine
Opcodes
[edit]- Thumb Archived 30 July 2022 at the Wayback Machine. Additional archives: 22 August 2022.
- ARM Archived 7 June 2022 at the Wayback Machine. Additional archives: 22 August 2022.
- GNU Assembler Directives Archived 30 April 2022 at the Wayback Machine. Additional archives: 22 August 2022.
ARM architecture family
View on Grokipedia- The A-profile for high-performance, general-purpose computing in devices like smartphones, PCs, and servers. The A-profile, the most prominent, supports rich operating systems and has progressed from Armv8-A (introduced in 2011 with 64-bit AArch64 execution) to Armv9-A (launched in 2021), which adds scalable vector extensions for AI workloads and enhanced security features like confidential computing.[5]
- The R-profile for real-time, deterministic operations in safety-critical systems such as automotive braking and medical equipment. Meanwhile, the R-profile (up to Armv8-R) prioritizes low-latency responses.[4]
- The M-profile for low-power microcontrollers in IoT sensors, wearables, and smart home devices. Meanwhile, the M-profile (up to Armv8-M) focuses on minimal code size and power consumption with optional TrustZone security.[4]
History
Origins in Acorn Computers
The development of the ARM architecture began in 1983 at Acorn Computers, a British firm known for its BBC Micro home computer, which relied on the 8-bit MOS Technology 6502 processor.[7] As Acorn sought a successor to enable a shift to 32-bit processing for future systems, engineers Sophie Wilson and Steve Furber led the effort, with Wilson designing the instruction set and Furber handling the overall chip architecture.[8] The project was motivated by the need for a low-cost, high-performance CPU amid intensifying competition from 16- and 32-bit rivals like the Intel 80286 and Motorola 68000.[7] Drawing on emerging RISC (Reduced Instruction Set Computer) principles from academic research at institutions like the University of California, Berkeley, the team prioritized simplicity to minimize transistor count and power consumption.[9] The design incorporated a load/store architecture, a three-stage pipeline, and just 45 instructions, targeting under 1 W of power—ultimately achieving about 0.1 W—to suit battery-powered and embedded applications while integrating seamlessly with Acorn's existing ecosystem.[7] Named the Acorn RISC Machine (ARM), the initial prototype, ARM1, was fabricated on a 3 µm CMOS process by VLSI Technology Inc. and powered up on April 26, 1985, after just 18 months of development using rudimentary tools like BBC BASIC for simulation.[8] The ARM1 featured approximately 25,000 transistors on a compact 7 mm × 7 mm die and operated at a clock speed of 6 MHz, delivering around 4 million instructions per second (MIPS).[9] The ARM1 served as a proof-of-concept, tested in internal development boards, and paved the way for production variants.[7] Its architecture debuted commercially in the Acorn Archimedes personal computers launched in 1987, marking Acorn's transition from 8-bit to 32-bit systems and demonstrating the design's efficiency with a performance edge over contemporaries despite the modest clock speed.[8] This foundational work at Acorn ultimately led to the formation of an independent licensing company in 1990.[7]Formation of ARM Holdings
In late 1990, Acorn Computers spun off its ARM processor technology into a new entity, Advanced RISC Machines Ltd (ARM Ltd), incorporated in Cambridge, United Kingdom, as a joint venture with Apple Computer and VLSI Technology.[2][10] Acorn contributed its intellectual property and a team of 12 engineers, Apple invested $3 million in cash to secure a significant ownership stake driven by its need for a low-power processor for the upcoming Newton personal digital assistant, and VLSI provided semiconductor design tools and fabrication expertise.[11][12] This structure gave Acorn and Apple each approximately 43% of the shares, with VLSI holding the remaining 14%.[13] The formation marked a pivotal shift from Acorn's in-house development to a fabless business model focused on licensing intellectual property rather than manufacturing chips, allowing ARM to commercialize the RISC architecture more broadly.[11][2] Apple's involvement was crucial, as the Newton project—initiated in 1987—required an efficient, battery-friendly CPU that the ARM design uniquely suited, leading Apple to champion the spin-off and fund its early operations.[11][14] VLSI's role extended to the first external license in 1990, enabling it to produce and integrate ARM-based chips while supporting the venture's goal of targeting embedded applications like portable devices and peripherals.[10][15] Early partnerships emphasized ARM's strategy of upfront licensing fees combined with royalties on produced silicon, fostering collaborations beyond the founding trio and positioning the company for global adoption in low-power computing.[11] This approach, rooted in the joint venture's inception on November 27, 1990, laid the foundation for ARM's expansion as an IP provider.[12][2]Key Milestones in Development
The development of the ARM architecture began with the introduction of the ARM2 processor in 1987, which added multiply and multiply-accumulate instructions to the original ARM1 design, enabling more efficient handling of arithmetic operations in embedded systems.[16] This enhancement was crucial for improving performance in early applications like the Acorn Archimedes personal computer, marking ARM's initial foray into commercial computing beyond its Acorn origins.[17] In 1989, the ARM3 processor was released, incorporating an on-chip cache and support for a floating-point unit (FPU) coprocessor, which significantly boosted processing speeds for graphics and scientific computations in workstations.[18] These advancements solidified ARM's reputation for balancing power efficiency with capability, paving the way for broader adoption in battery-constrained devices. The formation of Advanced RISC Machines Ltd. in November 1990, as a joint venture between Acorn Computers, Apple Computer, and VLSI Technology, represented a pivotal shift toward commercial IP licensing and independent development.[2] This entity released the ARM6 processor in 1992, featuring a memory management unit (MMU) and enhanced 32-bit processing, which facilitated virtual memory support and integration into more complex operating systems.[8] A major collaboration emerged in 1996 with Digital Equipment Corporation, resulting in the StrongARM family of processors, which delivered high performance at low power—up to 185 MIPS at 160 MHz—while maintaining full compatibility with the ARMv4 instruction set.[19] This partnership expanded ARM's reach into networking and portable computing, demonstrating the architecture's scalability for demanding applications. To address code density challenges in memory-limited environments, ARM introduced the Thumb instruction set in 1994 as part of the ARMv4 architecture, compressing common 32-bit instructions into 16-bit formats to reduce program size by approximately 30-40% without sacrificing much performance.[20] This innovation proved essential for embedded systems, allowing developers to fit more functionality into constrained ROM spaces. In 2002, ARM launched Jazelle technology, an extension enabling direct hardware execution of Java bytecode, which accelerated Java Virtual Machine (JVM) performance by up to 5-10 times compared to software interpretation alone.[21] By integrating bytecode handling into the processor pipeline, Jazelle optimized resource usage in mobile and embedded Java applications, anticipating the rise of platform-independent software. Key adoptions underscored these technical strides: the ARM architecture powered Apple's Newton personal digital assistant launched in 1993, utilizing the ARM610 processor to enable handwriting recognition and scheduling features in a portable form factor.[2] In the mid-1990s, Texas Instruments licensed ARM cores in 1993, followed by Nokia's adoption for GSM handsets like the 6110 in 1998, which leveraged the ARM7 for efficient signal processing and helped establish ARM as a standard in mobile telephony.[22]Market Growth and Adoption
The ARM architecture experienced significant commercial expansion in the 2000s, driven by its adoption in mobile phones due to superior power efficiency compared to competing architectures. Licensees such as Qualcomm with its Snapdragon processors and Samsung with Exynos chips integrated ARM cores into high-volume smartphone platforms, establishing ARM as the de facto standard for mobile computing by the mid-2000s.[23][24][25] This surge was fueled by the rapid growth of the smartphone market, where ARM's reduced instruction set computing (RISC) design enabled longer battery life and lower costs, leading to a 95% market share in mobile phone processors by 2010.[26] By the 2010s, ARM had solidified its dominance in embedded systems, powering devices from consumer electronics to industrial applications, with cumulative shipments of ARM-based chips exceeding 325 billion units as of 2025.[27] The post-2015 Internet of Things (IoT) boom further accelerated this adoption, as ARM's low-power cores like the Cortex-M series became integral to connected sensors, wearables, and smart home devices, contributing to a projected compound annual growth rate of 19% in IoT installations from 2014 to 2020.[28] ARM's revenue model, centered on upfront licensing fees and per-chip royalties, capitalized on this scale, with licensing revenue surging 56% year-over-year to $515 million in the fiscal second quarter of 2026, reflecting sustained demand across mobile and emerging sectors.[29] ARM's penetration extended to new markets in the late 2010s and 2020s, including servers and personal computers. Amazon Web Services introduced the Graviton processor in November 2018, marking ARM's entry into cloud computing with energy-efficient instances for scale-out workloads.[30] Apple's transition to its own ARM-based Apple Silicon chips for Macs, announced in June 2020 and rolled out starting late that year, accelerated ARM's adoption in high-performance PCs, breaking from Intel's x86 dominance.[31] By 2025, ARM powered over 99% of smartphones worldwide and was projected to capture more than 50% of the data center market, underscoring its broad industry penetration.[32][33]Licensing Model
Core and IP Licensing
The primary mechanism for accessing ARM processor cores involves licensing pre-configured designs such as the Cortex family, which are delivered as complete intellectual property (IP) blocks including the processor core, associated caches, and interconnect buses like CoreLink.[34] These licenses enable licensees to integrate the IP directly into system-on-chip (SoC) designs, ensuring compatibility with the ARM ecosystem while minimizing development time.[35] Pricing for core licenses typically follows a hybrid model combining upfront fees with per-unit royalties. As reported in the early 2010s, upfront fees for standard Cortex core implementations ranged from approximately $1 million to $10 million, depending on the core's complexity and the licensee's scale, while royalties were generally 1% to 2% of the selling price per shipped chip; current terms are negotiated individually and not publicly disclosed.[36][37] For example, licensing a high-performance core like the Cortex-A78 incurs these costs to grant access to its synthesizable design for premium mobile applications. ARM supports customization through two main delivery formats: binary-compatible processor implementations, which are fixed, pre-verified designs for rapid integration, and synthesizable register-transfer level (RTL) code, which allows licensees to modify the core for optimization in power, performance, or area while preserving ARM instruction set compatibility.[38][39] The RTL format, provided in Verilog, facilitates architectural extensions and integration into custom SoCs, particularly for integrated device manufacturers (IDMs).[34] By 2025, ARM has over 350 active licenses across its programs, including 44 Arm Total Access licenses—a subscription-based program providing comprehensive access to Arm's IP portfolio—and 314 Arm Flexible Access licenses, enabling a vast array of partners to develop products.[40] This licensing approach plays a pivotal role in the fabless semiconductor ecosystem, allowing companies without fabrication facilities—such as Qualcomm and MediaTek—to design and outsource production of ARM-based chips, driving innovation in mobile, automotive, and IoT markets without the need for in-house architecture development.[41][42]Architectural and Flexible Access Licenses
The Architectural License, also known as the Architecture License Agreement (ALA), grants licensees full access to Arm's Instruction Set Architecture (ISA) specifications, enabling the design of custom microarchitectures that remain compliant with Arm standards.[1] This license is particularly suited for companies seeking to optimize performance for specific workloads by developing proprietary processor cores, while ensuring broad software ecosystem compatibility across Arm-based devices.[1] Notable adopters include Apple, which utilizes the license for its M-series processors in Macs and other devices; Qualcomm, for custom Kryo CPU designs in Snapdragon SoCs; and Amazon Web Services (AWS), for the Graviton processor family powering cloud infrastructure.[1][43] Key terms of the Architectural License include coverage of major ISA versions such as Armv8-A and Armv9-A, providing detailed technical documentation for instruction sets, extensions, and system architectures without granting exclusive rights—licensees receive non-exclusive permissions to implement and commercialize compliant designs.[1] Royalties are typically assessed per shipped unit, scaled by volume and application, allowing differentiation through tailored implementations like high-efficiency cores for mobile or server environments.[41] This model benefits licensees by fostering innovation beyond off-the-shelf cores, as seen in Apple's performance-optimized M-series for AI and graphics tasks, or AWS Graviton's focus on cloud efficiency, which has delivered up to 20% better price-performance in EC2 instances compared to x86 alternatives.[1][44] Introduced in 2019, the Arm Flexible Access program serves as an entry-level licensing option, offering startups and small-to-medium enterprises upfront, no-cost or low-cost access to a curated portfolio of Arm IP, including processor cores, tools, and training resources, to prototype system-on-chip (SoC) designs.[45][46] Under this program, qualifying startups receive $0 entry-tier membership, enabling unlimited evaluation and design iterations without initial fees, with royalties and manufacturing licenses activating only upon tape-out of a production design.[46] It covers select ISA implementations, such as Armv8-A through Cortex-A series cores, Mali GPUs, and CoreLink interconnects, supporting applications from IoT to edge AI.[46] The Flexible Access model's royalty-based scaling—deferred until commercialization—lowers barriers for emerging companies, allowing them to experiment with Arm technology and achieve market differentiation without prohibitive upfront costs.[46] For instance, it has enabled over 60 partners, including first-time Arm IP users, to accelerate SoC development in high-growth areas like machine learning and automotive systems, often reducing time-to-market by providing pre-verified components and ecosystem support.[47] Non-exclusive rights ensure broad applicability, with three membership tiers (DesignStart for free basics, Entry at $0 for startups or $80,000 annually, and Standard at $212,000 annually) tailored to project scale.[46] This approach contrasts with traditional core licensing by emphasizing exploratory access, ultimately facilitating custom designs that leverage Arm's ISA for specialized benefits like power efficiency in startup-led innovations.[41]Evolution of Licensing Programs
In the early 1990s, ARM's licensing model focused on straightforward intellectual property (IP) agreements for its processor designs, marking the company's initial shift toward a fabless, royalty-based business. The first such licenses were granted in 1991 to GEC Plessey Semiconductors, enabling the production of ARM-based chips for embedded applications.[48] Shortly thereafter, VLSI Technology and Sharp Corporation became licensees, with VLSI integrating ARM cores into its semiconductor offerings and Sharp targeting consumer electronics.[49] These early deals, often involving upfront fees and royalties per shipped unit, laid the foundation for ARM's expansion by allowing partners to manufacture without developing the core IP from scratch.[22] During the 2000s, ARM evolved its licensing to support broader market segments through the introduction of the Cortex family of processor cores, launched in 2005 to standardize designs across application, real-time, and microcontroller profiles.[50] The Cortex-A series targeted high-performance devices like smartphones, Cortex-R focused on real-time systems such as automotive controllers, and Cortex-M addressed low-power embedded uses, providing licensees with configurable, scalable options under a unified branding.[51] This multi-profile approach simplified adoption for partners, who could select cores tailored to specific needs while benefiting from ARM's ongoing architectural updates, fostering widespread integration in mobile and consumer products.[2] In the 2010s, ARM responded to rising competition from open-source alternatives like RISC-V by launching the Flexible Access program in 2019, which offered low-barrier entry to its IP portfolio without immediate full licensing commitments.[52] This initiative allowed developers to access over 75% of ARM's designs, including Cortex cores and tools, for a nominal annual fee, deferring royalties until production, thereby attracting startups and reducing upfront costs compared to traditional models.[45] The program directly addressed RISC-V's no-fee appeal by emphasizing ARM's mature ecosystem and performance optimizations, enabling faster prototyping in emerging markets like IoT.[53] The 2020s saw ARM pivot toward AI-centric licensing, incorporating the Scalable Vector Extension (SVE) and its enhancements in Armv9 to support machine learning workloads on edge devices.[54] SVE, initially developed for high-performance computing, enables vector lengths up to 2048 bits for efficient AI inference and training, with licensing available through core or architectural agreements that integrate these extensions for AI-optimized processors.[55] In 2025, ARM updated its Flexible Access to include edge AI IP bundles, such as the Armv9 platform with Cortex-A320 and Ethos-U85 NPU, providing zero upfront costs for startups to develop on-device AI solutions and compete in the growing edge computing sector.[56]Processor Core Families
Cortex-A Profile Cores
The Cortex-A profile cores form the high-performance segment of ARM's processor family, designed primarily for application processors in devices requiring complex computation, such as smartphones, tablets, and embedded systems with rich operating systems like Android or Linux. These cores implement the ARMv7-A architecture for 32-bit processing and extend to the 64-bit ARMv8-A and ARMv9-A architectures, emphasizing scalability, virtual memory management, and support for advanced operating systems. Introduced to address the growing demands of mobile and consumer electronics, the Cortex-A series balances power efficiency with computational throughput, enabling seamless multitasking and multimedia processing.[57] Representative examples illustrate the evolution of Cortex-A cores across performance tiers and process nodes. The Cortex-A5, announced in 2009 and entering production in 2010, targets low-end applications like feature phones and ultra-low-cost handsets, featuring an in-order 8-stage pipeline, dual-issue execution, and compatibility with the ARMv7-A instruction set for energy-efficient, compact designs. In contrast, the Cortex-A78, unveiled in 2020 and optimized for 5nm process technology, delivers high-end 64-bit performance under ARMv8.2-A, with out-of-order execution, improved branch prediction, and up to 20% higher single-threaded performance compared to its predecessor, the Cortex-A77, while reducing power consumption by approximately 50% at equivalent speeds on advanced nodes. More recently, the Cortex-A320, introduced in February 2025 as the first ultra-efficient ARMv9 core, focuses on AI-optimized edge computing for IoT devices, offering up to 50% better energy efficiency than the Cortex-A520 through a smaller footprint, enhanced AI acceleration via Scalable Matrix Extension (SME), and support for on-device machine learning models without compromising security features like Arm TrustZone.[58][59][60] Key architectural features in Cortex-A cores enhance their suitability for demanding workloads. High-end variants, such as the Cortex-A78 and later models like the Cortex-A720, incorporate out-of-order execution pipelines with dynamic scheduling, allowing up to triple-issue throughput and speculative execution to minimize stalls, which contributes to sustained performance in multi-threaded environments. The big.LITTLE heterogeneous architecture, widely adopted in Cortex-A implementations, pairs power-hungry "big" cores (e.g., Cortex-A78) with efficient "LITTLE" cores (e.g., Cortex-A55) to dynamically allocate tasks based on workload intensity, achieving up to 75% better energy efficiency in mixed-use scenarios like mobile browsing and gaming by idling high-performance cores during light loads. For instance, the Cortex-A720, part of the ARMv9.2 lineup, delivers approximately 20% better power efficiency compared to the Cortex-A715, enabling premium efficiency in sustained workloads. Cortex-A cores power a diverse ecosystem of applications, from consumer devices to enterprise infrastructure. In smartphones, they underpin flagship platforms like Qualcomm's Snapdragon 8 Gen series, where configurations such as the Snapdragon 8 Gen 3 integrate Cortex-X4 prime cores with A720 and A520 clusters for AI-enhanced photography and 5G processing. For personal computers, custom implementations derived from Cortex-A designs, such as Apple's M4 chip in MacBooks, leverage ARMv8-A extensions for desktop-class productivity and creative workflows, delivering over 50% faster CPU performance than prior Intel-based equivalents in battery-constrained scenarios. In servers, AWS Graviton4 processors, built on Neoverse V2 cores evolved from A-profile principles, utilize Cortex-A-derived scalability to handle cloud workloads, offering up to 30% better price-performance for web services and data analytics compared to previous generations. In 2025, ARM rebranded its mobile-oriented Cortex-A derivatives as the Lumex platform for smartphones and tablets, emphasizing AI-specific enhancements like SME2 for matrix computations, while PC-focused variants adopted the Niva branding to target laptop and desktop markets with improved thermal efficiency and vector processing. Under these platforms, Arm introduced the C1 series of CPU cores in September 2025, including the flagship C1-Ultra, which supports Armv9.3-A and delivers up to 25% higher performance than prior high-end designs, with advanced on-device AI capabilities.[61][62]Cortex-R and Cortex-M Profile Cores
The Cortex-R profile of ARM cores is tailored for real-time systems that demand predictable, deterministic performance and minimal interrupt latency to ensure reliable operation in safety-critical environments.[63] These cores implement the Armv7-R and Armv8-R instruction set architectures, providing features such as tightly coupled memory for low-latency access and advanced branch prediction to maintain consistent timing in hard real-time applications.[64] Unlike application-oriented profiles, Cortex-R emphasizes fault tolerance and functional safety, often certified to standards like ISO 26262 for automotive use.[65] A representative example is the Cortex-R52, introduced in 2016 as the first Armv8-R implementation in AArch32 mode, which delivers high-performance 32-bit processing with efficient code density and integrated safety mechanisms, including dual-core lockstep operation for fault detection in redundant configurations.[66][67] The Cortex-R82, announced in 2020, advances this further as the highest-performance Cortex-R core, supporting 64-bit Armv8-R in AArch64 mode with up to 1TB addressable DRAM and enhanced safety features for real-time embedded systems.[68][69] Cortex-R cores are commonly deployed in automotive electronic control units (ECUs), where their deterministic execution handles time-sensitive tasks like engine management and braking systems.[70] The Cortex-M profile complements the R series by focusing on ultra-low-power microcontrollers for cost-sensitive, deeply embedded applications, spanning Armv6-M to Armv8-M architectures with scalable performance levels from basic control to signal processing.[71] These cores prioritize energy efficiency and simplicity, featuring a Harvard architecture with separate instruction and data buses to optimize power in battery-operated devices.[72] Key to their design is support for event-driven execution through the Nested Vectored Interrupt Controller (NVIC), which enables low-latency response to external events with deterministic interrupt handling.[73] The Cortex-M0, launched in 2009, exemplifies the profile's origins in ultra-low-power computing, offering a compact 32-bit core with minimal gate count for simple sensor interfaces and control loops.[74] More recent advancements include the Cortex-M85, introduced in 2022, which provides the highest performance in the series via Arm Helium vector processing and integrates TrustZone-M for hardware-enforced security isolation.[75][76] Cortex-M cores power IoT sensors and wearables, leveraging their event-driven capabilities for responsive, power-efficient operation in connected ecosystems.[77] Cortex-M processors have contributed significantly to the over 250 billion total Arm-based chips shipped as of 2025, dominating the microcontroller market.[78]Legacy and Custom Cores
The ARM7 family of processor cores, introduced in 1993, became a cornerstone of early mobile computing due to its low power consumption and efficient 32-bit RISC design, making it ubiquitous in feature phones and embedded devices during the late 1990s and early 2000s.[2] A notable implementation, the ARM7TDMI, powered the Nokia 6110, the first GSM phone to incorporate an ARM core, which achieved massive commercial success and established ARM as the flagship architecture for mobile designs.[79] This core's Harvard architecture with separate instruction and data caches, combined with debug and multiply extensions, enabled widespread adoption in battery-constrained applications like early cellular handsets.[2] Succeeding the ARM7, the ARM9 cores, released in the early 2000s, enhanced performance through a five-stage pipeline and support for the ARMv4T and later architectures, targeting more demanding embedded systems such as digital multimedia devices. The ARM11 family, introduced around 2002 and prevalent until the late 2000s, further advanced efficiency with an eight-stage pipeline and the introduction of Thumb-2 technology in ARMv6 implementations like the ARM1156T2F-S, which expanded the Thumb instruction set to include 32-bit instructions for improved code density and performance in resource-limited environments.[80][81] These pre-2009 designs emphasized scalar in-order execution, prioritizing power efficiency over aggressive parallelism, and were licensed for use in millions of devices before the shift to more scalable profiles. In 2005, ARM transitioned from these classical cores to the Cortex family, starting with the Cortex-A8 as the first implementation of the ARMv7-A architecture, marking a move toward standardized, configurable designs for broader application scalability.[82] Despite this evolution, custom core development persisted through ARM's architectural licenses, which grant licensees the freedom to create proprietary implementations compliant with the ARM instruction set architecture (ISA) while optimizing for specific workloads.[83] Prominent examples of such custom cores include Apple's A-series and M-series processors, which build on the Armv8 ISA with tailored microarchitectures featuring wider execution units, advanced branch prediction, and integrated high-performance cores to deliver superior single-threaded performance in mobile and desktop systems.[84] Qualcomm's Kryo series represents semi-custom designs, such as the Kryo 280 in the Snapdragon 835, which modifies ARM Cortex cores like the A53 and A73 with custom tweaks to cache hierarchies and pipeline depths for balanced power and throughput in smartphones.[85] Similarly, Samsung's Mongoose cores, debuted in the Exynos 8890 with an ARMv8 base, incorporated wider decode stages and custom floating-point units to enhance multimedia processing in mobile SoCs, though production of these fully custom variants ceased around 2019 in favor of hybrid approaches.[86][87] These custom implementations often achieve performance gains through targeted enhancements, such as increased instruction issue widths or specialized accelerators, without altering the core ISA compatibility.[83]Instruction Set Architectures
Early Architectures (Armv1 to Armv3)
The ARMv1 architecture, introduced in 1985, marked the debut of the ARM reduced instruction set computing (RISC) design as a 32-bit load/store architecture implemented in the ARM1 core. This initial version featured a compact set of 25 instructions focused on essential operations, including data processing (such as ADD, SUB, and MOV), load/store memory access, branches, and software interrupts, without support for multiplication or coprocessor interfaces. The design emphasized simplicity and efficiency, with a 3-stage pipeline consisting of fetch, decode, and execute stages to enable single-cycle instruction execution in most cases. It utilized 16 general-purpose 32-bit registers labeled R0 through R15, where R15 functioned as the program counter, and included a Current Program Status Register (CPSR) for flags like negative (N), zero (Z), carry (C), and overflow (V), though it lacked a Saved Program Status Register (SPSR) and advanced exception handling. The architecture supported a 26-bit address space (64 MB) and operated in four processor modes: User, FIQ (Fast Interrupt), IRQ (Interrupt), and Supervisor, prioritizing low power and high performance per watt for embedded applications.[7][88][89] Building on ARMv1, the ARMv2 architecture emerged in 1986 (with refinements continuing into 1987) and introduced key enhancements to expand functionality while maintaining backward compatibility, primarily implemented in the ARM2 and later ARM3 cores. Notable additions included multiply instructions (MUL for single-word multiplication and MLA for multiply-accumulate) and the swap instruction (SWP/SWPB) for atomic memory operations, increasing the instruction count to approximately 30-40 and enabling more efficient handling of arithmetic-intensive tasks. Coprocessor support was also integrated, allowing external units for tasks like floating-point operations via instructions such as MCR and MRC for data transfer. The 3-stage pipeline remained central, now with improved interrupt handling through banked registers in FIQ mode (adding two extra registers for faster context switching), and the register set expanded slightly with the introduction of an SPSR for preserving status during exceptions. The address space stayed at 26 bits, and the architecture continued to support the same four modes, but with better optimization for real-time systems, as seen in its use in the Acorn Archimedes computer released in 1987. These changes solidified ARMv2 as a more versatile foundation for commercial processors, balancing simplicity with expanded capabilities.[7][88][90] The ARMv3 architecture, released around 1990 and reaching notable implementations by 1993, further refined the series with a shift to a full 32-bit address space (4 GB) and enhanced support for protected memory, implemented in cores like the ARM6 and early ARM7 family. It built on prior versions by improving the multiplier with long multiply instructions (such as UMULL for unsigned long multiply and UMLAL for unsigned multiply-accumulate with accumulate), alongside signed variants, which proved crucial for signal processing and cryptography applications. Coprocessor support was deepened with better integration for memory management units (MMUs), and new instructions like MRS (Move to Register from Status) and MSR (Move to Status from Register) allowed direct access to CPSR and SPSR for mode switching and flag manipulation. The instruction set grew to about 40-50 entries, incorporating enhanced load/store operations (e.g., signed and unsigned byte/halfword loads) and six processor modes—User, FIQ, IRQ, Supervisor, Undefined, and Abort (for data and prefetch aborts)—for robust exception handling. Retaining the 3-stage pipeline, ARMv3 optimized it for higher clock speeds and added features like a 4 KB instruction cache in some implementations, as exemplified by the ARM6 core. This version gained prominence in desktop systems, notably powering the Acorn RISC PC released in 1994, which demonstrated its viability for multitasking environments with MMU-enabled operating systems like RISC OS.[7][88][89] Across ARMv1 to ARMv3, core concepts emphasized a uniform 3-stage pipeline for streamlined execution, a bank of 16 visible 32-bit registers (R0-R15) with mode-specific banking for efficiency, and a load/store model that separated data processing from memory access to reduce complexity and power consumption. These early architectures laid the groundwork for ARM's dominance in low-power computing by prioritizing orthogonal instructions and conditional execution on nearly all operations, enabling compact code without branches.[7][88]32-Bit Architectures (Armv4 to Armv7)
The 32-bit ARM architectures from Armv4 to Armv7 represent a period of significant evolution in the instruction set architecture (ISA), focusing on code density, performance enhancements for embedded and multimedia applications, and support for diverse processor profiles. These versions built upon the foundational load/store RISC design of earlier architectures, emphasizing low power consumption and scalability for mobile and embedded systems. Key shared features include a set of 16 general-purpose 32-bit registers (R0–R15, where R13 serves as the stack pointer, R14 as the link register, and R15 as the program counter) and extensive conditional execution capabilities, allowing nearly all instructions to be predicated on the application program status register (APSR) flags without branching, which reduces code size and improves branch prediction efficiency in pipelines.[91] Pipeline implementations varied by core, ranging from simple 3-stage designs in early Armv4 processors to deeper 8–13 stage superscalar pipelines in Armv7 for higher performance, enabling out-of-order execution and better instruction throughput while maintaining compatibility.[92] Armv4, released in 1996, marked the introduction of the Thumb instruction set in its Armv4T variant, providing 16-bit compressed instructions that offered up to 30–40% better code density compared to the standard 32-bit ARM instructions, ideal for memory-constrained embedded devices. This version was prominently implemented in the ARM7TDMI core, a 3-stage pipelined processor widely used in early mobile phones and PDAs due to its balance of performance and low power. Thumb mode allowed seamless interworking with the full ARM set via branch-and-exchange instructions like BX, while retaining the core's load/store model and conditional execution for efficient control flow. Alignment requirements were strict, mandating natural boundaries for word and halfword accesses to avoid faults.[93][94] Released in 2001, Armv5 enhanced multimedia and signal processing capabilities through its Armv5TE extension, adding DSP-oriented instructions such as enhanced multiply-accumulate operations (e.g., SMULxy for 16-bit signed multiplies) and saturated arithmetic to support fixed-point algorithms with up to 2x performance gains in audio and video processing. The Armv5TEJ variant introduced Jazelle, a hardware acceleration for Java bytecode execution that directly interpreted common bytecodes, reducing software overhead for Java-enabled devices like early smartphones and set-top boxes by interpreting up to 80% of bytecodes natively. Additional features included dual-load/store instructions (LDRD/STRD) for 64-bit transfers and improved Thumb-ARM interworking with BLX, all while preserving the 16-register model and conditional predicates for backward compatibility.[95] Armv6, introduced in 2004, further optimized for media-rich applications with SIMD extensions for parallel 8/16-bit operations on multimedia data, enabling efficient video decoding and image processing in cores like the ARM11 family. It added support for unaligned memory accesses in load/store instructions (LDR/STR), configurable via system control registers, which eliminated penalties for non-aligned data common in packed structures and improved performance by up to 20% in data-intensive tasks without requiring software alignment fixes. The architecture also integrated the Vector Floating Point (VFP) unit as an optional coprocessor for single- and double-precision floating-point operations with SIMD capabilities, supporting media workloads in devices like digital cameras and portable media players. Multi-processor synchronization primitives, such as exclusive load/store pairs (LDREX/STREX), were introduced to facilitate scalable shared-memory systems.[96][97] The Armv7 architecture, launched in 2007, consolidated advancements into three profiles—A for applications (e.g., smartphones with MMU support), R for real-time (e.g., automotive with tightly coupled memory), and M for microcontrollers (e.g., low-power IoT)—each tailored to market needs while sharing the core ISA. Thumb-2 emerged as a major enhancement, mixing 16- and 32-bit instructions for near-ARM performance with Thumb density, including conditional branches and table branches for better loop handling and up to 30% code size reduction. Advanced SIMD was boosted via the NEON extension, a 128-bit vector unit supporting integer and floating-point operations for multimedia acceleration, delivering 4x–8x speedup in tasks like video encoding on Cortex-A8 cores. Virtualization support via the Virtualization Extensions (VE) enabled secure hypervisor modes with stage-2 address translation, facilitating isolated execution environments in Armv7-A profiles. These features, combined with Jazelle RCT for dynamic binary translation and enhanced pipelines (e.g., 8-stage in Cortex-A8), positioned Armv7 as the foundation for modern mobile computing.[98]64-Bit Architectures (Armv8 and Armv9)
The Armv8 architecture, introduced in 2011, marked the transition to 64-bit computing within the Arm family by introducing the AArch64 execution state alongside the legacy AArch32 state for backward compatibility.[99][100] AArch64 features 31 general-purpose 64-bit registers named X0 through X30, enabling larger address spaces and enhanced integer arithmetic compared to the 32-bit registers of prior architectures.[99] This architecture supports multiple profiles: the A-profile for high-performance applications, the R-profile for real-time systems, and the M-profile for microcontrollers, each tailored to specific use cases while sharing core 64-bit capabilities.[101] For memory addressing in AArch32 mode, Armv8 incorporates the Large Physical Address Extension (LPAE), which expands physical addressing to 40 bits, allowing up to 1 terabyte of addressable memory beyond the traditional 32-bit limit.[102] Backward compatibility with AArch32 ensures that existing 32-bit Arm software can run without modification by switching execution states, facilitating a gradual migration to 64-bit operations.[103] Subsequent refinements to Armv8, starting with Armv8.1 in 2016 and continuing through later versions, introduced specialized extensions to enhance reliability and computational efficiency. The Reliability, Availability, and Serviceability (RAS) extensions, mandatory from Armv8.2, provide mechanisms for error detection, reporting, and recovery, such as error record registers and fault injection support, improving system robustness in server and embedded environments.[104][105] Additionally, the Armv8.4 dot-product instructions enable efficient vectorized accumulation of 8-bit integer multiplications into 32-bit results, accelerating machine learning workloads like neural network inference by optimizing matrix operations.[106][107] The Armv9 architecture, announced in 2021, builds on Armv8 by integrating advanced vector processing and security features to address emerging demands in AI and data protection. Central to Armv9 is the Scalable Vector Extension version 2 (SVE2), a superset of the original SVE that supports vector lengths from 128 to 2048 bits in increments of 128 bits, enabling scalable SIMD operations for high-performance computing and machine learning across diverse hardware implementations.[108] SVE2 incorporates functionality from Advanced SIMD (Neon) while adding instructions for digital signal processing and gather-scatter memory access, promoting code portability without vector-length-specific optimizations.[109] For security, Armv9 introduces the Memory Tagging Extension (MTE), which assigns 4-bit tags to memory allocations and pointers, enabling hardware-enforced checks to detect spatial memory errors like buffer overflows at runtime.[110] Complementing MTE is the Confidential Compute Architecture (CCA), a framework for secure enclaves that isolates sensitive workloads from privileged software, including the hypervisor and OS, using realms and attestation for confidential computing scenarios.[111] In 2025, the Armv9.7-A extension further advances A-profile capabilities for AI-driven systems, adding new instructions to SVE and the Scalable Matrix Extension (SME) for handling 6-bit data types in formats like OCP MXFP6, which optimize memory usage and bandwidth for efficient AI model execution.[112] These enhancements, released in October 2025, also include scalability improvements such as targeted TLB invalidations for multi-chip configurations and expanded resource partitioning in MPAMv2, supporting larger-scale AI deployments without compromising performance.[112]Architectural Features and Extensions
Instruction Set Modes and Enhancements
The ARM architecture supports multiple execution modes to manage privilege levels and handle exceptions, evolving from the 32-bit ARMv7 designs to the 64-bit AArch64 in Armv8 and later. In ARMv7-A and ARMv7-R profiles, there are seven processor modes: User (USR), which is unprivileged and used for application execution; Supervisor (SVC), a privileged mode for operating system tasks; Interrupt Request (IRQ) for general interrupts; Fast Interrupt Request (FIQ) for low-latency interrupts with dedicated registers; Abort for memory access errors; Undefined for unimplemented instructions; and System (SYS), a privileged mode for non-exception kernel code.[113] These modes determine access to registers and resources, with privileged modes (all except USR) enabling system control operations. In Armv8-A and Armv9-A, the model shifts to four exception levels (EL0 to EL3) for finer privilege separation: EL0 is unprivileged, akin to User mode for applications; EL1 is privileged for OS kernels, similar to Supervisor; EL2 supports hypervisors; and EL3 handles secure monitoring and TrustZone.[114] Exceptions taken to higher levels increase privilege, with EL3 being the highest for secure state management.[115] A key efficiency feature in the ARM instruction set is conditional execution, allowing most instructions to be predicated on the Application Program Status Register (APSR) flags without branching, thereby reducing pipeline stalls and improving performance in control-flow intensive code. There are 16 condition codes, including EQ (equal), NE (not equal), CS/HS (carry set/unsigned higher or same), CC/LO (carry clear/unsigned lower), MI (minus/negative), PL (plus/positive or zero), VS (overflow), VC (no overflow), HI (unsigned higher), LS (unsigned lower or same), GE (signed greater or equal), LT (signed less than), GT (signed greater than), LE (signed less than or equal), AL (always), and NV (never).[91] In AArch32 (32-bit execution state), instructions append a two-bit condition suffix; in Thumb-2 and AArch64, the IT (If-Then) instruction or equivalent enables up to four conditional instructions following a condition check. This mechanism minimizes branch instructions, which can account for significant overhead in embedded and mobile applications.[116] To enhance code density, ARM introduced the Thumb instruction set in Armv4T, compressing common 32-bit ARM instructions into 16-bit encodings, followed by Thumb-2 in Armv6T2 and Armv7, which mixes 16-bit and 32-bit instructions for broader functionality while maintaining compactness. Thumb-2 achieves up to 40% smaller code size compared to pure ARM instructions, improving cache efficiency and reducing memory footprint in resource-constrained systems like mobiles and embedded devices.[117] ThumbEE, an extension in Armv7-A, modifies Thumb-2 for dynamic code generation, such as just-in-time compilation, by altering load/store behaviors and adding instructions like BLX(2) for better branch prediction in runtime-optimized environments.[118] The architecture integrates coprocessors (CP0 to CP15) for specialized tasks, with instructions like MCR and MRC facilitating data transfer and control between the ARM core and these units. CP15 serves as the system control coprocessor, managing cache, MMU, and privilege configurations via registers accessed in privileged modes.[119] Jazelle DBX, introduced in Armv5TEJ, enables direct execution of Java bytecode in a dedicated state (Jazelle mode), bypassing interpretation for faster virtual machine performance, with variable-length instructions aligned to bytes and support for dynamic binary translation.[120]SIMD, DSP, and Multimedia Extensions
The ARM architecture incorporates several extensions to enhance single instruction, multiple data (SIMD) processing, digital signal processing (DSP), and multimedia workloads, enabling efficient parallel operations on vectors of data elements. These extensions build upon the base instruction set to accelerate tasks such as audio/video encoding, image processing, and machine learning inference, particularly in resource-constrained environments like mobile and embedded systems.[121][122] The Vector Floating Point (VFP) extension, introduced in Armv5 and further developed in subsequent versions including Armv7, provides dedicated hardware for single-precision and double-precision floating-point operations, supporting up to 32 64-bit registers for scalar and vector computations. It enables fused multiply-add operations and conversions between integer and floating-point formats, which are essential for multimedia algorithms requiring precise numerical handling. VFP is integrated with the Advanced SIMD unit in later implementations, allowing seamless switching between integer and floating-point modes without pipeline stalls.[123] Advanced SIMD, known as NEON and available from ARMv7 onward, introduces 128-bit vector registers that support operations on 8-bit, 16-bit, 32-bit, and 64-bit integer elements, including arithmetic, logical, and permutation instructions. NEON includes fused multiply-accumulate (MAC) instructions tailored for DSP tasks, such as filtering in audio processing, and is widely used for multimedia acceleration, including video decoding where it can process multiple pixels or coefficients in parallel to achieve up to several times the performance of scalar code. For instance, NEON's load/store instructions with structure handling optimize data movement for codecs like H.264, reducing memory bandwidth demands in real-time applications.[121][124][121] For the M-profile cores targeting embedded and microcontroller applications, the Helium technology—formally the M-Profile Vector Extension (MVE) in ARMv8.1-M—delivers SIMD and DSP capabilities with up to 128-bit vectors, supporting integer, fixed-point, and single-precision floating-point operations on 8- to 32-bit elements. Helium includes tail-predication and fault-handling mechanisms to manage variable-length vectors efficiently, making it suitable for machine learning workloads like neural network inference on low-power devices, where it can provide up to 15 times the performance uplift over scalar implementations for certain DSP functions. Its compact instruction encoding ensures minimal code size increase, ideal for resource-limited IoT systems.[122][125][126] The Scalable Vector Extension (SVE) in ARMv8-A and its enhancement SVE2 in ARMv9-A introduce vector lengths ranging from 128 to 2048 bits, allowing hardware-agnostic code that scales across implementations without recompilation. SVE supports gather-scatter memory accesses for non-contiguous data patterns common in sparse computations, along with first-faulting predication to handle irregular loops efficiently, which is crucial for high-performance computing and AI training. SVE2 expands this with additional integer and fixed-point instructions, bridging gaps for broader DSP and multimedia use cases beyond floating-point dominance in SVE. In 2025, optimizations in frameworks like PyTorch leverage SVE2 for enhanced AI performance on ARMv9 cores, including kernel fusions that exploit scalable vectors for up to 2.5 times faster inference on transformer-based models (e.g., BERT, Llama) compared to fixed-width SIMD.[108][127] The Scalable Matrix Extension (SME), introduced in Armv9.2-A, enhances matrix multiplication capabilities with scalable tiles up to 256x256 elements, accelerating AI training and inference workloads by providing dedicated hardware for outer-product operations on integers and floating-point data. SME, along with its enhancement SME2, supports a wide range of data types including bfloat16 and int8, enabling efficient deep learning computations in high-performance servers and AI accelerators.[128]Security and Virtualization Features
The ARM architecture family incorporates hardware-based security and virtualization features to enable secure execution environments, isolation of sensitive operations, and protection against common software vulnerabilities. These mechanisms are integral to supporting trusted execution in diverse applications, from embedded devices to servers, by partitioning system resources and enforcing access controls at the hardware level. Key features include TrustZone for runtime isolation and extensions like Pointer Authentication and Memory Tagging for mitigating exploits.[129] TrustZone, introduced in Armv6 and available in subsequent architectures, partitions the system into Secure and Normal worlds, allowing secure software to access both while restricting normal world access to secure resources. This enables dual-OS support, where a rich OS runs in the normal world and a trusted OS or secure applications operate in the secure world, often augmented by dedicated crypto accelerators for operations like encryption and key management. The hardware enforces isolation through a non-secure (NS) bit in memory addresses and peripherals, preventing unauthorized access and protecting against software attacks.[130][129] For microcontroller units (MCUs), Armv8-M introduces a lightweight variant of TrustZone tailored for resource-constrained embedded systems. This extension provides secure and non-secure memory partitioning without the overhead of a full monitor mode, using signal-based transitions between security states and separate interrupt handling for each world. It supports multiple secure function entry points, enabling fine-grained protection for IoT devices while maintaining low power consumption.[131][132][133] Virtualization support begins with the Virtualization Extensions (VE) in Armv7, which introduce a hypervisor mode (Hyp mode in AArch32) for managing guest operating systems. In Armv8 and later, this evolves into Exception Level 2 (EL2) in AArch64, allowing hypervisors to oversee multiple virtual machines through stage-2 address translation, which applies additional memory mappings on top of guest-level stage-1 translations. This enables efficient isolation of virtualized workloads, with EL2 handling traps and context switches to prevent guest interference. Secure virtualization in Armv8.4 further extends EL2 to the secure world, supporting nested isolation for trusted payloads.[134][57] The Armv8.3 extension adds Pointer Authentication Codes (PAC), which embed cryptographic signatures into pointer values to detect and prevent manipulation in return-oriented programming (ROP) and jump-oriented programming (JOP) attacks. PAC uses dedicated keys stored in system registers, with instructions like PACIA (authenticate instruction address) verifying pointers on load and use, providing low-overhead protection without altering the ABI. This feature is mandatory in Armv8.3-A and extends to Armv9.[135][136] The Armv8.5-A extension introduces the Memory Tagging Extension (MTE), which is included in Armv9, to address memory safety issues like buffer overflows and use-after-free errors, which contribute to 70% of serious security vulnerabilities. MTE assigns 4-bit tags to 16-byte memory granules, checked on every load/store against a pointer's allocation tag; mismatches trigger faults, enabling proactive detection with minimal performance impact through hardware acceleration.[110][137][5] The Realm Management Extension (RME) in Armv9-A enhances confidential computing by introducing Realms as isolated execution environments beyond Secure and Normal worlds, managed by a Root-of-Trust through dynamic attestation and attestation tokens. RME adds two new security states and exception levels (EL0r/EL1r in Realm state), supporting stage-3 translation for hypervisor oversight of Realms without exposing data, thus enabling secure multi-tenant cloud workloads.[138][139][140]Applications and Ecosystems
Embedded and Real-Time Systems
The ARM architecture family has established a dominant position in embedded and real-time systems, particularly through its Cortex-M and Cortex-R processor profiles, which prioritize low power consumption, deterministic performance, and reliability in resource-constrained environments. Cortex-M cores, optimized for microcontrollers (MCUs), power a wide array of devices from simple sensors to complex control units, enabling efficient operation in battery-powered or energy-limited scenarios. Meanwhile, Cortex-R cores target applications requiring predictable real-time responses, such as those in industrial automation and data management systems. Cortex-M processors hold a leading market share in the embedded MCU sector, capturing approximately 69% in 2024 and projected to maintain around 70% through 2025, driven by their balance of performance, power efficiency, and ecosystem support.[141] Prominent examples include STMicroelectronics' STM32 series, which leverages Cortex-M cores for versatile embedded applications like consumer electronics and industrial controls, and NXP's i.MX RT crossover MCUs, featuring Cortex-M7 and Cortex-M4 cores for high-performance real-time processing in motor control and human-machine interfaces.[142][143] These implementations highlight the M-profile's scalability, supporting everything from basic 8-bit replacements to advanced 32-bit tasks without compromising on low-power attributes. In real-time systems, Cortex-R processors excel in environments demanding low-latency and fault-tolerant operation, commonly deployed in storage controllers for data integrity and printers for precise timing in print mechanisms.[144][145] Safety-critical certifications further bolster their adoption; for instance, cores like Cortex-R52 and Cortex-R5 have achieved ISO 26262 compliance up to ASIL D, facilitating use in automotive and industrial systems where functional safety is paramount.[146][147] The proliferation of Internet of Things (IoT) devices underscores ARM's impact, with over 21 billion connected endpoints globally as of 2025, many powered by Cortex-M for their energy-efficient design.[148] These cores incorporate low-power modes, such as sleep and deep sleep states triggered by wait-for-interrupt (WFI) instructions, allowing devices to enter ultra-low consumption phases while maintaining rapid wake-up for event-driven tasks.[149] Armv8-M architecture enhances security in IoT deployments through TrustZone technology, partitioning resources into secure and non-secure worlds to protect sensitive data and firmware from unauthorized access, thereby addressing vulnerabilities in connected ecosystems.[150] Complementing this, ARM supports energy harvesting integrations, where Cortex-M-based systems draw power from ambient sources like vibrations or light, extending operational life in remote or battery-free applications through efficient power management circuits.[151]Mobile, Desktop, and Server Deployments
The ARM architecture dominates the mobile computing landscape, powering over 99% of smartphones worldwide as of 2024, a position it has maintained through custom implementations by major vendors.[152] Apple's A-series and M-series processors, based on ARM's A-profile, drive iOS devices with integrated neural processing units for AI tasks, while Qualcomm's Snapdragon series, licensed from ARM, supports the majority of Android smartphones, emphasizing high-performance cores for gaming and multimedia.[153] This near-universal adoption stems from ARM's energy-efficient design, which balances battery life and performance in power-constrained environments.[154] A key innovation in mobile deployments is ARM's big.LITTLE technology, which integrates high-performance "big" cores for demanding tasks like video rendering with energy-efficient "LITTLE" cores for background operations, enabling dynamic workload allocation to optimize power consumption without sacrificing responsiveness.[155] Widely implemented in Snapdragon and other SoCs, big.LITTLE has become foundational for heterogeneous computing in smartphones, allowing devices to handle AI inference and 5G processing efficiently.[156] In desktop and PC markets, ARM-based systems are experiencing growth, particularly through Windows on ARM initiatives, reaching approximately 14% market share in early 2025, with ongoing growth driven by AI-capable hardware.[157] Microsoft's Copilot+ PCs, launched in 2024 and expanded in 2025, leverage Qualcomm's Snapdragon X Elite processors—featuring custom Oryon cores derived from Nuvia designs—to deliver native ARM performance for productivity and AI workloads, marking a shift from traditional x86 dominance in Windows ecosystems. Recent Armv9 adoption has accelerated in these AI PCs.[158] These deployments highlight ARM's scalability to higher-power scenarios, offering improved battery life in laptops compared to Intel counterparts.[159] ARM's expansion into servers focuses on cloud and data center applications, where processors like AWS Graviton and Ampere Altra provide alternatives to x86 for cost-sensitive, high-density computing.[160] As of mid-2025, ARM-based servers have captured approximately 25% of the server market, fueled by adoption in hyperscale environments for web services and machine learning inference.[161] Leading providers such as AWS utilize Graviton instances for their energy efficiency, achieving up to 60% better power utilization than comparable x86 systems, which translates to substantial cost savings in large-scale operations— for instance, a 10% efficiency gain can save millions annually for providers like AWS.[162][163] Ampere Altra complements this by targeting edge and cloud workloads with multi-threaded scalability, further emphasizing ARM's role in sustainable data center growth, supported by strong Q3 2025 revenue momentum.[164][165]Automotive and Industrial Uses
The ARM architecture plays a pivotal role in automotive applications, particularly in safety-critical systems such as advanced driver-assistance systems (ADAS) and electronic control units (ECUs) for engine management and braking.[166] Cortex-R and Cortex-A processors, part of the R-profile and A-profile respectively, are widely deployed in these ECUs to handle real-time processing and complex computations, supporting functional safety up to Automotive Safety Integrity Level D (ASIL-D) as defined by ISO 26262.[167] For redundancy, lockstep core configurations in processors like the Cortex-R52 enable fault detection by running identical instructions in parallel and comparing outputs, enhancing reliability in harsh operating conditions.[168] These systems often operate across extended temperature ranges, typically from -40°C to 125°C, to withstand automotive environments.[169] In-vehicle infotainment (IVI) systems also leverage ARM-based solutions for multimedia processing and connectivity, with scalable Cortex-A cores providing efficient performance for user interfaces and entertainment features.[166] Notable examples include NVIDIA's DRIVE Orin platform, which integrates Armv8-based Hercules CPU cores for ADAS and autonomous driving compute, delivering up to 254 TOPS of AI performance in a safety-certified design.[170] Similarly, Renesas' R-Car series, such as the R-Car V4H, employs multiple ARM Cortex-A cores for ADAS and IVI applications, achieving ASIL-D systematic capability through integrated safety mechanisms.[171] ARM technology powers solutions in 94% of global automakers, underscoring its dominance in automotive system-on-chips (SoCs).[166] In industrial applications, ARM architectures support rugged, safety-critical environments like robotics and programmable logic controllers (PLCs), where real-time control and fault tolerance are essential.[172] The Armv8-R architecture, designed for deterministic performance, enables functional safety in these systems by providing features for error detection and recovery, suitable for applications requiring compliance with standards like IEC 61508.[173] For instance, Schneider Electric utilizes ARM-based platforms with SystemReady certification for software-defined PLCs, facilitating low-latency automation and secure operations in manufacturing.[174] In robotics, Cortex-A and Cortex-R processors manage motion control and sensor fusion, often incorporating lockstep redundancy to mitigate single-point failures in dynamic industrial settings.[168] Industrial ARM implementations commonly feature extended temperature ratings up to 125°C to endure factory floor conditions.[175]Standards and Certifications
Operating System Support
The Linux kernel has provided mainline support for ARM architectures since 1994, with kernel version 2.6 (released in 2003) introducing significant multi-platform enhancements that improved broad compatibility. Subsequent versions added support for 32-bit Armv7 (starting around 2007) and 64-bit Armv8/Armv9 (from 2012 onward) implementations across embedded, server, and desktop environments.[176][177] Major Linux distributions have adapted this support extensively; for instance, Ubuntu offers official 64-bit ARM server and desktop images optimized for processors like those in Raspberry Pi and cloud instances, while Fedora provides comprehensive ARM editions for aarch64 hardware ranging from single-board computers to enterprise servers.[178][179] Android, built on the Android Open Source Project (AOSP), has been predominantly designed for ARM architectures since its inception, with Armv7 and Armv8 dominating the ecosystem due to their efficiency in mobile devices; the platform includes specific optimizations for ARM's NEON SIMD extensions in the Native Development Kit (NDK) to enhance multimedia and AI workloads.[180] Google's Chrome OS has supported ARM architectures since version 5 in 2010, with native Armv7 and later Armv8/Armv9 compatibility for Chromebooks, enabling efficient deployment in education and lightweight computing.[181] Microsoft's Windows on ARM64, introduced in 2017 with Windows 10, supports native 64-bit applications on Armv8 processors, and by 2025, it incorporates the Prism emulation layer in Windows 11 24H2 and later to run x86/x64 software more efficiently, including advanced vector instructions like AVX/AVX2 for broader app compatibility.[182][183] For embedded systems, FreeRTOS offers official ports for ARM Cortex-M and Cortex-A cores, providing a lightweight real-time OS kernel with low memory footprint suitable for microcontrollers and IoT devices.[184] Apple's macOS, starting with Big Sur in 2020, runs natively only on its custom ARM-based Apple Silicon processors (Armv8-A derivatives), leveraging the architecture's power efficiency for laptops and desktops without support for non-Apple ARM hardware.[185] Porting operating systems to ARM involves challenges such as adapting to the ARM Application Binary Interface (ABI), which differs from x86 in areas like procedure call standards and data types (e.g., AAPCS64 for 64-bit), requiring recompilation or rewriting of binaries and libraries.[186] Additionally, driver support often necessitates custom development or upstreaming to the mainline kernel, as ARM's diverse SoC ecosystem demands platform-specific integrations for peripherals like GPUs and interrupts, potentially increasing porting time and testing efforts.[187]Arm SystemReady and PSA Certified
Arm SystemReady is a compliance program developed by Arm to promote interoperability across Arm-based hardware platforms by standardizing firmware and boot processes, enabling off-the-shelf operating systems like Linux and Android to boot and operate without hardware-specific modifications.[188] The program is divided into bands tailored to different use cases: SystemReady SR targets desktop and server environments, ensuring compatibility with standard server OS distributions through defined hardware and firmware interfaces, while SystemReady ES focuses on embedded systems for IoT and edge applications, supporting lightweight boot flows suitable for resource-constrained devices.[189] This structure reduces ecosystem fragmentation, allowing developers to deploy software across diverse hardware without extensive validation efforts.[190] Central to SystemReady compliance are key components such as the Firmware Framework for A-profile (FF-A), which specifies secure interfaces for firmware components to manage resource access and isolation between secure and non-secure worlds, often leveraging hardware like TrustZone for protection. For server-oriented SR compliance, Baseboard Management standards, including the Server Base Manageability Requirements (SBMR), integrate Baseboard Management Controllers (BMC) to enable remote monitoring, firmware updates, and hardware oversight independent of the host OS. Validated platforms exemplify these standards; for instance, Qualcomm's Snapdragon-based platforms have achieved SystemReady compliance for embedded and IoT use cases, while Ampere's Mt. Jade server platform meets SR requirements, contributing to over 150 compliant systems available as of 2025.[191][192] The PSA Certified framework, originally launched by Arm and transferred to GlobalPlatform governance in September 2025, provides a standardized IoT security assurance scheme to evaluate and certify the security posture of chips, firmware, and devices against defined threat models.[193] It encompasses assurance levels from 1 to 4: Level 1 involves vendor self-declaration of security requirements for the Platform Security Architecture (PSA); Level 2 requires independent lab testing of the PSA Root of Trust (PSA-RoT) for basic software vulnerabilities; Level 3 extends evaluation to substantial physical and sophisticated software attacks on the RoT; and Level 4 targets high robustness for isolated Secure Elements (iSE) or Secure Elements (SE), protecting high-value assets like cryptographic keys.[194] Core elements include the PSA-RoT, a minimal trusted component providing immutable security functions such as secure boot to verify firmware integrity and prevent unauthorized code execution from compromising the system.[195] In 2025, PSA Certified expanded to address emerging needs in AI edge devices, incorporating certifications for processors with integrated AI accelerators that maintain secure isolation for machine learning models and data processing.[196] For example, Renesas' RZ/V2L microprocessor, featuring an Arm Cortex-A55 CPU and built-in AI accelerator, achieved PSA Certified Level 2, demonstrating resistance to common IoT threats while supporting edge AI workloads.[197] As of late 2025, the program has surpassed 250 certifications across nearly 90 providers, with over 100 certified chips enabling secure deployment in connected ecosystems.[193]Recent Developments and Innovations
In 2025, Arm introduced the Armv9 Edge AI platform, optimized for Internet of Things (IoT) devices, featuring the new Cortex-A320 CPU core and Ethos-U85 Neural Processing Unit (NPU). This heterogeneous computing solution enables on-device execution of AI models exceeding 1 billion parameters, delivering up to 10 times the machine learning performance compared to prior generations while maintaining ultra-low power efficiency for edge applications.[56][198][199] Arm underwent a significant rebranding of its processor platforms in June 2025 to better align with market-specific needs and emphasize full-system solutions. The mobile segment now falls under the Lumex branding, targeting smartphones and tablets with AI-optimized cores, while the Niva brand was introduced for personal computers (PCs), focusing on high-performance computing in desktops and laptops. This shift moves beyond the traditional Cortex naming, incorporating Compute Subsystems (CSS) for integrated CPU, GPU, and NPU designs to accelerate development for partners.[200][201][202] Supporting these advancements, Arm released ExecuTorch 1.0 in October 2025, a lightweight runtime co-developed with Meta for deploying PyTorch models on edge devices. This tool enables efficient on-device AI inference across CPUs, GPUs, and NPUs, supporting large language models (LLMs) and vision tasks with broader hardware compatibility and production-ready stability. Concurrently, the A-profile architecture received updates in Armv9.7-A, including enhancements to power management through Memory Partitioning and Monitoring (MPAMv2) for improved resource partitioning, virtualization, and system profiling with up to 16-bit Partition Monitoring Groups (PMGs). These changes, alongside AI-specific extensions like Scalable Vector Extension (SVE)/Scalable Matrix Extension (SME) instructions for 6-bit data types, reduce memory bandwidth in machine learning workloads without explicit mentions of branch prediction refinements.[203][204][112] Ecosystem expansions gained momentum at Microsoft Build 2025, where Arm showcased deeper integrations for Azure cloud and Windows on Arm PCs, emphasizing AI acceleration and sustainable computing. This collaboration supports Arm's push into the PC market, with forecasts indicating Arm-based laptops could reach 20% of global shipments by year-end, driven by premium devices from Qualcomm and emerging offerings from MediaTek and Nvidia. Arm's leadership has set a long-term ambition for over 50% Windows PC market share by 2029, building on 2025's projected 13-20% foothold amid competition from x86 architectures.[205][206][207] Financially, Armv9 architectures contributed to robust growth, with quarterly revenue surpassing $1 billion in Q4 FY2025 (ending March 2025) and annual sales exceeding $4 billion, fueled by licensing and royalties from AI, cloud, and data center deployments. Royalty revenue grew 25-30% year-over-year in early FY2026 quarters, underscoring Armv9's impact on premium silicon shipments.[208][209][210]References
- https://en.wikichip.org/wiki/acorn/microarchitectures/arm3
- https://en.wikichip.org/wiki/arm/armv2