Recent from talks
Nothing was collected or created yet.
| General information | |
|---|---|
| Designed by | ARM Holdings |
| Architecture and classification | |
| Microarchitecture | ARMv4T |
| Instruction set | ARM (32-bit), Thumb (16-bit) |
| Performance | |
|---|---|
| Max. CPU clock rate | 100 MHz to 600 MHz |
| Architecture and classification | |
| Microarchitecture | ARMv5TE |
| Instruction set | ARM (32-bit), Thumb (16-bit) |
| Architecture and classification | |
|---|---|
| Microarchitecture | ARMv5TEJ |
| Instruction set | ARM (32-bit), Thumb (16-bit), Jazelle (8-bit) |
ARM9 is a group of 32-bit RISC ARM processor cores licensed by ARM Holdings for microcontroller use.[1] The ARM9 core family consists of ARM9TDMI, ARM940T, ARM9E-S, ARM966E-S, ARM920T, ARM922T, ARM946E-S, ARM9EJ-S, ARM926EJ-S, ARM968E-S, ARM996HS. ARM9 cores were released from 1998 to 2006, and no longer recommended for new IC designs; newer alternatives are ARM Cortex-M cores.[2]
Overview
[edit]With this design generation, ARM moved from a von Neumann architecture (Princeton architecture) to a (modified; meaning split cache) Harvard architecture with separate instruction and data buses (and caches), significantly increasing its potential speed.[3] Most silicon chips integrating these cores will package them as modified Harvard architecture chips, combining the two address buses on the other side of separated CPU caches and tightly coupled memories.
There are two subfamilies, implementing different ARM architecture versions.
Differences from ARM7 cores
[edit]Key improvements over ARM7 cores, enabled by spending more transistors, include:[4]
- Clock frequency improvements. Shifting from a three-stage instruction pipeline to a five-stage one lets the clock speed be approximately doubled, on the same silicon fabrication process.
- Cycle count improvements. Many unmodified ARM7 binaries were measured as taking about 30% fewer cycles to execute on ARM9 cores. Key improvements include:
- Faster loads and stores; many instructions now cost just one cycle. This is helped by both the modified Harvard architecture (reducing bus and cache contention) and the new pipeline stages.
- Exposing pipeline interlocks, enabling compiler optimizations to reduce blockage between stages.
Additionally, some ARM9 cores incorporate "Enhanced DSP" instructions, such as a multiply-accumulate, to support more efficient implementations of digital signal processing algorithms.
Switching from a von Neumann architecture entailed using a non-unified cache, so that instruction fetches do not evict data (and vice versa). ARM9 cores have separate data and address bus signals, which chip designers use in various ways. In most cases they connect at least part of the address space in von Neumann style, used for both instructions and data, usually to an AHB interconnect connecting to a DRAM interface and an External Bus Interface usable with NOR flash memory. Such hybrids are no longer pure Harvard architecture processors.
ARM license
[edit]ARM Holdings neither manufactures nor sells CPU devices based on its own designs, but rather licenses the processor architecture to interested parties. ARM offers a variety of licensing terms, varying in cost and deliverables. To all licensees, ARM provides an integratable hardware description of the ARM core, as well as complete software development toolset and the right to sell manufactured silicon containing the ARM CPU. This model of licensed CPU core design is called an intellectual property (IP) core.
Silicon customization
[edit]Integrated device manufacturers (IDM) receive the ARM Processor IP as synthesizable RTL (written in Verilog). In this form, they have the ability to perform architectural level optimizations and extensions. This allows the manufacturer to achieve custom design goals, such as higher clock speed, very low power consumption, instruction set extensions, optimizations for size, debug support, etc. To determine which components have been included in a particular ARM CPU chip, consult the manufacturer datasheet and related documentation.
Cores
[edit]| Year | ARM9 Cores |
|---|---|
| 1998 | ARM9TDMI |
| 1998 | ARM940T |
| 1999 | ARM9E-S |
| 1999 | ARM966E-S |
| 2000 | ARM920T |
| 2000 | ARM922T |
| 2000 | ARM946E-S |
| 2001 | ARM9EJ-S |
| 2001 | ARM926EJ-S |
| 2004 | ARM968E-S |
| 2006 | ARM996HS |
The ARM MPCore family of multicore processors support software written using either the asymmetric (AMP) or symmetric (SMP) multiprocessor programming paradigms. For AMP development, each central processing unit within the MPCore may be viewed as an independent processor and as such can follow traditional single processor development strategies.[5]
ARM9TDMI
[edit]ARM9TDMI is a successor to the popular ARM7TDMI core, and is also based on the ARMv4T architecture. Cores based on it have five-stage pipeline (fetch, decode, execute, data memory access, register write)[6], support both 32-bit ARM and 16-bit Thumb instruction sets and include:
- ARM920T with 16 KB each of I/D cache and an MMU
- ARM922T with 8 KB each of I/D cache and an MMU
- ARM940T with cache and a Memory Protection Unit (MPU)
ARM9E-S and ARM9EJ-S
[edit]ARM9E, and its ARM9EJ sibling, implement the basic ARM9TDMI pipeline, but add support for the ARMv5TE architecture, which includes some DSP-esque instruction set extensions. In addition, the multiplier unit width has been doubled, halving the time required for most multiplication operations. They support 32-bit, 16-bit, and sometimes 8-bit instruction sets.
- ARM926EJ-S with ARM Jazelle technology, which enables the direct execution of 8-bit Java bytecode in hardware, and an MMU
- ARM946
- ARM966
- ARM968
The TI-Nspire CX (2011) and CX II (2019) graphing calculators use an ARM926EJ-S processor, clocked at 132 and 396 MHz respectively.[7]
Chips
[edit]


- ARM920T
- Atmel AT91RM9200[8]
- Cirrus Logic EP9315 ARM9 CPU, 200 MHz
- NXP i.MX1
- Samsung S3C2410, S3C2440, S3C2442, S3C2443
- ARM925T
- Texas Instruments OMAP 1510
- ARM926EJ-S
- ASPEED AST2400
- Cypress Semiconductor EZ-USB FX3
- Microchip Technology (former Atmel) AT91SAM9260,[8] AT91SAM9G,[9] AT91SAM9M,[10] AT91SAM9N/CN,[11] AT91SAM9R/RL,[12] AT91SAM9X,[13] AT91SAM9XE[14] (see AT91SAM9)
- Nintendo Starlet (Wii coprocessor)[15]
- Nuvoton NUC900
- NXP (former Freescale Semiconductor) i.MX2 Series,[16] (see I.MX), LPC3100 and LPC3200 Series[17]
- Samsung S3C2412, S3C2416, S3C2450
- Spreadtrum SC6531, SC7701B
- STMicroelectronics Nomadik
- Texas Instruments OMAP 850, 750, 733, 730, 5912 (also 5948, which is a customer specific version of it, made for Bosch), 1610
- Texas Instruments Sitara AM1x, OMAP L137/L138, Davinci DA830/DA850/DM355/DM365
- HP iLO 4[18] baseboard management controller
- 5V Technologies 5VT1310/1312/1314
- STMicroelectronics SPEAr300/600[19]
- VIA WonderMedia 8505 and 8650
- ARM940T
- ARM946E-S
- Nintendo NTR-CPU (Nintendo DS CPU), TWL-CPU (Nintendo DSi CPU; same as the DS but clocked at 133 MHz instead of 67 MHz)[20]
- NXP Nexperia PNX5230
- ARM966E-S
- LSI Logic LSI53C1030
- STMicroelectronics STR9[21]
- ARM968E-S
- Unreferenced ARM9 core
- Anyka AK32xx
- Atmel AT91CAP9
- CSR Quatro 4300
- Centrality Atlas III
- Digi NS9215, NS9210[22]
- HiSilicon Kirin K3V1
- Infineon Technologies S-GOLDlite PMB 8875
- LeapFrog LF-1000
- NXP Semiconductors (former Freescale Semiconductor) i.MX1x
- MediaTek MT1000, MT6235-39, MT6268, MT6516
- PRAGMATEC RABBITV3 (ARM920T rev 0 (v4l)) used in Karotz)
- Qualcomm MSM6xxx
- Qualcomm Atheros AR6400
- Texas Instruments TMS320DM365/TMS320DM368 ARM9EJ-S
- Zilog Encore! 32
Documentation
[edit]The amount of documentation for all ARM chips is daunting, especially for newcomers. The documentation for microcontrollers from past decades would easily be inclusive in a single document, but as chips have evolved so has the documentation grown. The total documentation is especially hard to grasp for all ARM chips since it consists of documents from the IC manufacturer and documents from CPU core vendor (ARM Holdings).
A typical top-down documentation tree is: high-level marketing slides, datasheet for the exact physical chip, a detailed reference manual that describes common peripherals and other aspects of physical chips within the same series, reference manual for the exact ARM core processor within the chip, reference manual for the ARM architecture of the core which includes detailed description of all instruction sets.
- Documentation tree (top to bottom)
- IC manufacturer marketing slides.
- IC manufacturer datasheets.
- IC manufacturer reference manuals.
- ARM core reference manuals.
- ARM architecture reference manuals.
IC manufacturer has additional documents, including: evaluation board user manuals, application notes, getting started with development software, software library documents, errata, and more.
See also
[edit]References
[edit]- ^ ARM9 Family Webpage; ARM Holdings.
- ^ ARM9; OEMDrivers.
- ^ Furber, Steve (2000). ARM System-on-Chip Architecture. Addison-Wesley. p. 344. ISBN 0201675196.
- ^ "Performance of the ARM9TDMI and ARM9E-S cores compared to the ARM7TDMI core", Issue 1.0, dated 9 February 2000, ARM Ltd.
- ^ "MPCore Sample Code". Archived from the original on 11 April 2015.
- ^ https://www.ecb.torontomu.ca/~courses/ee8205/Data-Sheets/ARM/ARM9TDMI.pdf [bare URL PDF]
- ^ "Teardown Tuesday: Graphing Calculator - News". www.allaboutcircuits.com. Retrieved 2021-07-12.
- ^ a b Atmel Legacy ARM-Based Solutions; Atmel.
- ^ SAM9G ARM9 Microcontrollers; Atmel.
- ^ SAM9M ARM9 Microcontrollers; Microchip.
- ^ SAM9N/CN ARM9 Microcontrollers; Atmel.
- ^ SAM9R/RL ARM9 Microcontrollers; Atmel.
- ^ SAM9X ARM9 Microcontrollers; Atmel.
- ^ SAM9XE ARM9 Microcontrollers; Atmel.
- ^ "Hardware/Starlet". Wiibrew. Archived from the original on 16 May 2020. Retrieved 14 June 2020.
- ^ i.MX28 Applications Processors; NXP.
- ^ "LPC3100/200 Series: Arm9-based microcontrollers|NXP". www.nxp.com. Retrieved 2018-07-27.
- ^ "iLO 4 Cryptographic Module FIPS 140-2 Non-Proprietary Security Policy" (PDF). Hewlett Packard Enterprise. 10 February 2016.
- ^ "SPEAr ARM 926 Microprocessors - STMicroelectronics".
- ^ GBATEK - GBA/NDS Technical Info - ARM CP15 ID Codes; Martin Korth
- ^ STR9 ARM9 Microcontrollers; STMicroelectronics.
- ^ "NS9210/NS9215 32-bit NET+ARM Processor Family" (PDF). Digi International.
External links
[edit]- ARM9 official documents
- ARM9 official website
- Architecture Reference Manual: ARMv4/5/6
- Core Reference Manuals: ARM9E-S, ARM9EJ-S,ARM9TDMI,ARM920T,ARM922T,ARM926EJ-S,ARM940T,ARM946E-S,ARM966E-S,ARM968E-S
- Coprocessor Reference Manuals: VFP9-S (Floating-Point), MOVE (MPEG4)
- Quick Reference Cards
Introduction
Overview
The ARM9 family comprises a series of 32-bit RISC processor cores developed and licensed by ARM Holdings for microcontroller and low-power embedded applications.[10] These cores represent an older generation in the ARM architecture lineup, emphasizing high performance and efficiency for cost-sensitive devices such as mobile phones and personal digital assistants.[10] The family was announced in 1997, with the ARM9TDMI serving as the inaugural core.[11] ARM9 implementations generally achieve clock speeds up to 200-300 MHz, delivering performance around 210 MIPS at 200 MHz in typical configurations.[10] This range positions the family as suitable for embedded systems requiring balanced power consumption and processing capability, often operating at power levels of approximately 1.5 mW per MHz.[12] Within the broader ARM architecture evolution, the ARM9 family bridges the gap between the simpler, lower-performance ARM7 series and the more advanced ARM11 series, which introduced enhancements like Thumb-2 instruction set support for improved code density.[13] Architecturally, ARM9 cores adopt a Harvard design in many variants, featuring separate instruction and data buses to enhance memory access efficiency.[8]Historical Development
The ARM9 family of processors was developed by ARM Holdings in the late 1990s to address the escalating demands for enhanced 32-bit RISC processing power in power-constrained portable and embedded applications, building on the widespread adoption of the preceding ARM7TDMI core.[14] This evolution was motivated by the need to support emerging multimedia, networking, and signal-processing workloads that exceeded the capabilities of the ARM7 era, while maintaining low power consumption essential for battery-operated devices.[15] The ARM9TDMI, the inaugural core in the family, was announced on October 16, 1997, as a direct successor to the ARM7TDMI, promising approximately double the performance for applications in multimedia and networking.[16] It became available for licensing in 1998 and implemented the ARMv4T architecture, incorporating Thumb instructions for improved code density.[17] The first production silicon using the ARM9TDMI appeared in 1999, with licensees such as Samsung integrating it into system-on-chip designs for mobile and embedded systems.[18] This development occurred amid the late 1990s surge in personal digital assistants (PDAs) like the Palm Pilot series, early cellular phones from Nokia and others, and proliferating embedded systems in consumer electronics, where efficient 32-bit processing was increasingly vital.[19] Subsequent milestones included the introduction of the ARM9E-S core in 1999, which extended the architecture to ARMv5TE with enhanced digital signal processing (DSP) instructions to better handle multimedia and communications tasks.[4] In 2001, the ARM9EJ-S variant followed, adding Jazelle technology for hardware-accelerated execution of Java bytecodes, targeting the rising popularity of Java-enabled portable devices.[20] These extensions solidified the ARM9's intellectual property foundation, evolving from ARMv4T's baseline features to ARMv5TE's DSP and multimedia optimizations.Architecture
Key Features
The ARM9 family of processors implements the ARMv4T architecture in its base variant, the ARM9TDMI, supporting both 32-bit ARM instructions for full functionality and 16-bit Thumb instructions to enhance code density in memory-constrained embedded systems.[21] Later variants, such as the ARM9E-S family, adopt the ARMv5TE architecture, incorporating enhanced DSP extensions while retaining core compatibility with ARM and Thumb instruction sets.[22] As a load/store architecture, ARM9 cores employ a 32-bit flat address space, where only load and store instructions access memory, enabling efficient pipelining and reduced complexity in data handling. All instructions support conditional execution based on flags, minimizing the need for explicit branching and improving code efficiency by allowing up to 16 conditions per instruction. The processor operates in six distinct modes—User, FIQ, IRQ, Supervisor, Abort, and Undefined—to manage privilege levels, fast interrupts, general interrupts, and exception handling, ensuring secure and prioritized task execution.[21] Endianness is configurable, allowing selection between big-endian and little-endian byte ordering to accommodate diverse system requirements and legacy compatibility. Power management features include basic sleep modes for halting execution during idle periods and clock gating to disable unused pipeline stages, optimizing energy consumption in low-power embedded applications.[21] Compared to the preceding ARM7TDMI, which also uses ARMv4T but features a shallower pipeline, the ARM9 achieves approximately double the performance at equivalent clock speeds through a deeper five-stage pipeline, without incorporating advanced features like Jazelle mode or SIMD extensions in base configurations.[10][2]Pipeline and Performance
The ARM9TDMI core employs a five-stage pipeline consisting of fetch, decode, execute, memory, and writeback stages, adopting a Harvard architecture that separates instruction and data accesses to support higher clock frequencies than the ARM7TDMI's three-stage von Neumann design. This structure facilitates overlapping of operations, with typical operating frequencies ranging from 100 MHz to 400 MHz based on the semiconductor process node, such as 0.18 μm for up to 200 MHz implementations.[23][24] Branch instructions are resolved during the execute stage without dynamic prediction hardware, resulting in a 2-3 cycle penalty for taken branches due to pipeline flushing and refilling from the target address; untaken branches incur no penalty. Some ARM9 variants incorporate static prediction heuristics or minimal target buffering in customized implementations to mitigate frequent branching in control-intensive code.[14] Performance characteristics are evaluated using the Dhrystone benchmark, where the ARM9TDMI delivers approximately 1.1 DMIPS/MHz in ARM instruction mode, reflecting efficient integer processing but limited by the in-order pipeline. The ARM9E-S variant improves this to up to 1.1 DMIPS/MHz through DSP extensions that accelerate multiply-accumulate operations common in signal processing, reducing overall cycles for mixed workloads. DMIPS, or Dhrystone MIPS normalized to the VAX 11/780 standard, provides a measure of sustained integer performance per megahertz, derived from the benchmark's total iterations divided by execution cycles and scaled by clock rate.[25][26] Memory interfaces include optional Harvard caches, exemplified by the ARM940T's 4 KB instruction cache and 4 KB data cache for low-latency access in cached configurations, alongside support for the AMBA AHB or ASB bus protocols to enable scalable system integration with peripherals. Key bottlenecks stem from the strictly in-order execution model, which prevents reordering for parallelism, and load-use data hazards that impose 1-2 cycle interlocks when subsequent instructions depend on unloaded data arriving in the memory stage.[2]Core Variants
ARM9TDMI and ARM940T
The ARM9TDMI core, released in 1998, serves as the foundational integer processor in the ARM9 family, implementing the ARMv4T architecture with support for both 32-bit ARM and 16-bit Thumb instructions to enhance code density in memory-constrained embedded systems.[10] It employs a five-stage pipeline—fetch, decode, execute, memory access, and write-back—with a Harvard bus architecture enabling simultaneous instruction and data fetches, achieving approximately 1.1 MIPS per MHz while maintaining low complexity through forwarding paths that reduce pipeline stalls. Lacking integrated cache or memory management unit (MMU), the ARM9TDMI is provided as a synthesizable RTL design suitable for custom ASIC or FPGA integration, prioritizing minimal area (around 4.15 mm² in 0.35 μm process, scaling below 0.5 mm² in advanced nodes like 0.18 μm) and power efficiency (typically 0.6–1.8 mW/MHz depending on process and voltage, such as 1.8 mW/MHz at 3.0 V in 0.35 μm).[10][27] This makes it ideal for cost-sensitive applications where external memory systems handle caching needs, with EmbeddedICE logic enabling JTAG-based debugging. The ARM940T variant, introduced in 1999, builds directly on the ARM9TDMI core by integrating a 4 KB instruction cache and 4 KB data cache in a Harvard configuration, each organized as 64-way set-associative with 1 KB modular blocks for flexibility in power-sensitive designs.[28] It also incorporates a memory protection unit (MPU) with 8 instruction and 8 data regions for basic access control and embedded OS support, such as task isolation without the overhead of a full MMU or TLB, alongside a 4-entry write buffer to mitigate memory latency.[28][10] Retaining the same five-stage pipeline and Thumb compatibility, the ARM940T targets closed embedded systems requiring moderate performance boosts (up to 120 MHz operation) and protection features, with power consumption around 400 mW at full speed in 0.35 μm process and an area of approximately 13 mm² including caches.[10] A related variant, the ARM920T, integrates the ARM9TDMI core with a unified 16 KB cache, MMU, and write buffer, enabling support for open operating systems like Linux in embedded applications.[29] Both cores emphasize trade-offs for embedded use: the ARM9TDMI minimizes silicon footprint and power for bare-metal or simple RTOS applications, while the ARM940T adds integrated memory management for multitasking without escalating costs or complexity beyond basic protection needs.[10] Early implementations included high-volume consumer products such as mobile communications devices and next-generation PDAs. These designs laid the groundwork for subsequent ARM9 variants by demonstrating scalable performance in low-power scenarios.ARM9E-S and ARM9EJ-S Families
The ARM9E-S core, introduced in 2000, implements the ARMv5TE architecture, which extends the ARMv4T baseline with enhancements for digital signal processing (DSP).[3] This synthesizable core supports the 32-bit ARM instruction set in ARM state and the 16-bit Thumb instruction set in Thumb state, without integrated caches, allowing licensees to add memory systems as needed.[3] Key additions include SIMD DSP instructions, such as 16-bit and 32-bit multiply-accumulate (MAC) operations with single-cycle throughput after initial latency, enabling efficient signal processing tasks like filtering and transforms. Building on the ARM9E-S in 2001, the ARM9EJ-S variant incorporates Jazelle technology, specifically Direct Bytecode Execution (DBE), to accelerate Java applications by directly interpreting bytecode on the hardware without full software emulation or just-in-time compilation.[30] Jazelle introduces a dedicated state machine that handles over 95% of Java bytecodes natively, reducing interpreted code overhead by a factor of 5 to 10 compared to pure software execution on prior ARM cores. This extension maintains the ARMv5TEJ instruction set, combining DSP capabilities with Java acceleration for multimedia-rich embedded systems. Notable implementations include the ARM926EJ-S, a macrocell featuring an ARM9EJ-S core with a memory management unit (MMU), configurable Harvard caches (typically 16 KB instruction and data), and Jazelle support, targeted at multitasking operating systems in portable devices.[6] The ARM966E-S, based on the ARM9E-S core, omits the MMU for real-time applications, incorporating tightly coupled memory (TCM) interfaces for deterministic low-latency access and DSP extensions suited to control-oriented tasks.[31] Similarly, the ARM968E-S provides a secure-oriented variant with a protection unit enabling hardware-based separation of secure and non-secure memory regions, serving as an early precursor to advanced security models like TrustZone, while retaining TCM and no caches for cost-sensitive embedded security.[32] These families enhance multimedia processing through 16/32-bit MAC instructions and saturation arithmetic, which prevent overflow in fixed-point operations common in audio and video algorithms. The Jazelle state machine further optimizes bytecode handling by mapping Java opcodes to native execution paths, minimizing mode switches between ARM, Thumb, and Jazelle states.[33] Overall, the cores deliver up to 1.2 DMIPS/MHz, providing a performance uplift for DSP workloads and enabling mid-2000s mobile devices to handle audio and video decoding efficiently.[34]Licensing and Customization
ARM Licensing Model
ARM Holdings licenses its ARM9 processor intellectual property (IP) through two primary models: processor IP licenses, which grant access to specific pre-designed core implementations such as the ARM9TDMI, and architecture licenses, which permit licensees to develop custom derivatives based on the ARM instruction set architecture (ISA).[35][36] Processor IP licenses typically involve an upfront fee, often in the range of several million dollars depending on the configuration and volume commitments, followed by royalty payments of 1-2% of the selling price per chip containing the IP, or equivalently low fixed amounts like $0.01 to $0.10 per unit for high-volume embedded applications.[35][37] Architecture licenses, more flexible but costlier upfront (potentially tens of millions), allow modifications to the core design while adhering to the ISA specifications, enabling tailored implementations for specialized needs.[38][39] For the ARM9 family, licensing options have been available since 1998, starting with the ARM9TDMI core as a fixed-configuration synthesizable soft core delivered in RTL or Verilog format.[40] Licensees could select parameterizable variants, such as those in the ARM9E-S family, allowing customization of cache sizes (e.g., instruction and data caches from 4KB to 64KB) to optimize for embedded applications, with additional royalties applied for enhanced features like cache inclusion.[40] These options emphasized flexibility for low-power microcontroller use, contrasting with higher-performance application processors. The licensing process requires potential customers to sign a non-disclosure agreement (NDA) with ARM, followed by payment of the upfront fee to receive the IP deliverables, including design files, verification tools, and documentation, which licensees then integrate into their system-on-chip (SoC) designs.[35] By 2005, the ARM9 had attracted over 100 licensees worldwide, including major semiconductor firms like Texas Instruments, Samsung, and STMicroelectronics, who incorporated it into products for mobile and embedded markets.[41] This pre-Cortex-A era model (prior to 2005) prioritized broad accessibility and customization to differentiate embedded (e.g., real-time control) from application processors (e.g., multimedia handling).[37] Licenses impose strict restrictions, prohibiting reverse engineering, decompiling, or disassembly of the IP to protect ARM's proprietary designs, with violations potentially leading to termination.[42] Compliance is mandated with the relevant ARM architecture versions, such as v4T for the ARM9TDMI (supporting Thumb instructions) and v5TE for the ARM9E-S (adding DSP extensions), ensuring interoperability and backward compatibility.[43]Silicon Integration Options
The ARM9 family supports flexible silicon integration through synthesizable register-transfer level (RTL) designs, enabling licensees to customize core parameters such as cache sizes (ranging from 4 KB to 128 KB in power-of-2 increments for instruction and data caches) and tightly coupled memory (TCM) configurations (from 4 KB to 1 MB per region).[44] Bus width and associativity can be adjusted during synthesis to match target applications, while hard macrocell implementations provide pre-optimized IP blocks for faster integration into ASICs or SoCs.[45] These options allow for variants like the ARM926EJ-S, where cache lockdown and TLB configurations support time-critical operations without altering the base ARMv5TE instruction set architecture.[44] Integration interfaces include the AMBA AHB bus for high-performance connections to peripherals and memory, with separate instruction and data ports supporting burst transfers (e.g., INCR4 or INCR8) and compatibility with AMBA APB for lower-speed components.[45] The coprocessor interface facilitates extensions such as floating-point units via CP15 registers and handshake signals for instructions like MCR/MRC and CDP, enabling seamless addition of domain-specific accelerators.[44] For example, the ARM946E-S variant incorporates a memory protection unit (MPU) for enhanced security, defining access rules and privilege modes for memory regions to protect operating systems and applications without modifying the core ISA.[46] The cores are optimized for process nodes from 0.18 μm to 90 nm, with implementations achieving up to 200 MHz in 0.18 μm technology.[47] Low-power variants support voltage scaling techniques, such as dynamic voltage scaling (DVS) integrated with error-tolerant mechanisms like RAZOR for aggressive power reduction in ARM9-based designs.[48] Tool support includes the ARM RealView Development Suite for RTL simulation and debugging, featuring the ARMulator instruction set simulator and JTAG-based interfaces like RealView ICE for hardware emulation.[49] Verification relies on compliance test suites, including Architectural Validation Suites (AVS) for instruction set checks and Device Validation Suites (DVS) for peripheral and system behavior, ensuring architectural fidelity.[50] Challenges in integration include managing area overhead, with core implementations typically occupying 2.1–4.2 mm² in 0.18–0.25 μm processes (excluding caches and MMU), balanced by performance gains from custom optimizations like power-down modes for unused components.[51][47] Real-time features, such as TCM for deterministic access, can be added via configurable interfaces, minimizing latency without ISA changes.[52]Applications
Embedded Systems
The ARM9 architecture became a cornerstone of embedded systems in the 2000s, particularly for real-time operating system (RTOS)-based applications requiring efficient 32-bit processing at low power and cost. It dominated in sectors like automotive electronic control units (ECUs), where cores such as the ARM926EJ-S provided the deterministic performance needed for safety-critical tasks. Similarly, ARM9 enabled compact networking routers by integrating with Ethernet interfaces for reliable data handling, as seen in reference designs like Micrel's XceleRouter platform running at 166 MHz for near wire-speed WAN-to-LAN routing. In printers, ARM9-based solutions supported print engine control and image processing, contributing to the shift toward more capable embedded controllers in office and industrial printing devices.[53][54] Key microcontroller examples highlight ARM9's versatility in these domains. NXP's LPC3180 series, built on a 90-nm ARM9 core, targeted low-power industrial and networking applications with integrated vector floating-point support for enhanced computational efficiency. Atmel's AT91SAM9 family, including the AT91SAM9G20 and AT91SAM9R64, integrated the ARM926EJ-S core with large SRAM and peripherals like USB and Ethernet, making it suitable for RTOS-driven embedded systems in automotive and industrial control. Cirrus Logic's EP93xx series, such as the EP9307, combined an ARM9 CPU with multimedia interfaces like I2S audio and AC'97 codecs, excelling in audio DSP for embedded devices requiring real-time signal processing. These chips exemplified ARM9's role in bridging general-purpose computing with specialized peripherals.[55][56][57] ARM9's adoption extended to industrial control systems, such as programmable logic controllers (PLCs) utilizing the ARM926EJ-S for its Harvard-cached architecture and Jazelle technology for Java acceleration, ensuring low-latency responses in automation environments. Point-of-sale (POS) terminals and medical devices benefited from its 32-bit efficiency and low cost, enabling features like secure transaction processing and basic diagnostics without excessive power draw—NXP's LPC32xx series, for instance, supported USB OTG and LCD controllers for such portable applications. By the mid-2000s, ARM9-powered designs accounted for a substantial share of embedded MCUs, with ARM architectures overall capturing a significant portion of certain microcontroller markets through broad licensing. As of 2025, ARM9 remains in legacy industrial equipment, supported by current Keil MDK versions with legacy packs for ongoing maintenance in RTOS environments.[6][58][59] A notable case study is ARM9's application in set-top boxes for MPEG decoding, where DSP extensions in cores like the ARM946E-S and ARM9E accelerated multimedia tasks. These extensions, including enhanced multiply-accumulate instructions, allowed efficient handling of MPEG audio and video streams, reducing CPU load for real-time playback in resource-constrained devices—early implementations targeted audio coders and video decoders, enabling cost-effective integration in consumer broadcast systems.[60][61]Consumer and Industrial Devices
The ARM9 architecture powered early consumer electronics, particularly in portable and multimedia-focused devices during the mid-2000s. One prominent example is the Nokia 6600 smartphone, released in 2003, which incorporated the Texas Instruments OMAP1510 processor with a 104 MHz ARM9 core to handle Symbian OS tasks and basic multimedia features like VGA camera processing and MMS support. Similarly, digital cameras such as the Canon PowerShot A720 utilized ARM9-based DIGIC III image processors for efficient raw image handling, JPEG encoding, and on-device watermarking capabilities.[62] Portable media players and related devices also leveraged ARM9 for balanced performance in audio/video decoding. The Sony mylo personal communicator, introduced in 2006, employed the Freescale i.MX21 processor featuring an ARM9 core to manage Wi-Fi connectivity, video playback, and instant messaging on a Linux-based platform.[63] Key system-on-chip implementations included Samsung's S3C24xx series, which integrated ARM9 cores for mobile phones and PDAs with support for LCD controllers and USB interfaces; Intel's PXA255 XScale processor (ARMv5TE compatible with ARM9 binaries), used in PDAs like the HP iPAQ for 400 MHz operation in pocket computing tasks; and Freescale's i.MX1 family (e.g., MC9328MX1), optimized for low-power audio/video applications with integrated LCD and MPEG-4 decoding up to 200 MHz.[64][65][66] In industrial applications, ARM9 cores enabled reliable processing in devices requiring moderate computational demands beyond pure embedded controls. Multifunction printers from manufacturers like HP and Epson incorporated ARM9E variants for print job management, raster image processing, and network interfaces during the 2000s. GPS navigators, such as the AvMap Geosat series, utilized 200 MHz ARM9 processors for rapid route recalculation and data compression in portable units.[67] Barcode scanners like the FEIG ID ECCO+ integrated a 400 MHz ARM9 core with 128 MB RAM for real-time RFID/barcode decoding and data logging in warehouse environments.[68] Adoption of ARM9 peaked from 2002 to 2008, driven by its cost-effectiveness in feature phones and portable gadgets, where extensions like Jazelle accelerated Java ME bytecode execution for apps and games.[69] By around 2010, the architecture's use in new consumer designs declined with the transition to higher-performance ARM11 and Cortex-A series for advanced multimedia and multitasking, though ARM9 persisted in low-end industrial tools for legacy compatibility and power efficiency.[70]Legacy
Successors
The ARM11 family served as the direct successor to the ARM9, with the series announced on April 29, 2002, and specific cores such as the ARM1136J-S and ARM1136JF-S introduced on October 14, 2002. These processors implemented the ARMv6 architecture, building on the ARM9's ARMv5TE foundation by incorporating enhancements like SIMD media instructions for accelerated multimedia processing, and an 8-stage pipeline that enabled higher clock frequencies and up to twice the multimedia performance compared to ARMv5-based designs. Later variants, such as the ARM1156T2-S, added optional Thumb-2 instructions for improved code density and performance. The deeper pipeline and architectural optimizations allowed ARM11 cores to achieve significant efficiency gains, with implementations reaching over 600 Dhrystone MIPS at under 200 mW in 0.13-micron processes, representing a substantial leap for power-constrained applications.[71][72] This transition marked a broader architectural evolution from the ARM9's ARMv5TE, which emphasized DSP extensions and saturated arithmetic, to ARMv6 in the ARM11 family, featuring unaligned memory access, enhanced SIMD operations, multi-core support, and the introduction of TrustZone security extensions. The ARM11 maintained binary compatibility with ARM9 software, facilitating seamless migration by supporting the same ARM and Thumb instruction sets without requiring recompilation for legacy codebases. The Cortex-A series, introduced in 2005 with ARMv7-A, began supplanting ARM9 and ARM11 in new designs by the late 2000s. Subsequent shifts to the ARMv7-A architecture in 2005 introduced the Cortex family, including the Cortex-A8 as the first high-performance implementation with mandatory Thumb-2, NEON advanced SIMD extensions, and scalable designs optimized for varying performance and power profiles across application, real-time, and microcontroller domains.[72][73] ARM9 production peaked around 2006, with hundreds of millions of units shipped annually in embedded and mobile devices, but began phasing out for new designs by the late 2000s as licensees increasingly adopted Cortex processors. By 2010, the Cortex-A8 and later variants had largely supplanted ARM9 and ARM11 in high-end applications, while the Cortex-M series targeted embedded systems, driven by demands for superior single- and multi-threaded performance in evolving markets. This replacement was motivated by the escalating efficiency requirements of smartphones and other battery-powered devices, where ARM9's 5-stage pipeline struggled to scale effectively in multicore configurations amid rising computational needs for multimedia and connectivity. The Cortex transition enabled better power-performance trade-offs, with ARMv7 designs delivering up to several times the efficiency of predecessors in mobile workloads, easing the shift to more complex, multi-core SoCs.[70][74][75][76]Enduring Impact
The ARM9 family played a pivotal role in establishing ARM's leadership in the embedded systems market during the late 1990s and early 2000s, contributing significantly to the shipment of billions of ARM-based processors that powered mobile devices, PDAs, and early consumer electronics. By 2011, ARM had shipped approximately 15 billion chips cumulatively, with the ARM9 series accounting for a substantial portion of embedded deployments due to its balance of performance and power efficiency in 32-bit RISC designs. This success trained the developer ecosystem on scalable ARM architectures, enabling widespread adoption and solidifying ARM's dominance, where it captured over 95% market share in mobile processors by 2010.[77][78][79] Although ARM discontinued active development and support for new ARM9 designs around 2010 in favor of the Cortex series, the cores remain in use within legacy systems as of 2025, particularly in industrial controls, pre-2015 automotive ECUs, and IoT retrofits where replacement is uneconomical. For instance, older automotive electronic control units (ECUs) based on ARM9 continue to operate in vehicles, requiring secure bootloader updates to address evolving threats without full hardware overhauls. In industrial IoT applications, ARM9-based platforms serve as upgrade paths from even older architectures, supporting ongoing operations in cost-sensitive environments like smart factories.[59][80] In electrical engineering curricula, the ARM9 remains a key example for teaching fundamental pipeline concepts, such as its five-stage integer pipeline (fetch, decode, execute, memory, write-back), which illustrates RISC execution efficiency and hazard mitigation in embedded systems courses. Resources like ARM's technical documentation and university modules, such as EC8791 on ARM9 processors, emphasize these principles to build conceptual understanding of processor design. Open-source emulators, including those in QEMU, further enable hands-on experimentation with ARM9 behaviors in academic settings.[81][82][83] Economically, the ARM9 contributed to ARM Holdings' valuation growth through substantial licensing and royalties in the 2000s, representing a significant portion of licensing revenue during peak years and driving royalty streams from high-volume embedded chips into the 2010s and beyond. Even in 2016, classic cores like ARM9 accounted for a notable share of shipped units, sustaining royalties as legacy devices persisted in the market. This foundational revenue helped ARM transition to higher-margin architectures, bolstering its market cap to over $100 billion by the mid-2020s.[84][85] Legacy ARM9 deployments face challenges including security vulnerabilities in outdated features like Jazelle DBX, which, while innovative for Java acceleration, exposes systems to exploitation in unpatched environments without modern mitigations such as TrustZone. Migration to Cortex-M series incurs significant costs, including hardware redesign, software porting, and validation, often deterring upgrades in cost-sensitive industrial and automotive applications where ARM9's established ecosystem provides short-term stability over the expense of transitioning to more efficient 32-bit alternatives.[86][74]References
- https://en.wikichip.org/wiki/arm_holdings/microarchitectures/arm9/pr1

