Recent from talks
All channels
Be the first to start a discussion here.
Be the first to start a discussion here.
Be the first to start a discussion here.
Be the first to start a discussion here.
Welcome to the community hub built to collect knowledge and have discussions related to ARM Cortex-A8.
Nothing was collected or created yet.
ARM Cortex-A8
View on Wikipediafrom Wikipedia
| General information | |
|---|---|
| Launched | 2005 |
| Designed by | ARM Holdings |
| Common manufacturer | |
| Performance | |
| Max. CPU clock rate | 0.6 GHz to at least 1.0 GHz[1][additional citation(s) needed] |
| Physical specifications | |
| Cores |
|
| Cache | |
| L1 cache | 32 KiB/32 KiB |
| L2 cache | 512 KiB |
| Architecture and classification | |
| Instruction set | ARMv7-A |
The ARM Cortex-A8 is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture.
Compared to the ARM11, the Cortex-A8 is a dual-issue superscalar design, achieving roughly twice the instructions per cycle. The Cortex-A8 was the first Cortex design to be adopted on a large scale in consumer devices.[2]
Features
[edit]Key features of the Cortex-A8 core are:
- Frequency from 600 MHz to 1 GHz and above
- Superscalar dual-issue microarchitecture
- NEON SIMD instruction set extension [3]
- 13-stage integer pipeline and 10-stage NEON pipeline [4]
- VFPv3 floating-point unit
- Thumb-2 instruction set encoding
- Jazelle RCT (also known as ThumbEE instruction set)
- Advanced branch prediction unit with >95% accuracy
- Integrated level 2 Cache (0–4 MiB)
- 2.0 DMIPS/MHz
Chips
[edit]Several system-on-chips (SoC) have implemented the Cortex-A8 core, including:
- Allwinner A1X
- Apple A4
- Freescale Semiconductor i.MX51 [5]
- Rockchip RK2918, RK2906 [6]
- Samsung Exynos 3110
- TI OMAP3
- TI Sitara ARM Processors
- Conexant CX92755 [7]
See also
[edit]References
[edit]- ^ "Cortex-A8". ARM Developer. Retrieved January 3, 2023.
- ^ Gupta, Rahul (April 26, 2013). "ARM Cortex: The force that drives mobile devices". The Mobile Indian. Retrieved 2023-07-30.
- ^ Cortex-A8 Specification Summary; ARM Holdings.
- ^ Williamson, David, ARM Cortex A8: A High Performance Processor for Low Power Applications (PDF), archived from the original (PDF) on 2015-01-01
- ^ "i.MX51 Applications Processor and Linux Hands on" (PDF). Archived from the original (PDF) on 2011-11-19. Retrieved 2011-10-20.
- ^ "RK29XX". Archived from the original on 2011-11-05.
- ^ "CX97255" (PDF). Archived from the original (PDF) on 2012-11-19.
External links
[edit]ARM Cortex-A8
View on Grokipediafrom Grokipedia
The ARM Cortex-A8 is a high-performance, low-power, single-core 32-bit RISC processor core that implements the ARMv7-A architecture and provides full virtual memory capabilities through an integrated memory management unit (MMU). Introduced in 2005 as the first core in the Cortex-A family, it features a dual-issue superscalar pipeline with 13 stages, advanced branch prediction achieving over 95% accuracy, and support for technologies like NEON SIMD extensions for multimedia acceleration, ARM TrustZone for security, and the Thumb-2 instruction set for improved code density.[1][2]
Designed primarily for power-optimized mobile devices and embedded systems, the Cortex-A8 scales from 600 MHz to over 1 GHz clock speeds while consuming less than 300 mW of power, making it suitable for applications requiring efficient 32-bit computing such as smartphones, tablets, and consumer electronics from the late 2000s.[1] It includes optional integrated L1 and L2 caches (up to 1 MB for L2) and Vector Floating Point (VFPv3) for enhanced floating-point performance.[3] The core supports in-order execution and integrates with ARM's CoreSight debug and trace components for development and optimization. It also includes Jazelle RCT for Java acceleration via Thumb-2EE.[1]
Historically, the Cortex-A8 marked a significant advancement over previous ARM designs like the ARM11 by roughly doubling instructions per cycle through its superscalar architecture, paving the way for subsequent Cortex-A series processors such as the A9 and A15.[2] First implemented in silicon around 2008 on processes down to 45 nm, it powered notable devices including the Apple iPhone 3GS and various Texas Instruments OMAP platforms, contributing to the proliferation of ARM-based computing in portable gadgets.[2] Although superseded by more efficient multi-core designs, its legacy endures in legacy systems and as a benchmark for low-power, high-performance ARM IP.[1]
Overview
Introduction
The ARM Cortex-A8 is a 32-bit reduced instruction set computing (RISC) processor core developed by ARM Holdings that implements the ARMv7-A architecture, providing full support for virtual memory and advanced operating systems.[1] As the inaugural high-performance core in the Cortex-A series, it marked a significant evolution from earlier ARM designs by emphasizing enhanced instruction throughput while maintaining low power characteristics suitable for battery-constrained environments.[4] Announced in October 2005, the Cortex-A8 represented ARM's push into more demanding application processing, with first silicon implementations appearing in 2008, enabling its integration into early smartphones and other portable devices.[4][5] It quickly became a cornerstone for the burgeoning mobile computing market, powering a wide range of consumer products and establishing the Cortex-A lineage as a standard for ARM-based application processors.[1] At its core, the Cortex-A8 employs a dual-issue superscalar, in-order execution model augmented by advanced branch prediction mechanisms, such as a global history-based predictor with a branch target buffer, to achieve up to twice the instruction throughput of prior ARM cores like the ARM11.[6] This design targets applications in mobile devices, embedded systems, and consumer electronics, where it delivers balanced performance for multimedia and general computing tasks.[1] Physically, the core occupies less than 3 mm² of die area in a 65 nm low-power process (excluding NEON coprocessor and caches), with typical power consumption ranging from 300 mW at 600 MHz to around 600 mW at 1 GHz frequencies.[7][2]History
The ARM Cortex-A8 processor was announced on October 4, 2005, at the ARM Developers' Conference in Santa Clara, California, marking it as the first high-performance core based on the ARMv7-A architecture.[4] Designed to deliver up to twice the performance of the preceding ARM11 while maintaining low power consumption for mobile and consumer devices, the Cortex-A8 aimed to bridge the gap between the efficient but limited ARM11 and upcoming advanced high-end cores, emphasizing enhanced integer and floating-point processing alongside support for Thumb-2 instructions.[8] This positioning responded to growing demands for multimedia-rich applications in portable electronics, with initial licensing made available immediately to enable integration into system-on-chips (SoCs).[8] The first tape-outs and silicon validations occurred in 2007-2008, led by licensees such as Texas Instruments (TI), which became ARM's inaugural silicon partner for the core and integrated it into its OMAP3 platform.[9] Samsung followed with early implementations around the same period, focusing on high-speed variants for mobile SoCs.[10] Under ARM's intellectual property (IP) licensing model, the Cortex-A8 core design was sold exclusively to semiconductor companies, who then customized and fabricated it within their own SoCs for specific applications, generating revenue for ARM through upfront fees and royalties per shipped unit.[11] Initial widespread adoption surged in 2009-2010, powering the first wave of high-end smartphones and tablets as device makers sought its balance of performance and efficiency.[12] A pivotal milestone came in 2010 with its integration into Apple's A4 SoC, which debuted in the first-generation iPad and iPhone 4, propelling the core to mainstream success in consumer markets and solidifying its role in the smartphone revolution.[13] However, competition intensified with the announcement of the more advanced Cortex-A9 in October 2007, which offered multicore capabilities and began displacing the single-core A8 in new designs by the early 2010s.[14] Support for the Cortex-A8, aligned with the ARMv7 architecture, continued through extensions and tools into the mid-2010s, after which focus shifted to ARMv8-based successors.[1] Despite this, the core persists in legacy industrial and embedded applications post-2020, benefiting from ongoing ARMv7 ecosystem maintenance for long-term deployments.[1]Architecture
Core Design
The ARM Cortex-A8 core implements the ARMv7-A architecture with a register file comprising 16 general-purpose 32-bit registers (R0-R15) and program status registers (PSRs), where R15 functions as the program counter and R14 as the link register. In ARM state, all 16 registers and associated PSRs are directly accessible for data processing and control operations. The Thumb-2 execution state expands register accessibility by enabling 16-bit instructions to utilize higher-numbered registers (R8-R15) alongside the standard low registers (R0-R7), which supports denser code without compromising performance. For system integration, the core employs the AMBA AXI (Advanced eXtensible Interface) protocol as its primary bus interface, facilitating high-bandwidth connections to external memory, caches, and peripherals. This interface supports configurable read/write data bus widths of 64 bits or 128 bits, determined by the A64n128 input pin, and handles multiple outstanding transactions with burst lengths up to 16 words for efficient data movement.[15] The integer execution unit at the heart of the core includes an arithmetic logic unit (ALU) for performing essential arithmetic (add, subtract, multiply) and logical (AND, OR, XOR) operations on 32-bit operands. Integrated with the ALU is a barrel shifter that enables fast variable shifts, rotations, and immediate value adjustments on the second operand for most data-processing instructions, reducing instruction count and enhancing efficiency in integer computations. The overall microarchitecture of the Cortex-A8 is in-order superscalar, featuring dual symmetric integer units that allow dual-issue of compatible instructions within a 13-stage pipeline to achieve higher instruction throughput while maintaining simplicity and low power.[16] Clocking and reset mechanisms in the core support dynamic voltage and frequency scaling (DVFS) via configurable clock domains and power control signals, permitting runtime adjustments to operating frequency and supply voltage for energy efficiency without halting execution. Reset functionality includes asynchronous inputs for the processor core, NEON unit, and debug components, ensuring reliable initialization and recovery from power-down states.[17]Pipeline and Execution Units
The ARM Cortex-A8 processor implements a 13-stage dual-issue integer pipeline, enabling in-order execution of up to two instructions per cycle to enhance throughput while maintaining simplicity in design.[18] The pipeline is divided into key phases: fetch (including address generation and instruction buffering), decode (spanning multiple stages for instruction analysis and dependency resolution), issue (where instructions are dispatched to execution units), execute (comprising sub-stages E1 through E5 for arithmetic and memory operations), and writeback (for result commitment to the register file).[18] This structure allows for efficient handling of ARM and Thumb instructions, with the dual-issue capability restricted to compatible pairs such as two data-processing operations or one load/store alongside another instruction.[18] Branch prediction in the Cortex-A8 employs a dynamic two-level global history mechanism to mitigate the impact of control flow changes in the deep pipeline. It features a 512-entry, two-way set-associative Branch Target Buffer (BTB) for storing branch targets and prediction patterns, augmented by a 4096-entry Global History Buffer (GHB) and an 8-entry return stack for subroutine calls.[18] A mispredicted branch incurs a penalty of 13 cycles, as the pipeline must flush and refill from the corrected target address.[18] This predictor achieves high accuracy for typical workloads, reducing stalls and supporting the processor's overall instruction-level parallelism. The load/store unit supports up to two loads or one store per cycle, with non-blocking load operations that permit continued execution despite pending memory accesses.[18] It interfaces with the level-1 data cache and handles address generation, translation, and data movement, ensuring low-latency memory operations critical for performance in embedded applications. For integer arithmetic, the Cortex-A8 includes two symmetric Arithmetic Logic Units (ALUs) that enable parallel execution of simple operations like additions and logical functions, contributing to the pipeline's ability to sustain high throughput.[18] In terms of overall efficiency, the pipeline delivers up to 2 instructions per cycle (IPC) under optimal conditions, reflecting its dual-issue design.[18]Instruction Set Support
The ARM Cortex-A8 implements the ARMv7-A architecture, supporting the A32 instruction set, which consists of 32-bit fixed-length instructions for high-performance applications, and the T32 instruction set, encompassing Thumb-2 technology that mixes 16-bit and 32-bit instructions to achieve improved code density comparable to earlier Thumb while maintaining performance close to A32.[1][19] All instructions in both A32 and T32 support conditional execution based on the processor's condition flags (N, Z, C, V in the CPSR/APSR register), allowing up to four conditional instructions without branching via the IT (If-Then) construct in Thumb-2, which reduces overhead in control flow.[19][20] The Cortex-A8 supports Thumb-2EE, an extension of Thumb-2 for accelerating dynamic languages like Java through Jazelle RCT (Randomly Compiled Translation), enabling hardware-assisted real-time compilation into Thumb instructions to reduce the memory footprint of interpreted code. Jazelle DBX (Direct Bytecode eXecution) is not supported; the Jazelle state cannot be entered, and the BXJ instruction behaves as a standard branch.[21][6] Security is enhanced through TrustZone extensions, which partition the system into secure and non-secure worlds, with the NS (Non-Secure) bit in the CPSR controlling access to resources and enabling a secure monitor mode to handle transitions via the SMC (Secure Monitor Call) instruction, ensuring isolation for trusted execution environments like digital rights management.[1][19] The processor supports standard ARMv7-A operating modes—User, Supervisor, System, IRQ, FIQ, Abort, and Undefined—for handling different privilege levels and exceptions, with User mode operating at privilege level 0 (unprivileged) and the others at level 1 (privileged); TrustZone adds a Monitor mode in the secure world to manage world switches.[19][20]Memory and Peripherals
Cache Hierarchy
The ARM Cortex-A8 processor implements a two-level on-chip cache hierarchy to improve memory access performance while minimizing power consumption. The level 1 (L1) caches are split into separate instruction and data caches, both of which are 4-way set-associative with configurable sizes of 16 KB or 32 KB and 64-byte cache lines.[22] The L1 instruction cache is virtually indexed and physically tagged (VIPT), enabling parallel lookup with virtual address translation.[23] Similarly, the L1 data cache uses VIPT organization with alias detection to handle potential virtual address conflicts, ensuring correct operation in virtual memory environments.[24] The L1 data cache operates with a write-back policy and allocates a line on write misses to maintain efficiency for sequential writes. To mitigate stalls from store operations, the cache system includes a write buffer with 8 doubleword entries (64 bytes total), which merges and buffers writes before committing them to the L1 cache or external memory, reducing bus traffic and pipeline disruptions.[25] L1 cache miss penalties are approximately 11 cycles for loads, allowing the pipeline to continue with critical-word-first refilling to minimize disruption.[25][26] The level 2 (L2) cache is a unified structure external to the core, connected via the AMBA AXI interface and configurable in size from 0 KB to 1 MB in 128 KB increments, typically implemented with an ARM L2 cache controller such as the PrimeCell PL310.[27] It is physically indexed and physically tagged (PIPT) with 64-byte lines and supports write-back and write-allocate policies, often configured as 16-way set-associative in implementations like the PL310 to balance hit rates and complexity.[23] L2 miss penalties are 18 cycles plus external memory latency (typically around 40-50 cycles, for a total of approximately 60 cycles), depending on system configuration and outstanding requests.[25][26] For cache coherency in multi-core systems, the Cortex-A8 integrates support for an AXI-based Snoop Control Unit (SCU), which maintains consistency between L1 caches and the shared L2 cache through snoop requests, although the core is primarily designed for single-core use. The SCU enables hardware-managed coherency protocols, including debug state preservation, to ensure data visibility across cores without excessive software overhead.[25]Memory Management
The ARM Cortex-A8 implements a Memory Management Unit (MMU) compliant with the ARMv7 architecture's short-descriptor translation table format, enabling efficient virtual-to-physical address translation using 4 KB pages as the base granularity, while supporting larger page sizes of 64 KB and 1 MB for improved performance in handling bigger memory allocations.[28] This format organizes translation tables into hierarchical levels, with first-level descriptors pointing to second-level tables or directly specifying section mappings, allowing the MMU to resolve addresses through hardware walks when necessary.[28] The TLB hierarchy in the Cortex-A8 consists of separate 32-entry fully associative L1 instruction TLB (I-TLB) and data TLB (D-TLB) for low-latency first-level lookups, supplemented by a 256-entry unified L2 TLB that captures misses from both L1 TLBs and supports all page sizes in a 4-way set-associative configuration.[28] The L1 TLBs are lockable to preserve critical translations, and the L2 TLB includes mechanisms for lockdown and preload operations to optimize access patterns in demanding workloads.[28] This setup operates within a 32-bit virtual address space, providing up to 4 GB of addressable memory per process, mapped to a 32-bit physical address space.[23] Memory protection in the Cortex-A8 relies on domain-based access control with 16 configurable domains managed through the Domain Access Control Register in coprocessor 15 (CP15), where each domain can be set to modes such as No Access, Client (check page permissions), or Manager (full access regardless of permissions). Page table entries further enforce granular permissions via Access Permission (AP) bits for read/write control and the Execute-Never (XN) bit to restrict execution, ensuring secure separation of user and privileged code regions.[28] Context switching is accelerated by CP15 registers, including the 8-bit Address Space Identifier (ASID) in the Context ID Register, which tags TLB entries to avoid full flushes during process switches by invalidating only ASID-specific entries, and the Translation Table Base Registers (TTBR0 and TTBR1) that point to per-process translation tables for rapid reconfiguration. This design minimizes overhead in multitasking environments while maintaining isolation through ASID-based disambiguation.Key Features
Performance Optimizations
The ARM Cortex-A8 employs a dual-issue, in-order pipeline that enables out-of-order-like execution effects by simultaneously issuing two instructions per cycle, such as a load operation paired with an ALU computation, thereby improving instruction throughput without the complexity of full dynamic scheduling.[6] These hardware mechanisms allow the processor to achieve higher instructions per cycle while maintaining low power consumption through static scheduling. Power efficiency is further optimized via extensive clock gating, which disables clocks to idle pipeline stages and execution units, and power gating using multi-threshold CMOS (MT-CMOS) techniques to cut leakage in standby modes, resulting in significant dynamic and static power reductions during varying workloads.[6] On the software side, compiler optimizations leveraging the Thumb-2 instruction set extension deliver approximately 30% better code density compared to the traditional 32-bit ARM instructions, allowing denser binaries that fit more effectively in limited memory while preserving execution performance.[29][30] Performance metrics underscore these optimizations, with the core delivering about 2.0 Dhrystone 2.1 MIPS per MHz, enabling over 2000 DMIPS at typical clock rates.[6][31] Similarly, CoreMark scores reach around 3200 at 1 GHz, reflecting strong integer processing capability.[32] In terms of scalability, implementations in 45 nm processes achieved clock speeds up to 1.5 GHz around 2010, supporting high-performance mobile applications while adhering to power constraints.[33]Multimedia and SIMD Extensions
The ARM Cortex-A8 integrates the NEON advanced SIMD extension as a dedicated 128-bit wide co-processor to accelerate multimedia, signal processing, and data-parallel workloads. This unit features a shared register bank of 32 × 128-bit registers, which can be viewed as 32 × 64-bit double-word registers (D0–D31) for scalar operations or 16 × 128-bit quad-word registers (Q0–Q15) for vector processing, enabling flexible data handling across integer and floating-point formats. NEON supports a comprehensive set of vector instructions, including arithmetic operations such as vector addition (VADD) and multiplication (VMUL) for 8-bit, 16-bit, and 32-bit signed/unsigned integers, as well as single-precision and double-precision floating-point equivalents (e.g., VADD.F32, VMUL.F64). These instructions operate on packed data elements within the 128-bit vectors, allowing simultaneous processing of multiple pixels or samples to boost efficiency in tasks like filtering and transformations. Additional capabilities include shifts (VSHR), permutations, and load/store operations with support for unaligned accesses in normal and device memory regions. The NEON unit is fully integrated with the VFPv3 floating-point unit, sharing the register file and execution pipelines to enable unified handling of scalar and vector floating-point computations compliant with IEEE 754 standards. This integration allows the VFP to execute instructions like fused multiply-add (VFMA) and division (VDIV) using the NEON floating-point pipeline, which includes two dedicated floating-point execution units capable of issuing up to two SIMD instructions per cycle for integer and floating-point operations. The combined architecture supports short-vector processing for single-precision operations in as few as 7 cycles under run-fast mode, providing up to four 32-bit words of throughput per cycle when backed by the L1 data cache.[34][35] In multimedia applications, NEON's Advanced SIMD instructions excel at accelerating video codecs, such as H.264 baseline profile decoding, where vectorized motion compensation and inverse discrete cosine transforms reduce computational requirements; for instance, optimized implementations on Cortex-A8 achieve 30 frames per second for 720×480 D1 resolution streams at typical clock speeds. The extensions also facilitate audio processing, such as MP3 decoding through parallel SIMD operations on filter banks, and image processing tasks like edge detection via byte-level vector arithmetic.[36][37] These capabilities proved essential in early smartphones, such as those based on the Cortex-A8 without discrete GPUs, where NEON handled software-based graphics acceleration, 2D rendering, and basic 3D transformations to deliver responsive user interfaces and media playback.[1]Implementations
System-on-Chips
The ARM Cortex-A8 core was integrated into various single-core system-on-chips (SoCs) by multiple semiconductor manufacturers, targeting mobile, consumer, and industrial applications with clock speeds typically ranging from 600 MHz to 1 GHz.[38] Texas Instruments' OMAP3630, released in 2009, featured a 1 GHz Cortex-A8 core fabricated on a 45 nm process node and included a PowerVR SGX530 graphics processing unit (GPU) for multimedia acceleration. This SoC was designed for high-performance mobile devices, emphasizing power efficiency and integration of imaging, video, and display peripherals.[39] Samsung's S5PC110, codenamed Hummingbird and launched in 2009, incorporated a 1 GHz Cortex-A8 core on a 45 nm process, powering early smartphones with support for advanced connectivity and multimedia features. It was optimized for battery-constrained environments, delivering up to 2000 DMIPS of performance.[40] Apple's A4 SoC, introduced in 2010, utilized a custom implementation of the 1 GHz Cortex-A8 core on a 45 nm process node fabricated by Samsung, paired with a PowerVR SGX535 GPU to enable hardware-accelerated graphics and video decoding.[13] This design focused on seamless integration for tablet and smartphone platforms, balancing compute power with thermal management.[41] Freescale Semiconductor's i.MX51, announced in 2008, employed an 800 MHz Cortex-A8 core on a 65 nm process, tailored for industrial and automotive applications with robust peripheral support including Ethernet and LCD controllers.[38] It prioritized reliability and multimedia processing in embedded systems.[42] All Cortex-A8 implementations were strictly single-core, lacking native multi-core support, with process nodes evolving from 65 nm in early designs to as low as 40 nm in later revisions for improved efficiency.[43]Notable Devices and Applications
The ARM Cortex-A8 processor powered several landmark smartphones in the late 2000s and early 2010s, marking a significant step in mobile computing performance. The Apple iPhone 3GS, released in 2009, featured a Samsung S5PC100 system-on-chip with a 600 MHz Cortex-A8 core, enabling smoother multitasking and faster app launches compared to prior ARM11-based devices.[44] Similarly, the 2010 Samsung Galaxy S series utilized the Samsung S5PC110 (Hummingbird) SoC, clocked at 1 GHz, which supported advanced graphics rendering and contributed to the device's reputation for high-definition media playback.[45] The Apple iPhone 4, released in 2010, used the A4 SoC with an 800 MHz Cortex-A8 core, introducing Retina display support and improved performance for iOS applications.[46] In tablets and media players, the Cortex-A8 facilitated the rise of portable multimedia consumption. Apple's first-generation iPad, launched in 2010, incorporated the custom A4 SoC with a 1 GHz Cortex-A8 core, allowing for fluid web browsing and video streaming on a larger form factor.[13] The Barnes & Noble Nook Color, also from 2010, employed a Texas Instruments OMAP3621 processor at 800 MHz, blending e-reading with Android app support and color touchscreen capabilities.[47] Beyond consumer gadgets, the Cortex-A8 found applications in embedded systems, particularly automotive infotainment and set-top boxes. Freescale's i.MX51 family, based on the Cortex-A8, was integrated into early automotive head units for navigation and media playback, offering robust processing for in-vehicle entertainment systems.[48] In set-top boxes, devices like the Optimum CloudAlive utilized Freescale i.MX53 SoCs with Cortex-A8 cores to deliver Android-based streaming and IPTV services.[49] These implementations highlighted the Cortex-A8's role in enabling 720p video decoding and encoding, which supported early high-definition content in apps and media ecosystems, though its single-core design limited scalability for more demanding tasks.[50] By 2012, adoption shifted toward the multi-core Cortex-A9 in flagship devices, as seen in successors like the Samsung Galaxy S II and Apple iPhone 4S, phasing out the A8 in mainstream consumer markets.[51] As of 2025, the Cortex-A8 persists in legacy industrial and IoT applications, such as development boards like the BeagleBone Black with TI AM3358 processors, where vendors continue providing security patches to maintain compatibility in embedded environments.[52]References
- https://en.wikichip.org/wiki/arm_holdings/microarchitectures/cortex-a8