Hubbry Logo
ARM Cortex-A78ARM Cortex-A78Main
Open search
ARM Cortex-A78
Community hub
ARM Cortex-A78
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
ARM Cortex-A78
ARM Cortex-A78
from Wikipedia
ARM Cortex-A78
General information
Launched2020
Designed byARM Ltd.
Performance
Max. CPU clock rate2.4 GHz to 3.0 GHz in phones and 3.3 GHz in tablets/laptops 
Cache
L1 cache32–64 KB (parity)

32kb L1 Instruction cache and 32kb L1 Data cache. or

64kb L1 Instruction cache and 64kb L1 Data cache.
L2 cache256–512 (private L2 ECC) KiB
L3 cacheOptional, 512 KB to 4 MB (A78, A78AE)
Optional, 512 KB to 8 MB (A78C)
Architecture and classification
MicroarchitectureARM Cortex-A78
Instruction setARMv8-A
Extensions
Physical specifications
Cores
  • 1–4 per cluster (A78, A78AE)
    1–8 per cluster (A78C)
Products, models, variants
Product code name
  • Hercules
Variant
History
PredecessorARM Cortex-A77
SuccessorARM Cortex-A710

The ARM Cortex-A78 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Ltd.'s Austin centre.[failed verification][1]

Design

[edit]

The ARM Cortex-A78 is the successor to the ARM Cortex-A77. It can be paired with the ARM Cortex-X1 and/or ARM Cortex-A55 CPUs in a DynamIQ configuration to deliver both performance and efficiency. The processor also claims as much as 50% energy savings over its predecessor.[2]

The Cortex-A78 is a 4-wide decode out-of-order superscalar design with a 1.5K macro-OP (MOPs) cache. It can fetch 4 instructions and 6 Mops per cycle, and rename and dispatch 6 Mops, and 12 μops per cycle. The out-of-order window size is 160 entries and the backend has 13 execution ports with a pipeline depth of 14 stages, and the execution latencies consist of 10 stages.[2][3][4]

The processor is built on a standard Cortex-A roadmap and offers a 2.1 GHz (5 nm) chipset which makes it better than its predecessor in the following ways:

  • 7% better performance
  • 4% lower power consumption
  • 5% smaller, meaning 15% more area serving for a quad-core cluster, extra GPU, NPU

There is also extended scalability with extra support from Dynamic Shared Unit for DynamIQ on the chipset. A smaller 32 KB L1 cache from the 64 KB L1 cache configuration is optional. To offset this smaller L1 memory, the branch predictor is better at covering irregular search patterns and is capable of following two taken branches per cycle, which results in fewer L1 cache misses and helps hide pipeline bubbles to keep the core well supplied. The pipeline is one cycle longer compared to the A77, which ensures that the A78 hits a clock frequency target of around 3 GHz. The A78 is a 6 instruction per cycle design.

ARM also introduced a second integer multiply unit in the execution unit and an additional load Address Generation Unit (AGU) to increase both the data load and bandwidth by 50%. Other optimizations of the chipset include fused instructions[5] and efficiency improvements to instruction schedulers, register renaming structures, and the re-order buffer.

L2 cache is available up to 512 KB and has double the bandwidth to maximize the performance, while the shared L3 cache is available up to 4 MB, double that of previous generations. A Dynamic Shared Unit (DSU) also allows for an 8 MB configuration with the ARM Cortex-X1.[3][4][2][6]

Variants

[edit]

Cortex-A78C

[edit]

The Cortex-A78C is targeted for productivity and gaming applications, it increases the max core support from 4 to 8 cores and from 4MB to 8MB of L3 cache.[7]

Cortex-A78AE

[edit]

The Cortex-A78AE is targeted for security/safety applications.[8]

Licensing

[edit]

The Cortex-A78 is available as a SIP core to licensees whilst its design makes it suitable for integration with other SIP cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC).[citation needed]

Usage

[edit]

The Cortex-A78 was first used in Samsung Exynos 2100 SoC, introduced in November and December 2020 respectively.[9][10] The custom Kryo 680 Gold core used in the Snapdragon 888[broken anchor] SoC is based on the Cortex-A78 microarchitecture.[11][12] The Cortex-A78 is also used in the MediaTek Dimensity 1200 and 8000 series. The device is also used in Nvidia's BlueField-3 and 3X DPUs, and in the HiSilicon Kirin 9000s, released in August 2023.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The ARM Cortex-A78 is a high-performance, power-efficient (CPU) core developed by as part of its DynamIQ shared memory multi-processing technology, implementing the Armv8.2-A 64-bit with extensions for enhanced scalar, vector, and floating-point processing. Designed primarily for premium mobile devices, laptops, and emerging form factors like foldables, it supports up to four cores per DynamIQ cluster and features configurable L1 instruction and data caches of 32 KB to 64 KB each, along with a private L2 cache of 256 KB to 512 KB per core. Announced in May 2020, the Cortex-A78 emphasizes sustained performance for demanding workloads such as gaming, (XR), and (ML), while delivering up to 20% better sustained performance than its predecessor, the Cortex-A77, within the same mobile thermal power envelope. Key architectural enhancements in the Cortex-A78 include improved branch prediction, instruction fusion, and load/store optimizations, resulting in 7% higher single-threaded performance and 4% lower power consumption per performance point compared to the Cortex-A77. It also achieves 8% less power usage for ML-based tasks, contributing to 10% overall efficiency gains in AI-driven applications like real-time and features. Supporting 40-bit physical addressing and interfaces like AMBA CHI for coherent memory access, the core integrates seamlessly with little cores such as the Cortex-A55 in heterogeneous big.LITTLE configurations, enabling multi-day battery life in 5G-enabled smartphones and tablets. The Cortex-A78 powers flagship systems-on-chip (SoCs) from vendors like , (), and (), bridging the performance gap between mobile and with scalability for up to 3 GHz clock speeds in optimized designs. Variants include the Cortex-A78AE for safety-critical automotive and industrial applications, offering lock-step dual-core redundancy and 48-bit addressing, and the Cortex-A78C for client PCs, supporting up to eight cores with 8 MB shared L3 cache and frequencies up to 3.3 GHz. These adaptations highlight its versatility in delivering immersive digital experiences while prioritizing energy efficiency amid the rise of and AI workloads.

Introduction

Overview

The ARM Cortex-A78 is a 64-bit CPU core compatible with the ARMv8.2-A , designed to deliver with optimized power for demanding applications. As the fourth-generation premium core in Arm's DynamIQ lineup, it builds on the Austin family microarchitecture to enable sustained performance improvements while maintaining thermal constraints typical of mobile devices. This core targets primary applications in flagship smartphones, tablets, mainstream laptops, and -enabled devices that support immersive experiences such as (XR), tasks, and multi-screen interactions. It plays a central role in the ecosystem by bridging performance gaps between mobile and laptop computing, facilitating innovations in foldable devices and energy-efficient architectures. The Cortex-A78 offers general scalability, configurable for deployment in clusters of 1 to 4 cores within Arm's DynamIQ technology, and is compatible with big.LITTLE heterogeneous architectures that pair high-performance "big" cores with efficient "LITTLE" cores like the Cortex-A55. This flexibility allows system designers to balance performance, power, and area across diverse device form factors.

Development and Announcement

The ARM Cortex-A78 was officially announced on May 26, 2020, during Arm Tech Day 2020, as part of the company's 2020 mobile IP portfolio reveal. This event highlighted the core's role in advancing capabilities amid the rise of technologies. The design originated from Arm's facility in , where the team focused on creating a processor that could support evolving device form factors. Development of the Cortex-A78 was driven by the need to close the performance divide between mobile devices and laptops, while emphasizing power efficiency to accommodate power-hungry 5G applications and enable multi-day battery life in premium smartphones and emerging devices like foldables. Positioned as the direct successor to the Cortex-A77, it incorporated refinements aimed at mitigating thermal throttling challenges observed in sustained workloads on its predecessor, such as those in prolonged video playback or multitasking scenarios. These goals targeted a 20% uplift in sustained performance within the same thermal power envelope as the A77. Silicon samples of the Cortex-A78 became available to licensees in late , allowing partners to integrate the core into their system-on-chip designs. The first commercial implementations appeared in 2021 SoCs, powering flagship mobile devices from manufacturers including and .

Architecture

ISA and Extensions

The ARM Cortex-A78 implements the Armv8.2-A (ISA), operating primarily in the 64-bit execution state for high-performance applications. It maintains with the 32-bit AArch32 execution state, supporting the Thumb-2 instruction set at Exception Level 0 (EL0) to enable legacy software execution in user mode. This base ISA provides a with separate instruction and data pipelines, ensuring efficient handling of modern 64-bit workloads while preserving compatibility for mixed-mode environments. Key extensions in the Cortex-A78 enhance its capabilities for specific workloads, including the instructions introduced in Armv8.4-A, which accelerate operations by enabling efficient integer (INT8) dot product computations on vectors. It also supports the Enhanced Vector Floating Point (FP16) extension from Armv8.2-A, allowing half-precision floating-point operations in Advanced SIMD () for improved performance in graphics and AI tasks without full double-precision overhead. Additionally, the core includes Armv8.1-A and Armv8.2-A extensions for atomic operations and , along with partial Armv8.3-A support limited to Load-Acquire/Store-Release Pair (LDAPR) instructions; however, it offers only readiness for the Scalable Vector Extension (SVE), without full of SVE or SVE2 vector processing features, which are reserved for subsequent cores like the Cortex-A710. Cryptographic extensions for AES and SHA processing are optional but commonly integrated to bolster data security in system-on-chip designs. Security features are integral to the Cortex-A78, with full support for Arm TrustZone technology to enable secure isolation between normal and secure worlds for trusted execution environments. The core does not support Armv8.3-A features such as Pointer Authentication Codes (PAC) or Branch Target Identification (BTI), nor Armv9-A features such as memory tagging or advanced SVE2, positioning it firmly within the Armv8-A ecosystem.

Microarchitecture Details

The ARM Cortex-A78 employs an out-of-order execution microarchitecture with out-of-order issue, enabling efficient handling of instruction dependencies while maintaining a balance between performance and power consumption. Instructions are fetched and decoded into macro-operations (MOPs), which may be fused for optimization, before being split into micro-operations (μOPs) for dispatch to the execution backend. This design supports up to 6 MOPs dispatched per cycle, with a maximum of 12 μOPs per cycle under certain constraints. The integer pipeline comprises 13 stages, facilitating high throughput while minimizing latency for common operations. Load-to-use latency is 4 cycles for L1 cache hits, with dual-issue support for loads and stores to enhance memory access efficiency. The core lacks (SMT) or , operating as a single-threaded processor per core to prioritize and power efficiency in mobile and embedded applications. Key execution units include three integer arithmetic logic units (ALUs) for single-cycle operations, two dedicated branch units to handle , and two load/store units capable of performing two 16-byte loads or one 32-byte store per cycle. The floating-point and unit provides 128-bit vector processing support, with two pipelines (V0 and V1) for scalar and SIMD instructions, enabling efficient execution of multimedia and AI workloads. Branch prediction utilizes a hybrid mechanism combining TAGE and gshare predictors, offering improved accuracy and bandwidth—supporting up to two taken branches per cycle—over previous generations to reduce misprediction penalties and boost overall instruction throughput. The is configurable for area optimization, particularly targeting 5nm nodes, allowing licensees to balance die size, , and power through options like cache sizing and extension inclusions while adhering to Armv8.2-A compatibility.

Performance and Power Efficiency

Improvements over Predecessors

The ARM Cortex-A78 delivers a 20% increase in sustained over its predecessor, the Cortex-A77, within the same mobile thermal power envelope of 1 W, as evidenced by SPECint2006 benchmark scores where the A78 achieves higher throughput without exceeding power limits. This uplift addresses limitations in prior cores by prioritizing longevity under load, enabling more consistent operation in demanding scenarios. Additionally, the core provides a 7% gain in peak single-threaded through refined tweaks, allowing brief bursts of higher speed before thermal constraints apply. Efficiency advancements are particularly notable, with up to 50% reduction compared to 2019 devices delivering equivalent Cortex-A77 levels, achieved through optimizations that minimize power draw across workloads. In tasks specifically, the A78 consumes 8% less power than the A77 for the same output, contributing to an overall 10% improvement that reduces throttling and extends battery life in multi-day usage patterns. These gains stem from broader architectural refinements, including wider execution bandwidth via an expanded dispatch width, enhanced branch prediction for greater accuracy and bandwidth to cut stalls, and an optimized dispatch queue that sustains high instruction throughput without the rapid drops seen in the A77 under prolonged stress. Building on the microarchitectural lineage of the Cortex-A76 and A77 family—rooted in Armv8.2-A—the A78 refines these foundations for 5G-era demands, such as high-resolution video streaming and applications, by balancing peak capabilities with sustained efficiency to support immersive, always-on experiences. This evolution maintains compatibility with DynamIQ clustering while introducing targeted enhancements that better handle irregular workloads common in modern mobile ecosystems.

Clock Speeds and Benchmarks

The ARM Cortex-A78 typically operates at clock speeds ranging from 2.4 GHz to 3.0 GHz in implementations, such as the Qualcomm Snapdragon 888 where the three performance cores run at 2.4 GHz. In tablet and laptop configurations, it can reach up to 3.0 GHz, as seen in MediaTek's Kompanio 1380 SoC, which utilizes four Cortex-A78 cores clocked at this frequency. Benchmark results highlight the core's balance of performance and efficiency. In standardized tests on early 2021 implementations like the Snapdragon 888, the Cortex-A78 achieves a Geekbench 5 single-core score of approximately 1135 and a multi-core score of around 3794 in an 1+3+4 core configuration (one Cortex-X1 prime core paired with three A78 cores). For integer workloads, it delivers a SPECint2006 base score of about 30 at 1 W per core, supporting multi-day battery life in mobile devices under typical thermal constraints. Compared to the predecessor Cortex-A77, the A78 exhibits 8% lower power draw for inference tasks, contributing to a 10% overall improvement in heterogeneous big.LITTLE configurations. These results vary by SoC integration, process technology (e.g., 5 nm), and cooling solutions, with data primarily from 2021 devices like those powered by the Snapdragon 888.

Variants

Cortex-A78C

The Cortex-A78C is a variant of the Cortex-A78 CPU core, announced by ARM on November 2, 2020, and designed specifically for high-core-count configurations in compute-intensive devices. It builds on the base Cortex-A78 by extending support for larger DynamIQ clusters, enabling up to eight high-performance cores per cluster compared to the standard four-core limit of the Cortex-A78. This variant incorporates a shared 8 MB L3 cache—double the maximum of 4 MB available in the standard Cortex-A78—to enhance multi-threaded in scenarios with heavy core utilization. Cache configurations remain flexible for , with L1 instruction and caches each configurable at 32 KB or 64 KB per core, private L2 caches at 256 KB or 512 KB per core, and the optional L3 cache scaling up to 8 MB for the cluster. These features, combined with an improved interconnect fabric optimized for larger clusters, deliver better for shared-cache multi-threaded workloads while preserving the power efficiency of the underlying Cortex-A78 design. Targeted at always-on laptops and automotive systems, the Cortex-A78C supports demanding applications such as tasks and gaming, where sustained multi-core is critical. It maintains compatibility with the Mali-G78 GPU, allowing seamless integration into system-on-chip (SoC) designs for these markets.

Cortex-A78AE

The Cortex-A78AE is a variant of the ARM Cortex-A78 processor core, specifically engineered for safety-critical applications in automotive environments. Announced on September 29, 2020, as part of Arm's expanded automotive portfolio, it builds on the Cortex-A78 while incorporating dedicated hardware for compliance. Key differences from the standard Cortex-A78 include support for ASIL-D certification under the standard, as well as up to SIL 3, achieved through features like dual-core execution modes and integrated mechanisms in the execution units. In mode, pairs of cores run identical workloads in parallel, comparing outputs to detect faults and ensure deterministic behavior, while hybrid modes allow flexible allocation of safety levels without sacrificing performance. These enhancements enable extended diagnostics, such as parity checks on instruction caches and ECC on data caches, alongside safety wrappers that monitor and isolate potential errors in real-time. The core maintains the performance profile of the Cortex-A78, offering approximately 30% higher single-threaded performance compared to its predecessor, the Cortex-A76AE, at equivalent power efficiency. Targeted at advanced driver-assistance systems (ADAS), in-vehicle , and autonomous driving platforms, the Cortex-A78AE supports configurations of up to four cores per cluster, organized as two dual-core pairs for optimal . It features cache structures similar to the base Cortex-A78—64 KB L1 instruction cache with parity , 64 KB L1 data cache with ECC, and 512 KB private L2 cache per core—but augmented with safety-specific monitoring to meet automotive reliability requirements. This design facilitates deployment in domain controllers and software-defined vehicles, where high reliability is paramount without compromising the energy efficiency inherited from the Cortex-A78 family.

Licensing and Implementations

Licensing

The ARM Cortex-A78 is offered as a synthesizable IP (SIP) core, enabling partners to integrate it into custom system-on-chips (SoCs) under ARM's Core License agreements, which provide rights to use the pre-designed processor without the broader design freedoms of an Architectural License. This model supports scalability across high-performance mobile and embedded applications while maintaining compatibility with the Armv8.2-A architecture. Designed for seamless integration within ARM's DynamIQ technology, the Cortex-A78 pairs efficiently with complementary IP blocks such as Mali GPUs for graphics processing, Ethos NPUs for acceleration, and display controllers, all connected via the DynamIQ Shared Unit (DSU) to form heterogeneous clusters. The DSU facilitates shared L3 cache and interconnects, allowing customization during RTL synthesis to optimize for specific area, power, and performance targets. Licensing involves substantial upfront fees, coupled with per-unit royalties on manufactured and shipped chips, negotiated based on volume and integration scope. There are no open-source releases or free licensing tiers available for the Cortex-A78, restricting access to qualified commercial partners. The agreements are non-exclusive, permitting multiple licensees to implement independently, though access to certain proprietary implementation details may require non-disclosure agreements (NDAs) beyond the publicly available Technical Reference Manual. The core became available to partners in late following its announcement earlier that year.

SoC Integrations and Devices

The ARM Cortex-A78 core has been integrated into numerous high-performance System-on-Chip (SoC) designs, primarily for premium mobile devices, leveraging its efficiency in big.LITTLE configurations alongside efficiency cores like the Cortex-A55. Key implementations include Qualcomm's Snapdragon 888, announced in December 2020, which features one Cortex-X1 prime core at 2.84 GHz, three Cortex-A78 performance cores at 2.42 GHz, and four Cortex-A55 efficiency cores at 1.8 GHz, fabricated on a . Similarly, MediaTek's Dimensity 1200, launched in January 2021, employs one Cortex-A78 core at 3.0 GHz, three at 2.6 GHz, and four Cortex-A55 cores at 2.0 GHz on a 6 nm node, targeting mid-to-high-end smartphones. Samsung's 1080, introduced in November 2020, uses one Cortex-A78 core at 2.8 GHz, three at 2.6 GHz, and four Cortex-A55 cores at 2.0 GHz, also on 5 nm, marking an early adoption for balanced performance. The 2100, debuted in January 2021 for flagship devices, mirrors the Snapdragon 888's structure with one Cortex-X1 at 2.91 GHz, three Cortex-A78 at 2.81 GHz, and four Cortex-A55 at 2.2 GHz on 5 nm. Other notable integrations include UNISOC's T820 series (launched in 2022), featuring up to four Cortex-A78 cores at 2.7 GHz for mid-range Android devices. These SoCs have powered a range of consumer devices, particularly smartphones in the premium segment. Notable examples include the series, which utilized either the Snapdragon 888 or 2100 depending on region, delivering enhanced connectivity and multitasking capabilities. Devices with the MediaTek Dimensity 1200, such as the Realme GT Neo 2, , and vivo V23 Pro, emphasized fast charging and camera performance in mid-range flagships. Beyond phones, the Cortex-A78 appears in tablets and laptops via MediaTek's Kompanio series, like the Kompanio 1300T (based on Dimensity 1300 with four Cortex-A78 cores at up to 3.0 GHz), integrated into s such as the Acer Chromebook Spin 714 for efficient web-based computing. Typical cluster configurations pair 1-4 Cortex-A78 cores for high-performance tasks with 4-6 Cortex-A55 cores for background efficiency, enabling sustained operation in power-constrained environments; these were among the first widespread 5 nm implementations in 2021 flagships, contributing to better thermal management and battery life. The adoption of Cortex-A78-based SoCs facilitated the rollout of premium devices with improved multi-day battery performance, powering numerous devices in the mobile market and influencing the shift toward more efficient architectures. By 2023, the core began phasing out in favor of successors like the Cortex-A710 in new designs, though it persists in some embedded and automotive applications for its proven reliability.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.