Recent from talks
Nothing was collected or created yet.
Project Denver
View on Wikipedia| General information | |
|---|---|
| Launched | 2014 (Denver) 2016 (Denver 2) |
| Designed by | Nvidia |
| Cache | |
| L1 cache | 192 KiB per core (128 KiB I-cache with parity, 64 KiB D-cache with ECC) |
| L2 cache | 2 MiB @ 2 cores |
| Architecture and classification | |
| Technology node | 28 nm (Denver 1) to 16 nm (Denver 2) |
| Instruction set | ARMv8-A |
| Physical specifications | |
| Cores |
|
| General information | |
|---|---|
| Launched | 2018 |
| Designed by | Nvidia |
| Max. CPU clock rate | to 2.3 GHz |
| Cache | |
| L1 cache | 192 KiB per core (128 KiB I-cache with parity, 64 KiB D-cache with ECC) |
| L2 cache | 2 MiB @ 2 cores |
| L3 cache | (4 MiB @ 8 cores, T194[1]) |
| Architecture and classification | |
| Technology node | 12 nm |
| Instruction set | ARMv8.2-A |
| Physical specifications | |
| Cores |
|
- For the Soviet HIV disinformation campaign, see Operation Denver.
Project Denver is the codename of a central processing unit designed by Nvidia that implements the ARMv8-A 64/32-bit instruction sets using a combination of simple hardware decoder and software-based binary translation (dynamic recompilation) where "Denver's binary translation layer runs in software, at a lower level than the operating system, and stores commonly accessed, already optimized code sequences in a 128 MB cache stored in main memory".[2] Denver is a very wide in-order superscalar pipeline. Its design makes it suitable for integration with other SIPs cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC).
Project Denver is targeted at mobile computers, personal computers, servers, as well as supercomputers.[3] Respective cores have found integration in the Tegra SoC series from Nvidia. Initially Denver cores was designed for the 28 nm process node (Tegra model T132 aka "Tegra K1"). Denver 2 was an improved design that built for the smaller, more efficient 16 nm node. (Tegra model T186 aka "Tegra X2").
In 2018, Nvidia released an improved design (codename: "Carmel", based on ARMv8 (64-bit; variant: ARM-v8.2[1] with 10-way superscalar, functional safety, dual execution, parity & ECC) got integrated into the Tegra Xavier SoC offering a total of 8 cores (or 4 dual-core pairs).[4][failed verification] The Carmel CPU core supports full Advanced SIMD (ARM NEON), VFP (Vector Floating Point), and ARMv8.2-FP16.[1] First published testings of Carmel cores integrated in the Jetson AGX development kit by third party experts took place in September 2018 and indicated a noticeably increased performance as should expected for this real world physical manifestation compared to predecessors systems, despite all doubts the used quickness of such a test setup in general an in particular implies.[5] The Carmel design can be found in the Tegra model T194 ("Tegra Xavier") that is designed with a 12 nm structure size.
Overview
[edit]- Pipelined in-order superscalar processor
- 2-way decoder for ARM instructions
- On-the-fly binary translation of ARM code into internal VLIW instructions by hardware translator, uses software emulation as fallback
- Translation can reorder ARM instructions, and remove ones that do not contribute to the result[2]
- Up to 7 micro-ops per clock cycle with translated VLIW instructions; cannot run simultaneously with ARM decoder
- L1 cache: 128 KiB instruction + 64 KiB data per core (4-way set associative)
- 2 MiB shared L2 cache between two Denver cores (16-way set-associative)[6]
- Denver also sets aside 128 MiB of main memory to store translated VLIW code; this part of memory is inaccessible to the main operating system.
- Up to 2.5 GHz clockspeeds on TSMC 28 nm process[7]
Chips
[edit]A dual-core Denver CPU was paired with a Kepler-based GPU solution to form the Tegra K1; the dual-core 2.3 GHz Denver-based K1 was first used in the HTC Nexus 9 tablet, released November 3, 2014.[8][9] Note, however, that the quad-core Tegra K1, while using the same name, isn't based on Denver.
The Nvidia Tegra X2 has two Denver2 cores paired with four Cortex-A57 cores using a coherent HMP (Heterogeneous Multi-Processor Architecture) approach.[10] They are paired with a Pascal GPU.
The Tegra Xavier has a Volta GPU and several special purpose accelerators. The 8 Carmel CPU cores is divided into 4 ASIC macro blocks (each having 2 cores,) matched to each other with a crossbar and 4 MiB of shared L3 memory.
History
[edit]The existence of Project Denver was revealed at the 2011 Consumer Electronics Show.[11] In a March 4, 2011 Q&A article CEO Jen-Hsun Huang revealed that Project Denver is a five-year 64-bit ARMv8-A architecture CPU development on which hundreds of engineers had already worked for three and half years and which also has 32-bit ARM instruction set (ARMv7) backward compatibility.[12] Project Denver was started in Stexar Company (Colorado) as an x86-compatible processor using binary translation, similar to projects by Transmeta. Stexar was acquired by Nvidia in 2006.[13][14][15]
According to Tom's Hardware, there are engineers from Intel, AMD, HP, Sun and Transmeta on the Denver team, and they have extensive experience designing superscalar CPUs with out-of-order execution, very long instruction words (VLIW) and simultaneous multithreading (SMT).[16]
According to Charlie Demerjian, the Project Denver CPU may internally translate the ARM instructions to an internal instruction set, using firmware in the CPU.[17] Also according to Demerjian, Project Denver was originally intended to support both ARM and x86 code using code morphing technology from Transmeta, but was changed to the ARMv8-A 64-bit instruction set because Nvidia could not obtain a license to Intel's patents.[17]
The first consumer device shipping with Denver CPU cores, Google's Nexus 9, was announced on October 15, 2014. The tablet was manufactured by HTC and features the dual-core Tegra K1 SoC. The Nexus 9 was the first 64-bit Android device available to consumers.[18]
See also
[edit]References
[edit]- ^ a b c NVIDIA Jetson AGX Xavier Delivers 32 TeraOps for New Era of AI in Robotics by Dustin Franklin (Nvidia development team for Jetson), December 12, 2018
- ^ a b Wasson, Scott (August 11, 2014). "Nvidia claims Haswell-class performance for Denver CPU core". The Tech Report. Retrieved August 14, 2014.
- ^ Dally, Bill (January 5, 2011). ""PROJECT DENVER" PROCESSOR TO USHER IN NEW ERA OF COMPUTING". Official Nvidia blog.
- ^ NVIDIA Drive Xavier SOC Detailed by Hassan Mujtaba on Jan 8, 2018 via WccfTech
- ^ "A Quick Test of NVIDIA's "Carmel" CPU Performance".
- ^ Hachman, Mark (August 11, 2014). "Nvidia reveals PC-like performance for 'Denver' Tegra K1". PC World. Retrieved September 19, 2014.
- ^ Anthony, Sebastian (January 6, 2014). "Tegra K1 64-bit Denver core analysis: Are Nvidia's x86 efforts hidden within?". ExtremeTech. Retrieved January 7, 2014.
- ^ "Nexus 9 storms through Geekbench, Tegra K1 outperforms Apple iPhone 6's A8". 16 October 2014.
- ^ Shimpi, Anand (January 5, 2014). "NVIDIA Announces Tegra K1 SoC with Optional Denver CPU Cores". Anandtech. Archived from the original on January 7, 2014. Retrieved January 6, 2014.
- ^ NVIDIA Unveils Tegra Parker SOC at Hot Chips – Built on 16nm TSMC Process, Features Pascal and Denver 2 Duo Architecture, August 22, 2016
- ^ http://www.nvidia.com/object/ces2011.html Nvidia's press conference webcast
- ^ Takahashi, Dean (March 4, 2011). "Q&A: Nvidia chief explains his strategy for winning in mobile computing".
- ^ Valich, Theo (December 12, 2011). "NVIDIA Project Denver "Lost in Rockies", to Debut in 2014-15".
- ^ Miller, Paul (October 19, 2006). "NVIDIA has x86 CPU in the works?". Engadget. Retrieved October 19, 2013.
- ^ Valich, Theo (March 20, 2013). "New Tegra Roadmap Reveals Logan, Parker and Kayla CUDA Strategy".
- ^ Parrish, Kevin (October 14, 2013). "64-bit Nvidia Tegra 6 "Parker" Chip May Arrive in 2014. Devices with a 64-bit Tegra 6 could launch before the end of 2014". Tom's Hardware & ExtremeTech. Retrieved October 19, 2013.
- ^ a b Demerjian, Charlie (August 5, 2011). "What is Project Denver based on?". Semiaccurate.
- ^ Amadeo, Ron (October 15, 2014). "Google announces Nexus 6, Nexus 9, Nexus Player, and Android 5.0 Lollipop".
External links
[edit]- Valich, Theo (September 20, 2012). "NVIDIA Project Boulder Revealed: Tegra's Competitor Hides in GPU Group".
- Gwennap, Linley (August 18, 2014). "Nvidia's First CPU Is a Winner. Denver Uses Dynamic Translation to Outperform Mobile Rivals" (PDF). MPR, Linley Group.
Project Denver
View on GrokipediaIntroduction
Overview
Project Denver is the codename for NVIDIA's custom central processing unit (CPU) core that implements the ARMv8-A instruction set architecture, supporting both 64-bit (AArch64) and 32-bit (AArch32) modes for full compatibility.[1] The core purpose of Project Denver is to combine the energy efficiency characteristic of ARM processors—traditionally dominant in mobile devices—with the computational demands of personal computers and servers, achieved through tightly integrated CPU and GPU designs that leverage NVIDIA's expertise in parallel processing.[2][4] This initiative targets a broad spectrum of applications, from tablets and personal computers to data center servers and supercomputers, enabling scalable performance across diverse computing environments.[2] By developing its own ARM-compatible CPU, NVIDIA extends the ARM ecosystem beyond low-power mobile applications into high-performance computing, fostering innovations in heterogeneous computing architectures.[2]Objectives and Scope
Project Denver was initiated by NVIDIA with the strategic objective of developing high-performance, energy-efficient central processing units (CPUs) based on the ARM architecture to challenge the dominance of x86 processors in personal computers, servers, and supercomputing environments.[2] This effort aimed to leverage ARM's reduced instruction set computing (RISC) design principles to deliver superior power efficiency while maintaining competitive performance levels across diverse computing platforms.[5] The scope of Project Denver extended beyond initial mobile system-on-chips (SoCs) in the Tegra series, evolving toward integrated hybrid CPU-GPU architectures intended for widespread adoption in both consumer and enterprise applications, including tablets, workstations, and cloud infrastructure.[2] Through a strategic partnership with ARM Holdings, NVIDIA secured an architectural license to create fully custom CPU cores based on the ARM architecture, enabling tailored optimizations for advanced computing needs.[6] Anticipated benefits encompassed enhanced power efficiency to address the inefficiencies of traditional x86 systems, scalability for emerging workloads such as graphics processing and data analytics, and deep ecosystem integration with NVIDIA's parallel GPU technologies for accelerated computing.[5] These features positioned Project Denver as a foundational step toward heterogeneous computing paradigms that combine general-purpose processing with specialized acceleration.[2]History
Origins and Announcement
Prior to the official launch of Project Denver, NVIDIA explored developing an x86-compatible CPU in the late 2000s, licensing Transmeta's Tokamak technology—a RISC-based design intended for low-power translation of x86 instructions—to target server and personal computer markets. Rumors of this x86 development using Transmeta technology emerged in late 2009.[7] This effort, which began quietly around 2007, marked NVIDIA's initial foray into general-purpose CPU design, aiming to leverage Transmeta's expertise in efficient x86 emulation for competitive entry into high-performance computing.[8] Due to legal challenges associated with x86 intellectual property, the project pivoted to ARM architecture.[9] On January 5, 2011, at the Consumer Electronics Show (CES) in Las Vegas, NVIDIA publicly announced Project Denver as an initiative to design custom high-performance ARM-based CPU cores, integrated with its GPUs on a single chip.[2] The announcement highlighted NVIDIA's ambition to challenge x86 dominance in computing by harnessing ARM's low-power efficiency and open ecosystem for applications spanning personal computers, servers, workstations, and supercomputers.[2] CEO Jen-Hsun Huang emphasized the project's role in enabling "Internet Everywhere" devices with advanced operating systems and parallel computing capabilities.[10] To support this endeavor, NVIDIA formed a dedicated CPU design group, building on its 2007 internal efforts, and secured an architecture license from ARM Holdings to develop proprietary cores based on future ARM instruction sets.[2] This investment extended to the broader ARM ecosystem, including licensing the Cortex-A15 processor for initial Tegra integrations, positioning NVIDIA to innovate within ARM's growing influence in high-end computing.[2]Development Challenges and Architectural Shift
Following the 2011 announcement of Project Denver, NVIDIA encountered significant legal constraints stemming from its earlier licensing of Transmeta's x86 intellectual property, particularly the Tokamak technology designed for translating x86 code into a RISC instruction set.[9] These issues, which arose amid broader x86 patent litigations in the industry, ultimately forced NVIDIA to abandon its original x86-based plans for the processor.[11] As former Transmeta executive Dave Ditzel noted, "It originally started as an x86 but through certain legal issues, had to turn itself into an Arm CPU."[11] Following the pivot from x86, Project Denver was publicly announced as ARM-based in 2011, with a commitment to the ARMv8 instruction set by 2012 to enable 64-bit compatibility while leveraging its expertise in GPU integration for heterogeneous computing.[12] This redesign transformed Project Denver into a custom ARMv8-A CPU core, emphasizing dynamic code optimization (DCO) to bridge ARM's mobile heritage with server-grade performance needs.[9] The transition presented notable technical challenges, as ARM was primarily optimized for low-power mobile applications, requiring adaptations for high-performance workloads. Key hurdles included managing power efficiency in a superscalar, out-of-order execution model, where traditional designs incurred high energy costs and complexity; NVIDIA addressed this through DCO, which optimized hot code paths to deliver over seven ARM instructions per cycle while reducing branch misprediction penalties by up to 37% compared to contemporary ARM cores like the Cortex-A15. Scalability issues arose in balancing core performance with thermal and power budgets, particularly for integration with NVIDIA's GPU architectures, necessitating innovations like the CC4 retention state to lower voltage during short idle periods under 100 ms. Validation of the tightly coupled hardware-software system also proved complex, relying on extensive cosimulation to ensure reliability across AArch32 and AArch64 modes.Design and Architecture
Microarchitecture Details
The Denver microarchitecture employs a dual-issue in-order pipeline as its core execution model, capable of natively dispatching up to two ARM instructions per cycle, while achieving out-of-order-like performance through dynamic code optimization (DCO) that translates and optimizes guest ARM code into native micro-operations for superscalar execution.[13] This DCO mechanism simulates out-of-order execution by enabling register renaming, loop unrolling, load hoisting, and redundancy elimination in translated code blocks, stored in a dedicated optimization cache to boost throughput beyond the hardware's in-order limitations.[13] The design supports the full ARMv8-A instruction set architecture, including AArch64 for 64-bit addressing, AArch32 compatibility mode, and extensions for virtualization, cryptography, and advanced SIMD (NEON).[13] The integer pipeline comprises 15 stages, structured to minimize load-use dependencies through a skewed design that delays register file reads by three cycles after L1 data cache access, facilitating efficient load-ALU-store bundling and intrabundle forwarding. Branch misprediction incurs a 13-cycle penalty, addressed by an advanced predictor incorporating a global history buffer, branch target buffer, return address stack, and indirect target predictor, which achieves up to 37% lower mispredict rates compared to contemporary ARM cores like Cortex-A15.[13] The execution backend features seven wide superscalar units, including two integer ALUs (one with multiplier support), two 128-bit FP/NEON units, two load/store units, and a dedicated branch unit, enabling peak dispatch of seven micro-operations per cycle under DCO.[13] Cache hierarchies are configured for balanced latency and capacity in power-constrained environments, with a 128 KB four-way set-associative L1 instruction cache, a 64 KB four-way L1 data cache (three-cycle load-to-use latency), and a shared 2 MB 16-way L2 cache per dual-core cluster (18-cycle latency).[13] Translation lookaside buffers include a 128-entry four-way I-TLB, a 256-entry eight-way D-TLB supporting multiple page sizes, and a 2048-entry L2 TLB, complemented by a hardware prefetcher tracking up to 32 streams to mitigate misses in irregular access patterns.[13] The initial implementation targeted the 28 nm HPM process node, with clock speeds ranging from 1 GHz in low-power modes to up to 2.5 GHz for peak performance.[13]Key Innovations and Features
Project Denver introduced several innovative features that extended beyond the standard ARMv8 architecture, focusing on performance optimization, system integration, and efficiency tailored for NVIDIA's Tegra SoCs. A cornerstone innovation was its dynamic code optimization (DCO) mechanism, which employed a just-in-time (JIT) compiler to translate and optimize frequently executed ARM code regions on-the-fly. This approach identified "hot" code paths during runtime, recompiling them into more efficient micro-operations that reduced branch mispredictions and instruction redundancies, achieving up to 7 instructions per cycle in optimized workloads.[1] The CPU-GPU synergy in Project Denver represented a significant advancement in heterogeneous computing, with the Denver cores tightly integrated alongside NVIDIA's GPU within the Tegra K1 SoC. This on-chip architecture facilitated low-latency data sharing and unified memory access, enabling seamless task offloading between the CPU and GPU for compute-intensive applications like graphics rendering and parallel processing. By leveraging NVIDIA's CUDA ecosystem, the design supported direct CPU-to-GPU communication without external interfaces, enhancing overall system throughput in mobile and embedded scenarios.[14] Power efficiency was another key focus, incorporating adaptive voltage scaling and fine-grained clock gating optimized for battery-powered devices. The adaptive voltage scaling dynamically adjusted supply voltages based on workload demands, entering low-power states like CC4 during idle periods to minimize leakage while maintaining quick resumption. Complementing this, fine-grained clock gating disabled clocks to inactive pipeline stages and peripherals, achieving linear power scaling and 87% higher Dhrystone MIPS per watt compared to the Qualcomm APQ8084 at similar power levels.[1] Security extensions in Project Denver built upon ARM TrustZone by integrating NVIDIA-specific hardware root of trust mechanisms. This included secure boot processes rooted in immutable boot ROM and fused keys, ensuring authenticated code execution within isolated TrustZone environments to protect sensitive operations from software attacks. The hardware root of trust protected optimized regions against changes due to coherent I/O or CPU traffic, providing a robust foundation for trusted computing in Tegra-based systems.[1]Implementations
Tegra K1 Integration
The Tegra K1-64 represented the inaugural commercial integration of Project Denver cores into NVIDIA's mobile system-on-chip lineup, announced in January 2014 alongside the broader Tegra K1 family at CES. This 64-bit variant featured NVIDIA's custom-designed Denver CPU architecture, marking a shift from off-the-shelf ARM cores to in-house development for enhanced performance in mobile computing. Architectural details of the Denver integration were further elaborated in August 2014, highlighting its out-of-order execution and superscalar design for superior single-threaded efficiency. The chip began shipping in consumer devices later that year, with the Google Nexus 9 tablet serving as the flagship example, released in October 2014.[15][16][17] At its core, the Tegra K1-64 employed a dual-core Denver configuration clocked up to 2.5 GHz, paired with a 192-core Kepler GPU derived from NVIDIA's desktop graphics architecture to deliver PC-level rendering capabilities in a compact form. This setup supported advanced features like DirectX 11 and OpenGL 4.4, enabling high-fidelity gaming and multimedia on mobile platforms. Manufactured on TSMC's 28 nm HPM process, the SoC maintained a low-power envelope of approximately 5-10 W, optimized for battery-constrained environments while balancing compute demands. The Denver cores, building on the microarchitecture detailed in prior project phases, provided a 64-bit ARMv8 execution model with 7-way superscalar pipelines for improved instruction throughput.[18][14][19] Beyond tablets, the Tegra K1-64 found applications in gaming handhelds and early Android ecosystems, powering immersive experiences in devices like the NVIDIA Shield series derivatives. In automotive infotainment, it enabled advanced visual computing modules for in-vehicle systems, supporting Android-based interfaces, navigation, and multimedia rendering. These deployments underscored the chip's versatility in delivering high-performance graphics and processing within power-sensitive, embedded scenarios.[20]Project Denver 2 and Later Iterations
Following the initial implementation in the Tegra K1, NVIDIA developed Project Denver 2 as an enhanced iteration of its custom ARMv8-compatible CPU core, aimed at delivering superior single-threaded performance through advanced dynamic code optimization techniques. This second-generation design incorporated improvements to the original Denver's in-order pipeline, enabling higher instructions per cycle (IPC) rates—up to 7 micro-operations per cycle in optimized scenarios—while maintaining compatibility with ARMv8-A instruction sets. The core featured a wider execution pipeline and refined branch prediction mechanisms, including a global history buffer and return stack buffer, to reduce misprediction penalties and boost overall efficiency.[21][22] Announced as part of NVIDIA's 2015 roadmap during the Tegra X1 unveiling at CES, Denver 2 was initially planned for integration into the Tegra X1 SoC to provide out-of-order-like performance via binary translation and just-in-time compilation, targeting mobile and embedded applications with enhanced power efficiency on the 20 nm process. However, due to development timelines and a strategic "tick-tock" approach prioritizing rapid market entry with proven ARM IP, NVIDIA opted to replace Denver 2 with off-the-shelf ARM Cortex-A57 cores (four high-performance and four efficiency Cortex-A53 cores) in the final Tegra X1 design released later that year. This shift allowed the Tegra X1 to achieve broad adoption in devices like the NVIDIA Shield TV and Google Pixel C, while deferring custom core deployment.[23][24] Denver 2 ultimately debuted in 2016 within the Tegra X2 (codenamed Parker) SoC, fabricated on TSMC's 16 nm process, where it paired two Denver 2 cores with four Cortex-A57 cores in a heterogeneous big.LITTLE configuration alongside a 256-core Pascal GPU. This integration powered automotive and AI platforms such as the NVIDIA Drive PX 2 and Jetson TX2, delivering up to 1.5 times the CPU performance of the Tegra X1 while emphasizing perf/watt gains for edge computing tasks.[25][26] Beyond mobile SoCs, NVIDIA explored Project Denver variants for server and data center use cases around 2014–2015, envisioning high-performance ARM-based processors to compete in cloud and HPC environments with superior energy efficiency over x86 alternatives. These efforts, building on the original Denver's architecture, were ultimately shelved amid shifting priorities toward GPU-accelerated computing and partnerships with ARM licensees.[27] The experiences from Project Denver iterations informed NVIDIA's later custom CPU developments, notably the Grace CPU Superchip announced in 2021, which employs proprietary ARM Neoverse V1 cores optimized for data center workloads, achieving up to 10 times the performance of contemporary server CPUs in AI and HPC scenarios through high-bandwidth NVLink interconnects and scalable coherency. This marked a revival of NVIDIA's in-house CPU ambitions, leveraging lessons in dynamic optimization and ARM ecosystem integration from the Denver lineage.Impact and Legacy
Performance Evaluations
The Tegra K1 implementation of Project Denver, featuring dual 64-bit cores clocked up to 2.5 GHz, delivered competitive CPU performance in synthetic benchmarks suitable for mobile devices. In Geekbench 3 tests on devices like the Google Nexus 9, it recorded single-core scores of approximately 1,900 points, placing it on par with low-end Intel Core i3 processors such as the 4th-generation mobile variants in single-threaded workloads. Multi-core scores reached around 3,000 points, benefiting from the cores' high clock speeds despite the dual-core configuration. These results highlighted Denver's focus on single-thread efficiency over multi-thread parallelism compared to quad-core ARM contemporaries. Efficiency evaluations underscored Project Denver's advantages in power-constrained mobile scenarios, particularly when integrated with the Tegra K1's Kepler-based GPU. NVIDIA reported that the GPU provided 1.5 times the performance per watt of competing mobile graphics solutions, enabling up to twice the efficiency in graphics-intensive tasks like rendering and video processing relative to x86 equivalents in similar power envelopes. CPU power consumption under load typically ranged from 4-6 W, supporting extended battery life in tablets while outperforming ARM rivals like the Cortex-A15 in floating-point operations by up to 3x per core. In real-world applications on NVIDIA Shield devices, the Tegra K1 with Denver cores excelled in Android gaming and early 64-bit software. Titles such as Dead Trigger 2 and Real Racing 3 achieved frame rates exceeding 50 fps at high resolutions, while 64-bit apps like Google Maps ran smoothly with reduced latency compared to 32-bit counterparts. This performance extended to multimedia tasks, including 4K video decoding at 30 fps, demonstrating practical viability for gaming handhelds and tablets. Limitations emerged in sustained workloads, where thermal throttling could occur to maintain temperatures below 90°C, potentially reducing clock speeds after prolonged use in compact form factors. Additionally, Denver's in-order execution pipeline resulted in lower instructions per cycle (IPC) than the out-of-order Cortex-A15 in select integer-heavy tasks, such as certain database operations, despite overall higher clock-for-clock gains in other areas.Discontinuation and Industry Influence
In the mid-2010s, NVIDIA discontinued further development of custom Project Denver cores primarily due to the high complexity and extended timelines associated with in-house CPU design, opting instead for off-the-shelf ARM Cortex cores to expedite product releases. This shift became evident with the Tegra X1 SoC in 2015, which employed ARM Cortex-A57 and Cortex-A53 cores rather than Denver derivatives, allowing faster integration into mobile and embedded devices.[28] Intense market competition exacerbated this decision, as Qualcomm's Snapdragon series dominated Android devices with optimized, volume-produced SoCs, while Apple's custom A-series chips set performance benchmarks in iOS ecosystems, marginalizing NVIDIA's Tegra lineup.[29] The Tegra K1 remained the final major implementation featuring Denver cores. Despite its discontinuation, Project Denver exerted significant influence on the broader ARM ecosystem by pioneering high-performance, custom ARM CPU designs targeted at servers and supercomputers, which helped catalyze industry-wide interest in ARM-based data center solutions. This early demonstration of ARM's viability for demanding workloads contributed to the momentum behind server-grade ARM adoption, exemplified by Amazon Web Services' Graviton processors, which leverage custom ARM cores for cloud computing efficiency.[30] Within NVIDIA, the project laid foundational expertise that paved the way for subsequent Arm-based innovations, including the Grace CPU superchip and its integration with Hopper GPUs for AI and high-performance computing.[31] On the mobile front, Project Denver accelerated the transition to 64-bit ARM architectures, with the Tegra K1's Denver CPU enabling the first 64-bit ARM processor in Android devices by late 2014, prompting Google to prioritize 64-bit support in Android 5.0 Lollipop and influencing ecosystem-wide upgrades.[32] As of 2025, Project Denver's legacy endures in NVIDIA's AI server CPUs, such as the Vera CPU, which reintroduces custom ARM cores for enhanced performance in data centers, though without reviving the Denver architecture directly.[33]References
- https://en.wikichip.org/wiki/nvidia/microarchitectures/denver
