Hubbry Logo
AMD APUAMD APUMain
Open search
AMD APU
Community hub
AMD APU
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
AMD APU
AMD APU
from Wikipedia

AMD APU
A-series APU
Release date2011 (Original); 2017 (Zen based)
CodenameFusion
Desna
Ontario
Zacate
Llano
Hondo
Trinity
Weatherford
Richland
Kaveri
Godavari
Kabini
Temash
Carrizo
Bristol Ridge
Raven Ridge
Picasso
Renoir
Cezanne
Phoenix
IGP
Wrestler
WinterPark
BeaverCreek
ArchitectureAMD64
Models
Cores1 to 8
Transistors
  • 32 nm 1.178B (Llano)
  • 32 nm 1.303B (Trinity)
  • 32 nm 1.3B (Richland)
  • 28 nm 2.41B (Kaveri)
  • 14 nm 4.95B (Raven Ridge)
  • 12 nm (Picasso)
  • 7 nm (Renoir & Cezanne)
  • 6 nm (Rembrandt)
  • 4 nm (Phoenix)
API support
OpenCL1.2
OpenGL4.1+
DirectXDirect3D 11
Direct3D 12
History
PredecessorAthlon II
Sempron
SuccessorRyzen
Zen-based Athlon

AMD Accelerated Processing Unit (APU), formerly known as Fusion, is a series of 64-bit microprocessors from Advanced Micro Devices (AMD), combining a general-purpose AMD64 central processing unit (CPU) and 3D integrated graphics processing unit (IGPU) on a single die.

AMD announced the first generation APUs, Llano for high-performance and Brazos for low-power devices, in January 2011 and launched the first units on June 14.[1][2] The second generation Trinity for high-performance and Brazos-2 for low-power devices were announced in June 2012. The third generation Kaveri for high performance devices were launched in January 2014, while Kabini and Temash for low-power devices were announced in the summer of 2013. Since the launch of the Zen microarchitecture, Ryzen and Athlon APUs have released to the global market as Raven Ridge on the DDR4 platform, after Bristol Ridge a year prior.

AMD has also supplied semi-custom APUs for consoles starting with the release of Sony PlayStation 4 and Microsoft Xbox One eighth generation video game consoles.

History

[edit]

The AMD Fusion project started in 2006 with the aim of developing a system on a chip that combined a CPU with a GPU on a single die. This effort was moved forward by AMD's acquisition of graphics chipset manufacturer ATI[3] in 2006. The project reportedly required three internal iterations of the Fusion concept to create a product deemed worthy of release.[3] Reasons contributing to the delay of the project include the technical difficulties of combining a CPU and GPU on the same die at a 45 nm process, and conflicting views on what the role of the CPU and GPU should be within the project.[4]

The first generation desktop and laptop APU, codenamed Llano, was announced on 4 January 2011 at the 2011 Consumer Electronics Show in Las Vegas and released shortly thereafter.[5][6] It featured K10 CPU cores and a Radeon HD 6000 series GPU on the same die on the FM1 socket. An APU for low-power devices was announced as the Brazos platform, based on the Bobcat microarchitecture and a Radeon HD 6000 series GPU on the same die.[7]

At a conference in January 2012, corporate fellow Phil Rogers announced that AMD would re-brand the Fusion platform as the Heterogeneous System Architecture (HSA), stating that "it's only fitting that the name of this evolving architecture and platform be representative of the entire, technical community that is leading the way in this very important area of technology and programming development."[8] However, it was later revealed that AMD had been the subject of a trademark infringement lawsuit by the Swiss company Arctic, who used the name "Fusion" for a line of power supply products.[9]

The second generation desktop and laptop APU, codenamed Trinity, was announced at AMD's 2010 Financial Analyst Day[10][11] and released in October 2012.[12] It featured Piledriver CPU cores and Radeon HD 7000 series GPU cores on the FM2 socket.[13] AMD released a new APU based on the Piledriver microarchitecture on 12 March 2013 for Laptops/Mobile and on 4 June 2013 for desktops under the codename Richland.[14] The second generation APU for low-power devices, Brazos 2.0, used exactly the same APU chip, but ran at higher clock speed and rebranded the GPU as Radeon HD 7000 series and used a new I/O controller chip.

Semi-custom chips were introduced in the Microsoft Xbox One and Sony PlayStation 4 video game consoles,[15][16] and subsequently in the Microsoft Xbox Series X|S and Sony PlayStation 5 consoles.

A third generation of the technology was released on 14 January 2014, featuring greater integration between CPU and GPU. The desktop and laptop variant is codenamed Kaveri, based on the Steamroller architecture, while the low-power variants, codenamed Kabini and Temash, are based on the Jaguar architecture.[17]

Since the introduction of Zen-based processors, AMD renamed their APUs as the Ryzen with Radeon Graphics and Athlon with Radeon Graphics, with desktop units assigned with G suffix on their model numbers (e.g. Ryzen 5 3400G & Athlon 3000G) to distinguish them from regular processors or with basic graphics and also to differentiate away from their former Bulldozer era A-series APUs. The mobile counterparts were always paired with Radeon Graphics regardless of suffixes.

Features

[edit]

Heterogeneous System Architecture

[edit]

AMD is a founding member of the Heterogeneous System Architecture (HSA) Foundation and is consequently actively working on developing HSA in cooperation with other members. The following hardware and software implementations are available in AMD's APU-branded products:

Type HSA feature First implemented Notes
Optimized Platform GPU Compute C++ Support 2012
Trinity APUs
Support OpenCL C++ directions and Microsoft's C++ AMP language extension. This eases programming of both CPU and GPU working together to process support parallel workloads.
HSA-aware MMU GPU can access the entire system memory through the translation services and page fault management of the HSA MMU.
Shared Power Management CPU and GPU now share the power budget. Priority goes to the processor most suited to the current tasks.
Architectural Integration Heterogeneous Memory Management: the CPU's MMU and the GPU's IOMMU share the same address space.[18][19] 2014
PlayStation 4,
Kaveri APUs
CPU and GPU now access the memory with the same address space. Pointers can now be freely passed between CPU and GPU, hence enabling zero-copy.
Fully coherent memory between CPU and GPU GPU can now access and cache data from coherent memory regions in the system memory, and also reference the data from CPU's cache. Cache coherency is maintained.
GPU uses pageable system memory via CPU pointers GPU can take advantage of the shared virtual memory between CPU and GPU, and pageable system memory can now be referenced directly by the GPU, instead of being copied or pinned before accessing.
System Integration GPU compute context switch 2015
Carrizo APU
Compute tasks on GPU can be context switched, allowing a multi-tasking environment and also faster interpretation between applications, compute and graphics.
GPU graphics pre-emption Long-running graphics tasks can be pre-empted so processes have low latency access to the GPU.
Quality of service[18] In addition to context switch and pre-emption, hardware resources can be either equalized or prioritized among multiple users and applications.

Feature overview

[edit]

The following table shows features of AMD's processors with 3D graphics, including APUs (see also: List of AMD processors with 3D graphics).

Platform High, standard and low power Low and ultra-low power
Codename Server Basic Toronto
Micro Kyoto
Desktop Performance Raphael Phoenix
Mainstream Llano Trinity Richland Kaveri Kaveri Refresh (Godavari) Carrizo Bristol Ridge Raven Ridge Picasso Renoir Cezanne
Entry
Basic Kabini Dalí
Mobile Performance Renoir Cezanne Rembrandt Dragon Range
Mainstream Llano Trinity Richland Kaveri Carrizo Bristol Ridge Raven Ridge Picasso Renoir
Lucienne
Cezanne
Barceló
Phoenix
Entry Dalí Mendocino
Basic Desna, Ontario, Zacate Kabini, Temash Beema, Mullins Carrizo-L Stoney Ridge Pollock
Embedded Trinity Bald Eagle Merlin Falcon,
Brown Falcon
Great Horned Owl Grey Hawk Ontario, Zacate Kabini Steppe Eagle, Crowned Eagle,
LX-Family
Prairie Falcon Banded Kestrel River Hawk
Released Aug 2011 Oct 2012 Jun 2013 Jan 2014 2015 Jun 2015 Jun 2016 Oct 2017 Jan 2019 Mar 2020 Jan 2021 Jan 2022 Sep 2022 Jan 2023 Jan 2011 May 2013 Apr 2014 May 2015 Feb 2016 Apr 2019 Jul 2020 Jun 2022 Nov 2022
CPU microarchitecture K10 Piledriver Steamroller Excavator "Excavator+"[20] Zen Zen+ Zen 2 Zen 3 Zen 3+ Zen 4 Bobcat Jaguar Puma Puma+[21] "Excavator+" Zen Zen+ "Zen 2+"
ISA x86-64 v1 x86-64 v2 x86-64 v3 x86-64 v4 x86-64 v1 x86-64 v2 x86-64 v3
Socket Desktop Performance AM5
Mainstream AM4
Entry FM1 FM2 FM2+ FM2+[a], AM4 AM4
Basic AM1 FP5
Other FS1 FS1+, FP2 FP3 FP4 FP5 FP6 FP7 FL1 FP7
FP7r2
FP8
FT1 FT3 FT3b FP4 FP5 FT5 FP5 FT6
PCI Express version 2.0 3.0 4.0 5.0 4.0 2.0 3.0
CXL
Fab. (nm) GF 32SHP
(HKMG SOI)
GF 28SHP
(HKMG bulk)
GF 14LPP
(FinFET bulk)
GF 12LP
(FinFET bulk)
TSMC N7
(FinFET bulk)
TSMC N6
(FinFET bulk)
CCD: TSMC N5
(FinFET bulk)

cIOD: TSMC N6
(FinFET bulk)
TSMC 4nm
(FinFET bulk)
TSMC N40
(bulk)
TSMC N28
(HKMG bulk)
GF 28SHP
(HKMG bulk)
GF 14LPP
(FinFET bulk)
GF 12LP
(FinFET bulk)
TSMC N6
(FinFET bulk)
Die area (mm2) 228 246 245 245 250 210[22] 156 180 210 CCD: (2x) 70
cIOD: 122
178 75 (+ 28 FCH) 107 ? 125 149 ~100
Min TDP (W) 35 17 12 10 15 65 35 4.5 4 3.95 10 6 12 8
Max APU TDP (W) 100 95 65 45 170 54 18 25 6 54 15
Max stock APU base clock (GHz) 3 3.8 4.1 4.1 3.7 3.8 3.6 3.7 3.8 4.0 3.3 4.7 4.3 1.75 2.2 2 2.2 3.2 2.6 1.2 3.35 2.8
Max APUs per node[b] 1 1
Max core dies per CPU 1 2 1 1
Max CCX per core die 1 2 1 1
Max cores per CCX 4 8 2 4 2 4
Max CPU[c] cores per APU 4 8 16 8 2 4 2 4
Max threads per CPU core 1 2 1 2
Integer pipeline structure 3+3 2+2 4+2 4+2+1 1+3+3+1+2 1+1+1+1 2+2 4+2 4+2+1
i386, i486, i586, CMOV, NOPL, i686, PAE, NX bit, CMPXCHG16B, AMD-V, RVI, ABM, and 64-bit LAHF/SAHF Yes Yes
IOMMU[d] v2 v1 v2
BMI1, AES-NI, CLMUL, and F16C Yes Yes
MOVBE Yes
AVIC, BMI2, RDRAND, and MWAITX/MONITORX Yes
SME[e], TSME[e], ADX, SHA, RDSEED, SMAP, SMEP, XSAVEC, XSAVES, XRSTORS, CLFLUSHOPT, CLZERO, and PTE Coalescing Yes Yes
GMET, WBNOINVD, CLWB, QOS, PQE-BW, RDPID, RDPRU, and MCOMMIT Yes Yes
MPK, VAES Yes
SGX
FPUs per core 1 0.5 1 1 0.5 1
Pipes per FPU 2 2
FPU pipe width 128-bit 256-bit 80-bit 128-bit 256-bit
CPU instruction set SIMD level SSE4a[f] AVX AVX2 AVX-512 SSSE3 AVX AVX2
3DNow! 3DNow!+
PREFETCH/PREFETCHW Yes Yes
GFNI Yes
AMX
FMA4, LWP, TBM, and XOP Yes Yes
FMA3 Yes Yes
AMD XDNA Yes
L1 data cache per core (KiB) 64 16 32 32
L1 data cache associativity (ways) 2 4 8 8
L1 instruction caches per core 1 0.5 1 1 0.5 1
Max APU total L1 instruction cache (KiB) 256 128 192 256 512 256 64 128 96 128
L1 instruction cache associativity (ways) 2 3 4 8 2 3 4 8
L2 caches per core 1 0.5 1 1 0.5 1
Max APU total L2 cache (MiB) 4 2 4 16 1 2 1 2
L2 cache associativity (ways) 16 8 16 8
Max on-die L3 cache per CCX (MiB) 4 16 32 4
Max 3D V-Cache per CCD (MiB) 64
Max total in-CCD L3 cache per APU (MiB) 4 8 16 64 4
Max. total 3D V-Cache per APU (MiB) 64
Max. board L3 cache per APU (MiB)
Max total L3 cache per APU (MiB) 4 8 16 128 4
APU L3 cache associativity (ways) 16 16
L3 cache scheme Victim Victim
Max. L4 cache
Max stock DRAM support DDR3-1866 DDR3-2133 DDR3-2133, DDR4-2400 DDR4-2400 DDR4-2933 DDR4-3200, LPDDR4-4266 DDR5-4800, LPDDR5-6400 DDR5-5200 DDR5-5600, LPDDR5x-7500 DDR3L-1333 DDR3L-1600 DDR3L-1866 DDR3-1866, DDR4-2400 DDR4-2400 DDR4-1600 DDR4-3200 LPDDR5-5500
Max DRAM channels per APU 2 1 2 1 2
Max stock DRAM bandwidth (GB/s) per APU 29.866 34.132 38.400 46.932 68.256 102.400 83.200 120.000 10.666 12.800 14.933 19.200 38.400 12.800 51.200 88.000
GPU microarchitecture TeraScale 2 (VLIW5) TeraScale 3 (VLIW4) GCN 2nd gen GCN 3rd gen GCN 5th gen[23] RDNA 2 RDNA 3 TeraScale 2 (VLIW5) GCN 2nd gen GCN 3rd gen[23] GCN 5th gen RDNA 2
GPU instruction set TeraScale instruction set GCN instruction set RDNA instruction set TeraScale instruction set GCN instruction set RDNA instruction set
Max stock GPU base clock (MHz) 600 800 844 866 1108 1250 1400 2100 2400 400 538 600 ? 847 900 1200 600 1300 1900
Max stock GPU base GFLOPS[g] 480 614.4 648.1 886.7 1134.5 1760 1971.2 2150.4 3686.4 102.4 86 ? ? ? 345.6 460.8 230.4 1331.2 486.4
3D engine[h] Up to 400:20:8 Up to 384:24:6 Up to 512:32:8 Up to 704:44:16[24] Up to 512:32:8 768:48:8 128:8:4 80:8:4 128:8:4 Up to 192:12:8 Up to 192:12:4 192:12:4 Up to 512:?:? 128:?:?
IOMMUv1 IOMMUv2 IOMMUv1 ? IOMMUv2
Video decoder UVD 3.0 UVD 4.2 UVD 6.0 VCN 1.0[25] VCN 2.1[26] VCN 2.2[26] VCN 3.1 ? UVD 3.0 UVD 4.0 UVD 4.2 UVD 6.2 VCN 1.0 VCN 3.1
Video encoder VCE 1.0 VCE 2.0 VCE 3.1 VCE 2.0 VCE 3.4
AMD Fluid Motion No Yes No No Yes No
GPU power saving PowerPlay PowerTune PowerPlay PowerTune[27]
TrueAudio Yes[28] ? Yes
FreeSync 1
2
1
2
HDCP[i] ? 1.4 2.2 2.3 ? 1.4 2.2 2.3
PlayReady[i] 3.0 not yet 3.0 not yet
Supported displays[j] 2–3 2–4 3 3 (desktop)
4 (mobile, embedded)
4 2 3 4 4
/drm/radeon[k][30][31] Yes Yes
/drm/amdgpu[k][32] Yes[33] Yes[33]
  1. ^ For FM2+ Excavator models: A8-7680, A6-7480 & Athlon X4 845.
  2. ^ A PC would be one node.
  3. ^ An APU combines a CPU and a GPU. Both have cores.
  4. ^ Requires firmware support.
  5. ^ a b Requires firmware support.
  6. ^ No SSE4. No SSSE3.
  7. ^ Single-precision performance is calculated from the base (or boost) core clock speed based on a FMA operation.
  8. ^ Unified shaders : texture mapping units : render output units
  9. ^ a b To play protected video content, it also requires card, operating system, driver, and application support. A compatible HDCP display is also needed for this. HDCP is mandatory for the output of certain audio formats, placing additional constraints on the multimedia setup.
  10. ^ To feed more than two displays, the additional panels must have native DisplayPort support.[29] Alternatively active DisplayPort-to-DVI/HDMI/VGA adapters can be employed.
  11. ^ a b DRM (Direct Rendering Manager) is a component of the Linux kernel. Support in this table refers to the most current version.

APU or Radeon Graphics branded platforms

[edit]

AMD APUs have CPU modules, cache, and a discrete-class graphics processor, all on the same die using the same bus. This architecture allows for the use of graphics accelerators, such as OpenCL, with the integrated graphics processor.[34] The goal is to create a "fully integrated" APU, which, according to AMD, will eventually feature 'heterogeneous cores' capable of processing both CPU and GPU work automatically, depending on the workload requirement.[35]

TeraScale-based GPU

[edit]

K10 architecture (2011): Llano

[edit]
AMD A6-3650 (Llano)

The first generation APU, released in June 2011, was used in both desktops and laptops. It was based on the K10 architecture and built on a 32 nm process featuring two to four CPU cores on a thermal design power (TDP) of 65-100 W, and integrated graphics based on the Radeon HD 6000 series with support for DirectX 11, OpenGL 4.2 and OpenCL 1.2. In performance comparisons against the similarly priced Intel Core i3-2105, the Llano APU was criticised for its poor CPU performance[38] and praised for its better GPU performance.[39][40] AMD was later criticised for abandoning Socket FM1 after one generation.[41]

Bobcat architecture (2011): Ontario, Zacate, Desna, Hondo

[edit]

The AMD Brazos platform was introduced on 4 January 2011, targeting the subnotebook, netbook and low power small form factor markets.[5] It features the 9-watt AMD C-Series APU (codename: Ontario) for netbooks and low power devices as well as the 18-watt AMD E-Series APU (codename: Zacate) for mainstream and value notebooks, all-in-ones and small form factor desktops. Both APUs feature one or two Bobcat x86 cores and a Radeon Evergreen Series GPU with full DirectX11, DirectCompute and OpenCL support including UVD3 video acceleration for HD video including 1080p.[5]

AMD expanded the Brazos platform on 5 June 2011 with the announcement of the 5.9-watt AMD Z-Series APU (codename: Desna) designed for the Tablet market.[42] The Desna APU is based on the 9-watt Ontario APU. Energy savings were achieved by lowering the CPU, GPU and northbridge voltages, reducing the idle clocks of the CPU and GPU as well as introducing a hardware thermal control mode.[42] A bidirectional turbo core mode was also introduced.

AMD announced the Brazos-T platform on 9 October 2012. It comprised the 4.5-watt AMD Z-Series APU (codenamed Hondo) and the A55T Fusion Controller Hub (FCH), designed for the tablet computer market.[43][44] The Hondo APU is a redesign of the Desna APU. AMD lowered energy use by optimizing the APU and FCH for tablet computers.[45][46]

The Deccan platform including Krishna and Wichita APUs were cancelled in 2011. AMD had originally planned to release them in the second half 2012.[47]

Piledriver architecture (2012): Trinity and Richland

[edit]
Piledriver-based AMD APUs
An AMD A4-5300 for desktop systems
An AMD A10-4600M for mobile systems
Trinity

The first iteration of the second generation platform, released in October 2012, brought improvements to CPU and GPU performance to both desktops and laptops. The platform features 2 to 4 Piledriver CPU cores built on a 32 nm process with a TDP between 65 W and 100 W, and a GPU based on the Radeon HD7000 series with support for DirectX 11, OpenGL 4.2, and OpenCL 1.2. The Trinity APU was praised for the improvements to CPU performance compared to the Llano APU.[50]

Richland
  • "Enhanced Piledriver" CPU cores[51]
  • Temperature Smart Turbo Core technology. An advancement of the existing Turbo Core technology, which allows internal software to adjust the CPU and GPU clock speed to maximise performance within the constraints of the Thermal design power of the APU.[52]
  • New low-power consumption CPUs with only 45 W TDP[53]

The release of this second iteration of this generation was 12 March 2013 for mobile parts and 5 June 2013 for desktop parts.

Graphics Core Next-based GPU

[edit]

Jaguar architecture (2013): Kabini and Temash

[edit]

In January 2013 the Jaguar-based Kabini and Temash APUs were unveiled as the successors of the Bobcat-based Ontario, Zacate and Hondo APUs.[54][55][56] The Kabini APU is aimed at the low-power, subnotebook, netbook, ultra-thin and small form factor markets, while the Temash APU is aimed at the tablet, ultra-low power and small form factor markets.[56] The two to four Jaguar cores of the Kabini and Temash APUs feature numerous architectural improvements regarding power requirement and performance, such as support for newer x86-instructions, a higher IPC count, a CC6 power state mode and clock gating.[57][58][59] Kabini and Temash are AMD's first, and also the first ever quad-core x86 based SoCs.[60] The integrated Fusion Controller Hubs (FCH) for Kabini and Temash are codenamed "Yangtze" and "Salton", respectively.[61] The Yangtze FCH features support for two USB 3.0 ports, two SATA 6 Gbit/s ports, as well as the xHCI 1.0 and SD/SDIO 3.0 protocols for SD-card support.[61] Both chips feature DirectX 11.1-compliant GCN-based graphics as well as numerous HSA improvements.[54][55] They were fabricated at a 28 nm process in an FT3 ball grid array package by Taiwan Semiconductor Manufacturing Company (TSMC), and were released on 23 May 2013.[57][62][63]

The PlayStation 4 and Xbox One were revealed to both be powered by 8-core semi-custom Jaguar-derived APUs.

Steamroller architecture (2014): Kaveri

[edit]
AMD A8-7650K (Kaveri)

The third generation of the platform, codenamed Kaveri, was partly released on 14 January 2014.[66] Kaveri contains up to four Steamroller CPU cores clocked to 3.9 GHz with a turbo mode of 4.1 GHz, up to a 512-core Graphics Core Next GPU, two decode units per module instead of one (which allows each core to decode four instructions per cycle instead of two), AMD TrueAudio,[67] Mantle API,[68] an on-chip ARM Cortex-A5 MPCore,[69] and will release with a new socket, FM2+.[70] Ian Cutress and Rahul Garg of Anandtech asserted that Kaveri represented the unified system-on-a-chip realization of AMD's acquisition of ATI. The performance of the 45 W A8-7600 Kaveri APU was found to be similar to that of the 100 W Richland part, leading to the claim that AMD made significant improvements in on-die graphics performance per watt;[64] however, CPU performance was found to lag behind similarly specified Intel processors, a lag that was unlikely to be resolved in the Bulldozer family APUs.[64] The A8-7600 component was delayed from a Q1 launch to an H1 launch because the Steamroller architecture components allegedly did not scale well at higher clock speeds.[71]

AMD announced the release of the Kaveri APU for the mobile market on 4 June 2014 at Computex 2014,[65] shortly after the accidental announcement on the AMD website on 26 May 2014.[72] The announcement included components targeted at the standard voltage, low-voltage, and ultra-low voltage segments of the market. In early-access performance testing of a Kaveri prototype laptop, AnandTech found that the 35 W FX-7600P was competitive with the similarly priced 17 W Intel i7-4500U in synthetic CPU-focused benchmarks, and was significantly better than previous integrated GPU systems on GPU-focused benchmarks.[73] Tom's Hardware reported the performance of the Kaveri FX-7600P against the 35 W Intel i7-4702MQ, finding that the i7-4702MQ was significantly better than the FX-7600P in synthetic CPU-focused benchmarks, whereas the FX-7600P was significantly better than the i7-4702MQ's Intel HD 4600 iGPU in the four games that could be tested in the time available to the team.[65]

Puma architecture (2014): Beema and Mullins

[edit]

Puma+ architecture (2015): Carrizo-L

[edit]

Excavator architecture (2015): Carrizo

[edit]

Steamroller architecture (Q2–Q3 2015): Godavari

[edit]
  • Update of the desktop Kaveri series with higher clock frequencies or smaller power envelope
  • Steamroller-based CPU with 4 cores[77]
  • Graphics Core Next 2nd Gen-based GPU
  • Memory controller supports DDR3 SDRAM at 2133 MHz
  • 65/95 W TDP with support for configurable TDP
  • Socket FM2+
  • Target segment desktop
  • Listed since Q2 2015

Excavator architecture (2016): Bristol Ridge and Stoney Ridge

[edit]
AMD A12-9800 (Bristol Ridge)
  • Excavator-based CPU with 2–4 cores
  • 1 MB L2 cache per module
  • Graphics Core Next 3rd Gen-based GPU[78][79][80][81]
  • Memory controller supports DDR4 SDRAM
  • 15/35/45/65 W TDP with support for configurable TDP
  • 28 nm
  • Socket AM4 for desktop
  • Target segment desktop, mobile and ultra-mobile

Zen architecture (2017): Raven Ridge

[edit]

Zen+ architecture (2018): Picasso

[edit]
  • Zen+-based CPU microarchitecture[86]
  • Refresh of Raven Ridge on 12 nm with improved latency and efficiency/clock frequency. Features similar to Raven Ridge
  • Launched April 2018

Zen 2 architecture (2019): Renoir

[edit]

Zen 3 architecture (2020): Cezanne

[edit]

RDNA-based GPU

[edit]

Zen 3+ architecture (2022): Rembrandt

[edit]
  • Zen 3+ based CPU microarchitecture[92]
  • RDNA 2-based GPU[92]
  • Memory controller supports DDR5-4800 and LPDDR5-6400[92]
  • Up to 45 W TDP for mobile
  • Node: TSMC N6[92]
  • Socket FP7 for mobile
  • Released for mobile early 2022[92]

Zen 4 architecture (2023): Phoenix Point

[edit]
  • Zen 4 based CPU microarchitecture[93]
  • RDNA 3-based GPU with up to 12 CU[93]
  • Memory controller supports DDR5-5600 and LPDDR5x-7500
  • XDNA-powered NPU with up to 16 TOPS[94]
  • Up to 54 W TDP for mobile
  • Up to 65 W TDP for desktop[94]
  • Node: TSMC N4[93]
  • Sockets FP7, FP7r2 & FP8 for mobile
  • Socket AM5 for desktop
  • Released for mobiles early 2023[93]
  • Released for desktop early 2024[94]

Zen 5 architecture (2024): Strix Point

[edit]
  • Zen 5 based CPU microarchitecture with a mix of Zen 5 and 5c cores[95]
  • RDNA 3.5-based GPU[95] with up to 16 CU
  • Memory controller supports DDR5-5600 and LPDDR5x-8000
  • XDNA2-powered NPU with up to 55 TOPS[95]
  • Up to 54 W TDP for mobile
  • Node: TSMC N4[95]
  • Socket FP8 for mobile
  • Released for mobile early 2024

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The AMD Accelerated Processing Unit (APU) is a microprocessor developed by Advanced Micro Devices (AMD) that integrates a central processing unit (CPU) and a graphics processing unit (GPU) onto a single die, enabling efficient combined processing for computing, graphics, and multimedia tasks in devices such as desktops, laptops, and embedded systems. Introduced in 2011 as part of AMD's Fusion initiative, the first APUs combined multi-core x86 CPU technology—initially based on the Bobcat microarchitecture—with a DirectX 11-capable Radeon GPU core, a parallel processing engine, and hardware-accelerated video decoding, all connected via a high-bandwidth on-chip bus to support seamless data sharing and reduced latency. The launch included low-power E-Series and C-Series models for ultrathin notebooks and netbooks, followed by the higher-performance A-Series "Llano" APUs in mid-2011, which delivered up to 500 GFLOPs of graphics performance and all-day battery life in mobile systems. Over the subsequent years, AMD's APU lineup evolved significantly, transitioning from early architectures like to the family in the series, incorporating advanced GPU designs based on RDNA architectures for enhanced gaming and content creation capabilities. By 2022, Rembrandt-based 6000 Series mobile APUs on a 6 nm process featured 3+ CPU cores and , supporting up to gaming at configurable thermal design powers (TDPs) from 15 W to 45 W. Desktop APUs, such as the 7000 Series with integrated on the AM5 socket, introduced DDR5 support and up to 16 cores for broader productivity and light gaming without discrete GPUs. As of 2025, AMD's APU portfolio includes high-end mobile offerings like the Strix Point architecture, which employs a monolithic design with up to 12 Zen 5/Zen 5c cores and Radeon 890M integrated graphics targeting premium thin-and-light laptops, in contrast to the chiplet-based Strix Halo with up to 16 Zen 5 cores, Radeon 8060S graphics featuring up to 40 compute units, and greater memory bandwidth for discrete-GPU-level performance in thicker, high-performance devices, alongside desktop Ryzen 8000G Series APUs and anticipated Ryzen 9000G refreshes emphasizing AI acceleration via XDNA NPUs. These designs prioritize power efficiency, cost savings through integration, and unified memory access, making APUs ideal for budget gaming rigs, embedded applications, and emerging AI workloads while competing in markets traditionally dominated by discrete graphics solutions.

Introduction

Definition and Purpose

An Accelerated Processing Unit (APU) is a system on a chip (SoC) developed by AMD that integrates x86 CPU cores and a discrete-level GPU on a single die, enabling enhanced efficiency and performance for general computing tasks. This design combines central processing capabilities with graphics acceleration in a unified package, distinct from traditional setups that pair separate CPU and discrete GPU components. The primary purpose of an AMD APU is to facilitate seamless collaboration between the CPU and GPU through shared system memory, which allows unified memory access and minimizes data transfer latency between the processors. This integration reduces overall power consumption and manufacturing costs compared to discrete CPU-GPU configurations, while simplifying system design by eliminating the need for high-bandwidth interconnects like PCIe for inter-processor communication. Additionally, it promotes energy-efficient operation suitable for mobile and embedded applications without sacrificing computational versatility. At its core, an AMD APU consists of AMD64-compatible CPU cores for general-purpose computing, an integrated Radeon-branded GPU for graphics and parallel processing, and shared resources such as a and I/O interfaces that enable cohesive operation. The CPU handles sequential tasks and system management, while the GPU accelerates vectorized workloads, with both accessing the same memory pool to optimize data sharing. AMD introduced the APU under the "Fusion" branding in 2011, later transitioning to the A-Series designation for consumer and embedded products, and subsequently incorporating them into the lineup with integrated Graphics starting in 2018.

Significance and Applications

AMD Accelerated Processing Units (APUs) hold significant market importance in budget and mid-range laptops, all-in-one PCs, and embedded systems, where their integrated design facilitates compact, cost-effective solutions without the need for discrete graphics cards. This enables the creation of thin-and-light laptops and space-constrained devices like all-in-one desktops, which prioritize portability and affordability over high-end performance. In the embedded sector, APUs such as the AMD Embedded G-Series have been pivotal since their introduction, providing flexible platforms for industrial applications while adhering to stringent size and power requirements. Key applications of APUs span entry-level gaming, content creation, AI acceleration, gaming consoles, and industrial or automotive embedded systems. In entry-level gaming, APUs like the Ryzen AI Max Series deliver capable integrated graphics for casual play without additional hardware. For content creation tasks such as , APUs support efficient processing through combined CPU and GPU resources. Recent models incorporate neural processing units (NPUs) for AI acceleration, enabling on-device in laptops and edge devices. Iconic examples include the custom Jaguar-based APUs powering the and [Xbox One](/page/Xbox One) consoles, which integrated CPU and GPU capabilities to drive immersive gaming experiences. In industrial and automotive domains, APUs like the Ryzen Embedded V2000A Series handle real-time sensor data processing for advanced driver-assistance systems (ADAS) and in-vehicle infotainment. The primary advantages of stem from their space-saving integration of CPU and GPU on a single die, which reduces overall system complexity and board compared to discrete configurations. This integration enhances power efficiency, with mobile APUs typically operating in a TDP range of 15W to 65W, contributing to longer battery life in laptops and lower demands. Such allows original manufacturers (OEMs) to tailor APUs for diverse form factors, from ultrabooks to rugged embedded units, while maintaining consistent performance profiles. APUs address key challenges in traditional CPU-GPU setups by mitigating bandwidth bottlenecks through direct on-chip interconnects, enabling faster between processing elements. This is particularly beneficial for hybrid workloads like video encoding, where the CPU and GPU collaborate seamlessly—facilitated briefly by technologies such as (HSA)—to optimize throughput without excessive memory transfers.

History

Origins and Fusion Initiative

The origins of the AMD Accelerated Processing Unit (APU) trace back to 's acquisition of in October 2006, which provided the graphics expertise necessary to pursue integrated CPU-GPU designs. The $5.4 billion deal, completed on October 24, 2006, merged 's microprocessor capabilities with ATI's GPU technology, enabling the company to develop unified processor architectures. Immediately following the acquisition, announced its "Fusion" initiative on October 25, 2006, aiming to create processors that combined central processing units (CPUs) and graphics processing units (GPUs) on a single die or package. The primary goals of the Fusion initiative were to deliver discrete graphics-level performance within an integrated solution, thereby improving efficiency and reducing power consumption for mainstream computing. Targeted at laptops and desktops, Fusion sought to provide enhanced visual computing experiences without the need for separate discrete GPUs, positioning to compete directly with Intel's evolving integrated graphics offerings in its Core i-series processors. This integration was envisioned to enable modular designs that leveraged both CPU and GPU compute capabilities, with initial platforms planned for commercial clients, mobile devices, and gaming by 2007, and full Fusion processors by late 2008 or early 2009. Development of early Fusion prototypes accelerated in the late 2000s, pairing AMD's K10 () CPU architecture with ATI's TeraScale GPU technology to create system-on-chip (SoC) solutions. This shift marked AMD's transition from selling standalone CPUs and GPUs to producing cohesive SoCs, responding to industry demands for more efficient, all-in-one processors amid Intel's advancements in integrated graphics. Internal codenames such as "Llano" were assigned to the first desktop-oriented APU prototype, which combined K10-based CPU cores with a TeraScale GPU on a single die, laying the groundwork for subsequent releases.

Launch and Early Adoption

The initial commercial launches of AMD's Accelerated Processing Units () occurred in 2011, beginning with the low-power and Zacate APUs of the Brazos platform in the first quarter, followed by the desktop Llano platform in June and its mobile variants later that year. The Llano APUs, built on a 32nm process and integrating K10-based CPU cores with HD 6000-series , were introduced under the A-Series branding to target mainstream consumer desktops. These releases represented AMD's first widespread deployment of fused CPU-GPU architectures, branded as A-Series processors with embedded HD to deliver enhanced multimedia and light gaming capabilities without discrete GPUs. Early adoption was favorable, particularly for graphics-intensive tasks where Llano APUs outperformed Intel's integrated GPUs, achieving up to four times higher frame rates in benchmarks like and select games. Positioned for budget systems, the APUs gained traction in entry-level PCs and netbooks, with models like the A8-3850 priced at $135 to undercut competitors and drive volume sales among cost-sensitive consumers and OEMs. Positive reviewer feedback on integrated graphics performance spurred uptake in affordable all-in-one systems and slim laptops, helping capture share in the sub-$500 PC segment. Despite these strengths, early encountered hurdles such as elevated power consumption, exemplified by Llano's 100W TDP which constrained designs in compact . The 32nm fabrication process also imposed efficiency limitations relative to emerging competitors' nodes, impacting battery life in mobile variants. In the nascent tablet space, AMD's x86-based faced rivalry from Nvidia's ARM-oriented 2 SoCs, which prioritized ultra-low power for longer runtime in portable devices. A key milestone came with the 2012 launch of the APUs, which transitioned to the FM2 socket for desktops and improved integration, broadening ecosystem support and paving the way for further refinements. By 2013, APU adoption in OEM laptops had grown substantially, evidenced by partnerships like AMD's collaboration with HP to deliver over one million APU-equipped units to the Chinese market.

Evolution to Zen and Beyond

In the mid-2010s, AMD advanced its APU lineup by integrating Graphics Core Next (GCN) GPUs with the Kaveri APUs launched in 2014, marking the first use of this architecture in integrated graphics for improved compute and gaming performance. Concurrently, the company introduced the low-power Puma platform, featuring Mullins and Beema APUs targeted at tablets and ultrathin devices, which delivered up to twice the battery life and 25% better overall performance compared to prior generations like Temash. By 2015, the Carrizo APUs with Excavator cores further enhanced efficiency, achieving up to 25% longer battery life and 20% faster graphics performance through optimizations like improved voltage scaling and a more integrated SoC design. The transition to Zen architectures began in 2017 with Raven Ridge, AMD's first APU combining CPU cores and graphics, which extended support for the AM4 socket through at least 2020 to prolong platform longevity. This shift enabled better multi-threaded performance and integrated graphics capable of gaming without discrete GPUs. In 2020, the Renoir APUs adopted the microarchitecture on a 7nm , doubling transistor density over prior 14nm designs and boosting single-threaded performance by up to 25% while maintaining graphics for balanced mobile and desktop use. Recent developments have focused on graphics and AI enhancements, with the 2022 Rembrandt APUs introducing integrated GPUs for ray tracing support and up to 50% more compute units than predecessors, enabling solid gaming on 6nm processes. The 2023 Phoenix APUs paired cores with graphics, delivering improved efficiency and AI acceleration on a 4nm node. In 2024, Strix Point APUs added a dedicated neural unit (NPU) delivering 50 TOPS for AI workloads, powering the AI 300 series and advancing Copilot+ PC capabilities with cores and RDNA 3.5 graphics. APUs have expanded into high-impact applications, including custom Zen 2 variants in the console for seamless CPU-GPU integration in gaming. Their role in AI PCs has grown significantly, with AI processors enabling on-device inference and over 150 AI PC models available by 2025, driving productivity and creative tools. In laptops, APUs now form the core of AMD's client portfolio, supporting the surge in thin-and-light designs with integrated AI and graphics.

Architectural Features

CPU Microarchitectures in APUs

The CPU microarchitectures in AMD APUs began with the K10-based "Stars" cores used in early desktop-oriented models like Llano. These cores featured up to four standard cores with dedicated floating-point units on a 32 nm process, emphasizing multi-threaded workloads to complement integrated graphics. Concurrently, the Bobcat microarchitecture powered low-power APUs, delivering a dual-issue, out-of-order execution model on a 40 nm process tailored for netbook and ultrathin applications with single or dual cores optimized for efficiency over peak performance. Subsequent iterations refined the Bulldozer lineage through the Piledriver microarchitecture, which introduced enhancements like improved branch prediction and floating-point scheduling, yielding approximately 15% higher instructions per clock (IPC) compared to Bulldozer while maintaining the modular structure. Steamroller followed, expanding execution units for wider integer and floating-point throughput, achieving up to 20% IPC gains over Piledriver through better resource sharing within modules and support for advanced instruction sets. The Puma family, a derivative optimized for mobile devices, refined Jaguar cores with enhanced power gating and decode efficiency for ultra-low-power scenarios, prioritizing battery life in thin-and-light systems. Excavator concluded this era, delivering a 14% IPC uplift over Steamroller via larger caches and optimized pipelines on a 28 nm process, marking a shift toward higher efficiency in mainstream mobile APUs. The Zen series represented a complete redesign, debuting in APUs with Zen 1 cores on a , supporting 4 to 8 cores per chip with (SMT) for improved single-threaded performance through a wider front-end and deeper execution resources. refined this on a 12 nm process, reducing latencies in cache and memory subsystems for up to 3% IPC gains while enabling higher boost clocks in APU configurations. Zen 2 advanced to a with a partial layout, incorporating a monolithic die for APUs but leveraging modular CCDs for scalability, alongside doubled L3 cache per core complex for better multi-core efficiency. Zen 3 emphasized higher clock speeds through unified L3 cache designs and improved branch prediction, achieving 19% IPC uplift over Zen 2 in APU variants. Zen 4 introduced support with double-pumped 256-bit execution units on a 4 nm monolithic die in APU configurations, enhancing vector workloads in integrated systems. Zen 5 further boosts IPC by ~16% via enhanced branch prediction with dual decode pipes and wider pipelines, targeting AI-accelerated APUs on advanced nodes. APU-specific adaptations across these microarchitectures emphasize balanced core counts from 2 to 16 to optimize power efficiency, allowing seamless scaling for constraints in laptops and desktops while pairing CPU performance with integrated GPUs. In some embedded variants, elements like programmable logic akin to FPGA fabrics are integrated alongside cores via the Embedded+ architecture, enabling customizable acceleration for edge AI and real-time processing.

GPU Microarchitectures in APUs

The GPU microarchitectures in Accelerated Processing Units (APUs) have evolved significantly since the introduction of integrated graphics, transitioning from the TeraScale architecture to the more advanced RDNA family, with each generation enhancing rendering efficiency, compute capabilities, and API support tailored for power-constrained environments. In the TeraScale era, employed the TeraScale 2 microarchitecture, based on a (VLIW5) design, in early APUs such as Llano and . This architecture featured up to 80 shader processors organized into 5 shader engines, enabling DirectX 11 compatibility for improved and in graphics workloads. Later Piledriver-based APUs incorporated refinements from TeraScale 3 and 4, which introduced enhanced units and minor efficiency gains in and vertex pipelines, though still limited by the VLIW paradigm's scalability issues. The shift to the Graphics Core Next (GCN) family marked a pivotal advancement, starting with GCN 1.0 in APUs like Kabini and Kaveri, which unified scalar and vector processing through a single-instruction multiple-data (SIMD) approach across compute units (CUs). This generation supported OpenCL 1.2 for general-purpose GPU computing and delivered up to 6 CUs in mobile variants, focusing on balanced rasterization and basic compute tasks. GCN 2.0, featured in Carrizo, added asynchronous compute engines to allow concurrent graphics and compute operations, reducing pipeline stalls and improving throughput in multi-threaded applications. Subsequent iterations included GCN 3.0 in Bristol Ridge, which emphasized higher clock speeds—up to 1 GHz in some configurations—for better performance in legacy DirectX 12 workloads without major architectural overhauls. The GCN 5.0 variants, based on Vega, appeared in Zen-based APUs such as Raven Ridge, Picasso, Renoir, and Cezanne, scaling to 8–11 CUs with high-bandwidth cache controllers and early hardware support for ray tracing primitives like bounding volume hierarchy traversal acceleration. The transition to the RDNA architecture brought further optimizations for gaming and AI workloads in APUs. RDNA 2, integrated in APUs, utilized 12 CUs with a dual-issue architecture, enhancing rasterization efficiency by up to 50% over through improved and primitive shaders. , integrated in Phoenix APUs, incorporated dedicated encode hardware and dual compute engines per workgroup processor for superior and ray tracing performance. The latest , found in Strix Point APUs, expands to up to 16 CUs with refined ray tracing accelerators and integrated AI upscaling engines, enabling efficient hardware-accelerated neural rendering in thin-and-light devices. APU-specific optimizations across these microarchitectures emphasize scalability and resource sharing, with configurable CU counts ranging from 2 to 16 to match thermal and power envelopes. Recent models achieve bandwidth exceeding 100 GB/s via unified L3 caches and Infinity Fabric interconnects, while supporting modern APIs like Vulkan 1.3 for cross-platform compute and DirectX 12 Ultimate for advanced mesh shading and variable rate shading.

Integration Technologies

The (HSA), co-developed by AMD and introduced in 2013 through the HSA Foundation, enables cohesive CPU-GPU integration in APUs by allowing direct pointer sharing between the CPU and GPU, eliminating the need for data copying. This architecture provides unified virtual addressing across processors, permitting seamless memory access, and supports hOpenCL for tasks that leverage both CPU and GPU resources without kernel recompilation. HSA's design facilitates low-latency task dispatching to the GPU independently of the CPU, enhancing overall system efficiency for parallel workloads. Full HSA implementation in AMD APUs began with the Carrizo series in 2015, building on partial support in earlier models like , and has since become a cornerstone for compute-intensive applications in subsequent generations. By unifying the , HSA reduces overhead in , allowing developers to treat the APU as a single coherent system for tasks such as image processing and acceleration. The memory subsystem in AMD APUs emphasizes shared access to optimize CPU-GPU interactions, evolving from DDR3 support in early architectures to high-bandwidth LPDDR5X-7500 in modern designs like Strix Point. In Zen-based APUs, a shared L3 cache—typically 4-16 MB depending on configuration—serves both CPU cores and the integrated GPU, reducing latency for common data sets and improving cache coherency through hardware-managed protocols. Starting with + chiplet implementations, Infinity Fabric interconnect provides scalable, high-speed communication between dies, achieving bandwidths up to 40-60 GB/s for inter-component data transfer in multi-chiplet APUs. Power and thermal management in APUs incorporate dynamic voltage and (DVFS) applied jointly to CPU and GPU domains, enabling real-time adjustments based on workload demands to balance performance and efficiency. (TDP) configurations span ultra-low 4W for embedded and handheld applications to 120W for high-end desktop and mobile variants, with configurable profiles allowing designers to tailor power envelopes. An integrated northbridge on the APU die handles I/O coherency and functions, minimizing external dependencies and latency in data routing. Further integrations enhance APU versatility, including an Input-Output Memory Management Unit (IOMMU) that supports secure by enabling (DMA) remapping for GPU tasks in virtualized environments. PCIe support has advanced to version 4.0 in Zen 3+ APUs and 5.0 in select Zen 4+ models, providing up to 28 lanes in Phoenix APUs for expanded peripheral connectivity. Beginning with Zen 4+, APUs incorporate a dedicated Neural Processing Unit (NPU) based on the XDNA architecture, as seen in Strix Point, delivering up to 50 for AI inference while maintaining power efficiency through dedicated accelerators.

TeraScale-based APUs

Llano (2011)

Llano, released in June 2011, marked AMD's entry into desktop accelerated processing units (APUs) with the codename Llano and the introduction of the FM1 socket. The platform featured the A4, A6, and A8 series processors, offering dual- and quad-core configurations based on the Stars microarchitecture derived from the K10 family, providing 2 to 4 cores and corresponding threads without . The integrated graphics drew from the using the TeraScale 2 architecture, with variants including the HD 6410D (160 shading units) for A4 models, HD 6530D (320 shading units) for A6 models, and HD 6550D (400 shading units) for A8 models, delivering up to 480 GFLOPS of peak performance. These APUs supported Dual Graphics mode, allowing combination with compatible discrete GPUs for enhanced rendering via technology. Fabricated on a 32 nm silicon-on-insulator (SOI) process by , Llano APUs contained 1.178 billion transistors across a 228 mm² die. They supported dual-channel DDR3 up to 1866 MT/s and operated within a 65–100 W (TDP) envelope, balancing performance for mainstream desktops. As the first desktop APU to integrate 11-capable graphics, Llano pioneered unified CPU-GPU designs for consumer systems, significantly outperforming Intel's HD Graphics 2000 in gaming workloads—delivering up to 3–4 times higher frame rates in titles like Dirt 2 at 1680x1050 resolution. Its efficient video decode capabilities and low-power integrated graphics made it popular for budget home theater PCs (HTPCs) and all-in-one systems, enabling playback and light gaming without discrete GPUs.

Bobcat-based APUs (2011)

The Bobcat-based APUs, released in early 2011 as part of AMD's Brazos platform, targeted ultra-portable devices such as netbooks, mainstream laptops, and tablets with low-power requirements. These APUs integrated one or two Bobcat CPU cores, operating at clock speeds ranging from 1.0 to 1.6 GHz, under the E-Series (Zacate codename for netbooks) and C-Series (Ontario codename for mainstream low-power laptops). The Desna variant, aimed at tablets under the Z-Series branding, featured similar dual-core configurations at around 1.0 GHz but included enhanced display output support for touch-enabled devices. Fabricated on a 40 nm process node by TSMC, these APUs supported single-channel DDR3-1066 memory and had thermal design power ratings of 9–18 W, enabling fanless designs in slim form factors. The integrated graphics were based on the TeraScale 2 architecture, branded as Radeon HD 6xxxG series, with five compute units delivering approximately 30–50 GFLOPS of peak performance depending on the model and clock speeds up to 500 MHz. For instance, the Zacate E-350 featured the Radeon HD 6310 with 80 shader processors at 488 MHz, while Ontario models like the C-50 used the Radeon HD 6250 at lower clocks for reduced power draw. These GPUs included the third-generation Unified Video Decoder (UVD3), supporting hardware-accelerated 1080p video playback for H.264 and other formats, which enhanced multimedia capabilities in battery-constrained systems. The overall design emphasized integration on a single die, with die sizes around 75 mm², prioritizing cost efficiency over high transistor density. These marked AMD's first sub-10 W offerings for the and tablet markets, outperforming Intel's Atom processors in workloads and light productivity tasks by up to 80% in execution while maintaining comparable power efficiency. The Brazos platform's reception was positive for reviving interest in x86-based ultra-portables, as the combined CPU-GPU setup enabled smooth video playback and basic 3D graphics without discrete components. However, the in-order execution of cores limited multithreaded performance, positioning these as entry-level solutions rather than direct competitors to higher-end mobile chips. Desna's addition of multiple display outputs facilitated early tablet adoption, though overall adoption was tempered by the era's shift toward ARM-based alternatives.

Piledriver-based APUs (2012–2013)

The Piledriver-based APUs represented AMD's second-generation desktop and mobile accelerated processing units, succeeding the Llano architecture and bridging the TeraScale graphics era. Launched under the codenames and Richland, these APUs integrated Piledriver CPU cores with HD 7000 and 8000 series graphics, respectively, on the FM2 socket for desktops. They targeted mainstream consumer systems, emphasizing balanced performance for multimedia, light gaming, and everyday computing in budget-oriented PCs and laptops. Trinity debuted in May 2012 for mobile platforms, with desktop variants following in October 2012, while Richland arrived as a refresh in March 2013 for mobiles and June 2013 for desktops. The lineup spanned the A4 to A10 series, featuring 1 to 2 Piledriver modules (2 to 4 cores and threads), with base clocks starting at 2.0 GHz for entry-level models and peaking at 4.2 GHz for Trinity's A10-5800K, which could turbo up to 4.2 GHz. Richland models, such as the A10-6800K, pushed frequencies higher to a 4.1 GHz base and 4.4 GHz turbo, offering modest gains through architectural tweaks and higher binning. These supported up to 4 MB of L2 cache and were designed for multi-threaded workloads, though single-thread performance remained competitive primarily in value segments. Both platforms utilized a 32 nm SOI process node, with Trinity dies measuring 246 mm² and containing 1.303 billion transistors; Richland retained the same die size and transistor count as a silicon refresh, focusing on optimizations rather than a node shrink. Thermal design power (TDP) ranged from 65 W to 100 W for desktops and as low as 35 W for mobiles, enabling efficient operation in compact systems. Memory support included dual-channel DDR3-1866, an upgrade from prior generations, enhancing bandwidth for integrated graphics tasks. Piledriver delivered approximately 10-15% higher instructions per clock (IPC) compared to the original Bulldozer cores, primarily through improved branch prediction, floating-point execution, and reduced latency in the shared FPU, though multi-threaded scaling was limited by the module design. Graphics integration featured TeraScale 3 () in Trinity and TeraScale 3 () in Richland, with up to 384 processors (6 compute units) configurable across models. Peak performance reached approximately 614 GFLOPS in top Trinity configurations at 800 MHz GPU clocks, scaling to around 650 GFLOPS in Richland's higher-binned 844 MHz variants, supporting 11 and hardware-accelerated video decode via UVD3. These iGPUs excelled in budget gaming, delivering playable frame rates in titles like at low settings when paired with dual-channel memory. PowerTune technology enabled dynamic GPU boosting within TDP limits, improving efficiency during bursty workloads like video playback or casual gaming. Key innovations included native USB 3.0 support via the accompanying A85X and A75 chipsets, providing up to four ports alongside 10 USB 2.0 ports for enhanced peripheral connectivity in mainstream builds. The FM2 platform also introduced via AMD OverDrive for unlocked "K" models, appealing to enthusiasts on a . Reception was positive in the value market, where these powered affordable all-in-one PCs and HTPCs, capturing significant share in emerging markets and contributing to APUs comprising nearly 75% of AMD's processor unit shipments by late 2012. Their integrated design reduced system costs, making them popular for entry-level gaming rigs capable of 30+ FPS in older titles without discrete GPUs.

Graphics Core Next-based APUs

Jaguar-based APUs (2013)

The Jaguar-based APUs, released in the second quarter of 2013, marked AMD's shift to a new low-power x86 core architecture paired with (GCN) graphics for mainstream mobile and ultra-low-power applications. Codenamed Kabini for entry-level laptops and Temash for tablets and embedded devices, these APUs targeted thin-and-light systems with improved power efficiency over prior generations. The A4 and A6 series processors featured four cores with , operating at clock speeds ranging from 1.5 GHz for the A4-5000 to 2.0 GHz for the A6-5200 in Kabini variants, and lower 1.0 GHz base (up to 1.4 GHz turbo) in the Temash A6-1450. These were fabricated on a 28 nm process node, supporting DDR3-1600 in a single-channel configuration and delivering (TDP) ratings from 15-25 W for Kabini to as low as 4-8 W for Temash, enabling fanless designs in ultrathin tablets. The packages used BGA mounting typical for mobile SoCs, with some variants compatible with FT3/FT4 interfaces in modular systems. Kabini and Temash found adoption in entry-level laptops like the series and tablets, while custom variants powered the and consoles, contributing to over 100 million indirect unit shipments through these gaming platforms by leveraging the same Jaguar and GCN foundations. The integrated GPUs, branded under the , utilized GCN 1.0 architecture with 2 or 4 compute units (CUs) delivering up to approximately 300 GFLOPS of single-precision compute performance, depending on configuration and clock speeds up to 600 MHz. For instance, the A4-5000 paired with HD 8330 (2 CUs at 497 MHz), while higher-end A6 models used variants like HD 8400 (up to 4 CUs). These GPUs supported 1.2 for general-purpose computing, enabling basic heterogeneous workloads alongside 11.1 graphics. Jaguar-based APUs delivered 20-25% better power efficiency compared to the preceding Piledriver , primarily through smaller and optimized , allowing sustained performance at lower TDPs. The Temash platform, in particular, achieved a configurable 3.95-8 TDP, facilitating extended battery life in tablets without compromising quad-core x86 compatibility. Reception highlighted their role in reviving AMD's mobile presence, especially via console integrations that validated the 's for real-time rendering and multitasking.

Steamroller and Puma-based APUs (2014–2015)

The Steamroller and Puma microarchitectures marked AMD's transition to more efficient APUs in 2014, targeting both mainstream desktop/mobile and low-power portable devices. The Kaveri family, based on Steamroller, debuted in the first quarter of 2014 for FM2+ socket desktop and mobile platforms, with models like the A10-7850K offering up to four cores clocked at 3.7 GHz. A refresh under the Godavari codename followed in the second quarter of 2015, introducing minor clock boosts such as the A10-7870K at up to 3.9 GHz while retaining the core architecture. Concurrently, the Puma-based Beema and Mullins platforms launched in early 2014, optimized for low-power applications with quad-core configurations in the A6 to A10 series reaching up to 2.4 GHz, emphasizing battery life in tablets and convertibles. These were fabricated on a 28 nm process, with featuring 2.41 billion transistors across a 245 mm² die, while Puma variants like Beema and Mullins integrated similar densities for compact SoCs. ranged from 65–95 W for desktop Kaveri models to 12–25 W for mobile Beema, with Mullins extending to ultra-low configurations as low as 2.5 W for fanless designs. Memory support included DDR3-2133, and Mullins notably incorporated 6 Gb/s for enhanced storage connectivity in embedded and portable systems. The cores in Kaveri improved over prior architectures, while Puma refined Jaguar's efficiency for bursty workloads, as detailed in broader overviews. GPU integration advanced with (GCN) 1.2/2.0 architectures, branded as R7 and R5 in and Puma respectively, featuring up to 8 compute units (CUs) for and up to 6 for Puma variants for scalable performance. 's top configurations delivered up to approximately 0.86 TFLOPS of FP32 compute at base clocks, enabling smooth gaming and tasks without discrete graphics. HSA preview functionality first appeared in , allowing preliminary unified access between CPU and GPU to streamline , though full interoperability required software ecosystem maturity. Innovations included Kaveri's support for 4K video decoding and display output via updated UVD 4.2 and VCE 2.0 engines, facilitating Ultra HD playback in compatible systems. Puma platforms, particularly Mullins at configurable TDPs around 10.6 W, were tailored for 2-in-1 convertibles and tablets, prioritizing all-day battery life and seamless mode switching. These APUs received mixed reception; Kaveri was praised for its integrated graphics leap and HSA potential but criticized for modest CPU gains over predecessors and premium pricing relative to Intel Haswell alternatives. Puma variants fared better in low-power niches, offering competitive efficiency for media consumption devices despite limited high-end appeal.

Excavator-based APUs (2015–2017)

The -based APUs represented AMD's sixth and seventh generation A-Series processors, integrating the CPU microarchitecture with (GCN) 3.0 graphics to target mobile and entry-level desktop markets. These platforms, including Carrizo, Carrizo-L, Bristol Ridge, and Stoney Ridge, emphasized energy efficiency improvements over prior designs while supporting (HSA) for unified CPU-GPU computing. Carrizo APUs launched in the second quarter of 2015 for mobile devices, featuring quad-core Excavator configurations under the A6 to A12 branding with clock speeds up to 3.7 GHz. The lower-end Carrizo-L variant followed in the fourth quarter of 2015, also mobile-focused but with dual- or quad-core Puma+ cores derived from Excavator for mainstream configurations at reduced power levels. Bristol Ridge and Stoney Ridge arrived in the second quarter of 2016, with Bristol Ridge serving desktop systems on the FM2+ socket and Stoney Ridge targeting mobile ultrabooks via the FP4 package; both offered A6 to A12 models with up to four Excavator cores. Bristol Ridge is a family of accelerated processing units (APUs) released in 2016-2017, part of the 7th generation A-Series APUs for desktop and mobile, based on the Excavator CPU microarchitecture and Graphics Core Next (GCN) graphics. Key Bristol Ridge models include: Desktop APUs:
  • A12-9800 (4 cores, 3.8 GHz base, 4.2 GHz boost, 35W TDP)
  • A12-9800E (4 cores, 3.1 GHz base, 3.8 GHz boost, 35W TDP)
  • A10-9700 (4 cores, 3.5 GHz base, 3.8 GHz boost, 65W TDP)
  • A10-9700E (4 cores, 3.0 GHz base, 3.5 GHz boost, 35W TDP)
  • A6-9500 (2 cores, 3.5 GHz base, 3.8 GHz boost, 65W TDP)
  • A6-9500E (2 cores, 3.0 GHz base, 3.4 GHz boost, 35W TDP)
Mobile APUs:
  • A12-9700P (4 cores, 2.5 GHz base, 3.6 GHz boost, 15W TDP)
  • A10-9600P (4 cores, 2.4 GHz base, 3.3 GHz boost, 15W TDP)
  • A9-9410 (2 cores, 2.9 GHz base, 3.5 GHz boost, 10-25W TDP)
  • A6-9220 (2 cores, 2.5 GHz base, 3.0 GHz boost, 10-25W TDP)
  • A4-9120 (2 cores, 2.5 GHz base, 2.6 GHz boost, 10-25W TDP)
Note: Some sources group Stoney Ridge (lower-power 2-core variants with updated graphics) under the broader Bristol Ridge generation, but strictly Bristol Ridge refers to the models above. PRO business variants (e.g., PRO A12-9800) also exist. These integrated R6 Graphics based on GCN 3.0, scaling from 3 to 8 compute units (CUs) across variants for peak performance up to approximately 730 GFLOPS, enabling features like hardware-accelerated H.265 decoding for 4K video. Full HSA 1.0 compliance allowed seamless GPU context switching and access between CPU and GPU, facilitating compute tasks without data copying overhead. All models supported DDR3L-1866 memory in dual-channel configurations for Ridge and single-channel for Stoney Ridge, with configurable TDPs from 12-35 W in mobile SKUs and 65 W for desktops. Fabricated on a 28 nm process, Carrizo and Carrizo-L dies measured around 250 mm² with approximately 3.1 billion transistors, while Ridge and Stoney Ridge used smaller 124-182 mm² dies with 1.2-2.4 billion transistors for better efficiency. FreeSync technology was added for adaptive display refresh rates, enhancing multimedia playback up to . Excavator cores delivered up to 15% higher instructions per clock (IPC) compared to Steamroller, achieved through wider execution units and improved branch prediction without increasing die size significantly. Carrizo platforms specifically boosted notebook battery life by around 20% over predecessors through optimized power gating and voltage-frequency scaling, targeting all-day usage in ultrabooks for tasks like video streaming and light productivity. As the final APUs for the FM2+ socket, Bristol Ridge extended support for legacy AM4 motherboards, but the lineup overall found strength in cost-effective ultrabooks where integrated graphics excelled in casual gaming and media. However, they were overshadowed by Intel's Skylake processors, which offered superior single-threaded performance and broader ecosystem adoption in the mainstream laptop segment.

Zen-based APUs with GCN Graphics (2017–2021)

The Zen-based APUs with GCN graphics marked AMD's transition to integrating its high-performance CPU cores with the (GCN) 5th generation architecture, specifically graphics, in a monolithic die design. The series began with the Raven Ridge APUs, which debuted in mobile form factor at the end of 2017 and expanded to desktop with the 2000G series in , supporting the AM4 socket for desktops and FP5 for mobiles. This was followed by the Picasso refresh in January 2019 for mobile and July 2019 for desktop 3000G models, leveraging enhancements on a 12 nm process. The lineup progressed to Renoir in Q1 2020 for mobile 4000 series and July 2020 for desktop, adopting on 7 nm, before culminating with Cezanne in April 2021 for OEM mobile/desktop PRO variants and August 2021 for consumer 5000G desktop models on 3. These APUs featured 4 to 8 cores with , base clocks starting at 3.6 GHz and boosts up to 4.6 GHz, targeting 35-65 W TDP envelopes for efficient all-in-one computing. At the heart of these APUs was the integrated Vega graphics based on GCN 5.0, offering configurations like 8 (512 shaders, 8 compute units) or 11 (704 shaders, 11 compute units), with peak performance reaching up to approximately 1.8 TFLOPS at official boost clocks around 1250 MHz in higher-end models like the 5 2400G. The iGPU supported FP16 half-precision compute, enabling early workloads and accelerating tasks like video encoding, while providing 12 compatibility and hardware-accelerated video decode for H.264, H.265, and VP9. Key system specifications included DDR4-3200 support with dual-channel configuration for optimal bandwidth, PCIe 3.0 lanes (up to 20) in 1 models evolving to PCIe 4.0 (up to 24) in and 3 variants, and transistor counts ranging from 4.95 billion on the 14 nm/210 mm² Raven Ridge die to 10.7 billion on the denser 7 nm/180 mm² Cezanne die, with Renoir and Picasso at approximately 9.8 billion and 4.94 billion transistors respectively on 156 mm² and 210 mm² dies. From Raven Ridge onward, including Cezanne, these APUs used a monolithic design without chiplets while benefiting from 3's unified L3 cache improvements. These represented a significant leap in integrated performance, delivering discrete-level capabilities comparable to entry-level dedicated GPUs like the GeForce GTX 1050 in gaming at low to medium settings, making them popular for budget builds without a separate . Innovations included the debut of Zen architecture in with Raven Ridge, offering up to 2x the IPC over prior Bulldozer-era designs, and the 7 nm shift in Renoir, which achieved nearly double the power efficiency of Picasso through process shrinks and optimizations, enabling sustained performance at lower TDPs. Cezanne further extended AM4 socket longevity into 2022 with unlocked multipliers for and enhanced iGPU clocks up to 2 GHz, solidifying reception as a value-driven solution for light gaming, , and OEM systems.

RDNA-based APUs

Rembrandt (2022)

The Rembrandt platform, codenamed for AMD's 6000 series mobile processors, was announced on January 4, 2022, with laptops featuring these becoming available starting in February 2022. This laptop-focused lineup emphasizes the 5 and 7 6000U and HS variants, which incorporate 6 to 8 + CPU cores and 12 to 16 threads, with boost clocks reaching up to 4.9 GHz. Built on TSMC's 6 nm process, Rembrandt contain approximately 13.1 billion transistors across a 210 mm² die and support configurable TDPs from 15 W to 45 W, enabling efficient operation in thin-and-light designs. They utilize the FP7 socket and include four 32-bit memory channels compatible with LPDDR5-6400, alongside PCIe 4.0 and connectivity for enhanced data throughput. A key advancement in Rembrandt is the integration of the Radeon 680M graphics processor, AMD's first APU to employ the architecture with 12 compute units (768 shaders), operating at up to 2.4 GHz. This iGPU delivers up to 3.7 TFLOPS of FP32 compute performance, roughly doubling the graphics capabilities of the Vega-based iGPUs in the prior Cezanne generation ( 5000 series). It introduces hardware-accelerated ray tracing via 12 ray accelerators and video decode support, enabling smoother playback and improved efficiency for streaming and content creation. These features position Rembrandt as a strong contender for integrated gaming in ultrabooks, achieving 40-60 FPS in select AAA titles at low to medium settings without discrete GPUs. Rembrandt's refinements to the architecture, including optimized power delivery and higher average clocks, contribute to up to 28% better multi-threaded CPU performance over Cezanne while maintaining similar cache latencies. As the inaugural RDNA-based APU, it marked a significant shift from prior GCN/ integrations, prioritizing graphics efficiency for mobile workloads. The platform saw widespread adoption, powering over 200 premium laptop models from OEMs like , , and HP by mid-2022, and capturing substantial share in the high-end segment for its balance of productivity, battery life (up to 29 hours in office tasks), and casual gaming.

Phoenix (2023–2024)

The Phoenix platform, codenamed , represents AMD's -based accelerated processing unit (APU) architecture introduced in the 7040 series for mobile devices, with a refresh under the Hawk Point codename in the 8040 series. Launched in the second quarter of 2023 following an announcement at CES, the initial lineup targeted thin-and-light laptops and handhelds, featuring 6 to 8 cores and 12 to 16 threads, with maximum boost clocks reaching up to 5.2 GHz on the flagship Ryzen 9 7940HS model. The Hawk Point refresh, announced in December 2023 and available starting early 2024, maintained the core design while enhancing AI capabilities, extending the platform's lifecycle into 2024 for broader commercial adoption. Central to the Phoenix architecture is the integration of RDNA 3 graphics via the 760M and 780M iGPUs, marking AMD's first mobile APU with this GPU generation. The 780M, found in 7 and 9 models, employs 12 compute units (CUs) with 768 shaders, operating at up to 2.7–3.0 GHz for approximately 4.3 TFLOPS of FP32 performance, while the 760M in 5 variants uses 8 CUs at similar clocks for around 2.9 TFLOPS. These iGPUs leverage 's dual compute units per workgroup processor (WGP) design, improving efficiency over prior architectures, alongside dedicated AI accelerators for tasks. Built on TSMC's 4nm process node with an approximate die size of 178 mm² and approximately 25.4 billion transistors in the primary compute/ chiplet (plus a separate I/O die), the platform supports configurable (TDP) from 15 W to 54 W, LPDDR5X-7500 memory, and the FP8 socket for mobile integration. A key innovation in Phoenix is the introduction of the Ryzen AI neural processing unit (NPU) based on AMD's XDNA architecture, the first dedicated AI engine in an x86 processor, enabling native support for features like Microsoft Copilot+. The original 7040 series delivers up to 10 TOPS of INT8 performance via the NPU, while the Hawk Point refresh boosts this to 16 TOPS, qualifying for Copilot+ PC certification and accelerating AI workloads such as image generation and video upscaling. Graphics performance sees over 50% uplift compared to the prior Rembrandt platform's Radeon 680M, driven by RDNA 3 advancements, allowing efficient 1080p and 1440p gaming in titles like Cyberpunk 2077 at medium settings. The platform gained positive reception for its balance of power efficiency and integrated prowess, particularly in handheld gaming devices like the , which employs a custom Phoenix-derived Z1 Extreme APU (8 cores, 12-CU 780M at 15–30 W TDP). Reviews highlighted its ability to deliver playable frame rates at with low power draw, outperforming competitors like Intel's in gaming efficiency by up to 139% in select benchmarks, while maintaining strong battery life in ultrathin laptops. This efficiency stems from the 4nm process and optimized design, positioning Phoenix as a foundational step for AI-enhanced .

Strix Point (2024)

The Strix Point platform, codenamed for AMD's AI 300 series, was announced at in June 2024 and became available in premium laptops starting in July 2024. Targeted at high-end , it features up to 12 CPU cores—comprising a mix of full and dense Zen 5c cores—delivering up to 24 threads and a maximum boost clock of 5.1 GHz in models like the AI 9 HX 370. These monolithic APUs emphasize AI-driven tasks in thin-and-light designs from manufacturers such as and , with configurable power envelopes suited for creative professionals and mobile gamers. Integrated graphics are powered by the Radeon 890M, based on the RDNA 3.5 architecture with 16 compute units (1024 shaders) clocked up to 2.9 GHz, providing up to approximately 5.9 TFLOPS of FP32 compute performance. Enhancements include improved ray-tracing accelerators over prior RDNA generations and support for AMD Fluid Motion Frames, enabling frame generation for smoother gameplay in supported titles. The platform is fabricated on TSMC's 4 nm process node, with the core die measuring about 233 mm² and incorporating roughly 15 billion transistors across the CPU, GPU, and other components. It supports LPDDR5X memory up to 8000 MT/s and operates within a 15-54 W TDP range, though some configurations extend to 80 W for higher performance. The dedicated XDNA 2 neural processing unit delivers 50 TOPS of INT8 performance, optimized for generative AI workloads such as on-device large language models. Architecturally, Strix Point achieves a 16% increase in instructions per clock over the Zen 4-based predecessors, contributing to strong efficiency in multi-threaded applications. This marks the first AMD APU series capable of running advanced on-device LLMs fluidly without cloud dependency, excelling in creative workflows like video editing and 3D rendering due to the combined CPU, GPU, and NPU capabilities. Early reviews highlight its prowess in AI-accelerated content creation, with the integrated NPU handling tasks like image generation and real-time upscaling more efficiently than CPU-only alternatives.

Strix Halo (2025)

Strix Halo employs a chiplet-based design, contrasting with the monolithic architecture of Strix Point. It supports up to 16 Zen 5 cores, exceeding Strix Point's maximum of 12 cores combining Zen 5 and Zen 5c variants. The integrated Radeon 8060S graphics, based on RDNA 3.5, feature up to 40 compute units—more than double the 16 in Strix Point's Radeon 890M—delivering performance comparable to discrete GPUs. Enhanced memory bandwidth, up to 256 GB/s via a 256-bit LPDDR5X interface, supports these capabilities. Targeting thicker high-performance laptops and handhelds with greater cooling demands, Strix Halo suits intensive gaming and compute tasks, while Strix Point prioritizes efficiency in premium thin-and-light devices.

References

  1. https://en.wikichip.org/wiki/amd/microarchitectures/zen
  2. https://en.wikichip.org/wiki/amd/microarchitectures/zen_2
  3. https://en.wikichip.org/wiki/amd/microarchitectures/zen_4
Add your contribution
Related Hubs
User Avatar
No comments yet.