Hubbry Logo
Zen (first generation)Zen (first generation)Main
Open search
Zen (first generation)
Community hub
Zen (first generation)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Zen (first generation)
Zen (first generation)
from Wikipedia
AMD Zen
The logo for the Zen microarchitecture is a closed ensō
General information
LaunchedMarch 2, 2017; 8 years ago (March 2, 2017)[1]
Designed byAMD
Common manufacturer
CPUID codeFamily 17h
Cache
L1 cache64 KB instruction, 32 KB data per core
L2 cache512 KB per core
L3 cache8 MB per CCX (APU: 4 MB)
Architecture and classification
Technology node14 nm (FinFET)[2]
MicroarchitectureZen
Instruction setAMD64 (x86-64)
Physical specifications
Transistors
  • 4.8 billion per 8-core "Zeppelin" die[3]
Cores
    • 2–4 (essential)
    • 4–8 (mainstream)
    • 8–16 (enthusiast)[4][5][6][7]
    • Up to 32 (server)[4][8]
Sockets
Products, models, variants
Product code names
  • Summit Ridge (Desktop)
  • Whitehaven (HEDT)
  • Raven Ridge (APU/Embedded)
  • Naples (Server CPU)
  • Snowy Owl (Server APU)[10]
Brand names
History
PredecessorExcavator (4th gen)
SuccessorZen+
Support status
Supported

Zen is the first iteration in the Zen family of computer processor microarchitectures from AMD. It was first used with their Ryzen series of CPUs in February 2017.[4] The first Zen-based preview system was demonstrated at E3 2016, and first substantially detailed at an event hosted a block away from the Intel Developer Forum 2016. The first Zen-based CPUs, codenamed "Summit Ridge", reached the market in early March 2017, Zen-derived Epyc server processors launched in June 2017[11] and Zen-based APUs arrived in November 2017.[12]

Zen is a clean sheet design that differs from AMD's previous long-standing Bulldozer architecture. Zen-based processors use a 14 nm FinFET process, are reportedly more energy efficient, and can execute significantly more instructions per cycle. SMT has been introduced, allowing each core to run two threads. The cache system has also been redesigned, making the L1 cache write-back. Zen processors use three different sockets: desktop Ryzen chips use the AM4 socket, bringing DDR4 support; the high-end desktop Zen-based Threadripper chips support quad-channel DDR4 memory and offer 64 PCIe 3.0 lanes (vs 24 lanes), using the TR4 socket;[13][14] and Epyc server processors offer 128 PCIe 3.0 lanes and octa-channel DDR4 using the SP3 socket.

Zen is based on a SoC design.[15] The memory controller and the PCIe, SATA, and USB controllers are incorporated into the same chip(s) as the processor cores. This has advantages in bandwidth and power, at the expense of chip complexity and die area.[16] This SoC design allows the Zen microarchitecture to scale from laptops and small-form factor mini PCs to high-end desktops and servers.

By 2020, 260 million Zen cores have already been shipped by AMD.[17]

Design

[edit]
A highly simplified illustration of the Zen microarchitecture: a core has a total of 512 KB of L2 cache.
Ryzen 3 1200 die shot
Photomontage of a delidded Zen CPU with an etched die
A delidded AMD EPYC 7001 processor used in servers. The four dies are similar to the ones used in mainstream processors. All EPYC processors contain four dies to provide structural support to the IHS (Integrated Heat Spreader).[18][19][20]
A delidded AMD Athlon 3000G APU, based on the Zen architecture. The die is physically smaller than those on mainstream Zen processors.
Die shot of an AMD Athlon 3000G

According to AMD, the main focus of Zen is on increasing per-core performance.[21][22][23]

New or improved features include:[24]

  • The L1 cache has been changed from write-through to write-back, allowing for lower latency and higher bandwidth.
  • SMT (simultaneous multithreading) architecture allows for two threads per core, a departure from the CMT (clustered multi-thread) design used in the previous Bulldozer architecture. This is a feature previously offered in some IBM, Intel and Oracle processors.[25]
  • A fundamental building block for all Zen-based CPUs is the Core Complex (CCX) consisting of four cores and their associated caches. Processors with more than four cores consist of multiple CCXs connected by Infinity Fabric.[26] Processors with non-multiple-of-four core counts have some cores disabled.
  • Four ALUs, two AGUs/load–store units, and two floating-point units per core.[27]
  • Newly introduced "large" micro-operation cache.[28]
  • Each SMT core can dispatch up to six micro-ops per cycle (a combination of 6 integer micro-ops and 4 floating point micro-ops per cycle).[29][30]
  • Close to 2× faster L1 and L2 bandwidth, with total L3 cache bandwidth up 5×.
  • Clock gating.
  • Larger retire, load, and store queues.
  • Improved branch prediction using a hashed perceptron system with Indirect Target Array similar to the Bobcat microarchitecture,[31] something that has been compared to a neural network by AMD engineer Mike Clark.[32]
  • The branch predictor is decoupled from the fetch stage.
  • A dedicated stack engine for modifying the stack pointer, similar to that of Intel Haswell and Broadwell processors.[33]
  • Move elimination, a method that reduces physical data movement to reduce power consumption.
  • Binary compatibility with Intel's Skylake (excluding VT-x and private MSRs):
  • CLZERO instruction for clearing a cache line.[34] Useful for handling ECC-related Machine-check exceptions.
  • PTE (page table entry) coalescing, which combines 4 kB page tables into 32 kB page size.
  • "Pure Power" (more accurate power monitoring sensors).[35]
    • Support for intel-style running average power limit (RAPL) measurement.[36]
  • Smart Prefetch.
  • Precision Boost.
  • eXtended Frequency Range (XFR), an automated overclocking feature which boosts clock speeds beyond the advertised turbo frequency.[37]

This is the first time in a very long time that we engineers have been given the total freedom to build a processor from scratch and do the best we can do. It is a multi-year project with a really large team. It's like a marathon effort with some sprints in the middle. The team is working very hard, but they can see the finish line. I guarantee that it will deliver a huge improvement in performance and power consumption over the previous generation.

— Suzanne Plummer, Zen team leader, on September 19th, 2015.[38]

The Zen architecture is built on a 14 nanometer FinFET process subcontracted to GlobalFoundries,[39] which in turn licenses its 14 nm process from Samsung Electronics.[40] This gives greater efficiency than the 32 nm and 28 nm processes of previous AMD FX CPUs and AMD APUs, respectively.[41] The "Summit Ridge" Zen family of CPUs use the AM4 socket and feature DDR4 support and a 95 W TDP (thermal design power).[41] While newer roadmaps don't confirm the TDP for desktop products, they suggest a range for low-power mobile products with up to two Zen cores from 5 to 15 W and 15 to 35 W for performance-oriented mobile products with up to four Zen cores.[42]

Each Zen core can decode four instructions per clock cycle and includes a micro-op cache which feeds two schedulers, one each for the integer and floating point segments.[43][44] Each core has two address generation units, four integer units, and four floating point units. Two of the floating point units are adders, and two are multiply-adders. However, using multiply-add-operations may prevent simultaneous add operation in one of the adder units.[45] There are also improvements in the branch predictor. The L1 cache size is 64 KB for instructions per core and 32 KB for data per core. The L2 cache size 512 KB per core, and the L3 is 1–2 MB per core. L3 caches offer 5× the bandwidth of previous AMD designs.

History and development

[edit]

AMD began planning the Zen microarchitecture shortly after re-hiring Jim Keller in August 2012.[46] AMD formally revealed Zen in 2015.

The team in charge of Zen was led by Keller (who left in September 2015 after a 3-year tenure) and Zen Team Leader Suzanne Plummer.[47][48] The Chief Architect of Zen was AMD Senior Fellow Michael Clark.[49][50][51]

Zen was originally planned for 2017 following the ARM64-based K12 sister core, but on AMD's 2015 Financial Analyst Day it was revealed that K12 was delayed in favor of the Zen design, to allow it to enter the market within the 2016 timeframe,[9] with the release of the first Zen-based processors expected for October 2016.[52]

In November 2015, a source inside AMD reported that Zen microprocessors had been tested and "met all expectations" with "no significant bottlenecks found".[2][53]

In December 2015, it was rumored that Samsung may have been contracted as a fabricator for AMD's 14 nm FinFET processors, including both Zen and AMD's then-upcoming Polaris GPU architecture.[54] This was clarified by AMD's July 2016 announcement that products had been successfully produced on Samsung's 14 nm FinFET process.[55] AMD stated Samsung would be used "if needed", arguing this would reduce risk for AMD by decreasing dependence on any one foundry.

In December 2019, AMD started putting out first generation Ryzen products built using the second generation Zen+ architecture.[56]

Advantages over predecessors

[edit]

Manufacturing process

[edit]

Processors based on Zen use 14 nm FinFET silicon.[57] These processors are reportedly produced at GlobalFoundries.[58] Prior to Zen, AMD's smallest process size was 28 nm, as utilized by their Steamroller and Excavator microarchitectures.[59][60] The immediate competition, Intel's Skylake and Kaby Lake microarchitecture, are also fabricated on 14 nm FinFET;[61] though Intel planned to begin the release of 10 nm parts later in 2017.[62] Intel was unable to reach this goal, and in 2021, only mobile chips have been produced with the 10nm process. In comparison to Intel's 14 nm FinFET, AMD claimed in February 2017 the Zen cores would be 10% smaller.[63] Intel has later announced in July 2018 that 10nm mainstream processors should not be expected before the second half of 2019.[64]

For identical designs, these die shrinks would use less current (and power) at the same frequency (or voltage). As CPUs are usually power limited (typically up to ~125 W, or ~45 W for mobile), smaller transistors allow for either lower power at the same frequency, or higher frequency at the same power.[65]

Performance

[edit]

One of Zen's major goals in 2016 was to focus on performance per-core, and it was targeting a 40% improvement in instructions per cycle (IPC) over its predecessor.[66] Excavator, in comparison, offered 4–15% improvement over previous architectures.[67][68] AMD announced the final Zen microarchitecture actually achieved 52% improvement in IPC over Excavator.[69] The inclusion of SMT also allows each core to process up to two threads, increasing processing throughput by better use of available resources.

The Zen processors also employ sensors across the chip to dynamically scale frequency and voltage.[70] This allows for the maximum frequency to be dynamically and automatically defined by the processor itself based upon available cooling.

AMD has demonstrated an 8-core/16-thread Zen processor outperforming an equally-clocked Intel Broadwell-E processor in Blender rendering[4][10] and HandBrake benchmarks.[70]

Zen supports AVX2 but it requires two clock cycles to complete each AVX2 instruction compared to Intel's one.[71][72] This difference was corrected in Zen 2.

Memory

[edit]

Zen supports DDR4 memory (up to eight channels)[73] and ECC.[74]

Pre-release reports stated APUs using the Zen architecture would also support High Bandwidth Memory (HBM).[75] However, the first demonstrated APU did not use HBM.[76] Previous APUs from AMD relied on shared memory for both the GPU and the CPU.

Power consumption and heat output

[edit]

Processors built at the 14 nm node on FinFET silicon should show reduced power consumption and therefore heat over their 28 nm and 32 nm non-FinFET predecessors (for equivalent designs), or be more computationally powerful at equivalent heat output/power consumption.

Zen also uses clock gating,[44] reducing the frequency of underutilized portions of the core to save power. This comes from AMD's SenseMI technology, using sensors across the chip to dynamically scale frequency and voltage.[70]

Enhanced security and virtualization support

[edit]

Zen added support for AMD's Secure Memory Encryption (SME) and AMD's Secure Encrypted Virtualization (SEV). Secure Memory Encryption is real-time memory encryption done per page table entry. Encryption occurs on a hardware AES engine and keys are managed by the onboard "Security" Processor (ARM Cortex-A5) at boot time to encrypt each page, allowing any memory (including non-volatile varieties) to be encrypted. AMD SME also makes the contents of the memory more resistant to memory snooping and cold boot attacks.[77][78]

SME can be used to mark individual pages of memory as encrypted through the page tables. A page of memory that is marked encrypted will be automatically decrypted when read from DRAM and will be automatically encrypted when written to DRAM. The SME feature is identified through a CPUID function and enabled through the SYSCFG MSR. Once enabled, page table entries will determine how the memory is accessed. If a page table entry has the memory encryption mask set, then that memory will be accessed as encrypted memory. The memory encryption mask (as well as other related information) is determined from settings returned through the same CPUID function that identifies the presence of the feature.[79]

The Secure Encrypted Virtualization (SEV) feature allows the memory contents of a virtual machine (VM) to be transparently encrypted with a key unique to the guest VM. The memory controller contains a high-performance encryption engine which can be programmed with multiple keys for use by different VMs in the system. The programming and management of these keys is handled by the AMD Secure Processor firmware which exposes an API for these tasks.[80]

Connectivity

[edit]

Incorporating much of the southbridge into the SoC, the Zen CPU includes SATA, USB, and PCI Express NVMe links.[81][82] This can be augmented by available Socket AM4 chipsets which add connectivity options including additional SATA and USB connections, and support for AMD's Crossfire and Nvidia's SLI.[83]

AMD, in announcing its Radeon Instinct line, argued that the upcoming Zen-based Naples server CPU would be particularly suited for building deep learning systems.[84][85] The 128[86] PCIe lanes per Naples CPU allows for eight Instinct cards to connect at PCIe x16 to a single CPU. This compares favorably to the Intel Xeon line, with only 40[citation needed] PCIe lanes.

Features

[edit]

CPUs

[edit]

APUs

[edit]

APU features table

Products

[edit]

The Zen architecture is used in the current-generation desktop Ryzen CPUs. It is also in Epyc server processors (successor of Opteron processors), and APUs.[75][unreliable source][87][88]

The first desktop processors without graphics processing units (codenamed "Summit Ridge") were initially expected to start selling at the end of 2016, according to an AMD roadmap; with the first mobile and desktop processors of the AMD Accelerated Processing Unit type (codenamed "Raven Ridge") following in late 2017.[89] AMD officially delayed Zen until Q1 of 2017. In August 2016, an early demonstration of the architecture showed an 8-core/16-thread engineering sample CPU at 3.0 GHz.[10]

In December 2016, AMD officially announced the desktop CPU line under the Ryzen brand for release in Q1 2017. It also confirmed Server processors would be released in Q2 2017, and mobile APUs in H2 2017.[90]

On March 2, 2017, AMD officially launched the first Zen architecture-based octacore Ryzen desktop CPUs. The final clock speeds and TDPs for the 3 CPUs released in Q1 of 2017 demonstrated significant performance-per-watt benefits over the previous K15h (Piledriver) architecture.[91][92] The octacore Ryzen desktop CPUs demonstrated performance-per-watt comparable to Intel's Broadwell octacore CPUs.[93][94]

In March 2017, AMD also demonstrated an engineering sample of a server CPU based on the Zen architecture. The CPU (codenamed "Naples") was configured as a dual-socket server platform with each CPU having 32 cores/64 threads.[4][10]

Desktop processors

[edit]

Common features of Ryzen 1000 desktop CPUs:

  • Socket: AM4.
  • All the CPUs support DDR4-2666 in dual-channel mode.
  • All the CPUs support 24 PCIe 3.0 lanes. 4 of the lanes are reserved as link to the chipset.
  • No integrated graphics.
  • L1 cache: 96 KB (32 KB data + 64 KB instruction) per core.
  • L2 cache: 512 KB per core.
  • Node/fabrication process: GlobalFoundries 14 LP.
Branding and Model Cores
(threads)
Clock rate (GHz) L3 cache
(total)
TDP Core
config[i]
Release
date
Launch
price[a]
Base PBO
1–2
(≥3)
XFR[95]
1–2
Ryzen 7 1800X[96] 8 (16) 3.6 4.0
(3.7)
4.1 16 MB 95 W 2 × 4 March 2, 2017 US $499
PRO 1700X 3.4 3.8
(3.5)
3.9 June 29, 2017 OEM
1700X[96] March 2, 2017 US $399
PRO 1700 3.0 3.7
(3.2)
3.75 65 W June 29, 2017 OEM
1700[96] March 2, 2017 US $329
Ryzen 5 1600X[97] 6 (12) 3.6 4.0
(3.7)
4.1 95 W 2 × 3 April 11, 2017 US $249
PRO 1600 3.2 3.6
(3.4)
3.7 65 W June 29, 2017 OEM
1600[97] April 11, 2017 US $219
1500X[97] 4 (8) 3.5 3.7
(3.6)
3.9 2 × 2 US $189
PRO 1500 June 29, 2017 OEM
1400[97] 3.2 3.4
(3.4)
3.45 8 MB April 11, 2017 US $169
Ryzen 3 1300X[98] 4 (4) 3.5 3.7
(3.5)
3.9 July 27, 2017 US $129
PRO 1300 June 29, 2017 OEM
PRO 1200 3.1 3.4
(3.1)
3.45
1200[98] July 27, 2017 US $109
  1. ^ Core Complexes (CCX) × cores per CCX


Common features of Ryzen 1000 HEDT CPUs:

  • Socket: TR4.
  • All the CPUs support DDR4-2666 in quad-channel mode.
  • All the CPUs support 64 PCIe 3.0 lanes. 4 of the lanes are reserved as link to the chipset.
  • No integrated graphics.
  • L1 cache: 96 KB (32 KB data + 64 KB instruction) per core.
  • L2 cache: 512 KB per core.
  • Node/fabrication process: GlobalFoundries 14LP.
Branding and Model Cores
(threads)
Clock rate (GHz) L3 cache
(total)
TDP Chiplets Core
config[i]
Release
date
Launch
price[a]
Base PBO
1–4
(≥5)
XFR[95]
1–2
Ryzen
Threadripper
1950X[99] 16 (32) 3.4 4.0
(3.7)
4.2 32 MB 180 W 2 × CCD[ii] 4 × 4 August 10, 2017 US $999
1920X[99] 12 (24) 3.5 4 × 3 US $799
1900X[99] 8 (16) 3.8 4.0
(3.9)
16 MB 2 × 4 August 31, 2017 US $549
  1. ^ Core Complexes (CCX) × cores per CCX
  2. ^ Processor package actually contains two additional inactive dies to provide structural support to the integrated heat spreader.
Ryzen 5 1600 CPU on a motherboard
Threadripper 1950X TR4 in socket

Desktop APUs

[edit]

Ryzen APUs are identified by either the G or GE suffix in their name.

Die shot of an AMD 2200G APU
Model Release date
& price
Fab Thermal Solution CPU GPU Socket PCIe lanes DDR4
memory
support
TDP
(W)
Cores
(threads)
Clock rate (GHz) Cache Model Config[i] Clock
(GHz)
Processing
power
(GFLOPS)[ii]
Base Boost L1 L2 L3
Athlon 200GE[100] September 6, 2018
US $55
GloFo
14LP
AMD 65W thermal solution 2 (4) 3.2 64 KB inst.
32 KB data
per core
512 KB
per core
4 MB Vega 3 192:12:4
3 CU
1.0 384 AM4 16 (8+4+4) 2667
dual-channel
35
Athlon Pro 200GE[101] September 6, 2018
OEM
OEM
Athlon 220GE[102] December 21, 2018
US $65
AMD 65W thermal solution 3.4
Athlon 240GE[103] December 21, 2018
US $75
3.5
Athlon 3000G[104] November 19, 2019
US $49
1.1 424.4
Athlon 300GE[105] July 7, 2019
OEM
OEM 3.4
Athlon Silver 3050GE[106] July 21, 2020
OEM
Ryzen 3 Pro 2100GE[107] c. 2019

OEM

3.2 ? ? 2933
dual-channel
Ryzen 3 2200GE[108] April 19, 2018
OEM
4 (4) 3.2 3.6 Vega 8 512:32:16
8 CU
1126
Ryzen 3 Pro 2200GE[109] May 10, 2018
OEM
Ryzen 3 2200G February 12, 2018
US $99
Wraith Stealth 3.5 3.7 45–
65
Ryzen 3 Pro 2200G[110] May 10, 2018
OEM
OEM
Ryzen 5 2400GE[111] April 19, 2018
OEM
4 (8) 3.2 3.8 RX Vega 11 704:44:16
11 CU
1.25 1760 35
Ryzen 5 Pro 2400GE[112] May 10, 2018
OEM
Ryzen 5 2400G[113] February 12, 2018[114][115]
US $169
Wraith Stealth 3.6 3.9 45–
65
Ryzen 5 Pro 2400G[116] May 10, 2018
OEM
OEM
  1. ^ Unified Shaders : Texture Mapping Units : Render Output Units and Compute Units (CU)
  2. ^ Single-precision performance is calculated from the base (or boost) core clock speed based on a FMA operation.

Mobile APUs

[edit]
Model Release
date
Fab CPU GPU Socket PCIe
lanes
Memory
support
TDP
Cores
(threads)
Clock rate (GHz) Cache Model Config[i] Clock
(MHz)
Processing
power
(GFLOPS)[ii]
Base Boost L1 L2 L3
Athlon Pro 200U 2019 GloFo
14LP
2 (4) 2.3 3.2 64 KB inst.
32 KB data
per core
512 KB
per core
4 MB Radeon Vega 3 192:12:4
3 CU
1000 384 FP5 12 (8+4) DDR4-2400
dual-channel
12–25 W
Athlon 300U Jan 6, 2019 2.4 3.3
Ryzen 3 2200U Jan 8, 2018 2.5 3.4 1100 422.4
Ryzen 3 3200U Jan 6, 2019 2.6 3.5 1200 460.8
Ryzen 3 2300U Jan 8, 2018 4 (4) 2.0 3.4 Radeon Vega 6 384:24:8
6 CU
1100 844.8
Ryzen 3 Pro 2300U May 15, 2018
Ryzen 5 2500U Oct 26, 2017 4 (8) 3.6 Radeon Vega 8 512:32:16
8 CU
1126.4
Ryzen 5 Pro 2500U May 15, 2018
Ryzen 5 2600H Sep 10, 2018 3.2 DDR4-3200
dual-channel
35–54 W
Ryzen 7 2700U Oct 26, 2017 2.2 3.8 Radeon RX Vega 10 640:40:16
10 CU
1300 1664 DDR4-2400
dual-channel
12–25 W
Ryzen 7 Pro 2700U May 15, 2018 Radeon Vega 10
Ryzen 7 2800H Sep 10, 2018 3.3 Radeon RX Vega 11 704:44:16
11 CU
1830.4 DDR4-3200
dual-channel
35–54 W
  1. ^ Unified shaders : Texture mapping units : Render output units and Compute units (CU)
  2. ^ Single precision performance is calculated from the base (or boost) core clock speed based on a FMA operation.

Ultra-mobile APUs

[edit]

Dalí

[edit]
Model Release
date
Fab CPU GPU Socket PCIe
lanes
Memory
support
TDP Part number
Cores
(threads)
Clock rate (GHz) Cache Model Config[a] Clock
(GHz)
Processing
power
(GFLOPS)[b]
Base Boost L1 L2 L3
AMD 3020e Jan 6, 2020 14 nm 2 (2) 1.2 2.6 64 KB inst.
32 KB data
per core
512 KB
per core
4 MB Radeon
Graphics
(Vega)
192:12:4
3 CU
1.0 384 FP5 12 (8+4) DDR4-2400
dual-channel
6 W YM3020C7T2OFG
Athlon PRO 3045B Q1 2021 2.3 3.2 128:8:4
2 CU
1.1 281.6 15 W YM3045C4T2OFG
Athlon Silver 3050U Jan 6, 2020 YM3050C4T2OFG
Athlon Silver 3050C Sep 22, 2020 YM305CC4T2OFG
Athlon Silver 3050e Jan 6, 2020 2 (4) 1.4 2.8 192:12:4
3 CU[117]
1.0 384 6 W YM3050C7T2OFG
Athlon PRO 3145B Q1 2021 2.4 3.3 15 W YM3145C4T2OFG
Athlon Gold 3150U Jan 6, 2020 YM3150C4T2OFG
Athlon Gold 3150C Sep 22, 2020 YM315CC4T2OFG
Ryzen 3 3250U Jan 6, 2020 2.6 3.5 1.2 460.8 YM3250C4T2OFG
Ryzen 3 3250C Sep 22, 2020 YM325CC4T2OFG
  1. ^ Unified shaders : Texture mapping units : Render output units and Compute units (CU)
  2. ^ Single precision performance is calculated from the base (or boost) core clock speed based on a FMA operation.

Pollock

[edit]
Model Release
date
Fab CPU GPU Socket PCIe
lanes
Memory
support
TDP Part number
Cores
(threads)
Clock rate (GHz) Cache Model Config[a] Clock
(GHz)
Processing
power
(GFLOPS)[b]
Base Boost L1 L2 L3
AMD 3015e Jul 6, 2020 14 nm 2 (4) 1.2 2.3 64 KB inst.
32 KB data
per core
512 KB
per core
4 MB Radeon
Graphics
(Vega)
192:12:4
3 CU
0.6 230.4 FT5 12 (8+4) DDR4-1600
single-channel
6 W AM3015BRP2OFJ
AMD 3015Ce Apr 29, 2021 AM301CBRP2OFJ
  1. ^ Unified shaders : Texture mapping units : Render output units and Compute units (CU)
  2. ^ Single precision performance is calculated from the base (or boost) core clock speed based on a FMA operation.

Embedded processors

[edit]

V1000

[edit]

In February 2018, AMD announced the V1000 series of embedded Zen+Vega APUs with four SKUs.[118] (And three more SKUs in December that year.)

Model Release
date
Fab CPU GPU Memory
support
TDP Junction
temp.
range

(°C)
Cores
(threads)
Clock rate (GHz) Cache Model Config[i] Clock
(GHz)
Processing
power
(GFLOPS)[ii]
Base Boost L1 L2 L3
V1202B February 2018 GloFo
14LP
2 (4) 2.3 3.2 64 KB inst.
32 KB data
per core
512 KB
per core
4 MB Vega 3 192:12:16
3 CU
1.0 384 DDR4-2400
dual-channel
12–25 W 0–105
V1404I December 2018 4 (8) 2.0 3.6 Vega 8 512:32:16
8 CU
1.1 1126.4 -40–105
V1500B 2.2 0–105
V1605B February 2018 2.0 3.6 Vega 8 512:32:16
8 CU
1.1 1126.4
V1756B 3.25 DDR4-3200
dual-channel
35–54 W
V1780B December 2018 3.35
V1807B February 2018 3.8 Vega 11 704:44:16
11 CU
1.3 1830.4
  1. ^ Unified Shaders : Texture Mapping Units : Render Output Units and Compute Units (CU)
  2. ^ Single-precision performance is calculated from the base (or boost) core clock speed based on a FMA operation.

R1000

[edit]

In 2019, AMD announced the R1000 series of embedded Zen+Vega APUs.

Model Release
date
Fab CPU GPU Memory
support
TDP
Cores
(threads)
Clock rate (GHz) Cache Model Config[i] Clock
(GHz)
Processing
power
(GFLOPS)[ii]
Base Boost L1 L2 L3
R1102G February 25, 2020 GloFo
14LP
2 (2) 1.2 2.6 64 KB inst.
32 KB data
per core
512 KB
per core
4 MB Vega 3 192:12:4
3 CU
1.0 384 DDR4-2400
single-channel
6 W
R1305G 2 (4) 1.5 2.8 DDR4-2400
dual-channel
8-10 W
R1505G April 16, 2019 2.4 3.3 12–25 W
R1606G 2.6 3.5 1.2 460.8
  1. ^ Unified Shaders : Texture Mapping Units : Render Output Units and Compute Units (CU)
  2. ^ Single-precision performance is calculated from the base (or boost) core clock speed based on a FMA operation.

Server processors

[edit]
Epyc

AMD announced in March 2017 that it would release a server platform based on Zen, codenamed Naples, in the second quarter of the year. The platform include 1- and 2-socket systems. The CPUs in multi-processor configurations communicate via AMD's Infinity Fabric.[119] Each chip supports eight channels of memory and 128 PCIe 3.0 lanes, of which 64 lanes are used for CPU-to-CPU communication through Infinity Fabric when installed in a dual-processor configuration.[120] AMD officially revealed Naples under the brand name Epyc in May 2017.[121]

On June 20, 2017, AMD officially released the Epyc 7000 series CPUs at a launch event in Austin, Texas.[122]

Common features:

  • SP3 socket
  • Zen microarchitecture
  • GloFo 14 nm process
  • MCM with four System-on-a-chip (SOC) dies, two core complexes (CCX) per SOC die[123]
  • Eight-channel DDR4-2666 (the 7251 model is limited to DDR4-2400)
  • 128 PCIe 3.0 lanes per socket, 64 of which are used for Infinity Fabric inter-processor links in 2P platforms
  • 7xx1P series models are limited to uniprocessor operation (1P, single-socket)
Model Cores
(threads)
Chip-
lets
Core
config[i]
Clock rate Cache size Soc-
ket
Sca-
ling
TDP
(W)
Release
date
Release
price
Embedded
options[ii]
Base
(GHz)
Boost
(GHz)
L2
per core
L3
per CCX
Total
7251[124][125]   8 (16) 4[123]   8 × 1 2.1 2.9 512 KB 4 MB 36 MB  SP3 2P 120 Jun 2017[126]   $475 7251
7261[124][127] 2.5 2.9 8 MB 68 MB 2P 155/170 Jun 2018[128]   $570 7261
7281[124][125] 16 (32)   8 × 2 2.1 2.7 4 MB 40 MB 2P 155/170 Jun 2017[126]   $650 7281
7301[124][125] 2.2 2.7 8 MB 72 MB 2P   $800 7301
7351(P)[124][125] 2.4 2.9 2P (1P) $1100 ($750) 7351(735P)
7371[124][129] 3.1 3.8 2P 200 Nov 2018[130] $1550 7371
7401(P)[124][125] 24 (48)   8 × 3 2.0 3.0 8 MB 76 MB 2P (1P) 155/170 Jun 2017[126] $1850 ($1075) 7401(740P)
7451[124][125] 2.3 3.2 2P 180 $2400 7451
7501[124][125] 32 (64)   8 × 4 2.0 3.0 8 MB 80 MB 2P 155/170 Jun 2017[126] $3400 7501
7551(P)[124][125] 2.0 3.0 2P (1P) 180 $3400 ($2100) 7551(755P)
7571[131][132] 2.2 3.0 2P 200 Nov 2018 OEM/AWS --
7601[124][125] 2.2 3.2 2P 180 Jun 2017[126] $4200 7601
  1. ^ Core Complexes (CCX) × cores per CCX
  2. ^ Epyc Embedded 7001 series models have identical specifications as the respective Epyc 7001 series.

Embedded server processors

[edit]

In February 2018, AMD also announced the EPYC 3000 series of embedded Zen CPUs.[133]Common features of EPYC Embedded 3000 series CPUs:

  • Socket: SP4 (31xx and 32xx models use SP4r2 package).
  • All the CPUs support ECC DDR4-2666 in dual-channel mode (3201 supports only DDR4-2133), while 33xx and 34xx models support quad-channel mode.
  • L1 cache: 96 KB (32 KB data + 64 KB instruction) per core.
  • L2 cache: 512 KB per core.
  • All the CPUs support 32 PCIe 3.0 lanes per CCD (max 64 lanes).
  • Fabrication process: GlobalFoundries 14 nm.
Model Cores
(threads)
Clock rate (GHz) L3 cache
(total)
TDP Chiplets Core
config[i]
Release
date
Base Boost
All-core Max
3101[134] 4 (4) 2.1 2.9 2.9 8 MB 35 W 1 × CCD 1 × 4 Feb 2018
3151[134] 4 (8) 2.7 16 MB 45 W 2 × 2
3201[134] 8 (8) 1.5 3.1 3.1 30 W 2 × 4
3251[134] 8 (16) 2.5 55 W
3255[135] 25–55 W Dec 2018
3301[134] 12 (12) 2.0 2.15 3.0 32 MB 65 W 2 × CCD 4 × 3 Feb 2018
3351[134] 12 (24) 1.9 2.75 60–80 W
3401[134] 16 (16) 1.85 2.25 85 W 4 × 4
3451[134] 16 (32) 2.15 2.45 80–100 W
  1. ^ Core Complexes (CCX) × cores per CCX

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Zen (first generation) is the inaugural iteration of Advanced Micro Devices' (AMD) Zen microarchitecture family, a complete ground-up redesign of the company's x86 CPU core that was launched on March 2, 2017, with the debut of the Ryzen 1000-series desktop processors. Fabricated using GlobalFoundries' 14 nm FinFET process technology, it powers AMD's Ryzen consumer CPUs, Threadripper high-end desktop processors, and EPYC server chips, offering up to 8 cores and 16 threads in initial consumer models while enabling scalable multi-socket configurations in data centers. The architecture delivers a 40% improvement in instructions per clock (IPC) over the preceding Excavator microarchitecture, emphasizing balanced single-threaded performance, power efficiency, and multi-core scalability through innovative features like simultaneous multithreading (SMT). This redesign shifted AMD away from the modular Bulldozer-era cores toward a more conventional, high-IPC engine with a 19-stage and support for AVX2 instructions. Key components include a 4-wide dual-pump decoder, a micro-op cache holding up to 2K ops, and six execution ports feeding four units and two 128-bit floating-point units per core. The cache subsystem features 64 KB L1 instruction cache, 32 KB L1 data cache, and 512 KB private L2 cache per core, augmented by 8 MB of shared L3 cache per four-core complex (CCX) with victim cache functionality to minimize latency. Zen also incorporates advanced branch using a perceptron-based predictor providing improved accuracy over prior designs, alongside support for DDR4-2666 memory and PCIe 3.0 lanes, which collectively restored AMD's competitiveness against Intel's offerings in gaming, , and enterprise workloads.

History and development

Announcement and planning

In 2012, AMD initiated a major overhaul of its x86 architecture in response to the performance shortcomings of the family, which had failed to compete effectively with Intel's offerings due to its shared module design and lower instructions per clock efficiency. To lead this effort, AMD rehired veteran architect as Corporate and Chief of Microprocessor Cores, tasking him with developing a clean-sheet next-generation codenamed that emphasized independent core designs, higher IPC, and broad applicability across consumer and server segments. AMD first publicly announced Zen on May 6, 2015, during its Financial Analyst Day event, revealing it as a revolutionary x86 core aimed at delivering up to 40% IPC uplift over prior architectures like , with support for and a new . The announcement outlined initial product codenames, including Summit Ridge for high-end desktop processors and for mainstream servers, both targeting compatibility with DDR4 memory and the AM4 socket ecosystem. In May 2016, at , AMD provided the first live public demonstration of a Zen-based AM4 desktop processor, refining the roadmap to emphasize multi-year and confirming high-volume production readiness for 2016 launches. Subsequent updates in August 2016 at the Developer Forum detailed the architecture's validated 40% IPC improvement over , exceeding initial targets through enhancements in branch prediction, execution units, and memory access. The strategic planning for centered on regaining market share from in , particularly by prioritizing server dominance with ' up to 32 cores and 64 threads for datacenter workloads, alongside consumer revitalization through Summit Ridge's 8-core configurations for gaming and content creation. This dual-focus approach aimed to restore AMD's competitiveness in both enterprise and client markets, where it had lost ground to Intel's Core and lines, by emphasizing balanced performance-per-watt and ecosystem openness via the AM4 platform. To support these goals, AMD forged a key manufacturing partnership with , achieving silicon validation on the 14nm FinFET (14LPP) process in November 2015, which enabled taped-out Zen prototypes and paved the way for efficient, high-volume production starting in 2016.

Engineering milestones

The development of the first-generation Zen microarchitecture began with the recruitment of key engineering talent, notably Jim Keller, who joined AMD in 2012 as senior vice president of the computing and graphics business group to lead the Zen core design effort. Keller, previously instrumental in AMD's K8 architecture and Apple's A-series processors, emphasized a modular approach to processor design, drawing inspiration from tiled and multi-chip module concepts to enable scalability and yield improvements, though the initial Zen implementation remained a monolithic die. This strategic hire marked a pivotal shift from prior architectures like Bulldozer, focusing on high instructions per clock (IPC) gains through a clean-slate design. Keller departed AMD in September 2015, but the Zen project continued to meet its milestones under new leadership. A major engineering milestone occurred with the prototype tape-out in late 2015, when successfully fabricated initial silicon using ' 14 nm FinFET process. Validation of these prototypes confirmed stable operation, with early tests achieving clock speeds up to 4 GHz under controlled conditions, meeting internal performance targets for the architecture's debut. This tape-out represented a critical validation step, demonstrating the feasibility of the 14 nm node for high-performance x86 cores after years of process co-optimization with . In 2016, further silicon validation of Zen prototypes revealed significant architectural advancements, including a 40% IPC uplift over the preceding cores through enhanced and wider dispatch capabilities. A key integration milestone was the incorporation of (SMT), enabling two threads per core to improve throughput on parallel workloads without compromising single-threaded efficiency. These validations confirmed the core's readiness for production, paving the way for the Summit Ridge CPUs. Engineering teams overcame notable challenges in branch prediction, implementing an advanced TAGE-style predictor that reduced misprediction penalties to around 15-19 cycles by improving accuracy on complex control flows, a marked improvement over prior designs. Finalization of the also addressed latency concerns, settling on a per-core configuration of 32 KB L1 data cache, 64 KB L1 instruction cache, and 512 KB unified L2 cache to balance hit rates and power efficiency in a monolithic layout.

Launch and initial reception

The first-generation Zen-based processors marked AMD's return to competitive high-performance computing with the launch of its consumer Ryzen lineup on March 2, 2017. The initial offerings included the Summit Ridge family, featuring eight-core models like the Ryzen 7 1800X, 1700X, and 1700, aimed at desktop enthusiasts and creators. Later that year, AMD expanded to the server market with the EPYC (Naples) processors, officially launched on June 20, 2017, following an announcement of the release date at Computex in late May. These 7000-series EPYC chips targeted data center workloads with up to 32 cores per socket, positioning AMD against Intel's Xeon dominance. Initial availability focused on the flagship 7 1800X, priced at $499, which provided eight cores and 16 threads at a base clock of 3.6 GHz and up to 4.0 GHz. This model significantly outperformed Intel's Broadwell-E Core i7-6900K in multi-threaded tasks, such as video encoding, where it achieved over 50% faster completion times in benchmarks using the on a 4K source file. AnandTech's testing similarly highlighted substantial multi-core gains, with the 1800X delivering up to 52% better performance in rendering compared to the i7-6900K, underscoring Zen's strength in parallel workloads. Reception was largely positive, with reviewers praising the Ryzen's exceptional value and multi-core prowess that rivaled or exceeded Intel's high-end offerings at half the price. Publications like awarded the 1800X an Editors' Choice for its potential and productivity performance, noting it as a game-changer for content creators. However, criticisms emerged regarding initial instability on AM4 motherboards, including erratic Precision Boost behavior and elevated reported temperatures under load, which AMD addressed through rapid firmware updates in the weeks following launch. Single-threaded performance also lagged behind Intel's contemporary chips by about 5-10%, impacting lightly threaded applications and some games, though it still matched or exceeded the older Broadwell-E in IPC efficiency. The launches drove immediate market momentum for , with its stock price more than quintupling from early 2017 levels through mid-year, reflecting investor confidence in Zen's viability. In the server segment, saw quick adoption by major OEMs including Dell EMC and , which introduced -based and servers at launch, contributing to 's server CPU climbing from near-zero to over 2% by year-end and signaling a shift away from exclusivity in enterprise deployments.

Design and architecture

Core microarchitecture

The Zen core features an engine that decodes up to four x86 instructions per cycle into micro-operations, with a dispatch width of six micro-operations per cycle to the execution units. This design provides a balanced allocation for and floating-point workloads, including four arithmetic logic units (ALUs) and two generation units (AGUs) for operations, alongside four floating-point execution pipes (two for addition and two for multiplication). Branch prediction in the Zen core utilizes a perceptron-based predictor augmented with a loop predictor and indirect target array, enabling two branches per branch target buffer (BTB) entry and supporting high accuracy through large L1 and L2 BTBs along with a 32-entry return stack buffer. This approach significantly improves prediction accuracy over predecessor architectures like , contributing to the overall 52% increase in instructions per clock (IPC) when including (SMT). The core layout organizes four cores into a Complex (CCX) that shares an 8 MB, 16-way associative L3 cache, which is mostly exclusive with respect to the L2 caches. This CCX structure is part of a modular design, where multiple CCXs can be interconnected via Infinity Fabric to scale up to eight CCXs for higher core counts in multi-chip modules. SMT is implemented as full 2-way per core, with competitive sharing of resources such as caches, decode units, schedulers, and execution pipelines between threads. Front-end queues operate in a round-robin fashion with priority overrides for the higher-priority thread, allowing full resource utilization in single-threaded mode while enabling effective scaling to 16 threads across eight cores.

Manufacturing and fabrication

The first-generation Zen processors were manufactured using GlobalFoundries' 14 nm FinFET process technology, specifically the 14LPP (low-power plus) variant, which provided a significant density and efficiency improvement over AMD's prior 28 nm nodes. This process enabled the fabrication of the monolithic 8-core "Zeppelin" die, which integrates 4.9 billion transistors across an area of approximately 213 mm². The partnership with GlobalFoundries, announced in 2015, marked a key milestone in bringing Zen to production, with silicon validation achieved ahead of the 2017 launch. For high-core-count variants such as EPYC server processors and Threadripper high-end desktop CPUs, AMD adopted a chiplet-based multi-chip module (MCM) design to enhance scalability and manufacturing yields. This approach separates the compute chiplets—each containing four Zen cores (referred to as core complex dies or CCDs)—from a central I/O die (IOD), interconnected via AMD's Infinity Fabric protocol for high-bandwidth, low-latency communication. The chiplet strategy allowed for better yield management by isolating defective cores within individual CCDs, enabling their exclusion from final packages rather than discarding entire large dies, which was particularly beneficial for scaling to 16, 24, or 32 cores in EPYC configurations. The I/O die, responsible for memory controllers, PCIe lanes, and system interfaces, was fabricated on a cost-optimized 28 nm process to balance performance with economic considerations, while the CCDs utilized the advanced 14 nm node. Early production faced yield challenges on the 14 nm node, contributing to launch delays, but these were largely resolved by mid-2017 through process optimizations at , paving the way for the release of 16-core Threadripper and models. In 2017, introduced minor silicon revisions to the design, focusing on optimizations that supported higher clock speeds without altering the core process node; a full node transition to 12 nm occurred only with the subsequent refresh. These fabrication choices contributed to improved power efficiency in multi-core setups, though detailed thermal impacts are addressed elsewhere.

and execution units

The Zen core employs a 19-stage integer pipeline designed to deliver high instructions per clock (IPC) through balanced throughput across stages. The pipeline is divided into fetch (four stages), decode (four stages), dispatch (one stage), execute (four stages), and retire (one stage). This structure allows the front end to fetch and decode up to four x86 instructions per cycle, while the dispatch stage allocates up to six micro-operations (μops) to execution resources, enabling efficient out-of-order execution. The execution units in the Zen core include four integer arithmetic logic units (ALUs) for handling arithmetic and logical operations, alongside two address generation units (AGUs)—two dedicated to loads and one to stores—to support operations. The floating-point (FP) execution resources consist of four 128-bit pipes capable of AVX2 vector operations, with each pipe processing 128-bit wide data internally to achieve full 256-bit throughput for scalar and vector FP instructions by splitting 256-bit operations. The load/store unit complements these by sustaining two 128-bit loads and one 128-bit store per cycle, contributing to the core's . To reduce front-end bottlenecks, Zen incorporates a 2K-entry micro-op (μop) cache that stores decoded instructions, bypassing the decode stages for frequently executed code paths such as loops. This cache can deliver up to 6.75 μops per cycle on average, with a peak bandwidth approaching seven μops, significantly improving IPC in compute-bound workloads by minimizing decode latency. Key latencies in the pipeline include branch resolution occurring in eight cycles from fetch, allowing for relatively quick recovery from mispredictions compared to prior AMD architectures. Floating-point addition exhibits a latency of three cycles for register-to-register operations like ADDSS or ADDPS, while multiplication takes four cycles for instructions such as MULSS or MULPS. These timings reflect the pipeline's optimization for balanced scalar and vector performance without excessive depth in FP execution.

Improvements over predecessors

Performance gains

The Zen microarchitecture delivered a substantial 52% average increase in instructions per clock (IPC) compared to the preceding Excavator architecture, as measured across SPEC CPU2006 integer (SPECint) and floating-point (SPECfp) benchmarks. This uplift stemmed from key enhancements, including a wider out-of-order execution engine with 10 execution ports—up from four in Excavator—enabling greater instruction-level parallelism, and an advanced branch predictor that achieved over 90% accuracy in typical workloads, minimizing pipeline disruptions. In SPECint_base2006, single-socket configurations saw a 64% gain at 3.4 GHz, reflecting Zen's improvements in integer domains. In practical benchmarks, these IPC gains translated to significant throughput improvements. For instance, the 8-core 7 1800X achieved a Cinebench R15 multi-threaded score of 1624 points, approximately 2.4 times that of the 8-core FX-8350's 665 points, demonstrating 's superior multi-threaded scaling in rendering workloads. Single-threaded performance also advanced, with scoring 58% higher than at an identical 3.4 GHz clock in Cinebench R15, and 76% ahead of the Piledriver-based FX-8350 in similar tests. particularly excelled in integer-heavy tasks, such as code compilation, where workloads like GCC in SPECint showed up to 60% faster execution times due to enhanced integer execution resources and larger, faster caches. Multi-core scaling remained strong up to 16 cores, enabled by the NUMA-aware Infinity Fabric interconnect running at approximately 10.6 GT/s, which minimized inter-core latency in multi-chiplet configurations like the first-generation Threadripper processors. In multi-threaded applications such as Cinebench R15, scaled nearly linearly from 8 to 16 cores, with the 16-core Threadripper 1950X delivering approximately double the score of its 8-core counterpart without significant bandwidth bottlenecks. However, Zen initially underperformed in AVX-heavy floating-point tasks, as its two 128-bit FMA units required splitting 256-bit AVX2 instructions, leading to lower throughput compared to contemporary architectures with native 256-bit support.

Power and thermal efficiency

The first-generation Zen microarchitecture marked a significant advancement in power efficiency for AMD processors, primarily through its adoption of GlobalFoundries' 14 nm FinFET process node, a substantial shrink from the 28 nm bulk CMOS process used in the preceding Bulldozer family. This transition alone contributed to approximately 70% better performance per watt in key workloads, enabling higher clock speeds and core counts without proportional increases in power draw. Desktop implementations of Zen, such as the Ryzen 1000 series, operated within a TDP envelope of 65 W to 95 W for mainstream models like the Ryzen 5 1600 and Ryzen 7 1800X, while high-end Threadripper variants extended to 180 W to support up to 16 cores. In multi-threaded scenarios, Zen delivered roughly 1.5 times the compared to the cores in the prior generation, driven by a 52% uplift in instructions per clock (IPC) alongside refined techniques. Idle power consumption was notably low for the era, with engineering samples of 8-core Zen dies idling at around 5 W package power, reflecting effective and low-leakage transistor designs that minimized static power dissipation even at scale. This efficiency extended to boost scenarios, where an 8-core Zen processor could sustain 4 GHz all-core operation at approximately 88 W, facilitated by Precision Boost technology's dynamic voltage and (DVFS), which adjusted supply voltage in real-time based on and conditions. Thermal management in Zen benefited from configurable TDP (cTDP) options, particularly in mobile variants, where high-performance SKUs supported up to 95 W envelopes to balance sustained loads in thin-and-light designs without excessive heat output. The introduction of Infinity Fabric further optimized power usage by providing a scalable, low-latency interconnect for inter-chiplet communication in multi-die configurations, reducing energy overhead from traditional on-die buses and minimizing leakage currents through efficient —contributing to overall system-level gains of 20-30% over 28 nm predecessors in integrated scenarios. These features collectively positioned as a competitive alternative to contemporary architectures in terms of thermal headroom and energy proportionality.

Memory subsystem advancements

The first-generation Zen architecture marked a significant upgrade in memory support by integrating a dual-channel DDR4 memory controller capable of operating at speeds up to 2666 MT/s. This configuration provided a bandwidth of approximately 42.7 GB/s, representing a 20-30% increase over the dual-channel DDR3-2133 support in predecessor architectures like , which delivered around 34 GB/s. The shift to DDR4 enabled higher capacity configurations, with support for up to 128 GB on desktop platforms and greater scalability in server implementations. Zen employed a multi-level designed for balanced performance and efficiency, with a 32 KB 8-way set-associative L1 data cache per core offering low-latency access to frequently used . Each core also featured a dedicated 512 KB 8-way L2 cache, providing a private store of up to 512 KB with inclusive properties relative to the L1. At the complex level, an 8 MB shared L3 cache per core complex (CCX), comprising four cores, utilized 16-way set associativity and functioned primarily as a victim cache to minimize evictions from lower levels while maintaining mostly exclusive with respect to the L2 caches. This design doubled the L1 and L2 bandwidth and quintupled the L3 bandwidth compared to prior AMD cores. Access latencies were optimized for the hierarchy, with the L1 data cache achieving approximately 4 cycles for hits, enabling rapid instruction and data retrieval. The L3 cache targeted around 12 cycles for hits within the same CCX, further aided by its victim cache mechanism that improved hit rates by retaining useful evicted data. Complementary improvements included enhanced hardware prefetchers for the L1 and L2 caches, which anticipated data streams and boosted overall bandwidth utilization by about 20%, reducing stalls in memory-intensive workloads. In server-oriented variants, such as those powering the EPYC processor family, Zen incorporated error-correcting code (ECC) support for DDR4 memory, allowing single- and double-error detection and correction to enhance data integrity and reliability in mission-critical environments. This feature was absent in consumer desktop implementations but aligned with enterprise demands for robust memory subsystems.

Security and virtualization enhancements

The first-generation Zen microarchitecture introduced significant advancements in security and virtualization, particularly through hardware-based memory encryption and enhanced virtualization support tailored for server and multi-tenant environments. A key feature is Secure Memory Encryption (SME), which provides system-wide memory protection by encrypting data in DRAM using a single AES-128 key generated randomly by the integrated AMD Secure Processor during boot. This encryption occurs transparently in the memory controller, defending against physical attacks such as cold boot or memory scraping without requiring software modifications, and is enabled via BIOS settings. Building on SME, Secure Encrypted Virtualization (SEV) extends protection to virtualized workloads by assigning a unique ephemeral encryption key to each virtual machine (VM), ensuring isolation from the hypervisor and other VMs even if the host system is compromised. SEV operates by allowing guests to mark specific memory pages for encryption, with key management handled securely by the AMD Secure Processor, thereby enabling confidential computing in cloud scenarios. Zen incorporates AMD-V (Secure Virtual Machine) technology for hardware-assisted , including nested paging via Rapid Virtualization Indexing (RVI), which accelerates guest-to-host address translations by combining guest and nested page tables in a single hardware walk, reducing overhead compared to software-emulated paging. This setup supports up to 255 concurrent VMs through an 8-bit Identifier (ASID) mechanism, allowing efficient tagging of translation contexts without frequent flushes. Additionally, Zen implements the FSGSBASE instruction set extensions, which enable efficient user-mode access to FS and GS segment bases for and secure addressing, minimizing privilege escalations and improving performance in multi-threaded virtualized applications. Following the disclosure of Spectre and Meltdown vulnerabilities in early 2018, AMD delivered post-launch updates (microcode revisions AGESA 1.0.0.6 and later) for Zen processors to implement initial hardware-software mitigations, including indirect branch restricted speculation barriers and isolation to curb attacks, though full protection relies on coordinated OS and patches. These enhancements collectively improve efficiency; notably, Zen's expanded (TLB)—with a 72-entry L1 data TLB and 2,048-entry shared L2 TLB—enables approximately 2x faster VM context switches relative to predecessors like , by reducing TLB misses and page walk latency during guest switches.

Key features

Multi-threading and core scaling

The Zen microarchitecture employs simultaneous multithreading (SMT) to support two hardware threads per core, allowing each core to process instructions from multiple threads concurrently while maintaining fair scheduling to balance resource allocation and prevent thread starvation. This approach enhances throughput in workloads with parallelism by better utilizing execution units during stalls, such as cache misses, resulting in an approximate 1.3x to 1.5x speedup for threaded applications compared to single-threaded operation on the same core. To minimize latency in multi-core environments, Zen groups four cores into a Core Complex (CCX), where they share an 8 MB victim L3 cache configured as a 16-way associative structure, providing uniform low-latency access (around 35-40 cycles) for intra-CCX data sharing and coherence. CCXs are linked via AMD's Infinity Fabric interconnect, a scalable on-die network that enables communication between complexes with low latencies, supporting efficient data transfer without significant bottlenecks in balanced workloads. In multi-socket and high-core-count configurations, Zen scales to up to 16 cores in first-generation Threadripper processors through a multi-chiplet design comprising two dies, each hosting two CCXs for a total of 16 MB L3 per die. For systems exceeding eight cores, (NUMA) domains are utilized to partition memory controllers and optimize locality, reducing remote access penalties in NUMA-aware software while preserving overall scalability. This structure delivers performance that grows nearly linearly with core count due to effective load balancing and fabric bandwidth in multi-threaded rendering tasks.

Integrated components in APUs

The first-generation Zen APUs, codenamed Raven Ridge, integrate graphics processing units (GPUs) directly on the die, marking a significant advancement in unified system-on-chip design. These GPUs consist of 8 to 11 compute units (CUs), each equipped with 64 stream processors, operating at clock speeds up to 1.3 GHz and delivering peak theoretical performance of around 1.76 TFLOPS in higher-end configurations. The architecture supports 12, asynchronous compute for parallel task execution, and features like high-bandwidth cache controller (HBCC) for improved memory efficiency in graphics workloads. This integration enables entry-level gaming and without a discrete GPU, targeting budget-conscious systems. I/O capabilities in Raven Ridge are handled through an integrated I/O die (IOD), which includes support for USB 3.1 Gen1 ports and 2.0b outputs, facilitating connectivity for peripherals and displays up to at 60 Hz. The GPU shares system DDR4 memory, with BIOS-configurable allocation up to 2 GB dedicated to graphics, enhancing performance in memory-constrained environments while maintaining compatibility with dual-channel DDR4-2933 configurations. This model optimizes bandwidth for both CPU and GPU tasks, though it relies on fast system RAM to mitigate bottlenecks. Mobile variants of Raven Ridge are designed for 35 W (TDP) envelopes, allowing dynamic power balancing between Zen CPU cores and Vega GPU to prioritize either compute or graphics demands based on workload. For instance, the desktop-oriented 5 2400G employs 11 CUs within a 65 W TDP, but mobile implementations like the 7 2700U scale down to 10 CUs while maintaining similar architectural efficiency. These provide roughly twice the performance over the prior Bristol Ridge generation's Radeon R7 iGPUs, enabling playable frame rates in modern titles and reducing reliance on external cards, thereby freeing PCIe lanes for storage or other expansions.

Instruction set extensions

The first-generation Zen microarchitecture, implemented in AMD Family 17h processors (models 00h-0Fh), builds upon the baseline (ISA), providing full compatibility with the AMD64 standard, also known as EM64T. This includes support for (LM), physical address extensions (PAE), page size extensions (PSE), and a 48-bit virtual and physical address space, enabling up to 256 terabytes of addressable memory. Essential foundational features such as the x87 floating-point unit (FPU), MMX, and extensions like MTRR and PAT are fully implemented, ensuring backward compatibility with prior and x86 architectures. Zen cores support Streaming SIMD Extensions (SSE) up to SSE4.2, including SSE4A (an AMD-specific variant), along with Supplemental SSE3 (SSSE3). For vector processing, the architecture incorporates Advanced Vector Extensions (AVX) and AVX2 with 256-bit wide operations, enabling efficient handling of floating-point and integer workloads in applications like multimedia and scientific computing. Fused Multiply-Add (FMA3) instructions are included as part of the AVX2 suite, allowing three-operand fused operations for improved precision and throughput in numerical computations. Notably, AVX-512 (512-bit vectors) is not supported in this generation, limiting peak vector throughput compared to later architectures. Additionally, half-precision floating-point conversion (F16C) enhances compatibility with reduced-precision formats. AMD-specific extensions in Zen include Bit Manipulation Instructions 2 (BMI2) for advanced bit-level operations such as parallel bit deposit/extract (PDEP/PEXT) and shift-double-precision instructions, which optimize algorithms in , compression, and . The Secure Hash Algorithm (SHA) extensions provide hardware acceleration for and SHA-256 hashing, reducing software overhead in cryptographic tasks like digital signatures and verification. These SHA instructions offer partial but targeted support for common hash functions, complementing broader security features. For virtualization, Zen implements Secure Virtual Machine (SVM), AMD's technology, with the Advanced Virtual Interrupt Controller (AVIC) extension to streamline by accelerating virtual interrupt delivery and reducing hypervisor involvement in interrupt handling. This enhances performance in virtualized environments, such as and server consolidation. SVM support is enabled via specific leaves and MSRs, ensuring interoperability with standards.

Product families

Desktop processors

The first-generation Zen-based desktop processors were introduced under the Summit Ridge codename as the 1000 series, marking AMD's return to with a monolithic die design fabricated on a . The lineup included models ranging from quad-core to octa-core configurations, with (SMT) enabled across all variants to double thread counts. For instance, the flagship 7 1800X featured 8 cores and 16 threads, a base clock of 3.6 GHz, a boost clock up to 4.0 GHz, a 95 W TDP, and a launch price of $499. These processors supported dual-channel DDR4-2666 and integrated a 16 MB L3 cache shared across cores. Desktop APU variants were introduced in the 2000G series under the Raven Ridge codename, launched in February 2018, integrating CPU cores with on a for systems without discrete GPUs. These quad-core models supported dual-channel DDR4-2933 memory and AM4 socket compatibility. The 5 2400G offered 4 cores and 8 threads, a base clock of 3.6 GHz boosting to 3.9 GHz, a 65 W TDP, and 11 with 11 compute units. The entry-level 3 2200G provided 4 cores and 4 threads (SMT disabled), 3.5 GHz base to 3.7 GHz boost, also at 65 W, with 8 (8 compute units), targeting budget gaming and at launch prices of $169 and $99, respectively. A minor revision known as Pinnacle Ridge followed in 2018, refreshing the architecture with optimizations on a 12 nm process while retaining the core design and feature set of Summit Ridge; this became the 2000 series. The series maintained compatibility with existing AM4 motherboards via updates and focused on higher clock speeds for improved single-threaded performance. The 7 2700X, as an example, offered the same 8 cores and 16 threads but with a base clock of 3.7 GHz and boost up to 4.3 GHz, at a 105 W TDP and $329 launch price. All 1000 and 2000 series desktop processors utilized the AM4 socket, providing 24 PCIe 3.0 lanes from the CPU (typically configured as x20 for and storage plus x4 reserved), without an integrated GPU to emphasize discrete pairings. High-end variants in the first-generation lineup included the Threadripper 1000 series under the codename, targeted at enthusiast and workstation users with up to 16 cores on the TR4 socket using a quad-chiplet configuration for . The top model, Threadripper 1950X, delivered 16 cores and 32 threads, supporting quad-channel DDR4-2666 memory and 64 PCIe 3.0 lanes, with a 180 W TDP.

Mobile and ultra-mobile APUs

The first-generation Zen mobile APUs, introduced under the Ryzen brand, targeted laptops and ultrathin devices with a focus on balancing performance, power efficiency, and integrated graphics for consumer applications. These processors integrated Zen CPU cores with Radeon Vega graphics on a 14 nm process node, supporting dual-channel DDR4-2400 memory and PCIe 3.0 interfaces, while being deployed in soldered BGA packages compatible with mobile platforms that often included USB-C connectivity for peripherals and charging. Raven Ridge formed the core of the initial mobile APU lineup in the 2000U series, launched in late 2017 for 15 W designs. These quad-core configurations delivered competitive multi-threaded performance against contemporary offerings, with (SMT) enabling up to eight threads. For instance, the 5 2500U featured four cores clocked from 2.0 GHz base to 3.6 GHz boost, paired with a 8 GPU offering eight compute units for light gaming and . Similarly, the higher-end 7 2700U boosted to 3.8 GHz with a 10 GPU (ten compute units), providing enhanced graphical capabilities while maintaining the 15 W TDP envelope. These emphasized thermal efficiency for fanless or low-noise chassis, with 4 MB of shared L3 cache supporting efficient core-to-core communication. A minor extension appeared in the 3000U series under the Picasso codename, which refined the on a 12 nm process for slightly better efficiency, though still rooted in design principles. These 15 W parts scaled down to dual- or quad-core options with improved graphics, such as Vega 11 in higher SKUs, targeting mainstream ultrathins up to 25 W configurable TDP in some implementations. The 3 3200U, for example, used two cores (four threads) at 2.6 GHz base and 3.5 GHz boost, integrated with 3 graphics for basic tasks, while quad-core models like the 5 3500U reached 3.7 GHz boost with 8. This lineup prioritized battery life extensions and seamless integration with display outputs, including support for external monitors via . For ultra-mobile segments like thin-and-light convertibles and tablets, AMD developed low-power variants: the Dali family, derived from Picasso designs for entry-level efficiency, and , based on Raven Ridge for sub-10 W operation. Dali processors, such as the 3 3250U, offered two Zen cores with SMT (up to four threads) at 2.6 GHz base and 3.5 GHz boost, alongside 3 graphics in a 15 W package, enabling all-day usage in compact form factors. targeted even lower TDPs around 6 W, as seen in the 3015e with two cores/four threads from 1.2 GHz to 2.3 GHz boost and basic graphics, suiting passive-cooled ultra-thins with minimal thermal overhead. Both families retained DDR4-2400 support and focused on for extended standby, distinguishing them from higher-wattage mobile counterparts.
APU ModelCores/ThreadsBase/Boost Clock (GHz)TDP (W)iGPU
5 2500U (Raven Ridge)4/82.0/3.615 8
7 2700U (Raven Ridge)4/82.2/3.815 10
3 3200U (Picasso)2/42.6/3.515 3
3 3250U (Dali)2/42.6/3.515 3
3015e (Pollock)2/41.2/2.36 3

Embedded and server processors

The first-generation architecture powered AMD's entry into the server market through the 7001 series, codenamed , which was launched in June 2017. These processors featured a highly scalable system-on-chip (SoC) design supporting up to 32 cores and 64 threads via (SMT), utilizing the SP3 socket with a 4094-pin LGA configuration. (TDP) reached up to 225 W across models, with base clock frequencies ranging from 2.0 GHz to 2.5 GHz depending on core count and configuration. Key to their data center suitability, 7001 processors integrated eight channels of DDR4 memory support, enabling up to 2 TB capacity per socket with error-correcting code (ECC) for reliability in mission-critical environments. They also provided 128 lanes of PCIe 3.0 for expansive I/O connectivity, eliminating the need for a separate in single-socket configurations. Security features included Secure Memory Encryption (SME) for protecting system memory and Secure Encrypted Virtualization (SEV) to safeguard data from or host OS access, establishing a foundation for encrypted workloads. A representative high-end model, the 7551, offered 32 cores at a 2.0 GHz base frequency (boosting to 3.0 GHz), 64 MB of L3 cache, and a 180 W TDP, with a launch of $4,609 to target dense needs in and enterprise servers. These processors emphasized scalability for two-socket systems, delivering balanced performance for and without overlapping consumer-focused designs. For embedded applications, Zen 1 manifested in the Embedded V1000 series, based on the Raven Ridge die and targeted at industrial appliances, networking equipment, and automation systems. These APUs supported up to 4 cores and 8 threads, with configurable TDPs from 12 W to 54 W to suit low-power, always-on scenarios, integrating graphics for multimedia processing up to 3.6 TFLOPS. Complementing this, the Ryzen Embedded R1000 series extended Zen 1 to compact, graphics-capable embedded roles in thin clients and control systems, featuring dual-core/quad-thread configurations at base frequencies around 1.2–1.5 GHz and TDPs of 12–25 W. Both V1000 and R1000 variants supported dual-channel DDR4 with ECC options and up to 16 lanes of PCIe 3.0, prioritizing reliability and long-term availability over high core counts. Embedded EPYC 7001 variants, such as the 8-core models, adapted the server architecture for rugged, extended-lifecycle deployments in networking and storage appliances, inheriting ECC memory and SEV for secure operations.

References

  1. https://en.wikichip.org/wiki/amd/microarchitectures/zen
  2. https://en.wikichip.org/wiki/amd/ryzen_5/2500u
  3. https://en.wikichip.org/wiki/amd/ryzen_7/2700u
  4. https://en.wikichip.org/wiki/amd/ryzen_3/3200u
  5. https://en.wikichip.org/wiki/amd/cores/dali
  6. https://en.wikichip.org/wiki/amd/epyc
  7. https://en.wikichip.org/wiki/amd/epyc/7551
Add your contribution
Related Hubs
User Avatar
No comments yet.