Hubbry Logo
3DNow!3DNow!Main
Open search
3DNow!
Community hub
3DNow!
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
3DNow!
3DNow!
from Wikipedia
3DNow!
Design firmAdvanced Micro Devices
Introduced1998
Typeinstruction set architecture

3DNow! is a deprecated extension to the x86 instruction set developed by Advanced Micro Devices (AMD). It adds single instruction multiple data (SIMD) instructions to the base x86 instruction set, enabling it to perform vector processing of floating-point vector operations using vector registers. This improvement enhances the performance of many graphics-intensive applications. The first microprocessor to implement 3DNow! was the AMD K6-2, introduced in 1998. In appropriate applications, this enhancement raised the speed by about 2–4 times.[1]

However, the instruction set never gained much popularity, and AMD announced in August 2010 that support for 3DNow! would be dropped in future AMD processors, except for two instructions, PREFETCH and PREFETCHW.[2] These two instructions are also available in Bay-Trail Intel processors.[3]

History

[edit]

3DNow! was developed at a time when 3D graphics were becoming mainstream in PC multimedia and games. Realtime display of 3D graphics depended heavily on the host CPU's floating-point unit (FPU) to perform floating-point calculations, a task in which AMD's K6 processor was easily outperformed by its competitor, the Intel Pentium II.

As an enhancement to the MMX instruction set, the 3DNow! instruction-set augmented the MMX SIMD registers to support common arithmetic operations (add/subtract/multiply) on single-precision (32-bit) floating-point data. Software written to use AMD's 3DNow! instead of the slower x87 FPU could execute up to four times faster, depending on the instruction mix.

Versions

[edit]

3DNow!

[edit]

The first implementation of 3DNow! technology contains 21 new instructions that support SIMD floating-point operations. The 3DNow! data format is packed, single-precision, floating-point. The 3DNow! instruction set also includes operations for SIMD integer operations, data prefetch, and faster MMX-to-floating-point switching. Later, Intel would add similar (but incompatible) instructions to the Pentium III, known as SSE (Streaming SIMD Extensions).

3DNow! floating-point instructions are the following:

  • PI2FD – Packed 32-bit integer to floating-point conversion
  • PF2ID – Packed floating-point to 32-bit integer conversion
  • PFCMPGE – Packed floating-point comparison, greater or equal
  • PFCMPGT – Packed floating-point comparison, greater
  • PFCMPEQ – Packed floating-point comparison, equal
  • PFACC – Packed floating-point accumulate
  • PFADD – Packed floating-point addition
  • PFSUB – Packed floating-point subtraction
  • PFSUBR – Packed floating-point reverse subtraction
  • PFMIN – Packed floating-point minimum
  • PFMAX – Packed floating-point maximum
  • PFMUL – Packed floating-point multiplication
  • PFRCP – Packed floating-point reciprocal approximation
  • PFRSQRT – Packed floating-point reciprocal square root approximation
  • PFRCPIT1 – Packed floating-point reciprocal, first iteration step
  • PFRSQIT1 – Packed floating-point reciprocal square root, first iteration step
  • PFRCPIT2 – Packed floating-point reciprocal/reciprocal square root, second iteration step

3DNow! integer instructions are the following:

  • PAVGUSB – Packed 8-bit unsigned integer averaging
  • PMULHRW – Packed 16-bit integer multiply with rounding

3DNow! performance-enhancement instructions are the following:

  • FEMMS – Faster entry/exit of the MMX or floating-point state
  • PREFETCH/PREFETCHW – Prefetch at least a 32-byte line into L1 data cache (this is the only non-deprecated instruction)

3DNow! extensions

[edit]

There is little or no evidence that the second version of 3DNow! was ever officially given its own trade name. This has led to some confusion in documentation that refers to this new instruction set. The most common terms are Extended 3DNow!, Enhanced 3DNow! and 3DNow!+. The phrase "Enhanced 3DNow!" can be found in a few locations on the AMD website but the capitalization of "Enhanced" appears to be either purely grammatical or used for emphasis on processors that may or may not have these extensions (the most notable of which references a benchmark page for the K6-III-P that does not have these extensions).[4][5]

This extension to the 3DNow! instruction set was introduced with the first-generation Athlon processors. The Athlon added five new 3DNow! instructions and 19 new MMX instructions. Later, the K6-2+ and K6-III+ (both targeted at the mobile market) included the five new 3DNow! instructions, leaving out the 19 new MMX instructions. The new 3DNow! instructions were added to boost DSP. The new MMX instructions were added to boost streaming media.

The 19 new MMX instructions are a subset of Intel's SSE instruction set. In AMD technical manuals, AMD segregates these instructions apart from the 3DNow! extensions.[4] In AMD customer product literature, however, this segregation is less clear where the benefits of all 24 new instructions are credited to enhanced 3DNow! technology.[6] This has led programmers to come up with their own name for the 19 new MMX instructions. The most common appears to be Integer SSE (ISSE).[7] SSEMMX and MMX2 are also found in video filter documentation from the public domain sector. ISSE could also refer to Internet SSE, an early name for SSE.

3DNow! extension DSP instructions are the following:

  • PF2IW – Packed floating-point to integer word conversion with sign extend
  • PI2FW – Packed integer word to floating-point conversion
  • PFNACC – Packed floating-point negative accumulate
  • PFPNACC – Packed floating-point mixed positive-negative accumulate
  • PSWAPD – Packed swap doubleword

MMX extension instructions (Integer SSE) are the following:

  • MASKMOVQ – Streaming (cache bypass) store using byte mask
  • MOVNTQ – Streaming (cache bypass) store
  • PAVGB – Packed average of unsigned byte
  • PAVGW – Packed average of unsigned word
  • PMAXSW – Packed maximum signed word
  • PMAXUB – Packed maximum unsigned byte
  • PMINSW – Packed minimum signed word
  • PMINUB – Packed minimum unsigned byte
  • PMULHUW – Packed multiply high unsigned word
  • PSADBW – Packed sum of absolute byte differences
  • PSHUFW – Packed shuffle word
  • PEXTRW – Extract word into integer register
  • PINSRW – Insert word from integer register
  • PMOVMSKB – Move byte mask to integer register
  • PREFETCHNTA – Prefetch using the NTA reference
  • PREFETCHT0 – Prefetch using the T0 reference
  • PREFETCHT1 – Prefetch using the T1 reference
  • PREFETCHT2 – Prefetch using the T2 reference
  • SFENCE – Store fence

3DNow! Professional

[edit]

3DNow! Professional is a trade name used to indicate processors that combine 3DNow! technology with a complete SSE instructions set (such as SSE, SSE2 or SSE3).[8] The Athlon XP was the first processor to carry the 3DNow! Professional trade name, and was the first product in the Athlon family to support the complete SSE instruction set (for the total of: 21 original 3DNow! instructions; five 3DNow! extension DSP instructions; 19 MMX extension instructions; and 52 additional SSE instructions for complete SSE compatibility).[9]

3DNow! and the Geode GX/LX

[edit]

The Geode GX and Geode LX added two new 3DNow! instructions which is absent in all other processors.

3DNow! "professional" instructions unique to the Geode GX/LX are the following:

  • PFRSQRTV – Reciprocal square root approximation for a pair of 32-bit floats
  • PFRCPV – Reciprocal approximation for a pair of 32-bit floats

Advantages and disadvantages

[edit]

One advantage of 3DNow! is that it is possible to add or multiply the two numbers that are stored in the same register. With SSE, each number can only be combined with a number in the same position in another register. This capability, known as horizontal in Intel terminology, was the major addition to the SSE3 instruction set.

A disadvantage with 3DNow! is that 3DNow! instructions and MMX instructions share the same register-file, whereas SSE adds 8 new independent registers (XMM0XMM7).

Because MMX/3DNow! registers are shared by the standard x87 FPU, 3DNow! instructions and x87 instructions cannot be executed simultaneously. However, because it is aliased to the x87 FPU, the 3DNow! and MMX register states can be saved and restored by the traditional x87 F(N)SAVE and F(N)RSTOR instructions. This arrangement allowed operating systems to support 3DNow! with no explicit modifications, whereas SSE registers required explicit operating system support to properly save and restore the new XMM registers (via the added FXSAVE and FXRSTOR instructions.)

The FX* instructions from SSE provide a functional superset of the older x87 save and restore instructions. They can save not only SSE register states but also the x87 register states (hence are applicable also for MMX and 3DNow! operations where supported).

On AMD Athlon XP and K8-based cores (i.e. Athlon 64), assembly programmers have noted that it is possible to combine 3DNow! and SSE instructions to reduce register pressure, but in practice it is difficult to improve performance due to the instructions executing on shared functional units.[10]

Processors supporting 3DNow!

[edit]
  • All AMD processors after K6-2 (based on K6), Athlon, Athlon 64 and Phenom architecture families.
    • Not supported in Bulldozer, Bobcat and Zen architecture processors and their derivates.
    • The last AMD APU processor supporting 3DNow! is the A8-3870K, which is based on the Llano architecture. It is also the only APU with 3DNow! instructions, as the Bobcat and up exclude support for it.
  • National Semiconductor Geode GX2, later AMD Geode.
  • VIA C3 (also known as Cyrix III) "Samuel", "Samuel 2", "Ezra", and "Eden ESP" cores.
  • IDT WinChip 2, 3

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
3DNow! is a single instruction, multiple data (SIMD) extension to the x86 instruction set architecture developed by Advanced Micro Devices (AMD), designed to accelerate floating-point operations for multimedia applications including 3D graphics, video encoding, and audio processing. It builds upon Intel's MMX integer SIMD instructions by adding support for packed single-precision floating-point arithmetic within the same 64-bit MMX registers, enabling two packed floating-point operations per instruction without requiring separate vector registers. The extension consists of 21 core instructions for operations such as addition, multiplication, reciprocal approximation, and square root, plus two enhancement instructions (FEMMS for fast state transitions and PREFETCH for data caching), all compatible with existing x86 software and operating systems. AMD introduced 3DNow! in February 1998 as a strategic response to the limitations of Intel's MMX extension, which lacked floating-point support critical for emerging 3D graphics workloads in the late PC market. The first appeared in the -K6-2 , launched worldwide on May 28, 1998, fabricated on a 0.25 μm process with 9.3 million transistors. This timing positioned 3DNow! ahead of Intel's SSE (), which debuted in 1999 with 128-bit vectors but required new XMM registers, creating developer fragmentation between the competing sets. Subsequent processors, including the K6-III and series, incorporated 3DNow!, driving adoption in games and applications, where it delivered performance improvements in floating-point intensive tasks compared to MMX-only systems. Technically, 3DNow! operates on two 32-bit single-precision values packed into each 64-bit MMX register, supporting SIMD parallelism across arithmetic, logical, and conversion operations while reusing the floating-point unit for compatibility. Key instructions include PFADD (packed floating-point add), PFMAX (packed maximum), and PSQRT (packed ), which reduce the overhead of scalar floating-point computations in pipelines. The FEMMS instruction optimizes transitions between MMX modes and 3DNow! floating-point modes by avoiding full state saves, improving efficiency in mixed workloads. Detection occurs via the instruction, with feature bit 31 in EDX indicating support, ensuring software portability across processors. In 2000, AMD introduced Enhanced 3DNow! in the processor, adding five instructions focused on (DSP) and , such as PF2IW (packed float-to-integer conversion), PFNACC (negative accumulate), PI2FW (packed integer-to-float conversion), and PSWAPD (packed swap doubleword). These enhancements targeted audio/video codecs and were detected via bit 30. Later, 3DNow! Professional marketed broader SIMD support including SSE compatibility in processors like the XP. As Intel's SSE and later AVX extensions gained dominance, AMD shifted focus to compatibility with SSE, leading to 3DNow!'s announced in August 2010, with no support in Bulldozer-family and later CPUs. support was phased out in version 5.17 (2022). By 2024, major compilers like removed 3DNow! support, marking the end of its active use, though legacy code remains executable on older hardware.

Overview

Definition and Purpose

3DNow! is a proprietary (SIMD) instruction set extension to the x86 developed by Advanced Micro Devices (AMD). It enables vector processing of floating-point operations by packing two 32-bit single-precision floating-point values into each 64-bit MMX register, allowing parallel computations on multiple data elements within a single instruction. Introduced to compete with Intel's MMX technology, which supported only integer operations, and the subsequent SSE extension for floating-point SIMD, 3DNow! extended the existing MMX register set to handle floating-point tasks without requiring additional hardware. The primary purpose of 3DNow! is to accelerate performance in floating-point-intensive applications, such as , video decoding, and scientific simulations, by performing parallel floating-point operations directly on the CPU without the need for dedicated vector processing units. This extension targets enhancements in processing, including faster frame rates in high-resolution 3D scenes, improved physical modeling for realistic environments, sharper , smoother video playback, and higher-quality audio reproduction. By leveraging SIMD parallelism, it enables developers to optimize for efficient handling of vector-based computations common in graphics pipelines. In the late 1990s, the x86 architecture's floating-point unit (FPU) was limited to scalar operations, processing one floating-point value at a time, while MMX provided SIMD capabilities solely for integers, making both inadequate for the floating-point-heavy workloads in emerging 3D graphics applications like and . These limitations hindered real-time performance on consumer PCs, where single floating-point execution units struggled with the parallel demands of 3D transformations and lighting calculations. 3DNow! addressed this by integrating SIMD floating-point support into the x86 core, enabling more efficient processing for gaming and multimedia without the overhead of context switching between integer and floating-point modes. A representative example of its application is in consumer PC gaming, where 3DNow! targeted real-time 3D graphics to improve frame rates in titles like , which incorporated optimizations for the extension to enhance vertex processing and rendering efficiency on processors.

Key Technical Features

3DNow! leverages the existing eight 64-bit MMX registers (MM0 through MM7) to enable (SIMD) parallelism for floating-point operations, with each register capable of storing two packed 32-bit single-precision floating-point values. This architecture reuses the MMX register file without requiring additional hardware, allowing seamless integration into x86 processors for tasks such as 3D graphics acceleration. The data format adheres to the standard for single-precision floating-point numbers, packing two such values into a single 64-bit register while supporting denormalized numbers and gradual underflow to maintain numerical accuracy in computations. The core of 3DNow! consists of 21 new SIMD instructions, 19 of which are floating-point instructions that perform parallel operations on the packed values within the MMX registers, including essential arithmetic and functions. Representative instructions include PFADD for packed floating-point , PFMAX for packed maximum, and PFMUL for packed multiplication, which operate element-wise on the two single-precision values per register. Additionally, the set provides conversion instructions such as PF2ID (packed floating-point to signed 32-bit integer) and PI2FD (packed 32-bit integer to floating-point), facilitating data interchange between integer and floating-point domains without unloading to memory. Distinctive among SIMD extensions at the time, 3DNow! incorporates horizontal operations like PFACC (packed floating-point accumulate), which adds the two elements within the same register to produce partial sums efficiently for algorithms such as dot products. It also includes prefetch instructions—PREFETCH for read-only data prefetching into the cache and PREFETCHW for write-allocated prefetching—to optimize memory access patterns by reducing cache misses in data-intensive workloads. To ensure backward compatibility with the x87 (FPU), which shares the same physical registers as MMX, 3DNow! instructions are encoded with the two-byte prefix 0x0F 0x0F, distinguishing them from x87 opcodes and avoiding conflicts during mixed-mode execution. The FEMMS instruction provides a low-overhead method to reset the FPU tag word after 3DNow! usage, contrasting with the more comprehensive EMMS required for pure MMX operations.

History

Development and Introduction

In the mid-1990s, initiated research and development efforts within its California Microprocessor Division to enhance the floating-point capabilities of its K6 processor family, which suffered from performance limitations in the (FPU) compared to Intel's . These weaknesses were particularly evident in emerging 3D graphics applications, where the K6's single FPU struggled with intensive computations, prompting AMD to design a SIMD extension as a proactive counter to Intel's anticipated Katmai processor featuring early (SSE). The project, internally targeted for completion in the second half of 1997, aimed to integrate these enhancements directly into the processor core without relying on costly external co-processors. also collaborated with and to adopt 3DNow! as a unified standard, enabling support in their 2 and processors shortly after. The development was led by a multimedia-focused engineering team at , including key contributors Stuart Oberman, Fred Weber, Norbert Juffa, and Greg Favor, who specialized in creating efficient, low-cost SIMD solutions tailored for both desktop personal computers and emerging embedded applications. This team collaborated closely with independent software vendors (ISVs) to define the instruction set, ensuring compatibility with existing x86 architectures while prioritizing affordability and broad market applicability for consumer-grade systems. Their work built upon the integer-only limitations of Intel's MMX instructions, extending SIMD paradigms to floating-point operations to better support workloads. The primary technical motivation for 3DNow! stemmed from the need to accelerate floating-point SIMD operations essential for 3D graphics pipelines, such as transformations and calculations, which were bottlenecks in the K6's design due to its reliance on a single FPU for all such tasks. By reusing MMX registers for packed 32-bit floating-point data, the extension enabled parallel processing of multiple data elements, addressing the growing demands of without the expense of dedicated hardware accelerators. This approach was intended to deliver substantial performance gains in graphics-intensive scenarios, targeting up to four floating-point operations per cycle to overcome the K6's inherent scalar limitations. Pre-announcement milestones included rigorous integration testing of 3DNow! instructions with K6-2 processor prototypes throughout 1997, focusing on validation within the 0.25μm fabrication process and dual-pipeline execution units to ensure seamless compatibility. These efforts culminated in silicon validation by early 1998, with the technology designed to provide 2-4x speedups in representative 3D graphics workloads, such as setup and physics simulations, thereby positioning the K6-2 as a competitive alternative in the value-oriented PC segment.

Announcement and Initial Adoption

AMD announced 3DNow! technology alongside the launch of its K6-2 processor on May 28, 1998, introducing single-instruction multiple-data (SIMD) instructions aimed at accelerating 3D graphics and multimedia processing on x86 processors. The K6-2, fabricated on a 0.25-micron process with 9.3 million transistors, was positioned as a cost-effective alternative to Intel's , with initial pricing starting at $185 for the 266 MHz model to target mainstream personal computers. Early adoption was facilitated by compatibility with existing Super Socket 7 motherboards, including models from ASUS such as the P5A and from MSI such as the MS-6163, which supported the K6-2's 100 MHz front-side bus and enabled upgrades in value-oriented systems without requiring new hardware platforms. Software support quickly followed, with Microsoft incorporating 3DNow! optimizations into DirectX 6.1, released in early 1999, to leverage the instructions for improved Direct3D rendering performance. Additionally, 3Dfx provided Glide API wrappers optimized for 3DNow! in drivers for Voodoo graphics cards, enhancing compatibility for 3D-accelerated games. The launch significantly boosted 's market position, with K6-2 shipments exceeding 8.5 million units in under seven months and driving substantial revenue growth through increased penetration in the sub-$1,000 PC segment, where AMD captured around 37% share by late 1998. In the gaming sector, endorsements from developers like accelerated uptake; id collaborated with on 3DNow!-optimized drivers for , yielding significant performance gains—up to nearly double in some benchmarks—compared to non-optimized versions on compatible hardware. Despite these advances, initial challenges arose from limited developer tools and SDKs, resulting in inconsistent optimization across early titles; for instance, (1999) included 3DNow! support but exhibited variable performance gains depending on implementation, highlighting the need for more mature and ecosystems. By 1999, however, these hurdles began to ease as broader industry adoption improved software maturity.

Versions and Extensions

Original 3DNow!

The original 3DNow! instruction set debuted with the microprocessor in 1998, adding 21 new (SIMD) floating-point instructions to the existing MMX foundation. These instructions targeted enhancements in 3D graphics, audio, and by enabling packed single-precision floating-point operations on two 32-bit values per 64-bit register. Key instructions in this baseline set included conversions such as PF2ID (packed floating-point to 32-bit integer with truncation) and PI2FD (packed 32-bit integer to floating-point), alongside basic arithmetic operations like PFACC (packed floating-point accumulate). The full set comprised: FEMMS (fast enter/leave MMX state), PAVGUSB (packed average unsigned bytes), PF2ID, PFACC, PFADD (packed floating-point add), PFCMPEQ (packed floating-point compare equal), PFCMPGE (packed floating-point compare greater or equal), PFCMPGT (packed floating-point compare greater than), PFMAX (packed floating-point maximum), PFMIN (packed floating-point minimum), PFMUL (packed floating-point multiply), PFRCP (packed floating-point reciprocal), PFRCPIT1 (packed floating-point reciprocal iterative 1), PFRCPIT2 (packed floating-point reciprocal iterative 2), PFRSQIT1 (packed floating-point reciprocal square root iterative 1), (packed floating-point reciprocal square root), PFSUB (packed floating-point subtract), PFSUBR (packed floating-point subtract reverse), PI2FD, PMULHRW (packed multiply high round and scale word), and PREFETCH/PREFETCHW (prefetch and prefetch with intent to write). These operations utilized the same 64-bit MMX registers (MM0 through MM7) without requiring changes to the x87 floating-point stack, ensuring seamless integration. Software detection of original 3DNow! support relied on the instruction with extended function 0x8000_0001, where bit 31 (the most significant bit) of the register is set if the feature is present. This mechanism allowed applications to query hardware capabilities dynamically. Compatibility mandated underlying MMX support, as 3DNow! built directly upon it, but no operating system modifications were necessary, and there was no performance penalty for switching between MMX and 3DNow! states due to the shared . Applications needed to explicitly check to enable 3DNow!-optimized code paths.

3DNow! Extensions

The 3DNow! Extensions were introduced by alongside the first-generation processors in June 1999, enhancing the original 3DNow! instruction set with additional capabilities for multimedia processing. These extensions added a total of 24 new instructions to the existing 3DNow! and MMX sets, comprising 5 specialized (DSP) instructions that operate on 3DNow!'s 64-bit MMX registers and 19 instructions compatible with Intel's emerging MMX extensions (later incorporated into SSE). This update aimed to accelerate integer-based multimedia workloads on processors, which featured a 128-bit wide SIMD for improved throughput. The 5 DSP instructions focus on efficient packed floating-point to integer conversions and accumulate operations, enabling faster processing for tasks such as audio filtering and video effects. Representative examples include PFNACC (packed single-precision floating-point negative accumulate), which performs subtraction and accumulation on pairs of 32-bit floats to support DSP algorithms like (FIR) filters, and PI2FW (packed integer to floating-point word), which converts 32-bit integers to 32-bit floats for seamless data transitions in pipelines. The 19 MMX-compatible instructions, drawn from AMD's Multimedia Extensions (EMMX), emphasize integer operations for video and image processing; for instance, PSADBW (packed sum of absolute differences byte-wise) computes the between two 8-byte vectors, a key primitive for in MPEG video decoding, while PMULHUW (packed multiply high unsigned word) multiplies pairs of 16-bit unsigned integers and stores the high 16 bits of each result, aiding in for graphics scaling. These extensions improved performance in integer-heavy multimedia applications, such as decoding and audio codec processing, by providing that bridged the gap between the original 3DNow!'s floating-point focus and Intel's forthcoming SSE integer instructions, with benchmarks showing up to 2x speedup in video encoding tasks on compared to K6-2 processors. Support for 3DNow! Extensions is detected via the instruction: executing function 0x80000001 returns feature flags in , where bit 30 indicates presence of the 3DNow! DSP extensions, and bit 22 signals the MMX extensions (though standard MMX is bit 23 in the basic CPUID).
CategoryNumber of InstructionsExamplesPrimary Use Cases
3DNow! DSP5PFNACC, PI2FW, PF2IW, PFPNACC, PSWAPDAudio filtering, floating-point DSP conversions
MMX Extensions19PSADBW, PMULHUW, PAVGB, MASKMOVQ, PREFETCHNTAVideo encoding (e.g., MPEG ), non-temporal stores, prefetching for cache optimization

3DNow!

3DNow! is an enhanced version of AMD's SIMD instruction set that integrates the original 3DNow! technology with Intel's (SSE), providing full support for SSE instructions without including SSE2. It was introduced on October 9, 2001, alongside the Athlon XP processors based on the core, which merged all prior 3DNow! features into this unified extension for improved processing. This upgrade aimed to bridge compatibility gaps between and architectures, allowing developers to leverage a broader set of vector operations in software. A core feature of 3DNow! Professional is the adoption of 128-bit XMM registers from SSE, enabling packed single-precision floating-point operations across four elements per register, while retaining 3DNow!'s unique horizontal operations such as horizontal addition and multiplication for efficient vector reductions not natively available in SSE. This combination added approximately 52 new instructions overall, enhancing performance in 3D , video encoding, and scientific simulations by supporting advanced algorithms that mix and floating-point SIMD tasks seamlessly. For instance, applications like Adobe Premiere benefited from unified code paths that could utilize these extensions for faster photo, video, and audio editing without separate - or Intel-specific optimizations. The primary benefit of 3DNow! Professional lies in its with SSE-based software on hardware, which reduced developer fragmentation by allowing a single codebase to run efficiently across platforms without extensive branching for processor-specific instructions. This compatibility extended to operating systems supporting SSE, enabling up to 25% performance gains in real-world workloads on Palomino-based systems. Software detection of typically involves querying function 0x00000001, where bit 25 in the register indicates SSE support, combined with extended function 0x80000001, where bit 31 in confirms 3DNow! presence; processors reporting both flags on AuthenticAMD vendor strings qualify as supporting the full Professional feature set.

Support in Processors

The processors, designed for embedded and low-power applications, integrated 3DNow! technology as a core component of their x86-compatible to enhance and in resource-constrained environments. The GX series, introduced in , featured the original 3DNow! instructions along with extensions, tailored for thin clients and set-top boxes, while emphasizing power efficiency through a fully pipelined compliant with standards. Similarly, the LX series, launched in 2007, retained full support for 3DNow! Professional, which merged 3DNow! with SSE instructions, enabling seamless execution of advanced SIMD operations for video decoding and in low-wattage systems. These processors operated at reduced clock speeds to prioritize energy efficiency, with the GX reaching up to 500 MHz and the LX topping out at 600 MHz for the LX900 variant, though common models like the LX800 ran at 500 MHz and the LX700 at 433 MHz, all while maintaining compatibility with 3DNow! Professional for optimized floating-point computations. This adaptation proved particularly suitable for battery-powered devices, such as the OLPC XO-1 , which utilized the LX-700 to deliver educational capabilities with hardware-accelerated video playback and basic 3D graphics in power-sensitive scenarios. The line's 3DNow! implementation also included two unique instructions absent in other processors, PFRCPV (packed floating-point reciprocal approximation) and PFRSQRTV (packed floating-point reciprocal approximation), further enhancing efficiency for embedded tasks like MPEG video decoding in set-top boxes. In practical applications, 3DNow! support in processors facilitated video playback and processing in embedded systems, including industrial control panels and gaming terminals such as machines, where low power draw—down to 0.9 W for the LX800—combined with integrated accelerators ensured reliable performance without . These optimizations made the series a staple for thin clients and control systems throughout the late , supporting x86 software ecosystems while minimizing thermal output. Support for 3DNow! in the lineup effectively ended around 2010, coinciding with 's announcement of its in future architectures and the lack of a direct successor, shifting focus to newer embedded solutions like the G-Series APUs.

Implementation

Supported Processors

3DNow! was first implemented in hardware with the processor, introduced in May 1998 as part of AMD's Super platform. This marked the debut of the SIMD extension set, enabling enhanced 3D graphics and performance on x86 systems. Subsequent AMD processors in the K6 family, such as the K6-III released later in 1998, also included full 3DNow! support. AMD extended 3DNow! compatibility across its mainstream x86 lineup through the early 2000s. The Athlon processors, starting with the original Athlon in 1999 and continuing to the Athlon 64 models produced from 2003 to 2009, incorporated 3DNow! alongside MMX and later SSE instructions. Budget-oriented lines like Duron (introduced 2000) and Sempron (2002) similarly featured the full instruction set, as did mobile variants. Support persisted into AMD's accelerated processing units (APUs), with the Llano-based A8-3870K desktop APU in 2011 being the final model to include 3DNow!. However, beginning with the Bulldozer architecture in 2011, Bobcat in 2011, and Zen architectures from 2017 onward, AMD processors excluded 3DNow! hardware implementation.
Processor FamilyIntroduction YearKey Models with 3DNow! Support
1998K6-2 (up to 550 MHz), K6-III (up to 550 MHz)
1999–2005Athlon (up to 2.2 GHz), Athlon XP (up to 2.33 GHz)
2003–2009Athlon 64 (up to 3.2 GHz), (dual-core up to 3.2 GHz)
2000–2006Duron (up to 1.8 GHz), Sempron (up to 2.3 GHz)
2011A8-3870K (up to 3.0 GHz)
Third-party x86 processors licensed to implement 3DNow! appeared shortly after its debut, broadening adoption in low-cost systems. IDT's WinChip 2 and WinChip 3 series, introduced in 1998 and 2000 respectively, were among the earliest non-AMD implementations, targeting Socket 7 motherboards with integrated 3DNow! for multimedia acceleration. VIA Technologies' C3 family, spanning 1999 to 2005, included 3DNow! in cores like Samuel (1999, also branded as Cyrix III), Samuel 2, and Ezra, providing compatibility in embedded and budget desktop applications. Intel offered partial support limited to the PREFETCH and PREFETCHW instructions in its Bay Trail Atom processors released in 2013, without the full SIMD floating-point capabilities. AMD licensed 3DNow! technology to partners including (later acquired by VIA in 1999) and , enabling these vendors to integrate the extensions into their designs for competitive x86 compatibility. In August 2010, AMD announced the deprecation of full 3DNow! support in future processor designs, retaining only the PREFETCH and PREFETCHW instructions for in data prefetching operations.

Software and Compiler Support

The GNU Compiler Collection (GCC) provided support for 3DNow! instructions starting with version 2.95 in 1999, enabling developers to generate code using the -m3dnow flag for targeting processors with MMX and 3DNow! extensions. This allowed for inline assembly blocks via asm directives to incorporate 3DNow! operations directly in C/C++ code, though automatic code generation was limited, often requiring manual intervention for optimal use. Microsoft Visual C++ version 6.0 with 5, released in 1999, introduced intrinsics for 3DNow! through the Processor Pack update, providing functions such as _m_pfmul for packed floating-point multiplication without relying solely on inline assembly. These intrinsics facilitated SIMD operations on MMX registers, improving portability over pure assembly while maintaining compatibility with Windows environments. Key libraries integrated 3DNow! optimizations to leverage its capabilities in graphics and multimedia applications. The 7 runtime, introduced in 1999, included runtime detection of 3DNow! via queries, allowing applications to dynamically enable SIMD-accelerated rendering paths in for enhanced floating-point performance on supported hardware. Mesa 3D, an open-source implementation of , supported 3DNow! through its configure script option --enable-3dnow, which activated assembly optimizations for vertex transformations and rasterization to accelerate software rendering on systems. Similarly, FFmpeg incorporated hand-written assembly routines optimized for 3DNow! in video codecs like and , configurable via --enable-amd3dnow during compilation, to speed up decoding and encoding tasks involving packed floating-point arithmetic. Operating system compatibility relied on standard CPU feature detection mechanisms, as 3DNow! required no kernel modifications. On and later versions, applications used function 80000001h to query the EDX bit 31 for 3DNow! presence, enabling seamless integration without OS-level changes. In version 2.2, released in January 1999, supported 3DNow!-accelerated modules for and drivers through user-space detection and assembly-optimized libraries, facilitating in environments like XFree86. Developers faced significant challenges in adopting 3DNow!, as early compilers like GCC and MSVC primarily supported manual assembly or basic intrinsics, limiting automatic optimization of complex loops. This often necessitated hand-coding SIMD paths, increasing development time and error risk until the (ICC) version 8.0 in 2003 introduced advanced auto-vectorization capable of generating 3DNow! instructions from scalar C/C++ code with options like -xK for K6 targets. Such tools reduced reliance on explicit assembly, though compatibility with non-AMD processors remained a concern due to the extension's AMD-specific nature.

Advantages and Limitations

Advantages

3DNow! offered notable performance improvements in floating-point intensive workloads, especially those central to 3D graphics and processing. Its SIMD architecture allowed simultaneous operations on two 32-bit floating-point values per instruction, delivering speedups of 2 to 4 times in tasks such as matrix multiplications used for 3D transformations and rendering. This efficiency stemmed from dual pipelined execution units on processors like the , enabling up to four floating-point operations per clock cycle with low latency. In benchmarks, 3DNow!-enabled systems achieved peak floating-point throughput of 1800 MFLOPS at 450 MHz, roughly four times the 450 MFLOPS of comparable processors without equivalent extensions. Additionally, instructions like PFADD supported horizontal additions within a single register, providing a computational edge over early SSE implementations that lacked direct equivalents and required extra shuffles for similar intra-register operations. A major strength of 3DNow! was its with established x86 architectures, minimizing adoption barriers for developers and users. By its registers to the existing MMX and FPU , it avoided the need for new hardware state management or operating system kernel changes during switches. This design ensured seamless integration with legacy MMX integer code and scalar floating-point routines, allowing applications to mix 3DNow! instructions without rewriting core logic or risking incompatibility. Features like FEMMS further optimized transitions between MMX and floating-point states, reducing overhead in hybrid workloads. 3DNow! also enhanced cost-effectiveness in consumer computing, particularly for entry-level gaming configurations. On budget-oriented processors, it paired effectively with inexpensive 3D accelerators such as the Voodoo series, enabling capable gaming setups that outperformed pricier alternatives in graphics-heavy titles like . For instance, a system with a single card delivered frame rates up to 68 FPS at 640x480 resolution, surpassing equivalent setups by 10-15% in 3D benchmarks while keeping overall build costs low through AMD's competitive pricing.

Disadvantages and Deprecation

One key architectural limitation of 3DNow! stems from its use of the existing MMX register file, which is aliased to the x87 floating-point unit (FPU) registers. This sharing prevents the simultaneous execution of 3DNow! vector instructions and x87 scalar floating-point operations, as the registers cannot be accessed concurrently without state transitions that incur overhead. Additionally, 3DNow! operates on 64-bit registers, allowing only two single-precision floating-point values per operation, which lags behind the 128-bit registers introduced by Intel's (SSE) that support four values. This narrower width reduced its efficiency for parallel computations compared to SSE. The instruction set's inflexibility further compounded these issues, lacking advanced features such as gather and scatter operations for non-contiguous memory access, as well as robust support for vertical (element-wise) packing in complex data layouts. These omissions made 3DNow! progressively less adaptable to evolving workloads requiring irregular data movements. In August 2010, AMD officially announced the deprecation of 3DNow!, stating that it would not be supported in certain upcoming processors to align with industry standards. The FX-series processors based on the Bulldozer architecture, released in 2011, omitted 3DNow! support entirely, marking a shift away from AMD-specific extensions. Full hardware support for 3DNow! ended with the Llano APUs in 2011, exemplified by the A8-3870K processor as the last model to include it. The primary reasons for this discontinuation were the dominance of SSE and subsequent AVX standards, which offered broader cross-vendor compatibility between and processors, reducing the need for proprietary instructions. Only the PREFETCH and PREFETCHW instructions were retained for legacy prefetching purposes in future designs.

Legacy and Impact

Industry Influence

The introduction of 3DNow! by in 1998 exerted significant competitive pressure on , accelerating the rollout of (SSE) with the processor in May 1999 as a direct countermeasure. This rivalry between the two extensions—3DNow! leveraging MMX registers for floating-point SIMD and SSE introducing dedicated 128-bit XMM registers—ultimately standardized SIMD functionality across the x86 ecosystem, compelling both vendors to prioritize enhancements for 3D graphics and multimedia workloads. The design principles of 3DNow! left a lasting imprint on AMD's subsequent processor architectures, notably in the family launched in 2003, where the was engineered to handle both 3DNow! instructions and Intel's SSE for dual 32-bit SIMD operations. This integration reflected AMD's strategy to maintain with its proprietary extensions while adopting industry-standard SIMD features, ensuring seamless support for evolving software demands in server and environments. 3DNow! also catalyzed advancements in software ecosystems by necessitating runtime CPU feature detection, typically via the instruction (e.g., checking bit 31 in for 3DNow! support), which allowed applications to dynamically select optimized paths for enhanced floating-point in tasks like 3D transformations and lighting. This practice became foundational in graphics APIs and multimedia libraries—such as early extensions and implementations—enabling cross-processor optimization and benefiting multi-vendor development by mitigating compatibility issues in setups. On the market front, 3DNow!'s performance boosts in affordable processors like the K6-2 series helped expand its foothold in the desktop CPU segment, achieving approximately 16.6% in Q1 2003 according to Mercury Research data. This growth underpinned a surge in budget 3D gaming PCs during the late 1990s and early 2000s, as value-oriented systems leveraging 3DNow! delivered competitive frame rates in titles optimized for SIMD, democratizing access to immersive graphics experiences.

Current Relevance

In 2025, 3DNow! instructions are absent from all modern and processors, having been discontinued by starting with the architecture in 2011 and never natively supported by CPUs. Subsequent architectures, including the series introduced in 2017, exclude 3DNow! entirely, rendering it obsolete for contemporary hardware. However, emulation remains available in virtual machines such as , which supports 3DNow! alongside MMX and SSE extensions to enable execution of legacy software on unsupported hosts, particularly for older operating systems like . Legacy software continues to incorporate 3DNow! instructions, preserving their presence in binaries from the early , such as certain Windows XP-era games optimized for and processors to enhance performance. These applications can run on modern systems only through emulation or , as native execution triggers invalid faults on post-2011 hardware. Compiler support has waned, with removing 3DNow! intrinsics and options in 2024 due to lack of hardware backing, though assembly-level usage persists for niche legacy development. Certain 3DNow!-derived instructions, notably PREFETCH and PREFETCHW for cache line prefetching, retain utility in as non-deprecated hints for minimizing , compilable via modern GCC with dedicated support. These are distinct from the full SIMD set and appear in optimized code for data-intensive workloads, independent of broader 3DNow! deprecation. No revival of 3DNow! is anticipated, as it has been fully supplanted by like , which offer superior parallelism and are integral to current baselines such as v3 (requiring AVX2 and beyond since the ). The instruction set's removal from kernel configurations, as in 5.17, and toolchains underscores its terminal status in production environments.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.