Recent from talks
Nothing was collected or created yet.
3DNow!
View on Wikipedia| Design firm | Advanced Micro Devices |
|---|---|
| Introduced | 1998 |
| Type | instruction set architecture |
3DNow! is a deprecated extension to the x86 instruction set developed by Advanced Micro Devices (AMD). It adds single instruction multiple data (SIMD) instructions to the base x86 instruction set, enabling it to perform vector processing of floating-point vector operations using vector registers. This improvement enhances the performance of many graphics-intensive applications. The first microprocessor to implement 3DNow! was the AMD K6-2, introduced in 1998. In appropriate applications, this enhancement raised the speed by about 2–4 times.[1]
However, the instruction set never gained much popularity, and AMD announced in August 2010 that support for 3DNow! would be dropped in future AMD processors, except for two instructions, PREFETCH and PREFETCHW.[2] These two instructions are also available in Bay-Trail Intel processors.[3]
History
[edit]3DNow! was developed at a time when 3D graphics were becoming mainstream in PC multimedia and games. Realtime display of 3D graphics depended heavily on the host CPU's floating-point unit (FPU) to perform floating-point calculations, a task in which AMD's K6 processor was easily outperformed by its competitor, the Intel Pentium II.
As an enhancement to the MMX instruction set, the 3DNow! instruction-set augmented the MMX SIMD registers to support common arithmetic operations (add/subtract/multiply) on single-precision (32-bit) floating-point data. Software written to use AMD's 3DNow! instead of the slower x87 FPU could execute up to four times faster, depending on the instruction mix.
Versions
[edit]3DNow!
[edit]The first implementation of 3DNow! technology contains 21 new instructions that support SIMD floating-point operations. The 3DNow! data format is packed, single-precision, floating-point. The 3DNow! instruction set also includes operations for SIMD integer operations, data prefetch, and faster MMX-to-floating-point switching. Later, Intel would add similar (but incompatible) instructions to the Pentium III, known as SSE (Streaming SIMD Extensions).
3DNow! floating-point instructions are the following:
PI2FD– Packed 32-bit integer to floating-point conversionPF2ID– Packed floating-point to 32-bit integer conversionPFCMPGE– Packed floating-point comparison, greater or equalPFCMPGT– Packed floating-point comparison, greaterPFCMPEQ– Packed floating-point comparison, equalPFACC– Packed floating-point accumulatePFADD– Packed floating-point additionPFSUB– Packed floating-point subtractionPFSUBR– Packed floating-point reverse subtractionPFMIN– Packed floating-point minimumPFMAX– Packed floating-point maximumPFMUL– Packed floating-point multiplicationPFRCP– Packed floating-point reciprocal approximationPFRSQRT– Packed floating-point reciprocal square root approximationPFRCPIT1– Packed floating-point reciprocal, first iteration stepPFRSQIT1– Packed floating-point reciprocal square root, first iteration stepPFRCPIT2– Packed floating-point reciprocal/reciprocal square root, second iteration step
3DNow! integer instructions are the following:
PAVGUSB– Packed 8-bit unsigned integer averagingPMULHRW– Packed 16-bit integer multiply with rounding
3DNow! performance-enhancement instructions are the following:
FEMMS– Faster entry/exit of the MMX or floating-point statePREFETCH/PREFETCHW– Prefetch at least a 32-byte line into L1 data cache (this is the only non-deprecated instruction)
3DNow! extensions
[edit]There is little or no evidence that the second version of 3DNow! was ever officially given its own trade name. This has led to some confusion in documentation that refers to this new instruction set. The most common terms are Extended 3DNow!, Enhanced 3DNow! and 3DNow!+. The phrase "Enhanced 3DNow!" can be found in a few locations on the AMD website but the capitalization of "Enhanced" appears to be either purely grammatical or used for emphasis on processors that may or may not have these extensions (the most notable of which references a benchmark page for the K6-III-P that does not have these extensions).[4][5]
This extension to the 3DNow! instruction set was introduced with the first-generation Athlon processors. The Athlon added five new 3DNow! instructions and 19 new MMX instructions. Later, the K6-2+ and K6-III+ (both targeted at the mobile market) included the five new 3DNow! instructions, leaving out the 19 new MMX instructions. The new 3DNow! instructions were added to boost DSP. The new MMX instructions were added to boost streaming media.
The 19 new MMX instructions are a subset of Intel's SSE instruction set. In AMD technical manuals, AMD segregates these instructions apart from the 3DNow! extensions.[4] In AMD customer product literature, however, this segregation is less clear where the benefits of all 24 new instructions are credited to enhanced 3DNow! technology.[6] This has led programmers to come up with their own name for the 19 new MMX instructions. The most common appears to be Integer SSE (ISSE).[7] SSEMMX and MMX2 are also found in video filter documentation from the public domain sector. ISSE could also refer to Internet SSE, an early name for SSE.
3DNow! extension DSP instructions are the following:
PF2IW– Packed floating-point to integer word conversion with sign extendPI2FW– Packed integer word to floating-point conversionPFNACC– Packed floating-point negative accumulatePFPNACC– Packed floating-point mixed positive-negative accumulatePSWAPD– Packed swap doubleword
MMX extension instructions (Integer SSE) are the following:
MASKMOVQ– Streaming (cache bypass) store using byte maskMOVNTQ– Streaming (cache bypass) storePAVGB– Packed average of unsigned bytePAVGW– Packed average of unsigned wordPMAXSW– Packed maximum signed wordPMAXUB– Packed maximum unsigned bytePMINSW– Packed minimum signed wordPMINUB– Packed minimum unsigned bytePMULHUW– Packed multiply high unsigned wordPSADBW– Packed sum of absolute byte differencesPSHUFW– Packed shuffle wordPEXTRW– Extract word into integer registerPINSRW– Insert word from integer registerPMOVMSKB– Move byte mask to integer registerPREFETCHNTA– Prefetch using the NTA referencePREFETCHT0– Prefetch using the T0 referencePREFETCHT1– Prefetch using the T1 referencePREFETCHT2– Prefetch using the T2 referenceSFENCE– Store fence
3DNow! Professional
[edit]3DNow! Professional is a trade name used to indicate processors that combine 3DNow! technology with a complete SSE instructions set (such as SSE, SSE2 or SSE3).[8] The Athlon XP was the first processor to carry the 3DNow! Professional trade name, and was the first product in the Athlon family to support the complete SSE instruction set (for the total of: 21 original 3DNow! instructions; five 3DNow! extension DSP instructions; 19 MMX extension instructions; and 52 additional SSE instructions for complete SSE compatibility).[9]
3DNow! and the Geode GX/LX
[edit]The Geode GX and Geode LX added two new 3DNow! instructions which is absent in all other processors.
3DNow! "professional" instructions unique to the Geode GX/LX are the following:
PFRSQRTV– Reciprocal square root approximation for a pair of 32-bit floatsPFRCPV– Reciprocal approximation for a pair of 32-bit floats
Advantages and disadvantages
[edit]One advantage of 3DNow! is that it is possible to add or multiply the two numbers that are stored in the same register. With SSE, each number can only be combined with a number in the same position in another register. This capability, known as horizontal in Intel terminology, was the major addition to the SSE3 instruction set.
A disadvantage with 3DNow! is that 3DNow! instructions and MMX instructions share the same register-file, whereas SSE adds 8 new independent registers (XMM0–XMM7).
Because MMX/3DNow! registers are shared by the standard x87 FPU, 3DNow! instructions and x87 instructions cannot be executed simultaneously. However, because it is aliased to the x87 FPU, the 3DNow! and MMX register states can be saved and restored by the traditional x87 F(N)SAVE and F(N)RSTOR instructions. This arrangement allowed operating systems to support 3DNow! with no explicit modifications, whereas SSE registers required explicit operating system support to properly save and restore the new XMM registers (via the added FXSAVE and FXRSTOR instructions.)
The FX* instructions from SSE provide a functional superset of the older x87 save and restore instructions. They can save not only SSE register states but also the x87 register states (hence are applicable also for MMX and 3DNow! operations where supported).
On AMD Athlon XP and K8-based cores (i.e. Athlon 64), assembly programmers have noted that it is possible to combine 3DNow! and SSE instructions to reduce register pressure, but in practice it is difficult to improve performance due to the instructions executing on shared functional units.[10]
Processors supporting 3DNow!
[edit]References
[edit]- ^ "Effectively Utilizing 3DNow in Linux". Linux Journal. December 1, 1999. Archived from the original on 2011-06-07. Retrieved 2010-10-03.
- ^ "3DNow Instructions are Being Deprecated | AMD Developer Central". Blogs.amd.com. 2010-08-18. Archived from the original on 2010-10-24. Retrieved 2010-10-03.
- ^ "IntelE38xx - MinnowBoard Wiki". Archived from the original on 11 February 2017. Retrieved 13 February 2017.
- ^ a b "AMD Extensions to the 3DNow and MMX Instruction Sets Manual" (PDF). Advanced Micro Devices, Inc. March 2000. Archived (PDF) from the original on 2008-05-17. Retrieved 2008-06-07.
- ^ "Mobile AMD-K6-III-P Processor-Based Notebook: Ziff-Davis CPUmark 99". Archived from the original on 2008-07-24. Retrieved 2008-06-07.
Incorrect title on page: Mobile AMD-K6-III+ and Mobile AMD-K6-2+ Processors with Enchanced [sic] 3DNow! Technology
- ^ "AMD Athlon Processor Product Brief". Advanced Micro Devices, Inc. Archived from the original on 2008-02-25. Retrieved 2008-06-08.
- ^ "ISSE". AviSynth. Archived from the original on 2017-07-02. Retrieved 2017-07-19.
- ^ "Explaining the new 3DNow! Professional Technology". Advanced Micro Devices, Inc. Archived from the original on 2009-01-21. Retrieved 2008-06-08.
- ^ "AMD Athlon XP Architectural Features". Advanced Micro Devices, Inc. Archived from the original on 2008-02-25. Retrieved 2008-06-08.
- ^ Larry Lewis (9 July 2003). "3DNow+ vs SSE on Athlon XP". Newsgroup: comp.sys.ibm.pc.hardware.chips. Usenet: ad82cd69.0307090931.25391323@posting.google.com. Archived from the original on 2012-10-03. Retrieved 4 January 2023 – via Google Groups.
Further reading
[edit]- Case, Brian (1 June 1998). "3DNow Boosts Non-Intel 3D Performance". Microprocessor Report.
- Oberman, S.; Favor, G.; Weber, F. (March 1999). "AMD 3DNow technology: architecture and implementations". IEEE Micro.
External links
[edit]- 3DNow Technology Partners, archived from the original (removed from AMD's website in early 2001)
- AMD 3DNow Instruction Porting Guide (PDF), archived from the original (removed from AMD's website in 2014)
- 3DNow Technology Manual
- AMD Extensions to the 3DNow and MMX Instruction Sets Manual
- AMD Geode LX Processors Data Book
- AMD 3DNow! SDK March 1999, archived
3DNow!
View on GrokipediaOverview
Definition and Purpose
3DNow! is a proprietary single instruction, multiple data (SIMD) instruction set extension to the x86 architecture developed by Advanced Micro Devices (AMD). It enables vector processing of floating-point operations by packing two 32-bit single-precision floating-point values into each 64-bit MMX register, allowing parallel computations on multiple data elements within a single instruction.[1] Introduced to compete with Intel's MMX technology, which supported only integer operations, and the subsequent SSE extension for floating-point SIMD, 3DNow! extended the existing MMX register set to handle floating-point tasks without requiring additional hardware.[7] The primary purpose of 3DNow! is to accelerate performance in floating-point-intensive applications, such as 3D rendering, video decoding, and scientific simulations, by performing parallel floating-point operations directly on the CPU without the need for dedicated vector processing units. This extension targets enhancements in multimedia processing, including faster frame rates in high-resolution 3D scenes, improved physical modeling for realistic environments, sharper imaging, smoother video playback, and higher-quality audio reproduction.[1] By leveraging SIMD parallelism, it enables developers to optimize code for efficient handling of vector-based computations common in graphics pipelines.[7] In the late 1990s, the x86 architecture's x87 floating-point unit (FPU) was limited to scalar operations, processing one floating-point value at a time, while MMX provided SIMD capabilities solely for integers, making both inadequate for the floating-point-heavy workloads in emerging 3D graphics applications like Direct3D and OpenGL.[1] These limitations hindered real-time performance on consumer PCs, where single floating-point execution units struggled with the parallel demands of 3D transformations and lighting calculations. 3DNow! addressed this by integrating SIMD floating-point support into the x86 core, enabling more efficient processing for gaming and multimedia without the overhead of context switching between integer and floating-point modes.[1] A representative example of its application is in consumer PC gaming, where 3DNow! targeted real-time 3D graphics to improve frame rates in titles like Quake III Arena, which incorporated optimizations for the extension to enhance vertex processing and rendering efficiency on AMD processors.[1]Key Technical Features
3DNow! leverages the existing eight 64-bit MMX registers (MM0 through MM7) to enable single instruction, multiple data (SIMD) parallelism for floating-point operations, with each register capable of storing two packed 32-bit single-precision floating-point values.[1][2] This architecture reuses the MMX register file without requiring additional hardware, allowing seamless integration into x86 processors for tasks such as 3D graphics acceleration.[1] The data format adheres to the IEEE 754 standard for single-precision floating-point numbers, packing two such values into a single 64-bit register while supporting denormalized numbers and gradual underflow to maintain numerical accuracy in computations.[2][8] The core of 3DNow! consists of 21 new SIMD instructions, 19 of which are floating-point instructions that perform parallel operations on the packed values within the MMX registers, including essential arithmetic and comparison functions.[2] Representative instructions include PFADD for packed floating-point addition, PFMAX for packed maximum, and PFMUL for packed multiplication, which operate element-wise on the two single-precision values per register.[1][2] Additionally, the set provides conversion instructions such as PF2ID (packed floating-point to signed 32-bit integer) and PI2FD (packed 32-bit integer to floating-point), facilitating data interchange between integer and floating-point domains without unloading to memory.[1][8] Distinctive among SIMD extensions at the time, 3DNow! incorporates horizontal operations like PFACC (packed floating-point accumulate), which adds the two elements within the same register to produce partial sums efficiently for algorithms such as dot products.[1][2] It also includes prefetch instructions—PREFETCH for read-only data prefetching into the cache and PREFETCHW for write-allocated prefetching—to optimize memory access patterns by reducing cache misses in data-intensive workloads.[1][2] To ensure backward compatibility with the x87 floating-point unit (FPU), which shares the same physical registers as MMX, 3DNow! instructions are encoded with the two-byte prefix 0x0F 0x0F, distinguishing them from x87 opcodes and avoiding conflicts during mixed-mode execution.[8][1] The FEMMS instruction provides a low-overhead method to reset the FPU tag word after 3DNow! usage, contrasting with the more comprehensive EMMS required for pure MMX operations.[1][2]History
Development and Introduction
In the mid-1990s, Advanced Micro Devices (AMD) initiated research and development efforts within its California Microprocessor Division to enhance the floating-point capabilities of its K6 processor family, which suffered from performance limitations in the floating-point unit (FPU) compared to Intel's Pentium II.[2][1] These weaknesses were particularly evident in emerging 3D graphics applications, where the K6's single FPU struggled with intensive computations, prompting AMD to design a SIMD extension as a proactive counter to Intel's anticipated Katmai processor featuring early Streaming SIMD Extensions (SSE).[9] The project, internally targeted for completion in the second half of 1997, aimed to integrate these enhancements directly into the processor core without relying on costly external co-processors. AMD also collaborated with Cyrix and IDT to adopt 3DNow! as a unified standard, enabling support in their WinChip 2 and MII processors shortly after.[2][10] The development was led by a multimedia-focused engineering team at AMD, including key contributors Stuart Oberman, Fred Weber, Norbert Juffa, and Greg Favor, who specialized in creating efficient, low-cost SIMD solutions tailored for both desktop personal computers and emerging embedded applications.[2] This team collaborated closely with independent software vendors (ISVs) to define the instruction set, ensuring compatibility with existing x86 architectures while prioritizing affordability and broad market applicability for consumer-grade systems.[2][1] Their work built upon the integer-only limitations of Intel's MMX instructions, extending SIMD paradigms to floating-point operations to better support multimedia workloads.[2] The primary technical motivation for 3DNow! stemmed from the need to accelerate floating-point SIMD operations essential for 3D graphics pipelines, such as transformations and lighting calculations, which were bottlenecks in the K6's design due to its reliance on a single FPU for all such tasks.[1] By reusing MMX registers for packed 32-bit floating-point data, the extension enabled parallel processing of multiple data elements, addressing the growing demands of 3D rendering without the expense of dedicated hardware accelerators.[2] This approach was intended to deliver substantial performance gains in graphics-intensive scenarios, targeting up to four floating-point operations per cycle to overcome the K6's inherent scalar limitations.[2] Pre-announcement milestones included rigorous integration testing of 3DNow! instructions with K6-2 processor prototypes throughout 1997, focusing on validation within the 0.25μm CMOS fabrication process and dual-pipeline execution units to ensure seamless compatibility.[2] These efforts culminated in silicon validation by early 1998, with the technology designed to provide 2-4x speedups in representative 3D graphics workloads, such as geometry setup and physics simulations, thereby positioning the K6-2 as a competitive alternative in the value-oriented PC segment.[1]Announcement and Initial Adoption
AMD announced 3DNow! technology alongside the launch of its K6-2 processor on May 28, 1998, introducing single-instruction multiple-data (SIMD) instructions aimed at accelerating 3D graphics and multimedia processing on x86 processors. The K6-2, fabricated on a 0.25-micron process with 9.3 million transistors, was positioned as a cost-effective alternative to Intel's Pentium II, with initial pricing starting at $185 for the 266 MHz model to target mainstream personal computers. Early adoption was facilitated by compatibility with existing Super Socket 7 motherboards, including models from ASUS such as the P5A and from MSI such as the MS-6163, which supported the K6-2's 100 MHz front-side bus and enabled upgrades in value-oriented systems without requiring new hardware platforms. Software support quickly followed, with Microsoft incorporating 3DNow! optimizations into DirectX 6.1, released in early 1999, to leverage the instructions for improved Direct3D rendering performance.[11] Additionally, 3Dfx provided Glide API wrappers optimized for 3DNow! in drivers for Voodoo graphics cards, enhancing compatibility for 3D-accelerated games.[12] The launch significantly boosted AMD's market position, with K6-2 shipments exceeding 8.5 million units in under seven months and driving substantial revenue growth through increased penetration in the sub-$1,000 PC segment, where AMD captured around 37% share by late 1998.[13][14] In the gaming sector, endorsements from developers like id Software accelerated uptake; id collaborated with 3Dfx on 3DNow!-optimized drivers for Quake II, yielding significant performance gains—up to nearly double in some benchmarks—compared to non-optimized versions on compatible hardware.[15] Despite these advances, initial challenges arose from limited developer tools and SDKs, resulting in inconsistent optimization across early titles; for instance, Unreal Tournament (1999) included 3DNow! support but exhibited variable performance gains depending on implementation, highlighting the need for more mature compiler and library ecosystems.[16] By 1999, however, these hurdles began to ease as broader industry adoption improved software maturity.[17]Versions and Extensions
Original 3DNow!
The original 3DNow! instruction set debuted with the AMD K6-2 microprocessor in 1998, adding 21 new single instruction, multiple data (SIMD) floating-point instructions to the existing MMX foundation.[1][2] These instructions targeted enhancements in 3D graphics, audio, and video processing by enabling packed single-precision floating-point operations on two 32-bit values per 64-bit register.[1][10] Key instructions in this baseline set included conversions such as PF2ID (packed floating-point to 32-bit integer with truncation) and PI2FD (packed 32-bit integer to floating-point), alongside basic arithmetic operations like PFACC (packed floating-point accumulate).[1] The full set comprised: FEMMS (fast enter/leave MMX state), PAVGUSB (packed average unsigned bytes), PF2ID, PFACC, PFADD (packed floating-point add), PFCMPEQ (packed floating-point compare equal), PFCMPGE (packed floating-point compare greater or equal), PFCMPGT (packed floating-point compare greater than), PFMAX (packed floating-point maximum), PFMIN (packed floating-point minimum), PFMUL (packed floating-point multiply), PFRCP (packed floating-point reciprocal), PFRCPIT1 (packed floating-point reciprocal iterative 1), PFRCPIT2 (packed floating-point reciprocal iterative 2), PFRSQIT1 (packed floating-point reciprocal square root iterative 1), PFRSQRT (packed floating-point reciprocal square root), PFSUB (packed floating-point subtract), PFSUBR (packed floating-point subtract reverse), PI2FD, PMULHRW (packed multiply high round and scale word), and PREFETCH/PREFETCHW (prefetch and prefetch with intent to write).[1] These operations utilized the same 64-bit MMX registers (MM0 through MM7) without requiring changes to the x87 floating-point stack, ensuring seamless integration.[1][2] Software detection of original 3DNow! support relied on the CPUID instruction with extended function 0x8000_0001, where bit 31 (the most significant bit) of the EDX register is set if the feature is present.[1] This mechanism allowed applications to query hardware capabilities dynamically. Compatibility mandated underlying MMX support, as 3DNow! built directly upon it, but no operating system modifications were necessary, and there was no performance penalty for switching between MMX and 3DNow! states due to the shared register file.[1][2] Applications needed to explicitly check CPUID to enable 3DNow!-optimized code paths.[1]3DNow! Extensions
The 3DNow! Extensions were introduced by AMD alongside the first-generation Athlon processors in June 1999, enhancing the original 3DNow! instruction set with additional capabilities for multimedia processing.[4] These extensions added a total of 24 new instructions to the existing 3DNow! and MMX sets, comprising 5 specialized digital signal processing (DSP) instructions that operate on 3DNow!'s 64-bit MMX registers and 19 instructions compatible with Intel's emerging MMX extensions (later incorporated into SSE).[4] This update aimed to accelerate integer-based multimedia workloads on Athlon processors, which featured a 128-bit wide SIMD pipeline for improved throughput.[4] The 5 DSP instructions focus on efficient packed floating-point to integer conversions and accumulate operations, enabling faster processing for tasks such as audio filtering and video effects. Representative examples include PFNACC (packed single-precision floating-point negative accumulate), which performs subtraction and accumulation on pairs of 32-bit floats to support DSP algorithms like finite impulse response (FIR) filters, and PI2FW (packed integer to floating-point word), which converts 32-bit integers to 32-bit floats for seamless data transitions in multimedia pipelines.[4] The 19 MMX-compatible instructions, drawn from AMD's Enhanced Multimedia Extensions (EMMX), emphasize integer operations for video and image processing; for instance, PSADBW (packed sum of absolute differences byte-wise) computes the sum of absolute differences between two 8-byte vectors, a key primitive for motion estimation in MPEG video decoding, while PMULHUW (packed multiply high unsigned word) multiplies pairs of 16-bit unsigned integers and stores the high 16 bits of each result, aiding in fixed-point arithmetic for graphics scaling.[4] These extensions improved performance in integer-heavy multimedia applications, such as MPEG-2 decoding and audio codec processing, by providing hardware acceleration that bridged the gap between the original 3DNow!'s floating-point focus and Intel's forthcoming SSE integer instructions, with benchmarks showing up to 2x speedup in video encoding tasks on Athlon compared to K6-2 processors.[4][18] Support for 3DNow! Extensions is detected via the CPUID instruction: executing function 0x80000001 returns feature flags in EDX, where bit 30 indicates presence of the 3DNow! DSP extensions, and bit 22 signals the MMX extensions (though standard MMX is bit 23 in the basic CPUID).[4][18]| Category | Number of Instructions | Examples | Primary Use Cases |
|---|---|---|---|
| 3DNow! DSP | 5 | PFNACC, PI2FW, PF2IW, PFPNACC, PSWAPD | Audio filtering, floating-point DSP conversions |
| MMX Extensions | 19 | PSADBW, PMULHUW, PAVGB, MASKMOVQ, PREFETCHNTA | Video encoding (e.g., MPEG motion estimation), non-temporal stores, prefetching for cache optimization |
3DNow! Professional
3DNow! Professional is an enhanced version of AMD's SIMD instruction set that integrates the original 3DNow! technology with Intel's Streaming SIMD Extensions (SSE), providing full support for SSE instructions without including SSE2.[19] It was introduced on October 9, 2001, alongside the Athlon XP processors based on the Palomino core, which merged all prior 3DNow! features into this unified extension for improved multimedia processing. This upgrade aimed to bridge compatibility gaps between AMD and Intel architectures, allowing developers to leverage a broader set of vector operations in professional software. A core feature of 3DNow! Professional is the adoption of 128-bit XMM registers from SSE, enabling packed single-precision floating-point operations across four elements per register, while retaining 3DNow!'s unique horizontal operations such as horizontal addition and multiplication for efficient vector reductions not natively available in SSE.[19] This combination added approximately 52 new instructions overall, enhancing performance in 3D graphics, video encoding, and scientific simulations by supporting advanced algorithms that mix integer and floating-point SIMD tasks seamlessly.[20] For instance, applications like Adobe Premiere benefited from unified code paths that could utilize these extensions for faster photo, video, and audio editing without separate AMD- or Intel-specific optimizations.[20] The primary benefit of 3DNow! Professional lies in its backward compatibility with SSE-based software on AMD hardware, which reduced developer fragmentation by allowing a single codebase to run efficiently across platforms without extensive branching for processor-specific instructions.[19] This compatibility extended to operating systems supporting SSE, enabling up to 25% performance gains in real-world multimedia workloads on Palomino-based systems.[20] Software detection of 3DNow! Professional typically involves querying CPUID function 0x00000001, where bit 25 in the EDX register indicates SSE support, combined with extended function 0x80000001, where bit 31 in EDX confirms 3DNow! presence; processors reporting both flags on AuthenticAMD vendor strings qualify as supporting the full Professional feature set.[19]Support in Geode Processors
The AMD Geode processors, designed for embedded and low-power applications, integrated 3DNow! technology as a core component of their x86-compatible architecture to enhance multimedia and graphics performance in resource-constrained environments. The Geode GX series, introduced in 2003, featured the original 3DNow! instructions along with extensions, tailored for thin clients and set-top boxes, while emphasizing power efficiency through a fully pipelined floating-point unit compliant with IEEE 754 standards.[21] Similarly, the Geode LX series, launched in 2007, retained full support for 3DNow! Professional, which merged 3DNow! with SSE instructions, enabling seamless execution of advanced SIMD operations for video decoding and 3D rendering in low-wattage systems.[22][23] These processors operated at reduced clock speeds to prioritize energy efficiency, with the Geode GX reaching up to 500 MHz and the Geode LX topping out at 600 MHz for the LX900 variant, though common models like the LX800 ran at 500 MHz and the LX700 at 433 MHz, all while maintaining compatibility with 3DNow! Professional for optimized floating-point computations.[24] This adaptation proved particularly suitable for battery-powered devices, such as the OLPC XO-1 laptop, which utilized the Geode LX-700 to deliver educational computing capabilities with hardware-accelerated video playback and basic 3D graphics in power-sensitive scenarios.[24] The Geode line's 3DNow! implementation also included two unique instructions absent in other AMD processors, PFRCPV (packed floating-point reciprocal approximation) and PFRSQRTV (packed floating-point reciprocal square root approximation), further enhancing efficiency for embedded multimedia tasks like MPEG video decoding in set-top boxes.[25] In practical applications, 3DNow! support in Geode processors facilitated video playback and processing in embedded systems, including industrial control panels and gaming terminals such as casino machines, where low power draw—down to 0.9 W for the LX800—combined with integrated graphics accelerators ensured reliable performance without active cooling.[26][27] These optimizations made the Geode series a staple for thin clients and control systems throughout the late 2000s, supporting x86 software ecosystems while minimizing thermal output.[22] Support for 3DNow! in the Geode lineup effectively ended around 2010, coinciding with AMD's announcement of its deprecation in future architectures and the lack of a direct successor, shifting focus to newer embedded solutions like the G-Series APUs.[28][25]Implementation
Supported Processors
3DNow! was first implemented in hardware with the AMD K6-2 processor, introduced in May 1998 as part of AMD's Super Socket 7 platform.[29] This marked the debut of the SIMD extension set, enabling enhanced 3D graphics and multimedia performance on x86 systems. Subsequent AMD processors in the K6 family, such as the K6-III released later in 1998, also included full 3DNow! support.[1] AMD extended 3DNow! compatibility across its mainstream x86 lineup through the early 2000s. The Athlon processors, starting with the original Athlon in 1999 and continuing to the Athlon 64 models produced from 2003 to 2009, incorporated 3DNow! alongside MMX and later SSE instructions.[1] Budget-oriented lines like Duron (introduced 2000) and Sempron (2002) similarly featured the full instruction set, as did mobile variants. Support persisted into AMD's accelerated processing units (APUs), with the Llano-based A8-3870K desktop APU in 2011 being the final model to include 3DNow!.[7] However, beginning with the Bulldozer architecture in 2011, Bobcat in 2011, and Zen architectures from 2017 onward, AMD processors excluded 3DNow! hardware implementation.[30]| Processor Family | Introduction Year | Key Models with 3DNow! Support |
|---|---|---|
| AMD K6-2/K6-III | 1998 | K6-2 (up to 550 MHz), K6-III (up to 550 MHz) |
| AMD Athlon/Athlon XP | 1999–2005 | Athlon (up to 2.2 GHz), Athlon XP (up to 2.33 GHz) |
| AMD Athlon 64 | 2003–2009 | Athlon 64 (up to 3.2 GHz), Athlon 64 X2 (dual-core up to 3.2 GHz) |
| AMD Duron/Sempron | 2000–2006 | Duron (up to 1.8 GHz), Sempron (up to 2.3 GHz) |
| AMD APUs (Llano) | 2011 | A8-3870K (up to 3.0 GHz) |
