Floating-point unit

A floating-point unit (FPU), numeric processing unit (NPU),^[1] colloquially math coprocessor, is a part of a computer system specially designed to carry out operations on floating-point numbers.^[2] Typical operations are addition, subtraction, multiplication, division, and square root. Modern designs generally include a fused multiply-add instruction, which was found to be very common in real-world code. Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but the accuracy can be low,^[3]^[4] so some systems prefer to compute these functions in software.

Floating-point operations were originally handled in software in early computers. Over time, manufacturers began to provide standardized floating-point libraries as part of their software collections. Some machines, those dedicated to scientific processing, would include specialized hardware to perform some of these tasks with much greater speed. The introduction of microcode in the 1960s allowed these instructions to be included in the system's instruction set architecture (ISA). Normally these would be decoded by the microcode into a series of instructions that were similar to the libraries, but on those machines with an FPU, they would instead be routed to that unit, which would perform them much faster. This allowed floating-point instructions to become universal while the floating-point hardware remained optional; for instance, on the PDP-11 one could add the floating-point processor unit at any time using plug-in expansion cards.

The introduction of the microprocessor in the 1970s led to a similar evolution as the earlier mainframes and minicomputers. Early microcomputer systems performed floating point in software, typically in a vendor-specific library included in ROM. Dedicated single-chip FPUs began to appear late in the decade, but they remained rare in real-world systems until the mid-1980s, and using them required software to be re-written to call them. As they became more common, the software libraries were modified to work like the microcode of earlier machines, performing the instructions on the main CPU if needed, but offloading them to the FPU if one was present. By the late 1980s, semiconductor manufacturing had improved to the point where it became possible to include an FPU with the main CPU, resulting in designs like the i486 and 68040. These designs were known as an "integrated FPU"s, and from the mid-1990s, FPUs were a standard feature of most CPU designs except those designed as low-cost as embedded processors.

In modern designs, a single CPU will typically include several arithmetic logic units (ALUs) and several FPUs, reading many instructions at the same time and routing them to the various units for parallel execution. By the 2000s, even embedded processors generally included an FPU as well.

History

In 1954, the IBM 704 had floating-point arithmetic as a standard feature, one of its major improvements over its predecessor the IBM 701. This was carried forward to its successors the 709, 7090, and 7094.

In 1963, Digital announced the PDP-6, which had floating point as a standard feature.^[5]

In 1963, the GE-235 featured an "Auxiliary Arithmetic Unit" for floating point and double-precision calculations.^[6]

Historically, some systems implemented floating point with a coprocessor rather than as an integrated unit (but now in addition to the CPU, e.g. GPUs – that are coprocessors not always built into the CPU – have FPUs as a rule, while first generations of GPUs did not). This could be a single integrated circuit, an entire circuit board or a cabinet. Where floating-point calculation hardware has not been provided, floating-point calculations are done in software, which takes more processor time, but avoids the cost of the extra hardware. For a particular computer architecture, the floating-point unit instructions may be emulated by a library of software functions; this may permit the same object code to run on systems with or without floating-point hardware. Emulation can be implemented on any of several levels: in the CPU as microcode, as an operating system function, or in user-space code. When only integer functionality is available, the CORDIC methods are most commonly used for transcendental function evaluation.^{[citation needed]}

In most modern computer architectures, there is some division of floating-point operations from integer operations. This division varies significantly by architecture; some have dedicated floating-point registers, while some, like Intel x86, go as far as independent clocking schemes.^[7]

CORDIC routines have been implemented in Intel x87 coprocessors (8087,^[8]^[9]^[10]^[11]^[12] 80287,^[12]^[13] 80387^[12]^[13]) up to the 80486^[8] microprocessor series, as well as in the Motorola 68881^[8]^[9] and 68882 for some kinds of floating-point instructions, mainly as a way to reduce the gate counts (and complexity) of the FPU subsystem.

Floating-point operations are often pipelined. In earlier superscalar architectures without general out-of-order execution, floating-point operations were sometimes pipelined separately from integer operations.

The modular architecture of Bulldozer microarchitecture uses a special FPU named FlexFPU, which uses simultaneous multithreading. Each physical integer core, two per module, is single-threaded, in contrast with Intel's Hyperthreading, where two virtual simultaneous threads share the resources of a single physical core.^[14]^[15]

Floating-point library

Some floating-point hardware only supports the simplest operations: addition, subtraction, and multiplication. But even the most complex floating-point hardware has a finite number of operations it can support – for example, no FPUs directly support arbitrary-precision arithmetic.

When a CPU is executing a program that calls for a floating-point operation that is not directly supported by the hardware, the CPU uses a series of simpler floating-point operations. In systems without any floating-point hardware, the CPU emulates it using a series of simpler fixed-point arithmetic operations that run on the integer arithmetic logic unit.

The software that lists the necessary series of operations to emulate floating-point operations is often packaged in a floating-point library.

Integrated FPUs

In some cases, FPUs may be specialized, and divided between simpler floating-point operations (mainly addition and multiplication) and more complicated operations, like division. In some cases, only the simple operations may be implemented in hardware or microcode, while the more complex operations are implemented as software.

In some current architectures, the FPU functionality is combined with SIMD units to perform SIMD computation; an example of this is the augmentation of the x87 instructions set with SSE instruction set in the x86-64 architecture used in newer Intel and AMD processors.

Add-on FPUs

Several models of the PDP-11, such as the PDP-11/45,^[16] PDP-11/34a,^[17]^{: 184–185} PDP-11/44,^[17]^{: 195, 211} and PDP-11/70,^[17]^{: 277, 286–287} supported an add-on floating-point unit to support floating-point instructions. The PDP-11/60,^[17]^: 261 MicroPDP-11/23^[18] and several VAX models^[19]^[20] could execute floating-point instructions without an add-on FPU (the MicroPDP-11/23 required an add-on microcode option),^[18] and offered add-on accelerators to further speed the execution of those instructions.

In the 1980s, it was common in IBM PC/compatible microcomputers for the FPU to be entirely separate from the CPU, and typically sold as an optional add-on. It would only be purchased if needed to speed up or enable math-intensive programs.

The IBM PC, XT, and most compatibles based on the 8088 or 8086 had a socket for the optional 8087 coprocessor. The AT and 80286-based systems were generally socketed for the 80287, and 80386/80386SX-based machines – for the 80387 and 80387SX respectively, although early ones were socketed for the 80287, since the 80387 did not exist yet. Other companies manufactured co-processors for the Intel x86 series. These included Cyrix and Weitek. Acorn Computers opted for the WE32206 to offer single, double and extended precision^[21] to its ARM powered Archimedes range, introducing a gate array to interface the ARM2 processor with the WE32206 to support the additional ARM floating-point instructions.^[22] Acorn later offered the FPA10 coprocessor, developed by ARM, for various machines fitted with the ARM3 processor.^[23]

Coprocessors were available for the Motorola 68000 family, the 68881 and 68882. These were common in Motorola 68020/68030-based workstations, like the Sun-3 series. They were also commonly added to higher-end models of Apple Macintosh and Commodore Amiga series, but unlike IBM PC-compatible systems, sockets for adding the coprocessor were not as common in lower-end systems.

There are also add-on FPU coprocessor units for microcontroller units (MCUs/μCs)/single-board computer (SBCs), which serve to provide floating-point arithmetic capability. These add-on FPUs are host-processor-independent, possess their own programming requirements (operations, instruction sets, etc.) and are often provided with their own integrated development environments (IDEs).

References

^ "Intel 80287XL Numeric Processing Unit". computinghistory.org.uk. Retrieved 2024-11-02.
^ Anderson, Stanley F.; Earle, John G.; Goldschmidt, Robert Elliott; Powers, Don M. (January 1967). "The IBM System/360 Model 91: Floating-Point Execution Unit". IBM Journal of Research and Development. 11 (1): 34–53. doi:10.1147/rd.111.0034. ISSN 0018-8646.
^ Dawson, Bruce (2014-10-09). "Intel Underestimates Error Bounds by 1.3 quintillion". randomascii.wordpress.com. Retrieved 2020-01-16.
^ "FSIN Documentation Improvements in the "Intel® 64 and IA-32 Architectures Software Developer's Manual"". intel.com. 2014-10-09. Archived from the original on 2020-01-16. Retrieved 2020-01-16.
^ "PDP-6 Handbook" (PDF). www.bitsavers.org. Archived (PDF) from the original on 2022-10-09.
^ "GE-2xx documents". www.bitsavers.org. CPB-267_GE-235-SystemManual_1963.pdf, p. IV-4.
^ "Intel 80287 family". www.cpu-world.com. Retrieved 2019-01-15.
^ ^a ^b ^c Muller, Jean-Michel (2006). Elementary Functions: Algorithms and Implementation (2nd ed.). Boston, Massachusetts: Birkhäuser. p. 134. ISBN 978-0-8176-4372-0. LCCN 2005048094. Retrieved 2015-12-01.
^ ^a ^b Nave, Rafi (March 1983). "Implementation of Transcendental Functions on a Numerics Processor". Microprocessing and Microprogramming. 11 (3–4): 221–225. doi:10.1016/0165-6074(83)90151-5.
^ Palmer, John F.; Morse, Stephen Paul (1984). The 8087 Primer (1st ed.). John Wiley & Sons Australia, Limited. ISBN 0471875694. 9780471875697. Retrieved 2016-01-02.
^ Glass, L. Brent (January 1990). "Math Coprocessors: A look at what they do, and how they do it". Byte. 15 (1): 337–348. ISSN 0360-5280.
^ ^a ^b ^c Jarvis, Pitts (1990-10-01). "Implementing CORDIC algorithms – A single compact routine for computing transcendental functions". Dr. Dobb's Journal: 152–156. Retrieved 2016-01-02.
^ ^a ^b Yuen, A. K. (1988). "Intel's Floating-Point Processors". Electro/88 Conference Record: 48/5/1–7.
^ "AMD Steamroller vs Bulldozer". WCCFtech. Archived from the original on 9 May 2015. Retrieved 14 March 2022.
^ Halfacree, Gareth (28 October 2010). "AMD unveils Flex FP". bit-tech.net. Archived from the original on Mar 22, 2017. Retrieved 29 March 2018.
^ PDP-11/45 Processor Handbook (PDF). Digital Equipment Corporation. 1973. Chapter 7 "Floating Point Processor". Retrieved 2025-10-30.
^ ^a ^b ^c ^d PDP-11 Processor Handbook (PDF). Digital Equipment Corporation. 1979. Retrieved 2025-10-30.
^ ^a ^b MICRO/PDP-11 Handbook (PDF). Digital Equipment Corporation. 1983. p. 33.
^ VAX – Hardware Handbook Volume I – 1986 (PDF). Digital Equipment Corporation. 1985.
^ VAX – Hardware Handbook Volume II – 1986 (PDF). Digital Equipment Corporation. 1986.
^ "Western Electric 32206 co-processor". www.cpu-world.com. Retrieved 2021-11-06.
^ Fellows, Paul (March 1990). "Programming The ARM: The Floating Point Co-processor". A&B Computing. pp. 43–44.
^ "Acorn Releases Floating Point Accelerator" (Press release). Acorn Computers Limited. 5 July 1993. Retrieved 7 April 2021.

History

Media collections

Floating-point unit

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Floating-point unit

History

Floating-point library

Integrated FPUs

Add-on FPUs

See also

References

Further reading