Recent from talks
Contribute something
Nothing was collected or created yet.
Arithmetic logic unit
View on Wikipedia
| Part of a series on | |||||||
| Arithmetic logic circuits | |||||||
|---|---|---|---|---|---|---|---|
| Quick navigation | |||||||
|
Components
|
|||||||
|
See also |
|||||||
In computing, an arithmetic logic unit (ALU) is a combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers.[1][2] This is in contrast to a floating-point unit (FPU), which operates on floating point numbers. It is a fundamental building block of many types of computing circuits, including the central processing unit (CPU) of computers, FPUs, and graphics processing units (GPUs).[3]
The inputs to an ALU are the data to be operated on, called operands, and a code indicating the operation to be performed (opcode); the ALU's output is the result of the performed operation. In many designs, the ALU also has status inputs or outputs, or both, which convey information about a previous operation or the current operation, respectively, between the ALU and external status registers.
Signals
[edit]An ALU has a variety of input and output nets, which are the electrical conductors used to convey digital signals between the ALU and external circuitry. When an ALU is operating, external circuits apply signals to the ALU inputs and, in response, the ALU produces and conveys signals to external circuitry via its outputs.
Data
[edit]A basic ALU has three parallel data buses consisting of two input operands (A and B) and a result output (Y). Each data bus is a group of signals that conveys one binary integer number. Typically, the A, B and Y bus widths (the number of signals comprising each bus) are identical and match the native word size of the external circuitry (e.g., the encapsulating CPU or other processor).
Opcode
[edit]The opcode input is a parallel bus that conveys to the ALU an operation selection code, which is an enumerated value that specifies the desired arithmetic or logic operation to be performed by the ALU. The opcode size (its bus width) determines the maximum number of distinct operations the ALU can perform; for example, a four-bit opcode can specify up to sixteen different ALU operations. Generally, an ALU opcode is not the same as a machine language instruction, though in some cases it may be directly encoded as a bit field within such instructions.
Status
[edit]Outputs
[edit]The status outputs are various individual signals that convey supplemental information about the result of the current ALU operation. General-purpose ALUs commonly have status signals such as:
- Carry-out, which conveys the carry resulting from an addition operation, the borrow resulting from a subtraction operation, or the overflow bit resulting from a binary shift operation.
- Zero, which indicates all bits of Y are logic zero.
- Negative, which indicates the result of an arithmetic operation is negative.
- Overflow, which indicates the result of an arithmetic operation has exceeded the numeric range of Y.
- Parity, which indicates whether an even or odd number of bits in Y are logic one.
Inputs
[edit]The status inputs allow additional information to be made available to the ALU when performing an operation. Typically, this is a single "carry-in" bit that is the stored carry-out from a previous ALU operation.
Circuit operation
[edit]
An ALU is a combinational logic circuit, meaning that its outputs will change asynchronously in response to input changes. In normal operation, stable signals are applied to all of the ALU inputs and, when enough time (known as the "propagation delay") has passed for the signals to propagate through the ALU circuitry, the result of the ALU operation appears at the ALU outputs. The external circuitry connected to the ALU is responsible for ensuring the stability of ALU input signals throughout the operation, and for allowing sufficient time for the signals to propagate through the ALU circuitry before sampling the ALU outputs.
In general, external circuitry controls an ALU by applying signals to the ALU inputs. Typically, the external circuitry employs sequential logic to generate the signals that control ALU operation. The external sequential logic is paced by a clock signal of sufficiently low frequency to ensure enough time for the ALU outputs to settle under worst-case conditions (i.e., conditions resulting in the maximum possible propagation delay).
For example, a CPU starts an addition operation by routing the operands from their sources (typically processor registers) to the ALU's operand inputs, while simultaneously applying a value to the ALU's opcode input that configures it to perform an addition operation. At the same time, the CPU enables the destination register to store the ALU output (the resulting sum from the addition operation) upon operation completion. The ALU's input signals, which are held stable until the next clock, are allowed to propagate through the ALU and to the destination register while the CPU waits for the next clock. When the next clock arrives, the destination register stores the ALU result and, since the ALU operation has completed, the ALU inputs may be set up for the next ALU operation.
Functions
[edit]A number of basic arithmetic and bitwise logic functions are commonly supported by ALUs. Basic, general purpose ALUs typically include these operations in their repertoires:[1][2][4]
Arithmetic operations
[edit]- Add: A and B are summed and the sum appears at Y and carry-out.
- Add with carry: A, B and carry-in are summed and the sum appears at Y and carry-out.
- Subtract: B is subtracted from A (or vice versa) and the difference appears at Y and carry-out. For this function, carry-out is effectively a "borrow" indicator. This operation may also be used to compare the magnitudes of A and B; in such cases the Y output may be ignored by the processor, which is only interested in the status bits (particularly zero and negative) that result from the operation.
- Subtract with borrow: B is subtracted from A (or vice versa) with borrow (carry-in) and the difference appears at Y and carry-out (borrow out).
- Two's complement: The negative of A (or B) appears at Y in two's complement form.
- Increment: A (or B) is increased by one and the resulting value appears at Y.
- Decrement: A (or B) is decreased by one and the resulting value appears at Y.
Bitwise logical operations
[edit]- AND: the bitwise AND of A and B appears at Y. AND may also be used to TEST bits. In this case, the result would not be stored; only the status bits (particularly zero and negative) would be recorded.
- OR: the bitwise OR of A and B appears at Y.
- Exclusive-OR: the bitwise XOR of A and B appears at Y.
- Ones' complement: all bits of A (or B) are inverted and appear at Y.
Bit shift operations
[edit]| Type | Left | Right |
|---|---|---|
| Arithmetic shift | ||
| Logical shift | ||
| Rotate | ||
| Rotate through carry |
ALU shift operations cause operand A (or B) to shift left or right (depending on the opcode) and the shifted operand appears at Y. Simple ALUs typically can shift the operand by only one bit position, whereas more complex ALUs employ barrel shifters that allow them to shift the operand by an arbitrary number of bits in one operation. In all single-bit shift operations, the bit shifted out of the operand appears on carry-out; the value of the bit shifted into the operand depends on the type of shift.
- Arithmetic shift: the operand is treated as a two's complement integer, meaning that the most significant bit is a "sign" bit and is preserved.
- Logical shift: a logic zero is shifted into the operand. This is used to shift unsigned integers.
- Rotate: the operand is treated as a circular buffer of bits in which its least and most significant bits are effectively adjacent.
- Rotate through carry: the carry bit and operand are collectively treated as a circular buffer of bits.
Other operations
[edit]- Pass through: all bits of A (or B) appear unmodified at Y. This operation is typically used to determine the parity of the operand or whether it is zero or negative, or to copy the operand to a processor register.
Applications
[edit]Status usage
[edit]
Upon completion of each ALU operation, the ALU's status output signals are usually stored in external registers to make them available for future ALU operations (e.g., to implement multiple-precision arithmetic) and for controlling conditional branching. The bit registers that store the status output signals are often collectively treated as a single, multi-bit register, which is referred to as the "status register" or "condition code register".
Depending on the ALU operation being performed, some status register bits may be changed and others may be left unmodified. For example, in bitwise logical operations such as AND and OR, the carry status bit is typically not modified as it is not relevant to such operations.
In CPUs, the stored carry-out signal is usually connected to the ALU's carry-in net. This facilitates efficient propagation of carries (which may represent addition carries, subtraction borrows, or shift overflows) when performing multiple-precision operations, as it eliminates the need for software-management of carry propagation (via conditional branching, based on the carry status bit).
Operand and result data paths
[edit]
The sources of ALU operands and destinations of ALU results depend on the architecture of the encapsulating processor and the operation being performed. Processor architectures vary widely, but in general-purpose CPUs, the ALU typically operates in conjunction with a register file (array of processor registers) or accumulator register, which the ALU frequently uses as both a source of operands and a destination for results. To accommodate other operand sources, multiplexers are commonly used to select either the register file or alternative ALU operand sources as required by each machine instruction.
For example, the architecture shown to the right employs a register file with two read ports, which allows the values stored in any two registers (or the same register) to be ALU operands. Alternatively, it allows either ALU operand to be sourced from an immediate operand (a constant value which is directly encoded in the machine instruction[5]) or from memory. The ALU result may be written to any register in the register file or to memory.
Multiple-precision arithmetic
[edit]In integer arithmetic computations, multiple-precision arithmetic is an algorithm that operates on integers which are larger than the ALU word size. To do this, the algorithm treats each integer as an ordered collection of ALU-size fragments, arranged from most-significant (MS) to least-significant (LS) or vice versa. For example, in the case of an 8-bit ALU, the 24-bit integer 0x123456 would be treated as a collection of three 8-bit fragments: 0x12 (MS), 0x34, and 0x56 (LS). Since the size of a fragment exactly matches the ALU word size, the ALU can directly operate on this "piece" of operand.
The algorithm uses the ALU to directly operate on particular operand fragments and thus generate a corresponding fragment (a "partial") of the multi-precision result. Each partial, when generated, is written to an associated region of storage that has been designated for the multiple-precision result. This process is repeated for all operand fragments so as to generate a complete collection of partials, which is the result of the multiple-precision operation.
In arithmetic operations (e.g., addition, subtraction), the algorithm starts by invoking an ALU operation on the operands' LS fragments, thereby producing both a LS partial and a carry out bit. The algorithm writes the partial to designated storage, whereas the processor's state machine typically stores the carry out bit to an ALU status register. The algorithm then advances to the next fragment of each operand's collection and invokes an ALU operation on these fragments along with the stored carry bit from the previous ALU operation, thus producing another (more significant) partial and a carry out bit. As before, the carry bit is stored to the status register and the partial is written to designated storage. This process repeats until all operand fragments have been processed, resulting in a complete collection of partials in storage, which comprise the multi-precision arithmetic result.
In multiple-precision shift operations, the order of operand fragment processing depends on the shift direction. In left-shift operations, fragments are processed LS first because the LS bit of each partial—which is conveyed via the stored carry bit—must be obtained from the MS bit of the previously left-shifted, less-significant operand. Conversely, operands are processed MS first in right-shift operations because the MS bit of each partial must be obtained from the LS bit of the previously right-shifted, more-significant operand.
In bitwise logical operations (e.g., logical AND, logical OR), the operand fragments may be processed in any arbitrary order because each partial depends only on the corresponding operand fragments (the stored carry bit from the previous ALU operation is ignored).
Binary fixed-point addition and subtraction
[edit]Binary fixed-point values are represented by integers. Consequently, for any particular fixed-point scale factor (or implied radix point position), an ALU can directly add or subtract two fixed-point operands and produce a fixed-point result. This capability is commonly used in both fixed-point and floating-point addition and subtraction.
In floating-point addition and subtraction, the significand of the smaller operand is right-shifted so that its fixed-point scale factor matches that of the larger operand. The ALU then adds or subtracts the aligned significands to produce a result significand. Together with other operand elements, the result significand is normalized and rounded to produce the floating-point result.
Complex operations
[edit]Although it is possible to design ALUs that can perform complex functions, this is usually impractical due to the resulting increases in circuit complexity, power consumption, propagation delay, cost and size. Consequently, ALUs are typically limited to simple functions that can be executed at very high speeds (i.e., very short propagation delays), with more complex functions being the responsibility of software or external circuitry. For example:
- In simple cases in which a CPU contains a single ALU, the CPU typically implements a complex operation by orchestrating a sequence of ALU operations according to a software algorithm.
- More specialized architectures may use multiple ALUs to accelerate complex operations. In such systems, the ALUs are often pipelined, with intermediate results passing through ALUs arranged like a factory production line. Performance is greatly improved over that of a single ALU because all of the ALUs operate concurrently and software overhead is significantly reduced.
Graphics processing units
[edit]Graphics processing units (GPUs) often contain hundreds or thousands of ALUs which can operate concurrently. Depending on the application and GPU architecture, the ALUs may be used to simultaneously process unrelated data or to operate in parallel on related data. An example of the latter is graphics rendering, in which multiple ALUs perform the same operation in parallel on a group of pixels, with each ALU operating on a pixel within a scene.[6]
Implementation
[edit]An ALU is usually implemented either as a stand-alone integrated circuit (IC), such as the 74181, or as part of a more complex IC. In the latter case, an ALU is typically instantiated by synthesizing it from a description written in VHDL, Verilog or some other hardware description language. For example, the following VHDL code describes a very simple 8-bit ALU:
entity alu is
port ( -- the alu connections to external circuitry:
A : in signed(7 downto 0); -- operand A
B : in signed(7 downto 0); -- operand B
OP : in unsigned(2 downto 0); -- opcode
Y : out signed(7 downto 0)); -- operation result
end alu;
architecture behavioral of alu is
begin
case OP is -- decode the opcode and perform the operation:
when "000" => Y <= A + B; -- add
when "001" => Y <= A - B; -- subtract
when "010" => Y <= A - 1; -- decrement
when "011" => Y <= A + 1; -- increment
when "100" => Y <= not A; -- 1's complement
when "101" => Y <= A and B; -- bitwise AND
when "110" => Y <= A or B; -- bitwise OR
when "111" => Y <= A xor B; -- bitwise XOR
when others => Y <= (others => 'X');
end case;
end behavioral;
History
[edit]Mathematician John von Neumann proposed the ALU concept in 1945 in a report on the foundations for a new computer called the EDVAC.[7]
The cost, size, and power consumption of electronic circuitry was relatively high throughout the infancy of the Information Age. Consequently, all early computers had a serial ALU that operated on one data bit at a time although they often presented a wider word size to programmers. The first computer to have multiple parallel discrete single-bit ALU circuits was the 1951 Whirlwind I, which employed sixteen such "math units" to enable it to operate on 16-bit words.
In 1967, Fairchild introduced the first ALU-like device implemented as an integrated circuit, the Fairchild 3800, consisting of an eight-bit arithmetic unit with accumulator. It only supported adds and subtracts but no logic functions.[8]
Full integrated-circuit ALUs soon emerged, including four-bit ALUs such as the Am2901 and 74181. These devices were typically "bit slice" capable, meaning they had "carry look ahead" signals that facilitated the use of multiple interconnected ALU chips to create an ALU with a wider word size. These devices quickly became popular and were widely used in bit-slice minicomputers.
Microprocessors began to appear in the early 1970s. Even though transistors had become smaller, there was sometimes insufficient die space for a full-word-width ALU and, as a result, some early microprocessors employed a narrow ALU that required multiple cycles per machine language instruction. Examples of this includes the popular Zilog Z80, which performed eight-bit additions with a four-bit ALU.[9] Over time, transistor geometries shrank further, following Moore's law, and it became feasible to build wider ALUs on microprocessors.
Modern integrated circuit (IC) transistors are orders of magnitude smaller than those of the early microprocessors, making it possible to fit highly complex ALUs on ICs. Today, many modern ALUs have wide word widths, and architectural enhancements such as barrel shifters and binary multipliers[citation needed] that allow them to perform, in a single clock cycle, operations that would have required multiple operations on earlier ALUs.
ALUs can be realized as mechanical, electro-mechanical or electronic circuits[10][failed verification] and, in recent years, research into biological ALUs has been carried out[11][12] (e.g., actin-based).[13]
See also
[edit]References
[edit]- ^ a b Atul P. Godse; Deepali A. Godse (2009). "3". Digital Logic Design. Technical Publications. pp. 9–3. ISBN 978-81-8431-738-1.[permanent dead link]
- ^ a b Atul P. Godse; Deepali A. Godse (2009). "Appendix". Digital Logic Circuits. Technical Publications. pp. C–1. ISBN 978-81-8431-650-6.[permanent dead link]
- ^ "1. An Introduction to Computer Architecture - Designing Embedded Hardware, 2nd Edition [Book]". www.oreilly.com. Retrieved 2020-09-03.
- ^ Horowitz, Paul; Winfield Hill (1989). "14.1.1". The Art of Electronics (2nd ed.). Cambridge University Press. pp. 990–. ISBN 978-0-521-37095-0.
- ^ Barry, Peter; Crowley, Patrick (2012). Modern Embedded Computing. ISBN 978-0-12-391490-3.
- ^ Smith, Ryan. "Background: How GPUs Work". AnandTech. Archived from the original on February 28, 2014. Retrieved 14 January 2025.
- ^ Philip Levis (November 8, 2004). "Jonathan von Neumann and EDVAC" (PDF). cs.berkeley.edu. pp. 1, 3. Archived from the original (PDF) on September 23, 2015. Retrieved January 20, 2015.
- ^ Shirriff, Ken. "Inside the 74181 ALU chip: die photos and reverse engineering". Ken Shirriff's blog. Retrieved 7 May 2024.
- ^ Shirriff, Ken. "The Z-80 has a 4-bit ALU. Here's how it works." 2013, righto.com
- ^ Reif, John H. (2009), "Mechanical Computing: The Computational Complexity of Physical Devices", in Meyers, Robert A. (ed.), Encyclopedia of Complexity and Systems Science, New York, NY: Springer, pp. 5466–5482, doi:10.1007/978-0-387-30440-3_325, ISBN 978-0-387-30440-3, retrieved 2020-09-03
- ^ Lin, Chun-Liang; Kuo, Ting-Yu; Li, Wei-Xian (2018-08-14). "Synthesis of control unit for future biocomputer". Journal of Biological Engineering. 12 (1): 14. doi:10.1186/s13036-018-0109-4. ISSN 1754-1611. PMC 6092829. PMID 30127848.
- ^ Gerd Hg Moe-Behrens. "The biological microprocessor, or how to build a computer with biological parts".
- ^ Das, Biplab; Paul, Avijit Kumar; De, Debashis (2019-08-16). "An unconventional Arithmetic Logic Unit design and computing in Actin Quantum Cellular Automata". Microsystem Technologies. 28 (3): 809–822. doi:10.1007/s00542-019-04590-1. ISSN 1432-1858. S2CID 202099203.
Further reading
[edit]- Hwang, Enoch (2006). Digital Logic and Microprocessor Design with VHDL. Thomson. ISBN 0-534-46593-5.
- Stallings, William (2006). Computer Organization & Architecture: Designing for Performance (7th ed.). Pearson Prentice Hall. ISBN 0-13-185644-8.
External links
[edit]Arithmetic logic unit
View on GrokipediaOverview
Definition and Purpose
The arithmetic logic unit (ALU) is a combinational digital circuit designed to perform a variety of arithmetic and logical operations on binary inputs. It processes pairs of operands to execute functions such as addition, subtraction, bitwise AND, OR, and XOR, producing corresponding binary outputs without relying on sequential storage elements.[12] This design ensures that the ALU responds instantaneously to input changes, making it a fundamental building block for data manipulation in digital systems.[13] In central processing units (CPUs), the ALU serves as the primary execution core for arithmetic and logical instructions, handling the computational tasks essential to program execution. It enables the processor to perform basic data operations required by software, such as calculating sums or comparing values, thereby supporting the overall functionality of the computer system.[14] As a critical component of the CPU, the ALU integrates with registers to fetch operands and store results, facilitating efficient instruction processing.[7] The ALU occupies a central role in the von Neumann architecture, where the CPU is divided into distinct units: the ALU for computation, the control unit for instruction decoding and sequencing, and memory for storing both programs and data. This separation allows the ALU to focus solely on operand processing, receiving inputs from registers or memory via the control unit's orchestration, while outputs are routed back for further use or storage.[4] Unlike the control unit, which manages flow without direct computation, or memory, which provides passive storage, the ALU actively transforms binary data to enable algorithmic execution.[15] A textual representation of a basic ALU block diagram illustrates two operand inputs (A and B, typically n-bit wide), a function select input (a multi-bit control signal to choose the operation), and primary outputs consisting of the result (Y, n-bit) plus auxiliary status signals like carry-out or zero flag. This configuration positions the ALU as an interface between data sources in the CPU, ensuring operations align with instruction requirements while generating flags for conditional control.Role in Computer Architecture
The arithmetic logic unit (ALU) is a core component of the central processing unit (CPU) datapath, where it receives input operands from the data read ports of the register file and delivers computation results back to the register file for storage.[16] This integration enables efficient data flow within the processor, allowing operands to be fetched from general-purpose registers, processed by the ALU, and written back in a single cycle for basic operations.[17] The ALU interacts closely with the control unit, which decodes fetched instructions and generates control signals to route the appropriate opcode to the ALU, thereby selecting the specific operation to perform.[18] These signals direct the ALU's function selection mechanism, ensuring that the unit executes only the arithmetic or logical task mandated by the current instruction without unnecessary overhead.[19] In the instruction cycle, the ALU plays a pivotal role during the execution stage, where it computes results for arithmetic and logical instructions using the decoded operands from prior stages.[20] This stage involves the ALU applying operations such as addition or bitwise AND directly on register-sourced data, contributing to the overall throughput of instruction processing in the CPU.[21] The scope of the ALU differs between RISC and CISC architectures; in RISC designs, it focuses on straightforward register-to-register operations to simplify hardware and enable pipelining, while in CISC, it accommodates more intricate instructions that can reference memory alongside registers. This distinction influences processor efficiency, with RISC emphasizing ALU simplicity for faster execution cycles.[22]Signals and Interfaces
Data Inputs and Outputs
The arithmetic logic unit (ALU) receives two primary data inputs known as operands, typically denoted as A and B, each consisting of n bits representing binary integers.[23][24] These operands are loaded from processor registers or memory into the ALU's input ports, often via multiplexers that select the appropriate data paths based on the instruction being executed.[23] The ALU processes these inputs to produce an output result, which is generally an n-bit value matching the operand width, though certain operations may extend it to n+1 bits to accommodate carry or overflow bits.[24] Data bus widths in ALUs vary by processor architecture, commonly implemented as 8-bit, 16-bit, 32-bit, or 64-bit to align with the system's word size.[23] Wider bus widths enable handling of larger numerical ranges and greater precision in computations, thereby increasing the ALU's processing capacity for complex applications, but they also demand more transistor resources and can introduce propagation delays in carry chains without optimized designs like carry-lookahead adders.[24] For instance, a 64-bit ALU supports operands up to approximately 1.8 × 10^19 in unsigned magnitude, significantly expanding the scope of addressable memory and data manipulation compared to an 8-bit variant limited to 255.[23] ALUs handle both signed and unsigned data representations, with the same hardware circuitry often supporting both through interpretive conventions rather than distinct paths.[23] Unsigned operands treat all bits as magnitude, while signed ones use two's complement encoding, where the most significant bit indicates sign (0 for positive/zero, 1 for negative), allowing seamless extension for arithmetic without altering the core logic gates.[24] In data flow, for example, multiplexers route register values (e.g., from a general-purpose register file) to the A and B inputs, ensuring operands are properly aligned and zero- or sign-extended if necessary before ALU processing.[23] This setup permits opcode-driven selection of operations on the incoming data, integrating with broader datapath control.[24]Control Signals
The opcode functions as a multi-bit control input to the arithmetic logic unit (ALU), typically 3 to 4 bits wide, enabling the selection of specific operations from a set of 8 to 16 functions, such as addition (ADD), logical AND (AND), and shift right (SHR).[25][26] This binary code is fed into the ALU's function selection mechanism, where a decoder interprets it to route the appropriate arithmetic or logical circuitry for execution on the input operands.[16] Enable signals complement the opcode by gating the ALU's activity, activating processing only when asserted to prevent unnecessary computations and ensure proper timing in synchronous designs.[24] In clocked architectures, these signals synchronize ALU operations with the system clock, latching inputs and outputs at rising or falling edges to maintain data integrity across pipeline stages.[27] Without an enable signal, the ALU may default to a pass-through or hold state, conserving power in idle cycles.[28] A representative example of opcode decoding appears in single-cycle processor designs, where the control signals derive from the instruction's primary opcode and, for register operations, the function code subfield. The following textual truth table illustrates a simplified 3-bit opcode mapping for common ALU functions in a MIPS-like architecture:| Opcode | Operation |
|---|---|
| 000 | ADD |
| 001 | SUBTRACT |
| 010 | AND |
| 011 | OR |
| 100 | SLT (set on less than) |
| 101 | NOR |
| 110 | SHIFT LEFT |
| 111 | SHIFT RIGHT |
