Recent from talks
Nothing was collected or created yet.
R4000
View on Wikipedia
The R4000 is a microprocessor developed by MIPS Computer Systems that implements the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment (ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed for a number of reasons, but the R4000 found success in the workstation and server markets.
Models
[edit]There are three configurations of the R4000: the R4000PC, an entry-level model with no support for a secondary cache; the R4000SC, a model with secondary cache but no multiprocessor capability; and the R4000MC, a model with secondary cache and support for the cache coherency protocols required by multiprocessor systems.
Description
[edit]The R4000 is a scalar superpipelined microprocessor with an eight-stage integer pipeline. During the first stage (IF), a virtual address for an instruction is generated and the instruction translation lookaside buffer (TLB) begins the translation of the address to a physical address. In the second stage (IS), translation is completed and the instruction is fetched from an internal 8 KB instruction cache. The instruction cache is direct-mapped and virtually indexed, physically tagged. It has a 16- or 32-byte line size. Architecturally, it could be expanded to 32 KB.
During the third stage (RF), the instruction is decoded and the register file is read. The MIPS III defines two register files, one for the integer unit and the other for floating-point. Each register file is 64 bits wide and contained 32 entries. The integer register file has two read ports and one write port, while the floating-point register file has two read ports and two write ports. Execution begins at stage four (EX) for both integer and floating-point instructions; and is written back to the register files when completed in stage eight (WB). Results may be bypassed if possible.
Integer execution
[edit]The R4000 has an arithmetic logic unit (ALU), a shifter, multiplier and divider and load aligner for executing integer instructions. The ALU consists of a 64-bit carry-select adder and a logic unit and is pipelined. The shifter is a 32-bit barrel shifter. It performs 64-bit shifts in two cycles, stalling the pipeline as a result. This design was chosen to save die area. The multiplier and divider are not pipelined and have significant latencies: multiplies have a 10- or 20-cycle latency for 32-bit or 64-bit integers, respectively; whereas divides have a 69- or 133-cycle latency for 32-bit or 64-bit integers, respectively. Most instructions have a single cycle latency. The ALU adder is also used for calculating virtual addresses for loads, stores and branches.
Load and store instructions are executed by the integer pipeline, and access the on-chip 8 KB data cache.
Floating-point execution
[edit]The R4000 has an on-die IEEE 754-1985-compliant floating-point unit (FPU), referred to as the R4010. The FPU is a coprocessor designated CP1[1] (the MIPS ISA defined four coprocessors, designated CP0 to CP3). The FPU can operate in two modes, 32- or 64-bit which are selected by setting a bit, the FR bit, in the CPU status register. In 32-bit mode, the 32 floating-point registers become 32 bits wide when used to hold single-precision floating-point numbers. When used to hold double-precision numbers, there are 16 floating-point registers (the registers are paired).
The FPU can operate in parallel with the ALU unless there is a data or resource dependency, which causes it to stall. It contains three sub-units: an adder, a multiplier and a divider. The multiplier and divider can execute an instruction in parallel with the adder, but they use the adder in their final stages of execution, thus imposing limits to overlapping execution. Thus, under certain conditions, it can execute up to three instructions at any time, one in each unit. The FPU is capable of retiring one instruction per cycle.
The adder and multiplier are pipelined. The multiplier has a four-stage multiplier pipeline. It is clocked at twice the clock frequency of the microprocessor for adequate performance and uses dynamic logic to achieve the high clock frequency. Division has a 23- or 36-cycle latency for single- or double-precision operations and square-root has a 54- or 112-cycle latency. Division and square-root uses the SRT algorithm.
Memory management
[edit]The memory management unit (MMU) uses a 48-entry translation lookaside buffer to translate virtual addresses. The R4000 uses a 64-bit virtual address, but only implements 40 of the 64 bits, allowing 1 TB of virtual memory; the remaining bits are checked to ensure that they contain zero. The R4000 uses a 36-bit physical address, thus is able to address 64 GB of physical memory.
Secondary cache
[edit]The R4000 (SC and MC configurations only) supports an external secondary cache with a capacity of 128 KB to 4 MB. The cache is accessed via a dedicated 128-bit data bus. The secondary cache can be configured either as a unified cache or as a split instruction and data cache. In the latter configuration, each cache can have a capacity of 128 KB to 2 MB.[2] The secondary cache is physically indexed, physically tagged and has a programmable line size of 128, 256, 512 or 1,024 bytes. The cache controller is on-die. The cache is built from standard static random access memory (SRAM). The data and tag buses are ECC-protected.
System bus
[edit]The R4000 uses a 64-bit system bus called the SysAD bus. The SysAD bus was an address and data multiplexed bus, that is, it used the same set of wires to transfer data and addresses. While this reduces bandwidth, it is also less expensive than providing a separate address bus, which requires more pins and increases the complexity of the system. The SysAD bus can be configured to operate at half, a third or a quarter of the internal clock frequency. The SysAD bus generates its clock signal by dividing the operating frequency.
Transistor count, die dimensions and process details
[edit]The R4000 contains 1.2 million transistors.[3] It was designed for a 1.0 μm two-layer metal complementary metal–oxide–semiconductor (CMOS) process. As MIPS was a fabless company, the R4000 was fabricated by partners in their own processes, which had a 0.8 μm minimum feature size.[4]
Clocking
[edit]The R4000 generates the various clock signals from a master clock signal generated externally. For the operating frequency, the R4000 multiplies the master clock signal by two by use of an on-die phase-locked loop (PLL).
Packaging
[edit]The R4000PC is packaged in a 179-pin ceramic pin grid array (CPGA). The R4000SC and R4000MC are packaged in a 447-pin ceramic staggered pin grid array (SPGA). The pin out of the R4000MC is different from the R4000SC, with some pins which are unused on the R4000SC used for signals to implement cache coherency on the R4000MC. The pin-out of the R4000PC is similar to that of the PGA-packaged R4200 and R4600 microprocessors. This characteristic enables a properly designed system to use any of the three microprocessors.
R4400
[edit]

The R4400 is a further development of the R4000. It was announced in early November 1992. Samples of the microprocessor had been shipped to selected customers before then, with general availability in January 1993. The R4400 operates at clock frequencies of 100, 133, 150, 200, and 250 MHz. The only major improvement from the R4000 is larger primary caches, which were doubled in capacity to 16 KB each from 8 KB each. It contained 2.3 million transistors.
The R4400 was licensed by Integrated Device Technology (IDT), LSI Logic, NEC, Performance Semiconductor, Siemens AG and Toshiba. IDT, NEC, Siemens and Toshiba fabricated and marketed the microprocessor. LSI Logic used the R4400 in custom products. Performance Semiconductor sold their logic division to Cypress Semiconductor where the MIPS microprocessor products were discontinued.
NEC marketed their version as the VR4400. The first version, a 150 MHz part, was announced in November 1992. Early versions were fabricated in a 0.6 μm process.[5] In mid-1995, a 250 MHz part began sampling. It was fabricated in a 0.35 μm four-layer-metal process.[6] NEC also produced the MR4401, a ceramic multi-chip module (MCM) that contained a VR4400SC with ten 1 Mbit SRAM chips that implemented a 1 MB secondary cache. The MCM was pin-compatible with the R4x00PC. The first version, a 150 MHz part, was announced in 1994. In 1995, a 200 MHz part was announced.
Toshiba marketed their version as the TC86R4400. A 200 MHz part containing 2.3 million transistors and measuring 134 mm2 fabricated in a 0.3 μm process was introduced in mid-1994. The R4400PC was priced at $1,600, the R4400SC at $1,950, and the R4400MC at $2,150 in quantities of 10,000.[7]
Usage
[edit]The R4400 is used by:
- Carrera Computers in their Windows NT personal computers and workstations[8]
- Concurrent Computer Corporation in their real-time multiprocessor Maxion systems[9]
- DeskStation Technology in their Windows NT personal computers and DeskStation Tyne workstation[10]
- Digital Equipment Corporation in their DECstation 5000/260 workstation and server
- NEC Corporation in their RISCstation workstations, RISCserver servers, and Cenju-3 supercomputer[11]
- NeTPower in their Windows NT workstations and servers
- Pyramid Technology used the R4400MC in their Nile Series servers[12]
- Siemens Nixdorf Informationssysteme (SNI) in their RM-series UNIX servers and SR2000 mainframe
- Silicon Graphics in their Onyx, Indigo, Indigo2, and Indy workstations; and in their Challenge server
- Tandem Computers in their NonStop Himalaya fault-tolerant servers
Chipsets
[edit]The R4000 and R4400 microprocessors were interfaced to the system by custom ASICs or by commercially available chipsets. System vendors such as SGI developed their own ASICs for their systems. Commercial chipsets were developed, fabricated and marketed by companies such as Toshiba with their the Tiger Shark chipset, which provided a i486-compatible bus.[13]
Notes
[edit]- ^ MIPS R4000 Microprocessor User's Manual, Second Edition, p. 152
- ^ Heinrich, "MIPS R4000 Microprocessor User's Manual", p. 248
- ^ Mirapuri, "The Mips R4000 Processor", p. 10
- ^ Mirapuri, "The Mips R4000 Processor", p. 21
- ^ "NEC VR4400 RISC has 2m Transistors". Unigram/X. 30 November 1992. p. 4. Retrieved 6 October 2025.
- ^ "NEC Ready with 250MHz Version of the 64-Bit MIPS R4400 RISC". Unigram/X. 5 June 1995. p. 3. Retrieved 6 October 2025.
- ^ "Toshiba Has 200MHz MIPS R4400". Unigram/X. 30 May 1994. p. 3. Retrieved 6 October 2025.
- ^ "Tangent and Carrera Mimic Indigo at Half the Price". Unigram/X. 12 July 1993. p. 3. Retrieved 6 October 2025.
- ^ "Concurrent Debuts New Multiprocessor Systems with New Bus Architecture". Unigram/X. 18 October 1993. p. 4. Retrieved 6 October 2025.
- ^ "DeskStation Shows ARCStation-1 R-Series Personal Computer for NT". Unigram/X. 30 November 1992. p. 3. Retrieved 6 October 2025.
- ^ Byrnes, Anita (17 May 1993). "NEC Goes after the Business Market with New Unix Workstations". Unigram/X. p. 6. Retrieved 6 October 2025.
- ^ "Pyramid Technology Aims to Crash the Mainframe with Nile Series". Unigram/X. 11 October 1993. Retrieved 6 October 2025.
- ^ "Toshiba Samples 80486-Bus Chip Set For R-Series".
References
[edit]- Heinrich, Joe. MIPS R4000 Microprocessor User's Manual, Second Edition.
- Sunil Mirapuri, Michael Woodacre, Nader Vasseghi, "The Mips R4000 Processor," IEEE Micro, vol. 12. no. 2, pp. 10–22, March/April 1992
R4000
View on GrokipediaDevelopment and History
Background and Design Goals
The MIPS R4000 represented a pivotal evolution in the company's R-series processors, transitioning from the 32-bit R3000—introduced in 1988—to a 64-bit superpipelined implementation of the MIPS instruction set architecture (ISA). This shift was motivated by the need to address growing demands for larger memory addressing and enhanced computational capabilities in high-performance computing environments, building directly on the R3000's pipelined design while extending register widths, data paths, and virtual address spaces to 64 bits. The development began in the late 1980s, with MIPS Computer Systems deciding by late 1988 to pursue a true 64-bit CPU as its next-generation processor.[4][5] Central to the R4000's design goals was achieving high performance, targeting approximately 100 SPECmarks by 1993 through a superpipelined architecture that aimed for an execution rate of about one instruction per cycle via an 8-stage pipeline, alongside support for clock speeds up to 100 MHz. The processor was engineered to provide full 64-bit operations across registers, the arithmetic logic unit (ALU), floating-point unit (FPU), and system bus, enabling efficient handling of large datasets and extended addressing while maintaining complete backward compatibility with 32-bit MIPS software from earlier R-series models like the R2000, R3000, and R6000. This compatibility ensured seamless migration for existing applications in user mode, a key consideration for adoption in UNIX workstations and embedded systems.[6][4] The R4000's architecture drew heavily from foundational RISC principles, including load-store design, fixed-length instructions, and reliance on optimizing compilers, which had been advanced through academic and industry efforts in the late 1970s and early 1980s. Key influences included MIPS co-founder John Hennessy's work on simplified instruction sets at Stanford, alongside competitive pressures from rival RISC architectures such as Sun Microsystems' SPARC and Hewlett-Packard's PA-RISC, which were vying for dominance in the workstation market during the late 1980s. The core design team at MIPS Computer Systems, including engineers like Sunil Mirapuri, Michael Woodacre, and Nader Vasseghi, integrated these elements to create a highly integrated single-chip solution with on-board caches and memory management unit.[4][7]Release and Initial Adoption
The MIPS R4000 microprocessor was officially announced by MIPS Computer Systems on October 1, 1991, marking it as one of the first commercially available 64-bit RISC processors and the inaugural implementation of the MIPS III instruction set architecture.[8] This launch came as part of the Advanced Computing Environment (ACE) initiative, a consortium involving partners like Compaq, Digital Equipment Corporation (DEC), and Microsoft, aimed at standardizing hardware and software for high-performance computing.[8] First shipments of the chip began in early 1992, enabling initial system integrations shortly thereafter.[9] Initial specifications featured an external master clock of 50 MHz, doubling to a 100 MHz internal pipeline clock for enhanced instruction throughput, with power consumption under 2 watts at this frequency.[2][10] However, early production encountered challenges, including high costs exceeding $300 per chip and yield limitations that necessitated fabrication across multiple partners such as Integrated Device Technology (IDT), Performance Semiconductor, LSI Logic, Siemens-Nixdorf, and NEC to meet demand.[11][8] These factors positioned the R4000 as a premium component, primarily for professional and enterprise applications rather than consumer markets. Adoption accelerated through key partnerships, notably with Silicon Graphics (SGI), which integrated the R4000 into its IRIS Crimson workstation released in January 1992—the first 64-bit SGI system—and extended its use to early supercomputing via the Challenge server series for scalable multiprocessing.[9][12] These deployments highlighted the R4000's suitability for graphics-intensive and scientific workloads, despite initial hurdles, and helped establish MIPS architecture in high-end computing niches during the early 1990s.[8]Core Architecture
Integer Execution Unit
The integer execution unit of the MIPS R4000 employs a superpipelined design with eight stages to enable higher clock frequencies while maintaining compatibility with the MIPS instruction set architecture. These stages consist of IF (first half of instruction fetch, including PC selection and primary cache access initiation), IS (second half of instruction fetch), RF (register file access and decode), EX (execution of ALU operations and branch condition evaluation), DF (first half of data memory access), DS (second half of data memory access and result selection), TC (tag check for cache hit/miss determination), and WB (writeback to the register file). This structure extends the classic five-stage pipeline by subdividing fetch, decode, execute, memory, and writeback phases, allowing the processor to sustain one instruction completion per cycle in steady state despite the deeper pipeline.[13][14] The R4000 operates as a scalar processor, issuing and completing one integer instruction per cycle in an in-order manner, without superscalar dual-issue capabilities; however, out-of-order completion is not supported, as results are committed in program order during the WB stage. The unit includes a 64-bit integer arithmetic logic unit (ALU) capable of performing operations such as addition, subtraction, bitwise logic (AND, OR, XOR, NOR), comparisons, and shifts on 64-bit operands, with dedicated hardware for multiplication (using a 2-bit Booth recoding algorithm, yielding 10-cycle latency for 32-bit results and 20 cycles for 64-bit) and division (1-bit per iteration, with 32 cycles for 32-bit quotients and 64 cycles for 64-bit). Representative examples include 64-bit add (ADD) for basic arithmetic and shift left logical (SLL) for variable shifts up to 63 bits, emphasizing the unit's support for both 32-bit legacy and native 64-bit computations under the MIPS III ISA.[15][16] The load/store unit is tightly integrated into the pipeline, handling address generation in the EX stage and memory operations across DF, DS, and TC stages to support 64-bit aligned and unaligned accesses; loads produce results available for forwarding from the DS/TC boundary, while stores commit data during TC upon cache hit confirmation. This design accommodates the 64-bit virtual addressing and data paths, enabling efficient integer load (e.g., LD for 64-bit) and store (SD) instructions without separate address generation hardware beyond the ALU. Branch handling relies on a static delayed branch mechanism rather than dynamic prediction, requiring a one-instruction delay slot that is always executed; untaken branches incur no additional penalty beyond the slot, but taken branches resolved in EX may stall the pipeline by up to three cycles if the delay slot cannot mitigate the fetch disruption, with no branch history table employed.[17][4]Floating-Point Execution Unit
The R4000 microprocessor incorporates a dedicated Floating-Point Unit (FPU) as Coprocessor 1 (CP1), tightly integrated on-chip to extend the CPU's instruction set for floating-point arithmetic operations. This FPU employs an eight-stage superpipelined architecture that aligns with the integer pipeline, enabling overlapping execution of FP instructions with the integer pipeline. The design facilitates seamless interaction between the CPU and FPU, with FP instructions mapped to coprocessor formats that allow the CPU to stall if the FPU is busy, ensuring data dependencies are resolved before proceeding.[16][4] The FPU fully complies with the IEEE 754-1985 standard, providing accurate representation and computation for floating-point numbers, including support for single-precision (32-bit), double-precision (64-bit), and limited quadruple-precision operations through paired registers. It features specialized functional units for addition/subtraction, multiplication, and division, with the pipeline including stages such as Unpack (U), Shift (S), Mantissa Add (A), Multiplier (M/N), Rounding (R), and Exception Test (E), which instructions traverse in operation-specific sequences. For instance, addition and subtraction operations typically complete in 4 cycles for both single and double precision, while multiplication requires 7 cycles for single precision and 8 cycles for double precision; division latencies are longer at 23 cycles (single) and 36 cycles (double), reflecting the iterative nature of these computations. These latencies allow for efficient pipelining, where subsequent instructions can initiate after the repeat rate (e.g., 3 cycles for add, 4 for multiply), minimizing stalls in well-scheduled code.[16][4][18] Exception handling in the FPU adheres to IEEE 754 recommendations, detecting and signaling conditions such as underflow, overflow, and denormalized operands during execution. These exceptions are recorded in the Floating-Point Control/Status Register (FCR31), which includes fields for Cause, Enable, and Flags to prioritize and manage traps— for example, overflow and underflow take precedence over inexact results. Upon detection, the FPU signals a trap to the CPU, which uses the Exception Program Counter (EPC) in Coprocessor 0 to invoke software handlers, allowing precise recovery or emulation as needed; denormalized numbers are handled with gradual underflow to maintain accuracy without immediate trapping unless enabled. This mechanism ensures robust operation in numerical applications, with the FPU supporting fused multiply-add instructions that combine multiplication and addition in a single pipelined pass to reduce rounding errors.[16][4]Memory Management Unit
The MIPS R4000 Memory Management Unit (MMU) implements a 64-bit virtual addressing scheme with backward compatibility to 32-bit addressing, enabling a vast address space divided into distinct segments that vary by mode. In 32-bit compatibility mode, segments include KUSEG for user-mode access (0 to 2^{31} or 2 GB), KSEG0 and KSEG1 for kernel-mode direct physical mapping (cached and uncached, respectively, each 2^{29} or 512 MB), and KSEG2/KSSEG for kernel-mode and supervisor TLB-mapped regions (each 512 MB). In native 64-bit mode, extended segments provide larger spaces, such as XKUSEG for user-mode (up to 2^{40} or 1 TB), and XKSEG for kernel-mode TLB-mapped regions (up to 1 TB). This design supports both 32-bit and 64-bit operations, with virtual addresses translated to 36-bit physical addresses in 64-bit mode, allowing for efficient handling of large memory configurations in multiprocessor systems.[16][4] At the core of the MMU is a fully associative 48-entry Translation Lookaside Buffer (TLB) with 64-bit entries, each mapping two consecutive pages (odd/even pairs) for a total of 96 effective pages and supporting software expansion through refill mechanisms. The TLB entries include fields for virtual page number (VPN), physical frame number (PFN), address space identifier (ASID) for process isolation, page mask for variable sizing, and attribute bits such as valid (V), dirty (D) for writability, and global for shared mappings. Page sizes range from 4 KB to 16 MB in powers-of-four increments (4 KB, 16 KB, 64 KB, 256 KB, 1 MB, 4 MB, 16 MB), facilitated by multi-level page tables managed in software to accommodate hierarchical translation structures during misses.[16][15] Protection is enforced through user/kernel mode separation via the Status register's KSU (Kernel/Supervisor/User) bits, restricting access to segments accordingly, alongside per-page permissions controlled by TLB attributes: the valid bit governs read and execute access (with instruction fetches treated as reads), while the dirty bit enables write permissions. Violations trigger privilege or access exceptions, ensuring secure isolation between user and kernel spaces. TLB misses are handled by generating specific exceptions—such as TLB Refill for fast handler intervention or general TLB Miss/Invalid for full traps—which invoke software routines to probe multi-level page tables using Context or XContext registers, load the appropriate translation into the TLB via instructions like TLBWI (write indexed) or TLBWR (write random), and restart the faulting instruction from the Exception Program Counter (EPC). This software-managed approach allows flexible OS policies while minimizing hardware overhead.[16][19] Integer load and store instructions interact with the MMU by generating virtual addresses that undergo TLB lookup for translation and protection checks prior to cache access.[16]Cache and System Interface
Secondary Cache Design
The R4000 microprocessor incorporates an on-chip primary cache system based on a Harvard architecture, featuring a separate 8 KB instruction cache (I-cache) and an 8 KB data cache (D-cache). Both caches are direct-mapped with virtual indexing and physical tagging, with software-programmable line sizes of 16 or 32 bytes to optimize access latency within the processor's pipeline. The I-cache supports up to two accesses per clock cycle for instruction fetches, while the D-cache handles load and store operations through dedicated 64-bit data paths.[16] The external secondary cache interface enables support for off-chip cache implementations ranging from 128 KB to 4 MB in joint (unified instruction/data) mode or up to 2 MB in split (separate instruction/data) mode, configured as direct-mapped. This secondary cache is physically indexed and tagged, with line sizes programmable from 16 to 128 bytes (4 to 32 words), and requires an external tag RAM array that is 25 bits wide per entry, including valid, dirty, and error-checking fields with 7-bit ECC for single-error correction and double-error detection. Cache indexing relies on physical addresses generated by the memory management unit to ensure consistency across virtual-to-physical translations. The tag RAM is managed via processor registers such as TagLo and TagHi, allowing software control over cache operations like index invalidation and tag loading.[16] Cache coherency in the R4000 follows a MESI-like protocol extended for multiprocessor environments, with five states per cache line: Invalid, Shared, Clean Exclusive, Dirty Exclusive, and Dirty Shared. This protocol maintains consistency between the primary and secondary caches, as well as across multiple processors, using interventions such as write invalidates for shared data and updates for modified lines. In multiprocessor setups, coherency attributes like Uncached, Noncoherent, Sharable, Update, and Exclusive are applied to memory regions to enforce appropriate synchronization.[16] The secondary cache employs a write-back policy, where modified (dirty) lines are retained until eviction, at which point they are written back to main memory. To handle write-back efficiently, the R4000 includes a victim buffer that temporarily stores dirty lines displaced from the cache, reducing bus traffic and supporting burst transfers of 4- or 8-word blocks during write cycles. This design minimizes latency for write operations while ensuring data integrity in both uniprocessor and multiprocessor configurations.[16]System Bus Specifications
The R4000 microprocessor employs a 64-bit multiplexed address and data bus known as the SysAD bus, which serves as the primary interface for memory and I/O interactions in MIPS-based systems. This bidirectional bus combines address and data signals on the same 64 wires, enabling efficient transfers by first asserting the address during the initial cycle and then multiplexing data on subsequent cycles. Operating within a 36-bit physical address space (supporting up to 64 GB), the SysAD bus facilitates both big-endian and little-endian byte ordering and includes an 8-bit check bus (SysADC) for error detection on the main lines, along with a 9-bit command bus (SysCmd) protected by a parity bit (SysCmdP) to specify transaction types.[16] Bus arbitration is handled externally via a centralized controller, accommodating both single-processor and multiprocessor configurations with built-in support for cache coherency through snoop protocols. The processor asserts a request signal (Req) to gain bus mastery, which is granted by the external arbiter via the Gnt signal; alternatively, external agents can request control using the low-active ExtRqst* input, with the processor responding via the low-active Release* output to relinquish the bus for one cycle. This mechanism ensures orderly access in shared environments, with the system clock (SysClk) driving all transactions on its rising edge and running at half the processor clock frequency for synchronization. Electrical specifications align with TTL-compatible levels, requiring a stable +5V supply (VccOk above 4.75V for over 100 ms) and supporting peak data rates of 400 Mbytes/second at 50 MHz, with timings controlled by delays of 0.5T, 0.75T, or T relative to the master clock period.[16] Transaction types on the SysAD bus include single-beat reads and writes for individual 64-bit doublewords, as well as burst modes optimized for cache line fills, transferring four consecutive 64-bit words (256 bits total) in a single operation to minimize latency. Additional commands cover invalidates and updates for cache maintenance, snoops for coherence in multiprocessor setups, and specialized loads/stores (e.g., LDL/SDR for partial doubleword handling across boundaries). The ValidIn* and ValidOut* signals (low-active) indicate valid data from external agents or the processor, respectively, while the IvdErr* signal reports invalid errors during transactions. Secondary cache tags are interfaced through this bus for coherence checks.[16] The following table summarizes key SysAD bus signals:| Signal | Type | Description |
|---|---|---|
| SysAD[63:0] | Bidirectional | 64-bit multiplexed address/data lines. |
| SysADC[7:0] | Bidirectional | 8-bit check bits for SysAD error detection. |
| SysCmd[8:0] | Bidirectional | 9-bit bus encoding transaction commands and data identifiers. |
| SysCmdP | Bidirectional | Parity bit for SysCmd integrity. |
| SysClk | Input | System clock for latching and sampling (half processor clock rate). |
| Req | Output | Processor bus request assertion. |
| Gnt | Input | External grant for bus access. |
| ExtRqst* | Input | Low-active external bus request (U2 timing). |
| Release* | Output | Low-active bus release to external agent. |
| ValidIn* | Input | Low-active valid data indicator from external source. |
| ValidOut* | Output | Low-active valid data indicator from processor. |
| IvdErr* | Bidirectional | Invalid error reporting during bus operations. |
Physical Implementation
Process Technology and Die Details
The R4000 microprocessor was initially fabricated using a 1.0 μm complementary metal-oxide-semiconductor (CMOS) process with two layers of metal interconnection.[20] As a fabless design from MIPS Technologies, production was handled by licensed partners including Toshiba, NEC, LSI Logic, and Integrated Device Technology (IDT), who adapted the core to their specific fabrication capabilities.[3] The initial R4000 die measured 213 mm² and incorporated 1.35 million transistors, enabling its superscalar and superpipelined architecture within the constraints of early 1990s semiconductor manufacturing.[3][21] Subsequent iterations, such as the 100 MHz clocked version, reduced the die size to 165 mm² while maintaining approximately 1.3 million transistors, reflecting optimizations in layout density.[3] Power dissipation for the R4000 varied with operating frequency and supply voltage, typically scaling linearly with clock speed under constant voltage conditions; related variants like the R4400 consumed around 20 W at 150 MHz and 5 V.[22] Initial models operated at 5 V, contributing to thermal management challenges in high-performance systems.Clocking Mechanisms
The R4000 microprocessor utilizes an on-chip phase-locked loop (PLL) to generate its internal processor clock (PClock) by multiplying the external master clock (MasterClock) frequency by two, enabling the pipeline to operate at twice the system clock rate for improved performance. This PLL synchronizes key internal clocks—including PClock, system clock (SClock), transmit clock (TClock), and receive clock (RClock)—to the MasterClock input, using dedicated passive components such as resistors and capacitors connected to PLLCap pins for stable operation and lock acquisition during reset. The PLL's design incorporates quiet power supplies (VccP and VssP) to mitigate noise, ensuring reliable frequency multiplication up to 2x while aligning phases across the chip. Clock distribution in the R4000 emphasizes phase alignment to the eight-stage pipeline, with signals routed to minimize skew between MasterClock and internal clocks; this is achieved through SyncIn/SyncOut mechanisms that adjust timing delays (e.g., 0.5T, 0.75T, or 1T of the MasterClock period) and a Δi/Δt control system to suppress ground bounce during high-frequency transitions. SClock, derived by dividing PClock by configurable ratios of 2, 3, or 4 via boot-mode settings or the Config register, drives the external system interface while maintaining synchronization with pipeline stages. Initial production of the R4000 supported external MasterClock frequencies ranging from 25 to 50 MHz, yielding internal PClock rates of 50 to 100 MHz. For stable PLL locking and operation, the input MasterClock must exhibit a near-50% duty cycle and low jitter, with vendor-specified tolerances (typically under 500 ps peak-to-peak) to prevent synchronization failures or performance degradation.Packaging Options
The R4000 microprocessor offered distinct packaging variants to support a range of system designs, from high-end workstations requiring full bus and cache interfaces to cost-optimized embedded or desktop applications. The primary package for models with complete system bus support, such as the R4000SC and R4000MC, was a 447-pin ceramic staggered pin grid array (SPGA) or land grid array/pin grid array (LGA/PGA), enabling connections for the 64-bit SysAD bus, 128-bit secondary cache data path, and additional control signals.[15][4] An alternative packaging option targeted cost-reduced systems with limited cache requirements, utilizing a 179-pin ceramic pin grid array (CPGA) in the R4000PC configuration, which excluded secondary cache pins while preserving essential system interface connectivity.[4][23] Thermal management in these packages emphasized robust heat dissipation due to the processor's superpipelined design and power needs at frequencies up to 50 MHz, with a dedicated heatsink required for the PGA variants to maintain operational integrity.[4] Power and ground pin assignments were strategically distributed for reliability, featuring multiple Vcc and Vss pins (e.g., A2, A4 for Vcc and A3, A6 for Vss in the 179-pin package) alongside isolated VccP and VssP pins (e.g., K17 and K16) for the PLL to reduce noise and ensure stable clocking.[23] While the original R4000 series adhered to PGA and CPGA formats, later derivatives like the R5000 introduced ball grid array (BGA) packaging, such as the 272-pin BGA, for enhanced pin density and integration, though adoption remained limited in the core R4000 lineup.[24]Variants and Derivatives
Primary R4000 Models
The primary R4000 models consist of the R4000PC, R4000SC, and R4000MC variants, each optimized for distinct performance and system integration needs while sharing the core 64-bit MIPS III architecture with integrated integer and floating-point units. These models were introduced by MIPS Technologies in 1991, with initial sampling in late 1991 and volume production commencing in early 1992, remaining available through 1995 before being largely supplanted by enhanced derivatives. Production volumes for the R4000 family contributed to MIPS exceeding 1 million chip shipments overall by 1993, though specific figures for individual models are not publicly detailed in contemporary reports.[20][25] The R4000PC targets cost-sensitive uniprocessor systems like desktops and embedded controllers, incorporating 8 KB on-chip instruction and data caches with no interface for external secondary cache. It operates at external clock speeds of 40–50 MHz (internal pipeline up to 100 MHz via clock doubling) and uses a 179-pin PGA package with a 64-bit multiplexed system bus interface providing up to 400 MB/s bandwidth at 50 MHz. This model supports six dedicated interrupt pins and lacks cache coherency logic, prioritizing simplicity and low power over high-end scalability.[4][26][16] In contrast, the R4000SC serves high-performance uniprocessor applications such as workstations and servers, featuring the same 8 KB primary caches alongside a dedicated 128-bit external secondary cache interface supporting 128 KB to 4 MB of off-chip cache organized in 4–32 word lines. It runs at 40–50 MHz external clocks (internal up to 100 MHz) and employs a 447-pin LGA or PGA package, including a 25-bit tag bus for secondary cache management and a single interrupt pin, but without multiprocessing support. This configuration enables higher memory bandwidth and hit rates compared to the R4000PC, at the cost of increased system complexity.[4][2][26] The R4000MC extends the R4000SC design for multiprocessor environments, adding enhanced snoop logic and cache coherency protocols to handle invalidate, update, and acknowledgment signals across multiple processors. It retains the 8 KB primary caches and secondary cache interface (128 KB–4 MB), operates at similar 40–50 MHz external speeds, and uses the 447-pin LGA/PGA package with additional pins for non-maskable interrupts and multiprocessor bus arbitration. This variant supports master/checker modes and is suited for symmetric multiprocessing systems, providing protocol-level coherence without requiring external controllers.[4][15][26]| Model | Primary Caches | Secondary Cache Interface | Clock Speeds (External/Internal) | Package | Key Differentiator |
|---|---|---|---|---|---|
| R4000PC | 8 KB I/D | None | 40–50 / 80–100 MHz | 179-pin PGA | Cost-optimized uniprocessor |
| R4000SC | 8 KB I/D | 128-bit (128 KB–4 MB) | 40–50 / 80–100 MHz | 447-pin LGA/PGA | High-performance uniprocessor |
| R4000MC | 8 KB I/D | 128-bit (128 KB–4 MB) | 40–50 / 80–100 MHz | 447-pin LGA/PGA | Multiprocessor coherency |
