Hubbry Logo
MicrocodeMicrocodeMain
Open search
Microcode
Community hub
Microcode
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Microcode
Microcode
from Wikipedia

In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer.[1] It consists of a set of hardware-level instructions that implement the higher-level machine code instructions or control internal finite-state machine sequencing in many digital processing components. While microcode is utilized in Intel and AMD general-purpose CPUs in contemporary desktops and laptops, it functions only as a fallback path for scenarios that the faster hardwired control unit is unable to manage.[2]

Housed in special high-speed memory, microcode translates machine instructions, state machine data, or other input into sequences of detailed circuit-level operations. It separates the machine instructions from the underlying electronics, thereby enabling greater flexibility in designing and altering instructions. Moreover, it facilitates the construction of complex multi-step instructions, while simultaneously reducing the complexity of computer circuits. The act of writing microcode is often referred to as microprogramming, and the microcode in a specific processor implementation is sometimes termed a microprogram.

Through extensive microprogramming, microarchitectures of smaller scale and simplicity can emulate more robust architectures with wider word lengths, additional execution units, and so forth. This approach provides a relatively straightforward method of ensuring software compatibility between different products within a processor family.

Overview

[edit]

Instruction sets

[edit]

At the hardware level, processors contain a number of separate areas of circuitry, or "units", that perform different tasks. Commonly found units include the arithmetic logic unit (ALU) which performs instructions such as addition or comparing two numbers, circuits for reading and writing data to external memory, and small areas of onboard memory to store these values while they are being processed. In most designs, additional high-performance memory, the register file, is used to store temporary values, not just those needed by the current instruction.[3]

To properly perform an instruction, the various circuits have to be activated in order. For instance, it is not possible to add two numbers if they have not yet been loaded from memory. In RISC designs, the proper ordering of these instructions is largely up to the programmer, or at least to the compiler of the programming language they are using. So to add two numbers in memory and store the result in memory, for instance, the compiler may output instructions to load one of the values into one register, the second into another, perform the addition function in the ALU, putting the result into a register, and then store that register into memory.[3]

As the sequence of instructions needed to complete this higher-level concept, "add these two numbers in memory", may require multiple instructions, this can represent a performance bottleneck if those instructions are stored in main memory. Reading those instructions one by one takes time that could be used to read and write the actual data. For this reason, it is common for non-RISC designs to have many different instructions that differ largely on where they store data. For instance, the MOS 6502 has eight variations of the addition instruction, ADC, which differ only in where they look to find the two operands.[4]

Using the variation of the instruction, or "opcode", that most closely matches the ultimate operation can reduce the number of instructions to one, saving memory used by the program code and improving performance by leaving the data bus open for other operations. Internally, however, these instructions are not separate operations, but sequences of the operations the units actually perform. Converting a single instruction read from memory into the sequence of internal actions is the duty of the control unit, another unit within the processor.[5]

Microcode

[edit]

The basic idea behind microcode is to replace the custom hardware logic implementing the instruction sequencing with a series of simple instructions run in a "microcode engine" in the processor. Whereas a custom logic system might have a series of diodes and gates that output a series of voltages on various control lines, the microcode engine is connected to these lines instead, and these are turned on and off as the engine reads the microcode instructions in sequence. The microcode instructions are often bit encoded to those lines, for instance, if bit 8 is true, that might mean that the ALU should be paused awaiting data. In this respect microcode is somewhat similar to the paper rolls in a player piano, where the holes represent which key should be pressed.

The distinction between custom logic and microcode may seem small, one uses a pattern of diodes and gates to decode the instruction and produce a sequence of signals, whereas the other encodes the signals as microinstructions that are read in sequence to produce the same results. The critical difference is that in a custom logic design, changes to the individual steps require the hardware to be redesigned. Using microcode, all that changes is the code stored in the memory containing the microcode. This makes it much easier to fix problems in a microcode system. It also means that there is no effective limit to the complexity of the instructions, it is only limited by the amount of memory one is willing to use.

The lowest layer in a computer's software stack is traditionally raw machine code instructions for the processor. In microcoded processors, fetching and decoding those instructions, and executing them, may be done by microcode. To avoid confusion, each microprogram-related element is differentiated by the micro prefix: microinstruction, microassembler, microprogrammer, etc.[6]

Complex digital processors may also employ more than one (possibly microcode-based) control unit in order to delegate sub-tasks that must be performed essentially asynchronously in parallel. For example, the VAX 9000 has a hardwired IBox unit to fetch and decode instructions, which it hands to a microcoded EBox unit to be executed,[7] and the VAX 8800 has both a microcoded IBox and a microcoded EBox.[8]

A high-level programmer, or even an assembly language programmer, does not normally see or change microcode. Unlike machine code, which often retains some backward compatibility among different processors in a family, microcode only runs on the exact electronic circuitry for which it is designed, as it constitutes an inherent part of the particular processor design itself.

Design

[edit]

Engineers normally write the microcode during the design phase of a processor, storing it in a read-only memory (ROM) or programmable logic array (PLA)[9] structure, or in a combination of both.[10] However, machines also exist that have some or all microcode stored in static random-access memory (SRAM) or flash memory. This is traditionally denoted as writable control store in the context of computers, which can be either read-only or read–write memory. In the latter case, the CPU initialization process loads microcode into the control store from another storage medium, with the possibility of altering the microcode to correct bugs in the instruction set, or to implement new machine instructions.

Microprograms

[edit]

Microprograms consist of series of microinstructions, which control the CPU at a very fundamental level of hardware circuitry. For example, a single typical horizontal microinstruction might specify the following operations simultaneously:

  • Connect register 1 to the A side of the ALU
  • Connect register 7 to the B side of the ALU
  • Set the ALU to perform two's-complement addition
  • Set the ALU's carry input to zero
  • Connect the ALU output to register 8
  • Update the condition codes from the ALU status flags (negative, zero, overflow, and carry)
  • Microjump to a given μPC address for the next microinstruction

To simultaneously control all processor's features in one cycle, the microinstruction is often wider than 50 bits; e.g., 128 bits on a 360/85 with an emulator feature. Microprograms are carefully designed and optimized for the fastest possible execution, as a slow microprogram would result in a slow machine instruction and degraded performance for related application programs that use such instructions.

Justification

[edit]

Microcode was originally developed as a simpler method of developing the control logic for a computer. Initially, CPU instruction sets were hardwired. Each step needed to fetch, decode, and execute the machine instructions (including any operand address calculations, reads, and writes) was controlled directly by combinational logic and rather minimal sequential state machine circuitry. While such hard-wired processors were very efficient, the need for powerful instruction sets with multi-step addressing and complex operations (see below) made them difficult to design and debug; highly encoded and varied-length instructions can contribute to this as well, especially when very irregular encodings are used.

Microcode simplified the job by allowing much of the processor's behaviour and programming model to be defined via microprogram routines rather than by dedicated circuitry. Even late in the design process, microcode could easily be changed, whereas hard-wired CPU designs were very cumbersome to change. Thus, this greatly facilitated CPU design.

From the 1940s to the late 1970s, a large portion of programming was done in assembly language; higher-level instructions mean greater programmer productivity, so an important advantage of microcode was the relative ease by which powerful machine instructions can be defined. The ultimate extension of this are "Directly Executable High Level Language" designs, in which each statement of a high-level language such as PL/I is entirely and directly executed by microcode, without compilation. The IBM Future Systems project and Data General Fountainhead Processor are examples of this. During the 1970s, CPU speeds grew more quickly than memory speeds and numerous techniques such as memory block transfer, memory pre-fetch and multi-level caches were used to alleviate this. High-level machine instructions, made possible by microcode, helped further, as fewer more complex machine instructions require less memory bandwidth. For example, an operation on a character string can be done as a single machine instruction, thus avoiding multiple instruction fetches.

Architectures with instruction sets implemented by complex microprograms included the IBM System/360 and Digital Equipment Corporation VAX. The approach of increasingly complex microcode-implemented instruction sets was later called complex instruction set computer (CISC). An alternate approach, used in many microprocessors, is to use one or more programmable logic array (PLA) or read-only memory (ROM) (instead of combinational logic) mainly for instruction decoding, and let a simple state machine (without much, or any, microcode) do most of the sequencing. The MOS Technology 6502 is an example of a microprocessor using a PLA for instruction decode and sequencing. The PLA is visible in photomicrographs of the chip,[11] and its operation can be seen in the transistor-level simulation.

Microprogramming is still used in modern CPU designs. In some cases, after the microcode is debugged in simulation, logic functions are substituted for the control store.[citation needed] Logic functions are often faster and less expensive than the equivalent microprogram memory.

Benefits

[edit]

A processor's microprograms operate on a more primitive, totally different, and much more hardware-oriented architecture than the assembly instructions visible to normal programmers. In coordination with the hardware, the microcode implements the programmer-visible architecture. The underlying hardware need not have a fixed relationship to the visible architecture. This makes it easier to implement a given instruction set architecture on a wide variety of underlying hardware micro-architectures.

The IBM System/360 has a 32-bit architecture with 16 general-purpose registers, but most of the System/360 implementations use hardware that implements a much simpler underlying microarchitecture; for example, the System/360 Model 30 has 8-bit data paths to the arithmetic logic unit (ALU) and main memory and implemented the general-purpose registers in a special unit of higher-speed core memory, and the System/360 Model 40 has 8-bit data paths to the ALU and 16-bit data paths to main memory and also implemented the general-purpose registers in a special unit of higher-speed core memory. The Model 50 has full 32-bit data paths and implements the general-purpose registers in a special unit of higher-speed core memory.[12] The Model 65 through the Model 195 have larger data paths and implement the general-purpose registers in faster transistor circuits.[citation needed] In this way, microprogramming enabled IBM to design many System/360 models with substantially different hardware and spanning a wide range of cost and performance, while making them all architecturally compatible. This dramatically reduces the number of unique system software programs that must be written for each model.

The Digital Equipment Corporation PDP-11 is a 16-bit architecture with eight general-purpose registers. It was introduced in 1970 and the basic architecture remained unchanged through the 1990s. Only the original machine in the series was not microcoded. From the years 1972 to 1976, the width of the PDP-11's underlying microprogram varied from 22 to 64 bits. The length of the microprogram varied from 256 to 1,024 words with longer microprogram lengths generally corresponding to narrower widths.[13] The variety of microprogram widths imply that there were at least seven different implementations of the PDP-11 in just four years.

A similar approach was used by Digital Equipment Corporation in their VAX family of computers. As a result, different VAX processors use different microarchitectures, yet the programmer-visible architecture does not change.

Microprogramming also reduces the cost of field changes to correct defects (bugs) in the processor; a bug can often be fixed by replacing a portion of the microprogram rather than by changes being made to hardware logic and wiring.

History

[edit]

Early examples

[edit]

The ACE computer, designed by Alan Turing in 1946, used microprogramming.[14]

In 1947, the design of the MIT Whirlwind introduced the concept of a control store as a way to simplify computer design and move beyond ad hoc methods. The control store is a diode matrix: a two-dimensional lattice, where one dimension accepts "control time pulses" from the CPU's internal clock, and the other connects to control signals on gates and other circuits. A "pulse distributor" takes the pulses generated by the CPU clock and breaks them up into eight separate time pulses, each of which activates a different row of the lattice. When the row is activated, it activates the control signals connected to it.[15]

In 1951, Maurice Wilkes[16] enhanced this concept by adding conditional execution, a concept akin to a conditional in computer software. His initial implementation consisted of a pair of matrices: the first one generated signals in the manner of the Whirlwind control store, while the second matrix selected which row of signals (the microprogram instruction word, so to speak) to invoke on the next cycle. Conditionals were implemented by providing a way that a single line in the control store could choose from alternatives in the second matrix. This made the control signals conditional on the detected internal signal. Wilkes coined the term microprogramming to describe this feature and distinguish it from a simple control store.

The 360

[edit]

Microcode remained relatively rare in computer design as the cost of the ROM needed to store the code was not significantly different from the cost of custom control logic. This changed through the early 1960s with the introduction of mass-produced core memory and core rope, which was far less expensive than dedicated logic based on diode arrays or similar solutions. The first to take real advantage of this was IBM in their 1964 System/360 series. This allowed the machines to have a very complex instruction set, including operations that matched high-level language constructs like formatting binary values as decimal strings, encoding the complex series of internal steps needed for this task in low cost memory.[17]

But the real value in the 360 line was that one could build a series of machines that were completely different internally, yet run the same ISA. For a low-end machine, one might use an 8-bit ALU that requires multiple cycles to complete a single 32-bit addition, while a higher end machine might have a full 32-bit ALU that performs the same addition in a single cycle. These differences could be implemented in control logic, but the cost of implementing a completely different decoder for each machine would be prohibitive. Using microcode meant all that changed was the code in the ROM. For instance, one machine might include a floating point unit and thus its microcode for multiplying two numbers might be only a few lines line, whereas on the same machine without the FPU this would be a program that did the same using multiple additions, and all that changed was the ROM.[17]

The outcome of this design was that customers could use a low-end model of the family to develop their software, knowing that if more performance was ever needed, they could move to a faster version and nothing else would change. This lowered the barrier to entry and the 360 was a runaway success. By the end of the decade, the use of microcode was de rigueur across the mainframe industry.

Moving up the line

[edit]
The microcode (and "nanocode") of the Motorola 68000 is stored in the two large square blocks in the upper right and controlled by circuitry to the right of it. It takes up a significant amount of the total chip surface.

Early minicomputers were far too simple to require microcode, and were more similar to earlier mainframes in terms of their instruction sets and the way they were decoded. But it was not long before their designers began using more powerful integrated circuits that allowed for more complex ISAs. By the mid-1970s, most new minicomputers and superminicomputers were using microcode as well, such as most models of the PDP-11 and, most notably, most models of the VAX, which included high-level instruction not unlike those found in the 360.[18]

The same basic evolution occurred with microprocessors as well. Early designs were extremely simple, and even the more powerful 8-bit designs of the mid-1970s like the Zilog Z80 had instruction sets that were simple enough to be implemented in dedicated logic. By this time, the control logic could be patterned into the same die as the CPU, making the difference in cost between ROM and logic less of an issue. However, it was not long before these companies were also facing the problem of introducing higher-performance designs but still wanting to offer backward compatibility. Among early examples of microcode in micros was the Intel 8086.[5]

Among the ultimate implementations of microcode in microprocessors is the Motorola 68000. This offered a highly orthogonal instruction set with a wide variety of addressing modes, all implemented in microcode. This did not come without cost, according to early articles, about 20% of the chip's surface area (and thus cost) is the microcode system[19] and [citation needed] of the systems 68,000 transistors were part of the microcode system.

RISC enters

[edit]

While companies continued to compete on the complexity of their instruction sets, and the use of microcode to implement these was unquestioned, in the mid-1970s an internal project in IBM was raising serious questions about the entire concept. As part of a project to develop a high-performance all-digital telephone switch, a team led by John Cocke began examining huge volumes of performance data from their customer's 360 (and System/370) programs. This led them to notice a curious pattern: when the ISA presented multiple versions of an instruction, the compiler almost always used the simplest one, instead of the one most directly representing the code. They learned that this was because those instructions were always implemented in hardware, and thus run the fastest. Using the other instruction might offer higher performance on some machines, but there was no way to know what machine they were running on. This defeated the purpose of using microcode in the first place, which was to hide these distinctions.[20]

The team came to a radical conclusion: "Imposing microcode between a computer and its users imposes an expensive overhead in performing the most frequently executed instructions."[20]

The result of this discovery was what is today known as the RISC concept. The complex microcode engine and its associated ROM is reduced or eliminated completely, and those circuits instead dedicated to things like additional registers or a wider ALU, which increases the performance of every program. When complex sequences of instructions are needed, this is left to the compiler, which is the entire purpose of using a compiler in the first place. The basic concept was soon picked up by university researchers in California, where simulations suggested such designs would trivially outperform even the fastest conventional designs. It was one such project, at the University of California, Berkeley, that introduced the term RISC.

The industry responded to the concept of RISC with both confusion and hostility, including a famous dismissive article by the VAX team at Digital.[21] A major point of contention was that implementing the instructions outside of the processor meant it would spend much more time reading those instructions from memory, thereby slowing overall performance no matter how fast the CPU itself ran.[21] Proponents pointed out that simulations clearly showed the number of instructions was not much greater, especially when considering compiled code.[20]

The debate raged until the first commercial RISC designs emerged in the second half of the 1980s, which easily outperformed the most complex designs from other companies. By the late 1980s it was over; even DEC was abandoning microcode for their DEC Alpha designs, and CISC processors switched to using hardwired circuitry, rather than microcode, to perform many functions. For example, the Intel 80486 uses hardwired circuitry to fetch and decode instructions, using microcode only to execute instructions; register-register move and arithmetic instructions required only one microinstruction, allowing them to be completed in one clock cycle.[22] The Pentium Pro's fetch and decode hardware fetches instructions and decodes them into series of micro-operations that are passed on to the execution unit, which schedules and executes the micro-operations, possibly doing so out-of-order. Complex instructions are implemented by microcode that consists of predefined sequences of micro-operations.[23]

Some processor designs use machine code that runs in a special mode, with special instructions, available only in that mode, that have access to processor-dependent hardware, to implement some low-level features of the instruction set. The DEC Alpha, a pure RISC design, used PALcode to implement features such as translation lookaside buffer (TLB) miss handling and interrupt handling,[24] as well as providing, for Alpha-based systems running OpenVMS, instructions requiring interlocked memory access that are similar to instructions provided by the VAX architecture.[24] CMOS IBM System/390 CPUs, starting with the G4 processor, and z/Architecture CPUs use millicode to implement some instructions.[25]

Examples

[edit]
  • The Analytical engine envisioned by Charles Babbage uses pegs inserted into rotating drums to store its internal procedures.
  • The EMIDEC 1100[26] reputedly uses a hard-wired control store consisting of wires threaded through ferrite cores, known as "the laces".
  • Most models of the IBM System/360 series are microprogrammed:
    • The Model 25 is unique among System/360 models in using the top 16 K bytes of core storage to hold the control storage for the microprogram. The 2025 uses a 16-bit microarchitecture with seven control words (or microinstructions). After system maintenance or when changing operating mode, the microcode is loaded from the card reader, tape, or other device.[27] The IBM 1410 emulation for this model is loaded this way.
    • The Model 30 uses an 8-bit microarchitecture with only a few hardware registers; everything that the programmer saw is emulated by the microprogram. The microcode for this model is also held on special punched cards, which are stored inside the machine in a dedicated reader per card, called "CROS" units (Capacitor Read-Only Storage).[28]: 2–5  Another CROS unit is added for machines ordered with 1401/1440/1460 emulation[28]: 4–29  and for machines ordered with 1620 emulation.[28]: 4–75 
    • The Model 40 uses 56-bit control words. The 2040 box implements both the System/360 main processor and the multiplex channel (the I/O processor). This model uses TROS dedicated readers similar to CROS units, but with an inductive pickup (Transformer Read-only Store).
    • The Model 50 has two internal datapaths which operated in parallel: a 32-bit datapath used for arithmetic operations, and an 8-bit data path used in some logical operations. The control store uses 90-bit microinstructions.
    • The Model 85 has separate instruction fetch (I-unit) and execution (E-unit) to provide high performance. The I-unit is hardware controlled. The E-unit is microprogrammed; the control words are 108 bits wide on a basic 360/85 and wider if an emulator feature is installed.
  • The NCR 315 is microprogrammed with hand wired ferrite cores (a ROM) pulsed by a sequencer with conditional execution. Wires routed through the cores are enabled for various data and logic elements in the processor.
  • The Digital Equipment Corporation PDP-9 processor, KL10 and KS10 PDP-10 processors, and PDP-11 processors with the exception of the PDP-11/20, are microprogrammed.[29]
  • Most Data General Eclipse minicomputers are microprogrammed. The task of writing microcode for the Eclipse MV/8000 is detailed in the Pulitzer Prize-winning book titled The Soul of a New Machine.
  • Many systems from Burroughs are microprogrammed:
  • The B700 "microprocessor" execute application-level opcodes using sequences of 16-bit microinstructions stored in main memory; each of these is either a register-load operation or mapped to a single 56-bit "nanocode" instruction stored in read-only memory. This allows comparatively simple hardware to act either as a mainframe peripheral controller or to be packaged as a standalone computer.
  • The B1700 is implemented with radically different hardware including bit-addressable main memory but has a similar multi-layer organisation. The operating system preloads the interpreter for whatever language is required. These interpreters present different virtual machines for COBOL, Fortran, etc.

Implementation

[edit]

Each microinstruction in a microprogram provides the bits that control the functional elements that internally compose a CPU. The advantage over a hard-wired CPU is that internal CPU control becomes a specialized form of a computer program. Microcode thus transforms a complex electronic design challenge (the control of a CPU) into a less complex programming challenge. To take advantage of this, a CPU is divided into several parts:

  • An I-unit may decode instructions in hardware and determine the microcode address for processing the instruction in parallel with the E-unit.
  • A microsequencer picks the next word of the control store. A sequencer is mostly a counter, but usually also has some way to jump to a different part of the control store depending on some data, usually data from the instruction register and always some part of the control store. The simplest sequencer is just a register loaded from a few bits of the control store.
  • A register set is a fast memory containing the data of the central processing unit. It may include registers visible to application programs, such as general-purpose registers and the program counter, and may also include other registers that are not easily accessible to the application programmer. Often the register set is a triple-ported register file; that is, two registers can be read, and a third written at the same time.
  • An arithmetic and logic unit performs calculations, usually addition, logical negation, a right shift, and logical AND. It often performs other functions, as well.

There may also be a memory address register and a memory data register, used to access the main computer storage. Together, these elements form an "execution unit". Most modern CPUs have several execution units. Even simple computers usually have one unit to read and write memory, and another to execute user code. These elements could often be brought together as a single chip. This chip comes in a fixed width that would form a "slice" through the execution unit. These are known as "bit slice" chips. The AMD Am2900 family is one of the best known examples of bit slice elements.[40] The parts of the execution units and the whole execution units are interconnected by a bundle of wires called a bus.

Programmers develop microprograms, using basic software tools. A microassembler allows a programmer to define the table of bits symbolically. Because of its close relationship to the underlying architecture, "microcode has several properties that make it difficult to generate using a compiler."[1] A simulator program is intended to execute the bits in the same way as the electronics, and allows much more freedom to debug the microprogram. After the microprogram is finalized, and extensively tested, it is sometimes used as the input to a computer program that constructs logic to produce the same data.[citation needed] This program is similar to those used to optimize a programmable logic array. Even without fully optimal logic, heuristically optimized logic can vastly reduce the number of transistors from the number needed for a read-only memory (ROM) control store. This reduces the cost to produce, and the electricity used by, a CPU.

Microcode can be characterized as horizontal or vertical, referring primarily to whether each microinstruction controls CPU elements with little or no decoding (horizontal microcode)[a] or requires extensive decoding by combinatorial logic before doing so (vertical microcode). Consequently, each horizontal microinstruction is wider (contains more bits) and occupies more storage space than a vertical microinstruction.

Horizontal microcode

[edit]

"Horizontal microcode has several discrete micro-operations that are combined in a single microinstruction for simultaneous operation."[1] Horizontal microcode is typically contained in a fairly wide control store; it is not uncommon for each word to be 108 bits or more. On each tick of a sequencer clock a microcode word is read, decoded, and used to control the functional elements that make up the CPU.

In a typical implementation a horizontal microprogram word comprises fairly tightly defined groups of bits. For example, one simple arrangement might be:

Register source A Register source B Destination register Arithmetic and logic unit operation Type of jump Jump address

For this type of micromachine to implement a JUMP instruction with the address following the opcode, the microcode might require two clock ticks. The engineer designing it would write microassembler source code looking something like this:

   # Any line starting with a number-sign is a comment
   # This is just a label, the ordinary way assemblers symbolically represent a 
   # memory address.
 InstructionJUMP:
       # To prepare for the next instruction, the instruction-decode microcode has already
       # moved the program counter to the memory address register. This instruction fetches
       # the target address of the jump instruction from the memory word following the
       # jump opcode, by copying from the memory data register to the memory address register.
       # This gives the memory system two clock ticks to fetch the next 
       # instruction to the memory data register for use by the instruction decode.
       # The sequencer instruction "next" means just add 1 to the control word address.
    MDR, NONE, MAR, COPY, NEXT, NONE
       # This places the address of the next instruction into the PC.
       # This gives the memory system a clock tick to finish the fetch started on the
       # previous microinstruction.
       # The sequencer instruction is to jump to the start of the instruction decode.
    MAR, 1, PC, ADD, JMP, InstructionDecode
       # The instruction decode is not shown, because it is usually a mess, very particular
       # to the exact processor being emulated. Even this example is simplified.
       # Many CPUs have several ways to calculate the address, rather than just fetching
       # it from the word following the op-code. Therefore, rather than just one
       # jump instruction, those CPUs have a family of related jump instructions.

For each tick it is common to find that only some portions of the CPU are used, with the remaining groups of bits in the microinstruction being no-ops. With careful design of hardware and microcode, this property can be exploited to parallelise operations that use different areas of the CPU; for example, in the case above, the ALU is not required during the first tick, so it could potentially be used to complete an earlier arithmetic instruction.

Vertical microcode

[edit]

In vertical microcode, each microinstruction is significantly encoded, that is, the bit fields generally pass through intermediate combinatory logic that, in turn, generates the control and sequencing signals for internal CPU elements (ALU, registers, etc.). This is in contrast with horizontal microcode, in which the bit fields either directly produce the control and sequencing signals or are only minimally encoded. Consequently, vertical microcode requires smaller instruction lengths and less storage, but requires more time to decode, resulting in a slower CPU clock.[41]

Some vertical microcode is just the assembly language of a simple conventional computer that is emulating a more complex computer. Some processors, such as DEC Alpha processors and the CMOS microprocessors on later IBM mainframes System/390 and z/Architecture, use machine code, running in a special mode that gives it access to special instructions, special registers, and other hardware resources unavailable to regular machine code, to implement some instructions and other functions,[42][43] such as page table walks on Alpha processors.[44] This is called PALcode on Alpha processors and millicode on IBM mainframe processors.

Another form of vertical microcode has two fields:

Field select Field value

The field select selects which part of the CPU will be controlled by this word of the control store. The field value controls that part of the CPU. With this type of microcode, a designer explicitly chooses to make a slower CPU to save money by reducing the unused bits in the control store; however, the reduced complexity may increase the CPU's clock frequency, which lessens the effect of an increased number of cycles per instruction.

As transistors grew cheaper, horizontal microcode came to dominate the design of CPUs using microcode, with vertical microcode being used less often.

When both vertical and horizontal microcode are used, the horizontal microcode may be referred to as nanocode or picocode.[45]

Writable control store

[edit]

A few computers were built using writable microcode. In this design, rather than storing the microcode in ROM or hard-wired logic, the microcode is stored in a RAM called a writable control store or WCS. Such a computer is sometimes called a writable instruction set computer (WISC).[46]

Many experimental prototype computers use writable control stores; there are also commercial machines that use writable microcode, such as the Burroughs Small Systems, early Xerox workstations, the DEC VAX 8800 (Nautilus) family, the Symbolics L- and G-machines, a number of IBM System/360 and System/370 implementations, some DEC PDP-10 machines,[47] and the Data General Eclipse MV/8000.[48]

The IBM System/370 includes a facility called Initial-Microprogram Load (IML or IMPL)[49] that can be invoked from the console, as part of power-on reset (POR) or from another processor in a tightly coupled multiprocessor complex.

Some commercial machines, for example IBM 360/85,[50][51] have both a read-only storage and a writable control store for microcode.

WCS offers several advantages including the ease of patching the microprogram and, for certain hardware generations, faster access than ROMs can provide. User-programmable WCS allows the user to optimize the machine for specific purposes.

Starting with the Pentium Pro in 1995, several x86 CPUs have writable Intel Microcode.[52][53] This, for example, has allowed bugs in the Intel Core 2 and Intel Xeon microcodes to be fixed by patching their microprograms, rather than requiring the entire chips to be replaced. A second prominent example is the set of microcode patches that Intel offered for some of their processor architectures of up to 10 years in age, in a bid to counter the security vulnerabilities discovered in their designs – Spectre and Meltdown – which went public at the start of 2018.[54][55] A microcode update can be installed by Linux,[56] FreeBSD,[57] Microsoft Windows,[58] or the motherboard BIOS.[59]

Some machines offer user-programmable writable control stores as an option, including the HP 2100, DEC PDP-11/60, TI-990/12,[60][61] and Varian Data Machines V-70 series minicomputers. WCS options extended down to microprocessors too. The DEC LSI-11 has an option to allow programming of the internal 8-bit micromachine to create application-specific extensions to the instruction set.[62]

Some microcode peripheral devices and adapters have writable microcode, which is usually loaded by an operating system device driver. Such microcode is loaded to the device's SRAM or DRAM, for example, GDDR SDRAM of a video card.

Comparison to VLIW and RISC

[edit]

The design trend toward heavily microcoded processors with complex instructions began in the early 1960s and continued until roughly the mid-1980s. At that point the RISC design philosophy started becoming more prominent.

A CPU that uses microcode generally takes several clock cycles to execute a single instruction, one clock cycle for each step in the microprogram for that instruction. Some CISC processors include instructions that can take a very long time to execute. Such variations interfere with both interrupt latency and, what is far more important in modern systems, pipelining.

When designing a new processor, a hardwired control RISC has the following advantages over microcoded CISC:

  • Programming has largely moved away from assembly level, so it's no longer worthwhile to provide complex instructions for productivity reasons.
  • Simpler instruction sets allow direct execution by hardware, avoiding the performance penalty of microcoded execution.
  • Analysis shows complex instructions are rarely used, hence the machine resources devoted to them are largely wasted.
  • The machine resources devoted to rarely used complex instructions are better used for expediting performance of simpler, commonly used instructions.
  • Complex microcoded instructions may require many clock cycles that vary, and are difficult to pipeline for increased performance.

There are counterpoints as well:

  • The complex instructions in heavily microcoded implementations may not take much extra machine resources, except for microcode space. For example, the same ALU is often used to calculate an effective address and to compute the result from the operands, e.g., the original Z80, 8086, and others.
  • The simpler non-RISC instructions (i.e., involving direct memory operands) are frequently used by modern compilers. Even immediate to stack (i.e., memory result) arithmetic operations are commonly employed. Although such memory operations, often with varying length encodings, are more difficult to pipeline, it is still fully feasible to do so - clearly exemplified by the i486, AMD K5, Cyrix 6x86, Motorola 68040, etc.
  • Non-RISC instructions inherently perform more work per instruction (on average), and are also normally highly encoded, so they enable smaller overall size of the same program, and thus better use of limited cache memories.

Many RISC and VLIW processors are designed to execute every instruction (as long as it is in the cache) in a single cycle. This is very similar to the way CPUs with microcode execute one microinstruction per cycle. VLIW processors have instructions that behave similarly to very wide horizontal microcode, although typically without such fine-grained control over the hardware as provided by microcode. RISC instructions are sometimes similar to the narrow vertical microcode.

Micro-operations

[edit]

Modern CISC implementations, such as the x86 family starting with the NexGen Nx586, Intel Pentium Pro, and AMD K5, decode instructions into dynamically buffered micro-operations with an instruction encoding similar to RISC or traditional microcode. A hardwired instruction decode unit directly emits microoperations for common x86 instructions, but falls back to a more traditional microcode ROM containing microoperations for more complex or rarely used instructions.[2]

For example, an x86 might look up microoperations from microcode to handle complex multistep operations such as loop or string instructions, floating-point unit transcendental functions or unusual values such as denormal numbers, and special-purpose instructions such as CPUID.

Alternate meanings of "microcode"

[edit]

PDP-8

[edit]

The PDP-8 is a family of 12-bit minicomputers that was launched Digital Equipment Corporation in 1965. The OPR (OPeRate) instruction was said to be "microcoded." This did not mean what the word means today, but meant that each bit of the instruction word specifies a certain action, and the programmer could achieve several actions in a single instruction cycle by setting multiple bits. Examples of these actions are: clear accumulator, complement accumulator, rotate right, rotate right twice, and byte swap.

Embedded firmware

[edit]

Some hardware vendors, notably IBM and Lenovo, use the term microcode interchangeably with embedded firmware. In this context, all code within a device is termed microcode, whether it is microcode or machine code. For instance, updates to a hard disk drive's microcode may encompass updates to both its microcode and firmware.[63] Embedded firmware has been popular in application-specific processors such as network processors, digital signal processors, channel controllers, disk controllers, network interface controllers, flash memory controllers, graphics processing units, and in other hardware.

See also

[edit]

Notes

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Microcode is a low-level programming layer within a (CPU) that implements the machine instructions of the processor's (ISA) by translating them into a sequence of simpler microinstructions executed by the hardware . These microinstructions, stored in a dedicated control store—typically implemented as (ROM), (RAM), or writable control store—generate the precise control signals needed to orchestrate operations like fetching operands, performing arithmetic, and storing results. This approach allows complex ISAs to be realized through a relatively simple microengine, a basic state machine that sequences the microinstructions, enabling flexibility in design and implementation without altering the underlying hardware circuitry. Invented by at the in the late 1940s and first described in a 1951 paper, microcode emerged as a solution to the challenges of designing control logic for increasingly complex computers, inspired by earlier diode-matrix techniques like those in the MIT . The concept was practically demonstrated in the 2 computer, operational in 1958, which used a 32x32 matrix for its control store to implement variable-length instructions. Microprogramming gained prominence in the with the mainframe series, where most models (except the high-end 75 and 91) employed it to ensure binary compatibility across a diverse range of processor implementations, using control stores ranging from 2.75K to 4K microinstructions. In modern processors, microcode continues to play a vital role, particularly in complex instruction set computing (CISC) architectures like x86, where it handles intricate or infrequently used instructions that would be inefficient to hardwire directly. It facilitates post-manufacturing updates to address errata, such as security vulnerabilities or bugs, loaded via the system during (), as seen in processors and successors. While reduced instruction set computing (RISC) designs largely favor hardwired control for speed and density, microcode's advantages in modularity, ease of , and have sustained its use, often in hybrid forms that blend it with hardware decoding. This evolution reflects a balance between performance demands and the need for adaptable, maintainable processor designs.

Introduction

Definition and Purpose

Microcode consists of sequences of microinstructions stored in a control , such as a control store, that direct the processor's and by specifying the low-level operations needed to execute higher-level machine instructions. The primary purpose of microcode is to simplify CPU design by decomposing complex machine instructions into primitive micro-operations, which allows for the of intricate instruction set architectures (ISAs) using relatively straightforward hardware. This approach reduces the need for extensive custom logic circuitry, lowers development costs, and facilitates compatibility across processor variants within a family. Microcode functions as an interpreter for the ISA, translating programmer-visible instructions into hardware-specific controls while concealing underlying implementation details from software developers. In this role, it provides a flexible layer that enables efficient execution without exposing the complexities of the processor's internal mechanisms. Microcode emerged in the as a response to the limitations of early electronic computers using vacuum tubes, where hardwired control logic struggled with increasing complexity due to extensive wiring and reliability issues. It gained further prominence in the with transistor-based systems. Proposed by in 1951, it offered a systematic method to manage these challenges by replacing intricate wiring with stored programs for control sequencing.

Relation to Instruction Sets

The (ISA) serves as the visible interface to programmers, defining the set of machine instructions that a processor can execute, while microcode operates beneath this layer to translate those instructions into the detailed control signals required by the hardware. Microcode achieves this by decoding each ISA instruction and sequencing a series of lower-level operations that manipulate the processor's datapath and control units, effectively implementing the ISA's semantics without exposing these details to software developers. A key advantage of microcode lies in its ability to enable emulation, allowing hardware designed for one ISA to support instructions from another, which is particularly useful for maintaining in evolving processor families. For instance, the used microcode to emulate the older architecture, permitting legacy software to run on newer hardware without modification. This emulation capability arises because microcode can be modified or extended to interpret foreign instructions, bridging architectural differences at the control level. In complex instruction set computing (CISC) architectures, microcode plays a central role by decomposing intricate, variable-length instructions into simpler sequences, accommodating the wide variety of operations that directly access memory or perform multi-step computations. In contrast, reduced instruction set computing (RISC) architectures typically rely on hardwired control for their simpler, fixed-length instructions, minimizing the need for microcode and enabling more direct mapping to hardware execution paths. This distinction highlights microcode's flexibility in handling CISC complexity, where it interprets instructions as they are fetched, versus RISC's emphasis on streamlined hardware decoding. The typical flow begins with the fetch of a machine instruction from memory, which is then decoded to select and initiate the corresponding microprogram—a sequence of microinstructions stored in control memory that generates the necessary signals for execution. This culminates in micro-operations, the atomic hardware steps that carry out the instruction's intent.

Fundamentals

Microinstructions and Microprograms

A microinstruction serves as a low-level command that specifies the control signals necessary to execute a single basic operation within the processor's hardware, typically corresponding to one clock cycle. It directly activates components such as the arithmetic-logic unit (ALU), registers, and data buses by setting appropriate control lines, thereby implementing the finer-grained steps required to carry out a higher-level machine instruction. This approach allows for a modular breakdown of complex instructions into manageable hardware actions. The structure of a microinstruction generally consists of multiple fields that encode the desired operations and sequencing. Common fields include an opcode or function selector for the ALU (e.g., add, subtract, or pass-through), source and destination selectors for operands (e.g., specifying registers like A or B, or memory buffers), control bits for register reads/writes and memory access, and a next-address control field for determining the subsequent microinstruction. For instance, in illustrative architectures like the MIC-1, fields such as ALU operation (e.g., A+B or A AND B), condition codes for branching (e.g., branch if zero), and address fields for jumps provide precise control over datapath elements. These fields are typically packed into a fixed-width word, such as 32 bits, stored in a control memory. A microprogram is an ordered sequence of these microinstructions, residing in (ROM) or (RAM) within the control store, and invoked to execute a specific machine instruction. Upon decoding a machine instruction's , the processor dispatches to the starting of the corresponding microprogram, which then runs step-by-step to perform the required micro-operations, such as fetching operands or updating the . Microprograms support conditional branching to handle variations like overflow or zero results, enabling flexible implementation of instruction behaviors. Sequencing within a microprogram is managed by a microprogram counter (MPC) that holds the of the current microinstruction, incremented sequentially by default or altered via dispatch logic for branches and jumps. This logic interprets the next-address field in each microinstruction, potentially using condition flags (e.g., zero or negative) to select the path, and supports subroutine calls by saving return addresses for nested execution. Dispatch tables, indexed by the machine instruction , facilitate efficient entry into the appropriate microprogram routine. For a simple ADD instruction, such as adding a memory value to a register, a representative microprogram might proceed as follows in pseudocode:

1. MAR ← PC; Read memory; PC ← PC + 1 // Fetch effective address 2. MDR ← Memory[MAR]; A ← MDR // Load operand into A 3. B ← Register[R1] // Load register operand into B 4. ALU ← A + B; Set flags // Perform addition, update condition codes 5. Register[R1] ← ALU // Store result back to register 6. Next instruction // Return to fetch cycle

1. MAR ← PC; Read memory; PC ← PC + 1 // Fetch effective address 2. MDR ← Memory[MAR]; A ← MDR // Load operand into A 3. B ← Register[R1] // Load register operand into B 4. ALU ← A + B; Set flags // Perform addition, update condition codes 5. Register[R1] ← ALU // Store result back to register 6. Next instruction // Return to fetch cycle

This sequence fetches operands, executes the addition via the ALU, stores the result, and branches to the next machine instruction, with each step corresponding to one microinstruction.

Micro-operations

Micro-operations, often denoted as μops, represent the fundamental, atomic hardware actions executed by a processor's , encompassing tasks such as register-to-register data transfers, (ALU) computations, or memory read/write accesses, each typically confined to a single clock cycle. Microcode orchestrates these μops by employing sequences of microinstructions stored in a control store, where each microinstruction asserts targeted control signals to trigger one or more concurrent μops within the processor's execution hardware. To manage data dependencies and hazards, μops are arranged in chains that enforce sequential execution where necessary; for example, an ADD operation might sequence as follows: load the first operand from a source register to an ALU input, load the second operand similarly, execute the addition, and transfer the result to the destination register, with pipelining allowing overlap across cycles for efficiency. In modern processors, μops function as the core unit for and dynamic scheduling, enabling the hardware to reorder independent μops for optimal throughput while the dependency chains maintain architectural correctness. A primary role of μops lies in instruction decomposition, where complex macroinstructions are translated into multiple finer-grained μops; notably, in x86 architectures, a single intricate instruction can expand into 10 or more μops generated via microcode to facilitate detailed control over execution.

Design and Implementation

Horizontal Microcode

Horizontal microcode refers to a format of microprogrammed control in which microinstructions are wide—typically exceeding 100 bits—and each bit or small field maps directly to individual hardware control signals, such as selects, register enables, or ALU operation codes, without requiring an intermediate decoding stage. This one-to-one correspondence allows the microinstruction to explicitly specify all active control lines for parallel operations within the during a single clock cycle. The primary advantages of horizontal microcode stem from its direct control mechanism, enabling high-speed execution by eliminating decode overhead and facilitating inherent parallelism, as multiple independent operations can be initiated simultaneously across functional units. This approach minimizes latency in instruction processing, making it suitable for systems demanding rapid throughput. However, horizontal microcode presents significant disadvantages, including the need for extensive control signal wiring, which increases hardware complexity, board space requirements, and potential for signal propagation delays due to long interconnects. Its lack of encoding also reduces flexibility, as modifications to the control logic often necessitate hardware revisions rather than simple updates. In terms of encoding, horizontal microinstructions typically consist of numerous bit fields, each dedicated to a specific functional unit or control aspect; for instance, individual bits might enable particular registers or select ALU functions, while fields for branching include condition codes and target addresses within the same instruction. A representative example is the IBM System/360 Model 50, which utilized 90-bit horizontal microinstructions divided into 28 fields to directly govern datapath controls, such as register transfers and arithmetic operations, across its execution pipeline. This design allowed the Model 50 to implement complex instructions efficiently through explicit, unencoded signal assertions. In contrast to vertical microcode, which employs more compact, encoded formats requiring decoding, horizontal microcode prioritizes immediacy over storage efficiency.

Vertical Microcode

Vertical microcode refers to an encoded format of microinstructions that employs narrower word lengths, typically in the range of 20 to 50 bits, where individual fields such as opcodes represent multiple control signals that must be decoded before execution. This approach contrasts with direct signal mapping by grouping related control actions into compact fields, allowing the microinstruction to specify high-level operations like ALU functions or register selections through symbolic or numeric codes rather than explicit bits for each signal. The decoding generates the full set of control signals, often resembling a horizontal microcode output internally, which enables efficient storage but introduces an additional hardware layer for interpretation. Vertical microcode can vary in complexity based on the number of decoding stages. Simple vertical microcode involves a single level of decoding, where fields are expanded directly by dedicated decoders to produce control signals in one step, suitable for straightforward operations. In contrast, multi-level vertical microcode uses cascaded decoders, where initial fields select sub-opcodes that undergo further decoding, achieving greater compression at the cost of increased latency. Encoding details typically include dedicated fields for operation type (e.g., a 3-4 bit selecting ALU mode or memory access), operand routing (e.g., source/destination register selectors), and next-address control, with the decoder hardware translating these into the broader set of signals. The primary trade-offs of vertical microcode center on storage efficiency versus execution overhead. It achieves a smaller for the control store—significantly reducing ROM size compared to unencoded formats—making it advantageous for space-constrained designs and facilitating easier modification of microprograms during development or updates. However, the required decode cycles add latency, as the signals are not immediately available, potentially slowing overall processor in time-critical paths. An illustrative example is the processor, where 21-bit vertical microinstructions are stored in a 512-entry ROM and decoded by on-chip logic to generate horizontal control signals for the , including fields for ALU operations, updates, and bus controls. This format allowed the 8086 to implement its complex x86 instruction set with a compact control store while relying on decoding to handle the variety of operand routings and execution sequences.

Writable Control Store

Writable control store (WCS) refers to the implementation of a processor's control store using modifiable memory technologies, such as (RAM) or programmable (PROM), rather than fixed (ROM), which permits the dynamic loading, updating, and customization of microprograms during operation or maintenance. This approach allows microcode to be altered post-manufacture, providing flexibility in processor behavior without requiring hardware redesign. The concept originated in the , with Ascher Opler coining the term "firmware" in a 1967 Datamation article to describe the contents of such a writable control store, which could be reloaded to specialize a machine for particular applications. IBM pioneered practical WCS implementations in its mainframe systems, beginning with the System/360 Model 30 in the mid-1960s, where modifiable control cards (CCROS) enabled field engineering modifications to microcode. This evolved in the System/370 series, such as the Model 145, which featured up to 16K words of 32-bit read-write control storage for patches and diagnostics, with updates distributed on 8-inch floppy diskettes starting in 1971. In the System/370 Model 165, WCS supplemented read-only storage to accommodate new instructions, emulator microcode, and corrective patches for CPU defects. These microcode update mechanisms in mainframes represented a foundational advancement, influencing the development of firmware systems like BIOS and UEFI in personal computers, where similar patches are loaded into processor memory during system initialization. Key techniques for WCS include storing microinstructions in magnetic core memory or early RAM, enabling reprogramming via dedicated software loaders or hardware interfaces, and employing diagnostic modes to selectively patch specific sections of the control store without overwriting the entire program. For handling larger microprograms that exceed the addressable space of a single control store bank, bank switching can be used to swap segments of microcode into active memory as needed. Vertical microcode formats are often paired with WCS because their field-encoded structure facilitates easier editing and reloading compared to more densely packed horizontal formats. Applications of WCS primarily focus on enhancing processor adaptability, such as emulating legacy instructions—like the compatibility mode on the System/360 Model 30—or adding support for new architectural features without silicon changes. In the series, it was instrumental for bug fixes, allowing engineers to correct hardware flaws through microcode revisions that improved reliability and extended machine longevity. Despite its benefits, WCS presents challenges, including performance overhead from the higher access latency of RAM relative to ROM, which can slightly slow microinstruction fetches in time-critical paths. Additionally, the writable nature introduces risks, as unauthorized access to the control store could enable tampering with core processor logic, necessitating robust protection mechanisms like restricted access modes.

Historical Development

Early Examples

The concept of microprogramming was first proposed by in 1951, in his paper "The Best Way to Design an Automatic Calculating Machine," where he described it as a method for implementing a stored-program to simplify the design of the by treating control signals as a form of programming. This approach envisioned breaking down instructions into sequences of elementary control actions, allowing the control unit to be programmed rather than hardwired, thereby reducing design complexity and enabling easier modifications. One of the earliest hardware realizations of microprogramming was in the computer, which became operational in early 1958 at the under Wilkes' direction. The EDSAC 2 used a memory as its control store, consisting of a 32-by-32 core matrix that held 1,024 microinstructions, including a 128-step order decoder to interpret instructions. This implementation targeted micro-operations such as register transfers and arithmetic unit activations, demonstrating the feasibility of programmable control sequences in a practical . The key innovation of these early efforts was the separation of control logic into modifiable sequences stored in fast memory, which minimized the need for intricate hardwired circuitry and allowed for more flexible processor designs. For instance, in the EDSAC 2, the microprogram handled conditional branching and subroutine calls within the , proving that microprogramming could efficiently orchestrate complex operations without extensive recabling. A notable early commercial application appeared in the Burroughs B5000, introduced in 1961, which employed microprogrammed control to support its stack-based architecture optimized for high-level languages like ALGOL 60. The B5000's microcode facilitated efficient handling of stack operations and tagged memory, marking one of the first widespread uses of microprogramming in a production system. Early microcode implementations like these featured short routines; for example, the EDSAC 2's order decoder required only 128 microinstructions to process a machine instruction.

IBM System/360 Era

The introduction of microcode in most models of the family, announced in , marked a pivotal advancement in by enabling a single, unified (ISA) across a wide range of previously incompatible hardware models, from the low-end Model 30 to the high-performance Model 91. High-end models such as the 44, 75, 91, 95, and 195 were implemented with hardwired logic. This compatibility was achieved primarily through microprogramming in the majority of models, which allowed diverse processors to execute the same machine instructions despite significant variations in underlying hardware capabilities and performance levels spanning a factor of 50. Implementation in these microprogrammed System/360 models relied on vertical microcode stored in a read-only control store, typically consisting of thousands of words to encode control signals for instruction execution. This approach facilitated model-specific optimizations while maintaining strict binary compatibility, and it supported enhanced diagnostics by permitting post-manufacture modifications to the microcode for error corrections and feature updates. Vertical microcode's encoded format minimized control store size compared to horizontal alternatives, balancing density and flexibility in the resource-constrained environment of 1960s mainframes. The use of microcode provided essential , allowing software developed for one model to run unchanged on others and promoting portability across the family. A representative example is the handling of floating-point instructions, which could be emulated via microcode sequences on integer-only hardware in entry-level models like the Model 30, where dedicated floating-point units were optional; this ensured full ISA compliance without requiring uniform hardware across all variants. Later models incorporated writable control stores for model-specific extensions, further enhancing adaptability. The System/360's commercial triumph, with over 1,000 units ordered in the first month and sustained demand that dominated the industry for decades, popularized microprogramming as a standard technique in mainframe design, influencing subsequent generations of compatible systems.

Transition to RISC and Beyond

The emergence of Reduced Instruction Set Computing (RISC) architectures in the marked a significant shift in , substantially reducing the reliance on microcode compared to prevailing Complex Instruction Set Computing (CISC) systems. Projects like MIPS, initiated at in 1981, emphasized simple, uniform instructions that could be executed in a single clock cycle using hardwired control logic, eliminating the need for microcode to decode and sequence complex operations. This approach leveraged advancements in very-large-scale integration (, where the cost of transistors made direct hardware implementation more efficient than microprogrammed control stores, enabling deeper pipelining and optimizations for gains. However, microcode persisted in some RISC designs for handling traps and exceptions, where irregular control flows required flexible sequencing beyond standard hardwired paths. As CISC architectures like x86 faced performance bottlenecks from intricate instructions, hybrid designs emerged in the 1990s, integrating RISC-like execution cores with microcode to maintain . In these processors, complex x86 instructions were decoded into simpler micro-operations (μops) executed on an internal RISC-style , while microcode managed emulation of legacy CISC behaviors that could not be efficiently hardwired. This duality allowed x86 systems to adopt RISC principles—such as superscalar execution and out-of-order processing—without abandoning the established instruction set, bridging the gap between simplicity and compatibility. A pivotal example was the processor, released in 1996, which featured a superscalar RISC core paired with an x86 decoder that translated instructions into internal operations, using microcode for handling intricate emulation tasks to achieve full x86 compliance. While pure RISC processors, such as those based on MIPS or , largely phased out microcode by favoring hardwired implementations for their streamlined ISAs, microcode remained indispensable in evolutions for ensuring compatibility with decades of CISC software. The transition underscored microcode's role as a flexible layer for legacy support in hybrid systems, contrasting with RISC's emphasis on hardware simplicity. By the , microcode updates became a standard practice for and x86 processors, enabling post-silicon fixes for errata like flaws and vulnerabilities without hardware redesigns, as evidenced by early analyses of update mechanisms dating to 2000.

Advantages and Comparisons

Benefits of Microcode

Microcode provides significant design flexibility in processor architecture by allowing modifications to instruction implementation after fabrication, thereby avoiding the high costs and delays associated with hardware redesigns. For instance, bugs, optimizations, or new features can be addressed through microcode updates distributed via , enabling manufacturers to extend product lifecycles without recalling or replacing physical chips. This flexibility also facilitates (ISA) evolution and backward compatibility, as microcode can emulate legacy instructions or introduce extensions without altering the underlying hardware. In the family, microcoding enabled a unified ISA across diverse models varying in cost and performance, supporting compatibility with prior systems through emulation and allowing seamless upgrades for customers. From a cost perspective, microcode simplifies the hardware, particularly in complex CISC designs where implementing numerous variable-length instructions directly in hardwired logic would require extensive and expensive circuitry. By offloading instruction decoding and sequencing to microcode, designers reduce the complexity of the control, leading to smaller, more manageable hardware implementations. Although microcode introduces some overhead due to the additional cycles needed for microinstruction fetch and execution, it offloads intricate control logic from hardwired paths, allowing the core hardware to focus on high-speed data operations and potentially improving overall design efficiency in multifaceted processors. In the case of the , the adoption of microcode standardized control mechanisms across the product line, which streamlined development efforts and reduced the time required to bring multiple compatible models to market.

Comparison to VLIW and RISC

Microcode architectures, typically associated with complex instruction set computing (CISC) designs, introduce an intermediary layer that translates high-level instructions into simpler micro-operations, enabling the handling of intricate operations that would otherwise require extensive hardwired logic. This approach incurs additional decode overhead, as the processor must fetch and execute sequences of micro-instructions, potentially increasing cycle times compared to direct hardware execution. In contrast, reduced instruction set computing (RISC) architectures eliminate this layer by design, employing a streamlined instruction set where each instruction maps directly to basic hardware operations, allowing for faster decoding and execution of simple instructions without the indirection of microcode. For instance, RISC processors achieve lower latency on common operations by avoiding the microprogram sequencing that microcode necessitates, though they may require more instructions overall to accomplish complex tasks. When compared to (VLIW) architectures, microcode differs fundamentally in how it manages and scheduling. Microcode hides the scheduling of operations from the hardware by storing predefined sequences in control , where the microcontrol unit sequentially dispatches micro-instructions without explicit intervention for parallelism. VLIW, however, shifts this responsibility to the , which explicitly packs multiple independent operations into a single long instruction word for parallel execution, exposing the parallelism directly to the hardware and eliminating the need for microprogram . This compiler-driven approach in VLIW avoids the sequential fetch overhead of microcode but demands precise static scheduling, often inserting no-operation (NOP) instructions to resolve dependencies, whereas microcode's fixed sequences provide more abstraction at the cost of flexibility in dynamic environments. The trade-offs between these approaches highlight microcode's strength in maintaining , particularly in legacy-heavy ecosystems like x86, where it allows incremental enhancements to complex instructions without disrupting existing software binaries. RISC and VLIW designs, by prioritizing speed through simplified or explicitly parallel instructions, excel in greenfield applications but often necessitate full ISA redesigns or recompilation for evolution, limiting their adaptability to entrenched codebases. A notable example is Intel's (μop) cache, which stores decoded μops for frequent instructions, bypassing the traditional microcode decoder to enable RISC-like direct execution and reducing front-end latency. Ultimately, microcode enables CISC hardware to emulate RISC by breaking down instructions into efficient μop sequences, while VLIW's explicit parallelism requires an ISA overhaul to leverage without such emulation layers.

Modern Applications

Processor Examples

The IBM System/370, introduced in 1970, utilized writable control store in models like the Model 145 to implement vertical microcode, enabling flexible emulation of other architectures and supporting virtualization through the Virtual Machine Facility/370 (VM/370). This approach allowed customers to load custom microcode into the processor's control storage, facilitating efficient resource sharing among multiple virtual machines without hardware modifications. The microprocessor, released in 1978, employed vertical microcode to handle its 8- and 16-bit instructions, with the microcode engine decoding opcodes into sequences of simpler operations stored in a 512-entry ROM. This design balanced complexity and efficiency in a compact die, using a more encoded format to minimize storage while supporting variable-length instructions. In modern and x86-64 processors, microcode plays a key role in translating complex CISC instructions into simpler RISC-like micro-operations (μops) for execution on the internal pipeline, with updates delivered through or to address bugs and enhance compatibility. This translation layer allows the retention of the legacy x86 instruction set while leveraging RISC-inspired hardware for performance. ARM processors, adhering to RISC principles with fixed-length instructions, generally avoid extensive microcode in favor of direct hardware decoding. Modern x86 processors load microcode patches at boot time via to mitigate vulnerabilities, including the 2018 Meltdown exploit, which affected processors by enabling unauthorized kernel memory access through flaws addressed in microcode revisions. For instance, the x86 REP MOVS instruction, used for block memory transfers, is decomposed by microcode into approximately 20 μops on processors like Nehalem, involving loops for repetition, address increments, and data movement to execute efficiently on the out-of-order core.

Recent Developments and Challenges

In the multi-core era, microcode has become essential for managing environments, where processors integrate cores with diverse capabilities to balance performance and efficiency. For instance, 's processors (12th generation, released 2021) initially supported vector extensions on high-performance (P) cores but not on efficiency (E) cores by hardware design; subsequent microcode updates in 2022 disabled on P-cores to ensure consistent behavior in hybrid workloads and prevent compatibility issues. Security vulnerabilities disclosed in 2018, such as Spectre, exposed weaknesses in micro-op scheduling and within x86 processors, enabling side-channel attacks that leak data across security boundaries. Spectre variants exploit the CPU's predictive mechanisms to execute unauthorized micro-operations, potentially revealing sensitive information from kernel memory. To counter these, and released microcode updates that alter branch prediction tables and scheduling logic, reducing the attack surface without requiring full hardware redesigns; these patches have been widely deployed to mitigate impacts on systems running vulnerable software. Microcode updates are typically delivered via operating system integrations or , facilitating remote corrections for processor errata. Since the 2010s, has incorporated patches into , applying them dynamically during boot to address stability and security issues without user intervention or flashing. This mechanism supports writable control stores in modern CPUs, enabling volatile updates that persist only until power-off, thus minimizing risks from persistent modifications. Contemporary challenges in x86 microcode design stem from the rising micro-operation (μop) count per instruction, with complex x86 opcodes often decoding into 4-5 μops, which amplifies power dissipation and thermal constraints in dense multi-core dies. This complexity, while enabling backward compatibility, strains decoder throughput and increases energy overhead, prompting innovations like larger μop caches to bypass repeated decoding. In contrast, the RISC-V architecture treats microcode as optional, allowing implementers to adopt microprogrammed control units for extensible custom instructions, fostering more efficient designs in embedded and server applications. Emerging applications highlight microcode's adaptability, particularly in specialized hardware. Research into AI accelerators demonstrates microprogrammable control units that sequence operations via microcode, offering reconfiguration for evolving architectures without silicon changes. Additionally, as of November 2025, AMD's processors (released 2024) employ microcode updates via to resolve flaws in the RDSEED instruction (AMD-SB-7055, disclosed October 2025), which can produce non-random values affecting cryptographic operations, underscoring microcode's role in maintaining error-free execution in . Potential extensions include integrating quantum-resistant instructions into microcode layers to accelerate primitives, aligning with NIST's 2024 standards for lattice-based algorithms.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.