Recent from talks
Nothing was collected or created yet.
Field-programmable gate array
View on Wikipedia

A field-programmable gate array (FPGA) is a type of configurable integrated circuit that can be repeatedly programmed after manufacturing. FPGAs are a subset of logic devices referred to as programmable logic devices (PLDs). They consist of a grid-connected array of programmable logic blocks that can be configured "in the field" to interconnect with other logic blocks to perform various digital functions. FPGAs are often used in limited (low) quantity production of custom-made products, and in research and development, where the higher cost of individual FPGAs is not as important and where creating and manufacturing a custom circuit would not be feasible. Other applications for FPGAs include the telecommunications, automotive, aerospace, and industrial sectors, which benefit from their flexibility, high signal processing speed, and parallel processing abilities.
A FPGA configuration is generally written using a hardware description language (HDL) e.g. VHDL, similar to the ones used for application-specific integrated circuits (ASICs). Circuit diagrams were formerly used to write the configuration.
The logic blocks of an FPGA can be configured to perform complex combinational functions, or act as simple logic gates like AND and XOR. In most FPGAs, logic blocks also include memory elements, which may be simple flip-flops or more sophisticated blocks of memory.[1] Many FPGAs can be reprogrammed to implement different logic functions, allowing flexible reconfigurable computing as performed in computer software.
FPGAs also have a role in embedded system development due to their capability to start system software development simultaneously with hardware, enable system performance simulations at a very early phase of the development, and allow various system trials and design iterations before finalizing the system architecture.[2]
FPGAs are also commonly used during the development of ASICs to speed up the simulation process.
History
[edit]The FPGA industry sprouted from programmable read-only memory (PROM) and programmable logic devices (PLDs). PROMs and PLDs both had the option of being programmed in batches in a factory or in the field (field-programmable).[3]
Altera was founded in 1983 and delivered the industry's first reprogrammable logic device in 1984 – the EP300 – which featured a quartz window in the package that allowed users to shine an ultra-violet lamp on the die to erase the EPROM cells that held the device configuration.[4]
Xilinx produced the first commercially viable field-programmable gate array in 1985[3] – the XC2064.[5] The XC2064 had programmable gates and programmable interconnects between gates, the beginnings of a new technology and market.[6] The XC2064 had 64 configurable logic blocks (CLBs), with two three-input lookup tables (LUTs).[7]
In 1987, the Naval Surface Warfare Center funded an experiment proposed by Steve Casselman to develop a computer that would implement 600,000 reprogrammable gates. Casselman was successful and a patent related to the system was issued in 1992.[3]
Altera and Xilinx continued unchallenged and quickly grew from 1985 to the mid-1990s when competitors sprouted up, eroding a significant portion of their market share. By 1993, Actel (later Microsemi, now Microchip) was serving about 18 percent of the market.[6]
The 1990s were a period of rapid growth for FPGAs, both in circuit sophistication and the volume of production. In the early 1990s, FPGAs were primarily used in telecommunications and networking. By the end of the decade, FPGAs found their way into consumer, automotive, and industrial applications.[8]
By 2013, Altera (31 percent), Xilinx (36 percent) and Actel (10 percent) together represented approximately 77 percent of the FPGA market.[9]
Companies like Microsoft have started to use FPGAs to accelerate high-performance, computationally intensive systems (like the data centers that operate their Bing search engine), due to the performance per watt advantage FPGAs deliver.[10] Microsoft began using FPGAs to accelerate Bing in 2014, and in 2018 began deploying FPGAs across other data center workloads for their Azure cloud computing platform.[11]
Since 2019, modern generation of FPGAs have been integrated with other architectures like AI engines to target workloads in artificial intelligence domain.[12]
Growth
[edit]The following timelines indicate progress in different aspects of FPGA design.
Gates
[edit]- 1987: 9,000 gates, Xilinx[6]
- 1992: 600,000, Naval Surface Warfare Department[3]
- Early 2000s: millions[8]
- 2013: 50 million, Xilinx[13]
Market size
[edit]- 1985: First commercial FPGA : Xilinx XC2064[5][6]
- 1987: $14 million[6]
- c. 1993: >$385 million[6][failed verification]
- 2005: $1.9 billion[14]
- 2010 estimates: $2.75 billion[14]
- 2013: $5.4 billion[15]
- 2020 estimate: $9.8 billion[15]
- 2030 estimate: $23.34 billion[16]
Design starts
[edit]A design start is a new custom design for implementation on an FPGA.
Design
[edit]Contemporary FPGAs have ample logic gates and RAM blocks to implement complex digital computations. FPGAs can be used to implement any logical function that an ASIC can perform. The ability to update the functionality after shipping, partial re-configuration of a portion of the design[19] and the low non-recurring engineering costs relative to an ASIC design (notwithstanding the generally higher unit cost), offer advantages for many applications.[1]
As FPGA designs employ very fast I/O rates and bidirectional data buses, it becomes a challenge to verify correct timing of valid data within setup time and hold time.[20] Floor planning helps resource allocation within FPGAs to meet these timing constraints.
Some FPGAs have analog features in addition to digital functions. The most common analog feature is a programmable slew rate on each output pin. This allows the user to set low rates on lightly loaded pins that would otherwise ring or couple unacceptably, and to set higher rates on heavily loaded high-speed channels that would otherwise run too slowly.[21][22] Also common are quartz-crystal oscillator driver circuitry, on-chip RC oscillators, and phase-locked loops with embedded voltage-controlled oscillators used for clock generation and management as well as for high-speed serializer-deserializer (SERDES) transmit clocks and receiver clock recovery. Fairly common are differential comparators on input pins designed to be connected to differential signaling channels. A few mixed signal FPGAs have integrated peripheral analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) with analog signal conditioning blocks, allowing them to operate as a system on a chip (SoC).[23] Such devices blur the line between an FPGA, which carries digital ones and zeros on its internal programmable interconnect fabric, and field-programmable analog array (FPAA), which carries analog values on its internal programmable interconnect fabric.
Logic blocks
[edit]
The most common FPGA architecture consists of an array of logic blocks called configurable logic blocks (CLBs) or logic array blocks (LABs) (depending on vendor), I/O pads, and routing channels.[1] Generally, all the routing channels have the same width (number of signals). Multiple I/O pads may fit into the height of one row or the width of one column in the array.
"An application circuit must be mapped into an FPGA with adequate resources. While the number of logic blocks and I/Os required is easily determined from the design, the number of routing channels needed may vary considerably even among designs with the same amount of logic. For example, a crossbar switch requires much more routing than a systolic array with the same gate count. Since unused routing channels increase the cost (and decrease the performance) of the FPGA without providing any benefit, FPGA manufacturers try to provide just enough channels so that most designs that will fit in terms of lookup tables (LUTs) and I/Os can be routed. This is determined by estimates such as those derived from Rent's rule or by experiments with existing designs."[24]
In general, a logic block consists of a few logical cells. A typical cell consists of a 4-input LUT, a full adder (FA) and a D-type flip-flop. The LUT might be split into two 3-input LUTs. In normal mode those are combined into a 4-input LUT through the first multiplexer (mux). In arithmetic mode, their outputs are fed to the adder. The selection of mode is programmed into the second mux. The output can be either synchronous or asynchronous, depending on the programming of the third mux. In practice, the entire adder or parts of it are stored as functions into the LUTs in order to save space.[25][26][27]
Hard blocks
[edit]Modern FPGA families expand upon the above capabilities to include higher-level functionality fixed in silicon. Having these common functions embedded in the circuit reduces the area required and gives those functions increased performance compared to building them from logical primitives. Examples of these include multipliers, generic DSP blocks, embedded processors, high-speed I/O logic and embedded memories.
Higher-end FPGAs can contain high-speed multi-gigabit transceivers and hard IP cores such as processor cores, Ethernet medium access control units, PCI or PCI Express controllers, and external memory controllers. These cores exist alongside the programmable fabric, but they are built out of transistors instead of LUTs so they have ASIC-level performance and power consumption without consuming a significant amount of fabric resources, leaving more of the fabric free for the application-specific logic. The multi-gigabit transceivers also contain high-performance signal conditioning circuitry along with high-speed serializers and deserializers, components that cannot be built out of LUTs. Higher-level physical layer (PHY) functionality such as line coding may or may not be implemented alongside the serializers and deserializers in hard logic, depending on the FPGA.
Soft core
[edit]
An alternate approach to using hard macro processors is to make use of soft processor IP cores that are implemented within the FPGA logic. Nios II, MicroBlaze and Mico32 are examples of popular softcore processors. Many modern FPGAs are programmed at run time, which has led to the idea of reconfigurable computing or reconfigurable systems – CPUs that reconfigure themselves to suit the task at hand. Additionally, new non-FPGA architectures are beginning to emerge. Software-configurable microprocessors such as the Stretch S5000 adopt a hybrid approach by providing an array of processor cores and FPGA-like programmable cores on the same chip.
Integration
[edit]In 2012 the coarse-grained architectural approach was taken a step further by combining the logic blocks and interconnects of traditional FPGAs with embedded microprocessors and related peripherals to form a complete system on a programmable chip. Examples of such hybrid technologies can be found in the Xilinx Zynq-7000 all programmable SoC,[28] which includes a 1.0 GHz dual-core ARM Cortex-A9 MPCore processor embedded within the FPGA's logic fabric,[29] or in the Altera Arria V FPGA, which includes an 800 MHz dual-core ARM Cortex-A9 MPCore. The Atmel FPSLIC is another such device, which uses an AVR processor in combination with Atmel's programmable logic architecture. The Microsemi SmartFusion devices incorporate an ARM Cortex-M3 hard processor core (with up to 512 kB of flash and 64 kB of RAM) and analog peripherals such as a multi-channel analog-to-digital converters and digital-to-analog converters in their flash memory-based FPGA fabric.[citation needed]
Clocking
[edit]Most of the logic inside of an FPGA is synchronous circuitry that requires a clock signal. FPGAs contain dedicated global and regional routing networks for clock and reset, typically implemented as an H tree, so they can be delivered with minimal skew. FPGAs may contain analog phase-locked loop or delay-locked loop components to synthesize new clock frequencies and manage jitter. Complex designs can use multiple clocks with different frequency and phase relationships, each forming separate clock domains. These clock signals can be generated locally by an oscillator or they can be recovered from a data stream. Care must be taken when building clock domain crossing circuitry to avoid metastability. Some FPGAs contain dual port RAM blocks that are capable of working with different clocks, aiding in the construction of building FIFOs and dual port buffers that bridge clock domains.
3D architectures
[edit]To shrink the size and power consumption of FPGAs, vendors such as Tabula and Xilinx have introduced 3D or stacked architectures.[30][31] Following the introduction of its 28 nm 7-series FPGAs, Xilinx said that several of the highest-density parts in those FPGA product lines will be constructed using multiple dies in one package, employing technology developed for 3D construction and stacked-die assemblies.
Xilinx's approach stacks several (three or four) active FPGA dies side by side on a silicon interposer – a single piece of silicon that carries passive interconnect.[31][32] The multi-die construction also allows different parts of the FPGA to be created with different process technologies, as the process requirements are different between the FPGA fabric itself and the very high speed 28 Gbit/s serial transceivers. An FPGA built in this way is called a heterogeneous FPGA.[33]
Altera's heterogeneous approach involves using a single monolithic FPGA die and connecting other dies and technologies to the FPGA using Intel's embedded multi_die interconnect bridge (EMIB) technology.[34]
Programming
[edit]To define the behavior of the FPGA, the user provides a design in a hardware description language (HDL) or as a schematic design. The HDL form is more suited to work with large structures because it's possible to specify high-level functional behavior rather than drawing every piece by hand. However, schematic entry can allow for easier visualization of a design and its component modules.
Using an electronic design automation tool, a technology-mapped netlist is generated. The netlist can then be fit to the actual FPGA architecture using a process called place and route, usually performed by the FPGA company's proprietary place-and-route software. The user will validate the results using timing analysis, simulation, and other verification and validation techniques. Once the design and validation process is complete, the binary file generated, typically using the FPGA vendor's proprietary software, is used to (re-)configure the FPGA. This file is transferred to the FPGA via a serial interface (JTAG) or to an external memory device such as an EEPROM.
The most common HDLs are VHDL and Verilog. National Instruments' LabVIEW graphical programming language (sometimes referred to as G) has an FPGA add-in module available to target and program FPGA hardware. Verilog was created to simplify the process making HDL more robust and flexible. Verilog has a C-like syntax, unlike VHDL.[35][self-published source?]
To simplify the design of complex systems in FPGAs, there exist libraries of predefined complex functions and circuits that have been tested and optimized to speed up the design process. These predefined circuits are commonly called intellectual property (IP) cores, and are available from FPGA vendors and third-party IP suppliers. They are rarely free, and typically released under proprietary licenses. Other predefined circuits are available from developer communities such as OpenCores (typically released under free and open source licenses such as the GPL, BSD or similar license). Such designs are known as open-source hardware.
In a typical design flow, an FPGA application developer will simulate the design at multiple stages throughout the design process. Initially the RTL description in VHDL or Verilog is simulated by creating test benches to simulate the system and observe results. Then, after the synthesis engine has mapped the design to a netlist, the netlist is translated to a gate-level description where simulation is repeated to confirm the synthesis proceeded without errors. Finally, the design is laid out in the FPGA at which point propagation delay values can be back-annotated onto the netlist, and the simulation can be run again with these values.
More recently, OpenCL (Open Computing Language) is being used by programmers to take advantage of the performance and power efficiencies that FPGAs provide. OpenCL allows programmers to develop code in the C programming language.[36] For further information, see high-level synthesis and C to HDL.
Most FPGAs rely on an SRAM-based approach to be programmed. These FPGAs are in-system programmable and re-programmable, but require external boot devices. For example, flash memory or EEPROM devices may load contents into internal SRAM that controls routing and logic. The SRAM approach is based on CMOS.
Rarer alternatives to the SRAM approach include:
- Fuse: one-time programmable. Bipolar. Obsolete.
- Antifuse: one-time programmable. CMOS. Examples: Actel SX and Axcelerator families; Quicklogic Eclipse II family.[37]
- PROM: programmable read-only memory technology. One-time programmable because of plastic packaging.[clarification needed] Obsolete.
- EPROM: erasable programmable read-only memory technology. One-time programmable but with window, can be erased with ultraviolet (UV) light. CMOS. Obsolete.
- EEPROM: electrically erasable programmable read-only memory technology. Can be erased, even in plastic packages. Some but not all EEPROM devices can be in-system programmed. CMOS.
- Flash: flash-erase EPROM technology. Can be erased, even in plastic packages. Some but not all flash devices can be in-system programmed. Usually, a flash cell is smaller than an equivalent EEPROM cell and is, therefore, less expensive to manufacture. CMOS. Example: Actel ProASIC family.[37]
Manufacturers
[edit]In 2016, long-time industry rivals Xilinx (now part of AMD) and Altera (now part of Intel) were the FPGA market leaders.[38] At that time, they controlled nearly 90 percent of the market.
Both Xilinx and Altera provide proprietary electronic design automation software for Windows and Linux (ISE/Vivado and Quartus) which enables engineers to design, analyze, simulate, and synthesize (compile) their designs.[39][40]
In March 2010, Tabula announced their FPGA technology that uses time-multiplexed logic and interconnect that claims potential cost savings for high-density applications.[41] On March 24, 2015, Tabula officially shut down.[42]
On June 1, 2015, Intel announced it would acquire Altera for approximately US$16.7 billion and completed the acquisition on December 30, 2015.[43]
On October 27, 2020, AMD announced it would acquire Xilinx[44] and completed the acquisition valued at about US$50 billion in February 2022.[45]
In February 2024 Altera became independent of Intel again.[46]
Other manufacturers include:
- Achronix, manufacturing SRAM based FPGAs with 1.5 GHz fabric speed[47]
- Altium, provides system-on-FPGA hardware-software design environment.[48]
- Cologne Chip, German government-backed designer and producer of FPGAs[49]
- Efinix offers small to medium-sized FPGAs. They combine logic and routing interconnects into a configurable XLR cell.[citation needed]
- GOWIN Semiconductors, manufacturing small and medium-sized SRAM and flash-based FPGAs. They also offer pin-compatible replacements for a few Xilinx, Altera and Lattice products.[citation needed]
- Lattice Semiconductor manufactures low-power SRAM-based FPGAs featuring integrated configuration flash, instant-on and live reconfiguration
- SiliconBlue Technologies provides extremely low-power SRAM-based FPGAs with optional integrated nonvolatile configuration memory; acquired by Lattice in 2011
- Microchip:
- Microsemi (previously Actel), producing antifuse, flash-based, mixed-signal FPGAs; acquired by Microchip in 2018
- Atmel, a second source of some Altera-compatible devices; also FPSLIC[clarification needed] mentioned above;[50] acquired by Microchip in 2016
- QuickLogic manufactures ultra-low-power sensor hubs, extremely-low-powered, low-density SRAM-based FPGAs, with display bridges MIPI and RGB inputs; MIPI, RGB and LVDS outputs.[51]
Applications
[edit]An FPGA can be used to solve any problem which is computable. FPGAs can be used to implement a soft microprocessor, such as the Xilinx MicroBlaze or Altera Nios II. But their advantage lies in that they are significantly faster for some applications because of their parallel nature and optimality in terms of the number of gates used for certain processes.[52]
FPGAs were originally introduced as competitors to complex programmable logic devices (CPLDs) to implement glue logic for printed circuit boards. As their size, capabilities, and speed increased, FPGAs took over additional functions to the point where some are now marketed as full systems on chips (SoCs). Particularly with the introduction of dedicated multipliers into FPGA architectures in the late 1990s, applications that had traditionally been the sole reserve of digital signal processors (DSPs) began to use FPGAs instead.[53][54]
The evolution of FPGAs has motivated an increase in the use of these devices, whose architecture allows the development of hardware solutions optimized for complex tasks, such as 3D MRI image segmentation, 3D discrete wavelet transform, tomographic image reconstruction, or PET/MRI systems.[55][56] The developed solutions can perform intensive computation tasks with parallel processing, are dynamically reprogrammable, and have a low cost, all while meeting the hard real-time requirements associated with medical imaging.
Another trend in the use of FPGAs is hardware acceleration, where one can use the FPGA to accelerate certain parts of an algorithm and share part of the computation between the FPGA and a general-purpose processor. The search engine Bing is noted for adopting FPGA acceleration for its search algorithm in 2014.[57] As of 2018[update], FPGAs are seeing increased use as AI accelerators including Microsoft's Project Catapult[11] and for accelerating artificial neural networks for machine learning applications.
Originally,[when?] FPGAs were reserved for specific vertical applications where the volume of production is small. For these low-volume applications, the premium that companies pay in hardware cost per unit for a programmable chip is more affordable than the development resources spent on creating an ASIC. Often a custom-made chip would be cheaper if made in larger quantities, but FPGAs may be chosen to quickly bring a product to market. By 2017, new cost and performance dynamics broadened the range of viable applications.[citation needed]
Other uses for FPGAs include:
- Space (with radiation hardening[58])
- Hardware security modules[59]
- High-speed financial transactions[60][61]
- Retrocomputing (e.g. the MARS and MiSTer FPGA projects)[62]
- Large scale integrated digital differential analyzers, a form of an analog computer based on digital computing elements[63]
Usage by United States military
[edit]FPGAs play a crucial role in modern military communications, especially in systems like the Joint Tactical Radio System (JTRS) and in devices from companies such as Thales and Harris Corporation. Their flexibility and programmability make them ideal for military communications, offering customizable and secure signal processing. In the JTRS, used by the US military, FPGAs provide adaptability and real-time processing, crucial for meeting various communication standards and encryption methods.[64]
Security
[edit]Concerning hardware security, FPGAs have both advantages and disadvantages as compared to ASICs or secure microprocessors. FPGAs' flexibility makes malicious modifications during fabrication a lower risk.[65] Previously, for many FPGAs, the design bitstream was exposed while the FPGA loads it from external memory, typically during powerup. All major FPGA vendors now offer a spectrum of security solutions to designers such as bitstream encryption and authentication. For example, Altera and Xilinx offer AES encryption (up to 256-bit) for bitstreams stored in an external flash memory. Physical unclonable functions (PUFs) are integrated circuits that have their own unique signatures and can be used to secure FPGAs while taking up very little hardware space.[66]
FPGAs that store their configuration internally in nonvolatile flash memory, such as Microsemi's ProAsic 3 or Lattice's XP2 programmable devices, do not expose the bitstream and do not need encryption. Customers wanting a higher guarantee of tamper resistance can use write-once, antifuse FPGAs from vendors such as Microsemi.
With its Stratix 10 FPGAs and SoCs, Altera introduced a Secure Device Manager and physical unclonable functions to provide high levels of protection against physical attacks.[67]
In 2012 researchers Sergei Skorobogatov and Christopher Woods demonstrated that some FPGAs can be vulnerable to hostile intent. They discovered a critical backdoor vulnerability had been manufactured in silicon as part of the Actel/Microsemi ProAsic 3 making it vulnerable on many levels such as reprogramming crypto and access keys, accessing unencrypted bitstream, modifying low-level silicon features, and extracting configuration data.[68]
In 2020 a critical vulnerability (named Starbleed) was discovered in all Xilinx 7 series FPGAs that rendered bitstream encryption useless. There is no workaround. Xilinx did not produce a hardware revision. Ultrascale and later devices, already on the market at the time, were not affected.[citation needed]
Similar technologies
[edit]Historically, FPGAs have been slower, less energy efficient and generally achieved less functionality than their fixed ASIC counterparts. A study from 2006 showed that designs implemented on FPGAs need on average 40 times as much area, draw 12 times as much dynamic power, and run at one third the speed of corresponding ASIC implementations.[69]
Advantages of FPGAs include the ability to reprogram equipment in the field to fix bugs or make other improvements. Some FPGAs have the capability of partial re-configuration that lets one portion of the device be re-programmed while other portions continue running.[70][71] Other advantages may include shorter time to market and lower non-recurring engineering costs. Vendors can also take a middle road via FPGA prototyping: developing their prototype hardware on FPGAs, but manufacturing their final version as an ASIC after the design has been committed. This is often also the case with new processor designs.[72]
The primary differences between CPLDs and FPGAs are architectural. A CPLD has a comparatively restrictive structure consisting of one or more programmable sum-of-products logic arrays feeding a relatively small number of clocked registers. As a result, CPLDs are less flexible but have the advantage of more predictable propagation delay. FPGA architectures, on the other hand, are dominated by interconnect. This makes them far more flexible but also far more complex to design for, or at least requiring more complex electronic design automation (EDA) software. Another distinction between FPGAs and CPLDs is one of size, as FPGAs are usually much larger in terms of resources than CPLDs. Typically only FPGAs contain more complex embedded functions such as adders, multipliers, memory, and serializer/deserializers. Another common distinction is that CPLDs contain embedded flash memory to store their configuration, while FPGAs typically store their configuration in SRAM and require external non-volatile memory to initialize it on powerup. When a design requires simple instant-on, CPLDs are generally preferred. Sometimes both CPLDs and FPGAs are used in a single system design. In those designs, CPLDs generally perform glue logic functions.[73]
See also
[edit]- FPGA Mezzanine Card
- CRUVI FPGA Card FPGA daughter card standard of Standardization Group for Embedded Technologies e.V. (SGET)
- List of HDL simulators
References
[edit]- ^ a b c "FPGA Architecture for the Challenge". toronto.edu. University of Toronto.
- ^ Simpson, P. A. (2015). FPGA Design, Best Practices for Team Based Reuse, 2nd edition. Switzerland: Springer International Publishing AG. p. 16. ISBN 978-3-319-17924-7.
- ^ a b c d "History of FPGAs". Archived from the original on April 12, 2007. Retrieved 2013-07-11.
- ^ Ron Wilson (21 April 2015). "In the Beginning". altera.com. Archived from the original on 2015-04-21.
- ^ a b "XCELL issue 32" (PDF). Xilinx. Archived (PDF) from the original on 2011-01-07.
- ^ a b c d e f Funding Universe. "Xilinx, Inc." Retrieved January 15, 2009.
- ^ Clive Maxfield, Programmable Logic DesignLine, "Xilinx unveil revolutionary 65nm FPGA architecture: the Virtex-5 family Archived 2009-12-25 at the Wayback Machine. May 15, 2006. Retrieved February 5, 2009.
- ^ a b Maxfield, Clive (2004). The Design Warrior's Guide to FPGAs: Devices, Tools and Flows. Elsevier. p. 4. ISBN 978-0-7506-7604-5.
- ^ "Top FPGA Companies For 2013". sourcetech411.com. 2013-04-28. Archived from the original on 2015-07-09. Retrieved 2015-07-08.
- ^ "Microsoft Supercharges Bing Search With Programmable Chips". WIRED. 16 June 2014.
- ^ a b "Project Catapult". Microsoft Research. July 2018.
- ^ Gaide, Brian; Gaitonde, Dinesh; Ravishankar, Chirag; Bauer, Trevor (2019-02-20). "Xilinx Adaptive Compute Acceleration Platform: Versal Architecture". Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM. pp. 84–93. doi:10.1145/3289602.3293906. ISBN 978-1-4503-6137-8.
- ^ Maxfield, Max. "Xilinx UltraScale FPGA Offers 50 Million Equivalent ASIC Gates". www.eetimes.com. EE Times.
- ^ a b Dylan McGrath, EE Times, "FPGA Market to Pass $2.7 Billion by '10, In-Stat Says". May 24, 2006. Retrieved February 5, 2009.
- ^ a b "Global FPGA Market Analysis And Segment Forecasts To 2020 – FPGA Industry, Outlook, Size, Application, Product, Share, Growth Prospects, Key Opportunities, Dynamics, Trends, Analysis, FPGA Report – Grand View Research Inc". grandviewresearch.com.
- ^ "Field Programmable Gate Array Market To Reach $23.34Bn By 2030". www.grandviewresearch.com. Retrieved 2024-04-25.
- ^ Dylan McGrath, EE Times, "Gartner Dataquest Analyst Gives ASIC, FPGA Markets Clean Bill of Health". June 13, 2005. Retrieved February 5, 2009.
- ^ "Virtex-4 Family Overview" (PDF). xilinx.com. Archived (PDF) from the original on 2007-11-22. Retrieved 14 April 2018.
- ^ Wisniewski, Remigiusz (2009). Synthesis of compositional microprogram control units for programmable devices. Zielona Góra: University of Zielona Góra. p. 153. ISBN 978-83-7481-293-1.[permanent dead link]
- ^ Oklobdzija, Vojin G. (2017). Digital Design and Fabrication. CRC Press. ISBN 9780849386046.
- ^ "FPGA Signal Integrity tutorial". altium.com. Archived from the original on 2016-03-07. Retrieved 2010-06-15.
- ^ NASA: FPGA drive strength Archived 2010-12-05 at the Wayback Machine
- ^ Mike Thompson (2007-07-02). "Mixed-signal FPGAs provide GREEN POWER". Design & Reuse.
- ^ M.b, Swami; V.p, Pawar (2014-07-31). "VLSI DESIGN: A NEW APPROACH". Journal of Intelligence Systems. 4 (1): 60–63. ISSN 2229-7057.
- ^ 2. CycloneII Architecture Archived 2010-12-14 at the Wayback Machine. Altera. February 2007
- ^ "Documentation: Stratix IV Devices" (PDF). Altera.com. 2008-06-11. Archived from the original (PDF) on 2011-09-26. Retrieved 2013-05-01.
- ^ Virtex-4 FPGA User Guide (December 1st, 2008). Xilinx, Inc.
- ^ "Xilinx Inc, Form 8-K, Current Report, Filing Date Oct 19, 2011". secdatabase.com. Retrieved May 6, 2018.
- ^ "Xilinx Inc, Form 10-K, Annual Report, Filing Date May 31, 2011". secdatabase.com. Retrieved May 6, 2018.
- ^ Dean Takahashi, VentureBeat. "Intel connection helped chip startup Tabula raise $108M." May 2, 2011. Retrieved May 13, 2011.
- ^ a b Lawrence Latif, The Inquirer. "FPGA manufacturer claims to beat Moore's Law." October 27, 2010. Retrieved May 12, 2011.
- ^ EDN Europe. "Xilinx adopts stacked-die 3D packaging Archived 2011-02-19 at the Wayback Machine." November 1, 2010. Retrieved May 12, 2011.
- ^ Saban, Kirk (December 11, 2012). "Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity, Bandwidth, and Power Efficiency" (PDF). xilinx.com. Archived (PDF) from the original on 2010-11-05. Retrieved 2018-11-30.
- ^ "Intel Custom Foundry EMIB". Intel. Archived from the original on 2015-07-13. Retrieved 2015-07-13.
- ^ "Battle Over the FPGA: VHDL vs Verilog! Who is the True Champ?". digilentinc.com. Archived from the original on 2020-12-26. Retrieved 2020-12-16.
- ^ "Why use OpenCL on FPGAs?". StreamComputing. 2014-09-16. Archived from the original on 2017-01-01. Retrieved 2015-07-17.
- ^ a b "All about FPGAs". 21 March 2006.
- ^ Dillien, Paul (March 6, 2017). "And the Winner of Best FPGA of 2016 is..." EETimes. Archived from the original on January 5, 2019. Retrieved September 7, 2017.
- ^ "Xilinx ISE Design Suite". www.xilinx.com. Retrieved 2018-12-01.
- ^ "FPGA Design Software - Intel Quartus Prime". Intel. Retrieved 2018-12-01.
- ^ "Tabula's Time Machine — Micro Processor Report" (PDF). Archived from the original (PDF) on 2011-04-10.
- ^ Tabula to shut down; 120 jobs lost at fabless chip company Silicon Valley Business Journal
- ^ "Intel to buy Altera for $16.7 billion in its biggest deal ever". Reuters. June 2015.
- ^ "AMD to Acquire Xilinx, Creating the Industry's High Performance Computing Leader". October 2020.
- ^ "AMD closes record chip industry deal with estimated $50 billion purchase of Xilinx". Reuters. February 2022.
- ^ "Intel Launches Altera, Its New Standalone FPGA Company". Intel (Press release). Retrieved 2024-02-29.
- ^ "Achronix to Use Intel's 22nm Manufacturing". Intel Newsroom (Press release). 2010-11-01. Archived from the original on 2015-09-30. Retrieved 2018-12-01.[better source needed]
- ^ Maxfield, Clive (16 June 2004). The Design Warrior's Guide to FPGAs. Elsevier Science. ISBN 9780080477138.
- ^ "About the company – Cologne Chip". Retrieved 2024-02-27.[better source needed]
- ^ "Top FPGA Companies For 2013". SourceTech411. 2013-04-28. Archived from the original on 2018-08-24. Retrieved 2018-12-01.
- ^ "QuickLogic — Customizable Semiconductor Solutions for Mobile Devices". www.quicklogic.com. QuickLogic Corporation. Retrieved 2018-10-07.[better source needed]
- ^ "Xilinx Inc, Form 8-K, Current Report, Filing Date Apr 26, 2006". secdatabase.com. Retrieved May 6, 2018.
- ^ "Publications and Presentations". bdti.com. Archived from the original on 2010-08-21. Retrieved 2018-11-02.
- ^ LaPedus, Mark (5 February 2007). "Xilinx aims 65-nm FPGAs at DSP applications". EETimes.
- ^ Alcaín, Eduardo; Fernández, Pedro R.; Nieto, Rubén; Montemayor, Antonio S.; Vilas, Jaime; Galiana-Bordera, Adrian; Martinez-Girones, Pedro Miguel; Prieto-de-la-Lastra, Carmen; Rodriguez-Vila, Borja; Bonet, Marina; Rodriguez-Sanchez, Cristina (2021-12-15). "Hardware Architectures for Real-Time Medical Imaging". Electronics. 10 (24): 3118. doi:10.3390/electronics10243118. ISSN 2079-9292.
- ^ Nagornov, Nikolay N.; Lyakhov, Pavel A.; Valueva, Maria V.; Bergerman, Maxim V. (2022). "RNS-Based FPGA Accelerators for High-Quality 3D Medical Image Wavelet Processing Using Scaled Filter Coefficients". IEEE Access. 10: 19215–19231. Bibcode:2022IEEEA..1019215N. doi:10.1109/ACCESS.2022.3151361. ISSN 2169-3536. S2CID 246895876.
- ^ Morgan, Timothy Pricket (2014-09-03). "How Microsoft Is Using FPGAs To Speed Up Bing Search". Enterprise Tech. Retrieved 2018-09-18.[permanent dead link]
- ^ "FPGA development devices for radiation-hardened space applications introduced by Microsemi". www.militaryaerospace.com. 2016-06-03. Retrieved 2018-11-02.
- ^ "CrypTech: Building Transparency into Cryptography t" (PDF). Archived (PDF) from the original on 2016-08-07.
- ^ Mann, Tobias (2023-03-08). "While Intel XPUs are delayed, here's some more FPGAs to tide you over". The Register.
- ^ Leber, Christian; Geib, Benjamin; Litz, Heiner (September 2011). High Frequency Trading Acceleration Using FPGAs. International Conference on Field Programmable Logic and Applications. IEEE. doi:10.1109/FPL.2011.64.
- ^ "The DIY MiSTer Handheld". 16 December 2024.
- ^ DDA on FPGA - A modern Analog Computer
- ^ "Software-defined radio and JTRS". Military Aerospace. 2004-12-01. Retrieved 2024-01-17.
- ^ Huffmire, Ted; Brotherton, Brett; Sherwood, Timothy; Kastner, Ryan; Levin, Timothy; Nguyen, Thuy D.; Irvine, Cynthia (2008). "Managing Security in FPGA-Based Embedded Systems". IEEE Design & Test of Computers. 25 (6): 590–598. Bibcode:2008IDTC...25..590H. doi:10.1109/MDT.2008.166. hdl:10945/7159. S2CID 115840.
- ^ Babaei, Armin; Schiele, Gregor; Zohner, Michael (2022-07-26). "Reconfigurable Security Architecture (RESA) Based on PUF for FPGA-Based IoT Devices". Sensors. 22 (15): 5577. Bibcode:2022Senso..22.5577B. doi:10.3390/s22155577. ISSN 1424-8220. PMC 9331300. PMID 35898079.
- ^ "EETimes on PUF: Security features for non-security experts – Intrinsic ID". Intrinsic ID. 2015-06-09. Archived from the original on 2015-07-13. Retrieved 2015-07-12.
- ^ Skorobogatov, Sergei; Woods, Christopher (2012). "Breakthrough Silicon Scanning Discovers Backdoor in Military Chip". Cryptographic Hardware and Embedded Systems – CHES 2012. Lecture Notes in Computer Science. Vol. 7428. pp. 23–40. doi:10.1007/978-3-642-33027-8_2. ISBN 978-3-642-33026-1.
- ^ Kuon, Ian; Rose, Jonathan (2006). "Measuring the gap between FPGAs and ASICs" (PDF). Proceedings of the international symposium on Field programmable gate arrays – FPGA'06. New York, NY: ACM. pp. 21–30. doi:10.1145/1117201.1117205. ISBN 1-59593-292-5. Archived from the original (PDF) on 2010-06-22. Retrieved 2017-10-25.
- ^ "AN 818: Static Update Partial Reconfiguration Tutorial: for Intel Stratix 10 GX FPGA Development Board". www.intel.com. Retrieved 2018-12-01.
- ^ "Can FPGAs dynamically modify their logic?". Electrical Engineering Stack Exchange. Retrieved 2018-12-01.
- ^ Cutress, Ian (August 27, 2019). "Xilinx Announces World Largest FPGA: Virtex Ultrascale+ VU19P with 9m Cells". AnandTech. Archived from the original on August 27, 2019.
- ^ "CPLD vs FPGA: Differences between them and which one to use? – Numato Lab Help Center". numato.com. 2017-11-29.
Further reading
[edit]- Sadrozinski, Hartmut F.-W.; Wu, Jinyuan (2010). Applications of Field-Programmable Gate Arrays in Scientific Research. Taylor & Francis. ISBN 978-1-4398-4133-4.
- Wirth, Niklaus (1995). Digital Circuit Design An Introduction Textbook. Springer. ISBN 978-3-540-58577-0.
- Mitra, Jubin (2018). "An FPGA-Based Phase Measurement System". IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 26 (1). IEEE: 133–142. Bibcode:2018ITVL...26..133M. doi:10.1109/TVLSI.2017.2758807. S2CID 4920719.
- Mencer, Oskar et al. (2020). "The history, status, and future of FPGAs". Communications of the ACM. ACM. Vol. 63, No. 10. doi:10.1145/3410669
External links
[edit]Field-programmable gate array
View on GrokipediaHistory
Invention and Early Development
The concept of field-programmable gate arrays (FPGAs) emerged from earlier programmable logic devices (PLDs) developed in the late 1970s, such as programmable array logic (PAL) and field-programmable logic arrays (FPLA), which utilized PROM-based fusible links for custom logic implementation.[6] These devices, pioneered by Monolithic Memories Inc. (MMI), offered a step beyond fixed TTL logic by allowing users to program AND/OR arrays for prototyping, but they were limited to simple combinational functions without extensive interconnectivity.[7] In the 1970s, during the burgeoning very-large-scale integration (VLSI) era, engineers sought alternatives to costly custom integrated circuits (ICs), as the shift from small-scale to high-density chips increased design complexity and non-recurring engineering expenses for application-specific integrated circuits (ASICs).[8] Ross Freeman, an engineer at Zilog, conceived the idea of a reprogrammable logic array in the mid-1970s, filing initial patent applications for a device with configurable gates and interconnects that could be field-programmed multiple times without fabrication.[9] Freeman, along with Bernard Vonderschmitt and James Barnett, founded Xilinx in February 1984 to commercialize this vision, aiming to bridge the gap between rapid prototyping and production hardware amid the VLSI boom.[10] Their breakthrough culminated in the invention of the first FPGA in 1984, patented as a configurable electrical circuit with variably interconnected logic elements controlled by memory cells.[11] Xilinx released the XC2064, the world's first commercial FPGA, in November 1985, featuring 64 configurable logic blocks (CLBs) equivalent to approximately 1,000 to 1,500 gates and fabricated in a 1.2-micron CMOS process.[12][13] This device allowed users to program logic functions and routing in the field using electrical signals, reducing dependency on mask-programmed ASICs.[12] Early FPGAs like the XC2064 faced significant challenges, including high unit costs—often 10 times that of equivalent ASICs—and limited gate counts that restricted them to small-scale applications, making adoption slow outside niche prototyping. By the early 1990s, FPGAs began gaining traction in telecommunications for flexible signal processing and networking equipment, where reprogrammability supported evolving standards without full redesigns.[14] This initial market penetration marked a pivotal shift from custom IC dominance, enabling faster time-to-market despite ongoing cost and density limitations.[8]Technological Evolution and Market Growth
The technological evolution of field-programmable gate arrays (FPGAs) has been marked by exponential increases in logic density, driven by semiconductor process advancements and architectural refinements. In the 1980s, early commercial FPGAs, such as Xilinx's XC2064 introduced in 1985, offered densities equivalent to thousands of logic gates, limited by 1.2 μm process technology and basic configurable logic blocks. By the late 1990s and early 2000s, densities surged into the millions of system gates; for instance, the Xilinx Virtex-E family, released in 1999, scaled up to 4 million system gates using a 0.18 μm process, while the Virtex-II series in 2001 reached up to 10 million system gates on a 150 nm node. This growth continued through the 2010s and into the 2020s, with modern FPGAs leveraging sub-10 nm processes—such as 7 nm in AMD's Versal Premium series announced in 2020—enabling densities exceeding billions of transistors and supporting complex applications like AI acceleration.[15][13] Key innovations have paralleled these density gains, enhancing reprogrammability and performance. The widespread adoption of SRAM-based configuration in the 1990s, exemplified by Xilinx's XC4000 family launched in 1990, allowed for volatile but fast in-system reconfiguration, replacing earlier PROM and antifuse technologies and enabling iterative design prototyping. In the early 2000s, integration of specialized blocks further advanced capabilities: Xilinx's Virtex-II Pro in 2002 introduced dedicated DSP slices for efficient signal processing, while block RAM (BRAM) modules, first embedded in the original Virtex family in 1998, provided on-chip memory up to several megabits to reduce external dependencies. Entering the 2020s, 3D stacking and chiplet-based designs emerged as pivotal developments; AMD's Stacked Silicon Interconnect (SSI) technology, refined in the Virtex UltraScale+ series around 2016 and expanded in Versal adaptive compute acceleration platforms (ACAPs) by 2020, enables modular multi-die integration for higher bandwidth and scalability, akin to chiplet architectures in high-performance computing. Following the 2022 acquisition, AMD continued advancing FPGA technology, releasing the Versal AI Edge Gen 2 in 2024 on a 5nm process, enhancing AI inference capabilities at the edge.[16][17][18][19] Market growth has reflected these technological strides, transforming FPGAs from niche prototyping tools to essential components in diverse industries. The global FPGA market reached approximately $1 billion by 2000, fueled by adoption in telecommunications and defense for rapid ASIC emulation, where FPGAs' reprogrammability significantly lowered non-recurring engineering (NRE) costs compared to custom silicon development, which could exceed millions per project. By 2020, the market had expanded to nearly $10 billion, driven by demand in data centers, automotive, and 5G infrastructure, with projections estimating $9.9 billion for that year. As of 2025, the global FPGA market is estimated at around $11 billion, continuing growth driven by AI and adaptive computing demands.[20] A key enabler has been the reduced NRE barrier, allowing startups and enterprises to prototype complex systems on FPGAs before committing to ASIC production, thereby accelerating time-to-market. Industry shifts in the 2010s and 2020s underscore FPGA maturation, with consolidation among leaders and democratization via open-source ecosystems. Intel's $16.7 billion acquisition of Altera in 2015 integrated FPGA expertise into its CPU portfolio, enhancing hybrid CPU-FPGA offerings for datacenter acceleration. Similarly, AMD's $35 billion all-stock acquisition of Xilinx in 2022, completed in February, combined FPGA leadership with x86 and GPU technologies to target AI and edge computing markets. Concurrently, the rise of open-source tools in the 2010s, notably the Yosys Open SYnthesis Suite launched in 2011, has lowered entry barriers by providing free alternatives to proprietary flows, supporting synthesis for various FPGA architectures and fostering innovation in academic and hobbyist communities.[21][22][23]Fundamentals
Definition and Basic Principles
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or designer after manufacturing to implement custom digital logic functions through an array of programmable logic blocks interconnected by programmable routing resources.[24][1] This post-fabrication configurability distinguishes FPGAs from mask-programmed devices like application-specific integrated circuits (ASICs), enabling users to adapt the hardware for specific applications without requiring new silicon fabrication.[24][1] The core operating principle of an FPGA relies on reconfigurability via configuration memory, typically implemented using static random-access memory (SRAM) cells that store configuration bits to control the behavior of logic elements and interconnects.[1][25] These bits program multiplexers and other elements to route signals and define logic operations, allowing the FPGA to emulate diverse digital circuits from simple gates to complex systems.[1][26] Central to FPGA logic implementation are lookup tables (LUTs), small memory arrays that realize any combinational logic function by storing precomputed output values for all possible input combinations.[26][27] For instance, a 4-input LUT operates as a 16-bit read-only memory (ROM), where the inputs serve as address lines to select the appropriate output bit, enabling the emulation of any Boolean function of four variables without dedicated gate structures.[28][29] LUTs are paired with flip-flops in configurable logic blocks to support both combinational and sequential logic, providing the foundational building blocks for user-defined designs.[1][26] A key advanced concept in FPGA operation is partial reconfiguration, which permits dynamic modification of specific logic regions during runtime without interrupting or resetting the entire device.[30][31] This feature leverages the modular architecture to swap functionality in targeted areas, supporting applications requiring adaptability such as real-time system updates.[30] In terms of operational flow, an FPGA initializes upon power-on by loading a configuration bitstream from external non-volatile memory into its SRAM-based configuration cells, thereby instantiating the desired hardware behavior.[1][25] The bitstream is derived from user-specified hardware descriptions authored in hardware description languages (HDLs) like Verilog or VHDL, which undergo synthesis, placement, and routing in electronic design automation (EDA) tools to generate the final configuration file.[1][32]Comparison to Fixed Hardware
Field-programmable gate arrays (FPGAs) differ significantly from application-specific integrated circuits (ASICs) in development timelines and costs. FPGAs enable a shorter time-to-market, often achievable in months through reconfiguration without fabrication, in contrast to ASICs, which typically require 12 to 24 months for design, verification, and manufacturing.[33] Additionally, FPGAs incur no non-recurring engineering (NRE) costs, avoiding the multimillion-dollar expenses associated with ASIC mask sets and prototyping, making them ideal for risk-averse projects.[34] However, ASICs offer superior unit economics at high volumes due to their fixed, optimized structure, while FPGAs carry higher per-unit costs from programmable overhead.[35] In terms of performance and efficiency, ASICs generally outperform FPGAs by a factor of about 2 to 4 times in clock frequency, stemming from the routing and logic overhead in programmable fabrics that reduces clock speeds and increases latency.[36] This gap arises because FPGAs must accommodate general interconnects, whereas ASICs employ direct, customized wiring for specific functions. Power consumption follows a similar trend, with ASICs achieving higher efficiency through tailored transistors and minimal leakage, though the disparity has narrowed in modern process nodes (e.g., 7 nm and below) as FPGAs incorporate advanced FinFETs and specialized blocks to approach ASIC-like density. As of 2025, continued advancements in FPGA technology, including sub-5 nm process nodes and optimized architectures, have further narrowed this gap in many applications.[35][37] Compared to microprocessors and microcontrollers, FPGAs excel in parallel hardware acceleration for compute-intensive tasks such as digital signal processing (DSP), where sequential instruction execution on CPUs limits throughput. For instance, FPGAs can implement custom arithmetic logic units (ALUs) tailored to specific algorithms, processing multiple data streams concurrently without the overhead of general-purpose instruction sets, achieving orders-of-magnitude speedups over software implementations on microcontrollers.[38] This parallelism suits applications requiring real-time filtering or transforms, offloading the host processor to enhance overall system responsiveness.[39] FPGAs also provide advantages over graphics processing units (GPUs) in scenarios demanding low-latency, fixed-function acceleration, such as 5G baseband processing. In low-density parity-check (LDPC) decoding for 5G, FPGA implementations deliver latencies as low as 61.65 μs, outperforming GPU equivalents at 87 μs, due to deterministic hardware pipelines and fine-grained control over data flow.[40] However, FPGAs are less inherently suited for floating-point-intensive workloads like certain AI inferences without embedded hard IP blocks for multipliers and accumulators, where GPUs leverage massive parallel cores optimized for such operations.[41] Key decision factors for selecting FPGAs over fixed hardware revolve around production volume and flexibility needs. High-volume manufacturing favors ASICs for cost amortization, while low-volume runs, prototyping, or evolving standards benefit from FPGAs' reprogrammability and zero NRE.[42] Hybrid solutions, such as system-on-chip (SoC) FPGAs like Xilinx's Zynq UltraScale+ MPSoC, integrate hard processor systems with programmable logic to blend the parallelism of FPGAs with the software ecosystem of microprocessors, offering a balanced alternative for embedded applications.[43]Architecture
Logic and Programmable Blocks
The core of an FPGA's reconfigurable logic fabric consists of configurable logic blocks (CLBs), which serve as the fundamental units for implementing combinational and sequential digital circuits. Each CLB typically integrates multiple lookup tables (LUTs) for function generation, flip-flops for storage, and internal multiplexers for signal routing within the block, enabling flexible mapping of user-defined logic.[44][45] In architectures like those from AMD (formerly Xilinx), a CLB is subdivided into slices, with each slice containing four 6-input LUTs and eight flip-flops, allowing the block to support a variety of modes including combinational logic via LUTs, sequential logic through flip-flop registration, and arithmetic operations using dedicated carry chains. A 6-input LUT can realize any of 64 possible Boolean functions by storing the truth table in its memory, while the flip-flops provide synchronous storage with options for clock enable and reset. Internal multiplexers, such as 7-input and 8-input variants, facilitate mode selection and output combining within the slice.[44] In contrast, Intel's FPGAs employ adaptive logic modules (ALMs) as the basic elements, grouped into logic array blocks (LABs); each ALM features an 8-input fracturable LUT paired with four registers and two dedicated adders, capable of implementing select 7-input functions, all 6-input functions, or two independent smaller LUTs (e.g., 4-input each) to optimize density.[45] Function generation in these blocks relies on LUTs as versatile truth table implementations, where the LUT's SRAM configuration defines the output for each input combination, enabling rapid synthesis of arbitrary logic without custom wiring. For arithmetic functions, dedicated carry logic enhances efficiency; in AMD designs, a 4-bit ripple-carry chain per slice uses multiplexers (MUXCY) and exclusive-OR gates to propagate carries, with chains extending across multiple CLBs for wider operations like adders or counters. Intel ALMs similarly incorporate embedded adders within the fracturable LUT structure to support fast arithmetic without additional resources.[44][45] Modern FPGAs achieve high logic density through scaling these blocks, with devices featuring over 1 million LUTs or equivalent elements; for instance, AMD's Versal Premium Gen 2 series offers up to 3.27 million system logic cells, while Intel's Stratix 10 reaches 933,120 ALMs. Equivalent gate count is a rough, vendor-specific metric; a 6-input LUT is often estimated at 20-30 equivalent gates, so 1 million LUTs approximate 20-30 million gates.[46][47]Interconnect and Routing Resources
The interconnect and routing resources in a field-programmable gate array (FPGA) form a programmable wiring network that connects configurable logic blocks, enabling flexible signal paths across the device. This network typically consists of horizontal and vertical routing channels surrounding an array of logic blocks, with wires segmented into various lengths to balance routability, area, and delay. Short segments facilitate local connections, while longer segments support global routing with reduced switch overhead. In island-style architectures, common in commercial FPGAs, this structure occupies 80-90% of the total chip area, underscoring its dominance in resource allocation.[48] The routing hierarchy relies on connection blocks and switch boxes to interface logic blocks with the channel wires. Connection blocks provide access from logic block pins to the routing channels, with flexibility defined as the fraction of channel tracks accessible per pin (e.g., allows connection to half the tracks). Switch boxes, located at channel intersections, enable turns and continuations between horizontal and vertical wires, characterized by flexibility as the number of outgoing connections per incoming wire (e.g., ). Segmented wires in the channels include short (spanning one logic block), medium (two to four blocks), and long lines (spanning many blocks for low-skew global signals), allowing efficient path formation while minimizing switch usage for distant connections.[48][49] Switch matrices within these blocks are implemented using multiplexers controlled by configuration bits, such as 10:1 or 20:1 multiplexers at intersections to select signal paths. Pass-transistor switches, often NMOS-based with transmission gates, offer compact area but suffer from resistance degradation over multiple hops, impacting signal integrity. Buffer-based alternatives, employing tri-state inverters or full CMOS buffers, maintain drive strength for longer wires but increase area and power; modern FPGAs blend both, with buffers driving longer segments to optimize performance.[48][50][51] Routing challenges arise from limited resources, particularly congestion where multiple nets compete for tracks, potentially leading to unroutable designs. Place-and-route tools address this through iterative algorithms like rip-up and retry, where existing routes are torn up in congested areas and rerouted with penalty costs on overuse to promote balanced channel utilization. Channel width, defined as the number of tracks per channel (typically 100-200 in modern devices, though varying by architecture), must be sufficient to accommodate all nets without overflow; insufficient width increases critical path delays by forcing detours.[49][52][53] Performance is significantly influenced by routing, with delays often comprising 50-70% of the critical path due to wire capacitance and resistance, far exceeding logic block contributions. This dominance stems from the programmable nature of interconnects, which introduce extra parasitics compared to fixed ASICs. Wire delay can be approximated using the Elmore model: where and are resistance and capacitance per unit length, highlighting the linear scaling with path length and the need for segmentation to mitigate long-route penalties.[51][54][55]Input/Output and Clocking Systems
Input/Output Blocks (IOBs) in FPGAs serve as programmable interfaces that manage bidirectional data flow between external pins and the internal logic fabric, supporting a wide range of electrical standards to ensure compatibility with diverse systems.[56] These blocks typically accommodate differential signaling protocols such as LVDS for high-speed data transmission and PCIe interfaces up to Generation 5, enabling data rates of 32 GT/s per lane in modern implementations.[57] Additionally, IOBs feature configurable options including weak pull-up or pull-down resistors to stabilize unconnected inputs and programmable slew rate control on outputs to optimize signal integrity and reduce electromagnetic interference.[58][59] For high-speed applications, integrated transceivers within IOBs, such as Serializer/Deserializer (SerDes) units, operate at rates up to 28 Gbps, facilitating protocols like 100G Ethernet.[60] Clocking resources in FPGAs include dedicated global clock networks designed to distribute timing signals across the device with minimal variation, typically supporting 32 or more dedicated clock lines to handle multiple independent domains.[61] These networks achieve low skew, often below 100 ps peak-to-peak, ensuring synchronized operation of logic elements over large die areas.[62] Phase-Locked Loops (PLLs) and Digital Clock Managers (DCMs), now evolved into Mixed-Mode Clock Managers (MMCMs) in advanced architectures, provide frequency synthesis capabilities, such as multiplying an input clock of 100 MHz to 500 MHz through programmable multiplication factors while allowing phase adjustments for alignment.[63][64] Clock management systems employ dedicated routing paths to propagate clocks with low jitter, typically under 1 ps RMS for critical paths, minimizing timing uncertainties in high-performance designs.[65] Dynamic phase shifting within PLLs or MMCMs enables real-time adjustments to clock edges, which is essential for interfacing with DDR memory where data strobe (DQS) signals must align precisely with data (DQ) lines to capture information correctly.[66] In integration examples, Multi-Gigabit Transceivers (MGTs) incorporate embedded equalization techniques, such as adaptive continuous-time linear equalizers, to compensate for signal degradation over long traces or backplanes at multi-Gbps speeds.[67] Modern FPGAs often provide over 1,000 user I/O pins, allowing extensive external connectivity in applications requiring high pin counts.[68]Embedded Hard IP Blocks
Embedded hard IP blocks in field-programmable gate arrays (FPGAs) are fixed-function hardware macros fabricated directly into the silicon die to accelerate common operations with superior performance, power efficiency, and resource utilization compared to implementing equivalent functionality using programmable logic. These blocks include dedicated memory arrays, digital signal processing units, and interface controllers, enabling FPGAs to handle data-intensive tasks like buffering, arithmetic computations, and high-speed communication without consuming configurable resources. By integrating these specialized circuits, FPGA designers can achieve higher throughput in applications such as signal processing, networking, and embedded systems, while the surrounding programmable fabric provides customization around these fixed elements. Block RAM (BRAM) consists of dual-port static random-access memory (SRAM) arrays optimized for on-chip data storage and buffering in FPGAs. Each BRAM block typically provides 36 Kb of capacity, configurable as a single 36 Kb unit or two independent 18 Kb units, with two independent read/write ports supporting simultaneous access from different clock domains. These blocks support true dual-port operation, where both ports can perform read or write actions concurrently, and simple dual-port modes for asymmetric read/write configurations; they are also programmable as first-in-first-out (FIFO) buffers with built-in FIFO logic for queue management in data pipelines. In high-end devices, such as AMD's Virtex UltraScale+ FPGAs, the aggregate BRAM capacity can reach up to approximately 75 Mb, enabling efficient handling of large datasets in applications like image processing or machine learning inference without external memory access.[69][70][71] Digital signal processing (DSP) slices are dedicated arithmetic units designed for high-speed multiply-accumulate (MAC) operations and other numerical computations prevalent in filtering, convolution, and transform algorithms. Each DSP slice features a 25x18-bit two's complement multiplier, a 48-bit post-adder/accumulator, an optional 18-bit pre-adder for input conditioning, and configurable pipeline registers to support multi-cycle operations at clock rates up to 550 MHz. These elements enable efficient implementation of MAC functions, where the pre-adder sums inputs before multiplication to reduce slice count in symmetric filters, and the pipeline stages minimize latency while maximizing throughput. The overall computational capacity can be estimated as operations per second = clock rate × number of slices × effective parallelism per slice; for instance, in AMD's Kintex UltraScale FPGAs with over 2,000 slices operating at 500 MHz and supporting dual multiplies per cycle, this yields peak performance approaching 1 TFLOPS for fixed-point operations in compute-intensive workloads.[72][73] Beyond memory and arithmetic blocks, FPGAs incorporate other specialized hard IP for interfacing and processing, such as Ethernet media access controllers (MACs), PCI Express (PCIe) endpoints, and embedded processor cores in system-on-chip (SoC) variants. Ethernet MACs provide hardened support for standards like 10/100/1000 Mbps or up to 100 Gbps, including frame processing and checksum offload to reduce logic overhead in networking applications; for example, AMD's Zynq UltraScale+ devices integrate 100G Ethernet blocks compliant with IEEE 802.3. PCIe endpoints handle high-bandwidth data transfer with integrated PHY, data link, and transaction layers, supporting Gen3 (8 GT/s) or Gen4 (16 GT/s) rates, as seen in Intel's Stratix 10 FPGAs with up to 16 lanes per block. In SoC-FPGAs, hard processor systems (HPS) embed ARM Cortex cores for software-defined control; AMD's Zynq-7000 series features dual Cortex-A9 cores at up to 1 GHz with NEON SIMD extensions, while Intel's Stratix 10 SX includes a quad-core Cortex-A53 at 1.5 GHz for hybrid CPU-FPGA acceleration.[43] The primary trade-off of embedded hard IP blocks is their fixed architecture, which delivers up to 10 times higher logic density and improved power efficiency compared to soft IP implementations synthesized from configurable logic, but at the cost of reduced reconfigurability for non-standard functions. For instance, in AMD's UltraScale architecture, hard DSP slices achieve 2-3x better performance per watt than equivalent soft multipliers due to optimized silicon layout, while in Intel's Stratix 10, integrated PCIe hard IP reduces resource utilization by over 50% versus soft cores, though customization is limited to parameterizable features like lane width. This balance makes hard blocks essential for performance-critical paths in production designs, with programmable logic handling surrounding adaptability.[74]Advanced Architectural Features
Modern field-programmable gate arrays (FPGAs) have evolved to incorporate system-on-chip (SoC) integrations that combine programmable logic fabric with embedded processors and peripherals, enabling heterogeneous computing platforms capable of handling diverse workloads efficiently. For instance, AMD's Zynq UltraScale+ MPSoC family integrates a quad-core ARM Cortex-A53 application processing unit, dual-core ARM Cortex-R5F real-time processing unit, and a Mali-400 MP2 graphics processing unit (GPU) alongside the FPGA fabric, facilitating seamless coordination between software-defined processing and hardware acceleration for applications like embedded vision and automotive systems.[75] These SoC-FPGAs support heterogeneous architectures where CPUs, GPUs, and FPGAs operate in tandem, optimizing power efficiency and performance by assigning tasks to the most suitable compute element, as seen in platforms that leverage FPGA reconfigurability for big data analytics and signal processing.[76][77] Advancements in three-dimensional (3D) architectures further enhance FPGA capabilities by stacking silicon dies to increase density and reduce interconnect delays. Through-silicon vias (TSVs) serve as vertical interconnects in these stacked structures, enabling direct inter-layer communication that minimizes signal propagation latency compared to traditional two-dimensional routing.[78] AMD's Stacked Silicon Interconnect (SSI) technology, for example, allows multiple FPGA dies to be integrated with lower latency and power consumption, supporting high-bandwidth memory (HBM) stacks in devices like the Virtex UltraScale+ series.[79] Monolithic 3D integrated circuits (ICs) and hybrid stacking approaches, such as those explored in research prototypes, can achieve up to 50% latency reductions in critical paths by shortening wire lengths, while also improving overall throughput for compute-intensive tasks.[80] Intel's Stratix 10 FPGAs, meanwhile, integrate support for 3D XPoint memory via high-speed interfaces like PCIe 4.0, allowing FPGAs to leverage persistent, low-latency storage in accelerated systems without full die stacking.[81] Emerging trends in FPGA design emphasize chiplet-based architectures and adaptive computing tailored for artificial intelligence (AI). AMD's Versal AI Edge series, introduced in 2023, employs modular tiles including AI Engine tiles for scalar, vector, and tensor processing, enabling dynamic reconfiguration to optimize inference workloads in edge devices like autonomous vehicles and industrial automation.[82] These chiplet designs break monolithic structures into specialized interconnect, compute, and I/O tiles, improving yield, scalability, and performance; for example, next-generation Versal FPGAs like the VP1902 achieve up to 18.5 million system logic cells, more than doubling the density of prior monolithic implementations. In adaptive AI computing, FPGA fabrics incorporate dynamic tensor units, such as systolic array-based "Tensor Slices," which replace portions of programmable logic to accelerate deep learning operations like convolutions, offering flexibility for evolving neural network architectures without full redesigns. As of 2024, AMD's Versal Gen 2 series, including Premium Gen 2 devices with up to 3.27 million system logic cells and support for PCIe 6.0 and CXL 3.1, further advances chiplet integration and performance.[83][84][85] Looking toward future directions, FPGA architectures are exploring optical interconnects and quantum-inspired reconfigurability to address bandwidth and computational limits in exascale systems. Photonic integration promises to replace electrical interconnects with light-based links, reducing power dissipation and enabling terabit-per-second data rates for AI and high-performance computing, as demonstrated in prototypes combining silicon photonics with FPGA controllers.[86] Quantum-inspired approaches, meanwhile, leverage FPGA reconfigurability to emulate quantum hardware behaviors, such as dynamic partial reconfiguration for simulating qubit operations or error correction, paving the way for hybrid classical-quantum accelerators in scalable platforms. These innovations, still in early research phases, aim to extend FPGA versatility into domains requiring ultra-low latency and probabilistic computing paradigms.[87]Configuration and Programming
Configuration Memory Technologies
The configuration memory in field-programmable gate arrays (FPGAs) stores the bitstream that programs the device's logic, routing, and other resources, determining its functionality after fabrication. Different memory technologies offer trade-offs in volatility, reconfiguration speed, power efficiency, endurance, and environmental resilience, influencing their adoption in various applications from high-performance computing to space systems. SRAM-based memories dominate due to their reprogrammability, while non-volatile options like antifuse and Flash prioritize reliability and low power, and emerging types like FRAM and MRAM address limitations in endurance and harsh conditions.[88] SRAM-based configuration memory is volatile and widely used in over 60% of FPGAs as of 2024, particularly in high-density devices from AMD (Xilinx) and Intel. Upon power-off or reset, the memory loses its contents, requiring reloading of the bitstream from external non-volatile storage such as Flash or EEPROM during initialization, which typically takes milliseconds (e.g., over 200 ms for a Xilinx Spartan-3 XC3S200). This technology enables rapid in-system reconfiguration in tens of milliseconds but consumes more power due to the need for external boot devices and clears automatically on power-on reset, making it suitable for prototyping and applications tolerant of startup delays.[89][88][90] Antifuse-based memory is non-volatile and one-time programmable (OTP), forming permanent connections via metal-oxide breakdown during programming, which provides inherent design security and eliminates the need for external configuration storage. Employed in Microchip's (formerly Actel) ProASIC and RTG4 series for radiation-hardened space applications, it achieves near-instant power-up times of about 60 µs and offers high reliability with no reconfiguration capability post-programming. This technology excels in fixed-function, high-security environments like aerospace but lacks flexibility for iterative designs due to its OTP nature.[88][91][92] Flash and EEPROM-based memories are non-volatile with multi-time programmability, supporting 100 to 10,000 erase/write cycles depending on the implementation, and integrate configuration storage directly on-chip for simplified designs and low power. Lattice Semiconductor's iCE40 and MachXO2 families use embedded Flash for low-power embedded systems, enabling reconfiguration in microseconds (around 50 µs) and internal booting without external memory. Microchip's ProASIC3 series leverages Flash for space-grade FPGAs, consuming roughly one-third the power of SRAM equivalents while providing reprogrammability and radiation tolerance of 25 to 30 krad(Si). These are favored in battery-powered or size-constrained applications requiring occasional updates.[93][91][94] Emerging non-volatile technologies like FRAM (ferroelectric RAM) and MRAM (magnetoresistive RAM) aim to combine instant-on capability, unlimited endurance, and robustness for demanding environments. FRAM offers low-power operation (similar to SRAM but non-volatile) and high radiation hardness, with densities up to 2 Mb suitable for booting space-grade FPGAs and processors, making it attractive for low-earth orbit missions where SEU immunity and minimal power draw are critical. MRAM, using magnetic tunnel junctions, provides superior endurance (over 10^15 cycles in some variants), faster configuration (e.g., x8 widths at 160 MHz), and resilience to extreme temperatures and radiation, as integrated in Lattice's Certus-NX and Avant FPGAs with Everspin partners. These technologies trade higher initial costs for overcoming Flash's endurance limits and SRAM's volatility, targeting edge AI, automotive, and aerospace sectors.[95][96][97][98]Programming Process and Tools
The programming process for an FPGA begins with the synthesis of a hardware description language (HDL) design into a gate-level netlist, followed by place-and-route implementation to map the logic onto the device's resources, culminating in the generation of a bitstream file that encodes the configuration data.[99][100] This bitstream is then downloaded to the FPGA, typically via interfaces such as JTAG for debugging and initial programming or SPI for high-speed configuration from external flash memory. JTAG download speeds can reach up to 25 Mbps depending on the cable and device, while SPI modes, particularly quad-SPI, enable rates up to approximately 100 MB/s in modern devices like Intel Stratix 10 FPGAs.[101][102][103] Partial reconfiguration allows dynamic updates to specific regions of the FPGA fabric without halting the entire device, enabling efficient resource reuse in applications requiring adaptability. For instance, swapping 10% of the fabric might take on the order of milliseconds to seconds, depending on the bitstream size and interface speed, as reconfiguration overhead scales with the modified area.[104][105] This process involves loading partial bitstreams through the internal configuration access port (ICAP) or external interfaces, with tools managing region isolation to prevent glitches during updates.[30] Vendor-specific tools streamline this workflow, integrating synthesis, implementation, simulation, and bitstream generation. AMD's Vivado Design Suite handles HDL synthesis to produce optimized netlists, performs placement and routing for timing closure, and supports behavioral, post-synthesis, and post-implementation simulations to verify functionality before programming.[99][106] Similarly, Intel's Quartus Prime software compiles designs through synthesis and fitting stages, generating bitstreams while integrating with ModelSim for comprehensive simulation, including waveform viewing and testbench modifications during the design flow.[107][108] The open-source ecosystem has grown significantly since 2015, providing alternatives to proprietary tools for greater accessibility and customization. Tools like nextpnr serve as a timing-driven place-and-route engine, supporting devices such as Lattice iCE40, ECP5, and experimental architectures when paired with Yosys for synthesis, enabling full bitstream generation without vendor lock-in.[109] The SymbiFlow project, initiated around 2018 as part of broader efforts to create a fully open toolchain, extends this by targeting commercial FPGAs like Xilinx 7-series through data-driven flows for synthesis, placement, and routing.[110][111][112] FPGA boot modes determine how the bitstream is loaded at power-up, loading configuration data into SRAM-based memory for operation. Master serial mode (mode pins 000) has the FPGA generate the configuration clock (CCLK) and read data from an external PROM at 1-bit width, while slave serial mode (111) relies on an external clock source for daisy-chaining multiple devices. Parallel flash mode, or master BPI (010), interfaces with NOR flash at 8- or 16-bit widths for faster loading, with the FPGA driving addresses and reading data synchronously or asynchronously. In processor-driven modes like slave SelectMAP (110), common in SoC FPGAs with embedded ARM cores, an external processor supplies data via an 8-, 16-, or 32-bit bus, allowing software-controlled configuration and integration with system boot processes.Design Entry and Synthesis Methods
Design entry for field-programmable gate arrays (FPGAs) primarily involves hardware description languages (HDLs) such as Verilog, SystemVerilog, and VHDL, which allow designers to specify behavior at the register-transfer level (RTL) or behavioral level.[113][114] These languages enable the description of digital circuits through structural, dataflow, or behavioral constructs, facilitating simulation and synthesis into FPGA fabric.[115] High-level synthesis (HLS) provides an alternative entry method by converting higher-level languages like C, C++, or Python into RTL code suitable for FPGAs. Tools such as Vitis HLS from AMD automate this process, transforming algorithmic descriptions—such as loops—into pipelined hardware accelerators to improve throughput.[116] For instance, pragmas like#pragma HLS PIPELINE can schedule loop iterations to achieve an initiation interval of 1 cycle, enabling concurrent execution on FPGA resources.[116]
The synthesis process begins with logic optimization, which applies transformations such as constant propagation to eliminate redundant logic by substituting constant values through the design, and retiming to reposition registers for better timing balance.[117][118] Following optimization, technology mapping decomposes the logic into lookup tables (LUTs) and flip-flops, inferring sequential elements from HDL constructs like always blocks in Verilog.[119] This step targets the FPGA's programmable logic blocks, ensuring the netlist aligns with device architecture.[120]
Optimization techniques during synthesis balance area and speed trade-offs, often through pipelining, which inserts registers to divide critical paths and potentially double the achievable clock frequency at the cost of increased resource usage.[121] Formal verification, including equivalence checking, confirms that the synthesized netlist behaves identically to the RTL source, detecting discrepancies from optimization or mapping errors.[122] These methods ensure functional correctness without exhaustive simulation.[123]
Soft cores, such as the MicroBlaze RISC processor from AMD, are configurable intellectual property (IP) blocks implemented entirely in FPGA fabric using synthesis tools.[124] Resource utilization for these cores varies by configuration; for example, a basic MicroBlaze microcontroller variant on a Kintex UltraScale+ device consumes approximately 2,228 LUTs and achieves 399 MHz, while an application-optimized version uses 8,020 LUTs at 281 MHz.[125] Utilization is typically calculated as the percentage of resources employed, given by the formula:
This metric helps assess fit within the target FPGA.[125]
