Network processor

A network processor is an integrated circuit which has a feature set specifically targeted at the networking application domain.

Network processors are typically software programmable devices and would have generic characteristics similar to general purpose central processing units that are commonly used in many different types of equipment and products.

History of development

In modern telecommunications networks, information (voice, video, data) is transferred as packet data (termed packet switching) which is in contrast to older telecommunications networks that carried information as analog signals such as in the public switched telephone network (PSTN) or analog TV/Radio networks. The processing of these packets has resulted in the creation of integrated circuits (IC) that are optimised to deal with this form of packet data. Network processors have specific features or architectures that are provided to enhance and optimise packet processing within these networks.

Network processors have evolved into ICs with specific functions. This evolution has resulted in more complex and more flexible ICs being created. The newer circuits are programmable and thus allow a single hardware IC design to undertake a number of different functions, where the appropriate software is installed.

Network processors are used in the manufacture of many different types of network equipment such as:

Reconfigurable Match-Tables

Reconfigurable Match-Tables^[1]^[2] were introduced in 2013 to allow switches to operate at high speeds while maintaining flexibility when it comes to the network protocols running on them, or the processing does to them. P4^[3] is used to program the chips. The company Barefoot Networks was based around these processors and was later purchased by Intel in 2019.

An RMT pipeline relies on three main stages; the programmable parser,^[2] the Match-Action tables and the programmable deparser. The parser reads the packet in chunks and processes these chunks to find out which protocols are used in the packet (Ethernet, VLAN, IPv4...) and extracts certain fields from the packet into the Packet Header Vector (PHV). Certain fields in the PHV may be reserved for special uses such as present headers or total packet length. The protocols are typically programmable, and so are the fields to extract. The Match-Action tables are a series of units that read an input PHV, match certain fields in it using a crossbar and CAM memory, the result is a wide instruction that operates on one or more fields of the PHV and data to support this instruction. The output PHV is then sent to the next MA stage or to the deparser. The deparser takes in the PHV as well as the original packet and its metadata (to fill in missing bits that weren't extracted into the PHV) and then outputs the modified packet as chunks. It's typically programmable as with the parser and may reuse some of the configuration files.

FlexNIC^[4] attempts to apply this model to Network Interface Controllers allowing servers to send and receive packets at high speeds while maintaining protocol flexibility and without increasing the CPU overhead.

Generic functions

In the generic role as a packet processor, a number of optimised features or functions are typically present in a network processor, which include:

Pattern matching – the ability to find specific patterns of bits or bytes within packets in a packet stream.
Key lookup – the ability to quickly undertake a database lookup using a key (typically an address in a packet) to find a result, typically routing information.
Computation
Data bitfield manipulation – the ability to change certain data fields contained in the packet as it is being processed.
Queue management – as packets are received, processed and scheduled to be sent onwards, they are stored in queues.
Control processing – the micro operations of processing a packet are controlled at a macro level which involves communication and orchestration with other nodes in a system.
Quick allocation and re-circulation of packet buffers.

Architectural paradigms

In order to deal with high data-rates, several architectural paradigms are commonly used:

Pipeline of processors - each stage of the pipeline consisting of a processor performing one of the functions listed above.
Parallel processing with multiple processors, often including multithreading.
Specialized microcoded engines to more efficiently accomplish the tasks at hand.
With the advent of multicore architectures, network processors can be used for higher layer (L4-L7) processing.

Additionally, traffic management, which is a critical element in L2-L3 network processing and used to be executed by a variety of co-processors, has become an integral part of the network processor architecture, and a substantial part of its silicon area ("real estate") is devoted to the integrated traffic manager.^[5] Modern network processors are also equipped with low-latency high-throughput on-chip interconnection networks optimized for the exchange of small messages among cores (few data words). Such networks can be used as an alternative facility for the efficient inter-core communication aside of the standard use of shared memory.^[6]

Applications

Using the generic function of the network processor, a software program implements an application that the network processor executes, resulting in the piece of physical equipment performing a task or providing a service. Some of the applications types typically implemented as software running on network processors are:^[7]

Packet or frame discrimination and forwarding, that is, the basic operation of a router or switch.
Quality of service (QoS) enforcement – identifying different types or classes of packets and providing preferential treatment for some types or classes of packet at the expense of other types or classes of packet.
Access Control functions – determining whether a specific packet or stream of packets should be allowed to traverse the piece of network equipment.
Encryption of data streams – built in hardware-based encryption engines allow individual data flows to be encrypted by the processor.
TCP offload processing

References

^ Bosshart, Pat; Gibb, Glen; Kim, Hun-Seok; Varghese, George; McKeown, Nick; Izzard, Martin; Mujica, Fernando; Horowitz, Mark (2013-08-01). Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN. ACM SIGCOMM 2013. Archived from the original on 2022-03-26. Retrieved 2022-03-26.
^ ^a ^b Gibb, Glen; Varghese, George; Horowitz, Mark; McKeown, Nick (October 2013). "Design principles for packet parsers". Architectures for Networking and Communications Systems. pp. 13–24. doi:10.1109/ANCS.2013.6665172. ISBN 978-1-4799-1641-2. S2CID 12282067.
^ "P4: Programming Protocol-Independent Packet Processors | acm sigcomm". www.sigcomm.org. Retrieved 2022-03-26.
^ Kaufmann, Antoine; Peter, SImon; Sharma, Naveen Kr.; Anderson, Thomas; Krishnamurthy, Arvind (2016-03-25). "High Performance Packet Processing with FlexNIC". Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS '16. New York, NY, USA: Association for Computing Machinery. pp. 67–81. doi:10.1145/2872362.2872367. ISBN 978-1-4503-4091-5. S2CID 9625891.
^ Giladi, Ran (2008). Network Processors: Architecture, Programming, and Implementation. Systems on Silicon. Morgan Kaufmann. ISBN 978-0-12-370891-5.
^ Buono, Daniele; Mencagli, Gabriele (21–25 July 2014). Run-time mechanisms for fine-grained parallelism on network processors: The TILEPro64 experience (PDF). 2014 International Conference on High Performance Computing Simulation (HPCS 2014). Bologna, Italy. pp. 55–64. doi:10.1109/HPCSim.2014.6903669. ISBN 978-1-4799-5313-4. Archived (PDF) from the original on 27 March 2019. Alt URL
^ Comer, Douglas E. (2005). Network Systems Design Using Network Processors: Intel 2XXX Version. Addison-Wesley. ISBN 978-0-13-187286-8.

[1] Bosshart, Pat; Gibb, Glen; Kim, Hun-Seok; Varghese, George; McKeown, Nick; Izzard, Martin; Mujica, Fernando; Horowitz, Mark (2013-08-01). Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN. ACM SIGCOMM 2013. Archived from the original on 2022-03-26. Retrieved 2022-03-26.

[:0-2] Gibb, Glen; Varghese, George; Horowitz, Mark; McKeown, Nick (October 2013). "Design principles for packet parsers". Architectures for Networking and Communications Systems. pp. 13–24. doi:10.1109/ANCS.2013.6665172. ISBN 978-1-4799-1641-2. S2CID 12282067.

[3] "P4: Programming Protocol-Independent Packet Processors | acm sigcomm". www.sigcomm.org. Retrieved 2022-03-26.

[4] Kaufmann, Antoine; Peter, SImon; Sharma, Naveen Kr.; Anderson, Thomas; Krishnamurthy, Arvind (2016-03-25). "High Performance Packet Processing with FlexNIC". Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS '16. New York, NY, USA: Association for Computing Machinery. pp. 67–81. doi:10.1145/2872362.2872367. ISBN 978-1-4503-4091-5. S2CID 9625891.

[5] Giladi, Ran (2008). Network Processors: Architecture, Programming, and Implementation. Systems on Silicon. Morgan Kaufmann. ISBN 978-0-12-370891-5.

[6] Buono, Daniele; Mencagli, Gabriele (21–25 July 2014). Run-time mechanisms for fine-grained parallelism on network processors: The TILEPro64 experience (PDF). 2014 International Conference on High Performance Computing Simulation (HPCS 2014). Bologna, Italy. pp. 55–64. doi:10.1109/HPCSim.2014.6903669. ISBN 978-1-4799-5313-4. Archived (PDF) from the original on 27 March 2019. Alt URL

[7] Comer, Douglas E. (2005). Network Systems Design Using Network Processors: Intel 2XXX Version. Addison-Wesley. ISBN 978-0-13-187286-8.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

v t e Hardware acceleration
Theory	Universal Turing machine Parallel computing Distributed computing
Applications	GPU GPGPU software DirectX Audio Digital signal processing Hardware random number generation Neural processing unit Cryptography TLS Machine vision Custom hardware attack scrypt Networking Data
Implementations	High-level synthesis C to HDL FPGA ASIC CPLD System on a chip Network on a chip
Architectures	Dataflow Transport triggered Multicore Manycore Heterogeneous In-memory computing Systolic array Neuromorphic
Related	Programmable logic Processor design chronology Digital electronics Virtualization Hardware emulation Logic synthesis Embedded systems

Authority control databases
International	GND
National	United States Israel
Other	Yale LUX

Characteristic	Network Processors (NPs)	General-Purpose Processors (GPPs)	Graphics Processing Units (GPUs)
Primary Strength	Parallel I/O-bound packet processing at wire speed (e.g., >10 Gbps, millions of PPS)	Versatile compute for sequential or mixed workloads	Massive parallelism for compute-bound tasks (e.g., graphics, AI training)
Architecture Focus	Multi-core RISC with accelerators (CAM, crypto); low-latency interconnects	Single/multi-core with large caches; general instructions	Thousands of SIMD cores optimized for floating-point ops
Suitability	Real-time networking (routing, switching); scalable for Layers 2-7	Software-defined tasks; falls short in high-density I/O	Not ideal for low-latency, variable-size packet streams
Efficiency Tradeoff	High throughput per watt for sustained loads; programmable flexibility	Higher power for I/O-heavy nets; easier general programming	Energy-intensive for non-parallel workloads; poor for irregular data

History

Media collections

Network processor

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Network processor

History of development

Reconfigurable Match-Tables

Generic functions

Architectural paradigms

Applications

See also

References