Hubbry Logo
Clos networkClos networkMain
Open search
Clos network
Community hub
Clos network
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Clos network
Clos network
from Wikipedia

In the field of telecommunications, a Clos network is a kind of multistage circuit-switching network that represents a theoretical idealization of practical, multistage switching systems. It was invented by Edson Erwin[1] in 1938 and first formalized by the American [2] engineer Charles Clos[3] in 1952.

By adding stages, a Clos network reduces the number of crosspoints required to compose a large crossbar switch. A Clos network topology (diagrammed below) is parameterized by three integers n, m, and r: n represents the number of sources which feed into each of r ingress stage crossbar switches; each ingress stage crossbar switch has m outlets; and there are m middle stage crossbar switches.

Circuit switching arranges a dedicated communications path for a connection between endpoints for the duration of the connection. This sacrifices total bandwidth available if the dedicated connections are poorly utilized, but makes the connection and bandwidth more predictable, and only introduces control overhead when the connections are initiated, rather than with every packet handled, as in modern packet-switched networks.

When the Clos network was first devised, the number of crosspoints was a good approximation of the total cost of the switching system. While this was important for electromechanical crossbars, it became less relevant with the advent of VLSI, wherein the interconnects could be implemented either directly in silicon, or within a relatively small cluster of boards. Upon the advent of complex data centers, with huge interconnect structures, each based on optical fiber links, Clos networks regained importance.[4] A subtype of Clos network, the Beneš network, has also found recent application in machine learning.[5]

Topology

[edit]

Clos networks have three stages: the ingress stage, the middle stage, and the egress stage. Each stage is made up of a number of crossbar switches (see diagram below), often just called crossbars. The network implements an r-way perfect shuffle between stages. Each call entering an ingress crossbar switch can be routed through any of the available middle stage crossbar switches, to the relevant egress crossbar switch. A middle stage crossbar is available for a particular new call if both the link connecting the ingress switch to the middle stage switch, and the link connecting the middle stage switch to the egress switch, are free.

Clos networks are defined by three integers n, m, and r. n represents the number of sources which feed into each of r ingress stage crossbar switches. Each ingress stage crossbar switch has m outlets, and there are m middle stage crossbar switches. There is exactly one connection between each ingress stage switch and each middle stage switch. There are r egress stage switches, each with m inputs and n outputs. Each middle stage switch is connected exactly once to each egress stage switch. Thus, the ingress stage has r switches, each of which has n inputs and m outputs. The middle stage has m switches, each of which has r inputs and r outputs. The egress stage has r switches, each of which has m inputs and n outputs.

Blocking characteristics

[edit]

The relative values of m and n define the blocking characteristics of the Clos network.

Strict-sense nonblocking Clos networks (m ≥ 2n−1): the original 1953 Clos result

[edit]

If m ≥ 2n−1, the Clos network is strict-sense nonblocking, meaning that an unused input on an ingress switch can always be connected to an unused output on an egress switch, without having to re-arrange existing calls. This is the result which formed the basis of Clos's classic 1953 paper. Assume that there is a free terminal on the input of an ingress switch, and this has to be connected to a free terminal on a particular egress switch. In the worst case, n−1 other calls are active on the ingress switch in question, and n−1 other calls are active on the egress switch in question. Assume, also in the worst case, that each of these calls passes through a different middle-stage switch. Hence in the worst case, 2n−2 of the middle stage switches are unable to carry the new call. Therefore, to ensure strict-sense nonblocking operation, another middle stage switch is required, making a total of 2n−1.

The below diagram shows the worst case when the already established calls (blue and red) are passing different middle-stage switches, so another middle-stage switch is necessary to establish a call between the green input and output.

Rearrangeably nonblocking Clos networks (mn)

[edit]

If mn, the Clos network is rearrangeably nonblocking, meaning that an unused input on an ingress switch can always be connected to an unused output on an egress switch, but for this to take place, existing calls may have to be rearranged by assigning them to different centre stage switches in the Clos network.[6] To prove this, it is sufficient to consider m = n, with the Clos network fully utilised; that is, r×n calls in progress. The proof shows how any permutation of these r×n input terminals onto r×n output terminals may be broken down into smaller permutations which may each be implemented by the individual crossbar switches in a Clos network with m = n.

The proof uses Hall's marriage theorem[7] which is given this name because it is often explained as follows. Suppose there are r boys and r girls. The theorem states that if every subset of k boys (for each k such that 0 ≤ kr) between them know k or more girls, then each boy can be paired off with a girl that he knows. It is obvious that this is a necessary condition for pairing to take place; what is surprising is that it is sufficient.

In the context of a Clos network, each boy represents an ingress switch, and each girl represents an egress switch. A boy is said to know a girl if the corresponding ingress and egress switches carry the same call. Each set of k boys must know at least k girls because k ingress switches are carrying k×n calls and these cannot be carried by less than k egress switches. Hence each ingress switch can be paired off with an egress switch that carries the same call, via a one-to-one mapping. These r calls can be carried by one middle-stage switch. If this middle-stage switch is now removed from the Clos network, m is reduced by 1, and we are left with a smaller Clos network. The process then repeats itself until m = 1, and every call is assigned to a middle-stage switch.

Blocking probabilities: the Lee and Jacobaeus approximations

[edit]

Real telephone switching systems are rarely strict-sense nonblocking for reasons of cost, and they have a small probability of blocking, which may be evaluated by the Lee or Jacobaeus approximations,[8] assuming no rearrangements of existing calls. Here, the potential number of other active calls on each ingress or egress switch is u = n−1.

In the Lee approximation, it is assumed that each internal link between stages is already occupied by a call with a certain probability p, and that this is completely independent between different links. This overestimates the blocking probability, particularly for small r. The probability that a given internal link is busy is p = uq/m, where q is the probability that an ingress or egress link is busy. Conversely, the probability that a link is free is 1−p. The probability that the path connecting an ingress switch to an egress switch via a particular middle stage switch is free is the probability that both links are free, (1−p)2. Hence the probability of it being unavailable is 1−(1−p)2 = 2pp2. The probability of blocking, or the probability that no such path is free, is then [1−(1−p)2]m.

The Jacobaeus approximation is more accurate, and to see how it is derived, assume that some particular mapping of calls entering the Clos network (input calls) already exists onto middle stage switches. This reflects the fact that only the relative configurations of ingress switch and egress switches is of relevance. There are i input calls entering via the same ingress switch as the free input terminal to be connected, and there are j calls leaving the Clos network (output calls) via the same egress switch as the free output terminal to be connected. Hence 0 ≤ iu, and 0 ≤ ju.

Let A be the number of ways of assigning the j output calls to the m middle stage switches. Let B be the number of these assignments which result in blocking. This is the number of cases in which the remaining mj middle stage switches coincide with mj of the i input calls, which is the number of subsets containing mj of these calls. Then the probability of blocking is:

If fi is the probability that i other calls are already active on the ingress switch, and gj is the probability that j other calls are already active on the egress switch, the overall blocking probability is:

This may be evaluated with fi and gj each being denoted by a binomial distribution. After considerable algebraic manipulation, this may be written as:

Clos networks with more than three stages

[edit]

Clos networks may also be generalised to any odd number of stages. By replacing each centre stage crossbar switch with a 3-stage Clos network, Clos networks of five stages may be constructed. By applying the same process repeatedly, 7, 9, 11,... stages are possible.

Beneš network (m = n = 2)

[edit]

A rearrangeably nonblocking network of this type with m = n = 2 is generally called a Beneš network, even though it was discussed and analyzed by others[who?] before Václav E. Beneš. The number of inputs and outputs is N = r×n = 2r. Such networks have 2log2N − 1 stages, each containing N/2 2×2 crossbar switches, and use a total of Nlog2NN/2 2×2 crossbar switches. For example, an 8×8 Beneš network (i.e. with N = 8) is shown below; it has 2log28 − 1 = 5 stages, each containing N/2 = 4 2×2 crossbar switches, and it uses a total of N log2NN/2 = 20 2×2 crossbar switches. The central three stages consist of two smaller 4×4 Beneš networks, while in the center stage, each 2×2 crossbar switch may itself be regarded as a 2×2 Beneš network. This example therefore highlights the recursive construction of this type of network, with one of the two constituent 4×4 Beneš highlighted. The color of the lines between the 2×2 blocks are chosen to emphasize the odd-even recursive decomposition of the inputs, with the odd numbered inputs going to one sub-block, and the even numbered inputs going to the other sub-block.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A Clos network is a multistage interconnection that enables non-blocking connectivity between a large number of inputs and outputs using a series of smaller crossbar switches arranged in multiple stages, originally designed to optimize switching systems by reducing the total number of crosspoints. The architecture was first invented by Edson Erwin in 1938 and formalized by Charles Clos, a researcher at Bell Laboratories, and detailed in his seminal 1953 paper "A Study of Non-Blocking Switching Networks" published in the Bell System Technical Journal, the architecture ensures that any input can connect to any output without interference under specified conditions, making it highly efficient for circuit-switched environments. The core structure of a Clos network typically consists of three stages: an ingress stage of input switches, a middle stage of interconnecting switches, and an egress stage of output switches. In a standard symmetric configuration for N × N connectivity, the ingress and egress stages each comprise m switches with n ports (N = m × n), while the middle stage has r switches, each with m × m crosspoints, where full-mesh connections link every ingress switch to every middle switch and every middle switch to every egress switch. To achieve strict non-blocking behavior—allowing any unused input to connect to any unused output without reconfiguration—r must be at least 2n - 1, as proven by Clos's theorem; for rearrangeably non-blocking networks, where paths can be rearranged to free connections, r ≥ n suffices. This scales efficiently, as adding stages or switches increases capacity without proportional growth in complexity. Key advantages of Clos networks include their , through redundant paths, and cost-effectiveness compared to single large crossbar switches, which would require crosspoints versus the Clos's approximately N² / n for large n. In the original context, these properties minimized hardware costs and improved reliability for handling voice traffic. The architecture supports both circuit and , with non-blocking guarantees reducing latency and in high-demand scenarios. In modern applications, Clos networks have been adapted for fabrics, particularly in the leaf-spine topology—a folded variant of the three-stage design—where leaf switches connect to servers or endpoints, and spine switches handle inter-leaf routing to support massive in and hyperscale environments. This evolution, prominent since the , enables horizontal scaling by adding leaf or spine layers, often up to five or seven stages for global infrastructures, and integrates with protocols like Ethernet and VXLAN for . Companies like and deploy Clos-based designs for their predictability and performance in AI workloads and .

History and Background

Invention and Original Purpose

The concept of the Clos network was invented by Edson Erwin in 1938 and patented in 1941 (US Patent 2,244,004). Charles Clos, an engineer at Bell Telephone Laboratories, developed the Clos network in the early 1950s to address the challenges of building scalable and cost-effective exchanges amid the rapid expansion of telephony services following . During this period, the experienced significant growth in subscriber demand, with millions of new telephone lines installed annually, necessitating larger switching systems capable of handling increased call volumes without proportional cost increases. In his seminal 1953 paper, "A Study of Non-Blocking Switching Networks," published in the Bell System Technical Journal, Clos outlined the motivation to minimize the number of crosspoints— the electromechanical contact points essential for routing calls—while ensuring non-blocking connectivity in telephone switching arrays. Single-stage crossbar switches, the prevailing technology at the time, suffered from high costs due to their requirement of approximately N2N^2 crosspoints for NN inputs and outputs, making them impractical for large-scale urban exchanges serving thousands of lines. The core innovation of the Clos network was a multi-stage architecture composed of smaller crossbar switches interconnected across input, middle, and output stages to interconnect inputs and outputs more efficiently. Clos introduced notation where nn represents the number of inputs (or outputs) per switch in the input and output stages, and mm denotes the number of middle-stage switches, allowing for a total of N=n×kN = n \times k connections (with kk being the number of input/output stage switches) while drastically reducing the overall crosspoint count—for instance, a three-stage network for N=36N=36 with n=6n=6 and m=11m=11 middle switches required only 1,188 crosspoints compared to 1,296 for a single-stage equivalent. This design was specifically tailored for systems, enabling reliable path establishment from any idle inlet to any idle outlet irrespective of existing connections.

Development and Key Milestones

In the and , Clos networks transitioned from analog applications to digital switching systems, integrating with (TDM) techniques and early stored-program control architectures to handle digitized voice traffic more efficiently. This shift was driven by advancements in (PCM) and the need for scalable digital exchanges. During the 1980s and 1990s, Clos networks gained prominence in (ATM) switching fabrics, where their multistage design supported high-speed packetized data for emerging broadband services. Major telecommunications vendors, including and , implemented Clos-based ATM switches to meet the demands of integrated services digital network (ISDN) extensions, enabling nonblocking connections for variable-rate traffic. A key theoretical advancement in this era involved adapting Clos structures for optical implementations using (WDM), first explored in research prototypes around 2000 to leverage fiber-optic capacities for terabit-scale routing. From the 2000s onward, Clos networks experienced a revival in packet-switched environments, particularly within infrastructures, where their scalability addressed the explosion of Ethernet-based traffic. In the 2010s, hyperscale operators adopted Clos-derived leaf-spine topologies for nonblocking Ethernet fabrics; for instance, Cisco's Nexus series and Arista's EOS platforms deployed multi-tier Clos designs supporting up to hundreds of thousands of ports with low latency, powering at companies like and . As of 2025, Clos networks continue to evolve through integration with (SDN) controllers and AI-driven optimization algorithms, enhancing dynamic path selection and in AI training clusters and deployments. These advancements, often realized in optical Clos variants, enable real-time traffic engineering in environments handling exabyte-scale data flows for workloads.

Fundamental Topology

Three-Stage Architecture

The three-stage is a multistage designed to connect N inputs to N outputs, where N = n², using smaller crossbar switches arranged in input, middle, and output stages. The input stage consists of n switches, each of size n × m, providing n inputs and m outputs per switch. The middle stage comprises m switches, each of size n × n. The output stage includes n switches, each of size m × n, with m inputs and n outputs per switch. Interconnections between stages are structured as full bipartite graphs: each of the n input-stage switches connects to all m middle-stage switches via dedicated links, and similarly, each middle-stage switch connects to all n output-stage switches. This arrangement enables signal flow from any input through a selected path: a connection is established by activating a crosspoint in the appropriate input switch to route to a middle switch, then from that middle switch to the target output switch, and finally to the desired output . The permutation-based connections ensure multiple alternate paths exist between stages, facilitating from any input to any output under suitable conditions. The total number of crosspoints in the network is given by 3mn², accounting for n × (n m) in the input stage, m × (n n) in the middle stage, and n × (m n) in the output stage. This yields a complexity of O(N^{3/2}), a significant scaling advantage over the O(N²) required for a monolithic crossbar switch of size N × N, as the Clos design distributes the switching across smaller, more manageable components. For example, consider a Clos network with n = 4 and m = 5, supporting N = 16 ports. There are 4 input switches (each 4 5), 5 middle switches (each 4 × 4), and 4 output switches (each 5 4), for a total of 240 crosspoints. A simple routing path might connect input port 1 (on the first input switch) to output port 3 (on the second output switch) by selecting the third middle switch: activate the crosspoint from input port 1 to the third middle output in the first input switch, then from the first middle input to the second output switch in the third middle switch, and finally from the second input to output port 3 in the second output switch. To achieve strict-sense nonblocking operation in such a configuration, m must be at least 2n - 1.

Parameters and Scaling

In the symmetric three-stage Clos network, the primary parameters are nn, denoting the number of endpoints attached to each ingress or egress switch, and mm, the number of switches in the middle stage. There are nn ingress switches and nn egress switches, yielding a total of N=n2N = n^2 endpoints or ports. This parameterization assumes full connectivity between stages, with each ingress switch linking to all mm middle switches via dedicated links, and similarly for the egress stage. The total number of crosspoints kk is derived directly from the switch sizes across stages: the nn ingress switches each require n×mn \times m crosspoints, the mm middle switches each require n×nn \times n crosspoints, and the nn egress switches each require m×nm \times n crosspoints, resulting in k=3n2mk = 3n^2 m. Compared to a monolithic crossbar switch needing N2=n4N^2 = n^4 crosspoints, the Clos design offers significant savings for large NN. As NN scales with increasing nn, crosspoint efficiency improves asymptotically; for instance, with mm on the order of nn to maintain low blocking, k3n3k \approx 3n^3, reducing the relative complexity to O(1/n)O(1/n) of the crossbar's n4n^4. A key arises in selecting mm: larger values decrease blocking probability by providing more parallel paths but elevate through additional crosspoints and hardware. For N=256N = 256 (n=16n = 16), setting m=17m = 17 yields k=3×256×17=13,056k = 3 \times 256 \times 17 = 13{,}056 crosspoints, versus 65,53665{,}536 for an equivalent crossbar—a reduction by a factor of about 5. In contemporary deployments, the parameter nn is adapted to reflect switch , the aggregate port count enabling high to the middle stage, which supports scalable fabrics using devices with 32–128 ports or more.

Nonblocking Conditions

Strict-Sense Nonblocking

Strict-sense nonblocking refers to the property of a Clos network where a connection can always be established between any idle input and any idle output without disrupting existing connections or requiring rearrangements, irrespective of the current traffic pattern. This ensures the network supports full connectivity under all possible occupancy conditions, making it ideal for deterministic performance guarantees. In a three-stage Clos network with ingress and egress stages each comprising mm switches of size n×rn \times r (N=m×nN = m \times n) and rr middle-stage switches each of size m×mm \times m, the condition for strict-sense nonblocking is r2n1r \geq 2n - 1. This theorem, established by in 1953, minimizes the number of crosspoints while preventing blocking. The minimum value arises from the need to accommodate the worst-case scenario without conflicts. The proof relies on the applied to middle-stage switch usage. Consider establishing a new connection from an input switch to an output switch; in the adversarial case, n1n-1 inputs on the source switch and n1n-1 outputs on the destination switch are already connected, potentially occupying up to 2n22n-2 distinct middle switches. With r=2n1r = 2n - 1, at least one middle switch remains available for the new path, avoiding overlap. This result originated in the context of circuit-switched systems, where Clos aimed to design efficient crossbar alternatives for handling simultaneous voice calls with 100% throughput assurance. The architecture reduced crosspoint requirements compared to single-stage networks, enabling scalable deployment in early electronic switching exchanges. The strict nonblocking condition can be expressed as: r=2n1r = 2n - 1 for the minimum number of middle-stage switches in a balanced three-stage Clos network.

Rearrangeably Nonblocking

In a rearrangeably nonblocking Clos network, any of connections between idle inputs and idle outputs can be established, potentially by rearranging the paths of some existing connections, as long as the number of middle-stage switches rr satisfies rnr \geq n, where nn is the number of ports per ingress or egress switch. This property ensures that the network supports full connectivity for any valid request, albeit with possible disruptions to ongoing paths that must be rerouted transparently. The theoretical basis for rearrangeability in three-stage Clos networks is the Slepian-Duguid theorem, which demonstrates that under the condition rnr \geq n, a complete matching exists by applying to model between ingress and egress switches, treating middle-stage switches as intermediaries for distinct path assignments. guarantees a system of distinct representatives for subsets of inputs and outputs, ensuring no subset of ingress switches requires more middle-stage links than available, thus allowing to be realized after rearrangement. is thus r=nr = n, as derived from the theorem's application to the network's staged structure. To implement rearrangements, a centralized controller typically computes new path assignments by iteratively solving bipartite matching problems across the stages, often using algorithms like Hopcroft-Karp for efficiency in finding augmenting paths that resolve conflicts. For instance, in a Clos network with n=4n=4 and r=4r=4, suppose existing connections route inputs from ingress switch 1 to egress switch 2 via middle switch 3, and from ingress switch 2 to egress switch 1 via middle switch 4, blocking a new request from ingress 1 to egress 1 (which would need middle switch 3 but conflicts due to shared egress constraints). The controller can resolve this by swapping the middle-stage assignments—rerouting the first connection via middle switch 4 and the second via middle switch 3—freeing the path for the new connection while preserving all prior endpoints. Compared to strict-sense nonblocking Clos networks, which require r2n1r \geq 2n-1 to avoid any rearrangements and thus provide about twice as many middle-stage switches (and roughly twice the crosspoints in the middle stage), the rearrangeable variant halves this middle-stage complexity at the cost of control overhead for dynamic path recomputation.

Blocking

Probability Approximations

In Clos that are underprovisioned (i.e., with fewer middle-stage switches than required for nonblocking operation), exact computation of blocking probabilities is complex due to the of possible connection states. Approximate methods provide practical estimates under assumptions of random, uniform traffic. One seminal approach is the Lee approximation, introduced by C. Y. Lee in for analyzing multistage switching . For a three-stage Clos network, the approximation assumes that the m middle switches are independent, with each interstage link occupied with probability p = a/m, where a is the offered load in Erlangs. The probability that a specific path through a middle switch is available is (1 - p)^2, so the blocking probability P_b for a random connection attempt under uniform offered load ρ (in Erlangs per ) is Pb[1(1p)2]m,P_b \approx \left[1 - (1 - p)^2 \right]^m, where p ≈ ρ/n for low loads in symmetric n x n x n Clos with m middle switches (adjusted for exact link utilization). This captures the probability that all m potential paths are blocked. A more refined method is the Jacobaeus approximation, from Carl Jacobaeus's 1950 work on congestion in link systems. It accounts for dependencies by considering the number of busy inputs i and outputs j on the relevant ingress and egress switches (0 ≤ i, j ≤ n-1). The conditional blocking probability is \beta_{i j} = \sum_{k=\max(0, i+j-m)}^{i} \frac{\binom{m}{k} \binom{i}{k} \binom{j}{i+j-k}}{\binom{m}{i+j-k}}, but a simplified form often used is the probability that at least i + j - m + 1 middle switches are required beyond availability. The overall P_b is the expectation over binomial distributions for i and j: PB=i=0n1j=0n1figjβij,P_B = \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} f_i g_j \beta_{i j}, where f_i = \binom{n-1}{i} \lambda^i (1-\lambda)^{n-1-i} with \lambda = \rho / n from Erlang-B, and similarly for g_j. This better captures correlations than Lee's independent assumption. Both approximations rely on key assumptions: random routing of connection requests, uniform traffic distribution across inlets and outlets, and modeling of switch crosspoints as independent loss systems governed by the Erlang-B formula for fixed capacity B(k, a) = \frac{a^k / k!}{\sum_{i=0}^k a^i / i!}. These methods assume Poisson arrivals and exponential holding times, leading to binomial distributions for path availabilities. For illustration, consider a Clos network with n=8, m=8 (underprovisioned relative to the nonblocking threshold of 15), and offered load ρ = 0.8 Erlangs per . Using the Lee approximation with p ≈ 2*(0.8/8) = 0.2 (approximating both links), P_b ≈ [1 - (1-0.2)^2]^8 ≈ [1 - 0.64]^8 = 0.36^8 ≈ 0.0005, indicating very low blocking for this load. Despite their historical influence, these approximations have limitations: they overestimate blocking under bursty or non-uniform patterns, as real-world loads violate assumptions, and they ignore algorithms beyond random selection. Modern analyses often favor simulations or exact Markov models for high-precision needs in large-scale networks.

Factors Influencing Blocking

In Clos networks, traffic patterns significantly impact blocking behavior. Uniform traffic, where connections are evenly distributed across inputs and outputs, typically results in lower blocking probabilities compared to nonuniform patterns such as hot-spot traffic, in which a disproportionate volume concentrates on specific outputs, leading to congestion at middle-stage switches. Bursty traffic, characterized by intermittent high-intensity bursts followed by idle periods, exacerbates blocking even in overprovisioned networks by creating temporary overloads that overwhelm buffering or scheduling mechanisms, reducing overall throughput under real-world workloads. The symmetric structure of Clos topologies can amplify this due to the multiplicity of identical-length paths, which synchronize traffic fluctuations and increase contention at shared . Routing algorithms play a crucial role in mitigating blocking by influencing path selection and load distribution. Fixed or deterministic , which assigns predefined paths without considering current network state, can lead to higher blocking under nonuniform traffic as it fails to balance loads across available middle-stage links. In contrast, random distributes connections probabilistically, offering better average but potentially causing hotspots if randomness aligns poorly with traffic demands. Adaptive , which dynamically adjusts paths based on congestion feedback, more effectively reduces blocking by rerouting around overloaded links, achieving near-nonblocking behavior in high-radix folded-Clos topologies even with faults or imbalances. For packet-switched Clos networks, techniques like deflection —where packets are rerouted to alternative paths upon encountering congestion—further minimize blocking in bufferless or low-buffer designs, though they are more commonly applied in specialized interconnects rather than general fabrics. Fault tolerance directly affects effective blocking rates in operational Clos networks. A single switch failure in any stage can elevate blocking by reducing path diversity, potentially degrading from rearrangeably nonblocking to partially blocking states, as lost links concentrate traffic on surviving paths. strategies, such as deploying extra switches per stage or using multi-path routing with protocols, enhance resilience; for instance, adding one redundant module per stage allows to tolerate isolated failures without reconfiguration, maintaining low blocking under uniform loads. Engineered designs like Microsoft's F10 demonstrate that proactive path recomputation upon failure can limit to under 0.1% for brief outages, trading minimal latency for sustained . Oversubscription ratios represent a practical in Clos network deployment, particularly in cost-sensitive data centers. A common 3:1 oversubscription—where aggregate leaf-to-spine bandwidth is one-third of server-to-leaf capacity—intentionally introduces potential blocking to reduce hardware costs, as full nonblocking would require excessive spine ports. This ratio balances performance and economics, with blocking remaining acceptable under typical workloads below 50% utilization, though it amplifies issues from bursty or hot-spot traffic. To evaluate these factors without relying on approximations like or Jacobaeus models, simulation tools employing methods provide exact blocking probabilities by generating numerous random connection scenarios and computing outcomes empirically. These approaches are particularly useful for complex traffic patterns or fault scenarios, offering high-fidelity insights into real-world performance without analytical simplifications. In modern Clos fabrics, advanced analyses incorporate fluid flow models or to predict blocking under bursty AI workloads, improving as of 2023.

Advanced Variants

Multi-Stage Extensions

The Clos network generalizes to multi-stage architectures beyond the three-stage base case through a recursive construction, where the middle-stage switches of a lower-stage network are replaced by smaller Clos subnetworks of appropriate size, alternating between smaller and larger switch dimensions across stages. This approach allows for scalable designs with an odd number of stages k=2l+1k = 2l + 1, where ll represents the recursion depth, enabling larger port counts while maintaining the potential for nonblocking operation. For instance, a five-stage Clos network is formed by substituting the middle stage of a three-stage Clos with another three-stage Clos subnetwork. In a symmetric kk-stage Clos network with edge switch radix nn, the total number of ports NN scales as N=n(k+1)/2N = n^{(k+1)/2} under optimal parameterization for balanced stages, though practical implementations adjust parameters for specific NN. The nonblocking condition extends the three-stage case, requiring the number of middle-stage switches mm to satisfy m(k1)(n1)+1m \geq (k-1)(n-1) + 1 for strict-sense nonblocking, ensuring paths can always be established without rearrangement regardless of existing connections. This condition arises from recursive application of to the bipartite matching graphs at each stage. A representative example is a five-stage Clos network supporting N=[1024](/page/1024)N = [1024](/page/1024) ports with n=16n = 16, which requires approximately 154,176 crosspoints compared to 193,536 crosspoints for an equivalent three-stage Clos network under similar nonblocking constraints, demonstrating reduced hardware complexity for large-scale systems. The recursive structure also lowers the overall crosspoint density relative to a single-stage crossbar (N2=1,048,576N^2 = 1,048,576 crosspoints), though the path diameter increases to five hops from three. Multi-stage extensions introduce challenges such as heightened control complexity due to the need for coordinated across more levels and increased latency from longer paths, often mitigated by self-routing algorithms that deterministically select paths based on destination addresses without central . In optical implementations, (WDM) integrates with multi-stage Clos topologies to achieve terabit-scale switching capacities; for example, hybrid electro-optical designs combine electronic edge stages with all-optical WDM middle stages to support aggregate throughputs exceeding 1 Tbps while preserving nonblocking properties.

Beneš Networks

The Beneš network is a rearrangeably nonblocking multistage interconnection network designed to connect 2^n inputs to 2^n outputs using 2×2 switching elements, ensuring that any can be realized through reconfiguration of the switches. Introduced by V. E. Beneš in 1964, it achieves optimality in terms of the number of stages, requiring exactly 2n - 1 stages for n = log₂N, where N is the number of ports, which is the minimal depth for rearrangeable networks of this form. This recursive structure consists of two back-to-back n-stage butterfly networks sharing a central stage, allowing efficient permutation routing via algorithms that decompose the connection pattern into sub-permutations. In relation to Clos networks, the Beneš network represents a specialized case within the broader family of multistage , particularly as a power-of-two variant of the three-stage Clos architecture where all crosspoint switches are and the middle stage is expanded recursively to achieve rearrangeable nonblocking behavior for permutations. Unlike the general Clos network, which uses larger k×k switches in the middle stage to meet nonblocking conditions (e.g., m ≥ n for rearrangeability), the Beneš design leverages binary switches exclusively, resulting in a more uniform but deeper with 2n - 1 stages instead of three. This makes it a subtype of Clos networks tailored for binary permutations, with the recursive enabling for large N while maintaining logarithmic depth. The key advantage of Beneš networks lies in their rearrangeable nonblocking property, where any conflict in an initial connection can be resolved by rearranging existing paths without disrupting the overall permutation, as proven through inductive construction on smaller subnetworks. Routing in Beneš networks typically employs the looping algorithm or its variants, which iteratively set switches in forward and backward passes to avoid cycles and ensure conflict-free paths; for example, in an 8×8 network (n=3), the central stage handles 4×4 permutations after resolving the input and output butterflies. This efficiency has made Beneš networks influential in optical switching and parallel computing, though they require centralized control for rearrangement, contrasting with self-routing delta networks. Modern extensions, such as Beneš variants, enhance reliability by adding redundancy while preserving the core recursive structure, demonstrating up to 20% in simulations for N=64 without performance degradation. Overall, Beneš networks provide a foundational model for scalable, permutation-capable interconnects, bridging classical switching principles with contemporary and on-chip fabrics.

Modern Applications

Telecommunications Switching

Clos networks have played a pivotal role in switching since their inception, initially serving as the foundation for circuit-switched systems in electromechanical exchanges. Developed in the mid-1950s for space-division switching, they enabled nonblocking connections for voice paths in large-scale exchanges, ensuring reliable call without reconfiguration under full load. This design minimized blocking in high-traffic environments while maintaining dedicated paths for electrical current transfer. In packet-switched , Clos networks transitioned to (ATM) fabrics during the 1990s, forming the core of high-capacity routers and switches. Widely proposed for scalable fast-packet and ATM implementations, these multistage topologies used nonblocking modules to route cells efficiently, offering multiple paths between inputs and outputs to handle bursty data traffic in core networks. By the two-sided Clos configuration, they ensured m independent paths per connection, reducing contention in broadband ISDN deployments. Evolving further, Clos architectures underpin IP/MPLS routers in modern backhaul, where they facilitate high-throughput aggregation from radio access networks to the core, supporting unified MPLS for low-latency slicing and scalability. Optical telecommunications leverage Clos networks in reconfigurable optical add-drop multiplexers (ROADMs) for routing, enabling dynamic management of dense (DWDM) signals across fiber links. Next-generation Clos-based ROADM designs scale to large node degrees with reduced and power consumption compared to traditional architectures, providing nonblocking route assignment under constraints. These structures integrate multiple optical switching elements, such as wavelength selective switches, to minimize blocking while supporting mega-data-center interconnects and long-haul transport. In high-degree nodes, Clos optical cross-connects (OXCs) optimize functionality by distributing switching across stages, addressing scalability challenges in photonic layer networks. Performance in telecommunications Clos networks emphasizes low latency and high throughput, critical for real-time services. Typical implementations achieve latencies under 1 ms due to fixed hop counts in multistage designs, ensuring predictable delays for voice and packet flows. Throughput scales to high aggregates in core routers, enabled by parallel paths and nonblocking properties that sustain high utilization under uniform traffic.

Data Center Fabrics

In modern data centers, Clos networks have been adapted into spine-leaf topologies, forming a two-stage architecture where leaf switches connect directly to servers and endpoints, while spine switches provide full-mesh interconnections between all leaves to ensure nonblocking connectivity. This design supports oversubscription ratios such as 1:1 for fully nonblocking performance or 3:1 to balance cost and capacity, allowing efficient traffic distribution without hotspots. By leveraging commodity Ethernet switches, these fabrics scale horizontally by adding more spines or leaves, enabling support for clusters exceeding 100,000 servers while maintaining consistent low latency across the network. Hyperscalers like and Meta (formerly ) have implemented Clos-based fabrics to handle massive-scale workloads, with 's Jupiter network employing a multi-stage for intra-data center connectivity and Meta's F16 using a folded-Clos design optimized for high-throughput applications. As of 2025, these implementations increasingly incorporate 400G and 800G ports to meet bandwidth demands from cloud-native and AI-driven services, with ports supporting QSFP-DD and OSFP form factors for dense, high-speed uplinks. (SDN) controllers, such as those from Arista or , enable and load balancing over these fabrics, facilitating ECMP (Equal-Cost Multi-Path) for even traffic spreading and adaptive path selection. The primary benefits of Clos-based data center fabrics include flat, predictable latency profiles—typically under 1 for —and seamless without requiring proprietary hardware, making them ideal for environments. These topologies also enhance , as traffic can reroute around failed links via multiple paths, ensuring for mission-critical applications. However, challenges arise from the dense deployment of high-speed switches in racks, leading to elevated power consumption and generation, particularly in AI-optimized variants tailored for training clusters. For instance, rail-optimized Clos derivatives, which prioritize GPU-to-GPU bandwidth over general-purpose connectivity, demand advanced cooling solutions to manage loads from 800G interconnects in large-scale ML setups. For example, Arista's 7050 series switches, deployed in leaf-spine Clos configurations, deliver 100% nonblocking throughput at up to 51.2 Tbps per , but require efficient power budgeting to mitigate in hyperscale environments.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.