Hubbry Logo
search
logo
376406

Summit (supercomputer)

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia
Summit
SponsorsUnited States Department of Energy
OperatorsIBM
Architecture9,216 POWER9 22-core CPUs
27,648 Nvidia Tesla V100 GPUs[1]
Power13 MW[2]
Operating systemRed Hat Enterprise Linux (RHEL)[3][4]
Storage250 PB
Speed200 petaFLOPS (peak)
RankingTOP500: 7 (1H2024)
PurposeScientific research
Websitewww.olcf.ornl.gov/olcf-resources/compute-systems/summit/
Summit components
POWER9 wafer with TOP500 certificates for Summit and Sierra

Summit or OLCF-4 was a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, United States of America. It held the number 1 position on the TOP500 list from June 2018 to June 2020.[5][6] As of June 2024, its LINPACK benchmark was clocked at 148.6 petaFLOPS.[7] Summit was decommissioned on November 15, 2024.[8]

As of November 2019, the supercomputer had ranked as the 5th most energy efficient in the world with a measured power efficiency of 14.668 gigaFLOPS/watt.[9] Summit was the first supercomputer to reach exaflop (a quintillion operations per second) speed, on a non-standard metric, achieving 1.88 exaflops during a genomic analysis and is expected to reach 3.3 exaflops using mixed-precision calculations.[10]

History

[edit]

The United States Department of Energy awarded a $325 million contract in November 2014 to IBM, Nvidia and Mellanox. The effort resulted in construction of Summit and Sierra. Summit is tasked with civilian scientific research and is located at the Oak Ridge National Laboratory in Tennessee. Sierra is designed for nuclear weapons simulations and is located at the Lawrence Livermore National Laboratory in California.[11]

Summit was estimated to cover 5,600 square feet (520 m2)[12] and require 219 kilometres (136 mi) of cabling,[13] and was designed to be used for research in diverse fields such as cosmology, medicine, and climatology.[14]

In 2015, the project called Collaboration of Oak Ridge, Argonne and Lawrence Livermore (CORAL) included a third supercomputer named Aurora and was planned for installation at Argonne National Laboratory.[15] By 2018, Aurora was re-engineered with completion anticipated in 2021 as an exascale computing project along with Frontier and El Capitan to be completed shortly thereafter.[16] Aurora was completed in late 2022.[17]

Uses

[edit]

The Summit supercomputer was built for research in energy, artificial intelligence, human health, and other areas.[18] It has been used in earthquake simulation, extreme weather simulation, materials science, genomics, and predicting the lifetime of neutrinos.[19]

Design

[edit]

Each of its 4,608 nodes consist of 2 IBM POWER9 CPUs, 6 Nvidia Tesla GPUs,[20] with over 600 GB of coherent memory (96 GB HBM2 plus 512 GB DDR4) which is addressable by all CPUs and GPUs, plus 800 GB of non-volatile RAM that can be used as a burst buffer or as extended memory.[21] The POWER9 CPUs and Nvidia Volta GPUs are connected using Nvidia's high speed NVLink. This allows for a heterogeneous computing model.[22]

To provide a high rate of data throughput, the nodes are connected in a non-blocking fat-tree topology using a dual-rail Mellanox EDR InfiniBand interconnect for both storage and inter-process communications traffic, which delivers both 200 Gbit/s bandwidth between nodes and in-network computing acceleration for communications frameworks such as MPI and SHMEM/PGAS.

The storage for Summit [23] has a fast in-system layer and a center-wide parallel filesystem layer. The in-system layer is optimized for fast storage with SSDs on each node, while the center-wide parallel file system provides easy to access data stored on hard drives. The two layers work together seamlessly so users do not have to differentiate their storage needs. The center-wide parallel file system is GPFS (IBM Storage Scale). It provides 250PB of storage. The cluster delivers 2.5 TB/s of single stream read peak throughput and 1 TB/s of 1M file throughput. It was one of the first supercomputers that also required extremely fast metadata performance to support AI/ML workloads exemplified by the 2.6M 32k file creates per second it delivers.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Summit is a high-performance computing system developed by IBM and deployed at the Oak Ridge Leadership Computing Facility of Oak Ridge National Laboratory in 2018, featuring 4,608 compute nodes each with two 22-core IBM POWER9 processors and six NVIDIA Tesla V100 GPUs.[1][2] The system delivers a peak theoretical performance of 200 petaFLOPS, enabling advanced data analytics and artificial intelligence workloads alongside traditional simulations.[3][4] Summit achieved the top ranking on the TOP500 list of the world's most powerful supercomputers in June 2018 with an initial High-Performance Linpack benchmark of 122.3 petaFLOPS, later optimized to 148.6 petaFLOPS, maintaining the position through multiple semiannual rankings until surpassed by Japan's Fugaku in June 2020.[3][5] Its architecture emphasized GPU acceleration and high memory bandwidth, with over 10 petabytes of total system memory, supporting large-scale data movement critical for machine learning and scientific discovery.[2][6] The supercomputer has driven notable advancements in fields such as protein structure prediction, climate modeling, and COVID-19 research, processing vast datasets to accelerate empirical insights in biology and physics.[7][8] Extended operations beyond its initial lifecycle have yielded continued scientific outputs, underscoring its role as a bridge to exascale computing.[9]

Development and Procurement

Origins and Funding

The U.S. Department of Energy (DOE) initiated planning for next-generation supercomputers in the early 2010s to sustain American leadership in high-performance computing amid intensifying international rivalry, particularly after China's Tianhe-2 claimed the top position on the TOP500 list in November 2013. At Oak Ridge National Laboratory (ORNL), the existing Titan system, operational since 2012, required replacement to address escalating computational demands in scientific simulation and data analysis. In early 2014, DOE formalized the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) program to coordinate procurement across its national laboratories, optimize resource allocation, and accelerate development of pre-exascale systems capable of supporting advanced research in energy, materials, and national security.[10] On November 14, 2014, DOE awarded IBM a $325 million contract—shared between Summit for ORNL and a companion system, Sierra, for Lawrence Livermore National Laboratory—to design and build machines incorporating hybrid architectures as stepping stones toward exascale computing.[11][12] This selection followed a competitive request for proposals under CORAL, prioritizing vendors able to deliver scalable performance exceeding 100 petaflops while integrating emerging technologies for broader applicability in DOE missions.[13] Funding for Summit derived primarily from DOE's Office of Science budget, appropriated through congressional allocations of federal taxpayer dollars to maintain U.S. computational infrastructure for open science and classified applications.[14] The investment reflected strategic priorities to counter foreign advances in supercomputing, which had implications for technological edge in fields like nuclear simulation and climate modeling, without relying on classified export-restricted hardware.[15]

IBM Partnership and Construction

IBM led the design and construction of Summit in collaboration with NVIDIA for GPU acceleration and Mellanox Technologies for networking, as announced by Oak Ridge National Laboratory on November 14, 2014. The system utilized IBM's Power System AC922 architecture, with each of the 4,608 compute nodes equipped with two 22-core POWER9 processors and six NVIDIA Tesla V100 GPUs, interconnected intra-node via NVIDIA's NVLink 2.0 for 50 GB/s bandwidth between CPUs and GPUs.[16] [1] [2] Assembly of the nodes occurred primarily at IBM facilities, with initial deliveries enabling installation phases at the Oak Ridge Leadership Computing Facility to commence in August 2017. The inter-node fabric employed a non-blocking Mellanox EDR InfiniBand fat-tree topology, providing 100 Gb/s connectivity across dual rails per node to support massive parallelism without bottlenecks.[17] [2] [18] Key engineering milestones included validating the hybrid CPU-GPU scaling during phased rollouts, addressing integration demands of over 27,000 GPUs and 9,000 CPUs through rigorous testing of NVLink coherence and InfiniBand latency under full load. The process achieved initial operational readiness by June 8, 2018, marking the transition from construction to acceptance verification by ORNL and IBM teams.[1] [19]

Technical Design

Hardware Architecture

Summit employs a hybrid CPU-GPU architecture comprising 4,608 compute nodes interconnected via a dual-rail Mellanox EDR InfiniBand network operating at 200 Gb/s per rail.[2][20] Each node integrates two IBM POWER9 processors and six NVIDIA Tesla V100 GPUs, enabling high-performance computing workloads through tight integration of CPU and accelerator resources.[2][21] The POWER9 CPUs in each node feature 22 cores clocked at up to 3.07 GHz, providing 44 CPU cores total per node with support for simultaneous multithreading (SMT) up to 8 threads per core.[19] These processors deliver scalable vector extensions and are optimized for data-intensive tasks, with each node allocating 512 GB of DDR4 memory accessible coherently across CPUs and GPUs.[2] The NVIDIA V100 GPUs, each equipped with 5,120 CUDA cores, 640 Tensor cores, and 16 GB of HBM2 memory, connect to the POWER9 CPUs via NVLink 2.0 interconnects offering up to 900 GB/s bidirectional bandwidth per GPU for low-latency data transfer.[2][19] This configuration yields a theoretical peak performance of 200 petaFLOPS in double-precision (FP64) arithmetic across the system.[22]
ComponentSpecification per NodeSystem Total
CPUs2 × IBM POWER9 (22 cores each, up to 3.07 GHz)9,216 CPUs, ~2.4 million cores
GPUs6 × NVIDIA Tesla V100 (5,120 CUDA cores each)27,648 GPUs
Memory512 GB DDR4 (coherent) + 96 GB HBM2 (GPU)~2.5 PB aggregate
Interconnect (intra-node)NVLink 2.0 (900 GB/s per GPU)N/A
Summit's storage subsystem utilizes the IBM Spectrum Scale parallel file system, known as Spider, with a capacity exceeding 250 PB and sustained bandwidth of 2.5 TB/s to support large-scale data movement for compute nodes.[2][23] This hardware foundation emphasizes GPU acceleration while maintaining balanced CPU capabilities for hybrid workloads.[3]

Software Stack and Programming Models

Summit employs Red Hat Enterprise Linux (RHEL 7) tailored for the ppc64le architecture, serving as the foundational operating system to ensure compatibility with IBM POWER9 processors and NVIDIA GPUs in a high-performance computing environment.[24][25] Resource management and job scheduling rely on IBM Spectrum Load Sharing Facility (LSF), which processes batch submissions and allocates compute nodes, with the jsrun launcher handling parallel task distribution across CPUs and GPUs.[24][25] The system supports core programming paradigms for scalable parallelism, including Message Passing Interface (MPI) via IBM Spectrum MPI for distributed-memory communication across nodes and OpenMP for shared-memory threading within nodes.[24] GPU-accelerated workloads leverage CUDA as the primary model for NVIDIA V100 utilization, supplemented by directive-based approaches like OpenACC and OpenMP offload directives, as well as the Kokkos library for performance portability across heterogeneous architectures.[24][2] Available compilers include IBM XL (default version 16.1.1-7), GCC (6.4.0), and PGI (19.9), with environment modules via Lmod managing dependencies such as CUDA 10.1.243, HDF5 1.10.4 for data handling, and PAPI 5.7.0 for performance monitoring.[24][2] This modular stack, including the NVIDIA CUDA toolkit, enables hybrid CPU-GPU models that optimize data locality and computation overlap, supporting efficient execution of compute-intensive tasks without vendor lock-in where portability tools are applied.[2]

Power Consumption and Cooling Systems

The Summit supercomputer has a peak power consumption rating of 13 megawatts (MW), sufficient to power approximately 10,000 average U.S. households, though measured operational draw has been recorded at around 10.1 MW under load.[2][26] This high demand stems from its dense node architecture, featuring 4,608 compute nodes each equipped with two IBM POWER9 CPUs and six NVIDIA V100 GPUs, which generate substantial thermal output during intensive workloads.[22] To manage heat dissipation, Summit employs a warm-water liquid cooling system integrated into the IBM AC922 node design, circulating over 4,000 gallons of water per minute to remove up to 13 MW of thermal energy.[27] This approach, which operates with inlet water temperatures around 70°F (21°C) and relies primarily on cooling towers rather than energy-intensive chillers, marked a shift from the air-cooled predecessor Titan and necessitated upgrades to the Oak Ridge Leadership Computing Facility (OLCF) infrastructure, including enhanced plumbing and heat rejection capacity.[28][29] The liquid cooling directly addresses the thermal challenges of GPU-dense packing, enabling sustained high-performance operation by maintaining component temperatures below critical thresholds without relying on less efficient air-based methods.[30] In terms of efficiency, Summit achieved a measured power efficiency of 14.668 gigaflops per watt on the Green500 list, outperforming prior systems like Titan (which scored around 2.1 gigaflops per watt) through advancements in GPU architecture and integrated cooling that reduce energy overhead for heat management.[22] This metric reflects the causal trade-off in supercomputing design: while absolute power draw scales with performance ambitions—here enabling over 200 petaflops peak—the per-watt gains from POWER9 processors and Volta GPUs, combined with direct liquid cooling, mitigate some inefficiencies inherent to scaling compute density.[31][23]

Deployment and Performance

Initial Deployment and Testing

The Summit supercomputer underwent final installation phases at Oak Ridge National Laboratory (ORNL) in early 2018, following initial construction that began in August 2017, marking the rollout of its 4,608 compute nodes equipped with IBM POWER9 CPUs and NVIDIA V100 GPUs.[17][32] Initial testing prioritized system stability, scalability across the full node count, and integration of the novel hardware architecture with the software stack, including evaluations of interconnect reliability via the IBM Spectrum Network and high-bandwidth memory access.[33] These phases incorporated both synthetic benchmarks and early application runs to identify and resolve potential bottlenecks in data movement and parallel execution.[19] The U.S. Department of Energy (DOE) conducted rigorous acceptance testing to confirm adherence to design specifications, encompassing High-Performance Linpack (HPL) benchmarks and custom tests for reliability under sustained loads, culminating in formal system acceptance and public launch on June 8, 2018.[14][34] This validation ensured the system's peak performance exceeded 200 petaflops for traditional workloads while demonstrating enhanced capabilities for data-intensive tasks, with no major deviations from projected efficiency metrics during the process.[32] As the successor to the Titan supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), early operations focused on transitioning users through porting applications to leverage Summit's six GPUs per node and NVLink interconnects, including code optimization workshops and readiness programs initiated in 2018.[2] Limited early access was granted to select projects in late 2018 for validation and adaptation, enabling OLCF researchers to migrate workflows from Titan's CPU-GPU hybrid design to Summit's accelerated paradigm while maintaining continuity in shared file systems like Orion.[35][36] This preparatory phase preceded broader allocations via the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program in January 2019.[2]

TOP500 Rankings and Benchmarks

Summit first appeared on the TOP500 list in June 2018, achieving the number one position with an Rmax performance of 122.3 petaflops on the LINPACK benchmark, surpassing China's Sunway TaihuLight at 93 petaflops.[3] It retained the top ranking through the November 2019 list, before being displaced by Japan's Fugaku supercomputer in June 2020, which recorded 442 petaflops.[37] By June 2024, Summit had fallen to the ninth position on the TOP500 list, with an updated Rmax of 148.6 petaflops, amid the rise of exascale systems like Frontier at 1.194 exaflops and Aurora at 1.012 exaflops.[38][39] This decline reflects the rapid advancement in high-performance computing, particularly with AMD-based architectures in newer U.S. systems outperforming Summit's IBM POWER9 CPU and NVIDIA V100 GPU hybrid design on traditional double-precision LINPACK tasks.[40] In addition to standard TOP500 metrics, Summit demonstrated superior capabilities in mixed-precision benchmarks suited for AI workloads. On the HPL-AI benchmark introduced in 2019, it achieved 445 petaflops by leveraging lower-precision arithmetic (FP16 and FP32), tripling its effective performance over double-precision LINPACK and highlighting the efficiency of its GPU-heavy architecture for hybrid CPU-GPU computations in machine learning applications.[41][42] This result underscored Summit's strengths in tasks requiring tensor operations, though subsequent systems like Frontier have since surpassed it in both traditional and AI-oriented benchmarks.[43]

Applications and Scientific Uses

Core Research Domains

Summit supercomputer's allocations for open science were primarily directed toward computationally intensive projects addressing Department of Energy (DOE) priorities in energy security, materials innovation, environmental modeling, and biological systems. Through the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program, researchers received large-scale access to tackle grand challenges, including high-fidelity simulations of fusion energy plasmas, advanced materials design for batteries and catalysts, and accelerated drug discovery pipelines via AI-enhanced molecular dynamics.[14][44][45] Key domains encompassed climate and weather modeling for extreme event prediction, materials science for sustainable energy technologies, and astrophysics for cosmic structure simulations, with INCITE awards distributing millions of node-hours annually to projects in these areas. For instance, efforts focused on kilometer-scale Earth system models to improve climate projections and nuclear physics simulations supporting fusion reactor viability. Astrophysics allocations supported computations of atomic nuclei and neutrino lifetimes, aligning with DOE's emphasis on fundamental physics underpinning energy applications.[44][45][46] Access was managed via peer-reviewed INCITE proposals, open to U.S. and international researchers but evaluated for alignment with DOE missions, resulting in a predominance of U.S.-led teams from national labs, universities, and industry partners. Limited international collaboration reflected national security considerations for DOE-funded facilities, though collaborative elements appeared in joint projects with allied institutions. This policy ensured prioritized allocation to domestic grand challenges while maintaining open science principles.[47][2][48]

Notable Projects and Breakthroughs

In 2020, Summit facilitated rapid virtual screening for COVID-19 therapeutics by simulating molecular dynamics of SARS-CoV-2 proteins, including docking over 8,000 existing drug compounds against the virus's main protease to identify potential inhibitors.[49] This effort, conducted by Oak Ridge National Laboratory researchers using tools like NAMD for spike protein simulations, accelerated the identification of binding candidates within days, contributing to early pandemic response pipelines.[50] The platform's capabilities also supported a supercomputer-driven ensemble docking pipeline with enhanced sampling molecular dynamics, screening billions of compounds via collaborations involving Summit and other resources.[51][52] These simulations earned a 2020 ACM Gordon Bell Special Prize for high-performance computing-based COVID-19 research, validating Summit's role in data-intensive drug discovery.[53] In fusion energy research, Summit enabled Type One Energy to optimize stellarator designs in 2025, producing a refined concept for a practical fusion power plant through high-fidelity plasma and coil simulations.[54] The work focused on modular high-field stellarators, leveraging Summit's computational power to iterate on magnetic confinement configurations that enhance plasma stability and energy output, marking a step toward commercially viable steady-state fusion.[55] This optimization built on prior physics baselines but incorporated Summit's scale to test integrated plant-level parameters, demonstrating feasibility for pilot-scale devices.[56] Summit contributed to debates surrounding Google's 2019 quantum supremacy claim with its Sycamore processor by performing classical tensor network simulations of large-scale quantum circuits, achieving milestones in emulating quantum tasks that underscored classical supercomputing's competitive edge for certain random circuit sampling problems.[57] These benchmarks, run in parallel to Google's reported 200-second quantum runtime versus an estimated 10,000-year classical equivalent, fueled skepticism; subsequent analyses showed optimized classical methods, including those scalable on systems like Summit, could replicate or approximate the task in hours or days, questioning the supremacy threshold.[58][59] While not directly refuting Google's hardware demonstration, Summit's results highlighted limitations in supremacy definitions, emphasizing verifiable classical alternatives over unsubstantiated exponential gaps.[60]

Impact and Achievements

Scientific Advancements

The Summit supercomputer facilitated breakthroughs in understanding DNA repair mechanisms by enabling high-fidelity molecular simulations of nucleotide excision repair (NER), a pathway that surgically excises damaged DNA strands, revealing atomic-level details of protein-DNA interactions previously intractable on smaller systems.[61] These simulations, conducted in 2025, identified key conformational changes in repair proteins, providing causal insights into how NER prevents mutagenesis, with implications for therapies targeting NER-deficient cancers like xeroderma pigmentosum variants.[61] In molecular dynamics, Summit achieved the first large-scale simulations capturing the dominant aqueous conformation of cyclosporine A, an immunosuppressive drug, allowing researchers to model its binding dynamics at resolutions that reduced computational timelines from weeks to hours for systems exceeding millions of atoms.[9] Similarly, Deep Potential Molecular Dynamics (DPMD) on Summit produced record-scale quantum-accurate simulations of biomolecular systems, enabling predictions of protein folding pathways that aligned with experimental data and accelerated virtual screening for drug candidates.[62] These efforts supported peer-reviewed validations of ensemble docking pipelines for inhibitor discovery, quantifying improvements in binding affinity predictions through enhanced sampling of conformational ensembles.[51] Petascale simulations on Summit elucidated causal mechanisms in turbulent flows, including the first detailed modeling of oceanic turbulence under realistic geophysical conditions, which quantified heat dispersion rates in seawater with resolutions capturing sub-kilometer eddies and reducing simulation times for global-scale models from months to days.[63] In fusion research, turbulence-resolved simulations at unprecedented scales optimized stellarator coil geometries, revealing instability mitigation strategies that improved plasma confinement efficiency by factors validated against experimental tokamak data.[54] For atmospheric science, Summit enabled the highest-resolution global weather simulations to date, modeling the entire Earth's atmosphere at 3-kilometer grids and incorporating machine learning to identify emergent extreme weather patterns, such as convective storm triggers, with accuracy improvements over prior petaflop-era models confirmed via benchmark comparisons to observational datasets.[64] These computations, leveraging GPU-accelerated Weather Research and Forecasting (WRF) models, demonstrated scalable performance that halved forecast iteration times for ensemble predictions, fostering peer-reviewed advancements in sub-seasonal predictability.[65]

Geopolitical and National Security Implications

The deployment of Summit in June 2018 enabled the United States to surpass China's Sunway TaihuLight and reclaim the top position on the TOP500 list of the world's fastest supercomputers, ending a five-year period of Chinese dominance that began in 2016.[66][67] This milestone was presented by Department of Energy officials as a strategic assertion of U.S. leadership in high-performance computing amid intensifying technological rivalry with China, where state-driven investments had propelled rapid advances in systems like Sunway.[68][69] Summit's capabilities underscored the dual-use nature of advanced computing, with the Department of Energy emphasizing its potential to support national security through simulations essential for defense-related research, including those countering adversarial developments in weapons design and materials science.[1] U.S. national laboratories, including Oak Ridge where Summit operates, have historically leveraged such systems for stockpile stewardship and other classified simulations that maintain nuclear deterrence without physical testing, thereby enhancing strategic stability against peer competitors.[3] This aligns with broader U.S. policy efforts to restrict technology transfers, as evidenced by export controls on high-performance computing hardware imposed since 2015 to prevent Chinese military entities from acquiring systems capable of similar dual-use applications.[70] Critics of U.S. HPC strategy, however, have pointed to Summit's $200 million federal investment as indicative of over-dependence on government funding, contrasting with private sector priorities focused on profitable cloud and AI deployments rather than raw exascale performance for national needs.[71] This reliance risks eroding competitive edges if public-private synergies falter, particularly as China's domestically developed systems evade some restrictions and close performance gaps in specialized benchmarks.[71]

Criticisms and Challenges

Economic and Efficiency Critiques

The Summit supercomputer was procured and deployed at a capital cost of approximately $200 million as part of the U.S. Department of Energy's CORAL program, involving collaboration between IBM, NVIDIA, and Oak Ridge National Laboratory.[32] [72] Ongoing operational expenses for leadership-class systems like Summit, encompassing maintenance, staffing, and infrastructure, are estimated to exceed $10 million annually based on benchmarks from comparable DOE facilities, though exact figures for Summit remain classified or unpublished in public budgets.[73] [74] Economic critiques highlight potential suboptimal returns on this investment when weighed against alternatives such as distributed cloud computing, where pay-as-you-go models avoid massive upfront outlays and enable elastic scaling for workloads not requiring ultra-low-latency interconnects.[75] [76] Supercomputers like Summit excel in tightly coupled simulations demanding rapid data sharing across thousands of nodes, but for loosely coupled or embarrassingly parallel tasks, commercial clouds can deliver similar throughput at fractional hardware ownership costs, raising questions about the fiscal justification for bespoke national assets over commoditized infrastructure.[75] Efficiency analyses reveal that Summit's headline 200 petaFLOPS peak performance masks practical underutilization in heterogeneous workloads, stemming from the programming challenges of its POWER9 CPU-NVIDIA V100 GPU architecture, which demands specialized optimization to avoid idle components.[77] For example, quantum circuit simulations on Summit have demonstrated CPU underutilization when GPU-centric codes fail to balance hybrid execution, effectively stranding processing capacity and diminishing performance-per-dollar metrics.[77] Such complexities contribute to debates over whether the marginal gains in raw FLOPS justify the specialized development effort compared to more accessible, general-purpose distributed systems. Federal allocation of funds to mega-scale supercomputers like Summit incurs opportunity costs, potentially diverting resources from distributed grants for basic research or subsidies fostering private-sector HPC adoption, where broader economic multipliers could amplify innovation diffusion beyond siloed national labs.[78] While proponents cite intangible national security benefits, skeptics in policy analyses contend that equivalent budgets applied to ecosystem-wide incentives—such as cloud credits for academia—might yield higher aggregate ROI by accelerating commercialization and reducing taxpayer exposure to depreciating hardware.[79] [78]

Environmental and Operational Drawbacks

The Summit supercomputer consumes up to 15 megawatts of power at peak, equivalent to the electricity demand of 9,000 to 18,000 households, depending on usage patterns and time of day.[80] This substantial draw, while enabling exascale-level simulations, contributes to a significant carbon footprint, as high-performance computing systems like Summit rely on grid electricity that often includes fossil fuel sources, amplifying emissions in proportion to operational hours.[81] Although Summit achieved a power efficiency of 14.668 gigaFLOPS per watt—ranking fifth globally on the Green500 list in November 2019—its absolute energy use exceeds that of its predecessor Titan by about 44%, rising from 9 MW to 13 MW, despite delivering roughly eight times the computational performance.[2][82] This reflects inherent trade-offs in high-performance computing, where scaling compute capability demands commensurate energy inputs, bounded by thermodynamic limits rather than offset by relative efficiency gains alone.[83] Operationally, Summit faced recurrent hardware reliability issues, including node failures that terminated running jobs and required interventions to prevent widespread disruptions.[19] In 2019, specific node malfunctions affected user codes, prompting root-cause analyses with IBM to identify and mitigate underlying defects.[84] GPU-related problems were prevalent, with double-bit errors captured via NVIDIA XID records and firmware logs, alongside simultaneous multi-GPU failures that complicated recovery.[85] Software failures dominated overall downtime, proving difficult to reproduce and often necessitating manual reboots or reallocations, which extended maintenance cycles.[86] These challenges contributed to an operational lifecycle prolonged beyond initial plans through rigorous upkeep, with Summit active until its decommissioning on November 15, 2024, rather than an earlier retirement.[87][88] Such reliability demands underscore that extreme performance in dense, GPU-heavy architectures inherently increases failure rates and sustainment burdens, without viable shortcuts that evade physical scaling laws.

Specific Controversies

In October 2019, Google's claim of achieving "quantum supremacy" with its Sycamore processor sparked debate when the company asserted that simulating the quantum circuit on Summit would require approximately 10,000 years, positioning the quantum task as intractable for classical supercomputers. IBM, a key collaborator in Summit's development, rebutted this by demonstrating that optimized classical algorithms could simulate the same random quantum circuit on Summit in about 2.5 days using roughly 2.5 petabytes of memory, challenging the supremacy threshold and highlighting potential overestimations in Google's classical simulation benchmarks.[89][90] This exchange fueled broader skepticism regarding the practical advantages of near-term quantum devices, as subsequent analyses, including those replicating Sycamore's output on classical hardware, suggested that specialized optimizations could erode claimed quantum speedups without invoking full error-corrected quantum capabilities.[58] Summit's ascent to the top of the TOP500 list in June 2018, displacing China's Sunway TaihuLight after five years of Chinese dominance, intensified perceptions of supercomputing as a proxy in U.S.-China technological rivalry, with U.S. officials framing it as a national achievement amid escalating trade tensions. Critics, including Chinese state media and analysts, contended that TOP500 rankings—based on the High-Performance LINPACK benchmark—politicize comparisons by favoring general-purpose GPU architectures like Summit's NVIDIA V100s over China's domestically engineered many-core processors, which excel in benchmark-specific tasks but face export restrictions on advanced components, potentially understating China's indigenous progress in scalable, specialized computing.[66][91] Such debates underscore accusations of benchmark nationalism, where architectural trade-offs for versatility versus peak flops are overlooked in favor of geopolitical scoring, though proponents argue LINPACK's standardization ensures verifiable, apples-to-apples evaluations across diverse systems.[92] Access to Summit, primarily allocated through the U.S. Department of Energy's competitive peer-review process for Office of Science missions in areas like climate modeling and materials science, has drawn criticism for prioritizing government-directed research over immediate commercialization or broader academic-industry collaboration, despite its $200 million taxpayer-funded construction. Advocates for expanded access argue that restricting cycles largely to DOE users—over 90% for scientific allocations—limits spillover to private-sector innovation, such as drug discovery or AI training, amid calls from industry groups for hybrid public-private models to accelerate economic returns; however, DOE officials maintain that mission-aligned safeguards prevent misuse while enabling targeted breakthroughs with national security implications.[45][3] This tension reflects ongoing discussions in congressional hearings on balancing exascale investments' strategic focus against demands for democratized compute resources to counter global competitors.[93]

Retirement and Legacy

Decommissioning Process

In March 2024, the Oak Ridge Leadership Computing Facility (OLCF) announced an extension of Summit's operational life through a program called SummitPLUS, enabling the system to support over 100 additional research projects into late 2024 before final shutdown.[87] This extension followed an initial plan outlined in September 2023 for allocations from January to October 2024, reflecting Summit's continued utility despite its aging architecture.[94] The decision prioritized computationally ready workloads to maximize productivity during the wind-down phase.[95] Summit's decommissioning was scheduled for November 15, 2024, after approximately six years of production service since its deployment in 2018, during which it delivered over 200 million node hours to global researchers.[88] The primary drivers included technological obsolescence relative to successor systems like Frontier, which achieved exascale performance exceeding Summit's peak capabilities by orders of magnitude, and facility constraints at Oak Ridge National Laboratory requiring space reallocation for expansions and upgrades.[96] Operational logistics involved a phased transition, including a virtual farewell event to acknowledge contributions while the system ran at full capacity until the cutoff.[97] Prior to shutdown, user data and workflows underwent archival to OLCF storage systems such as the High Performance Storage System (HPSS), with active migration of projects to Frontier for continued execution on ORNL's exascale infrastructure.[88] This process ensured minimal disruption, as Frontier's architecture supported porting of Summit-compatible codes, though some adaptations were required for its AMD-based nodes versus Summit's IBM POWER9 setup.[98] Post-deactivation, hardware components were slated for secure disposal or potential repurposing, aligning with Department of Energy protocols for sensitive equipment.[99]

Post-Retirement Reuse and Influence

Summit exerted significant influence on subsequent high-performance computing (HPC) architectures, particularly through its demonstration of hybrid CPU-GPU designs that integrated IBM POWER9 processors with NVIDIA Volta GPUs, enabling balanced compute and data-intensive workloads.[2] This approach informed the development of the Frontier supercomputer at Oak Ridge National Laboratory (ORNL), which adopted a similar heterogeneous node structure—albeit with AMD Epyc CPUs and Instinct MI250X GPUs—while scaling to exascale performance levels exceeding 1 exaFLOPS. Applications ported and optimized on Summit during its operational phase facilitated smoother transitions to Frontier, including benchmarks in mixed-precision computing that highlighted scalability improvements from petaflop to exaflop regimes.[100] Following its decommissioning on November 15, 2024, Summit's components faced limited prospects for repurposing due to rapid obsolescence in HPC hardware and security protocols at national laboratories. ORNL dismantled and shredded over 32,000 disk drives from Summit's associated Alpine storage system, totaling 250 petabytes, to mitigate data security risks rather than enable resale or redistribution.[101] Precedent from ORNL's prior Titan supercomputer indicated that only select memory modules held resale value, while custom CPUs and GPUs lacked viable markets, suggesting Summit's $200 million infrastructure—comprising 4,608 nodes—was largely scrapped without transfer to private entities or international allies.[102] Summit's legacy endures through its contributions to the U.S. HPC ecosystem, fostering advancements in simulation-driven research that persist in peer-reviewed outputs beyond retirement. For instance, molecular simulations performed on Summit informed 2025 publications on nucleotide excision repair pathways in DNA damage response, underscoring the system's role in enabling high-fidelity modeling of biological processes.[103] Over its lifespan, Summit supported more than 100 research projects in its final extended year, yielding datasets and methodologies that continue to underpin scientific breakthroughs in fields like astrophysics and materials science, even as newer systems like Frontier build on its foundational hybrid paradigm.[104]

References

User Avatar
No comments yet.