Recent from talks
Nothing was collected or created yet.
Supercomputer operating system
View on WikipediaA supercomputer operating system is an operating system intended for supercomputers. Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in supercomputer architecture.[1] While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been moving away from in-house operating systems and toward some form of Linux,[2] with it running all the supercomputers on the TOP500 list in November 2017. In 2021, top 10 computers run for instance Red Hat Enterprise Linux (RHEL), or some variant of it or other Linux distribution e.g. Ubuntu.
Given that modern massively parallel supercomputers typically separate computations from other services by using multiple types of nodes, they usually run different operating systems on different nodes, e.g., using a small and efficient lightweight kernel such as Compute Node Kernel (CNK) or Compute Node Linux (CNL) on compute nodes, but a larger system such as a Linux distribution on server and input/output (I/O) nodes.[3][4]
While in a traditional multi-user computer system job scheduling is in effect a tasking problem for processing and peripheral resources, in a massively parallel system, the job management system needs to manage the allocation of both computational and communication resources, as well as gracefully dealing with inevitable hardware failures when tens of thousands of processors are present.[5]
Although most modern supercomputers use the Linux operating system,[6] each manufacturer has made its own specific changes to the Linux distribution they use, and no industry standard exists, partly because the differences in hardware architectures require changes to optimize the operating system to each hardware design.[1][7]

Context and overview
[edit]In the early days of supercomputing, the basic architectural concepts were evolving rapidly, and system software had to follow hardware innovations that usually took rapid turns.[1] In the early systems, operating systems were custom tailored to each supercomputer to gain speed, yet in the rush to develop them, serious software quality challenges surfaced and in many cases the cost and complexity of system software development became as much an issue as that of hardware.[1]

In the 1980s the cost for software development at Cray came to equal what they spent on hardware and that trend was partly responsible for a move away from the in-house operating systems to the adaptation of generic software.[2] The first wave in operating system changes came in the mid-1980s, as vendor specific operating systems were abandoned in favor of Unix. Despite early skepticism, this transition proved successful.[1][2]
By the early 1990s, major changes were occurring in supercomputing system software.[1] By this time, the growing use of Unix had begun to change the way system software was viewed. The use of a high level language (C) to implement the operating system, and the reliance on standardized interfaces was in contrast to the assembly language oriented approaches of the past.[1] As hardware vendors adapted Unix to their systems, new and useful features were added to Unix, e.g., fast file systems and tunable process schedulers.[1] However, all the companies that adapted Unix made unique changes to it, rather than collaborating on an industry standard to create "Unix for supercomputers". This was partly because differences in their architectures required these changes to optimize Unix to each architecture.[1]
As general purpose operating systems became stable, supercomputers began to borrow and adapt critical system code from them, and relied on the rich set of secondary functions that came with them.[1] However, at the same time the size of the code for general purpose operating systems was growing rapidly. By the time Unix-based code had reached 500,000 lines long, its maintenance and use was a challenge.[1] This resulted in the move to use microkernels which used a minimal set of the operating system functions. Systems such as Mach at Carnegie Mellon University and ChorusOS at INRIA were examples of early microkernels.[1]
The separation of the operating system into separate components became necessary as supercomputers developed different types of nodes, e.g., compute nodes versus I/O nodes. Thus modern supercomputers usually run different operating systems on different nodes, e.g., using a small and efficient lightweight kernel such as CNK or CNL on compute nodes, but a larger system such as a Linux-derivative on server and I/O nodes.[3][4]
Early systems
[edit]
The CDC 6600, generally considered the first supercomputer in the world, ran the Chippewa Operating System, which was then deployed on various other CDC 6000 series computers.[9] The Chippewa was a rather simple job control oriented system derived from the earlier CDC 3000, but it influenced the later KRONOS and SCOPE systems.[9][10]
The first Cray-1 was delivered to the Los Alamos Lab with no operating system, or any other software.[11] Los Alamos developed the application software for it, and the operating system.[11] The main timesharing system for the Cray 1, the Cray Time Sharing System (CTSS), was then developed at the Livermore Labs as a direct descendant of the Livermore Time Sharing System (LTSS) for the CDC 6600 operating system from twenty years earlier.[11]
In developing supercomputers, rising software costs soon became dominant, as evidenced by the 1980s cost for software development at Cray growing to equal their cost for hardware.[2] That trend was partly responsible for a move away from the in-house Cray Operating System to UNICOS system based on Unix.[2] In 1985, the Cray-2 was the first system to ship with the UNICOS operating system.[12]
Around the same time, the EOS operating system was developed by ETA Systems for use in their ETA10 supercomputers.[13] Written in Cybil, a Pascal-like language from Control Data Corporation, EOS highlighted the stability problems in developing stable operating systems for supercomputers and eventually a Unix-like system was offered on the same machine.[13][14] The lessons learned from developing ETA system software included the high level of risk associated with developing a new supercomputer operating system, and the advantages of using Unix with its large extant base of system software libraries.[13]
By the middle 1990s, despite the extant investment in older operating systems, the trend was toward the use of Unix-based systems, which also facilitated the use of interactive graphical user interfaces (GUIs) for scientific computing across multiple platforms.[15] The move toward a commodity OS had opponents, who cited the fast pace and focus of Linux development as a major obstacle against adoption.[16] As one author wrote "Linux will likely catch up, but we have large-scale systems now". Nevertheless, that trend continued to gain momentum and by 2005, virtually all supercomputers used some Unix-like OS.[17] These variants of Unix included IBM AIX, the open source Linux system, and other adaptations such as UNICOS from Cray.[17] By the end of the 20th century, Linux was estimated to command the highest share of the supercomputing pie.[1][18]
Modern approaches
[edit]
The IBM Blue Gene supercomputer uses the CNK operating system on the compute nodes, but uses a modified Linux-based kernel called I/O Node Kernel (INK) on the I/O nodes.[3][19] CNK is a lightweight kernel that runs on each node and supports a single application running for a single user on that node. For the sake of efficient operation, the design of CNK was kept simple and minimal, with physical memory being statically mapped and the CNK neither needing nor providing scheduling or context switching.[3] CNK does not even implement file I/O on the compute node, but delegates that to dedicated I/O nodes.[19] However, given that on the Blue Gene multiple compute nodes share a single I/O node, the I/O node operating system does require multi-tasking, hence the selection of the Linux-based operating system.[3][19]
While in traditional multi-user computer systems and early supercomputers, job scheduling was in effect a task scheduling problem for processing and peripheral resources, in a massively parallel system, the job management system needs to manage the allocation of both computational and communication resources.[5] It is essential to tune task scheduling, and the operating system, in different configurations of a supercomputer. A typical parallel job scheduler has a master scheduler which instructs some number of slave schedulers to launch, monitor, and control parallel jobs, and periodically receives reports from them about the status of job progress.[5]
Some, but not all supercomputer schedulers attempt to maintain locality of job execution. The PBS Pro scheduler used on the Cray XT3 and Cray XT4 systems does not attempt to optimize locality on its three-dimensional torus interconnect, but simply uses the first available processor.[20] On the other hand, IBM's scheduler on the Blue Gene supercomputers aims to exploit locality and minimize network contention by assigning tasks from the same application to one or more midplanes of an 8x8x8 node group.[20] The Slurm Workload Manager scheduler uses a best fit algorithm, and performs Hilbert curve scheduling to optimize locality of task assignments.[20] Several modern supercomputers such as the Tianhe-2 use Slurm, which arbitrates contention for resources across the system. Slurm is open source, Linux-based, very scalable, and can manage thousands of nodes in a computer cluster with a sustained throughput of over 100,000 jobs per hour.[21][22]
See also
[edit]References
[edit]- ^ a b c d e f g h i j k l m Encyclopedia of Parallel Computing by David Padua 2011 ISBN 0-387-09765-1 pages 426–429.
- ^ a b c d e Knowing machines: essays on technical change by Donald MacKenzie 1998 ISBN 0-262-63188-1 page 149–151.
- ^ a b c d e Euro-Par 2004 Parallel Processing: 10th International Euro-Par Conference 2004, by Marco Danelutto, Marco Vanneschi and Domenico Laforenza ISBN 3-540-22924-8 page 835.
- ^ a b An Evaluation of the Oak Ridge National Laboratory Cray XT3 by Sadaf R. Alam, et al., International Journal of High Performance Computing Applications, February 2008 vol. 22 no. 1 52–80.
- ^ a b c Open Job Management Architecture for the Blue Gene/L Supercomputer by Yariv Aridor et al in Job scheduling strategies for parallel processing by Dror G. Feitelson 2005 ISBN 978-3-540-31024-2 pages 95–101.
- ^ Vaughn-Nichols, Steven J. (June 18, 2013). "Linux continues to rule supercomputers". ZDNet. Retrieved June 20, 2013.
- ^ "Top500 OS chart". Top500.org. Archived from the original on 2012-03-05. Retrieved 2010-10-31.
- ^ Targeting the computer: government support and international competition by Kenneth Flamm 1987 ISBN 0-8157-2851-4 page 82 [1]
- ^ a b The computer revolution in Canada by John N. Vardalas 2001 ISBN 0-262-22064-4 page 258.
- ^ Design of a computer: the Control Data 6600 by James E. Thornton, Scott, Foresman Press 1970 page 163.
- ^ a b c Targeting the computer: government support and international competition by Kenneth Flamm 1987 ISBN 0-8157-2851-4 pages 81–83.
- ^ Lester T. Davis, The balance of power, a brief history of Cray Research hardware architectures in "High performance computing: technology, methods, and applications" by J. J. Dongarra 1995 ISBN 0-444-82163-5 page 126 [2].
- ^ a b c Lloyd M. Thorndyke, The Demise of the ETA Systems in "Frontiers of Supercomputing II by Karyn R. Ames, Alan Brenner 1994 ISBN 0-520-08401-2 pages 489–497.
- ^ Past, present, parallel: a survey of available parallel computer systems by Arthur Trew 1991 ISBN 3-540-19664-1 page 326.
- ^ Frontiers of Supercomputing II by Karyn R. Ames, Alan Brenner 1994 ISBN 0-520-08401-2 page 356.
- ^ Brightwell, Ron Riesen, Rolf Maccabe, Arthur. "On the Appropriateness of Commodity Operating Systems for Large-Scale, Balanced Computing Systems" (PDF). Retrieved January 29, 2013.
{{cite web}}: CS1 maint: multiple names: authors list (link) - ^ a b Getting up to speed: the future of supercomputing by Susan L. Graham, Marc Snir, Cynthia A. Patterson, National Research Council 2005 ISBN 0-309-09502-6 page 136.
- ^ Forbes magazine, 03.15.05: Linux Rules Supercomputers
- ^ a b c Euro-Par 2006 Parallel Processing: 12th International Euro-Par Conference, 2006, by Wolfgang E. Nagel, Wolfgang V. Walter and Wolfgang Lehner ISBN 3-540-37783-2.
- ^ a b c Job Scheduling Strategies for Parallel Processing: by Eitan Frachtenberg and Uwe Schwiegelshohn 2010 ISBN 3-642-04632-0 pages 138–144.
- ^ SLURM at SchedMD
- ^ Jette, M. and M. Grondona, SLURM: Simple Linux Utility for Resource Management in the Proceedings of ClusterWorld Conference, San Jose, California, June 2003 [3]
Supercomputer operating system
View on GrokipediaOverview and Fundamentals
Definition and Core Functions
A supercomputer operating system is a specialized software layer designed to orchestrate hardware resources in massively parallel computing environments, enabling the execution of computationally intensive tasks such as scientific simulations and data analysis. Unlike general-purpose systems, it prioritizes maximal computational throughput by minimizing overhead and ensuring efficient coordination across thousands of processing elements. These operating systems typically employ a lightweight kernel architecture to support the unique demands of high-performance computing (HPC), focusing on simplicity and reliability to achieve sustained peak performance.[8][9] Core functions include advanced process scheduling tailored for parallel workloads, where non-preemptive mechanisms assign fixed affinities to cores, reducing context-switching overhead and ensuring low-latency execution across distributed nodes. Memory allocation is handled through static partitioning and large-page mechanisms, avoiding demand paging to prevent interference and enable efficient distribution over interconnected nodes. Input/output (I/O) optimization is critical, often achieved by offloading operations to dedicated nodes or utilizing parallel file systems that deliver high-bandwidth data transfer rates, such as terabytes per second, to mitigate bottlenecks in large-scale simulations.[9][10][9] These systems provide essential support for scientific programming models like the Message Passing Interface (MPI), facilitating efficient inter-process communication in distributed-memory architectures through low-overhead messaging. Kernel modifications, such as streamlined interrupt handling and reduced system noise, enable deterministic performance by minimizing jitter and variability, which is vital for reproducible results in long-running computations. Abstraction layers are incorporated to manage heterogeneous hardware, including CPUs, GPUs, and accelerators, allowing seamless resource utilization without compromising scalability. Originating from mainframe operating systems, supercomputer OS designs have evolved to address HPC-specific challenges like massive parallelism.[11][9][10][8]Distinctions from General-Purpose OS
Supercomputer operating systems (OS) are engineered with a primary emphasis on maximizing computational throughput and scalability for high-performance computing (HPC) workloads, in stark contrast to general-purpose OS like Linux distributions for desktops or Windows, which balance user interactivity, multitasking, and peripheral support. These HPC OS often employ stripped-down kernels to minimize system overhead, eliminating features such as graphical user interfaces (GUIs) and unnecessary drivers that could introduce latency or resource contention in batch-oriented environments. For instance, lightweight kernels like those in the Cougar OS demonstrate superior performance in message-passing benchmarks, achieving up to 310 MB/s bandwidth compared to 45 MB/s on standard Linux using TCP/IP, by dedicating nearly all CPU cycles to applications rather than OS services.[12] Similar improvements in efficiency are observed with other lightweight kernels such as Kitten. This focus on deterministic, low-variability execution—often below 1% jitter—enables efficient scaling to thousands of nodes, prioritizing sustained floating-point operations over responsive user interfaces. In terms of hardware support, supercomputer OS are tailored for specialized architectures that general-purpose OS rarely accommodate, such as non-uniform memory access (NUMA) topologies and high-speed interconnects like InfiniBand, which demand custom drivers and optimized memory management to handle massive parallelism without the abstractions suited for commodity hardware. General-purpose OS, designed for uniform memory access (UMA) in personal devices, incur significant penalties on NUMA systems due to poor page placement, potentially degrading execution time by up to 29% without specialized policies. Supercomputer OS integrate direct support for these, including passthrough I/O for low-latency network communication on platforms like Cray XT4, ensuring efficient data movement across nodes without the overhead of emulated hardware layers found in desktop environments. Security and isolation in supercomputer OS favor lightweight virtualization techniques to enforce job boundaries in multi-user, shared-resource settings, differing from the heavyweight hypervisors (e.g., VMware or Hyper-V) in general-purpose OS that provide broad virtualization but at a cost to HPC performance. On compute nodes, mechanisms like Compute Node Kernel (CNK) or Hafnium-based partitions offer memory isolation for individual jobs with minimal overhead—often ≤5%—using hardware-assisted features like Intel VT, while avoiding full virtual machine (VM) stacks that could disrupt tightly coupled simulations. This approach supports container-like isolation via tools such as Singularity, tailored for HPC reproducibility, contrasting with general OS reliance on resource-intensive VMs for similar containment. Key trade-offs in supercomputer OS include reduced multitasking capabilities to emphasize batch processing, where jobs are queued and executed sequentially via schedulers like SLURM, optimizing for long-running scientific computations over interactive sessions. Unlike general-purpose OS that support concurrent user tasks and preemptive scheduling, HPC kernels like Kitten disable multitasking on compute nodes to eliminate context-switching overhead, focusing instead on single-job dominance per node. Additionally, optimized drivers for interconnects such as InfiniBand enable remote direct memory access (RDMA) with sub-microsecond latencies, a necessity for exascale systems, building on but enhancing the RDMA support available in standard OS kernels which are primarily designed for Ethernet-based networking.[13]Historical Development
Early Systems (1950s–1970s)
The earliest supercomputer operating systems emerged in the 1950s amid the transition from vacuum-tube-based machines to more reliable transistorized designs, focusing primarily on basic input/output management and error recovery from frequent hardware failures. Derivatives of the ENIAC, such as those developed at institutions like MIT's Whirlwind I in 1951, utilized paper tape loaders to automate program loading and reduce manual intervention, encoding instructions in 5-bit format with sprocket holes for sequential batch execution. These rudimentary systems addressed the unreliability of vacuum tubes, which were prone to overheating and burnout, by incorporating simple monitors that coordinated tape handling and basic diagnostics to resume operations after failures.[14][15] By the 1960s, operating systems for pioneering supercomputers like the CDC 6600 emphasized batch processing optimized for single-processor scientific workloads, leveraging peripheral processors to offload I/O and allow the central unit to focus on compute-intensive tasks such as floating-point operations. The CDC 6600's SCOPE (System of Computer Operated Processing Environment), introduced in 1964, managed job scheduling with time limits specified in octal seconds and terminated exceeding jobs while preserving output, enabling efficient handling of serial vector computations in environments like university computing centers. Similarly, IBM's System/360, launched in 1964, adapted OS/360 for scientific computing by supporting unified batch processing across a range of models, eliminating the need for separate scientific hardware and introducing Job Control Language (JCL) to script resource requests and sequential job execution. NOS, an evolution for the CDC 6000 series in the 1970s, enhanced multi-user batch capabilities with improved task scheduling via peripheral processors, further streamlining magnetic tape I/O for data-heavy simulations.[16][17] In the 1970s, systems like the ILLIAC IV introduced rudimentary multiprocessing to handle parallel array processing, marking a shift toward modularity influenced by emerging minicomputer clusters. The ILLIAC IV's operating system, built on a Burroughs B6500 control unit, distributed functions across independent ALGOL modules for resource management, including disk allocation and I/O via job partners that handled interrupts and error recovery for its 64 processing elements configured in arrays. Challenges included the lack of hardware protection, leading to 1-second swapping inefficiencies and batch-mode preferences for jobs under 5 minutes, with error recovery relying on checkpointing and section comparisons to isolate faults in over 6 million components. This era's minicomputer clusters, such as those using Unix on PDP-11 systems, promoted OS modularity through portable, hierarchical designs that influenced supercomputer software by enabling scalable resource sharing and fault-tolerant structures.[18][19][20]Specialized OS in the Vector Era (1980s–1990s)
The vector era of supercomputing, spanning the 1980s and 1990s, saw the development of specialized operating systems tailored to exploit the architectural innovations of vector processors, which emphasized high-throughput computations through long pipelines and single-instruction, multiple-data (SIMD) paradigms. These OS designs prioritized efficient resource allocation for vector operations, moving beyond the batch-oriented systems of earlier decades to support interactive multitasking and multiprocessor coordination. Key examples include Cray Research's operating systems, which evolved from the Cray Operating System (COS) introduced with the Cray-1 in 1976 to UNICOS in the mid-1980s, providing Unix-like compatibility while optimizing for vector pipelines across systems like the Cray-2 and Y-MP.[21][22] Similarly, Fujitsu's VP series, launched in 1982 with models like the VP-100 and VP-200, utilized the proprietary MSP/EX operating system for enhanced throughput and expansibility, alongside the UNIX-based UXP/M with a Vector Processor Option (VPO) to enable vector-specific execution in batch and interactive modes.[23] Innovations in these OS focused on seamless integration with vector hardware, including runtime support for SIMD instructions through advanced compilers that automated vectorization of loops and conditional statements. For instance, Fujitsu's FORTRAN77 EX/VP compiler in UXP/M utilized up to seven vector pipelines with parallel scheduling to maximize efficiency on VP systems achieving peak performances of approximately 0.5 GFLOPS per processor.[23] UNICOS extended this with microtasking capabilities for fine-grained parallelism on Cray Y-MP systems, incorporating dynamic load balancing to distribute workloads across multiple processors and mitigate imbalances in vector unit utilization.[24] Network file system adaptations, building on standard NFS protocols introduced in 1984, were customized for high-performance computing; vendor OS like UNICOS and UXP/M integrated high-speed I/O subsystems and vector-friendly file access to handle large-scale data transfers without bottlenecking pipeline operations.[25] These features emphasized conceptual scalability over exhaustive benchmarks, enabling applications in scientific simulations to leverage vector units without manual reconfiguration. Significant events shaped OS development during this period. The establishment of the National Science Foundation's supercomputer centers in 1985— including the National Center for Supercomputing Applications at the University of Illinois, the Cornell Theory Center, the John von Neumann Center at Princeton, and the San Diego Supercomputer Center—provided widespread access to vector systems and spurred collaborative software efforts, including explorations of Unix-based environments for portability and training.[26] A fifth center, the Pittsburgh Supercomputing Center, followed in 1986, further promoting standardized interfaces for vector OS. The introduction of the TOP500 list in 1993 began tracking global supercomputer performance biannually, highlighting the dominance of vector architectures and indirectly driving OS portability by showcasing systems with Unix derivatives that facilitated code migration across vendors.[27] Challenges in OS design centered on managing complex memory hierarchies in multiprocessor vector systems. The Cray Y-MP, released in 1988 with configurations supporting up to eight processors at 6 ns cycle times and 32 megawords of central memory, required UNICOS to handle shared memory access contention and vector data staging, where inefficiencies in inter-processor communication could degrade sustained performance below 2 GFLOPS.[28] These systems addressed such issues through advanced paging and solid-state storage integration, but the need for fault-tolerant resource scheduling underscored the era's push toward robust, vendor-specific kernels optimized for vector parallelism.[29]Design Principles and Challenges
Scalability for Parallel Processing
Supercomputer operating systems achieve scalability for parallel processing through distributed kernel architectures that deploy a lightweight kernel instance per compute node, minimizing interference and enabling efficient resource utilization across thousands of nodes. This design, often exemplified by systems like the Kitten kernel, avoids monolithic structures by assigning one kernel per node to handle local tasks such as device initialization, process scheduling, and memory management, while external coordinators manage global synchronization via high-speed interconnects.[30] Such multikernel approaches treat the system as a network of independent cores communicating through message-passing, recasting traditional OS functions to leverage distributed systems principles for better performance on multicore hardware.[31] These kernels support the Single Program, Multiple Data (SPMD) model, where a single executable runs across multiple nodes with data partitioned accordingly, facilitated by OS-level process launching and communication primitives that ensure coordinated execution without centralized bottlenecks.[32] Key techniques for enhancing parallelism include implementations of the Partitioned Global Address Space (PGAS) model, which provides a globally shared address space while maintaining local memory coherence per node to support scalable data access in distributed environments. PGAS integrations in supercomputer OSes, often backed by hardware extensions like FPGA-based communication engines, enable low-overhead remote memory operations, achieving latencies under 2 µs for fine-grained accesses and throughputs exceeding 300 MB/s for cache-line writes.[33] Thread management is handled efficiently via OpenMP runtimes, such as lightweight user-level threading libraries that optimize nested parallelism and affinity binding, delivering up to 2.5x performance gains on multi-core nodes while preserving flat parallelism efficiency.[34] Scalability is further analyzed using Amdahl's Law applied to OS overhead, where the speedup is given bywith as the parallelizable fraction of the workload and as the number of processors; this highlights how even small serial OS components, like context switching costing ~10⁴ cycles, limit efficiency to below 20% on million-core systems if not minimized.[35] To integrate with hardware topologies, supercomputer OSes adapt to fat-tree networks, which provide non-blocking, scalable interconnects with increasing bandwidth toward the root to prevent bottlenecks in collective operations. These adaptations involve optimized network stacks and drivers that route traffic hierarchically across core, aggregation, and edge switches, ensuring low-latency communication for all-to-all patterns common in parallel workloads.[36] Such designs enable near-linear scaling in benchmarks like the High-Performance Linpack (HPL), where implementations on GPU-accelerated clusters achieve over 90% weak-scaling efficiency, escalating from hundreds of TFLOPS on single nodes to tens of PFLOPS across 128 nodes through OS-managed process binding and communication hiding.[37]