Recent from talks
Nothing was collected or created yet.
K computer
View on Wikipedia
| Active | June 2011 – August 2019 |
|---|---|
| Sponsors | MEXT, |
| Operators | Fujitsu |
| Location | Riken Advanced Institute for Computational Science |
| Architecture | 88,128 SPARC64 VIIIfx processors, Tofu interconnect |
| Power | 12.6 MW |
| Operating system | Linux[1][2] |
| Speed | 10.51 petaflops (Rmax) |
| Ranking | TOP500: 18th, as of November 2018[update][3] |

The K computer – named for the Japanese word/numeral "kei" (京), meaning 10 quadrillion (1016)[4][Note 1] – was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Prefecture, Japan.[4][5][6] The K computer was based on a distributed memory architecture with over 80,000 compute nodes.[7] It was used for a variety of applications, including climate research, disaster prevention and medical research.[6] The K computer's operating system was based on the Linux kernel, with additional drivers designed to make use of the computer's hardware.[8]
In June 2011, TOP500 ranked K the world's fastest supercomputer, with a computation speed of over 8 petaflops, and in November 2011, K became the first computer to top 10 petaflops.[9][10] It had originally been slated for completion in June 2012.[10] In June 2012, K was superseded as the world's fastest supercomputer by the American IBM Sequoia.[11]
As of November 2018[ref], the K computer held third place for the HPCG benchmark. It held the first place until June 2018, when it was superseded by Summit and Sierra.[12][13]
The K supercomputer was decommissioned on 30 August 2019.[14] In Japan, the K computer was succeeded by the Fugaku supercomputer, in 2020, which took the top spot on the June 2020 TOP500 list, at that time nearly three times faster than second most powerful supercomputer.[15]
Performance
[edit]On 20 June 2011, the TOP500 Project Committee announced that K had set a LINPACK record with a performance of 8.162 petaflops, making it the fastest supercomputer in the world at the time;[4][6][9] it achieved this performance with a computing efficiency ratio of 93.0%. The previous record holder was the Chinese National University of Defense Technology's Tianhe-1A, which performed at 2.507 petaflops.[5] The TOP500 list is revised semiannually, and the rankings change frequently, indicating the speed at which computing power is increasing.[4] In November 2011, Riken reported that K had become the first supercomputer to exceed 10 petaflops, achieving a LINPACK performance of 10.51 quadrillion computations per second with a computing efficiency ratio of 93.2%.[10] K received top ranking in all four performance benchmarks at the 2011 HPC Challenge Awards.[16]
On 18 June 2012, the TOP500 Project Committee announced that the California-based IBM Sequoia supercomputer replaced K as the world's fastest supercomputer, with a LINPACK performance of 16.325 petaflops. Sequoia is 55% faster than K, using 123% more CPU processors, but is also 150% more energy efficient.[11]
On the TOP500 list, it became first in June 2011, falling down through time to lower positions, to eighteenth in November 2018.[12]
K computer held third place in the HPCG benchmark test proposed by Jack Dongarra, with 0.6027 HPCG PFLOPS in November 2018.[17]
Specifications
[edit]Node architecture
[edit]The K computer comprised 88,128 2.0 GHz eight-core SPARC64 VIIIfx processors contained in 864 cabinets, for a total of 705,024 cores,[1][18] manufactured by Fujitsu with 45 nm CMOS technology.[19] Each cabinet contained 96 computing nodes, in addition to six I/O nodes. Each computing node contained a single processor and 16 GB of memory. The computer's water cooling system was designed to minimize failure rate and power consumption.[20]
Network
[edit]The nodes were connected by Fujitsu's proprietary torus fusion (Tofu) interconnect.[20][21][22][23]
File system
[edit]The system adopted a two-level local/global file system with parallel/distributed functions, and provided users with an automatic staging function for moving files between global and local file systems. Fujitsu developed an optimized parallel file system based on Lustre, called the Fujitsu Exabyte File System (FEFS), which is scalable to several hundred petabytes.[20][24]
Power consumption
[edit]Although the K computer reported the highest total power consumption (9.89 MW – the equivalent of almost 10,000 suburban homes) on the June 2011 TOP500 list, it is relatively efficient, achieving 824.6 GFlop/kW. This is 29.8% more efficient than China's NUDT TH MPP (ranked #2 in 2011), and 225.8% more efficient than Oak Ridge's Jaguar-Cray XT5-HE (ranked #3 in 2011). However, K's power efficiency still fell far short of the 2097.2 GFlops/kWatt supercomputer record set by IBM's NNSA/SC Blue Gene/Q Prototype 2. For comparison, the average power consumption of a TOP 10 system in 2011 was 4.3 MW, and the average efficiency was 463.7 GFlop/kW.[9]
According to TOP500 compiler Jack Dongarra, professor of electrical engineering and computer science at the University of Tennessee, the K computer's performance equaled "one million linked desktop computers".[5] The computer's annual running costs were estimated at US$10 million.[5]
K Computer Mae rapid transit station
[edit]On 1 July 2011, Kobe's Port Island Line rapid transit system renamed one of its stations from "Port Island Minami" to "K Computer Mae" (meaning "In front of K Computer") denoting its vicinity.[25] In June 2021, after the decommissioning of K computer, the station was renamed as Keisan Kagaku Center Station.[26]
See also
[edit]Notes
[edit]- ^ See Japanese numbers
References
[edit]- ^ a b K computer, SPARC64 VIIIfs 2.0GHz, Tofu interconnect
- ^ Moroo, Jun; et al. (2012). "Operation System for the K computer" (PDF). Fujitsu Sci. Tech. J. 48 (3): 295–301. Archived from the original (PDF) on 28 December 2013. Retrieved 23 May 2013.
- ^ "TOP500 List - November 2018". www.top500.org. November 2018. Archived from the original on 13 November 2018. Retrieved 16 November 2018.
- ^ a b c d "Japanese 'K' Computer Is Ranked Most Powerful". The New York Times. 20 June 2011. Retrieved 20 June 2011.
- ^ a b c d "Japanese supercomputer 'K' is world's fastest". The Telegraph. 20 June 2011. Retrieved 20 June 2011.
- ^ a b c "Supercomputer "K computer" Takes First Place in World". Fujitsu. Retrieved 20 June 2011.
- ^ Yokokawa, Mitsuo; Shoji, Fumiyoshi; Uno, Atsuya; Kurokawa, Motoyoshi; Watanabe, Tadashi (1–3 August 2011). The K computer: Japanese next-generation supercomputer development project. IEEE/ACM International Symposium on Low Power Electronics and Design. IEEE. pp. 371–372. doi:10.1109/ISLPED.2011.5993668.
- ^ Moroo; et al. (2012). "Operating System for the K computer" (PDF). Fujitsu. Retrieved 18 June 2013.
- ^ a b c June 2011 TOP500 Supercomputer Sites
- ^ a b c "K computer" Achieves Goal of 10 Petaflops". Fujitsu. 2 November 2011. Retrieved 10 November. 2011.
- ^ a b Kottoor, Naveena (18 June 2012). "IBM supercomputer overtakes Fujitsu as world's fastest". BBC.
- ^ a b "TOP500 - K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect". Retrieved 15 November 2017.
- ^ "HPCG - November 2018 | TOP500 Supercomputer Sites". www.top500.org. Retrieved 16 November 2018.
- ^ "Japan pulls plug on K, once the world's fastest supercomputer, after seven-year run". www.japantimes.co.jp. 16 August 2019. Retrieved 30 August 2019.
- ^ "Japan's Fugaku gains title as world's fastest supercomputer". www.riken.jp. 23 June 2020. Retrieved 13 April 2024.
- ^ ""K computer" No. 1 in Four Benchmarks at HPC Challenge Awards". Riken. 17 November 2011. Retrieved 17 November 2011.
- ^ "June 2017 HPCG Results". HPCG Benchmark. June 2017. Archived from the original on 30 September 2017. Retrieved 29 September 2017.
- ^ ""SPARC64™ VIIIfx": A Fast, Reliable, Low-power CPU". Fujitsu Global. Retrieved 24 February 2013.
- ^ Takumi Maruyama (25 August 2009). SPARC64(TM) VIIIfx: Fujitsu's New Generation Octo Core Processor for PETA Scale computing (PDF). Proceedings of Hot Chips 21. IEEE Computer Society. Retrieved 24 February 2013.
- ^ a b c "Riken Advanced Institute for Computational Science" (PDF). Riken. Archived from the original (PDF) on 27 July 2011. Retrieved 20 June 2011.
- ^ "Programming on K computer" (PDF). Fujitsu. Retrieved 24 June 2011.
- ^ "Open MPI powers 8 petaflops". Cisco Systems. Archived from the original on 28 June 2011. Retrieved 24 June 2011.
- ^ Yuichiro Ajima; et al. (2009). "Tofu: A 6D Mesh/Torus Interconnect for Exascale Computers". Computer. 42 (11). IEEE Computer Society: 36–40. Bibcode:2009Compr..42k..36A. doi:10.1109/MC.2009.370. S2CID 2049404.
- ^ "An Overview of Fujitsu's Lustre Based File System" (PDF). Fujitsu. Retrieved 24 June 2011.
- ^ "Japan's K Supercomputer". Trends in Japan. January 2012. Retrieved 6 June 2012.
- ^ ""K Computer Mae" on the Port Liner will change its station name in June 2021 - [WTM] Railway & Travel News". 17 November 2020.
External links
[edit]- Riken Advanced Institute for Computational Science
- Riken Next-Generation Supercomputer R&D Center
- K computer: Fujitsu Global
- Fujitsu Scientific & Technical Journal, July 2012 (Vol. 48, No. 3, The K computer
- Special Interview: Taking on the Challenge of a 10-Petaflop Computer, Riken News, No. 298, April 2006.
- June 2017 Top 500
K computer
View on GrokipediaHistory and Development
Origins and Funding
In 2006, the Japanese government, through the Ministry of Education, Culture, Sports, Science and Technology (MEXT), announced the Next-Generation Supercomputing Project as part of the broader High Performance Computing Infrastructure (HPCI) initiative. This effort was designated a key technology of national importance to bolster Japan's competitiveness in computational science and address pressing global challenges requiring advanced simulation capabilities. The project focused on creating a petascale supercomputer to support research in areas such as climate modeling, drug discovery, and disaster prevention, aiming to enable breakthroughs that would position Japan at the forefront of high-performance computing innovation.[9][7] Originally planned as a consortium involving NEC, Hitachi, and Fujitsu to develop a hybrid vector-scalar system, the project faced setbacks when NEC and Hitachi withdrew in early 2009 due to economic difficulties. Fujitsu was subsequently selected as the sole lead developer on May 14, 2009, shifting the design to a fully scalar architecture.[10] The total development cost for the project was approximately 112 billion yen, funded primarily by the national government to ensure shared access for researchers across academia and industry. This investment reflected the strategic priority placed on supercomputing for advancing scientific discovery and economic growth, with the system intended for operation at RIKEN's facilities in Kobe. Annual operating costs were estimated at around US$10 million, covering maintenance, power, and support to sustain long-term utilization.[11][12] RIKEN was appointed as the primary operator and coordinator, leveraging its expertise in computational research, while the partnership with Fujitsu combined RIKEN's scientific oversight with Fujitsu's extensive experience in supercomputer design, involving over 1,000 engineers and researchers in the joint effort. The collaboration emphasized indigenous technology development to reduce reliance on foreign systems and foster domestic HPC capabilities.[9][10]Design and Construction
The development of the K computer began with conceptual design in 2006, as part of a joint effort between RIKEN and Fujitsu to create a next-generation supercomputer for high-performance computing in Japan.[2] Full-scale development followed shortly thereafter, focusing on integrating advanced hardware components tailored for massive parallelism. The first eight racks were shipped to RIKEN's Advanced Institute for Computational Science (AICS) facility in Kobe on September 28, 2010, enabling partial operations for initial testing and validation.[10] Key design choices centered on the adoption of the SPARC64 VIIIfx processor, a customized version of the SPARC64 architecture optimized for high-performance computing through enhancements in vector processing and power efficiency.[1] The system was engineered to comprise 864 racks housed within 432 cabinets, with each rack containing 96 compute nodes, for a total of 82,944 compute nodes, providing the distributed memory architecture necessary for petascale simulations.[1] This scale was selected to achieve target performance levels while maintaining interconnect efficiency. Construction milestones included the progressive installation of all 864 racks over approximately 11 months, culminating in full system assembly by August 2011 at the AICS facility in Kobe.[10] Central to this was the integration of the Tofu interconnect, a six-dimensional mesh/torus network that ensured low-latency communication and scalability across the entire node array.[13] Addressing challenges in an earthquake-prone region like Kobe, the AICS facility incorporated seismic-resistant structures and soil liquefaction countermeasures to safeguard the system's main functions during seismic events.[14] Simultaneously, designers tackled scalability to 10 petaflops by implementing an advanced water-cooling system that managed heat dissipation and power demands, reducing CPU temperatures to enhance overall efficiency.[9] These measures allowed the K computer to operate reliably in a high-risk environment while meeting ambitious performance goals.[15]Technical Specifications
Processor and Node Architecture
The K computer's compute nodes each incorporated a single SPARC64 VIIIfx processor, a custom eight-core scalar CPU developed by Fujitsu specifically for high-performance computing applications.[16] Operating at 2.0 GHz, the processor delivered a peak performance of 128 GFLOPS (16 GFLOPS per core) through fused multiply-add (FMA) operations, with the processor supporting 16 GB of DDR3 SDRAM memory per node for balanced compute and data handling.[16] The overall system scaled to 88,128 such processors, encompassing 705,024 cores distributed across 82,944 compute nodes and 5,184 I/O nodes, enabling massive parallel processing for scientific simulations.[17] Architecturally, the SPARC64 VIIIfx was fabricated on a 45 nm silicon-on-insulator (SOI) process, integrating the memory controller directly on-chip to minimize latency and power overhead while maximizing bandwidth to the DDR3 interface.[18] Key features included dual 64-bit SIMD vector pipelines per core, enabling 128-bit wide floating-point operations via the High Performance Computing Arithmetic and Control Extension (HPC-ACE) instruction set, which extended the SPARC V9 ISA for vectorized workloads common in HPC.[16] Additionally, the processor incorporated integer multiply-accumulate (MAC) instructions in the HPC-ACE extensions, facilitating efficient accumulation in integer-based algorithms for fields like climate modeling and fluid dynamics.[19] At the node level, four compute nodes were mounted on each system board, with 24 system boards accommodated per compute rack alongside six I/O system boards, resulting in 96 compute nodes per rack across the system's 864 racks.[1] This dense, water-cooled organization optimized space and thermal management, with each node interconnected via the Tofu network for system-wide coordination.[1]Interconnect and Network
The K computer's interconnect, known as Tofu (Torus Fusion), is a proprietary high-performance network developed by Fujitsu to enable efficient communication among its compute nodes. It utilizes a six-dimensional (6D) mesh/torus topology, structured as a Cartesian product of a 3D torus in the xyz dimensions and a 3D mesh/torus in the abc dimensions, with the abc dimensions fixed at sizes 2 × 3 × 2 to align with physical hardware constraints and promote scalability. This design provides direct node-to-node links without intermediate switches, ensuring low-latency data transfer and inherent fault tolerance through multiple routing paths that can bypass defective components.[20] Each compute node features a Tofu interface with 10 bidirectional links, delivering a peak bandwidth of 10 GB/s per link (5 GB/s in each direction), for an aggregate off-chip bandwidth of 100 GB/s per node. The network supports the full scale of 88,128 nodes, allowing seamless parallel processing across the system while maintaining high bisection bandwidth for balanced communication in distributed workloads. Intra-node groups of 12 nodes sharing identical xyz coordinates are interconnected via the abc axes in a mesh/torus fashion, overlaying up to twelve independent 3D tori for optimized local exchanges, while inter-group connections extend the topology globally.[13][21] Key features include built-in fault detection and isolation, where the system can dynamically reroute traffic around failed nodes—such as removing a minimal set of four nodes if one fails—without significant performance degradation, supporting reliable operation in large-scale environments. The hierarchical embedding of lower-dimensional tori within the 6D structure further enhances flexibility, enabling users to allocate virtual 3D torus subnetworks for jobs regardless of physical node placement. This fault-tolerant, switchless architecture contrasts with traditional switched fabrics by reducing points of failure and simplifying maintenance.[13][21] The Tofu interconnect's design rationale prioritizes scalability for exascale computing, high-bandwidth efficiency to support data-intensive simulations, and low-latency communication to minimize synchronization overhead in parallel applications. By embedding 3D torus properties within each cubic fragment of the 6D network, it achieves superior embeddability and routing efficiency compared to lower-dimensional alternatives, making it ideal for grand-challenge problems requiring massive inter-node coordination. These attributes contributed to the K computer's ability to sustain over 10 petaflops in real-world scientific computations.[21]Storage and File System
The K computer's storage infrastructure was built around the Fujitsu Exabyte File System (FEFS), a high-performance parallel file system based on Lustre, tailored to manage the enormous data volumes produced by petascale simulations. FEFS employed a two-layer architecture consisting of a local file system for temporary, high-speed access and a global file system for large-scale, shared storage, with an initial capacity of several tens of petabytes scalable to a 100-petabyte class. This design allowed for efficient handling of datasets exceeding hundreds of terabytes, supporting the demands of scientific computing workloads.[22][23] The storage hardware comprised thousands of Object Storage Server (OSS) nodes, including over 2,400 for the local file system and over 80 for the global file system, integrated with Fujitsu ETERNUS disk arrays configured in RAID5 for speed and RAID6 for capacity and redundancy. These OSS nodes delivered an aggregate bandwidth exceeding 1 TB/s, with measured read throughputs reaching 1.31 TB/s on 80% of the system using the IOR benchmark, ensuring sustained high-performance I/O for parallel applications. The system incorporated 6 OSS per storage rack to distribute load and maintain scalability, connected via the Tofu interconnect for low-latency data transfer.[23][22] Dedicated I/O nodes, functioning as OSS, handled data movement between the compute nodes and storage layers, minimizing contention and enabling asynchronous transfers through the Tofu network. This setup supported up to 20,000 OSS and 20,000 object storage targets (OSTs), allowing dynamic expansion without downtime. Integration with the job scheduler facilitated automatic file staging, where input data was transferred to local storage prior to job execution and output results were archived to the global system post-completion, optimizing overall workflow efficiency.[22][24] FEFS emphasized high-throughput access for large simulation datasets via Lustre extensions, including MPI-IO optimizations, file striping across up to 20,000 OSTs, and a 512 KB block size tuned for the system's interconnect. Reliability was enhanced through hardware-level redundancy, such as duplicated components and failover mechanisms, alongside software features like continuous journaling and automatic recovery to prevent data loss during intensive operations. These capabilities ensured robust performance in reliability-critical environments, with minimal downtime even under full-scale usage.[23][22]Power Consumption and Efficiency
The K computer required a total power consumption of 12.66 MW at full load, encompassing both IT equipment and supporting infrastructure.[4] This high demand was managed through a dedicated power supply system, including cogeneration facilities and commercial grid connections, to ensure stable operation for sustained computational tasks.[15] Cooling demands were addressed with a water-cooling system for critical components like CPUs, interconnect chips, and power supplies, supplemented by air conditioning, achieving a power usage effectiveness (PUE) of 1.34 during LINPACK testing.[15] This hierarchical cooling design distributed cold water at 15 ± 1°C to node-level components while using underfloor air distribution at the facility level, with high-efficiency fans contributing to overall energy savings compared to traditional air-only systems.[15] The setup supported dense packing of up to 96 compute nodes per rack, minimizing thermal hotspots and enabling reliable performance.[1] Energy efficiency reached 824.6 GFLOPS/kW on the LINPACK benchmark in its June 2011 configuration, reflecting optimized hardware and cooling integration. This metric was bolstered by the low-power SPARC64 VIIIfx processors, each consuming 58 W while delivering 128 GFLOPS peak performance through techniques like clock gating and low-leakage transistors.[16] The full system later improved to approximately 830 GFLOPS/kW, highlighting the design's focus on balancing high throughput with reduced energy use.[4] Environmental resilience was incorporated via seismic isolation using 49 laminated-rubber dampers, allowing the facility to withstand accelerations up to 200 Gal—equivalent to Japan Meteorological Agency intensity levels 5 (no damage) and upper 6 (minor damage)—while optimizing power distribution for uninterrupted operations during potential disruptions.[15]Performance and Benchmarks
TOP500 Rankings
The K computer achieved its first TOP500 ranking in June 2011, securing the number one position with an Rmax performance of 8.162 petaFLOPS on the LINPACK benchmark, calculated using 548,352 processor cores.[4] This result demonstrated 93.0% efficiency relative to its Rpeak of 8.774 petaFLOPS, surpassing China's Tianhe-IA system that had held the top spot.[25] The system's partial deployment at this stage highlighted the effectiveness of its SPARC64 VIIIfx processors and Tofu interconnect in delivering high sustained performance.[4] By November 2011, following full deployment with 705,024 cores, the K computer retained the top ranking and became the first supercomputer to exceed 10 petaFLOPS in Rmax, recording 10.51 petaFLOPS against an Rpeak of 11.28 petaFLOPS.[26] This milestone underscored its dominance in the petaFLOPS era and maintained Japan's lead in supercomputing capability.[4] The K computer held the number one position for two consecutive TOP500 lists before being overtaken by IBM's Sequoia in June 2012, dropping to number two with unchanged Rmax of 10.51 petaFLOPS.[27] Over the subsequent years, it gradually declined in the rankings as faster systems emerged: number three in November 2012, number four from June 2013 to November 2015, number five in June 2016, number seven in November 2016, number eight in June 2017, number ten in November 2017, number sixteen in June 2018, and number eighteen in November 2018.[4] By June 2019, it had fallen to number twenty, reflecting the rapid advancement in global supercomputing performance while its own LINPACK score remained stable at 10.51 petaFLOPS Rmax.[4]| Date | Rank | Rmax (petaFLOPS) | Cores |
|---|---|---|---|
| June 2011 | 1 | 8.162 | 548,352 |
| November 2011 | 1 | 10.51 | 705,024 |
| June 2012 | 2 | 10.51 | 705,024 |
| November 2018 | 18 | 10.51 | 705,024 |