Hubbry Logo
ItaniumItaniumMain
Open search
Itanium
Community hub
Itanium
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Itanium
Itanium
from Wikipedia
Itanium
General information
LaunchedJune 2001; 24 years ago (2001-06)[a]
DiscontinuedJanuary 30, 2020; 5 years ago (2020-01-30)[1]
Marketed byIntel
Designed byIntel
Hewlett-Packard
Common manufacturer
  • Intel
Performance
Max. CPU clock rate733 MHz to 2.66 GHz
FSB speeds266 MT/s to 667 MT/s
QPI speeds4.8 GT/s to 6.4 GT/s
Data width64 bits
Address width64 bits
Virtual address width64 bits
Cache
L1 cacheUp to 32 KB per core (data)
Up to 32 KB per core (instructions)
L2 cacheUp to 256 KB per core (data)
Up to 1 MB per core (instructions)
L3 cacheUp to 32 MB
L4 cache32 MB (Hondo only)
Architecture and classification
ApplicationHigh-end/mission critical servers
High performance computing
High-end workstations
Technology node180 nm to 32 nm
MicroarchitectureP7
Instruction setIA-64
Extensions
Physical specifications
Cores
  • 1, 2, 4 or 8
Memory (RAM)
  • Up to 1.5 TB
  • Up to DDR3 with ECC support
Packages
Sockets
Products, models, variants
Core names
  • Merced
  • McKinley
  • Madison 3M/6M/9M
  • Deerfield (Madison LV)
  • Hondo[b]
  • Fanwood (Madison DP)
  • Montecito
  • Montvale
  • Tukwila
  • Poulson
  • Kittson
Models
  • Itanium
  • Itanium 2
  • Itanium 9000 series
  • Itanium 9100 series
  • Itanium 9300 series
  • Itanium 9500 series
  • Itanium 9700 series
Support status
Unsupported

Itanium (/ˈtniəm/; eye-TAY-nee-əm) is a discontinued family of 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly developed by HP and Intel. Launching in June 2001, Intel initially marketed the processors for enterprise servers and high-performance computing systems. In the concept phase, engineers said "we could run circles around PowerPC...we could kill the x86". Early predictions were that IA-64 would expand to the lower-end servers, supplanting Xeon, and eventually penetrate into the personal computers, eventually to supplant reduced instruction set computing (RISC) and complex instruction set computing (CISC) architectures for all general-purpose applications.

When first released in 2001 after a decade of development, Itanium's performance was disappointing compared to better-established RISC and CISC processors. Emulation to run existing x86 applications and operating systems was particularly poor. Itanium-based systems were produced by HP and its successor Hewlett Packard Enterprise (HPE) as the Integrity Servers line, and by several other manufacturers. In 2008, Itanium was the fourth-most deployed microprocessor architecture for enterprise-class systems, behind x86-64, Power ISA, and SPARC.[6][needs update]

In February 2017, Intel released the final generation, Kittson, to test customers, and in May began shipping in volume.[7][8] It was only used in mission-critical servers from HPE.

In 2019, Intel announced that new orders for Itanium would be accepted until January 30, 2020, and shipments would cease by July 29, 2021.[1] This took place on schedule.[9]

Itanium never sold well outside enterprise servers and high-performance computing systems, and the architecture was ultimately supplanted by competitor AMD's x86-64 (also called AMD64) architecture. x86-64 is a compatible extension to the 32-bit x86 architecture, implemented by, for example, Intel's own Xeon line and AMD's Opteron line. By 2009, most servers were being shipped with x86-64 processors, and they dominate the low cost desktop and laptop markets which were not initially targeted by Itanium.[10] In an article titled "Intel's Itanium is finally dead: The Itanic sunken by the x86 juggernaut" Techspot declared "Itanium's promise ended up sunken by a lack of legacy 32-bit support and difficulties in working with the architecture for writing and maintaining software", while the dream of a single dominant ISA would be realized by the AMD64 extensions.[11]

History

[edit]

Development: 1989–2001

[edit]

Inception: 1989–1994

[edit]

In 1989, HP started to research an architecture that would exceed the expected limits of the reduced instruction set computer (RISC) architectures caused by the great increase in complexity needed for executing multiple instructions per cycle due to the need for dynamic dependency checking and precise exception handling.[c] HP hired Bob Rau of Cydrome and Josh Fisher of Multiflow, the pioneers of very long instruction word (VLIW) computing. One VLIW instruction word can contain several independent instructions, which can be executed in parallel without having to evaluate them for independence. A compiler must attempt to find valid combinations of instructions that can be executed at the same time, effectively performing the instruction scheduling that conventional superscalar processors must do in hardware at runtime.

HP researchers modified the classic VLIW into a new type of architecture, later named Explicitly Parallel Instruction Computing (EPIC), which differs by: having template bits which show which instructions are independent inside and between the bundles of three instructions, which enables the explicitly parallel execution of multiple bundles and increasing the processors' issue width without the need to recompile; by predication of instructions to reduce the need for branches; and by full interlocking to eliminate the delay slots. In EPIC the assignment of execution units to instructions and the timing of their issuing can be decided by hardware, unlike in the classic VLIW. HP intended to use these features in PA-WideWord, the planned successor to their PA-RISC ISA. EPIC was intended to provide the best balance between the efficient use of silicon area and electricity, and general-purpose flexibility.[13][14] In 1993 HP held an internal competition to design the best (simulated) microarchitectures of a RISC and an EPIC type, led by Jerry Huck and Rajiv Gupta respectively. The EPIC team won, with over double the simulated performance of the RISC competitor.[15]

At the same time Intel was also looking for ways to make better ISAs. In 1989 Intel had launched the i860, which it marketed for workstations, servers, and iPSC and Paragon supercomputers. It differed from other RISCs by being able to switch between the normal single instruction per cycle mode, and a mode where pairs of instructions are explicitly defined as parallel so as to execute them in the same cycle without having to do dependency checking. Another distinguishing feature were the instructions for an exposed floating-point pipeline, that enabled the tripling of throughput compared to the conventional floating-point instructions. Both of these features were left largely unused because compilers didn't support them, a problem that later challenged Itanium too. Without them, i860's parallelism (and thus performance) was no better than other RISCs, so it failed in the market. Itanium would adopt a more flexible form of explicit parallelism than i860 had.[16]

In November 1993 HP approached Intel, seeking collaboration on an innovative future architecture.[17][19] At the time Intel was looking to extend x86 to 64 bits in a processor codenamed P7, which they found challenging.[20] Later Intel claimed that four different design teams had explored 64-bit extensions, but each of them concluded that it was not economically feasible.[21] At the meeting with HP, Intel's engineers were impressed when Jerry Huck and Rajiv Gupta presented the PA-WideWord architecture they had designed to replace PA-RISC. "When we saw WideWord, we saw a lot of things we had only been looking at doing, already in their full glory", said Intel's John Crawford, who in 1994 became the chief architect of Merced, and who had earlier argued against extending the x86 with P7. HP's Gupta recalled: "I looked Albert Yu [Intel's general manager for microprocessors] in the eyes and showed him we could run circles around PowerPC, that we could kill PowerPC, that we could kill the x86."[22] Soon Intel and HP started conducting in-depth technical discussions at an HP office, where each side had six[25] engineers who exchanged and discussed both companies' confidential architectural research. They then decided to use not only PA-WideWord, but also the more experimental HP Labs PlayDoh as the source of their joint future architecture.[12][26] Convinced of the superiority of the new project, in 1994 Intel canceled their existing plans for P7.

In June 1994 Intel and HP announced their joint effort to make a new ISA that would adopt ideas of Wide Word and VLIW. Yu declared: "If I were competitors, I'd be really worried. If you think you have a future, you don't."[22] On P7's future, Intel said the alliance would impact it, but "it is not clear" whether it would "fully encompass the new architecture".[27][28] Later the same month, Intel said that some of the first features of the new architecture would start appearing on Intel chips as early as the P7, but the full version would appear sometime later.[29] In August 1994 EE Times reported that Intel told investors that P7 was being re-evaluated and possibly canceled in favor of the HP processor. Intel immediately issued a clarification, saying that P7 is still being defined, and that HP may contribute to its architecture. Later it was confirmed that the P7 codename had indeed passed to the HP-Intel processor. By early 1996 Intel revealed its new codename, Merced.[30][31]

HP believed that it was no longer cost-effective for individual enterprise systems companies such as itself to develop proprietary microprocessors, so it partnered with Intel in 1994 to develop the IA-64 architecture, derived from EPIC. Intel was willing to undertake the very large development effort on IA-64 in the expectation that the resulting microprocessor would be used by the majority of enterprise systems manufacturers. HP and Intel initiated a large joint development effort with a goal of delivering the first product, Merced, in 1998.[14]

Design and delays: 1994–2001

[edit]

Merced was designed by a team of 500, which Intel later admitted was too inexperienced, with many recent college graduates. Crawford (Intel) was the chief architect, while Huck (HP) held the second position. Early in the development HP and Intel had a disagreement where Intel wanted more dedicated hardware for more floating-point instructions. HP prevailed upon the discovery of a floating-point hardware bug in Intel's Pentium. When Merced was floorplanned for the first time in mid-1996, it turned out to be far too large, "this was a lot worse than anything I'd seen before", said Crawford. The designers had to reduce the complexity (and thus performance) of subsystems, including the x86 unit and cutting the L2 cache to 96 KB.[d] Eventually it was agreed that the size target could only be reached by using the 180 nm process instead of the intended 250 nm. Later problems emerged with attempts to speed up the critical paths without disturbing the other circuits' speed. Merced was taped out on 4 July 1999, and in August Intel produced the first complete test chip.[22]

The expectations for Merced waned over time as delays and performance deficiencies emerged, shifting the focus and onus for success onto the HP-led second Itanium design, codenamed McKinley. In July 1997 the switch to the 180 nm process delayed Merced into the second half of 1999.[32] Shortly before the reveal of EPIC at the Microprocessor Forum in October 1997, an analyst of the Microprocessor Report said that Itanium would "not show the competitive performance until 2001. It will take the second version of the chip for the performance to get shown".[33] At the Forum, Intel's Fred Pollack originated the "wait for McKinley" mantra when he said that it would double the Merced's performance and would "knock your socks off",[34][35] while using the same 180 nm process as Merced.[36] Pollack also said that Merced's x86 performance would be lower than the fastest x86 processors, and that x86 would "continue to grow at its historical rates".[34] Intel said that IA-64 won't have much presence in the consumer market for 5 to 10 years.[37]

Later it was reported that HP's motivation when starting to design McKinley in 1996 was to have more control over the project so as to avoid the issues affecting Merced's performance and schedule.[38][39] The design team finalized McKinley's project goals in 1997.[40] In late May 1998 Merced was delayed to mid-2000, and by August 1998 analysts were questioning its commercial viability, given that McKinley would arrive shortly after with double the performance, as delays were causing Merced to turn into simply a development vehicle for the Itanium ecosystem. The "wait for McKinley" narrative was becoming prevalent.[41] The same day it was reported that due to the delays, HP would extend its line of PA-RISC PA-8000 series processors from PA-8500 to as far as PA-8900.[42] In October 1998 HP announced its plans for four more generations of PA-RISC processors, with PA-8900 set to reach 1.2 GHz in 2003.[43]

By March 1999 some analysts expected Merced to ship in volume only in 2001, but the volume was widely expected to be low as most customers would wait for McKinley.[38] In May 1999, two months before Merced's tape-out, an analyst said that failure to tape-out before July would result in another delay.[44] In July 1999, upon reports that the first silicon would be made in late August, analysts predicted a delay to late 2000, and came into agreement that Merced would be used chiefly for debugging and testing the IA-64 software. Linley Gwennap of MPR said of Merced that "at this point, everyone is expecting it's going to be late and slow, and the real advance is going to come from McKinley. What this does is puts a lot more pressure on McKinley and for that team to deliver".[45] By then, Intel had revealed that Merced would be initially priced at $5000.[46] In August 1999 HP advised some of their customers to skip Merced and wait for McKinley.[47] By July 2000 HP told the press that the first Itanium systems would be for niche uses, and that "You're not going to put this stuff near your data center for several years."; HP expected its Itanium systems to outsell the PA-RISC systems only in 2005.[48] The same July Intel told of another delay, due to a stepping change to fix bugs. Now only "pilot systems" would ship that year, while the general availability was pushed to the "first half of 2001". Server makers had largely forgone spending on the R&D for the Merced-based systems, instead using motherboards or whole servers of Intel's design. To foster a wide ecosystem, by mid-2000 Intel had provided 15,000 Itaniums in 5,000 systems to software developers and hardware designers.[49] In March 2001 Intel said Itanium systems would begin shipping to customers in the second quarter, followed by a broader deployment in the second half of the year. By then even Intel publicly acknowledged that many customers would wait for McKinley.[50]

Itanium Server Sales forecast history[51][52]

Expectations

[edit]

During development, Intel, HP, and industry analysts predicted that IA-64 would dominate first in 64-bit servers and workstations, then expand to the lower-end servers, supplanting Xeon, and finally penetrate into the personal computers, eventually to supplant RISC and complex instruction set computing (CISC) architectures for all general-purpose applications, though not replacing x86 "for the foreseeable future" according to Intel.[53][15][54][55][56][57] In 1997-1998, Intel CEO Andy Grove predicted that Itanium would not come to the desktop computers for four of five years after launch, and said "I don't see Merced appearing on a mainstream desktop inside of a decade".[58][15] In contrast, Itanium was expected to capture 70% of the 64-bit server market in 2002.[59] Already in 1998 Itanium's focus on the high end of the computer market was criticized for making it vulnerable to challengers expanding from the lower-end market segments, but many people in the computer industry feared voicing doubts about Itanium in the fear of Intel's retaliation.[15] Compaq and Silicon Graphics decided to abandon further development of the Alpha and MIPS architectures respectively in favor of migrating to IA-64.[60]

Several groups ported operating systems for the architecture, including Microsoft Windows, OpenVMS, Linux, HP-UX, Solaris,[61][62][63] Tru64 UNIX,[60] and Monterey/64.[64] The latter three were canceled before reaching the market. By 1997, it was apparent that the IA-64 architecture and the compiler were much more difficult to implement than originally thought, and the delivery timeframe of Merced began slipping.[45]

Intel announced the official name of the processor, Itanium, on October 4, 1999.[65] Within hours, the name Itanic had been coined on a Usenet newsgroup, a reference to the RMS Titanic, the "unsinkable" ocean liner that sank on her maiden voyage in 1912.[66] "Itanic" was then used often by The Register,[67] and others,[68][69][70] to imply that the multibillion-dollar investment in Itanium—and the early hype associated with it—would be followed by its relatively quick demise.

Itanium (Merced): 2001

[edit]
Itanium (Merced)
Itanium processor
General information
Launched29 May–June 2001
Discontinued10 April 2003[71]
Common manufacturer
  • Intel
Performance
Max. CPU clock rate733  to 800 MHz
FSB speeds266 MT/s
Cache
L2 cache96 KB
L3 cache2 or 4 MB
Physical specifications
Cores
  • 1
Socket

After having sampled 40,000 chips to the partners, Intel launched Itanium on May 29, 2001, with first OEM systems from HP, IBM and Dell shipping to customers in June.[72][73] By then Itanium's performance was not superior to competing RISC and CISC processors.[74] Itanium competed at the low-end (primarily four-CPU and smaller systems) with servers based on x86 processors, and at the high-end with IBM POWER and Sun Microsystems SPARC processors. Intel repositioned Itanium to focus on the high-end business and HPC computing markets, attempting to duplicate the x86's successful "horizontal" market (i.e., single architecture, multiple systems vendors). The success of this initial processor version was limited to replacing the PA-RISC in HP systems, Alpha in Compaq systems and MIPS in SGI systems, though IBM also delivered a supercomputer based on this processor.[75] POWER and SPARC remained strong, while the 32-bit x86 architecture continued to grow into the enterprise space, building on the economies of scale fueled by its enormous installed base.

Only a few thousand systems using the original Merced Itanium processor were sold, due to relatively poor performance, high cost and limited software availability.[76] Recognizing that the lack of software could be a serious problem for the future, Intel made thousands of these early systems available to independent software vendors (ISVs) to stimulate development. HP and Intel brought the next-generation Itanium 2 processor to the market a year later. Few of the microarchitectural features of Merced would be carried over to all the subsequent Itanium designs, including the 16+16 KB L1 cache size and the 6-wide (two-bundle) instruction decoding.

Itanium 2 (McKinley and Madison): 2002–2006

[edit]
Itanium 2 (McKinley and Madison)
Itanium 2 processor
General information
Launched8 July 2002
Discontinued16 November 2007[80]
Designed byHP and Intel
Product codeMcKinley, Madison, Deerfield, Madison 9M, Fanwood
Performance
Max. CPU clock rate900  to 1667 MHz
FSB speeds400  to 667 MT/s
Cache
L2 cache256 KB
L3 cache1.5–9 MB
Architecture and classification
Technology node180 nm to 130 nm
Physical specifications
Cores
  • 1
Socket

The Itanium 2 processor was released in July 2002, and was marketed for enterprise servers rather than for the whole gamut of high-end computing. The first Itanium 2, code-named McKinley, was jointly developed by HP and Intel, led by the HP team at Fort Collins, Colorado, taping out in December 2000. It relieved many of the performance problems of the original Itanium processor, which were mostly caused by an inefficient memory subsystem by approximately halving the latency and doubling the fill bandwidth of each of the three levels of cache, while expanding the L2 cache from 96 to 256 KB. Floating-point data is excluded from the L1 cache, because the L2 cache's higher bandwidth is more beneficial to typical floating-point applications than low latency. The L3 cache is now integrated on-chip rather than on a separate die, tripling in associativity and doubling in bus width. McKinley also greatly increases the number of possible instruction combinations in a VLIW-bundle and reaches 25% higher frequency, despite having only eight pipeline stages versus Merced's ten.[81][40]

McKinley contains 221 million transistors (of which 25 million are for logic and 181 million for L3 cache), measured 19.5 mm by 21.6 mm (421 mm2) and was fabricated in a 180 nm, bulk CMOS process with six layers of aluminium metallization.[82][83][84] In May 2003 it was disclosed that some McKinley processors can suffer from a critical-path erratum leading to a system's crashing. It can be avoided by lowering the processor frequency to 800 MHz.[85]

In 2003, AMD released the Opteron CPU, which implements its own 64-bit architecture called AMD64. The Opteron gained rapid acceptance in the enterprise server space because it provided an easy upgrade from x86. Under the influence of Microsoft, Intel responded by implementing AMD's x86-64 instruction set architecture instead of IA-64 in its Xeon microprocessors in 2004, resulting in a new industry-wide de facto standard.[60]

In 2003, Intel released a new Itanium 2 family member, codenamed Madison, initially with up to 1.5 GHz frequency and 6 MB of L3 cache. The Madison 9M chip released in November 2004 had 9 MB of L3 cache and frequency up to 1.6 GHz, reaching 1.67 GHz in July 2005. Both chips used a 130 nm process and were the basis of all new Itanium processors until Montecito was released in July 2006, specifically Deerfield being a low wattage Madison, and Fanwood being a version of Madison 9M for lower-end servers with one or two CPU sockets.

In November 2005, the major Itanium server manufacturers joined with Intel and a number of software vendors to form the Itanium Solutions Alliance to promote the architecture and accelerate the software porting effort.[86] The Alliance announced that its members would invest $10 billion in the Itanium Solutions Alliance by the end of the decade.[87]

Itanium 2 9000 and Itanium 9100: 2006 and 2007

[edit]
9000 and 9100 series
Intel Itanium 2 9000 (heat spreader removed)
General information
Launched18 July 2006
Discontinued26 August 2011[88]
Product codeMontecito, Montvale
Performance
Max. CPU clock rate1.4 GHz to 1.67 GHz
FSB speeds400  to 667 MT/s
Cache
L2 cache256 KB (D) + 1 MB (I)
L3 cache6–24 MB
Architecture and classification
Technology node90 nm
Physical specifications
Cores
  • 1 or 2
Socket

In early 2003, due to the success of IBM's dual-core POWER4, Intel announced that the first 90 nm Itanium processor, codenamed Montecito, would be delayed to 2005 so as to change it into a dual-core, thus merging it with the Chivano project.[89][90] In September 2004 Intel demonstrated a working Montecito system, and claimed that the inclusion of hyper-threading increases Montecito's performance by 10-20% and that its frequency could reach 2 GHz.[91][92] After a delay to "mid-2006" and reduction of the frequency to 1.6 GHz,[93] on July 18 Intel delivered Montecito (marketed as the Itanium 2 9000 series), a dual-core processor with a switch-on-event multithreading and split 256 KB + 1 MB L2 caches that roughly doubled the performance and decreased the energy consumption by about 20 percent.[94] At 596 mm² die size and 1.72 billion transistors it was the largest microprocessor at the time. It was supposed to feature Foxton Technology, a very sophisticated frequency regulator, which failed to pass validation and was thus not enabled for customers.

Intel released the Itanium 9100 series, codenamed Montvale, in November 2007, retiring the "Itanium 2" brand.[95] Originally intended to use the 65 nm process,[96] it was changed into a fix of Montecito, enabling the demand-based switching (like EIST) and up to 667 MT/s front-side bus, which were intended for Montecito, plus a core-level lockstep.[91] Montecito and Montvale were the last Itanium processors in which design Hewlett-Packard's engineering team at Fort Collins had a key role, as the team was subsequently transferred to Intel's ownership.[97]

Itanium 9300 (Tukwila): 2010

[edit]
9300 series
General information
Launched8 February 2010
Discontinued2nd quarter of 2014
Performance
Max. CPU clock rate1.33  to 1.73 GHz
Cache
L2 cache256 KB (D) + 512 KB (I)
L3 cache10–24 MB
Architecture and classification
Technology node65 nm
Physical specifications
Cores
  • 2 or 4
Socket
9500 and 9700 series
General information
Launched8 November 2012
Discontinued30 January 2020[98]
Product codePoulson, Kittson
Performance
Max. CPU clock rate1.73  to 2.67 GHz
Cache
L2 cache256 KB (D) + 512 KB (I)
L3 cache20–32 MB
Architecture and classification
Technology node32 nm
Physical specifications
Cores
  • 4 or 8
Socket
Intel Itanium 9300 CPU
Intel Itanium 9300 CPU LGA
Intel Itanium 9300 Socket Intel LGA 1248
Intel Itanium 9300 with cap removed

The original code name for the first Itanium with more than two cores was Tanglewood, but it was changed to Tukwila in late 2003 due to trademark issues.[99][100] Intel discussed a "middle-of-the-decade Itanium" to succeed Montecito, achieving ten times the performance of Madison.[101][90] It was being designed by the famed DEC Alpha team and was expected have eight new multithreading-focused cores. Intel claimed "a lot more than two" cores and more than seven times the performance of Madison.[102][103][104] In early 2004 Intel told of "plans to achieve up to double the performance over the Intel Xeon processor family at platform cost parity by 2007".[105] By early 2005 Tukwila was redefined, now having fewer cores but focusing on single-threaded performance and multiprocessor scalability.[106]

In March 2005, Intel disclosed some details of Tukwila, the next Itanium processor after Montvale, to be released in 2007. Tukwila would have four processor cores and would replace the Itanium bus with a new Common System Interface, which would also be used by a new Xeon processor.[107] Tukwila was to have a "common platform architecture" with a Xeon codenamed Whitefield,[96] which was canceled in October 2005,[108] when Intel revised Tukwila's delivery date to late 2008.[109] In May 2009, the schedule for Tukwila, was revised again, with the release to OEMs planned for the first quarter of 2010.[110] The Itanium 9300 series processor, codenamed Tukwila, was released on February 8, 2010, with greater performance and memory capacity.[111]

The device uses a 65 nm process, includes two to four cores, up to 24 MB on-die caches, Hyper-Threading technology and integrated memory controllers. It implements double-device data correction, which helps to fix memory errors. Tukwila also implements Intel QuickPath Interconnect (QPI) to replace the Itanium bus-based architecture. It has a peak interprocessor bandwidth of 96 GB/s and a peak memory bandwidth of 34 GB/s. With QuickPath, the processor has integrated memory controllers and interfaces the memory directly, using QPI interfaces to directly connect to other processors and I/O hubs. QuickPath is also used on Intel x86-64 processors using the Nehalem microarchitecture, which possibly enabled Tukwila and Nehalem to use the same chipsets.[112] Tukwila incorporates two memory controllers, each of which has two links to Scalable Memory Buffers, which in turn support multiple DDR3 DIMMs,[113] much like the Nehalem-based Xeon processor code-named Beckton.[114]

HP vs. Oracle

[edit]

During the 2012 Hewlett-Packard Co. v. Oracle Corp. support lawsuit, court documents unsealed by a Santa Clara County Court judge revealed that in 2008, Hewlett-Packard had paid Intel around $440 million to keep producing and updating Itanium microprocessors from 2009 to 2014. In 2010, the two companies signed another $250 million deal, which obliged Intel to continue making Itanium CPUs for HP's machines until 2017. Under the terms of the agreements, HP had to pay for chips it gets from Intel, while Intel launches Tukwila, Poulson, Kittson, and Kittson+ chips in a bid to gradually boost performance of the platform.[115][116]

Itanium 9500 (Poulson): 2012

[edit]

Intel first mentioned Poulson on March 1, 2005, at the Spring IDF.[117] In June 2007 Intel said that Poulson would use a 32 nm process technology, skipping the 45 nm process.[118] This was necessary for catching up after Itanium's delays left it at 90 nm competing against 65 nm and 45 nm processors.

At ISSCC 2011, Intel presented a paper called "A 32nm 3.1 Billion Transistor 12-Wide-Issue Itanium Processor for Mission Critical Servers."[119][120] Analyst David Kanter speculated that Poulson would use a new microarchitecture, with a more advanced form of multithreading that uses up to two threads, to improve performance for single threaded and multithreaded workloads.[121] Some information was also released at the Hot Chips conference.[122][123]

Information presented improvements in multithreading, resiliency improvements (Intel Instruction Replay RAS) and few new instructions (thread priority, integer instruction, cache prefetching, and data access hints).

Poulson was released on November 8, 2012, as the Itanium 9500 series processor. It is the follow-on processor to Tukwila. It features eight cores and has a 12-wide issue architecture, multithreading enhancements, and new instructions to take advantage of parallelism, especially in virtualization.[112][124][125] The Poulson L3 cache size is 32 MB and common for all cores, not divided like previously. L2 cache size is 6 MB, 512 I KB, 256 D KB per core.[119] Die size is 544 mm², less than its predecessor Tukwila (698.75 mm²).[126][127]

Intel's Product Change Notification (PCN) 111456-01 lists four models of Itanium 9500 series CPU, which was later removed in a revised document.[128] The parts were later listed in Intel's Material Declaration Data Sheets (MDDS) database.[129] Intel later posted Itanium 9500 reference manual.[130]

The models are the following:[128][131]

Processor number Frequency Cache
9520 1.73 GHz 20MB
9540 2.13 GHz 24MB
9550 2.40 GHz 32MB
9560 2.53 GHz 32MB

Itanium 9700 (Kittson): 2017

[edit]

Intel had committed to at least one more generation after Poulson, first mentioning Kittson on 14 June 2007.[118] Kittson was supposed to be on a 22 nm process and use the same LGA2011 socket and platform as Xeons.[132][133][134] On 31 January 2013 Intel issued an update to their plans for Kittson: it would have the same LGA1248 socket and 32 nm process as Poulson, thus effectively halting any further development of Itanium processors.[135]

In April 2015, Intel, although it had not yet confirmed formal specifications, did confirm that it continued to work on the project.[136] Meanwhile, the aggressively multicore Xeon E7 platform displaced Itanium-based solutions in the Intel roadmap.[137] Even Hewlett-Packard, the main proponent and customer for Itanium, began selling x86-based Superdome and NonStop servers, and started to treat the Itanium-based versions as legacy products.[138][139]

Intel officially launched the Itanium 9700 series processor family on May 11, 2017.[140][8] Kittson has no microarchitecture improvements over Poulson; despite nominally having a different stepping, it is functionally identical with the 9500 series, even having exactly the same bugs, the only difference being the 133 MHz higher frequency of 9760 and 9750 over 9560 and 9550 respectively.[141][142]

Intel announced that the 9700 series would be the last Itanium chips produced.[7][8]

The models are:[143]

Processor number Cores Threads Frequency Cache
9720 4 8 1.73 GHz 20 MB
9740 8 16 2.13 GHz 24 MB
9750 4 8 2.53 GHz 32 MB
9760 8 16 2.66 GHz 32 MB

Market share

[edit]

Compared to its Xeon family of server processors, Itanium was never a high-volume product for Intel. Intel does not release production numbers, but one industry analyst estimated that the production rate was 200,000 processors per year in 2007.[144]

According to Gartner Inc., the total number of Itanium servers (not processors) sold by all vendors in 2007, was about 55,000 (It is unclear whether clustered servers counted as a single server or not.). This compares with 417,000 RISC servers (spread across all RISC vendors) and 8.4 million x86 servers. IDC reports that a total of 184,000 Itanium-based systems were sold from 2001 through 2007. For the combined POWER/SPARC/Itanium systems market, IDC reports that POWER captured 42% of revenue and SPARC captured 32%, while Itanium-based system revenue reached 26% in the second quarter of 2008.[145] According to an IDC analyst, in 2007, HP accounted for perhaps 80% of Itanium systems revenue.[94] According to Gartner, in 2008, HP accounted for 95% of Itanium sales.[146] HP's Itanium system sales were at an annual rate of $4.4Bn at the end of 2008, and declined to $3.5Bn by the end of 2009,[10] compared to a 35% decline in UNIX system revenue for Sun and an 11% drop for IBM, with an x86-64 server revenue increase of 14% during this period.

In December 2012, IDC released a research report stating that Itanium server shipments would remain flat through 2016, with annual shipment of 26,000 systems (a decline of over 50% compared to shipments in 2008).[147]

Hardware support

[edit]

Systems

[edit]
Server manufacturers' Itanium products
Company Last product
name from to name CPUs
HP/HPE 2001 2021 Integrity 1–256
Compaq 2001 2002 ProLiant 590 1–4
IBM 2001 2005 System x455 1–16
Dell 2001 PowerEdge 7250 1–4
Hitachi 2001 2008 BladeSymphony
1000
1–8
Unisys 2002 2009 ES7000/one 1–32
SGI 2001 2011 Altix 4000 1–2048
Fujitsu 2005 PRIMEQUEST 1–32
Bull 2002 pre-2015 NovaScale 9410 1–32
NEC 2002 2012 nx7700i 1–256
Inspur 2010 pre-2015 TS10000 2–1024
Huawei 2012 pre-2015 ? ?

By 2006, HP manufactured at least 80% of all Itanium systems, and sold 7,200 in the first quarter of 2006.[148] The bulk of systems sold were enterprise servers and machines for large-scale technical computing, with an average selling price per system in excess of US$200,000. A typical system used eight or more Itanium processors.

By 2012, only a few manufacturers offered Itanium systems, including HP, Bull, NEC, Inspur and Huawei. In addition, Intel offered a chassis that could be used by system integrators to build Itanium systems.[149]

By 2015, only HP supplied Itanium-based systems.[136] When HP split in late 2015, Itanium systems (branded as Integrity)[150] were handled by Hewlett Packard Enterprise (HPE), with a major update in 2017 (Integrity i6, and HP-UX 11i v3 Update 16). HPE also supports a few other operating systems, including Windows up to Server 2008 R2, Linux, OpenVMS and NonStop. Itanium is not affected by Spectre or Meltdown.[151]

Chipsets

[edit]

Prior to the 9300-series (Tukwila), chipsets were needed to connect to the main memory and I/O devices, as the front-side bus to the chipset was the sole operational connection to the processor.[e] Two generations of buses existed: the original Itanium processor system bus (a.k.a. Merced bus) had a 64 bit data width and 133 MHz clock with DDR (266 MT/s), being soon superseded by the 128-bit 200 MHz DDR (400 MT/s) Itanium 2 processor system bus (a.k.a. McKinley bus), which later reached 533 and 667 MT/s. Up to four CPUs per single bus could be used, but prior to the 9000-series the bus speeds of over 400 MT/s were limited to up to two processors per bus.[152][153] As no Itanium chipset could connect to more than four sockets, high-end servers needed multiple interconnected chipsets.

The "Tukwila" Itanium processor model had been designed to share a common chipset with the Intel Xeon processor EX (Intel's Xeon processor designed for four processor and larger servers). The goal was to streamline system development and reduce costs for server OEMs, many of which develop both Itanium- and Xeon-based servers. However, in 2013, this goal was pushed back to be "evaluated for future implementation opportunities".[154]

In the times before on-chip memory controllers and QPI, enterprise server manufacturers differentiated their systems by designing and developing chipsets that interface the processor to memory, interconnections, and peripheral controllers. "Enterprise server" referred to the then-lucrative market segment of high-end servers with high reliability, availability and serviceability and typically 16+ processor sockets, justifying their pricing by having a custom system-level architecture with their own chipsets at its heart, with capabilities far beyond what two-socket "commodity servers" could offer. Development of a chipset costs tens of millions of dollars and so represented a major commitment to the use of Itanium.

Neither Intel nor IBM would develop Itanium 2 chipsets to support newer technologies such as DDR2 or PCI Express.[155] Before "Tukwila" moved away from the FSB, chipsets supporting such technologies were manufactured by all Itanium server vendors, such as HP, Fujitsu, SGI, NEC, and Hitachi.

Intel

[edit]

The first generation of Itanium received no vendor-specific chipsets, only Intel's 460GX consisting of ten distinct chips. It supported up to four CPUs and 64 GB of memory at 4.2 GB/s, which is twice the system bus's bandwidth. Addresses and data were handled by two different chips. 460GX had an AGP X4 graphics bus, two 64-bit 66 MHz PCI buses and configurable 33 MHz dual 32-bit or single 64-bit PCI bus(es).[156]

There were many custom chipset designs for Itanium 2, but many smaller vendors chose to use Intel's E8870 chipset. It supports 128 GB of DDR SDRAM at 6.4 GB/s. It was originally designed for Rambus RDRAM serial memory, but when RDRAM failed, Intel added four DDR SDRAM-to-RDRAM converter chips to the chipset.[157] When Intel had previously made such a converter for Pentium III chipsets 820 and 840, it drastically cut performance.[158][159] E8870 provides eight 133 MHz PCI-X buses (4.2 GB/s total because of bottlenecks) and a ICH4 hub with six USB 2.0 ports. Two E8870 can be linked together by two E8870SP Scalability Port Switches, each containing a 1MB (~200,000 cache lines) snoop filter, to create an 8-socket system with double the memory and PCI-X capacity, but still only one ICH4. Further expansion to 16 sockets was planned.[160][161][162] In 2004 Intel revealed plans for its next Itanium chipset, codenamed Bayshore, to support PCI-e and DDR2 memory, but canceled it the same year.[163][155]

Hewlett-Packard

[edit]

HP has designed four different chipsets for Itanium 2: zx1, sx1000, zx2 and sx2000. All support 4 sockets per chipset, but sx1000 and sx2000 support interconnection of up to 16 chipsets to create up to a 64 socket system. As it was developed in collaboration with Itanium 2's development, booting the first Itanium 2 in February 2001,[164] zx1 became the first Itanium 2 chipset available and later in 2004 also the first to support 533 MT/s FSB. In its basic two-chip version it directly provides four channels of DDR-266 memory, giving 8.5 GB/s of bandwidth and 32 GB of capacity (though 12 DIMM slots).[165] In versions with memory expander boards memory bandwidth reaches 12.8 GB/s, while the maximum capacity for the initial two-board 48 DIMM expanders was 96 GB, and the later single-board 32 DIMM expander up to 128 GB. The memory latency increases by 25 nanoseconds from 80 ns due to the expanders. Eight independent links went to the PCI-X and other peripheral devices (e.g. AGP in workstations), totaling 4 GB/s.[166][167]

HP's first high-end Itanium chipset was sx1000, launched in mid-2003 with the Integrity Superdome flagship server. It has two independent front-side buses, each bus supporting two sockets, giving 12.8 GB/s of combined bandwidth from the processors to the chipset. It has four links to data-only memory buffers and supports 64 GB of HP-designed 125 MHz memory at 16 GB/s. The above components form a system board called a cell. Two cells can be directly connected together to create an 8-socket glueless system. To connect four cells together, a pair of 8-ported crossbar switches is needed (adding 64 ns to inter-cell memory accesses), while four such pairs of crossbar switches are needed for the top-end system of 16 cells (64 sockets), giving 32 GB/s of bisection bandwidth. Cells maintain cache coherence through in-memory directories, which causes the minimum memory latency to be 241 ns. The latency to the most remote (NUMA) memory is 463 ns. The per-cell bandwidth to the I/O subsystems is 2 GB/s, despite the presence of 8 GB/s worth of PCI-X buses in each I/O subsystem.[168][169][170]

HP launched sx2000 in March 2006 to succeed sx1000. Its two FSBs operate at 533 MT/s. It supports up to 128 GB of memory at 17 GB/s. The memory is of HP's custom design, using the DDR2 protocol, but twice as tall as the standard modules and with redundant address and control signal contacts. For the inter-chipset communication, 25.5 GB/s is available on each sx2000 through its three serial links that can connect to a set of three independent crossbars, which connect to other cells or up to 3 other sets of 3 crossbars. The multi-cell configurations are the same as with sx1000, except the parallelism of the sets of crossbars has been increased from 2 to 3. The maximum configuration of 64 sockets has 72 GB/s of sustainable bisection bandwidth. The chipset's connection to its I/O module is now serial with an 8.5 GB/s peak and 5.5 GB/s sustained bandwidth, the I/O module having either 12 PCI-X buses at up to 266 MHz, or 6 PCI-X buses and 6 PCIe 1.1 ×8 slots. It is the last chipset to support HP's PA-RISC processors (PA-8900).[171]

HP launched the first zx2-based servers in September 2006. zx2 can operate the FSB at 667 MT/s with two CPUs or 533 MT/s with four CPUs. It connects to the DDR2 memory either directly, supporting 32 GB at up to 14.2 GB/s, or through expander boards, supporting up to 384 GB at 17 GB/s. The minimum open-page latency is 60 to 78 ns. 9.8 GB/s are available through eight independent links to the I/O adapters, which can include PCIe ×8 or 266 MHz PCI-X.[172][173]

Others

[edit]

In May 2003, IBM launched the XA-64 chipset for Itanium 2. It used many of the same technologies as the first two generations of XA-32 chipsets for Xeon, but by the time of the third gen XA-32 IBM had decided to discontinue its Itanium products. XA-64 supported 56 GB of DDR SDRAM in 28 slots at 6.4 GB/s, though due to bottlenecks only 3.2 GB/s could go to the CPU and other 2 GB/s to devices for a 5.2 GB/s total. The CPU's memory bottleneck was mitigated by an off-chip 64 MB DRAM L4 cache, which also worked as a snoop filter in multi-chipset systems. The combined bandwidth of the four PCI-X buses and other I/O is bottlenecked to 2 GB/s per chipset. Two or four chipsets can be connected to make an 8 or 16 socket system.[174]

SGI's Altix supercomputers and servers used the SHUB (Super-Hub) chipset, which supports two Itanium 2 sockets. The initial version used DDR memory through four buses for up to 12.8 GB/s bandwidth, and up to 32 GB of capacity across 16 slots. A 2.4 GB/s XIO channel connected to a module with up to six 64-bit 133 MHz PCI-X buses. SHUBs can be interconnected by the dual 6.4 GB/s NUMAlink4 link planes to create a 512-socket cache-coherent single-image system. A cache for the in-memory coherence directory saves memory bandwidth and reduces latency. The latency to the local memory is 132 ns, and each crossing of a NUMAlink4 router adds 50 ns. I/O modules with four 133 MHz PCI-X buses can connect directly to the NUMAlink4 network.[175][176][177][178] SGI's second-generation SHUB 2.0 chipset supported up to 48 GB of DDR2 memory, 667 MT/s FSB, and could connect to I/O modules providing PCI Express.[179][180] It supports only four local threads, so when having two dual-core CPUs per chipset, Hyper-Threading must be disabled.[181]

Software support

[edit]

Unix

[edit]
  • HP-UX 11 (supported until 2025)

BSD

[edit]
  • NetBSD (a tier II port[182] that "is a work-in-progress effort to port NetBSD to the Itanium family of processors. Currently no formal release is available."[183])
  • FreeBSD (unsupported since 31 October 2018)

Linux

[edit]

The Trillian Project was an effort by an industry consortium to port the Linux kernel to the Itanium processor. The project started in May 1999 with the goal of releasing the distribution in time for the initial release of Itanium, then scheduled for early 2000.[184] By the end of 1999, the project included Caldera Systems, CERN, Cygnus Solutions, Hewlett-Packard, IBM, Intel, Red Hat, SGI, SuSE, TurboLinux and VA Linux Systems.[185] The project released the resulting code in February 2000.[184] The code then became part of the mainline Linux kernel more than a year before the release of the first Itanium processor. The Trillian project was able to do this for two reasons:

  • the free and open source GCC compiler had already been enhanced to support the Itanium architecture.
  • a free and open source simulator had been developed to simulate an Itanium processor on an existing computer.[186]

After the successful completion of Project Trillian, the resulting Linux kernel was used by all of the manufacturers of Itanium systems (HP, IBM, Dell, SGI, Fujitsu, Unisys, Hitachi, and Groupe Bull). With the notable exception of HP, Linux is either the primary OS or the only OS the manufacturer supports for Itanium. Ongoing free and open source software support for Linux on Itanium subsequently coalesced at Gelato.

Distribution support

[edit]

In 2005, Fedora Linux started adding support for Itanium[187] and Novell added support for SUSE Linux.[188] In 2007, CentOS added support for Itanium in a new release.[189]

  • Debian (official support was dropped in Debian 8; unofficial support available through Debian Ports until June 2024[190])
  • EPIC Slack - an unofficial port of Slackware - specifically supports IA-64 (and hence Itanium) since its release in May 2024.[191]
  • Gentoo Linux[192] (releases before August 2024)[193]
  • Red Hat Enterprise Linux (unsupported since RHEL 6, had support in RHEL 5 until 2017, which supported other platforms until November 30, 2020)
  • SUSE Linux 11 (supported until 2019, for other platforms SUSE 11 was supported until 2022).
  • T2 SDE supports Itanium in its IA-64 port.[194]

Deprecation

[edit]

In 2009, Red Hat dropped Itanium support in Enterprise Linux 6.[195] Ubuntu 10.10 dropped support for Itanium.[196] In 2021, Linus Torvalds marked the Itanium code as orphaned. Torvalds said: "HPE no longer accepts orders for new Itanium hardware, and Intel stopped accepting orders a year ago. While intel [sic] is still officially shipping chips until July 29, 2021, it's unlikely that any such orders actually exist. It's dead, Jim."[197][198]

Support for Itanium was removed in Linux 6.7[199][200] and is since then maintained out-of-tree.[201][202]

Microsoft Windows

[edit]

OpenVMS

[edit]

In 2001, Compaq announced that OpenVMS would be ported to the Itanium architecture.[203] This led to the creation of the V8.x releases of OpenVMS, which support both Itanium-based HPE Integrity Servers and DEC Alpha hardware.[204] Since the Itanium porting effort began, ownership of OpenVMS transferred from Compaq to HP in 2001, and then to VMS Software Inc. (VSI) in 2014.[205] Noteworthy releases include:

  • V8.0 (2003) - First pre-production release of OpenVMS on Itanium available outside HP.[204]
  • V8.2 (2005) - First production-grade release of OpenVMS on Itanium.[204]
  • V8.4 (2010) - Final release of OpenVMS supported by HP. Support ended on December 31, 2020.[206]
  • V8.4-2L3 (2021) - Final release of OpenVMS on Itanium supported by VSI. Support ends on December 31, 2035.[207]

Support for Itanium has been dropped in the V9.x releases of OpenVMS, which run on x86-64 only.[207]

NonStop OS

[edit]

NonStop OS was ported from MIPS-based hardware to Itanium in 2005.[208] NonStop OS was later ported to x86-64 in 2015. Sales of Itanium-based NonStop hardware ended in 2020, with support ending in 2025.[209][210]

Compiler

[edit]

GNU Compiler Collection deprecated support for IA-64 in GCC 10, after Intel announced the planned phase-out of this ISA.[211] LLVM (Clang) dropped Itanium support in version 2.6.[212]

Virtualization and emulation

[edit]

HP sells a virtualization technology for Itanium called Integrity Virtual Machines.

Emulation is a technique that allows a computer to execute binary code that was compiled for a different type of computer. Before IBM's acquisition of QuickTransit in 2009, application binary software for IRIX/MIPS and Solaris/SPARC could run via type of emulation called "dynamic binary translation" on Linux/Itanium. Similarly, HP implemented a method to execute PA-RISC/HP-UX on the Itanium/HP-UX via emulation, to simplify migration of its PA-RISC customers to the radically different Itanium instruction set. Itanium processors can also run the mainframe environment GCOS from Groupe Bull and several x86 operating systems via instruction set simulators.

Competition

[edit]
Area chart showing the representation of different families of micro-
processors in the TOP500 ranking list of supercomputers (1993–2019)

Itanium was aimed at the enterprise server and high-performance computing (HPC) markets. Other enterprise- and HPC-focused processor lines include Oracle's and Fujitsu's SPARC processors and IBM's Power microprocessors. Measured by quantity sold, Itanium's most serious competition came from x86-64 processors including Intel's own Xeon line and AMD's Opteron line. Since 2009, most servers were being shipped with x86-64 processors.[10]

In 2005, Itanium systems accounted for about 14% of HPC systems revenue, but the percentage declined as the industry shifted to x86-64 clusters for this application.[213]

An October 2008 Gartner report on the Tukwila processor stated that "...the future roadmap for Itanium looks as strong as that of any RISC peer like Power or SPARC."[214]

Supercomputers and high-performance computing

[edit]

An Itanium-based computer first appeared on the list of the TOP500 supercomputers in November 2001.[75] The best position ever achieved by an Itanium 2 based system in the list was No. 2, achieved in June 2004, when Thunder (Lawrence Livermore National Laboratory) entered the list with an Rmax of 19.94 Teraflops. In November 2004, Columbia entered the list at No. 2 with 51.8 Teraflops, and there was at least one Itanium-based computer in the top 10 from then until June 2007. The peak number of Itanium-based machines on the list occurred in the November 2004 list, at 84 systems (16.8%); by June 2012, this had dropped to one system (0.2%),[215] and no Itanium system remained on the list in November 2012.

Processors

[edit]

Released processors

[edit]
Itanium 2 mx2 'Hondo' (top)
Itanium 2 mx2 'Hondo' (bottom)

The Itanium processors show a progression in capability. Merced was a proof of concept. McKinley dramatically improved the memory hierarchy and allowed Itanium to become reasonably competitive. Madison, with the shift to a 130 nm process, allowed for enough cache space to overcome the major performance bottlenecks. Montecito, with a 90 nm process, allowed for a dual-core implementation and a major improvement in performance per watt. Montvale added three new features: core-level lockstep, demand-based switching and front-side bus frequency of up to 667 MHz.

Codename process Released Clock L2 Cache/
core
L3 Cache/
processor
Bus dies/
dev.
cores/
die
TDP/
dev.
Comments
Itanium
Merced 180 nm 2001-05-29 733 MHz 96 KB 1 MB

2 MB

266 MHz 1 1 116 2 or 4 MB off-die L3 cache
800 MHz 130 2 or 4 MB off-die L3 cache
Itanium 2
McKinley 180 nm 2002-07-08 900 MHz 256 KB 1.5 MB 400 MHz 1 1 90 HW branchlong
1 GHz 100
3 MB
Madison 130 nm 2003-06-30 1.3 GHz 3 MB 97
1.4 GHz 4 MB 91
1.5 GHz 6 MB 107
2003-09-08 1.4 GHz 1.5 MB 91
2004-04-13 3 MB  
1.6 GHz 99
Deerfield 2003-09-08 1.0 GHz 1.5 MB 55 Low voltage
Hondo[216] 2004-06 1.1 GHz 4 MB 2 1 170 Not a product of Intel, but of HP. 32 MB L4
Fanwood 2004-11-08 1.3 GHz 3 MB 1 1 62 Low voltage
1.6 GHz 99  
533 MHz
Madison 9M 1.5 GHz 4 MB 400 MHz 122
1.6 GHz 6 MB
9 MB
2005-07-05 1.67 GHz 6 MB 667 MHz
9 MB
Itanium 2 9000 series
Montecito 90 nm 2006-07-18 1.4–
1.6 GHz
256 KB (D)+
1 MB (I)
6–24 MB 400–
533 MHz
1 2 75–104 Virtualization, Multithread, no HW IA-32
Itanium 9100 series
Montvale 90 nm 2007-10-31 1.42–
1.66 GHz
256 KB (D)+
1 MB (I)
8–24 MB 400–
667 MHz
1 1–2 75–104 Core-level lockstep, demand-based switching
Itanium 9300 series
Tukwila 65 nm 2010-02-08 1.33–
1.73 GHz
256 KB (D)+
512 KB (I)
10–24 MB QPI with
4.8 GT/s
1 2–4 130–185 A new point-to-point processor interconnect, the QPI,
replacing the FSB. Turbo Boost
Itanium 9500 series
Poulson 32 nm 2012-11-08
[217]
1.73–
2.53 GHz
256 KB (D)+
512 KB (I)
20–32 MB QPI with
6.4 GT/s
1 4–8 130–170 Doubled issue width (from 6 to 12 instructions per cycle),
Instruction Replay technology, Dual-domain hyperthreading[218][124][219]
Itanium 9700 series
Kittson 32 nm 2017-05-11
[8]
1.73–
2.66 GHz
256 KB (D)+
512 KB (I)
20–32 MB QPI with
6.4 GT/s
1 4–8 130–170 No architectural improvements over Poulson,
5 % higher clock for the top model
Codename process Released Clock L2 Cache/
core
L3 Cache/
processor
Bus dies/
dev.
cores/
die
watts/
dev.
Comments
List of Intel Itanium processors

Market reception

[edit]

High-end server market

[edit]
HP zx6000 system board with dual Itanium 2 processors
Itanium 2 in 2003

When first released in 2001, Itanium's performance was disappointing compared to better-established RISC and CISC processors.[56][57] Emulation to run existing x86 applications and operating systems was particularly poor, with one benchmark in 2001 reporting that it was equivalent at best to a 100 MHz Pentium in this mode (1.1 GHz Pentiums were on the market at that time).[220] Itanium failed to make significant inroads against IA-32 or RISC, and suffered further following the arrival of x86-64 systems which offered greater compatibility with older x86 applications.

In a 2009 article on the history of the processor — "How the Itanium Killed the Computer Industry" — journalist John C. Dvorak reported "This continues to be one of the great fiascos of the last 50 years".[221] Tech columnist Ashlee Vance commented that the delays and underperformance "turned the product into a joke in the chip industry".[146] In an interview, Donald Knuth said "The Itanium approach...was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write."[222]

Both Red Hat and Microsoft announced plans to drop Itanium support in their operating systems due to lack of market interest;[223][224] however, other Linux distributions such as Gentoo and Debian remain available for Itanium. On March 22, 2011, Oracle Corporation announced that it would no longer develop new products for HP-UX on Itanium, although it would continue to provide support for existing products.[225] Following this announcement, HP sued Oracle for breach of contract, arguing that Oracle had violated conditions imposed during settlement over Oracle's hiring of former HP CEO Mark Hurd as its co-CEO, requiring the vendor to support Itanium on its software "until such time as HP discontinues the sales of its Itanium-based servers",[226] and that the breach had harmed its business. In 2012, a court ruled in favor of HP, and ordered Oracle to resume its support for Itanium. In June 2016, Hewlett Packard Enterprise (the corporate successor to HP's server business) was awarded $3 billion in damages from the lawsuit.[227][228] Oracle unsuccessfully appealed the decision to the California Court of Appeal in 2021.[229]

A former Intel official reported that the Itanium business had become profitable for Intel in late 2009.[230] By 2009, the chip was almost entirely deployed on servers made by HP, which had over 95% of the Itanium server market share,[146] making the main operating system for Itanium HP-UX. On March 22, 2011, Intel reaffirmed its commitment to Itanium with multiple generations of chips in development and on schedule.[231]

Other markets

[edit]
HP zx6000, an Itanium 2-based Unix workstation

Although Itanium did attain limited success in the niche market of high-end computing, Intel had originally hoped it would find broader acceptance as a replacement for the original x86 architecture.[232]

AMD chose a different direction, designing the less radical x86-64, a 64-bit extension to the existing x86 architecture, which Microsoft then supported, forcing Intel to introduce the same extensions in its own x86-based processors.[233] These designs can run existing 32-bit applications at native hardware speed, while offering support for 64-bit memory addressing and other enhancements to new applications.[146] This architecture has now become the predominant 64-bit architecture in the desktop and portable market. Although some Itanium-based workstations were initially introduced by companies such as SGI, they are no longer available.

See also

[edit]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Itanium is a family of 64-bit microprocessors implementing the IA-64 instruction set architecture (ISA), jointly developed by Hewlett-Packard (HP) and Intel using the Explicitly Parallel Instruction Computing (EPIC) paradigm to enable high instruction-level parallelism (ILP) through compiler-driven optimizations. Designed primarily for high-end servers and workstations, it emphasizes advanced features such as explicit instruction bundling (three instructions per 128-bit bundle), predication to minimize branch penalties, data and control speculation to hide memory latencies, and a large register file including 128 general-purpose registers and 128 floating-point registers. Launched on May 29, 2001, with the initial Merced processor, Itanium aimed to address limitations of existing architectures like x86 by providing superior scalability for enterprise and technical computing workloads. The architecture's origins trace back to a 1994 partnership between HP and , following HP's initial research into VLIW-based designs in the late 1980s, with the goal of creating a next-generation 64-bit platform for mission-critical applications. Subsequent generations, including Itanium 2 (introduced in 2002) and later models like Tukwila (2008) and Poulson (2012), incorporated multi-core designs, , larger caches, and process shrinks down to 32 nm, achieving up to 2x performance gains per generation while maintaining backward compatibility with software. Despite early promise—such as powering supercomputers and accumulating over $8.7 billion in revenue by 2007—Itanium faced challenges from entrenched x86 ecosystems, limited software adoption, and competition from AMD's , leading to niche market positioning in areas like HP's servers. Intel announced the discontinuation of Itanium in January 2019 via Product Change Notification 116733-00, citing shifts in market demand toward x86-based products, with final orders accepted until January 30, 2020, and last shipments on July 29, 2021, for the Kittson series (Itanium 9700). Although support for existing systems continued through partners like HPE until December 31, 2025, the architecture's end marked the conclusion of a bold but ultimately unsuccessful attempt to redefine enterprise , influencing later designs in parallelism and technologies.

Architecture

EPIC paradigm

The Explicitly Parallel Instruction Computing (EPIC) paradigm, foundational to the Itanium architecture, shifts the responsibility of identifying (ILP) from hardware to the , allowing software to explicitly annotate independent instructions for concurrent execution and thereby minimizing runtime dependency resolution overhead. This design philosophy contrasts with conventional RISC and CISC approaches, which rely on complex hardware schedulers to detect parallelism dynamically, often at the cost of increased power consumption and design complexity. EPIC's origins trace back to advancements in VLIW research, notably the Multiflow TRACE project led by Josh Fisher in the 1980s, which pioneered trace scheduling—a compiler technique for grouping instructions along likely execution paths to exploit ILP statically. This work influenced Hewlett-Packard's subsequent efforts, culminating in a 1993 collaboration with Intel to develop the IA-64 architecture, publicly announced in 1994 as a new computing paradigm to address limitations in scaling ILP for future processors. Central to EPIC are instruction bundles, fixed 128-bit units comprising three 41-bit instructions and a 5-bit template; the template encodes instruction types (e.g., integer, memory, or branch) and stop bits to delineate dependency boundaries, enabling the hardware to fetch and issue bundles atomically for parallel execution across multiple functional units. Complementing this, predicate execution employs 64 dedicated predicate registers to guard instructions conditionally, transforming branches into predicated operations that execute both paths simultaneously and nullify the incorrect one, thus reducing control hazards and enabling over 50% branch elimination in typical code. Proponents of EPIC asserted it could achieve superior instruction throughput by harnessing compiler sophistication for parallelism exposure, while permitting simpler hardware devoid of advanced decoders or reorder buffers, potentially supporting wider issue widths and larger register files for sustained performance in parallel workloads. For instance, by explicitly marking independent operations within bundles, EPIC avoids the latency of hardware ILP extraction, claiming up to 40% fewer branch mispredictions through predication alone. Unlike pure VLIW architectures, where the bears full responsibility for scheduling—including handling variable latencies via no-ops or delay slots—EPIC introduces hardware-assisted mechanisms for greater robustness, such as dynamic branch hints via predicate manipulation and to enforce dependencies without stalling the . These features enhance binary portability across processor generations and mitigate VLIW's sensitivity to inaccuracies, though they still demand high-quality optimization tools for optimal efficacy.

Instruction set and registers

The IA-64 (ISA), which underpins the Itanium processor family, is a 64-bit load-store that separates memory operations from computation, enabling efficient pipelining and parallelism. It features a large to minimize memory accesses, including 128 general-purpose integer registers (GR0 through GR127), each 64 bits wide, used for addressing, arithmetic, and ; 128 floating-point registers (FR0 through FR127), each 82 bits to accommodate with status bits; 64 one-bit predicate registers (PR0 through PR63) for conditional execution; and 8 registers (BR0 through BR7), each 64 bits, dedicated to holding target addresses for branches and calls. Instructions in employ a three-operand format, where most operations specify two source operands and one destination, promoting explicit data flow without implicit register reuse common in two-operand ISAs. To indicate parallelism, instructions are grouped into 128-bit bundles consisting of three 41-bit instructions and a 5-bit template that encodes dependencies and assignments, ensuring alignment on 16-byte boundaries for atomic fetch. Addressing modes include PC-relative displacement for , such as in targets calculated as instruction pointer plus offset, and indirect addressing via base registers plus displacement or index for flexible memory access patterns. The mandates little-endian byte order exclusively, with natural alignment required for loads and stores to avoid exceptions. Key instruction categories encompass integer operations like (add) for basic arithmetic and shifts (shl, shr) for ; floating-point instructions including fused multiply-add (fma), which computes a×b+ca \times b + c in a single rounded operation to enhance precision and performance in numerical computations; memory instructions such as speculative loads (ld.s) that defer exceptions until use, paired with checks (chk.s) for control data speculation; and control-flow instructions like predicated es (br) that execute conditionally based on predicate registers, reducing branch misprediction penalties, alongside call instructions (call) that link to branch registers for subroutine invocation. These categories support a wide range of computations while integrating predication across nearly all instructions to enable if-conversion during compilation. Distinctive features of the ISA include register rotation, facilitated by a 7-bit register rename base (RRB) in the processor status register, which cyclically renames registers within the set to facilitate software and iteration without stack spills or explicit counter maintenance, ideal for vectorizable loops. Additionally, Not-a-Thing (NaT) values—special 65th-bit indicators on general registers—allow deferred in ; a load that faults sets a NaT bit instead of trapping immediately, enabling the computation to proceed until a consuming instruction like an add triggers resolution via a NaT check. These mechanisms, combined with bundle-based explicit parallelism, form the core of IA-64's for .
Register TypeDesignationCountWidthPrimary Use
General-PurposeGR0–GR12712864 bitsInteger arithmetic, addressing
Floating-PointFR0–FR12712882 bitsFloating-point operations
PredicatePR0–PR63641 bitConditional execution
BR0–BR7864 bitsBranch targets and calls

Microarchitecture features

The Itanium processors employ a deep optimized for the EPIC instruction set, with early designs featuring a 10-stage to accommodate wide instruction issue and high clock frequencies. Subsequent generations refined this to an 8-stage , decoupling the front-end fetch and back-end execution stages with an instruction buffer to sustain throughput of up to six instructions per cycle. These pipelines support for loads and hardware mechanisms for , including an Advanced Load Address Table (ALAT) to track speculative memory accesses and resolve dependencies dynamically. Execution units are designed for parallel operation, with configurations including multiple integer arithmetic logic units (ALUs), floating-point multipliers/accumulators (FMACs), and dedicated units per core. Baseline implementations provide four integer ALUs, two floating-point units, and three units, enabling peak rates of several gigaflops in floating-point operations; later evolutions expand to six ALUs, four FP units, and additional units for enhanced vector processing. handling emphasizes compiler-provided static hints over dynamic hardware to reduce complexity and power overhead, though limited hardware support for IP-relative branches achieves zero-cycle penalties on correct predictions. The memory subsystem features a split on-chip L1 cache with separate 16 KB instruction and 16 KB data caches, both four-way set-associative and write-through for low latency access. A unified on-chip L2 cache follows, typically 256 KB and eight-way associative, followed by an off-chip L3 cache of several megabytes that operates at core speed for high bandwidth. This hierarchy incorporates non-blocking designs and dynamic prefetching to tolerate latency in speculative workloads, with event monitoring for misses and references to aid . Interconnects evolve from a shared in initial models to point-to-point links in later generations, such as the QuickPath Interconnect (QPI), providing scalable bandwidth up to 6.4 GB/s per link for multi-socket systems. These support (NUMA) configurations, allowing up to four processors in glueless setups with coherent caching and up to 16 outstanding requests to maintain scalability in enterprise environments. Power and thermal efficiency improvements include extensive error-correcting code (ECC) coverage across caches and buses, along with voltage scaling introduced in later designs to reduce consumption—such as low-voltage variants operating at half the power of standard models while preserving performance. These features enable reliable operation in high-density servers, with mechanisms like phase shedding in voltage regulators to optimize idle power.

Development history

Inception and partnerships (1989–1994)

In the late 1980s, (HP) initiated internal research to develop a successor to its architecture, driven by the need for a more scalable 64-bit design to address limitations in handling increasingly complex enterprise workloads. This effort, which began as a secret project in December 1988 under director Dick Lampman, focused on exploring advanced to boost performance in high-end computing. Meanwhile, , having learned from the commercial failure of its complex iAPX 432 microprocessor in the early 1980s—which aimed for high-level language support but suffered from performance issues and high costs—was seeking opportunities to evolve beyond incremental x86 improvements toward a revolutionary enterprise-focused architecture. By late 1993, HP, facing the economic challenges of developing and manufacturing advanced processors independently, approached to propose a collaboration on a 64-bit VLIW-based design, marking the start of formal joint efforts. This led to the official announcement on June 8, 1994, of a 50-50 between HP and to co-develop a new processor family codenamed Merced, intended to power future workstations and servers with superior and . Key figures included HP's technical lead Jerry Huck, who advocated for the alliance, and Intel's senior vice president Albert Yu, who emphasized merging the companies' expertise to advance . Intel's John Crawford was appointed to lead the joint design team, drawing on HP's research into explicit parallelism concepts. The partnership's primary goals were to create a clean-slate 64-bit for enterprise that would supplant both x86 and existing RISC designs, targeting high-performance servers capable of explicit to exploit parallelism without relying on hardware speculation. Initial visions outlined a processor aiming for clock speeds around 800 MHz and a 6-wide issue capability, though emphasis was placed on long-term innovation over precise near-term metrics, with production slated for the . To ensure market adoption, the design incorporated plans for with x86 software through emulation, addressing early debates about forgoing native support in favor of a forward-looking ISA. This anticipated challenge highlighted tensions between innovation and legacy preservation, as HP and sought to protect customer investments while pushing for a .

Design phase and delays (1994–2001)

The design of the Itanium processor, codenamed Merced during its development, evolved through a collaboration between (HP) and , shifting from HP's initial vision of a radically wide () architecture toward a more practical () approach. Early HP concepts explored issue widths as wide as 64 instructions to maximize (), but the joint effort scaled this down to a 6-wide execution that dispatched bundles of three 41-bit instructions within 128-bit words, balancing hardware complexity with compiler-driven scheduling. This paradigm incorporated semantics from HP's architecture, such as rotating register files for loop unrolling and predication to reduce branch penalties, enabling compilers to explicitly mark parallel operations without hardware speculation overhead. Development encountered substantial delays from 1994 to 2001, primarily due to process technology transitions from an initial 0.25 μm node to the more advanced 0.18 μm process to fit the chip's growing transistor count and performance needs, as well as the inherent complexity of verifying a design with over 25 million transistors. Verification efforts were hampered by the architecture's novelty, requiring extensive simulation to ensure EPIC bundle execution and branch prediction accuracy across billions of potential state combinations. Compiler development further contributed to lags, as optimizing for EPIC's reliance on static scheduling demanded new tools to identify ILP in legacy codebases, with early versions struggling to achieve expected parallelism without manual intervention. Key milestones highlighted these challenges: design verification issues delayed tape-out until July 1999, with first produced in August 1999 but containing bugs in the execution pipeline and , necessitating additional revisions. To accelerate tool development, and HP partnered with universities, including efforts at institutions like the University of Illinois for optimization research and simulation frameworks. Internal conflicts arose over resource allocation, as Intel increasingly prioritized enhancements to its dominant x86 architecture amid rising demand for Pentium processors, diverting engineering talent from the Itanium project. Debates intensified regarding x86 compatibility, culminating in the addition of a hardware emulation unit in 2000 to decode and execute legacy x86 instructions via a dedicated in-order pipeline, addressing concerns about software ecosystem migration without fully integrating x86 semantics into the core EPIC design. By 2001, the Merced design was frozen for production on the 0.18 μm process at 800 MHz, but initial performance fell short of expectations, with SPECint_base2000 score of around 314 compared to approximately 480-550 for HP's PA-8700 and 400-550 for 21264 at comparable clock speeds, placing Itanium behind in integer workloads while matching or slightly trailing in floating-point tasks like SPECfp_base2000. This gap stemmed partly from immature compilers unable to fully exploit the architecture's potential, underscoring the risks of the EPIC bet during the design phase.

Launch and initial expectations (2001)

The Itanium processor, codenamed Merced, officially launched on May 29, 2001, following years of development collaboration between and . Initial shipments began in June 2001 to original equipment manufacturers (OEMs) such as HP and , with the first systems including HP's rx7620 server and Dell's Precision Workstation 730. These entry-level configurations, equipped with 733 MHz or 800 MHz Itanium processors, 1 GB of RAM, and basic storage, started at prices around $7,000 to $8,000, positioning Itanium as a premium option for high-end computing. Intel and its partners marketed Itanium as a revolutionary shift in computing architecture, leveraging the (EPIC) paradigm to unlock massive parallelism and deliver up to 10 times the performance of contemporary processors like the through better exploitation of . Intel CEO Craig Barrett emphasized the processor's potential to power enterprise workloads, including web servers and , heralding it as the foundation for a new generation of scalable systems. Expectations were high, with projections that Itanium would dominate high-end servers and workstations, supported by endorsements from major vendors including , , and Silicon Graphics Inc. (SGI), who announced their own Itanium-based platforms. However, early performance benchmarks revealed significant shortfalls, primarily due to immature compilers that struggled to generate optimized EPIC code, limiting the architecture's promised parallelism. On the SPEC CPU2000 suite, the 800 MHz Itanium in the Dell Precision Workstation 730 achieved a SPECint_base2000 score of 314, lagging behind competitors like the 1.5 GHz 4's score of 526 and even some RISC processors such as the UltraSPARC III. Floating-point performance fared better at around 645 on SPECfp_base2000, but overall integer workloads underperformed expectations, with analysts noting Itanium's effective throughput as roughly equivalent to or below mid-range x86 systems despite its higher price point. These results stemmed from the need for specialized software tuning, as legacy x86 emulation added overhead for non-native applications. Initial adoption was confined to niche high-end server markets, with limited deployments in enterprise environments for tasks like database processing and scientific computing. Partnerships with SGI and expanded system offerings, but volume sales remained low as customers awaited software ecosystem maturity; HP positioned early systems more as development platforms than production-ready hardware. By late 2001, fewer than a dozen Itanium-based models were available from major vendors, reflecting cautious uptake amid economic uncertainty following the dot-com bust. Media coverage and analyst opinions lauded Itanium's architectural innovation as a bold departure from x86 dominance, praising its 64-bit addressing and potential for future scalability in mission-critical applications. However, criticism quickly mounted over the immature technology and , with outlets describing the launch as overhyped and as "disappointing" relative to established RISC alternatives. Illuminata analyst Jonathan Eunice called it a "development environment" rather than a mature product, highlighting the risks of betting on unproven paradigms in a rapidly evolving market.

Processor generations

Merced generation (2001)

The Merced generation marked the debut of the Itanium processor family in 2001, featuring a single-core design targeted at high-end servers and workstations. Fabricated using a 0.18 μm process technology, the processor contained approximately 25 million transistors. It operated at a base clock speed of 800 MHz for standard models, with a low-voltage variant clocked at 733 MHz to support power-sensitive applications. These processors integrated with the 460GX chipset, enabling connectivity for memory and I/O in early Itanium-based systems. Merced's emphasized the EPIC paradigm through a 10-stage capable of issuing up to 6 in bundles of three, supported by 2 arithmetic logic units (ALUs) and 2 floating-point units. While the theoretical peak performance promised high throughput for explicitly parallel code, real-world (IPC) typically ranged from 1 to 2, limited by the rigidity of instruction bundling and early inefficiencies in exploiting parallelism. The design supported 4-way (SMP), allowing configurations of up to four processors without additional for basic . Key limitations included the absence of dedicated hardware for x86 instruction translation, relying instead on software emulation that resulted in significantly reduced for legacy applications—often 50% or slower compared to native x86 hardware of the era. Additionally, the processor exhibited high power consumption with a (TDP) of 130 W, contributing to thermal management challenges in densely packed systems. Deployment was limited, appearing in initial products such as the Compaq servers, but overall volumes remained low at around 10,000 units shipped in 2001 due to software ecosystem immaturity and shortfalls relative to expectations.

McKinley and Madison generations (2002–2006)

The McKinley processor, introduced in 2002 as the inaugural member of the Itanium 2 family, marked a substantial from the initial Merced generation by enhancing execution efficiency and resource utilization within the architecture. Fabricated using a 0.18 μm process, it operated at clock speeds of 900 MHz to 1 GHz and incorporated up to 3 MB of integrated L3 cache, enabling up to twice the overall system performance compared to Merced-based platforms. Floating-point capabilities saw particularly strong gains, with SPECfp_base2000 scores reaching approximately 1,356 at 1 GHz—roughly four times higher than Merced's typical results—due to doubled execution resources and refined pipeline design that better exploited EPIC parallelism. Key architectural advancements in McKinley included improved branch prediction mechanisms, which reduced misprediction penalties through advanced prediction tables and recovery paths, and an enhanced hardware x86 compatibility assist unit for seamless legacy binary execution. These features doubled the number of integer and load/store units relative to Merced, allowing for greater while maintaining compatibility with the evolving Itanium instruction set. The processor powered early enterprise systems, demonstrating viability in high-end computing environments despite ongoing reliance on sophisticated compilers for optimal EPIC bundle scheduling. The Madison series, debuting in 2003, refined McKinley's design through a 0.13 μm process shrink, enabling higher clock speeds up to 1.6 GHz and larger cache configurations of 6 MB or 9 MB L3. With around 410 million transistors, Madison variants delivered up to 50% greater frequency and doubled L3 cache bandwidth over McKinley, yielding SPECfp_base2000 rates exceeding 2,100 in single-processor configurations. These improvements extended to multi-processor scalability, where SPECfp_rate2000 scores surpassed 40 in small cluster setups, underscoring enhanced floating-point throughput for scientific and database workloads. Madison processors integrated into platforms like HP Integrity servers and SGI Altix systems, supporting demanding applications in enterprise and . In 2006, the Madison lineage culminated with the Montecito variant, Intel's first dual-core Itanium 2 processor fabricated at 90 nm, featuring two Madison-derived cores sharing up to 12 MB L3 cache per socket at 1.6 GHz. Operating at a of up to 104 W per socket, Montecito emphasized reliability with advanced error correction and aimed to double per-socket performance through core replication, though power scaling remained challenging in multi-socket configurations exceeding four sockets due to interconnect bottlenecks and elevated thermal demands. Despite these hurdles, the generation's dependency persisted, requiring optimized code generation to fully leverage its parallel execution potential.

High-end evolutions (2006–2017)

The high-end evolutions of the Itanium architecture from 2006 to 2017 built upon the dual-core Madison generation by introducing enhanced multi-core designs, process node shrinks, and advanced interconnects tailored for (HPC) and mission-critical enterprise applications. These developments emphasized scalability, (RAS) features, such as improved error correction and support, to address demanding workloads in sectors like , , and scientific . The Itanium 2 9100 series, codenamed Montvale and released in 2007, marked the transition to dual-core configurations on a , operating at up to 1.66 GHz with speeds reaching 667 MT/s. It featured up to 24 MB of shared L3 cache and split L2 caches per core (1 MB instruction and 256 KB data), enabling better thread handling through demand-based switching and whole-core lockstep for . These processors supported up to four sockets in NUMA configurations, delivering approximately 19% performance uplift over prior models while maintaining compatibility with existing Itanium platforms for enterprise servers. In 2010, the Itanium 9300 series, known as Tukwila, advanced to a quad-core on a with frequencies up to 1.86 GHz and a (TDP) of 130 W. It introduced the first integrated on-die L3 cache of 30 MB shared across cores, along with Technology for eight threads per processor, resulting in more than double the performance of Montvale in memory-intensive tasks. A key innovation was the adoption of the QuickPath Interconnect (QPI) at 4.8 GT/s, replacing the to boost inter-processor bandwidth by up to 800% and support up to four sockets with enhanced DDR3 memory capacity reaching 2 TB per processor. These features significantly improved RAS through advanced ECC and second-generation virtualization, targeting scalable mission-critical systems. The Itanium 9500 series, codenamed Poulson and launched in 2012, represented a major leap with an eight-core architecture on a 32 nm process, clock speeds up to 2.53 GHz, and a TDP of up to 170 W. It integrated 54 MB of on-die cache (including 32 MB L3 and approximately 6 MB total L2 across cores, with 512 KB instruction and 256 KB data per core), enabling up to 16 threads and 12 instructions retired per cycle for superior multi-threaded efficiency. Architectural enhancements included optimized integer multiply units and a wider execution pipeline, providing up to 1.9x the performance of Tukwila in HPC benchmarks while supporting up to eight sockets via QPI at 6.4 GT/s. With 3.1 billion transistors, Poulson focused on power efficiency and scalability for large-scale enterprise and technical computing environments. The final evolution, the Itanium 9700 series codenamed Kittson in 2017, retained the eight-core layout on 32 nm but refined frequencies to up to 2.66 GHz with a TDP of 130-170 W and 32 MB L3 cache. It incorporated dual on-die controllers supporting scalable interfaces for up to 2 TB DDR4 per socket, supporting up to eight-socket configurations via QPI for HPC clusters. Kittson emphasized with prior generations and enhanced , delivering incremental performance gains in mission-critical applications while prioritizing reliability features like advanced RAS extensions. Overall, these evolutions sustained Itanium's niche in high-end servers, with annual system shipments stabilizing around 26,000 units through 2016 amid a focus on specialized, resilient computing.

End of production and support (post-2017)

Intel ceased development of new Itanium designs after 2017, citing the dominance of the x86 architecture in server markets as the primary reason for shifting resources away from the IA-64 platform. The final processor, codenamed Kittson (Itanium 9700 series), saw its last shipments in July 2021, marking the end of hardware production. A significant legal challenge arose from the 2010 HP-Oracle lawsuit, stemming from 's decision to withdraw support for its software on Itanium-based HP servers, despite a prior agreement. The dispute originated in a September 2010 settlement of an unrelated case involving former HP CEO , which obligated to maintain compatibility with HP's Itanium systems. 's subsequent announcement to end support in 2011 prompted HP to sue for , leading to a 2012 ruling in HP's favor that enforced continued support and awarded damages, though the ecosystem suffered from reduced vendor confidence. Software support has progressively wound down, reflecting the hardware's obsolescence. The Linux kernel removed IA-64 architecture support in version 6.7, released in late 2023, after it had been marked as orphaned earlier that year due to lack of maintainers and minimal usage. HP-UX 11i v3, the primary operating system for Itanium servers, reached end of standard support on December 31, 2025, with Hewlett Packard Enterprise (HPE) ceasing all maintenance thereafter. However, the GNU Compiler Collection (GCC) temporarily reversed its deprecation of Itanium support in version 15, released in 2025, to aid legacy code maintenance amid ongoing discussions for full removal in future releases. As of November 2025, support remains in GCC 15 but is planned for removal in GCC 16. HPE has accelerated its transition to x86 and architectures for enterprise servers, fully phasing out Itanium-based systems by the end of 2025 to align with broader industry trends toward more efficient, scalable platforms. Despite this, a small number of legacy Itanium installations persist in specialized sectors like and , where custom applications and high-reliability requirements delay full migrations. As of 2025, no active Itanium production or new hardware shipments occur, but emulation tools such as continue to enable software testing and gradual transitions to modern architectures for remaining users.

Hardware support

Server systems and platforms

The initial Itanium server platforms emerged in 2001 with Hewlett-Packard's Integrity rx series, designed as rackmount systems for enterprise workloads. The rx4610 model supported 2 to 4 first-generation Merced Itanium processors operating at 733 MHz or 800 MHz with 2 MB or 4 MB L3 cache, respectively, in a 7U chassis weighing up to 68 kg. It featured up to 64 GB of PC100 SDRAM across 64 slots, 10 PCI slots (including 8 hot-plug 64-bit/66 MHz slots), and high-availability elements such as N+1 redundant 800W hot-swap power supplies and 6 hot-swap fans for cooling. Storage options included 2 hot-plug 36 GB drives, with system bandwidth reaching 4.2 GB/s for and 2.1 GB/s for I/O. These systems targeted mission-critical applications under 11i v1.5, , or . Mid-era platforms expanded scalability and density, exemplified by the HP Integrity rx7640 introduced around 2005 with Itanium 2 processors, such as the single-core Madison generation at up to 1.6 GHz or later dual-core models. This 7U rackmount server accommodated up to 8 processors (up to 16 cores with dual-core configurations), 256 GB of DDR2 memory, and configurations with 2 cell boards, 10 GB base memory, and 5 PCI cards for enhanced I/O. It included 2 internal hard drives, a DVD drive, and redundant bulk power supplies, emphasizing reliability for database and tasks. Blade-based designs also proliferated, with the HP Integrity BL60p as an early Itanium 2 blade for the BladeSystem p-Class enclosure, supporting 2 processors, 2 hot-swappable U320 SCSI drives, and integration with up to 256 virtual machines under for low . The HPE Integrity Superdome 2, launched later in the decade, introduced modular blade architecture using 8-core Itanium 9700 series processors (Poulson generation) in CB900s blades, scaling to 16 blades per enclosure for up to 32 sockets and 8 TB memory in a unified . Later platforms focused on and extreme scalability, particularly through SGI's offerings. The SGI Altix 4700 series utilized McKinley and subsequent Itanium 2 processors in NUMA configurations, supporting up to cores in bandwidth-optimized systems for scientific simulations, as evidenced by its ranking on supercomputing lists with Itanium cores. The HPE Superdome 2 evolved to support Kittson-generation Itanium processors (planned for mid-2017 refresh), maintaining up to 8 sockets in compact configurations while preserving compatibility with Poulson-era systems; following the Kittson refresh, no further Itanium platforms were developed due to the architecture's discontinuation. Form factors spanned rackmount (e.g., rx series), blades (e.g., Superdome 2 CB900s), and specialized systems for . Hewlett-Packard (later HPE) dominated the Itanium server market, capturing approximately 90% share by 2009, with SGI, (via PrimeQuest series), and each holding around 1%. 's PrimeQuest servers supported up to 32 Itanium sockets for mission-critical Unix environments. Scalability relied on NUMA architectures, enabling single systems up to 64 processors in earlier Superdome configurations or 32 sockets in Superdome 2, with larger clusters possible for over 1000 processors in SGI Altix for HPC, with dense configurations featuring redundant power (e.g., 4x 800W supplies) and hot-swap cooling (e.g., multiple fans per enclosure) to manage thermal loads in multi-socket setups exceeding 100 kW.

Chipset integrations

The Itanium processor family relied on a variety of s from and key partners to handle system connectivity, , and I/O expansion, evolving from early designs supporting basic PCI interfaces to more advanced configurations enabling large-scale multiprocessor systems. 's initial 460GX , introduced alongside the Merced-generation Itanium in 2001, served as the primary northbridge solution, supporting up to four processors with standard 100 MHz SDRAM , PCI slots for expansion, and AGP 4x for graphics in configurations. This isolated and I/O subsystems from the processor bus, allowing customization for servers or while providing bi-directional 1.3 GB/s bandwidth to . For the McKinley and subsequent Itanium 2 generations starting in 2002, Intel developed the E8870 chipset, which enhanced scalability for up to 16 processors in shared-memory modules and supported 128 GB of DDR SDRAM at 6.4 GB/s bandwidth. The E8870 included scalable node controllers for multiprocessor interconnects, though early implementations used front-side bus architectures rather than later serial links. Later iterations, particularly for the Tukwila-generation Itanium 9300 in 2010, integrated Intel's QuickPath Interconnect (QPI) with up to six links per processor for cache-coherent multiprocessing, enabling bandwidths of 96 GB/s for interprocessor communication and improved I/O handling shared with Xeon platforms. Hewlett-Packard (later HPE) contributed significantly to Itanium chipset designs, particularly for its server line. The sx2000 chipset, launched in 2005 for Madison-generation Itanium 2 systems like the Superdome, supported up to 64 processors in NUMA configurations with redundant cell boards and double-chip-spare memory for . HP also integrated TACHYON-based controllers, such as the A6795A adapter, providing 2 Gbps connectivity for storage area networks in Itanium environments running . For NUMA scaling, HP systems incorporated PCI bridges to manage I/O expansion across nodes, ensuring low-latency access in multi-socket setups. Other vendors developed specialized s for Itanium-based platforms. (SGI) utilized the SN1 router ASIC in its Altix series, connecting up to 512 Itanium 2 processors via the proprietary NUMAlink interconnect, which provided directory-based similar to for clusters with 2.4 GB/s per port bandwidth. Fujitsu's PRIMEQUEST servers employed a custom-developed with proprietary , optimized for mission-critical mainframe workloads on Itanium 2 processors, supporting up to 32 sockets, 512 GB of memory, and enhanced reliability features like error-correcting code extensions. Over time, Itanium chipsets evolved from support in early generations (offering up to 1 GB/s per slot) to PCIe 3.0 in the Poulson-generation Itanium 9500 by , delivering 8 GT/s per for high-bandwidth I/O. Memory support progressed to DDR3-1600 in later designs, achieving up to 51.2 GB/s bandwidth per socket, while QPI facilitated cache-coherent NUMA across scales from 2 to 8 sockets. Itanium chipsets lacked integrated graphics processing, focusing exclusively on server applications and requiring external host bus adapters (HBAs) for storage and networking tasks like or Ethernet.

Software support

Operating systems

The Itanium architecture received native support from several Unix variants, reflecting its initial positioning in enterprise computing. 11i provided full support starting in 2001, enabling deployment on HP Integrity servers, with ongoing updates through version 11i v3 until its scheduled end-of-support in December 2025, with Mature Support available until December 31, 2028. IBM ported AIX 5L to Itanium as part of Project Monterey, with initial availability in 2003, though adoption remained limited and support was discontinued in 2016. 10 offered native Itanium support from 2005, targeting high-end servers, but this was terminated in 2012 following Oracle's broader withdrawal from the platform. Linux kernel support for Itanium (ia64) began with version 2.4 in 2001, allowing early adoption in research and enterprise environments. Major distributions followed suit, including 5, which maintained support through extended lifecycle phases until November 2020 for legacy users. also provided ia64 packages, though with decreasing maintenance; kernel-level deprecation occurred in version 6.7 in 2023, while user-space compatibility persists for existing installations. Microsoft offered native Itanium support in and 2008, optimized for and database workloads, with mainstream support ending in 2010 and extended support concluding in 2018 for the 2008 edition. No subsequent versions, including or later, supported Itanium natively, marking the end of 's investment in the architecture. Proprietary operating systems further extended Itanium's ecosystem in specialized markets. , ported by HP in 2003, ran on servers for mission-critical applications, with HP providing support until 2020, after which VSI OpenVMS continued maintenance, with VSI providing support until December 31, 2035. HP NonStop OS, originally from , utilized Itanium for fault-tolerant in financial and telecommunications sectors, with support ending December 31, 2025. Tru64 Unix, inherited from DEC, received limited Itanium support starting around 2001 but saw minimal adoption and was phased out by the mid-2000s in favor of other platforms. Porting operating systems to Itanium presented significant challenges due to the architecture's lack of binary compatibility with x86, necessitating full recompilation of applications and libraries rather than direct migration of existing binaries. This required substantial developer effort to optimize for Itanium's (EPIC) model, often leading to specific to the hardware.

Compilers and development tools

Intel developed the primary compilers for the Itanium , starting with the ECC (Itanium C Compiler) released in 2001 for early development and testing on platforms like and UNIX. The ICC (Intel C++ Compiler) for IA-64 followed, incorporating advanced features such as bundle scheduling to group instructions into fixed 128-bit bundles for explicit parallel execution and predication to enable conditional execution without branches using 64 predicate registers. These compilers targeted C, C++, and , emphasizing the (EPIC) model to exploit . Hewlett-Packard provided the aC++ compiler optimized for HP-UX on Itanium, focusing on EPIC-specific enhancements like control and speculation to execute instructions ahead of dependency resolution, using speculative loads and NaT (Not a Thing) tokens for . It also supported register rotation through software pipelining, which overlaps loop iterations using rotating registers and predicates to minimize code expansion and improve loop performance. Open-source support emerged with (GCC) version 3.0 in 2002, introducing an backend for C and C++ compilation. The Open64 compiler, derived from SGI's MIPS tools and released as open-source, further advanced optimization research. In 2024, the GCC backend was undeprecated in GCC 15, maintaining limited compatibility despite broader platform decline. Key optimization techniques in these toolchains included software via modulo scheduling to overlap loop operations and if-conversion to transform branches into predicated code, reducing misprediction penalties on Itanium's deep . Tools like Intel VTune Profiler aided development by analyzing parallelism, cache misses, and branch behavior on systems, enabling targeted tuning of EPIC code. Developing for Itanium posed challenges due to the EPIC paradigm's reliance on intelligence, often resulting in optimization times of hours for large applications because of complex global scheduling and profile-guided passes. To bridge legacy software, Intel's Execution Layer (IA-32 EL) served as a dynamic binary translator, converting x86 instructions to native at runtime for compatibility without full recompilation. Today, active development of Itanium compilers is limited, with and HP focusing on maintenance rather than new features; archived versions remain available through developer portals for legacy support. Migration guides from vendors like HPE recommend porting code to architectures, leveraging automated tools for syntax and optimization adjustments to ease transition from environments.

Virtualization and emulation

Hewlett-Packard Enterprise (HPE) developed nPars, a hardware partitioning technology for servers based on Itanium processors, enabling the division of a single physical system into multiple independent partitions for improved resource isolation and management. Complementing this, Virtual Machines ( VM), introduced in 2005, provided soft partitioning and full virtualization support for environments on Itanium hardware, allowing multiple virtual machines to run concurrently on a single physical host while leveraging Intel's Technology for Itanium (VT-i). These solutions facilitated efficient workload consolidation and resource allocation in enterprise settings, with VM requiring 11i v2 (May 2005 update) or later as the host and guest operating system. For development purposes, created the simulator, an open-source instruction set emulator designed to execute and debug Itanium code on non- platforms such as x86 systems running . simulates the Itanium architecture as specified in Intel's Itanium 2 manuals, supporting user-level and basic operating system development without full platform emulation, and it includes features like a for loading executables and inspecting machine state. This tool proved valuable for early software porting and testing before native hardware availability, though it lacks networking or peripheral simulation. To address compatibility with existing software ecosystems, implemented the Execution Layer ( EL), a dynamic system that enabled 32-bit x86 applications to run on Itanium processors with minimal overhead. EL employed a two-phase approach—initial template-based translation followed by runtime optimization—bypassing the slower hardware x86 emulation mode in early Itanium models and achieving performance comparable to equivalently clocked x86 processors like , typically incurring 10-20% overhead on Itanium 2 systems. This layer was integrated into operating systems such as Windows and for Itanium, supporting seamless execution of legacy x86 binaries in enterprise applications. During the early 2000s transition from 's Alpha architecture to Itanium following HP's acquisition of Compaq, efforts focused on source-level migration rather than direct binary emulation, as no widespread runtime emulator like FX!32 (originally for x86 on Alpha) was developed specifically for Alpha binaries on Itanium; instead, HP provided porting guides and recompilation frameworks for and Tru64 UNIX applications to ensure compatibility. For reverse emulation, running Itanium (IA-64) code on non-Itanium architectures, QEMU offers partial support through its IA-64 target, enabling emulation of Itanium binaries and basic system images primarily for legacy application testing and preservation. However, QEMU's Tiny Code Generator (TCG) interpreter for IA-64 results in significantly reduced performance, often comparable to a 1990s-era Pentium 100 MHz processor, making it unsuitable for production workloads but useful for development or archival purposes. Microsoft did not provide native emulation for IA-64 binaries on x86-64 Windows via WOW64 or Application Virtualization (App-V), limiting support to source recompilation or containerized migration paths, as IA-64 was discontinued after Windows Server 2012. Migration from Itanium systems often involved HP's Systems Insight Manager (SIM), a centralized tool that monitored and facilitated hardware inventory, configuration , and phased transitions to x86 platforms, including automated reporting for decommissioning planning. HP also offered recompilation suites and porting kits, such as those for , to automate source code adaptation from Itanium to x86 equivalents, reducing manual effort in enterprise migrations. In the 2020s, financial institutions decommissioned Itanium-based infrastructure as part of broader modernization efforts, with case studies highlighting migrations to x86 cloud environments to cut costs and extend support lifecycles, though specific banking examples emphasize general shifts rather than Itanium-exclusive tools. Emulation generally imposed substantial performance penalties, with dynamic binary translation on Itanium for x86 code showing around 50% instructions-per-cycle (IPC) loss in unoptimized scenarios compared to native execution, though optimized layers like IA-32 EL mitigated this to near-parity. In hybrid environments, such as HPE's Superdome series, transitions from Itanium-based models (e.g., Superdome 2) to x86 variants (e.g., Superdome X and Flex) relied on for interim compatibility, allowing gradual workload shifts without full emulation overhead. For source-based migrations, compilers from prior toolchains could be referenced briefly to enable recompilation for x86 targets, streamlining the process alongside emulation where binaries remained irreplaceable.

Market and reception

Adoption in high-end servers

Itanium's adoption in high-end servers reached its zenith between 2003 and 2008, a period marked by growing deployment in enterprise environments seeking robust 64-bit processing for demanding workloads. During this time, (HP), later (HPE), dominated the market, accounting for approximately 80% of Itanium system sales in 2007 and rising to 95% by 2008. This dominance was driven by HP's server line, which integrated Itanium processors with and NonStop operating systems tailored for high-reliability computing. The architecture's focus on explicit parallelism appealed to sectors requiring fault-tolerant operations, contributing to Intel's server revenue growth amid the broader shift to 64-bit systems. Key markets for Itanium included mission-critical applications in , , and , where downtime costs were prohibitive. In and telecom, HP NonStop systems powered and , leveraging Itanium's scalability for real-time operations. Government agencies adopted Itanium for high-performance simulations; notably, NASA's Columbia , equipped with 10,240 Itanium 2 processors, achieved 51.87 teraflops in 2004, enabling advanced modeling and earning recognition as the world's fastest at the time. These deployments underscored Itanium's role in environments prioritizing availability over raw speed. Market metrics highlight Itanium's niche but influential presence. In the third quarter of 2005 alone, Gartner reported 7,845 Itanium servers sold, contributing to growing cumulative shipments worldwide by mid-year according to IDC. The RISC/Itanium Unix server segment, where Itanium played a prominent role, generated $15.4 billion in revenue for 2005, representing a stable portion of the overall $51.7 billion global server market. Over its lifetime, Itanium shipments totaled hundreds of thousands of processors, with annual production estimates reaching 200,000 units by 2007. Success factors centered on reliability and scalability suited to enterprise databases and clustered systems. HP NonStop platforms delivered 99.9999% uptime (six nines), categorized as Availability Level 4 (AL4) by IDC, enabling continuous operation for transaction-heavy applications without single points of failure. For database workloads, scaled effectively on Itanium systems, supporting multi-terabyte environments in clustered configurations up to 64 processors per node, as certified by for versions through 12c. Signs of decline emerged by 2010, as Itanium's dwindled below 1% of overall server revenues amid the rise of cost-efficient x86 multicore processors from Intel's line. The RISC/Itanium segment's revenue, for example, reached $2.4 billion in Q3 2011 with a 3.5% year-over-year increase but with shipments dropping sharply. Despite this, legacy installations persisted; HPE reported supporting around 50,000 active Itanium systems as of , primarily in entrenched mission-critical sites reluctant to migrate due to compatibility and reliability concerns.

Competition and decline

The Itanium architecture faced significant competition from x86 processors, particularly Intel's own Xeon line and AMD's Opteron multicore chips introduced in 2003, which provided 64-bit extensions while maintaining backward compatibility with existing x86 software ecosystems. Unlike Itanium, which required recompilation for its Explicitly Parallel Instruction Computing (EPIC) model and lacked native support for legacy x86 binaries, Opteron enabled seamless execution of 32-bit x86 applications at high speeds, eroding Itanium's appeal in enterprise servers. RISC-based architectures also posed rivals, including IBM's Power series, Sun Microsystems' SPARC, and Hewlett-Packard's PA-RISC processors, which dominated high-end computing through the early 2000s before many transitioned or declined. Emerging 64-bit ARM processors from 2012 further intensified pressure by offering power-efficient alternatives for servers, though Itanium's decline predated widespread ARM adoption in that segment. Itanium's market contraction accelerated due to x86 achieving performance parity by around 2005, driven by advancements like and multicore designs that boosted throughput without the architectural overhaul required for Itanium. The ecosystem for Itanium lagged behind, as optimizing for its EPIC paradigm proved complex and time-intensive compared to the mature x86 toolchain, limiting software availability and developer adoption. Intel's strategic pivot toward enhancing x86 post-2006, including aggressive process shrinks and feature additions, further marginalized Itanium by prioritizing the higher-volume architecture. Economic pressures exacerbated the decline, with Itanium's development incurring costs exceeding $1 billion due to its novel design and limited production scale, contrasting sharply with x86's billions of units shipped annually for . Low sales volumes hindered cost recovery, while the 2008 financial recession prompted vendors to consolidate around commoditized x86 platforms, accelerating exits from proprietary RISC and Itanium investments. Key events underscored the contraction: , following its 2010 acquisition of , announced in 2011 the cessation of software development for Itanium, citing Intel's waning commitment and low market traction as factors propping up its SPARC-based alternatives. formally cut funding and production in 2019, accepting last orders until January 2020 and ceasing shipments by July 2021. HPE announced that standard support for Itanium-based systems will end on December 31, 2025. By 2025, Itanium persists in niche legacy roles within specialized high-end servers, but holds less than 0.1% of the overall server market share, serving as a cautionary example of lock-in challenges in a x86- and ARM-dominated landscape.

Legacy in computing

Despite its commercial challenges, the Itanium left a lasting mark on through its (EPIC) paradigm, which emphasized compiler-driven parallelism and features like predication to reduce mispredictions. Predication, where instructions are conditionally executed based on predicates rather than branches, influenced subsequent hardware-software co-designs by enabling more efficient exploitation of (ILP) in superscalar processors. Similarly, Itanium's advanced mechanisms, including advanced load speculation and checkpointing, informed models in modern engines, allowing compilers to safely reorder loads ahead of dependent operations. In , Itanium spurred innovations in technology that extended beyond its . Its reliance on sophisticated optimization for ILP extraction advanced techniques now integral to open-source compilers like GCC and , particularly in , , and ABI design. The Itanium C++ ABI, for instance, became the standard for non-Windows platforms in GCC and Clang/, facilitating consistent binary interfaces across diverse architectures. These advancements highlighted the potential of compiler-hardware synergy, influencing how modern toolchains handle vectorization and auto-parallelization in environments. Itanium played a pivotal role in high-performance computing (HPC) during the early 2000s, powering scalable shared-memory systems that advanced supercomputing capabilities. SGI's Altix series, built on Itanium 2 processors, enabled massive single-system image configurations, with NASA's Columbia supercomputer—comprising 10,240 Itanium 2 cores—achieving 51.87 teraflops and ranking among the world's fastest systems in 2004. At its peak, Itanium-based systems accounted for over 80 entries on the TOP500 list in late 2004, representing about 16% of the total and demonstrating viability for large-scale scientific simulations in fields like astrophysics and climate modeling. This era underscored Itanium's strength in NUMA-aware environments, where its 64-bit addressing and cache coherence supported unprecedented memory scalability up to terabytes. Beyond HPC, Itanium found niche applications in mission-critical domains requiring high reliability and scalability. In , deployed Itanium-based HP NonStop platforms for its Centricity Enterprise systems, leveraging the architecture's fault-tolerant features for processing and archiving large volumes of diagnostic from scanners. In defense simulations, Itanium powered high-end workstations and clusters for complex modeling, though specific deployments like those at national labs emphasized its role in secure, high-throughput computations. The development of migration tools and utilities during Itanium's lifecycle has since become standard practice for transitioning legacy workloads to x86 or , incorporating lessons in emulation and recompilation. From a 2025 vantage, Itanium serves as a for emerging open ISAs like , illustrating the perils of betting on novel paradigms without broad ecosystem buy-in, much as navigates fragmentation risks in extensions and commercialization. Academically, it remains a key in ILP and optimization, with EPIC principles referenced in research on parallelism extraction for multicore and GPU architectures. Its intellectual legacy endures through over 200 patents related to EPIC innovations, many still cited in ongoing work on hardware-software partitioning.

References

  1. https://en.wikichip.org/wiki/intel/microarchitectures/merced
  2. https://en.wikichip.org/wiki/intel/microarchitectures/montvale
Add your contribution
Related Hubs
User Avatar
No comments yet.