Hubbry Logo
logo
Lion Cove
Community hub

Lion Cove

logo
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something to knowledge base
Hub AI

Lion Cove AI simulator

(@Lion Cove_simulator)

Lion Cove

Lion Cove is a 64-bit x86 CPU core architecture designed by Intel. The Lion Cove core is featured in Core Ultra Series 2 Arrow Lake and Lunar Lake processors.

Lion Cove is a performance core architecture aimed at providing high computing performance with wider integer and vector execution units, wider fetch and increased core frequencies compared to the Intel's density-optimized E-core architectures. Intel claims a 14% increase in instructions per cycle (IPC) with the Lion Cove P-core over Redwood Cove. Intel approached the Lion Cove design process with the intention to "remove any transistor from the design that doesn't directly contribute to productivity", stripping down the core design in order to focus on single-threading and core area efficiency. Ori Lempel served as Senior Principal Engineer for the Lion Cove- P-core design.

The front-end of the Lion Cove core for fetching, decoding and issuing instructions has been made wider and deeper. There is eight-way decoding of instructions from the Instruction Queue, up from six-way decode in Redwood Cove. Likewise, Lion Cove's out-of-order engine uses an eight-way allocation/rename queue, increased from Redwood Cove's six-way queue. The out-of-order engine has split the renamers and scheduling into dedicated integer and vector domains which allows Intel to modify each of these domains independently in future designs without requiring a complete redesign of the out-of-order engine. Both of these domains have their own individual access to the micro-op queue. The larger Ops cache size and longer queue benefit efficiency as more micro-ops being stored in the larger cache does not require the decode logic to be powered up again.

Branch prediction has been strengthened in Lion Cove with the core's prediction block being 8 times wider than Redwood Cove. The branch predictor in a core tries to predict the outcome when there are diverging code paths or branch. Lion Cove's L0 Branch Target Buffer (BTB) cache has been doubled to 256 entries to store a higher number of target addresses for a taken branch which can be used to help predict the next branch and reduce the number of misses.

Lion Cove increases the number of integer Arithmetic Logic Units (ALUs) to six. Redwood Cove contained five ALUs that used a 256-bit wide pipe. The number of integer multiply units has risen from one to three which means that the core can enact more than one integer multiply operation per cycle.

Intel's vector engine design in Lion Cove now more closely resembles that used by AMD since Zen with four pipes for floating point and vector execution. Two of those pipes deal with floating-point multiplications and multiply-adds, while the two other pipes handle floating-point additions. The number of floating-point dividers has increased from one to two with improved throughput. For handling sort-vector instructions, the vector engine contains four SIMD ALUs, up from three in Redwood Cove.

Lion Cove supports AVX-512 instructions but it is disabled in heterogeneous processor generations like Arrow Lake and Lunar Lake. This is no different to Golden Cove, Raptor Cove or Redwood Cove that had their AVX-512 support disabled in all heterogeneous non-server products.

Lion Cove introduces an expanded cache hierarchy with four caching tiers rather than three. With select Broadwell SKUs in 2015, Intel added a 128 MB eDRAM that acted like fourth level cache. However, this eDRAM was not a traditional cache as it was placed on a separate die as a form of slower shared memory between the CPU cores and graphics with its intended purpose being to reduce memory access requests. Broadwell's L3 cache had three times lower per-cycle latency and over triple the bandwidth compared to its eDRAM. In terms of adding a new level of traditional cache, the last time Intel did so was in 2003 with L3 cache on the Pentium 4 Extreme Edition.

See all
User Avatar
No comments yet.