Hubbry Logo
Mali (processor)Mali (processor)Main
Open search
Mali (processor)
Community hub
Mali (processor)
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Mali (processor)
Mali (processor)
from Wikipedia
Mali
ARM Cortex A57 A53 big.LITTLE SoC with a Mali-T624 GPU
Release date2005
Architecture
  • Utgard
  • Midgard
  • Bifrost
  • Valhall
ModelsSee Variants
Cores1 to 32 cores
Fabrication process4 to 40 nm
API support
Direct3D9 to 12
OpenCL1.1 to 3.0
OpenGL2.0 to 3.0
Vulkan1.0 to 1.3

The Mali and Immortalis series of graphics processing units (GPUs) and multimedia processors are semiconductor intellectual property cores produced by Arm Holdings for licensing in various ASIC designs by Arm partners.

Mali GPUs were developed by Falanx Microsystems A/S, which was a spin-off of a research project from the Norwegian University of Science and Technology.[1] Arm Holdings acquired Falanx Microsystems A/S on June 23, 2006 and renamed the company to Arm Norway.[2]

It was originally named Malaik, but the team shortened the name to Mali, Serbo-Croatian for "small", which was thought to be fitting for a mobile GPU.[3]

On June 28, 2022, Arm announced their Immortalis series of GPUs with hardware-based Ray Tracing support.[4]

Graphics processors

[edit]

Utgard

[edit]

In 2005, Falanx announced their Utgard GPU Architecture, the Mali-200 GPU.[5] Arm followed up with the Mali-300, Mali-400, Mali-450, and Mali-470. Utgard was a non-unified GPU (discrete pixel and vertex shaders).[1]

Comparison of Mali Utgard graphics processing units
Model Launch date Type EUs/Shader core count Core clock rate

(MHz)

L2 cache size Fillrate GFLOPS
(per core)
OpenGL ES
M△/s GT/s (GP/s)
Mali-55/110 2005 Fixed function pipeline[6] 1 2.8 0.1 ? 1.1
Mali-200 2007[7] Programmable pipeline[6] 1 5 ? 0.2 2.0
Mali-300 2010[8] 1 500 8 KiB 55 0.5 5
Mali-400 MP 2008 1–4 200–600 8–256 KiB 55 0.5 1.2–5.4
Mali-450 MP 2012 1–8 300–750 8–512 KiB 142 2.6 4.5–11.9
Mali-470 MP 2015 1–4 250–650 8–256 KiB 71 0.65 8–20.8

Midgard

[edit]

1st generation

[edit]

On November 10, 2010, Arm announced their Midgard 1st gen GPU Architecture, including the Mali-T604 and later the Mali-T658 GPU in 2011.[9][10][11][12] Midgard uses a Hierarchical Tiling system.[1]

2nd generation

[edit]

On August 6, 2012, Arm announced their Midgard 2nd gen GPU Architecture, including the Mali-T678 GPU.[13] Midgard 2nd gen introduced Forward Pixel Kill.[1][14]

3rd generation

[edit]

On October 29, 2013, Arm announced their Midgard 3rd gen GPU Architecture, including the Mali-T760 GPU.[15][1][16][17][18]

4th generation

[edit]

On October 27, 2014, Arm announced their Midgard 4th gen GPU Architecture, including the Mali-T860, Mali-T830, Mali-T820. Their flagship Mali-T880 GPU was announced on February 3, 2015. New microarchitectural features include:[19]

  • Up to 16 cores for the Mali-T880, with 256KB – 2MB L2 cache

Bifrost

[edit]

1st generation

[edit]

On May 27, 2016, Arm announced their Bifrost GPU Architecture, including the Mali-G71 GPU. New microarchitectural features include:[20][21]

  • Unified shaders with quad vectorization
  • Scalar ISA
  • Clauses execution
  • Full cache coherency
  • Up to 32 cores for the Mali-G71, with 128KB – 2MB L2 cache
  • Arm claims the Mali-G71 has 40% more performance density and 20% better energy efficiency than the Mali-T880

2nd generation

[edit]

On May 29, 2017, Arm announced their Bifrost 2nd gen GPU Architecture, including the Mali-G72 GPU. New microarchitectural features include:[22][23]

  • Arithmetic optimizations and increased caches
  • Up to 32 cores for the Mali-G72, with 128KB – 2MB L2 cache
  • Arm claims the Mali-G72 has 20% more performance density and 25% better energy efficiency than the Mali-G71

3rd generation

[edit]

On May 31, 2018, Arm announced their Bifrost 3rd gen GPU Architecture, including the Mali-G76 GPU. New microarchitectural features include:[24][25]

  • 8 execution lanes per engine (up from 4). Doubled pixel and texel throughput
  • Up to 20 cores for the Mali-G76, with 512KB – 4MB L2 cache
  • Arm claims the Mali-G76 has 30% more performance density and 30% better energy efficiency than the Mali-G72

Valhall

[edit]

1st generation

[edit]

On May 27, 2019, Arm announced their Valhall GPU Architecture, including the Mali-G77 GPU, and in October Mali-G57 GPUs. New microarchitectural features include:[26][27][28]

  • New superscalar engine
  • Simplified scalar ISA
  • New dynamic scheduling
  • Up to 16 cores for the Mali-G77, with 512KB – 2MB L2 cache
  • Arm claims the Mali-G77 has 30% more performance density and 30% better energy efficiency than the Mali-G76

2nd generation

[edit]

On May 26, 2020, Arm announced their Valhall 2nd Gen GPU Architecture, including the Mali-G78. New microarchitectural features include:[29][30][31]

  • Asynchronous clock domains
  • New FMA units and increase Tiler throughput
  • Up to 24 cores for the Mali-G78, with 512KB – 2MB L2 cache
  • Arm Frame Buffer Compression (AFBC)
  • Arm claims the Mali-G78 has 15% more performance density and 10% better energy efficiency than the Mali-G77

3rd generation

[edit]

On May 25, 2021, Arm announced their Valhall 3rd Gen GPU Architecture (as part of TCS21), including the Mali-G710, Mali-G510, and Mali-G310 GPUs. New microarchitectural features include:[32][33][34]

  • Larger shader cores (2x compared to Valhall 2nd Gen)
  • New GPU frontend, Command Stream Frontend (CSF) replaces the Job Manager
  • Up to 16 cores for the Mali-G710, with 512KB – 2MB L2 cache
  • Arm claims the Mali-G710 has 20% more performance density and 20% better energy efficiency than the Mali-G78

4th generation

[edit]

On June 28, 2022, Arm announced their Valhall 4th Gen GPU Architecture (as part of TCS22), including the Immortalis-G715, Mali-G715, and Mali-G615 GPUs. New microarchitectural features include:[4][35]

  • Ray Tracing support (hardware-based)
  • Variable Rate Shading[36]
  • New Execution Engine, with doubled the FMA block, Matrix Multiply instruction support, and PPA improvements
  • Arm Fixed Rate Compression (AFRC)
  • Arm claims the Immortalis-G715 has 15% more performance & 15% better energy efficiency than the Mali-G710[37]

5th generation

[edit]

On May 29, 2023, Arm announced their 5th Gen Arm GPU Architecture (as part of TCS23), including the Immortalis-G720, Mali-G720 and Mali-G620 GPUs.[38][39][40] New microarchitectural features include:[41]

  • Deferred vertex shading (DVS) pipeline
  • Arm claims the Immortalis-G720 has 15% more performance and uses up to 40% less memory bandwidth than the Immortalis-G715

Technical details

[edit]

Like other embedded IP cores for 3D rendering acceleration, the Mali GPU does not include display controllers driving monitors, in contrast to common desktop video cards. Instead, the Mali ARM core is a pure 3D engine that renders graphics into memory and passes the rendered image over to another core to handle display.

ARM does, however, license display controller SIP cores independently of the Mali 3D accelerator SIP block, e.g. Mali DP500, DP550 and DP650.[42]

ARM also supplies tools to help in authoring OpenGL ES shaders named Mali GPU Shader Development Studio and Mali GPU User Interface Engine.

Display controllers such as the ARM HDLCD display controller are available separately.[43]

Variants

[edit]

The Mali core grew out of the cores previously produced by Falanx and currently constitute:[44]

Model Microarchi-
tecture
Type Launch date EUs/Shader core count Shading Units (per core) Total Shaders Fab (nm) Die size
(mm2)
Core clock rate (MHz) L2 cache size (KiB) Fillrate GFLOPS
(per core)
GFLOPS
(total)
API (version)
M△/s GT/s (GP/s) Vulkan OpenGL ES OpenCL
Mali-T604[45] Midgard 1st gen Unified shader model +

SIMD ISA

Nov 2010[46] 1–4 8 8–32 32
28
? 533 32–256 133 0.6 @ 600 MHz 9.6 @ 600 MHz 9.6–38.4 @ 600 MHz 3.1 1.1 Full Profile
Mali-T658[45] Nov 2011[47] 1–8 16 16–128 ? ? ? 19.2 @ 600 MHz 19.2–153.6 @ 600 MHz
Mali-T622 Midgard 2nd gen Jun 2013[48] 1–2 4 4–8 32
28
? 533 ? ? 4.8 @ 600 MHz 4.8–9.6 @ 600 MHz
Mali-T624 Aug 2012 1–4 8 8–32 ? 533–600 ? ? 9.6 @ 600 MHz 9.6–38.4 @ 600 MHz
Mali-T628 1–8 16 16–128 ? 533–695 ? ? 19.2 @ 600 MHz 19.2–153.6 @ 600 MHz
Mali-T678[49] 1–8 28 ? ? ? ?
Mali-T720 Midgard 3rd gen Oct 2013 1–8 10 10–80 28
14
10
? 400–700 600 (MP8@
600 MHz)
0.6 @ 600 MHz 12 @ 600 MHz 12–96 @ 600 MHz
Mali-T760 1–16 14 14–224 28
20
14
1.75 mm2 per shader core at 14 nm[50] 600–772 256–2048[51] 1300 16.8 @ 600 MHz 16.8–268.8 @ 600 MHz 1.0[52] 3.2[53] 1.2 Full Profile
Mali-T820 Midgard 4th gen Q4 2015 1–4 8 8–32 28 ? 600 32–256[51] 400 9.6 @ 600 MHz 9.6–38.4 @ 600 MHz
Mali-T830 16 16–64 28
16
14
? 600–950 400 19.2 @ 600 MHz 19.2–76.8 @ 600 MHz
Mali-T860 1–16 14 14–224 ? 350–700 256–2048[51] 1300 16.8 @ 600 MHz 16.8–268.8 @ 600 MHz
Mali-T880 Q2 2016 1–16 21 21–351 20
16
14
? 650–1000 1700 25.2 @ 600 MHz 25.2–403.2 @ 600 MHz
Mali-G31 Bifrost 1st gen Unified shader model + Unified memory +

scalar, clause-based ISA

Q1 2018 1–6[54] 4 or 8 4–48 28
12
? 650 32–512 0.5 @ 1000 MHz 8–16 @ 1000 MHz 48–576 @ 1000 MHz 2.0 Full Profile
Mali-G51[55] Q4 2016 1–6[56] 8 or 12 8–72 28
16
14
12
10
? 1000 16–24 @ 1000 MHz 16–144 @ 1000 MHz
Mali-G71 Q2 2016 1–32 12 12–384 16
14
10
? 546–1037 128–2048 1850 1 @ 1000 MHz 24 @ 1000 MHz 24–768 @ 1000 MHz
Mali-G52 Bifrost 2nd gen Q1 2018 1–6 16 or 24 16–144 16
12
8
7
? 850 32-512 2 @ 1000 MHz 32–48 @ 1000 MHz 32–288 @ 1000 MHz 2.1 Full Profile
Mali-G72 Q2 2017 1–32 12 12–384 16
12
10
1.36 mm2 per shader core at 10 nm[57] 572–1050 128–2048 1 @ 1000 MHz 24 @ 1000 MHz 24–768 @ 1000 MHz 2.0 Full Profile
Mali-G76 Bifrost 3rd gen Q2 2018 4–20 24 96–480 12
8
7
? 600–800 512–4096 ? 2 @ 1000 MHz 2 @ 1000 MHz 48 @ 1000 MHz 192–960 @ 1000 MHz 1.1 2.1 Full Profile
Mali-G57 Valhall 1st gen Superscalar engine + Unified memory +

simplified scalar ISA

Q2 2019 1–6 32 32–192 12
7
6
? 950[58] 64–512 ? 4 @ 1000 MHz 64 @ 1000 MHz 64–384 @ 1000 MHz
Mali-G77 7–16 224–512 7
6
? 695–850 512–2048 ? 448–1024 @ 1000 MHz
Mali-G68 Valhall 2nd gen Q2 2020 1–6 32–192 6
5
3
64–384 @ 1000 MHz 1.2 3.0 Full Profile
Mali-G78 7–24 224–768 5 759-848 448–1536 @ 1000 MHz
Mali-G310 Valhall 3rd gen Q2 2021 1 16 or 32 or 64 16–64 6
5
4
256–1024 2, 4 or 8 @ 1000 MHz 2 or 4 @ 1000 MHz 32–128 @ 1000 MHz
Mali-G510 2–6 48 or 64 96–384 4 or 8 @ 1000 MHz 4 @ 1000 MHz 96–128 @ 1000 MHz 192–768 @ 1000 MHz
Mali-G610 1–6 64 64–384 512–2048 8 @ 1000 MHz 128 @ 1000 MHz 128–768 @ 1000 MHz
Mali-G710 7–16 448–1024 650,850
900
2648 896–2048 @ 1000 MHz
Mali-G615 Valhall 4th gen Q2 2022 1–6 128 128–768 4 256 @ 1000 MHz 256–1536 @ 1000 MHz 1.3[59]
Mali-G715 7–9 896–1152 1792–2304 @ 1000 MHz
Immortalis-G715 10–16 1280–2048 2560–4096 @ 1000 MHz
Mali-G620 5th Gen[60] Deferred Vertex Shading (DVS) Q2 2023 1–5 128–640 256–1024 256–1280 @ 1000 MHz
Mali-G720 6–9 768–1152 512–2048 1536–2304 @ 1000 MHz
Immortalis-G720 Q4 2023 10–16 1280–2048 2560–4096 @ 1000 
MHz
Mali-G625 Q2 2024 1–5 128–640 4
3
256–1024 256–1280 @ 1000 MHz
Mali-G725 6–9 768–1152 512–4096 1536–2304 @ 1000 MHz
Immortalis-G925 10–24 1280–3072 2560–6144 @ 1000 
MHz
Mali G1-Pro Q3 2025 1–5 128–640 3 512–2048 256–1280 @ 1000 MHz 1.4
Mali G1-Premium 6–9 768–1152 512–4096 1536–2304 @ 1000 MHz
Mali G1-Ultra 10–24 1280–3072 2560–6144 @ 1000 
MHz
Model Microarchi-
tecture
Type Launch date EUs/Shader core count Shading Units (per core) Total Shaders Fab

(nm)

Die size
(mm2)
Core clock rate (MHz) Max L2 cache size (KiB) Fillrate (per core) FP32 GFLOPS
(per core)
GFLOPS
(total)
Vulkan Open
GL/ES
Open
CL

Some microarchitectures (or just some chips?) support cache coherency for the L2 cache with the CPU.[61][62]

Adaptive Scalable Texture Compression (ASTC) is supported by Mali-T620, T720/T760, T820/T830/T860/T880[63] and Mali-G series.

Implementations

[edit]

The Mali GPU variants can be found in the following systems on chips (SoCs):

Video processors

[edit]

Mali Video is the name given to ARM Holdings' dedicated video decoding and video encoding ASIC. There are multiple versions implementing a number of video codecs, such as HEVC, VP9, H.264 and VP8. As with all ARM products, the Mali video processor is a semiconductor intellectual property core licensed to third parties for inclusion in their chips. Real time encode-decode capability is central to videotelephony. An interface to ARM's TrustZone technology is also built-in to enable digital rights management of copyrighted material.

Mali-V500

[edit]

The first version of a Mali Video processor was the V500, released in 2013 with the Mali-T622 GPU.[116] The V500 is a multicore design, sporting 1–8 cores, with support for H.264 and a protected video path using ARM TrustZone. The 8 core version is sufficient for 4K video decode at 120 frames per second (fps). The V500 can encode VP8 and H.264, and decode H.264, H.263, MPEG4, MPEG2, VC-1/WMV, Real, VP8.

Mali-V550

[edit]

Released with the Mali-T800 GPU, ARM V550 video processors added both encode and decode HEVC support, 10-bit color depth, and technologies to further reduced power consumption.[117] The V550 also included technology improvements to better handle latency and save bandwidth.[118] Again built around the idea of a scalable number of cores (1–8) the V550 could support between 1080p60 (1 core) to 4K120 (8 cores). The V550 supported HEVC Main, H.264, VP8, JPEG encode, and HEVC Main 10, HEVC Main, H.264, H.263, MPEG4, MPEG2, VC-1/WMV, Real, VP8, JPEG decode.

Mali-V61

[edit]

The Mali V61 video processor (formerly named Egil) was released with the Mali Bifrost GPU in 2016.[119][120] V61 has been designed to improve video encoding, in particular HEVC and VP9, and to allow for encoding either a single or multiple streams simultaneously.[121] The design continues the 1–8 variable core number design, with a single core supporting 1080p60 while 8 cores can drive 4Kp120. It can decode and encode VP9 10-bit, VP9 8-bit, HEVC Main 10, HEVC Main, H.264, VP8, JPEG and decode only MPEG4, MPEG2, VC-1/WMV, Real, H.263.[122]

Mali-V52

[edit]

The Mali V52 video processor was released with the Mali G52 and G31 GPUs in March 2018.[123] The processor is intended to support 4K (including HDR) video on mainstream devices.[124]

The platform is scalable from 1 to 4 cores and doubles the decode performance relative to V61. It also adds High 10 H.264 encode (Level 5.0) and decode (Level 5.1) capabilities, as well as AVS Part 2 (Jizhun) and Part 16 (AVS+, Guangdian) decode capability for YUV420.[125]

Mali-V76

[edit]

The Mali V76 video processor was released with the Mali G76 GPU and Cortex-A76 CPU in 2018.[126] The V76 was designed to improve video encoding and decoding performance. The design continues the 2–8 variable core number design, with 8 cores capable of 8Kp60 decoding and 8Kp30 encoding. It claims improves HEVC encode quality by 25% relative to Mali-V61 at launch. The AV1 codec is not supported.

Mali-V77

[edit]

The Mali V77 video processor was released with the Mali G77 GPU and Cortex-A77 CPU in 2019.

Comparison

[edit]

Display processors

[edit]

Mali-D71

[edit]

The Mali-D71 added Arm Framebuffer Compression (AFBC) 1.2 encoder, support for ARM CoreLink MMU-600 and Assertive Display 5. Assertive Display 5 has support for HDR10 and hybrid log–gamma (HLG).

Mali-D77

[edit]

The Mali-D77 added features including asynchronous timewarp (ATW), lens distortion correction (LDC), and chromatic aberration correction (CAC)[broken anchor]. The Mali-D77 is also capable of 3K (2880x1440) @ 120 Hz and 4K @ 90 Hz.[131]

Image signal processors

[edit]

Mali-C71

[edit]

On April 25, 2017 the Mali-C71 was announced, ARM's first image signal processor (ISP).[143][144][145]

Mali-C52 and Mali-C32

[edit]

On January 3, 2019 the Mali-C52 and C32 were announced, aimed at everyday devices including drones, smart home assistants and security, and internet protocol (IP) camera.[146]

Mali-C71AE

[edit]

On September 29, 2020 the Mali-C71AE image signal processor was introduced, alongside the Cortex-A78AE CPU and Mali-G78AE GPU.[147] It supports up to 4 real-time cameras or up to 16 virtual cameras with a maximum resolution of 4096 x 4096 each.[148]

Mali-C55

[edit]

On June 8, 2022 the Mali-C55 ISP was introduced as successor to the C52.[149][150] It is the smallest and most configurable image signal processor from Arm, and support up to 8 camera with a max resolution of 48 megapixel each. Arm claims improved tone mapping and spatial noise reduction compared to the C52. Multiple C55 ISPs can be combined to support higher than 48 megapixel resolutions.

Comparison

[edit]

The Lima, Panfrost and Panthor FOSS drivers

[edit]

On January 21, 2012, Phoronix reported that Luc Verhaegen was driving a reverse-engineering attempt aimed at the Mali series of GPUs, specifically the Mali 200 and Mali 400 versions. The project was known as Lima and targeted support for OpenGL ES 2.0.[152] The reverse-engineering project was presented at FOSDEM, February 4, 2012,[153][154] followed by the opening of a website[155] demonstrating some renders. On February 2, 2013, Verhaegen demonstrated Quake III Arena in timedemo mode, running on top of the Lima driver.[156] In May 2018, a Lima developer posted the driver for inclusion in the Linux kernel.[157] In May 2019, the Lima driver became part of the mainline Linux kernel.[158] The Mesa userspace counterpart was merged at the same time. It currently supports OpenGL ES 1.1, 2.0 and parts of Desktop OpenGL 2.1, and the fallback emulation in MESA provides full support for graphical desktop environments.[159]

Panfrost is a reverse-engineered driver effort for Mali Txxx (Midgard) and Gxx (Bifrost) GPUs. A talk introducing Panfrost was presented at X.Org Developer's Conference 2018.[160] As of May 2019, the Panfrost driver is part of the mainline Linux kernel.[161] and MESA. Panfrost supports OpenGL ES 2.0, 3.0 and 3.1, as well as OpenGL 3.1.[162]

Later Collabora has developed[163] panthor driver for G310, G510, G710 GPUs.

See also

[edit]
  • Adreno – GPU developed by Qualcomm (formerly AMD, then Freescale)
  • Atom family of SoCs – with Intel graphics core, not licensed to third parties
  • AMD mobile APUs – with AMD graphics core, not licensed to third parties
  • PowerVR – by Imagination Technologies
  • Tegra – family of SoCs by Nvidia with the graphics core available as a SIP block to third parties
  • VideoCore – family of SoCs by Broadcom with the graphics core available as a SIP block to third parties
  • Vivante – available as SIP block to third parties
  • Imageon – old AMD mobile GPU
  • RDNA – by AMD, licensed to Samsung for use as GPUs in Exynos SoCs under Xclipse name

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Mali series consists of graphics processing units (GPUs) and multimedia processors developed by Arm as licensable semiconductor intellectual property (IP) cores, primarily targeted at low-power mobile and embedded devices such as smartphones, tablets, smart TVs, automotive systems, and IoT applications. Introduced in 2006 following Arm's acquisition of Falanx Microsystems, the family has evolved through multiple architectures—starting with the Utgard series for basic graphics acceleration, followed by Midgard with unified shaders, progressing to Bifrost for improved efficiency in gaming and compute tasks, Valhall for enhanced scalability and AI support, and the current 5th Generation architecture that offers up to 15% improved graphics performance and efficiency gains in machine learning while prioritizing power efficiency. Key features across models include variable rate shading, deferred vertex shading, and AI/ML acceleration, with premium variants under the Immortalis branding introducing hardware-based ray tracing (RTUv2) for console-class visuals in mobile gaming and advanced rendering. Notable implementations, such as the Mali-G77 for high-efficiency gaming and on-device ML, the Immortalis-G720 for flagship next-gen graphics, and the recent Mali-G1 Ultra with 2x ray tracing boosts, power devices from major vendors like Samsung, MediaTek, and automotive ADAS systems, enabling immersive experiences without compromising battery life.

Overview

History

ARM acquired Falanx Microsystems in June 2006, integrating its graphics technology to address the growing demand for advanced mobile graphics processing, with the initial Mali-55 GPU marking the start of the Mali family. This move positioned ARM to provide dedicated GPU IP for power-constrained mobile devices, building on Falanx's research from the Norwegian University of Science and Technology. The Mali-55 was followed by announcements of the Mali-200 in early 2007, which became the first commercially licensed Mali IP in 2008, achieving 2.0 conformance and enabling higher-resolution graphics in early smartphones and tablets. Key milestones in the included a shift toward open-source driver development in , with the community-driven project releasing initial open-source code for Mali-200 and Mali-400 GPUs to foster broader ecosystem adoption. Architectural evolution accelerated with the introduction of the architecture in , transitioning from fixed-function pipelines to unified shaders for improved flexibility and efficiency in handling diverse workloads like 3.0. This period also saw ARM's IP licensing model gain traction, with major SoC vendors such as , , and Allwinner integrating Mali GPUs into their platforms for cost-effective, high-performance graphics in consumer devices. Subsequent advancements focused on enhancing realism and AI capabilities, with ray tracing first introduced in the Immortalis series in 2022 and further advanced as part of the fifth-generation GPU architecture announced in May 2023, exemplified by the Immortalis-G720 for mobile gaming. In September 2025, ARM released the Mali G1-Ultra, built on the fifth-generation architecture, which enhances AI processing and doubles ray-tracing performance for desktop-quality visuals in mobile SoCs, along with a new branding scheme that drops the Immortalis and Cortex names in favor of G1 for GPUs and C1 for CPUs. The evolution toward fully open-source drivers gained momentum starting in 2017 with community-driven efforts like Panfrost, improving compatibility for hardware.

Key features and licensing

The Mali processors feature a highly scalable design, allowing configurations from single-core setups for low-power embedded applications to multi-core clusters with up to 24 cores, such as in the Mali-G78AE, to address diverse needs in mobile devices, automotive systems, and high-end computing. This modularity enables licensees to tailor performance and area trade-offs without redesigning core architectures, supporting applications from IoT sensors to flagship smartphones. Mali GPUs provide broad API compatibility, including up to version 3.2, from 1.0 to 1.3, and 1.2 and for compute tasks. support is achieved through translation layers in environments like Windows on ARM, facilitating cross-platform development. Power efficiency is a cornerstone of Mali's architecture, optimized for battery-constrained devices through tile-based deferred rendering, which divides the into small tiles (typically 16x16 pixels) processed on-chip to minimize external and reduce power draw by up to 50% compared to immediate-mode rendering. Additional features include dynamic voltage and frequency scaling (DVFS) for adaptive power management based on workload demands, and to disable inactive circuit blocks, further enhancing energy efficiency in tiled architectures. Licensing for Mali IP follows ARM's standard model, where the company provides synthesizable register-transfer level (RTL) designs for custom integration or pre-placed GDSII hard macros for faster implementation, accompanied by upfront fees and royalties calculated per shipped device. This structure allows partners to incorporate Mali into their SoCs while ARM handles ongoing optimizations. ARM collaborates with leading foundries like TSMC to enable fabrication on advanced nodes, including 3nm and 2nm processes targeted for production in 2025, ensuring compatibility with cutting-edge manufacturing. Mali processors are deeply integrated into major ecosystems, powering graphics and compute in billions of Android devices, Linux-based systems via open-source Mesa drivers, and Windows on ARM platforms for mixed-reality and productivity applications. Later generations, such as the Immortalis series, incorporate hardware-accelerated ray tracing units (RTUv2) for realistic lighting effects and dedicated AI engines for on-device inference, delivering up to 2x improvements in ray tracing throughput and ML performance.

Graphics processors

Utgard architecture

The Utgard architecture represents the inaugural generation of Arm's Mali GPU family, introduced as the first programmable featuring dedicated fixed-function vertex and fragment processors without unified shaders. This pre-unified approach separated in the vertex stage from pixel shading in the fragment stage, enabling efficient handling of 2D and 3D pipelines. A core innovation was its tile-based deferred rendering technique, which divides the screen into small tiles (typically 16x16 pixels) processed in on-chip memory buffers, significantly reducing external DDR memory bandwidth demands and enhancing power efficiency for battery-constrained mobile devices. This architecture supported up to 4x multi-sample (MSAA) directly in hardware, further optimizing rendering quality while minimizing overdraw. The Utgard lineup began with the Mali-55 in 2007 as a proof-of-concept for low-cost, fixed-function , featuring a processor for rasterization compliant with 1.1 and relying on CPU software for geometry tasks, with no programmable . This was followed by the programmable Mali-200 and Mali-300 in 2007-2008, which introduced vertex support for 2.0 alongside fragment processing, targeting enhanced mobile UIs and basic 3D games. The Mali-400 series, launched in 2008, expanded to multi-processor (MP) configurations with 1-4 fragment cores for scalable performance, while the Mali-450 in 2012 doubled scalability to 1-8 cores, all maintaining the fixed-function vertex setup shared across the family. These models were produced through 2012, with core counts configurable to balance area, power, and throughput in system-on-chip (SoC) integrations. Performance scaled with core count and process node, with the Mali-400 MP4 achieving up to 55 million triangles per second (Mtri/s) and 2.0 gigapixels per second (Gpix/s) fill rate at 500 MHz on a 28nm high-performance mobile (HPM) process, sufficient for resolutions in early smartphones. Targeted at feature phones and entry-level smartphones, these GPUs operated at clock speeds from 210-500 MHz, prioritizing low power over raw compute, with implementations as small as 1.4 mm² die area on 90nm for the Mali-55. Key innovations included for both 2D (via OpenVG 1.1) and 3D graphics, full-scene up to 16x, (HDR) rendering, and transaction elimination to further cut memory traffic by up to 75% in tiled operations. The Mali-400, for instance, powered the Samsung Galaxy S smartphone in 2010, enabling 2.0-compliant 3D games and UI effects in one of the first widely adopted Android devices. Despite these advances, the Utgard architecture's graphics-only focus and absence of compute shader support limited its applicability to general-purpose parallel processing tasks like , rendering it unsuitable for emerging workloads beyond rendering. This fixed-function design was eventually superseded by the architecture to address demands for unified shaders and broader compatibility.

Midgard architecture

The architecture represents ARM's first-generation unified design for Mali GPUs, introduced in 2012 as a significant advancement over prior fixed-function approaches by enabling programmable s for vertex, fragment, and compute processing within a type. This shift to scalar unified shaders allowed for flexible workload handling, with each core featuring multiple arithmetic s—typically two, increasing to three in later models like the Mali-T880—that process instructions via SIMD vectorization on 128-bit quad-word registers, supporting 4 FP32 operations, 8 FP16, or 16 int8 per per clock. handling was improved through a massively multi-threaded execution model, where hundreds of independent scalar threads per core mask latency from divergent paths by rapidly switching contexts, avoiding the penalties of lockstep execution in more rigid SIMD designs. Building briefly on Utgard's tile-based rendering foundation, retained deferred lighting and on-chip tile buffers for power efficiency while adding full programmability to support emerging APIs. Midgard evolved across four generations, scaling in core count and efficiency to meet diverse device needs. The first generation, launched with the Mali-T604 in 2012, supported up to four cores and marked the architecture's debut, powering early high-end tablets like the 7 (2012). The second generation (2013) included models such as the Mali-T622, T624, and T628, offering up to eight cores with enhanced power management for mid-range devices. Third-generation variants like the Mali-T720, T760, and T820 (announced in 2013 but shipping around 2015) pushed scalability to 16 cores, with the T760 delivering 400% better energy efficiency than the T604 through optimized pipelines and larger L2 caches (up to 2 MB). The fourth generation (2016), comprising the Mali-T830, T860, and T880, further refined this with up to 16 cores and support for more complex rendering, as seen in smartphones like the (T760 MP8 variant). Performance scaled with configuration, culminating in the Mali-T880 multi-processor variants, which could achieve over 100 GFLOPS of FP32 compute in 12-core setups at typical mobile clocks around 800 MHz, driven by three arithmetic pipelines per core enabling up to 12 FP32 operations per core per cycle. All GPUs supported 3.1 and 1.0, alongside 1.2 for compute tasks, enabling features like multi-sample up to 16x and adaptive scalable texture compression (ASTC) for bandwidth reduction. Key features emphasized adaptive scalability, allowing integrators to configure 1 to 16 cores per design to balance power and performance across low- to high-end SoCs. A dedicated job manager handled task distribution, pipelining vertex, tiling, and fragment jobs to optimize throughput while minimizing host CPU overhead. Additional efficiencies included transaction elimination (reducing memory writes by 16x in 16x16 pixel blocks) and Frame Buffer Compression (AFBC) for on-chip storage, contributing to overall system energy savings in tile-based rendering. Midgard laid the groundwork for GPU compute workloads by unifying shader types and exposing general-purpose memory access, but its scalar, thread-level execution model—lacking wave-level (SIMD ) primitives—limited occupancy and efficiency in highly divergent or bandwidth-bound kernels compared to later architectures.

Bifrost architecture

The Bifrost architecture represents the second-generation unified shader design in the Mali GPU family, succeeding by introducing enhancements in execution efficiency and power management while maintaining a tile-based deferred rendering approach. It features programmable shader cores capable of handling vertex, fragment, and compute workloads through a single unified pipeline, with each core comprising multiple execution engines (EEs) that support dual-issue capabilities for issuing two instructions per cycle to improve throughput. The arithmetic pipelines employ warp-based vectorization, operating on 4-wide warps for scalar 32-bit operations but scaling to 16-wide SIMD execution for lower-precision formats like INT8, enabling efficient processing of diverse workloads including texture sampling and compute tasks. Bifrost GPUs are organized into generations, with the first generation encompassing models like the Mali-G71 (announced in 2016) and entry-level variants such as the G52 (2018), followed by the second-generation Mali-G72 (2017) and third-generation Mali-G76 (2018), which support up to 20 cores in multi-processor configurations for scalable performance. Key improvements include refined texture sampling units that deliver one bilinear per clock in small cores (scaling to two in larger cores) and optimized depth texture reduced to a single cycle, enhancing rendering efficiency for complex scenes. The architecture also incorporates better through support for mixed-precision operations (INT8, INT16, FP16), allowing dynamic scaling of compute resources to minimize energy use during varying workloads. Performance in Bifrost scales with core count and clock speed, for instance the Mali-G52 MC2 achieves ~80–100 GFLOPS in theoretical FP32 performance, with the Mali-G76 offering up to 46% greater graphics processing power compared to its predecessor while achieving 178% improved energy efficiency, making it suitable for high-end mobile applications. It provides full support for 1.1 and 2.0, enabling advanced graphics rendering and general-purpose computing on GPUs. L2 cache enhancements feature a unified logical cache (configurable from 128 KB to 4 MB across implementations) that reduces partial line writes to memory, improving bandwidth efficiency particularly with LPDDR4 interfaces. Bifrost's design advances prepare it for machine learning applications through efficient compute shaders and INT8 dot product support in later models like the G76, boosting ML inference performance by up to 17% over prior generations via optimizations in thread occupancy and register file size (up to 64 64-bit registers per thread). Early implementations include the Mali-G71 in devices like the , while the G76 powers the Huawei Kirin 980 SoC in smartphones such as the Huawei Mate 20.

Valhall architecture

The Valhall architecture is Arm's fourth-generation GPU for the Mali series, succeeding Bifrost and emphasizing enhanced efficiency through innovative compression techniques and scalable design. It builds on Bifrost's dual-issue execution model by incorporating shader core compression that achieves up to 2x density in performance per silicon area compared to prior generations, enabling more compute resources within the same die space. Larger register files support up to 64 32-bit registers per thread, with full at 32 registers to accommodate complex without sacrificing parallelism. Valhall also introduces native support for mesh via extensions, allowing developers to generate and cull geometry more efficiently on the GPU. Valhall evolved across four generations, beginning with the first in 2019 featuring the entry-level Mali-G57 and premium Mali-G77 models, which prioritized power efficiency for mobile devices. The second generation arrived in 2020 with the Mali-G68 for mainstream applications and the high-end Mali-G78, scalable to 24 cores for demanding workloads. The third generation, launched in 2022, included the ultra-low-power Mali-G310 and mid-range Mali-G610, optimizing for broader deployment in wearables and IoT. The Mali-G610 MC4 provides approximately 4-5x higher GPU performance than the Mali-G52 MC2 in benchmarks like AnTuTu, attributed to the newer Valhall architecture, increased core count, and efficiency improvements, making it suitable for mid-range gaming and multitasking. The fourth and final generation in 2023 delivered the Mali-G715 for general use and the Immortalis-G715 variant with dedicated ray tracing hardware, supporting up to 12 cores in premium configurations. Performance highlights include the Mali-G78 MP24 configuration reaching up to 1.3 TFLOPS of FP32 throughput, underscoring Valhall's suitability for console-quality mobile gaming. The conforms to 1.2 and 1.3, with Immortalis models adding hardware-accelerated ray tracing for realistic lighting and shadows in supported titles. Notable enhancements encompass improved asynchronous compute, enabling simultaneous graphics rendering and compute tasks to maximize utilization and reduce latency. Integrated AI tensor accelerators further elevate inference, delivering up to 60% higher ML performance density in initial implementations. Valhall powers key SoCs like the in devices (Mali-G78) and Dimensity 9200 (Mali-G715), driving immersive experiences in smartphones. As the concluding major iteration before Arm's shift to the fifth-generation , Valhall solidified compression-driven as a cornerstone for energy-constrained embedded .

Fifth-generation GPU

The fifth-generation GPU , introduced by Arm in May 2023, represents a new designed to enhance rendering, AI workloads, and power efficiency in mobile devices. It features improved core scaling capabilities, supporting configurations from fewer than five cores in entry-level variants to over ten cores in flagship models, with later iterations extending up to 24 cores. This delivers an average 15% peak performance increase and 15% better energy efficiency compared to the prior generation, alongside a 20% uplift in frame rates for complex scenes, while being optimized for advanced 3nm process nodes to accelerate system-on-chip integration. Key models based on this include the Immortalis-G720, a 2023 ray-tracing flagship scalable to ten or more cores for high-end smartphones; the Mali-G720 and Mali-G620, mid-range options from 2023 with six to nine cores and up to five cores, respectively, omitting mandatory ray tracing for cost efficiency; the Mali-G725, a 2024 premium scalable variant with six to nine cores emphasizing gaming and AI; and the Mali G1-Ultra, a 2025 flagship model with enhancements for AI processing and ray tracing, scaling from ten to 24 cores. As a successor to the Valhall , it builds on prior compression techniques while introducing deferred vertex to handle increased scene complexity more effectively. Performance highlights include up to approximately 2.6 TFLOPS in configurations like the Immortalis-G720 MC12, enabling sustained frame rates in demanding applications. The architecture provides full support for 1.3 and enhanced implementations, including versions 1.2, 2.1, and 3.0 full profile, to facilitate tasks such as 3D scene reconstruction with up to 25% better efficiency in select workloads. Central features encompass expanded hardware ray tracing via a power-gatable ray-tracing unit, which doubles performance in the G1-Ultra through the second-generation RTUv2 for more realistic lighting and reflections, and intelligent workload balancing that reduces CPU load by up to 40% while cutting usage for lower power consumption. These GPUs power upcoming 2025 SoCs, such as those in Arm's Lumex Compute Subsystem platform, which integrates them with AI-optimized CPU clusters for on-device experiences in gaming and . Looking ahead, the architecture benefits from open-source kernel drivers, with the Panthor DRM driver providing upstream support identical to Arm's commercial implementations for fifth-generation models.

Technical details

The Mali GPU architectures utilize tile-based deferred rendering (TBDR) as a core mechanism to minimize consumption, particularly suited for power-constrained mobile devices. In TBDR, the rendering pipeline begins with a phase that bins incoming against the screen, dividing it into fixed-size —typically 16×16 pixels in Mali implementations. This binning creates a compact list , identifying only the relevant for each and discarding those that do not overlap, thereby avoiding unnecessary fragment processing across the full . The subsequent fragment processing phase renders each tile entirely within on-chip tile memory, which buffers color, depth, and stencil data locally. Primitives are rasterized, shaded, and blended per tile, with visibility tests (such as early-Z rejection) and overdraw resolution handled without external memory accesses. Only the finalized tile buffer is written back to system memory once, leveraging techniques like Arm Frame Buffer Compression (AFBC) for further lossless reduction in transfer size. This off-screen rendering eliminates repeated reads and writes associated with overdraw in traditional immediate-mode rendering. Bandwidth savings in TBDR arise primarily from localizing overdraw handling, which can be approximated conceptually as follows. In immediate-mode rendering, required bandwidth scales with total fragment , given by BIMRS×O×(R+W)B_{\text{IMR}} \approx S \times O \times (R + W), where SS is the screen area in , OO is the overdraw factor (average fragments per ), RR is bandwidth for reads (e.g., depth/color fetches), and WW is bandwidth for writes. In TBDR, occurs per , reducing to BTBDR(S/T)×Wtile+BbinB_{\text{TBDR}} \approx (S / T) \times W_{\text{tile}} + B_{\text{bin}}, where TT is area in (e.g., 256 for 16×16), WtileW_{\text{tile}} is the final write cost, and BbinB_{\text{bin}} is binning overhead. Derivation of savings yields ΔBS×(O1)×(R+W)\Delta B \approx S \times (O - 1) \times (R + W), assuming negligible binning cost relative to overdraw elimination; smaller tiles refine this by limiting intra-tile overdraw but increase binning granularity. Quantitative impact includes up to 4 GB/s savings for deferred at 60 FPS, establishing TBDR's role in bandwidth efficiency. Shader core designs across Mali architectures have evolved from vector-oriented processing to hybrid scalar-vector models, enhancing flexibility for both graphics and compute workloads. Early Utgard and generations relied on a 4-wide vector (vec4) SIMD execution, where instructions processed four components in parallel, aligning well with graphics shaders but limiting divergence in general-purpose code. unified vertex and fragment shaders into scalable cores with dual-issue pipelines for improved throughput. Bifrost shifted to a scalar ISA with quad-parallel execution, executing four independent scalar threads in per stage, which boosts utilization and eases compilation compared to Midgard's vector constraints. Valhall builds on this scalar foundation, incorporating vector processing capabilities for compute tasks while maintaining scalar efficiency for ; register files expanded significantly, reaching 128 KB per core to support higher thread counts (up to threads per core) and complex programs without spilling to memory. This evolution prioritizes balanced performance across diverse workloads. The memory hierarchy in Mali GPUs balances low latency and bandwidth through tiered caching, integrated with system-level coherence. Shader cores include private L1 instruction and data caches (typically 16-32 KB combined) alongside texture caches for filtering operations, enabling fast local access during execution. A unified L2 cache, shared across cores and scalable to 64-128 KB per core in architectures like Bifrost and Valhall, aggregates traffic and applies compression for framebuffer data. System coherence with CPU cores is managed at the L2 boundary via protocols such as (AXI Coherency Extensions), ensuring GPU writes are visible to CPUs and vice versa without involving per-core L1 caches in snoop traffic; this reduces overhead while maintaining data consistency in heterogeneous SoCs. L1 caches operate non-coherently internally, relying on L2 for inter-core and system . Power management features in GPUs emphasize efficiency through dynamic voltage and (DVFS), which modulates core clocks and voltages based on workload demand, often via platform governors that profile utilization. Idle states power down unused shader cores or the entire GPU during quiescence, minimizing leakage. Efficiency metrics, such as GFLOPS/Watt, improve across generations—Bifrost achieves roughly 2× better power efficiency than in fragment-heavy workloads due to reduced overdraw and scalar optimizations—enabling sustained performance within thermal limits. DVFS curves typically scale frequency linearly with utilization while quadratically reducing power, prioritizing energy savings in bursty mobile scenarios. Compute and AI capabilities in Mali GPUs leverage an OpenCL-based execution model, where kernels define parallel work-items grouped into work-groups, dispatched via NDRanges to shader cores for processing. Each work-item executes as an independent thread with its own , scheduled in waves to maximize ; barriers and atomics ensure within work-groups. Later generations, including Valhall and fifth-gen architectures, extend this with tensor operations like low-precision matrix multiply-accumulate (e.g., FP16/INT8) directly in pipelines, accelerating AI inference without dedicated tensor cores by fusing operations for layers. This model supports scalable compute throughput, with examples like kernels benefiting from vectorized tensor ops in AI workloads.

Implementations

Mali graphics processors are integrated into various system-on-chip (SoC) designs by major vendors, enabling graphics acceleration in mobile, embedded, and emerging computing platforms. Samsung's series frequently incorporates Mali GPUs, with the Exynos 9820 featuring a Mali-G76 MP12 configuration to deliver enhanced gaming performance in flagship devices. MediaTek's Dimensity lineup also widely adopts Mali technology, as seen in the Dimensity 9200 SoC with an Immortalis-G715 MC11 GPU, supporting advanced ray tracing and high-frame-rate rendering for premium smartphones. Google's Tensor SoC in the series utilizes a Mali-G78 MP20 GPU, achieving strong graphics benchmarks that demonstrate reliable performance for everyday mobile tasks and light gaming. Notable device integrations highlight Mali's versatility across consumer products. The , powered by the Exynos 9820, leverages the Mali-G76 MP12 for immersive visuals in a 6.1-inch display, contributing to its premium multimedia experience. The OnePlus Nord 3 employs the Dimensity 9000 with a Mali-G710 MC10 GPU, balancing efficiency and power for mid-range gaming and multitasking on its 6.74-inch Fluid screen. In the embedded space, Allwinner's A-series SoCs, such as the A33, integrate Mali-400 MP2 GPUs for cost-effective tablets, supporting basic 2.0 acceleration in budget Android devices. MediaTek's Helio G91 SoC incorporates the Mali-G52 MC2 GPU, delivering theoretical FP32 performance of ~80–100 GFLOPS for entry-level smartphones and tablets. In automotive applications, earlier Renesas R-Car generations, like the R-Car E2, incorporated Mali GPUs such as the for systems, enabling smooth UI rendering and video playback in vehicle displays. For emerging workloads, the 2025-introduced G1-Ultra GPU appears in SoCs like the Dimensity 9500, targeting AI-enhanced graphics in upcoming flagships such as the Vivo X300 Pro, with up to 33% improved GPU performance over prior generations.
Vendor/SoCMali GPU VariantNotable Devices/Use CasesKey Performance Context
9820-G76 MP12 S10Up to 40% graphics performance uplift for gaming
Dimensity 9200Immortalis-G715 MC11Vivo X90 Pro, Find X632% boost in Manhattan 3.0 benchmark scores
()-G78 MP20 6Strong performance in graphics benchmarks
Dimensity 9000 ( 3)-G710 MC10 3Efficient for mid-range emulation and multitasking
Allwinner A33-400 MP2Budget Android tabletsBasic 1080p UI and video support
Helio G91Mali-G52 MC2Entry-level smartphonesTheoretical FP32 performance ~80–100 GFLOPS
Dimensity 9500 G1-Ultra MP12Vivo X300 Pro (2025)119% ray tracing improvement for AI workloads

Video processors

Mali-V500

The Mali-V500 is Arm's inaugural dedicated video processor, announced in 2013 and made available for integration into system-on-chips (SoCs) starting in mid-2014. Designed for mainstream mobile and embedded devices, it supports key formats for both decoding and encoding, including H.264 (up to High Profile level 4.1) and for encode/decode, and , MPEG-4 ASP, , /WMV, and for decoding, enabling efficient processing of standard-definition and high-definition content. A single-core configuration delivers performance up to at 60 frames per second (fps) for both encode and decode, scaling to 4K@120fps with eight cores, with low latency under 10 ms at 1080p30. The architecture employs a scalable fixed-function , configurable from one to eight cores to balance performance and power, with each core operating at up to 600 MHz via an AMBA AXI or Lite bus interface. This design emphasizes energy efficiency, reducing overall system bandwidth by over 50% through integration with Frame Buffer Compression (AFBC), which enables lossless frame storage and minimizes memory access during . The processor includes a (MMU) for virtual addressing and supports TrustZone for secure content handling, ensuring protected video paths in multi-tenant environments. It is optimized for low-cost (DRAM) types, further lowering power draw in entry-level SoCs targeted at mid-range mobile devices. Key specifications highlight its capability for 1 to 4 simultaneous streams on multi-core variants, facilitating multi-view or multi-party video use cases without excessive power overhead. The Mali-V500 integrates directly with Arm's Mali GPU lineup, such as the Midgard-based Mali-T622 and Mali-T720, allowing shared resources for compositing and post-processing in unified pipelines. While effective for basic high-definition processing, the Mali-V500 is limited to legacy formats without support for emerging codecs like HEVC, positioning it as a foundational solution for cost-sensitive designs. It precedes more advanced V-series processors by establishing 's approach to dedicated video .

Mali-V550

The Mali-V550 is a scalable video processor IP core developed by , introduced in October 2014 as part of the company's Mali multimedia suite, with a primary focus on hardware-accelerated HEVC (H.265) encoding to enable efficient high-resolution video capture in mobile and embedded devices. It represents the first video processor to integrate both encoding and decoding in a single core, supporting up to 1080p60 HEVC encode/decode on one core and scaling to 4K@120fps with an eight-core configuration, making it suitable for premium smartphones and set-top boxes requiring 4K video output. As an evolution from the Mali-V500, the V550 adds dedicated encoding hardware while maintaining for multi-standard video processing. The architecture of the Mali-V550 centers on a multi-core hardware encode , configurable from one to eight cores, which handles , , and rate control optimized for HEVC Main Profile at 8- and 10-bit depths. This design supports time-multiplexed multi-stream encoding, allowing up to eight simultaneous 720p streams or mixed resolutions with different codecs like H.264 and HEVC, reducing the need for multiple dedicated engines in system-on-chip (SoC) designs. Integrated features such as Arm Frame Buffer Compression (AFBC) minimize by up to 60% during encoding, enhancing power efficiency without quality loss, particularly for external display scenarios like wireless streaming. Key specifications include a low-latency mode that hides memory access delays to prevent frame drops, ideal for real-time applications such as video calls and at resolutions up to . The processor has been integrated into SoCs like the S912, an octa-core Cortex-A53 design used in 4K boxes, where it enables hardware HEVC encoding for efficient media processing. Compared to software-based encoding on general-purpose CPUs, the Mali-V550 delivers significantly better compression efficiency—up to 50% lower power consumption for equivalent bitrates—by offloading compute-intensive tasks to dedicated , thereby extending battery life in mobile devices while supporting higher quality outputs.

Mali-V61

The Mali-V61 is a versatile video processor developed by , announced on October 31, 2016, and designed for integration into mainstream mobile and embedded systems starting in 2017. It combines for H.265 (HEVC) Main10 Profile decoding and encoding with VP9 Profile 2 decoding, supporting both 8-bit and 10-bit color depths for multi-format video processing up to 4K UHD resolution at 60 frames per second. This unified approach enables efficient handling of high-definition content for applications like streaming and video conferencing, while maintaining backward compatibility with earlier formats such as H.264. Building on the encode-focused Mali-V550, the V61 introduces robust decode capabilities to support emerging web video standards. Its architecture employs a unified that processes both decoding and encoding tasks, allowing for flexible across up to 16 simultaneous decode streams or 8 encode streams. This design optimizes throughput for scenarios involving multiple video feeds, such as live broadcasting or multi-view playback, while leveraging Arm Frame Buffer Compression (AFBC) v1.2 for reduced . The processor scales from 1 to 8 cores, enabling configurations tailored to performance needs, from single-core @60fps operation to multi-core 4K@120fps decoding. Key specifications include native support for enhanced dynamic range in 4K content, ensuring compatibility with high-fidelity displays without additional processing overhead. Its power-efficient architecture, optimized for 28nm and advanced nodes, minimizes energy consumption for battery-constrained environments, making it suitable for IoT applications requiring scalable video handling from to 4K resolutions.

Mali-V52

The Mali-V52 is a video processing unit (VPU) developed by and announced in March 2018 as part of the company's Mali Multimedia Suite targeting mainstream devices. It serves as a decode-centric IP core with H.264 encoding support, optimized for efficient hardware decoding and encoding of high-resolution video streams, supporting key codecs including H.265/HEVC (up to 10-bit) and for decode, and H.264/AVC (High 10 Profile, Levels 5.0/5.1) for encode/decode. This design enables smooth playback of 4K content at 60 frames per second in single-core configurations for decode, scaling to 4K at 120 fps decode or 4K at 60 fps encode with up to four cores, making it suitable for delivering premium video experiences in resource-constrained environments. Architecturally, the Mali-V52 features a streamlined core emphasizing gains through architectural refinements that double decoding throughput compared to the prior Mali-V61 while reducing silicon area by 38%. The core is scalable from one to four instances, allowing integration flexibility in system-on-chips (SoCs) for varying needs, and incorporates optimizations for YUV420 color handling to minimize overhead. Similar to the Mali-V61, it prioritizes efficiency but introduces targeted improvements for mid-range scalability. Key specifications highlight its efficiency, with the compact design enabling low bandwidth utilization and power consumption ideal for battery-powered devices. For instance, a single core can handle 4K at 30 fps encode or 60 fps decode, or at 120 fps, supporting multi-stream scenarios in mainstream applications without excessive memory demands. This focus on area and power efficiency—achieved through refined heuristics and reduced external memory accesses—positions the Mali-V52 as a cost-effective solution for SoC designers aiming to include advanced video capabilities in mid-tier hardware. In practical use cases, the Mali-V52 excels in streaming services on budget smartphones and entry-level tablets, where it facilitates high-quality 4K video playback for apps like or while conserving system resources for other tasks. Its deployment in mainstream SoCs supports HDR content decoding, enhancing visual fidelity in affordable without compromising on thermal or energy budgets.

Mali-V76

The Mali-V76 is a video processing unit (VPU) from , announced on May 31, 2018, as part of a premium IP suite targeting high-end mobile devices, set-top boxes, and requiring advanced processing. It builds on prior generations by doubling decode performance while reducing silicon area by up to 40% for equivalent tasks, enabling efficient handling of ultra-high-definition content in power-constrained environments. This processor supports key codecs including H.265 (HEVC) for both decoding and encoding, as well as decoding, with for 10-bit . The architecture of the Mali-V76 employs a scalable multi-core design configurable from 2 to 8 cores, allowing SoC designers to optimize for varying performance needs and power budgets. Its next-generation decode and encode engines incorporate optimizations for high-resolution video, including support for high dynamic range formats such as and hybrid log-gamma (HLG), which enhance color accuracy and contrast in displays. The unit also facilitates multi-view video applications through simultaneous stream processing, such as configuring for video walls or multi-screen setups. Operating at frequencies up to 800 MHz, it delivers significant efficiency gains over predecessors like the Mali-V61, with reduced power consumption for sustained high-frame-rate operations. Key specifications highlight the Mali-V76's capability for 8K decoding at up to 60 frames per second in a single stream or four 4K streams at 60 fps, alongside support for up to 16 full HD (1080p) streams concurrently. Encoding performance includes H.265 up to 8K at 30 fps or 4K at 120 fps, suitable for premium in smartphones and applications. These features position the V76 for emerging 8K ecosystems, with implementations appearing in high-end SoCs from vendors like and for devices such as smart TVs and tablets. Overall, it advances video processor efficiency, enabling broader adoption of 8K content without compromising battery life or thermal limits.

Comparison of video processors

The Mali video processors demonstrate a clear progression in capabilities, beginning with the V500's focus on efficient H.264 and processing for HD content and advancing to the V76's support for high-resolution, multi-format decoding and encoding in premium mobile devices. This evolution reflects Arm's emphasis on scaling performance for diverse SoC requirements while maintaining low power consumption suitable for battery-powered systems. Key advancements include expanded support, higher resolutions, and optimized multi-stream handling to enable features like simultaneous video playback and recording.
ModelDecode FormatsEncode Formats
V500H.264, VP8, H.263, MPEG-4 ASP, MPEG-2, VC-1/WMV, RealVideo H.264,
V550H.264, HEVC H.264, HEVC
V61H.264, HEVC, H.264, HEVC
V52H.264, HEVC, H.264
V76H.264, HEVC, H.264, HEVC
Resolution and frame rate capabilities have significantly advanced across generations, enabling higher-quality video experiences on mobile platforms. The V500 supports up to at 60 fps for both decode and encode on a single core, scaling to 4K at 120 fps with eight cores, targeting mid-range devices of the mid-2010s. Subsequent models like the V550 maintained similar scaling but added HEVC efficiency for 4K at 120 fps decode/encode on eight cores. The V61 extended this to 4K at 120 fps decode with support, suitable for immersive VR and streaming. The V52, optimized for mainstream SoCs, achieves 4K at 60 fps decode and 60 fps encode across 1-4 cores, doubling decode performance relative to the V61 while reducing area. The V76 marked a leap to 8K at 60 fps decode or four concurrent 4K at 60 fps streams on eight cores, with 8K at 30 fps encode, addressing emerging ultra-HD demands. Power efficiency trends show consistent improvements, with each generation reducing area and increasing to extend battery life in mobile SoCs. For instance, the V52 achieves double the decode performance of the V61 in 38% less area, enhancing GFLOPS per watt for 4K workloads. The V76 further optimizes for 8K processing, delivering up to 4K at 120 fps decode on just four cores compared to eight in prior models, reflecting architectural refinements that lower power draw by approximately 20-30% for equivalent tasks. Later iterations continue this trajectory, prioritizing energy-efficient multi-stream operations for always-on video features in high-end devices, though specific GFLOPS/W metrics vary by process node and configuration. Integration factors across Mali-V models include scalable core counts (1-8) for multi-stream support, enabling simultaneous handling of multiple video pipelines—such as one 4K decode and two encodes in higher-end variants like the V76. All models are compatible with Android's MediaCodec API, facilitating hardware-accelerated video in frameworks like and ensuring seamless interoperability with TrustZone for secure content paths. This design allows SoC vendors to configure stream counts based on bandwidth needs, with later models like the V61 onward supporting up to 10-bit color depths for HDR workflows. When selecting a video processor, SoC designers should consider and feature set: the V52 suits mainstream devices requiring cost-effective 4K60 decode/60 encode without excessive area, ideal for smartphones balancing power and performance. In contrast, the V76 is preferable for SoCs demanding 8K60 multi-stream capabilities, such as advanced video walls or premium streaming, where its efficiency gains justify the integration complexity. : https://community.arm.com/arm-community-blogs/b/mobile-graphics-and-gaming-blog/posts/arm-mali-v500-overview
: https://www.tomshardware.com/news/arm-mali-gpus-video-display,27961.html
: https://www.cnx-software.com/2016/11/01/arm-introduces-bifrost-mali-g51-gpu-and-mali-v61-4k-h-265-vp9-video-processing-unit/
: https://www.notebookcheck.net/ARM-announces-new-Mali-G52-31-GPUs-along-with-video-and-display-processors-for-mobile-devices.287474.0.html
: https://www.cnx-software.com/2018/06/01/arm-cortex-a76-cpu-mali-g76-gpu-mali-v76-8k-vpu/

Display processors

Mali-D71

The Mali-D71 is a display processor developed by , announced on November 1, 2017, as the first implementation of the company's Komeda architecture for advanced mobile and embedded display handling. Designed primarily for high-resolution outputs in power-constrained environments, it enables driving up to two independent displays simultaneously, with support for 4K (3840×2160) resolution at 60 frames per second per display in dual mode or a single 4K display at up to 120 Hz for latency-sensitive applications like . The core architecture revolves around a modular compositor with two configurable pipelines, allowing flexible allocation for either dual-display operation—where each pipeline drives a separate output—or single-display mode with combined resources for enhanced complexity, such as up to eight simultaneous Android composition layers. This setup incorporates stages for layer blending, scaling, rotation, and post-processing, integrated with Framebuffer Compression (AFBC) 1.2 to optimize and reduce power consumption. The processor pairs with the CoreLink MMU-600 for efficient 4KB-paged , ensuring real-time in scenarios requiring low latency. Key specifications emphasize compatibility with major display interfaces, including MIPI DSI for mobile panels and for external connections, making it suitable for smartphones, tablets, and VR headsets. Power efficiency is a hallmark, with the Mali-D71 offloading composition tasks from the GPU to achieve up to 30% overall system power savings in complex UI scenarios compared to GPU-based rendering. It supports output natively through integration with Assertive Display 5, which handles , conversion, and enhancement even on standard dynamic range (SDR) displays. Additional features include for accurate color reproduction and dithering to minimize banding artifacts in gradients. The Mali-D71 complements Arm's Mali GPU families by managing final display pipeline stages, such as multi-layer blending and output formatting, thereby freeing GPU resources for rendering and improving overall system responsiveness in multi-window environments.

Mali-D51

The Mali-D51 is a mainstream display processor developed by , announced on March 6, 2018, based on the Komeda architecture. It supports up to at 60 Hz, with up to eight composition layers, bringing premium features like HDR support via Assertive Display 5 to mid-range devices. Compared to the previous Mali-DP650, it offers 30% system power savings and 50% better , enabling efficient handling of complex UIs while maintaining low power consumption.

Mali-D77

The Arm Mali-D77 is a premium display processing unit (DPU) introduced in May 2019, designed primarily to enhance (VR) experiences in head-mounted displays (HMDs) and premium mobile devices by handling high-resolution, low-latency composition and rendering offloads from the GPU. It builds upon the Komeda of prior models, enabling support for up to four stereo VR layers with optimizations for resolutions such as 3K at 120 frames per second (fps) or 4K at 90 fps, which helps reduce through smoother frame delivery. This represents an evolution from the Mali-D71's capability for dual 4K displays at 60 Hz or a single 4K display at 120 Hz, incorporating dedicated VR accelerations to improve overall system efficiency. Architecturally, the Mali-D77 features fixed-function hardware blocks that perform VR-specific tasks, including Asynchronous Timewarp (ATW) for interpolating frames to maintain high refresh rates despite GPU bottlenecks, Lens Distortion Correction (LDC) to compensate for optical distortions in HMDs, and Correction (CAC) for color fringing reduction. These enhancements, integrated into the Komeda compositor, allow for multi-layer composition with (HDR) support on 4K displays, enabling pixel densities exceeding 1000 per inch (ppi) in collaboration with display drivers like those from . The design also achieves up to 40% savings in system bandwidth and 12% in power consumption for VR workloads by offloading compute-intensive operations from the GPU. Key specifications emphasize scalability for untethered VR devices, supporting seamless transitions from HMDs to standard premium mobile screens while preserving image quality. When paired with Arm's MMU-600 memory management unit and Assertive Display 5 engine, it facilitates efficient handling of high-frame-rate content without compromising battery life or thermal performance. The Mali-D77's focus on VR acceleration positions it as a foundational IP for next-generation immersive applications, prioritizing low-latency rendering over general-purpose display tasks.

Image signal processors

Mali-C71

The Mali-C71 is Arm's inaugural image signal processor (ISP), announced on April 25, 2017, and designed specifically for advanced driver-assistance systems (ADAS) in automotive applications. It addresses challenges in processing images from multiple cameras under varying lighting and weather conditions, enabling features like 360-degree surround views and for both human display and computer vision pipelines. Built following Arm's acquisition of Apical, the processor integrates over 300 dedicated fault detection circuits to support high-reliability standards, marking a shift toward integrated solutions for smart vehicles. Architecturally, the Mali-C71 employs a multi-input capable of handling up to four real-time camera streams or sixteen additional streams from memory, allowing simultaneous processing from diverse sensor types including , monochrome, and flexible color filter arrays (CFAs). It features a modular block-based that includes advanced modules—such as 2D spatial filtering and per-exposure temporal profiling—along with chromatic aberration correction and (HDR) fusion to merge exposures from up to 24 stops of . This enables ultra-wide imaging, far exceeding typical smartphone ISPs, to capture details in extreme contrasts like direct and shadows. The processor outputs processed data in formats suitable for display or further analysis, with optimizations for low latency and reversible transforms to preserve raw data integrity for tasks. Key specifications include a throughput of 1.2 gigapixels per second, supporting resolutions adequate for automotive cameras such as full HD at high frame rates, while prioritizing efficiency in power-constrained embedded systems. It processes raw data through debayering, , and stages, with built-in support for region-of-interest cropping and planar histograms to accelerate downstream algorithms. The Mali-C71 has been integrated into automotive system-on-chips (SoCs) for enhanced , distinguishing it as a foundational technology for evolving autonomous driving capabilities.

Mali-C52 and Mali-C32

The Mali-C52 and Mali-C32 image signal processors (ISPs) were announced on January 3, 2019, as mid-range and entry-level solutions for embedded vision applications such as cameras, drones, and smart home devices. The Mali-C52 targets balanced camera systems with support for up to four independent camera inputs at a maximum resolution of 4608 × 3456 pixels (approximately 16 megapixels per ), enabling real-time processing for 4K video at 60 frames per second. In contrast, the Mali-C32 is area-optimized for low-power, cost-sensitive entry-level devices, maintaining similar input capabilities but in a more compact implementation suitable for basic 16-megapixel imaging. Both provide a complete including hardware IP, software drivers, 3A libraries for auto-exposure, auto-white balance, and auto-focus, along with calibration and tuning tools. These ISPs employ a scalable, block-based with multi-context that applies over 25 steps per to raw sensor data from RGGB or RGBIr formats, supporting multi-channel outputs in RGB or . The Mali-C52 offers configurable modes optimized for either superior image quality or reduced area, with a peak throughput of 600 megapixels per second to handle demanding real-time workloads. The Mali-C32 prioritizes efficiency in the same pipeline, delivering comparable performance in a smaller footprint for resource-constrained systems. Key features focus on essential image enhancement for both human and computer vision, including basic (HDR) processing via Arm's Iridix technology for contextual and management, which preserves details in shadows and highlights without overexposure. Additional capabilities encompass advanced to minimize artifacts in low-light conditions and for accurate reproduction, alongside lens correction through geometric compensation integrated into the processing flow. These elements enable high-quality outputs for applications requiring reliable imaging without the advanced of later models like the Mali-C71.

Mali-C71AE

The Mali-C71AE is an image signal processor (ISP) developed by for automotive and industrial applications, particularly advanced driver-assistance systems (ADAS) and tasks. Announced in September 2020, it builds on the architecture of the consumer-oriented Mali-C71 but incorporates enhancements for and reliability in harsh environments. It supports processing from multiple camera streams to enable features like surround-view systems, , and night-vision enhancement, delivering up to 1.2 gigapixels per second throughput. Designed with automotive-grade ruggedization, the Mali-C71AE operates reliably in extreme conditions typical of vehicle and industrial settings, emphasizing fault tolerance and diagnostic coverage. It meets ISO 26262 ASIL B for random hardware faults and ASIL D for systematic failures, alongside IEC 61508 SIL 3 standards, through over 400 built-in fault-detection circuits, cyclic redundancy checks (CRC), and built-in self-test (BIST) mechanisms. The architecture includes dedicated pipelines for simultaneous human-visible output (for displays) and computer-vision processing (for ADAS), supporting up to four real-time camera inputs at resolutions up to 4096 x 2560 pixels or 16 virtual streams from memory. This multi-camera capability handles diverse sensor types, such as RGGB, RCCC, and RGBIr, with 4:1 high dynamic range (HDR) exposure fusion for twice the dynamic range of a single-exposure sensor. Key features focus on enhancing image quality and safety for ADAS applications, including advanced 2D via sinter technology, chromatic aberration correction, and per-exposure noise profiling for low-light conditions. It enables multi-camera stitching for 360-degree views and region-of-interest cropping, while tagging suspect pixels and providing reversible transforms to maintain for downstream AI processing. The ISP integrates with Arm's Automotive (AE) ecosystem, such as the Cortex-A78AE CPU and Mali-G78AE GPU, and has been adopted in automotive system-on-chips (SoCs) for production monitoring, quality control, and all-around vehicle awareness.

Mali-C55

The Mali-C55 is an image signal processor (ISP) developed by and released on June 8, 2022, designed specifically for efficient image processing in IoT and embedded vision systems. It supports up to eight simultaneous camera inputs, enabling multi-sensor setups for applications such as smart cameras and drones, and handles resolutions up to 8K with a maximum image size of 48 megapixels. The processor emphasizes (HDR) capabilities for cameras, including 2:1 HDR stitching, digital overlay (DOL), and dual-pixel HDR to capture details across varying conditions. Architecturally, the Mali-C55 features a compact, configurable design optimized for low power consumption and minimal area—achieving half the footprint of its predecessor, the Mali-C52—making it suitable for battery-powered embedded devices. It delivers a throughput of up to 1.2 gigapixels per second while supporting input formats including 14-bit for high-fidelity processing. Key enhancements include multi-exposure fusion via HDR sensor support, advanced with Temper temporal and Sinter 2.6 spatial algorithms (reducing memory bandwidth by up to 50% compared to prior generations), and improved Iridix local for natural image rendering in challenging environments. The Mali-C55 is widely adopted in smart home devices, such as security cameras and hubs, where it enables real-time image enhancement for endpoint vision tasks. It integrates edge AI processing through a dedicated output pipe to accelerators, facilitating on-device inference for features like without cloud dependency. This combination of efficiency and configurability positions the Mali-C55 as a mid-range complement to the Mali-C52, targeting cost-sensitive IoT deployments.

Comparison of image signal processors

The Arm Mali image signal processors (ISPs) have evolved to address diverse applications, with throughput ranging from 0.6 gigapixels per second (GP/s) in entry-level models to 1.2 GP/s in advanced configurations, enabling efficient processing for embedded vision systems. Early models like the Mali-C32 prioritize low-power operation for cost-sensitive IoT devices, while later variants such as the Mali-C55 and Mali-C71AE incorporate multi-camera support and enhanced handling for more demanding consumer and automotive scenarios. This progression reflects a shift toward higher efficiency and integration with pipelines, particularly after 2020, where ISPs began facilitating direct feeds to AI accelerators for real-time tasks.
ModelThroughput (GP/s)Max Inputs/StreamsPrimary Use Cases
Mali-C320.6Up to 4 independent camera sourcesLow-power IoT, entry-level embedded vision (e.g., )
Mali-C520.6Up to 4 independent camera sources, dual outputsConsumer cameras, drones, action cams with HDR needs
Mali-C551.28 separate inputsBattery-powered IoT, smart cameras, edge ML integration
Mali-C71AE1.24 real-time inputs or 16 Automotive ADAS, industrial multi-camera systems
Power efficiency varies by model and application, with the Mali-C32 optimized for minimal area and energy in resource-constrained IoT environments, consuming less footprint than higher-throughput variants. In contrast, the Mali-C55 balances high performance with low power for battery-operated devices, supporting up to 8K resolutions without excessive drain, while the Mali-C71AE targets automotive and industrial use cases where and sustained operation under varying conditions (e.g., weather, lighting) demand robust, efficient processing. Consumer-oriented models like the Mali-C52 emphasize quality over extreme power savings for portable devices such as drones. All Mali-C models support RAW formats from 10 to 14 bits per channel, enabling flexible integration, alongside HDR variants such as 4:1 stitching for enhanced in challenging lighting. Outputs include RGB, , and RAW, with companded bit depths up to 12 bits for display and further processing. Post-2020 developments, including the Mali-C55, have trended toward greater AI integration by providing downscaled outputs optimized for accelerators, improving on-device inference for vision tasks in IoT and automotive systems. This evolution supports seamless handoff from human vision pipelines to AI-driven analysis, reducing latency in applications like .

Open-source drivers

Lima

Lima is an open-source, reverse-engineered graphics driver for ARM's Mali Utgard architecture GPUs, including the Mali-400 and Mali-450 series. Developed as a community effort within the Mesa 3D graphics library, it utilizes the Gallium3D driver framework to provide support for these embedded GPUs. The project was initiated by Luc Verhaegen in 2012 and later upstreamed into Mesa 19.1 in 2019, marking a significant milestone for open-source Mali compatibility. The driver focuses on enabling 3D acceleration through reverse engineering of the proprietary hardware, replacing binary blobs with verifiable source code. It supports 2.0 with a 97% pass rate on Khronos conformance tests, alongside partial implementations of OpenGL 2.1 and OpenGL ES 1.1. These features target basic 2D and 3D rendering workloads suitable for the fixed-function shader model of Utgard GPUs. In Linux environments, Lima has reached a mature state for 2D and 3D operations, integrated with display drivers like sun4i-drm for Allwinner SoCs and for Rockchip platforms. It is commonly deployed on single-board computers such as Olimex boards and Armbian-supported devices with Allwinner A10/A20 or H3 processors, offering an open alternative to proprietary drivers in Raspberry Pi-like ecosystems. Development now emphasizes bug fixes and broader application compatibility rather than major new features. Due to the hardware constraints of the Utgard architecture, does not support compute shaders, 3.x or higher, 3.x, , or , limiting it to legacy APIs. Fragment shaders are restricted to FP16 precision, aligning with the GPU's original design for mobile and embedded use cases.

Panfrost

Panfrost is an open-source driver developed for Arm Mali GPUs featuring the and Bifrost microarchitectures, including the T600 series and G30 through G76 models. Initiated in 2018 as a reverse-engineered implementation built on the Gallium3D framework within the Mesa 3D , it aims to deliver conformant support for modern APIs without relying on proprietary binaries. By 2022, Panfrost provided full support for 3.1 and 1.1 on these architectures, enabling robust 3D rendering and compatibility with applications targeting embedded systems. Key features of Panfrost include support for unified shaders, which allow flexible execution of vertex, fragment, and compute workloads on the same hardware units, along with compute shader capabilities for general-purpose GPU computing tasks. These elements enable efficient handling of complex shaders and parallel processing, essential for games and graphical applications. The driver has been integrated into Mesa versions 20 and later, facilitating widespread adoption in open-source distributions and facilitating hardware acceleration for desktop environments like on compatible devices. Panfrost is considered production-ready for both Android and environments, powering smooth graphics performance in real-world scenarios such as video playback, UI , and light gaming. For instance, it delivers reliable acceleration on the RK3399 system-on-chip, which integrates a Mali-T860 GPU, enabling Wayland and application rendering without proprietary drivers. Development of Panfrost began as a community-led effort hosted on , with initial focus on reverse-engineering binaries and kernel interfaces. Following Arm's official endorsement in , the company began contributing code and documentation, accelerating progress toward API conformance and performance optimizations while maintaining the project's open-source ethos. As the successor to the Lima driver, Panfrost extends open-source support to architectures with unified s.

Panthor

Panthor is an open-source kernel driver developed for Arm Mali GPUs utilizing the Command Stream Frontend (CSF) architecture, beginning with third-generation Valhall models such as the Mali-G610, and extending to other third-generation Valhall GPUs like the Mali-G310, G510, and G710, as well as fifth-generation architectures including the Immortalis series like the G720 and the Mali-G1 series. Development on Panthor was publicly announced in late 2023 by engineers at , with initial patches focusing on upstream integration into the kernel's (DRM) subsystem. It builds upon the userspace components of the Panfrost driver to provide a unified model for modern Mali hardware. Key features of Panthor include support for advanced graphics capabilities such as ray tracing on Immortalis GPUs and asynchronous compute operations, enabling efficient parallel workload execution on supported hardware. The driver is designed to be identical in functionality to Arm's own open-sourced kernel components for CSF-based GPUs, ensuring compatibility with upstream blobs while promoting full open-source stack adoption. In conjunction with the Mesa userspace libraries, particularly the PanVK driver, Panthor achieves conformance to Vulkan 1.3, allowing developers to leverage modern API features like dynamic rendering and enhanced synchronization. As of 2025, with the PanVK driver, Panthor achieves conformance to Vulkan 1.2 on Mali-G610, with support for Vulkan 1.3 and 1.4 implemented, nearing full conformance for higher versions. Panthor was merged into the as part of version 6.10, released in July 2024, initially targeting third-generation Valhall GPUs and select devices with compatible hardware, such as those featuring the Mali-G715 in later series beyond the Pixel 6. Subsequent enhancements in 6.18 expand support to additional Valhall GPUs such as the Mali-G310, G510, and G710. Further support for fifth-generation and Immortalis GPUs, including the Mali-G1 series, has been added in late 2025 kernel versions. Advancements in Panthor emphasize enhanced , with future iterations incorporating standalone Dynamic Voltage and Frequency Scaling (DVFS) for CSF-based GPUs to optimize energy efficiency during varying workloads. It also facilitates AI workload support via compute shaders and integration with Arm's shader cores, enabling inference and other parallel processing tasks on Mali hardware without proprietary dependencies.

References

  1. Arm Immortalis and Mali GPUs deliver immersive graphics and compute performance for everything from high-end smartphones to smart TVs.
  2. Mali-G77 is the highest performing mobile GPU for complex use cases, such as graphics and on-device machine learning, and delivers consistent battery life ...
  3. Jun 23, 2006 · ARM Holdings plc, a licensor of processor and physical IP technology, has acquired Falanx Microsystems AS, a Norwegian developer of graphics ...
  4. Feb 12, 2007 · ARM Builds Graphics Stack And Broadens Portfolio With Mali200 And Mali55 Processors. ARM Mali processors enable visually stunning 2D and 3D ...
  5. Feb 10, 2012 · The aim of this driver is to finally bring all the advantages of open source software to ARM SoC graphics drivers.
Add your contribution
Related Hubs
User Avatar
No comments yet.