Recent from talks
Nothing was collected or created yet.
Shader
View on Wikipedia
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
It has been suggested that Compute kernel be merged into this article. (Discuss) Proposed since July 2025. |

In computer graphics, a shader is a programmable operation which is applied to data as it moves through the rendering pipeline.[1][2] Shaders can act on data such as vertices and primitives — to generate or morph geometry — and fragments — to calculate the values in a rendered image.[2]
Shaders can execute a wide variety of operations and can run on different types of hardware. In modern real-time computer graphics, shaders are run on graphics processing units (GPUs) — dedicated hardware which provides highly parallel execution of programs. As rendering an image is embarrassingly parallel, fragment and pixel shaders scale well on SIMD hardware. Historically, the drive for faster rendering has produced highly-parallel processors which can in turn be used for other SIMD amenable algorithms.[3] Such shaders executing in a compute pipeline are commonly called compute shaders.
History
[edit]The first known use of the term "shader" was introduced to the public by Pixar with version 3.0 of their RenderMan Interface Specification, originally published in May 1988.[4]
As graphics processing units evolved, major graphics software libraries such as OpenGL and Direct3D began to support shaders. The first shader-capable GPUs only supported pixel shading, but vertex shaders were quickly introduced once developers realized the power of shaders. The first video card with a programmable pixel shader was the Nvidia GeForce 3 (NV20), released in 2001.[5] Geometry shaders were introduced with Direct3D 10 and OpenGL 3.2. Eventually, graphics hardware evolved toward a unified shader model.
Graphics shaders
[edit]The traditional use of shaders is to operate on data in the graphics pipeline to control the rendering of an image. Graphics shaders can be classified according to their position in the pipeline, the data being manipulated, and the graphics API being used.
Fragment shaders
[edit]Fragment shaders, also known as pixel shaders, compute color and other attributes of each "fragment": a unit of rendering work affecting at most a single output pixel. The simplest kinds of pixel shaders output one screen pixel as a color value; more complex shaders with multiple inputs/outputs are also possible.[6] Pixel shaders range from simply always outputting the same color, to applying a lighting value, to doing bump mapping, shadows, specular highlights, translucency and other phenomena. They can alter the depth of the fragment (for Z-buffering), or output more than one color if multiple render targets are active. In 3D graphics, a pixel shader alone cannot produce some kinds of complex effects because it operates only on a single fragment, without knowledge of a scene's geometry (i.e. vertex data). However, pixel shaders do have knowledge of the screen coordinate being drawn, and can sample the screen and nearby pixels if the contents of the entire screen are passed as a texture to the shader. This technique can enable a wide variety of two-dimensional postprocessing effects such as blur, or edge detection/enhancement for cartoon/cel shaders. Pixel shaders may also be applied in intermediate stages to any two-dimensional images—sprites or textures—in the pipeline, whereas vertex shaders always require a 3D scene. For instance, a pixel shader is the only kind of shader that can act as a postprocessor or filter for a video stream after it has been rasterized.
Vertex shaders
[edit]Vertex shaders are run once for each 3D vertex given to the graphics processor. The purpose is to transform each vertex's 3D position in virtual space to the 2D coordinate at which it appears on the screen (as well as a depth value for the Z-buffer).[7] Vertex shaders can manipulate properties such as position, color and texture coordinates, but cannot create new vertices. The output of the vertex shader goes to the next stage in the pipeline, which is either a geometry shader if present, or the rasterizer. Vertex shaders can enable powerful control over the details of position, movement, lighting, and color in any scene involving 3D models.
Geometry shaders
[edit]Geometry shaders were introduced in Direct3D 10 and OpenGL 3.2; formerly available in OpenGL 2.0+ with the use of extensions.[8] This type of shader can generate new graphics primitives, such as points, lines, and triangles, from those primitives that were sent to the beginning of the graphics pipeline.[9]
Geometry shader programs are executed after vertex shaders. They take as input a whole primitive, possibly with adjacency information. For example, when operating on triangles, the three vertices are the geometry shader's input. The shader can then emit zero or more primitives, which are rasterized and their fragments ultimately passed to a pixel shader.
Typical uses of a geometry shader include point sprite generation, geometry tessellation, shadow volume extrusion, and single pass rendering to a cube map. A typical real-world example of the benefits of geometry shaders would be automatic mesh complexity modification. A series of line strips representing control points for a curve are passed to the geometry shader and depending on the complexity required the shader can automatically generate extra lines each of which provides a better approximation of a curve.
Tessellation shaders
[edit]As of OpenGL 4.0 and Direct3D 11, a new shader class called a tessellation shader has been added. It adds two new shader stages to the traditional model: tessellation control shaders (also known as hull shaders) and tessellation evaluation shaders (also known as Domain Shaders), which together allow for simpler meshes to be subdivided into finer meshes at run-time according to a mathematical function. The function can be related to a variety of variables, most notably the distance from the viewing camera to allow active level-of-detail scaling. This allows objects close to the camera to have fine detail, while further away ones can have more coarse meshes, yet seem comparable in quality. It also can drastically reduce required mesh bandwidth by allowing meshes to be refined once inside the shader units instead of downsampling very complex ones from memory. Some algorithms can upsample any arbitrary mesh, while others allow for "hinting" in meshes to dictate the most characteristic vertices and edges.
Primitive and Mesh shaders
[edit]Circa 2017, the AMD Vega microarchitecture added support for a new shader stage—primitive shaders—somewhat akin to compute shaders with access to the data necessary to process geometry.[10][11]
Nvidia introduced mesh and task shaders with its Turing microarchitecture in 2018 which are also modelled after compute shaders.[12][13] Nvidia Turing is the world's first GPU microarchitecture that supports mesh shading through DirectX 12 Ultimate API, several months before Ampere RTX 30 series was released.[14]
In 2020, AMD and Nvidia released RDNA 2 and Ampere microarchitectures which both support mesh shading through DirectX 12 Ultimate.[15] These mesh shaders allow the GPU to handle more complex algorithms, offloading more work from the CPU to the GPU, and in algorithm intense rendering, increasing the frame rate of or number of triangles in a scene by an order of magnitude.[16] Intel announced that Intel Arc Alchemist GPUs shipping in Q1 2022 will support mesh shaders.[17]
Ray tracing shaders
[edit]Ray tracing shaders are supported by Microsoft via DirectX Raytracing, by Khronos Group via Vulkan, GLSL, and SPIR-V,[18] by Apple via Metal. NVIDIA and AMD called "ray tracing shaders" as "ray tracing cores". Unlike unified shader, one ray tracing shader can contain multiple ALUs.[19]
Compute shaders
[edit]Compute shaders are not limited to graphics applications, but use the same execution resources for GPGPU. They may be used in graphics pipelines e.g. for additional stages in animation or lighting algorithms (e.g. tiled forward rendering). Some rendering APIs allow compute shaders to easily share data resources with the graphics pipeline.
Tensor shaders
[edit]Tensor shaders may be integrated in NPUs or GPUs. Tensor shaders are supported by Microsoft via DirectML, by Khronos Group via OpenVX, by Apple via Core ML, by Google via TensorFlow, by Linux Foundation via ONNX.[20] NVIDIA and AMD called "tensor shaders" as "tensor cores". Unlike unified shader, one tensor shader can contains multiple ALUs.[21]
Programming
[edit]Several programming languages exist specifically for writing shaders, and which is used can depend on the target environment. The shading language for OpenGL is GLSL, and Direct3D uses HLSL. The Metal framework, used by Apple devices, has its own shading language called Metal Shading Language.
Increasingly in modern graphics APIs, shaders are compiled into SPIR-V, an intermediate language, before they are distributed to the end user. This standard allows more flexible choice of shading language, regardless of target platform.[22] First supported by Vulkan and OpenGL, SPIR-V is also being adopted by Direct3D.[23]
GUI shader editors
[edit]Modern video game development platforms such as Unity, Unreal Engine and Godot increasingly include node-based editors that can create shaders without the need for written code; the user is instead presented with a directed graph of connected nodes that allow users to direct various textures, maps, and mathematical functions into output values like the diffuse color, the specular color and intensity, roughness/metalness, height, normal, and so on. The graph is then compiled into a shader.
See also
[edit]References
[edit]- ^ "Vulkan® 1.4.323 - A Specification". Khronos Group. Retrieved July 28, 2025.
- ^ a b "The OpenGL® Graphics System: A Specification" (PDF). Khronos Group. p. 87. Retrieved July 28, 2025.
- ^ "Encyclopedia Britannica: graphics processing unit". Encyclopedia Britannica. Retrieved July 28, 2025.
- ^ "The RenderMan Interface Specification". Archived from the original on June 25, 2018.
- ^ Lillypublished, Paul (May 19, 2009). "From Voodoo to GeForce: The Awesome History of 3D Graphics". PC Gamer – via www.pcgamer.com.
- ^ "GLSL Tutorial – Fragment Shader". June 9, 2011.
- ^ "GLSL Tutorial – Vertex Shader". June 9, 2011.
- ^ Geometry Shader - OpenGL. Retrieved on December 21, 2011.
- ^ "Pipeline Stages (Direct3D 10) (Windows)". msdn.microsoft.com. January 6, 2021.
- ^ "Radeon RX Vega Revealed: AMD promises 4K gaming performance for $499 - Trusted Reviews". TrustedReviews. July 31, 2017.
- ^ "The curtain comes up on AMD's Vega architecture". January 5, 2017.
- ^ "NVIDIA Turing Architecture In-Depth". September 14, 2018.
- ^ "Introduction to Turing Mesh Shaders". September 17, 2018.
- ^ "DirectX 12 Ultimate Game Ready Driver Released; Also Includes Support for 9 New G-SYNC Compatible Gaming Monitors".
- ^ "Announcing DirectX 12 Ultimate". DirectX Developer Blog. March 19, 2020. Retrieved May 25, 2021.
- ^ "Realistic Lighting in Justice with Mesh Shading". NVIDIA Developer Blog. May 21, 2021. Retrieved May 25, 2021.
- ^ Smith, Ryan. "Intel Architecture Day 2021: A Sneak Peek At The Xe-HPG GPU Architecture". www.anandtech.com. Archived from the original on August 19, 2021.
- ^ "Vulkan Ray Tracing Final Specification Release". Blog. Khronos Group. November 23, 2020. Retrieved 2021-02-22.
- ^ "RTSL: a Ray Tracing Shading Language" (PDF). Archived from the original (PDF) on October 12, 2011.
- ^ "NNAPI Migration Guide | Android NDK". Android Developers. Retrieved August 1, 2024.
- ^ "Review on GPU Architecture" (PDF). Archived from the original (PDF) on September 6, 2024.
- ^ Kessenich, John (2015). "An Introduction to SPIR-V" (PDF). Khronos Group.
- ^ Discuss, Aleksandar K (September 20, 2024). "Microsoft DirectX 12 Shifts to SPIR-V as Default Interchange Format". TechPowerUp. Retrieved September 8, 2025.
Further reading
[edit]- Upstill, Steve (1990). The RenderMan Companion: A Programmer's Guide to Realistic Computer Graphics. Addison-Wesley. ISBN 0-201-50868-0.
- Ebert, David S; Musgrave, F. Kenton; Peachey, Darwyn; Perlin, Ken; Worley, Steven (1994). Texturing and modeling: a procedural approach. AP Professional. ISBN 0-12-228730-4.
- Fernando, Randima; Kilgard, Mark (2003). The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics. Addison-Wesley Professional. ISBN 0-321-19496-9.
- Rost, Randi J (2004). OpenGL Shading Language. Addison-Wesley Professional. ISBN 0-321-19789-5.
External links
[edit]- OpenGL geometry shader extension
- Riemer's DirectX & HLSL Tutorial: HLSL Tutorial using DirectX with much sample code
- Pipeline Stages (Direct3D 10)
Shader
View on GrokipediaFundamentals
Definition
A shader is a compact program executed on the graphics processing unit (GPU) to perform specialized computations on graphics data as part of the rendering process.[15] These programs enable flexible processing of visual elements, distinguishing them from earlier hardware-limited approaches by allowing custom logic to be applied directly on the GPU hardware.[16] The primary purposes of shaders include transforming vertex positions in 3D space, determining pixel colors and shading effects, generating additional geometry, and supporting general-purpose computations beyond traditional rendering.[15][17] Key characteristics encompass their implementation in high-level shading languages, parallel execution across the GPU's numerous cores to handle massive workloads efficiently, and a design that is stateless—meaning no persistent state is maintained between invocations—and intended to produce deterministic results for consistent rendering, though practical variations may occur due to floating-point arithmetic.[18] In contrast to fixed-function pipeline stages, which rely on predefined hardware operations for tasks like transformation and lighting, programmable shaders provide developer-defined behavior at these stages, enhancing versatility in graphics rendering.[15] The basic execution model involves shaders processing individual graphics primitives, such as vertices or fragments, using inputs like per-primitive attributes (e.g., position or texture coordinates) and uniforms (constant parameters shared across invocations), with outputs directed to render targets like buffers or the framebuffer.[2]Graphics Pipeline Integration
The graphics rendering pipeline in modern GPU architectures processes 3D scene data through a sequence of fixed-function and programmable stages to produce a final 2D image on the screen. Key stages include the input assembler, which assembles vertices from buffers; vertex processing, where positions and attributes are transformed; primitive assembly, which forms geometric primitives like triangles; rasterization, which converts primitives into fragments; fragment processing, where pixel colors are computed; and the output merger, which blends results into render targets.[19] Shaders integrate into the pipeline's programmable stages—such as vertex and fragment processing—replacing earlier fixed-function hardware units that handled rigid operations like basic transformations and lighting. This shift to programmability, which began with DirectX 8 in 2000 and OpenGL extensions in 2001, was advanced in APIs like Direct3D 10 (2006) and OpenGL 2.0 (2004), allowing developers to implement custom algorithms for effects including physically based lighting, dynamic texturing, and advanced shading models, enhancing visual realism and artistic control.[20][21] Data flows sequentially through the pipeline with inputs comprising vertex attributes (e.g., positions, normals) from index and vertex buffers, uniform variables for global parameters like matrices, and textures accessed via samplers. Inter-stage communication occurs through varying qualifiers, where outputs from the vertex stage—such as transformed positions in clip space and interpolated attributes—are passed to the fragment stage after rasterization. Final outputs include fragment colors and depth values directed to framebuffers or other render targets in the output merger.[22] In graphics APIs, pipeline configuration involves compiling shader code into modules and binding them to specific stages via pipeline state objects (PSOs) or equivalent structures, such as Direct3D 12's ID3D12PipelineState or Vulkan's VkPipeline. Resource management requires allocating and binding buffers for vertex data, constant buffers for uniforms, and descriptor sets or tables for textures and samplers, ensuring shaders can access them during execution without runtime overhead.[16][22] This shader-driven architecture provides key benefits, including the ability to realize complex effects like procedural geometry generation within vertex stages and real-time ray tracing extensions through dedicated shader invocations, far surpassing the limitations of fixed-function pipelines in supporting diverse rendering techniques.[16]History
Origins and Early Concepts
The origins of shading in computer graphics trace back to the 1970s, when researchers developed mathematical models to simulate surface illumination and enhance the realism of rendered images. Gouraud shading, introduced in 1971, represented an early approach to smooth shading by interpolating colors across polygon vertices, thereby avoiding the faceted appearance of flat shading while computing lighting only at vertices for efficiency. This technique prioritized computational feasibility on early hardware, laying groundwork for interpolated shading methods. Shortly after, in 1975, Bui Tuong Phong proposed a more sophisticated reflection model that separated illumination into ambient, diffuse, and specular components, enabling per-pixel normal interpolation to capture highlights and surface details more accurately than vertex-based methods.[23] These models established shading as a core concept for local illumination, influencing subsequent hardware and software developments despite their initial software-only implementation. In the 1980s, advancements in rendering architectures began to integrate shading concepts into production pipelines, particularly for film and animation. At Lucasfilm (later Pixar), the REYES (Render Everything Really Easy) architecture, developed by Edwin Catmull, Loren Carpenter, and Robert L. Cook, introduced micropolygon-based rendering around 1980, where complex surfaces were subdivided into tiny polygons shaded individually to support displacement mapping and anti-aliasing.[24] This system optimized shading for high-quality output by processing micropolygons in screen space, serving as a precursor to parallel graphics processing and influencing hardware design. Building on REYES, Pixar released RenderMan in 1988, a commercial rendering system that included the first shading language for procedural surface effects, allowing artists to define custom illumination models beyond fixed mathematics.[25] These innovations met the growing demand for photorealistic visuals in films like Toy Story (1995), but remained software-bound, highlighting the need for hardware acceleration. The 1990s saw the shift to dedicated graphics hardware with fixed-function pipelines, which hardcoded shading stages for lighting, texturing, and transformation to accelerate real-time rendering in games and simulations. Cards like the 3Dfx Voodoo (1996) implemented multi-texturing and basic lighting in fixed stages, enabling rasterization without CPU intervention, though limited to predefined operations like Gouraud-style interpolation.[26] Similarly, NVIDIA's RIVA 128 (1997) featured a configurable pipeline with four texture units for effects such as environment mapping, but shading remained non-programmable, relying on driver-set parameters for custom looks.[26] These systems dominated consumer graphics, processing billions of pixels per second, yet their rigidity constrained complex effects, prompting multi-pass techniques to approximate advanced shading. Rising demands for custom effects in mid-1990s films—such as procedural textures in RenderMan for Jurassic Park (1993)—and video games—like dynamic lighting in Quake (1996)—exposed limitations of fixed hardware, driving early experiments in low-level GPU programming via assembly-like instructions to tweak combiners and registers.[27] This pressure culminated in a key milestone with NVIDIA's GeForce 256 in 1999, the first consumer GPU to integrate dedicated transform and lighting (T&L) engines, offloading vertex shading from the CPU and hinting at future programmability through hardware-accelerated fixed stages.[28]Evolution in Graphics APIs
The evolution of shader programmability in graphics APIs began in 2000 with Microsoft's DirectX 8, which introduced vertex and pixel shaders. Hardware support followed in 2001 with GPUs such as NVIDIA's GeForce 3 and ATI's Radeon 8500, enabling programming in assembly language and marking a shift from fixed-function pipelines to basic programmable stages. This enabled developers to customize transformations and per-pixel operations beyond rigid hardware limitations, laying the groundwork for more expressive rendering techniques. Programmable shaders initially used low-level assembly-like instructions via extensions like ARB_vertex_program (2001) and ARB_fragment_program (2002). High-level shading languages emerged from 2002 to 2004, with Microsoft's High-Level Shading Language (HLSL) debuting with DirectX 9 in 2002, supporting shader model 2.0 for improved precision and branching, and OpenGL Shading Language (GLSL) introduced alongside OpenGL 2.0 in 2004 for more accessible vertex and fragment programming. OpenGL 2.0 in 2004 and DirectX 9's shader model 3.0 further standardized these capabilities, allowing longer programs and dynamic branching for complex effects like procedural textures. The mid-2000s saw expanded shader stages, as DirectX 10 in 2006 added geometry shaders to process primitives after vertex shading, enabling amplification and simplification of geometry on the GPU. DirectX 11 in 2009 introduced tessellation shaders for adaptive subdivision and compute shaders for general-purpose GPU computing, with OpenGL 3.2 in 2009 adding tessellation shaders and OpenGL 4.3 in 2012 introducing compute shaders to align cross-platform development. In the 2010s, modern APIs focused on efficiency and low-level control, with Apple's Metal API released in 2014 emphasizing streamlined shader pipelines for iOS and macOS devices to reduce overhead in draw calls. Vulkan, launched in 2016 by the Khronos Group, extended this with explicit resource management and SPIR-V as an intermediate representation for portable shaders across APIs. Microsoft's DirectX 12, introduced in 2015, built on these principles with enhanced command list handling for shaders, paving the way for advanced features like mesh shaders in later updates. By 2018, real-time ray tracing gained traction through extensions like DirectX Raytracing (DXR) in DirectX 12 and Vulkan's ray tracing extension (VK_KHR_ray_tracing_pipeline), integrating specialized shaders for ray generation, intersection, and shading to simulate light interactions more accurately. Mesh shaders arrived in DirectX 12 Ultimate in 2020, replacing geometry and tessellation stages with a unified task/mesh pipeline for scalable geometry processing, followed by Vulkan 1.3 in 2022 for broader adoption. The SPIR-V format, adopted from 2016, has facilitated cross-API shader portability by compiling high-level code to a binary intermediate language. As of 2025, no major new core shader types have been introduced in primary APIs, though integration of AI-accelerated shading—using neural networks for denoising and upscaling in ray-traced pipelines—has proliferated, as seen in NVIDIA's DLSS and AMD's FSR implementations.Graphics Shaders
Vertex Shaders
Vertex shaders operate on individual vertices early in the graphics pipeline, transforming their positions from model space to clip space through matrix multiplications, typically involving the model, view, and projection matrices to prepare geometry for rasterization.[29] This stage performs per-vertex computations such as coordinate transformations, normal vector adjustments, and texture coordinate generation, ensuring that subsequent pipeline stages receive properly oriented vertex data.[30] The inputs to a vertex shader include per-vertex attributes supplied from vertex buffers, such as position, normal vectors, and UV texture coordinates, along with uniform variables like transformation matrices and lighting parameters that remain constant across vertices in a draw call.[31] Texture samplers can be accessed in vertex shaders for operations like displacement mapping, though this is rarely utilized due to hardware limitations in earlier shader models and the efficiency of handling such computations in later stages.[32] Outputs from the vertex shader consist of the transformed vertex position, written to a built-in variable such asgl_Position in GLSL or SV_Position in HLSL, which defines the clip-space coordinates for primitive assembly.[33] Additionally, varying outputs—such as interpolated normals, colors, or texture coordinates—are passed to the next pipeline stage for interpolation across primitives, enabling smooth shading effects without per-vertex redundancy.[34]
Vertex shaders execute once per input vertex, immediately following the input assembler and preceding primitive assembly in the rendering pipeline, which allows for efficient parallel processing on the GPU as each invocation is independent.[35] This model ensures a one-to-one mapping between input and output vertices, preserving the topology of the input geometry.[2]
Common applications of vertex shaders extend beyond basic transformations to include skeletal skinning, where vertex positions are blended using bone influence matrices to animate character meshes in real-time.[36] Procedural deformations, such as wind-driven animations for vegetation, leverage vertex shaders to apply dynamic offsets based on time or noise functions, simulating natural motion without CPU intervention.[37] Billboard effects, used for particles or distant objects, orient vertices to always face the camera by replacing the model matrix with a view-dependent transformation in the shader.[38]
Fragment Shaders
Fragment shaders, also known as pixel shaders in some APIs, are a programmable stage in the graphics pipeline responsible for processing each fragment generated by rasterization to determine its final color, depth, and other attributes. These shaders execute after the rasterization stage, where primitives are converted into fragments representing potential pixels, and before the per-fragment operations like depth testing and blending. The primary function is to compute the appearance of each fragment based on interpolated data from earlier stages, enabling per-fragment shading effects that contribute to realistic rendering.[33] Inputs to a fragment shader include interpolated varying variables from the vertex shader, such as texture coordinates, surface normals, and positions, which are automatically interpolated across the primitive by the rasterizer. Additionally, uniforms provide constant data like light positions, material properties, and transformation matrices, while texture samplers allow access to bound textures for sampling color or other data at specific coordinates. These inputs enable the shader to perform computations tailored to each fragment's position and attributes without accessing vertex-level data directly.[33][39] The outputs of a fragment shader typically include one or more color values written to the framebuffer, often via built-in variables likegl_FragColor in GLSL or explicit output locations in modern shading languages. Optionally, shaders can modify the fragment's depth value using gl_FragDepth for custom depth computations or discard fragments entirely to simulate effects like alpha testing. Stencil values can also be altered if enabled, though this is less common. These outputs are then subjected to fixed-function tests and blending before final pixel composition.[33]
Common applications of fragment shaders include texture mapping, where interpolated UV coordinates are used to sample from textures and combine them with base colors, and lighting calculations to simulate illumination per fragment. For instance, the Phong reflection model computes intensity as the sum of ambient, diffuse, and specular components:
where , , and are ambient, diffuse, and specular light intensities; , , and are material coefficients; is the surface normal; is the light direction; is the reflection vector; is the view direction; and is the shininess exponent. This model, originally proposed by Bui Tuong Phong, is widely implemented in fragment shaders for efficient per-pixel lighting.[40] Other uses encompass fog effects, achieved by blending fragment colors toward a fog color based on depth or distance, and contributions to anti-aliasing through techniques like multisample anti-aliasing (MSAA) integration or post-processing filters that smooth edges by averaging samples.
Fragment shaders are executed once per fragment in a highly parallel manner across the GPU, making them performance-critical due to their impact on fill rate—the number of fragments processed per second. Modern GPUs optimize this by executing shaders on streaming multiprocessors or compute units, with early rejection via depth or stencil tests to avoid unnecessary computations. Complex shaders can become bottlenecks in scenes with high overdraw, emphasizing the need for efficient code to maintain frame rates.[41]
Geometry Shaders
Geometry shaders represent an optional programmable stage in the graphics rendering pipeline, positioned immediately after the vertex shader and prior to the rasterization stage. This stage enables developers to process entire input primitives—such as points, lines, or triangles—allowing for the generation of new primitives or the modification of existing ones directly on the GPU. Introduced in Direct3D 10 with Shader Model 4.0 in November 2006, geometry shaders marked a significant advancement in GPU programmability by extending beyond per-vertex operations to per-primitive processing.[42] OpenGL incorporated geometry shaders in version 3.2, released in August 2009, aligning with the core profile to support modern hardware features.[43] The primary function of a geometry shader is to receive a complete primitive from the vertex shader output, including its topology (e.g., GL_TRIANGLES or GL_POINTS in OpenGL) and associated per-vertex attributes such as positions, normals, and texture coordinates. Unlike vertex shaders, which handle individual vertices independently, geometry shaders operate on the full set of vertices defining the primitive, providing access to inter-vertex relationships for more sophisticated manipulations. This enables tasks like transforming the primitive's shape or topology while preserving or augmenting vertex data. In the OpenGL Shading Language (GLSL), inputs are accessed via built-in arrays like gl_in, which holds the vertex data for the current primitive. Similarly, in High-Level Shading Language (HLSL) for Direct3D, the shader receives the primitive's vertices through input semantics defined in the shader signature.[44] Outputs from geometry shaders are generated dynamically by emitting new vertices and completing primitives, subject to hardware-imposed limits on amplification. In GLSL, developers use the EmitVertex() function to append a vertex (with current output variable values) to the ongoing primitive, followed by EndPrimitive() to finalize and emit the primitive to subsequent pipeline stages. This process allows for variable output topologies, such as converting a point primitive into a triangle strip for billboard rendering. In HLSL, equivalent functionality is achieved through [maxvertexcount(N)] declarations, where N specifies the maximum vertices per invocation, capped by hardware constraints like 1024 scalar components per primitive in Direct3D 10-era implementations—translating to an effective amplification factor of up to approximately 32 times for typical vertex formats (e.g., position and color).[44] Beyond scalar limits, outputs must adhere to supported topologies like point lists, line strips, or triangle strips, ensuring compatibility with rasterization.[45] Common applications of geometry shaders leverage their primitive-level control for efficient geometry generation and optimization. For instance, point primitives can be extruded into billboard quads to render particle effects or impostors, where a single input point expands into four vertices forming a textured square always facing the camera. Fur or hair simulation often employs geometry shaders to generate strand-like line strips from base mesh edges, creating dense fibrous surfaces without excessive CPU-side geometry preparation. Shadow volume creation benefits from on-the-fly extrusion of silhouette edges into volume primitives, streamlining real-time lighting computations in deferred rendering pipelines. Additionally, primitive culling can be implemented by conditionally discarding or simplifying input primitives based on visibility criteria, such as frustum or occlusion tests, reducing downstream workload. These uses highlight geometry shaders' role in balancing performance and visual fidelity in real-time graphics.[46][47][45]Tessellation Shaders
Tessellation shaders enable adaptive subdivision of coarse geometry patches into finer meshes on the GPU, facilitating detailed surface rendering without excessive vertex data in memory. In the graphics pipeline, they operate after the vertex shader stage and consist of two programmable components: the hull shader (also known as the tessellation control shader in OpenGL) and the domain shader (tessellation evaluation shader). The hull shader processes input control points from patches, such as Bézier curves or surfaces, to generate output control points and compute tessellation factors that dictate subdivision density. These factors include edge levels for patch boundaries and inside levels for interior subdivision, typically ranging from 1 to 64, allowing control over the level of detail (LOD) based on factors like viewer distance to optimize performance.[48][49][50] The fixed-function hardware tessellator then uses these factors to generate a denser grid of vertices from the patch topology, evaluating parametric coordinates (e.g., u-v parameters for Bézier patches) without programmable intervention. The domain shader subsequently receives these generated vertices, along with the original control points and tessellation factors, to displace or position them in world space, often applying height or displacement maps for realistic surface variations. This produces a stream of dense vertices that feeds into subsequent pipeline stages, such as the geometry shader or rasterizer, enabling techniques like displacement mapping for enhanced detail. Introduced in DirectX 11 in 2009 and OpenGL 4.0 in 2010, tessellation shaders integrate post-vertex processing to dynamically adjust geometry complexity, reducing CPU-side vertex generation while leveraging GPU parallelism.[48][51][49] Common applications include terrain rendering, where tessellation factors vary with distance to create seamless LOD transitions across landscapes; character skinning, which uses subdivision for smooth, wrinkle-free deformations; and approximation of subdivision surfaces like Catmull-Clark, where low-order Bézier patches represent higher-order geometry for efficient rendering of complex models. These uses exploit the hardware tessellator's efficiency in evaluating subdivision patterns, allowing real-time adaptation to viewing conditions without precomputing all possible detail levels.[52][53][50]Mesh Shaders
Mesh shaders represent a significant evolution in graphics pipeline stages, combining and replacing the traditional vertex, geometry, and tessellation shaders with a more flexible, compute-like model for geometry processing. Introduced as part of DirectX 12 Ultimate, they enable developers to generate variable numbers of vertices and primitives directly within the shader, bypassing fixed-function topology constraints and reducing overhead from multiple pipeline stages. This approach leverages workgroups to process meshlets—small, efficient units of geometry—allowing for dynamic culling, amplification, and generation of mesh data on the GPU.[54] The primary function of mesh shaders involves two complementary stages: the task shader (also known as the amplification shader) and the mesh shader itself. The task shader operates on input workgroups, performing culling or amplification to determine the number of meshlets needed, outputting a variable count of child workgroups to invoke mesh shaders. The mesh shader then executes within these workgroups, generating vertices, indices, and primitive data for each meshlet, which are directly fed into the rasterizer without intermediate fixed-function processing. This task-based processing model allows for coarse-grained decisions at the task stage and fine-grained vertex/primitive assembly at the mesh stage, streamlining geometry workloads.[55][54] Inputs to mesh shaders typically originate from draw calls that specify groups of meshlets, along with per-meshlet attributes, uniform buffers, and resources such as textures or buffers for procedural generation. These inputs are processed cooperatively within a workgroup, similar to compute shaders, enabling shared memory access and thread synchronization for efficient data handling. Outputs from a single mesh shader invocation include a variable number of vertices (up to 256) and primitives (up to 512 per meshlet), defined in one of three modes: points, lines, or triangles, which replaces the rigid topology of traditional pipelines and supports dynamic mesh topologies without additional memory writes.[55] Common applications of mesh shaders include efficient level-of-detail (LOD) management, where task shaders can cull distant or occluded meshlets before mesh generation; procedural mesh creation for complex scenes like terrain or foliage; and building ray-tracing acceleration structures by generating custom geometry on-the-fly. By consolidating multiple shader stages into these programmable units, mesh shaders reduce pipeline bubbles—idle periods between stages—and improve GPU utilization, particularly for high-vertex-count models, leading to performance gains in scenarios with variable geometry complexity.[54][56] Mesh shaders were first introduced in DirectX 12 Ultimate in March 2020, with initial hardware support on NVIDIA's Turing architecture (RTX 20-series and later), though broader adoption accelerated with the RTX 30-series and subsequent generations. In Vulkan, support arrived via the VK_EXT_mesh_shader extension in 2022, enabling cross-platform implementation on compatible hardware from NVIDIA, AMD (RDNA 2 and later), and Intel. This replacement of legacy stages has been adopted in modern engines for rasterization pipelines, offering up to 2x performance improvements in geometry-heavy workloads by minimizing draw call overhead and enabling better parallelism.[54][13][56]Ray-Tracing Shaders
Ray-tracing shaders represent a specialized class of programmable shaders designed to simulate light paths in real-time rendering by tracing rays through a scene. Introduced in major graphics APIs such as DirectX Raytracing (DXR) in 2018 and Vulkan Ray Tracing extensions in 2020, these shaders enable developers to implement physically accurate effects by querying ray intersections against scene geometry. Unlike traditional rasterization-based shading, ray-tracing shaders operate on ray queries, allowing for dynamic computation of light interactions without relying on fixed screen-space sampling. They are typically integrated into hybrid rendering pipelines that combine rasterization for primary visibility with ray tracing for secondary effects, often employing denoising techniques to achieve interactive frame rates on hardware-accelerated GPUs.[57][58] The primary shader types in ray-tracing pipelines include ray-generation, miss, closest-hit, any-hit, and callable shaders, each serving distinct roles in processing ray queries. The ray-generation shader acts as the entry point, dispatched in a grid similar to compute shaders, where it defines ray origins and directions based on screen pixels or other sources and initiates tracing via API calls like TraceRayInline or TraceRay. Miss shaders execute when a ray does not intersect any geometry, commonly used to sample environment maps or compute background contributions for effects like sky lighting. Closest-hit shaders run upon detecting the nearest intersection, retrieving hit attributes such as barycentric coordinates, world position, surface normal, and material properties to perform shading calculations, such as diffuse or specular responses. Any-hit shaders handle potential intersections for non-opaque surfaces, evaluating transparency or alpha testing to accept or reject hits, often in scenarios involving blended materials. Callable shaders provide a mechanism for indirect invocation from other shaders via the CallShader intrinsic, enabling modular reuse of code for complex procedural evaluations without full ray tracing.[59][60][61] These shaders receive inputs including ray origins and directions, acceleration structures for efficient traversal—such as bottom-level structures (BLAS) for individual meshes and top-level structures (TLAS) for instanced scenes using bounding volume hierarchies (BVH)—and scene data like textures or materials bound via descriptor heaps. Outputs consist of hit attributes passed back through the shader pipeline and shading results written to a ray-tracing output buffer, which may include color payloads or visibility flags for further processing. Common applications encompass global illumination to simulate indirect lighting bounces, realistic reflections on glossy surfaces, and soft shadows by tracing occlusion rays, frequently in hybrid setups where rasterized geometry provides base lighting and ray tracing enhances details like caustics or ambient occlusion. Performance optimizations, such as denoising passes on noisy ray-traced samples, are essential for real-time viability in these uses.[60][62][63] Execution of ray-tracing shaders occurs through API dispatch commands, such as DispatchRays in DXR or traceRaysKHR in Vulkan, which launch the ray-generation shader grid and traverse the acceleration structure hardware-accelerated by dedicated RT cores on NVIDIA RTX GPUs introduced in 2018. Intersection tests are offloaded to these cores for bounding volume and triangle checks, while shading remains on general-purpose streaming multiprocessors. Recursion for multiple bounces is managed via a hardware stack, limiting depth to prevent excessive computation, with payloads propagated between shader invocations to accumulate lighting contributions across the ray path. This model supports scalable parallelism, where thousands of rays are processed concurrently to render complex scenes at interactive rates.[60][61][58]Compute Shaders
Core Functionality
Compute shaders enable general-purpose computing on graphics processing units (GPUs) by decoupling from the traditional rendering pipeline, allowing developers to perform arbitrary parallel computations. Unlike graphics shaders tied to fixed pipeline stages, compute shaders operate as standalone kernels dispatched across a grid of threads organized into workgroups, supporting one-, two-, or three-dimensional layouts for scalable parallelism. This flexibility permits execution on thousands of GPU threads simultaneously, leveraging the massive parallelism inherent in modern GPUs without the constraints of vertex processing or fragment rasterization. Compute shaders were first introduced in DirectX 11 in 2009 by Microsoft, expanding GPU capabilities beyond graphics to general-purpose tasks.[17][64] In terms of execution, a compute shader is invoked via API calls such asglDispatchCompute in OpenGL or Dispatch in DirectX, specifying the number of workgroups along each dimension of the grid. Each workgroup consists of multiple threads that execute the shader code cooperatively, enabling efficient handling of data-parallel workloads. The shader code, written in languages like GLSL or HLSL, defines the computation logic, with built-in variables like gl_GlobalInvocationID providing unique identifiers for each thread to access input data and determine output positions. This model supports highly scalable operations, where the GPU scheduler distributes threads across available cores, achieving performance gains for tasks involving large datasets.[65][17]
Compute shaders access inputs through shader storage buffer objects (SSBOs), image textures, and uniform buffers, which provide read/write access to large data structures on the GPU. Within a workgroup, threads can share data via declared shared memory variables, facilitating intra-group communication and reducing global memory traffic. Outputs are written back to SSBOs or images, allowing results to be used in subsequent computations or transferred to the CPU. For synchronization, functions like memoryBarrierShared() in GLSL ensure that shared memory writes are visible to other threads before proceeding, preventing race conditions in cooperative algorithms. These mechanisms enable atomic operations and barriers to coordinate thread execution within workgroups.[65][66]
Common applications of compute shaders in general-purpose GPU (GPGPU) computing include particle simulations, where threads update positions and velocities for thousands of particles in parallel; physics computations such as N-body simulations modeling gravitational interactions via the force equation
where is the gravitational constant, and are masses, and is the distance between bodies; image processing tasks like convolutions for filters such as blurring or edge detection; and fast Fourier transforms (FFTs) for signal analysis. These uses exploit the GPU's parallel architecture to accelerate simulations that would be computationally intensive on CPUs, often achieving real-time performance for complex datasets.[67][68][69]
Tensor and Specialized Shaders
Tensor and specialized shaders represent an evolution of compute shaders tailored for accelerating tensor operations, particularly in machine learning workloads. These shaders execute optimized kernels that perform matrix and tensor mathematics, exploiting specialized hardware units such as tensor cores to achieve high throughput via single instruction, multiple data (SIMD) processing. Introduced with NVIDIA's Volta architecture in 2017, Tensor Cores enable mixed-precision computations that significantly boost performance for deep learning tasks compared to general-purpose CUDA cores. Similarly, AMD's CDNA architecture incorporates Matrix Cores to deliver comparable acceleration for AI and high-performance computing applications.[70][71] The primary inputs to these shaders consist of tensor buffers in low-precision formats like FP16 or INT8, along with neural network weights and biases, which are loaded into GPU memory for efficient access. These formats reduce memory bandwidth demands while maintaining sufficient numerical accuracy for many AI models. Outputs are typically transformed tensors, such as feature activations following convolutional or fully connected layers, which can be passed to subsequent shader invocations or used in broader pipeline stages. This design facilitates seamless integration into end-to-end machine learning workflows, where data flows through multiple tensor operations without host intervention.[72][73] In practice, tensor and specialized shaders are commonly deployed for machine learning inference and training, with a focus on general matrix multiply (GEMM) operations of the form for matrices and . NVIDIA Tensor Cores, for instance, execute these as fused multiply-accumulate instructions on 4x4 FP16 matrices, delivering up to 125 TFLOPS of throughput on Volta-based GPUs like the Tesla V100. AMD Matrix Cores support analogous operations through matrix fused multiply-add (MFMA) instructions, optimized for wavefront-level parallelism in CDNA GPUs such as the Instinct MI series, enabling scalable performance for large-scale AI training. These hardware accelerations are pivotal for reducing training times in models like transformers, where GEMM dominates computational cost.[72][71][73] Execution occurs by dispatching these shaders as compute kernels, incorporating tensor-specific intrinsics to directly target the underlying hardware. In NVIDIA's CUDA ecosystem, the Warp Matrix Multiply-Accumulate (WMMA) API provides programmatic access to Tensor Cores within compute shaders, allowing developers to fragment larger matrices into warp-synchronous operations. For GPU-agnostic environments, APIs like Vulkan expose cooperative matrix extensions (e.g., VK_KHR_cooperative_matrix) that enable tensor intrinsics in SPIR-V shaders, supporting cross-vendor hardware without low-level vendor specifics. Microsoft's DirectML further abstracts this by compiling high-level ML operators into DirectX 12 compute shaders, leveraging tensor cores on compatible GPUs for operator execution. Integration with frameworks such as TensorFlow and PyTorch occurs through backends like cuDNN (for NVIDIA) or ROCm (for AMD), which automatically dispatch these optimized shaders during graph execution, often with automatic mixed-precision to invoke tensor hardware transparently.[74][75][76]Programming Shaders
Languages and Syntax
Shader programming languages are high-level, C-like constructs designed for GPU execution, enabling developers to write code for graphics and compute pipelines. These languages share foundational syntax elements but differ in API integration, type systems, and extensions tailored to specific hardware ecosystems. Major languages include GLSL for OpenGL and Vulkan, HLSL for DirectX, and MSL for Metal, with emerging standards like WGSL for WebGPU addressing cross-platform needs.[10][11][77][78] The OpenGL Shading Language (GLSL) is a C-like language used with OpenGL and Vulkan APIs, featuring versioned specifications up to 4.60 for desktop and 3.20 for embedded systems. It includes built-in types such asvec4 for 4-component vectors and mat4 for 4x4 matrices, facilitating vectorized operations essential for graphics transformations. GLSL supports extensions like GL_ARB_compute_shader for compute shaders since version 4.30 and GLSL_EXT_ray_tracing for ray-tracing capabilities with Vulkan extensions, allowing shaders to interface with advanced rendering pipelines.[10][10]
High-Level Shading Language (HLSL), developed by Microsoft for DirectX, adopts a syntax similar to C++ and is used to program shaders across the Direct3D pipeline. It incorporates an older effect framework via .fx files for encapsulating multiple techniques and passes, though modern usage favors standalone shader objects. HLSL provides intrinsics for DirectX Raytracing (DXR), such as RayTracingAccelerationStructure, enabling ray-tracing shaders with functions like TraceRay. Shaders written in HLSL can be cross-compiled to SPIR-V intermediate representation using the DirectX Shader Compiler (DXC) for Vulkan compatibility.[11][79]
Metal Shading Language (MSL), Apple's shading language for the Metal API, is based on a subset of C++11 and integrates seamlessly with Swift and Objective-C environments on iOS, macOS, and visionOS. It emphasizes strong static typing for type safety and performance, with features like constexpr for compile-time evaluation and automatic SIMD vectorization. MSL shaders declare inputs and outputs using attributes such as [[stage_in]] for vertex inputs and [[color(0)]] for fragment outputs, ensuring explicit resource binding.[77][77][77]
Across these languages, common syntax elements promote portability and efficiency on parallel GPU architectures. Inputs and outputs are declared with qualifiers like in and out in GLSL, or input and output semantics in HLSL, defining data flow between shader stages. Vector types, such as float3 in HLSL or vec3 in GLSL, support swizzling (e.g., pos.xyz) and component-wise operations for spatial computations. Control flow structures include if, for, and while statements, but implementations warn against branch divergence in SIMD execution to avoid performance penalties on GPU warps. Precision qualifiers, particularly in GLSL for OpenGL ES (e.g., highp, mediump, lowp), allow optimization for mobile hardware by specifying floating-point accuracy.[80][10][80][10]
The Cg (C for Graphics) language, jointly developed by NVIDIA and ATI from 2002 to 2012, was an early high-level shading language modeled on ANSI C with extensions for graphics. It supported profiles for various APIs but has been deprecated since 2012, with NVIDIA recommending migration to GLSL or HLSL for ongoing development.[81]
An emerging language is WGSL (WebGPU Shading Language), first published in 2021 as part of the WebGPU standard by the W3C and now at Candidate Recommendation status as of July 2024, designed for secure, portable shader execution in web browsers. WGSL features Rust-inspired syntax with explicit types, structured bindings (e.g., @group(0) @binding(0) var<uniform> u : Uniform;), and no preprocessor directives, prioritizing safety and validation over legacy C-style flexibility.[78][78]