Recent from talks
Nothing was collected or created yet.
High-Level Shader Language
View on Wikipedia
The High-Level Shader Language[1] or High-Level Shading Language[2] (HLSL) is a proprietary shading language developed by Microsoft for the Direct3D 9 API to augment the shader assembly language, and went on to become the required shading language for the unified shader model of Direct3D 10 and higher.
HLSL is analogous to the GLSL shading language used with the OpenGL standard. It is very similar to the Nvidia Cg shading language, as it was developed alongside it. Early versions of the two languages were considered identical, only marketed differently.[3] HLSL shaders can enable profound speed and detail increases as well as many special effects in both 2D and 3D computer graphics.[citation needed]
HLSL programs come in six forms: pixel shaders (fragment in GLSL), vertex shaders, geometry shaders, compute shaders, tessellation shaders (Hull and Domain shaders), and ray tracing shaders (Ray Generation Shaders, Intersection Shaders, Any Hit/Closest Hit/Miss Shaders). A vertex shader is executed for each vertex that is submitted by the application, and is primarily responsible for transforming the vertex from object space to view space, generating texture coordinates, and calculating lighting coefficients such as the vertex's normal, tangent, and bitangent vectors. When a group of vertices (normally 3, to form a triangle) come through the vertex shader, their output position is interpolated to form pixels within its area; this process is known as rasterization.
Optionally, an application using a Direct3D 10/11/12 interface and Direct3D 10/11/12 hardware may also specify a geometry shader. This shader takes as its input some vertices of a primitive (triangle/line/point) and uses this data to generate/degenerate (or tessellate) additional primitives or to change the type of primitives, which are each then sent to the rasterizer.
D3D11.3 and D3D12 introduced Shader Model 5.1[4] and later 6.0.[5]
Shader model comparison
[edit]GPUs listed are the hardware that first supported the given specifications. Manufacturers generally support all lower shader models through drivers. Note that games may claim to require a certain DirectX version, but don't necessarily require a GPU conforming to the full specification of that version, as developers can use a higher DirectX API version to target lower-Direct3D-spec hardware; for instance DirectX 9 exposes features of DirectX7-level hardware that DirectX7 did not, targeting their fixed-function T&L pipeline.
Pixel shader comparison
[edit]| Pixel shader version | 1.0 | 1.1 | 1.2
1.3[6] |
1.4[6] | 2.0[6][7] | 2.0a[6][7][8] | 2.0b[6][7][9] | 3.0[6][10] | 4.0[11] 4.1[12]
5.0[13] |
|---|---|---|---|---|---|---|---|---|---|
| Dependent texture limit | 4 | 4 | 4 | 6 | 8 | Unlimited | 8 | Unlimited | Unlimited |
| Texture instruction limit | 4 | 4 | 4 | 6 * 2 | 32 | Unlimited | Unlimited | Unlimited | Unlimited |
| Arithmetic instruction limit | 8 | 8 | 8 | 8 * 2 | 64 | Unlimited | Unlimited | Unlimited | Unlimited |
| Position register | No | No | No | No | No | No | No | Yes | Yes |
| Instruction slots | 8 | 8 + 4 | 8 + 4 | (8 + 6) * 2 | 64 + 32 | 512 | 512 | ≥ 512 | ≥ 65536 |
| Executed instructions | 8 | 8 + 4 | 8 + 4 | (8 + 6) * 2 | 64 + 32 | 512 | 512 | 65536 | Unlimited |
| Texture indirections | 4 | 4 | 4 | 4 | 4 | Unlimited | 4 | Unlimited | Unlimited |
| Interpolated registers | 2 + 4 | 2 + 4 | 2 + 4 | 2 + 6 | 2 + 8 | 2 + 8 | 2 + 8 | 10 | 32 |
| Instruction predication | No | No | No | No | No | Yes | No | Yes | No |
| Index input registers | No | No | No | No | No | No | No | Yes | Yes |
| Temp registers | 2 | 2 + 4 | 3 + 4 | 6 | 12 to 32 | 22 | 32 | 32 | 4096 |
| Constant registers | 8 | 8 | 8 | 8 | 32 | 32 | 32 | 224 | 16×4096 |
| Arbitrary swizzling | No | No | No | No | No | Yes | No | Yes | Yes |
| Gradient instructions | No | No | No | No | No | Yes | No | Yes | Yes |
| Loop count register | No | No | No | No | No | No | No | Yes | Yes |
| Face register (2-sided lighting) | No | No | No | No | No | No | Yes | Yes | Yes |
| Dynamic flow control | No | No | No | No | No | No | No | Yes (24) | Yes (64) |
| Bitwise Operators | No | No | No | No | No | No | No | No | Yes |
| Native Integers | No | No | No | No | No | No | No | No | Yes |
- PS 1.0 — Unreleased 3dfx Rampage, DirectX 8
- PS 1.1 — GeForce 3, DirectX 8
- PS 1.2 — 3Dlabs Wildcat VP, DirectX 8.1
- PS 1.3 — GeForce 4 Ti, DirectX 8.1
- PS 1.4 — Radeon 8500–9250, Matrox Parhelia, DirectX 8.1
- Shader Model 2.0 — Radeon 9500–9800/X300–X600, DirectX 9
- Shader Model 2.0a — GeForce FX/PCX-optimized model, DirectX 9.0a
- Shader Model 2.0b — Radeon X700–X850 shader model, DirectX 9.0b
- Shader Model 3.0 — Radeon X1000 and GeForce 6, DirectX 9.0c
- Shader Model 4.0 — Radeon HD 2000 and GeForce 8, DirectX 10
- Shader Model 4.1 — Radeon HD 3000 and GeForce 200, DirectX 10.1
- Shader Model 5.0 — Radeon HD 5000 and GeForce 400, DirectX 11
- Shader Model 5.1 — GCN 1+, Fermi+, DirectX 12 (11_0+) with WDDM 2.0
- Shader Model 6.0 — GCN 1+, Kepler+, DirectX 12 (11_0+) with WDDM 2.1
- Shader Model 6.1 — GCN 1+, Kepler+, DirectX 12 (11_0+) with WDDM 2.3
- Shader Model 6.2 — GCN 1+, Kepler+, DirectX 12 (11_0+) with WDDM 2.4
- Shader Model 6.3 — GCN 1+, Kepler+, DirectX 12 (11_0+) with WDDM 2.5
- Shader Model 6.4 — GCN 1+, Kepler+, Skylake+, DirectX 12 (11_0+) with WDDM 2.6
- Shader Model 6.5 — GCN 1+, Kepler+, Skylake+, DirectX 12 (11_0+) with WDDM 2.7
- Shader Model 6.6 — GCN 4+, Maxwell+, DirectX 12 (11_0+) with WDDM 3.0
- Shader Model 6.7 — GCN 4+, Maxwell+, DirectX 12 (12_0+) with WDDM 3.1
- Shader Model 6.8 — RDNA 1+, Maxwell 2+, DirectX 12 (12_0+) with WDDM 3.1 / 3.2 with Agility SDK
"32 + 64" for Executed Instructions means "32 texture instructions and 64 arithmetic instructions."
Vertex shader comparison
[edit]| Vertex shader version | 1.0 | 1.1[14] | 2.0[7][14][8] | 2.0a[7][14][8] | 3.0[10][14] | 4.0[11] 4.1[12] 5.0[13] |
|---|---|---|---|---|---|---|
| # of instruction slots | 128 | 128 | 256 | 256 | ≥ 512 | ≥ 65536 |
| Max # of instructions executed | 128 | 128 | 1024 | 65536 | 65536 | Unlimited |
| Instruction predication | No | No | No | Yes | Yes | Yes |
| Temp registers | 12 | 12 | 12 | 16 | 32 | 4096 |
| # constant registers | ≥ 96 | ≥ 96 | ≥ 256 | 256 | ≥ 256 | 16×4096 |
| Address register | No | Yes | Yes | Yes | Yes | Yes |
| Static flow control | No | No | Yes | Yes | Yes | Yes |
| Dynamic flow control | No | No | No | Yes | Yes | Yes |
| Dynamic flow control depth | — | — | — | 24 | 24 | 64 |
| Vertex texture fetch | No | No | No | No | Yes | Yes |
| # of texture samplers | — | — | — | — | 4 | 128 |
| Geometry instancing support | No | No | No | No | Yes | Yes |
| Bitwise operators | No | No | No | No | No | Yes |
| Native integers | No | No | No | No | No | Yes |
See also
[edit]Footnotes
[edit]- ^ "Writing HLSL Shaders in Direct3D 9". Microsoft Docs. Retrieved February 22, 2021.
- ^ "High-level shader language (HLSL)". Microsoft Docs. Retrieved February 22, 2021.
- ^ "Fusion Industries :: Cg and HLSL FAQ ::". August 24, 2012. Archived from the original on August 24, 2012.
- ^ "Shader Model 5.1 Objects". Microsoft Docs. Retrieved February 22, 2021.
- ^ "HLSL Shader Model 6.0". Microsoft Docs. Retrieved February 22, 2021.
- ^ a b c d e f "Pixel Shader Differences". Microsoft Docs. August 19, 2020. Retrieved February 22, 2021.
- ^ a b c d e Peeper, Craig; Mitchell, Jason L. (July 2003). "Introduction to the DirectX 9 High-Level Shader Language". Microsoft Docs. Retrieved February 22, 2021.
- ^ a b c Shimpi, Anand Lal. "NVIDIA Introduces GeForce FX (NV30)". AnandTech. Archived from the original on June 10, 2013. Retrieved February 22, 2021.
- ^ Wilson, Derek. "ATI Radeon X800 Pro and XT Platinum Edition: R420 Arrives". AnandTech. Archived from the original on September 28, 2012. Retrieved February 22, 2021.
- ^ a b Shader Model 3.0, Ashu Rege, NVIDIA Developer Technology Group, 2004.
- ^ a b The Direct3D 10 System, David Blythe, Microsoft Corporation, 2006.
- ^ a b "Registers - ps_4_1". Microsoft Docs. August 23, 2019. Retrieved February 22, 2021.
- ^ a b "Registers - ps_5_0". Microsoft Docs. August 23, 2019. Retrieved February 22, 2021.
- ^ a b c d "Vertex Shader Differences". Microsoft Docs. August 19, 2020. Retrieved February 22, 2021.
External links
[edit]- Programming guide for HLSL at Microsoft Docs
- Introduction to the DirectX 9 High Level Shading Language, (ATI) AMD developer central
- Riemer's HLSL Introduction & Tutorial (includes sample code) Archived November 19, 2008, at the Wayback Machine
- HLSL Introduction
- DirectX Intermediate Language (DXIL) specification
High-Level Shader Language
View on Grokipediafloat4 for RGBA colors) and built-in libraries for mathematical and graphics operations, which streamline development while optimizing for GPU execution models like SIMT (Single Instruction, Multiple Threads).[2] Beyond core DirectX usage, HLSL has been adapted for cross-platform scenarios, such as compilation to SPIR-V for Vulkan via DXC, and integration into engines like Unity for Windows targets.[4][5] Its emphasis on readability and performance has positioned it as a foundational tool in game development, real-time visualization, and general-purpose computing on GPUs (GPGPU), though alternatives like DirectML are recommended for certain machine learning tasks.[1]
Overview
Definition and Purpose
High-Level Shader Language (HLSL) is a programming language developed by Microsoft, serving as the primary tool for authoring shaders within the DirectX graphics ecosystem.[1][2] Modeled after C and C++, HLSL provides a high-level, syntax-simplified interface for GPU programming, allowing developers to express complex graphics algorithms without delving into hardware-specific details.[1][2] The core purpose of HLSL is to facilitate the creation of programmable shaders that handle rendering, compute tasks, and visual effects in 3D graphics applications.[3] It enables precise control over the graphics pipeline, from transforming vertices to coloring pixels and performing general-purpose computations on the GPU, thereby supporting advanced features like realistic lighting, procedural textures, and physics simulations.[1][2] In HLSL, shaders function as compact programs executed in parallel across the GPU's processing units, optimizing for high-throughput tasks in real-time rendering.[1] This language was specifically introduced to replace the cumbersome low-level assembly code required for shader programming in prior DirectX iterations, streamlining development and improving portability across compatible hardware.[1][2]Relation to DirectX API
The High-Level Shader Language (HLSL) is developed for Direct3D, the graphics API component of the DirectX multimedia framework developed by Microsoft, and was introduced starting with DirectX 9 to enable programmable shaders in the 3D graphics pipeline.[1] Prior to this, Direct3D relied on fixed-function pipelines, but HLSL allowed developers to write custom code for key rendering stages, marking a shift toward greater flexibility in graphics programming. This integration positions HLSL as the standard language for authoring shaders within Direct3D applications across Windows platforms. Shaders authored in HLSL are compiled into bytecode or low-level assembly representations, which are then loaded and executed on the GPU through Direct3D device contexts and pipeline objects. Compilation typically occurs using the Direct3D Shader Compiler (via functions like D3DCompile or D3DCompileFromFile), producing optimized binary code tailored to specific shader models and hardware feature levels.[6][7] This process ensures compatibility with Direct3D's runtime, where shaders are bound to pipeline stages via API calls such as CreateVertexShader or SetPixelShader, facilitating efficient GPU execution without direct hardware access. HLSL supports programmability across multiple stages of the Direct3D graphics pipeline, including vertex processing for transforming geometry, pixel shading for per-fragment color and texture computations, and rasterization coordination between these stages. Subsequent Direct3D versions expanded this to include geometry shaders for primitive generation and manipulation (introduced in Direct3D 10) and compute shaders for general-purpose parallel processing (added in Direct3D 11).[3][8] These stages allow developers to intercept and customize the flow of data through the pipeline, from input vertices to final output pixels. By providing a high-level, C-like syntax for these programmable elements, HLSL enables full control over the 3D rendering pipeline, supporting advanced visual effects such as dynamic lighting, shadow mapping, and procedural texture generation that would be impractical with fixed-function hardware.[1] This programmability has been foundational to modern real-time graphics in games and simulations, leveraging Direct3D's hardware abstraction to achieve high performance across diverse GPU architectures.History and Development
Introduction in DirectX 9
The High-Level Shader Language (HLSL) was first publicly released on December 20, 2002, as part of the DirectX 9.0 SDK, marking a significant advancement in Microsoft's graphics programming ecosystem.[9] This launch followed an initial beta phase of DirectX 9 earlier in 2002, with Microsoft announcing broadened availability and highlighting HLSL's role in January 2003.[10] HLSL was further showcased at the Game Developers Conference (GDC) in March 2003, where demonstrations emphasized its potential for simplifying complex graphics development.[11] The primary motivation for introducing HLSL stemmed from the limitations of low-level shader assembly languages used in previous DirectX versions, such as DirectX 8, which required developers to write hardware-specific code that was error-prone and difficult to maintain.[10] By providing a higher-level abstraction, HLSL aimed to make GPU programming more accessible, allowing developers to focus on creative aspects of 3D graphics rather than intricate register management and instruction sets.[1] This shift was intended to broaden adoption among game developers and content creators, optimizing performance across diverse DirectX-compliant hardware without deep hardware knowledge.[10] At its debut, HLSL featured a C-like syntax designed to abstract away hardware-specific details, enabling straightforward compilation to assembly code for various GPU architectures.[1] It initially supported vertex and pixel shaders targeting Shader Models 1.1 through 2.0, with support for 3.0 added in the DirectX 9.0c update in 2004, allowing integration with DirectX 9's programmable pipeline for tasks like transforming vertices and computing pixel colors.[12][13] These capabilities were bundled with the D3DX library's compiler, facilitating seamless use within tools like Visual Studio.[10] HLSL's introduction revolutionized real-time graphics by empowering developers to implement sophisticated effects using high-level code, such as bump mapping for surface detailing and per-pixel lighting for realistic illumination.[14] This accessibility contributed to enhanced visual fidelity in games and applications, supporting over 2,500 titles that leveraged DirectX 9's features for immersive experiences.[14] By democratizing advanced shading techniques, HLSL laid foundational groundwork for subsequent innovations in programmable rendering.[1]Evolution Across DirectX Versions
The evolution of HLSL began to accelerate with the release of DirectX 10 in 2006, which introduced Shader Model 4.0 and marked a significant unification of shader programming. This model adopted a common-shader core, allowing vertex, pixel, and the newly added geometry shaders to share a consistent instruction set and resource access model, thereby simplifying development across pipeline stages. Geometry shaders enabled procedural geometry generation on the GPU, while stream output functionality permitted direct writing of vertex data back to buffers without rasterization, enhancing flexibility for advanced rendering techniques. These advancements were tightly integrated with HLSL, requiring all Direct3D 10 shaders to be authored in the language targeting this model. DirectX 11, launched in 2009, further expanded HLSL capabilities through Shader Model 5.0, introducing tessellation shaders to support dynamic mesh refinement for improved surface detail in real-time applications. This version also enhanced resource binding mechanisms, including support for shader resource views (SRVs), unordered access views (UAVs), and constant buffer improvements, which streamlined data access and reduced state management overhead. Additionally, compute shaders were added, enabling general-purpose GPU (GPGPU) computing within the HLSL framework by allowing shaders to operate on unstructured data grids independent of the graphics pipeline, thus broadening HLSL's utility beyond traditional rendering. With DirectX 12's introduction in 2015, HLSL entered Shader Model 6.0 and subsequent iterations, reaching version 6.9 by 2025, to accommodate advanced GPU features and low-level API control. Key enhancements included wave intrinsics for SIMD-wide operations, optimizing intra-thread communication in pixel and compute shaders; mesh shaders for more efficient triangle assembly and culling; and raytracing extensions via the DirectX Raytracing (DXR) API, which integrated hardware-accelerated ray tracing into HLSL pipelines. In 2025, Shader Model 6.9 introduced cooperative vector operations, native support for long vectors up to length 1024, and shader execution reordering, improving performance for ray tracing and compute workloads on modern hardware.[15] Support for SPIR-V interchange was progressively added, culminating in its adoption as the standard intermediate format in 2024 to facilitate cross-API compatibility and portability across Vulkan and other ecosystems. Notable milestones in HLSL's development include the open-sourcing of the DirectX Shader Compiler (DXC) in 2017, which replaced the legacy FXC compiler and enabled community contributions for improved optimization and cross-platform support. In September 2024, Microsoft announced the full adoption of SPIR-V as DirectX's interchange format starting with Shader Model 7.0, replacing DXIL to promote broader interoperability with open standards and reduce vendor lock-in.[16] These evolutions have positioned HLSL to fully leverage modern GPU architectures, such as NVIDIA's RTX series with dedicated raytracing tensor cores and AMD's RDNA architectures featuring enhanced compute units and raytracing accelerators, enabling high-fidelity real-time rendering and compute workloads on contemporary hardware.Syntax and Features
Core Syntax Elements
High-Level Shader Language (HLSL) is modeled after the syntax of C and C++, incorporating familiar constructs such as variable declarations, function definitions, and expressions while adding graphics-specific features for shader programming. This design facilitates readability and portability for developers accustomed to C-family languages, with HLSL code structured around global functions, statements, and preprocessor directives that compile to GPU instructions via the DirectX runtime.[2] Shader entry points in HLSL are typically defined as functions namedmain, which serve as the starting point for execution in a specific shader stage, such as vertex or pixel processing. For instance, a basic vertex shader entry point might appear as float4 main(float4 pos : POSITION) : SV_POSITION, where input and output parameters are annotated with semantics to bind them to pipeline inputs and outputs. These semantic annotations, such as POSITION for vertex positions or TEXCOORD for texture coordinates, ensure proper data flow between shader stages and fixed-function pipeline components, a requirement introduced to replace explicit register assignments in earlier assembly-based shading.[17][2]
In effects files (typically with .fx extension), HLSL code is organized into technique blocks that encapsulate rendering strategies, each containing one or more pass blocks to define sequential rendering operations with associated state settings like blending or depth testing. Techniques allow for multiple implementations tailored to hardware capabilities, while passes handle the granular application of shaders and states during rendering loops, though this framework is considered legacy in DirectX 11 and later, superseded by explicit state management.[18][1]
HLSL employs #pragma directives as preprocessor instructions to influence compilation behavior, such as #pragma pack_matrix(row_major) to specify matrix storage layout or directives for optimizing code generation. These pragmas provide hints to the compiler without altering core semantics, and unrecognized ones issue warnings but do not halt compilation; compilation targets, like shader models, are primarily set via command-line flags during the build process rather than pragmas.[19][20][21]
Control flow in HLSL mirrors C standards, supporting conditional statements like if-else for runtime branching and loops such as for, while, and do-while for iteration, enabling dynamic execution paths based on scalar or vector conditions. However, early shader models impose restrictions: for example, Shader Model 2.0 lacks support for dynamic loops and branching in pixel shaders, limiting control to static, compile-time unrolling to ensure predictable performance on older hardware, with subsequent models progressively relaxing these constraints for more flexible code.[22][2][23]
Matrix operations in HLSL default to column-major ordering for uniform parameters, where matrices are stored with columns aligned in memory, but this can be overridden to row-major via pragmas or type modifiers to match application-side data layouts. The mul() intrinsic handles matrix-vector and matrix-matrix multiplications, treating the second argument as a column vector if it is a vector, or the first as a row vector. For standard column-vector transformations, use mul(matrix, vector), which contrasts with row-vector conventions in some other shading languages and requires careful argument ordering to avoid transposition errors.[24][20][2]
In 2024, HLSL 202x was introduced, refining syntax with features like conforming literals for better C/C++ compatibility and deprecating the legacy effects framework syntax to encourage explicit state management.[25]
Data Types and Built-in Functions
HLSL supports a range of scalar data types for basic variable declarations, includingfloat for 32-bit floating-point values, int for 32-bit signed integers, uint for 32-bit unsigned integers, bool for true/false values, and half for 16-bit floating-point values to enable precision control in performance-sensitive scenarios.[26][27] The half type, while offering reduced precision, is mapped to float in Direct3D 10 shader targets for compatibility, though it supports lower-precision operations on capable hardware.[26]
Vector and matrix types build on scalars to facilitate graphics computations, with vectors like float3 or float4 holding 1 to 4 components of the same scalar type, and matrices like float4x4 arranging up to 16 components in a 1x1 to 4x4 grid.[28][29] These can be constructed using initializer lists, such as float3 pos = float3(1.0f, 2.0f, 3.0f); or float4x4 mat = {1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f};.[28][29] Vectors support swizzling for component access and rearrangement, as in pos.xyz to select the first three components or pos.zyx to reorder them.[28][27]
Textures and samplers handle image data access, with modern HLSL using Texture2D objects (and variants like Texture3D or TextureCube) declared as Texture2D tex : register(t0); to represent 2D image resources returning up to four components.[30] These pair with SamplerState objects, such as SamplerState samp : register(s0);, which define filtering and addressing modes like linear interpolation or wrap clamping.[31] Sampling occurs via methods like tex.Sample(samp, uv);, which performs bilinear texture lookup at coordinates uv using the sampler's settings, available in pixel shaders from model ps_4_0 onward; legacy Direct3D 9 used sampler2D types with tex2D(sampler, uv).[32][31]
HLSL provides intrinsic functions for common operations, categorized into mathematical, texture sampling, and graphics-specific utilities. Mathematical intrinsics include sin(x) and cos(x) for trigonometric computations on scalars or vectors, dot(a, b) for the dot product of two vectors returning a scalar, and cross(a, b) for the cross product of two float3 vectors yielding another float3.[33] Texture sampling intrinsics evolved from legacy functions like tex2D(s, t) in Direct3D 9, which samples a 2D texture at coordinates t using sampler s, to modern object methods such as Sample on Texture2D for filtered lookups.[33][32] Graphics-specific intrinsics encompass lerp(x, y, a) for linear interpolation between x and y by amount a (e.g., blending colors), and saturate(x) to clamp a value to the [0,1] range, useful for color normalization.[33][34][35]
Precision control extends to attributes and advanced intrinsics, with the [unroll] attribute applied to loops like [unroll] for(int i = 0; i < N; i++) { ... } to hint the compiler for full or partial unrolling (optionally [unroll(n)] for n iterations), optimizing performance by eliminating loop overhead at the cost of code size.[2] In Shader Model 6.0 and later, wave intrinsics enable SIMD operations across lanes in a wave (a group of threads, typically 32 or 64), including WaveActiveSum(x) to sum x across active lanes and broadcast the result, WaveReadLaneAt(value, index) to access a value from a specific lane, and WaveActiveAllTrue(cond) to check if a condition holds in all active lanes, facilitating efficient parallel reductions and synchronization without barriers.[36][2] These support types like half, float, int, and vectors thereof on wave-aware hardware.[36]
Shader Types and Models
Programmable Shader Stages
The programmable shader stages in HLSL form the core of the Direct3D graphics pipeline, allowing developers to customize vertex processing, primitive generation, tessellation, pixel shading, and general-purpose computation on the GPU. These stages execute HLSL code compiled into bytecode, with inputs and outputs bound via semantics to ensure seamless data flow between pipeline components. Each stage operates on specific data types, such as vertices or pixels, and contributes to rendering or compute tasks by transforming or generating graphical elements. Vertex shaders process individual vertices from input assemblies, performing transformations like world-space positioning, skinning, morphing, and per-vertex lighting to prepare data for subsequent pipeline stages. They receive vertex attributes (e.g., position, normals) from vertex buffers and output transformed attributes, including at minimum a clip-space position, which is passed to the rasterizer for primitive assembly. Conventionally, the entry point for a vertex shader is named VSMain, with a signature that uses input/output semantics like POSITION for coordinates.[37] Pixel shaders, also known as fragment shaders, operate on interpolated data from the rasterizer to compute per-pixel properties such as color, incorporating techniques like lighting, texturing, and post-processing effects. They take inputs including varying attributes from vertices (e.g., texture coordinates, colors) and system values like position or primitive ID, producing outputs such as render target colors and optional depth values, while supporting pixel discard for alpha testing. The typical entry point is PSMain, featuring semantics like COLOR for outputs and TEXCOORD for texture sampling.[38] Geometry shaders process entire primitives—such as triangles, lines, or points—after the vertex stage, enabling the generation or modification of geometry for applications like particle systems, shadow volumes, or fur rendering. They consume assembled primitives along with adjacent vertex data and can emit multiple vertices into streams (e.g., triangle strips or point lists), with outputs directed to the rasterizer or stream output buffers; the maximum output vertices must be declared statically. Entry points are often named GSMain, utilizing input assembler semantics like SV_PrimitiveID and output stream objects for emission.[39] Tessellation in HLSL involves two cooperative stages: hull shaders and domain shaders, which enable adaptive subdivision of low-order surfaces into detailed geometry for smoother rendering without high-polygon models. Hull shaders receive control points defining patches (up to 32 points) and operate in two phases—a control-point phase that outputs transformed patch points and a patch-constant phase that computes tessellation factors and constants for edge and interior subdivision—culling invalid patches based on factors. Domain shaders then evaluate positions for tessellated points using these factors, control points, and UV coordinates from the fixed-function tessellator, applying displacement or shading. Hull entry points are typically HSMain with phases separated by [patchconstantfunc] attributes, while domain shaders use DSMain signatures incorporating SV_DomainLocation semantics.[40][41] Compute shaders provide a flexible, general-purpose computing model on the GPU, decoupled from the rendering pipeline, for tasks like physics simulations, image processing, or data-parallel algorithms. They dispatch threads in groups over structured buffers or textures, supporting shared memory, synchronization, and atomic operations for efficient parallelism, with inputs from resource views and outputs written back to buffers. The entry point is conventionally CSMain, invoked via dispatch calls specifying thread group dimensions.[42]Shader Model Progression and Capabilities
The progression of HLSL shader models reflects the increasing complexity and flexibility of GPU programmable pipelines, starting from basic emulation of fixed-function hardware in early DirectX versions to sophisticated compute and rendering paradigms in modern ones. Shader Model 1.1 (SM 1.1), tied to DirectX 8 and supported in HLSL from DirectX 9 for compatibility, introduced rudimentary vertex and pixel shaders capable of emulating fixed-function transformations but with severe restrictions, such as no texture sampling in vertex shaders and limited arithmetic operations.[43] Shader Model 2.0 (SM 2.0) and 3.0 (SM 3.0), both under DirectX 9, significantly expanded capabilities; SM 2.0 added support for multiple render targets and basic flow control, while SM 3.0 enhanced this with dynamic branching, loops, and vertex texture sampling, enabling more complex effects like procedural geometry.[44] Shader Model 4.0 (SM 4.0), introduced with DirectX 10, marked a unification of shader stages with a common instruction set, eliminating separate vertex and pixel models in favor of a single profile (e.g., vs_4_0, ps_4_0) and removing legacy limits on instructions and registers, though bounded by hardware.[45] This model added geometry shaders for primitive generation and required Windows Vista or later. Shader Model 5.0 (SM 5.0), aligned with DirectX 11, further extended the pipeline with hull and domain shaders for tessellation, compute shaders for general-purpose computing, and support for unstructured addressable resources like byte-address buffers.[46] In Shader Model 6.x series for DirectX 12, capabilities advanced to include wave intrinsics for SIMD operations in SM 6.0, ray tracing shaders in SM 6.3 (using lib_6_3 profile for ray generation, closest-hit, and miss shaders), mesh and amplification shaders in SM 6.5 for GPU-driven geometry processing, and additional optimizations like variable rate shading in SM 6.1+ via per-draw attributes.[36][47][48] SM 6.8 builds on these with work graphs for dynamic task distribution and extended matrix operations, enhancing scalability for complex scenes.[49] Key resource limits evolved dramatically across models, transitioning from rigid caps in early versions to near-unlimited allocations in later ones, constrained only by hardware. The following table summarizes representative limits for instruction slots, constant registers, and texture samplers, drawn from DirectX 9 capabilities (SM 2.0–3.0) and DirectX 11+ (SM 5.0+); actual values vary by device caps queried via D3DCAPS9 or ID3D11Device::CheckFeatureSupport.[50][51][52]| Shader Model | Instruction Slots (VS/PS example) | Constant Registers (VS example) | Texture Samplers (PS example) |
|---|---|---|---|
| SM 2.0 | 256 / 96 | ≥256 | ≤16 |
| SM 3.0 | ≥512 / ≥512 | ≥256 | ≤16 |
| SM 5.0+ | ≥65,535 / ≥65,535 | 16 × 4,096 vectors | ≤128 |
Comparisons
With GLSL
High-Level Shader Language (HLSL) is the shading language developed by Microsoft specifically for use with DirectX and Direct3D APIs, whereas OpenGL Shading Language (GLSL) serves as the primary shading language for OpenGL and Vulkan graphics APIs, enabling cross-platform rendering on diverse hardware.[13] This fundamental API binding influences their design, with HLSL optimized for Windows-based DirectX ecosystems and GLSL emphasizing vendor-neutral standards maintained by the Khronos Group. Syntax differences between HLSL and GLSL are prominent in matrix handling and parameter passing. HLSL stores matrices in row-major order by default, requiring themul() intrinsic function for matrix-vector multiplication, such as float4 pos = mul(worldViewProj, float4(input.pos, 1.0));. In contrast, GLSL employs column-major storage and uses the * operator for multiplication, as in vec4 pos = worldViewProj * vec4(input.pos, 1.0);.[24] For input/output parameters, HLSL relies on semantics like SV_POSITION to bind variables to pipeline stages, e.g., out float4 SV_POSITION : POSITION;, while GLSL uses qualifiers such as in and out, like out vec4 gl_Position;.[54][55] These conventions ensure type safety and pipeline connectivity but require careful mapping during porting.[56]
Both languages offer feature parity in core constructs like vector types and swizzling operations—e.g., HLSL's float4 pos; pos.xy mirrors GLSL's vec4 pos; pos.xy—facilitating similar algorithmic expressiveness for shading computations. However, HLSL includes a legacy effects framework for encapsulating shaders, techniques, and passes in .fx files, which has been deprecated in favor of modular .hlsl files since Direct3D 11.[13][21] GLSL, conversely, supports compute shaders through extensions like ARB_compute_shader, allowing general-purpose GPU computing without equivalent legacy abstractions.[57] Built-in functions also diverge; for instance, HLSL's tex2D(sampler, uv) for 2D texture sampling corresponds to GLSL's texture(sampler2D, uv), with HLSL separating textures and samplers as distinct objects unlike GLSL's unified sampler.[55]
Porting shaders between GLSL and HLSL often involves automated tools and manual adjustments. Microsoft provides a GLSL-to-HLSL reference translator for Universal Windows Platform (UWP) applications, aiding conversion of OpenGL ES 2.0 shaders to Direct3D 11 equivalents by mapping variables, intrinsics, and qualifiers.[55] Key challenges include reconciling built-in differences, such as replacing GLSL's gl_FragCoord with HLSL's SV_POSITION in fragment shaders, and ensuring matrix transpose handling to maintain visual fidelity.[56]
In terms of advantages, HLSL benefits from tight integration with DirectX, offering seamless compilation via the DirectX Shader Compiler (DXC) and native support for Windows-specific features like raytracing extensions. GLSL, however, excels in portability, compiling across OpenGL, Vulkan, and even WebGL environments without platform-specific recompilation, making it preferable for multi-API applications.[4]