Recent from talks
Nothing was collected or created yet.
| LLVM | |
|---|---|
| Original authors | Chris Lattner, Vikram Adve |
| Developer | LLVM Developer Group |
| Initial release | 2003 |
| Stable release | 21.1.4[2] |
| Repository | |
| Written in | C++ |
| Operating system | Cross-platform |
| Type | Compiler |
| License | Apache License 2.0 with LLVM Exceptions (v9.0.0 or later)[3] Legacy license:[4] UIUC (BSD-style) |
| Website | www |
LLVM is a set of compiler and toolchain technologies[5] that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes.[6] The name LLVM originally stood for Low Level Virtual Machine. However, the project has since expanded, and the name is no longer an acronym but an orphan initialism.[7]
LLVM is written in C++ and is designed for compile-time, link-time, and runtime optimization. Originally implemented for C and C++, the language-agnostic design of LLVM has since spawned a wide variety of frontends: languages with compilers that use LLVM (or which do not directly use LLVM but can generate compiled programs as LLVM IR) include ActionScript, Ada, C# for .NET,[8][9][10] Common Lisp,[11] PicoLisp, Crystal, CUDA, D,[12] Delphi,[13] Dylan, Forth,[14] Fortran,[15] FreeBASIC, Free Pascal, Halide, Haskell, Idris,[16] Jai (only for optimized release builds), Java bytecode, Julia, Kotlin, LabVIEW's G language,[17][18] Objective-C, OpenCL,[19] PostgreSQL's SQL and PLpgSQL,[20] Ruby,[21] Rust,[22] Scala,[23][24] Standard ML,[25] Swift, Xojo, and Zig.
History
[edit]The LLVM project started in 2000 at the University of Illinois at Urbana–Champaign, under the direction of Vikram Adve and Chris Lattner. LLVM was originally developed as a research infrastructure to investigate dynamic compilation techniques for static and dynamic programming languages. LLVM was released under the University of Illinois/NCSA Open Source License,[3] a permissive free software licence. In 2005, Apple Inc. hired Lattner and formed a team to work on the LLVM system for various uses within Apple's development systems.[26] LLVM has been an integral part of Apple's Xcode development tools for macOS and iOS since Xcode 4 in 2011.[27]
In 2006, Lattner started working on a new project named Clang. The combination of the Clang frontend and LLVM backend is named Clang/LLVM or simply Clang.
The name LLVM was originally an initialism for Low Level Virtual Machine. However, the LLVM project evolved into an umbrella project that has little relationship to what most current developers think of as a virtual machine. This made the initialism "confusing" and "inappropriate", and since 2011 LLVM is "officially no longer an acronym",[28] but a brand that applies to the LLVM umbrella project.[29] The project encompasses the LLVM intermediate representation (IR), the LLVM debugger, the LLVM implementation of the C++ Standard Library (with full support of C++11 and C++14[30]), etc. LLVM is administered by the LLVM Foundation. Compiler engineer Tanya Lattner became its president in 2014[31] and was still in that post as of August 2024[update].[32]
"For designing and implementing LLVM", the Association for Computing Machinery presented Vikram Adve, Chris Lattner, and Evan Cheng with the 2012 ACM Software System Award.[33]
The project was originally available under the UIUC license. After v9.0.0 released in 2019,[34] LLVM relicensed to the Apache License 2.0 with LLVM Exceptions.[3] As of November 2022[update] about 400 contributions had not been relicensed.[35][36]
Features
[edit]LLVM can provide the middle layers of a complete compiler system, taking intermediate representation (IR) code from a compiler and emitting an optimized IR. This new IR can then be converted and linked into machine-dependent assembly language code for a target platform. LLVM can accept the IR from the GNU Compiler Collection (GCC) toolchain, allowing it to be used with a wide array of extant compiler front-ends written for that project. LLVM can also be built with gcc after version 7.5.[37]
LLVM can also generate relocatable machine code at compile-time or link-time or even binary machine code at runtime.
LLVM supports a language-independent instruction set and type system.[6] Each instruction is in static single assignment form (SSA), meaning that each variable (called a typed register) is assigned once and then frozen. This helps simplify the analysis of dependencies among variables. LLVM allows code to be compiled statically, as it is under the traditional GCC system, or left for late-compiling from the IR to machine code via just-in-time compilation (JIT), similar to Java. The type system consists of basic types such as integer or floating-point numbers and five derived types: pointers, arrays, vectors, structures, and functions. A type construct in a concrete language can be represented by combining these basic types in LLVM. For example, a class in C++ can be represented by a mix of structures, functions and arrays of function pointers.
The LLVM JIT compiler can optimize unneeded static branches out of a program at runtime, and thus is useful for partial evaluation in cases where a program has many options, most of which can easily be determined unneeded in a specific environment. This feature is used in the OpenGL pipeline of Mac OS X Leopard (v10.5) to provide support for missing hardware features.[38]
Graphics code within the OpenGL stack can be left in intermediate representation and then compiled when run on the target machine. On systems with high-end graphics processing units (GPUs), the resulting code remains quite thin, passing the instructions on to the GPU with minimal changes. On systems with low-end GPUs, LLVM will compile optional procedures that run on the local central processing unit (CPU) that emulate instructions that the GPU cannot run internally. LLVM improved performance on low-end machines using Intel GMA chipsets. A similar system was developed under the Gallium3D LLVMpipe, and incorporated into the GNOME shell to allow it to run without a proper 3D hardware driver loaded.[39]
In 2011, programs compiled by GCC outperformed those from LLVM by 10%, on average.[40][41] In 2013, phoronix reported that LLVM had caught up with GCC, compiling binaries of approximately equal performance.[42]
Components
[edit]LLVM has become an umbrella project containing multiple components.
Frontends
[edit]LLVM was originally written to be a replacement for the extant code generator in the GCC stack,[43] and many of the GCC frontends were modified to work with it, resulting in the now-defunct LLVM-GCC suite. The modifications generally involved a GIMPLE-to-LLVM IR step so that LLVM optimizers and codegen could be used instead of GCC's GIMPLE system. Apple was a significant user of LLVM-GCC through Xcode 4.x (2013).[44][45] This use of the GCC frontend was considered a temporary measure which became mostly obsolete with the advent of LLVM/Clang's more modern, modular codebase and compilation speed.
LLVM currently[as of?] supports compiling of Ada, C, C++, D, Delphi, Fortran, Haskell, Julia, Objective-C, Rust, and Swift using various frontends.
Widespread interest in LLVM has led to several efforts to develop new frontends for many languages. The one that has received the most attention is Clang, a newer compiler supporting C, C++, and Objective-C. Primarily supported by Apple, Clang is aimed at replacing the C/Objective-C compiler in the GCC system with a system that is more easily integrated with integrated development environments (IDEs) and has wider support for multithreading. Support for OpenMP directives has been included in Clang since release 3.8.[46]
The Utrecht Haskell compiler can generate code for LLVM. While the generator was in early stages of development, in many cases it was more efficient than the C code generator.[47] The Glasgow Haskell Compiler (GHC) backend uses LLVM and achieves a 30% speed-up of compiled code relative to native code compiling via GHC or C code generation followed by compiling, missing only one of the many optimizing techniques implemented by the GHC.[48]
Many other components are in various stages of development, including, but not limited to, the Rust compiler, a Java bytecode frontend, a Common Intermediate Language (CIL) frontend, the MacRuby implementation of Ruby 1.9, various frontends for Standard ML, and a new graph coloring register allocator.[citation needed]
Intermediate representation
[edit]
The core of LLVM is the intermediate representation (IR), a low-level programming language similar to assembly. IR is a strongly typed reduced instruction set computer (RISC) instruction set which abstracts away most details of the target. For example, the calling convention is abstracted through call and ret instructions with explicit arguments. Also, instead of a fixed set of registers, IR uses an infinite set of temporaries of the form %0, %1, etc. LLVM supports three equivalent forms of IR: a human-readable assembly format,[49] an in-memory format suitable for frontends, and a dense bitcode format for serializing. A simple "Hello, world!" program in the human-readable IR format:
@.str = internal constant [14 x i8] c"Hello, world\0A\00"
declare i32 @printf(ptr, ...)
define i32 @main(i32 %argc, ptr %argv) nounwind {
entry:
%tmp1 = getelementptr [14 x i8], ptr @.str, i32 0, i32 0
%tmp2 = call i32 (ptr, ...) @printf( ptr %tmp1 ) nounwind
ret i32 0
}
The many different conventions used and features provided by different targets mean that LLVM cannot truly produce a target-independent IR and retarget it without breaking some established rules. Examples of target dependence beyond what is explicitly mentioned in the documentation can be found in a 2011 proposal for "wordcode", a fully target-independent variant of LLVM IR intended for online distribution.[50] A more practical example is PNaCl.[51]
The LLVM project also introduces another type of intermediate representation named MLIR[52] which helps build reusable and extensible compiler infrastructure by employing a plugin architecture named Dialect.[53] It enables the use of higher-level information on the program structure in the process of optimization including polyhedral compilation.
Backends
[edit]At version 16, LLVM supports many instruction sets, including IA-32, x86-64, ARM, Qualcomm Hexagon, LoongArch, M68K, MIPS, NVIDIA Parallel Thread Execution (PTX, also named NVPTX in LLVM documentation), PowerPC, AMD TeraScale,[54] most recent AMD GPUs (also named AMDGPU in LLVM documentation),[55] SPARC, z/Architecture (also named SystemZ in LLVM documentation), and XCore.
Some features are not available on some platforms. Most features are present for IA-32, x86-64, z/Architecture, ARM, and PowerPC.[56] RISC-V is supported as of version 7.
In the past, LLVM also supported other backends, fully or partially, including C backend, Cell SPU, mblaze (MicroBlaze),[57] AMD R600, DEC/Compaq Alpha (Alpha AXP)[58] and Nios2,[59] but that hardware is mostly obsolete, and LLVM developers decided the support and maintenance costs were no longer justified.[citation needed]
LLVM also supports WebAssembly as a target, enabling compiled programs to execute in WebAssembly-enabled environments such as Google Chrome / Chromium, Firefox, Microsoft Edge, Apple Safari or WAVM. LLVM-compliant WebAssembly compilers typically support mostly unmodified source code written in C, C++, D, Rust, Nim, Kotlin and several other languages.
The LLVM machine code (MC) subproject is LLVM's framework for translating machine instructions between textual forms and machine code. Formerly, LLVM relied on the system assembler, or one provided by a toolchain, to translate assembly into machine code. LLVM MC's integrated assembler supports most LLVM targets, including IA-32, x86-64, ARM, and ARM64. For some targets, including the various MIPS instruction sets, integrated assembly support is usable but still in the beta stage.[citation needed]
Linker
[edit]The lld subproject is an attempt to develop a built-in, platform-independent linker for LLVM.[60] lld aims to remove dependence on a third-party linker. As of May 2017[update], lld supports ELF, PE/COFF, Mach-O, and WebAssembly[61] in descending order of completeness. lld is faster than both flavors of GNU ld.[citation needed]
Unlike the GNU linkers, lld has built-in support for link-time optimization (LTO). This allows for faster code generation as it bypasses the use of a linker plugin, but on the other hand prohibits interoperability with other flavors of LTO.[62]
C++ Standard Library
[edit]The LLVM project includes an implementation of the C++ Standard Library named libc++, dual-licensed under the MIT License and the UIUC license.[63]
Since v9.0.0, it was relicensed to the Apache License 2.0 with LLVM Exceptions.[3]
Polly
[edit]This implements a suite of cache-locality optimizations as well as auto-parallelism and vectorization using a polyhedral model.[64]
Debugger
[edit]C Standard Library
[edit]llvm-libc is an incomplete, upcoming, ABI independent C standard library designed by and for the LLVM project.[65]
Derivatives
[edit]Due to its permissive license, many vendors release their own tuned forks of LLVM. This is officially recognized by LLVM's documentation, which suggests against using version numbers in feature checks for this reason.[66] Some of the vendors include:
- AMD's AMD Optimizing C/C++ Compiler is based on LLVM, Clang, and Flang.
- Apple maintains an open-source fork for Xcode.[67]
- Arm provides a number of LLVM based toolchains, including Arm Compiler for Embedded targeting bare-metal development and Arm Compiler for Linux targeting the High Performance Computing market
- Flang, Fortran project in development as of 2022[update]
- IBM is adopting LLVM in its C/C++ and Fortran compilers.[68]
- Intel has adopted LLVM for their next generation Intel C++ Compiler.[69]
- The Los Alamos National Laboratory has a parallel-computing fork of LLVM 8 named "Kitsune".[70]
- Nvidia uses LLVM in the implementation of its NVVM CUDA Compiler.[71] The NVVM compiler is distinct from the "NVPTX" backend mentioned in the Backends section, although both generate PTX code for Nvidia GPUs.
- Since 2013, Sony has been using LLVM's primary front-end Clang compiler in the software development kit (SDK) of its PlayStation 4 console.[72]
See also
[edit]- Common Intermediate Language
- HHVM
- C--
- Amsterdam Compiler Kit (ACK)
- Optimizing compiler
- LLDB (debugger)
- GNU lightning
- GNU Compiler Collection (GCC)
- Pure
- OpenCL
- ROCm
- Emscripten
- TenDRA Distribution Format
- Architecture Neutral Distribution Format (ANDF)
- Comparison of application virtualization software
- SPIR-V
- University of Illinois at Urbana Champaign discoveries & innovations
Literature
[edit]- Chris Lattner - The Architecture of Open Source Applications - Chapter 11 LLVM, ISBN 978-1257638017, released 2012 under CC BY 3.0 (Open Access).[73]
- LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, a published paper by Chris Lattner, Vikram Adve
References
[edit]- ^ "LLVM Logo". The LLVM Compiler Infrastructure Project.
- ^ "LLVM 21.1.4". October 21, 2025. Retrieved October 21, 2025.
- ^ a b c d "LICENSE.TXT". llvm.org. Retrieved September 24, 2019.
- ^ "LLVM Developer Policy — LLVM 20.0.0git documentation". llvm.org. Retrieved November 9, 2024.
- ^ "The LLVM Compiler Infrastructure Project". Retrieved March 11, 2016.
- ^ a b "LLVM Language Reference Manual". Retrieved June 9, 2019.
- ^ "The LLVM Compiler Infrastructure Project". llvm.org. Archived from the original on December 29, 2024. Retrieved January 13, 2025.
- ^ "Announcing LLILC - A new LLVM-based Compiler for .NET". dotnetfoundation.org. Archived from the original on December 12, 2021. Retrieved September 12, 2020.
- ^ "Mono LLVM". Retrieved March 10, 2013.
- ^ Lattner, Chris (2011). "LLVM". In Brown, Amy; Wilson, Greg (eds.). The Architecture of Open Source Applications.
- ^ "Clasp". Clasp Developers. Retrieved December 2, 2024.
- ^ "LDC". D Wiki. Retrieved December 2, 2024.
- ^ "LLVM-based Delphi Compilers". Embarcadero. Retrieved November 26, 2024.
- ^ "MovForth". GitHub. November 28, 2021.
- ^ "The Flang Compiler". LLVM Project. Retrieved December 2, 2024.
- ^ "Rapid". Rapid. Retrieved November 22, 2024.
- ^ William Wong (May 23, 2017). "What's the Difference Between LabVIEW 2017 and LabVIEW NXG?". Electronic Design.
- ^ "NI LabVIEW Compiler: Under the Hood".
- ^ Larabel, Michael (April 11, 2018). "Khronos Officially Announces Its LLVM/SPIR-V Translator". Phoronix.com.
- ^ "32.1. What is JIT compilation?". PostgreSQL Documentation. November 12, 2020. Retrieved January 25, 2021.
- ^ "Features". RubyMotion. Scratchwork Development LLC. Retrieved June 17, 2017.
RubyMotion transforms the Ruby source code of your project into ... machine code using a[n] ... ahead-of-time (AOT) compiler, based on LLVM.
- ^ "Code Generation - Guide to Rustc Development". rust-lang.org. Retrieved January 4, 2023.
- ^ Reedy, Geoff (September 24, 2012). "Compiling Scala to LLVM". St. Louis, Missouri, United States. Retrieved February 19, 2013.
- ^ "Scala Native". Retrieved November 26, 2023.
- ^ "LLVMCodegen". MLton. Retrieved November 26, 2024.
- ^ Adam Treat (February 19, 2005), mkspecs and patches for LLVM compile of Qt4, archived from the original on October 4, 2011, retrieved January 27, 2012
- ^ "Developer Tools Overview". Apple Developer. Apple. Archived from the original on April 23, 2011.
- ^ Lattner, Chris (December 21, 2011). "The name of LLVM". llvm-dev (Mailing list). Retrieved March 2, 2016.
'LLVM' is officially no longer an acronym. The acronym it once expanded too was confusing, and inappropriate almost from day 1. :) As LLVM has grown to encompass other subprojects, it became even less useful and meaningless.
- ^ Lattner, Chris (June 1, 2011). "LLVM". In Brown, Amy; Wilson, Greg (eds.). The architecture of open source applications. Lulu.com. ISBN 978-1257638017.
The name 'LLVM' was once an acronym, but is now just a brand for the umbrella project.
- ^ ""libc++" C++ Standard Library".
- ^ Lattner, Chris (April 3, 2014). "The LLVM Foundation". LLVM Project Blog.
- ^ "Board of Directors". LLVM Foundation. Retrieved September 18, 2025.
- ^ "ACM Software System Award". ACM.
- ^ Wennborg, Hans (September 19, 2019). "[llvm-announce] LLVM 9.0.0 Release".
- ^ "Relicensing Long Tail". foundation.llvm.org. November 11, 2022. Archived from the original on May 13, 2024. Retrieved April 1, 2022.
- ^ "LLVM relicensing - long tail". LLVM Project. Retrieved November 27, 2022 – via Google Docs.
- ^ "⚙ D156286 [docs] Bump minimum GCC version to 7.5". reviews.llvm.org. Retrieved July 28, 2023.
- ^ Lattner, Chris (August 15, 2006). "A cool use of LLVM at Apple: the OpenGL stack". llvm-dev (Mailing list). Retrieved March 1, 2016.
- ^ Michael Larabel, "GNOME Shell Works Without GPU Driver Support", phoronix, November 6, 2011
- ^ Makarov, V. "SPEC2000: Comparison of LLVM-2.9 and GCC4.6.1 on x86". Retrieved October 3, 2011.
- ^ Makarov, V. "SPEC2000: Comparison of LLVM-2.9 and GCC4.6.1 on x86_64". Retrieved October 3, 2011.
- ^ Larabel, Michael (December 27, 2012). "LLVM/Clang 3.2 Compiler Competing With GCC". Retrieved March 31, 2013.
- ^ Lattner, Chris; Adve, Vikram (May 2003). Architecture For a Next-Generation GCC. First Annual GCC Developers' Summit. Retrieved September 6, 2009.
- ^ "LLVM Compiler Overview". developer.apple.com.
- ^ "Xcode 5 Release Notes". Apple Inc.
- ^ "Clang 3.8 Release Notes". Retrieved August 24, 2016.
- ^ "Compiling Haskell To LLVM". Retrieved February 22, 2009.
- ^ "LLVM Project Blog: The Glasgow Haskell Compiler and LLVM". May 17, 2010. Retrieved August 13, 2010.
- ^ "LLVM Language Reference Manual". LLVM.org. January 10, 2023.
- ^ Kang, Jin-Gu. "Wordcode: more target independent LLVM bitcode" (PDF). Retrieved December 1, 2019.
- ^ "PNaCl: Portable Native Client Executables" (PDF). Archived from the original (PDF) on 2 May 2012. Retrieved 25 April 2012.
- ^ "MLIR". mlir.llvm.org. Retrieved June 7, 2022.
- ^ "Dialects - MLIR". mlir.llvm.org. Retrieved June 7, 2022.
- ^ Stellard, Tom (March 26, 2012). "[LLVMdev] RFC: R600, a new backend for AMD GPUs". llvm-dev (Mailing list).
- ^ "User Guide for AMDGPU Backend — LLVM 15.0.0git documentation".
- ^ Target-specific Implementation Notes: Target Feature Matrix // The LLVM Target-Independent Code Generator, LLVM site.
- ^ "Remove the mblaze backend from llvm". GitHub. July 25, 2013. Retrieved January 26, 2020.
- ^ "Remove the Alpha backend". GitHub. October 27, 2011. Retrieved January 26, 2020.
- ^ "[Nios2] Remove Nios2 backend". GitHub. January 15, 2019. Retrieved January 26, 2020.
- ^ "lld - The LLVM Linker". The LLVM Project. Retrieved May 10, 2017.
- ^ "WebAssembly lld port".
- ^ "42446 – lld can't handle gcc LTO files". bugs.llvm.org.
- ^ ""libc++" C++ Standard Library".
- ^ "Polly - Polyhedral optimizations for LLVM".
- ^ "llvm-libc: An ISO C-conformant Standard Library — libc 15.0.0git documentation". libc.llvm.org. Retrieved July 18, 2022.
- ^ "Clang Language Extensions". Clang 12 documentation.
Note that marketing version numbers should not be used to check for language features, as different vendors use different numbering schemes. Instead, use the Feature Checking Macros.
- ^ "apple/llvm-project". Apple. September 5, 2020.
- ^ "IBM C/C++ and Fortran compilers to adopt LLVM open source infrastructure". July 29, 2022.
- ^ "Intel C/C++ compilers complete adoption of LLVM". Intel. Retrieved August 17, 2021.
- ^ "lanl/kitsune". Los Alamos National Laboratory. February 27, 2020.
- ^ "NVVM IR Specification 1.5".
The current NVVM IR is based on LLVM 5.0
- ^ Developer Toolchain for ps4 (PDF), retrieved February 24, 2015
- ^ Lattner, Chris (March 15, 2012). "Chapter 11". The Architecture of Open Source Applications. Amy Brown, Greg Wilson. ISBN 978-1257638017.
External links
[edit]History
Origins and Founding
LLVM originated as a research project in December 2000 at the University of Illinois at Urbana-Champaign (UIUC), spearheaded by graduate student Chris Lattner under the supervision of Professor Vikram Adve within the IMPACT research group, a prominent center for parallel computing and compiler innovations at the institution.[7][8] The project emerged from Lattner's master's thesis work, aiming to create a modular compiler infrastructure that could support advanced program analysis and optimization techniques beyond the capabilities of contemporary tools.[9] The primary motivation behind LLVM's development was to overcome key limitations in existing compilers, such as the GNU Compiler Collection (GCC), which were largely monolithic and lacked mechanisms for persistent, language-independent program representations suitable for lifelong analysis and transformation across compile-time, link-time, runtime, and even idle periods.[10][7] Traditional compilers like GCC provided little support for retaining detailed program information after compilation, hindering research in areas like whole-program optimization, user-directed profiling, and transparent instrumentation for arbitrary applications. By designing LLVM as a collection of reusable libraries with well-defined interfaces, Adve and Lattner sought to enable more flexible experimentation in compiler research, facilitating code reuse across different languages and optimization stages without the constraints of rigid, special-purpose tools.[10] Initial development of LLVM was supported by funding from the National Science Foundation (NSF) through the Next Generation Software program, including grants EIA-0093426 (an NSF CAREER award to Adve) and EIA-0103756, which backed the foundational work on multi-stage optimization infrastructures.[11] This sponsorship aligned with broader efforts in the IMPACT group to advance compiler technologies for parallel and embedded systems. The project's first public release, LLVM 1.0, occurred on October 24, 2003, introducing a stable C frontend, a beta C++ frontend, and backends for x86 and SPARC V9 architectures, with support for both static compilation and just-in-time (JIT) code generation.[4] From its inception, LLVM was distributed as open-source software under a permissive license, allowing immediate adoption by researchers and developers for building custom compilers and analysis tools.[7]Key Milestones and Releases
The LLVM project marked its initial public release with version 1.0 in October 2003, establishing a foundation for modular compiler research focused on lifelong code optimization.[12] LLVM 2.0, released on May 23, 2007, introduced substantial enhancements to optimization capabilities, including a complete rewrite of the pass manager for greater extensibility, improved loop strength reduction with expression sinking, and advanced scalar replacement for unions and vectors.[13] Apple's involvement began in 2005 when project creator Chris Lattner joined the company, accelerating LLVM's practical adoption in production environments; this culminated in the start of Clang frontend development in July 2007 to provide a C/C++/Objective-C parser integrated with LLVM's backend.[14] In December 2011, LLVM 3.0 brought key advancements in just-in-time compilation through the introduction of MC-JIT, an in-memory object file emitter leveraging the MC framework for improved code generation and dynamic linking support.[15] The LLVM Foundation was established in 2014 as a nonprofit organization dedicated to advancing compiler education and project sustainability through events, grants, and community support.[16] Building on MC-JIT, the ORC (On-Request Compilation) JIT APIs landed in LLVM's mainline in January 2015 (with LLVM 3.7), offering a more flexible and performant layer for runtime code generation by enabling modular compilation and lazy symbol resolution.[17] A major licensing shift occurred in 2019, with LLVM adopting the Apache 2.0 License with LLVM Exceptions starting from the LLVM 8.0 release in March, replacing the prior University of Illinois/NCSA Open Source License to broaden compatibility and patent protections while maintaining open-source principles.[18][19] LLVM 9.0.0, released in September 2019, integrated MLIR (Multi-Level Intermediate Representation) as a core component, enabling dialect-based representations for domain-specific optimizations and facilitating compiler infrastructure reuse across hardware targets.[20] LLVM 10.0.0 followed in March 2020, featuring the addition of thefreeze instruction for undefined behavior handling, the Attributor framework for interprocedural optimizations, and matrix math intrinsics to support emerging AI and HPC workloads.[21]
By LLVM 18.1.0 in March 2024, GPU support saw significant enhancements, including initial targeting for AMD's GFX12/RDNA4 architecture, improved NVPTX backend for NVIDIA CUDA, and expanded offloading capabilities for heterogeneous computing in tools like Clang.[22]
LLVM 19.1.0, released in September 2024, advanced RISC-V support with full Zabha extension and Ztso ratification, alongside AArch64 enhancements for new Cortex processors and improved WebAssembly symbol handling in tools.[23]
LLVM 20.1.0, released in March 2025, promoted the SPIR-V backend to official status for OpenCL and Vulkan, introduced the IRNormalizer pass for module standardization, and added support for Armv9.6-A and new RISC-V extensions.[24]
LLVM 21.1.0, released in August 2025, further expanded AArch64 with execute-only memory features, enhanced RISC-V for Qualcomm uC extensions, and removed legacy IR elements like recursive types to modernize the infrastructure.[25]
Evolution and Institutional Support
LLVM's community has grown substantially since its early years, evolving from a small research team of approximately 10 developers in 2003 to over 2,000 active contributors by 2025.[26] This expansion reflects the project's increasing adoption across industry and academia, with significant contributions from major organizations including Apple, which has driven much of the Clang frontend development; Google, focusing on optimizations for Chrome and Android; Intel, enhancing x86 backend support; and NVIDIA, advancing GPU code generation capabilities.[1] The total number of unique authors committing code reached a record 2,138 in 2024 alone, underscoring the vibrant and collaborative nature of the ecosystem.[26] Funding for LLVM has come from diverse sources, supporting its development and sustainability. Early work at the University of Illinois at Urbana-Champaign (UIUC) was backed by grants from the National Science Foundation (NSF), enabling foundational research into lifelong program optimization.[27] Corporate sponsorships from tech giants like Apple, Intel, and Google have provided ongoing resources through the LLVM Foundation, which manages donations and facilitates community initiatives. Additionally, specialized research efforts, such as those at UIUC's centers exploring compiler technologies, have further bolstered institutional support.[16] The annual LLVM Developers' Meetings, starting in 2007, have been instrumental in unifying the community and standardizing development processes.[28] These events, now held multiple times a year across regions like the US, Europe, and Asia, bring together hundreds of developers for technical talks, birds-of-a-feather sessions, and planning discussions, fostering innovation and resolving key challenges in compiler infrastructure.[29] By 2012, LLVM had achieved widespread integration into major Linux distributions, including Ubuntu 12.04 and Fedora, where it became readily available via package managers for building and optimizing software.[30] This accessibility accelerated its adoption among open-source projects and system tools. Furthermore, since 2019 with the release of Android NDK r19, LLVM-based tools like Clang and LLD have served as the default toolchain for native development, enabling efficient cross-compilation for Android's diverse architectures.[31]Overview and Design Principles
Core Objectives and Philosophy
LLVM's core objectives center on providing a robust infrastructure for lifelong program analysis and transformation, enabling optimizations across compile-time, link-time, run-time, and offline stages in a transparent and language-independent manner.[32] This framework aims to support both static and dynamic compilation for arbitrary programming languages, fostering reusable components that can be applied in diverse environments without imposing runtime dependencies.[1] The design philosophy emphasizes creating a system that combines modularity, whole-program optimization, and profile-guided transformations, addressing limitations in traditional compilers by preserving intermediate representations for extended use.[32] A foundational principle of LLVM is its modular architecture, which decomposes the compilation process into interchangeable libraries for front-ends, optimizers, and back-ends, promoting reusability across different tools and projects.[32] This modularity allows developers to leverage the same optimization passes for multiple languages, reducing redundancy and enabling rapid experimentation in compiler research.[1] Language independence is achieved through a low-level intermediate representation (IR) that abstracts away high-level language specifics, making LLVM suitable for compiling diverse languages such as C++, Rust, and even scripting languages like Python in production settings.[32] Key tenets include the use of a type-safe IR, which incorporates a language-independent type system to support type-safe operations and facilitate advanced analyses, including those for verifying memory access safety, with empirical evidence showing that a significant portion of memory accesses in benchmarks like SPECINT 2000 can be verified as type-safe.[32] LLVM supports aggressive optimization through interprocedural passes that operate on the preserved IR, enabling techniques like link-time optimization that are more efficient than those in monolithic systems.[32] Additionally, built-in support for just-in-time (JIT) compilation via an execution engine allows for dynamic code generation at runtime, which is particularly valuable for applications requiring on-the-fly compilation, such as virtual machines and embedded systems.[1] Compared to monolithic compilers like GCC, LLVM offers advantages in easier testing, retargeting to new architectures, and performing static analyses due to its component-based design, which avoids the tight coupling found in traditional systems and results in faster whole-program optimizations—for instance, reducing optimization time for benchmarks like 164.gzip from over 3 seconds in GCC to mere milliseconds.[32] This makes LLVM an ideal target for compiler researchers experimenting with novel analyses, tool developers building static analyzers or debuggers, and production environments such as embedded systems where modularity aids in customizing toolchains for specific hardware constraints.[1]Modular Architecture
LLVM's modular architecture is organized around a three-stage compiler pipeline that separates concerns to enhance reusability and maintainability. The frontend stage handles parsing source code from various programming languages and translates it into LLVM Intermediate Representation (IR), a platform-independent form that captures the program's semantics. This IR then flows to the middle-end stage, where optimizations are applied to improve performance, such as through instruction selection and dead code elimination, without regard to the target hardware. Finally, the backend stage takes the optimized IR and generates machine-specific code, including assembly or object files, tailored to the intended architecture like x86 or ARM. This separation allows independent development of each stage, enabling LLVM to support diverse languages and targets efficiently.[33][34] Central to the middle-end's modularity is the pass manager system, which orchestrates a sequence of transformation passes that operate on the IR. Each pass performs a specific analysis or optimization, such as constant propagation or loop vectorization, and the pass manager composes these into pipelines, scheduling them to minimize redundant computations and ensure dependencies are resolved. The new pass manager, introduced to replace the legacy version, uses a concept-based approach with analysis managers to track preserved analyses after each pass, allowing for more efficient and flexible composition of transformations. This infrastructure enables developers to chain passes modularly, fostering extensibility while maintaining the pipeline's integrity.[35][33] LLVM supports multiple compilation modes to accommodate different use cases, including static (ahead-of-time) compilation for producing optimized executables, dynamic compilation for runtime linking of libraries, and just-in-time (JIT) compilation for on-the-fly code generation in interpreters or virtual machines. In static mode, the full pipeline generates persistent machine code; dynamic mode leverages runtime components for shared libraries; and JIT mode uses the ExecutionEngine to emit and execute code immediately, balancing speed and optimization depth. These modes are unified through the IR, allowing the same core infrastructure to serve both offline and online compilation scenarios.[36][34] The architecture's extensibility is facilitated through plugins and APIs that allow integration of custom components without modifying the core system. Developers can implement new passes as dynamic libraries loaded via the plugin interface, registering them with the pass manager for inclusion in pipelines. APIs for frontends, backends, and passes provide hooks for extending functionality, such as adding support for novel optimizations or targets, making LLVM suitable for research prototypes and production compilers alike. This design has enabled widespread adoption, with contributions from academia and industry enhancing its capabilities over time.[35][37]Subprojects
The LLVM project consists of several primary subprojects, each serving specific roles in compiler infrastructure, tooling, and runtime support.[1]- LLVM Core: Provides a source- and target-independent optimizer and code generation support for many architectures, built around the LLVM Intermediate Representation (IR).
- Clang: A compiler for C, C++, Objective-C, and Objective-C++, focused on fast compilation, excellent diagnostics, and tools like the Clang Static Analyzer and clang-tidy for bug detection.
- LLDB: A high-performance native debugger that leverages Clang ASTs, LLVM JIT, and disassembler for efficient debugging.
- compiler-rt: Supplies low-level builtins, runtime libraries, and sanitizers (e.g., AddressSanitizer, ThreadSanitizer) for dynamic testing.
- libc++ and libc++abi: A standards-conformant, high-performance C++ Standard Library and ABI implementation with full C++11/C++14 support.
- libc: A high-performance, standards-conformant C Standard Library integrated with LLVM.
- MLIR: A reusable, extensible compiler infrastructure for addressing fragmentation, heterogeneous hardware, and domain-specific compilers.
- Flang: A Fortran frontend for compiling Fortran code.
- LLD: A fast, drop-in replacement linker.
- BOLT: A post-link optimizer that improves performance via profile-guided code layout.
- polly: Implements cache-locality optimizations, auto-parallelism, and vectorization using a polyhedral model.
- libclc: Implements the OpenCL standard library.
- klee: A symbolic execution tool ("symbolic virtual machine") for bug finding and property proving.
- OpenMP: Provides an OpenMP runtime for use with Clang's OpenMP implementation.
Intermediate Representation
Structure of LLVM IR
LLVM Intermediate Representation (IR) is a low-level, platform-independent language that serves as the core data structure for the LLVM compiler infrastructure, designed in Static Single Assignment (SSA) form to facilitate optimizations.[38] In SSA, each variable is assigned exactly once, with uses referencing that single definition, enabling efficient analysis and transformation; values are represented as either named registers (e.g.,%x) or unnamed temporaries (e.g., %0).[39] LLVM IR supports two primary formats: a human-readable textual assembly language, which resembles a low-level programming language with syntax for declarations and operations, and a binary bitcode format for compact serialization and storage, both of which are equivalent in expressiveness.[40]
The structure of LLVM IR is hierarchical, beginning with a module as the top-level container that encompasses all code and data for a translation unit.[41] A module includes global variables (e.g., @global_var = global i32 42), functions, metadata nodes, and attributes, along with optional specifications like the target data layout (e.g., target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128") and triple (e.g., target triple = "x86_64-unknown-linux-gnu").[41] Functions within a module are declared or defined with a signature specifying the return type, parameter types, and optional variadic arguments (e.g., define i32 @foo(i32 %a, i32 %b) { ... }), and they may carry attributes for optimization guidance.[42] Each function body consists of one or more basic blocks, which are linear sequences of instructions that start after a label (e.g., entry:) and end with a terminator instruction like ret or br, ensuring control flow is explicit and block boundaries align with potential jumps.[43] Instructions form the atomic operations within basic blocks, including arithmetic (e.g., %sum = add i32 %a, %b), memory access (e.g., %val = load i32, ptr %ptr or store i32 %val, ptr %ptr), control flow (e.g., call void @bar()), and conversions, all typed and in SSA form where the result is a new value.[44]
The type system of LLVM IR is strongly typed and supports a range of primitives, derived types, and aggregates to represent data at a low level.[45] Primitive types include integers of arbitrary bit width (e.g., i1 for booleans, i32 for 32-bit signed integers), floating-point types (e.g., float, double), and void for functions without return values.[46] Pointers are denoted as ptr (opaque by default since LLVM 15 and the only supported form since LLVM 17) with optional address spaces for memory regions (e.g., ptr addrspace(1) for global memory), allowing representation of addresses without specifying pointee types in modern IR.[47] Aggregate types include structs, defined as packed or unpacked collections of types (e.g., %{ i32, float } for a literal struct or %MyStruct = type { i32, float } for an identified one, which can be recursive or opaque), and other aggregates like fixed-size arrays (e.g., [10 x i32]) and vectors (e.g., <4 x float> for SIMD operations).[48] These types enable modeling of complex data structures while maintaining type safety throughout the IR.
LLVM IR incorporates metadata and attributes as annotations that provide supplementary information without affecting core computation, primarily for optimization and debugging.[49] Metadata nodes are distinct entities (e.g., !0 = !{ i32 42, metadata !"debug info" }) referenced via ! attachments to instructions or globals (e.g., load i32, ptr %p, !dbg !0), often used for source-level details like line numbers.[49] Attributes, grouped into sets (e.g., #0 = { nounwind }), annotate functions, parameters, or callsites with hints such as nounwind (indicating no unwind exceptions), readonly (for functions without side effects), or align 16 (for memory alignment), allowing passes to apply targeted transformations like inlining or dead code elimination. Note that as of LLVM 21, the nocapture attribute has been replaced by the captures attribute (e.g., captures(none)), and inspecting uses of ConstantData is no longer permitted.[50][25]
Semantics and Operations
LLVM IR employs a flat address space model, where all memory accesses occur through pointers in a single linear address space (default address space 0), with target-specific semantics possible in non-zero address spaces.[38] This model assumes no inherent types for memory locations, relying instead on metadata and attributes for type-based alias analysis (TBAA) to enable optimizations while preventing invalid aliasing assumptions.[38] Strict aliasing is enforced through attributes such asnoalias on function parameters and return values, which guarantee that memory locations accessed via one pointer do not overlap with those accessed via unrelated pointers, allowing aggressive optimizations without introducing undefined behavior.[38] Undefined behavior arises from actions like dereferencing null or misaligned pointers, accessing memory outside an object's lifetime, or violating pointer attributes (e.g., nonnull or dereferenceable), ensuring that the IR remains a sound foundation for compiler transformations.[38] For concurrent operations, LLVM adopts a memory model based on a happens-before partial order, where atomic instructions provide synchronization points to establish visibility and ordering guarantees across threads.[51]
Control flow in LLVM IR is structured around basic blocks, which are sequences of instructions ending in a terminator that dictates the next block, forming a static single assignment (SSA) graph.[38] Unconditional and conditional branches use the br instruction to transfer control to labeled basic blocks, while function calls employ the call instruction, which continues execution immediately after the call unless specified otherwise, supporting various calling conventions like ccc (C) or fastcc.[38] Phi nodes, placed at the start of basic blocks, select values based on the incoming predecessor edge, enabling SSA form to merge control paths without explicit variables.[38] Exception handling integrates via the invoke instruction, which calls a function but specifies an unwind destination (a landing pad) in case of an exception; upon unwinding, control transfers to the landingpad instruction in that block, which processes the exception using clauses like catch for type matching or filter for exclusion, potentially resuming unwinding with resume if unhandled.[52] This mechanism ensures precise exception propagation while maintaining SSA properties through the personality function defined per module.[52]
Intrinsic functions in LLVM IR provide a mechanism for low-level, platform-specific operations that cannot be expressed through standard instructions, always declared as external and invoked via call or invoke.[38] Examples include memory manipulation intrinsics like llvm.memcpy, which copies a specified number of bytes between pointers with optional volatile semantics to preserve side effects, and llvm.memmove for overlapping regions.[38] Atomic intrinsics, such as atomicrmw for read-modify-write operations (e.g., add or xchg) and cmpxchg for compare-and-swap, support concurrency with memory ordering constraints like monotonic for weak consistency or seq_cst for sequential consistency, ensuring thread-safe access without races leading to undefined behavior.[51] These intrinsics map to hardware instructions or library calls during code generation, bridging high-level semantics to target-specific behaviors.[38]
Verification passes in LLVM ensure IR well-formedness by checking syntactic and semantic rules, including type safety, operand validity, and structural integrity, with the verifier automatically invoked on module loading or pass execution.[38] Dominance rules require that every use of a value is reachable only after its defining instruction in the control flow graph, verified through dominator tree analysis to prevent invalid optimizations.[38] Reachability is enforced by confirming all basic blocks are accessible from the function entry, eliminating dead code and ensuring the SSA graph's coherence without unreachable terminators or phi nodes with undefined predecessors.[38] These checks, including validation of terminator instructions and exception handling constructs, maintain the IR's reliability across compiler pipelines.[38]
Compiler Pipeline Components
Frontends
In the LLVM compiler infrastructure, frontends serve as language-specific translators that convert high-level source code into LLVM Intermediate Representation (IR). These components are responsible for lexical analysis to tokenize input, parsing to construct an Abstract Syntax Tree (AST), semantic analysis to verify type correctness and resolve symbols, and finally emitting IR through AST-to-IR conversion.[53] This modular design allows LLVM to support diverse programming languages by isolating language-specific logic from the target-independent optimization and code generation stages.[53] The primary example of an LLVM frontend is Clang, which targets C, C++, Objective-C, and Objective-C++. Development of Clang began in 2007 at Apple to address limitations in existing compilers, such as poor diagnostics and licensing issues, with initial C language support achieved by 2009 and production-quality C++ support by 2012.[54][55] Clang performs production-quality compilation for these languages, leveraging LLVM's IR for subsequent processing.[54] Full support for the C++11 standard, including features like lambda expressions, rvalue references, and auto declarations, was realized in Clang 3.1, released in 2012.[55] Other notable LLVM frontends include rustc for the Rust programming language, which has used LLVM as its primary backend since its inception in 2006, with LLVM integration from early development to generate efficient, safe systems code.[56] Apple's Swift compiler, introduced in 2014, also employs an LLVM frontend to translate Swift's syntax—emphasizing safety and performance—directly into optimized IR, enabling seamless interoperability with C and C++ ecosystems.[57] Similarly, the Julia language, initiated in 2012, utilizes an LLVM-based frontend in its just-in-time compiler to handle dynamic, high-level code for numerical and scientific computing, producing native machine code via IR.[58] Another significant frontend is Flang, the Fortran compiler, which achieved full integration in LLVM by 2019 and received major updates in 2024-2025 for Fortran 2023 standard support, enhancing scientific computing capabilities.[59] Designing LLVM frontends presents challenges, particularly in accommodating language-specific features that demand intricate semantic processing. For instance, C++ templates require complex mechanisms for instantiation, two-phase name lookup, and dependent type resolution, which must be faithfully mapped to LLVM IR without introducing inefficiencies or errors during AST traversal.[60] These aspects necessitate robust error recovery and diagnostics to ensure compatibility with LLVM's type-safe IR semantics.[60]Optimizer and Middle-End
The LLVM middle-end, often referred to as the optimizer, processes LLVM Intermediate Representation (IR) generated by frontends to apply a series of analysis and transformation passes that improve code quality, such as reducing execution time, memory usage, and binary size, while preserving semantics.[61] This stage operates on portable IR, enabling optimizations independent of target architectures.[62] The infrastructure is built around the PassManager, which orchestrates these passes in a modular, extensible manner.[61] LLVM employs two pass manager implementations, with the New Pass Manager serving as the primary system for the middle-end optimization pipeline since LLVM 10, replacing the legacy PassManager for this stage.[61] The New Pass Manager supports sequential execution of passes organized by IR hierarchy—module, call-graph strongly connected component (CGSCC), function, and loop levels—along with parallelization opportunities, such as running independent function passes concurrently.[61] It facilitates scalar optimizations (e.g., instruction simplification), vector optimizations (e.g., loop vectorization), and loop-specific transformations through dedicated managers and adaptors, allowing developers to customize pipelines via the PassBuilder API.[61] Key transformation passes include dead code elimination, which removes unreachable or unused instructions to shrink code size; function inlining, which integrates caller-callee bodies to eliminate call overhead and enable further optimizations; constant propagation, which substitutes variables with known constant values to simplify expressions; and loop-invariant code motion, which hoists computations outside loops when they do not depend on iteration variables.[63] For instance, aggressive dead code elimination (ADCE) can eliminate thousands of instructions in benchmarks like SPECint 2000, demonstrating significant impact on code bloat.[62] Supporting these transformations are analysis passes that provide essential data without modifying the IR. Alias analysis disambiguates memory references to enable precise optimizations like global common subexpression elimination, using modular implementations such as basic alias analysis or scalar evolution analysis.[63] Control dependence analysis, often via dominator trees, identifies reachable code paths to inform transformations like dead code removal.[63] Profile-guided optimization (PGO) incorporates runtime execution profiles to guide decisions, such as hot-cold code splitting or branch probability estimation, improving performance by up to 10-20% in profile-heavy workloads.[63] For whole-program optimization, LLVM integrates link-time optimization (LTO) through ThinLTO, a scalable variant introduced in LLVM 3.9 that performs cross-module analysis without fully merging IR modules.[64] ThinLTO compiles modules to bitcode with summary indices, merges these at link time for importing high-value functions (e.g., via inlining), and applies middle-end passes in parallel per module, reducing link times while enabling interprocedural optimizations like devirtualization.[64] This approach supports incremental builds with caching, making it suitable for large projects.[65]Backends and Code Generation
The LLVM backend is responsible for transforming the optimized intermediate representation (IR) from the middle-end into target-specific machine code, enabling execution on diverse hardware platforms. This process involves lowering abstract instructions into concrete machine instructions while respecting architectural constraints such as register sets, instruction formats, and memory models. The backend operates in a modular fashion, separating target-independent phases—common across all architectures—from target-specific customizations, which facilitates maintenance and extension.[36] Central to the target-independent code generation is the SelectionDAG (Directed Acyclic Graph) framework, which models computations as graphs for efficient transformation. Instruction selection employs TableGen, a domain-specific language that defines target descriptions in.td files, automatically generating C++ code for pattern matching and lowering IR operations to machine instructions. For instance, complex operations like floating-point multiply-add are matched via predefined patterns, minimizing manual implementation and enhancing portability. Register allocation follows, mapping an unbounded set of virtual registers to a finite physical set using algorithms such as the greedy allocator or Partitioned Boolean Quadratic Programming (PBQP), with spill code insertion for overflow management. Instruction scheduling then reorders the resulting machine instructions to optimize for latency, throughput, or resource usage, often using list scheduling on the DAG before converting to linear instruction sequences. These phases collectively produce assembly or object code via the Machine Code (MC) layer, which handles emission in formats like ELF or Mach-O.[36][66]
LLVM's backend design emphasizes portability, supporting over 20 architectures including x86, ARM, AArch64, PowerPC, MIPS, RISC-V, and AMDGPU, among others. WebAssembly support was integrated in 2015, enabling compilation to the WebAssembly binary format for web and embedded environments. This breadth is achieved through TableGen-driven descriptions that abstract hardware differences, allowing new targets to be added with minimal core modifications. The optimized IR from the middle-end serves as input, ensuring that backend transformations build on cross-target improvements without reintroducing architecture-specific biases.[67][68]
For just-in-time (JIT) compilation, LLVM provides MCJIT, introduced in 2013 as a memory-safe execution engine that dynamically loads and links machine code modules using the MC layer for object file handling. Building on this, the ORC (On-Request Compilation) JIT infrastructure, launched in 2015, offers a more flexible API for layered compilation, supporting lazy materialization and runtime code patching. Enhancements in the 2020s have extended ORC to hybrid ahead-of-time (AOT) and JIT scenarios, improving performance in dynamic language runtimes and embedded systems by enabling efficient object linking and relocation.[69][70]
Debugging support in the backend generates DWARF (Debugging With Attributed Record Formats) metadata alongside machine code, embedding source-level information such as line numbers, variable locations, and call frames into object files. This format, standardized across architectures, allows tools like GDB to reconstruct program state during execution, with LLVM's MC layer ensuring consistent emission even for JIT-generated code.[71][72]
Tools and Libraries
Linker and Runtime Support
LLVM's linker infrastructure is primarily embodied by LLD, a high-performance linker designed as a drop-in replacement for traditional system linkers such as GNU ld and gold. Introduced in 2016, LLD supports multiple object file formats including ELF, Mach-O, PE/COFF, and WebAssembly, enabling efficient production of executables across diverse platforms. Its architecture emphasizes speed through parallel processing and incremental linking capabilities, achieving over twice the performance of GNU gold on multicore systems for large-scale builds. For instance, LLD's moldable design allows it to handle complex linker scripts while maintaining a compact codebase of approximately 21,000 lines of C++ as of early implementations.[73][74] Complementing LLD, LLVMgold serves as a GCC-compatible plugin that integrates LLVM's link-time optimization (LTO) capabilities into the GNU gold linker. This plugin implements the gold plugin interface atop LLVM's libLTO library, allowing GCC users to leverage LLVM-based optimizations during the linking phase without switching compilers. It facilitates seamless interoperation with tools like ar and nm, enabling whole-program analysis and optimization for projects built with GCC. LLVMgold has been a key enabler for hybrid workflows where LLVM enhancements augment existing GNU toolchains.[75] On the runtime side, LLVM provides essential libraries to support program execution, particularly for error detection, debugging, and performance analysis. Libunwind implements a lightweight stack unwinding mechanism critical for C++ exception handling, adhering to the Itanium ABI and supporting platforms like x86-64, ARM, and AArch64. This library enables efficient traversal of call frames during exception propagation, integrating with LLVM's exception handling model that uses landing pads and invoke instructions in IR. For profiling, libprofile delivers runtime support for profile-guided optimization (PGO), collecting instrumentation data such as branch frequencies and function counters to inform subsequent compilation passes. It serializes profiles in formats compatible with LLVM's IRPGO, aiding in just-in-time adjustments for better code quality.[52] Sanitizer runtimes form another cornerstone, with AddressSanitizer (ASan) introduced in 2012 as a fast memory error detector comprising compiler instrumentation and a runtime library. ASan employs shadow memory to track addressable regions, detecting issues like buffer overflows and use-after-free errors with low overhead—typically 2x runtime slowdown and 2-3x memory usage increase on supported architectures. Other sanitizers, such as ThreadSanitizer for race detection and MemorySanitizer for uninitialized reads, rely on analogous runtime components built into LLVM's compiler-rt project. These runtimes are dynamically linked or statically incorporated, ensuring portability across Unix-like systems and Windows.[76][77] In just-in-time (JIT) compilation scenarios, LLVM's runtime support extends to integration with dynamic loaders via components like the ORC JIT infrastructure and JITLink. This allows generated code to resolve symbols and relocations at runtime, mimicking the behavior of system dynamic linkers (e.g., ld.so on Linux) for loading modules on-demand. JITLink, in particular, handles object file loading and patching in memory, supporting formats like ELF and Mach-O to enable seamless execution in environments such as interpreters or embedded systems.[78]Standard Libraries
LLVM's standard libraries provide implementations for key runtime components required by C and C++ programs compiled with Clang and other compatible frontends. These libraries emphasize modularity, performance, and permissiveness under open-source licensing, enabling their use in diverse environments from embedded systems to high-performance computing. libc++ is LLVM's modular implementation of the C++ standard library, initially released in 2011 as a high-performance alternative to existing options. It targets C++11 and later standards, prioritizing correctness as defined by the ISO specifications, fast execution, minimal memory usage, and rapid compile times. Designed for portability across platforms including macOS, Linux, Windows, FreeBSD, Android, and embedded targets, libc++ factors out OS- and CPU-specific code to facilitate cross-compilation and maintenance. By LLVM 16 in 2023, it achieved full support for the C++20 standard, including features like the spaceship operator, coroutines, and modules, with ongoing enhancements for C++23 and C++26. Its modular architecture allows selective inclusion of components, reducing binary size in constrained environments, and it includes extensive unit tests to ensure conformance. libcxxabi serves as the application binary interface (ABI) layer for libc++, implementing low-level support for C++ features such as exceptions and runtime type information (RTTI). It provides implementations for the Itanium ABI—widely used on x86 and other architectures—and the ARM EABI, ensuring compatibility with diverse hardware targets. Key functionalities include exception handling mechanisms that enable cross-dynamic-library propagation of exceptions (e.g., defining destructors for standard classes likestd::exception to maintain unique type_info instances) and RTTI support for type identification across module boundaries. Developed as a portable sublayer, libcxxabi is ABI-compatible with existing implementations on platforms like macOS and is dual-licensed under the MIT and University of Illinois/NCSA Open Source Licenses, promoting broad adoption without restrictive terms.
Compiler-RT (compiler runtime) is LLVM's library for implementing intrinsic functions and low-level runtime support, replacing parts of traditional libraries like libgcc. It provides optimized implementations for mathematical operations (e.g., floating-point conversions like __floatundidf), atomic operations for thread-safe concurrency, and sanitizer runtimes for debugging tools such as AddressSanitizer and ThreadSanitizer. For instance, it handles builtins like __builtin_trap for generating traps in optimized code paths. Written in C and assembly for performance, Compiler-RT supports multiple architectures including x86-64, ARM, PowerPC, and SPARC, and operating systems like Linux, Windows, and Darwin. Its design focuses on replacing vendor-specific runtimes with a unified, high-performance alternative, also dual-licensed under MIT and UIUC terms.
In comparison to GNU's libstdc++, which is tightly integrated with GCC and licensed under GPLv2 with runtime exceptions (or LGPLv3), libc++ offers greater modularity through its factored design and a more permissive Apache 2.0 license with LLVM exceptions, avoiding copyleft restrictions that can complicate proprietary or mixed-license projects. While libstdc++ excels in certain areas like I/O performance on Linux, libc++ often provides superior speed in string handling via short-string optimization and broader portability across non-GCC compilers, making it the default for Clang on Apple platforms and increasingly in Android and other ecosystems.
