Recent from talks
Nothing was collected or created yet.
Unity build
View on WikipediaIn software engineering, a unity build (also known as unified build, jumbo build or blob build) is a method used in C and C++ software development to speed up the compilation of projects by combining multiple translation units into a single one, usually achieved by using include directives to bundle multiple source files into one larger file.
Implementation
[edit]If two different translation units FileA.cpp:
#include "Header.hpp"
// content of source file A ...
and FileB.cpp:
#include "Header.hpp"
// content of source file B ...
in the same project both include the header Header.hpp, that header will be processed twice by the compiler chain, once for each build task. If the two translation units are merged into a single source file JumboFile.cpp:
#include "FileA.cpp"
#include "FileB.cpp"
then Header.hpp will be processed only once (thanks to include guards) when compiling JumboFile.cpp.[1]
Effects
[edit]The main benefit of unity builds is a reduction of duplicated effort in parsing and compiling the content of headers that are included in more than one source file. The content of headers usually accounts for the majority of code in a source file after preprocessing. Unity builds also mitigate the overhead caused by having a large number of small source files by reducing the number of object files created and processed by the compilation chain, and allows interprocedural analysis and optimisation across the files that form the unity build task (similar to the effects of link-time optimisation). They make it also easier to detect violations of the One Definition Rule, because if a symbol is defined twice in different source files in the same unity build, the compiler will be able to identify the redefinition and emit a warning or error.
One of the drawbacks of unity builds is a larger memory footprint due to larger translation units. Larger translation units can also negatively affect parallel builds, since a small number of large compile jobs is generally harder or impossible to schedule to saturate all available parallel computing resources effectively. Unity builds can also deny part of the benefits of incremental builds, that rely on rebuilding as little code as possible, i.e. only the translation units affected by changes since the last build. These disadvantages can be offset by a great increase in the speed of the application linker stage, which no longer has to load and eliminate duplicate template codegen blocks in each translation unit. The linker stage memory use will also decrease.
Unity builds have also potentially dangerous effects on the semantics of programs. Some valid C++ constructs that rely on internal linkage may fail under a unity build, for instance clashes of static symbols and symbols defined in anonymous namespaces with the same identifier in different files. If different C++ files define different functions with the same name, the compiler may unexpectedly resolve the overloading by selecting the wrong function, in a way that was not possible when designing the software with the files as different translation units. Another adverse effect is the possible leakage of macro definitions across different source files.[2]
Build system support
[edit]Some build systems provide built-in support for automated unity builds, including Visual Studio,[3] Meson,[4] CMake.[5] and xmake.
References
[edit]- ^ Kubota et al. (2019)
- ^ Kirilov, Viktor (7 July 2018). "A guide to unity builds". Archived from the original on 2020-11-12.
- ^ Olga Arkhipova (2 July 2018). "Support for Unity (Jumbo) Files in Visual Studio 2017 15.8 (Experimental)". Microsoft.
- ^ "Unity builds".
- ^ "UNITY_BUILD - CMake 3.17.0 Documentation".
- Kubota, Takafumi; Yusuke, Suzuki; and, Kenji Kono (2019). To unify or not to unify: a case study on unified builds (in WebKit). Proceedings of the 28th International Conference on Compiler Construction. doi:10.1145/3302516.3307347.
Unity build
View on Grokipedia#include to concatenate them into one file before compilation.[1][2] This approach contrasts with traditional builds that compile each source file separately into object files and then link them, aiming primarily to accelerate the overall build process by reducing redundant operations such as header parsing and template instantiation across files.[3]
Unity builds offer significant advantages for full project compilations, particularly in large codebases, where they can achieve speedups of up to 9-12 times compared to standard builds; for instance, in the Inkscape project, a unity build reduced compilation time from 34 minutes to under 3 minutes using CMake.[4] By treating the entire codebase as one unit, compilers can perform more aggressive interprocedural optimizations, eliminate cross-file symbol resolution during linking, and parse shared headers only once, leading to faster preprocessing and potentially better code generation.[2] However, these benefits come with trade-offs: incremental builds become slower since any single file change requires recompiling the entire unity file, increasing memory usage and preventing parallel compilation of individual units.[3] Additionally, unity builds may introduce compilation errors or unexpected behavior due to issues like one definition rule (ODR) violations, symbol clashes from static variables or anonymous namespaces becoming globally visible, and stricter dependency ordering requirements among included files.[2][1]
The technique has been employed in high-profile projects for over a decade, including by Ubisoft since around 2004 for game development, as well as in WebKit and Unreal Engine to manage complex codebases efficiently.[3] Modern build systems like CMake provide native support through properties such as UNITY_BUILD, allowing configurable batching of source files into multiple unity units for partial parallelization while mitigating some drawbacks.[1] Despite these advancements, unity builds require careful code adjustments—such as avoiding internal linkage or ensuring include guards—to avoid pitfalls, making them most suitable for release builds or scenarios where full recompilation is frequent.[3]
Overview
Definition
A unity build, also known as a unified build, jumbo build, or single translation unit build, is a compilation technique used in C and C++ software development where multiple source files (typically .c or .cpp files) are concatenated or included into one or a few large files to form a single translation unit before compilation.[5][1] This approach contrasts with the traditional method of compiling each source file as a separate translation unit, which processes files independently.[5] In C and C++, a translation unit represents the fundamental unit of compilation, comprising a single source file after preprocessing, including all its header files and excluding comments, as specified in the ISO C++ standard (ISO/IEC 14882). The basic mechanism of a unity build involves the compiler treating this combined unit as a cohesive whole, which minimizes repetitive parsing of shared headers across files and facilitates whole-program optimizations that span multiple original source files.[5] A simple way to implement this is by creating a dedicated unity source file that uses#include directives to incorporate other source files, such as #include "file1.cpp" followed by #include "file2.cpp", which the compiler then processes as one entity.[5] This technique has seen adoption in large-scale projects, including game engines, to streamline builds.[6]
History
The concept of unity builds, a technique for combining multiple source files into a single translation unit to accelerate compilation in C and C++ projects, was employed in proprietary game development as early as around 2004 by Ubisoft.[3] It first gained notable attention in developer discussions around 2007. An early detailed exploration appeared in a blog post by OJ Reeves, who described unity builds as a method to dramatically reduce build times for developers working on large codebases, particularly in environments with frequent modifications.[7] This approach was initially popularized in the context of Microsoft Visual Studio, where it was referred to simply as "unity builds" to denote the consolidation of project files for faster release compilations.[8] During the 2010s, unity builds saw increased adoption in game development, driven by the demands of massive codebases in engines and tools. The technique was prominently featured in Casey Muratori's Handmade Hero project, launched in 2014, which demonstrated a single-file unity build structure to streamline compilation while building a complete game from scratch.[9] In parallel, open-source communities began using alternative terminology like "jumbo builds" to avoid confusion with the Unity game engine, as noted in Chromium's documentation, which adopted the term for its large-scale C++ codebase. Build systems also integrated support during this period; for instance, Meson, which emerged around 2013, added built-in unity build capabilities by 2017 to handle dependencies efficiently in incremental scenarios.[10][11] By 2018, benchmarks on projects like Chromium highlighted the technique's impact, with Clang's JumboSupport feature reporting significant reductions in header processing time—often 80-95% of compilation overhead—yielding up to 2x faster builds compared to traditional methods, albeit with a minor increase in binary size.[12] Adoption continued to grow, exemplified by Firefox's gradual integration of unity builds into its build system in 2023 to optimize compilation for its expansive codebase.[13] As of 2025, unity builds are widely employed in CI/CD pipelines for release configurations in large-scale software projects, though their use remains selective due to advancements in compiler optimizations and parallel processing that mitigate some traditional build bottlenecks.[1][14]Implementation
Manual Methods
Manual methods for creating unity builds involve directly editing source code to combine multiple translation units into a single file, suitable for small to medium-sized projects where build system automation is not desired. A translation unit consists of a source file and all its included headers, processed by the compiler as an independent unit. This approach requires careful ordering of includes to respect dependencies and avoid compilation errors. The process begins by creating a new source file, typically named something likeunity.cpp or all.cpp. In this file, use preprocessor directives to include all existing .cpp files in an order that satisfies their dependencies, ensuring that any file included earlier does not rely on symbols defined later. This dependency order prevents issues arising from circular includes, where two files mutually depend on each other. Once created, compile only this single unity file instead of the individual sources, removing the originals from the build invocation to prevent duplicate compilation.[15]
Handling dependencies is crucial for successful unity builds. Headers must employ include guards—preprocessor macros that prevent multiple inclusions of the same file—using patterns like #ifndef HEADER_NAME_H #define HEADER_NAME_H ... #endif. This avoids redefinition errors when files are concatenated. Additionally, forward declarations in headers or the unity file can reduce unnecessary inclusions; for instance, declaring class MyClass; allows using pointers or references without full class definitions, minimizing recompilation triggers.[16]
To maximize benefits, specific compiler flags can enhance optimizations within the unified translation unit. In Visual Studio, the /Gy option enables function-level linking, packaging functions into COMDAT sections for the linker to discard unused ones and order them efficiently, which is particularly effective in large unity files. For GCC or Clang, the -fwhole-program flag assumes the compilation unit represents the entire program, allowing the compiler to inline functions across what were originally separate files and apply aggressive interprocedural optimizations.[17][18]
A simple example of a unity build file for a basic project with a main function and supporting modules is shown below:
// unity.cpp
#include "math.cpp" // Math operations with class definition
#include "utils.cpp" // Utility functions depending on math's declarations
#include "main.cpp" // Contains the main() function and basic setup
// unity.cpp
#include "math.cpp" // Math operations with class definition
#include "utils.cpp" // Utility functions depending on math's declarations
#include "main.cpp" // Contains the main() function and basic setup
math.cpp might define:
// math.cpp
#include "math.h" // Header with include guards and function prototypes
class [Calculator](/page/Calculator) {
public:
int add(int a, int b) { return a + b; }
};
// math.cpp
#include "math.h" // Header with include guards and function prototypes
class [Calculator](/page/Calculator) {
public:
int add(int a, int b) { return a + b; }
};
main.cpp (included after):
// main.cpp
#include <iostream>
int main() {
Calculator calc;
std::cout << "Result: " << calc.add(2, 3) << std::endl;
return 0;
}
// main.cpp
#include <iostream>
int main() {
Calculator calc;
std::cout << "Result: " << calc.add(2, 3) << std::endl;
return 0;
}
math.h.[15]
Common pitfalls in manual unity builds include duplicate symbol errors, where functions or variables with the same name across included files violate the One Definition Rule. This often occurs with static functions or global variables intended for internal use; to resolve, wrap such definitions in anonymous namespaces (namespace { ... }), which limit visibility to the translation unit without external linkage. Additionally, over-reliance on static keywords may need adjustment, as they become effectively global within the unity file, potentially exposing unintended symbols.[19][8]
Automated Methods
Automated methods for unity builds employ scripts and tools to dynamically assemble multiple source files into a single translation unit, facilitating repeatability and scalability in larger C++ projects where manual approaches become unwieldy. These techniques automate file collection, concatenation, and compilation while addressing challenges like dependency tracking and partial rebuilds. By leveraging scripting languages such as Bash or Python, developers can generate unity files on demand, integrating globbing or list-based selection to include relevant sources without hardcoding paths.[20] Scripting basics center on concatenating source files into one unit for compilation. In Bash, this can be achieved by globbing .cpp files and using thecat command to merge their contents, followed by a compiler invocation. Such scripts are lightweight and suitable for simple automation in Unix-like environments. For instance, a basic Bash script might perform the concatenation and build in sequence, ensuring the unity file is created fresh each time.[20]
#!/bin/bash
# Remove existing unity file if present
rm -f unity.cpp
# Concatenate all .cpp files
cat *.cpp > unity.cpp
# Compile the unity file
g++ -std=c++17 unity.cpp -o program
#!/bin/bash
# Remove existing unity file if present
rm -f unity.cpp
# Concatenate all .cpp files
cat *.cpp > unity.cpp
# Compile the unity file
g++ -std=c++17 unity.cpp -o program
src/*.cpp) to dynamically assemble sources, while specialized tools analyze project dependencies to form stable unity files, selecting external headers for precompilation to minimize redundant work. The cotire tool (archived since April 2025) exemplifies this by automatically deriving unity sources from project file lists and compile definitions, generating a consolidated file without source modifications.[21]
Note that with C++20 modules, unity builds via #include concatenation may not apply directly to module interfaces, necessitating tool adjustments or alternative bundling strategies.[22]
Hybrid incremental approaches mitigate the full-rebuild drawback of unity builds by re-including only changed files, using file timestamps or content hashes for detection. Timestamps compare modification times to decide if a source needs concatenation, while hashes (e.g., MD5 of file contents) offer robustness against clock skews in distributed teams. A script might scan sources, hash or timestamp-check each against a cache, and append only updates to the unity file, enabling faster iteration on modified modules while retaining unity benefits for unchanged code. Excluding frequently edited files from the unity set during development further supports this hybrid model.[23]
A simple Python script can implement this generation and compilation process, offering cross-platform flexibility and easy extension for hashing or exclusions. The following example globs .cpp files, writes them to unity.cpp, and compiles:
import os
import glob
import subprocess
import fnmatch
# List of files to exclude (e.g., tests or third-party)
exclusions = ['test_*.cpp', 'third_party/*.cpp']
# Glob .cpp files, filtering exclusions
all_sources = glob.glob('*.cpp')
sources = [src for src in all_sources if not any(fnmatch.fnmatch(src, ex) for ex in exclusions)]
# Concatenate to unity.cpp
with open('unity.cpp', 'w') as unity_file:
for source in sources:
with open(source, 'r') as src_file:
unity_file.write(src_file.read() + '\n\n')
# Compile
subprocess.run(['g++', '-std=c++17', 'unity.cpp', '-o', 'program'])
import os
import glob
import subprocess
import fnmatch
# List of files to exclude (e.g., tests or third-party)
exclusions = ['test_*.cpp', 'third_party/*.cpp']
# Glob .cpp files, filtering exclusions
all_sources = glob.glob('*.cpp')
sources = [src for src in all_sources if not any(fnmatch.fnmatch(src, ex) for ex in exclusions)]
# Concatenate to unity.cpp
with open('unity.cpp', 'w') as unity_file:
for source in sources:
with open(source, 'r') as src_file:
unity_file.write(src_file.read() + '\n\n')
# Compile
subprocess.run(['g++', '-std=c++17', 'unity.cpp', '-o', 'program'])
Effects
Advantages
Unity builds offer substantial performance benefits in C++ compilation, primarily by streamlining the build process for large-scale projects. The core advantage lies in drastically reducing compilation times through the elimination of redundant header parsing and preprocessing. In conventional separate compilation, each source file triggers repeated parsing of shared headers, incurring significant overhead from disk I/O and CPU cycles. By merging multiple source files into a single translation unit using#include directives, the compiler processes headers only once, leading to faster full builds—especially beneficial for projects with extensive template usage or complex include graphs.[24][10]
Quantitative benchmarks on real-world C++ codebases illustrate these gains, often achieving speedups of 2x or more for complete compilations. For example, in builds involving changes to multiple files, such as modifying 101 files in the iscool-core library (over 1,000 source files), unity builds reduced debug compilation time from 154 seconds to 70 seconds, while release builds saw similar proportional improvements.[25] Such accelerations are particularly pronounced in template-heavy code, where redundant instantiations are minimized, making unity builds ideal for release configurations, continuous integration pipelines, and game development projects spanning millions of lines of code.[24]
Linking phases also benefit, as fewer object files are produced—typically one or a handful instead of hundreds—lowering the linker's workload and enabling quicker resolution of symbols. This is especially impactful for massive codebases, where traditional linking can become a bottleneck.[10][1]
Additionally, unity builds enhance optimization opportunities by presenting the compiler with a holistic view of the codebase. This allows for superior interprocedural analyses, such as cross-file function inlining that would otherwise be infeasible, and more effective dead code elimination across the entire program. For instance, functions defined in one original source file can be inlined into callers from another, reducing runtime overhead in ways not possible with isolated translation units.[10][26]
Disadvantages
Unity builds, while accelerating full compilations, introduce significant drawbacks in development workflows, particularly for iterative changes. A primary limitation is the slowdown in incremental builds: modifying a single source file within a unity unit necessitates recompiling the entire aggregated translation unit, often negating the time savings for developers who frequently edit code during active work. This can extend recompilation times from seconds to minutes in large projects, as the compiler must reprocess all included files despite only one being altered.[27][28] Larger translation units also elevate peak memory usage during compilation, as the compiler holds the entire unit in RAM simultaneously, potentially straining systems with limited resources. For substantial codebases, this can result in higher RAM demands compared to compiling files separately, leading to swapping or failures on machines with insufficient memory and thereby prolonging overall build times on low-end hardware.[28] Semantic alterations pose another challenge, as unity builds remove traditional translation unit boundaries, changing program behavior in ways that require code modifications. Additionally, static variables, inline functions, or internal linkage symbols with identical names across files can trigger multiple definition errors or One Definition Rule (ODR) violations, necessitating adjustments such as renaming symbols or using unique namespaces to resolve conflicts. Preprocessor directives and namespace usages in source files can similarly clash, further complicating integration.[1][28][27] Debugging becomes more arduous with unity builds, as error messages and stack traces reference the monolithic unit rather than individual files, making it harder to isolate issues within the broader codebase without additional tools or separate compilation for diagnosis.[8] Finally, compatibility issues arise with third-party libraries, many of which rely on isolated translation units and may not compile correctly in a unity context due to symbol collisions or ODR problems, often requiring library-specific workarounds or exclusion from unity aggregation. Although unity builds can halve full build times in clean compilations, these trade-offs frequently limit their adoption in mixed or library-heavy projects.[28]Build System Support
CMake Integration
CMake provides native support for unity builds starting with version 3.16, released in 2020, through theUNITY_BUILD target property, which enables the automatic combination of multiple source files into batches for compilation.[1] When enabled, CMake generates intermediate unity source files that include the original sources, reducing the number of separate compilations while preserving the semantics of individual file builds.[1] This feature is available for C and CXX languages from version 3.16, with extensions to CUDA (3.31), OBJC (3.29), and OBJCXX (3.29).[1]
To enable unity builds for a specific target, set the UNITY_BUILD property to ON using set_target_properties after calling add_library or add_executable. For example:
add_library(my_library source1.cpp source2.cpp source3.cpp)
set_target_properties(my_library PROPERTIES UNITY_BUILD ON)
add_library(my_library source1.cpp source2.cpp source3.cpp)
set_target_properties(my_library PROPERTIES UNITY_BUILD ON)
CMAKE_UNITY_BUILD set to ON via the CMake command line (e.g., cmake -DCMAKE_UNITY_BUILD=ON ..), which initializes the property on newly created targets; however, projects should avoid hardcoding this variable and instead rely on developer overrides for machine-specific optimization.[29]
Unity build grouping is controlled by the UNITY_BUILD_MODE property, introduced in CMake 3.18, with two primary modes: BATCH (default, where CMake automatically partitions sources) and GROUP (explicit partitioning via the UNITY_GROUP source property).[30] In BATCH mode, the UNITY_BUILD_BATCH_SIZE property limits the number of sources per unity file (default 8), allowing customization for large projects by increasing the size to balance memory usage and parallelism—e.g., set_target_properties(my_library PROPERTIES UNITY_BUILD_BATCH_SIZE 16).[31] In GROUP mode, sources are assigned to buckets using set_source_files_properties with the UNITY_GROUP property, such as:
set_source_files_properties(source1.cpp source2.cpp PROPERTIES UNITY_GROUP "group1")
set_source_files_properties(source3.cpp PROPERTIES UNITY_GROUP "group2")
set_target_properties(my_library PROPERTIES UNITY_BUILD_MODE GROUP)
set_source_files_properties(source1.cpp source2.cpp PROPERTIES UNITY_GROUP "group1")
set_source_files_properties(source3.cpp PROPERTIES UNITY_GROUP "group2")
set_target_properties(my_library PROPERTIES UNITY_BUILD_MODE GROUP)
BATCH mode with a higher batch size creates multiple unity files, distributing the workload across cores without exceeding compiler limits.[31]
Conditional unity builds can integrate with generator expressions for environment-specific enabling, such as set_target_properties(my_library PROPERTIES UNITY_BUILD $<BOOL:${ENABLE_UNITY}>) where ENABLE_UNITY is a cache variable.[32] Hybrid builds, mixing unity and individual compilations, are supported by setting the SKIP_UNITY_BUILD_INCLUSION source property to ON for specific files via set_source_files_properties(file.cpp PROPERTIES SKIP_UNITY_BUILD_INCLUSION ON), ensuring they compile separately even when the target uses unity builds.[33] Developers can test configurations by toggling CMAKE_UNITY_BUILD in the cache or per-target properties, regenerating the build directory to observe effects without altering project code.[29]
Prior to CMake 3.16, unity builds required manual implementation using custom commands to generate unity source files and set_source_files_properties to mark original sources as HEADER_FILE_ONLY ON, excluding them from direct compilation while including them via #include in the generated unity file. This approach, often scripted in CMakeLists.txt, allowed batching but lacked automation, leading to the adoption of third-party modules like cotire for streamlined setup via its cotire() function.[21] With native support, such manual methods are now deprecated for new projects.[1]
Meson and Other Systems
Meson provides built-in support for unity builds, enabling developers to accelerate compilation by combining multiple source files into fewer units without requiring code modifications.[10] To activate this feature, setunity: true within a target's declaration in the meson.build file, such as executable('myapp', 'main.c', unity: true), or use the --unity command-line option during project configuration.[10] Upon enabling, Meson automatically generates unity source files—typically by creating intermediate header files that include the original sources (e.g., #include "src1.c")—and compiles these concatenated units, with a default grouping of up to four files per unit adjustable via the unity_size option.[10]
Visual Studio offers native support for unity builds, referred to as jumbo builds, introduced experimentally in version 2017 update 15.8 and stabilized in subsequent releases.[5] This is configured via project settings in the IDE: navigate to Configuration Properties > C/C++ > Advanced > Enable Unity (JUMBO) Build and set it to "Yes (/u)", which merges multiple C++ source files into one or more larger files before compilation.[34] For enhanced performance, unity builds integrate with precompiled headers using the /Yu compiler flag, allowing shared headers to be precompiled and reused across the unified compilation units.
Premake facilitates unity builds primarily for Visual Studio generators through the enableunitybuild "On" directive in its Lua-based scripts, automatically applying the jumbo build property to project configurations.[35] For broader or custom implementations across other backends like GNU Make or Ninja, developers can leverage Premake's Lua scripting to create custom targets: for instance, generate a single unity.cpp file by concatenating sources via a premake action or post-build command, then include it in the build while excluding originals.[36]
The GN build system, employed in large-scale projects like Chromium, implements unity builds via "jumbo" targets to merge numerous source files and reduce compilation overhead.[27] Configuration involves importing //build/config/jumbo.gni in the BUILD.gn file and replacing standard targets (e.g., source_set) with equivalents like jumbo_source_set("name", { sources = [ ... ] }), which uses a Python script to concatenate up to 200 files per unit by default.[27] Exclusions for files causing symbol clashes are handled with jumbo_excluded_sources.[27]
In comparison to CMake's extensive customization options for unity builds, Meson's approach prioritizes simplicity, requiring only a single boolean flag in build files for activation without additional scripting.[37] GN's adoption in Chromium exemplifies its utility for massive codebases, where jumbo builds have contributed to measurable reductions in full compilation times.[27]
Older build systems such as GNU Make offer no native unity build functionality, instead relying on custom Makefile rules to manually concatenate sources (e.g., via cat commands) into a single file for compilation. Similarly, legacy versions of Apache Ant lack built-in support for unity-style merging in Java or C++ projects, necessitating custom Ant tasks or external scripts as workarounds.