Recent from talks
Contribute something
Nothing was collected or created yet.
GNU Compiler Collection
View on Wikipedia
| GNU Compiler Collection | |
|---|---|
Screenshot of GCC 10.2 compiling its own source code | |
| Original author | Richard Stallman |
| Developer | GNU Project |
| Initial release | March 22, 1987[1] |
| Stable release | 15.2[2] |
| Repository | |
| Written in | C, C++[3] |
| Operating system | Cross-platform |
| Platform | GNU and many others |
| Size | ~15 million LOC[4] |
| Available in | English |
| Type | Compiler |
| License | GPLv3+ with GCC Runtime Library Exception[5] |
| Website | gcc |
The GNU Compiler Collection (GCC) is a collection of compilers from the GNU Project that support various programming languages, hardware architectures, and operating systems. The Free Software Foundation (FSF) distributes GCC as free software under the GNU General Public License (GNU GPL). GCC is a key component of the GNU toolchain which is used for most projects related to GNU and the Linux kernel. With roughly 15 million lines of code in 2019, GCC is one of the largest free programs in existence.[4] It has played an important role in the growth of free software, as both a tool and an example.
When it was first released in 1987 by Richard Stallman, GCC 1.0 was named the GNU C Compiler since it only handled the C programming language.[1] It was extended to compile C++ in December of that year. Front ends were later developed for Objective-C, Objective-C++, Fortran, Ada, Go, D, Modula-2, Rust and COBOL among others.[6] The OpenMP and OpenACC specifications are also supported in the C and C++ compilers.[7][8]
As well as being the official compiler of the GNU operating system, GCC has been adopted as the standard compiler by many other modern Unix-like computer operating systems, including most Linux distributions. Most BSD family operating systems also switched to GCC shortly after its release, although since then, FreeBSD and Apple macOS have moved to the Clang compiler,[9] largely due to licensing reasons.[10][11][12] GCC can also compile code for Windows, Android, iOS, Solaris, HP-UX, AIX, and MS-DOS compatible operating systems.[13]
GCC has been ported to more platforms and instruction set architectures than any other compiler, and is widely deployed as a tool in the development of both free and proprietary software. GCC is also available for many embedded systems, including ARM-based and Power ISA-based chips.
History
[edit]In late 1983, in an effort to bootstrap the GNU operating system, Richard Stallman asked Andrew S. Tanenbaum, the author of the Amsterdam Compiler Kit (also known as the Free University Compiler Kit), for permission to use that software for GNU. When Tanenbaum advised him that the compiler was not free, and that only the university was free, Stallman decided to work on a different compiler.[14] His initial plan was to rewrite an existing compiler from Lawrence Livermore National Laboratory from Pastel to C with some help from Len Tower and others.[15][16] Stallman wrote a new C front end for the Livermore compiler, but then realized that it required megabytes of stack space, an impossibility on a 68000 Unix system with only 64 KB, and concluded he would have to write a new compiler from scratch.[15] None of the Pastel compiler code ended up in GCC, though Stallman did use the C front end he had written.[15][17]
GCC was first released March 22, 1987, available by FTP from MIT.[18] Stallman was listed as the author but cited others for their contributions, including Tower for "parts of the parser, RTL generator, RTL definitions, and of the Vax machine description", Jack Davidson and Christopher W. Fraser for the idea of using RTL as an intermediate language, and Paul Rubin for writing most of the preprocessor.[19] Described as the "first free software hit" by Peter H. Salus, the GNU compiler arrived just at the time when Sun Microsystems was unbundling its development tools from its operating system, selling them separately at a higher combined price than the previous bundle, which led many of Sun's users to buy or download GCC instead of the vendor's tools.[20] While Stallman considered GNU Emacs as his main project, by 1990 GCC supported thirteen computer architectures, was outperforming several vendor compilers, and was used commercially by several companies.[21]
EGCS fork
[edit]As GCC was licensed under the GPL, programmers wanting to work in other directions—particularly those writing interfaces for languages other than C—were free to develop their own fork of the compiler, provided they meet the GPL's terms, including its requirements to distribute source code. Multiple forks proved inefficient and unwieldy, however, and the difficulty in getting work accepted by the official GCC project was greatly frustrating for many, as the project favored stability over new features.[22] The FSF kept such close control on what was added to the official version of GCC 2.x (developed since 1992) that GCC was used as one example of the "cathedral" development model in Eric S. Raymond's essay The Cathedral and the Bazaar.
In 1997, a group of developers formed the Experimental/Enhanced GNU Compiler System[citation needed] (EGCS) to merge several experimental forks into a single project.[22][17] The basis of the merger was a development snapshot of GCC (taken around the 2.7.2 and later followed up to 2.8.1 release). Mergers included g77 (Fortran), PGCC (P5 Pentium-optimized GCC),[17] many C++ improvements, and many new architectures and operating system variants.[23]
While both projects followed each other's changes closely, EGCS development proved considerably more vigorous, so much so that the FSF officially halted development on their GCC 2.x compiler, blessed EGCS as the official version of GCC, and appointed the EGCS project as the GCC maintainers in April 1999. With the release of GCC 2.95 in July 1999 the two projects were once again united.[24][17] GCC has since been maintained by a varied group of programmers from around the world under the direction of a steering committee.[25]
GCC 3 (2002) removed a front-end for CHILL due to a lack of maintenance.[26]
Before version 4.0 the Fortran front end was g77, which only supported FORTRAN 77, but later was dropped in favor of the new GNU Fortran front end that supports Fortran 95 and large parts of Fortran 2003 and Fortran 2008 as well.[27][28]
As of version 4.8, GCC is implemented in C++.[29]
Support for Cilk Plus existed from GCC 5 to GCC 7.[30][31]
GCC has been ported to a wide variety of instruction set architectures, and is widely deployed as a tool in the development of both free and proprietary software. GCC is also available for many embedded systems, including Symbian (called gcce),[32] ARM-based, and Power ISA-based chips.[33] The compiler can target a wide variety of platforms, including video game consoles such as the PlayStation 2,[34] Cell SPE of PlayStation 3,[35] and Dreamcast.[36] It has been ported to "more than 60 platforms".[37]
Supported languages
[edit]As of the 15.1 release,[update] GCC includes front ends for C (gcc), C++ (g++), Objective-C, Objective-C++, Fortran (gfortran), Ada (GNAT), Go (gccgo), D (gdc, since 9.1),[38][39] Modula-2 (gm2, since 13.1),[40][41] Rust (gccrs, since 15.1) and COBOL (gcobol, since 15.1) programming languages,[42] with the OpenMP and OpenACC parallel language extensions being supported since GCC 5.1.[8][43] Versions prior to GCC 7 also supported Java (gcj), allowing compilation of Java to native machine code.[44]
Third-party front ends exist for many languages, such as ALGOL 68 (ga68),[45] Pascal (gpc), Mercury, Modula-3, VHDL (GHDL) and PL/I.[42] A few experimental branches exist to support additional languages, such as the GCC UPC compiler for Unified Parallel C.[46][47][better source needed]
Regarding language version support for C++, since GCC 11.1 the default target is gnu++17, a superset of C++17, and for C, since GCC 15 the default target is gnu23, a superset of C23, with strict standard support also available. GCC also provides experimental support for C2Y, C++20, C++23, and C++26.[48]
Design
[edit]

GCC's external interface follows Unix conventions. Users invoke a language-specific driver program (gcc for C, g++ for C++, etc.), which interprets command arguments, calls the actual compiler, runs the assembler on the output, and then optionally runs the linker to produce a complete executable binary.
Each of the language compilers is a separate program that reads source code and outputs machine code. All have a common internal structure. A per-language front end parses the source code in that language and produces an abstract syntax tree ("tree" for short).
These are, if necessary, converted to the middle end's input representation, called GENERIC form; the middle end then gradually transforms the program towards its final form. Compiler optimizations and static code analysis techniques (such as FORTIFY_SOURCE,[49] a compiler directive that attempts to discover some buffer overflows) are applied to the code. These work on multiple representations, mostly the architecture-independent GIMPLE representation and the architecture-dependent RTL representation. Finally, machine code is produced using architecture-specific pattern matching originally based on an algorithm of Jack Davidson and Chris Fraser.
GCC was written primarily in C except for parts of the Ada front end. The distribution includes the standard libraries for Ada and C++ whose code is mostly written in those languages.[50][needs update] On some platforms, the distribution also includes a low-level runtime library, libgcc, written in a combination of machine-independent C and processor-specific machine code, designed primarily to handle arithmetic operations that the target processor cannot perform directly.[51]
GCC uses many additional tools in its build, many of which are installed by default by many Unix and Linux distributions (but which, normally, aren't present in Windows installations), including Perl,[further explanation needed] Flex, Bison, and other common tools. In addition, it currently requires three additional libraries to be present in order to build: GMP, MPC, and MPFR.[52]
In May 2010, the GCC steering committee decided to allow use of a C++ compiler to compile GCC.[53] The compiler was intended to be written mostly in C plus a subset of features from C++. In particular, this was decided so that GCC's developers could use the destructors and generics features of C++.[54]
In August 2012, the GCC steering committee announced that GCC now uses C++ as its implementation language.[55] This means that to build GCC from sources, a C++ compiler is required that understands ISO/IEC C++03 standard.
On May 18, 2020, GCC moved away from ISO/IEC C++03 standard to ISO/IEC C++11 standard (i.e. needed to compile, bootstrap, the compiler itself; by default it however compiles later versions of C++).[56]
Front ends
[edit]
Each front end uses a parser to produce the abstract syntax tree of a given source file. Due to the syntax tree abstraction, source files of any of the different supported languages can be processed by the same back end. GCC started out using LALR parsers generated with Bison, but gradually switched to hand-written recursive-descent parsers for C++ in 2004,[57] and for C and Objective-C in 2006.[58] As of 2021 all front ends use hand-written recursive-descent parsers.
Until GCC 4.0, the tree representation of the program was not fully independent of the processor being targeted. The meaning of a tree was somewhat different for different language front ends, and front ends could provide their own tree codes. This was simplified with the introduction of GENERIC and GIMPLE, two new forms of language-independent trees that were introduced with the advent of GCC 4.0. GENERIC is more complex, based on the GCC 3.x Java front end's intermediate representation. GIMPLE is a simplified GENERIC, in which various constructs are lowered to multiple GIMPLE instructions. The C, C++, and Java front ends produce GENERIC directly in the front end. Other front ends instead have different intermediate representations after parsing and convert these to GENERIC.
In either case, the so-called "gimplifier" then converts this more complex form into the simpler SSA-based GIMPLE form that is the common language for a large number of language- and architecture-independent global (function scope) optimizations.
GENERIC and GIMPLE
[edit]GENERIC is an intermediate representation language used as a "middle end" while compiling source code into executable binaries. A subset, called GIMPLE, is targeted by all the front ends of GCC.
The middle stage of GCC does all of the code analysis and optimization, working independently of both the compiled language and the target architecture, starting from the GENERIC[59] representation and expanding it to register transfer language (RTL). The GENERIC representation contains only the subset of the imperative programming constructs optimized by the middle end.
In transforming the source code to GIMPLE,[60] complex expressions are split into a three-address code using temporary variables. This representation was inspired by the SIMPLE representation proposed in the McCAT compiler[61] by Laurie J. Hendren[62] for simplifying the analysis and optimization of imperative programs.
Optimization
[edit]Optimization can occur during any phase of compilation; however, the bulk of optimizations are performed after the syntax and semantic analysis of the front end and before the code generation of the back end; thus a common, though somewhat self-contradictory, name for this part of the compiler is the "middle end."
The exact set of GCC optimizations varies from release to release as it develops, but includes the standard algorithms, such as loop optimization, jump threading, common subexpression elimination, instruction scheduling, and so forth. The RTL optimizations are of less importance with the addition of global SSA-based optimizations on GIMPLE trees,[63] as RTL optimizations have a much more limited scope, and have less high-level information.
Some of these optimizations performed at this level include dead-code elimination, partial-redundancy elimination, global value numbering, sparse conditional constant propagation, and scalar replacement of aggregates. Array dependence based optimizations such as automatic vectorization and automatic parallelization are also performed. Profile-guided optimization is also possible.[64]
C++ Standard Library (libstdc++)
[edit]The GCC project includes an implementation of the C++ Standard Library called libstdc++,[65] licensed under the GPLv3 License with an exception to link non-GPL applications when sources are built with GCC.[66]
Other features
[edit]Some features of GCC include:
- Link-time optimization
- Link-time optimization optimizes across object file boundaries to directly improve the linked binary. Link-time optimization relies on an intermediate file containing the serialization of some Gimple representation included in the object file.[citation needed] The file is generated alongside the object file during source compilation. Each source compilation generates a separate object file and link-time helper file. When the object files are linked, the compiler is executed again and uses the helper files to optimize code across the separately compiled object files.
- Plugins
- Plugins extend the GCC compiler directly.[67] Plugins allow a stock compiler to be tailored to specific needs by external code loaded as plugins. For example, plugins can add, replace, or even remove middle-end passes operating on Gimple representations.[68] Several GCC plugins have already been published, notably:
- The support of plugins was once a contentious issue in 2007.[70]
- C++ transactional memory
- The C++ language has an active proposal for transactional memory. It can be enabled in GCC 6 and newer when compiling with
-fgnu-tm.[7][71] - Unicode identifiers
- Although the C++ language requires support for non-ASCII Unicode characters in identifiers, the feature has only been supported since GCC 10. As with the existing handling of string literals, the source file is assumed to be encoded in UTF-8. The feature is optional in C, but has been made available too since this change.[72][73]
- C extensions
- GNU C extends the C programming language with several non-standard-features, including nested functions.[74]
Architectures
[edit]
The primary supported (and best tested) processor families are 64- and 32-bit ARM, 64- and 32-bit x86 64 and x86 and 64-bit PowerPC and SPARC.[75]
GCC target processor families as of version 11.1 include:[76]
Lesser-known target processors supported in the standard release have included:
Additional processors have been supported by GCC versions maintained separately from the FSF version:
- Cortus APS3
- ARC
- AVR32
- C166 and C167
- D10V
- EISC
- eSi-RISC
- Hexagon[77]
- LatticeMico32
- LatticeMico8
- MeP
- MicroBlaze
- Motorola 6809
- MSP430
- NEC SX architecture[78]
- Nios II and Nios
- OpenRISC
- PDP-10
- PIC24/dsPIC
- PIC32
- Propeller
- Saturn (HP48XGCC)
- System/370
- TIGCC (m68k variant)
- TMS9900
- TriCore
- Z8000
- ZPU
The GCJ Java compiler can target either a native machine language architecture or the Java virtual machine's Java bytecode.[79] When retargeting GCC to a new platform, bootstrapping is often used. Motorola 68000, Zilog Z80, and other processors are also targeted in the GCC versions developed for various Texas Instruments, Hewlett Packard, Sharp, and Casio programmable graphing calculators.[80]
License
[edit]GCC is licensed under the GNU General Public License version 3.[81] The GCC runtime exception permits compilation of proprietary programs (in addition to free software) with GCC headers and runtime libraries. This does not impact the license terms of GCC source code.[82]
However this exception is limited. For example, when non-GPL-compatible software is used together with GCC within the Compile Process follow the GPL for all of the propagated object code GCC generated becomes mandatory as it is derived from the GPL-licensed libraries.[83]
See also
[edit]References
[edit]- ^ a b "GCC Releases". GNU Project. Archived from the original on June 4, 2023. Retrieved July 24, 2020.
- ^ Richard Biener (August 8, 2025). "GCC 15.2 Released". Retrieved August 8, 2025.
- ^ "GCC Coding Conventions - GNU Project". gcc.gnu.org. Archived from the original on May 28, 2023. Retrieved February 7, 2022.
- ^ a b Víctor Rodríguez (October 1, 2019). "Cutting Edge Toolchain (Latest Features in GCC/GLIBC)". youtube.com. Linux Foundation. Archived from the original on November 7, 2021. Retrieved January 19, 2021.
- ^ "GCC Runtime Library Exception". Archived from the original on March 31, 2023. Retrieved July 24, 2020.
- ^ "Programming Languages Supported by GCC". GNU Project. Archived from the original on January 18, 2023. Retrieved June 23, 2014.
- ^ a b "GCC 6 Release Series — Changes, New Features, and Fixes - GNU Project". gcc.gnu.org. Archived from the original on September 22, 2016. Retrieved September 19, 2016.
- ^ a b "OpenACC - GCC Wiki". gcc.gnu.org. Archived from the original on April 1, 2015. Retrieved September 19, 2016.
- ^ "The LLVM Compiler Infrastructure Project". llvm.org. Archived from the original on January 18, 2023. Retrieved September 24, 2021.
- ^ "Apple's GPLv3 purge". meta.ath0.com. February 5, 2012. Archived from the original on January 18, 2023. Retrieved January 12, 2021.
- ^ Linnemann, Reid (June 20, 2012). "Why Clang". Archived from the original on January 18, 2023. Retrieved January 12, 2021.
- ^ "August 29, 2007: FreeBSD Foundation Newsletter, August 29, 2007". October 11, 2007. Archived from the original on October 11, 2007. Retrieved January 12, 2021.
- ^ "Installing GCC: Binaries - GNU Project - Free Software Foundation (FSF)". gcc.gnu.org. Archived from the original on January 5, 2021. Retrieved January 12, 2021.
- ^ von Hagen, William (2006). The Definitive Guide to GCC. Definitive Guides (2nd ed.). Apress. p. XXVII. ISBN 978-1-4302-0219-6. Archived from the original on April 5, 2024. Retrieved September 25, 2020.
So he wrote to VUCK's author asking if GNU could use it. Evidently, VUCK's developer was uncooperative, responding that the university was free but that the compiler was not.
- ^ a b c Stallman, Richard (September 20, 2011). "About the GNU Project". The GNU Project. Archived from the original on August 9, 2019. Retrieved October 9, 2011.
- ^ Puzo, Jerome E., ed. (February 1986). "Gnu's Zoo". GNU's Bulletin. 1 (1). Free Software Foundation. Archived from the original on June 23, 2015. Retrieved August 11, 2007.
- ^ a b c d von Hagen, William (2006). The Definitive Guide to GCC. Definitive Guides (2nd ed.). Apress. p. XXVII. ISBN 978-1-4302-0219-6. Archived from the original on April 5, 2024. Retrieved September 25, 2020.
- ^ Richard M. Stallman (forwarded by Leonard H. Tower Jr.) (March 22, 1987). "GNU C compiler beta test release". Newsgroup: comp.lang.c. Archived from the original on June 2, 2013. Retrieved October 9, 2011.
- ^ Stallman, Richard M. (June 22, 2001) [First published 1988], "Contributors to GNU CC", Using and Porting the GNU Compiler Collection (GCC), Free Software Foundation, Inc., p. 7, archived from the original on January 18, 2023, retrieved June 18, 2015.
- ^ Salus, Peter H. (2005). "Chapter 10. SUN and gcc". The Daemon, the Gnu and the Penguin. Groklaw. Archived from the original on June 20, 2022. Retrieved September 14, 2015.
- ^ Garfinkel, Simson L. (August 6, 1990). "Get ready for GNU software". Computerworld. p. 102.
- ^ a b Henkel-Wallace, David (August 15, 1997), A new compiler project to merge the existing GCC forks, archived from the original on January 18, 2023, retrieved May 25, 2012.
- ^ "The Short History of GCC development". www.softpanorama.org. Archived from the original on November 9, 2022. Retrieved January 24, 2021.
- ^ "History - GCC Wiki". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved September 28, 2020.
- ^ "GCC steering committee - GNU Project". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved July 25, 2016.
- ^ "PATCH] Remove chill". gcc.gnu.org. Archived from the original on October 20, 2016. Retrieved July 29, 2010.
- ^ "Chart of Fortran 2003 Features supported by GNU Fortran". GNU. Archived from the original on January 18, 2023. Retrieved June 25, 2009.
- ^ "Chart of Fortran 2008 Features supported by GNU Fortran". GNU. Archived from the original on January 18, 2023. Retrieved June 25, 2009.
- ^ "GCC 4.8 Release Series — Changes, New Features, and Fixes - GNU Project". gcc.gnu.org. Archived from the original on December 8, 2015. Retrieved February 17, 2015.
- ^ "GCC 5 Release Series — Changes, New Features, and Fixes". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved January 13, 2022.
- ^ "GCC 8 Release Series — Changes, New Features, and Fixes". gcc.gnu.org. Archived from the original on November 29, 2018. Retrieved January 13, 2022.
- ^ "Symbian GCC Improvement Project". Archived from the original on August 1, 2014. Retrieved November 8, 2007.
- ^ "Linux Board Support Packages". Archived from the original on June 7, 2011. Retrieved January 24, 2021.
- ^ "setting up gcc as a cross-compiler". ps2stuff. June 8, 2002. Archived from the original on December 11, 2008. Retrieved December 12, 2008.
- ^ "CompileFarm - GCC Wiki". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved September 19, 2016.
- ^ "sh4 g++ guide". Archived from the original on December 20, 2002. Retrieved December 12, 2008.
- ^ "Linux Information Project". LINFO. Archived from the original on January 3, 2023. Retrieved April 27, 2010.
The GCC has been ported to (i.e., modified to run on) more than 60 platforms, which is more than for any other compiler.
- ^ "GCC 9 Release Series — Changes, New Features, and Fixes - GNU Project". Archived from the original on February 19, 2022. Retrieved May 7, 2019.
- ^ "The D Language Front-End Finally Merged Into GCC 9 - Phoronix". phoronix.com. Archived from the original on May 17, 2022. Retrieved January 19, 2021.
- ^ "GCC 13 Release Series — Changes, New Features, and Fixes - GNU Project". Archived from the original on May 26, 2023. Retrieved June 23, 2023.
- ^ Proven, Liam (December 16, 2022). "GCC 13 to support Modula-2: Follow-up to Pascal lives on in FOSS form". Archived from the original on December 19, 2022. Retrieved December 19, 2022.
- ^ a b "GCC Front Ends". gnu.org. Archived from the original on January 18, 2023. Retrieved November 25, 2011.
- ^ "GCC 5 Release Series — Changes, New Features, and Fixes - GNU Project". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved April 23, 2015.
- ^ "GCC 7 Release Series". gnu.org. Archived from the original on September 2, 2020. Retrieved March 20, 2018.
- ^ E. Marchesi, Jose. "GCC Wiki: Algol 68 Front-End". gcc.gnu.org.
- ^ "GCC UPC (GCC Unified Parallel C)". Intrepid Technology, Inc. February 20, 2006. Archived from the original on February 11, 2010. Retrieved March 11, 2009.
- ^ Spengler, Brad (January 12, 2021). "Open Source Security, Inc. Announces Funding of GCC Front-End for Rust". Archived from the original on April 25, 2021.
- ^ "C++ Standards Support in GCC". Archived from the original on April 20, 2022. Retrieved May 17, 2021.
- ^ "Security Features: Compile Time Buffer Checks (FORTIFY_SOURCE)". fedoraproject.org. Archived from the original on January 7, 2007. Retrieved March 11, 2009.
- ^ "languages used to make GCC". Archived from the original on May 27, 2008. Retrieved September 14, 2008.
- ^ "GCC Internals". GCC.org. Archived from the original on January 18, 2023. Retrieved March 1, 2010.
- ^ "Prerequisites for GCC - GNU Project". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved September 5, 2021.
- ^ "GCC allows C++ – to some degree". The H. June 1, 2010. Archived from the original on September 26, 2022. Retrieved June 9, 2010.
- ^ "Re: Efforts to attract more users?". lists.gnu.org. Archived from the original on January 18, 2023. Retrieved September 24, 2021.
- ^ "GCC 4.8 Release Series: Changes, New Features, and Fixes". Archived from the original on December 8, 2015. Retrieved October 4, 2013.
- ^ "bootstrap: Update requirement to C++11". GitHub. Archived from the original on September 29, 2022. Retrieved May 18, 2020.
- ^ "GCC 3.4 Release Series — Changes, New Features, and Fixes - GNU Project". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved July 25, 2016.
- ^ "GCC 4.1 Release Series — Changes, New Features, and Fixes - GNU Project". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved July 25, 2016.
- ^ "GENERIC (GNU Compiler Collection (GCC) Internals)". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved July 25, 2016.
- ^ "GIMPLE (GNU Compiler Collection (GCC) Internals)". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved July 25, 2016.
- ^ "McCAT". Archived from the original on August 12, 2004. Retrieved September 14, 2017.
{{cite web}}: CS1 maint: bot: original URL status unknown (link) - ^ "Laurie Hendren's Home Page". www.sable.mcgill.ca. Archived from the original on September 27, 2022. Retrieved July 20, 2009.
- ^ Novillo, Diego (December 2004). "From Source to Binary: The Inner Workings of GCC". Red Hat Magazine. Archived from the original on April 1, 2009.
- ^ "Installing GCC: Building - GNU Project". gcc.gnu.org. Archived from the original on August 22, 2023. Retrieved July 25, 2016.
- ^ "The GNU C++ Library". GNU Project. Archived from the original on December 25, 2022. Retrieved February 21, 2021.
- ^ "License". GNU Project. Archived from the original on January 18, 2023. Retrieved February 21, 2021.
- ^ "Plugins". GCC online documentation. Archived from the original on April 30, 2013. Retrieved July 8, 2013.
- ^ Starynkevitch, Basile. "GCC plugins thru the MELT example" (PDF). Archived (PDF) from the original on April 13, 2014. Retrieved April 10, 2014.
- ^ "About GCC MELT". Archived from the original on July 4, 2013. Retrieved July 8, 2013.
- ^ Corbet, Jonathan (November 19, 2007). "GCC unplugged [LWN.net]". lwn.net. Archived from the original on November 9, 2020. Retrieved March 28, 2021.
- ^ "TransactionalMemory - GCC Wiki". gcc.gnu.org. Archived from the original on August 19, 2016. Retrieved September 19, 2016.
- ^ "Lewis Hyatt - [PATCH] wwwdocs: Document support for extended identifiers added to GCC". gcc.gnu.org. Archived from the original on March 27, 2020. Retrieved March 27, 2020.
- ^ "Recommendations for extended identifier characters for C and C++". www.open-std.org. Archived from the original on September 30, 2020. Retrieved March 27, 2020.
- ^ "C Extensions (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Archived from the original on January 12, 2022. Retrieved January 12, 2022.
- ^ "GCC 12 Release Criteria". gcc.gnu.org. October 26, 2022. Archived from the original on January 27, 2023. Retrieved January 27, 2023.
- ^ "Option Summary (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Archived from the original on January 18, 2023. Retrieved August 21, 2020.
- ^ "Hexagon Project Wiki". Archived from the original on March 23, 2012. Retrieved May 19, 2011.
- ^ "Google Code Archive - Long-term storage for Google Code Project Hosting". code.google.com. Archived from the original on September 25, 2022. Retrieved September 24, 2021.
- ^ "The GNU Compiler for the Java Programming Language". Archived from the original on May 9, 2007. Retrieved April 22, 2010.
- ^ graphing calculators#programming
- ^ "Using the GNU Compiler Collection". gnu.org. Archived from the original on November 16, 2023. Retrieved November 5, 2019.
- ^ "GCC Runtime Exception". FSF. Archived from the original on April 16, 2014. Retrieved April 10, 2014.
- ^ "GCC Runtime Library Exception Rationale and FAQ - GNU Project - Free Software Foundation". www.gnu.org. Free Software Foundation. Section "How the Exception Works". Archived from the original on July 25, 2025. Retrieved July 31, 2025.
How the Exception Works (...) You have permission to propagate a work of Target Code formed by combining the Runtime Library with Independent Modules, even if such propagation would otherwise violate the terms of GPLv3, provided that all Target Code was generated by Eligible Compilation Processes. You may then convey such a combination under terms of your choice, consistent with the licensing of the Independent Modules. (...) However, if you used GCC in conjunction with GPL-incompatible software during the process of transforming high-level code to low-level code, that would not be an Eligible Compilation Process. This would happen if, for example, you used GCC with a proprietary plugin. (...) As long as you use an Eligible Compilation Process, then you have permission to take the Target Code that GCC generates and propagate it "under terms of your choice." If you did use GPL-incompatible software in conjunction with GCC during the Compilation Process, you would not be able to take advantage of this permission. Since all of the object code that GCC generates is derived from these GPLed libraries, that means you would be required to follow the terms of the GPL when propagating any of that object code. You could not use GCC to develop your own GPL-incompatible software.
Further reading
[edit]- Using the GNU Compiler Collection (GCC), Free Software Foundation, 2008.
- GNU Compiler Collection (GCC) Internals, Free Software Foundation, 2008.
- An Introduction to GCC, Network Theory Ltd., 2004 (Revised August 2005). ISBN 0-9541617-9-3.
- Arthur Griffith, GCC: The Complete Reference. McGraw Hill / Osborne, 2002. ISBN 0-07-222405-3.
External links
[edit]Official
[edit]Other
[edit]- Collection of GCC 4.0.2 architecture and internals documents at I.I.T. Bombay
- Kerner, Sean Michael (March 2, 2006). "New GCC Heavy on Optimization". internetnews.com
- Kerner, Sean Michael (April 22, 2005). "Open Source GCC 4.0: Older, Faster". internetnews.com. Archived from the original on September 17, 2006. Retrieved October 21, 2006
- From Source to Binary: The Inner Workings of GCC, by Diego Novillo, Red Hat Magazine, December 2004
- A 2003 paper on GENERIC and GIMPLE
- Marketing Cygnus Support, an essay covering GCC development for the 1990s, with 30 monthly reports for in the "Inside Cygnus Engineering" section near the end
- EGCS 1.0 announcement
- EGCS 1.0 features list
- Fear of Forking, an essay by Rick Moen recording seven well-known forks, including the GCC/EGCS one
GNU Compiler Collection
View on GrokipediaHistory
Origins and Initial Development (1980s)
The GNU Project, aimed at creating a complete free Unix-compatible operating system, was announced by Richard Stallman on September 27, 1983, via a Usenet posting that outlined the need for user-modifiable tools to replace proprietary software prevalent in computing at the time.[8][9] Among the planned components was a compiler, as existing C compilers—such as those from Bell Labs for PDP-11 systems or DEC for VAX machines—were not freely redistributable or modifiable, restricting collaborative development and user freedoms in an era dominated by licensed binaries from vendors like DEC, Sun Microsystems, and AT&T.[8] Stallman emphasized that GNU would prioritize software licenses allowing unlimited copying and modification, addressing the absence of free alternatives for essential tools like compilers amid the growing reliance on C for Unix-like systems.[8] The GNU C Compiler (GCC), originally developed solely for the C language, emerged as a foundational GNU tool to enable compilation of the project's other components, with work beginning in 1986 under Stallman's leadership using resources like an MIT-provided VAX 11/750.[10] The first public beta release, version 0.9, was distributed on March 22, 1987, supporting C compilation targeted at DEC VAX systems and Sun Microsystems' 68k-based workstations (Sun-1 through Sun-3) running BSD-derived Unix variants.[11][12] This release marked GCC's portability focus, generating assembly code adaptable to these platforms' architectures, though it lacked advanced optimizations present in proprietary counterparts.[11] Initial bootstrapping posed significant hurdles, as GCC's C-written source required an extant C compiler for initial builds; developers thus depended on vendor-supplied proprietary compilers—such as DEC's VAX C or Sun's own tools—to cross-compile GCC, verifying its output before achieving self-hosting capability where later versions could compile themselves.[10] This reliance highlighted the project's early vulnerability to non-free tools, yet successful bootstraps on VAX and Sun hardware validated GCC's design for incremental self-sufficiency, paving the way for broader GNU toolchain integration by late 1987.[10]Expansion and Challenges (1990s)
During the 1990s, GCC expanded beyond its initial C focus to support additional languages, with significant advancements in C++ through the g++ front end, originally developed by Michael Tiemann and maintained by figures like Jason Merrill, enabling more robust native-code compilation for object-oriented features. Fortran support was introduced via the g77 front end, maintained by Craig Burley, which integrated Fortran 77 compatibility and laid groundwork for handling legacy scientific codebases, reflecting growing demands from high-performance computing communities. These additions broadened GCC's utility, allowing it to compile diverse codebases while sharing a common back end for optimizations across targets.[4] A major challenge emerged with the formation of the Experimental/Enhanced GNU Compiler System (EGCS) fork on August 15, 1997, initiated by developers frustrated with the Free Software Foundation's (FSF) stewardship of GCC, which emphasized excessive stability and conservative release cycles at the expense of innovation. Critics, including Linux kernel hackers and Fortran maintainers, argued that FSF's single-gatekeeper model stifled rapid feature integration and architectural improvements needed for emerging platforms, leading to parallel development snapshots and community divergence. Cygnus Solutions, a commercial entity providing engineering support, played a pivotal role by hosting the EGCS mailing lists, contributing ports to over 175 host/target combinations, and sustaining development through paid expertise without relying on FSF monopoly, thus enabling faster experimentation.[4] The fork highlighted governance tensions but spurred progress, as EGCS incorporated enhancements like better C++ parsing and Fortran integration more aggressively. After negotiations, EGCS merged back into GCC in April 1999, with the FSF appointing the EGCS team as official maintainers; this reunion prompted the renaming to GNU Compiler Collection to reflect its multi-language scope, culminating in the GCC 2.95 release in July 1999, which unified the codebase and resolved the schism while underscoring the need for balanced community-driven evolution over rigid central control.[4]Maturation and Recent Advances (2000s–Present)
GCC continued its evolution in the 2000s as a robust multi-language compiler suite, with major releases emphasizing enhanced optimizations, standard compliance, and portability. The GCC 4.0 series, first released on April 20, 2005, marked a pivotal advancement by introducing tree-level Static Single Assignment (SSA) form, enabling more sophisticated interprocedural optimizations and improved code generation across languages like C and C++.[13][14] This release also bolstered C++ support, aligning closer with the ISO/IEC 14882 standard through better template handling and exception mechanisms.[14] Subsequent versions in the decade, such as GCC 4.1 through 4.8, refined these capabilities with annual iterations focused on stability, vectorization for multicore processors, and initial accommodations for 64-bit architectures and SIMD instructions.[13] From the 2010s onward, GCC maintained a cadence of yearly major releases, adapting to modern hardware through extended instruction set support (e.g., AVX, ARMv8) and runtime libraries like libgomp for parallel computing.[13] The project emphasized regression testing and maintainer-driven feature freezes to ensure reliability for enterprise and embedded deployments. By the GCC 11–15 series (spanning 2021–2025), enhancements included refined middle-end transformations for energy-efficient code on heterogeneous systems and broader OpenMP 5.x conformance for directive-based parallelism.[15] The GCC 15.1 release on April 25, 2025, exemplified ongoing maturation by integrating a new COBOL front-end (gcobol), limited to 64-bit x86-64 and AArch64 targets due to complexity in handling legacy fixed-point arithmetic and procedural dialects.[16][15] It also featured Rust front-end refinements via the gccrs project, improving borrow checker integration and codegen for safe concurrency, alongside vectorization boosts for large-scale data processing.[17] Initial work-in-progress patches for an Algol-68 front-end were submitted in January 2025, aiming to revive the language's parallel modes and strong typing within GCC's infrastructure, though full integration remains pending upstream review.[18] As of October 2025, GCC 16 development transitioned to stage 3 on November 17, restricting changes to bug fixes, new target ports (e.g., emerging RISC-V extensions), and performance regressions to preserve stability ahead of the anticipated 2026 release.[19] This phased approach underscores GCC's commitment to empirical validation through extensive bootstrap testing and community-driven ports, ensuring compatibility with evolving hardware like AI accelerators without compromising backward compatibility.[20]Supported Languages
Primary Languages and Front Ends
The GNU Compiler Collection's primary front ends target C, C++, Fortran, Ada, Go, and Objective-C, each leveraging GCC's shared middle-end and back-end infrastructure for optimization and code generation across diverse architectures. C and C++ remain the foundational languages, with thegcc driver handling C compilation since GCC's initial release in May 1987 by Richard Stallman, establishing it as a portable alternative to proprietary compilers.[2] The C++ front end, invoked via g++, originated as an extension and achieved early integration by GCC 2.0 in 1992, prioritizing standards conformance to facilitate widespread adoption in systems programming.[21] These front ends preprocess source code, parse it into abstract syntax trees, and feed intermediate representations into GCC's optimization passes, ultimately invoking Binutils tools like as for assembly and ld for linking to produce executables.[1]
GCC provides full support for the ISO C11 and C17 standards via gcc, with substantial implementation of C23 features including attributes and bit-precise integers as of GCC 15.1 released in April 2025.[22] For C++, g++ offers complete conformance to C++11, C++14, and C++17; near-complete support for C++20 (including concepts and coroutines); and partial implementation of C++23, such as extended modules and pattern matching, though some library features like std::expected remain experimental pending full standardization expected in 2026.[23] This evolution reflects iterative improvements driven by ISO committee feedback and community testing, ensuring GCC's role as a de facto reference for standards validation despite occasional divergences for performance reasons.[23]
The Fortran front end, gfortran, entered GCC with version 4.0 in February 2005, superseding the legacy g77 to deliver modern conformance to Fortran 2003, 2008, and 2018 standards, including parallel constructs like DO CONCURRENT and team-based coarrays.[24] As of GCC 15.1, gfortran experimentally supports select Fortran 2023 features, such as enhanced interoperability with C, positioning it as a robust choice for scientific computing workloads integrated with libraries like LAPACK.[24]
Ada compilation occurs through the GNAT front end, incorporated into GCC since version 2.8 in 1997 via collaboration with the Ada Joint Program Office, providing full Ada 95 and Ada 2012 compliance alongside partial Ada 2022 support for contracts and expression functions in GCC 13 and later.[25] GNAT emphasizes static verification and safety-critical reliability, generating code that links seamlessly with C via foreign function interfaces.
For Go, the gccgo front end, introduced in GCC 4.5 around 2010, parses Go 1-compatible syntax and utilizes GCC's backend for superior optimization compared to the reference gc compiler, though it lags in adopting the latest Go module features.[26] Objective-C support, available since GCC 1.x in the early 1990s, enables compilation of Objective-C and Objective-C++ via gcc or g++ with the -fobjc flag, targeting GNU Objective-C runtimes for non-Apple platforms like GNUstep, with dialect options aligning to Objective-C 2.0 constructs such as fast enumeration.[27] These front ends collectively underscore GCC's maturity in handling production-grade code for embedded, desktop, and high-performance systems.[1]
Extensions, Experimental, and Recent Additions
The GNU Compiler Collection includes experimental front ends for languages beyond its core offerings, such as D, which was integrated as a full front end starting with GCC 9 in 2019 but retains limitations in optimization and standard compliance compared to dedicated compilers like DMD.[28] Similarly, the Rust front end, known as gccrs, provides partial support as an alternative to rustc, with significant updates merged for GCC 15 in 2025 enabling compilation of substantial Rust codebases, though it lacks full feature parity and is actively developed toward upstream integration, including efforts to compile the Linux kernel.[29][30] In April 2025, GCC 15.1 introduced a COBOL front end, marking the first native integration of this legacy language into the collection, developed by Symas' COBOLworx team with over 134,000 lines of code; however, support is restricted to 64-bit x86-64 targets and aims for partial COBOL 2023 compliance, excluding advanced I/O enhancements available only through proprietary extensions.[1][31][32] This addition reflects community-driven revival of niche languages for modernization of legacy systems, though practical use requires awareness of its incomplete runtime and platform limitations.[33] Ongoing experimental efforts include an Algol-68 front end, proposed in January 2025 by an Oracle engineer with initial patches covering core syntax and semantics of the 1968 language standard; despite updates through October 2025, it has not been merged into mainline GCC due to steering committee decisions prioritizing stability, remaining available via external patches for niche historical or educational compilation.[34][35] Previously, the Java front end (GCJ) served as an experimental native compiler but was fully deprecated and removed by GCC 7 in 2016 owing to stalled development and lack of maintenance.[36] Within established languages like C++, recent standards introduce experimental or incomplete features; for instance, C++20 modules, while usable for many projects in GCC 15 as of 2025, face ongoing criticisms for internal compiler errors, build system incompatibilities, and incomplete standard library integration, hindering widespread adoption despite header unit support.[23][37] These extensions underscore GCC's modular architecture enabling community contributions, but users must verify feature maturity via release notes, as incomplete implementations can lead to unreliable code generation or portability issues.[13]Technical Architecture
Front-End Parsing and Language-Specific Processing
The GNU Compiler Collection (GCC) utilizes modular front-ends tailored to individual programming languages, enabling the parsing of source code into structured internal representations such as abstract syntax trees (ASTs) or equivalent tree structures specific to each language's syntax and semantics.[38][39] Each front-end is invoked once per compilation unit through hooks likelang_hooks.parse_file, performing lexical analysis to tokenize input, syntactic parsing to build the tree hierarchy, and initial semantic checks to enforce language rules such as type compatibility and scope resolution.[38] This language-specific processing ensures accurate validation of constructs unique to the source language, including dialects and extensions, before passing validated declarations and definitions onward.[38]
In the C and C++ front-ends, preprocessing occurs via the integrated cpp module, which expands macros, resolves include directives, and applies conditional compilation as defined in standards like ISO C99 or C11, prior to tokenization and parsing of the refined input stream.[40] The C parser, transitioned from a Bison-generated implementation to a hand-written recursive descent parser in GCC 4.1 released in 2006, constructs parse trees while accommodating GNU extensions such as nested functions, case ranges in switch statements, and attributes for function properties.[41] Similarly, the C++ front-end (cp) employs a custom parser to handle object-oriented features, templates, and exceptions, validating compliance with ISO C++ standards alongside GNU-specific enhancements like __attribute__ directives.[38]
Front-ends for other languages, such as Fortran (gfortran), Ada, or Go, implement analogous parsing pipelines adapted to their grammars; for instance, some rely on tools like GNU Bison for generating parsers during build, as required for components like the COBOL front-end since GCC 10 in 2020.[42] These parsers prioritize fidelity to language semantics, including array handling in Fortran or package modules in Go, without incorporating cross-language optimizations.[38] Semantic phases during parsing detect errors like undeclared identifiers or mismatched types, ensuring the resulting tree captures the program's intended structure accurately.[38] This separation maintains GCC's extensibility, allowing community-contributed front-ends for experimental languages to interface via standardized hooks while preserving language purity.[39]
Middle-End Intermediate Representations and Transformations
The middle-end of the GNU Compiler Collection (GCC) utilizes intermediate representations (IRs) to enable language-independent optimizations, decoupling front-end parsing from back-end code generation. The foundational high-level IR is GENERIC, a tree-based structure produced by front-ends to represent program semantics in a manner abstracted from specific source languages.[43] GENERIC trees encode expressions, statements, types, and control flow using a hierarchical node system, facilitating initial semantic checks and basic transformations while preserving essential program structure. From GENERIC, the compiler generates GIMPLE, a simplified, tuple-oriented IR restricted to three-address forms with at most three operands per statement, which canonicalizes complex expressions into sequences of basic operations.[44] This form supports precise data-flow and alias analyses by eliminating side effects in expressions and introducing temporaries as needed. GIMPLE's tree foundation allows for recursive traversal and manipulation, underpinning optimizations like constant folding and common subexpression elimination.[45] GCC further refines GIMPLE into GIMPLE SSA (Static Single Assignment), where each variable assignment occurs exactly once, creating explicit versions (e.g.,x_1, x_2) to track definitions and uses across basic blocks.[43] Introduced via the Tree SSA framework developed between 2003 and 2004, this representation enhances optimization precision by enabling efficient computation of dominators, phi functions for merging values, and sparse conditional constant propagation.[46] The SSA form's explicit flow dependencies reduce the need for iterative fixpoint analyses, accelerating passes on large functions.
Optimization passes in the middle-end primarily target GIMPLE SSA, performing transformations such as function inlining to reduce call overhead and expose cross-function redundancies, dead code elimination via removal of unreachable blocks and unused computations, and loop optimizations including induction variable analysis, invariant hoisting, and unrolling for improved cache locality.[43] These passes iterate over the control flow graph derived from GIMPLE trees, applying peephole-like rewrites and global analyses to minimize execution time and code size, with effects verifiable through flags like -fdump-tree-optimized.[44]
The shift to tree-based IRs, culminating in GENERIC and GIMPLE around the early 2000s, marked an evolution from GCC's prior reliance on lower-level RTL for all optimizations, enabling higher-level, context-sensitive analyses that better exploit modern hardware features across languages. This design supports modular pass scheduling, where optimizations are organized into phases (e.g., early inlining before vectorization), ensuring incremental improvements without full recompilation.[43]
Back-End Code Generation and Optimization
The GCC back-end generates architecture-specific machine code from the Register Transfer Language (RTL), a low-level intermediate representation that expresses computations as transfers between registers, memory, and constants, closely mirroring assembly-level operations. RTL chains, known as basic blocks, are produced by expanding middle-end trees into sequences ofset, call, jump, and other primitives, enabling subsequent target-dependent transformations.[47] This representation supports both machine-independent RTL passes, such as common subexpression elimination, and architecture-specific code generation via pattern matching against instruction descriptions in .md files.[48]
Instruction selection occurs through the gen_* tools, which compile machine descriptions into efficient dispatch tables for matching RTL patterns to target instructions, often incorporating constraints on operands and registers. Following selection, the back-end performs optimizations like RTL combine for peephole-style pattern replacement and simplification, reload for resolving constraint violations post-scheduling, and instruction scheduling via list or region schedulers to minimize pipeline stalls and fill delay slots on architectures like MIPS or SPARC. Register allocation employs graph coloring algorithms, with heuristics for spilling and live range splitting, tailored to the target's register file size and calling conventions defined in target macros.[48][47]
Target-specific optimizations extend to vector code generation, where RTL vector operations are lowered to SIMD instructions, such as x86 SSE/AVX extensions or ARM NEON/SVE, respecting architecture vector lengths and alignment requirements during expansion and scheduling. Profile-guided optimization (PGO), enabled via -fprofile-generate and -fprofile-use, propagates execution frequencies into RTL passes to bias decisions like basic block reordering for locality, branch target prediction, and function partitioning into hot/cold sections, yielding up to 10-20% performance gains in profiled workloads on average.[49] In GCC 15, back-end enhancements include refined AArch64 scheduling for SVE vectorization and broader improvements in code quality across targets, contributing to SPEC benchmark uplifts of over 11% in floating-point rates through better instruction selection and emission.[15][50]
Associated Libraries and Runtimes
GCC includes libgcc, a low-level runtime library distributed aslibgcc.a (static) or libgcc_s.so.1 (shared on supported platforms), which supplies routines automatically invoked by compiler-generated code for hardware-unsupported operations such as integer division and multiplication on certain architectures, stack unwinding for exception handling, and synchronization primitives like atomic operations.[51] This library is distinct from the system C standard library (e.g., glibc or musl), focusing solely on compiler-specific runtime dependencies rather than general-purpose standard functions.[3]
For C++ compilation via g++, GCC provides libstdc++, the GNU implementation of the ISO/IEC 14882 C++ standard library, covering clauses 17 through 33 (including containers, algorithms, iterators, and I/O streams) along with annexes for compatibility and numerics. libstdc++ incorporates extensions from technical reports such as TR1 (e.g., unordered containers and regular expressions, later standardized in C++11) and supports ongoing C++ standards like C++23 features via headers introduced under P1642. It maintains ABI compatibility policies, with stable interfaces since GCC 3.4 (2004) and subsequent policy-defined epochs to minimize binary breakage across compiler versions.
Additional language-specific runtimes bundled with GCC include libgfortran for Fortran intrinsic procedures and array handling, libgo for Go concurrency primitives like goroutines, and libobjc for Objective-C runtime support, all integrated during the GCC build process to enable self-contained compilation targets without external dependencies for core language features.[52] These libraries ensure portability across GCC-supported architectures by providing architecture-agnostic abstractions over target-specific implementations, such as multilib variants for different ABI models (e.g., 32-bit vs. 64-bit).
GCC supports the creation of freestanding programs that avoid dependencies on standard startup files and runtime libraries through the -nostdlib option. This flag disables automatic linking of startup files (such as crt*.o) and default libraries (including libc and libgcc), enabling minimal executables that do not rely on the standard C runtime environment. Such programs are useful for operating system kernels, bootloaders, or embedded systems.[3]
On Linux x86_64, a minimal program that exits immediately without invoking libc can be implemented as follows:
void _start(void)
{
asm volatile(
"mov $60, %%rax\n\t"
"xor %%rdi, %%rdi\n\t"
"syscall"
::: "rax", "rdi"
);
}
void _start(void)
{
asm volatile(
"mov $60, %%rax\n\t"
"xor %%rdi, %%rdi\n\t"
"syscall"
::: "rax", "rdi"
);
}
gcc -nostdlib minimal.c -o minimal
gcc -nostdlib minimal.c -o minimal
./minimal terminates with exit code 0.
A similar example that outputs "Hello, world!\n" using the write syscall before exiting:
void _start(void)
{
const char msg[] = "Hello, world!\n";
asm volatile(
"mov $1, %%rax\n\t"
"mov $1, %%rdi\n\t"
"lea %1, %%rsi\n\t"
"mov $14, %%rdx\n\t"
"syscall\n\t"
"mov $60, %%rax\n\t"
"xor %%rdi, %%rdi\n\t"
"syscall"
:
: "m"(msg)
: "rax", "rdi", "rsi", "rdx"
);
}
void _start(void)
{
const char msg[] = "Hello, world!\n";
asm volatile(
"mov $1, %%rax\n\t"
"mov $1, %%rdi\n\t"
"lea %1, %%rsi\n\t"
"mov $14, %%rdx\n\t"
"syscall\n\t"
"mov $60, %%rax\n\t"
"xor %%rdi, %%rdi\n\t"
"syscall"
:
: "m"(msg)
: "rax", "rdi", "rsi", "rdx"
);
}
gcc -nostdlib hello.c -o hello
gcc -nostdlib hello.c -o hello
Target Architectures and Platforms
Historical and Core Supported Architectures
The GNU Compiler Collection (GCC) originated in 1987 with support for the VAX and Motorola 68000-series (including the 68020) architectures, reflecting its roots in Unix-like systems prevalent on those platforms.[4] The initial release, version 0.9 on March 22, 1987, targeted DEC VAX minicomputers and Sun Microsystems' 68k-based workstations (Sun-1 through Sun-3), enabling compilation of C code without proprietary tools.[4] By version 1.0 in May 1987, these ports were refined for CISC machines like VAX and m68k, prioritizing portability across early Unix environments.[4] Expansion accelerated in the late 1980s and 1990s, incorporating RISC architectures such as SPARC (first ported in 1988) and MIPS, alongside x86 variants like the Intel 80386 (supported from GCC 1.27 in 1988).[4] By 1990, GCC encompassed thirteen distinct architectures, driven by community contributions and commercial efforts like Cygnus Support, which by the mid-1990s enabled over a dozen target backends and dozens of host-target combinations.[4] This modular backend design, using machine descriptions to generate code for diverse instruction sets, facilitated ports to embedded systems (e.g., MIPS for network routers) and high-performance computing targets like PowerPC.[53] Core modern architectures maintained in GCC include x86 and x86-64 (IA-32 and AMD64), ARM (32- and 64-bit variants), RISC-V, and PowerPC, which receive regular updates and testing for mainstream operating systems and embedded applications.[53] These form the backbone for Linux distributions, Android, and server workloads, with configurable backends supporting over 50 primary architectures and hundreds of variants through target triples (e.g., specifying CPU models and ABIs).[53] Declining architectures like IA-64 (Itanium), once prominent in enterprise servers, saw support phased out starting with GCC 15 in 2024, reflecting reduced hardware adoption despite maintained compatibility in prior releases.[53] GCC's breadth underscores its role in cross-compilation, with ongoing community ports ensuring viability for niche embedded targets like AVR and Blackfin.[53]Porting Process and Community Contributions
The porting of GCC to a new target architecture centers on implementing a backend that translates the middle-end's Register Transfer Language (RTL) intermediate representation into target-specific assembly instructions, while the frontend and optimization passes remain largely shared and architecture-agnostic. Developers define the target through configuration files (e.g.,config.gcc), machine description files (.md) specifying instruction patterns, predicates, and constraints, and C header files (e.g., machine.h) for hooks like register allocation and calling conventions. This modular backend design minimizes changes to GCC's core, requiring coordination with related tools like GNU Binutils for assembler support, often ported first to handle generated assembly.[48][54]
A notable example is the RISC-V instruction set architecture, which received full upstream support in GCC 10, released on May 6, 2020, enabling comprehensive compilation for its base and extension instructions after iterative community refinements from earlier partial integrations. Efforts for emerging ISAs, such as custom RISC designs or extensions like vector processing, follow similar multi-stage bootstrapping: validating basic code generation, optimizing for performance, and integrating via patches reviewed by GCC maintainers. These ports typically involve initial self-hosting on simulators or existing hardware before full validation.
Contributions to GCC ports arise from a decentralized ecosystem of volunteers and corporations, including Red Hat engineers who maintain targets for Linux distributions on architectures like ARM and PowerPC, and Intel developers focusing on x86 enhancements and experimental ports. This distributed model, coordinated through the GCC mailing lists and copyright assignments to the Free Software Foundation, avoids centralized monopoly by relying on merit-based patch acceptance rather than proprietary control, with corporate involvement tied to self-interest in compatible ecosystems rather than overarching governance.[55]
Licensing and Legal Framework
GPL Licensing and Copyleft Principles
The GNU Compiler Collection (GCC) is licensed under the GNU General Public License version 3 (GPLv3) or any later version, a shift implemented with the release of GCC 4.3 in April 2008 following the final GPLv3 text on June 29, 2007.[56][57] This copyleft license mandates that users who modify and distribute GCC or derivative works must provide the corresponding source code under the same terms, preserving the four essential freedoms: to run the program, study and modify it, redistribute copies, and distribute modified versions.[57] The "viral" aspect of copyleft causally enforces openness by treating combined works as derivatives, thereby preventing enclosure of modifications behind proprietary restrictions and ensuring perpetual access to improvements for the community.[57] Historically, GCC originated under GPLv2, but the upgrade to GPLv3 addressed evolving threats to software freedom, such as hardware restrictions on modified software (tivoization) and patent risks, while maintaining compatibility with prior versions via explicit clauses.[57] Certain components, like runtime libraries (e.g., libgcc and libstdc++), incorporate a specific GCC Runtime Library Exception, permitting the compilation and distribution of non-GPL programs—including proprietary ones—without requiring those outputs to adopt GPLv3 terms, provided the exception's conditions are met.[58] This exception, formalized in version 3.1 alongside GPLv3, evolved from earlier informal permissions under GPLv2, reflecting a pragmatic balance to facilitate GCC's role as a toolchain without unduly restricting compiled binaries.[59] Enforcement of GPLv3 for GCC falls under the Free Software Foundation's (FSF) community-oriented approach, prioritizing education and compliance over litigation, with reports of violations handled through requests for source code disclosure rather than immediate suits.[60] The FSF holds copyrights on substantial portions of GCC, enabling coordinated defense of terms, though empirical cases specific to GCC modifications remain limited in public record, underscoring the license's deterrent effect through transparency requirements.[61] These principles have empirically sustained GCC's development as a communal resource, as modifications distributed without source—such as in embedded or forked distributions—violate the license, compelling eventual openness or reversion to upstream.[62]Dual-Licensing Options and Compatibility Issues
The GNU Compiler Collection (GCC) incorporates a GCC Runtime Library Exception to the GPLv3 license, specifically version 3.1 dated March 31, 2009, which applies to key runtime libraries such as libgcc, libstdc++, libgfortran, libgomp, libdecnumber, and libgcov.[59] This exception permits the combination of these libraries with "Independent Modules"—code that does not incorporate or link with GPL-incompatible components during an "Eligible Compilation Process"—allowing the resulting target code to be conveyed under terms chosen by the developer, including proprietary licenses.[59] As a result, proprietary applications compiled with GCC can link against these libraries without triggering full GPL copyleft obligations on the entire program, provided no proprietary plugins or incompatible elements are used in the core compilation.[56] This mechanism addresses usability concerns by enabling widespread adoption in commercial software development, where strict GPL enforcement would otherwise prohibit integration.[59] While GCC's core codebase remains under GPLv3—following the project's transition from GPLv2 with the release of GCC 4.2.2 in 2007—the runtime exception functions as a targeted compatibility layer rather than formal dual-licensing, which would offer explicit alternatives like a proprietary option.[59] No such dual-licensing scheme exists for GCC itself, distinguishing it from projects that provide both copyleft and permissive variants to licensees.[56] This exception has ensured that binaries produced by standard GCC usage remain unencumbered by GPL requirements, permitting distribution under any terms without relicensing the output as free software.[63] Compatibility challenges have arisen from GPLv3's anti-tivoization provisions in Section 6, which mandate that distributors of "User Products" (interactive devices like embedded systems) provide necessary installation information—such as signing keys or firmware update mechanisms—to enable users to run modified versions of included GPL-licensed software.[64] Tivoization refers to hardware restrictions that verify software signatures to block unauthorized modifications, a practice the Free Software Foundation (FSF) views as undermining user freedoms despite source code availability under GPLv2.[64] Embedded vendors have criticized this requirement, arguing it compromises device security, intellectual property protection, and reliability by potentially exposing systems to unverified code alterations.[65] The FSF counters that such measures preserve the essential right to modify and reinstall software, rejecting hardware-imposed limitations as contrary to free software principles.[64] GCC's GPLv3 adoption amplified these tensions for vendors compiling GPL components into locked-down firmware, though the runtime exception mitigates direct impacts on proprietary binaries.[66] No significant forks of GCC have emerged solely from licensing disputes, unlike responses to the GPLv3 shift in other projects; instead, some distributions like FreeBSD retained older GPLv2 versions (e.g., GCC 4.2.1 from 2007) to avoid compatibility hurdles.[67] This contrasts with more permissive frameworks like LLVM, which under Apache 2.0 avoids copyleft and tivoization constraints, facilitating easier proprietary extensions without exceptions.[68] The FSF maintains that GCC's structure upholds copyleft integrity while pragmatically supporting diverse use cases through the exception, without diluting freedoms for derivative works.[59]Adoption and Impact
Role in Operating Systems and Embedded Systems
The GNU Compiler Collection (GCC) has been the primary compiler for building the Linux kernel since its initial development in 1991, when Linus Torvalds used GCC version 1.x to compile early versions on Minix-derived systems. The kernel's Makefile explicitly invokes GCC as the default, relying on its specific extensions, inline assembly support, and optimization flags for generating performant code across architectures like x86, ARM, and RISC-V. This foundational role extends to enabling the kernel's use in diverse environments, including servers, desktops, and embedded devices, where GCC's stable output ensures bootable and reliable binaries. As of kernel version 6.x releases in 2023–2025, GCC versions 5.1 and later remain the minimum supported, with ongoing enhancements for newer GCC iterations tested via the kernel's build bot infrastructure. In Linux distributions, GCC forms the core of the development toolchain, compiling the kernel, system libraries like glibc, and the bulk of user-space applications via package managers such as APT in Debian-based systems or DNF in Fedora. It powers the build processes for over 90% of open-source packages in major repositories, as evidenced by dependency graphs in distro-specific build farms, ensuring interoperability within the GNU ecosystem. For BSD variants like FreeBSD and NetBSD, GCC was historically the default compiler through the 2000s but has been largely supplanted by Clang since around 2010 due to GPLv3 licensing incompatibilities with BSD's permissive model, though GCC remains available for legacy or specific portability needs. GCC's cross-compilation capabilities underpin its dominance in embedded systems and IoT, where it generates code for targets like ARM Cortex-M, AVR, and MIPS processors used in microcontrollers and sensors. Toolchains such as those from the Embedded GNU Project or vendor-specific variants (e.g., for Raspberry Pi or ESP32) leverage GCC's backend for low-level optimizations, enabling efficient firmware for battery-constrained devices; empirical benchmarks show GCC producing binaries with comparable or superior code density to alternatives in resource-limited scenarios. This extends to mobile ecosystems, where early Android Native Development Kit (NDK) versions from 2009–2016 defaulted to GCC for compiling C/C++ libraries, supporting app portability before the 2017 shift to Clang for better security features like address sanitization. Additionally, projects like MinGW-w64 use GCC for cross-compiling POSIX-compliant Windows executables from Linux hosts, facilitating hybrid development workflows. Historically, pre-2012 macOS releases integrated GCC (up to version 4.2) in Xcode for native app builds, prior to Apple's adoption of Clang to avoid GPL constraints.[69][70]Influence on Software Development Practices
GCC's prompt implementation of ISO C and C++ standards has shaped developer practices by enabling early adoption of standardized features, reducing reliance on vendor-specific extensions. Beginning with partial support for C99 features such as inline functions and designated initializers in GCC 3.x releases around 2001–2003, the compiler provided a free reference for testing compliance against the 1999 standard finalized by ISO/IEC.[71] Full substantial conformance arrived with GCC 4.5 in 2010, including options like-std=c99 for strict mode, which encouraged portable coding habits over dialect-specific workarounds prevalent in proprietary compilers of the era.[22] This progression influenced the software industry toward prioritizing standards-compliant code, as developers could compile and optimize against GCC's open implementation without licensing barriers.
In C++, GCC's aggressive support for draft and ratified standards has similarly accelerated the shift to modern idioms. For instance, GCC 4.7 in 2012 introduced core C++11 features like auto, lambdas, and nullptr, shortly after the standard's 2011 publication, allowing practitioners to integrate concurrency and generic programming earlier than many commercial alternatives.[23] By GCC 5 (2015), C++14 enhancements such as variable templates were available, and ongoing experimental support for C++23 in GCC 13 (2023) and later continues this pattern, promoting practices like range-based algorithms and modules for modular, maintainable codebases.[23] This leadership in conformance has causally driven tooling ecosystems, as libraries and frameworks standardized interfaces assuming GCC's availability, embedding standards-centric development into open-source workflows.
As the cornerstone of the GNU toolchain, GCC facilitated the free software movement by enabling bootstrapping—self-compilation from source code—which democratized software creation and distribution. Released initially in 1987, GCC allowed developers to build entire systems, including the GNU operating system components and Linux kernel starting from 1991, without proprietary compilers, thus removing a key barrier to collaborative, source-available projects.[42] This capability fostered merit-based contribution models, where code quality determined integration rather than institutional affiliation, exemplified by GCC's own build process requiring a prior compiler but yielding a verified, optimized successor.[42] Over 35 years, this open bootstrapping paradigm has sustained GCC's evolution through community patches and testing, outperforming closed-source compilers in adaptability and feature longevity, as evidenced by its continued dominance in compiling over 15 million lines of production codebases.[1]
Comparisons with Alternatives
Architectural Differences with LLVM/Clang
The GNU Compiler Collection (GCC) employs a monolithic architecture where frontends, optimization passes, and backends are tightly integrated within a single framework, facilitating a unified pipeline tailored to the entire compilation process. In contrast, LLVM adopts a modular design composed of reusable libraries with well-defined interfaces, enabling independent development and reuse of components such as the optimizer and code generators across diverse tools and languages.[72] This modularity in LLVM supports applications beyond traditional ahead-of-time compilation, such as just-in-time compilation, while GCC's structure emphasizes a cohesive, self-contained system evolved from its origins in the late 1980s.[73] GCC's intermediate representations form a sequential pipeline: language frontends generate GENERIC trees, which are lowered to GIMPLE—a structured, three-address form used for high-level optimizations—and subsequently to Register Transfer Language (RTL) for target-specific code generation and low-level transformations.[44] LLVM, however, centers on a single, typed intermediate representation (LLVM IR) that is static single assignment (SSA)-based and designed for platform independence, allowing optimizations to operate uniformly before backend-specific lowering.[74] This unified IR in LLVM promotes reusability across frontends and backends, differing from GCC's multi-stage tree-to-RTL progression, which embeds more language- and target-specific details earlier in the process. GCC implements separate frontends for each supported language, with parsing and semantic analysis customized per language (e.g., distinct parsers for C, Fortran, or Ada), leading to independent evolution of these components. Clang, as LLVM's frontend for the C family (C, C++, Objective-C), utilizes a single, unified parser that handles these languages cohesively, leveraging a common abstract syntax tree and diagnostics infrastructure.[75] Historically, GCC predates LLVM, with its foundational development beginning in 1987, whereas the LLVM project originated in December 2000 at the University of Illinois under Chris Lattner, later gaining momentum through industry adoption by entities like Apple starting in 2005.[72][76]Performance Benchmarks and Trade-offs
GCC's generated code exhibits competitive runtime performance against LLVM/Clang, with recent benchmarks on x86_64 architectures showing GCC 15 producing binaries that are marginally faster in aggregate across diverse workloads, often by 1-4% in geometric means for CPU-intensive tests on AMD Zen 5 processors.[77] These advantages stem from GCC's refined ahead-of-time (AOT) optimization passes, which excel in scalar and vector code generation for established targets like x86, where historical maturity allows deeper loop unrolling and inlining heuristics compared to LLVM's intermediate representation-focused approach.[78] In contrast, Clang 20 demonstrates strengths in modular optimization pipelines that yield faster compilation times—typically 20-50% quicker for large C++ codebases due to its AST-based parsing efficiency—though this comes at the expense of occasionally less aggressive runtime optimizations in non-vectorized paths.[79] Specific language niches highlight GCC's trade-offs: its gfortran frontend delivers superior Fortran code quality, with benchmarks indicating LLVM Flang trails by approximately 23% in geometric mean runtime across standard suites, attributable to GCC's decades-tuned array handling and DO-loop optimizations honed for scientific computing.[80] Clang, while advancing in diagnostics and incremental builds, historically underperforms in Fortran due to Flang's newer implementation, forcing developers in high-performance computing to favor GCC for reliability despite longer compile phases.[81] Broader trade-offs arise in multi-architecture support, where GCC's monolithic backend sustains optimizations for over 20 primary targets including legacy and embedded systems like MIPS and PowerPC, enabling consistent code quality across ports without LLVM's occasional gaps in niche vector intrinsics or ABI fidelity.[82] LLVM's modular design facilitates rapid backend extensions and JIT scenarios but incurs overhead in AOT scenarios for less common architectures, where GCC's integrated passes reduce binary size by 5-10% in cross-compilation tests via tighter register allocation.[73] Developers must weigh these against Clang's lower memory footprint during builds, which scales better for massive projects but may necessitate vendor-specific flags to match GCC's default robustness in Fortran or multi-arch AOT.[83]Criticisms and Debates
Technical Shortcomings and Optimization Critiques
GCC's compilation process is generally slower than that of LLVM/Clang, with benchmarks showing Clang achieving up to 2-3 times faster build times for large C/C++ projects like the Linux kernel or Firefox due to its modular design and efficient parsing.[84] This disparity arises from GCC's integrated frontend-backend structure, which processes intermediate representations more sequentially, leading to higher memory usage and CPU overhead during optimization passes like link-time optimization (LTO), where GCC's full LTO contrasts with Clang's thinner variants.[85] Recent versions, such as GCC 14, have incorporated profile-guided optimizations and better parallelization, mitigating some delays but not fully closing the gap in empirical tests on x86_64 architectures.[86] In code generation, GCC has historically produced suboptimal output in niche scenarios, such as inefficient vectorization or scalar replacements, resulting in runtime performance deficits of 10-20% compared to Intel's ICC on Intel-specific workloads like numerical simulations.[87] An empirical study of optimization bugs identified frequent issues in GCC's value range propagation and instruction combining passes, which can lead to incorrect or inefficient transformations, though these affect a small fraction of codebases and are addressed via bug fixes rather than systemic flaws.[88] Runtime benchmarks indicate GCC remains competitive overall, often matching or exceeding Clang in integer-heavy tasks, but its monolithic codebase complicates targeted enhancements, slowing responses to architecture-specific tuning.[77] GCC's diagnostic tools, including static analysis via the-fanalyzer flag introduced in GCC 10, have improved in GCC 14 with enhanced leak detection, interprocedural analysis, and clearer warning messages, enabling better identification of memory errors and undefined behavior.[89] [90] However, these lag behind Clang's static analyzer in precision for certain leak patterns and path-sensitive checks, partly due to GCC's historically verbose output that can obscure critical issues amid noise.[90] Ongoing refinements, such as refined state tracking in GCC 14, demonstrate progress without inherent limitations, though the compiler's complexity demands extensive validation to avoid introducing regressions in analysis accuracy.[86]
