C (programming language)
View on Wikipedia
| C | |
|---|---|
Logotype used on the cover of the first edition of The C Programming Language[1] | |
| Paradigm | Multi-paradigm: imperative (procedural), structured |
| Designed by | Dennis Ritchie |
| Developer | ANSI X3J11 (ANSI C); ISO/IEC JTC 1 (Joint Technical Committee 1) / SC 22 (Subcommittee 22) / WG 14 (Working Group 14) (ISO C) |
| First appeared | 1972[a] |
| Stable release | C23
/ October 31, 2024 |
| Preview release | C2y (N3220)
/ February 21, 2024[5] |
| Typing discipline | Static, weak, manifest, nominal |
| OS | Cross-platform |
| Filename extensions | .c, .h |
| Website | |
| Major implementations | |
| pcc, GCC, Clang, Intel C, C++Builder, Microsoft Visual C++, Watcom C | |
| Dialects | |
| Cyclone, Unified Parallel C, Split-C, Cilk, C* | |
| Influenced by | |
| B (BCPL, CPL), ALGOL 68,[b] PL/I, Fortran | |
| Influenced | |
| Numerous: AMPL, AWK, csh, C++, C--, C#, Objective-C, D, Go, Java, JavaScript, JS++, Julia, Limbo, LPC, Perl, PHP, Pike, Processing, Python, Rust, V (Vlang), Vala, Verilog (HDL),[8] Nim, Zig | |
| |
| This article is part of a series on the C programming language |
C[c] is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains widely used and influential. By design, C gives the programmer relatively direct access to the features of the typical CPU architecture, customized for the target instruction set. It has been and continues to be used to implement operating systems (especially kernels[10]), device drivers, and protocol stacks, but its use in application software has been decreasing.[11] C is used on computers that range from the largest supercomputers to the smallest microcontrollers and embedded systems.
A successor to the programming language B, C was originally developed at Bell Labs by Ritchie between 1972 and 1973 to construct utilities running on Unix. It was applied to re-implementing the kernel of the Unix operating system.[12] During the 1980s, C gradually gained popularity. It has become one of the most widely used programming languages,[13][14] with C compilers available for practically all modern computer architectures and operating systems. The book The C Programming Language, co-authored by the original language designer, served for many years as the de facto standard for the language.[15][1] C has been standardized since 1989 by the American National Standards Institute (ANSI) and, subsequently, jointly by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC).
C is an imperative procedural language, supporting structured programming, lexical variable scope, and recursion, with a static type system. It was designed to be compiled to provide low-level access to memory and language constructs that map efficiently to machine instructions, all with minimal runtime support. Despite its low-level capabilities, the language was designed to encourage cross-platform programming. A standards-compliant C program written with portability in mind can be compiled for a wide variety of computer platforms and operating systems with few changes to its source code.
Although neither C nor its standard library provide some popular features found in other languages, it is flexible enough to support them. For example, object orientation and garbage collection are provided by external libraries GLib Object System and Boehm garbage collector, respectively.
Since 2000, C has consistently ranked among the top four languages in the TIOBE index, a measure of the popularity of programming languages.[16]
Characteristics
[edit]
The C language exhibits the following characteristics:
- Free-form source code
- Semicolons terminate statements
- Curly braces group statements into blocks
- Executable code is contained in functions; no script-like syntax
- Parameters are passed by value; pass by-reference is achieved by passing a pointer to a value
- Relatively small number of keywords
- Control flow constructs, including
if,for,do,while, andswitch - Arithmetic, bitwise, and logic operators, including
+,+=,++,&,|| - Multiple assignments may be performed in a single statement
- User-defined identifiers are not distinguished from keywords; i.e. by a sigil
- A variable declared inside a block is accessible only in that block and only below the declaration
- A function return value can be ignored
- A function cannot be nested inside a function, but some translators support this
- Run-time polymorphism may be achieved using function pointers
- Supports recursion
- Data typing is static, but weakly enforced; all variables have a type, but implicit conversion between primitive types weakens the separation of the different types
- User-defined data types allow for aliasing a data type specifier
- Syntax for array definition and access is via square bracket notation, for example
month[11]. Indexing is defined in terms of pointer arithmetic. Whole arrays cannot be copied or compared without custom or library code - User-defined structure types allow related data elements to be passed and copied as a unit although two structures cannot be compared without custom code to compare each field
- User-defined union types support overlapping members; allowing multiple data types to share the same memory location
- User-defined enumeration types support aliasing integer values
- Lacks a string type but has syntax for null-terminated strings with associated handling in its standard library
- Supports low-level access to computer memory via pointers
- Supports procedure-like construct as a function returning
void - Supports dynamic memory via standard library functions
- Includes the C preprocessor to perform macro definition, source code file inclusion, and conditional compilation
- Supports modularity in that files are processed separately, with visibility control via
staticandexternattributes - Minimized functionality in the core language while relatively complex functionality such as I/O, string manipulation, and mathematical functions supported via standard library functions
- Resulting compiled code has relatively straightforward needs on the underlying platform, making it desirable for operating and embedded systems
"Hello, world" example
[edit]
The "Hello, World!" program example that appeared in the first edition of K&R has become the model for an introductory program in most programming textbooks. The program prints "hello, world" to the standard output.
The original version was:[17]
main()
{
printf("hello, world\n");
}
A more modern version is:[d]
#include <stdio.h>
int main(void)
{
printf("hello, world\n");
}
The first line is a preprocessor directive, indicated by #include, which causes the preprocessor to replace that line of code with the text of the stdio.h header file, which contains declarations for input and output functions including printf. The angle brackets around stdio.h indicate that the header file can be located using a search strategy that selects header files provided with the compiler over files with the same name that may be found in project-specific directories.
The next code line declares the entry point function main. The run-time environment calls this function to begin program execution. The type specifier int indicates that the function returns an integer value. The void parameter list indicates that the function consumes no arguments. The run-time environment actually passes two arguments (typed int and char *[]), but this implementation ignores them. The ISO C standard (section 5.1.2.2.1) requires syntax that either is void or these two arguments – a special treatment not afforded to other functions.
The opening curly brace indicates the beginning of the code that defines the function.
The next line of code calls (diverts execution to) the C standard library function printf with the address of the first character of a null-terminated string specified as a string literal. The text \n is an escape sequence that denotes the newline character which when output in a terminal results in moving the cursor to the beginning of the next line. Even though printf returns an int value, it is silently discarded. The semicolon ; terminates the call statement.
The closing curly brace indicates the end of the main function. Prior to C99, an explicit return 0; statement was required at the end of main function, but since C99, the main function (as being the initial function call) implicitly returns 0 upon reaching its final closing curly brace.[e]
History
[edit]Early developments
[edit]| Year | Informal name |
Official standard |
|---|---|---|
| 1972 | first release | — |
| 1978 | K&R C | — |
| 1989, 1990 |
ANSI C, C89, ISO C, C90 |
ANSI X3.159-1989 ISO/IEC 9899:1990 |
| 1999 | C99, C9X | ISO/IEC 9899:1999 |
| 2011 | C11, C1X | ISO/IEC 9899:2011 |
| 2018 | C17, C18 | ISO/IEC 9899:2018 |
| 2024 | C23, C2X | ISO/IEC 9899:2024 |
| TBA | C2Y |
The origin of C is closely tied to the development of the Unix operating system, originally implemented in assembly language on a PDP-7 by Dennis Ritchie and Ken Thompson, incorporating several ideas from colleagues. Eventually, they decided to port the operating system to a PDP-11. The original PDP-11 version of Unix was also developed in assembly language.[12]
B
[edit]Thompson wanted a programming language for developing utilities for the new platform. He first tried writing a Fortran compiler, but he soon gave up the idea and instead created a cut-down version of the recently developed systems programming language called BCPL. The official description of BCPL was not available at the time,[19] and Thompson modified the syntax to be less 'wordy' and similar to a simplified ALGOL known as SMALGOL.[20] He called the result B,[12] describing it as "BCPL semantics with a lot of SMALGOL syntax".[20] Like BCPL, B had a bootstrapping compiler to facilitate porting to new machines.[20] Ultimately, few utilities were written in B because it was too slow and could not take advantage of PDP-11 features such as byte addressability.
Unlike BCPL's // comment marking comments up to the end of the line, B adopted /* comment */ as the comment delimiter, more akin to PL/1, and allowing comments to appear in the middle of lines. (BCPL's comment style would be reintroduced in C++.)[12]
New B and first C release
[edit]In 1971 Ritchie started to improve B, to use the features of the more-powerful PDP-11. A significant addition was a character data type. He called this New B (NB).[20] Thompson started to use NB to write the Unix kernel, and his requirements shaped the direction of the language development.[20][21]
Through to 1972, richer types were added to the NB language. NB had arrays of int and char, and to these types were added pointers, the ability to generate pointers to other types, arrays of all types, and types to be returned from functions. Arrays within expressions were effectively treated as pointers. A new compiler was written, and the language was renamed C.[12]
The C compiler and some utilities made with it were included in Version 2 Unix, which is also known as Research Unix.[22]
Structures and Unix kernel re-write
[edit]At Version 4 Unix, released in November 1973, the Unix kernel was extensively re-implemented in C.[12] By this time, the C language had acquired some powerful features such as struct types.
The preprocessor was introduced around 1973 at the urging of Alan Snyder and also in recognition of the usefulness of the file-inclusion mechanisms available in BCPL and PL/I. Its original version provided only included files and simple string replacements: #include and #define of parameterless macros. Soon after that, it was extended, mostly by Mike Lesk and then by John Reiser, to incorporate macros with arguments and conditional compilation.[12]
Unix was one of the first operating system kernels implemented in a language other than assembly. Earlier instances include the Multics system (which was written in PL/I) and Master Control Program (MCP) for the Burroughs B5000 (which was written in ALGOL) in 1961. In and around 1977, Ritchie and Stephen C. Johnson made further changes to the language to facilitate portability of the Unix operating system. Johnson's Portable C Compiler served as the basis for several implementations of C on new platforms.[21]
K&R C
[edit]
In 1978 Brian Kernighan and Dennis Ritchie published the first edition of The C Programming Language.[23] Known as K&R from the initials of its authors, the book served for many years as an informal specification of the language. The version of C that it describes is commonly referred to as "K&R C". As this was released in 1978, it is now also referred to as C78.[24] The second edition of the book[25] covers the later ANSI C standard, described below.
K&R introduced several language features:
- Standard I/O library
long intdata typeunsigned intdata type- Compound assignment operators of the form
=op(such as=-) were changed to the formop=(that is,-=) to remove the semantic ambiguity created by constructs such asi=-10, which had been interpreted asi =- 10(decrementiby 10) instead of the possibly intendedi = -10(letibe −10).
Even after the publication of the 1989 ANSI standard, for many years K&R C was still considered the "lowest common denominator" to which C programmers restricted themselves when maximum portability was desired, since many older compilers were still in use, and because carefully written K&R C code can be legal Standard C as well.
Although later versions of C require functions to have an explicit type declaration, K&R C only requires functions that return a type other than int to be declared before use. Functions used without prior declaration were presumed to return int.
For example:
long long_function();
calling_function()
{
long longvar;
register intvar;
longvar = long_function();
if (longvar > 1)
intvar = 0;
else
intvar = int_function();
return intvar;
}
The declaration of long_function() (on line 1) is required since it returns long; not int. Function int_function can be called (line 11) even though it is not declared since it returns int. Also, variable intvar does not need to be declared as type int since that is the default type for register keyword.
Since function declarations did not include information about arguments, type checks were not performed, although some compilers would issue a warning if different calls to a function used different numbers or types of arguments. Tools such as Unix's lint utility were developed that (among other things) checked for consistency of function use across multiple source files.
In the years following the publication of K&R C, several features were added to the language, supported by compilers from AT&T (in particular PCC[26]) and other vendors. These included:
voidfunctions; functions returning no value- Functions returning
structoruniontypes - Assignment for
structvariables - Enumerated types
The popularity of the language, lack of agreement on standard library interfaces, and lack of compliance to the K&R specification, led to standardization efforts.[27]
ANSI C and ISO C
[edit]During the late 1970s and 1980s, versions of C were implemented for a wide variety of mainframe computers, minicomputers, and microcomputers, including the IBM PC, as its popularity increased significantly.
In 1983 the American National Standards Institute (ANSI) formed a committee, X3J11, to establish a standard specification of C. X3J11 based the C standard on the Unix implementation; however, the non-portable portion of the Unix C library was handed off to the IEEE working group 1003 to become the basis for the 1988 POSIX standard. In 1989, the C standard was ratified as ANSI X3.159-1989 "Programming Language C". This version of the language is often referred to as ANSI C, Standard C, or sometimes C89.
In 1990 the ANSI C standard (with formatting changes) was adopted by the International Organization for Standardization (ISO) as ISO/IEC 9899:1990, which is sometimes called C90. Therefore, the terms "C89" and "C90" refer to the same programming language.
ANSI, like other national standards bodies, no longer develops the C standard independently, but defers to the international C standard, maintained by the working group ISO/IEC JTC1/SC22/WG14. National adoption of an update to the international standard typically occurs within a year of ISO publication.
One of the aims of the C standardization process was to produce a superset of K&R C, incorporating many of the subsequently introduced unofficial features. The standards committee also included several additional features such as function prototypes (borrowed from C++), void pointers, support for international character sets and locales, and preprocessor enhancements. Although the syntax for parameter declarations was augmented to include the style used in C++, the K&R interface continued to be permitted, for compatibility with existing source code.
C89 is supported by current C compilers, and most modern C code is based on it. Any program written only in Standard C and without any hardware-dependent assumptions will run correctly on any platform with a conforming C implementation, within its resource limits. Without such precautions, programs may compile only on a certain platform or with a particular compiler, due, for example, to the use of non-standard libraries, such as GUI libraries, or to a reliance on compiler- or platform-specific attributes such as the exact size of data types and byte endianness.
In cases where code must be compilable by either standard-conforming or K&R C-based compilers, the __STDC__ macro can be used to split the code into Standard and K&R sections to prevent the use on a K&R C-based compiler of features available only in Standard C.
After the ANSI/ISO standardization process, the C language specification remained relatively static for several years. In 1995, Normative Amendment 1 to the 1990 C standard (ISO/IEC 9899/AMD1:1995, known informally as C95) was published, to correct some details and to add more extensive support for international character sets.[28]
C99
[edit]The C standard was further revised in the late 1990s, leading to the publication of ISO/IEC 9899:1999 in 1999, which is commonly referred to as "C99". It has since been amended three times by Technical Corrigenda.[29]
C99 introduced several new features, including inline functions, several new data types (including long long int and a complex type to represent complex numbers), variable-length arrays and flexible array members, improved support for IEEE 754 floating point, support for variadic macros (macros of variable arity), and support for one-line comments beginning with //, as in BCPL or C++. Many of these had already been implemented as extensions in several C compilers.
C99 is for the most part backward compatible with C90, but is stricter in some ways; in particular, a declaration that lacks a type specifier no longer has int implicitly assumed. A standard macro __STDC_VERSION__ is defined with value 199901L to indicate that C99 support is available. GCC, Solaris Studio, and other C compilers now[when?] support many or all of the new features of C99. The C compiler in Microsoft Visual C++, however, implements the C89 standard and those parts of C99 that are required for compatibility with C++11.[30][needs update]
In addition, the C99 standard requires support for identifiers using Unicode in the form of escaped characters (e.g. \u0040 or \U0001f431) and suggests support for raw Unicode names.
C11
[edit]Work began in 2007 on another revision of the C standard, informally called "C1X" until its official publication of ISO/IEC 9899:2011 on December 8, 2011. The C standards committee adopted guidelines to limit the adoption of new features that had not been tested by existing implementations.
The C11 standard adds numerous new features to C and the library, including type generic macros, anonymous structures, improved Unicode support, atomic operations, multi-threading, and bounds-checked functions. It also makes some portions of the existing C99 library optional, and improves compatibility with C++. The standard macro __STDC_VERSION__ is defined as 201112L to indicate that C11 support is available.
C17
[edit]C17 is an informal name for ISO/IEC 9899:2018, a standard for the C programming language published in June 2018. It introduces no new language features, only technical corrections, and clarifications to defects in C11. The standard macro __STDC_VERSION__ is defined as 201710L to indicate that C17 support is available.
C23
[edit]C23 is an informal name for the current major C language standard revision and was known as "C2X" through most of its development. It builds on past releases, introducing features like new keywords, types including nullptr_t and _BitInt(N), and expansions to the standard library.[31]
C23 was published in October 2024 as ISO/IEC 9899:2024.[32] The standard macro __STDC_VERSION__ is defined as 202311L to indicate that C23 support is available.
C2Y
[edit]C2Y is an informal name for the next major C language standard revision, after C23 (C2X), that is hoped to be released later in the 2020s, hence the '2' in "C2Y". An early working draft of C2Y was released in February 2024 as N3220 by the working group ISO/IEC JTC1/SC22/WG14.[33]
Embedded C
[edit]Historically, embedded C programming requires non-standard extensions to the C language to support exotic features such as fixed-point arithmetic, multiple distinct memory banks, and basic I/O operations.
In 2008, the C Standards Committee published a technical report extending the C language[34] to address these issues by providing a common standard for all implementations to adhere to. It includes a number of features not available in normal C, such as fixed-point arithmetic, named address spaces, and basic I/O hardware addressing.
Definition
[edit]C has a formal grammar specified by the C standard.[35] Line endings are generally not significant in C; however, line boundaries do have significance during the preprocessing phase. Comments may appear either between the delimiters /* and */, or (since C99) following // until the end of the line. Comments delimited by /* and */ do not nest, and these sequences of characters are not interpreted as comment delimiters if they appear inside string or character literals.[36]
C source files contain declarations and function definitions. Function definitions, in turn, contain declarations and statements. Declarations either define new types using keywords such as struct, union, and enum, or assign types to and perhaps reserve storage for new variables, usually by writing the type followed by the variable name. Keywords such as char and int specify built-in types. Sections of code are enclosed in braces ({ and }, sometimes called "curly brackets") to limit the scope of declarations and to act as a single statement for control structures.
As an imperative language, C uses statements to specify actions. The most common statement is an expression statement, consisting of an expression to be evaluated, followed by a semicolon; as a side effect of the evaluation, functions may be called and variables assigned new values. To modify the normal sequential execution of statements, C provides several control-flow statements identified by reserved keywords. Structured programming is supported by if ... [else] conditional execution and by do ... while, while, and for iterative execution (looping). The for statement has separate initialization, testing, and reinitialization expressions, any or all of which can be omitted. break and continue can be used within the loop. Break is used to leave the innermost enclosing loop statement and continue is used to skip to its reinitialisation. There is also a non-structured goto statement, which branches directly to the designated label within the function. switch selects a case to be executed based on the value of an integer expression. Different from many other languages, control-flow will fall through to the next case unless terminated by a break.
Expressions can use a variety of built-in operators and may contain function calls. The order in which arguments to functions and operands to most operators are evaluated is unspecified. The evaluations may even be interleaved. However, all side effects (including storage to variables) will occur before the next "sequence point"; sequence points include the end of each expression statement, and the entry to and return from each function call. Sequence points also occur during evaluation of expressions containing certain operators (&&, ||, ?: and the comma operator). This permits a high degree of object code optimization by the compiler, but requires C programmers to take more care to obtain reliable results than is needed for other programming languages.
Kernighan and Ritchie say in the Introduction of The C Programming Language: "C, like any other language, has its blemishes. Some of the operators have the wrong precedence; some parts of the syntax could be better."[37] The C standard did not attempt to correct many of these blemishes, because of the impact of such changes on already existing software.
Character set
[edit]The basic C source character set includes the following characters:[38]
- Lowercase and uppercase letters of the ISO basic Latin alphabet:
a–z,A–Z - Decimal digits:
0–9 - Graphic characters:
! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~ - Whitespace characters: space, horizontal tab, vertical tab, form feed, newline
The newline character indicates the end of a text line; it need not correspond to an actual single character, although for convenience C treats it as such.
The POSIX standard mandates a portable character set which adds a few characters (notably "@") to the basic C source character set. Both standards do not prescribe any particular value encoding -- ASCII and EBCDIC both comply with these standards, since they include at least those basic characters, even though they use different encoded values for those characters.
Additional multi-byte encoded characters may be used in string literals, but they are not entirely portable. Since C99 multi-national Unicode characters can be embedded portably within C source text by using \uXXXX or \UXXXXXXXX encoding (where X denotes a hexadecimal character).
The basic C execution character set contains the same characters, along with representations for the null character, alert, backspace, and carriage return.[38]
Run-time support for extended character sets has increased with each revision of the C standard.
Reserved words
[edit]All versions of C have reserved words that are case sensitive. As reserved words, they cannot be used for variable names.
C89 has 32 reserved words:
C99 added five more reserved words: (‡ indicates an alternative spelling alias for a C23 keyword)
inlinerestrict_Bool‡_Complex_Imaginary
C11 added seven more reserved words:[39] (‡ indicates an alternative spelling alias for a C23 keyword)
_Alignas‡_Alignof‡_Atomic_Generic_Noreturn_Static_assert‡_Thread_local‡
C23 reserved fifteen more words:
alignasalignofboolconstexprfalsenullptrstatic_assertthread_localtruetypeoftypeof_unqual_BitInt_Decimal32_Decimal64_Decimal128
Most of the recently reserved words begin with an underscore followed by a capital letter, because identifiers of that form were previously reserved by the C standard for use only by implementations. Since existing program source code should not have been using these identifiers, it would not be affected when C implementations started supporting these extensions to the programming language. Some standard headers do define more convenient synonyms for underscored identifiers. Some of those words were added as keywords with their conventional spelling in C23 and the corresponding macros were removed.
Prior to C89, entry was reserved as a keyword. In the second edition of their book The C Programming Language, which describes what became known as C89, Kernighan and Ritchie wrote, "The ... [keyword] entry, formerly reserved but never used, is no longer reserved." and "The stillborn entry keyword is withdrawn."[40]
Operators
[edit]C supports a rich set of operators, which are symbols used within an expression to specify the manipulations to be performed while evaluating that expression. C has operators for:
- arithmetic:
+,-,*,/,% - assignment:
= - augmented assignment:
+=,-=,*=,/=,%=,&=,|=,^=,<<=,>>= - bitwise logic:
~,&,|,^ - bitwise shifts:
<<,>> - Boolean logic:
!,&&,|| - conditional evaluation:
? : - equality testing:
==,!= - calling functions:
( ) - increment and decrement:
++,-- - member selection:
.,-> - object size:
sizeof - type:
typeof,typeof_unqualsince C23 - order relations:
<,<=,>,>= - reference and dereference:
&,*,[ ] - sequencing:
, - subexpression grouping:
( ) - type conversion:
(typename)
C uses the operator = (used in mathematics to express equality) to indicate assignment, following the precedent of Fortran and PL/I, but unlike ALGOL and its derivatives. C uses the operator == to test for equality. The similarity between the operators for assignment and equality may result in the accidental use of one in place of the other, and in many cases the mistake does not produce an error message (although some compilers produce warnings). For example, the conditional expression if (a == b + 1) might mistakenly be written as if (a = b + 1), which will be evaluated as true unless the value of a is 0 after the assignment.[41]
The C operator precedence is not always intuitive. For example, the operator == binds more tightly than (is executed prior to) the operators & (bitwise AND) and | (bitwise OR) in expressions such as x & 1 == 0, which must be written as (x & 1) == 0 if that is the coder's intent.[42]
Data types
[edit]This section needs additional citations for verification. (October 2012) |

The type system in C is static and weakly typed, which makes it similar to the type system of ALGOL descendants such as Pascal.[43] There are built-in types for integers of various sizes, both signed and unsigned, floating-point numbers, and enumerated types (enum). Integer type char is often used for single-byte characters. C99 added a Boolean data type. There are also derived types including arrays, pointers, records (struct), and unions (union).
C is often used in low-level systems programming where escapes from the type system may be necessary. The compiler attempts to ensure type correctness of most expressions, but the programmer can override the checks in various ways, either by using a type cast to explicitly convert a value from one type to another, or by using pointers or unions to reinterpret the underlying bits of a data object in some other way.
Some find C's declaration syntax unintuitive, particularly for function pointers. (Ritchie's idea was to declare identifiers in contexts resembling their use: "declaration reflects use".)[44]
C's usual arithmetic conversions allow for efficient code to be generated, but can sometimes produce unexpected results. For example, a comparison of signed and unsigned integers of equal width requires a conversion of the signed value to unsigned. This can generate unexpected results if the signed value is negative.
Pointers
[edit]C supports the use of pointers, a type of reference that records the address or location of an object or function in memory. Pointers can be dereferenced to access data stored at the address pointed to, or to invoke a pointed-to function. Pointers can be manipulated using assignment or pointer arithmetic. The run-time representation of a pointer value is typically a raw memory address (perhaps augmented by an offset-within-word field), but since a pointer's type includes the type of the thing pointed to, expressions including pointers can be type-checked at compile time. Pointer arithmetic is automatically scaled by the size of the pointed-to data type.
Pointers are used for many purposes in C. Text strings are commonly manipulated using pointers into arrays of characters. Dynamic memory allocation is performed using pointers; the result of a malloc is usually cast to the data type of the data to be stored. Many data types, such as trees, are commonly implemented as dynamically allocated struct objects linked together using pointers. Pointers to other pointers are often used in multi-dimensional arrays and arrays of struct objects. Pointers to functions (function pointers) are useful for passing functions as arguments to higher-order functions (such as qsort or bsearch), in dispatch tables, or as callbacks to event handlers.[18]
A null pointer value explicitly points to no valid location. Dereferencing a null pointer value is undefined, often resulting in a segmentation fault. Null pointer values are useful for indicating special cases such as no "next" pointer in the final node of a linked list, or as an error indication from functions returning pointers. In appropriate contexts in source code, such as for assigning to a pointer variable, a null pointer constant can be written as 0, with or without explicit casting to a pointer type, as the NULL macro defined by several standard headers or, since C23 with the constant nullptr. In conditional contexts, null pointer values evaluate to false, while all other pointer values evaluate to true.
Void pointers (void *) point to objects of unspecified type, and can therefore be used as "generic" data pointers. Since the size and type of the pointed-to object is not known, void pointers cannot be dereferenced, nor is pointer arithmetic on them allowed, although they can easily be (and in many contexts implicitly are) converted to and from any other object pointer type.[18]
Careless use of pointers is potentially dangerous. Because they are typically unchecked, a pointer variable can be made to point to any arbitrary location, which can cause undesirable effects. Although properly used pointers point to safe places, they can be made to point to unsafe places by using invalid pointer arithmetic; the objects they point to may continue to be used after deallocation (dangling pointers); they may be used without having been initialized (wild pointers); or they may be directly assigned an unsafe value using a cast, union, or through another corrupt pointer. In general, C is permissive in allowing manipulation of and conversion between pointer types, although compilers typically provide options for various levels of checking. Some other programming languages address these problems by using more restrictive reference types.
Arrays
[edit]Array types in C are traditionally of a fixed, static size specified at compile time. The more recent C99 standard also allows a form of variable-length arrays. However, it is also possible to allocate a block of memory (of arbitrary size) at run time, using the standard library's malloc function, and treat it as an array.
Since arrays are always accessed (in effect) via pointers, array accesses are typically not checked against the underlying array size, although some compilers may provide bounds checking as an option.[45][46] Array bounds violations are therefore possible and can lead to various repercussions, including illegal memory accesses, corruption of data, buffer overruns, and run-time exceptions.
C does not have a special provision for declaring multi-dimensional arrays, but rather relies on recursion within the type system to declare arrays of arrays, which effectively accomplishes the same thing. The index values of the resulting "multi-dimensional array" can be thought of as increasing in row-major order. Multi-dimensional arrays are commonly used in numerical algorithms (mainly from applied linear algebra) to store matrices. The structure of the C array is well suited to this particular task. However, in early versions of C the bounds of the array must be known fixed values or else explicitly passed to any subroutine that requires them, and dynamically sized arrays of arrays cannot be accessed using double indexing. (A workaround for this was to allocate the array with an additional "row vector" of pointers to the columns.) C99 introduced "variable-length arrays" which address this issue.
The following example using modern C (C99 or later) shows allocation of a two-dimensional array on the heap and the use of multi-dimensional array indexing for accesses (which can use bounds-checking on many C compilers):
int func(int n, int m) {
float (*p)[n][m] = malloc(sizeof *p);
if (p == NULL) {
return -1;
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
(*p)[i][j] = i + j;
}
}
print_array(n, m, p);
free(p);
return 1;
}
And here is a similar implementation using C99's Auto VLA feature:[f]
int func(int n, int m) {
// Caution: checks should be made to ensure n * m * sizeof(float) does NOT exceed limitations for auto VLAs and is within available size of stack.
float p[n][m]; // auto VLA is held on the stack, and sized when the function is invoked
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
p[i][j] = i + j;
}
print_array(n, m, p);
// no need to free(p) since it will disappear when the function exits, along with the rest of the stack frame
return 1;
}
Array–pointer interchangeability
[edit]The subscript notation x[i] (where x designates a pointer) is syntactic sugar for *(x+i).[47] Taking advantage of the compiler's knowledge of the pointer type, the address that x + i points to is not the base address (pointed to by x) incremented by i bytes, but rather is defined to be the base address incremented by i multiplied by the size of an element that x points to. Thus, x[i] designates the i+1th element of the array.
Furthermore, in most expression contexts (a notable exception is as operand of sizeof), an expression of array type is automatically converted to a pointer to the array's first element. This implies that an array is never copied as a whole when named as an argument to a function, but rather only the address of its first element is passed. Therefore, although function calls in C use pass-by-value semantics, arrays are in effect passed by reference.
The total size of an array x can be determined by applying sizeof to an expression of array type. The size of an element can be determined by applying the operator sizeof to any dereferenced element of an array A, as in n = sizeof A[0]. Thus, the number of elements in a declared array A can be determined as sizeof A / sizeof A[0]. Note, that if only a pointer to the first element is available as it is often the case in C code because of the automatic conversion described above, the information about the full type of the array and its length are lost.
Memory management
[edit]One of the most important functions of a programming language is to provide facilities for managing memory and the objects that are stored in memory. C provides three principal ways to allocate memory for objects:[18]
- Static memory allocation: space for the object is provided in the binary at compile time; these objects have an extent (or lifetime) as long as the binary which contains them is loaded into memory.
- Automatic memory allocation: temporary objects can be stored on the stack, and this space is automatically freed and reusable after the block in which they are declared is exited.
- Dynamic memory allocation: blocks of memory of arbitrary size can be requested at run time using library functions such as
mallocfrom a region of memory called the heap; these blocks persist until subsequently freed for reuse by calling the library functionreallocorfree.
These three approaches are appropriate in different situations and have various trade-offs. For example, static memory allocation has little allocation overhead, automatic allocation may involve slightly more overhead, and dynamic memory allocation can potentially have a great deal of overhead for both allocation and deallocation. The persistent nature of static objects is useful for maintaining state information across function calls, automatic allocation is easy to use but stack space is typically much more limited and transient than either static memory or heap space, and dynamic memory allocation allows convenient allocation of objects whose size is known only at run time. Most C programs make extensive use of all three.
Where possible, automatic or static allocation is usually simplest because the storage is managed by the compiler, freeing the programmer of the potentially error-prone chore of manually allocating and releasing storage. However, many data structures can change in size at run time, and since static allocations (and automatic allocations before C99) must have a fixed size at compile time, there are many situations in which dynamic allocation is necessary.[18] Prior to the C99 standard, variable-sized arrays were a common example of this. (See the article on C dynamic memory allocation for an example of dynamically allocated arrays.) Unlike automatic allocation, which can fail at run time with uncontrolled consequences, the dynamic allocation functions return an indication (in the form of a null pointer value) when the required storage cannot be allocated. (Static allocation that is too large is usually detected by the linker or loader, before the program can even begin execution.)
Unless otherwise specified, static objects contain zero or null pointer values upon program startup. Automatically and dynamically allocated objects are initialized only if an initial value is explicitly specified; otherwise they initially have indeterminate values (typically, whatever bit pattern happens to be present in the storage, which might not even represent a valid value for that type). If the program attempts to access an uninitialized value, the results are undefined. Many modern compilers try to detect and warn about this problem, but both false positives and false negatives can occur.
Heap memory allocation has to be synchronized with its actual usage in any program to be reused as much as possible. For example, if the only pointer to a heap memory allocation goes out of scope or has its value overwritten before it is deallocated explicitly, then that memory cannot be recovered for later reuse and is essentially lost to the program, a phenomenon known as a memory leak. Conversely, it is possible for memory to be freed but referenced subsequently, leading to unpredictable results. Typically, the failure symptoms appear in a portion of the program unrelated to the code that causes the error, making it difficult to diagnose the failure. Such issues are ameliorated in languages with automatic garbage collection.
Libraries
[edit]The C programming language uses libraries as its primary method of extension. In C, a library is a set of functions contained within a single "archive" file. Each library typically has a header file, which contains the prototypes of the functions contained within the library that may be used by a program, and declarations of special data types and macro symbols used with these functions. For a program to use a library, it must include the library's header file, and the library must be linked with the program, which in many cases requires compiler flags (e.g., -lm, shorthand for "link the math library").[18]
The most common C library is the C standard library, which is specified by the ISO and ANSI C standards and comes with every C implementation (implementations which target limited environments such as embedded systems may provide only a subset of the standard library). This library supports stream input and output, memory allocation, mathematics, character strings, and time values. Several separate standard headers (for example, stdio.h) specify the interfaces for these and other standard library facilities.
Another common set of C library functions are those used by applications specifically targeted for Unix and Unix-like systems, especially functions which provide an interface to the kernel. These functions are detailed in various standards such as POSIX and the Single UNIX Specification.
Since many programs have been written in C, there are a wide variety of other libraries available. Libraries are often written in C because C compilers generate efficient object code; programmers then create interfaces to the library so that the routines can be used from higher-level languages like Java, Perl, and Python.[18]
File handling and streams
[edit]File input and output (I/O) is not part of the C language itself but instead is handled by libraries (such as the C standard library) and their associated header files (e.g. stdio.h). File handling is generally implemented through high-level I/O which works through streams. A stream is from this perspective a data flow that is independent of devices, while a file is a concrete device. The high-level I/O is done through the association of a stream to a file. In the C standard library, a buffer (a memory area or queue) is temporarily used to store data before it is sent to the final destination. This reduces the time spent waiting for slower devices, for example a hard drive or solid-state drive. Low-level I/O functions are not part of the standard C library[clarification needed] but are generally part of "bare metal" programming (programming that is independent of any operating system such as most embedded programming). With few exceptions, implementations include low-level I/O.
Language tools
[edit]A number of tools have been developed to help C programmers find and fix statements with undefined behavior or possibly erroneous expressions, with greater rigor than that provided by the compiler.
Automated source code checking and auditing tools exist, such as Lint. A common practice is to use Lint to detect questionable code when a program is first written. Once a program passes Lint, it is then compiled using the C compiler. Also, many compilers can optionally warn about syntactically valid constructs that are likely to actually be errors. MISRA C is a proprietary set of guidelines to avoid such questionable code, developed for embedded systems.[48]
There are also compilers, libraries, and operating system level mechanisms for performing actions that are not a standard part of C, such as bounds checking for arrays, detection of buffer overflow, serialization, dynamic memory tracking, and automatic garbage collection.
Memory management checking tools like Purify or Valgrind and linking with libraries containing special versions of the memory allocation functions can help uncover run-time errors in memory usage.[49][50]
Uses
[edit]C has been widely used to implement end-user and system-level applications.[51]
Rationale for use in systems programming
[edit]
C is widely used for systems programming in implementing operating systems and embedded system applications.[52] This is for several reasons:
- The C language permits platform hardware and memory to be accessed with pointers and type punning, so system-specific features (e.g. Control/Status Registers, I/O registers) can be configured and used with code written in C – it allows fullest control of the platform it is running on.
- The code generated by compilation does not demand many system features, and can be invoked from some boot code in a straightforward manner – it is simple to execute.
- The C language statements and expressions typically map well to sequences of instructions for the target processor, and consequently there is a low run-time demand on system resources – it is fast to execute.
- With its rich set of operators, the C language can use many of the features of target CPUs. Where a particular CPU has more esoteric instructions, a language variant can be constructed with perhaps intrinsic functions to exploit those instructions – it can use practically all the target CPU's features.
- The language makes it easy to overlay structures onto blocks of binary data, allowing the data to be comprehended, navigated and modified – it can write data structures, even file systems.
- The language supports a rich set of operators, including bit manipulation, for integer arithmetic and logic, and perhaps different sizes of floating point numbers – it can process appropriately structured data effectively.
- C is a fairly small language, with only a handful of statements, and without too many features that generate extensive target code – it is comprehensible.
- C has direct control over memory allocation and deallocation, which gives reasonable efficiency and predictable timing to memory-handling operations, without any concerns for sporadic stop-the-world garbage collection events – it has predictable performance.
- C permits the use and implementation of different memory allocation schemes, including a typical
mallocandfree; a more sophisticated mechanism with arenas; or a version for an OS kernel that may suit DMA, use within interrupt handlers, or integrated with the virtual memory system. - Depending on the linker and environment, C code can also call libraries written in assembly language, and may be called from assembly language – it interoperates well with other lower-level code.
- C and its calling conventions and linker structures are commonly used in conjunction with other high-level languages, with calls both to C and from C supported – it interoperates well with other high-level code.
- C has a mature and broad ecosystem, including libraries, frameworks, open source compilers, debuggers and utilities, and is the de facto standard. It is likely the drivers already exist in C, or that there is a similar CPU architecture as a back-end of a C compiler, so there is reduced incentive to choose another language.
Games
[edit]Computer games are often built from a combination of languages. C has featured significantly, especially for those games attempting to obtain best performance from computer platforms. Examples include Doom from 1993.[53]
World Wide Web
[edit]Historically, C was sometimes used for web development using the Common Gateway Interface (CGI) as a "gateway" for information between the web application, the server, and the browser.[54] C may have been chosen over interpreted languages because of its speed, stability, and near-universal availability.[55] It is no longer common practice for web development to be done in C,[56] and many other web development languages are popular. Applications where C-based web development continues include the HTTP configuration pages on routers, IoT devices and similar, although even here some projects have parts in higher-level languages e.g. the use of Lua within OpenWRT.
Two popular web servers, Apache HTTP Server and Nginx, are written in C.[57][58][better source needed] C's close-to-the-metal approach allows for the construction of these high-performance software systems.[citation needed]
C as an intermediate language
[edit]C is sometimes used as an intermediate language by implementations of other languages. This approach may be used for portability or convenience; by using C as an intermediate language, additional machine-specific code generators are not necessary. C has some features, such as line-number preprocessor directives and optional superfluous commas at the end of initializer lists, that support compilation of generated code. However, some of C's shortcomings have prompted the development of other C-based languages specifically designed for use as intermediate languages, such as C--. Also, contemporary major compilers GCC and LLVM both feature an intermediate representation that is not C, and those compilers support front ends for many languages including C.
Computationally intensive libraries
[edit]C enables programmers to create efficient implementations of algorithms and data structures, because the layer of abstraction from hardware is thin, and its overhead is low, an important criterion for computationally intensive programs. For example, the GNU Multiple Precision Arithmetic Library, the GNU Scientific Library, Mathematica, and MATLAB are completely or partially written in C. Many languages support calling library functions in C; for example, the Python-based framework NumPy uses C for the high-performance and hardware-interacting aspects.
Other languages are written in C
[edit]A consequence of C's wide availability and efficiency is that compilers, libraries and interpreters of other programming languages are often implemented in C.[59] For example, the reference implementations of Python,[60] Perl,[61] Ruby,[62] and PHP[63] are written in C.
Limitations
[edit]Ritchie himself joked about the limitations of the language that he created:[64]
the power of assembly language and the convenience of ... assembly language
— Dennis Ritchie
While C is popular, influential and hugely successful, it has drawbacks, including:
- The standard dynamic memory handling with
mallocandfreeis prone to mistakes. Improper use can lead to memory leaks and dangling pointers.[65] - The use of pointers and the direct manipulation of memory means corruption of memory is possible.
- There is type checking, yet it does not apply to some areas like variadic functions, and the type checking can be trivially or inadvertently circumvented. It is weakly typed, despite being statically typed.
- Since the code generated by the compiler contains few run-time checks, there is a burden on the programmer to consider all possible outcomes, to protect against buffer overruns, array bounds checking, stack overflows, and memory exhaustion, and consider race conditions, thread isolation, etc.
- The use of pointers and the run-time manipulation of these enables two ways to access the same data (aliasing), which is not always determinable at compile time. This means that some optimizations that may be available to some other languages, such as Fortran, are not possible in C. For this reason, Fortran is sometimes considered faster.[citation needed]
- Some of the standard library functions, e.g.
scanforstrncat, can lead to buffer overruns. - There is limited standardization in support for low-level variants in generated code, such as different function calling conventions and ABIs; different structure packing conventions; and different byte ordering within larger integers (including endianness). In many language implementations, some of these options may be handled with the preprocessor directive
#pragma,[66][67] and some with additional keywords e.g. use__cdeclcalling convention. The directive and options are not consistently supported.[68] - String handling using the standard library is code-intensive, with explicit memory management required.
- The language does not directly support object orientation, introspection, run-time expression evaluation (like
evalin JavaScript), garbage collection, etc. - There are few guards against misuse of language features, which may enable unmaintainable code. In particular, the C preprocessor can hide troubling effects such as double evaluation and worse.[69] This capability for obfuscated code has been celebrated with competitions such as the International Obfuscated C Code Contest and the Underhanded C Contest.
- C lacks standard support for exception handling and only offers return codes for error checking. The
setjmpandlongjmpstandard library functions have been used[70] to implement a try-catch mechanism via macros. Also,gotostatements are commonly used for error handling.[citation needed]
For some purposes, restricted styles of C have been adopted, e.g. MISRA C or CERT C, in an attempt to reduce the opportunity for glitches. Databases such as CWE attempt to count the ways that C has potential vulnerabilities, along with recommendations for mitigation.
There are tools that can mitigate some of the drawbacks. Contemporary C compilers include checks which may generate warnings to help identify many potential bugs.
Related languages
[edit]
Many languages developed after C were influenced by and borrowed aspects of C, including C++, C#, C shell, D, Go, Java, JavaScript, Julia, Limbo, LPC, Objective-C, Perl, PHP, Python, Ruby, Rust, Swift, Verilog and SystemVerilog.[8][71] Some claim that the most pervasive influence has been syntactical – that these languages combine the statement and expression syntax of C with type systems, data models and large-scale program structures that differ from those of C, sometimes radically.
Several C or near-C interpreters exist, including Ch and CINT, which can also be used for scripting.
When object-oriented programming languages became popular, C++ and Objective-C were two different extensions of C that provided object-oriented capabilities. Both languages were originally implemented as source-to-source compilers; source code was translated into C, and then compiled with a C compiler.[72]
The C++ programming language (originally named "C with Classes") was devised by Bjarne Stroustrup as an approach to providing object-oriented functionality with a C-like syntax.[73] C++ adds greater typing strength, scoping, and other tools useful in object-oriented programming, and permits generic programming via templates. Nearly a superset of C, C++ now[when?] supports most of C, with a few exceptions.
Objective-C was originally a thin layer on top of C, and remains a strict superset of C that permits object-oriented programming using a hybrid dynamic/static typing paradigm. Objective-C derives its syntax from both C and Smalltalk: syntax that involves preprocessing, expressions, function declarations, and function calls is inherited from C, while the syntax for object-oriented features was originally taken from Smalltalk.
In addition to C++ and Objective-C, Ch, Cilk, and Unified Parallel C are nearly supersets of C.
See also
[edit]Notes
[edit]- ^ "Thompson had made a brief attempt to produce a system coded in an early version of C—before structures—in 1972, but gave up the effort."[2][3][4]
- ^ "The scheme of type composition adopted by C owes considerable debt to Algol 68, although it did not, perhaps, emerge in a form that Algol's adherents would approve of."[6][7][4]
- ^ Pronounced /ˈsiː/, like the letter 'c'.[9]
- ^ The original example code will compile on most modern compilers that are not in strict standard compliance mode, but it does not fully conform to the requirements of either C89 or C99. In fact, C99 requires that a diagnostic message be produced.
- ^ Return value
0is typically used in this context to indicate success.[18] - ^ Code of
print_array(not shown) slightly differs also, because of the type of p, being a pointer to the 2D array in the malloc'd version, and just a 2D array in the auto VNA version.
References
[edit]- ^ a b Prinz, Peter; Crawford, Tony (December 16, 2005). C in a Nutshell. O'Reilly Media, Inc. p. 3. ISBN 978-0-596-55071-4.
- ^ Ritchie (1993a), p. 9.
- ^ Ritchie (1993b), p. 9.
- ^ a b Ritchie (2003).
- ^ "N3221 – Editor's Report, Post January 2024 Strasbourg France Meeting". ISO/IEC JTC1/SC22/WG14. Open Standards. February 21, 2024. Retrieved May 24, 2024.
- ^ Ritchie (1993a), p. 8.
- ^ Ritchie (1993b), p. 8.
- ^ a b "Verilog HDL (and C)" (PDF). The Research School of Computer Science at the Australian National University. June 3, 2010. Archived from the original (PDF) on November 6, 2013. Retrieved August 19, 2013.
1980s: Verilog first introduced; Verilog inspired by the C programming language
- ^ "The name is based on, and pronounced like the letter C in the English alphabet". the c programming language sound. English Chinese Dictionary. Archived from the original on November 17, 2022. Retrieved November 17, 2022.
- ^ Munoz, Daniel. "After All These Years, the World is Still Powered by C Programming | Toptal". Toptal Engineering Blog. Retrieved June 15, 2024.
- ^ "C Language Drops to Lowest Popularity Rating". Developer.com. August 9, 2016. Archived from the original on August 22, 2022. Retrieved August 1, 2022.
- ^ a b c d e f g Ritchie (1993a).
- ^ "Programming Language Popularity". 2009. Archived from the original on January 16, 2009. Retrieved January 16, 2009.
- ^ "TIOBE Programming Community Index". 2009. Archived from the original on May 4, 2009. Retrieved May 6, 2009.
- ^ Ward, Terry A. (August 1983). "Annotated C / A Bibliography of the C Language". Byte. p. 268. Retrieved January 31, 2015.
- ^ "TIOBE Index for September 2024". Archived from the original on September 18, 2024. Retrieved September 20, 2024.
- ^ Kernighan & Ritchie (1978), p. 6.
- ^ a b c d e f g Klemens, Ben (2013). 21st Century C. O'Reilly Media. ISBN 978-1-4493-2714-9.
- ^ Ritchie, Dennis. "BCPL to B to C". lysator.liu.se. Archived from the original on December 12, 2019. Retrieved September 10, 2019.
- ^ a b c d e Jensen, Richard (December 9, 2020). ""A damn stupid thing to do"—the origins of C". Ars Technica. Archived from the original on March 28, 2022. Retrieved March 28, 2022.
- ^ a b Johnson, S. C.; Ritchie, D. M. (1978). "Portability of C Programs and the UNIX System". Bell System Tech. J. 57 (6): 2021–2048. CiteSeerX 10.1.1.138.35. doi:10.1002/j.1538-7305.1978.tb02141.x. ISSN 0005-8580. S2CID 17510065. (Note: The PDF is an OCR scan of the original, and contains a rendering of "IBM 370" as "IBM 310".)
- ^ McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. p. 10. 139. Archived (PDF) from the original on November 11, 2017. Retrieved February 1, 2015.
- ^ Kernighan & Ritchie (1978).
- ^ "C manual pages". FreeBSD Miscellaneous Information Manual (FreeBSD 13.0 ed.). May 30, 2011. Archived from the original on January 21, 2021. Retrieved January 15, 2021. [1] Archived January 21, 2021, at the Wayback Machine
- ^ Kernighan & Ritchie (1988).
- ^ Stroustrup, Bjarne (2002). Sibling rivalry: C and C++ (PDF) (Report). AT&T Labs. Archived (PDF) from the original on August 24, 2014. Retrieved April 14, 2014.
- ^ "Rationale for American National Standard for Information Systems – Programming Language – C". Archived from the original on July 17, 2024. Retrieved July 17, 2024.
- ^ C Integrity. International Organization for Standardization. March 30, 1995. Archived from the original on July 25, 2018. Retrieved July 24, 2018.
- ^ "JTC1/SC22/WG14 – C". Home page. ISO/IEC. Archived from the original on February 12, 2018. Retrieved June 2, 2011.
- ^ Andrew Binstock (October 12, 2011). "Interview with Herb Sutter". Dr. Dobbs. Archived from the original on August 2, 2013. Retrieved September 7, 2013.
- ^ "ISO/IEC 9899:2024 (en) — N3220 working draft" (PDF). Retrieved July 11, 2025.
- ^ "WG14-N3132 : Revised C23 Schedule" (PDF). open-std.org. June 4, 2023. Archived (PDF) from the original on June 9, 2023.
- ^ "WG14-N3220 : Working Draft, C2y" (PDF). open-std.org. February 21, 2024. Archived (PDF) from the original on February 26, 2024.
- ^ "TR 18037: Embedded C" (PDF). open-std.org. April 4, 2006. ISO/IEC JTC1 SC22 WG14 N1169. Archived (PDF) from the original on February 25, 2021. Retrieved July 26, 2011.
- ^ Harbison, Samuel P.; Steele, Guy L. (2002). C: A Reference Manual (5th ed.). Englewood Cliffs, NJ: Prentice Hall. ISBN 978-0-13-089592-9. Contains a BNF grammar for C.
- ^ Kernighan & Ritchie (1988), p. 192.
- ^ Kernighan & Ritchie (1978), p. 3.
- ^ a b "Committee Draft ISO/IEC 9899:TC3: 5.2.1 Character sets". 2007.
- ^ "ISO/IEC 9899:201x (ISO C11) Committee Draft" (PDF). open-std.org. December 2, 2010. Archived (PDF) from the original on December 22, 2017. Retrieved September 16, 2011.
- ^ Kernighan & Ritchie (1988), pp. 192, 259.
- ^ "10 Common Programming Mistakes in C++". Cs.ucr.edu. Archived from the original on October 21, 2008. Retrieved June 26, 2009.
- ^ Schultz, Thomas (2004). C and the 8051 (3rd ed.). Otsego, MI: PageFree Publishing Inc. p. 20. ISBN 978-1-58961-237-2. Retrieved February 10, 2012.
- ^ Feuer, Alan R.; Gehani, Narain H. (March 1982). "Comparison of the Programming Languages C and Pascal". ACM Computing Surveys. 14 (1): 73–92. doi:10.1145/356869.356872. S2CID 3136859.
- ^ Kernighan & Ritchie (1988), p. 122.
- ^ For example, gcc provides _FORTIFY_SOURCE. "Security Features: Compile Time Buffer Checks (FORTIFY_SOURCE)". fedoraproject.org. Archived from the original on January 7, 2007. Retrieved August 5, 2012.
- ^ เอี่ยมสิริวงศ์, โอภาศ (2016). Programming with C. Bangkok, Thailand: SE-EDUCATION PUBLIC COMPANY LIMITED. pp. 225–230. ISBN 978-616-08-2740-4.
- ^ Raymond, Eric S. (October 11, 1996). The New Hacker's Dictionary (3rd ed.). MIT Press. p. 432. ISBN 978-0-262-68092-9. Retrieved August 5, 2012.
- ^ "Man Page for lint (freebsd Section 1)". unix.com. May 24, 2001. Retrieved July 15, 2014.
- ^ "CS107 Valgrind Memcheck". web.stanford.edu. Retrieved June 23, 2023.
- ^ Hastings, Reed; Joyce, Bob. "Purify: Fast Detection of Memory Leaks and Access Errors" (PDF). Pure Software Inc.: 9.
- ^ Munoz, Daniel. "After All These Years, the World is Still Powered by C Programming". Toptal Engineering Blog. Retrieved November 17, 2023.
- ^ Dale, Nell B.; Weems, Chip (2014). Programming and problem solving with C++ (6th ed.). Burlington, Massachusetts: Jones & Bartlett Learning. ISBN 978-1-4496-9428-9. OCLC 894992484.
- ^ "Development of Doom". DoomWiki.org. March 2, 2025. Retrieved March 2, 2025.
- ^ Dr. Dobb's Sourcebook. U.S.: Miller Freeman, Inc. November–December 1995.
- ^ "Using C for CGI Programming". linuxjournal.com. March 1, 2005. Archived from the original on February 13, 2010. Retrieved January 4, 2010.
- ^ Perkins, Luc (September 17, 2013). "Web development in C: crazy? Or crazy like a fox?". Medium. Archived from the original on October 4, 2014. Retrieved April 8, 2022.
- ^ "What programming language does NGINX use?".
- ^ "What is Apache and What Does it Do for Website Development?".
- ^ "C – the mother of all languages". ICT Academy at IITK. November 13, 2018. Archived from the original on May 31, 2021. Retrieved October 11, 2022.
- ^ "1. Extending Python with C or C++". Python 3.10.7 documentation. Archived from the original on November 5, 2012. Retrieved October 11, 2022.
- ^ Conrad, Michael (January 22, 2018). "An overview of the Perl 5 engine". Opensource.com. Archived from the original on May 26, 2022. Retrieved October 11, 2022.
- ^ "To Ruby From C and C++". Ruby Programming Language. Archived from the original on August 12, 2013. Retrieved October 11, 2022.
- ^ Para, Michael (August 3, 2022). "What is PHP? How to Write Your First PHP Program". freeCodeCamp. Archived from the original on August 4, 2022. Retrieved October 11, 2022.
- ^ Metz, Cade (October 13, 2011). "Dennis Ritchie: The Shoulders Steve Jobs Stood On". Wired. Archived from the original on April 12, 2022. Retrieved April 19, 2022.
- ^ Internet Security Research Group. "What is memory safety and why does it matter?". Prossimo. Retrieved March 3, 2025.
- ^ corob-msft (March 31, 2022). "Pragma directives and the __pragma and _Pragma keywords". Microsoft Learn. Archived from the original on September 24, 2022. Retrieved September 24, 2022.
- ^ "Pragmas (The C Preprocessor)". GCC, the GNU Compiler Collection. Archived from the original on June 17, 2002. Retrieved September 24, 2022.
- ^ "Pragmas". Intel C++ Compiler Classic Developer Guide and Reference. Intel. Archived from the original on April 10, 2022. Retrieved April 10, 2022.
- ^ "In praise of the C preprocessor". apenwarr. August 13, 2007. Retrieved July 9, 2023.
- ^ Roberts, Eric S. (March 21, 1989). "Implementing Exceptions in C" (PDF). DEC Systems Research Center. SRC-RR-40. Archived (PDF) from the original on January 15, 2017. Retrieved January 4, 2022.
- ^ O'Regan, Gerard (September 24, 2015). Pillars of computing : a compendium of select, pivotal technology firms. Springer. ISBN 978-3-319-21464-1. OCLC 922324121.
- ^ Rauchwerger, Lawrence (2004). Languages and compilers for parallel computing : 16th international workshop, LCPC 2003, College Station, TX, USA, October 2–4, 2003 : revised papers. Springer. ISBN 978-3-540-24644-2. OCLC 57965544.
- ^ Stroustrup, Bjarne (1993). "A History of C++: 1979–1991" (PDF). Archived (PDF) from the original on February 2, 2019. Retrieved June 9, 2011.
Sources
[edit]- Kernighan, Brian W.; Ritchie, Dennis M. (1978). The C Programming Language (1st ed.). Englewood Cliffs: Prentice Hall. ISBN 978-0-13-110163-0. LCCN 77028983. OCLC 3608698. OL 4558528M. Wikidata Q63565563.
- Kernighan, Brian W.; Ritchie, Dennis M. (1988). The C Programming Language (2nd ed.). Upper Saddle River: Prentice Hall. ISBN 978-0-13-110362-7. LCCN 88005934. OCLC 254455874. OL 2030445M. Wikidata Q63413168.
- Ritchie, Dennis M. (March 1993a). Wexelblat, Richard L. (ed.). "The Development of the C Language". ACM SIGPLAN Notices. 28 (3). New York City: Association for Computing Machinery: 201–208. doi:10.1145/155360.155580. ISSN 0362-1340. Wikidata Q55869040.
- Ritchie, Dennis M. (1993b). Bergin, Thomas J.; Gibson, Richard G. (eds.). "The Development of the C Language". The Second ACM SIGPLAN Conference on History of Programming Languages (HOPL-II). New York City: Association for Computing Machinery: 201–208. doi:10.1145/154766.155580. Wikidata Q29392176.
- Ritchie, Dennis M. (2003) [1993]. The Development of the C Language. Dennis Ritchie. Wikidata Q134885774. Archived from the original on January 30, 2025 – via Bell Labs/Lucent Technologies.
Further reading
[edit]- Plauger, P.J. (1992). The Standard C Library (1 ed.). Prentice Hall. ISBN 978-0-13-131509-9. (source)
- Banahan, M.; Brady, D.; Doran, M. (1991). The C Book: Featuring the ANSI C Standard (2 ed.). Addison-Wesley. ISBN 978-0-201-54433-6. (free)
- Feuer, Alan R. (1985). The C Puzzle Book (1 ed.). Prentice Hall. ISBN 0-13-109934-5.
- Harbison, Samuel; Steele, Guy Jr. (2002). C: A Reference Manual (5 ed.). Pearson. ISBN 978-0-13-089592-9. (archive)
- King, K.N. (2008). C Programming: A Modern Approach (2 ed.). W. W. Norton. ISBN 978-0-393-97950-3. (archive)
- Griffiths, David; Griffiths, Dawn (2012). Head First C (1 ed.). O'Reilly. ISBN 978-1-4493-9991-7.
- Perry, Greg; Miller, Dean (2013). C Programming: Absolute Beginner's Guide (3 ed.). Que. ISBN 978-0-7897-5198-0.
- Deitel, Paul; Deitel, Harvey (2015). C: How to Program (8 ed.). Pearson. ISBN 978-0-13-397689-2.
- Gustedt, Jens (2019). Modern C (2 ed.). Manning. ISBN 978-1-61729-581-2. (free)
External links
[edit]- ISO C Working Group official website
- ISO/IEC 9899, publicly available official C documents, including the C99 Rationale
- "C99 with Technical corrigenda TC1, TC2, and TC3 included" (PDF). Archived (PDF) from the original on October 25, 2007. (3.61 MB)
- comp.lang.c Frequently Asked Questions
- A History of C, by Dennis Ritchie
- C Library Reference and Examples
C (programming language)
View on GrokipediaIntroduction
Characteristics
C is a procedural, imperative programming language that supports structured programming through features like compound statements, loops, and conditional branching, allowing developers to organize code into functions and procedures for modular design.[7] Its design emphasizes direct control over program execution, where statements explicitly modify program state step by step.[7] A key distinguishing feature of C is its provision of low-level access to hardware memory via pointers, which treat memory as a linear array of addressable cells, enabling explicit manipulation of data structures and direct interaction with system resources such as device registers.[7] This capability supports systems programming tasks, including operating system kernels and embedded applications, while maintaining a balance with higher-level abstractions. C's minimalist design is evident in its small set of 32 keywords in the original ANSI standard, eschewing built-in support for object-oriented paradigms like classes or inheritance and functional features such as first-class functions or closures, which keeps the language core simple and extensible through libraries.[7] Portability across diverse hardware platforms is achieved through C's abstract machine model, which defines a parameterized, nondeterministic execution environment assuming basic types like 8-bit characters and integers of specified minimum sizes, independent of specific processor architectures.[8] This model facilitates recompilation on different systems with minimal changes, as demonstrated by the successful porting of the Unix operating system kernel to various machines in the early 1970s.[8] C programs compile directly to efficient machine code without requiring a dedicated runtime system, producing standalone executables that leverage the host operating system's services only as needed, which contributes to its high performance in resource-constrained environments.[7] Influenced by ALGOL 60's structured syntax and type system, as well as the simplicity of its predecessor B, C prioritizes performance and expressiveness in systems programming, originally developed for Unix at Bell Labs.[7]Hello, World Example
The "Hello, World!" program exemplifies the minimal structure required for a functional C executable, showcasing input/output operations and program termination. This simple example outputs a greeting to the console, relying on the C standard library for basic functionality.#include <stdio.h>
int main(void) {
printf("Hello, World!\n");
return 0;
}
The #include <stdio.h> directive is a preprocessor instruction that incorporates the declarations from the standard input/output header file into the program before compilation, enabling access to functions like printf for formatted output.[9] Preprocessor directives, beginning with #, are handled by the preprocessor phase, which expands macros and includes files to prepare the code for the compiler.
The int main(void) line defines the program's entry point function, where execution begins; int specifies the return type as an integer, and void indicates no input arguments are accepted. The printf("Hello, World!\n"); statement invokes the printf function from the standard library to display the specified string on standard output, with \n producing a newline.[9] The standard library supplies essential I/O operations, including those declared in stdio.h. The return 0; statement ends the main function and signals successful completion to the host environment by returning the integer value 0.
Compilation transforms the source code (typically saved as hello.c) into an executable binary through several stages: preprocessing (handling directives like #include), compilation to assembly, assembly to object code, and linking with the standard library to resolve external references such as printf. Using the GNU Compiler Collection (GCC), this is achieved with the command gcc hello.c -o hello, producing an executable named hello; execution follows via ./hello on Unix-like systems.
History
Early Developments
The C programming language originated at Bell Labs in the late 1960s and early 1970s, emerging as a tool for systems programming amid the development of the Unix operating system. Its immediate predecessor was the B language, created by Ken Thompson in 1969 for the PDP-7 minicomputer, which itself derived from Martin Richards's BCPL language of the mid-1960s. BCPL was a typeless, word-oriented language initially used at Bell Labs for the Multics project, but Thompson simplified it into B to fit the memory constraints of early Unix development on the PDP-7, which had only 8K 18-bit words available. B proved effective for writing early Unix utilities but lacked robust support for character handling and floating-point operations, limiting its suitability for the more capable PDP-11 acquired in 1970.[10] Dennis Ritchie began extending B in 1971 to address these shortcomings, motivated by the need for a higher-level language that could enable portable systems programming beyond the inefficiencies of assembly code. This work, initially termed "New B" or NB, introduced explicit data types such as int and char to better accommodate the PDP-11's byte-addressable architecture, reducing pointer overhead and improving efficiency. By 1972, Ritchie formalized these changes, adding structures (struct) for enhanced data abstraction and treating arrays as pointers, marking the birth of C as a distinct language during a highly creative period that coincided with Unix's growth. These additions allowed C to support more structured and abstract programming while retaining B's simplicity and low-level access.[10] A pivotal demonstration of C's viability came in 1973, when Ritchie and Thompson rewrote the Unix kernel in C over the summer on the PDP-11, replacing much of the prior assembly code and proving the language's reliability for operating system implementation. This rewrite was feasible only after the essentials of modern C were in place by early 1973, though an earlier 1972 attempt using a pre-struct version had been abandoned due to limitations. Initially, C lacked a formal specification and was distributed informally through typed manuscripts and internal manuals shared among Bell Labs developers, facilitating its adoption within the Unix team before broader dissemination. The core motivation throughout—creating a portable alternative to assembly—stems from Unix's evolution from resource-constrained beginnings to a more general-purpose system.[10]Pre-Standard Era
The pre-standard era of the C programming language, spanning roughly from 1973 to 1983, was characterized by its informal specification and rapid evolution within the Bell Labs environment, building on earlier experiments with the B language. C emerged as a typed successor to the typeless B, introducing fundamental features like explicit data types (int and char), arrays, and pointers to enhance expressiveness for systems programming. By 1973, these changes had solidified, with functions gaining typed return values and parameters, marking a key transition from B's limitations. This period laid the groundwork for C's use in rewriting the Unix kernel, emphasizing efficiency on the PDP-11 minicomputer.[1] A pivotal milestone was the 1978 publication of The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie, which provided the first comprehensive, widely accessible description of the language and effectively defined "K&R C" as the de facto standard. The book outlined C's syntax, including the absence of function prototypes—where functions were declared with parameter names only, followed by separate type declarations—and the implicit int rule, under which undeclared functions or variables defaulted to integer type. These conventions reflected C's origins in a resource-constrained environment but introduced subtleties in type checking and argument passing. The preprocessor, introduced around 1972–1973, was also detailed, supporting directives like #include for file inclusion and #define for macro substitution, with later enhancements for argumented macros and conditional compilation by 1975. Early library conventions, such as Mike Lesk's portable I/O package, emerged by 1973 to facilitate input/output across hardware, forming the basis for subsequent standard libraries.[1][11] C saw widespread adoption starting with Unix Version 6 in May 1975, which included the first distributed C compiler and much of the Unix system rewritten in C, enabling broader portability beyond assembly language implementations. This release marked C's shift from an internal tool to a language influencing academic and commercial systems, with source code licensed to universities. However, the lack of a formal standard led to significant variations across implementations; for instance, pointer arithmetic and integer sizes differed on machines like the Interdata 8/32 in 1977, causing portability issues such as misalignment or overflow in code ported from PDP-11. Compilers from vendors like PDP-11 and later VAX systems often diverged in handling implicit declarations or preprocessor extensions, compelling developers to use conditional compilation (#ifdef) for machine-specific adaptations. These challenges underscored the need for standardization but also demonstrated C's flexibility in diverse environments.[1][12]ANSI and ISO Standardization
In 1983, the American National Standards Institute (ANSI) formed the X3J11 committee under the Accredited Standards Committee X3 on Information Processing Systems to develop a formal standard for the C programming language, building on the de facto K&R specification to address growing needs for portability amid diverse implementations.[3][13] The committee, comprising representatives from industry, academia, and users, held its first meeting in June 1983 and conducted over 20 meetings through 1988, reviewing base documents including the second edition of The C Programming Language and prior proposals like the /usr/group standard.[14][13] The committee's efforts culminated in the publication of ANSI X3.159-1989, commonly known as ANSI C or C89, which was ratified by ANSI on December 14, 1989, and officially published in the spring of 1990.[3][14] This standard was subsequently adopted internationally by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) as ISO/IEC 9899:1990, or C90, following minor editorial adjustments to align with ISO formatting and procedures, without substantive changes to the language definition.[15][3] The ISO ratification process involved balloting among national bodies, achieving approval in mid-1990, and the standard received Amendment 1 (ISO/IEC 9899:1990/AMD 1:1995) adding optional bounds-checked library functions for improved program integrity, though these amendments were minor and did not alter core semantics.[16][13] Key enhancements in C89/C90 over prior practices included the introduction of function prototypes, which specify parameter types in declarations to enable compile-time type checking and automatic argument promotion, reducing errors from implicit int returns and mismatched calls.[14] The void type was formalized as an incomplete type for functions returning no value (e.g.,void f(void);), unspecified parameters (e.g., void (*ptr)(void);), and generic pointers (e.g., void *), enhancing expressiveness without introducing new storage.[14] Additionally, the const qualifier declared objects as read-only to prevent modification and aid optimization (e.g., const int max = 100;), while volatile ensured that accesses to variables like hardware registers bypassed compiler optimizations, guaranteeing fresh reads and writes (e.g., volatile int status;).[14] These features promoted safer, more maintainable code while preserving compatibility.
The standard resolved numerous ambiguities in K&R C by precisely defining behaviors, scopes, and conversions, such as standardizing integral promotions to preserve values in mixed signed/unsigned operations and clarifying array-to-pointer decay rules.[14] It explicitly categorized certain actions as undefined behavior—such as unsequenced modifications or dereferencing null pointers—allowing compilers flexibility for optimization while requiring conformance in defined cases, thus eliminating implementation-specific interpretations that plagued earlier variants.[14][17] This precision extended to linkage and storage duration, ensuring consistent program semantics across environments.
The ANSI/ISO standardization significantly improved C's portability by providing a common reference for compilers, enabling code to compile and behave identically on diverse platforms without vendor-specific extensions dominating.[3][18] Compiler conformance became measurable against the standard's requirements for translation phases, diagnostics, and library functions, fostering widespread adoption and reducing fragmentation; for instance, major vendors like Microsoft and UNIX implementers aligned their tools to C89/C90, boosting C's role in systems programming.[14][19]
Post-1999 Revisions
The first major revision after the initial ISO standardization was C99, formally known as ISO/IEC 9899:1999, which introduced several enhancements to improve expressiveness and support for modern hardware. Key additions included inline functions for better code optimization, thelong long integer type to handle 64-bit integers, variable-length arrays (VLAs) for dynamic sizing at runtime, and support for complex numbers via the <complex.h> header. These features aimed to address limitations in numerical computing and performance-critical applications while maintaining backward compatibility with C90.[20]
Subsequent updates continued this evolution with C11 (ISO/IEC 9899:2011), which focused on concurrency and generic programming to meet demands from parallel processing environments. Notable innovations were the _Generic keyword for type-generic expressions, the _Atomic qualifier for thread-safe atomic operations, and memory ordering specifications to support multithreading through the new <threads.h> and <stdatomic.h> headers. C11 also deprecated the unsafe gets() function, marking it for potential removal due to buffer overflow risks, and introduced optional annexes for bounds-checked functions.[21]
C17 (ISO/IEC 9899:2018) served primarily as a maintenance release, incorporating technical corrigenda to fix defects identified in C11 without adding substantive new language features or library extensions. It clarified ambiguities in areas like Unicode support and floating-point behavior but removed gets() entirely from the standard library.[22]
The most recent revision, C23 (ISO/IEC 9899:2024), published in 2024, builds on prior standards by enhancing type safety, precision, and usability for contemporary systems. It standardized the bool type with true and false keywords, introduced nullptr as a null pointer constant, added bit-precise integers via _BitInt(N) for exact-width types, and enhanced support for constant expressions and const-qualified objects via the constexpr specifier. An annex provides checked arithmetic functions to detect overflows, and other additions include typeof for type inference and attributes for metadata. As of 2025, C23 remains the current standard.[23][5]
Ongoing development under the WG14 committee targets C2Y, the next anticipated revision, with work commencing in early 2024 and continuing into 2025. Early proposals emphasize attributes for function and type annotations, enhanced Unicode string handling via UTF-8 literals and improved character classification, and potential extensions for safer memory management, though no features have been finalized.[24][25]
Syntax and Semantics
Lexical Elements
The lexical elements of the C programming language form the fundamental units from which source code is constructed, as defined in the ISO/IEC 9899 standard. These elements include characters, tokens, comments, and directives processed through specific translation phases to ensure portability across implementations. The language's lexical structure emphasizes simplicity and efficiency, drawing from ASCII-based representations while providing mechanisms for extended character support. The basic source character set in C consists of 96 characters: the space character, control characters for horizontal tab, vertical tab, form feed, and newline, plus 91 graphical characters including uppercase and lowercase letters (A-Z, a-z), digits (0-9), and symbols such as !, @, #, $, %, ^, &, *, (, ), -, +, =, {, }, [, ], |, , ;, :, ', ", <, >, ,, ., and /, aligned with ISO/IEC 646:1991.[23] This set ensures compatibility with 7-bit encodings, though implementations may support multibyte characters in comments or string literals. The execution character set, used at runtime, extends the source set with additional control characters like null (all bits zero) and is implementation-defined in encoding.[23] Universal character names provide support for Unicode code points beyond the basic set, using escape sequences like \u followed by four hexadecimal digits (for UTF-16) or \U followed by eight hexadecimal digits (for UTF-32), excluding surrogate pairs (U+D800 to U+DFFF) and values exceeding U+10FFFF.[23] These names can appear in identifiers, character constants, and string literals, adhering to Unicode Standard Annex #31 for identifier validity.[23] A C source file comprises a sequence of preprocessing tokens and comments, categorized during translation into keywords, identifiers, constants, string literals, operators, punctuators, and header names.[23] Keywords are predefined, case-sensitive reserved words with fixed meanings, totaling approximately 54 in the latest standard (C23), including classics likeint, if, while, return, switch, case, default, break, continue, and goto, as well as newer ones such as _Bool, _Complex, _Atomic, _Alignas, _Alignof, _Noreturn, _Static_assert, _Thread_local, bool, alignas, alignof, constexpr, nullptr, typeof, true, false, _BitInt, and type-specific _Float32, _Float64, _Float128, _Decimal32, _Decimal64, and _Decimal128.[23][26] Identifiers, used for variable and function names, are sequences starting with a letter (a-z, A-Z), underscore (_), or universal character name, followed by letters, digits (0-9), underscores, or additional universal characters; they are case-sensitive, with implementations handling at least 63 significant characters internally and 31 externally, and may include the dollar sign ($) on an implementation-defined basis.[23] Identifiers beginning with double underscore (__) or underscore followed by an uppercase letter are reserved for the implementation.[23]
Constants represent fixed values: integer constants in decimal, octal (prefixed 0), hexadecimal (0x or 0X), or binary (0b or 0B) forms, with optional digit separators (_) and suffixes like U (unsigned), L (long), LL (long long), or combinations; for example, 123, 0x1A, 0b101, or 1'000U.[23] Floating-point constants appear in decimal (e.g., 1.23) or hexadecimal (0x1.2p3) notation, with exponents (e or E for decimal, p or P for hexadecimal), optional suffixes like f (float), F, l (long double), L, or extensions such as _FloatN, _Float32x, dN (decimal), and their x variants for interchange floating types.[23] Character constants, enclosed in single quotes, include 'a', escape sequences like '\n' for newline or '\x1A' for hexadecimal, and multicharacter forms like 'AB' (integer value); prefixed forms support UTF-8 (u8), UTF-16 (u), UTF-32 (U), or wide (L) characters.[23] Enumeration constants and predefined values like true (1) and false (0) also qualify as constants.[23]
String literals are sequences of characters in double quotes, such as "hello", supporting the same prefixes as character constants (e.g., u8"hello" for UTF-8); adjacent literals are concatenated during translation, forming null-terminated arrays in the execution character set.[23] Operators include unary and binary symbols like +, -, *, /, %, =, ==, !=, <, >, &, |, ^, ~, !, &&, ||, sizeof, and ->, with digraph alternatives such as <: for [, :> for ], <% for {, %> for }, and %: for # or ##.[23] Punctuators structure the syntax, comprising ;, ,, (, ), [, ], {, }, ., ->, and :: (the latter for qualified names in limited contexts).[23] Header names, used in inclusion directives, are enclosed in < > for system headers (e.g., <stdio.h>) or " " for local files (e.g., "myfile.h").[23]
Comments allow explanatory text, ignored by the compiler: traditional block comments /* ... */ (non-nesting, spanning multiple lines until */), introduced in early C, and single-line comments // ... (ending at newline), added in C99 and standardized thereafter.[23] Whitespace—spaces, horizontal tabs, newlines, and form feeds—serves primarily to separate tokens, with multiple instances treated as one except in string literals or where significant (e.g., distinguishing operators like x+++y from x++ + y); newlines terminate lines, and backslash-newline sequences enable line continuation by joining physical lines into logical ones, replacing the sequence with a space.[23]
Preprocessor directives begin with # followed by a directive name like include or define, ending at newline, and operate on preprocessing tokens before full compilation.[23] The #include directive inserts file contents at the point of invocation, searching system paths for __VA_ARGS__, stringification (# operator), and token pasting (## operator); predefined macros like __LINE__ (current line number), __FILE__ (source file name), and __STDC_VERSION__ (standard version, e.g., 202311L for C23) are always available.[23]
For example:
#define PI 3.14
#define SQUARE(x) ((x)*(x))
Other directives include #undef for macro removal, #ifdef/#ifndef/#if/#else/#elif/#endif for conditional inclusion, and #pragma for implementation-specific controls.[23] Macros expand textually, rescanning for further expansions, but exclude keywords and may use parentheses for precedence.[23]
The translation environment processes source code in eight phases to form executable units.[23] Phase 1 maps the physical source file to the source character set, handling multibyte characters and line endings.[23] Phase 2 converts to the execution character set and processes line continuations by removing backslash-newline pairs.[23] Phase 3 splices continued lines, tokenizes the text into preprocessing tokens, and replaces comments with single spaces.[23] Phase 4 executes preprocessing directives, expands macros, and removes directives.[23] Phase 5 maps remaining tokens to the execution character set, recognizing keywords and identifiers.[23] Phase 6 concatenates adjacent string literals and handles token pasting.[23] Phase 7 performs syntax and semantic analysis to form translation units.[23] Phase 8 links translation units and libraries into the program image.[23] This phased approach ensures consistent interpretation, with implementation-defined behaviors noted for aspects like character mappings.[23]
Data Types and Declarations
C's type system categorizes data into basic types, derived types, and other constructs, ensuring precise memory allocation and operation semantics as defined by the ISO/IEC 9899 standard.[23] This system supports low-level programming while promoting portability across implementations.[22] Basic types form the foundation, with derived types built upon them to represent complex structures.[27]Basic Types
The fundamental data types in C include integer types, floating-point types, and the void type. Integer types can be qualified with thesigned or unsigned keywords to specify their representation. Signed integer types typically use two's complement to represent negative values, ranging from -2^(n-1) to 2^(n-1) - 1 for n bits, while unsigned types represent only non-negative values from 0 to 2^n - 1, providing a larger positive range but no support for negatives. The basic integer types include char (minimum 8 bits, signedness implementation-defined), signed char (signed, minimum 8 bits), unsigned char (unsigned, minimum 8 bits), short (signed, at least 16 bits), unsigned short (unsigned, at least 16 bits), int (signed, at least 16 bits), unsigned int (unsigned, at least 16 bits), long (signed, at least 32 bits), unsigned long (unsigned, at least 32 bits), long long (signed, at least 64 bits, added in C99), and unsigned long long (unsigned, at least 64 bits).[23][27] For example, unsigned int guarantees at least 16 bits but is often 32 bits in practice.[23]
Floating-point types include float (single precision, typically 32 bits), double (double precision, typically 64 bits), and long double (extended precision, implementation-defined, at least 64 bits). These conform to IEC 60559 (also known as IEEE 754) where supported, enabling representation of real numbers with varying ranges and precision.[23] The void type indicates the absence of a value and is used primarily for functions returning no value or for generic pointers like void*.[27]
The following table summarizes the core basic types, their categories, signedness, and minimum sizes as defined by the C standard:
| Type | Category | Signedness | Minimum Size (bits) |
|---|---|---|---|
| char | Integer | Implementation-defined | 8 |
| signed char | Integer | Signed | 8 |
| unsigned char | Integer | Unsigned | 8 |
| short | Integer | Signed | 16 |
| unsigned short | Integer | Unsigned | 16 |
| int | Integer | Signed | 16 |
| unsigned int | Integer | Unsigned | 16 |
| long | Integer | Signed | 32 |
| unsigned long | Integer | Unsigned | 32 |
| long long | Integer | Signed | 64 |
| unsigned long long | Integer | Unsigned | 64 |
| float | Floating-point | Signed | 32 |
| double | Floating-point | Signed | 64 |
| long double | Floating-point | Signed | 64 |
| void | Void | N/A | N/A |
C99 and Later Extensions
C99 introduced additional basic types to expand functionality. Notably,_Bool is an integer type capable of representing only the values 0 (false) and 1 (true), with a minimum size of 1 bit (though often implemented as at least 8 bits for alignment). It is the type of the boolean values in C and promotes to int in expressions. Other extensions include _Complex and _Imaginary for complex arithmetic, but these are optional and depend on implementation support.[23][27]
| Type | Category | Description | Minimum Size (bits) |
|---|---|---|---|
| _Bool | Integer | Boolean type (0 or 1) | 1 |
| _Complex | Complex | Complex floating-point (optional) | N/A (based on real/imag parts) |
| _Imaginary | Imaginary | Imaginary floating-point (optional) | N/A (based on real part) |
Derived Types
Derived types extend basic types to form pointers, arrays, structures (struct), unions, and enumerations (enum). Pointers reference memory locations of a specified type, declared as type*, such as int *p; for a pointer to an integer. Arrays declare contiguous sequences, like int arr[10]; for ten integers.
Structures aggregate heterogeneous data members within curly braces, e.g., struct point { int x; int y; };, allowing access via dot notation. Unions store variant data in overlapping memory, declared similarly with union keyword, e.g., union data { int i; float f; };, where the size matches the largest member. Enumerations define named integer constants, as in enum color { RED, GREEN, BLUE };, with values starting from 0 unless specified. These derived types enable complex data modeling while adhering to the standard's memory layout rules.[23]
Type Qualifiers
Type qualifiers modify base or derived types to impose additional semantics. Theconst qualifier declares objects as read-only after initialization, preventing modification, e.g., const int max = 100;. volatile indicates that an object's value may change unpredictably, such as in hardware registers, ensuring no optimizations assume constancy, e.g., volatile int status;. Introduced in C99, restrict qualifies pointers to assert unique access, enabling compiler optimizations like alias removal, e.g., int *restrict ptr;.[23] Qualifiers can combine, such as const volatile, and apply recursively to derived types.
Storage Classes
Storage classes specify duration, linkage, and visibility of objects.auto provides automatic storage duration for local variables, defaulting to block scope, e.g., auto int i = 0;. register suggests optimizer placement in CPU registers for speed, though ignored in modern compilers; additionally, register variables do not have addresses, so the address-of operator (&) cannot be applied to them.[28] e.g., register int counter;. static grants static storage duration, retaining values across invocations and limiting scope to file or function, e.g., static int count = 0;. extern declares external linkage for global access across translation units, e.g., extern int global_var;. These classes interact with types to control program behavior and memory persistence.[23]
Declaration Syntax
Declarations in C follow the syntaxstorage-class-specifiers type-specifiers declarator-list;, where type-specifiers define the base type and declarators add indirection or naming.[23] For aggregates, structures and unions use struct or union followed by member declarations in braces, optionally tagged for incomplete types, e.g., struct node { int data; struct node* next; };. Enumerations declare as enum with optional tag and constants, e.g., enum status { OFF, ON };. Pointers use * in declarators, with qualifiers applying to the pointed-to type, e.g., const int* const p = &x;. This syntax ensures type-safe construction of program entities.
Type Compatibility and Promotion Rules
Two types are compatible if they have the same representation and behavior, such as matching integer widths or identical structure layouts (per section 6.2.7 of ISO/IEC 9899).[23] For aggregates, compatibility requires identical tags, members, and qualifiers. Integer promotion rules elevate narrower types toint if representable, or unsigned int otherwise, before arithmetic operations; for instance, char and short promote to int (section 6.3.1.1).[23] These rules prevent overflow in expressions and ensure consistent computation, with floating-point promotions similarly elevating float to double. Such mechanisms underpin C's role in efficient memory management by aligning types with hardware constraints.[23]
Operators and Expressions
In C, an expression consists of operators and operands that compute a value, designate an object or function, or produce side effects such as modifying storage or I/O operations.[29] Expressions form the building blocks for more complex constructs, where operands can be constants, variables, or other expressions, and operators perform the specified computations.[30] The language defines several categories of operators, each handling specific types of operations on operands, typically integers or other scalar types.[31] Arithmetic operators perform basic mathematical operations: addition (+), subtraction (-), multiplication (*), division (/), and remainder (%), all of which work on integer or floating-point operands and yield results of compatible types after usual arithmetic conversions.[29] Unary arithmetic operators include the additive inverse (-) and postfix/prefix increment (++) and decrement (--), which modify integer operands by 1; for example, i++ increments i after using its current value.[32] Relational operators compare scalar values for ordering: less than (<), greater than (>), less than or equal (<=), and greater than or equal (>=), producing integer results of 1 (true) or 0 (false).[31] Equality operators (== and !=) similarly compare for equality or inequality, also yielding 0 or 1.[29]
Logical operators evaluate boolean conditions: logical AND (&&) short-circuits if the left operand is 0, logical OR (||) short-circuits if the left is nonzero, and unary negation (!) inverts a scalar to 0 or 1. Bitwise operators manipulate integer bits: AND (&), XOR (^), inclusive OR (|), unary complement (~), and left/right shifts (<< and >>), where shifts treat the left operand as the value and the right as the shift count, with right shifts implementation-defined for signed integers.[31] Assignment operators modify and assign: simple assignment (=) stores the right operand's value into the left lvalue, while compound forms like +=, -=, *=, /=, %=, <<=, >>=, &=, ^=, and |= combine arithmetic or bitwise operations with assignment.[33]
Other operators include the sizeof operator, which yields the size in bytes of its operand's type (e.g., sizeof(int)), the ternary conditional (?:), which selects between two expressions based on a condition (e.g., condition ? expr1 : expr2), and the comma operator (,), which evaluates its left operand, discards the result, then evaluates the right and yields its value.[29] Operator precedence determines parsing order in mixed expressions, with higher precedence binding tighter; for instance, multiplication precedes addition. Associativity resolves ambiguities for same-precedence operators, mostly left-to-right except for unary, ternary, and assignment operators, which are right-to-left.[31] The following table summarizes precedence levels from highest (1) to lowest (15), as defined in the C standard:
| Precedence | Operator Description | Operators | Associativity |
|---|---|---|---|
| 1 | Postfix | ++ -- () [] . -> (type){init} | Left-to-right |
| 2 | Unary | ++ -- + - ! ~ (type) * & sizeof _Alignof alignof | Right-to-left |
| 3 | Multiplicative | * / % | Left-to-right |
| 4 | Additive | + - | Left-to-right |
| 5 | Shift | << >> | Left-to-right |
| 6 | Relational | < <= > >= | Left-to-right |
| 7 | Equality | == != | Left-to-right |
| 8 | Bitwise AND | & | Left-to-right |
| 9 | Bitwise XOR | ^ | Left-to-right |
| 10 | Bitwise OR | ` | ` |
| 11 | Logical AND | && | Left-to-right |
| 12 | Logical OR | ` | |
| 13 | Conditional | ?: | Right-to-left |
| 14 | Assignment | `= += -= *= /= %= <<= >>= &= ^= | =` |
| 15 | Comma | , | Left-to-right |
f(a) + g(b), f might execute before or after g.[34] Sequence points impose ordering guarantees: they occur after the first operand and before the second for &&, ||, and ,; after the condition and before the selected operand for ?:; at the end of full expressions (e.g., after ;) and initializers; before function calls after argument evaluation; and at other points like the end of declarators or before library returns.[29] All side effects from evaluations before a sequence point must complete before those after it, and value computations must not overlap with modifying side effects on the same object.[34]
Side effects arise from operators like assignment, increment, or function calls that modify objects or produce I/O; for instance, i++ computes i's value then increments it as a side effect.[31] Undefined behavior results if a side effect on a scalar object is unsequenced relative to another side effect on the same object (e.g., i = i++ + 1) or relative to the value computation using that object (e.g., f(i++, i)), as the order of modification and access is indeterminate.[34] Such cases can lead to unpredictable results across compilers or optimizations, emphasizing the need for sequence points to enforce reliable ordering.[29]
Control Structures
C provides a set of fundamental control structures to manage the flow of program execution, allowing conditional branching, repetition, and explicit jumps within functions. These mechanisms, defined in the ISO/IEC 9899 standard, emphasize simplicity and efficiency, enabling sequential execution unless altered by conditions or transfers.[29] Statements in C are the basic units of execution, and control structures build upon them to handle decision-making and iteration without introducing higher-level abstractions like exceptions.[29]Conditionals
Conditional execution in C relies on selection statements: theif statement for simple branching and the switch statement for multi-way decisions based on integer values. The if statement has the syntax if (expression) statement or if (expression) statement else statement, where the controlling expression must have scalar type and is evaluated to determine whether the associated statement executes—true if nonzero, false if zero.[29] The else clause pairs with the nearest preceding if without an else, ensuring unambiguous nesting.[29] This design preserves the traditional semantics of C, treating if bodies as blocks to avoid undefined behavior with features like compound literals.[35]
The switch statement, with syntax switch (expression) statement, requires an integer-type expression and transfers control to a matching case label or the default label if present.[29] Case labels use case constant-expression : statement, where constant-expression are unique integer constants, and at most one default : statement is allowed per switch.[29] Execution falls through from one case to the next unless terminated by a break, a deliberate feature that supports efficient multi-case handling but requires explicit control to prevent unintended continuation.[29][35] Integer promotions are applied to the controlling expression, and no implicit fall-through occurs beyond the switch block.[29]
Loops
Iteration in C is achieved through three loop constructs:while, do-while, and for, each testing a scalar-type controlling expression that must evaluate to nonzero for continuation.[29] The while loop, while (expression) statement, evaluates the expression before each iteration, executing the statement only if nonzero; it terminates when the expression is zero.[29] The do-while loop, do statement while (expression);, executes the statement at least once before testing the expression afterward, repeating if nonzero.[29] This post-test design suits cases requiring initial execution regardless of the condition.[35]
The for loop offers compact syntax for initialization, testing, and updating: for (expression_opt ; expression_opt ; expression_opt) statement or, in C99 and later, for (declaration expression_opt ; expression_opt) statement, where the declaration introduces loop-scoped automatic variables.[29] The first expression_opt (or declaration) initializes, the second tests (nonzero to continue), and the third updates after each iteration; omitting parts yields equivalent while behavior, such as for(;;) for an infinite loop.[29][35] Loop bodies create block scopes, and termination follows the same zero/nonzero rule as other iterations, with variable-length arrays affecting storage if declared inside.[29]
Jump Statements
C includes jump statements for explicit control transfer:goto, continue, break, and return. The goto identifier ; transfers execution to the labeled statement within the same function, where labels (identifier : statement) are unique per function and do not alter normal flow.[29] Jumps forward leave destinations uninitialized if skipping declarations, while backward jumps reinitialize; jumping into or out of variable-length array scopes is forbidden.[29][35]
The continue ; skips the rest of the current iteration in the nearest enclosing while, do, or for loop, proceeding to the update and test.[29] Conversely, break ; exits the nearest enclosing loop or switch, terminating its execution immediately.[29] The return expression_opt ; exits the current function, optionally returning the value of expression (converted to the function's return type); no expression is allowed for void functions.[29]
Other Statements and Rules
Expression statements,expression_opt ;, evaluate an optional expression for side effects (discarding its value), while the null statement ; performs no action, often used in empty loop bodies.[29] Compound statements, { block-item-list_opt }, group declarations and statements into blocks, introducing new scopes for local entities.[29] In C99, declarations may intermix with statements within blocks for greater flexibility.[35]
Loops terminate when their controlling expressions evaluate to zero, but infinite loops arise if conditions remain nonzero (e.g., while(1) or for(;;)), requiring manual intervention via jumps or external signals for exit.[29] C lacks built-in exception handling, instead relying on integer error codes returned by functions to propagate errors manually through control flow.[29][35]
For example, a simple conditional loop might use:
if (x > 0) {
for (int i = 0; i < x; i++) {
if (i % 2 == 0) continue;
printf("%d\n", i);
}
} else {
switch (x) {
case -1: printf("Negative one\n"); break;
default: printf("Non-positive\n");
}
}
This demonstrates branching, iteration with skips, and fall-through prevention.[29]
Functions
In C, functions provide a mechanism for modularizing code by encapsulating reusable blocks of statements that perform specific tasks. A function consists of a declaration, which specifies its interface, and optionally a definition, which provides its implementation. Functions can be invoked from other parts of the program, promoting code reusability and maintainability.[36] Function declarations, also known as prototypes when they include parameter types, inform the compiler of a function's return type, name, and parameter list for type checking during compilation. The syntax isreturn_type function_name(parameter_type1 param1, parameter_type2 param2, ...);, where the parameter names are optional in prototypes but required in definitions. For example, int add(int a, int b); declares a function that returns an integer and takes two integer parameters. Prototypes enable the compiler to verify argument types and promote them appropriately at call sites, reducing errors. Without a prototype, the compiler assumes default promotions, which can lead to mismatches.[37][38]
A function definition includes the declaration followed by a compound statement in curly braces containing the executable code, local variable declarations, and control structures. Local variables are declared within the function body and have automatic storage duration, allocated upon entry and deallocated upon exit. For instance:
int add(int x, int y) {
int sum = x + y; // Local variable
return sum;
}
This definition computes the sum of two integers and returns it. Functions support recursion, where a function calls itself, as long as the recursion depth does not exceed stack limits, allowing solutions to problems like factorial computation or tree traversals.[39][40]
Parameters in C are passed by value, meaning the function receives copies of the arguments, so modifications to parameters do not affect the original values. For arrays, however, the parameter type decays to a pointer to the array's first element, effectively passing the array by reference without copying its contents. Thus, void process(int arr[10]) is equivalent to void process(int *arr), allowing the function to access and modify the original array elements via the pointer. This decay occurs automatically for array arguments in function calls.[41][42]
Variadic functions, which accept a variable number of arguments after a fixed set of parameters using the ellipsis ... in the declaration, were standardized in the ANSI C89 standard. The header <stdarg.h> provides va_list, va_start, va_arg, and va_end macros to access the variable arguments. For example, printf is declared as int printf(const char *format, ...);, where the format string guides extraction of subsequent arguments. The fixed parameters must include at least one, serving as the starting point for va_start.[43][44]
The C99 standard added the inline keyword as a hint to the compiler to inline the function's body at call sites for performance optimization, reducing function call overhead. An inline function definition can be used across translation units if declared appropriately, but the compiler decides whether to inline based on optimization settings. For example, inline int max(int a, int b) { return a > b ? a : b; } suggests inlining without prohibiting a separate out-of-line definition.[20]
Function pointers allow storing the address of a function for indirect invocation, declared as return_type (*pointer_name)(parameter_types);. They enable dynamic selection of functions, such as in callback mechanisms.[45]
Functions have linkage that determines their visibility across translation units: external linkage by default, making them accessible from other files via declarations; or internal linkage with the static specifier, restricting visibility to the defining file. Static functions prevent naming conflicts and unintended external use. The main function serves as the program's entry point, with signature int main(void) for no arguments or int main(int argc, char *argv[]) for command-line arguments, returning 0 for success or a non-zero value for failure. The runtime environment calls main to start execution.[46]
Pointers, Arrays, and Memory
Pointers in C are variables that store memory addresses, enabling indirect access to objects or functions. A pointer is declared by prefixing the type with an asterisk (*), such asint *p;, which declares p as a pointer to an integer.[47] The address-of operator (&) obtains the address of an object, as in p = &var;, assigning the address of var to p.[47] Dereferencing a pointer with the unary * operator accesses the value at that address, for example *p = 42; modifies the object pointed to by p.[47] A null pointer constant can be the predefined NULL from <stddef.h> or, in C23, the keyword nullptr of type nullptr_t, both representing an invalid or uninitialized pointer; neither must be dereferenced, as doing so results in undefined behavior.[47][48]
Arrays in C provide a means to store a fixed number of elements of the same type contiguously in memory. An array is declared with square brackets specifying the size, such as int arr[5];, which allocates space for five integers.[47] Multi-dimensional arrays are declared with multiple bracket sets, like int matrix[3][4];, representing a 3-by-4 grid stored in row-major order, where elements are laid out contiguously from lower to higher addresses.[47] Arrays can be initialized using brace-enclosed lists, for instance int arr[3] = {1, 2, 3};, which sets the first three elements and zero-initializes any remainder if the array size exceeds the initializer count.[47] For unknown-size arrays, the initializer determines the size, as in int arr[] = {1, 2, 3};.[47]
In most expressions, an array name decays to a pointer to its first element, establishing the equivalence between arrays and pointers; thus, arr in int arr[5]; becomes &arr[0] of type int*.[47] This decay does not occur for sizeof, address-of, or certain initializations. Pointer arithmetic operates on such pointers, allowing increment (p++) or decrement (p--) to advance to the next or previous element, scaled by the size of the pointed-to type.[47] Array indexing is equivalent to pointer arithmetic: arr[i] is the same as *(arr + i), where addition yields a pointer to the i-th element.[47] Arithmetic is only defined within the same array object; operations outside lead to undefined behavior.[47]
Dynamic memory allocation allows runtime requests for variable-sized blocks from the heap, managed by functions in <stdlib.h>. The malloc(size_t size) function allocates size bytes of uninitialized memory and returns a void* pointer to it, or NULL on failure.[47] calloc(size_t nmemb, size_t size) allocates space for nmemb objects of size bytes each, initializing them to zero.[47] realloc(void *ptr, size_t size) resizes a previously allocated block at ptr to size bytes, preserving contents if possible, and returns a new pointer or NULL.[47] Memory is deallocated with free(void *ptr), which must match a prior allocation and not be called on NULL or freed pointers, to avoid undefined behavior.[47] Allocated pointers are suitably aligned for any object type that fits the size.[47]
C distinguishes storage durations for memory: automatic storage (typically on the stack) for local variables, which is deallocated on scope exit; static storage for globals and static locals, retained throughout execution; and allocated storage (heap) for dynamic blocks, manually managed via allocation functions.[47] The stack grows and shrinks automatically with function calls, while the heap expands as needed but requires explicit deallocation to prevent leaks. Alignment requirements ensure objects start at addresses that are multiples of their alignment value, an implementation-defined integer greater than zero; for example, basic types like int often align to 4 bytes on 32-bit systems.[47] Structures may include padding bytes between members to satisfy the alignment of subsequent elements or the overall structure alignment, which is the maximum of its members' alignments.[47] The _Alignof or alignof (C23) operator queries an object's alignment, and allocation functions guarantee alignment sufficient for the largest basic type.[47]
#include <stdlib.h>
int main() {
int *p = malloc(sizeof(int) * 5); // Allocate array on heap
if (p != NULL) {
p[0] = 1; // Equivalent to *(p + 0) = 1
free(p); // Deallocate
}
return 0;
}
Standard Library
Core Library Features
The C standard library encompasses a set of header files that declare essential functions, types, and macros for general-purpose programming tasks, promoting portability across implementations.[47] Prominent among these are<stdlib.h> for general utilities including conversions, random number generation, and searching/sorting; <string.h> for manipulating character arrays; <time.h> for time representation and measurement; <locale.h> for locale-specific behaviors; and <errno.h> for error indication.[47] These components form the foundation for non-specialized operations, integrating seamlessly with C's function and pointer mechanisms to support efficient program development.[20]
String handling in the C library centers on null-terminated strings, defined as contiguous sequences of characters terminated by and including the first null character \0.[47] The <string.h> header declares functions operating on these strings, treating characters as unsigned char values for byte-wise processing.[47] For instance, strlen computes the number of characters before the null terminator, returning a size_t value; strcpy copies the source string, including the terminator, to a destination array; and strcmp compares two strings, yielding a negative value if the first precedes the second lexicographically, zero if equal, or positive otherwise.[47] These functions enable basic text manipulation without built-in bounds checking, emphasizing the programmer's responsibility for memory safety.[20]
Utility functions in <stdlib.h> support diverse tasks such as data conversion, randomization, and array processing.[47] The atoi function converts the initial numeric portion of a string to an int, equivalent to using strtol with base 10 and ignoring trailing non-digits.[47] For randomness, rand generates pseudo-random integers in the range 0 to RAND_MAX (at least 32767), seeding via srand for reproducibility.[47] Sorting and searching capabilities include qsort, which arranges an array of objects in ascending order using a user-provided comparison function, and bsearch, which locates a key in a sorted array via binary search, returning a pointer to the match or NULL if absent.[47] These tools, along with types like size_t and macros such as NULL, facilitate algorithmic implementations with minimal overhead.[20]
Time and date support is provided by <time.h>, which defines arithmetic types for temporal representations.[47] The time_t type represents calendar times, while clock_t measures processor time in clock ticks.[47] The clock function returns an approximation of the processor time consumed by the program since invocation, expressed in seconds as clock() divided by CLOCKS_PER_SEC, or (clock_t)-1 if the value cannot be computed.[47] Additional structures like struct tm decompose times into components (e.g., year, month, day), supporting conversions between broken-down and calendar formats for date arithmetic.[20]
Error handling mechanisms ensure functions can signal failures portably.[47] The <errno.h> header declares errno as a modifiable lvalue of type int, initialized to zero at program startup and set to a positive error code by library functions upon detecting conditions like domain errors or range overflows.[47] Macros such as EDOM, ERANGE, and EILSEQ define standard error values.[47] The perror function maps the current errno value to an error message, optionally prefixed by a user-supplied string, aiding in diagnostic reporting.[47]
Locale support in <locale.h> enables adaptation to cultural conventions for data formatting and collation.[47] The setlocale function configures categories of the program's locale (e.g., numeric formatting, collation) based on a locale specification string, returning a pointer to the current locale name or NULL on failure.[47] It influences behaviors like decimal point characters and string comparisons in functions such as strcoll.[20] The localeconv function populates a struct lconv with locale-dependent values for monetary and numeric formatting, such as thousands separators and currency symbols, returned as a pointer to the structure.[47] This framework supports internationalization by allowing locale-aware adjustments without altering core language semantics.[20]
Input/Output and File Handling
C's input/output facilities are primarily defined in the<stdio.h> header, which provides functions for handling streams, including console input/output and file operations. These facilities abstract the underlying hardware and operating system details, allowing portable data transfer between programs and external devices or files. Streams are represented by the FILE type, which encapsulates buffering, positioning, and error status for I/O operations.[49]
The language defines three standard streams: stdin for input, stdout for normal output, and stderr for error output. These are predefined FILE pointers available upon program startup, typically associated with the console or terminal. The printf and scanf families operate on these streams by default; printf formats and outputs data to stdout, while scanf reads formatted input from stdin. For instance, printf("Value: %d\n", x); outputs the integer x using the %d specifier for decimal integers. Similarly, scanf("%d", &x); parses input as an integer. These functions support a variety of format specifiers, such as %s for strings (null-terminated character arrays) and %f for floating-point numbers, enabling type-safe and readable I/O.[49]
For file handling, programs use fopen to open a file and obtain a FILE pointer, specifying a filename and mode string (e.g., "r" for read, "w" for write). The returned pointer allows subsequent operations like fread to read binary data into a buffer or fwrite to write from a buffer to the file. fclose releases the file and flushes any buffered data. Files can be opened in text mode (default on hosted environments), where newline characters (\n) are translated to platform-specific line endings, or binary mode (appended with "b", e.g., "rb"), which treats data as raw bytes without translation. This distinction ensures portability across systems with differing text representations.[50][49]
Formatted file I/O extends console functions with fprintf and fscanf, which take a FILE pointer as the first argument after the format string. For example, fprintf(fp, "%s: %d\n", name, value); writes to file fp using string and integer specifiers, while fscanf(fp, "%s %d", str, &num); reads matching input. These functions parse or generate output according to the specifiers, handling conversions like %d for signed integers and %s for strings up to a null terminator or width limit.
Buffering improves I/O efficiency by temporarily storing data in memory before transfer. Streams are fully buffered (block-based), line-buffered (flush on newline for interactive streams like stdout), or unbuffered. The setbuf function allows explicit buffer assignment or disabling buffering (by passing NULL), called immediately after fopen to avoid prior I/O. For finer control, setvbuf specifies mode (_IOFBF for full, _IOLBF for line, _IONBF for none) and buffer size.[49]
Error handling relies on stream flags for end-of-file (EOF) and errors. feof returns nonzero if the EOF indicator is set (after a read attempt beyond file end), ferror checks for I/O errors (e.g., disk full), and both query without altering state. clearerr resets both indicators, enabling reuse after recovery. These macros, along with errno for detailed errors, facilitate robust file operations. For example:
FILE *fp = fopen("data.txt", "r");
if (fp == NULL) {
// Handle open failure
}
char buf[100];
while (fread(buf, 1, sizeof(buf), fp) > 0) {
if (ferror(fp)) {
// Handle read error
clearerr(fp);
break;
}
}
if (feof(fp)) {
// Normal end
}
fclose(fp);
This approach detects issues without assuming success.[49]
The C11 standard introduced optional bounds-checking interfaces in Annex K, including fopen_s, which opens files with enhanced security checks (e.g., validating pointers and modes) and returns an errno_t error code instead of NULL on failure. This mitigates buffer overflows and invalid accesses, though adoption is limited to supporting implementations like Microsoft Visual C++. fopen_s updates the FILE pointer via an output parameter, differing from fopen's direct return.[49][51]
Mathematical and Utility Functions
The<math.h> header in the C standard library provides a suite of functions for common mathematical operations on floating-point numbers, enabling computations such as trigonometric, exponential, and power-related tasks. These functions primarily operate on double arguments and return double results, with overloaded variants (suffixed f for float and l for long double) available for other precision levels. Representative examples include sin(x) to compute the sine of x in radians, cos(x) for the cosine, pow(x, y) to raise x to the power y, and sqrt(x) for the non-negative square root of x.[52] All such functions adhere to the requirements of the ISO/IEC 9899 standard, ensuring consistent behavior across compliant implementations where domain errors or range errors may set the floating-point exception flags.[31]
The <math.h> header may also define symbolic constants for mathematical values, such as M_PI approximating π (approximately 3.141592653589793), though these are not mandated by the ISO C standard and remain implementation-defined; they are standardized as extensions in POSIX environments.[53] For instance, to use M_PI in a computation:
#include <math.h>
double circumference = 2.0 * M_PI * radius;
These constants facilitate precise numerical work without hardcoding approximate values, but programmers must verify availability in their target implementation.
Introduced in the C99 revision of the ISO C standard, the <complex.h> header supports complex number arithmetic through built-in types like double _Complex (aliased as double complex for convenience) and corresponding functions for operations on these types. Key functions include csqrt(z) to compute the principal square root of the complex number z and cexp(z) for the exponential of z, both returning complex results that handle real and imaginary components appropriately.[54] These facilities extend real-valued math to the complex plane, useful for signal processing and scientific simulations. An example usage is:
#include <complex.h>
double complex z = 1.0 + 2.0 * I;
double complex result = csqrt(z); // Computes sqrt(1 + 2i)
The standard library includes facilities for pseudo-random number generation to support simulations and testing. In <stdlib.h>, the rand(void) function generates a pseudo-random integer in the range 0 to RAND_MAX (at least 32767), while srand(seed) initializes the generator with a given seed for reproducibility. For double-precision uniform random numbers in [0, 1), the POSIX extension drand48(void) provides a 48-bit linear congruential generator, seeded via srand48(seed).[55] These are used with floating-point types like double to produce sequences suitable for Monte Carlo methods, though quality varies by implementation and rand is not cryptographically secure.
The <signal.h> header enables basic signal handling for asynchronous events, such as interrupts or errors. The signal(sig, handler) function installs a handler for signal sig (e.g., SIGINT for interrupt), specifying either a custom function, SIG_DFL for default action, or SIG_IGN to ignore the signal.[56] Complementing this, raise(sig) generates the signal sig synchronously within the current process, invoking the handler if set. This mechanism allows programs to respond to conditions like user interrupts gracefully, as in:
#include <signal.h>
#include <stdio.h>
void handler(int sig) {
printf("Signal %d received\n", sig);
}
int main() {
signal(SIGINT, handler);
raise(SIGINT); // Triggers the handler
return 0;
}
In the C23 standard (ISO/IEC 9899:2024), the new <stdckdint.h> header introduces checked integer arithmetic functions to detect overflow and other errors during operations on integer types, promoting safer numerical computations. These type-generic macros, such as ckd_add(&r, a, b) for checked addition, perform the operation and store the result in r if no overflow occurs, returning false on success or true on overflow (which would otherwise be undefined behavior in C for signed integers). Similar macros exist for subtraction (ckd_sub), multiplication (amul wait, ckd_mul), and other operations, applicable to types like int and long. This addition addresses long-standing concerns in integer-heavy code, such as in embedded systems. For example:
#include <stdckdint.h>
int a = INT_MAX;
int b = 1;
int result;
if (!ckd_add(&result, a, b)) {
// Use result safely
} else {
// Handle overflow
}
C23 also introduces <stdbit.h>, providing utility functions for bit manipulation, including popcount(x) to count the number of 1 bits in an integer x, leading_zeros(x) for the number of leading zero bits, and parity(x) for the parity of the bit count. These are available as type-generic macros for various integer widths (e.g., 8, 16, 32, 64 bits), aiding in low-level optimizations and algorithmic implementations.[57]
The <fenv.h> header, added in C99, provides access to the floating-point environment for fine-grained control over arithmetic behavior, including exception status flags (e.g., for overflow or invalid operations) and control modes like rounding direction. Functions such as fegetenv(envp) save the current environment to fenv_t *envp, while fesetenv(envp) restores it, allowing temporary modifications without affecting global state.[58] Macros like FE_DIVBYZERO and FE_TONEAREST facilitate querying and setting these modes, essential for high-precision numerical algorithms compliant with IEC 60559 (IEEE 754). These tools are used with C's floating-point types to ensure deterministic results in portable code, such as saving and restoring the environment around non-standard operations.
Implementations and Extensions
Compilers and Build Tools
The GNU Compiler Collection (GCC) is an open-source compiler suite that serves as a cornerstone for C development across diverse platforms. Developed by the GNU Project, GCC supports compilation of C code conforming to ISO standards up to C23, enabling features like improved type genericity and attributes for better code diagnostics. As of GCC 15 (2025), C23 mode is the default.[59] It offers extensive optimization options, cross-compilation capabilities for numerous architectures, and integration with GNU tools for seamless development workflows.[60] Clang, part of the LLVM project, provides a modern alternative to GCC with emphasis on fast compilation times and detailed diagnostics. As a production-quality frontend for C, it supports the C23 standard via the-std=c23 flag, with partial implementation of its features as of Clang 20 (2025), including enhancements for Unicode handling and fixed-width integer types.[61] Clang's modular design facilitates integration with IDEs like Xcode and Visual Studio, and its intermediate representation enables advanced optimizations shared with other LLVM-based tools.[62]
Microsoft Visual C++ (MSVC), integrated into Visual Studio, is optimized for Windows environments and includes proprietary extensions for enhanced performance in Microsoft ecosystems. It achieves full conformance to C11 and C17, with partial support for C23 features as of Visual Studio 2022 version 17.14, such as _Static_assert and anonymous unions, while prioritizing compatibility with Windows APIs.[63] MSVC excels in link-time code generation and security mitigations, making it suitable for enterprise applications requiring robust debugging and profiling.[64]
Build systems automate the compilation process by managing dependencies and generating executables from source files. GNU Make, a standard tool in Unix-like environments, uses Makefiles to define rules for rebuilding only modified components, supporting parallel execution via the -j flag for efficient large-scale projects.[65] CMake, a cross-platform meta-build system, generates native build files (e.g., Makefiles or Visual Studio solutions) from a platform-independent CMakeLists.txt script, facilitating out-of-source builds and dependency resolution for multi-language projects.[66]
Debuggers enable runtime inspection and control of C programs. The GNU Debugger (GDB) allows setting breakpoints, examining memory, and stepping through code execution, with support for remote debugging over networks and multi-threaded applications.[67] LLDB, LLVM's debugger, offers similar capabilities with faster performance and native integration for C on macOS and iOS, including expression evaluation using Clang's parser for accurate variable inspection.[68]
Static analysis tools detect potential bugs and vulnerabilities without executing code. Coverity, developed by Synopsys (now under Black Duck), performs scalable static application security testing (SAST) on C codebases, identifying issues like buffer overflows and resource leaks through deep interprocedural analysis.[69] The Clang Static Analyzer, integrated into the Clang toolchain, uses symbolic execution to uncover defects such as null pointer dereferences and use-after-free errors in C programs, often invoked via scan-build.[70]
Dialects and Embedded Variants
C dialects and embedded variants extend or restrict the standard language to meet specific needs, such as safety in critical systems or compatibility with proprietary environments. These variations often arise in domains like automotive software, operating systems, and resource-limited devices, where standard C may introduce risks or lack necessary features.[71][72] MISRA C provides guidelines for using C in safety-critical embedded systems, particularly in automotive and real-time applications, by defining a restricted subset that avoids undefined behaviors and promotes reliability. The latest edition, MISRA C:2023, includes 200 rules and 22 directives, totaling 222 guidelines, emphasizing decidable constructs and minimizing dynamic memory allocation to reduce faults in environments like vehicle control units. Compliance with MISRA C is mandatory in standards such as ISO 26262 for functional safety in road vehicles.[73] Compiler-specific extensions introduce non-standard features to C for platform optimization or integration. Microsoft's Visual C++ adds keywords like__declspec for attributes such as DLL export/import and thread safety, enabling Windows-specific behaviors like structured exception handling without altering core syntax. Similarly, GNU C extensions via GCC include __attribute__ for function and variable annotations, such as specifying alignment or deprecation, which enhance portability across Unix-like systems while allowing fine-tuned optimizations.[72][74]
The C standard distinguishes between hosted and freestanding implementations to accommodate diverse environments. In a hosted implementation, the full standard library is available, supporting general-purpose programming with features like file I/O and dynamic allocation. Freestanding implementations, common in embedded systems, require only a minimal set of headers (e.g., <float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>) and lack the broader library, suiting bare-metal or kernel development where no operating system intervenes.[75]
CERT C secure coding standards, developed by Carnegie Mellon University's Software Engineering Institute, offer rules to mitigate vulnerabilities in C programs, focusing on issues like buffer overflows and integer errors. The standard comprises over 100 rules and recommendations, prioritized by risk, and is aligned with the C11 standard while promoting practices like bounds checking and secure string handling for high-assurance software. Adoption of CERT C has been linked to reduced defect rates in critical infrastructure code.[76]
The C11 standard's Annex K provides optional bounds-checking functions like strcpy_s and strcat_s, which may be supported in hosted implementations; freestanding environments in C23 and earlier require only minimal headers without such library functions. This profile targets embedded systems by standardizing bounded operations and attributes for better interoperability in minimal setups.[23][77]
Historically, the ISO/IEC JTC1/SC22/WG14 committee proposed an Embedded C subset in the early 2000s to address limitations in standard C for microcontrollers, but the effort was discontinued without formal adoption due to insufficient consensus and evolving industry needs.
Applications
Systems and Kernel Programming
C's low-level capabilities make it particularly suited for systems and kernel programming, where direct interaction with hardware and operating system primitives is essential. The language enables developers to access hardware registers and manage system resources through pointers, which allow precise manipulation of memory addresses, and inline assembly, which embeds machine-specific instructions directly into C code for tasks like context switching or device initialization. For instance, pointers can be used to read or write to memory-mapped I/O ports, providing a portable abstraction over raw hardware addresses without the need for full assembly code.[78][79] A seminal example of C's application in kernel development is the rewriting of the Unix kernel in 1973, which marked a pivotal shift from assembly language to a higher-level yet efficient language for operating systems. Developed by Dennis Ritchie at Bell Labs, this rewrite for the PDP-11 minicomputer demonstrated C's viability for core OS components, allowing the kernel to handle processes, file systems, and device drivers while maintaining performance comparable to assembly. This innovation influenced subsequent Unix-like systems, including Linux and BSD variants, where the kernel is predominantly written in C to leverage its balance of abstraction and control. The Linux kernel, initiated by Linus Torvalds in 1991 as a Unix-compatible system, inherits this tradition, with its core modules implemented in C for cross-architecture compatibility. Similarly, BSD kernels, such as those in FreeBSD, rely on C for their monolithic structure, enabling modular driver development and system calls.[1][80][81] In kernel environments, memory management eschews standard library functions like malloc to avoid dependencies on user-space abstractions and potential vulnerabilities; instead, custom allocators such as slab allocators or buddy systems are implemented in C to handle kernel memory pools efficiently. These mechanisms allocate fixed-size blocks for objects like page tables or process descriptors, ensuring deterministic behavior and minimizing fragmentation in resource-constrained settings. For example, the Linux kernel uses kmalloc for small allocations and vmalloc for larger, virtually contiguous regions, all coded in C to integrate seamlessly with the hardware's memory management unit.[82] System calls and interrupts are handled in C kernels through a combination of C functions and inline assembly for low-level entry points, allowing user-space requests to transition securely to kernel mode. Interrupts from hardware devices trigger C-based interrupt service routines (ISRs) that acknowledge the event and schedule deferred processing, while system calls invoke kernel functions via a software interrupt or dedicated instruction like syscall on x86-64. This approach ensures responsive handling of I/O events and resource requests without excessive overhead.[83][84] The rationale for adopting C in kernel programming centers on its superior performance and portability compared to pure assembly. Assembly, while offering ultimate control, ties code to specific architectures, complicating maintenance and porting; C provides near-equivalent efficiency through optimized compilers while abstracting machine details, as evidenced by Unix's successful port to diverse hardware post-1973 rewrite. This portability enabled widespread adoption, reducing development time for new platforms without sacrificing the speed critical for kernel operations. Modern examples include the Windows NT kernel, where core components like the executive and device drivers are written primarily in C, augmented by assembly only for architecture-specific optimizations, supporting Microsoft's multi-platform strategy.[1][85]Embedded and Real-Time Systems
C is widely adopted for programming microcontrollers in embedded systems due to its efficiency and low-level hardware access, particularly on platforms like Arduino boards and ARM Cortex-M series processors. Arduino utilizes a subset of C/C++ through its integrated development environment, allowing developers to write code that directly controls peripherals such as digital pins, analog inputs, and serial communication on microcontrollers like the ATmega328P.[86] Similarly, ARM Cortex-M microcontrollers, prevalent in industrial and consumer embedded devices, leverage standard C with vendor-specific libraries to manage CPU operations, timers, and GPIO interfaces, enabling compact firmware for applications ranging from sensors to motor controls.[87] In real-time environments, C supports interrupt service routines (ISRs) essential for handling time-sensitive events, where thevolatile keyword prevents compiler optimizations that could lead to stale data reads or writes in shared variables. This qualifier is particularly vital when ISRs interact with main program loops, ensuring that memory locations affected by hardware interrupts or asynchronous events are consistently accessed without caching assumptions.[88] Real-time operating systems such as FreeRTOS further extend C's capabilities by providing APIs for task creation, queuing, and semaphores, all implemented in portable C code that integrates seamlessly with microcontroller hardware abstraction layers.[89]
Embedded C applications often operate under severe memory constraints, favoring static allocation via global or local variables to guarantee deterministic behavior and avoid the fragmentation risks of dynamic heap allocation with functions like malloc and free. This approach aligns with the fixed resources of microcontrollers, where stack and data segments are predefined at compile time to prevent runtime overflows.[90] Cross-compilation tools facilitate deployment to specific targets; for instance, GCC variants target AVR microcontrollers used in Arduino, while Microchip's MPLAB XC8 compiler optimizes C code for PIC devices, generating efficient assembly for resource-limited hardware.[91]
Safety standards like MISRA C enforce subsets of the language to enhance reliability in critical domains, prohibiting unsafe constructs such as pointer arithmetic or recursion to reduce defects in avionics software certified under DO-178C. In automotive systems, MISRA-compliant C integrates with AUTOSAR architectures, where basic software modules adhere to these guidelines for deterministic execution in ECUs handling engine control and advanced driver assistance.[92][93]
Scientific Computing and Games
C's efficiency in handling computationally intensive tasks has made it a cornerstone for scientific computing, particularly in domains requiring high-performance numerical operations. Libraries such as BLAS (Basic Linear Algebra Subprograms) provide optimized routines for vector and matrix operations, with C interfaces like CBLAS enabling seamless integration into C programs for tasks like matrix multiplication and linear systems solving. Similarly, LAPACK (Linear Algebra Package) builds on BLAS to offer advanced algorithms for eigenvalue problems and singular value decomposition, accessible via the LAPACKE C interface, which is widely used in simulations where precision and speed are paramount. These libraries are foundational in high-performance computing environments, allowing developers to leverage low-level optimizations without reinventing core mathematical primitives.[94] For Fourier transforms and signal processing in scientific applications, the FFTW library stands out as a comprehensive C-based implementation that supports one- and multi-dimensional transforms for real and complex data, achieving near-optimal performance through sophisticated planning algorithms and SIMD exploitation. In numerical simulations, C facilitates computational fluid dynamics (CFD) codes that solve partial differential equations via finite volume or finite element methods, emphasizing tight loops for iterative solvers to minimize overhead. Physics engines like Bullet, while primarily in C++, expose a C API for integration, enabling real-time rigid body dynamics and collision detection in simulations that demand low-latency computations. These applications highlight C's role in optimizing for computational intensity, where developers employ techniques such as cache-friendly data layouts—through blocking to enhance locality—and SIMD intrinsics to vectorize operations, potentially yielding 4x to 8x speedups in array-heavy workloads.[95][96] In financial modeling, C is employed for Monte Carlo simulations and option pricing models, capitalizing on its speed for risk assessments involving stochastic differential equations. To bridge C's performance with higher-level scripting, bindings like NumPy's C backend allow Python users to invoke optimized C routines for array operations and linear algebra, powering much of scientific Python's numerical capabilities.[97][98] In game development, C's direct hardware control has influenced engines prioritizing performance, such as the early id Tech engines used in Doom, which relied on C and assembly for rendering and physics to achieve real-time 3D graphics on 1990s hardware. Later engines like Unreal trace their roots to C-style programming paradigms, evolving into C++ but retaining C's emphasis on efficient memory management and low-level optimizations for graphics pipelines. These uses underscore C's enduring value in domains where every cycle counts, from scientific discovery to immersive entertainment.[99]Influence on Other Languages
C's syntax has profoundly shaped the design of subsequent programming languages, providing a familiar foundation for developers transitioning from low-level to higher-level paradigms. C++ extends C directly, retaining its core syntax while adding object-oriented features, as its creator Bjarne Stroustrup emphasized the need for compatibility with existing C codebases to ensure widespread adoption.[100] Similarly, Java adopted a C-like syntax to appeal to C and C++ programmers, with James Gosling prioritizing familiarity in control structures, declarations, and operators during its development at Sun Microsystems.[101] Go draws from the C family for its basic syntax, including curly braces and semicolon usage, to simplify learning for systems programmers while introducing modern concurrency primitives.[102] Rust also borrows heavily from C's syntax for expressions and statements, facilitating interoperability with C libraries, though it diverges in memory management to enhance safety.[103] Zig maintains a C-like syntax as a modern systems programming language designed to improve upon C, emphasizing seamless compatibility with C libraries and low-level control.[104] Beyond direct syntactic borrowing, C serves as a compilation target for intermediate representations in modern compiler infrastructures. The LLVM project, which underpins compilers like Clang for C, generates LLVM Intermediate Representation (IR) from C source code, enabling optimizations across diverse hardware platforms before final code generation.[105] Tools like LLJVM extend this capability to produce JVM bytecode from C, allowing C code to run on the Java Virtual Machine, though such approaches remain specialized due to semantic mismatches between C's low-level features and the JVM's object-oriented model. C's role extends to implementing runtimes for higher-level languages, leveraging its efficiency and portability. The CPython interpreter, the reference implementation of Python, is written primarily in C to handle core execution, memory management, and extension modules efficiently.[106] Likewise, Ruby's Matz's Ruby Interpreter (MRI), the standard Ruby implementation, is implemented in C, providing the foundational virtual machine for Ruby's dynamic features. In transpilation and web ecosystems, C functions as an intermediate language for cross-platform deployment. Emscripten compiles C and C++ code to WebAssembly and JavaScript, enabling legacy C libraries to run in browsers with near-native performance.[107] This pattern continues in ongoing WebAssembly development, where Clang directly compiles C to WebAssembly modules, supporting portable execution in web environments without traditional JavaScript intermediaries.[108] Historically, C was prevalent for Common Gateway Interface (CGI) scripting in early web servers during the 1990s, generating dynamic HTML from C programs invoked by HTTP requests, but its use declined post-2000s in favor of interpreted languages like Perl and PHP due to process overhead and security concerns.[109][110]Limitations and Criticisms
Security and Safety Issues
One of the primary security vulnerabilities in C programs stems from buffer overflows, which occur when data exceeds the allocated buffer size due to the language's absence of automatic bounds checking for arrays and strings. This allows attackers to overwrite adjacent memory, potentially leading to arbitrary code execution or data corruption. For instance, the standard library functionstrcpy copies strings without verifying destination buffer capacity, enabling overflows if the source string is longer than the target.[111] A historical example is the 1988 Morris worm, which exploited a buffer overflow in the fingerd daemon on UNIX systems by sending a 536-byte string via the unchecked gets function, overwriting the stack to execute a shell command and propagate the worm across approximately 6,000 machines, or 10% of the internet at the time.[112]
Memory leaks represent another safety issue, arising when dynamically allocated memory via functions like malloc is not freed with free, causing gradual resource exhaustion in long-running programs. This can lead to denial-of-service conditions as available memory diminishes, though it is less directly exploitable than overflows. In embedded or server applications, unfreed allocations in loops exacerbate the problem, potentially crashing systems under sustained load.[113]
C's undefined behavior further compounds risks, such as signed integer overflow, where exceeding an integer's representable range (e.g., INT_MAX + 1) invokes no guaranteed outcome, allowing compilers to optimize away checks and enable exploits like unintended buffer allocations. Similarly, dereferencing a null pointer triggers undefined behavior, often resulting in program crashes but potentially enabling code injection if memory at address 0 is writable, as seen in vulnerabilities like CVE-2009-2692 in the Linux kernel.[114][115]
The language's type unsafety, characterized by weak typing and no built-in bounds enforcement, permits implicit conversions and unchecked accesses that propagate errors, such as casting pointers without validation leading to invalid memory operations. This design choice prioritizes performance over safety, making type-related bugs common in low-level code.[116]
To mitigate these issues, developers employ static analysis tools that scan source code for potential vulnerabilities without execution, enforcing standards like SEI CERT C rules to detect overflows and leaks early. Runtime tools like AddressSanitizer, integrated into compilers such as Clang, instrument code to identify buffer overflows, use-after-free errors, and other memory issues at runtime with detailed stack traces, imposing about a 2x slowdown. For concurrency-related races, C11's atomic operations in <stdatomic.h> ensure thread-safe access to shared variables, preventing data races by guaranteeing indivisible updates.[117][118][119]
Portability and Maintenance Challenges
One significant portability challenge in C arises from platform-specific differences in data representation, particularly endianness and integer sizes. Endianness determines the byte order of multi-byte data types, with big-endian systems storing the most significant byte first and little-endian systems the least significant byte first; since the C standard does not mandate a specific endianness, code that assumes one order—such as when serializing data for network transmission—may fail across platforms without explicit handling like byte-swapping functions. Similarly, the sizes of fundamental types likeint and long vary: while int is typically 32 bits on modern systems, the standard permits as few as 16 bits, and long can differ between 32 and 64 bits depending on the architecture, leading to overflows or incorrect assumptions in portable code unless fixed-width types from <stdint.h> (introduced in C99), such as int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, and uint64_t, are used.[120][121]
Compiler extensions exacerbate non-portability by allowing vendor-specific features that deviate from the ISO C standard, such as GCC's __attribute__ for function attributes or Microsoft's non-standard keywords like __declspec. These extensions enable optimizations or platform-specific behaviors but result in code that compiles only on the extending compiler, complicating cross-compiler portability; for instance, using GCC's zero-length arrays as flexible array members before C99 standardization can break on other compilers.[122][123] To mitigate this, developers must avoid extensions or isolate them with conditional compilation, though widespread adoption in legacy code often hinders full portability.[124]
The C preprocessor introduces pitfalls through macro misuse and abuse, which can obscure code intent and undermine portability. Macros lack type safety and scoping, leading to unintended substitutions; for example, defining #define SQUARE(x) x * x fails for SQUARE(a++) due to multiple evaluations, causing incorrect results or undefined behavior.[125] Preprocessor abuse, such as over-reliance on #define for configuration, often results in "macro hell" with deeply nested expansions that are hard to debug and non-portable across compilers with differing preprocessing behaviors.[126] Additionally, excessive use of #ifdef for portability creates proliferating, unmaintainable code branches that grow with each new platform, as seen in early UNIX porting efforts where such directives led to fragmented implementations.[127]
Prior to proposals in C23 discussions, C lacked built-in module support, relying instead on header files that cause namespace pollution by exposing internal symbols and requiring full recompilation on changes. Header pollution occurs when including a header inadvertently pulls in unrelated declarations, increasing coupling and build times; for example, system headers like <stdio.h> may define macros that conflict with user code, forcing workarounds like include guards or forward declarations.[128] This absence of modules forces opaque interfaces via incomplete struct types (e.g., struct foo; typedef struct foo *foo_t;), which hide implementations but complicate maintenance by scattering definitions across translation units.[129]
Maintenance challenges in C stem from verbose error handling and the lack of native generics, necessitating error-prone workarounds. Error handling typically involves checking return values manually, such as verifying malloc returns non-NULL, which clutters code with repetitive if-statements and increases the risk of overlooked failures in large programs.[130] Without generics, reusable data structures like queues require void* pointers for type erasure, sacrificing type safety; for instance, a generic list might store void* elements, requiring casts that can lead to runtime errors if types mismatch, as commonly implemented in standard library extensions.[131][132] These approaches demand diligent documentation and testing to maintain correctness across evolutions.
Best practices for addressing these issues include conditional compilation with #ifdef for platform-specific code and using abstract types to encapsulate implementations. For example, #ifdef __linux__ can select Linux-specific APIs while providing fallbacks, reducing non-portable sections when combined with autoconf tools for detection.[133] Abstract types, via opaque pointers, promote information hiding, as in library designs where users interact only through handles without accessing internals, enhancing modularity.[129] The ISO C standards, such as C99 and C11, aid portability by standardizing fixed-width integers and type-generic macros, though adoption varies.[23]
Evolution and Future Directions
The evolution of the C programming language proceeds at a measured pace, guided by the ISO/IEC JTC1/SC22/WG14 committee, which emphasizes backward compatibility to preserve the integrity of decades-old codebases across diverse systems. This committee-driven process ensures that revisions introduce minimal disruptions, with the most recent major update, C23 (ISO/IEC 9899:2024), focusing on refinements like bit-precise integers and improved attributes rather than radical overhauls. As a result, C's development cycle spans years, allowing for thorough vetting but limiting rapid adaptation to modern paradigms.[134] Key gaps persist in C's design, notably the lack of native support for dynamic strings and standard containers, compelling developers to depend on third-party libraries such as those in the GNU C Library or custom implementations for everyday data management tasks. These omissions stem from C's foundational philosophy of minimalism and portability, which avoids bloating the core language at the expense of performance and simplicity. While extensions like the C23nullptr keyword and [deprecated](/page/deprecated) attribute address some usability issues, broader structural enhancements remain deferred to future standards.[135]
As of 2025, the WG14 committee is advancing proposals for the next revision, informally termed C2Y, anticipated later in the decade, with a focus on enhancing expressiveness and concurrency. Notable submissions include standard prefixed attributes for more flexible metadata (N3661), thread attributes to enable customizable thread creation with ABI-resistant extensibility (N3690), and utilities like countof for safer array handling alongside new escape sequences for octal literals (N3353). These aim to streamline parallelism and reduce common pitfalls without compromising C's efficiency, building on C23's foundations in bit manipulation and floating-point consistency. Additionally, refinements to math functions, such as cleaned-up prototypes for frexp and scalbn (N3704), underscore ongoing efforts to modernize the standard library.[136][137]
Adoption of C23 remains uneven across compilers, reflecting the challenges of integrating new features into entrenched toolchains. GCC versions 13 and later, along with Clang 16 onward, offer robust support for most C23 elements, including the #embed directive and typeof keyword, enabling experimental use in open-source projects. In contrast, MSVC provides only partial implementation, with gaps in areas like IEEE 754 decimal types and advanced enumerations, slowing enterprise uptake on Windows platforms. Community initiatives, including reference implementations in LLVM and contributions via WG14 mailing lists, are accelerating conformance, though full ecosystem maturity is projected for 2026–2027.[138]
C faces growing competition from languages like Rust, Go, and Zig in systems programming, where demands for memory safety and inherent concurrency challenge its dominance. Rust's ownership model eliminates common vulnerabilities without garbage collection overhead, gaining traction in kernel modules and embedded firmware, while Go's goroutines simplify scalable networking, appealing to cloud-native development. Zig, an emerging general-purpose language, emphasizes manual memory management without garbage collection, compile-time safety features, and seamless interoperability with C code, attracting interest in performance-critical and real-time applications.[139][140] Despite this, C retains primacy in performance-critical domains like operating systems and device drivers, bolstered by its unmatched portability and optimization maturity. To sustain relevance amid AI-driven hardware complexity and secure coding mandates, future C revisions may explore modules for better encapsulation and contracts for runtime verification, as discussed in ongoing WG14 technical specifications.[141][142]