Recent from talks
Nothing was collected or created yet.
Pointer (computer programming)
View on WikipediaThis article needs additional citations for verification. (April 2018) |
I do consider assignment statements and pointer variables to be among computer science's "most valuable treasures."

a pointing to the memory address associated with a variable b, i.e., a contains the memory address 1008 of the variable b. In this diagram, the computing architecture uses the same address space and data primitive for both pointers and non-pointers; this need not be the case.In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.
Using pointers significantly improves performance for repetitive operations, like traversing iterable data structures (e.g. strings, lookup tables, control tables, linked lists, and tree structures). In particular, it is often much cheaper in time and space to copy and dereference pointers than it is to copy and access the data to which the pointers point.
Pointers are also used to hold the addresses of entry points for called subroutines in procedural programming and for run-time linking to dynamic link libraries (DLLs). In object-oriented programming, pointers to functions are used for binding methods, often using virtual method tables.
A pointer is a simple, more concrete implementation of the more abstract reference data type. Several languages, especially low-level languages, support some type of pointer, although some have more restrictions on their use than others. While "pointer" has been used to refer to references in general, it more properly applies to data structures whose interface explicitly allows the pointer to be manipulated (arithmetically via pointer arithmetic) as a memory address, as opposed to a magic cookie or capability which does not allow such.[citation needed] Because pointers allow both protected and unprotected access to memory addresses, there are risks associated with using them, particularly in the latter case. Primitive pointers are often stored in a format similar to an integer; however, attempting to dereference or "look up" such a pointer whose value is not a valid memory address could cause a program to crash (or contain invalid data). To alleviate this potential problem, as a matter of type safety, pointers are considered a separate type parameterized by the type of data they point to, even if the underlying representation is an integer. Other measures may also be taken (such as validation and bounds checking), to verify that the pointer variable contains a value that is both a valid memory address and within the numerical range that the processor is capable of addressing.
History
[edit]In 1955, Soviet Ukrainian computer scientist Kateryna Yushchenko created the Address programming language that made possible indirect addressing and addresses of the highest rank – analogous to pointers. This language was widely used on the Soviet computers. However, it was unknown outside the Soviet Union and usually Harold Lawson is credited with the invention, in 1964, of the pointer.[2] In 2000, Lawson was presented the Computer Pioneer Award by the IEEE "[f]or inventing the pointer variable and introducing this concept into PL/I, thus providing for the first time, the capability to flexibly treat linked lists in a general-purpose high-level language".[3] His seminal paper on the concepts appeared in the June 1967 issue of CACM entitled: PL/I List Processing. According to the Oxford English Dictionary, the word pointer first appeared in print as a stack pointer in a technical memorandum by the System Development Corporation.
Formal description
[edit]In computer science, a pointer is a kind of reference.
A data primitive (or just primitive) is any datum that can be read from or written to computer memory using one memory access (for instance, both a byte and a word are primitives).
A data aggregate (or just aggregate) is a group of primitives that are logically contiguous in memory and that are viewed collectively as one datum (for instance, an aggregate could be 3 logically contiguous bytes, the values of which represent the 3 coordinates of a point in space). When an aggregate is entirely composed of the same type of primitive, the aggregate may be called an array; in a sense, a multi-byte word primitive is an array of bytes, and some programs use words in this way.
In the context of these definitions, a byte is the smallest primitive; each memory address specifies a different byte. The memory address of the initial byte of a datum is considered the memory address (or base memory address) of the entire datum.
A memory pointer (or just pointer) is a primitive, the value of which is intended to be used as a memory address; it is said that a pointer points to a memory address. It is also said that a pointer points to a datum [in memory] when the pointer's value is the datum's memory address.
More generally, a pointer is a kind of reference, and it is said that a pointer references a datum stored somewhere in memory; to obtain that datum is to dereference the pointer. The feature that separates pointers from other kinds of reference is that a pointer's value is meant to be interpreted as a memory address, which is a rather low-level concept.
References serve as a level of indirection: A pointer's value determines which memory address (that is, which datum) is to be used in a calculation. Because indirection is a fundamental aspect of algorithms, pointers are often expressed as a fundamental data type in programming languages; in statically (or strongly) typed programming languages, the type of a pointer determines the type of the datum to which the pointer points.
Architectural roots
[edit]Pointers are a very thin abstraction on top of the addressing capabilities provided by most modern architectures. In the simplest scheme, an address, or a numeric index, is assigned to each unit of memory in the system, where the unit is typically either a byte or a word – depending on whether the architecture is byte-addressable or word-addressable – effectively transforming all of memory into a very large array. The system would then also provide an operation to retrieve the value stored in the memory unit at a given address (usually utilizing the machine's general-purpose registers).
In the usual case, a pointer is large enough to hold more addresses than there are units of memory in the system. This introduces the possibility that a program may attempt to access an address which corresponds to no unit of memory, either because not enough memory is installed (i.e. beyond the range of available memory) or the architecture does not support such addresses. The first case may, in certain platforms such as the Intel x86 architecture, be called a segmentation fault (segfault). The second case is possible in the current implementation of AMD64, where pointers are 64 bit long and addresses only extend to 48 bits. Pointers must conform to certain rules (canonical addresses), so if a non-canonical pointer is dereferenced, the processor raises a general protection fault.
On the other hand, some systems have more units of memory than there are addresses. In this case, a more complex scheme such as memory segmentation or paging is employed to use different parts of the memory at different times. The last incarnations of the x86 architecture support up to 36 bits of physical memory addresses, which were mapped to the 32-bit linear address space through the PAE paging mechanism. Thus, only 1/16 of the possible total memory may be accessed at a time. Another example in the same computer family was the 16-bit protected mode of the 80286 processor, which, though supporting only 16 MB of physical memory, could access up to 1 GB of virtual memory, but the combination of 16-bit address and segment registers made accessing more than 64 KB in one data structure cumbersome.
In order to provide a consistent interface, some architectures provide memory-mapped I/O, which allows some addresses to refer to units of memory while others refer to device registers of other devices in the computer. There are analogous concepts such as file offsets, array indices, and remote object references that serve some of the same purposes as addresses for other types of objects.
Uses
[edit]Pointers are directly supported without restrictions in languages such as PL/I, C, C++, Pascal, FreeBASIC, and implicitly in most assembly languages. They are used mainly to construct references, which in turn are fundamental to construct nearly all data structures, and to pass data between different parts of a program.
In functional programming languages that rely heavily on lists, data references are managed abstractly by using primitive constructs like cons and the corresponding elements car and cdr, which can be thought of as specialised pointers to the first and second components of a cons-cell. This gives rise to some of the idiomatic "flavour" of functional programming. By structuring data in such cons-lists, these languages facilitate recursive means for building and processing data—for example, by recursively accessing the head and tail elements of lists of lists; e.g. "taking the car of the cdr of the cdr". By contrast, memory management based on pointer dereferencing in some approximation of an array of memory addresses facilitates treating variables as slots into which data can be assigned imperatively.
When dealing with arrays, the critical lookup operation typically involves a stage called address calculation which involves constructing a pointer to the desired data element in the array. In other data structures, such as linked lists, pointers are used as references to explicitly tie one piece of the structure to another.
Pointers are used to pass parameters by reference. This is useful if the programmer wants a function's modifications to a parameter to be visible to the function's caller. This is also useful for returning multiple values from a function.
Pointers can also be used to allocate and deallocate dynamic variables and arrays in memory. Since a variable will often become redundant after it has served its purpose, it is a waste of memory to keep it, and therefore it is good practice to deallocate it (using the original pointer reference) when it is no longer needed. Failure to do so may result in a memory leak (where available free memory gradually, or in severe cases rapidly, diminishes because of an accumulation of numerous redundant memory blocks).
C pointers
[edit]In C, the basic syntax to define a pointer is:[4]
// both declarations are considered valid:
int *ptr;
int* ptr;
This declares a variable ptr that stores a pointer to an object of type int. Other types can be used in place of int; for example, bool *ptr would declare a pointer to an object of type bool.
Because the C language does not specify an implicit initialization for objects of automatic storage duration,[5] pointer variables can sometimes point to unexpected locations, causing undefined behavior. To combat this, pointers are sometimes initialized with a null pointer value, represented in C by the NULL macro;[6] in C23 and later, nullptr is also available as an alternative.[7] nullptr is type-safe and has type nullptr_t, unlike NULL which expands to (void*)0.[8]
int *ptr = NULL;
int *ptr = nullptr; // since C23
Dereferencing a null pointer produces undefined behavior,[9] which can result in unpredictable bugs and results.
After a pointer has been declared, it can be assigned an address. In C, the address of a variable can be retrieved with the & unary operator:
// Declares the pointer variable
int *ptr = NULL;
// Creates a variable
int a = 5;
// Assigns the address of a to the pointer variable
ptr = &a;
Additionally, to dereference the pointer, an asterisk (*) can be used. This allows the assignment of a value to the address pointed to by a without having to be in the same scope.
*ptr = 8;
If a is accessed later, its new value will be 8.
This example may be clearer if memory is examined directly.
Assume that a is located at address 0x8130 in memory and ptr at 0x8134; also assume this is a 32-bit machine such that an int is 32-bits wide. The following is what would be in memory after the following code snippet is executed:
int a = 5;
int *ptr = NULL;
| Address | Contents |
|---|---|
| 0x8130 | 0x00000005 |
| 0x8134 | 0x00000000 |
(The null pointer shown here is 0x00000000.)
By assigning the address of a to ptr:
ptr = &a;
yields the following memory values:
| Address | Contents |
|---|---|
| 0x8130 | 0x00000005 |
| 0x8134 | 0x00008130 |
Then by dereferencing ptr by coding:
*ptr = 8;
the computer will take the contents of ptr (which is 0x8130), 'locate' that address, and assign 8 to that location yielding the following memory:
| Address | Contents |
|---|---|
| 0x8130 | 0x00000008 |
| 0x8134 | 0x00008130 |
Clearly, accessing a will yield the value of 8 because the previous instruction modified the contents of a by way of the pointer ptr.
Use in data structures
[edit]When setting up data structures like lists, queues and trees, it is necessary to have pointers to help manage how the structure is implemented and controlled. Typical examples of pointers are start pointers, end pointers, and stack pointers. These pointers can either be absolute (the actual physical address or a virtual address in virtual memory) or relative (an offset from an absolute start address ("base") that typically uses fewer bits than a full address, but will usually require one additional arithmetic operation to resolve).
Relative addresses are a form of manual memory segmentation, and share many of its advantages and disadvantages. A two-byte offset, containing a 16-bit, unsigned integer, can be used to provide relative addressing for up to 64 KiB (216 bytes) of a data structure. This can easily be extended to 128, 256 or 512 KiB if the address pointed to is forced to be aligned on a half-word, word or double-word boundary (but, requiring an additional "shift left" bitwise operation—by 1, 2 or 3 bits—in order to adjust the offset by a factor of 2, 4 or 8, before its addition to the base address). Generally, though, such schemes are a lot of trouble, and for convenience to the programmer absolute addresses (and underlying that, a flat address space) is preferred.
A one byte offset, such as the hexadecimal ASCII value of a character (e.g. X'29') can be used to point to an alternative integer value (or index) in an array (e.g., X'01'). In this way, characters can be very efficiently translated from 'raw data' to a usable sequential index and then to an absolute address without a lookup table.
C arrays
[edit]In C, array indexing is formally defined in terms of pointer arithmetic; that is, the language specification requires that a[i] be equivalent to *(a + i).[10] Thus in C, arrays can be thought of as pointers to consecutive areas of memory (with no gaps),[10] and the syntax for accessing arrays is identical for that which can be used to dereference pointers. For example, an array a can be declared and used in the following manner:
int a[5]; // Declares 5 contiguous integers
int *ptr = a; // Arrays can be used as pointers
ptr[0] = 1; // Pointers can be indexed with array syntax
*(a + 1) = 2; // Arrays can be dereferenced with pointer syntax
*(1 + a) = 2; // Pointer addition is commutative
2[a] = 4; // Subscript operator is commutative (perhaps unusual)
This allocates a block of five integers and names the block a, which acts as a pointer to the block. Another common use of pointers is to point to dynamically allocated memory from malloc which returns a consecutive block of memory of no less than the requested size that can be used as an array.
While most operators on arrays and pointers are equivalent, the result of the sizeof operator differs. In this example, sizeof(a) will evaluate to 5 * sizeof(int) (the size of the array), while sizeof(ptr) will evaluate to sizeof(int*), the size of the pointer itself.
Default values of an array can be declared like:
int a[5] = {2, 4, 3, 1, 5};
If array is located in memory starting at address 0x1000 on a 32-bit little-endian machine then memory will contain the following (values are in hexadecimal, like the addresses):
0 1 2 3 1000 2 0 0 0 1004 4 0 0 0 1008 3 0 0 0 100C 1 0 0 0 1010 5 0 0 0
Represented here are five integers: 2, 4, 3, 1, and 5. These five integers occupy 32 bits (4 bytes) each with the least-significant byte stored first (this is a little-endian CPU architecture) and are stored consecutively starting at address 0x1000.
The syntax for C with pointers is:
ameans 0x1000;a + 1means 0x1004: the "+ 1" means to add the size of 1int, which is 4 bytes;*ameans to dereference the contents ofa. Considering the contents as a memory address (0x1000), look up the value at that location (0x0002);a[i]means element numberi, 0-based, ofawhich is translated into*(a + i).
The last example is how to access the contents of a. Breaking it down:
a + iis the memory location of theith element ofa, starting ati = 0;*(a + i)takes that memory address and dereferences it to access the value.
C linked list
[edit]Below is an example definition of a linked list in C.
/* the empty linked list is represented by NULL
* or some other sentinel value */
#define EMPTY_LIST NULL
struct Link {
void *data; // data of this link
struct Link *next; // next link; EMPTY_LIST if there is none
};
This pointer-recursive definition is essentially the same as the reference-recursive definition from the language Haskell:
data Link a = Nil
| Cons a (Link a)
Nil is the empty list, and Cons a (Link a) is a cons cell of type a with another link also of type a.
The definition with references, however, is type-checked and does not use potentially confusing signal values. For this reason, data structures in C are usually dealt with via wrapper functions, which are carefully checked for correctness.
Pass-by-address using pointers
[edit]Pointers can be used to pass variables by their address, allowing their value to be changed. For example, consider the following C code:
// a copy of the int n can be changed within the function without affecting the calling code
void passByValue(int n) {
n = 12;
}
// a pointer m is passed instead. No copy of the value pointed to by m is created
void passByAddress(int *m) {
*m = 14;
}
int main(void) {
int x = 3;
// pass a copy of x's value as the argument
passByValue(x);
// the value was changed inside the function, but x is still 3 from here on
// pass x's address as the argument
passByAddress(&x);
// x was actually changed by the function and is now equal to 14 here
return 0;
}
Dynamic memory allocation
[edit]In some programs, the required amount of memory depends on what the user may enter. In such cases the programmer needs to allocate memory dynamically. This is done by allocating memory at the heap rather than on the stack, where variables usually are stored (although variables can also be stored in the CPU registers). Dynamic memory allocation can only be made through pointers, and names – like with common variables – cannot be given.
Pointers are used to store and manage the addresses of dynamically allocated blocks of memory. Such blocks are used to store data objects or arrays of objects. Most structured and object-oriented languages provide an area of memory, called the heap or free store, from which objects are dynamically allocated.
The example C code below illustrates how structure objects are dynamically allocated and referenced. The standard C library provides the function malloc() for allocating memory blocks from the heap. It takes the size of an object to allocate as a parameter and returns a pointer to a newly allocated block of memory suitable for storing the object, or it returns a null pointer if the allocation failed.
// Parts inventory item
typedef struct {
int id; // Part number
char *name; // Part name
float cost; // Cost
} Item;
// Allocate and initialize a new Item object
Item *makeItem(const char *name) {
Item *item;
// Allocate a block of memory for a new Item object
item = (Item *)malloc(sizeof(Item));
if (!item) {
return NULL;
}
// Initialize the members of the new Item
memset(item, 0, sizeof(Item));
item->id = -1;
item->name = NULL;
item->cost = 0.0;
// Save a copy of the name in the new Item
item->name = (char *)malloc(strlen(name) + 1);
if (!item->name) {
free(item);
return NULL;
}
strcpy(item->name, name);
// Return the newly created Item object
return item;
}
The code below illustrates how memory objects are dynamically deallocated, i.e., returned to the heap or free store. The standard C library provides the function free() for deallocating a previously allocated memory block and returning it back to the heap.
// Deallocate an Item object
void destroyItem(Item *item) {
// Check for a null object pointer
if (!item) {
return;
}
// Deallocate the name string saved within the Item
if (item->name) {
free(item->name);
item->name = NULL;
}
// Deallocate the Item object itself
free(item);
}
Memory-mapped hardware
[edit]On some computing architectures, pointers can be used to directly manipulate memory or memory-mapped devices.
Assigning addresses to pointers is an invaluable tool when programming microcontrollers. Below is a simple example declaring a pointer of type int and initialising it to a hexadecimal address in this example the constant 0x7FFF:
int *hardware_address = (int *)0x7FFF;
In the mid-1980s, using the BIOS to access the video capabilities of PCs was slow. Applications that were display-intensive typically used to access CGA video memory directly by casting the hexadecimal constant 0xB8000 to a pointer to an array of 80 unsigned 16-bit int values. Each value consisted of an ASCII code in the low byte, and a colour in the high byte. Thus, to put the letter 'A' at row 5, column 2 in bright white on blue, one would write code like the following:
#define VID ((unsigned short (*)[80])0xB8000)
void foo(void) {
VID[4][1] = 0x1F00 | 'A';
}
Use in control tables
[edit]Control tables that are used to control program flow usually make extensive use of pointers. The pointers, usually embedded in a table entry, may, for instance, be used to hold the entry points to subroutines to be executed, based on certain conditions defined in the same table entry. The pointers can however be simply indexes to other separate, but associated, tables comprising an array of the actual addresses or the addresses themselves (depending upon the programming language constructs available). They can also be used to point to earlier table entries (as in loop processing) or forward to skip some table entries (as in a switch or "early" exit from a loop). For this latter purpose, the "pointer" may simply be the table entry number itself and can be transformed into an actual address by simple arithmetic.
Typed pointers and casting
[edit]In many languages, pointers have the additional restriction that the object they point to has a specific type. For example, a pointer may be declared to point to an integer; the language will then attempt to prevent the programmer from pointing it to objects which are not integers, such as floating-point numbers, eliminating some errors.
For example, in the following C code:
int *money;
char *bags;
money would be an integer pointer and bags would be a char pointer.
The following would yield a compiler warning of "assignment from incompatible pointer type" under GCC:
bags = money;
because money and bags were declared with different types.
To suppress the compiler warning, it must be made explicit to make the assignment by typecasting it:
bags = (char *)money;
which says to cast the integer pointer of money to a char pointer and assign to bags.
A 2005 draft of the C standard requires that casting a pointer derived from one type to one of another type should maintain the alignment correctness for both types (6.3.2.3 Pointers, par. 7):[11]
char *external_buffer = "abcdef";
int *internal_data;
internal_data = (int *)external_buffer;
// UNDEFINED BEHAVIOUR if "the resulting pointer is not correctly aligned"
In languages that allow pointer arithmetic, arithmetic on pointers takes into account the size of the type. For example, adding an integer number to a pointer produces another pointer that points to an address that is higher by that number times the size of the type. This allows us to easily compute the address of elements of an array of a given type, as was shown in the C arrays example above. When a pointer of one type is cast to another type of a different size, the programmer should expect that pointer arithmetic will be calculated differently. In C, for example, if the money array starts at 0x2000 and sizeof(int) is 4 bytes whereas sizeof(char) is 1 byte, then money + 1 will point to 0x2004, but bags + 1 would point to 0x2001. Other risks of casting include loss of data when "wide" data is written to "narrow" locations (e.g. bags[0] = 65537;), unexpected results when bit-shifting values, and comparison problems, especially with signed vs unsigned values.
Although it is impossible in general to determine at compile-time which casts are safe, some languages store run-time type information which can be used to confirm that these dangerous casts are valid at runtime. Other languages merely accept a conservative approximation of safe casts, or none at all.
Value of pointers
[edit]In C and C++, even if two pointers compare as equal that doesn't mean they are equivalent. In these languages and LLVM, the rule is interpreted to mean that "just because two pointers point to the same address, does not mean they are equal in the sense that they can be used interchangeably", the difference between the pointers referred to as their provenance.[12] Casting to an integer type such as uintptr_t is implementation-defined and the comparison it provides does not provide any more insight as to whether the two pointers are interchangeable. In addition, further conversion to bytes and arithmetic will throw off optimizers trying to keep track the use of pointers, a problem still being elucidated in academic research.[13]
Making pointers safer
[edit]As a pointer allows a program to attempt to access an object that may not be defined, pointers can be the origin of a variety of programming errors. However, the usefulness of pointers is so great that it can be difficult to perform programming tasks without them. Consequently, many languages have created constructs designed to provide some of the useful features of pointers without some of their pitfalls, also sometimes referred to as pointer hazards. In this context, pointers that directly address memory (as used in this article) are referred to as raw pointers, by contrast with smart pointers or other variants.
One major problem with pointers is that as long as they can be directly manipulated as a number, they can be made to point to unused addresses or to data which is being used for other purposes. Many languages, including most functional programming languages and recent imperative programming languages like Java, replace pointers with a more opaque type of reference, typically referred to as simply a reference, which can only be used to refer to objects and not manipulated as numbers, preventing this type of error. Array indexing is handled as a special case.
A pointer which does not have any address assigned to it is called a wild pointer. Any attempt to use such uninitialized pointers can cause unexpected behavior, either because the initial value is not a valid address, or because using it may damage other parts of the program. The result is often a segmentation fault, storage violation or wild branch (if used as a function pointer or branch address).
In systems with explicit memory allocation, it is possible to create a dangling pointer by deallocating the memory region it points into. This type of pointer is dangerous and subtle because a deallocated memory region may contain the same data as it did before it was deallocated but may be then reallocated and overwritten by unrelated code, unknown to the earlier code. Languages with garbage collection prevent this type of error because deallocation is performed automatically when there are no more references in scope.
Some languages, like C++, support smart pointers, which use a simple form of reference counting to help track allocation of dynamic memory in addition to acting as a reference. In the absence of reference cycles, where an object refers to itself indirectly through a sequence of smart pointers, these eliminate the possibility of dangling pointers and memory leaks. Delphi strings support reference counting natively.
The Rust programming language introduces a borrow checker, pointer lifetimes, and an optimisation based around option types for null pointers to eliminate pointer bugs, without resorting to garbage collection.
Special kinds of pointers
[edit]Kinds defined by value
[edit]Null pointer
[edit]A null pointer has a value reserved for indicating that the pointer does not refer to a valid object. Null pointers are routinely used to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types and to the Nothing value in an option type.
Dangling pointer
[edit]A dangling pointer is a pointer that does not point to a valid object and consequently may make a program crash or behave oddly. In the Pascal or C programming languages, pointers that are not specifically initialized may point to unpredictable addresses in memory.
The following example code shows a dangling pointer:
int func(void) {
char *p1 = (char *)malloc(sizeof(char)); // (undefined) value of some place on the heap
char *p2; // dangling (uninitialized) pointer
*p1 = 'a'; // This is OK, assuming malloc() has not returned NULL.
*p2 = 'b'; // This invokes undefined behavior
}
Here, p2 may point to anywhere in memory, so performing the assignment *p2 = 'b'; can corrupt an unknown area of memory or trigger a segmentation fault.
Wild branch
[edit]Where a pointer is used as the address of the entry point to a program or start of a function which doesn't return anything and is also either uninitialized or corrupted, if a call or jump is nevertheless made to this address, a "wild branch" is said to have occurred. In other words, a wild branch is a function pointer that is wild (dangling).
The consequences are usually unpredictable and the error may present itself in several different ways depending upon whether or not the pointer is a "valid" address and whether or not there is (coincidentally) a valid instruction (opcode) at that address. The detection of a wild branch can present one of the most difficult and frustrating debugging exercises since much of the evidence may already have been destroyed beforehand or by execution of one or more inappropriate instructions at the branch location. If available, an instruction set simulator can usually not only detect a wild branch before it takes effect, but also provide a complete or partial trace of its history.
Kinds defined by structure
[edit]Autorelative pointer
[edit]An autorelative pointer is a pointer whose value is interpreted as an offset from the address of the pointer itself; thus, if a data structure has an autorelative pointer member that points to some portion of the data structure itself, then the data structure may be relocated in memory without having to update the value of the auto relative pointer.[14]
The cited patent also uses the term self-relative pointer to mean the same thing. However, the meaning of that term has been used in other ways:
- to mean an offset from the address of a structure rather than from the address of the pointer itself;[citation needed]
- to mean a pointer containing its own address, which can be useful for reconstructing in any arbitrary region of memory a collection of data structures that point to each other.[15]
Based pointer
[edit]A based pointer is a pointer whose value is an offset from the value of another pointer. This can be used to store and load blocks of data, assigning the address of the beginning of the block to the base pointer.[16]
Kinds defined by use or datatype
[edit]Multiple indirection
[edit]In some languages, a pointer can reference another pointer, requiring multiple dereference operations to get to the original value. While each level of indirection may add a performance cost, it is sometimes necessary in order to provide correct behavior for complex data structures. For example, in C it is typical to define a linked list in terms of an element that contains a pointer to the next element of the list:
typedef struct Element {
struct Element *next;
int value;
} Element;
Element *head = NULL;
This implementation uses a pointer to the first element in the list as a surrogate for the entire list. If a new value is added to the beginning of the list, head has to be changed to point to the new element. Since C arguments are always passed by value, using double indirection allows the insertion to be implemented correctly, and has the desirable side-effect of eliminating special case code to deal with insertions at the front of the list:
// Given a sorted list at *head, insert the element item at the first
// location where all earlier elements have lesser or equal value.
void insert(Element **head, Element *item) {
// p points to a pointer to an element
Element**p = head;
while (*p && (*p)->value < item->value) {
p = &(*p)->next;
}
item->next = *p;
*p = item;
}
// Caller does this:
insert(&head, item);
In this case, if the value of item is less than that of head, the caller's head is properly updated to the address of the new item.
A basic example is in the argv argument to the main function in C (and C++), which is given in the prototype as char** argv (or char* argv[])—this is because the variable argv itself is a pointer to an array of strings (an array of arrays), so *argv is a pointer to the 0th string (by convention the name of the program), and **argv is the 0th character of the 0th string.
Function pointer
[edit]In some languages, a pointer can reference executable code, i.e., it can point to a function, method, or procedure. A function pointer will store the address of a function to be invoked. While this facility can be used to call functions dynamically, it is often a favorite technique of virus and other malicious software writers.
// Function with two integer parameters returning an integer value
int sum(int n1, int n2) {
return n1 + n2;
}
int main(void) {
int a = 3;
int b = 5;
// Function pointer to a function (int, int) -> int
// and points to function sum
int (*fp)(int, int) = ∑
int x = (*fp)(a, b); // Calls function sum with arguments a and b
int y = sum(a, b); // Calls function sum with arguments a and b
}
Back pointer
[edit]In doubly linked lists or tree structures, a back pointer held on an element 'points back' to the item referring to the current element. These are useful for navigation and manipulation, at the expense of greater memory use.
Simulation using an array index
[edit]It is possible to simulate pointer behavior using an index to an (normally one-dimensional) array.
Primarily for languages which do not support pointers explicitly but do support arrays, the array can be thought of and processed as if it were the entire memory range (within the scope of the particular array) and any index to it can be thought of as equivalent to a general-purpose register in assembly language (that points to the individual bytes but whose actual value is relative to the start of the array, not its absolute address in memory). Assuming the array is, say, a contiguous 16 megabyte character data structure, individual bytes (or a string of contiguous bytes within the array) can be directly addressed and manipulated using the name of the array with a 31 bit unsigned integer as the simulated pointer (this is quite similar to the C arrays example shown above). Pointer arithmetic can be simulated by adding or subtracting from the index, with minimal additional overhead compared to genuine pointer arithmetic.
In languages that abstract away pointers and pointer arithmetic, such as Java, one can use iterators. For example, in C++ the iterator can overload operator++, similar to the traditional incrementing a pointer in C.
import std;
using std::vector;
int main() {
// collections such as vector define an iterator
// type, with begin() and end() method
vector<int> v{1, 2, 3, 4, 5};
// 'it' is of type vector<int>::iterator
// iterate through the collection similar to
// pointer arithmetic
for (auto it = v.begin(); it != v.end(); ++it) {
std::print("{}", *it);
}
}
It is even theoretically possible, using the above technique, together with a suitable instruction set simulator to simulate any machine code or the intermediate (byte code) of any processor/language in another language that does not support pointers at all (for example Java / JavaScript). To achieve this, the binary code can initially be loaded into contiguous bytes of the array for the simulator to "read", interpret and execute entirely within the memory containing the same array. If necessary, to completely avoid buffer overflow problems, bounds checking can usually be inserted by the compiler (or if not, hand coded in the simulator).
Support in various programming languages
[edit]Ada
[edit]Ada is a strongly typed language where all pointers are typed and only safe type conversions are permitted. All pointers are by default initialized to null, and any attempt to access data through a null pointer causes an exception to be raised. Pointers in Ada are called access types. Ada 83 did not permit arithmetic on access types (although many compiler vendors provided for it as a non-standard feature), but Ada 95 supports “safe” arithmetic on access types via the package System.Storage_Elements.
BASIC
[edit]Several old versions of BASIC for the Windows platform had support for STRPTR() to return the address of a string, and for VARPTR() to return the address of a variable. Visual Basic 5 also had support for OBJPTR() to return the address of an object interface, and for an ADDRESSOF operator to return the address of a function. The types of all of these are integers, but their values are equivalent to those held by pointer types.
Newer dialects of BASIC, such as FreeBASIC or BlitzMax, have exhaustive pointer implementations, however. In FreeBASIC, arithmetic on ANY pointers (equivalent to C's void*) are treated as though the ANY pointer was a byte width. ANY pointers cannot be dereferenced, as in C. Also, casting between ANY and any other type's pointers will not generate any warnings.
dim as integer f = 257
dim as any ptr g = @f
dim as integer ptr i = g
assert(*i = 257)
assert( (g + 4) = (@f + 1) )
C and C++
[edit]In C and C++ pointers are variables that store addresses and can be null. Each pointer has a type it points to, but one can freely cast between pointer types (but not between a function pointer and an object pointer). A special pointer type called the "void pointer" allows pointing to any (non-function) object, but is limited by the fact that it cannot be dereferenced directly (it shall be cast). The address itself can often be directly manipulated by casting a pointer to and from an integral type of sufficient size, though the results are implementation-defined and may indeed cause undefined behavior; while earlier C standards did not have an integral type that was guaranteed to be large enough, C99 specifies the uintptr_t typedef name defined in <stdint.h>, but an implementation need not provide it.
C++ fully supports C pointers and C typecasting. It also supports a new group of typecasting operators to help catch some unintended dangerous casts at compile-time. Since C++11, the C++ standard library also provides smart pointers (unique_ptr, shared_ptr and weak_ptr) which can be used in some situations as a safer alternative to primitive C pointers. C++ also supports another form of reference, quite different from a pointer, called simply a reference or reference type.
Pointer arithmetic, that is, the ability to modify a pointer's target address with arithmetic operations (as well as magnitude comparisons), is restricted by the language standard to remain within the bounds of a single array object (or just after it), and will otherwise invoke undefined behavior. Adding or subtracting from a pointer moves it by a multiple of the size of its datatype. For example, adding 1 to a pointer to 4-byte integer values will increment the pointer's pointed-to byte-address by 4. This has the effect of incrementing the pointer to point at the next element in a contiguous array of integers—which is often the intended result. Pointer arithmetic cannot be performed on void pointers because the void type has no size, and thus the pointed address cannot be added to, although gcc and other compilers will perform byte arithmetic on void* as a non-standard extension, treating it as if it were char*.
Pointer arithmetic provides the programmer with a single way of dealing with different types: adding and subtracting the number of elements required instead of the actual offset in bytes. (Pointer arithmetic with char* pointers uses byte offsets, because sizeof(char) is 1 by definition.) In particular, the C definition explicitly declares that the syntax a[n], which is the n-th element of the array a, is equivalent to *(a + n), which is the content of the element pointed by a + n. This implies that n[a] is equivalent to a[n], and one can write, e.g., a[3] or 3[a] equally well to access the fourth element of an array a.
While powerful, pointer arithmetic can be a source of computer bugs. It tends to confuse novice programmers, forcing them into different contexts: an expression can be an ordinary arithmetic one or a pointer arithmetic one, and sometimes it is easy to mistake one for the other. In response to this, many modern high-level computer languages (for example Java) do not permit direct access to memory using addresses. Also, the safe C dialect Cyclone addresses many of the issues with pointers. See C programming language for more discussion.
The void pointer, or void*, is supported in ANSI C and C++ as a generic pointer type. A pointer to void can store the address of any object (not function),[a] and, in C, is implicitly converted to any other object pointer type on assignment, but it must be explicitly cast if dereferenced.
K&R C used char* for the "type-agnostic pointer" purpose (before ANSI C).
int x = 4;
void* p1 = &x;
int* p2 = p1; // void* implicitly converted to int*: valid C, but not C++
int a = *p2;
int b = *(int*)p1; // when dereferencing inline, there is no implicit conversion
C++ does not allow the implicit conversion of void* to other pointer types, even in assignments. This was a design decision to avoid careless and even unintended casts, though most compilers only output warnings, not errors, when encountering other casts.
int x = 4;
void* p1 = &x;
int* p2 = p1; // this fails in C++: there is no implicit conversion from void*
int* p3 = (int*)p1; // C-style cast
int* p4 = reinterpret_cast<int*>(p1); // C++ cast
In C++, there is no void& (reference to void) to complement void* (pointer to void), because references behave like aliases to the variables they point to, and there can never be a variable whose type is void.
Pointer-to-member
[edit]In C++ pointers to non-static members of a class can be defined. If a class MyClass has a member T a then &MyClass::a is a pointer to the member a of type T MyClass::*. This member can be an object or a function.[18] They can be used on the right-hand side of operators .* and ->* to access the corresponding member.
struct MyStruct {
int a;
[[nodiscard]]
int f() const noexcept {
return a;
}
};
MyStruct s1{};
MyStruct* ptrS = &s1;
int MyStruct::* ptr = &MyStruct::a; // pointer to MyStruct::a
int (MyStruct::* fp)() const = &MyStruct::f; // pointer to MyStruct::f
s1.*ptr = 1;
std::println("{}", (s1.*fp)()); // prints 1
ptrS->*ptr = 2;
std::println("{}", (ptrS->*fp)()); // prints 2
Pointer declaration syntax overview
[edit]These pointer declarations cover most variants of pointer declarations. Of course it is possible to have triple pointers, but the main principles behind a triple pointer already exist in a double pointer. The naming used here is what the expression typeid(type).name() equals for each of these types when using g++ or clang.[19][20]
char a[5][5]; // array of arrays of chars
char* b[5]; // array of pointers to chars
char** c; // pointer to pointer to char ("double pointer")
char (*d)[5]; // pointer to array(s) of chars
char* e(); // function which returns a pointer to char(s)
char (*f)(); // pointer to a function which returns a char
char (*g())[5]; // function which returns pointer to an array of chars
char (*h[5])(); // an array of pointers to functions which return a char
The following declarations involving pointers-to-member are valid only in C++:
class X;
class Y;
char X::* a; // pointer-to-member to char
char X::* b[5]; // array of pointers-to-member to char
char* X::* c; // pointer-to-member to pointer to char(s)
char X::** d; // pointer to pointer-to-member to char
char (*e)[5]; // pointer-to-member to array(s) of chars
char X::* f(); // function which returns a pointer-to-member to char
char Y::* X::* g; // pointer-to-member to pointer-to-member to pointer to char(s)
char X::* X::* h; // pointer-to-member to pointer-to-member to pointer to char(s)
char (X::* i())[5]; // function which returns pointer-to-member to an array of chars
char (X::* j)() // pointer-to-member-function which returns a char
char (X::* k[5])(); // an array of pointers-to-member-functions which return a char
The () and [] have a higher priority than *.[21]
C#
[edit]In the C# programming language, pointers are supported by either marking blocks of code that include pointers with the unsafe keyword, or by using the System.Runtime.CompilerServices assembly provisions for pointer access.
The syntax is essentially the same as in C++, and the address pointed can be either managed or unmanaged memory. However, pointers to managed memory (any pointer to a managed object) must be declared using the fixed keyword, which prevents the garbage collector from moving the pointed object as part of memory management while the pointer is in scope, thus keeping the pointer address valid.
However, an exception to this is from using the IntPtr structure, which is a memory managed equivalent to int*, and does not require the unsafe keyword nor the CompilerServices assembly. This type is often returned when using methods from the System.Runtime.InteropServices, for example:
using System;
using System.Runtime.InteropServices;
// Get 16 bytes of memory from the process's unmanaged memory
IntPtr pointer = Marshal.AllocHGlobal(16);
// Do something with the allocated memory
// Free the allocated memory
Marshal.FreeHGlobal(pointer);
The .NET framework includes many classes and methods in the System and System.Runtime.InteropServices namespaces (such as the Marshal class) which convert .NET types (for example, System.String) to and from many unmanaged types and pointers (for example, LPWSTR or void*) to allow communication with unmanaged code. Most such methods have the same security permission requirements as unmanaged code, since they can affect arbitrary places in memory.
C# allows stack-allocated arrays in safe code using System.Span.[22]
namespace Wikipedia.Examples;
using System;
public class Example
{
static void Main(string[] args)
{
int num = 1024;
unsafe
{
// convert an int into bytes by creating a byte pointer
byte* p = (byte*)&number;
Console.Write("The 4 bytes of the integer are: ");
for (int i = 0; i < sizeof(int); ++i)
{
Console.Write(" {0:X2}", *p);
++p;
}
Console.WriteLine();
}
// Stack-allocated arrays can be done either through pointers or Span<T>
unsafe
{
int* numbers = stackalloc int[5];
}
Span<int> numbers = stackalloc int[5];
}
}
C#, like C and C++, also has a void* (void pointer) type, but it is highly unrecommended.[23]
COBOL
[edit]The COBOL programming language supports pointers to variables. Primitive or group (record) data objects declared within the LINKAGE SECTION of a program are inherently pointer-based, where the only memory allocated within the program is space for the address of the data item (typically a single memory word). In program source code, these data items are used just like any other WORKING-STORAGE variable, but their contents are implicitly accessed indirectly through their LINKAGE pointers.
Memory space for each pointed-to data object is typically allocated dynamically using external CALL statements or via embedded extended language constructs such as EXEC CICS or EXEC SQL statements.
Extended versions of COBOL also provide pointer variables declared with USAGE IS POINTER clauses. The values of such pointer variables are established and modified using SET and SET ADDRESS statements.
Some extended versions of COBOL also provide PROCEDURE-POINTER variables, which are capable of storing the addresses of executable code.
PL/I
[edit]The PL/I language provides full support for pointers to all data types (including pointers to structures), recursion, multitasking, string handling, and extensive built-in functions. PL/I was quite a leap forward compared to the programming languages of its time.[citation needed] PL/I pointers are untyped, and therefore no casting is required for pointer dereferencing or assignment. The declaration syntax for a pointer is DECLARE xxx POINTER;, which declares a pointer named "xxx". Pointers are used with BASED variables. A based variable can be declared with a default locator (DECLARE xxx BASED(ppp); or without (DECLARE xxx BASED;), where xxx is a based variable, which may be an element variable, a structure, or an array, and ppp is the default pointer). Such a variable can be address without an explicit pointer reference (xxx=1;, or may be addressed with an explicit reference to the default locator (ppp), or to any other pointer (qqq->xxx=1;).
Pointer arithmetic is not part of the PL/I standard, but many compilers allow expressions of the form ptr = ptr±expression. IBM PL/I also has the builtin function PTRADD to perform the arithmetic. Pointer arithmetic is always performed in bytes.
IBM Enterprise PL/I compilers have a new form of typed pointer called a HANDLE.
D
[edit]The D programming language is a derivative of C and C++ which fully supports C pointers and C typecasting.
Eiffel
[edit]The Eiffel object-oriented language employs value and reference semantics without pointer arithmetic. Nevertheless, pointer classes are provided. They offer pointer arithmetic, typecasting, explicit memory management, interfacing with non-Eiffel software, and other features.
Fortran
[edit]Fortran-90 introduced a strongly typed pointer capability. Fortran pointers contain more than just a simple memory address. They also encapsulate the lower and upper bounds of array dimensions, strides (for example, to support arbitrary array sections), and other metadata. An association operator, => is used to associate a POINTER to a variable which has a TARGET attribute. The Fortran-90 ALLOCATE statement may also be used to associate a pointer to a block of memory. For example, the following code might be used to define and create a linked list structure:
type real_list_t
real :: sample_data(100)
type (real_list_t), pointer :: next => null ()
end type
type (real_list_t), target :: my_real_list
type (real_list_t), pointer :: real_list_temp
real_list_temp => my_real_list
do
read (1,iostat=ioerr) real_list_temp%sample_data
if (ioerr /= 0) exit
allocate (real_list_temp%next)
real_list_temp => real_list_temp%next
end do
Fortran-2003 adds support for procedure pointers. Also, as part of the C Interoperability feature, Fortran-2003 supports intrinsic functions for converting C-style pointers into Fortran pointers and back.
Go
[edit]Go has pointers. Its declaration syntax is equivalent to that of C, but written the other way around, ending with the type. Unlike C, Go has garbage collection, and disallows pointer arithmetic. Reference types, like in C++, do not exist. Some built-in types, like maps and channels, are boxed (i.e. internally they are pointers to mutable structures), and are initialized using the make function. In an approach to unified syntax between pointers and non-pointers, the arrow (->) operator has been dropped: the dot operator on a pointer refers to the field or method of the dereferenced object. This, however, only works with 1 level of indirection.
Java
[edit]There is no explicit representation of pointers in Java. Instead, more complex data structures like objects and arrays are implemented using references. The language does not provide any explicit pointer manipulation operators. It is still possible for code to attempt to dereference a null reference (null pointer), however, which results in a run-time exception being thrown. The space occupied by unreferenced memory objects is recovered automatically by garbage collection at run-time.[24]
Java provides the classes java.lang.ref.WeakReference and java.lang.ref.PhantomReference, which respectively implement weak references and phantom references.
Modula-2
[edit]Pointers are implemented very much as in Pascal, as are VAR parameters in procedure calls. Modula-2 is even more strongly typed than Pascal, with fewer ways to escape the type system. Some of the variants of Modula-2 (such as Modula-3) include garbage collection.
Oberon
[edit]Much as with Modula-2, pointers are available. There are still fewer ways to evade the type system and so Oberon and its variants are still safer with respect to pointers than Modula-2 or its variants. As with Modula-3, garbage collection is a part of the language specification.
Pascal
[edit]Unlike many languages that feature pointers, standard ISO Pascal only allows pointers to reference dynamically created variables that are anonymous and does not allow them to reference standard static or local variables.[25] It does not have pointer arithmetic. Pointers also must have an associated type and a pointer to one type is not compatible with a pointer to another type (e.g. a pointer to a char is not compatible with a pointer to an integer). This helps eliminate the type security issues inherent with other pointer implementations, particularly those used for PL/I or C. It also removes some risks caused by dangling pointers, but the ability to dynamically let go of referenced space by using the dispose standard procedure (which has the same effect as the free library function found in C) means that the risk of dangling pointers has not been eliminated.[26]
However, in some commercial and open source Pascal (or derivatives) compiler implementations —like Free Pascal,[27] Turbo Pascal or the Object Pascal in Embarcadero Delphi— a pointer is allowed to reference standard static or local variables and can be cast from one pointer type to another. Moreover, pointer arithmetic is unrestricted: adding or subtracting from a pointer moves it by that number of bytes in either direction, but using the Inc or Dec standard procedures with it moves the pointer by the size of the data type it is declared to point to. An untyped pointer is also provided under the name Pointer, which is compatible with other pointer types.
Perl
[edit]The Perl programming language supports pointers, although rarely used, in the form of the pack and unpack functions. These are intended only for simple interactions with compiled OS libraries. In all other cases, Perl uses references, which are typed and do not allow any form of pointer arithmetic. They are used to construct complex data structures.[28]
Rust
[edit]The Rust language has pointers, however raw pointers must be wrapped inside unsafe blocks. Most operations with raw pointers can be found in std::ptr. Otherwise, using smart pointers Rust has the following:
std::boxed::Box: equivalent to a unique pointer[29]std::rc::Rc: equivalent to a reference-counted single-threaded shared pointerstd::sync::Arc: equivalent to an atomically reference-counted thread safe shared pointerstd::rc::Weak: equivalent to a weak pointer[30]
A pointer to T (T* in C/C++) is written as *T in Rust. The following demonstrates raw pointers in Rust:
fn main() {
let mut num: i32 = 42;
let r1 = &num as *const i32;
let r2 = &mut num as *mut i32;
unsafe {
println!("r1 points to: {}", *r1);
println!("r2 points to: {}", *r2);
*r2 = 100;
println!("num is now: {}", num);
}
}
See also
[edit]Notes
[edit]References
[edit]- ^ Donald Knuth (1974). "Structured Programming with go to Statements" (PDF). Computing Surveys. 6 (5): 261–301. CiteSeerX 10.1.1.103.6084. doi:10.1145/356635.356640. S2CID 207630080. Archived from the original (PDF) on August 24, 2009.
- ^ Reilly, Edwin D. (2003). Milestones in Computer Science and Information Technology. Greenwood Publishing Group. p. 204. ISBN 9781573565219. Retrieved 2018-04-13.
Harold Lawson pointer.
- ^ "IEEE Computer Society awards list". Awards.computer.org. Archived from the original on 2011-03-22. Retrieved 2018-04-13.
- ^ ISO/IEC 9899, clause 6.7.5.1, paragraph 1.
- ^ ISO/IEC 9899, clause 6.7.8, paragraph 10.
- ^ ISO/IEC 9899, clause 7.17, paragraph 3: NULL... which expands to an implementation-defined null pointer constant...
- ^ Gustedt, Jens; Meneide, JeanHeyd. "Introduce the nullptr constant". Retrieved 13 July 2025.
- ^ "Predefined null pointer constant (since C23)". cppreference.com. Retrieved 16 September 2025.
- ^ ISO/IEC 9899, clause 6.5.3.2, paragraph 4, footnote 87: If an invalid value has been assigned to the pointer, the behavior of the unary
*operator is undefined... Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer... - ^ a b Plauger, P J; Brodie, Jim (1992). ANSI and ISO Standard C Programmer's Reference. Redmond, WA: Microsoft Press. pp. 108, 51. ISBN 978-1-55615-359-4.
An array type does not contain additional holes because all other types pack tightly when composed into arrays [at page 51]
- ^ WG14 N1124, C – Approved standards: ISO/IEC 9899 – Programming languages – C, 2005-05-06.
- ^ Jung, Ralf. "Pointers Are Complicated II, or: We need better language specs".
- ^ Jung, Ralf. "Pointers Are Complicated, or: What's in a Byte?".
- ^ us patent 6625718, Steiner, Robert C. (Broomfield, CO), "Pointers that are relative to their own present locations", issued 2003-09-23, assigned to Avaya Technology Corp. (Basking Ridge, NJ)
- ^ us patent 6115721, Nagy, Michael (Tampa, FL), "System and method for database save and restore using self-pointers", issued 2000-09-05, assigned to IBM (Armonk, NY)
- ^ "Based Pointers". Msdn.microsoft.com. Retrieved 2018-04-13.
- ^ "Converting between function and object pointers". CWG (195). Retrieved 2024-02-15 – via cplusplus.github.io.
- ^ "Pointers to Member Functions". C++ Super-FAQ. Standard C++ Foundation. Retrieved 2022-11-26.
- ^ "c++filt(1) - Linux man page".
- ^ "Itanium C++ ABI".
- ^ Bilting, Ulf; Skansholm, Jan. Vägen till C [The Road to C] (in Swedish) (3rd ed.). p. 169. ISBN 91-44-01468-6.
- ^ "stackalloc expression". learn.microsoft.com. Microsoft Learn. 10 July 2024.
- ^ "Unsafe code, pointer types, and function pointers". learn.microsoft.com. Microsoft Learn. 6 February 2025.
- ^ Nick Parlante, [1], Stanford Computer Science Education Library, pp. 9–10 (2000).
- ^ ISO 7185 Pascal Standard (unofficial copy), section 6.4.4 Pointer-types Archived 2017-04-24 at the Wayback Machine and subsequent.
- ^ J. Welsh, W. J. Sneeringer, and C. A. R. Hoare, "Ambiguities and Insecurities in Pascal," Software: Practice and Experience 7, pp. 685–696 (1977)
- ^ Free Pascal Language Reference guide, section 3.4 Pointers
- ^ Contact details. "// Making References (Perl References and nested data structures)". Perldoc.perl.org. Retrieved 2018-04-13.
- ^ "Box in std::boxed". doc.rust-lang.org. 4 August 2025.
- ^ "Weak in std::rc". doc.rust-lang.org. 4 August 2025.
External links
[edit]- PL/I List Processing Paper from the June, 1967 issue of CACM
- cdecl.org A tool to convert pointer declarations to plain English
- Over IQ.com A beginner level guide describing pointers in a plain English.
- Pointers and Memory Introduction to pointers – Stanford Computer Science Education Library
- Pointers in C programming Archived 2019-06-09 at the Wayback Machine A visual model for beginner C programmiers
- 0pointer.de A terse list of minimum length source codes that dereference a null pointer in several different programming languages
- "The C book" – containing pointer examples in ANSI C
- Joint Technical Committee ISO/IEC JTC 1, Subcommittee SC 22, Working Group WG 14 (2007-09-08). International Standard ISO/IEC 9899 (PDF).
{{cite book}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link) Committee draft.
Pointer (computer programming)
View on Grokipediaint *ptr;, where the asterisk denotes a pointer to an integer, and initialized using the address-of operator &, as in ptr = &variable;.[1] Dereferencing with * accesses or modifies the pointed-to value, enabling pass-by-reference semantics in functions to avoid copying large data structures.[1] Pointers also support more complex uses, such as pointing to functions or structure members via the arrow operator ->, which streamlines access in object-oriented contexts.[1]
While powerful for systems programming and performance-critical applications, pointers require careful management to prevent issues like uninitialized references or invalid addresses, which can cause program crashes.[2] Modern languages like Rust and Go incorporate safer variants, such as references or raw pointers with ownership rules, to mitigate these risks while retaining low-level control.[4] Overall, pointers remain a cornerstone of memory management in imperative and systems-level programming.
Basics
Definition and Core Concepts
In computer programming, a pointer is a data type that stores the memory address of another value, rather than the value itself, thereby enabling indirect access and manipulation of data stored at that location.[5] This mechanism allows programs to reference and interact with data without directly embedding the data's contents, facilitating more flexible and efficient memory usage.[2] At its core, a pointer serves as an address holder, distinct from the pointee, which is the actual data or object residing at the referenced memory address; accessing the pointee requires dereferencing the pointer to retrieve or modify the target's value.[2] This distinction abstracts away the physical details of memory locations, allowing programmers to treat addresses as symbolic references for operations like linking data structures or passing large objects by reference, which optimizes performance by avoiding unnecessary data copying.[6] Pointers thus play a pivotal role in abstracting memory management, enabling indirect addressing that supports advanced programming techniques while relying on the underlying hardware's ability to resolve addresses.[7] To illustrate conceptually, imagine a memory layout where a variable namedx occupies bytes at address 0x1000, holding the integer value 42; a pointer variable p at address 0x2000 would store the value 0x1000, effectively "pointing" to x—dereferencing p then yields 42, as if following an arrow from p to the location of x. This pointer-pointee relationship highlights how pointers enable dynamic referencing without altering the pointee's storage.[5]
Pointers operate within the memory model of the von Neumann architecture, which assumes a flat, linear address space where both program instructions and data reside in a unified sequence of addressable locations, typically bytes, each uniquely identified by a numeric address starting from zero.[8] In this model, memory is treated as a contiguous array of bytes, allowing pointers to represent any valid address as an integer offset, independent of the data type stored there, which underpins the architecture's stored-program concept.[9]
Representation and Value
In computer systems, pointers are internally represented as fixed-size integers that encode memory addresses in binary form. The size of these integers corresponds to the system's architecture; for instance, on 32-bit systems, pointers are typically 32 bits (4 bytes) long, allowing addressing of up to 4 gigabytes of memory, while on 64-bit systems, they are 64 bits (8 bytes) long to support vastly larger address spaces.[10][11] This binary encoding directly maps to the machine's word size, ensuring efficient storage and manipulation within registers and memory. The value held by a pointer represents a memory address, most commonly an absolute virtual address within the process's address space, which the memory management unit (MMU) translates to a physical address at runtime. In some architectures, pointers may instead store relative offsets from a base address, but absolute addressing predominates in flat memory models used by modern operating systems. Certain pointer values are considered invalid; for example, an all-zeroes value often denotes the null pointer, indicating no valid object or an inaccessible location in various systems, though the exact interpretation can vary by implementation.[12][13][14] Address space considerations further influence pointer representation, distinguishing between virtual and physical addressing schemes. Virtual addressing, standard in contemporary processors, allows pointers to reference a large, contiguous logical space per process, independent of physical memory layout, with the operating system handling translations via page tables. Physical addressing, rarer in user-level programming, directly encodes hardware memory locations but limits portability. Additionally, the endianness of the system affects how multi-byte pointers are ordered in memory: little-endian architectures (e.g., x86) store the least significant byte at the lowest address, while big-endian ones (e.g., some network protocols or PowerPC) reverse this order, impacting data serialization and cross-platform compatibility.[13][15]Historical Development
Origins in Computing Architecture
The concept of pointers emerged from the architectural necessities of early stored-program computers in the 1940s and 1950s, where hardware mechanisms for indirect addressing and address modification addressed the challenges of accessing non-contiguous memory locations efficiently. In the von Neumann model, outlined in John von Neumann's 1945 First Draft of a Report on the EDVAC, the stored-program paradigm merged instructions and data in the same address space, requiring mechanisms to manipulate memory addresses efficiently for dynamic program execution. This foundational design motivated the development of register-based addressing to support dynamic program execution without excessive computational overhead.[16] Early implementations appeared in machines like the EDSAC, completed in 1949 at the University of Cambridge, which used accumulator-based architecture and relied on self-modifying code to simulate indirect addressing, allowing instructions to alter operand addresses on the fly for efficient access to non-sequential data. Although the original EDSAC lacked dedicated index registers, a feature introduced contemporaneously in the Manchester Mark 1—this approach highlighted the need for hardware support in handling variable memory references, reducing the burden of manual address bookkeeping in scientific computations.[17][18] The UNIVAC I, delivered in 1951 as the first commercial stored-program computer, extended these ideas through its accumulator design, incorporating address modification capabilities that enabled indirect-like operations for business data processing, where flexible referencing to variable-length records improved efficiency over rigid sequential access.[18] A pivotal milestone came with the Whirlwind computer at MIT, operational by late 1951, which employed address registers to facilitate real-time indirect addressing, essential for its interactive simulations and military applications. Whirlwind's integration of magnetic core memory further underscored the architectural drive for pointer primitives, as the random-access nature of core storage demanded quick, non-contiguous addressing without repeated full-path calculations, achieving access times of approximately 8 microseconds per word. These hardware innovations stemmed from the von Neumann architecture's core requirement for address indirection, enabling programs to treat memory locations as manipulable values and laying the groundwork for scalable computing systems.[16]Evolution in Programming Languages
The concept of pointers in programming languages emerged as a means to manage memory indirection and parameter passing more efficiently than in earlier assembly-level approaches. In ALGOL 60, released in 1960, the introduction of call-by-name parameter passing provided a mechanism akin to reference parameters, where actual parameters were textually substituted into the procedure body upon each use, allowing modifications to the original data without explicit address manipulation.[19] This feature, while not a direct pointer type, laid groundwork for indirect referencing by simulating dynamic evaluation and side effects on caller variables.[20] Building on this, PL/I, developed by IBM and first specified in 1964, formalized explicit pointer variables as a core language feature, enabling direct manipulation of memory addresses for data structures and dynamic allocation. Harold Lawson is credited with inventing the pointer variable concept during PL/I's design, integrating it to support both scientific and business computing needs with type-safe indirection.[21] The Burroughs B5000 system, introduced in 1961, further influenced pointer-like mechanisms through its tagged architecture, where descriptors—extended words with tag bits indicating data types and bounds—served as hardware-supported pointers for high-level languages like ALGOL 60 and COBOL, promoting safer memory access and stack-based operations.[22] Pointers gained widespread adoption with the C programming language, developed by Dennis Ritchie at Bell Labs between 1971 and 1973, where explicit typed pointers became central to its design for systems programming on the PDP-11. Drawing from B's indirection operator but adding structure types and byte addressing, C's pointers enabled array decay to pointers and arithmetic, popularizing their use for low-level control while maintaining portability across Unix implementations.[23] By the late 1970s, languages like Pascal (1970) incorporated pointers with dereferencing via the caret symbol (^), restricting arithmetic to enhance safety for educational and general-purpose use.[24] Standardization efforts in the late 1980s solidified pointer semantics, with the ANSI X3.159-1989 standard (later ISO/IEC 9899:1990) defining C's pointer behaviors, including null pointers, conversions, and undefined behaviors for arithmetic beyond object bounds, ensuring consistent implementation across compilers.[25] In parallel, C++, evolving from "C with Classes" in 1979, introduced references in 1985 as safer aliases to objects, reducing reliance on explicit pointers for function parameters and operator overloading, while retaining pointers for dynamic allocation.[26] The 1980s also saw a paradigm shift toward abstracted references in higher-level languages; for instance, Ada's 1983 standard used access types as typed pointers with built-in null checks and no unchecked arithmetic, prioritizing safety in safety-critical systems over raw control.[27] This evolution culminated in the 1990s with C++ smart pointers, such as reference-counted classes proposed in 1992, which encapsulated raw pointers to automate memory deallocation and prevent leaks, addressing common pitfalls in manual management.[28] By then, the transition from assembly's direct addressing to higher-level abstractions like references in languages such as Modula-2 and early object-oriented designs emphasized reliability, influencing modern paradigms where explicit pointers are often confined to performance-critical code.[26]Formal Foundations
Mathematical Description
In the denotational semantics model for ANSI C (Papaspyrou 2001), a pointer can be abstracted as a function , where denotes the value space comprising all possible data values in the system, and represents the address space consisting of unique memory locations.[29] The dereference operation is then defined as the inverse function , which retrieves the value stored at the specified address.[29] This set-theoretic formulation separates the conceptual layers of data storage and access, treating pointers independently of specific hardware implementations.[29] Indirection levels arise naturally through function composition in this model. A single pointer maps a value to its address, while a double pointer represents the composition , allowing access to addresses of addresses; dereferencing yields applied iteratively.[29] Higher-order indirections extend this pattern, with -level indirection modeled as nested compositions, ensuring that each layer preserves the bijection between valid addresses and values within the defined spaces.[29] The address space is governed by axioms ensuring unique addressing and locality. Uniqueness requires that each element in corresponds to exactly one memory location, formalized as an injection from allocated objects to addresses, preventing overlaps.[29] Locality axiomatically bounds addresses to aggregate structures, such that offsets within an object remain valid only relative to its base address.[29] Pointer equality follows directly: two pointers and are equal if and only if , where extracts the underlying location from the function representation.[29] Theoretical properties of this model emphasize type safety and aliasing behavior in formal semantics. Type safety is enforced through semantic domains where pointer types restrict operations to compatible value spaces, preventing ill-typed dereferences via inference rules that validate compositions.[29] Pointer aliasing manifests as shared references when distinct pointers map to the same address in , allowing concurrent access to the same value but requiring careful handling to avoid undefined behaviors in semantic evaluations.[30]Hardware Architectural Roots
The foundational hardware support for pointers emerged in early mainframe processors through mechanisms like index registers and indirect addressing modes, which enabled efficient memory indirection and offset calculations. The IBM 704, introduced in 1954, was the first IBM computer to incorporate index registers, featuring three such registers that modified addresses by adding the two's complement of their contents to an instruction's base address, facilitating indexed access patterns akin to pointer arithmetic.[31] Indirect addressing, a core pointer operation, was further refined in successors like the IBM 709, where it required an additional execution cycle to fetch the effective address from memory, allowing instructions to reference locations dynamically rather than statically.[31] These features laid the groundwork for pointers by decoupling logical addresses from physical ones at the hardware level, reducing the need for manual address recalculation in assembly code. Memory management units (MMUs) extended this support in the 1960s by automating virtual-to-physical address translation, treating pointers as virtual references resolved at runtime. The Atlas computer, operational from 1962, pioneered this with Page Address Registers (PARs) that mapped 512-word virtual pages to physical core store blocks using associative matching; a virtual address's page identifier was compared against PAR contents, and on a match, the physical block address was concatenated with the intra-page offset to yield the final location.[32] If no match occurred, a page fault interrupted execution, prompting the supervisor to load the page from secondary storage (e.g., a drum) and update the PAR, thus enabling pointers to operate in a larger virtual space without direct physical addressing.[32] This hardware abstraction became standard, influencing subsequent designs like those in the Burroughs B5000 series, where descriptor-based translation similarly virtualized pointer targets. In modern processors, pointers interact closely with cache hierarchies and translation lookaside buffers (TLBs), influencing locality and prefetching efficiency. Pointer chasing—common in linked structures—often disrupts spatial and temporal locality, leading to high cache miss rates (e.g., up to 83% in L3 for pointer-intensive benchmarks) as sequential accesses jump irregularly across memory.[33] TLBs, which cache recent virtual-to-physical mappings, exacerbate this; frequent pointer-induced page crossings increase TLB misses, stalling translation and amplifying latency. Hardware prefetchers mitigate these effects by predicting pointer transitions, such as using a pointer cache to track and preload target objects into L2 cache, achieving up to 50% speedup in dependency-bound workloads by breaking serial chains.[33] Architectural examples illustrate these roots' evolution. The PDP-11, introduced in 1970, supported base+offset addressing in its index mode (mode 6), where an instruction fetched an offset from the subsequent word and added it to a general register (e.g., R4) to compute the effective address, enabling pointer-like array traversals without altering the base.[34] It also provided indirect addressing via deferred modes (e.g., mode 1: @Rn), where the register held the address of the operand rather than the operand itself, requiring an extra memory fetch—mirroring pointer dereferencing.[34] In contemporary x86 architectures, segmentation persists but is typically configured in a flat model, where segment registers (CS, DS, etc.) point to full 4 GB linear spaces, allowing pointers to function as simple offsets while retaining hardware support for base relocation via segment descriptors in protected mode.[35] This setup, detailed in Intel's architecture manuals, uses the Global Descriptor Table to define segment limits and bases, ensuring compatibility with legacy pointer operations while prioritizing virtual addressing via paging.[35]Primary Uses
Data Structures and Arrays
In computer programming, pointers play a fundamental role in implementing arrays as data structures by providing a mechanism to access elements stored in contiguous memory locations. An array is essentially treated as a pointer to its first element, allowing direct manipulation through pointer arithmetic. For instance, in languages like C, the name of an array decays to a pointer to its initial element, enabling efficient traversal by incrementing the pointer to step through subsequent elements without explicit indexing. This equivalence between arrays and pointers facilitates operations such as accessing the nth element via pointer addition, where the address of arr is computed as arr + n, assuming arr points to the base address.int arr[5] = {1, 2, 3, 4, 5};
int *ptr = arr; // ptr now points to arr[0]
for (int i = 0; i < 5; i++) {
printf("%d ", *ptr); // Prints each element
ptr++; // Advances to next element
}
int arr[5] = {1, 2, 3, 4, 5};
int *ptr = arr; // ptr now points to arr[0]
for (int i = 0; i < 5; i++) {
printf("%d ", *ptr); // Prints each element
ptr++; // Advances to next element
}
struct Node {
int data;
struct Node *next;
};
struct Node *head = NULL; // Empty list
// Traversal example
struct Node *current = head;
while (current != NULL) {
printf("%d ", current->data);
current = current->next;
}
struct Node {
int data;
struct Node *next;
};
struct Node *head = NULL; // Empty list
// Traversal example
struct Node *current = head;
while (current != NULL) {
printf("%d ", current->data);
current = current->next;
}
Dynamic Memory Allocation
Dynamic memory allocation allows programs to request memory from the heap at runtime, with pointers serving as the mechanism to access and manage the allocated space. In C, themalloc function allocates a block of memory of a specified size in bytes and returns a pointer to the beginning of that block, which must be of type void* and explicitly cast to the appropriate type for use.[38] This pointer enables the program to store and manipulate data dynamically, such as variable-sized arrays or structures, without relying on compile-time fixed sizes. Similarly, in C++, the new operator allocates memory on the heap for objects or arrays and returns a pointer to the allocated memory, automatically invoking constructors for objects. These mechanisms provide flexibility for applications needing runtime adaptability, like building dynamic data structures.[39]
Manual memory management requires explicit deallocation to prevent resource waste, using free in C to release the memory block pointed to by the provided pointer, or delete and delete[] in C++ to deallocate single objects or arrays while calling destructors.[40] Failure to deallocate, such as when the pointer to the allocated memory is lost or overwritten without calling the deallocation function, results in a memory leak, where the memory remains allocated but inaccessible for the program's duration. In contrast, languages with garbage collection, such as Java, use pointers (implemented as references) to track object reachability from active roots like the stack or global variables, automatically reclaiming memory for unreachable objects without explicit deallocation calls.[41] This approach reduces the risk of leaks from programmer error but introduces overhead from periodic collection cycles.
Pointer-based dynamic allocation can lead to fragmentation, degrading memory efficiency over time. Internal fragmentation occurs when an allocated block exceeds the requested size due to alignment requirements or allocator overhead, leaving unusable space within the block.[42] External fragmentation arises from repeated allocations and deallocations creating scattered free memory holes too small for new requests, even if total free memory suffices, often exacerbated by varying allocation sizes and lifetimes in pointer-managed heaps.[39] Allocators mitigate this through strategies like coalescing adjacent free blocks or using buddy systems, but fragmentation remains a key challenge in manual pointer-based systems.
Pass-by-Reference and Function Pointers
In languages like C that primarily use pass-by-value semantics for function parameters, modifications to arguments within a function do not affect the original variables in the calling scope, as only copies of the values are passed. To simulate pass-by-reference behavior—allowing functions to modify the caller's variables—pointers are employed by passing the address of the variable as an argument, enabling the function to dereference and alter the original data. For instance, a function to swap two integers can be implemented using pointers to their addresses, ensuring the changes persist after the function returns.[43][44] Function pointers in C provide a mechanism to store the memory address of a function, facilitating indirect invocation and dynamic dispatch without direct calls. The syntax declares a pointer with the function's return type, followed by parentheses containing an asterisk and the pointer name, then the parameter types in parentheses, such asvoid (*fp)(int); for a pointer to a function taking an integer and returning void. These pointers can be assigned the address of compatible functions and invoked by dereferencing, like (*fp)(42); or simply fp(42);, allowing runtime selection of executable code.[45][46]
Arrays of function pointers enable callbacks, where a higher-level function receives a pointer to a user-defined function and invokes it during execution, such as for event handling or custom processing. In graphical user interfaces or sorting algorithms like qsort, callbacks allow client code to specify behavior without altering the core library. Similarly, virtual method tables (vtables) are structures—often arrays of function pointers—used to implement runtime polymorphism in C by associating object types with their method implementations, akin to control tables in hardware for indirect jumps.[47][48][49]
Function pointers in C enable polymorphism by allowing structs to include tables of pointers to type-specific functions, simulating object-oriented inheritance and method overriding at runtime. For example, a base struct can define a vtable with pointers to common operations, while derived structs populate their own vtables with specialized implementations; invoking through the struct's pointer resolves to the appropriate function dynamically. This approach provides behavioral flexibility, such as selecting drawing methods for different shapes, without built-in language support for classes.[50][51]
Typing and Operations
Typed Pointers and Casting
In computer programming, particularly in languages like C, pointers are typically typed, meaning they are declared with a specific data type that indicates the type of object they reference. This type information, such asint* for a pointer to an integer or char* for a pointer to a character, allows the compiler to validate operations like dereferencing and ensures that the pointer is used in a manner consistent with the pointed-to object's layout and size. For instance, dereferencing an int* expects to read or write four bytes (on most platforms), while a char* handles one byte, preventing inadvertent data corruption from type mismatches.[52]
Untyped or generic pointers, such as void* in C, lack this specific type association and can store the address of any object type, facilitating generic programming in functions like malloc or qsort. However, void* pointers cannot be directly dereferenced; an explicit cast to a typed pointer is required before accessing the underlying data, as in int* p = (int*)void_ptr;. This design promotes flexibility but introduces risks, as incorrect casts can lead to undefined behavior if the cast type does not match the object's effective type.[52]
Pointer casting in C is performed using the explicit cast operator (type)expression, which converts a pointer of one type to another, including between incompatible types. For example, casting a char* to an int* reinterprets the memory address, but such conversions are implementation-defined or undefined if they violate alignment requirements or the object's effective type. The use of void* often involves implicit conversions to and from other pointer types, but explicit casts are needed for dereferencing, underscoring the language's reliance on programmer discipline for correctness.
Type safety benefits from typed pointers arise through compiler-enforced checks during dereference operations, where mismatched types trigger compilation errors, and runtime rules like strict aliasing, which prohibit accessing an object through a pointer of an incompatible type to enable optimizations. Strict aliasing rules specify that an object of effective type T can only be accessed via lvalues of compatible types or character types, preventing aliasing-related undefined behavior and allowing compilers to assume non-overlapping accesses for better performance. Violations, such as dereferencing a float* on memory with an effective type of int, result in undefined behavior, highlighting the trade-off between pointer flexibility and safety.[53]
Pointer Arithmetic and Manipulation
Pointer arithmetic refers to the set of operations that can be performed on pointers to navigate memory, primarily in languages like C where pointers directly represent addresses. These operations are constrained to ensure type safety and prevent arbitrary memory access, with arithmetic scaled according to the size of the type being pointed to. For instance, incrementing a pointer to achar advances the address by 1 byte, while incrementing a pointer to an int (typically 4 bytes on many systems) advances by 4 bytes.[54] This scaling is automatic: the expression p + n, where p is a pointer of type T* and n is an integer, computes p + n * sizeof(T).[54] Similarly, p - n subtracts n * sizeof(T) from the address.[54]
Valid operations on pointers include addition and subtraction of integers, as well as subtraction between two pointers of the same type. Adding an integer n to a pointer p yields a pointer to the element at position i + n in the same array object, assuming p points to index i.[54] Subtraction of pointers p1 - p2 (where both point into the same array) returns the difference in their indices as a ptrdiff_t value, not the raw byte difference.[54] Comparisons such as <, >, ==, and != are defined between pointers to the same array (or one past the end), allowing checks for relative positions, such as verifying array bounds.[54] These operations treat a single object as a one-element array for arithmetic purposes.[54] Multiplication and division involving pointers are not permitted, as they lack semantic meaning in this context.[54]
A key equivalence in C is the decay of arrays to pointers: an expression of array type implicitly converts to a pointer to its first element, except in specific contexts like sizeof or the unary & operator.[55] This decay enables seamless use of arrays in pointer arithmetic; for example, if arr is an array of int, the expression arr + i points to the i-th element, and &(arr[i]) equals arr + i.[55] The resulting pointer type is T* for an array of type T[N].[55] This behavior underpins array indexing as pointer arithmetic, where arr[i] is equivalent to *(arr + i).[55]
Pointer arithmetic is strictly limited to elements within the same array object to avoid undefined behavior. Operations that would point outside the array bounds, such as p - 1 when p is the first element or p + n exceeding the array length, result in undefined behavior.[54] Additionally, if the computed result cannot be represented in ptrdiff_t or violates integer overflow rules, the behavior is undefined.[54] These constraints, as specified in the C standard (e.g., C17 6.5.6), ensure that arithmetic remains meaningful only for array navigation.[54]
int arr[5] = {1, 2, 3, 4, 5};
int *p = arr; // arr decays to &arr[0]
p = p + 2; // Now points to arr[2], address advanced by 2 * sizeof(int)
int diff = (p + 3) - p; // diff == 3 (index difference)
if (p < arr + 5) { // Valid comparison within array
// Bounds check passes
}
int arr[5] = {1, 2, 3, 4, 5};
int *p = arr; // arr decays to &arr[0]
p = p + 2; // Now points to arr[2], address advanced by 2 * sizeof(int)
int diff = (p + 3) - p; // diff == 3 (index difference)
if (p < arr + 5) { // Valid comparison within array
// Bounds check passes
}
sizeof(int) == 4; the address of p + 1 would be &arr[0] + 4.[54][55]
Safety Mechanisms
Common Pointer Errors
One of the most prevalent pointer errors is dereferencing a null pointer, which occurs when a program attempts to access memory at address zero, a conventional null value represented as 0x0 in C or nullptr in C++. This action typically triggers a segmentation fault because the operating system protects the null address space from reads or writes, halting execution to prevent invalid memory access. For example, in C code, if a pointer is initialized to NULL but dereferenced without checking, such as*ptr = 42;, the program crashes immediately upon attempting the assignment.[56]
Dangling pointers arise when a pointer references memory that has been freed or gone out of scope, leading to use-after-free bugs where the program accesses invalid or reused memory locations.[57] The cause is often failing to update the pointer after deallocation with functions like free() in C, leaving it pointing to the now-invalid address.[58] A common scenario involves dynamically allocated memory that is released, but the pointer is later dereferenced, potentially corrupting data or executing arbitrary code if the memory is reallocated.[57]
Uninitialized pointers, sometimes termed wild pointers, point to arbitrary or garbage memory addresses because they have not been assigned a valid value upon declaration.[59] This error manifests when a pointer variable is used before initialization, such as in function calls or indirection. For instance, declaring int *p; without setting p = NULL; or allocating memory, then using *p, accesses unpredictable memory, leading to erratic behavior.
These errors commonly present as program crashes like segmentation faults, output of garbage data from unintended memory reads, or security vulnerabilities such as buffer overflows when pointer arithmetic exceeds allocated bounds.[60] Buffer overflows, in particular, can exploit pointer mishandling to overwrite adjacent memory, enabling attackers to inject malicious code or escalate privileges in vulnerable applications.[61]
Techniques for Safer Pointers
To mitigate common risks associated with raw pointers, such as memory leaks and dangling references, programming languages and tools have introduced various mechanisms for safer pointer handling. These techniques emphasize automatic resource management, runtime verification, and compile-time checks to prevent errors without sacrificing performance entirely.[62] In C++, smart pointers implement the Resource Acquisition Is Initialization (RAII) idiom to ensure automatic cleanup of dynamically allocated memory. Thestd::unique_ptr provides exclusive ownership of an object, automatically deleting it when the pointer goes out of scope, thus preventing leaks from forgotten deallocations.[63] Similarly, std::shared_ptr enables shared ownership through reference counting, decrementing the count and deleting the object only when the last reference is destroyed, which is particularly useful for graphs of interconnected objects. These constructs, introduced in C++11, replace raw pointers in modern codebases to enforce deterministic lifetime management.[64]
Bounds checking tools and representations address spatial memory errors like buffer overflows. AddressSanitizer (ASan), developed by Google, instruments code at compile time to detect out-of-bounds accesses, use-after-free, and other pointer-related violations at runtime with low overhead, typically under 2x slowdown in execution.[65] It has been integrated into compilers like Clang and GCC, aiding debugging in large projects such as Chromium.[66] Complementing this, fat pointers extend standard pointers with embedded metadata for bounds (e.g., start address and length), enabling hardware or software checks on every access; low-fat variants optimize space by encoding bounds in unused pointer bits on 64-bit systems, reducing overhead to about 1.1x for SPEC benchmarks while maintaining compatibility.[67][68]
Certain languages incorporate pointer safety directly into their type systems. Rust's borrow checker enforces an ownership model at compile time, where pointers (references) are borrowed immutably or mutably under strict rules: only one mutable borrow or multiple immutable borrows are allowed at a time, preventing data races and use-after-free errors without a garbage collector.[69] This static analysis rejects unsafe aliasing, as demonstrated in its effectiveness for systems programming, with adoption in projects like the Linux kernel.[70] In Java, explicit pointers are absent; instead, references are managed by automatic garbage collection, which traces reachable objects from roots (e.g., stack variables) and reclaims unreferenced memory, eliminating manual deallocation risks like leaks or invalid accesses.[71] The JVM's generational collectors, such as G1, further optimize this for low pause times in production environments.[72]
Beyond language features, best practices and tools promote disciplined pointer usage. Pointers should always be initialized to nullptr or a valid address upon declaration to avoid dereferencing garbage values, a rule codified in secure coding standards. Before dereferencing, validate against nullptr and, where applicable, check bounds or ownership; static analysis tools like those from CERT enforce this by flagging uninitialized or unchecked pointers during compilation. These habits, combined with regular use of sanitizers, significantly reduce vulnerabilities in C/C++ codebases.[73]
Specialized Variants
Null and Dangling Pointers
In computer programming, null pointers represent an intentional absence of a valid memory address, serving as a sentinel value to indicate that a pointer does not refer to any object. In the C programming language, theNULL macro is defined as an implementation-defined null pointer constant, typically expanding to an integer constant expression such as 0 or (void*)0, which can be converted to any pointer type without a diagnostic message.[74] This macro, included in headers like <stddef.h>, allows explicit signaling of uninitialized or invalid pointers, though its integer-based representation has led to type-related issues in generic code. In Python, the None object functions as the equivalent null value, a singleton instance of type NoneType that denotes the lack of a value and is commonly returned from functions to signify failure or absence without raising an exception.[75] Unlike C's NULL, None is an object treated with reference counting, ensuring it remains immortal and unchanging across program execution. Higher-level languages often extend this concept through optional types; for instance, Haskell's Maybe type encapsulates optional values as either Just a (containing a value of type a) or Nothing (indicating absence), enabling safe handling of potential null cases via pattern matching without runtime errors.[76]
Dangling pointers, in contrast, arise unintentionally when a pointer references memory that is no longer valid or allocated to the program, leading to undefined behavior upon access. These can occur on the stack when a pointer captures the address of a local variable within a function scope, but the function returns and the stack frame is deallocated, invalidating the reference due to the variable's limited lifetime.[77] For example, in C, returning a pointer to a local array from a function creates a dangling pointer, as the memory is reused for subsequent calls, potentially causing data corruption or crashes. On the heap, dangling pointers result from explicit deallocation via free() or delete without nullifying the pointer, leaving it pointing to reclaimed memory that may be reallocated for other purposes.[78] Such heap-based issues persist beyond local scopes, as the memory persists until reused, but accessing it violates program invariants and can lead to security vulnerabilities if exploited.
In garbage-collected environments, back pointers—references from objects back to their referrers—can create cycles that prevent automatic reclamation, retaining objects in memory despite unreachability from roots. Reference-counting collectors fail to detect these cycles, as mutual references keep counts above zero, leading to memory leaks unless augmented with tracing mechanisms.[79] If a cycle is broken (e.g., by nullifying a back pointer), the involved objects may become dangling if not properly managed, exposing them to premature collection or invalid access. Errors from null and dangling pointers, such as segmentation faults or data races, are common pitfalls in low-level languages but can be mitigated through disciplined initialization.
Indirection and Structural Pointers
Indirection in pointers allows a pointer to reference another pointer, enabling layered access to data through successive dereferences. This concept, known as multiple indirection, is fundamental in languages like C, where a double pointer (e.g.,char**) stores the address of a single pointer, which in turn points to a character. To access the underlying data, multiple dereference operations (*) are required, such as **ptr for a double pointer. Multiple indirection facilitates dynamic modifications to pointer values within functions and supports complex data structures like argument lists.[78][80]
A practical example of multiple indirection appears in the main function's command-line arguments in C, where argv is declared as char**argv, forming an array of pointers to strings. Each element argv[i] is a char* pointing to the i-th argument string, allowing the program to access and process variable numbers of string arguments passed at runtime. Triple pointers (e.g., char***) extend this further, often used in scenarios requiring modification of arrays of pointers, such as building dynamic multi-dimensional structures.[81][82]
Autorelative pointers, also called self-relative pointers, store offsets relative to the pointer's own location rather than absolute addresses, promoting code relocatability without recompilation. This approach is particularly useful in position-independent code (PIC) and persistent memory systems, where the offset is computed as the difference between the target's address and the pointer's address, enabling seamless relocation across memory mappings. In implementations like those for byte-addressable persistent memory, self-relative pointers avoid the need for pointer rewriting during loading, reducing overhead in virtualized or distributed environments.[83][84]
Based pointers operate by adding an offset to a base address stored in a register, a mechanism prevalent in segmented memory models such as those in the Intel x86 architecture. In this model, memory is divided into segments, each defined by a base register value, and pointers consist of a segment selector plus an offset, allowing efficient addressing within large address spaces while providing protection boundaries. This structure was key in early protected-mode systems, where segment registers hold the base, and offsets enable sparse allocation without contiguous physical memory.[85]
Arrays of pointers provide a flexible way to simulate multi-dimensional arrays, particularly jagged ones where rows vary in length. In C, a 2D array can be represented as an array of pointers (T**), where the first dimension is an array of pointers, each pointing to a separate 1D array for a row. This allocation strategy—first allocating the pointer array, then each row array—saves space for non-rectangular data and allows independent resizing of rows, contrasting with contiguous 2D blocks. Such structures are dynamically allocated using malloc for each level, enabling runtime adaptability in applications like matrix processing or graph representations.[86][87]
Function and Control Pointers
Function pointers enable indirect invocation of functions by storing their memory addresses, allowing runtime selection and execution of code. In the C programming language, a function pointer is declared using syntax such asint (*fp)(int), where fp points to a function accepting an integer argument and returning an integer.[88] This construct supports callbacks, where a function is passed as an argument to another function for later invocation; for example, the qsort library function accepts a comparison function pointer int (*compar)(const void *, const void *) to customize sorting behavior dynamically.[88] Similarly, signal handling uses function pointers like void (*handler)(int) to register routines executed upon specific events, such as interrupts.[88]
Dynamic loading extends function pointers by allowing code to be loaded and invoked at runtime without recompilation. In systems supporting shared libraries, functions from external modules can be accessed via handles returned by dlopen, with dlsym retrieving the address as a function pointer for immediate use.[89] This mechanism is essential for plugin architectures, where callbacks from loaded libraries enable extensible behavior, such as registering event handlers in graphical user interfaces or extending application functionality.[90]
Control tables, often implemented as arrays of function pointers, facilitate efficient branching in program control flow. Jump tables, for instance, optimize switch statements by mapping case values to indices in an array of pointers, enabling constant-time dispatch via an indirect jump after table lookup.[91] Compilers generate these tables for dense, consecutive case labels to reduce instruction count compared to chained conditional branches.[91] In hardware contexts, interrupt vector tables serve a similar role, storing pointers to interrupt service routines (ISRs) indexed by interrupt numbers; upon hardware interrupt, the processor loads the corresponding pointer into the program counter for immediate execution.[92] This pointer-based dispatch ensures low-latency response in embedded and operating systems.[92]
Wild branches arise when function pointers are corrupted, leading to unpredictable control flow transfers that pose significant security risks. Attackers exploiting memory vulnerabilities, such as buffer overflows, can overwrite function pointers to redirect execution to malicious code, bypassing intended program paths.[93] This indirect branch hijacking undermines control-flow integrity, potentially enabling privilege escalation or data exfiltration; for example, altering a callback pointer in a library function could invoke unauthorized routines.[94] Dereferencing uninitialized or invalid function pointers (wild pointers) may further cause crashes or arbitrary code execution, amplifying risks in unsafe languages like C.[95]
In object-oriented programming, back pointers provide references from child objects to their owning parents, facilitating bidirectional navigation in hierarchical structures. These pointers enable efficient traversal, such as querying an object's container for context or updating parent state upon child modifications, without relying solely on forward references.[96] In verification disciplines, back pointers are formalized to ensure aliasing consistency, preventing dangling references during object lifecycle management.[96] This pattern is common in graph-based designs, like scene graphs in graphics systems, where back pointers support operations such as deletion propagation from leaves to roots.
Simulation Methods
Array Index Simulation
In languages without native pointer support, such as Fortran 77, array indices can emulate basic pointer functionality by treating a fixed-size array as a contiguous memory block, where the index serves as an offset from the base address to access elements. This simulates pointer arithmetic, enabling traversal and manipulation of data as if using a pointer to reference specific locations within the array. For instance, to implement a simple linked list, an array of structures can store node data, with an additional array of integer indices representing "pointers" to the next node; a value of 0 or -1 typically denotes the end of the list. This approach substitutes direct memory addresses with safe, integer-based offsets, avoiding raw pointer operations. The EQUIVALENCE statement in Fortran further enhances this simulation by allowing multiple variables or arrays to overlay the same storage, effectively creating aliases that mimic pointer-based reinterpretation of memory. For example, an integer variable and a real variable can be equivalenced to share the same bytes, permitting type punning similar to casting a pointer to a different type. A practical illustration involves overlaying arrays for efficient storage reuse:EQUIVALENCE (IARRAY(1), RARRAY(1))
DIMENSION IARRAY(100), RARRAY(50)
EQUIVALENCE (IARRAY(1), RARRAY(1))
DIMENSION IARRAY(100), RARRAY(50)
IARRAY and RARRAY begin at the same memory location, with RARRAY occupying half the space of IARRAY due to differing type sizes; accessing one modifies the shared memory viewed through the other. This technique was commonly used in Fortran 77 for simulating union-like structures or dynamic workspace overlay without native support for such features.[97][98]
Despite these capabilities, array index simulation has significant limitations compared to true pointers. It provides no genuine indirection beyond the array's fixed bounds, restricting dynamic allocation and requiring pre-declared maximum sizes that may lead to waste or overflow risks if exceeded. Complex structures like trees or graphs demand multiple parallel arrays for data and links, complicating management without the flexibility of pointer reassignment or polymorphism. Moreover, unlike pointers, indices cannot easily cross array boundaries or support runtime resizing without recompilation or extensions.[98][99]
A key safety benefit arises from the inherent array nature of this method: indices are subject to bounds checking in many Fortran compilers, which can trap out-of-bounds access at runtime and prevent the memory corruption or segmentation faults common with unchecked pointer arithmetic. For example, compilers like those from Sun or GNU often include options to enable subscript validation against declared array limits, enforcing safer access than raw offsets in pointer-based systems. This reduces overflow vulnerabilities, promoting more reliable code in environments without hardware-level protections.[100]
Higher-Level Abstractions
In higher-level programming paradigms, pointers are often abstracted through mechanisms that simulate their functionality while enhancing safety and usability, particularly in environments where direct memory manipulation is restricted or discouraged. These abstractions, such as references and handles, provide aliasing to objects without exposing raw address arithmetic, thereby reducing risks like null dereferences or dangling pointers.[101] In C++, references serve as aliases to existing objects, binding directly to the referent without requiring indirection or explicit dereferencing, which makes them inherently safer than raw pointers since they cannot be null or reassigned after initialization. Unlike pointers, which support arithmetic and optional null states, references enforce valid binding at compile time and eliminate the need for manual memory management in many scenarios, promoting cleaner code in function parameters and object passing. This design choice aligns with the C++ Core Guidelines, which recommend references as a superior alternative to pointers when ownership transfer is not required, as they avoid common errors associated with pointer reassignment or unchecked access.[101][102] Handles and iterators further abstract pointer-like behavior by wrapping underlying memory access in safer, more generic constructs. In the C++ Standard Template Library (STL), iterators generalize pointers to enable uniform traversal and manipulation across diverse data structures like vectors and lists, supporting operations such as increment, dereference, and comparison without exposing raw addresses. For instance, a random-access iterator likestd::vector::iterator behaves akin to a pointer with offset arithmetic but includes bounds checking in some implementations to prevent out-of-range access. Similarly, in Component Object Model (COM) programming on Windows, smart handles like CComPtr encapsulate interface pointers with automatic reference counting via AddRef and Release, ensuring proper lifetime management and reducing leaks without manual intervention. These wrappers, derived from base classes that handle COM-specific querying and casting, abstract the complexity of raw IUnknown pointers while maintaining compatibility with legacy APIs.[103]
Proxy objects in languages like Python simulate pointer indirection through object references, which are opaque handles to heap-allocated instances managed by the interpreter's garbage collector, without granting programmers direct access to memory addresses or arithmetic. Every variable in Python holds a reference to an object—essentially a pointer under the hood—but this is abstracted via the data model, where assignments create new bindings rather than copies, mimicking pass-by-reference semantics for mutable types like lists. This approach avoids pointer exposure entirely, as the Python Virtual Machine (PVM) handles all dereferencing and deallocation, preventing common low-level errors while enabling dynamic behaviors like duck typing. For weak references, the weakref module provides non-owning proxies that do not increment the reference count, allowing garbage collection without cycles, further simulating controlled pointer lifetimes.[104][105]
These higher-level abstractions introduce trade-offs between enhanced safety and potential performance overhead, as layers of indirection and runtime checks can increase execution time compared to raw pointers. For example, iterator wrappers in C++ may incur minimal compile-time costs but add runtime validation in debug modes, while COM smart handles automate reference counting at the expense of slight allocation overhead per instance. In managed environments like the Java Virtual Machine (JVM) and .NET Common Language Runtime (CLR), object references abstract pointers internally—treating them as opaque handles with automatic memory management—yet they introduce runtime overheads due to safety features like bounds checking in similar systems, balancing security against efficiency in production workloads. Such mechanisms provide incomplete abstraction in modern virtual machines, where internal pointer use persists for optimization but remains hidden from developers to prioritize safety.[102][103]
Language Implementations
Low-Level Languages
In low-level languages such as C and assembly, pointers provide direct manipulation of memory addresses, enabling efficient but error-prone access to data structures. In C, pointers are declared using the asterisk (*) to denote the pointer type, as inint *ptr;, which creates a pointer to an integer.[52] The address-of operator (&) obtains the memory address of a variable, allowing initialization like ptr = &variable;.[52] Dereferencing with * accesses the value at that address, such as *ptr = 42;, which modifies the original variable.[52] The C23 standard (ISO/IEC 9899:2024) introduces the nullptr keyword as a null pointer constant of type nullptr_t, which implicitly converts to any pointer type but not to integers, enhancing type safety compared to NULL.[106]
Arrays in C decay to pointers to their first element, facilitating pointer arithmetic for traversal; for example, int arr[5]; int *p = arr; treats p as pointing to arr[0], and p[2] is equivalent to *(p + 2).[55] Strings are handled as character arrays or pointers to null-terminated sequences, where char *str = "hello"; points to the first character, and the null terminator \0 marks the end.[52] However, C defines several undefined behaviors with pointers, including dereferencing a null pointer or performing signed integer overflow in pointer arithmetic, which can lead to unpredictable program crashes or incorrect results.
C++ builds on C's pointer syntax with extensions for object-oriented features, including pointer-to-member operators for accessing class members. A pointer to a data member is declared as int Class::*pm = &Class::member;, and invoked via obj.*pm for objects or ptr->*pm for pointers to objects.[107] Pointers to member functions follow similarly, e.g., void (Class::*pf)() = &Class::func;, enabling dynamic dispatch.[107] Const qualifiers enhance safety: const int *p points to a constant integer (modifiable pointer but immutable target), while int *const p is a constant pointer (immutable address but modifiable target), and const int *const p combines both.[108]
In assembly languages like x86, pointers are implemented through register-based addressing modes, where registers such as %rbp (base pointer) or %rsp (stack pointer) hold memory addresses directly. Instructions like movl (%rbp), %eax load the value at the address in %rbp into %eax, effectively dereferencing a pointer, while offsets enable array access, e.g., movl -4(%rbp), %eax for the element four bytes before the base.[109] Higher-level languages integrate assembly via inline directives; in GCC-extended C/C++, asm("mov %1, %0" : "=r"(dest) : "r"(src)) uses registers for pointer operations, with memory constraints like "m"(*ptr) allowing direct access to C pointers while preserving register states.[110] This low-level control is essential for systems programming but demands careful management to avoid segmentation faults from invalid addresses.
High-Level and Managed Languages
In high-level and managed programming languages, explicit pointers are often absent or heavily restricted to promote memory safety and abstraction from low-level memory management. These languages typically employ automatic garbage collection or ownership models to handle memory allocation and deallocation, reducing the risks associated with direct pointer manipulation such as dangling references or buffer overflows. Instead of raw pointers, they use higher-level constructs like object references or smart pointers that enforce safety invariants at compile or runtime. This approach aligns with modern trends toward safer systems programming, where borrow checkers or runtime checks prevent common pointer-related errors without sacrificing performance in most cases. Java exemplifies this paradigm by eschewing explicit pointers entirely in favor of object references, which are opaque handles managed by the Java Virtual Machine (JVM). All non-primitive data types are accessed via references that point to objects on the heap, with the garbage collector automatically reclaiming memory from unreferenced objects to prevent leaks. This design ensures that developers cannot perform arithmetic on references or access raw memory addresses directly, fostering safer code. However, for interoperability with native code, the Java Native Interface (JNI) allows unsafe access to pointers in C/C++ libraries, where developers must manually manage memory to avoid crashes or security vulnerabilities.[111][112] In dynamically typed languages like Python and Perl, references serve as automatic, implicit pointers to objects, abstracting away direct memory addressing while still allowing limited introspection. Python treats all objects as referenced values, with the id() built-in function providing a unique integer identifier—effectively the memory address in CPython implementations—for debugging or equality checks, though it is not intended for pointer-like operations. Perl's references are scalar values that point to other data structures like arrays or hashes, enabling complex data manipulation without exposing raw addresses; they support dereferencing via operators but are garbage-collected to ensure safety. These mechanisms prioritize ease of use over low-level control, with id() or similar functions offering only a peek into underlying addresses without enabling unsafe manipulations.[113][114] Rust introduces a hybrid model, providing safe alternatives to pointers through its ownership and borrowing system while reserving raw pointers for exceptional cases. Borrowed references (&T) and owned values (Boxand UnsafeMutablePointer
types for direct memory access, typically in performance-sensitive or C-interfacing code, where bounds checking and ownership transfers help mitigate risks. Even legacy languages like COBOL and PL/I provide limited pointer support: COBOL uses USAGE IS POINTER for data items in procedure calls or dynamic allocation, while PL/I supports pointer variables with arithmetic for based structures, though both emphasize structured programming over unrestricted pointer use. These features reflect a broader shift in high-level languages toward safer abstractions, with unsafe options as deliberate escapes for specialized requirements.[115][116][117]