Hubbry Logo
Pointer (computer programming)Pointer (computer programming)Main
Open search
Pointer (computer programming)
Community hub
Pointer (computer programming)
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Pointer (computer programming)
Pointer (computer programming)
from Wikipedia

I do consider assignment statements and pointer variables to be among computer science's "most valuable treasures."

Donald Knuth, Structured Programming, with go to Statements[1]
A pointer a pointing to the memory address associated with a variable b, i.e., a contains the memory address 1008 of the variable b. In this diagram, the computing architecture uses the same address space and data primitive for both pointers and non-pointers; this need not be the case.

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

Using pointers significantly improves performance for repetitive operations, like traversing iterable data structures (e.g. strings, lookup tables, control tables, linked lists, and tree structures). In particular, it is often much cheaper in time and space to copy and dereference pointers than it is to copy and access the data to which the pointers point.

Pointers are also used to hold the addresses of entry points for called subroutines in procedural programming and for run-time linking to dynamic link libraries (DLLs). In object-oriented programming, pointers to functions are used for binding methods, often using virtual method tables.

A pointer is a simple, more concrete implementation of the more abstract reference data type. Several languages, especially low-level languages, support some type of pointer, although some have more restrictions on their use than others. While "pointer" has been used to refer to references in general, it more properly applies to data structures whose interface explicitly allows the pointer to be manipulated (arithmetically via pointer arithmetic) as a memory address, as opposed to a magic cookie or capability which does not allow such.[citation needed] Because pointers allow both protected and unprotected access to memory addresses, there are risks associated with using them, particularly in the latter case. Primitive pointers are often stored in a format similar to an integer; however, attempting to dereference or "look up" such a pointer whose value is not a valid memory address could cause a program to crash (or contain invalid data). To alleviate this potential problem, as a matter of type safety, pointers are considered a separate type parameterized by the type of data they point to, even if the underlying representation is an integer. Other measures may also be taken (such as validation and bounds checking), to verify that the pointer variable contains a value that is both a valid memory address and within the numerical range that the processor is capable of addressing.

History

[edit]

In 1955, Soviet Ukrainian computer scientist Kateryna Yushchenko created the Address programming language that made possible indirect addressing and addresses of the highest rank – analogous to pointers. This language was widely used on the Soviet computers. However, it was unknown outside the Soviet Union and usually Harold Lawson is credited with the invention, in 1964, of the pointer.[2] In 2000, Lawson was presented the Computer Pioneer Award by the IEEE "[f]or inventing the pointer variable and introducing this concept into PL/I, thus providing for the first time, the capability to flexibly treat linked lists in a general-purpose high-level language".[3] His seminal paper on the concepts appeared in the June 1967 issue of CACM entitled: PL/I List Processing. According to the Oxford English Dictionary, the word pointer first appeared in print as a stack pointer in a technical memorandum by the System Development Corporation.

Formal description

[edit]

In computer science, a pointer is a kind of reference.

A data primitive (or just primitive) is any datum that can be read from or written to computer memory using one memory access (for instance, both a byte and a word are primitives).

A data aggregate (or just aggregate) is a group of primitives that are logically contiguous in memory and that are viewed collectively as one datum (for instance, an aggregate could be 3 logically contiguous bytes, the values of which represent the 3 coordinates of a point in space). When an aggregate is entirely composed of the same type of primitive, the aggregate may be called an array; in a sense, a multi-byte word primitive is an array of bytes, and some programs use words in this way.

In the context of these definitions, a byte is the smallest primitive; each memory address specifies a different byte. The memory address of the initial byte of a datum is considered the memory address (or base memory address) of the entire datum.

A memory pointer (or just pointer) is a primitive, the value of which is intended to be used as a memory address; it is said that a pointer points to a memory address. It is also said that a pointer points to a datum [in memory] when the pointer's value is the datum's memory address.

More generally, a pointer is a kind of reference, and it is said that a pointer references a datum stored somewhere in memory; to obtain that datum is to dereference the pointer. The feature that separates pointers from other kinds of reference is that a pointer's value is meant to be interpreted as a memory address, which is a rather low-level concept.

References serve as a level of indirection: A pointer's value determines which memory address (that is, which datum) is to be used in a calculation. Because indirection is a fundamental aspect of algorithms, pointers are often expressed as a fundamental data type in programming languages; in statically (or strongly) typed programming languages, the type of a pointer determines the type of the datum to which the pointer points.

Architectural roots

[edit]

Pointers are a very thin abstraction on top of the addressing capabilities provided by most modern architectures. In the simplest scheme, an address, or a numeric index, is assigned to each unit of memory in the system, where the unit is typically either a byte or a word – depending on whether the architecture is byte-addressable or word-addressable – effectively transforming all of memory into a very large array. The system would then also provide an operation to retrieve the value stored in the memory unit at a given address (usually utilizing the machine's general-purpose registers).

In the usual case, a pointer is large enough to hold more addresses than there are units of memory in the system. This introduces the possibility that a program may attempt to access an address which corresponds to no unit of memory, either because not enough memory is installed (i.e. beyond the range of available memory) or the architecture does not support such addresses. The first case may, in certain platforms such as the Intel x86 architecture, be called a segmentation fault (segfault). The second case is possible in the current implementation of AMD64, where pointers are 64 bit long and addresses only extend to 48 bits. Pointers must conform to certain rules (canonical addresses), so if a non-canonical pointer is dereferenced, the processor raises a general protection fault.

On the other hand, some systems have more units of memory than there are addresses. In this case, a more complex scheme such as memory segmentation or paging is employed to use different parts of the memory at different times. The last incarnations of the x86 architecture support up to 36 bits of physical memory addresses, which were mapped to the 32-bit linear address space through the PAE paging mechanism. Thus, only 1/16 of the possible total memory may be accessed at a time. Another example in the same computer family was the 16-bit protected mode of the 80286 processor, which, though supporting only 16 MB of physical memory, could access up to 1 GB of virtual memory, but the combination of 16-bit address and segment registers made accessing more than 64 KB in one data structure cumbersome.

In order to provide a consistent interface, some architectures provide memory-mapped I/O, which allows some addresses to refer to units of memory while others refer to device registers of other devices in the computer. There are analogous concepts such as file offsets, array indices, and remote object references that serve some of the same purposes as addresses for other types of objects.

Uses

[edit]

Pointers are directly supported without restrictions in languages such as PL/I, C, C++, Pascal, FreeBASIC, and implicitly in most assembly languages. They are used mainly to construct references, which in turn are fundamental to construct nearly all data structures, and to pass data between different parts of a program.

In functional programming languages that rely heavily on lists, data references are managed abstractly by using primitive constructs like cons and the corresponding elements car and cdr, which can be thought of as specialised pointers to the first and second components of a cons-cell. This gives rise to some of the idiomatic "flavour" of functional programming. By structuring data in such cons-lists, these languages facilitate recursive means for building and processing data—for example, by recursively accessing the head and tail elements of lists of lists; e.g. "taking the car of the cdr of the cdr". By contrast, memory management based on pointer dereferencing in some approximation of an array of memory addresses facilitates treating variables as slots into which data can be assigned imperatively.

When dealing with arrays, the critical lookup operation typically involves a stage called address calculation which involves constructing a pointer to the desired data element in the array. In other data structures, such as linked lists, pointers are used as references to explicitly tie one piece of the structure to another.

Pointers are used to pass parameters by reference. This is useful if the programmer wants a function's modifications to a parameter to be visible to the function's caller. This is also useful for returning multiple values from a function.

Pointers can also be used to allocate and deallocate dynamic variables and arrays in memory. Since a variable will often become redundant after it has served its purpose, it is a waste of memory to keep it, and therefore it is good practice to deallocate it (using the original pointer reference) when it is no longer needed. Failure to do so may result in a memory leak (where available free memory gradually, or in severe cases rapidly, diminishes because of an accumulation of numerous redundant memory blocks).

C pointers

[edit]

In C, the basic syntax to define a pointer is:[4]

// both declarations are considered valid:
int *ptr;
int* ptr;

This declares a variable ptr that stores a pointer to an object of type int. Other types can be used in place of int; for example, bool *ptr would declare a pointer to an object of type bool.

Because the C language does not specify an implicit initialization for objects of automatic storage duration,[5] pointer variables can sometimes point to unexpected locations, causing undefined behavior. To combat this, pointers are sometimes initialized with a null pointer value, represented in C by the NULL macro;[6] in C23 and later, nullptr is also available as an alternative.[7] nullptr is type-safe and has type nullptr_t, unlike NULL which expands to (void*)0.[8]

int *ptr = NULL;
int *ptr = nullptr; // since C23

Dereferencing a null pointer produces undefined behavior,[9] which can result in unpredictable bugs and results.

After a pointer has been declared, it can be assigned an address. In C, the address of a variable can be retrieved with the & unary operator:

// Declares the pointer variable
int *ptr = NULL;
// Creates a variable
int a = 5;
// Assigns the address of a to the pointer variable
ptr = &a;

Additionally, to dereference the pointer, an asterisk (*) can be used. This allows the assignment of a value to the address pointed to by a without having to be in the same scope.

*ptr = 8;

If a is accessed later, its new value will be 8.

This example may be clearer if memory is examined directly. Assume that a is located at address 0x8130 in memory and ptr at 0x8134; also assume this is a 32-bit machine such that an int is 32-bits wide. The following is what would be in memory after the following code snippet is executed:

int a = 5;
int *ptr = NULL;
Address Contents
0x8130 0x00000005
0x8134 0x00000000

(The null pointer shown here is 0x00000000.) By assigning the address of a to ptr:

ptr = &a;

yields the following memory values:

Address Contents
0x8130 0x00000005
0x8134 0x00008130

Then by dereferencing ptr by coding:

*ptr = 8;

the computer will take the contents of ptr (which is 0x8130), 'locate' that address, and assign 8 to that location yielding the following memory:

Address Contents
0x8130 0x00000008
0x8134 0x00008130

Clearly, accessing a will yield the value of 8 because the previous instruction modified the contents of a by way of the pointer ptr.

Use in data structures

[edit]

When setting up data structures like lists, queues and trees, it is necessary to have pointers to help manage how the structure is implemented and controlled. Typical examples of pointers are start pointers, end pointers, and stack pointers. These pointers can either be absolute (the actual physical address or a virtual address in virtual memory) or relative (an offset from an absolute start address ("base") that typically uses fewer bits than a full address, but will usually require one additional arithmetic operation to resolve).

Relative addresses are a form of manual memory segmentation, and share many of its advantages and disadvantages. A two-byte offset, containing a 16-bit, unsigned integer, can be used to provide relative addressing for up to 64 KiB (216 bytes) of a data structure. This can easily be extended to 128, 256 or 512 KiB if the address pointed to is forced to be aligned on a half-word, word or double-word boundary (but, requiring an additional "shift left" bitwise operation—by 1, 2 or 3 bits—in order to adjust the offset by a factor of 2, 4 or 8, before its addition to the base address). Generally, though, such schemes are a lot of trouble, and for convenience to the programmer absolute addresses (and underlying that, a flat address space) is preferred.

A one byte offset, such as the hexadecimal ASCII value of a character (e.g. X'29') can be used to point to an alternative integer value (or index) in an array (e.g., X'01'). In this way, characters can be very efficiently translated from 'raw data' to a usable sequential index and then to an absolute address without a lookup table.

C arrays

[edit]

In C, array indexing is formally defined in terms of pointer arithmetic; that is, the language specification requires that a[i] be equivalent to *(a + i).[10] Thus in C, arrays can be thought of as pointers to consecutive areas of memory (with no gaps),[10] and the syntax for accessing arrays is identical for that which can be used to dereference pointers. For example, an array a can be declared and used in the following manner:

int a[5]; // Declares 5 contiguous integers
int *ptr = a; // Arrays can be used as pointers
ptr[0] = 1; // Pointers can be indexed with array syntax
*(a + 1) = 2; // Arrays can be dereferenced with pointer syntax
*(1 + a) = 2; // Pointer addition is commutative
2[a] = 4; // Subscript operator is commutative (perhaps unusual)

This allocates a block of five integers and names the block a, which acts as a pointer to the block. Another common use of pointers is to point to dynamically allocated memory from malloc which returns a consecutive block of memory of no less than the requested size that can be used as an array.

While most operators on arrays and pointers are equivalent, the result of the sizeof operator differs. In this example, sizeof(a) will evaluate to 5 * sizeof(int) (the size of the array), while sizeof(ptr) will evaluate to sizeof(int*), the size of the pointer itself.

Default values of an array can be declared like:

int a[5] = {2, 4, 3, 1, 5};

If array is located in memory starting at address 0x1000 on a 32-bit little-endian machine then memory will contain the following (values are in hexadecimal, like the addresses):

0 1 2 3
1000 2 0 0 0
1004 4 0 0 0
1008 3 0 0 0
100C 1 0 0 0
1010 5 0 0 0

Represented here are five integers: 2, 4, 3, 1, and 5. These five integers occupy 32 bits (4 bytes) each with the least-significant byte stored first (this is a little-endian CPU architecture) and are stored consecutively starting at address 0x1000.

The syntax for C with pointers is:

  • a means 0x1000;
  • a + 1 means 0x1004: the "+ 1" means to add the size of 1 int, which is 4 bytes;
  • *a means to dereference the contents of a. Considering the contents as a memory address (0x1000), look up the value at that location (0x0002);
  • a[i] means element number i, 0-based, of a which is translated into *(a + i).

The last example is how to access the contents of a. Breaking it down:

  • a + i is the memory location of the ith element of a, starting at i = 0;
  • *(a + i) takes that memory address and dereferences it to access the value.

C linked list

[edit]

Below is an example definition of a linked list in C.

/* the empty linked list is represented by NULL
 * or some other sentinel value */
#define EMPTY_LIST NULL

struct Link {
    void *data; // data of this link
    struct Link *next; // next link; EMPTY_LIST if there is none
};

This pointer-recursive definition is essentially the same as the reference-recursive definition from the language Haskell:

 data Link a = Nil
             | Cons a (Link a)

Nil is the empty list, and Cons a (Link a) is a cons cell of type a with another link also of type a.

The definition with references, however, is type-checked and does not use potentially confusing signal values. For this reason, data structures in C are usually dealt with via wrapper functions, which are carefully checked for correctness.

Pass-by-address using pointers

[edit]

Pointers can be used to pass variables by their address, allowing their value to be changed. For example, consider the following C code:

// a copy of the int n can be changed within the function without affecting the calling code
void passByValue(int n) {
    n = 12;
}

// a pointer m is passed instead. No copy of the value pointed to by m is created
void passByAddress(int *m) {
    *m = 14;
}

int main(void) {
    int x = 3;

    // pass a copy of x's value as the argument
    passByValue(x);
    // the value was changed inside the function, but x is still 3 from here on

    // pass x's address as the argument
    passByAddress(&x);
    // x was actually changed by the function and is now equal to 14 here

    return 0;
}

Dynamic memory allocation

[edit]

In some programs, the required amount of memory depends on what the user may enter. In such cases the programmer needs to allocate memory dynamically. This is done by allocating memory at the heap rather than on the stack, where variables usually are stored (although variables can also be stored in the CPU registers). Dynamic memory allocation can only be made through pointers, and names – like with common variables – cannot be given.

Pointers are used to store and manage the addresses of dynamically allocated blocks of memory. Such blocks are used to store data objects or arrays of objects. Most structured and object-oriented languages provide an area of memory, called the heap or free store, from which objects are dynamically allocated.

The example C code below illustrates how structure objects are dynamically allocated and referenced. The standard C library provides the function malloc() for allocating memory blocks from the heap. It takes the size of an object to allocate as a parameter and returns a pointer to a newly allocated block of memory suitable for storing the object, or it returns a null pointer if the allocation failed.

// Parts inventory item
typedef struct {
    int id; // Part number
    char *name; // Part name
    float cost; // Cost
} Item;

// Allocate and initialize a new Item object
Item *makeItem(const char *name) {
    Item *item;

    // Allocate a block of memory for a new Item object
    item = (Item *)malloc(sizeof(Item));
    if (!item) {
        return NULL;
    }

    // Initialize the members of the new Item
    memset(item, 0, sizeof(Item));
    item->id = -1;
    item->name = NULL;
    item->cost = 0.0;

    // Save a copy of the name in the new Item 
    item->name = (char *)malloc(strlen(name) + 1);
    if (!item->name) {
        free(item);
        return NULL;
    }
    strcpy(item->name, name);

    // Return the newly created Item object
    return item;
}

The code below illustrates how memory objects are dynamically deallocated, i.e., returned to the heap or free store. The standard C library provides the function free() for deallocating a previously allocated memory block and returning it back to the heap.

// Deallocate an Item object
void destroyItem(Item *item) {
    // Check for a null object pointer
    if (!item) {
        return;
    }

    // Deallocate the name string saved within the Item
    if (item->name) {
        free(item->name);
        item->name = NULL;
    }

    // Deallocate the Item object itself
    free(item);
}

Memory-mapped hardware

[edit]

On some computing architectures, pointers can be used to directly manipulate memory or memory-mapped devices.

Assigning addresses to pointers is an invaluable tool when programming microcontrollers. Below is a simple example declaring a pointer of type int and initialising it to a hexadecimal address in this example the constant 0x7FFF:

int *hardware_address = (int *)0x7FFF;

In the mid-1980s, using the BIOS to access the video capabilities of PCs was slow. Applications that were display-intensive typically used to access CGA video memory directly by casting the hexadecimal constant 0xB8000 to a pointer to an array of 80 unsigned 16-bit int values. Each value consisted of an ASCII code in the low byte, and a colour in the high byte. Thus, to put the letter 'A' at row 5, column 2 in bright white on blue, one would write code like the following:

#define VID ((unsigned short (*)[80])0xB8000)

void foo(void) {
    VID[4][1] = 0x1F00 | 'A';
}

Use in control tables

[edit]

Control tables that are used to control program flow usually make extensive use of pointers. The pointers, usually embedded in a table entry, may, for instance, be used to hold the entry points to subroutines to be executed, based on certain conditions defined in the same table entry. The pointers can however be simply indexes to other separate, but associated, tables comprising an array of the actual addresses or the addresses themselves (depending upon the programming language constructs available). They can also be used to point to earlier table entries (as in loop processing) or forward to skip some table entries (as in a switch or "early" exit from a loop). For this latter purpose, the "pointer" may simply be the table entry number itself and can be transformed into an actual address by simple arithmetic.

Typed pointers and casting

[edit]

In many languages, pointers have the additional restriction that the object they point to has a specific type. For example, a pointer may be declared to point to an integer; the language will then attempt to prevent the programmer from pointing it to objects which are not integers, such as floating-point numbers, eliminating some errors.

For example, in the following C code:

int *money;
char *bags;

money would be an integer pointer and bags would be a char pointer.

The following would yield a compiler warning of "assignment from incompatible pointer type" under GCC:

bags = money;

because money and bags were declared with different types.

To suppress the compiler warning, it must be made explicit to make the assignment by typecasting it:

bags = (char *)money;

which says to cast the integer pointer of money to a char pointer and assign to bags.

A 2005 draft of the C standard requires that casting a pointer derived from one type to one of another type should maintain the alignment correctness for both types (6.3.2.3 Pointers, par. 7):[11]

char *external_buffer = "abcdef";
int *internal_data;

internal_data = (int *)external_buffer;  
// UNDEFINED BEHAVIOUR if "the resulting pointer is not correctly aligned"

In languages that allow pointer arithmetic, arithmetic on pointers takes into account the size of the type. For example, adding an integer number to a pointer produces another pointer that points to an address that is higher by that number times the size of the type. This allows us to easily compute the address of elements of an array of a given type, as was shown in the C arrays example above. When a pointer of one type is cast to another type of a different size, the programmer should expect that pointer arithmetic will be calculated differently. In C, for example, if the money array starts at 0x2000 and sizeof(int) is 4 bytes whereas sizeof(char) is 1 byte, then money + 1 will point to 0x2004, but bags + 1 would point to 0x2001. Other risks of casting include loss of data when "wide" data is written to "narrow" locations (e.g. bags[0] = 65537;), unexpected results when bit-shifting values, and comparison problems, especially with signed vs unsigned values.

Although it is impossible in general to determine at compile-time which casts are safe, some languages store run-time type information which can be used to confirm that these dangerous casts are valid at runtime. Other languages merely accept a conservative approximation of safe casts, or none at all.

Value of pointers

[edit]

In C and C++, even if two pointers compare as equal that doesn't mean they are equivalent. In these languages and LLVM, the rule is interpreted to mean that "just because two pointers point to the same address, does not mean they are equal in the sense that they can be used interchangeably", the difference between the pointers referred to as their provenance.[12] Casting to an integer type such as uintptr_t is implementation-defined and the comparison it provides does not provide any more insight as to whether the two pointers are interchangeable. In addition, further conversion to bytes and arithmetic will throw off optimizers trying to keep track the use of pointers, a problem still being elucidated in academic research.[13]

Making pointers safer

[edit]

As a pointer allows a program to attempt to access an object that may not be defined, pointers can be the origin of a variety of programming errors. However, the usefulness of pointers is so great that it can be difficult to perform programming tasks without them. Consequently, many languages have created constructs designed to provide some of the useful features of pointers without some of their pitfalls, also sometimes referred to as pointer hazards. In this context, pointers that directly address memory (as used in this article) are referred to as raw pointers, by contrast with smart pointers or other variants.

One major problem with pointers is that as long as they can be directly manipulated as a number, they can be made to point to unused addresses or to data which is being used for other purposes. Many languages, including most functional programming languages and recent imperative programming languages like Java, replace pointers with a more opaque type of reference, typically referred to as simply a reference, which can only be used to refer to objects and not manipulated as numbers, preventing this type of error. Array indexing is handled as a special case.

A pointer which does not have any address assigned to it is called a wild pointer. Any attempt to use such uninitialized pointers can cause unexpected behavior, either because the initial value is not a valid address, or because using it may damage other parts of the program. The result is often a segmentation fault, storage violation or wild branch (if used as a function pointer or branch address).

In systems with explicit memory allocation, it is possible to create a dangling pointer by deallocating the memory region it points into. This type of pointer is dangerous and subtle because a deallocated memory region may contain the same data as it did before it was deallocated but may be then reallocated and overwritten by unrelated code, unknown to the earlier code. Languages with garbage collection prevent this type of error because deallocation is performed automatically when there are no more references in scope.

Some languages, like C++, support smart pointers, which use a simple form of reference counting to help track allocation of dynamic memory in addition to acting as a reference. In the absence of reference cycles, where an object refers to itself indirectly through a sequence of smart pointers, these eliminate the possibility of dangling pointers and memory leaks. Delphi strings support reference counting natively.

The Rust programming language introduces a borrow checker, pointer lifetimes, and an optimisation based around option types for null pointers to eliminate pointer bugs, without resorting to garbage collection.

Special kinds of pointers

[edit]

Kinds defined by value

[edit]

Null pointer

[edit]

A null pointer has a value reserved for indicating that the pointer does not refer to a valid object. Null pointers are routinely used to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types and to the Nothing value in an option type.

Dangling pointer

[edit]

A dangling pointer is a pointer that does not point to a valid object and consequently may make a program crash or behave oddly. In the Pascal or C programming languages, pointers that are not specifically initialized may point to unpredictable addresses in memory.

The following example code shows a dangling pointer:

int func(void) {
    char *p1 = (char *)malloc(sizeof(char)); // (undefined) value of some place on the heap
    char *p2; // dangling (uninitialized) pointer
    *p1 = 'a'; // This is OK, assuming malloc() has not returned NULL.
    *p2 = 'b'; // This invokes undefined behavior
}

Here, p2 may point to anywhere in memory, so performing the assignment *p2 = 'b'; can corrupt an unknown area of memory or trigger a segmentation fault.

Wild branch

[edit]

Where a pointer is used as the address of the entry point to a program or start of a function which doesn't return anything and is also either uninitialized or corrupted, if a call or jump is nevertheless made to this address, a "wild branch" is said to have occurred. In other words, a wild branch is a function pointer that is wild (dangling).

The consequences are usually unpredictable and the error may present itself in several different ways depending upon whether or not the pointer is a "valid" address and whether or not there is (coincidentally) a valid instruction (opcode) at that address. The detection of a wild branch can present one of the most difficult and frustrating debugging exercises since much of the evidence may already have been destroyed beforehand or by execution of one or more inappropriate instructions at the branch location. If available, an instruction set simulator can usually not only detect a wild branch before it takes effect, but also provide a complete or partial trace of its history.

Kinds defined by structure

[edit]

Autorelative pointer

[edit]

An autorelative pointer is a pointer whose value is interpreted as an offset from the address of the pointer itself; thus, if a data structure has an autorelative pointer member that points to some portion of the data structure itself, then the data structure may be relocated in memory without having to update the value of the auto relative pointer.[14]

The cited patent also uses the term self-relative pointer to mean the same thing. However, the meaning of that term has been used in other ways:

  • to mean an offset from the address of a structure rather than from the address of the pointer itself;[citation needed]
  • to mean a pointer containing its own address, which can be useful for reconstructing in any arbitrary region of memory a collection of data structures that point to each other.[15]

Based pointer

[edit]

A based pointer is a pointer whose value is an offset from the value of another pointer. This can be used to store and load blocks of data, assigning the address of the beginning of the block to the base pointer.[16]

Kinds defined by use or datatype

[edit]

Multiple indirection

[edit]

In some languages, a pointer can reference another pointer, requiring multiple dereference operations to get to the original value. While each level of indirection may add a performance cost, it is sometimes necessary in order to provide correct behavior for complex data structures. For example, in C it is typical to define a linked list in terms of an element that contains a pointer to the next element of the list:

typedef struct Element {
    struct Element *next;
    int value;
} Element;

Element *head = NULL;

This implementation uses a pointer to the first element in the list as a surrogate for the entire list. If a new value is added to the beginning of the list, head has to be changed to point to the new element. Since C arguments are always passed by value, using double indirection allows the insertion to be implemented correctly, and has the desirable side-effect of eliminating special case code to deal with insertions at the front of the list:

// Given a sorted list at *head, insert the element item at the first
// location where all earlier elements have lesser or equal value.
void insert(Element **head, Element *item) {
    // p points to a pointer to an element
    Element**p = head;
    while (*p && (*p)->value < item->value) {
        p = &(*p)->next;
    }
    item->next = *p;
    *p = item;
}

// Caller does this:
insert(&head, item);

In this case, if the value of item is less than that of head, the caller's head is properly updated to the address of the new item.

A basic example is in the argv argument to the main function in C (and C++), which is given in the prototype as char** argv (or char* argv[])—this is because the variable argv itself is a pointer to an array of strings (an array of arrays), so *argv is a pointer to the 0th string (by convention the name of the program), and **argv is the 0th character of the 0th string.

Function pointer

[edit]

In some languages, a pointer can reference executable code, i.e., it can point to a function, method, or procedure. A function pointer will store the address of a function to be invoked. While this facility can be used to call functions dynamically, it is often a favorite technique of virus and other malicious software writers.

// Function with two integer parameters returning an integer value
int sum(int n1, int n2) {   
    return n1 + n2;
}

int main(void) {
    int a = 3;
    int b = 5;
    // Function pointer to a function (int, int) -> int
    // and points to function sum
    int (*fp)(int, int) = &sum;
    int x = (*fp)(a, b); // Calls function sum with arguments a and b
    int y = sum(a, b); // Calls function sum with arguments a and b
}

Back pointer

[edit]

In doubly linked lists or tree structures, a back pointer held on an element 'points back' to the item referring to the current element. These are useful for navigation and manipulation, at the expense of greater memory use.

Simulation using an array index

[edit]

It is possible to simulate pointer behavior using an index to an (normally one-dimensional) array.

Primarily for languages which do not support pointers explicitly but do support arrays, the array can be thought of and processed as if it were the entire memory range (within the scope of the particular array) and any index to it can be thought of as equivalent to a general-purpose register in assembly language (that points to the individual bytes but whose actual value is relative to the start of the array, not its absolute address in memory). Assuming the array is, say, a contiguous 16 megabyte character data structure, individual bytes (or a string of contiguous bytes within the array) can be directly addressed and manipulated using the name of the array with a 31 bit unsigned integer as the simulated pointer (this is quite similar to the C arrays example shown above). Pointer arithmetic can be simulated by adding or subtracting from the index, with minimal additional overhead compared to genuine pointer arithmetic.

In languages that abstract away pointers and pointer arithmetic, such as Java, one can use iterators. For example, in C++ the iterator can overload operator++, similar to the traditional incrementing a pointer in C.

import std;

using std::vector;

int main() {
    // collections such as vector define an iterator
    // type, with begin() and end() method
    vector<int> v{1, 2, 3, 4, 5};
    // 'it' is of type vector<int>::iterator
    // iterate through the collection similar to
    // pointer arithmetic
    for (auto it = v.begin(); it != v.end(); ++it) {
        std::print("{}", *it);
    }
}

It is even theoretically possible, using the above technique, together with a suitable instruction set simulator to simulate any machine code or the intermediate (byte code) of any processor/language in another language that does not support pointers at all (for example Java / JavaScript). To achieve this, the binary code can initially be loaded into contiguous bytes of the array for the simulator to "read", interpret and execute entirely within the memory containing the same array. If necessary, to completely avoid buffer overflow problems, bounds checking can usually be inserted by the compiler (or if not, hand coded in the simulator).

Support in various programming languages

[edit]

Ada

[edit]

Ada is a strongly typed language where all pointers are typed and only safe type conversions are permitted. All pointers are by default initialized to null, and any attempt to access data through a null pointer causes an exception to be raised. Pointers in Ada are called access types. Ada 83 did not permit arithmetic on access types (although many compiler vendors provided for it as a non-standard feature), but Ada 95 supports “safe” arithmetic on access types via the package System.Storage_Elements.

BASIC

[edit]

Several old versions of BASIC for the Windows platform had support for STRPTR() to return the address of a string, and for VARPTR() to return the address of a variable. Visual Basic 5 also had support for OBJPTR() to return the address of an object interface, and for an ADDRESSOF operator to return the address of a function. The types of all of these are integers, but their values are equivalent to those held by pointer types.

Newer dialects of BASIC, such as FreeBASIC or BlitzMax, have exhaustive pointer implementations, however. In FreeBASIC, arithmetic on ANY pointers (equivalent to C's void*) are treated as though the ANY pointer was a byte width. ANY pointers cannot be dereferenced, as in C. Also, casting between ANY and any other type's pointers will not generate any warnings.

dim as integer f = 257
dim as any ptr g = @f
dim as integer ptr i = g
assert(*i = 257)
assert( (g + 4) = (@f + 1) )

C and C++

[edit]

In C and C++ pointers are variables that store addresses and can be null. Each pointer has a type it points to, but one can freely cast between pointer types (but not between a function pointer and an object pointer). A special pointer type called the "void pointer" allows pointing to any (non-function) object, but is limited by the fact that it cannot be dereferenced directly (it shall be cast). The address itself can often be directly manipulated by casting a pointer to and from an integral type of sufficient size, though the results are implementation-defined and may indeed cause undefined behavior; while earlier C standards did not have an integral type that was guaranteed to be large enough, C99 specifies the uintptr_t typedef name defined in <stdint.h>, but an implementation need not provide it.

C++ fully supports C pointers and C typecasting. It also supports a new group of typecasting operators to help catch some unintended dangerous casts at compile-time. Since C++11, the C++ standard library also provides smart pointers (unique_ptr, shared_ptr and weak_ptr) which can be used in some situations as a safer alternative to primitive C pointers. C++ also supports another form of reference, quite different from a pointer, called simply a reference or reference type.

Pointer arithmetic, that is, the ability to modify a pointer's target address with arithmetic operations (as well as magnitude comparisons), is restricted by the language standard to remain within the bounds of a single array object (or just after it), and will otherwise invoke undefined behavior. Adding or subtracting from a pointer moves it by a multiple of the size of its datatype. For example, adding 1 to a pointer to 4-byte integer values will increment the pointer's pointed-to byte-address by 4. This has the effect of incrementing the pointer to point at the next element in a contiguous array of integers—which is often the intended result. Pointer arithmetic cannot be performed on void pointers because the void type has no size, and thus the pointed address cannot be added to, although gcc and other compilers will perform byte arithmetic on void* as a non-standard extension, treating it as if it were char*.

Pointer arithmetic provides the programmer with a single way of dealing with different types: adding and subtracting the number of elements required instead of the actual offset in bytes. (Pointer arithmetic with char* pointers uses byte offsets, because sizeof(char) is 1 by definition.) In particular, the C definition explicitly declares that the syntax a[n], which is the n-th element of the array a, is equivalent to *(a + n), which is the content of the element pointed by a + n. This implies that n[a] is equivalent to a[n], and one can write, e.g., a[3] or 3[a] equally well to access the fourth element of an array a.

While powerful, pointer arithmetic can be a source of computer bugs. It tends to confuse novice programmers, forcing them into different contexts: an expression can be an ordinary arithmetic one or a pointer arithmetic one, and sometimes it is easy to mistake one for the other. In response to this, many modern high-level computer languages (for example Java) do not permit direct access to memory using addresses. Also, the safe C dialect Cyclone addresses many of the issues with pointers. See C programming language for more discussion.

The void pointer, or void*, is supported in ANSI C and C++ as a generic pointer type. A pointer to void can store the address of any object (not function),[a] and, in C, is implicitly converted to any other object pointer type on assignment, but it must be explicitly cast if dereferenced. K&R C used char* for the "type-agnostic pointer" purpose (before ANSI C).

int x = 4;
void* p1 = &x;
int* p2 = p1; // void* implicitly converted to int*: valid C, but not C++
int a = *p2;
int b = *(int*)p1; // when dereferencing inline, there is no implicit conversion

C++ does not allow the implicit conversion of void* to other pointer types, even in assignments. This was a design decision to avoid careless and even unintended casts, though most compilers only output warnings, not errors, when encountering other casts.

int x = 4;
void* p1 = &x;
int* p2 = p1; // this fails in C++: there is no implicit conversion from void*
int* p3 = (int*)p1; // C-style cast
int* p4 = reinterpret_cast<int*>(p1); // C++ cast

In C++, there is no void& (reference to void) to complement void* (pointer to void), because references behave like aliases to the variables they point to, and there can never be a variable whose type is void.

Pointer-to-member

[edit]

In C++ pointers to non-static members of a class can be defined. If a class MyClass has a member T a then &MyClass::a is a pointer to the member a of type T MyClass::*. This member can be an object or a function.[18] They can be used on the right-hand side of operators .* and ->* to access the corresponding member.

struct MyStruct {
    int a;
    
    [[nodiscard]]
    int f() const noexcept {
        return a;
    }
};

MyStruct s1{};
MyStruct* ptrS = &s1;

int MyStruct::* ptr = &MyStruct::a; // pointer to MyStruct::a
int (MyStruct::* fp)() const = &MyStruct::f; // pointer to MyStruct::f

s1.*ptr = 1;
std::println("{}", (s1.*fp)()); // prints 1
ptrS->*ptr = 2;
std::println("{}", (ptrS->*fp)()); // prints 2

Pointer declaration syntax overview

[edit]

These pointer declarations cover most variants of pointer declarations. Of course it is possible to have triple pointers, but the main principles behind a triple pointer already exist in a double pointer. The naming used here is what the expression typeid(type).name() equals for each of these types when using g++ or clang.[19][20]

char a[5][5]; // array of arrays of chars
char* b[5]; // array of pointers to chars
char** c; // pointer to pointer to char ("double pointer")
char (*d)[5]; // pointer to array(s) of chars
char* e(); // function which returns a pointer to char(s)
char (*f)(); // pointer to a function which returns a char 
char (*g())[5]; // function which returns pointer to an array of chars
char (*h[5])(); // an array of pointers to functions which return a char

The following declarations involving pointers-to-member are valid only in C++:

class X;
class Y;

char X::* a; // pointer-to-member to char
char X::* b[5]; // array of pointers-to-member to char 
char* X::* c; // pointer-to-member to pointer to char(s)
char X::** d; // pointer to pointer-to-member to char 
char (*e)[5]; // pointer-to-member to array(s) of chars 
char X::* f(); // function which returns a pointer-to-member to char
char Y::* X::* g; // pointer-to-member to pointer-to-member to pointer to char(s)
char X::* X::* h; // pointer-to-member to pointer-to-member to pointer to char(s)
char (X::* i())[5]; // function which returns pointer-to-member to an array of chars
char (X::* j)() // pointer-to-member-function which returns a char
char (X::* k[5])(); // an array of pointers-to-member-functions which return a char

The () and [] have a higher priority than *.[21]

C#

[edit]

In the C# programming language, pointers are supported by either marking blocks of code that include pointers with the unsafe keyword, or by using the System.Runtime.CompilerServices assembly provisions for pointer access. The syntax is essentially the same as in C++, and the address pointed can be either managed or unmanaged memory. However, pointers to managed memory (any pointer to a managed object) must be declared using the fixed keyword, which prevents the garbage collector from moving the pointed object as part of memory management while the pointer is in scope, thus keeping the pointer address valid.

However, an exception to this is from using the IntPtr structure, which is a memory managed equivalent to int*, and does not require the unsafe keyword nor the CompilerServices assembly. This type is often returned when using methods from the System.Runtime.InteropServices, for example:

using System;
using System.Runtime.InteropServices;

// Get 16 bytes of memory from the process's unmanaged memory
IntPtr pointer = Marshal.AllocHGlobal(16);

// Do something with the allocated memory

// Free the allocated memory
Marshal.FreeHGlobal(pointer);

The .NET framework includes many classes and methods in the System and System.Runtime.InteropServices namespaces (such as the Marshal class) which convert .NET types (for example, System.String) to and from many unmanaged types and pointers (for example, LPWSTR or void*) to allow communication with unmanaged code. Most such methods have the same security permission requirements as unmanaged code, since they can affect arbitrary places in memory.

C# allows stack-allocated arrays in safe code using System.Span.[22]

namespace Wikipedia.Examples;

using System;

public class Example
{
    static void Main(string[] args)
    {
        int num = 1024;
        unsafe 
        {
            // convert an int into bytes by creating a byte pointer
            byte* p = (byte*)&number;
            Console.Write("The 4 bytes of the integer are: ");
            for (int i = 0; i < sizeof(int); ++i)
            {
                Console.Write(" {0:X2}", *p);
                ++p;
            }
            Console.WriteLine();
        }

        // Stack-allocated arrays can be done either through pointers or Span<T>
        unsafe
        {
            int* numbers = stackalloc int[5];
        }
        Span<int> numbers = stackalloc int[5];
    }
}

C#, like C and C++, also has a void* (void pointer) type, but it is highly unrecommended.[23]

COBOL

[edit]

The COBOL programming language supports pointers to variables. Primitive or group (record) data objects declared within the LINKAGE SECTION of a program are inherently pointer-based, where the only memory allocated within the program is space for the address of the data item (typically a single memory word). In program source code, these data items are used just like any other WORKING-STORAGE variable, but their contents are implicitly accessed indirectly through their LINKAGE pointers.

Memory space for each pointed-to data object is typically allocated dynamically using external CALL statements or via embedded extended language constructs such as EXEC CICS or EXEC SQL statements.

Extended versions of COBOL also provide pointer variables declared with USAGE IS POINTER clauses. The values of such pointer variables are established and modified using SET and SET ADDRESS statements.

Some extended versions of COBOL also provide PROCEDURE-POINTER variables, which are capable of storing the addresses of executable code.

PL/I

[edit]

The PL/I language provides full support for pointers to all data types (including pointers to structures), recursion, multitasking, string handling, and extensive built-in functions. PL/I was quite a leap forward compared to the programming languages of its time.[citation needed] PL/I pointers are untyped, and therefore no casting is required for pointer dereferencing or assignment. The declaration syntax for a pointer is DECLARE xxx POINTER;, which declares a pointer named "xxx". Pointers are used with BASED variables. A based variable can be declared with a default locator (DECLARE xxx BASED(ppp); or without (DECLARE xxx BASED;), where xxx is a based variable, which may be an element variable, a structure, or an array, and ppp is the default pointer). Such a variable can be address without an explicit pointer reference (xxx=1;, or may be addressed with an explicit reference to the default locator (ppp), or to any other pointer (qqq->xxx=1;).

Pointer arithmetic is not part of the PL/I standard, but many compilers allow expressions of the form ptr = ptr±expression. IBM PL/I also has the builtin function PTRADD to perform the arithmetic. Pointer arithmetic is always performed in bytes.

IBM Enterprise PL/I compilers have a new form of typed pointer called a HANDLE.

D

[edit]

The D programming language is a derivative of C and C++ which fully supports C pointers and C typecasting.

Eiffel

[edit]

The Eiffel object-oriented language employs value and reference semantics without pointer arithmetic. Nevertheless, pointer classes are provided. They offer pointer arithmetic, typecasting, explicit memory management, interfacing with non-Eiffel software, and other features.

Fortran

[edit]

Fortran-90 introduced a strongly typed pointer capability. Fortran pointers contain more than just a simple memory address. They also encapsulate the lower and upper bounds of array dimensions, strides (for example, to support arbitrary array sections), and other metadata. An association operator, => is used to associate a POINTER to a variable which has a TARGET attribute. The Fortran-90 ALLOCATE statement may also be used to associate a pointer to a block of memory. For example, the following code might be used to define and create a linked list structure:

type real_list_t
  real :: sample_data(100)
  type (real_list_t), pointer :: next => null ()
end type

type (real_list_t), target :: my_real_list
type (real_list_t), pointer :: real_list_temp

real_list_temp => my_real_list
do
  read (1,iostat=ioerr) real_list_temp%sample_data
  if (ioerr /= 0) exit
  allocate (real_list_temp%next)
  real_list_temp => real_list_temp%next
end do

Fortran-2003 adds support for procedure pointers. Also, as part of the C Interoperability feature, Fortran-2003 supports intrinsic functions for converting C-style pointers into Fortran pointers and back.

Go

[edit]

Go has pointers. Its declaration syntax is equivalent to that of C, but written the other way around, ending with the type. Unlike C, Go has garbage collection, and disallows pointer arithmetic. Reference types, like in C++, do not exist. Some built-in types, like maps and channels, are boxed (i.e. internally they are pointers to mutable structures), and are initialized using the make function. In an approach to unified syntax between pointers and non-pointers, the arrow (->) operator has been dropped: the dot operator on a pointer refers to the field or method of the dereferenced object. This, however, only works with 1 level of indirection.

Java

[edit]

There is no explicit representation of pointers in Java. Instead, more complex data structures like objects and arrays are implemented using references. The language does not provide any explicit pointer manipulation operators. It is still possible for code to attempt to dereference a null reference (null pointer), however, which results in a run-time exception being thrown. The space occupied by unreferenced memory objects is recovered automatically by garbage collection at run-time.[24]

Java provides the classes java.lang.ref.WeakReference and java.lang.ref.PhantomReference, which respectively implement weak references and phantom references.

Modula-2

[edit]

Pointers are implemented very much as in Pascal, as are VAR parameters in procedure calls. Modula-2 is even more strongly typed than Pascal, with fewer ways to escape the type system. Some of the variants of Modula-2 (such as Modula-3) include garbage collection.

Oberon

[edit]

Much as with Modula-2, pointers are available. There are still fewer ways to evade the type system and so Oberon and its variants are still safer with respect to pointers than Modula-2 or its variants. As with Modula-3, garbage collection is a part of the language specification.

Pascal

[edit]

Unlike many languages that feature pointers, standard ISO Pascal only allows pointers to reference dynamically created variables that are anonymous and does not allow them to reference standard static or local variables.[25] It does not have pointer arithmetic. Pointers also must have an associated type and a pointer to one type is not compatible with a pointer to another type (e.g. a pointer to a char is not compatible with a pointer to an integer). This helps eliminate the type security issues inherent with other pointer implementations, particularly those used for PL/I or C. It also removes some risks caused by dangling pointers, but the ability to dynamically let go of referenced space by using the dispose standard procedure (which has the same effect as the free library function found in C) means that the risk of dangling pointers has not been eliminated.[26]

However, in some commercial and open source Pascal (or derivatives) compiler implementations —like Free Pascal,[27] Turbo Pascal or the Object Pascal in Embarcadero Delphi— a pointer is allowed to reference standard static or local variables and can be cast from one pointer type to another. Moreover, pointer arithmetic is unrestricted: adding or subtracting from a pointer moves it by that number of bytes in either direction, but using the Inc or Dec standard procedures with it moves the pointer by the size of the data type it is declared to point to. An untyped pointer is also provided under the name Pointer, which is compatible with other pointer types.

Perl

[edit]

The Perl programming language supports pointers, although rarely used, in the form of the pack and unpack functions. These are intended only for simple interactions with compiled OS libraries. In all other cases, Perl uses references, which are typed and do not allow any form of pointer arithmetic. They are used to construct complex data structures.[28]

Rust

[edit]

The Rust language has pointers, however raw pointers must be wrapped inside unsafe blocks. Most operations with raw pointers can be found in std::ptr. Otherwise, using smart pointers Rust has the following:

  • std::boxed::Box: equivalent to a unique pointer[29]
  • std::rc::Rc: equivalent to a reference-counted single-threaded shared pointer
  • std::sync::Arc: equivalent to an atomically reference-counted thread safe shared pointer
  • std::rc::Weak: equivalent to a weak pointer[30]

A pointer to T (T* in C/C++) is written as *T in Rust. The following demonstrates raw pointers in Rust:

fn main() {
    let mut num: i32 = 42;

    let r1 = &num as *const i32;
    let r2 = &mut num as *mut i32;

    unsafe {
        println!("r1 points to: {}", *r1);
        println!("r2 points to: {}", *r2);

        *r2 = 100;
        println!("num is now: {}", num);
    }
}

See also

[edit]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In , a pointer is a variable that stores the of another variable or object, allowing indirect access to the data stored at that address rather than holding the data itself. This mechanism enables programmers to manipulate memory directly, which is essential for tasks like dynamic memory allocation and efficient implementation. Pointers originated in early programming languages as a way to handle memory addressing more flexibly than fixed indices. In the and , they evolved through languages like and , where they functioned as integer indices into memory arrays, using operators for indirection. formalized pointers in the C language during its development from 1971 to 1973 at , introducing a typed system where pointers represent byte addresses and support arithmetic operations, such as incrementing to point to the next memory location. This design eliminated runtime overhead for address scaling and integrated seamlessly with arrays, treating array names as pointers to their first element. In languages like C and C++, pointers are declared with syntax such as int *ptr;, where the asterisk denotes a pointer to an integer, and initialized using the address-of operator &, as in ptr = &variable;. Dereferencing with * accesses or modifies the pointed-to value, enabling pass-by-reference semantics in functions to avoid copying large data structures. Pointers also support more complex uses, such as pointing to functions or structure members via the arrow operator ->, which streamlines access in object-oriented contexts. While powerful for and performance-critical applications, pointers require careful management to prevent issues like uninitialized references or invalid addresses, which can cause program crashes. Modern languages like and Go incorporate safer variants, such as references or raw pointers with ownership rules, to mitigate these risks while retaining low-level control. Overall, pointers remain a cornerstone of in imperative and systems-level programming.

Basics

Definition and Core Concepts

In computer programming, a pointer is a data type that stores the memory address of another value, rather than the value itself, thereby enabling indirect access and manipulation of data stored at that location. This mechanism allows programs to reference and interact with data without directly embedding the data's contents, facilitating more flexible and efficient memory usage. At its core, a pointer serves as an holder, distinct from the pointee, which is the actual or object residing at the referenced ; accessing the pointee requires dereferencing the pointer to retrieve or modify the target's value. This distinction abstracts away the physical details of locations, allowing programmers to treat addresses as symbolic s for operations like linking structures or passing large objects by , which optimizes by avoiding unnecessary copying. Pointers thus play a pivotal role in abstracting , enabling indirect addressing that supports advanced programming techniques while relying on the underlying hardware's ability to resolve addresses. To illustrate conceptually, imagine a memory layout where a variable named x occupies bytes at 0x1000, holding the value 42; a pointer variable p at 0x2000 would store the value 0x1000, effectively "" to x—dereferencing p then yields 42, as if following an arrow from p to the location of x. This pointer-pointee relationship highlights how pointers enable dynamic referencing without altering the pointee's storage. Pointers operate within the memory model of the , which assumes a flat, linear where both program instructions and data reside in a unified sequence of addressable locations, typically bytes, each uniquely identified by a numeric starting from zero. In this model, memory is treated as a contiguous of bytes, allowing pointers to represent any valid as an integer offset, independent of the stored there, which underpins the architecture's stored-program concept.

Representation and Value

In computer systems, pointers are internally represented as fixed-size integers that encode memory addresses in . The size of these integers corresponds to the system's ; for instance, on 32-bit systems, pointers are typically 32 bits (4 bytes) long, allowing addressing of up to 4 gigabytes of , while on 64-bit systems, they are 64 bits (8 bytes) long to support vastly larger address spaces. This binary encoding directly maps to the machine's word size, ensuring efficient storage and manipulation within registers and . The value held by a pointer represents a , most commonly an absolute virtual address within the process's , which the (MMU) translates to a at runtime. In some architectures, pointers may instead store relative offsets from a base address, but absolute addressing predominates in flat memory models used by . Certain pointer values are considered invalid; for example, an all-zeroes value often denotes the , indicating no valid object or an inaccessible location in various systems, though the exact interpretation can vary by implementation. Address space considerations further influence pointer representation, distinguishing between virtual and physical addressing schemes. Virtual addressing, standard in contemporary processors, allows pointers to reference a large, contiguous logical per , independent of physical layout, with the operating system handling translations via page tables. Physical addressing, rarer in user-level programming, directly encodes hardware locations but limits portability. Additionally, the endianness of the system affects how multi-byte pointers are ordered in : little-endian architectures (e.g., x86) store the least significant byte at the lowest address, while big-endian ones (e.g., some network protocols or PowerPC) reverse this order, impacting data and cross-platform compatibility.

Historical Development

Origins in Computing Architecture

The concept of pointers emerged from the architectural necessities of early stored-program computers in the and , where hardware mechanisms for indirect addressing and address modification addressed the challenges of accessing non-contiguous locations efficiently. In the von Neumann model, outlined in John von Neumann's 1945 First Draft of a Report on the , the stored-program paradigm merged instructions and data in the same address space, requiring mechanisms to manipulate addresses efficiently for dynamic program execution. This foundational design motivated the development of register-based addressing to support dynamic program execution without excessive computational overhead. Early implementations appeared in machines like the , completed in at the , which used accumulator-based architecture and relied on to simulate indirect addressing, allowing instructions to alter addresses on the fly for efficient access to non-sequential data. Although the original lacked dedicated index registers, a feature introduced contemporaneously in the —this approach highlighted the need for hardware support in handling variable memory references, reducing the burden of manual address bookkeeping in scientific computations. The , delivered in 1951 as the first commercial , extended these ideas through its accumulator design, incorporating address modification capabilities that enabled indirect-like operations for business , where flexible referencing to variable-length records improved efficiency over rigid . A pivotal milestone came with the computer at MIT, operational by late 1951, which employed address registers to facilitate real-time indirect addressing, essential for its interactive simulations and military applications. Whirlwind's integration of further underscored the architectural drive for pointer primitives, as the random-access nature of core storage demanded quick, non-contiguous addressing without repeated full-path calculations, achieving access times of approximately 8 microseconds per word. These hardware innovations stemmed from the von Neumann architecture's core requirement for address , enabling programs to treat locations as manipulable values and laying the groundwork for scalable systems.

Evolution in Programming Languages

The concept of pointers in programming languages emerged as a means to manage and passing more efficiently than in earlier assembly-level approaches. In , released in 1960, the introduction of call-by-name passing provided a mechanism akin to reference parameters, where actual parameters were textually substituted into the procedure body upon each use, allowing modifications to the original data without explicit address manipulation. This feature, while not a direct pointer type, laid groundwork for indirect referencing by simulating dynamic evaluation and side effects on caller variables. Building on this, , developed by and first specified in 1964, formalized explicit pointer variables as a core language feature, enabling direct manipulation of memory addresses for data structures and dynamic allocation. Harold Lawson is credited with inventing the pointer variable concept during 's design, integrating it to support both scientific and business computing needs with type-safe indirection. The Burroughs B5000 system, introduced in 1961, further influenced pointer-like mechanisms through its tagged architecture, where descriptors—extended words with tag bits indicating data types and bounds—served as hardware-supported pointers for high-level languages like and , promoting safer memory access and stack-based operations. Pointers gained widespread adoption with the C programming language, developed by at between 1971 and 1973, where explicit typed pointers became central to its design for on the PDP-11. Drawing from B's indirection operator but adding structure types and byte addressing, C's pointers enabled array decay to pointers and arithmetic, popularizing their use for low-level control while maintaining portability across Unix implementations. By the late 1970s, languages like Pascal (1970) incorporated pointers with dereferencing via the caret symbol (^), restricting arithmetic to enhance safety for educational and general-purpose use. Standardization efforts in the late solidified pointer semantics, with the ANSI X3.159-1989 standard (later ISO/IEC 9899:1990) defining C's pointer behaviors, including null pointers, conversions, and undefined behaviors for arithmetic beyond object bounds, ensuring consistent implementation across compilers. In parallel, C++, evolving from "C with Classes" in 1979, introduced references in 1985 as safer aliases to objects, reducing reliance on explicit pointers for function parameters and , while retaining pointers for dynamic allocation. The also saw a toward abstracted references in higher-level languages; for instance, Ada's 1983 standard used access types as typed pointers with built-in null checks and no unchecked arithmetic, prioritizing safety in safety-critical systems over raw control. This evolution culminated in the 1990s with C++ smart pointers, such as reference-counted classes proposed in , which encapsulated raw pointers to automate memory deallocation and prevent leaks, addressing common pitfalls in manual management. By then, the transition from assembly's direct addressing to higher-level abstractions like references in languages such as and early object-oriented designs emphasized reliability, influencing modern paradigms where explicit pointers are often confined to performance-critical code.

Formal Foundations

Mathematical Description

In the denotational semantics model for (Papaspyrou 2001), a pointer can be abstracted as a function p:VAp: V \to A, where VV denotes the value space comprising all possible data values in the system, and AA represents the consisting of unique locations. The dereference operation is then defined as the p:AV*p: A \to V, which retrieves the value stored at the specified address. This set-theoretic formulation separates the conceptual layers of and access, treating pointers independently of specific hardware implementations. Indirection levels arise naturally through in this model. A single pointer pp maps a value to its address, while a double pointer pppp represents the composition pp:AAp \circ *p: A \to A, allowing access to addresses of addresses; dereferencing yields pp:AV*pp: A \to V applied iteratively. Higher-order indirections extend this pattern, with nn-level modeled as nested compositions, ensuring that each layer preserves the between valid addresses and values within the defined spaces. The address space AA is governed by axioms ensuring unique addressing and locality. Uniqueness requires that each element in AA corresponds to exactly one memory location, formalized as an injection from allocated objects to addresses, preventing overlaps. Locality axiomatically bounds addresses to aggregate structures, such that offsets within an object remain valid only relative to its base address. Pointer equality follows directly: two pointers pp and qq are equal if and only if address(p)=address(q)\text{address}(p) = \text{address}(q), where address\text{address} extracts the underlying location from the function representation. Theoretical properties of this model emphasize and behavior in formal semantics. is enforced through semantic domains where pointer types ptr τ\text{ptr } \tau restrict operations to compatible value spaces, preventing ill-typed dereferences via rules that validate compositions. Pointer manifests as shared references when distinct pointers map to the same address in AA, allowing concurrent access to the same value but requiring careful handling to avoid undefined behaviors in semantic evaluations.

Hardware Architectural Roots

The foundational hardware support for pointers emerged in early mainframe processors through mechanisms like index registers and indirect addressing modes, which enabled efficient memory indirection and offset calculations. The , introduced in 1954, was the first IBM computer to incorporate index registers, featuring three such registers that modified addresses by adding the of their contents to an instruction's base address, facilitating indexed access patterns akin to pointer arithmetic. Indirect addressing, a core pointer operation, was further refined in successors like the , where it required an additional execution cycle to fetch the effective address from memory, allowing instructions to reference locations dynamically rather than statically. These features laid the groundwork for pointers by decoupling logical addresses from physical ones at the hardware level, reducing the need for manual address recalculation in assembly code. Memory management units (MMUs) extended this support in the 1960s by automating virtual-to-physical address translation, treating pointers as virtual references resolved at runtime. The Atlas computer, operational from 1962, pioneered this with Page Address Registers (PARs) that mapped 512-word virtual pages to physical core store blocks using associative matching; a virtual address's page identifier was compared against PAR contents, and on a match, the physical block address was concatenated with the intra-page offset to yield the final location. If no match occurred, a page fault interrupted execution, prompting the supervisor to load the page from secondary storage (e.g., a drum) and update the PAR, thus enabling pointers to operate in a larger virtual space without direct physical addressing. This hardware abstraction became standard, influencing subsequent designs like those in the Burroughs B5000 series, where descriptor-based translation similarly virtualized pointer targets. In modern processors, pointers interact closely with cache hierarchies and translation lookaside buffers (TLBs), influencing locality and prefetching efficiency. Pointer chasing—common in linked structures—often disrupts spatial and temporal locality, leading to high cache miss rates (e.g., up to 83% in L3 for pointer-intensive benchmarks) as sequential accesses jump irregularly across . TLBs, which cache recent virtual-to-physical mappings, exacerbate this; frequent pointer-induced page crossings increase TLB misses, stalling translation and amplifying latency. Hardware prefetchers mitigate these effects by predicting pointer transitions, such as using a pointer cache to track and preload target objects into L2 cache, achieving up to 50% in dependency-bound workloads by breaking serial chains. Architectural examples illustrate these roots' evolution. The PDP-11, introduced in 1970, supported base+offset addressing in its index mode (mode 6), where an instruction fetched an offset from the subsequent word and added it to a general register (e.g., R4) to compute the effective address, enabling pointer-like array traversals without altering the base. It also provided indirect addressing via deferred modes (e.g., mode 1: @Rn), where the register held the address of the operand rather than the operand itself, requiring an extra memory fetch—mirroring pointer dereferencing. In contemporary x86 architectures, segmentation persists but is typically configured in a flat model, where segment registers (CS, DS, etc.) point to full 4 GB linear spaces, allowing pointers to function as simple offsets while retaining hardware support for base relocation via segment descriptors in . This setup, detailed in Intel's architecture manuals, uses the to define segment limits and bases, ensuring compatibility with legacy pointer operations while prioritizing virtual addressing via paging.

Primary Uses

Data Structures and Arrays

In computer programming, pointers play a fundamental role in implementing arrays as data structures by providing a mechanism to access elements stored in contiguous memory locations. An is essentially treated as a pointer to its first element, allowing direct manipulation through pointer arithmetic. For instance, in languages like , the name of an decays to a pointer to its initial element, enabling efficient traversal by incrementing the pointer to step through subsequent elements without explicit indexing. This equivalence between arrays and pointers facilitates operations such as accessing the nth element via pointer addition, where the address of arr is computed as arr + n, assuming arr points to the base address.

c

int arr[5] = {1, 2, 3, 4, 5}; int *ptr = arr; // ptr now points to arr[0] for (int i = 0; i < 5; i++) { printf("%d ", *ptr); // Prints each element ptr++; // Advances to next element }

int arr[5] = {1, 2, 3, 4, 5}; int *ptr = arr; // ptr now points to arr[0] for (int i = 0; i < 5; i++) { printf("%d ", *ptr); // Prints each element ptr++; // Advances to next element }

This approach leverages the contiguous allocation of arrays to achieve constant-time O(1) random access to any element by offsetting from the base pointer. Pointers extend beyond arrays to enable dynamic and non-contiguous structures, such as linked lists, where each node contains and a pointer to the next node. In a singly linked list, the head pointer references the first node, and traversal proceeds by following the next pointers until reaching a null pointer, allowing flexible node insertion and deletion without fixed-size constraints. Doubly linked lists incorporate additional previous pointers for bidirectional traversal, enhancing efficiency for operations like reversing the list. These structures contrast with arrays by distributing elements across memory, avoiding the need for resizing entire blocks during modifications.

c

struct Node { int data; struct Node *next; }; struct Node *head = NULL; // Empty list // Traversal example struct Node *current = head; while (current != NULL) { printf("%d ", current->data); current = current->next; }

struct Node { int data; struct Node *next; }; struct Node *head = NULL; // Empty list // Traversal example struct Node *current = head; while (current != NULL) { printf("%d ", current->data); current = current->next; }

Trees and graphs further utilize pointers to represent hierarchical or networked relationships; in binary trees, each node holds pointers to left and right children, enabling recursive traversal algorithms like in-order or , while graphs employ adjacency lists where each vertex points to a of neighboring vertices. The use of pointers in these structures provides key efficiency benefits, including dynamic sizing that accommodates varying data volumes without preallocating fixed space, unlike rigid arrays. Linked lists support O(1) time complexity for insertions and deletions at known positions (e.g., head or tail), compared to O(n) for arrays due to shifting elements, making pointers ideal for scenarios requiring frequent structural changes, such as queue operations or graph traversals. However, this comes at the cost of O(n) access time in linked lists versus O(1) in arrays, highlighting a space-time trade-off where pointers enable adaptability at the expense of cache locality.

Dynamic Memory Allocation

Dynamic memory allocation allows programs to request memory from the heap at runtime, with pointers serving as the mechanism to access and manage the allocated space. In C, the malloc function allocates a block of memory of a specified size in bytes and returns a pointer to the beginning of that block, which must be of type void* and explicitly cast to the appropriate type for use. This pointer enables the program to store and manipulate data dynamically, such as variable-sized arrays or structures, without relying on compile-time fixed sizes. Similarly, in C++, the new operator allocates memory on the heap for objects or arrays and returns a pointer to the allocated memory, automatically invoking constructors for objects. These mechanisms provide flexibility for applications needing runtime adaptability, like building dynamic data structures. Manual memory management requires explicit deallocation to prevent resource waste, using free in C to release the memory block pointed to by the provided pointer, or delete and delete[] in C++ to deallocate single objects or arrays while calling destructors. Failure to deallocate, such as when the pointer to the allocated memory is lost or overwritten without calling the deallocation function, results in a memory leak, where the memory remains allocated but inaccessible for the program's duration. In contrast, languages with garbage collection, such as , use pointers (implemented as references) to track object reachability from active roots like the stack or global variables, automatically reclaiming memory for unreachable objects without explicit deallocation calls. This approach reduces the risk of leaks from programmer error but introduces overhead from periodic collection cycles. Pointer-based dynamic allocation can lead to fragmentation, degrading memory efficiency over time. Internal fragmentation occurs when an allocated block exceeds the requested due to alignment requirements or allocator overhead, leaving unusable within the block. External fragmentation arises from repeated allocations and deallocations creating scattered free holes too small for new requests, even if total free memory suffices, often exacerbated by varying allocation sizes and lifetimes in pointer-managed heaps. Allocators mitigate this through strategies like coalescing adjacent free blocks or using buddy systems, but fragmentation remains a key challenge in manual pointer-based systems.

Pass-by-Reference and Function Pointers

In languages like C that primarily use pass-by-value semantics for function parameters, modifications to arguments within a function do not affect the original variables in the calling scope, as only copies of the values are passed. To simulate pass-by-reference behavior—allowing functions to modify the caller's variables—pointers are employed by passing the address of the variable as an argument, enabling the function to dereference and alter the original data. For instance, a function to swap two integers can be implemented using pointers to their addresses, ensuring the changes persist after the function returns. Function pointers in C provide a mechanism to store the of a function, facilitating indirect invocation and without direct calls. The syntax declares a pointer with the function's return type, followed by parentheses containing an and the pointer name, then the parameter types in parentheses, such as void (*fp)(int); for a pointer to a function taking an and returning void. These pointers can be assigned the address of compatible functions and invoked by dereferencing, like (*fp)(42); or simply fp(42);, allowing runtime selection of executable code. Arrays of function pointers enable callbacks, where a higher-level function receives a pointer to a user-defined function and invokes it during execution, such as for event handling or custom . In graphical user interfaces or sorting algorithms like , callbacks allow client code to specify behavior without altering the core library. Similarly, virtual method tables (vtables) are structures—often arrays of function pointers—used to implement runtime polymorphism in C by associating object types with their method implementations, akin to control tables in hardware for indirect jumps. Function pointers in C enable polymorphism by allowing structs to include tables of pointers to type-specific functions, simulating object-oriented and at runtime. For example, a base struct can define a vtable with pointers to common operations, while derived structs populate their own vtables with specialized implementations; invoking through the struct's pointer resolves to the appropriate function dynamically. This approach provides behavioral flexibility, such as selecting drawing methods for different shapes, without built-in support for classes.

Typing and Operations

Typed Pointers and Casting

In computer programming, particularly in languages like C, pointers are typically typed, meaning they are declared with a specific data type that indicates the type of object they reference. This type information, such as int* for a pointer to an integer or char* for a pointer to a character, allows the compiler to validate operations like dereferencing and ensures that the pointer is used in a manner consistent with the pointed-to object's layout and size. For instance, dereferencing an int* expects to read or write four bytes (on most platforms), while a char* handles one byte, preventing inadvertent data corruption from type mismatches. Untyped or generic pointers, such as void* in C, lack this specific type association and can store the address of any object type, facilitating generic programming in functions like malloc or qsort. However, void* pointers cannot be directly dereferenced; an explicit cast to a typed pointer is required before accessing the underlying data, as in int* p = (int*)void_ptr;. This design promotes flexibility but introduces risks, as incorrect casts can lead to undefined behavior if the cast type does not match the object's effective type. Pointer casting in C is performed using the explicit cast operator (type)expression, which converts a pointer of one type to another, including between incompatible types. For example, casting a char* to an int* reinterprets the memory address, but such conversions are implementation-defined or undefined if they violate alignment requirements or the object's effective type. The use of void* often involves implicit conversions to and from other pointer types, but explicit casts are needed for dereferencing, underscoring the language's reliance on programmer discipline for correctness. Type safety benefits from typed pointers arise through compiler-enforced checks during dereference operations, where mismatched types trigger compilation errors, and runtime rules like , which prohibit accessing an object through a pointer of an incompatible type to enable optimizations. Strict aliasing rules specify that an object of effective type T can only be accessed via lvalues of compatible types or character types, preventing aliasing-related and allowing compilers to assume non-overlapping accesses for better performance. Violations, such as dereferencing a float* on memory with an effective type of int, result in , highlighting the trade-off between pointer flexibility and safety.

Pointer Arithmetic and Manipulation

Pointer arithmetic refers to the set of operations that can be performed on pointers to navigate memory, primarily in languages like C where pointers directly represent addresses. These operations are constrained to ensure type safety and prevent arbitrary memory access, with arithmetic scaled according to the size of the type being pointed to. For instance, incrementing a pointer to a char advances the address by 1 byte, while incrementing a pointer to an int (typically 4 bytes on many systems) advances by 4 bytes. This scaling is automatic: the expression p + n, where p is a pointer of type T* and n is an integer, computes p + n * sizeof(T). Similarly, p - n subtracts n * sizeof(T) from the address. Valid operations on pointers include addition and subtraction of integers, as well as subtraction between two pointers of the same type. Adding an integer n to a pointer p yields a pointer to the element at position i + n in the same array object, assuming p points to index i. Subtraction of pointers p1 - p2 (where both point into the same array) returns the difference in their indices as a ptrdiff_t value, not the raw byte difference. Comparisons such as <, >, ==, and != are defined between pointers to the same array (or one past the end), allowing checks for relative positions, such as verifying array bounds. These operations treat a single object as a one-element array for arithmetic purposes. Multiplication and division involving pointers are not permitted, as they lack semantic meaning in this context. A key equivalence in C is the decay of arrays to pointers: an expression of array type implicitly converts to a pointer to its first element, except in specific contexts like sizeof or the unary & operator. This decay enables seamless use of arrays in pointer arithmetic; for example, if arr is an array of int, the expression arr + i points to the i-th element, and &(arr[i]) equals arr + i. The resulting pointer type is T* for an array of type T[N]. This behavior underpins array indexing as pointer arithmetic, where arr[i] is equivalent to *(arr + i). Pointer arithmetic is strictly limited to elements within the same array object to avoid . Operations that would point outside the array bounds, such as p - 1 when p is the first element or p + n exceeding the array length, result in . Additionally, if the computed result cannot be represented in ptrdiff_t or violates rules, the behavior is undefined. These constraints, as specified in the C standard (e.g., C17 6.5.6), ensure that arithmetic remains meaningful only for array navigation.

c

int arr[5] = {1, 2, 3, 4, 5}; int *p = arr; // arr decays to &arr[0] p = p + 2; // Now points to arr[2], address advanced by 2 * sizeof(int) int diff = (p + 3) - p; // diff == 3 (index difference) if (p < arr + 5) { // Valid comparison within array // Bounds check passes }

int arr[5] = {1, 2, 3, 4, 5}; int *p = arr; // arr decays to &arr[0] p = p + 2; // Now points to arr[2], address advanced by 2 * sizeof(int) int diff = (p + 3) - p; // diff == 3 (index difference) if (p < arr + 5) { // Valid comparison within array // Bounds check passes }

This example illustrates the scaling and equivalence, assuming sizeof(int) == 4; the address of p + 1 would be &arr[0] + 4.

Safety Mechanisms

Common Pointer Errors

One of the most prevalent pointer errors is dereferencing a null pointer, which occurs when a program attempts to access memory at address zero, a conventional null value represented as 0x0 in C or nullptr in C++. This action typically triggers a segmentation fault because the operating system protects the null address space from reads or writes, halting execution to prevent invalid memory access. For example, in C code, if a pointer is initialized to NULL but dereferenced without checking, such as *ptr = 42;, the program crashes immediately upon attempting the assignment. Dangling pointers arise when a pointer references memory that has been freed or gone out of scope, leading to use-after-free bugs where the program accesses invalid or reused memory locations. The cause is often failing to update the pointer after deallocation with functions like in C, leaving it pointing to the now-invalid address. A common scenario involves dynamically allocated memory that is released, but the pointer is later dereferenced, potentially corrupting data or executing arbitrary code if the memory is reallocated. Uninitialized pointers, sometimes termed wild pointers, point to arbitrary or garbage memory addresses because they have not been assigned a valid value upon declaration. This error manifests when a pointer variable is used before initialization, such as in function calls or indirection. For instance, declaring int *p; without setting p = NULL; or allocating memory, then using *p, accesses unpredictable memory, leading to erratic behavior. These errors commonly present as program crashes like segmentation faults, output of garbage data from unintended memory reads, or security vulnerabilities such as buffer overflows when pointer arithmetic exceeds allocated bounds. Buffer overflows, in particular, can exploit pointer mishandling to overwrite adjacent memory, enabling attackers to inject malicious code or escalate privileges in vulnerable applications.

Techniques for Safer Pointers

To mitigate common risks associated with raw pointers, such as memory leaks and dangling references, programming languages and tools have introduced various mechanisms for safer pointer handling. These techniques emphasize automatic resource management, runtime verification, and compile-time checks to prevent errors without sacrificing performance entirely. In C++, smart pointers implement the Resource Acquisition Is Initialization (RAII) idiom to ensure automatic cleanup of dynamically allocated memory. The std::unique_ptr provides exclusive ownership of an object, automatically deleting it when the pointer goes out of scope, thus preventing leaks from forgotten deallocations. Similarly, std::shared_ptr enables shared ownership through reference counting, decrementing the count and deleting the object only when the last reference is destroyed, which is particularly useful for graphs of interconnected objects. These constructs, introduced in C++11, replace raw pointers in modern codebases to enforce deterministic lifetime management. Bounds checking tools and representations address spatial memory errors like buffer overflows. AddressSanitizer (ASan), developed by Google, instruments code at compile time to detect out-of-bounds accesses, use-after-free, and other pointer-related violations at runtime with low overhead, typically under 2x slowdown in execution. It has been integrated into compilers like Clang and GCC, aiding debugging in large projects such as Chromium. Complementing this, fat pointers extend standard pointers with embedded metadata for bounds (e.g., start address and length), enabling hardware or software checks on every access; low-fat variants optimize space by encoding bounds in unused pointer bits on 64-bit systems, reducing overhead to about 1.1x for SPEC benchmarks while maintaining compatibility. Certain languages incorporate pointer safety directly into their type systems. Rust's borrow checker enforces an ownership model at compile time, where pointers (references) are borrowed immutably or mutably under strict rules: only one mutable borrow or multiple immutable borrows are allowed at a time, preventing data races and use-after-free errors without a garbage collector. This static analysis rejects unsafe aliasing, as demonstrated in its effectiveness for systems programming, with adoption in projects like the Linux kernel. In Java, explicit pointers are absent; instead, references are managed by automatic garbage collection, which traces reachable objects from roots (e.g., stack variables) and reclaims unreferenced memory, eliminating manual deallocation risks like leaks or invalid accesses. The JVM's generational collectors, such as G1, further optimize this for low pause times in production environments. Beyond language features, best practices and tools promote disciplined pointer usage. Pointers should always be initialized to nullptr or a valid address upon declaration to avoid dereferencing garbage values, a rule codified in secure coding standards. Before dereferencing, validate against nullptr and, where applicable, check bounds or ownership; static analysis tools like those from CERT enforce this by flagging uninitialized or unchecked pointers during compilation. These habits, combined with regular use of sanitizers, significantly reduce vulnerabilities in C/C++ codebases.

Specialized Variants

Null and Dangling Pointers

In computer programming, null pointers represent an intentional absence of a valid memory address, serving as a sentinel value to indicate that a pointer does not refer to any object. In the C programming language, the NULL macro is defined as an implementation-defined null pointer constant, typically expanding to an integer constant expression such as 0 or (void*)0, which can be converted to any pointer type without a diagnostic message. This macro, included in headers like <stddef.h>, allows explicit signaling of uninitialized or invalid pointers, though its integer-based representation has led to type-related issues in generic code. In Python, the None object functions as the equivalent null value, a singleton instance of type NoneType that denotes the lack of a value and is commonly returned from functions to signify failure or absence without raising an exception. Unlike C's NULL, None is an object treated with , ensuring it remains immortal and unchanging across program execution. Higher-level languages often extend this concept through optional types; for instance, Haskell's Maybe type encapsulates optional values as either Just a (containing a value of type a) or Nothing (indicating absence), enabling safe handling of potential null cases via without runtime errors. Dangling pointers, in contrast, arise unintentionally when a pointer references that is no longer valid or allocated to the program, leading to upon access. These can occur on the stack when a pointer captures the of a within a function scope, but the function returns and the stack frame is deallocated, invalidating the due to the variable's limited lifetime. For example, in , returning a pointer to a local from a function creates a , as the is reused for subsequent calls, potentially causing or crashes. On the heap, dangling pointers result from explicit deallocation via free() or delete without nullifying the pointer, leaving it pointing to reclaimed that may be reallocated for other purposes. Such heap-based issues persist beyond local scopes, as the persists until reused, but accessing it violates program invariants and can lead to security vulnerabilities if exploited. In garbage-collected environments, back pointers—references from objects back to their referrers—can create cycles that prevent automatic reclamation, retaining objects in despite unreachability from . Reference-counting collectors fail to detect these cycles, as mutual references keep counts above zero, leading to memory leaks unless augmented with tracing mechanisms. If a cycle is broken (e.g., by nullifying a back pointer), the involved objects may become dangling if not properly managed, exposing them to premature collection or invalid access. Errors from null and dangling pointers, such as segmentation faults or races, are common pitfalls in low-level languages but can be mitigated through disciplined initialization.

Indirection and Structural Pointers

Indirection in pointers allows a pointer to reference another pointer, enabling layered access to through successive dereferences. This concept, known as multiple , is fundamental in languages like , where a double pointer (e.g., char**) stores the of a single pointer, which in turn points to a character. To access the underlying , multiple dereference operations (*) are required, such as **ptr for a double pointer. Multiple facilitates dynamic modifications to pointer values within functions and supports complex structures like argument lists. A practical example of multiple indirection appears in the main function's command-line arguments in C, where argv is declared as char**argv, forming an array of pointers to strings. Each element argv[i] is a char* pointing to the i-th argument string, allowing the program to access and process variable numbers of string arguments passed at runtime. Triple pointers (e.g., char***) extend this further, often used in scenarios requiring modification of arrays of pointers, such as building dynamic multi-dimensional structures. Autorelative pointers, also called self-relative pointers, store offsets relative to the pointer's own rather than absolute addresses, promoting code relocatability without recompilation. This approach is particularly useful in (PIC) and persistent memory systems, where the offset is computed as the difference between the target's address and the pointer's address, enabling seamless relocation across memory mappings. In implementations like those for byte-addressable persistent memory, self-relative pointers avoid the need for pointer rewriting during loading, reducing overhead in virtualized or distributed environments. Based pointers operate by adding an offset to a base stored in a register, a mechanism prevalent in segmented models such as those in the Intel x86 architecture. In this model, is divided into segments, each defined by a base register value, and pointers consist of a segment selector plus an offset, allowing efficient addressing within large spaces while providing boundaries. This structure was key in early protected-mode systems, where segment registers hold the base, and offsets enable sparse allocation without contiguous physical . Arrays of pointers provide a flexible way to simulate multi-dimensional arrays, particularly ones where rows vary in length. In , a 2D array can be represented as an array of pointers (T**), where the first dimension is an array of pointers, each pointing to a separate 1D for a row. This allocation strategy—first allocating the pointer array, then each row —saves space for non-rectangular data and allows independent resizing of rows, contrasting with contiguous 2D blocks. Such structures are dynamically allocated using malloc for each level, enabling runtime adaptability in applications like matrix processing or graph representations.

Function and Control Pointers

Function pointers enable indirect invocation of functions by storing their memory addresses, allowing runtime selection and execution of code. In the C programming language, a function pointer is declared using syntax such as int (*fp)(int), where fp points to a function accepting an integer argument and returning an integer. This construct supports callbacks, where a function is passed as an argument to another function for later invocation; for example, the qsort library function accepts a comparison function pointer int (*compar)(const void *, const void *) to customize sorting behavior dynamically. Similarly, signal handling uses function pointers like void (*handler)(int) to register routines executed upon specific events, such as interrupts. Dynamic loading extends function pointers by allowing code to be loaded and invoked at runtime without recompilation. In systems supporting shared libraries, functions from external modules can be accessed via handles returned by dlopen, with dlsym retrieving the address as a function pointer for immediate use. This mechanism is essential for plugin architectures, where callbacks from loaded libraries enable extensible behavior, such as registering event handlers in graphical user interfaces or extending application functionality. Control tables, often implemented as arrays of function pointers, facilitate efficient branching in program . Jump tables, for instance, optimize switch statements by mapping case values to indices in an of pointers, enabling constant-time dispatch via an indirect jump after table lookup. Compilers generate these tables for dense, consecutive case labels to reduce instruction count compared to chained conditional branches. In hardware contexts, interrupt vector tables serve a similar role, storing pointers to interrupt service routines (ISRs) indexed by numbers; upon hardware , the processor loads the corresponding pointer into the for immediate execution. This pointer-based dispatch ensures low-latency response in embedded and operating systems. Wild branches arise when function pointers are corrupted, leading to unpredictable control flow transfers that pose significant security risks. Attackers exploiting memory vulnerabilities, such as buffer overflows, can overwrite function pointers to redirect execution to malicious code, bypassing intended program paths. This indirect branch hijacking undermines control-flow integrity, potentially enabling privilege escalation or data exfiltration; for example, altering a callback pointer in a library function could invoke unauthorized routines. Dereferencing uninitialized or invalid function pointers (wild pointers) may further cause crashes or arbitrary code execution, amplifying risks in unsafe languages like C. In , back pointers provide references from objects to their owning , facilitating bidirectional navigation in hierarchical structures. These pointers enable efficient traversal, such as querying an object's for context or updating parent state upon child modifications, without relying solely on forward references. In verification disciplines, back pointers are formalized to ensure consistency, preventing dangling references during object lifecycle management. This pattern is common in graph-based designs, like scene graphs in graphics systems, where back pointers support operations such as deletion propagation from leaves to roots.

Simulation Methods

Array Index Simulation

In languages without native pointer support, such as Fortran 77, array indices can emulate basic pointer functionality by treating a fixed-size as a contiguous memory block, where the index serves as an offset from the base address to access elements. This simulates pointer arithmetic, enabling traversal and manipulation of data as if using a pointer to reference specific locations within the . For instance, to implement a simple , an of structures can store node data, with an additional of indices representing "pointers" to the next node; a value of 0 or -1 typically denotes the end of the list. This approach substitutes direct addresses with safe, integer-based offsets, avoiding raw pointer operations. The EQUIVALENCE statement in Fortran further enhances this simulation by allowing multiple variables or arrays to overlay the same storage, effectively creating aliases that mimic pointer-based reinterpretation of memory. For example, an integer variable and a real variable can be equivalenced to share the same bytes, permitting type punning similar to casting a pointer to a different type. A practical illustration involves overlaying arrays for efficient storage reuse:

EQUIVALENCE (IARRAY(1), RARRAY(1)) DIMENSION IARRAY(100), RARRAY(50)

EQUIVALENCE (IARRAY(1), RARRAY(1)) DIMENSION IARRAY(100), RARRAY(50)

Here, IARRAY and RARRAY begin at the same location, with RARRAY occupying half the space of IARRAY due to differing type sizes; accessing one modifies the viewed through the other. This technique was commonly used in Fortran 77 for simulating union-like structures or dynamic workspace overlay without native support for such features. Despite these capabilities, array index simulation has significant limitations compared to true pointers. It provides no genuine beyond the array's fixed bounds, restricting dynamic allocation and requiring pre-declared maximum sizes that may lead to waste or overflow risks if exceeded. Complex structures like trees or graphs demand multiple parallel arrays for data and links, complicating management without the flexibility of pointer reassignment or polymorphism. Moreover, unlike pointers, indices cannot easily cross array boundaries or support runtime resizing without recompilation or extensions. A key safety benefit arises from the inherent array nature of this method: indices are subject to bounds checking in many compilers, which can trap out-of-bounds access at runtime and prevent the or segmentation faults common with unchecked pointer arithmetic. For example, compilers like those from Sun or often include options to enable subscript validation against declared array limits, enforcing safer access than raw offsets in pointer-based systems. This reduces overflow vulnerabilities, promoting more reliable code in environments without hardware-level protections.

Higher-Level Abstractions

In higher-level programming paradigms, pointers are often abstracted through mechanisms that simulate their functionality while enhancing safety and usability, particularly in environments where direct memory manipulation is restricted or discouraged. These abstractions, such as references and handles, provide to objects without exposing raw arithmetic, thereby reducing risks like null dereferences or dangling pointers. In C++, references serve as aliases to existing objects, binding directly to the without requiring or explicit dereferencing, which makes them inherently safer than raw pointers since they cannot be null or reassigned after initialization. Unlike pointers, which support arithmetic and optional null states, references enforce valid binding at and eliminate the need for in many scenarios, promoting cleaner code in function parameters and object passing. This design choice aligns with the C++ Core Guidelines, which recommend references as a superior alternative to pointers when ownership transfer is not required, as they avoid common errors associated with pointer reassignment or unchecked access. Handles and iterators further abstract pointer-like behavior by wrapping underlying memory access in safer, more generic constructs. In the C++ Standard Template Library (STL), iterators generalize pointers to enable uniform traversal and manipulation across diverse data structures like vectors and lists, supporting operations such as increment, dereference, and comparison without exposing raw addresses. For instance, a random-access iterator like std::vector::iterator behaves akin to a pointer with offset arithmetic but includes bounds checking in some implementations to prevent out-of-range access. Similarly, in (COM) programming on Windows, smart handles like CComPtr encapsulate interface pointers with via AddRef and Release, ensuring proper lifetime management and reducing leaks without manual intervention. These wrappers, derived from base classes that handle COM-specific querying and casting, abstract the complexity of raw IUnknown pointers while maintaining compatibility with legacy APIs. Proxy objects in languages like Python simulate pointer indirection through object s, which are opaque handles to heap-allocated instances managed by the interpreter's garbage collector, without granting programmers direct access to memory addresses or arithmetic. Every variable in Python holds a to an object—essentially a pointer under the hood—but this is abstracted via the , where assignments create new bindings rather than copies, mimicking pass-by- semantics for mutable types like lists. This approach avoids pointer exposure entirely, as the Python Virtual Machine (PVM) handles all dereferencing and deallocation, preventing common low-level errors while enabling dynamic behaviors like . For weak s, the weakref module provides non-owning proxies that do not increment the reference count, allowing garbage collection without cycles, further simulating controlled pointer lifetimes. These higher-level abstractions introduce trade-offs between enhanced and potential performance overhead, as layers of and runtime checks can increase execution time compared to raw pointers. For example, wrappers in C++ may incur minimal compile-time costs but add runtime validation in debug modes, while COM smart handles automate at the expense of slight allocation overhead per instance. In managed environments like the Java Virtual Machine (JVM) and Common Language Runtime (CLR), object s abstract pointers internally—treating them as opaque handles with automatic —yet they introduce runtime overheads due to features like bounds checking in similar systems, balancing against efficiency in production workloads. Such mechanisms provide incomplete in modern virtual machines, where internal pointer use persists for optimization but remains hidden from developers to prioritize .

Language Implementations

Low-Level Languages

In low-level languages such as C and assembly, pointers provide direct manipulation of memory addresses, enabling efficient but error-prone access to data structures. In C, pointers are declared using the asterisk (*) to denote the pointer type, as in int *ptr;, which creates a pointer to an integer. The address-of operator (&) obtains the memory address of a variable, allowing initialization like ptr = &variable;. Dereferencing with * accesses the value at that address, such as *ptr = 42;, which modifies the original variable. The C23 standard (ISO/IEC 9899:2024) introduces the nullptr keyword as a null pointer constant of type nullptr_t, which implicitly converts to any pointer type but not to integers, enhancing type safety compared to NULL. Arrays in C decay to pointers to their first element, facilitating pointer arithmetic for traversal; for example, int arr[5]; int *p = arr; treats p as pointing to arr[0], and p[2] is equivalent to *(p + 2). Strings are handled as character arrays or pointers to null-terminated sequences, where char *str = "hello"; points to the first character, and the null terminator \0 marks the end. However, C defines several undefined behaviors with pointers, including dereferencing a or performing signed in pointer arithmetic, which can lead to unpredictable program crashes or incorrect results. C++ builds on C's pointer syntax with extensions for object-oriented features, including pointer-to-member operators for accessing class members. A pointer to a data member is declared as int Class::*pm = &Class::member;, and invoked via obj.*pm for objects or ptr->*pm for pointers to objects. Pointers to member functions follow similarly, e.g., void (Class::*pf)() = &Class::func;, enabling . Const qualifiers enhance safety: const int *p points to a constant integer (modifiable pointer but immutable target), while int *const p is a constant pointer (immutable address but modifiable target), and const int *const p combines both. In assembly languages like x86, pointers are implemented through register-based addressing modes, where registers such as %rbp (base pointer) or %rsp (stack pointer) hold memory addresses directly. Instructions like movl (%rbp), %eax load the value at the address in %rbp into %eax, effectively dereferencing a pointer, while offsets enable access, e.g., movl -4(%rbp), %eax for the element four bytes before the base. Higher-level languages integrate assembly via inline directives; in GCC-extended C/C++, asm("mov %1, %0" : "=r"(dest) : "r"(src)) uses registers for pointer operations, with memory constraints like "m"(*ptr) allowing direct access to C pointers while preserving register states. This low-level control is essential for but demands careful management to avoid segmentation faults from invalid addresses.

High-Level and Managed Languages

In high-level and managed programming languages, explicit pointers are often absent or heavily restricted to promote and abstraction from low-level . These languages typically employ automatic garbage collection or models to handle memory allocation and deallocation, reducing the risks associated with direct pointer manipulation such as dangling references or buffer overflows. Instead of raw pointers, they use higher-level constructs like object references or smart pointers that enforce safety invariants at compile or runtime. This approach aligns with modern trends toward safer , where borrow checkers or runtime checks prevent common pointer-related errors without sacrificing performance in most cases. Java exemplifies this paradigm by eschewing explicit pointers entirely in favor of object references, which are opaque handles managed by the (JVM). All non-primitive data types are accessed via references that point to objects on the heap, with the garbage collector automatically reclaiming memory from unreferenced objects to prevent leaks. This design ensures that developers cannot perform arithmetic on references or access raw memory addresses directly, fostering safer code. However, for interoperability with native code, the (JNI) allows unsafe access to pointers in C/C++ libraries, where developers must manually manage memory to avoid crashes or security vulnerabilities. In dynamically typed languages like Python and , references serve as automatic, implicit pointers to objects, abstracting away direct memory addressing while still allowing limited introspection. Python treats all objects as referenced values, with the id() built-in function providing a unique integer identifier—effectively the memory address in implementations—for or equality checks, though it is not intended for pointer-like operations. Perl's references are scalar values that point to other data structures like arrays or hashes, enabling complex data manipulation without exposing raw addresses; they support dereferencing via operators but are garbage-collected to ensure safety. These mechanisms prioritize ease of use over low-level control, with id() or similar functions offering only a peek into underlying addresses without enabling unsafe manipulations. Rust introduces a hybrid model, providing safe alternatives to pointers through its and borrowing system while reserving raw pointers for exceptional cases. Borrowed references (&T) and owned values () act as safe, compile-time-checked pointers that prevent and use-after-free errors via the borrow checker, eliminating many traditional pointer pitfalls without runtime overhead. Raw pointers (*const T and *mut T) exist but can only be dereferenced or created within explicit unsafe blocks, where the programmer assumes responsibility for correctness, often for interfacing with C code or optimizing performance-critical sections. This design fills gaps in safe seen in older languages, enabling with pointer-like efficiency under strict safety guarantees. Other managed languages offer similar opt-in unsafe facilities for low-level needs. Go's unsafe package provides an unsafe.Pointer type for converting between pointers and integers or bypassing type checks, but its use is discouraged and limited to specific patterns like systems calls, with the garbage collector handling most memory automatically. Swift includes UnsafePointer

and UnsafeMutablePointer

types for , typically in performance-sensitive or C-interfacing code, where bounds checking and ownership transfers help mitigate risks. Even legacy languages like and provide limited pointer support: uses USAGE IS POINTER for data items in procedure calls or dynamic allocation, while supports pointer variables with arithmetic for based structures, though both emphasize over unrestricted pointer use. These features reflect a broader shift in high-level languages toward safer abstractions, with unsafe options as deliberate escapes for specialized requirements.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.