Recent from talks
Contribute something
Nothing was collected or created yet.
Function (computer programming)
View on Wikipedia
In computer programming, a function (also procedure, method, subroutine, routine, or subprogram) is a callable unit[1] of software logic that has a well-defined interface and behavior and can be invoked multiple times.
Callable units provide a powerful programming tool.[2] The primary purpose is to allow for the decomposition of a large and/or complicated problem into chunks that have relatively low cognitive load and to assign the chunks meaningful names (unless they are anonymous). Judicious application can reduce the cost of developing and maintaining software, while increasing its quality and reliability.[3]
Callable units are present at multiple levels of abstraction in the programming environment. For example, a programmer may write a function in source code that is compiled to machine code that implements similar semantics. There is a callable unit in the source code and an associated one in the machine code, but they are different kinds of callable units – with different implications and features.
Terminology
[edit]Some programming languages, such as COBOL and BASIC, make a distinction between functions that return a value (typically called "functions") and those that do not (typically called "subprogram", "subroutine", or "procedure"); some, such as C, C++, and Rust, only use the term "function" irrespective of whether they return a value or not; others, such as ALGOL 60 and PL/I, only use the word procedure. Some object-oriented languages, such as Java and C#, refer to functions inside classes as "methods".
History
[edit]The idea of a callable unit was initially conceived by John Mauchly and Kathleen Antonelli during their work on ENIAC and recorded in a January 1947 Harvard symposium on "Preparation of Problems for EDVAC-type Machines."[4] Maurice Wilkes, David Wheeler, and Stanley Gill are generally credited with the formal invention of this concept, which they termed a closed sub-routine,[5][6] contrasted with an open subroutine or macro.[7] However, Alan Turing had discussed subroutines in a paper of 1945 on design proposals for the NPL ACE, going so far as to invent the concept of a return address stack.[8]
The idea of a subroutine was worked out after computing machines had already existed for some time. The arithmetic and conditional jump instructions were planned ahead of time and have changed relatively little, but the special instructions used for procedure calls have changed greatly over the years. The earliest computers, such as the Manchester Baby, and some early microprocessors, such as the RCA 1802, did not have a single subroutine call instruction. Subroutines could be implemented, but they required programmers to use the call sequence—a series of instructions—at each call site.
A single subroutine was implemented in 1945 in Konrad Zuse's Z4 in the form of a tape.[9]
In 1945, Alan Turing used the terms "bury" and "unbury" as a means of calling and returning from subroutines.[10][11]
In January 1947 John Mauchly presented general notes at A Symposium of Large Scale Digital Calculating Machinery under the joint sponsorship of Harvard University and the Bureau of Ordnance, United States Navy. Here he discusses serial and parallel operation suggesting:
...the structure of the machine need not be complicated one bit. It is possible, since all the logical characteristics essential to this procedure are available, to evolve a coding instruction for placing the subroutines in the memory at places known to the machine, and in such a way that they may easily be called into use.
In other words, one can designate subroutine A as division and subroutine B as complex multiplication and subroutine C as the evaluation of a standard error of a sequence of numbers, and so on through the list of subroutines needed for a particular problem. ... All these subroutines will then be stored in the machine, and all one needs to do is make a brief reference to them by number, as they are indicated in the coding.[4]
Kay McNulty had worked closely with John Mauchly on the ENIAC team and developed an idea for subroutines for the ENIAC computer she was programming during World War II.[12] She and the other ENIAC programmers used the subroutines to help calculate missile trajectories.[12]
Goldstine and von Neumann wrote a paper dated 16 August 1948 discussing the use of subroutines.[13]
Some very early computers and microprocessors, such as the IBM 1620, the Intel 4004 and Intel 8008, and PIC microcontrollers, have a single-instruction subroutine call that uses a dedicated hardware stack to store return addresses; such hardware supports only a few levels of subroutine nesting, but can support recursive subroutines. Machines before the mid-1960s, such as the UNIVAC I, the PDP-1, and the IBM 1130, typically use a calling convention which saved the instruction counter in the first memory location of the called subroutine. This allows arbitrarily deep levels of subroutine nesting but does not support recursive subroutines. The IBM System/360 had a subroutine call instruction that placed the saved instruction counter value into a general-purpose register; with additional code, this can be used to support arbitrarily deep subroutine nesting and recursive subroutines. The Burroughs B5000[14] (1961) is one of the first computers to store subroutine return data on a stack.
The DEC PDP-6[15] (1964) is one of the first accumulator-based machines to have a subroutine call instruction that saved the return address in a stack addressed by an accumulator or index register. The later PDP-10 (1966), PDP-11 (1970) and VAX-11 (1976) lines followed suit; this feature also supports both arbitrarily deep subroutine nesting and recursive subroutines.[16]
Language support
[edit]In the very early assemblers, subroutine support was limited. Subroutines were not explicitly separated from each other or from the main program, and indeed the source code of a subroutine could be interspersed with that of other subprograms. Some assemblers would offer predefined macros to generate the call and return sequences. By the 1960s, assemblers usually had much more sophisticated support for both inline and separately assembled subroutines that could be linked together.
One of the first programming languages to support user-written subroutines and functions was FORTRAN II. The IBM FORTRAN II compiler was released in 1958. ALGOL 58 and other early programming languages also supported procedural programming.
Libraries
[edit]Even with this cumbersome approach, subroutines proved very useful. They allowed the use of the same code in many different programs. Memory was a very scarce resource on early computers, and subroutines allowed significant savings in the size of programs.
Many early computers loaded the program instructions into memory from a punched paper tape. Each subroutine could then be provided by a separate piece of tape, loaded or spliced before or after the main program (or "mainline"[17]); and the same subroutine tape could then be used by many different programs. A similar approach was used in computers that loaded program instructions from punched cards. The name subroutine library originally meant a library, in the literal sense, which kept indexed collections of tapes or decks of cards for collective use.
Return by indirect jump
[edit]To remove the need for self-modifying code, computer designers eventually provided an indirect jump instruction, whose operand, instead of being the return address itself, was the location of a variable or processor register containing the return address.
On those computers, instead of modifying the function's return jump, the calling program would store the return address in a variable so that when the function completed, it would execute an indirect jump that would direct execution to the location given by the predefined variable.
Jump to subroutine
[edit]Another advance was the jump to subroutine instruction, which combined the saving of the return address with the calling jump, thereby minimizing overhead significantly. At least five forms of this existed in the 1950s.
- Return jump: store the return address in the first word of the subroutine. Return Jump (RJ) on the UNIVAC 1103A[18] is an early example.
- Index jump: store the return address in an index register. Transfer and Set indeX (TSX) on the IBM 704[19] is an early example.
- Automatic storage into a dedicated storage location: the Store P (STP) location on the RCA 501[20] is an early example.
- Automatic storage into a designated register: The sequence and cosequence history registers (SH and CSH)[21] on the Honeywell 800 is an early example.
- Patch return address: The "return address" instruction adds one to the contents of the program counter register and patches the address portion of a branch instruction at the last word the subroutine with the caller's return address. The "return address" instruction is followed immediately with a branch instruction to the beginning of the subroutine. The LGP-30[22] is an example.
The IBM System/360 architecture is an example of saving the return address into a designated register. The branch instructions BAL or BALR, designed for procedure calling, would save the return address in a processor register specified in the instruction, by convention register 14. To return, the subroutine had only to execute an indirect branch instruction (BR) through that register. If the subroutine needed that register for some other purpose (such as calling another subroutine), it would save the register's contents to a private memory location or a register stack.
The HP 2100 architecture is an example of saving the return address in the first word of the subroutine. The JSB instruction would save the return address in the memory location that was the target of the branch. Execution of the procedure would begin at the next memory location. In the HP 2100 assembly language, one would write, for example
...
JSB MYSUB (Calls subroutine MYSUB.)
BB ... (Will return here after MYSUB is done.)
to call a subroutine called MYSUB from the main program. The subroutine would be coded as
MYSUB NOP (Storage for MYSUB's return address.)
AA ... (Start of MYSUB's body.)
...
JMP MYSUB,I (Returns to the calling program.)
The JSB instruction placed the address of the NEXT instruction (namely, BB) into the location specified as its operand (namely, MYSUB), and then branched to the NEXT location after that (namely, AA = MYSUB + 1). The subroutine could then return to the main program by executing the indirect jump JMP MYSUB, I which branched to the location stored at location MYSUB. Compilers for Fortran and other languages could easily make use of these instructions when available. This approach supported multiple levels of calls; however, since the return address, parameters, and return values of a subroutine were assigned fixed memory locations, it did not allow for recursive calls.
Incidentally, a similar method was used by Lotus 1-2-3, in the early 1980s, to discover the recalculation dependencies in a spreadsheet. Namely, a location was reserved in each cell to store the return address. Since circular references are not allowed for natural recalculation order, this allows a tree walk without reserving space for a stack in memory, which was very limited on small computers such as the IBM PC.
Call stack
[edit]Most modern implementations of a function call use a call stack, a special case of the stack data structure, to implement function calls and returns. Each procedure call creates a new entry, called a stack frame, at the top of the stack; when the procedure returns, its stack frame is deleted from the stack, and its space may be used for other procedure calls. Each stack frame contains the private data of the corresponding call, which typically includes the procedure's parameters and internal variables, and the return address.
The call sequence can be implemented by a sequence of ordinary instructions (an approach still used in reduced instruction set computing (RISC) and very long instruction word (VLIW) architectures), but many traditional machines designed since the late 1960s have included special instructions for that purpose.
The call stack is usually implemented as a contiguous area of memory. It is an arbitrary design choice whether the bottom of the stack is the lowest or highest address within this area, so that the stack may grow forwards or backwards in memory; however, many architectures chose the latter.[citation needed]
Some designs, notably some Forth implementations, used two separate stacks, one mainly for control information (like return addresses and loop counters) and the other for data. The former was, or worked like, a call stack and was only indirectly accessible to the programmer through other language constructs while the latter was more directly accessible.
When stack-based procedure calls were first introduced, an important motivation was to save precious memory.[23][page needed] With this scheme, the compiler does not have to reserve separate space in memory for the private data (parameters, return address, and local variables) of each procedure. At any moment, the stack contains only the private data of the calls that are currently active (namely, which have been called but haven't returned yet). Because of the ways in which programs were usually assembled from libraries, it was (and still is) not uncommon to find programs that include thousands of functions, of which only a handful are active at any given moment.[citation needed] For such programs, the call stack mechanism could save significant amounts of memory. Indeed, the call stack mechanism can be viewed as the earliest and simplest method for automatic memory management.
However, another advantage of the call stack method is that it allows recursive function calls, since each nested call to the same procedure gets a separate instance of its private data.
In a multi-threaded environment, there is generally more than one stack.[24] An environment that fully supports coroutines or lazy evaluation may use data structures other than stacks to store their activation records.
Delayed stacking
[edit]One disadvantage of the call stack mechanism is the increased cost of a procedure call and its matching return.[clarification needed] The extra cost includes incrementing and decrementing the stack pointer (and, in some architectures, checking for stack overflow), and accessing the local variables and parameters by frame-relative addresses, instead of absolute addresses. The cost may be realized in increased execution time, or increased processor complexity, or both.
This overhead is most obvious and objectionable in leaf procedures or leaf functions, which return without making any procedure calls themselves.[25][26] To reduce that overhead, many modern compilers try to delay the use of a call stack until it is really needed.[citation needed] For example, the call of a procedure P may store the return address and parameters of the called procedure in certain processor registers, and transfer control to the procedure's body by a simple jump. If the procedure P returns without making any other call, the call stack is not used at all. If P needs to call another procedure Q, it will then use the call stack to save the contents of any registers (such as the return address) that will be needed after Q returns.
Features
[edit]In general, a callable unit is a list of instructions that, starting at the first instruction, executes sequentially except as directed via its internal logic. It can be invoked (called) many times during the execution of a program. Execution continues at the next instruction after the call instruction when it returns control.
Implementations
[edit]The features of implementations of callable units evolved over time and varies by context. This section describes features of the various common implementations.
General characteristics
[edit]Most modern programming languages provide features to define and call functions, including syntax for accessing such features, including:
- Delimit the implementation of a function from the rest of the program
- Assign an identifier, name, to a function
- Define formal parameters with a name and data type for each
- Assign a data type to the return value, if any
- Specify a return value in the function body
- Call a function
- Provide actual parameters that correspond to a called function's formal parameters
- Return control to the caller at the point of call
- Consume the return value in the caller
- Dispose of the values returned by a call
- Provide a private naming scope for variables
- Identify variables outside the function that are accessible within it
- Propagate an exceptional condition out of a function and to handle it in the calling context
- Package functions into a container such as module, library, object, or class
Naming
[edit]Some languages, such as Pascal, Fortran, Ada and many dialects of BASIC, use a different name for a callable unit that returns a value (function or subprogram) vs. one that does not (subroutine or procedure).
Other languages, such as C, C++, C# and Lisp, use only one name for a callable unit, function. The C-family languages use the keyword void to indicate no return value.
Call syntax
[edit]If declared to return a value, a call can be embedded in an expression in order to consume the return value. For example, a square root callable unit might be called like y = sqrt(x).
A callable unit that does not return a value is called as a stand-alone statement like print("hello"). This syntax can also be used for a callable unit that returns a value, but the return value will be ignored.
Some older languages require a keyword for calls that do not consume a return value, like CALL print("hello").
Parameters
[edit]Most implementations, especially in modern languages, support parameters which the callable declares as formal parameters. A caller passes actual parameters, a.k.a. arguments, to match. Different programming languages provide different conventions for passing arguments.
| Convention | Description | Used in |
|---|---|---|
| by value | A copy of the argument is passed | Default in most Algol-like languages after Algol 60, such as Pascal, Delphi, Simula, CPL, PL/M, Modula, Oberon, Ada, and many others including C, C++ and Java |
| by reference | A reference to the argument is passed; typically its address | Selectable in most Algol-like languages after Algol 60, such as Algol 68, Pascal, Delphi, Simula, CPL, PL/M, Modula, Oberon, Ada, and many others including C++, Fortran, PL/I |
| by result | The value computed during the call is copied to the argument on return | Ada OUT parameters |
| by value-result | A copy of the argument is passed in and the value computed during the call is copied to the argument on return | Algol, Swift in-out parameters |
| by name | Like a macro – replace the parameters with the unevaluated argument expressions, then evaluate the argument in the context of the caller every time that the callable uses the parameter | Algol, Scala |
| by constant value | Like by-value except that the parameter is treated as a constant | PL/I NONASSIGNABLE parameters, Ada IN parameters |
Return value
[edit]In some languages, such as BASIC, a callable has different syntax (i.e. keyword) for a callable that returns a value vs. one that does not.
In other languages, the syntax is the same regardless.
In some of these languages an extra keyword is used to declare no return value; for example void in C, C++ and C#.
In some languages, such as Python, the difference is whether the body contains a return statement with a value, and a particular callable may return with or without a value based on control flow.
Side effects
[edit]In many contexts, a callable may have side effect behavior such as modifying passed or global data, reading from or writing to a peripheral device, accessing a file, halting the program or the machine, or temporarily pausing program execution.
Side effects are considered undesirable by Robert C. Martin, who is known for promoting design principles. Martin argues that side effects can result in temporal coupling or order dependencies.[27]
In strictly functional programming languages such as Haskell, a function can have no side effects, which means it cannot change the state of the program. Functions always return the same result for the same input. Such languages typically only support functions that return a value, since there is no value in a function that has neither return value nor side effect.
Local variables
[edit]Most contexts support local variables – memory owned by a callable to hold intermediate values. These variables are typically stored in the call's activation record on the call stack along with other information such as the return address.
Nested call – recursion
[edit]If supported by the language, a callable may call itself, causing its execution to suspend while another nested execution of the same callable executes. Recursion is a useful means to simplify some complex algorithms and break down complex problems. Recursive languages provide a new copy of local variables on each call. If the programmer desires the recursive callable to use the same variables instead of using locals, they typically declare them in a shared context such static or global.
Languages going back to ALGOL, PL/I and C and modern languages, almost invariably use a call stack, usually supported by the instruction sets to provide an activation record for each call. That way, a nested call can modify its local variables without affecting any of the suspended calls variables.
Recursion allows direct implementation of functionality defined by mathematical induction and recursive divide and conquer algorithms. Here is an example of a recursive function in C to find Fibonacci numbers:
int fibonacci(unsigned int n) {
if (n <= 1) {
return n;
}
return fibonacci(n - 1) + fibonacci(n - 2);
}
Early languages like Fortran did not initially support recursion because only one set of variables and return address were allocated for each callable.[28] Early computer instruction sets made storing return addresses and variables on a stack difficult. Machines with index registers or general-purpose registers, e.g., CDC 6000 series, PDP-6, GE 635, System/360, UNIVAC 1100 series, could use one of those registers as a stack pointer.
Nested scope
[edit]Some languages, e.g., Ada, Pascal, PL/I, Python, support declaring and defining a function inside, e.g., a function body, such that the name of the inner is only visible within the body of the outer.
A simple example in Pascal:
function E(x: real): real;
function F(y: real): real;
begin
F := x + y
end;
begin
E := F(3) + F(4)
end;
The function F is nested within E. Note that E's parameter x is also visible in F (as F is a part of E) while both x and y are invisible outside E and F respectively.
Reentrancy
[edit]If a callable can be executed properly even when another execution of the same callable is already in progress, that callable is said to be reentrant. A reentrant callable is also useful in multi-threaded situations since multiple threads can call the same callable without fear of interfering with each other. In the IBM CICS transaction processing system, quasi-reentrant was a slightly less restrictive, but similar, requirement for application programs that were shared by many threads.
Overloading
[edit]Some languages support overloading – allow multiple callables with the same name in the same scope, but operating on different types of input. Consider the square root function applied to real number, complex number and matrix input. The algorithm for each type of input is different, and the return value may have a different type. By writing three separate callables with the same name. i.e. sqrt, the resulting code may be easier to write and to maintain since each one has a name that is relatively easy to understand and to remember instead of giving longer and more complicated names like sqrt_real, sqrt_complex, qrt_matrix.
Overloading is supported in many languages that support strong typing. Often the compiler selects the overload to call based on the type of the input arguments or it fails if the input arguments do not select an overload. Older and weakly-typed languages generally do not support overloading.
Here is an example of overloading in C++, two functions area that accept different types:
// returns the area of a rectangle defined by height and width
double area(double height, double width) {
return h * w;
}
// returns the area of a circle defined by radius
double area(double radius) {
return radius * radius * std::numbers::pi;
}
int main() {
double rectangleArea = area(3, 4);
double circleArea = area(5);
}
PL/I has the GENERIC attribute to define a generic name for a set of entry references called with different types of arguments. Example:
DECLARE gen_name GENERIC(
name WHEN(FIXED BINARY),
flame WHEN(FLOAT),
pathname OTHERWISE);Multiple argument definitions may be specified for each entry. A call to "gen_name" will result in a call to "name" when the argument is FIXED BINARY, "flame" when FLOAT", etc. If the argument matches none of the choices "pathname" will be called.
Closure
[edit]A closure is a callable plus values of some of its variables captured from the environment in which it was created. Closures were a notable feature of the Lisp programming language, introduced by John McCarthy. Depending on the implementation, closures can serve as a mechanism for side-effects.
Exception reporting
[edit]Besides its happy path behavior, a callable may need to inform the caller about an exceptional condition that occurred during its execution.
Most modern languages support exceptions which allows for exceptional control flow that pops the call stack until an exception handler is found to handle the condition.
Languages that do not support exceptions can use the return value to indicate success or failure of a call. Another approach is to use a well-known location like a global variable for success indication. A callable writes the value and the caller reads it after a call.
In the IBM System/360, where return code was expected from a subroutine, the return value was often designed to be a multiple of 4—so that it could be used as a direct branch table index into a branch table often located immediately after the call instruction to avoid extra conditional tests, further improving efficiency. In the System/360 assembly language, one would write, for example:
BAL 14, SUBRTN01 go to a subroutine, storing return address in R14
B TABLE(15) use returned value in reg 15 to index the branch table,
* branching to the appropriate branch instr.
TABLE B OK return code =00 GOOD }
B BAD return code =04 Invalid input } Branch table
B ERROR return code =08 Unexpected condition }
Call overhead
[edit]A call has runtime overhead, which may include but is not limited to:
- Allocating and reclaiming call stack storage
- Saving and restoring processor registers
- Copying input variables
- Copying values after the call into the caller's context
- Automatic testing of the return code
- Handling of exceptions
- Dispatching such as for a virtual method in an object-oriented language
Various techniques are employed to minimize the runtime cost of calls.
Compiler optimization
[edit]Some optimizations for minimizing call overhead may seem straight forward, but cannot be used if the callable has side effects. For example, in the expression (f(x)-1)/(f(x)+1), the function f cannot be called only once with its value used two times since the two calls may return different results. Moreover, in the few languages which define the order of evaluation of the division operator's operands, the value of x must be fetched again before the second call, since the first call may have changed it. Determining whether a callable has a side effect is difficult – indeed, undecidable by virtue of Rice's theorem. So, while this optimization is safe in a purely functional programming language, a compiler for a language not limited to functional typically assumes the worst case, that every callable may have side effects.
Inlining
[edit]Inlining eliminates calls for particular callables. The compiler replaces each call with the compiled code of the callable. Not only does this avoid the call overhead, but it also allows the compiler to optimize code of the caller more effectively by taking into account the context and arguments at that call. Inlining, however, usually increases the compiled code size, except when only called once or the body is very short, like one line.
Sharing
[edit]Callables can be defined within a program, or separately in a library that can be used by multiple programs.
Inter-operability
[edit]A compiler translates call and return statements into machine instructions according to a well-defined calling convention. For code compiled by the same or a compatible compiler, functions can be compiled separately from the programs that call them. The instruction sequences corresponding to call and return statements are called the procedure's prologue and epilogue.
Built-in functions
[edit]
A built-in function, or builtin function, or intrinsic function, is a function for which the compiler generates code at compile time or provides in a way other than for other functions.[29] A built-in function does not need to be defined like other functions since it is built in to the programming language.[30]
Programming
[edit]Trade-offs
[edit]Advantages
[edit]Advantages of breaking a program into functions include:
- Decomposing a complex programming task into simpler steps: this is one of the two main tools of structured programming, along with data structures
- Reducing duplicate code within a program
- Enabling reuse of code across multiple programs
- Dividing a large programming task among various programmers or various stages of a project
- Hiding implementation details from users of the function
- Improving readability of code by replacing a block of code with a function call where a descriptive function name serves to describe the block of code. This makes the calling code concise and readable even if the function is not meant to be reused.
- Improving traceability (i.e. most languages offer ways to obtain the call trace which includes the names of the involved functions and perhaps even more information such as file names and line numbers); by not decomposing the code into functions, debugging would be severely impaired
Disadvantages
[edit]Compared to using in-line code, invoking a function imposes some computational overhead in the call mechanism.[citation needed]
A function typically requires standard housekeeping code – both at the entry to, and exit from, the function (function prologue and epilogue – usually saving general purpose registers and return address as a minimum).
Conventions
[edit]Many programming conventions have been developed regarding callables.
With respect to naming, many developers name a callable with a phrase starting with a verb when it does a certain task, with an adjective when it makes an inquiry, and with a noun when it is used to substitute variables.
Some programmers suggest that a callable should perform exactly one task, and if it performs more than one task, it should be split up into multiple callables. They argue that callables are key components in software maintenance, and their roles in the program must remain distinct.
Proponents of modular programming advocate that each callable should have minimal dependency on the rest of the codebase. For example, the use of global variables is generally deemed unwise, because it adds coupling between all callables that use the global variables. If such coupling is not necessary, they advise to refactor callables to accept passed parameters instead.
Examples
[edit]Early BASIC
[edit]Early BASIC variants require each line to have a unique number (line number) that orders the lines for execution, provides no separation of the code that is callable, no mechanism for passing arguments or to return a value and all variables are global. It provides the command GOSUB where sub is short for sub procedure, subprocedure or subroutine. Control jumps to the specified line number and then continues on the next line on return.
10 REM A BASIC PROGRAM
20 GOSUB 100
30 GOTO 20
100 INPUT “GIVE ME A NUMBER”; N
110 PRINT “THE SQUARE ROOT OF”; N;
120 PRINT “IS”; SQRT(N)
130 RETURN
This code repeatedly asks the user to enter a number and reports the square root of the value. Lines 100-130 are the callable.
Small Basic
[edit]In Microsoft Small Basic, targeted to the student first learning how to program in a text-based language, a callable unit is called a subroutine.
The Sub keyword denotes the start of a subroutine and is followed by a name identifier. Subsequent lines are the body which ends with the EndSub keyword.
[31]
Sub SayHello
TextWindow.WriteLine("Hello!")
EndSub
This can be called as SayHello().
[32]
Visual Basic
[edit]In later versions of Visual Basic (VB), including the latest product line and VB6, the term procedure is used for the callable unit concept. The keyword Sub is used to return no value and Function to return a value. When used in the context of a class, a procedure is a method.
[33]
Each parameter has a data type that can be specified, but if not, defaults to Object for later versions based on .NET and variant for VB6.[34]
VB supports parameter passing conventions by value and by reference via the keywords ByVal and ByRef, respectively.
Unless ByRef is specified, an argument is passed ByVal. Therefore, ByVal is rarely explicitly specified.
For a simple type like a number these conventions are relatively clear. Passing ByRef allows the procedure to modify the passed variable whereas passing ByVal does not. For an object, semantics can confuse programmers since an object is always treated as a reference. Passing an object ByVal copies the reference; not the state of the object. The called procedure can modify the state of the object via its methods yet cannot modify the object reference of the actual parameter.
Sub DoSomething()
' Some Code Here
End Sub
The does not return a value and has to be called stand-alone, like DoSomething
Function GiveMeFive() as Integer
GiveMeFive= 5
End Function
This returns the value 5, and a call can be part of an expression like y = x + GiveMeFive()
Sub AddTwo(ByRef intValue as Integer)
intValue = intValue + 2
End Sub
This has a side-effect – modifies the variable passed by reference and could be called for variable v like AddTwo(v). Giving v is 5 before the call, it will be 7 after.
C and C++
[edit]In C and C++, a callable unit is called a function. A function definition starts with the name of the type of value that it returns or void to indicate that it does not return a value. This is followed by the function name, formal arguments in parentheses, and body lines in braces.
In C++, a function declared in a class (as non-static) is called a member function or method. A function outside of a class can be called a free function to distinguish it from a member function. [35]
void doSomething() {
// some code
}
This function does not return a value and is always called stand-alone, like doSomething()
int returnFive() {
return 5;
}
This function returns the integer value 5. The call can be stand-alone or in an expression like y = x + returnFive()
void addTwo(int* pi) {
*pi += 2;
}
This function has a side-effect – modifies the value passed by address to the input value plus 2. It could be called for variable v as addTwo(&v) where the ampersand (&) tells the compiler to pass the address of a variable. Giving v is 5 before the call, it will be 7 after.
void addTwo(int& i) {
i += 2;
}
This function requires C++ – would not compile as C. It has the same behavior as the preceding example but passes the actual parameter by reference rather than passing its address. A call such as addTwo(v) does not include an ampersand since the compiler handles passing by reference without syntax in the call.
PL/I
[edit]In PL/I a called procedure may be passed a descriptor providing information about the argument, such as string lengths and array bounds. This allows the procedure to be more general and eliminates the need for the programmer to pass such information. By default PL/I passes arguments by reference. A (trivial) function to change the sign of each element of a two-dimensional array might look like:
change_sign: procedure(array);
declare array(*,*) float;
array = -array;
end change_sign;
This could be called with various arrays as follows:
/* first array bounds from -5 to +10 and 3 to 9 */
declare array1 (-5:10, 3:9)float;
/* second array bounds from 1 to 16 and 1 to 16 */
declare array2 (16,16) float;
call change_sign(array1);
call change_sign(array2);
Python
[edit]In Python, the keyword def denotes the start of a function definition. The statements of the function body follow as indented on subsequent lines and end at the line that is indented the same as the first line or end of file.[36]
def format_greeting(name: str) -> None:
return f"Welcome, {name}!"
def greet_martin() -> None:
print(format_greeting("Martin"))
The first function returns greeting text that includes the name passed by the caller. The second function calls the first and is called like greet_martin() to write "Welcome Martin" to the console.
Prolog
[edit]In the procedural interpretation of logic programs, logical implications behave as goal-reduction procedures. A rule (or clause) of the form:
A :- B
which has the logical reading:
A if B
behaves as a procedure that reduces goals that unify with A to subgoals that are instances ofB.
Consider, for example, the Prolog program:
mother_child(elizabeth, charles).
father_child(charles, william).
father_child(charles, harry).
parent_child(X, Y) :- mother_child(X, Y).
parent_child(X, Y) :- father_child(X, Y).
Notice that the motherhood function, X = mother(Y) is represented by a relation, as in a relational database. However, relations in Prolog function as callable units.
For example, the procedure call ?- parent_child(X, charles) produces the output X = elizabeth. But the same procedure can be called with other input-output patterns. For example:
?- parent_child(elizabeth, Y).
Y = charles.
?- parent_child(X, Y).
X = elizabeth,
Y = charles.
X = charles,
Y = harry.
X = charles,
Y = william.
?- parent_child(william, harry).
no.
?- parent_child(elizabeth, charles).
yes.
See also
[edit]- Asynchronous procedure call – function that is called after its parameters are set by other activities
- Command–query separation – IT architecture separating actions and reads (CQS)
- Compound operation – Basic programming language construct
- Coroutine – Functions whose execution you can pause
- Evaluation strategy – Programming language evaluation rules
- Event (computing) – Computing state associated with a point in time
- Function (mathematics) – Association of one output to each input
- Functional programming – Programming paradigm based on applying and composing functions
- Fused operation – Basic programming language construct
- Generator (computer programming) – Routine that generates a sequence of values
- Intrinsic function – Function whose implementation is handled specially by the compiler
- Lambda function (computer programming) – Function definition that is not bound to an identifier
- Logic programming – Programming paradigm based on formal logic
- Modular programming – Organizing code into modules
- Operator overloading – Feature of some programming languages
- Parameter (computer programming)
- Protected procedure
- Transclusion – Including one data set inside another automatically
References
[edit]- ^ "Terminology Glossary". nist.gov. NIST. Retrieved 9 February 2024.
Callable unit: (Of a software program or logical design) Function, method, operation, subroutine, procedure, or analogous structural unit that appears within a module.
- ^ Donald E. Knuth (1997). The Art of Computer Programming, Volume I: Fundamental Algorithms. Addison-Wesley. ISBN 0-201-89683-4.
- ^ O.-J. Dahl; E. W. Dijkstra; C. A. R. Hoare (1972). Structured Programming. Academic Press. ISBN 0-12-200550-3.
- ^ a b Mauchly, J.W. (1982). "Preparation of Problems for EDVAC-Type Machines". In Randell, Brian (ed.). The Origins of Digital Computers. Springer. pp. 393–397. doi:10.1007/978-3-642-61812-3_31. ISBN 978-3-642-61814-7.
- ^ Wheeler, D. J. (1952). "The use of sub-routines in programmes" (PDF). Proceedings of the 1952 ACM national meeting (Pittsburgh) on - ACM '52. p. 235. doi:10.1145/609784.609816.
- ^ Wilkes, M. V.; Wheeler, D. J.; Gill, S. (1951). Preparation of Programs for an Electronic Digital Computer. Addison-Wesley.
- ^ Dainith, John (2004). ""open subroutine." A Dictionary of Computing". Encyclopedia.com. Retrieved 14 January 2013.
- ^ Turing, Alan M. (1945), Report by Dr. A.M. Turing on proposals for the development of an Automatic Computing Engine (ACE): Submitted to the Executive Committee of the NPL in February 1946 reprinted in Copeland, B. J., ed. (2005). Alan Turing's Automatic Computing Engine. Oxford: Oxford University Press. p. 383. ISBN 0-19-856593-3.
- ^ Speiser, Ambros Paul (2002). "Konrad Zuse's Z4: Architecture, Programming and Modifications at ETH Zurich". In Rojas, Rául; Hashagen, Ulf [in German] (eds.). The first Computers: History and Architectures. MIT. pp. 263–276. ISBN 978-0-262-18197-6.
- ^ Turing, Alan Mathison (19 March 1946) [1945], Proposals for Development in the Mathematics Division of an Automatic Computing Engine (ACE) (NB. Presented on 1946-03-19 before the Executive Committee of the National Physical Laboratory (Great Britain).)
- ^ Carpenter, Brian Edward; Doran, Robert William (1 January 1977) [October 1975]. "The other Turing machine". The Computer Journal. 20 (3): 269–279. doi:10.1093/comjnl/20.3.269. (11 pages)
- ^ a b Isaacson, Walter (18 September 2014). "Walter Isaacson on the Women of ENIAC". Fortune. Archived from the original on 12 December 2018. Retrieved 14 December 2018.
- ^ Herman H. Goldstine; John von Neumann (1947). "Volume 3, 12.0 COMBINING ROUTINES" (PDF). Report on the Mathematical and Logical aspects of an Electronic Computing Instrument - Part II, Volume I-3, Planning and Coding of Problems for an Electronic Computing Instrument (PDF) (Technical report). Institute for Advanced Study.
- ^ The Operational Characteristics of the Processors for the Burroughs B5000 (PDF). Revision A. Burroughs Corporation. 1963. 5000-21005. Retrieved 8 February 2024.
- ^ "Push-Down Instructions" (PDF). Programmed Data Processor 6 - Handbook (PDF). p. 37. Retrieved 8 February 2024.
- ^ Guy Lewis Steele Jr. AI Memo 443. 'Debunking the "Expensive Procedure Call" Myth; or, Procedure call implementations considered harmful". Section "C. Why Procedure Calls Have a Bad Reputation".
- ^
Frank, Thomas S. (1983). Introduction to the PDP-11 and Its Assembly Language. Prentice-Hall software series. Prentice-Hall. p. 195. ISBN 9780134917047. Retrieved 6 July 2016.
We could supply our assembling clerk with copies of the source code for all of our useful subroutines and then when presenting him with a mainline program for assembly, tell him which subroutines will be called in the mainline [...]
- ^ The UNIVAC Scientific General Purpose Computer System (Model 1103 A) - Preliminary Information (PDF). Remington Rand Engineering Research Associates Division. 1 December 1955. p. 1-10. Retrieved 8 October 2025.
- ^ "Control Operations" (PDF). 704 electronic data-processing machine - manual of operation (PDF). IBM. 1955. p. 25. 24-6661-2. Retrieved 8 October 2025.
- ^ "Automatic Storage of Contents of P Register (STP)" (PDF). RCA 501 Electronic Data processing System - Programmers' Reference Manual (PDF). RCA Electronic Data Processing Division. 1958. p. 18. P501-2. Retrieved 8 October 2025.
- ^ "History Registers" (PDF). Honeywell 800 Transistorized Data processing System - Programmers' Reference Manual (PDF). Honeywell. 1960. p. 55. DSI-31. Retrieved 8 October 2025.
- ^ Royal Precision Electronic Computer LGP - 30 Programming Manual. Port Chester, New York: Royal Precision. April 1957. Retrieved 19 November 2023.
- ^ Tanenbaum, Andrew S.; Austin, Todd (2013). Structured computer organization. Always learning (6th ed.). Boston, Mass. Munich: Pearson. ISBN 978-0-13-291652-3.
- ^ Buttlar, Dick; Farrell, Jacqueline; Nichols, Bradford (1996). PThreads Programming: A POSIX Standard for Better Multiprocessing. "O'Reilly Media, Inc.". pp. 2–5. ISBN 978-1-4493-6475-5. OCLC 1036778036.
- ^ "ARM Information Center". Infocenter.arm.com. Retrieved 29 September 2013.
- ^ "x64 stack usage". Microsoft Learn. Microsoft. Function types. Retrieved 29 October 2025.
- ^ Martin, Robert C. (1 August 2008). Clean Code: A Handbook of Agile Software Craftsmanship (1 ed.). Pearson. ISBN 9780132350884. Retrieved 19 May 2024.
- ^ Verhoeff, Tom (2018). "A Master Class on Recursion". In Böckenhauer, Hans-Joachim; Komm, Dennis; Unger, Walter (eds.). Adventures Between Lower Bounds and Higher Altitudes: Essays Dedicated to Juraj Hromkovič on the Occasion of His 60th Birthday. Springer. p. 616. ISBN 978-3-319-98355-4. OCLC 1050567095.
- ^ "Built-in functions". ibm.com. 9 March 2017. Retrieved 25 December 2023.
- ^ Study Material Python. April 2023. p. 87. Retrieved 25 December 2023.
- ^ "Small Basic". Small Basic. Retrieved 8 February 2024.
- ^ "Small Basic Getting Started Guide: Chapter 9: Subroutines". Microsoft. 17 January 2024.
- ^ "Procedures in Visual Basic". Microsoft Learn. 15 September 2021. Retrieved 8 February 2024.
- ^ "Dim statement (Visual Basic)". Microsoft Learn. 15 September 2021. Retrieved 8 February 2024.
- ^ "what is meant by a free function".
- ^ "4. More Control Flow Tools — Python 3.9.7 documentation".
Function (computer programming)
View on GrokipediaTerminology
Definitions and Core Concepts
In computer programming, a function is a named sequence of statements that encapsulates a specific computation, accepting input parameters to process data and optionally producing a return value as output. This design facilitates modularity by allowing developers to define reusable blocks of code that can be invoked multiple times with different inputs, reducing redundancy and improving program structure.[4] The concept draws directly from mathematics, where a function represents a mapping from one set of inputs to corresponding outputs, ensuring a unique result for each valid input. In programming, this etymological root underscores the emphasis on predictable computation, though adapted to handle imperative instructions and side effects in practical software.[5] A key distinction exists between functions and related constructs like procedures or subroutines: functions are designed to compute and return a value to the caller, enabling integration into expressions, whereas procedures primarily execute actions without requiring a return, focusing on tasks such as updating state or performing I/O operations. This separation promotes clearer code organization, with functions handling value-producing logic and procedures managing procedural flows.[6] Central to functions are core concepts of abstraction and encapsulation. Abstraction allows complex internal algorithms to be represented by a simple interface, hiding implementation details from the user while exposing only necessary inputs and outputs. Encapsulation further ensures that a function's internal variables and operations remain isolated, minimizing unintended interactions and supporting maintainable, scalable software design. The term "function" entered computer science from its mathematical origins in the 1950s, evolving to suit imperative contexts in early high-level languages.[7]Variations Across Languages
In computer programming, the core concept of a function manifests with varying terminology and subtle semantic adaptations across languages and paradigms, reflecting differences in design philosophy and usage. Common synonyms include "subroutine," which denotes a reusable block of code invoked by a call; "procedure," often implying a sequence of statements without a return value; and "method," particularly in object-oriented programming where it is bound to a class or object instance to encapsulate behavior.[8][9][10] These variations extend to paradigm-specific interpretations. In functional programming languages, functions are treated as first-class citizens, meaning they can be assigned to variables, passed as arguments to other functions, or returned as values, emphasizing immutability and composition over state changes. In contrast, imperative languages often prioritize functions that produce side effects, such as modifying global state or input/output operations, to drive program execution through sequential commands. Declarative languages like Prolog diverge further by using "predicates" instead of functions; these are relations that evaluate to true or false based on logical unification, without explicit computation or return values, aligning with rule-based inference rather than procedural flow.[11] Specific languages illustrate these nuances in nomenclature and syntax. The C language employs the term "function" for standalone blocks of code that may return values, drawing directly from mathematical analogies to promote modularity in systems programming.[12] Python distinguishes named functions, defined with thedef keyword for reusable, documented code blocks, from anonymous functions created via [lambda](/page/Lambda) expressions for concise, inline operations without formal names.[13] Java, adhering strictly to object-oriented principles, uses "method" exclusively for code defined within classes, where instance methods operate on object state and static methods belong to the class itself, ensuring all functionality is encapsulated.[14]
A distinctive adaptation is the higher-order function, which accepts other functions as arguments or returns them as results, enabling abstraction and metaprogramming; this concept originated in Lisp, developed by John McCarthy in 1958, where functions like mapcar apply a given function to elements of a list.[15]
History
Early Developments
The concept of functions in computing traces its mathematical roots to the formalization of computable functions in the 1930s. Alonzo Church developed lambda calculus as a system for expressing functions through abstraction and application, introducing it in his 1932 paper on logical foundations and further refining the untyped version in 1936 to define effective calculability.[16][17] Independently, Alan Turing's 1936 paper on computable numbers proposed Turing machines as a model for mechanical computation, demonstrating that any computable function could be realized through a finite set of states and symbols on an infinite tape.[18] These theoretical frameworks established functions as central to what could be algorithmically computed, influencing later programming paradigms without direct implementation in hardware at the time. Early hardware implementations began to realize function-like reuse through subroutines in the mid-1940s. The von Neumann architecture, outlined in the 1945 First Draft of a Report on the EDVAC, described a stored-program computer where instructions and data shared memory, enabling subroutines as reusable sequences of operations controlled by logical components like central arithmetic units.[19] Similarly, the ENIAC, completed in 1945, supported subroutine calls via jump instructions that allowed non-sequential execution, such as transferring control to a predefined block of operations and returning afterward; this was evident in programs using indirect addressing for jumps stored in function tables.[20] Conceptual advancements in software emerged with Konrad Zuse's Plankalkül in the 1940s, which envisioned reusable code blocks known as "Rechenpläne" or subroutines, complete with input/output specifications and callable structures to promote modularity in algorithmic descriptions.[21] Practical implementation followed in 1949 with the EDSAC, where subroutines were stored as short programs for common tasks, facilitating code reuse and efficiency on the machine's limited mercury delay-line memory.[22] A pivotal development was Maurice Wilkes' creation of the first subroutine library for EDSAC that year, which provided pre-coded routines on punched tapes to eliminate duplication of frequently used code segments, thereby conserving precious memory and accelerating program development.[22]Key Milestones in Language Design
In the 1950s, the development of high-level programming languages marked significant progress in supporting user-defined functions. FORTRAN, introduced in 1957 by John Backus and his team at IBM, pioneered the concept of user-defined functions capable of returning values, allowing programmers to encapsulate reusable computations for scientific and numerical tasks.[23] This innovation reduced the need for manual assembly coding and emphasized procedural abstraction in early computing environments. ALGOL 60, formalized in 1960 following design work begun in 1958, advanced function design by introducing call-by-value and call-by-name parameter passing mechanisms, as well as support for recursion, enabling more flexible and expressive subroutine invocations.[24] These features, detailed in the language's revised report, influenced subsequent languages by promoting structured control flow and dynamic behavior in functions. ALGOL's contributions extended to block structure and lexical scoping, which delimited variable visibility within function bodies and nested scopes, a paradigm adopted in Pascal, released in 1970 by Niklaus Wirth.[25] Pascal's revised report explicitly built on ALGOL's scoping rules to enforce modularity and readability in procedure definitions.[26] During the 1960s and 1970s, Lisp, originating in 1958 and elaborated in John McCarthy's 1960 paper, popularized first-class functions—treating functions as data that could be passed as arguments, returned, or stored in variables—laying groundwork for functional programming paradigms.[27] Meanwhile, C, developed by Dennis Ritchie in 1972 at Bell Labs, standardized function declarations with pointer parameters, facilitating efficient memory manipulation and interoperability in system programming.[28] Simula 67, released in 1967 by Ole-Johan Dahl and Kristen Nygaard, served as a precursor to object-oriented programming by integrating functions as methods within class-like structures, enabling simulation of discrete events through encapsulated behaviors.[29] In the 1980s, Ada, standardized in 1983 under the U.S. Department of Defense, introduced packages for modularizing functions and related data, promoting abstraction and reusability in large-scale systems.[30] The ANSI C standard (X3.159-1989), ratified in 1989, formalized the C standard library's function set, ensuring portability of common utilities like input/output and string handling across implementations.[31] Additionally, Occam, designed by David May in 1983 at Inmos, incorporated early support for parallel function calls via concurrent processes and channel-based communication, addressing multiprocessing in hardware like transputers.[32]Fundamental Features
Parameters and Arguments
In computer programming, parameters are the formal variables declared in a function's definition, serving as placeholders for the inputs the function is designed to process.[33] Arguments, in contrast, are the actual values or expressions supplied to the function during a call, which are bound to the corresponding parameters to enable execution.[33] This binding process typically occurs at runtime, associating each argument with its parameter based on position or name, though static analysis in compiled languages may enable compile-time optimizations for efficiency. Functions receive arguments through various passing modes that dictate how data is transferred and manipulated. Pass-by-value, common in languages like C, creates a copy of the argument's value for the parameter, isolating the function's operations from the original data.[34] Pass-by-reference, supported in C++ via reference types (denoted by &), passes an alias to the argument, allowing the function to modify the original variable without copying.[35] Pass-by-name, introduced in ALGOL 60, treats the argument as an unevaluated expression that is reinterpreted each time the parameter is accessed within the function, potentially leading to multiple evaluations.[36] Many languages provide mechanisms for optional or flexible inputs, such as default parameters, which assign predefined values to parameters if no argument is supplied. In Python, default argument values are specified directly in the function signature and are evaluated once at definition time.[13] Python also supports keyword arguments, allowing arguments to be passed by name for clarity and order independence, a feature present since Python 1.0 in 1994 with later enhancements like keyword-only parameters via PEP 3102 in Python 3.0 (2008).[37][38] To handle varying numbers of arguments, variadic functions permit an indefinite count of inputs beyond fixed parameters. In C, variadic functions use an ellipsis (...) in the declaration, as exemplified by printf, where the first fixed argument often describes the types and number of subsequent variable arguments. In functional programming, currying transforms a multi-argument function into a chain of single-argument functions through partial application, a technique rooted in Moses Schönhfinkel's 1924 work on combinatory logic and natively implemented in Haskell, where all functions are curried by default.[39] Modern languages extend parameter handling with advanced binding patterns like destructuring, which unpacks elements from arrays or objects directly into parameters. JavaScript introduced this in ECMAScript 2015 (ES6), enabling concise extraction of properties, such as function({x, y} = obj), to bind named variables from an object argument.[40] These input mechanisms complement output features like return values, facilitating modular code design.Return Values and Side Effects
Functions produce outputs primarily through return mechanisms, which allow them to pass computed values back to the caller upon completion. In languages with static typing, such as C, the return type is explicitly declared in the function signature, enabling compile-time verification of type compatibility; for instance, a function might be defined asint add(int a, int b) to return an integer sum.[41] Dynamically typed languages like Python permit functions to return any object type without prior declaration, with the return statement evaluating an expression or implicitly yielding None if omitted, reflecting runtime type determination.[42] Some languages extend this to multiple return values, as in Go, where a function signature like func divide(x, y float64) (float64, error) can specify several result types, commonly pairing a computed result with an error indicator, unpacked via tuple-like assignment at the call site.[43] Procedures, or functions declared with a void return type in languages like C, perform actions without producing a value, relying instead on other means to convey results, such as modifying inputs passed by reference.[44]
Beyond explicit returns, functions can influence program state through side effects, defined as any observable modification or interaction outside the function's local scope, including alterations to global variables, input/output operations, or database writes.[45] These effects contrast with pure functions, which depend solely on inputs and produce consistent outputs without altering external state, facilitating composability, testing, and optimization; Haskell enforces this purity via its type system, where non-IO functions are inherently side-effect free, as their types preclude mutable operations.[46] Impure functions, prevalent in imperative languages, blend returns with side effects for practical tasks like logging or state updates, though excessive reliance on them can complicate reasoning about program behavior.
Error conditions often integrate with return mechanisms to signal failures without halting execution. In C, functions frequently return special values like -1 to denote errors, paired with the global errno variable for detailed diagnostics, as seen in system calls where success yields non-negative results and failure sets errno accordingly.[47] Languages like Java favor exceptions for error propagation: the throw statement raises an exception object, which unwinds the call stack until caught by a matching try-catch block, allowing structured handling while preserving the return pathway for normal results.[48] Modern extensions address asynchronous and iterative returns. In JavaScript, since the ES6 specification of 2015, asynchronous functions return Promise objects, which resolve to values upon completion of non-blocking operations like network requests, enabling chained handling via .then() without callback nesting.[49] Python's generators, introduced via PEP 255 in Python 2.2 (2001), use the yield keyword to produce a sequence of values on-demand from an iterator, suspending execution between yields while preserving internal state, ideal for memory-efficient processing of large datasets.[50] These mechanisms complement traditional returns by supporting deferred or streamed outputs in concurrent or data-intensive contexts.
Implementation Mechanisms
Calling and Control Flow
In computer programming, functions are invoked using specific syntax that varies slightly across languages but generally follows a direct call pattern where the function name is followed by parentheses enclosing arguments, such asfunc(arg1, arg2) in C. This syntax evaluates the function designator, converts it to a pointer to the function if necessary, passes the arguments after any required promotions or conversions, and executes the function body until it returns.[51] Indirect calls, common in low-level languages like C, employ function pointers to invoke functions dynamically; for instance, a pointer int (*fp)(int) can be dereferenced as (*fp)(5) to call the referenced function, enabling runtime selection of callable code such as in callbacks or sorted tables.[51] Tail calls occur when a function performs another function call as its final action before returning, allowing the caller to reuse its stack frame without additional allocation, a mechanism supported in languages like Scheme and Lua to prevent stack overflow in recursive scenarios.[52]
Control transfer during a function call typically involves pushing the return address—the address of the instruction immediately following the call—onto the call stack and jumping to the function's entry point, as seen in assembly instructions like x86's call which automates this push-and-jump sequence. Upon completion, the function returns by popping the return address from the stack and jumping to it, restoring execution flow to the caller, a model formalized in modern processors to support nested invocations.[53] In early computers lacking dedicated stack hardware, such as the PDP-8 from the 1960s, returns often used indirect jumps: the caller stored the return address in the subroutine's first word, and the subroutine concluded with an indirect jump through that location to resume the caller, accommodating limited instruction sets without explicit stack operations.[54] The Jump to Subroutine (JSR) instruction in 6502 assembly, used in systems like the Commodore 64, pushes the address of the last byte of the JSR instruction onto the stack before branching to the subroutine; the Return from Subroutine (RTS) instruction then pulls this address and increments it by one to resume execution after the JSR, ensuring precise return.[55]
Integration with libraries often relies on dynamic linking, where functions from separate modules are resolved at runtime rather than compile time, facilitating code sharing and modularity. In Microsoft Windows, dynamic-link libraries (DLLs) were introduced with Windows 1.0 in 1985, with significant enhancements in Windows 3.0 in 1990, allowing applications to call functions across modules via imports resolved by the loader, which patches indirect jumps or stubs to the actual addresses in loaded DLLs.[56] This contrasts with traditional calls by deferring address binding, using mechanisms like procedure linkage tables (PLTs) for indirect jumps to shared library entries, reducing executable size but introducing minor runtime overhead.[57]
As a non-preemptive alternative to standard function calls, coroutines enable cooperative multitasking where execution can yield control explicitly without full context switches, implemented in Lua since version 5.0 in 2003. In Lua, coroutines are created via coroutine.create(f), started with coroutine.resume(co), and suspended using coroutine.yield(), allowing the programmer to pause and resume flows manually, ideal for simulations or asynchronous I/O without OS-level preemption.[58] This model treats coroutines as lightweight threads sharing the same stack space, yielding only on explicit calls rather than timers, providing finer control over invocation sequencing than traditional recursive or iterative calls.[59]
Scope and Variable Lifetime
Local variables, declared within the body of a function, have their visibility limited to that function and an automatic lifetime tied to the duration of the function's execution; they are typically allocated on the call stack for efficiency, with storage allocated upon entry and deallocated upon exit.[60] This automatic allocation ensures that local variables do not persist beyond the function's scope, preventing unintended interference between calls.[61] In languages like C, static local variables deviate from this model: declared with thestatic keyword, they retain their initialized value across successive function calls while maintaining block scope, with their lifetime spanning the entire program execution and storage in a fixed data segment.[62]
Scope rules determine how variables are resolved and accessed within functions, distinguishing between lexical (static) and dynamic approaches. Lexical scoping, prevalent in modern languages such as Python, binds variables based on their position in the source code's textual structure, allowing nested functions to access variables from enclosing outer scopes without runtime lookup along the call chain.[63] For instance, in Python, an inner function can read variables from its outer function's scope, following the LEGB resolution order (Local, Enclosing, Global, Built-in), though modifications require the nonlocal keyword to avoid creating a new local variable.[63] Conversely, dynamic scoping, employed in early Lisp dialects, resolves free variables by searching the runtime call stack upward from the point of use, potentially leading to bindings from the calling context rather than the defining one.[64]
Variable lifetime in functions is managed through activation records, data structures pushed onto the call stack during function invocation to hold local variables, parameters, and control information like return addresses.[65] These records ensure that local variables are accessible only during the function's activation and are automatically reclaimed upon return, aligning lifetime with scope.[66] However, when local variables "escape" their activation record—such as when captured in a closure returned from the function—they must be allocated on the heap to outlive the stack frame, as the compiler detects this via escape analysis to prevent dangling references.[67]
Certain architectures from the 1970s, such as the Burroughs B6700, employed stack-based mechanisms to optimize parameter evaluation in functions, using structured activation records and control words to handle parameters efficiently without immediate full stacking upon call setup.[68] In contemporary languages, refinements address broader scoping needs: JavaScript's ES modules, standardized in 2015, introduce module-level scope to encapsulate variables across function boundaries, preventing global pollution.[69] Similarly, ES6 additions like let and const provide block scoping within functions, limiting variable visibility to the nearest enclosing block rather than the entire function, enhancing modularity and reducing errors from variable hoisting.[70]
As functions are called, these scope and lifetime mechanisms integrate with the call stack, where each activation record briefly references the prior frame to resolve nested accesses without persistent storage overhead.[65]
Advanced Capabilities
Recursion and Reentrancy
Recursion allows a function to invoke itself, either directly or indirectly, to solve problems by breaking them down into smaller instances of the same problem. In direct recursion, a function calls itself explicitly within its body, while indirect recursion occurs through mutual calls between two or more functions, forming a cycle that eventually terminates.[71] A classic example of direct recursion is computing the factorial of a number, where the function calls itself with a decremented argument until reaching zero:def factorial(n):
if n == 0: # Base case
return 1
else:
return n * [factorial](/page/Factorial)(n - 1) # Recursive call
def factorial(n):
if n == 0: # Base case
return 1
else:
return n * [factorial](/page/Factorial)(n - 1) # Recursive call
n == 0, is crucial to halt the recursion and avoid infinite execution. Without it, the call stack grows indefinitely, potentially causing a stack overflow error when the maximum stack depth is exceeded.[71] For instance, in languages like Python, the default recursion limit is 1000 frames, beyond which a RecursionError is raised to prevent crashes; this can be queried via sys.getrecursionlimit() and adjusted if needed, though increasing it risks stack overflow in resource-constrained environments.[72][73]
Tail recursion optimization addresses the stack growth issue in recursive calls where the recursive invocation is the last operation in the function, allowing compilers or interpreters to reuse the current stack frame instead of allocating a new one. This technique, which enables constant stack space for potentially deep recursion, was a foundational feature in the design of Scheme, as detailed in Sussman and Steele's 1975 paper introducing the language as an interpreter for extended lambda calculus.[74] In Scheme, proper tail-recursive implementations support an unbounded number of active tail calls, a requirement formalized in later reports like R5RS.[75]
Reentrancy is the capability of a function to be safely interrupted and re-invoked, either by the same thread or another, before the initial execution completes, without corrupting its state or data.[76] Reentrant functions must avoid reliance on global or static mutable variables, instead using only local variables and parameters; pure functions, which produce the same output for the same inputs without side effects or observable interactions with external state, are inherently reentrant due to their deterministic and isolated nature.[77] Thread safety enhances reentrancy in multi-threaded contexts by protecting against concurrent access to shared resources, often through locks, but reentrancy alone does not guarantee thread safety if global state is involved.
In operating systems, reentrancy extends to kernel design, where Unix kernels from the 1970s onward were built to be reentrant, supporting multiple processes and interrupts without interference, as implemented by Thompson and Ritchie to enable efficient time-sharing on limited hardware.[78][79] This design principle, evident in the kernel's handling of system calls and interrupts, allowed safe nested or concurrent executions while maintaining process isolation.
Modern asynchronous programming introduces async recursion, where recursive calls are non-blocking and managed by an event loop, preventing stack overflow in long-running computations. In Node.js, for example, async functions can recursively invoke themselves using promises or await, offloading work to the event loop for I/O operations, thus supporting deep recursion in single-threaded environments without exhausting the call stack.[80][81]
Overloading and Polymorphism
In computer programming, function overloading allows multiple functions to share the same name while differing in the number or types of parameters they accept, enabling compile-time resolution based on the arguments provided. This mechanism, also known as ad-hoc polymorphism, permits developers to define behaviors specific to different input signatures without using distinct names, improving code readability and expressiveness. For instance, in C++, the addition operator+ can be overloaded to handle both integer and floating-point types: int add(int a, int b) and float add(float a, float b), where the compiler selects the appropriate version during overload resolution.[82][83]
Polymorphism extends this concept by allowing functions or methods to exhibit different behaviors at runtime depending on the actual object type, often through subtype polymorphism in object-oriented languages. In Java, this is achieved via method overriding, where a subclass redefines a method from its superclass, and the Java Virtual Machine dispatches to the correct implementation based on the object's runtime type rather than the reference type. For example, a Shape class might define a draw() method overridden in subclasses like Circle and Rectangle, enabling polymorphic calls like shape.draw() to invoke the appropriate version dynamically.[84] Ad-hoc polymorphism can also occur through template specialization in C++, where a generic template function is customized for specific types, blending compile-time decisions with type-specific logic.
Operator overloading builds on these ideas by redefining the behavior of built-in operators as functions, tailored to user-defined types. In Python, this is implemented via special methods such as __add__ for the + operator, allowing classes to specify custom addition logic; for instance, a Vector class can overload __add__ to perform vector summation element-wise.[85] A distinctive approach appears in Common Lisp's Common Lisp Object System (CLOS), introduced in the 1980s, where generic functions defined with defgeneric enable multi-method dispatch based on all argument types, supporting flexible polymorphism beyond single-argument inheritance hierarchies.[86]
Modern languages further refine polymorphism through parametric mechanisms and implicit typing. Rust's generics provide parametric polymorphism, allowing functions to operate uniformly on multiple types constrained by traits, as in a generic max function: fn max<T: PartialOrd>(a: T, b: T) -> T, which compiles to type-specific code without runtime overhead since Rust 1.0 in 2015.[87] In Python, duck typing offers an implicit form of polymorphism by relying on object interfaces rather than explicit type declarations; if an object supports the required methods (e.g., __len__ for length), it can be used interchangeably, avoiding isinstance checks in favor of "if it walks like a duck and quacks like a duck" behavior.[88]
Performance and Optimization
Overhead and Inlining
Function calls introduce runtime overhead primarily due to the mechanics of transferring control and managing stack frames, including pushing and popping registers and parameters onto the stack, which typically consumes 5-20 CPU cycles on modern x86-64 processors for a simple direct call assuming correct branch prediction.[89] On ARM Cortex-A series processors, similar overhead applies, with the BL (branch with link) instruction for calls incurring 1-2 cycles for the instruction itself, but full frame setup adding up to 10-15 cycles depending on parameter passing.[90] Memory costs arise from stack frame allocation, often requiring 16-128 bytes per call for local variables, return addresses, and saved registers, which can lead to cache misses if frames are large or frequent.[91] To measure this overhead, profiling tools such as gprof in the GNU Compiler Collection (GCC) are commonly used; gprof instruments code with mcount calls to build a call graph tracking invocation counts and estimates time spent in callees, while sampling the program counter at intervals (e.g., 100 Hz) to attribute execution time to functions, revealing per-call costs indirectly through self and children times.[92] Factors influencing measured overhead include parameter passing mechanisms: copying small values by value adds cycles for memory operations (e.g., 1-3 cycles per parameter on x86), whereas passing pointers reduces this to near-zero copying but risks aliasing issues.[89] Inlining addresses this overhead by having the compiler replace the function call site with the function's body, eliminating the need for control transfer, parameter setup, and return sequencing, thereby reducing execution time for frequently called small functions.[93] Compilers apply heuristics to decide inlining, such as targeting functions under 10-50 instructions in size or those on hot execution paths identified via profile-guided optimization (PGO), balancing overhead savings against increased code size that could degrade instruction cache performance.[93] In just-in-time (JIT) compilers like the Java Virtual Machine (JVM), introduced in the mid-1990s, dynamic inlining further reduces call overhead by profiling runtime behavior to inline monomorphic or low-polymorphism virtual calls, achieving geometric mean speedups of 3-9% in benchmarks like SPECjvm98 through techniques such as polymorphic inline caches.[94] The impact of branch prediction on function call overhead remains underexplored, particularly for indirect calls on modern hardware such as ARMv9 processors, where mispredictions can add 10-20 penalty cycles, amplifying costs in polymorphic scenarios despite advanced predictors.[95]Compiler Techniques
Compilers employ advanced techniques to optimize function usage by analyzing and transforming code at various stages, enabling more efficient execution without altering program semantics. These strategies build upon foundational optimizations like inlining, extending to interprocedural and whole-program analyses that consider function interactions across modules. Such methods are crucial for reducing code size, improving instruction-level parallelism, and adapting to modern hardware architectures. Dead code elimination identifies and removes portions of functions that are never executed, such as unreachable branches or unused local variables, thereby reducing binary size and cache pressure. This technique often integrates with constant propagation, where compilers infer constant values across function calls by analyzing call sites and propagating them backward, allowing further simplifications like eliminating conditional branches. For instance, if a function parameter is always a compile-time constant, the compiler can specialize the function body accordingly. These optimizations are standard in modern compilers, with GCC implementing aggressive dead code removal since version 4.0 in 2005. Loop unrolling and fusion target functions involving iterative or recursive patterns, expanding loops to reduce overhead from loop control instructions and enabling subsequent optimizations like instruction scheduling. In recursive functions, compilers may apply tail recursion elimination to convert recursion into iteration, unrolling the resulting loops for better performance on processors with limited branch prediction resources. Loop fusion merges adjacent loops within or across functions to minimize memory accesses, particularly beneficial in numerical computations. These techniques rely on interprocedural analysis (IPA), which examines function call graphs to perform optimizations spanning multiple functions, such as common subexpression elimination across call boundaries. IPA has been a core feature in GCC since version 4.1 in 2006, enabling whole-program optimizations that can achieve up to 20% speedup in compute-intensive applications. Link-time optimization (LTO) extends these analyses to the linking phase, allowing cross-module inlining and dead code elimination across compilation units that were previously opaque. In GCC, the -flto flag, introduced in version 4.5 around 2010, devirtualizes calls and propagates constants inter-module, resulting in binaries that are often 10-15% smaller and faster. LLVM's LTO variant integrates seamlessly with its intermediate representation, supporting thin LTO for faster builds while preserving optimization quality. This approach is particularly effective for large codebases with many small functions, as it uncovers optimization opportunities hidden by separate compilation. Profile-guided optimization (PGO) in LLVM uses runtime profiles to identify "hot" functions—those executed frequently—and prioritizes aggressive optimizations like loop unrolling or IPA on them, improving overall performance by 10-20% in profiled workloads. By instrumenting code to collect execution counts, PGO informs decisions such as function reordering for better cache locality, a capability enhanced in LLVM since its integration in the early 2010s. Vectorization transforms scalar operations in functions into SIMD instructions, exploiting parallelism in functional code like map-reduce patterns common in array-processing functions. Modern compilers like Intel's ICC (now part of oneAPI) automatically vectorize loops within functions using auto-vectorization, achieving speedups of 2-8x on vector units since the 2010s, provided data dependencies allow. This is especially relevant for functional programming paradigms where pure functions facilitate safe parallelization. Emerging AI-assisted optimizations, such as MLGO in LLVM, leverage machine learning to guide decisions in areas like inlining and instruction selection for functions, outperforming traditional heuristics in complex scenarios. Introduced in LLVM in 2021, MLGO trains models on optimization traces to predict beneficial transformations, yielding up to 5% performance gains in benchmarks as of 2025. These tools represent a shift toward data-driven compilation, particularly for optimizing recursive or polymorphic functions.Programming Considerations
Conventions and Best Practices
In computer programming, naming conventions for functions emphasize descriptiveness and consistency to enhance code readability and maintainability. In Java, method names should be verbs written in mixed case, starting with a lowercase letter and capitalizing the first letter of each subsequent word, such asrunFast().[96] Similarly, Python's PEP 8 recommends function names in lowercase separated by underscores (snake_case), like calculate_total(), to align with the language's overall style for clarity.[97] These conventions stem from the broader single responsibility principle, which dictates that a function should perform one specific task, reducing complexity and making changes easier without affecting unrelated code.[98]
Documentation practices for functions typically involve structured comments that describe purpose, parameters, return values, and exceptions. The Javadoc standard, used in Java, employs block comments starting with /** to generate API documentation, including tags like @param for inputs and @return for outputs, ensuring self-documenting code.[99] For parameter validation, assertions serve as a tool for internal checks during development, verifying preconditions like non-null inputs, but they should not replace explicit runtime checks in public interfaces to avoid silent failures in production.
Error handling in functions favors fail-fast approaches, where invalid conditions trigger immediate exceptions rather than propagating error codes, allowing early detection and cleaner control flow. In Java, this principle recommends throwing runtime exceptions for programming errors, as opposed to checked exceptions or integer return codes, to prevent masking bugs and ensure robust systems.[100] For reentrant functions, which can be interrupted and resumed without issues, idempotency is a key practice: designing the function to produce the same output for the same input regardless of repeated calls, minimizing side effects and supporting safe concurrency.[101]
The Unix philosophy, articulated as early as the 1970s, advocates for small, focused functions that handle text streams and compose via pipes, promoting modularity and reusability in system design. This approach, exemplified in tools like grep and sort, encourages functions to do one thing well and interface simply, facilitating powerful pipelines without tight coupling.[102]
In modern languages like TypeScript, accessibility conventions control function visibility using modifiers such as public (default, accessible everywhere), private (limited to the class), and protected (accessible within the class and subclasses), introduced in version 1.3 in 2014 to enforce encapsulation and prevent unintended external access.[103]
Trade-offs in Design
Functions in computer programming offer significant advantages in promoting modularity, which facilitates code reuse by allowing developers to define reusable logic once and invoke it across multiple parts of a program, thereby reducing duplication and enhancing maintainability.[104] This modular approach also simplifies testing and debugging through isolation, as individual functions can be unit-tested independently without affecting the broader system, enabling black-box reuse and multi-person development.[104] Furthermore, functions contribute to scalability in large systems by breaking down complex programs into manageable components, supporting easier maintenance and extension as the codebase grows.[105] Despite these benefits, functions introduce several disadvantages that can impact program efficiency and development. Frequent small function calls incur runtime overhead due to the costs of parameter passing, stack management, and context switching, which can degrade performance in performance-critical applications.[106] Deep recursion, a common use of functions, complicates debugging by producing lengthy stack traces that obscure error locations and risk stack overflow if not managed carefully.[107] Additionally, the abstraction provided by functions can hide underlying bugs, making it harder to trace issues across modular boundaries and potentially leading to subtle errors in integration.[108] Key trade-offs in function design revolve around granularity, where an excess of small functions promotes readability and reuse but increases call overhead and cognitive complexity for developers navigating numerous abstractions, contrasted with monolithic functions that minimize overhead at the expense of maintainability.[104] Pure functions, which avoid side effects, enhance parallelism by enabling independent evaluation of arguments without altering results, facilitating scalable concurrent execution in functional languages.[109] However, this purity limits direct I/O operations, as such actions introduce non-determinism that undermines referential transparency and requires special handling like monads or external interfaces.[110] In embedded systems, these trade-offs are particularly acute; function calls add unacceptable overhead in real-time environments like AVR microcontrollers, where tight timing constraints demand inlining or avoidance to prevent latency violations, prioritizing raw execution speed over modularity.[111] Similarly, in serverless computing paradigms such as Functions as a Service (FaaS), introduced by AWS Lambda in 2014, the benefits of on-demand scalability come at the cost of cold starts—initialization latencies that can reach hundreds of milliseconds when functions scale from zero, impacting response times for sporadic workloads.[112]Examples in Practice
Procedural and Imperative Languages
In procedural and imperative programming languages, functions are typically defined to perform specific tasks with explicit control flow, often involving side effects such as modifying global state or variables passed by reference. These languages emphasize step-by-step execution, where functions encapsulate reusable code blocks that can be invoked sequentially. Classic examples include C, BASIC variants, and PL/I, each with distinct syntax for defining and calling functions to support imperative paradigms. In the C programming language, functions are declared with a return type, name, parameter list in parentheses, and a body enclosed in braces. For instance, a simple function to add two integers is defined asint add(int a, int b) { return a + b; }, and invoked as int result = add(1, 2);.[113] Parameters in C are passed by value, meaning copies of arguments are made, so modifications inside the function do not affect the caller's variables unless pointers are used for pass-by-reference semantics; for example, void swap(int *x, int *y) { int temp = *x; *x = *y; *y = temp; } called with swap(&a, &b); exchanges the values of a and b.[113] The C23 standard (ISO/IEC 9899:2024, published October 31, 2024) introduces the constexpr specifier for objects, enabling compile-time evaluation of their initializers when they are constant expressions.
BASIC, originating in the 1960s at Dartmouth College, initially used imperative constructs like GOSUB to jump to a subroutine label and RETURN to resume execution, as seen in early implementations such as 10 GOSUB 100 followed by subroutine code ending in RETURN, which supported basic modularization without formal parameters.[114] Modern variants like Microsoft Small Basic employ structured syntax with Sub for subroutine definition and EndSub to close it, for example: Sub HelloWorld TextWindow.WriteLine("Hello, World!") EndSub called via HelloWorld();, promoting cleaner control flow over unstructured jumps.[115] In Visual Basic, functions support explicit parameter passing modes: ByVal passes a copy of the argument (default for value types), while ByRef allows direct modification of the original variable, as in Function Triple(ByRef x As Integer) As Integer x = x * 3: Triple = x End Function where calling Triple(myVar) alters myVar if passed ByRef.[116]
PL/I, designed in the 1960s for systems programming, features robust procedure syntax with the PROCEDURE statement, supporting multiple entry points via the ENTRY declaration for selective invocation within the same procedure block. For example, random: procedure; setrands: entry; declare seed fixed; /* main entry */ if seed = 0 then seed = 12345; return seed * 1103515245; setrands: entry; seed = param; return; end random; allows calls to either the primary random or secondary setrands entry to initialize or generate pseudo-random numbers.[117] Uniquely, PL/I's multitasking facilities enable concurrent calls to the same procedure from multiple tasks in a multithreaded environment, managed through built-in functions like WAIT and RELEASE to synchronize access and prevent race conditions in shared state.[118]
Functional and Modern Languages
In functional programming languages, functions are treated as first-class citizens, enabling higher-order functions that accept or return other functions, and anonymous functions for concise expressions. Python supports this paradigm through lambda expressions, which define anonymous functions inline, often used with built-in higher-order functions likemap() and filter(). For instance, map(lambda x: x**2, [1, 2, 3]) applies the squaring function to each element of the list, producing [1, 4, 9]. Similarly, filter(lambda x: x > 0, [-1, 0, 1]) selects positive elements, yielding [1]. These constructs promote declarative code by focusing on what to compute rather than how, aligning with functional principles.
Python also provides list comprehensions as a functional alternative to explicit loops and higher-order applications, offering succinct syntax for transforming iterables. The expression [x**2 for x in [1, 2, 3] if x > 1] filters and maps in one line, equivalent to list(filter(lambda x: x > 1, map(lambda x: x**2, [1, 2, 3]))), resulting in [4, 9]. This approach enhances readability and performance by avoiding intermediate lists in generator forms like (x**2 for x in [1, 2, 3] if x > 1). For asynchronous programming, Python introduced async def in version 3.5 (released 2015), allowing coroutine functions that use await for non-blocking operations, such as async def fetch_data(): data = await api_call(); return data.[119][120]
In logic-based functional languages like Prolog, predicates serve as functions, defined through clauses that unify inputs and outputs declaratively. The standard append/3 predicate, for example, concatenates lists: append([1,2], [3], [1,2,3]) succeeds by unifying the third argument with the result of joining the first two.[121] Prolog's backtracking mechanism enables non-deterministic function calls, automatically retrying alternative clauses upon failure to explore multiple solutions. For append/3, querying append(X, Y, [1,2,3]) generates bindings like X=[1], Y=[2,3] and then X=[1,2], Y=[3], demonstrating exhaustive search without explicit loops.[122]
Modern languages extend these ideas with concise syntax for anonymous and asynchronous functions. JavaScript's ES6 (2015) introduced arrow functions (=>) as a compact form of anonymous functions, such as const square = x => x * x;, which lexically bind this unlike traditional functions.[123] ES2017 added async/await for handling promises asynchronously: async function fetchData() { const response = await fetch('/api'); return response.json(); }, simplifying callback-heavy code.[81] JavaScript closures, where functions capture their lexical scope, enable patterns like private variables: function counter() { let count = 0; return () => ++count; } const increment = counter(); increment(); // 1.[124] This captures the environment for stateful yet functional behavior.
Haskell emphasizes pure functions, which always produce the same output for the same input without side effects, using monads to encapsulate impurity like I/O. A pure function like add x y = x + y can be composed freely, but side effects are managed via the IO monad: main = do putStrLn "Hello"; return (), where do notation sequences actions without altering purity. Monads like IO ensure referential transparency by treating effects as values in a computational context, allowing safe composition.[125]
Rust, since its 1.0 stable release in 2015, supports closures as anonymous functions that capture variables by reference, mutable reference, or ownership, integrating functional traits like Fn, FnMut, and FnOnce. For example, let add_one = |x: i32| x + 1; vec.iter().map(add_one).collect::<Vec<_>>(); uses a closure with map for transformation, ensuring memory safety through borrow checking.[126] This enables higher-order iteration without runtime overhead from dynamic dispatch when types are known.[126]