Recent from talks
Nothing was collected or created yet.
Tail call
View on WikipediaIn computer science, a tail call is a subroutine call performed as the final action of a procedure.[1] If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion (or tail-end recursion) is particularly useful, and is often easy to optimize in implementations.
Tail calls can be implemented without adding a new stack frame to the call stack. Most of the frame of the current procedure is no longer needed, and can be replaced by the frame of the tail call, modified as appropriate (similar to overlay for processes, but for function calls). The program can then jump to the called subroutine. Producing such code instead of a standard call sequence is called tail-call elimination or tail-call optimization. Tail-call elimination allows procedure calls in tail position to be implemented as efficiently as goto statements, thus allowing efficient structured programming. In the words of Guy L. Steele, "in general, procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can be uniformly coded as [machine code] JUMP instructions."[2]
Not all programming languages require tail-call elimination. However, in functional programming languages, tail-call elimination is often guaranteed by the language standard, allowing tail recursion to use a similar amount of memory as an equivalent loop. The special case of tail-recursive calls, when a function calls itself, may be more amenable to call elimination than general tail calls. When the language semantics do not explicitly support general tail calls, a compiler can often still optimize sibling calls, or tail calls to functions which take and return the same types as the caller.[3]
Description
[edit]When a function is called, the computer must "remember" the place it was called from, the return address, so that it can return to that location with the result once the call is complete. Typically, this information is saved on the call stack, a list of return locations in the order that the call locations were reached. In addition, compilers allocate memory for local variables of the called function and push register content (if any and/or relevant) onto the stack. Typically, it is done by allocating a stack frame including saved registers, space allocated for non-register local variables, return address and call parameters (unless they are passed in registers). For tail calls, there is no need to remember the caller or preserve content of registers – instead, tail-call elimination avoids allocation of new stack frames and makes only the minimum necessary changes to the existing stack frame before passing it on, and the tail-called function will return directly to the original caller.[4] This, however, leads to complete loss of the caller's stack frame, which is sometimes considered as a hindrance in debugging. The tail call doesn't have to appear lexically after all other statements in the source code; it is only important that the calling function return immediately after the tail call, returning the tail call's result if any, since the calling function is bypassed when the optimization is performed.
For non-recursive function calls, this is usually an optimization that saves only a little time and space, since there are not that many different functions available to call. When dealing with recursive or mutually recursive functions where recursion happens through tail calls, however, the stack space and the number of returns saved can grow to be very significant, since a function can call itself, directly or indirectly, creating a new call stack frame each time. Tail-call elimination often reduces asymptotic stack space requirements from linear, or O(n), to constant, or O(1). Tail-call elimination is thus required by the standard definitions of some programming languages, such as Scheme, and languages in the ML family among others.[5][6] The Scheme language definition formalizes the intuitive notion of tail position exactly, by specifying which syntactic forms allow having results in tail context.[7] Implementations allowing an unlimited number of tail calls to be active at the same moment, thanks to tail-call elimination, can also be called 'properly tail recursive'.[5]
Besides space and execution efficiency, tail-call elimination is important in the functional programming idiom known as continuation-passing style (CPS), which would otherwise quickly run out of stack space.
Syntactic form
[edit]A tail call can be located just before the syntactical end of a function:
function foo(data) {
a(data);
return b(data);
}
Here, both a(data) and b(data) are calls, but b is the last thing the procedure executes before returning and is thus in tail position. However, not all tail calls are necessarily located at the syntactical end of a subroutine:
function bar(data) {
if (a(data)) {
return b(data);
}
return c(data);
}
Here, both calls to b and c are in tail position. This is because each of them lies in the end of if-branch respectively, even though the first one is not syntactically at the end of bar's body.
In this code:
function foo1(data) {
return a(data) + 1;
}
function foo2(data) {
var ret = a(data);
return ret;
}
function foo3(data) {
var ret = a(data);
return (ret == 0) ? 1 : ret;
}
the call to a(data) is in tail position in foo2, but it is not in tail position either in foo1 or in foo3, because control must return to the caller to allow it to inspect or modify the return value before returning it.
Example programs
[edit]The following program is an example in Scheme:[8]
;; factorial : number -> number
;; to calculate the product of all positive
;; integers less than or equal to n.
(define (factorial n)
(if (= n 0)
1
(* n (factorial (- n 1)))))
This is not written in a tail-recursive style, because the multiplication function ("*") is in the tail position. This can be compared to:
;; factorial : number -> number
;; to calculate the product of all positive
;; integers less than or equal to n.
(define (factorial n)
(fact-iter 1 n))
(define (fact-iter product n)
(if (= n 0)
product
(fact-iter (* product n)
(- n 1))))
This program assumes applicative-order evaluation. The inner procedure fact-iter calls itself last in the control flow. This allows an interpreter or compiler to reorganize the execution which would ordinarily look like this:[8]
call factorial (4)
call fact-iter (1 4)
call fact-iter (4 3)
call fact-iter (12 2)
call fact-iter (24 1)
return 24
return 24
return 24
return 24
return 24
into the more efficient variant, in terms of both space and time:
call factorial (4) call fact-iter (1 4) replace arguments with (4 3) replace arguments with (12 2) replace arguments with (24 1) return 24 return 24
This reorganization saves space because no state except for the calling function's address needs to be saved, either on the stack or on the heap, and the call stack frame for fact-iter is reused for the intermediate results storage. This also means that the programmer need not worry about running out of stack or heap space for extremely deep recursions. In typical implementations, the tail-recursive variant will be substantially faster than the other variant, but only by a constant factor.
Some programmers working in functional languages will rewrite recursive code to be tail recursive so they can take advantage of this feature. This often requires addition of an "accumulator" argument (product in the above example) to the function.
Tail recursion modulo cons
[edit]Tail recursion modulo cons is a generalization of tail-recursion optimization introduced by David H. D. Warren[9] in the context of compilation of Prolog, seen as an explicitly set once language. It was described (though not named) by Daniel P. Friedman and David S. Wise in 1974[10] as a LISP compilation technique. As the name suggests, it applies when the only operation left to perform after a recursive call is to prepend a known value in front of the list returned from it (or to perform a constant number of simple data-constructing operations, in general). This call would thus be a tail call save for ("modulo") the said cons operation. But prefixing a value at the start of a list on exit from a recursive call is the same as appending this value at the end of the growing list on entry into the recursive call, thus building the list as a side effect, as if in an implicit accumulator parameter. The following Prolog fragment illustrates the concept:
Example code
[edit]% Prolog, tail recursive modulo cons:
partition([], _, [], []).
partition([X|Xs], Pivot, [X|Rest], Bigs) :-
X @< Pivot, !,
partition(Xs, Pivot, Rest, Bigs).
partition([X|Xs], Pivot, Smalls, [X|Rest]) :-
partition(Xs, Pivot, Smalls, Rest).
|
-- In Haskell, guarded recursion:
partition [] _ = ([],[])
partition (x:xs) p
| x < p = (x:a,b)
| otherwise = (a,x:b)
where
(a,b) = partition xs p
|
% Prolog, with explicit unifications:
% non-tail recursive translation:
partition([], _, [], []).
partition(L, Pivot, Smalls, Bigs) :- L=[X|Xs],
( X @< Pivot
-> partition(Xs,Pivot,Rest,Bigs), Smalls=[X|Rest]
; partition(Xs,Pivot,Smalls,Rest), Bigs=[X|Rest]
).
|
% Prolog, with explicit unifications:
% tail-recursive translation:
partition([], _, [], []).
partition(L, Pivot, Smalls, Bigs) :- L=[X|Xs],
( X @< Pivot
-> Smalls=[X|Rest], partition(Xs,Pivot,Rest,Bigs)
; Bigs=[X|Rest], partition(Xs,Pivot,Smalls,Rest)
).
|
Thus in tail-recursive translation such a call is transformed into first creating a new list node and setting its first field, and then making the tail call with the pointer to the node's rest field as argument, to be filled recursively. The same effect is achieved when the recursion is guarded under a lazily evaluated data constructor, which is automatically achieved in lazy programming languages like Haskell.
C example
[edit]The following fragment defines a recursive function in C that duplicates a linked list (with some equivalent Scheme and Prolog code as comments, for comparison):
typedef struct LinkedList {
void* value;
struct LinkedList* next;
} LinkedList;
LinkedList* duplicate(const LinkedList** ls) {
LinkedList* head = NULL;
if (ls) {
LinkedList* p = duplicate(ls->next);
head = (LinkedList*)malloc(sizeof(*head));
head->value = ls->value;
head->next = p;
}
return head;
}
|
;; in Scheme,
(define (duplicate ls)
(if (not (null? ls))
(cons (car ls)
(duplicate (cdr ls)))
'()))
|
%% in Prolog,
duplicate([X|Xs],R):-
duplicate(Xs,Ys),
R=[X|Ys].
duplicate([],[]).
|
In this form the function is not tail recursive, because control returns to the caller after the recursive call duplicates the rest of the input list. Even if it were to allocate the head node before duplicating the rest, it would still need to plug in the result of the recursive call into the next field after the call.[a]
So the function is almost tail recursive. Warren's method pushes the responsibility of filling the next field into the recursive call itself, which thus becomes tail call.[b] Using sentinel head node to simplify the code,
void duplicate_aux(const LinkedList* ls, LinkedList* end) {
if (ls) {
end->next = (LinkdList*)malloc(sizeof(*end));
end->next->value = ls->value;
duplicate_aux(ls->next, end->next);
} else {
end->next = NULL;
}
}
LinkedList* duplicate(const LinkedList* ls) {
LinkedList head;
duplicate_aux(ls, &head);
return head.next;
}
|
;; in Scheme,
(define (duplicate ls)
(let ((head (list 1)))
(let dup ((ls ls)
(end head))
(cond
((not (null? ls))
(set-cdr! end (list (car ls)))
(dup (cdr ls) (cdr end)))))
(cdr head)))
|
%% in Prolog,
duplicate([X|Xs],R):-
R=[X|Ys],
duplicate(Xs,Ys).
duplicate([],[]).
|
The callee now appends to the end of the growing list, rather than have the caller prepend to the beginning of the returned list. The work is now done on the way forward from the list's start, before the recursive call which then proceeds further, instead of backward from the list's end, after the recursive call has returned its result. It is thus similar to the accumulating parameter technique, turning a recursive computation into an iterative one.
Characteristically for this technique, a parent frame is created on the execution call stack, which the tail-recursive callee can reuse as its own call frame if the tail-call optimization is present.
The tail-recursive implementation can now be converted into an explicitly iterative implementation, as an accumulating loop:
LinkedList* duplicate(const LinkedList* ls) {
LinkedList head;
LinkedList* end;
end = &head;
while (ls) {
end->next = (LinkedList*)malloc(sizeof(*end));
end->next->value = ls->value;
ls = ls->next;
end = end->next;
}
end->next = NULL;
return head.next;
}
|
;; in Scheme,
(define (duplicate ls)
(let ((head (list 1)))
(do ((end head (cdr end))
(ls ls (cdr ls )))
((null? ls) (cdr head))
(set-cdr! end (list (car ls))))))
|
%% in Prolog,
%% N/A
|
History
[edit]In a paper delivered to the ACM conference in Seattle in 1977, Guy L. Steele summarized the debate over the GOTO and structured programming, and observed that procedure calls in the tail position of a procedure can be best treated as a direct transfer of control to the called procedure, typically eliminating unnecessary stack manipulation operations.[2] Since such "tail calls" are very common in Lisp, a language where procedure calls are ubiquitous, this form of optimization considerably reduces the cost of a procedure call compared to other implementations. Steele argued that poorly-implemented procedure calls had led to an artificial perception that the GOTO was cheap compared to the procedure call. Steele further argued that "in general procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can be uniformly coded as [machine code] JUMP instructions", with the machine code stack manipulation instructions "considered an optimization (rather than vice versa!)".[2] Steele cited evidence that well-optimized numerical algorithms in Lisp could execute faster than code produced by then-available commercial Fortran compilers because the cost of a procedure call in Lisp was much lower. In Scheme, a Lisp dialect developed by Steele with Gerald Jay Sussman, tail-call elimination is guaranteed to be implemented in any interpreter.[11]
Implementation methods
[edit]Tail recursion is important to some high-level languages, especially functional and logic languages and members of the Lisp family. In these languages, tail recursion is the most commonly used way (and sometimes the only way available) of implementing iteration. The language specification of Scheme requires that tail calls are to be optimized so as not to grow the stack. Tail calls can be made explicitly in Perl, with a variant of the "goto" statement that takes a function name: goto &NAME;[12]
However, for language implementations which store function arguments and local variables on a call stack (which is the default implementation for many languages, at least on systems with a hardware stack, such as the x86), implementing generalized tail-call optimization (including mutual tail recursion) presents an issue: if the size of the callee's activation record is different from that of the caller, then additional cleanup or resizing of the stack frame may be required. For these cases, optimizing tail recursion remains trivial, but general tail-call optimization may be harder to implement efficiently.
For example, in the Java virtual machine (JVM), tail-recursive calls can be eliminated (as this reuses the existing call stack), but general tail calls cannot be (as this changes the call stack).[13][14] As a result, functional languages such as Scala that target the JVM can efficiently implement direct tail recursion, but not mutual tail recursion.
The GCC, LLVM/Clang, and Intel compiler suites perform tail-call optimization for C and other languages at higher optimization levels or when the -foptimize-sibling-calls option is passed.[15][16][17] Though the given language syntax may not explicitly support it, the compiler can make this optimization whenever it can determine that the return types for the caller and callee are equivalent, and that the argument types passed to both function are either the same, or require the same amount of total storage space on the call stack.[18]
Various implementation methods are available.
In assembly
[edit]Tail calls are often optimized by interpreters and compilers of functional programming and logic programming languages to more efficient forms of iteration. For example, Scheme programmers commonly express while loops as calls to procedures in tail position and rely on the Scheme compiler or interpreter to substitute the tail calls with more efficient jump instructions.[19]
For compilers generating assembly directly, tail-call elimination is easy: it suffices to replace a call opcode with a jump one, after fixing parameters on the stack. From a compiler's perspective, the first example above is initially translated into pseudo-assembly language (in fact, this is valid x86 assembly):
foo:
call B
call A
ret
Tail-call elimination replaces the last two lines with a single jump instruction:
foo:
call B
jmp A
After subroutine A completes, it will then return directly to the return address of foo, omitting the unnecessary ret statement.
Typically, the subroutines being called need to be supplied with parameters. The generated code thus needs to make sure that the call frame for A is properly set up before jumping to the tail-called subroutine. For instance, on platforms where the call stack does not just contain the return address, but also the parameters for the subroutine, the compiler may need to emit instructions to adjust the call stack. On such a platform, for the code:
function foo(data1, data2) B(data1) return A(data2)
(where data1 and data2 are parameters) a compiler might translate that as:[c]
foo:
mov reg,[sp+data1] ; fetch data1 from stack (sp) parameter into a scratch register.
push reg ; put data1 on stack where B expects it
call B ; B uses data1
pop ; remove data1 from stack
mov reg,[sp+data2] ; fetch data2 from stack (sp) parameter into a scratch register.
push reg ; put data2 on stack where A expects it
call A ; A uses data2
pop ; remove data2 from stack.
ret
A tail-call optimizer could then change the code to:
foo:
mov reg,[sp+data1] ; fetch data1 from stack (sp) parameter into a scratch register.
push reg ; put data1 on stack where B expects it
call B ; B uses data1
pop ; remove data1 from stack
mov reg,[sp+data2] ; fetch data2 from stack (sp) parameter into a scratch register.
mov [sp+data1],reg ; put data2 where A expects it
jmp A ; A uses data2 and returns immediately to caller.
This code is more efficient both in terms of execution speed and use of stack space.
Through trampolining
[edit]Since many Scheme compilers use C as an intermediate target code, the tail recursion must be encoded in C without growing the stack, even if the C compiler does not optimize tail calls. Many implementations achieve this by using a device known as a trampoline, a piece of code that repeatedly calls functions. All functions are entered via the trampoline. When a function has to tail-call another, instead of calling it directly and then returning the result, it returns the address of the function to be called and the call parameters back to the trampoline (from which it was called itself), and the trampoline takes care of calling this function next with the specified parameters. This ensures that the C stack does not grow and iteration can continue indefinitely.
It is possible to implement trampolines using higher-order functions in languages that support them, such as Groovy, Visual Basic .NET and C#.[20]
Using a trampoline for all function calls is rather more expensive than the normal C function call, so at least one Scheme compiler, Chicken, uses a technique first described by Henry Baker from an unpublished suggestion by Andrew Appel,[21] in which normal C calls are used but the stack size is checked before every call. When the stack reaches its maximum permitted size, objects on the stack are garbage-collected using the Cheney algorithm by moving all live data into a separate heap. Following this, the stack is unwound ("popped") and the program resumes from the state saved just before the garbage collection. Baker says "Appel's method avoids making a large number of small trampoline bounces by occasionally jumping off the Empire State Building."[21] The garbage collection ensures that mutual tail recursion can continue indefinitely. However, this approach requires that no C function call ever returns, since there is no guarantee that its caller's stack frame still exists; therefore, it involves a much more dramatic internal rewriting of the program code: continuation-passing style.
Relation to the while statement
[edit]Tail recursion can be related to the while statement, an explicit iteration, for instance by transforming
procedure foo(x)
if p(x)
return bar(x)
else
return foo(baz(x))
into
procedure foo(x)
while true
if p(x)
return bar(x)
else
x ← baz(x)
where x may be a tuple involving more than one variable: if so, care must be taken in implementing the assignment statement x ← baz(x) so that dependencies are respected. One may need to introduce auxiliary variables or use a swap construct.
More generally,
procedure foo(x)
if p(x)
return bar(x)
else if q(x)
return baz(x)
...
else if r(x)
return foo(qux(x))
...
else
return foo(quux(x))
can be transformed into
procedure foo(x)
while true
if p(x)
return bar(x)
else if q(x)
return baz(x)
...
else if r(x)
x ← qux(x)
...
else
x ← quux(x)
For instance, this Julia program gives a non-tail recursive definition fact of the factorial:
function factorial(n)
if n == 0
return 1
else
return n * factorial(n - 1)
end
end
Indeed, n * factorial(n - 1) wraps the call to factorial. But it can be transformed into a tail-recursive definition by adding an argument a called an accumulator.[8]
This Julia program gives a tail-recursive definition fact_iter of the factorial:
function factorial(n::Integer, a::Integer)
if n == 0:
return a
else
return factorial(n - 1, n * a)
end
end
function factorial(n::Integer)
return factorial(n, 1)
end
This Julia program gives an iterative definition fact_iter of the factorial:
function fact_iter(n::Integer, a::Integer)
while n > 0
a = n * a
n = n - 1
end
return a
end
function factorial(n::Integer)
return fact_iter(n, one(n))
end
Language support
[edit]- C++ – C and C++ both do tail-call optimization.[22]
- Clojure – Clojure has
recurspecial form.[23] - Common Lisp – Some implementations perform tail-call optimization during compilation if optimizing for speed
- Elixir – Elixir implements tail-call optimization,[24]as do all languages currently targeting the BEAM VM.
- Elm – Yes[25]
- Erlang – Yes
- F# – F# implements TCO by default where possible [26]
- Go – No support[27]
- Haskell – Yes[28]
- JavaScript – ECMAScript 6.0 compliant engines should have tail calls[29] which is now implemented on Safari/WebKit[30] but rejected by V8 and SpiderMonkey
- Kotlin – Has
tailrecmodifier for functions[31] - Lua – Tail recursion is required by the language definition[32]
- Objective-C – Compiler optimizes tail calls when -O1 (or higher) option specified, but it is easily disturbed by calls added by Automatic Reference Counting (ARC).
- OCaml – Yes
- Perl – Explicit with a variant of the "goto" statement that takes a function name:
goto &NAME;[33] - Prolog – SWI-Prolog implements tail-recursion optimization.[34]
- PureScript – Yes
- Python – Stock Python implementations do not perform tail-call optimization, though a third-party module is available to do this.[35] Language inventor Guido van Rossum contended that stack traces are altered by tail-call elimination making debugging harder, and preferred that programmers use explicit iteration instead.[36] In Python 3.14, a new interpreter was introduced that uses tail-call based dispatch of Python opcodes.[37] This resulted in overall improved performance when compared to Python 3.13.[38][39]
- R – Yes,
tailcall()function introduced in R.4.4.0[40] - Racket – Yes[41]
- Ruby – Yes, but disabled by default [42]
- Rust – tail-call optimization may be done in limited circumstances, but is not guaranteed[43]
- Scala – Tail-recursive functions are automatically optimized by the compiler. Such functions can also optionally be marked with a
@tailrecannotation, which makes it a compilation error if the function is not tail recursive[44] - Scheme – Required by the language definition[45][46]
- Swift – In some cases (as of 2014).[47]
- Tcl – Since Tcl 8.6, Tcl has a
tailcallcommand[48] - Zig – Yes[49]
See also
[edit]Notes
[edit]- ^ Like this:
if (ls) { head = (LinkedList*)malloc(sizeof(*head)); head->value = ls->value; head->next = duplicate(ls->next); }
- ^ Like this:
if (ls) { head = (LinkedList*)malloc(sizeof(*head)); head->value = ls->value; duplicate(ls->next, &(head->next)); }
- ^ The
callinstruction first pushes the current code location onto the stack and then performs an unconditional jump to the code location indicated by the label. Theretinstruction first pops a code location off the stack, then performs an unconditional jump to the retrieved code location.
References
[edit]- ^ Steven Muchnick; Muchnick and Associates (15 August 1997). Advanced Compiler Design Implementation. Morgan Kaufmann. ISBN 978-1-55860-320-2.
- ^ a b c Steele, Guy Lewis (1977). "Debunking the "expensive procedure call" myth or, procedure call implementations considered harmful or, LAMBDA: The Ultimate GOTO". Proceedings of the 1977 annual conference on - ACM '77. pp. 153–162. doi:10.1145/800179.810196. hdl:1721.1/5753. ISBN 978-1-4503-2308-6. S2CID 9807843.
- ^ "The LLVM Target-Independent Code Generator — LLVM 7 documentation". llvm.org.
- ^ "recursion - Stack memory usage for tail calls - Theoretical Computer Science". Cstheory.stackexchange.com. 2011-07-29. Retrieved 2013-03-21.
- ^ a b "Revised [6] Report on the Algorithmic Language Scheme". R6rs.org. Retrieved 2013-03-21.
- ^ "Revised [6] Report on the Algorithmic Language Scheme - Rationale". R6rs.org. Retrieved 2013-03-21.
- ^ "Revised [6] Report on the Algorithmic Language Scheme". R6rs.org. Retrieved 2013-03-21.
- ^ a b c Sussman, G. J.; Abelson, Hal (1984). Structure and Interpretation of Computer Programs. Cambridge, MA: MIT Press. ISBN 0-262-01077-1.
- ^ D. H. D. Warren, DAI Research Report 141, University of Edinburgh, 1980.
- ^ Daniel P. Friedman and David S. Wise, Technical Report TR19: Unwinding Structured Recursions into Iterations, Indiana University, Dec. 1974. PDF available here (webarchived copy here).
- ^ R5RS Sec. 3.5, Richard Kelsey; William Clinger; Jonathan Rees; et al. (August 1998). "Revised5 Report on the Algorithmic Language Scheme". Higher-Order and Symbolic Computation. 11 (1): 7–105. doi:10.1023/A:1010051815785. S2CID 14069423.
- ^ Contact details. "goto". perldoc.perl.org. Retrieved 2013-03-21.
- ^ "What is difference between tail calls and tail recursion?", Stack Overflow
- ^ "What limitations does the JVM impose on tail-call optimization", Programmers Stack Exchange
- ^ Lattner, Chris. "LLVM Language Reference Manual, section: The LLVM Target-Independent Code Generator, sub: Tail Call Optimization". The LLVM Compiler Infrastructure. The LLVM Project. Retrieved 24 June 2018.
- ^ "Using the GNU Compiler Collection (GCC): Optimize Options". gcc.gnu.org.
- ^ "foptimize-sibling-calls". software.intel.com.
- ^ "Tackling C++ Tail Calls".
- ^ Probst, Mark (20 July 2000). "proper tail recursion for gcc". GCC Project. Retrieved 10 March 2015.
- ^ Samuel Jack, Bouncing on your tail. Functional Fun. April 9, 2008.
- ^ a b Henry Baker, "CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A." Archived 2006-03-03 at the Wayback Machine
- ^ "Which, if any, C++ compilers do tail-recursion optimization?". stackoverflow.
- ^ "(recur expr*)". clojure.org.
- ^ "Recursion". elixir-lang.github.com.
- ^ Czaplicki, Evan. "Functional Programming in Elm: Tail-Call Elimination".
- ^ "Tail Calls in F#". msdn. Microsoft. 8 July 2011.
- ^ "proposal: Go 2: add become statement to support tail calls". github.com.
- ^ "Tail recursion - HaskellWiki". wiki.haskell.org. Retrieved 2019-06-08.
- ^ Beres-Deak, Adam. "Worth watching: Douglas Crockford speaking about the good new parts of JavaScript in 2014". bdadam.com.
- ^ "ECMAScript 6 in WebKit". 13 October 2015.
- ^ "Functions: infix, vararg, tailrec - Kotlin Programming Language". Kotlin.
- ^ "Lua 5.3 Reference Manual". www.lua.org.
- ^ "goto - perldoc.perl.org". perldoc.perl.org.
- ^ "SWI-Prolog Reference Manual". www.lix.polytechnique.fr.
- ^ "baruchel/tco". GitHub. 29 March 2022.
- ^ Rossum, Guido Van (22 April 2009). "Neopythonic: Tail Recursion Elimination".
- ^ "Tail-calling interpreter". GitHub. 2024-01-08. Retrieved 2025-03-08.
- ^ "What's new in Python 3.14". Python documentation. Retrieved 2025-02-19.
- ^ "A new tail-calling interpreter for significantly better interpreter performance". GitHub. 2025-01-06. Retrieved 2025-03-08.
- ^ "What's new in R 4.4.0?". www.jumpingrivers.com. 2024-04-25. Retrieved 2024-04-28.
- ^ "The Racket Reference". docs.racket-lang.org.
- ^ "Ruby Tail Call Optimisation".
- ^ "Rust FAQ". prev.rust-lang.org.
- ^ "Scala Standard Library 2.13.0 - scala.annotation.tailrec". www.scala-lang.org. Retrieved 2019-06-20.
- ^ "Revised^5 Report on the Algorithmic Language Scheme". www.schemers.org.
- ^ "Revised [6] Report on the Algorithmic Language Scheme". www.r6rs.org.
- ^ "Does Swift implement tail call optimization?". 2014. Retrieved 13 March 2024.
- ^ "tailcall manual page - Tcl Built-In Commands". www.tcl.tk.
- ^ "Documentation - the Zig Programming Language".
Tail call
View on GrokipediaFundamentals
Definition
A tail call is a subroutine call performed as the final action of a procedure, such that the calling procedure completes all its other work before the call and directly returns the result of the called subroutine without any intervening computation.[5] This definition emphasizes that the called subroutine effectively takes over the continuation of the calling procedure, allowing the implementation to treat the call as a transfer of control rather than a nested invocation.[6] In distinction from non-tail calls, where the calling procedure performs additional operations after the subroutine returns—such as processing or combining the result with other values—a tail call requires no such post-call actions, thereby avoiding the need to preserve the caller's activation record on the call stack.[7] Non-tail calls necessitate pushing a new stack frame for the callee while retaining the caller's frame for subsequent use, which can lead to stack growth in recursive scenarios.[8] The core concept enabling this behavior is that a tail call permits the reuse or replacement of the current stack frame with the frame of the called procedure, preventing unnecessary frame allocation and supporting space-efficient execution.[9] Tail recursion represents a special case of tail calls, where the subroutine invoked is the procedure itself.[10]Importance
Tail calls play a crucial role in achieving efficiency gains in recursive computations, particularly by enabling tail call optimization (TCO), which reuses the current stack frame rather than allocating new ones for each recursive invocation. This mechanism prevents stack overflow that would otherwise occur in deep recursion, allowing tail-recursive functions to operate with constant space complexity, denoted as , regardless of recursion depth.[11] For instance, in languages supporting proper tail recursion, such as Scheme, this ensures bounded storage usage for iterative processes expressed recursively, making it reliable for portable code across implementations.[11] In functional programming paradigms, tail calls facilitate iterative-style recursion without relying on imperative loops, thereby preserving referential transparency and function purity. This approach maintains the composability of functions, as recursive definitions can be structured to avoid side effects while mimicking loop behavior through tail-recursive calls, which is essential in pure functional languages where mutation is discouraged. By treating tail calls as jumps, compilers can optimize these structures to support scalable, side-effect-free computations that align with functional principles. More broadly, tail calls enable the efficient compilation of recursive definitions into equivalent loop constructs, which is particularly vital for functional languages lacking built-in iteration primitives. This transformation allows recursion-heavy code to execute with performance comparable to imperative loops, avoiding the space and time penalties of naive recursion and supporting applications like web servers or data processing that require sustained computation without stack growth.[12] Such optimizations are standardized in languages like Scheme, ensuring that tail-recursive programs achieve asymptotic space efficiency predictably.[11]Syntax and Forms
Syntactic Recognition
In programming languages, a tail call is syntactically recognizable when a subroutine call constitutes the final action within a function body, such that the function directly returns the result of that call without any intervening computations or operations on the returned value. This pattern ensures that the call occupies a position where no further processing of its outcome is required, allowing the current activation frame to be safely discarded before invoking the callee.[11] Variations of this syntactic form include direct returns of the call as well as calls embedded within control structures like conditionals, provided the call remains the ultimate expression in all relevant paths. For example, in a conditional construct, both the then-branch and else-branch can each end with a tail call, as long as no post-call actions occur in either case. This flexibility preserves the tail property across branching logic without altering the core requirement that the call terminates the function's execution flow.[11] Common pitfalls in syntactic recognition involve calls that appear late in a function but are not truly terminal, such as those nested inside compound expressions or followed by subsequent operations like arithmetic, assignments, or sequencing. In these scenarios, the call's result must be used or modified afterward, disqualifying it from tail position and preventing optimization opportunities. Such misidentifications can lead to unnecessary stack growth during execution.[13] This syntactic analysis is crucial for enabling tail call optimization, which reuses stack frames to support efficient recursion and prevent stack overflows in functional programming paradigms.[14]Tail Position
In programming languages that support tail calls, particularly those with functional paradigms like Scheme, a function call occupies the tail position when it constitutes the final computational step of the enclosing procedure, such that the procedure's return value is precisely the value returned by that call, without any subsequent operations, side effects, or continuations pending upon its completion.[15] Semantically, this implies that the continuation associated with the tail call is identical to the continuation of the enclosing procedure itself, allowing the implementation to reuse the current activation record rather than allocating a new one, thereby preventing unbounded stack growth in recursive scenarios.[16] This definition ensures that the call directly computes the procedure's result, aligning the control flow such that no additional evaluation context is required after the call resolves.[11] From a control-flow perspective, tail position is preserved in structured constructs where the call dominates the exit path of the procedure. For instance, in conditional expressions, a call is in tail position if it appears as the ultimate expression in every branch that can lead to the procedure's return, ensuring that regardless of the branch taken, the control flow terminates directly via the call without further processing.[15] Similarly, in loop-like constructs or sequential compositions, the position qualifies as tail only if the call is the last in a sequence that exhausts all possible paths to return, with no intervening actions that would impose a distinct continuation.[11] These conditions extend the tail property inductively through the control structure, maintaining the semantic guarantee of direct result propagation. Formally, the criteria for tail position can be characterized inductively relative to the procedure's lambda expression: the body of the lambda is inherently in tail position, and within control forms like conditionals, the expressions in the consequent and alternative branches inherit tail status if the enclosing form is itself in tail position.[11] A procedure call meets these criteria only if it is a tail expression, meaning it evaluates to the procedure's overall value without wrapping in additional computational contexts.[15] While syntactic patterns often serve as indicators for identifying such positions during compilation, the underlying semantics emphasize the absence of post-call continuations as the defining feature.[16]Examples
Basic Tail Calls
A tail call occurs when a function performs a subroutine call as its final action before returning, with no additional computations or operations following the call. This positions the subroutine invocation in a tail position, where the result of the call is directly returned by the caller. Such calls enable potential optimizations by allowing the caller's activation record to be reused, avoiding unnecessary stack growth. Consider a simple non-recursive example in pseudocode, where one function invokes another as its sole concluding action:function foo(x) {
// Perform preliminary computations on x
y = process(x);
return bar(y);
}
function foo(x) {
// Perform preliminary computations on x
y = process(x);
return bar(y);
}
bar(y) constitutes a basic tail call, as it represents the last operation in foo, and foo simply propagates bar's result. This structure is common in procedural and functional programming to chain operations efficiently without intermediate stack frames.[17]
Basic tail calls can also appear in recursive contexts through patterns like the accumulator, where a parameter accumulates state across invocations while ensuring the recursive call remains in tail position. For instance, a tail-recursive maximum function swaps arguments to avoid non-tail recursion:
function max(a, b) {
if (a > b) {
return a;
} else {
return max(b, a);
}
}
function max(a, b) {
if (a > b) {
return a;
} else {
return max(b, a);
}
}
max is in tail position, directly returning its outcome. Similarly, an accumulator-based sum might pass a running total:
function sum(n, acc) {
if (n == 0) {
return acc;
} else {
return sum(n - 1, acc + n);
}
}
function sum(n, acc) {
if (n == 0) {
return acc;
} else {
return sum(n - 1, acc + n);
}
}
Tail Recursion
Tail recursion is a form of recursion where the recursive call is the final operation performed by the function, allowing the compiler or interpreter to optimize it by reusing the current stack frame rather than allocating a new one for each call.[19] This pattern is particularly useful for implementing iterative processes in functional languages without explicit loops. A common technique to achieve tail recursion is the accumulator pattern, where an additional parameter tracks the accumulated result, transforming operations that would otherwise build up stack frames into constant-space iterations.[20] For instance, the factorial function can be rewritten in tail-recursive form by passing an accumulator that multiplies the running product. Consider the tail-recursive factorial for a non-negative integer , initialized with an accumulator of 1:def fact(n, acc=1):
if n == 0:
return acc
else:
return fact(n-1, acc * n)
def fact(n, acc=1):
if n == 0:
return acc
else:
return fact(n-1, acc * n)
def sum_list(lst, acc=0):
if lst is empty:
return acc
else:
return sum_list(lst.tail, acc + lst.head)
def sum_list(lst, acc=0):
if lst is empty:
return acc
else:
return sum_list(lst.tail, acc + lst.head)
def map_list(lst, f, acc=[]):
if lst is empty:
return reverse(acc)
else:
return map_list(lst.tail, f, cons(f(lst.head), acc))
def map_list(lst, f, acc=[]):
if lst is empty:
return reverse(acc)
else:
return map_list(lst.tail, f, cons(f(lst.head), acc))
cons prepends the transformed head to the accumulator, and reverse corrects the order in the base case.[23]
The primary benefit of tail recursion is space efficiency, as it prevents stack overflow by maintaining constant stack usage regardless of recursion depth, enabling safe computation at depths of thousands or more in languages that support tail call optimization.[24]
Optimizations
Tail Call Optimization
Tail call optimization (TCO) is a compiler technique that optimizes procedure calls occurring in tail position by eliminating the need to allocate new stack frames, thereby treating the call as a direct transfer of control equivalent to a jump instruction.[14] This optimization ensures that the called procedure reuses the current stack frame of the caller, avoiding the overhead of pushing a new return address or updating the stack pointer for the invocation.[2] As articulated by Steele, procedure calls in tail position can be implemented as efficiently as goto statements, preserving the caller's context without additional storage allocation.[14] The process begins during compilation, where the optimizer identifies calls in tail position—meaning no computations follow the call in the current procedure's body—and rewrites them as unconditional jumps to the callee's entry point.[2] This transformation involves adjusting argument passing to occur directly in the existing frame, followed by a jump that restores the caller's environment register without stacking a new continuation.[2] In languages enforcing proper tail recursion, such as Scheme, this rewrite guarantees space efficiency by precluding stack growth, distinguishing it from ad hoc optimizations in stack-based implementations.[2] A primary benefit of TCO is its space savings in recursive scenarios, converting potentially linear stack usage O(n) in depth to constant O(1) space, effectively mimicking iterative loops.[2] For instance, in tail-recursive functions like factorial computation, the stack frame is reused across invocations, bounding storage regardless of recursion depth and preventing stack overflow.[14] This efficiency is formalized in operational semantics where tail calls consume no additional control stack space, enabling unbounded recursion in constant memory.[2]Modulo Cons and Variants
Tail recursion modulo cons (TRMC) is an optimization technique that extends standard tail call optimization to handle recursive calls embedded within simple constructor applications, such as prepending an element to a list using thecons operation in Lisp-like languages.[25] This allows compilers to eliminate stack growth for functions that build data structures incrementally, by recognizing that the constructor can be deferred or transformed into an accumulator-based form without altering the semantics.[26] The technique, originating from work in the Lisp community in the 1970s, transforms such calls into true tail positions, often by reversing the accumulated list at the end or using destination-passing style.[27]
A representative example in Scheme illustrates this optimization for building a list through repeated prepending:
(define (build-list n)
(if (= n 0)
'()
(cons n (build-list (- n 1)))))
(define (build-list n)
(if (= n 0)
'()
(cons n (build-list (- n 1)))))
n due to the cons wrapping the recursive call. However, a compiler supporting TRMC recognizes the tail-recursive nature modulo cons and optimizes it to avoid stack overflow, typically by accumulating elements in reverse and reversing the result post-recursion, ensuring constant stack space even for large n.[28] (Note: While this example uses Scheme syntax, the principle applies similarly in languages like OCaml with explicit annotations.)[23]
Variants of TRMC generalize the approach to other constructors beyond simple cons. For instance, tail recursion modulo append optimizes list concatenation operations, where recursive calls are followed by appending to an accumulator, enabling efficient traversal without stack accumulation by compressing nested structures during compilation.[29] In Prolog, difference lists provide a related mechanism, representing lists as pairs of head and tail variables to facilitate tail-recursive appends via unification, allowing efficient construction of lists in logic programs without explicit reversal.[30]
Implementation
Assembly and Low-Level
At the assembly level, tail call optimization involves transforming a subroutine call that is immediately followed by a return instruction into an unconditional jump (jmp in x86 assembly), thereby avoiding the allocation of a new stack frame for the callee. This transformation reuses the existing stack frame of the calling function, as the callee's return will effectively transfer control back to the original caller without needing to unwind the intermediate frame.[31][32] In x86 assembly under a register-based calling convention like System V ABI for AMD64, the process begins by preparing the arguments in registers (e.g., rdi for the first integer argument). The compiler then deallocates the current stack frame by adding the frame size to the stack pointer (add rsp, frame_size) to discard local variables and restore the stack to the state upon entry. Callee-saved registers, if modified, are restored via pop instructions. Finally, instead of a call instruction (which pushes the return address onto the stack), an unconditional jmp is emitted to transfer control to the callee's entry point. For example, consider a simplified tail call from function foo to bar:; Non-optimized
prepare_args:
mov rdi, some_value ; Load [argument](/page/Argument)
call bar ; Pushes [return address](/page/Return_address) to foo, jumps to bar
ret ; Pops [return address](/page/Return_address), returns to foo's caller
; Optimized
prepare_args:
mov rdi, some_value ; Load [argument](/page/Argument)
add rsp, 16 ; Deallocate frame (example size)
pop rbp ; Restore frame pointer if used
jmp bar ; Jump without pushing [return address](/page/Return_address)
; Non-optimized
prepare_args:
mov rdi, some_value ; Load [argument](/page/Argument)
call bar ; Pushes [return address](/page/Return_address) to foo, jumps to bar
ret ; Pops [return address](/page/Return_address), returns to foo's caller
; Optimized
prepare_args:
mov rdi, some_value ; Load [argument](/page/Argument)
add rsp, 16 ; Deallocate frame (example size)
pop rbp ; Restore frame pointer if used
jmp bar ; Jump without pushing [return address](/page/Return_address)
IL_0000: ldarg.0 // Load argument
IL_0001: tail. // Prefix for tail call
IL_0002: call SomeMethod // Call marked as tail
IL_0007: ret // Return (optimized away in branch)
IL_0000: ldarg.0 // Load argument
IL_0001: tail. // Prefix for tail call
IL_0002: call SomeMethod // Call marked as tail
IL_0007: ret // Return (optimized away in branch)
Trampolining
Trampolining is a runtime technique used to emulate tail call optimization in programming languages that lack native support for it, allowing recursive functions to execute iteratively without consuming additional stack space. The core mechanism involves functions returning thunks—closures or small function objects that encapsulate the pending computation—rather than performing direct recursive calls or returning final values. A dedicated trampoline function then enters a loop that repeatedly invokes the returned thunk until it yields a non-thunk result, effectively simulating the tail call by reusing the current stack frame. This approach transforms potentially deep recursion into a bounded loop, preventing stack overflows in environments with strict stack limits.[40] To illustrate, consider implementing a tail-recursive factorial function in JavaScript, which does not guarantee tail call optimization. The recursive function returns a thunk for further computation, and a trampoline loop processes the chain:function factorial(n, acc = 1) {
if (n <= 1) {
return acc;
}
return () => factorial(n - 1, n * acc); // Return thunk instead of calling directly
}
function trampoline(fn) {
let result = fn();
while (typeof result === 'function') {
result = result();
}
return result;
}
// Usage
console.log(trampoline(() => factorial(10000))); // Computes without [stack overflow](/page/Stack_overflow)
function factorial(n, acc = 1) {
if (n <= 1) {
return acc;
}
return () => factorial(n - 1, n * acc); // Return thunk instead of calling directly
}
function trampoline(fn) {
let result = fn();
while (typeof result === 'function') {
result = result();
}
return result;
}
// Usage
console.log(trampoline(() => factorial(10000))); // Computes without [stack overflow](/page/Stack_overflow)
Related Concepts
While Loops
Tail-recursive functions, particularly those employing accumulators, exhibit a direct equivalence to imperative while loops that utilize mutable variables for state management. In such constructions, the recursive call serves as the final operation, allowing compilers or interpreters to optimize the process by reusing the current stack frame rather than allocating new ones, effectively transforming the recursion into an iterative loop structure. This optimization ensures constant space complexity, mirroring the behavior of a while loop that updates an accumulator in place without growing the call stack.[43][44] A representative example is the computation of the sum of a list. Consider a tail-recursive implementation in a functional style, such as the following Scheme code, which uses an accumulator to build the result:(define (sum xs)
(define (sum-tail acc xs)
(if (null? xs)
acc
(sum-tail (+ (car xs) acc) (cdr xs))))
(sum-tail 0 xs))
(define (sum xs)
(define (sum-tail acc xs)
(if (null? xs)
acc
(sum-tail (+ (car xs) acc) (cdr xs))))
(sum-tail 0 xs))
function sum_while(lst):
acc = 0
current = lst
while current is not empty:
acc = acc + head(current)
current = [tail](/page/Tail)(current)
return acc
function sum_while(lst):
acc = 0
current = lst
while current is not empty:
acc = acc + head(current)
current = [tail](/page/Tail)(current)
return acc
Continuations
Tail calls can be conceptualized as invocations of a continuation that consists solely of the caller's return behavior, effectively transferring control without preserving additional context from the current frame.[47] In continuation-passing style (CPS), a programming paradigm where functions accept an additional continuation argument representing the remainder of the computation, tail calls manifest as direct applications of the callee to the current continuation, eschewing the accumulation of new continuation frames on the call stack.[48] This structure, formalized in the λ-calculus, ensures that all function calls are tail calls, facilitating efficient implementation by avoiding stack growth and enabling optimizations akin to jumps.[47] For instance, a CPS-transformed factorial function passes the accumulator and continuation such that recursive invocations reuse the existing frame, maintaining constant stack space.[47] Delimited continuations extend this model by capturing only a bounded portion of the computation up to a designated prompt, interacting with tail calls in languages like Scheme to support advanced control structures such as coroutines.[49] In Scheme implementations adhering to standards like SRFI 248, operators such asshift0 and reset0 preserve tail-call contexts within guarded expressions, allowing coroutine generators to yield control without incurring space leaks from unbounded continuation stacking.[50] This delimited approach, often realized through monadic CPS transformations, enables modular composition of control effects while upholding tail-call optimization, as empty continuations in tail positions are detectable and handled efficiently.[49] For example, coroutine implementations using make-coroutine-generator leverage these operators to interleave executions symmetrically, simulating cooperative multitasking without violating tail-call guarantees.[50]
Language Support
Native Support
Several programming languages provide native support for tail call optimization (TCO), ensuring that tail-recursive calls do not consume additional stack space and allowing potentially unbounded recursion without stack overflow. This support is often mandated by language standards or reliably implemented in their primary compilers and runtimes. In Scheme, the Revised(5) Report on the Algorithmic Language Scheme (R5RS) explicitly requires implementations to be properly tail-recursive, meaning that any procedure call in tail position must not save the caller's continuation on the stack.[51] This guarantee enables efficient recursive algorithms, such as those for list processing. For instance, implementations like Guile and Racket fully adhere to this standard, optimizing tail calls at compile time to reuse the current stack frame. Haskell supports TCO through its lazy evaluation model, where the Glasgow Haskell Compiler (GHC) optimizes strict tail calls by eliminating unnecessary stack frames when optimizations are enabled (e.g., via the -O2 flag). This is particularly effective for right-recursive functions, allowing recursion to behave like iteration without risking stack overflow, though non-tail-recursive or left-recursive calls may still build thunks that require careful management. Erlang's runtime environment natively performs last call optimization (a form of TCO) for tail-recursive function calls, converting them into jumps that avoid stack growth.[52] This is crucial for Erlang's actor-based concurrency model, where processes can recurse indefinitely on messages without exhausting process stacks. For example, a loop processing a queue can use tail recursion to handle elements efficiently. Scala provides compiler-level TCO for self-recursive tail calls in final methods or local functions, enforced via the @tailrec annotation, which verifies and optimizes the code to prevent stack overflow. Although running on the JVM limits mutual recursion optimization, this native support allows functional-style recursion in Scala's core language features. Zig offers explicit tail call support through the @call builtin with the .always_tail modifier, which guarantees that the compiler generates tail-optimized code or fails compilation if impossible.[53] This enables compile-time recursion in comptime contexts and runtime tail calls, promoting safe and efficient recursive designs in systems programming.Partial or Emulated Support
In Python 3.14, an experimental tail-call interpreter was introduced as an opt-in feature to improve CPython's internal performance by using tail calls between small C functions for opcodes, yielding 3-5% better results on the pyperformance benchmark suite when built with Clang 19 on x86-64 and AArch64 architectures.[54] This interpreter is not enabled by default and requires configuration with the--with-tail-call-interp option during build; it optimizes the interpreter itself rather than providing tail call optimization (TCO) for user-defined Python functions.[54] For handling deep recursion in standard Python, developers must rely on increasing the recursion limit via sys.setrecursionlimit(), as CPython does not implement TCO for Python code.[54]
JavaScript, as specified in ECMAScript 2015 (ES6), includes support for proper tail calls in the language standard, allowing tail-recursive functions to avoid stack growth.[55] However, implementation is partial: only Safari's JavaScriptCore engine fully supports TCO, while engines in Chrome (V8), Firefox (SpiderMonkey), and Node.js do not, due to performance and debugging implications.[56] In unsupported environments, JavaScript developers emulate TCO using techniques like trampolining, where recursive calls are transformed into iterative loops with explicit stack management, or by leveraging async generators for asynchronous recursion without stack overflow.[56]
In C++, the GNU Compiler Collection (GCC) provides partial TCO through the -foptimize-sibling-calls flag, which enables the compiler to replace eligible tail calls with jumps, reducing stack usage for sibling calls (calls to functions at the same scope level). This optimization is not guaranteed for all tail calls and requires optimization levels like -O2 or higher to activate reliably, but it applies only when the compiler can verify safety, such as in non-inline or non-virtual functions. Similarly, in Ruby (MRI implementation), experimental TCO has been available since version 1.9.2 for tail-recursive calls within the same lexical scope, but it is not enabled by default and can be activated by setting the tailcall_optimization option to true when compiling instruction sequences using RubyVM::InstructionSequence, or by building Ruby with appropriate compiler flags.[57] This support is implementation-specific and does not extend to all recursive patterns, often necessitating manual refactoring for deep recursion to avoid stack overflows.[57]
