Recent from talks
Contribute something
Nothing was collected or created yet.
Cycle detection
View on WikipediaThis article may be too technical for most readers to understand. (February 2018) |
In computer science, cycle detection or cycle finding is the algorithmic problem of finding a cycle in a sequence of iterated function values.
For any function f that maps a finite set S to itself, and any initial value x0 in S, the sequence of iterated function values
must eventually use the same value twice: there must be some pair of distinct indices i and j such that xi = xj. Once this happens, the sequence must continue periodically, by repeating the same sequence of values from xi to xj − 1. Cycle detection is the problem of finding i and j, given f and x0.
Several algorithms are known for finding cycles quickly and with little memory. Robert W. Floyd's tortoise and hare algorithm moves two pointers at different speeds through the sequence of values until they both point to equal values. Alternatively, Brent's algorithm is based on the idea of exponential search. Both Floyd's and Brent's algorithms use only a constant number of memory cells, and take a number of function evaluations that is proportional to the distance from the start of the sequence to the first repetition. Several other algorithms trade off larger amounts of memory for fewer function evaluations.
The applications of cycle detection include testing the quality of pseudorandom number generators and cryptographic hash functions, computational number theory algorithms, detection of infinite loops in computer programs and periodic configurations in cellular automata, automated shape analysis of linked list data structures, and detection of deadlocks for transactions management in DBMS.
Example
[edit]
The figure shows a function f that maps the set S = {0,1,2,3,4,5,6,7,8} to itself. If one starts from x0 = 2 and repeatedly applies f, one sees the sequence of values
- 2, 0, 6, 3, 1, 6, 3, 1, 6, 3, 1, ....
The cycle in this value sequence is 6, 3, 1.
Definitions
[edit]Let S be any finite set, f be any endofunction from S to itself, and x0 be any element of S. For any i > 0, let xi = f(xi − 1). Let μ be the smallest index such that the value xμ reappears infinitely often within the sequence of values xi, and let λ (the loop length) be the smallest positive integer such that xμ = xλ + μ. The cycle detection problem is the task of finding λ and μ.[1]
One can view the same problem graph-theoretically, by constructing a functional graph (that is, a directed graph in which each vertex has a single outgoing edge) the vertices of which are the elements of S and the edges of which map an element to the corresponding function value, as shown in the figure. The set of vertices reachable from starting vertex x0 form a subgraph with a shape resembling the Greek letter rho (ρ): a path of length μ from x0 to a cycle of λ vertices.[2]
Practical cycle-detection algorithms do not find λ and μ exactly.[1] They usually find lower and upper bounds μl ≤ μ ≤ μh for the start of the cycle, and a more detailed search of the range must be performed if the exact value of μ is needed. Also, most algorithms do not guarantee to find λ directly, but may find some multiple kλ < μ + λ. (Continuing the search for an additional kλ/q steps, where q is the smallest prime divisor of kλ, will either find the true λ or prove that k = 1.)
Computer representation
[edit]Except in toy examples like the above, f will not be specified as a table of values. Such a table implies O(|S|) space complexity, and if that is permissible, building a predecessor array (associative array mapping xi to i) while iterating f will detect the first repeated value when it is visited the second time, at which point the value in the predecessor array is μ and the current index is μ+λ. Rather, a cycle detection algorithm is given a black box for generating the sequence xi, and the task is to find λ and μ using very little memory.
The black box might consist of an implementation of the recurrence function f, but it might also store additional internal state to make the computation more efficient. Although xi = f(xi−1) must be true in principle, this might be expensive to compute directly; the function could be defined in terms of the discrete logarithm of xi−1 or some other difficult-to-compute property which can only be practically computed in terms of additional information. In such cases, the number of black boxes required becomes a figure of merit distinguishing the algorithms.
A second reason to use one of these algorithms is that they are pointer algorithms which do no operations on elements of S other than testing for equality. An associative array implementation requires computing a hash function on the elements of S, or ordering them. But cycle detection can be applied in cases where neither of these are possible.
The classic example is Pollard's rho algorithm for integer factorization, which searches for a factor p of a given number n by looking for values xi and xi+λ which are equal modulo p without knowing p in advance. This is done by computing the greatest common divisor of the difference xi − xi+λ with a known multiple of p, namely n. If the gcd is non-trivial (neither 1 nor n), then the value is a proper factor of n, as desired.[2] If n is not prime, it must have at least one factor p ≤ √n, and by the birthday paradox, a random function f has an expected cycle length (modulo p) of √p ≤ 4√n.
Algorithms
[edit]If the input is given as a subroutine for calculating f, the cycle detection problem may be trivially solved using only λ + μ function applications, simply by computing the sequence of values xi and using a data structure such as a hash table to store these values and test whether each subsequent value has already been stored. However, the space complexity of this algorithm is proportional to λ + μ, unnecessarily large. Additionally, to implement this method as a pointer algorithm would require applying the equality test to each pair of values, resulting in quadratic time overall. Thus, research in this area has concentrated on two goals: using less space than this naive algorithm, and finding pointer algorithms that use fewer equality tests.
Floyd's tortoise and hare
[edit]
Floyd's cycle-finding algorithm is a pointer algorithm that uses only two pointers, which move through the sequence at different speeds. It is also called the "tortoise and the hare algorithm", alluding to Aesop's fable of The Tortoise and the Hare.
The algorithm is named after Robert W. Floyd, who was credited with its invention by Donald Knuth.[3][4] However, the algorithm does not appear in Floyd's published work, and this may be a misattribution: Floyd describes algorithms for listing all simple cycles in a directed graph in a 1967 paper,[5] but this paper does not describe the cycle-finding problem in functional graphs that is the subject of this article. In fact, Knuth's statement (in 1969), attributing it to Floyd, without citation, is the first known appearance in print, and it thus may be a folk theorem, not attributable to a single individual.[6]
The key insight in the algorithm is as follows. If there is a cycle, then, for any integers i ≥ μ and k ≥ 0, xi = xi + kλ, where λ is the length of the loop to be found, μ is the index of the first element of the cycle, and k is a whole integer representing the number of loops. Based on this, it can then be shown that i = kλ ≥ μ for some k if and only if xi = x2i (if xi = x2i in the cycle, then there exists some k such that 2i = i + kλ, which implies that i = kλ; and if there are some i and k such that i = kλ, then 2i = i + kλ and x2i = xi + kλ). Thus, the algorithm only needs to check for repeated values of this special form, one twice as far from the start of the sequence as the other, to find a period ν of a repetition that is a multiple of λ. Once ν is found, the algorithm retraces the sequence from its start to find the first repeated value xμ in the sequence, using the fact that λ divides ν and therefore that xμ = xμ + v. Finally, once the value of μ is known it is trivial to find the length λ of the shortest repeating cycle, by searching for the first position μ + λ for which xμ + λ = xμ.
The algorithm thus maintains two pointers into the given sequence, one (the tortoise) at xi, and the other (the hare) at x2i. At each step of the algorithm, it increases i by one, moving the tortoise one step forward and the hare two steps forward in the sequence, and then compares the sequence values at these two pointers. The smallest value of i > 0 for which the tortoise and hare point to equal values is the desired value ν.
The following Python code shows how this idea may be implemented as an algorithm.
def floyd(f, x0) -> (int, int):
"""Floyd's cycle detection algorithm."""
# Main phase of algorithm: finding a repetition x_i = x_2i.
# The hare moves twice as quickly as the tortoise and
# the distance between them increases by 1 at each step.
# Eventually they will both be inside the cycle and then,
# at some point, the distance between them will be
# divisible by the period λ.
tortoise = f(x0) # f(x0) is the element/node next to x0.
hare = f(f(x0))
while tortoise != hare:
tortoise = f(tortoise)
hare = f(f(hare))
# At this point the tortoise position, ν, which is also equal
# to the distance between hare and tortoise, is divisible by
# the period λ. So hare moving in cycle one step at a time,
# and tortoise (reset to x0) moving towards the cycle, will
# intersect at the beginning of the cycle. Because the
# distance between them is constant at 2ν, a multiple of λ,
# they will agree as soon as the tortoise reaches index μ.
# Find the position μ of first repetition.
mu = 0
tortoise = x0
while tortoise != hare:
tortoise = f(tortoise)
hare = f(hare) # Hare and tortoise move at same speed
mu += 1
# Find the length of the shortest cycle starting from x_μ
# The hare moves one step at a time while tortoise is still.
# lam is incremented until λ is found.
lam = 1
hare = f(tortoise)
while tortoise != hare:
hare = f(hare)
lam += 1
return lam, mu
This code only accesses the sequence by storing and copying pointers, function evaluations, and equality tests; therefore, it qualifies as a pointer algorithm. The algorithm uses O(λ + μ) operations of these types, and O(1) storage space.[7]
Brent's algorithm
[edit]Richard P. Brent described an alternative cycle detection algorithm that, like the tortoise and hare algorithm, requires only two pointers into the sequence.[8] However, it is based on a different principle: searching for the smallest power of two 2i that is larger than both λ and μ. For i = 0, 1, 2, ..., the algorithm compares x2i−1 with each subsequent sequence value up to the next power of two, stopping when it finds a match. It has two advantages compared to the tortoise and hare algorithm: it finds the correct length λ of the cycle directly, rather than needing to search for it in a subsequent stage, and its steps involve only one evaluation of the function f rather than three.[9]
The following Python code shows how this technique works in more detail.
def brent(f, x0) -> (int, int):
"""Brent's cycle detection algorithm."""
# main phase: search successive powers of two
power = lam = 1
tortoise = x0
hare = f(x0) # f(x0) is the element/node next to x0.
# this assumes there is a cycle; otherwise this loop won't terminate
while tortoise != hare:
if power == lam: # time to start a new power of two?
tortoise = hare
power *= 2
lam = 0
hare = f(hare)
lam += 1
# Find the position of the first repetition of length λ
tortoise = hare = x0
for i in range(lam):
hare = f(hare)
# The distance between the hare and tortoise is now λ.
# Next, the hare and tortoise move at same speed until they agree
mu = 0
while tortoise != hare:
tortoise = f(tortoise)
hare = f(hare)
mu += 1
return lam, mu
Like the tortoise and hare algorithm, this is a pointer algorithm that uses O(λ + μ) tests and function evaluations and O(1) storage space. It is not difficult to show that the number of function evaluations can never be higher than for Floyd's algorithm. Brent claims that, on average, his cycle finding algorithm runs around 36% more quickly than Floyd's and that it speeds up the Pollard rho algorithm by around 24%. He also performs an average case analysis for a randomized version of the algorithm in which the sequence of indices traced by the slower of the two pointers is not the powers of two themselves, but rather a randomized multiple of the powers of two. Although his main intended application was in integer factorization algorithms, Brent also discusses applications in testing pseudorandom number generators.[8]
Gosper's algorithm
[edit]R. W. Gosper's algorithm[10][11] finds the period , and the lower and upper bound of the starting point, and , of the first cycle. The difference between the lower and upper bound is of the same order as the period, i.e. .
The algorithm maintains an array of tortoises . For each :
- For each compare to .
- If , a cycle has been detected, of length
- If no match is found, set , where is the number of trailing zeros in the binary representation of . I.e. the greatest power of 2 which divides .
If it is inconvenient to vary the number of comparisons as increases, you may initialize all of the , but must then return if while .
Advantages
[edit]The main features of Gosper's algorithm are that it is economical in space, very economical in evaluations of the generator function, and always finds the exact cycle length (never a multiple). The cost is a large number of equality comparisons. It could be roughly described as a concurrent version of Brent's algorithm. While Brent's algorithm uses a single tortoise, repositioned every time the hare passes a power of two, Gosper's algorithm uses several tortoises (several previous values are saved), which are roughly exponentially spaced. According to the note in HAKMEM item 132,[11] this algorithm will detect repetition before the third occurrence of any value, i.e. the cycle will be iterated at most twice. HAKMEM also states that it is sufficient to store previous values; however, this only offers a saving if we know a priori that is significantly smaller than . The standard implementations[10] store values. For example, assume the function values are 32-bit integers, so and Then Gosper's algorithm will find the cycle after less than function evaluations (in fact, the most possible is ), while consuming the space of 33 values (each value being a 32-bit integer).
Complexity
[edit]Upon the -th evaluation of the generator function, the algorithm compares the generated value with previous values; observe that goes up to at least and at most . Therefore, the time complexity of this algorithm is . Since it stores values, its space complexity is . This is under the usual transdichotomous model, assumed throughout this article, in which the size of the function values is constant. Without this assumption, we know it requires space to store distinct values, so the overall space complexity is
Time–space tradeoffs
[edit]A number of authors have studied techniques for cycle detection that use more memory than Floyd's and Brent's methods, but detect cycles more quickly. In general these methods store several previously-computed sequence values, and test whether each new value equals one of the previously-computed values. In order to do so quickly, they typically use a hash table or similar data structure for storing the previously-computed values, and therefore are not pointer algorithms: in particular, they usually cannot be applied to Pollard's rho algorithm. Where these methods differ is in how they determine which values to store. Following Nivasch,[12] we survey these techniques briefly.
- Brent[8] already describes variations of his technique in which the indices of saved sequence values are powers of a number R other than two. By choosing R to be a number close to one, and storing the sequence values at indices that are near a sequence of consecutive powers of R, a cycle detection algorithm can use a number of function evaluations that is within an arbitrarily small factor of the optimum λ + μ.[13][14]
- Sedgewick, Szymanski, and Yao[15] provide a method that uses M memory cells and requires in the worst case only function evaluations, for some constant c, which they show to be optimal. The technique involves maintaining a numerical parameter d, storing in a table only those positions in the sequence that are multiples of d, and clearing the table and doubling d whenever too many values have been stored.
- Several authors have described distinguished point methods that store function values in a table based on a criterion involving the values, rather than (as in the method of Sedgewick et al.) based on their positions. For instance, values equal to zero modulo some value d might be stored.[16][17] More simply, Nivasch[12] credits D. P. Woodruff with the suggestion of storing a random sample of previously seen values, making an appropriate random choice at each step so that the sample remains random.
- Nivasch[12] describes an algorithm that does not use a fixed amount of memory, but for which the expected amount of memory used (under the assumption that the input function is random) is logarithmic in the sequence length. An item is stored in the memory table, with this technique, when no later item has a smaller value. As Nivasch shows, the items with this technique can be maintained using a stack data structure, and each successive sequence value need be compared only to the top of the stack. The algorithm terminates when the repeated sequence element with smallest value is found. Running the same algorithm with multiple stacks, using random permutations of the values to reorder the values within each stack, allows a time–space tradeoff similar to the previous algorithms. However, even the version of this algorithm with a single stack is not a pointer algorithm, due to the comparisons needed to determine which of two values is smaller.
Any cycle detection algorithm that stores at most M values from the input sequence must perform at least function evaluations.[18][19]
Applications
[edit]Cycle detection has been used in many applications.
- Determining the cycle length of a pseudorandom number generator is one measure of its strength. This is the application cited by Knuth in describing Floyd's method.[3] Brent[8] describes the results of testing a linear congruential generator in this fashion; its period turned out to be significantly smaller than advertised. For more complex generators, the sequence of values in which the cycle is to be found may not represent the output of the generator, but rather its internal state.
- Several number-theoretic algorithms are based on cycle detection, including Pollard's rho algorithm for integer factorization[20] and his related kangaroo algorithm for the discrete logarithm problem.[21]
- In cryptographic applications, the ability to find two distinct values xμ−1 and xλ+μ−1 mapped by some cryptographic function ƒ to the same value xμ may indicate a weakness in ƒ. For instance, Quisquater and Delescaille[17] apply cycle detection algorithms in the search for a message and a pair of Data Encryption Standard keys that map that message to the same encrypted value; Kaliski, Rivest, and Sherman[22] also use cycle detection algorithms to attack DES. The technique may also be used to find a collision in a cryptographic hash function.[23]
- Cycle detection may be helpful as a way of discovering infinite loops in certain types of computer programs.[24]
- Periodic configurations in cellular automaton simulations may be found by applying cycle detection algorithms to the sequence of automaton states.[12]
- Shape analysis of linked list data structures is a technique for verifying the correctness of an algorithm using those structures. If a node in the list incorrectly points to an earlier node in the same list, the structure will form a cycle that can be detected by these algorithms.[25] In Common Lisp, the S-expression printer, under control of the
*print-circle*variable, detects circular list structure and prints it compactly. - Teske[14] describes applications in computational group theory: determining the structure of an Abelian group from a set of its generators. The cryptographic algorithms of Kaliski et al.[22] may also be viewed as attempting to infer the structure of an unknown group.
- Fich (1981) briefly mentions an application to computer simulation of celestial mechanics, which she attributes to William Kahan. In this application, cycle detection in the phase space of an orbital system may be used to determine whether the system is periodic to within the accuracy of the simulation.[18]
- In Mandelbrot Set fractal generation some performance techniques are used to speed up the image generation. One of them is called "period checking", which consists of finding the cycles in a point orbit. This article describes the "period checking" technique. You can find another explanation here. Some cycle detection algorithms have to be implemented in order to implement this technique.
References
[edit]- ^ a b Joux, Antoine (2009), "7. Birthday-based algorithms for functions", Algorithmic Cryptanalysis, CRC Press, p. 223, ISBN 978-1-420-07003-3.
- ^ a b Joux (2009, p. 224).
- ^ a b Knuth, Donald E. (1969), The Art of Computer Programming, vol. II: Seminumerical Algorithms, Addison-Wesley, p. 7, exercises 6 and 7
- ^ Handbook of Applied Cryptography, by Alfred J. Menezes, Paul C. van Oorschot, Scott A. Vanstone, p. 125, describes this algorithm and others
- ^ Floyd, R.W. (1967), "Nondeterministic Algorithms", J. ACM, 14 (4): 636–644, doi:10.1145/321420.321422, S2CID 1990464
- ^ The Hash Function BLAKE, by Jean-Philippe Aumasson, Willi Meier, Raphael C.-W. Phan, Luca Henzen (2015), p. 21, footnote 8
- ^ Joux (2009), Section 7.1.1, Floyd's cycle-finding algorithm, pp. 225–226.
- ^ a b c d Brent, R. P. (1980), "An improved Monte Carlo factorization algorithm" (PDF), BIT Numerical Mathematics , 20 (2): 176–184, doi:10.1007/BF01933190, S2CID 17181286.
- ^ Joux (2009), Section 7.1.2, Brent's cycle-finding algorithm, pp. 226–227.
- ^ a b Warren, Henry S. Jr. "Loop detectors of Floyd and Gosper". Hacker's Delight. Archived from the original on 2016-04-14. Retrieved 2017-02-08.
- ^ a b "Hakmem -- Flows and Iterated Functions -- Draft, Not Yet Proofed". Archived from the original on 2020-03-18. Retrieved 2024-05-02.
- ^ a b c d Nivasch, Gabriel (2004), "Cycle detection using a stack", Information Processing Letters, 90 (3): 135–140, doi:10.1016/j.ipl.2004.01.016.
- ^ Schnorr, Claus P.; Lenstra, Hendrik W. (1984), "A Monte Carlo factoring algorithm with linear storage", Mathematics of Computation, 43 (167): 289–311, doi:10.2307/2007414, hdl:1887/3815, JSTOR 2007414.
- ^ a b Teske, Edlyn (1998), "A space-efficient algorithm for group structure computation", Mathematics of Computation, 67 (224): 1637–1663, Bibcode:1998MaCom..67.1637T, doi:10.1090/S0025-5718-98-00968-5.
- ^ Sedgewick, Robert; Szymanski, Thomas G.; Yao, Andrew C.-C. (1982), "The complexity of finding cycles in periodic functions", SIAM Journal on Computing, 11 (2): 376–390, doi:10.1137/0211030.
- ^ van Oorschot, Paul C.; Wiener, Michael J. (1999), "Parallel collision search with cryptanalytic applications", Journal of Cryptology, 12 (1): 1–28, doi:10.1007/PL00003816, S2CID 5091635.
- ^ a b Quisquater, J.-J.; Delescaille, J.-P. (1990), "How easy is collision search? Application to DES", Advances in Cryptology – EUROCRYPT '89, Workshop on the Theory and Application of Cryptographic Techniques, Lecture Notes in Computer Science, vol. 434, Springer-Verlag, pp. 429–434, doi:10.1007/3-540-46885-4_43, ISBN 978-3-540-53433-4.
- ^ a b Fich, Faith Ellen (1981), "Lower bounds for the cycle detection problem", Proc. 13th ACM Symposium on Theory of Computing, Stoc '81, pp. 96–105, doi:10.1145/800076.802462, ISBN 978-1-4503-7392-0, S2CID 119742106.
- ^ Allender, Eric W.; Klawe, Maria M. (1985), "Improved lower bounds for the cycle detection problem", Theoretical Computer Science, 36 (2–3): 231–237, doi:10.1016/0304-3975(85)90044-1.
- ^ Pollard, J. M. (1975), "A Monte Carlo method for factorization", BIT, 15 (3): 331–334, doi:10.1007/BF01933667, S2CID 122775546.
- ^ Pollard, J. M. (1978), "Monte Carlo methods for index computation (mod p)", Mathematics of Computation, 32 (143), American Mathematical Society: 918–924, doi:10.2307/2006496, JSTOR 2006496, S2CID 235457090.
- ^ a b Kaliski, Burton S. Jr.; Rivest, Ronald L.; Sherman, Alan T. (1988), "Is the Data Encryption Standard a group? (Results of cycling experiments on DES)", Journal of Cryptology, 1 (1): 3–36, doi:10.1007/BF00206323, S2CID 17224075.
- ^ Joux (2009), Section 7.5, Collisions in hash functions, pp. 242–245.
- ^ Van Gelder, Allen (1987), "Efficient loop detection in Prolog using the tortoise-and-hare technique", Journal of Logic Programming, 4 (1): 23–31, doi:10.1016/0743-1066(87)90020-3.
- ^ Auguston, Mikhail; Hon, Miu Har (1997), "Assertions for Dynamic Shape Analysis of List Data Structures", AADEBUG '97, Proceedings of the Third International Workshop on Automatic Debugging, Linköping Electronic Articles in Computer and Information Science, Linköping University, pp. 37–42.
External links
[edit]- Gabriel Nivasch, The Cycle Detection Problem and the Stack Algorithm
- Tortoise and Hare, Portland Pattern Repository
- Floyd's Cycle Detection Algorithm (The Tortoise and the Hare)
- Brent's Cycle Detection Algorithm (The Teleporting Turtle)
Cycle detection
View on GrokipediaBasic Concepts
Problem Statement and Definitions
Cycle detection is a fundamental problem in computer science and mathematics, concerned with identifying repeating patterns, or cycles, in sequences generated by repeated application of a function. Given a function defined on some domain and an initial value , the sequence is defined iteratively as for . A cycle in this sequence manifests as a point where the iteration enters a periodic loop, meaning that after some initial transient phase, the values repeat indefinitely.[6] Mathematically, a cycle exists if there are non-negative integers (the tail length, representing the number of steps before entering the cycle) and (the cycle length, the period of repetition) such that for all integers , where denotes the -fold composition of . This condition ensures that starting from the -th iterate, the sequence loops with exact period , and and are the smallest such values satisfying these properties. The overall structure of the sequence often resembles the Greek letter (rho), consisting of a non-repeating tail of length followed by a cyclic loop of length , a paradigm central to many cycle detection methods.[6] The problem can be subdivided into cycle detection, which determines only the existence of a cycle (i.e., whether and are finite), and cycle finding, which additionally computes the specific values of and . Cycle detection in linked lists represents a special case where the iteration follows the next-pointer function, but the general formulation applies to arbitrary functional iterations.[7] This topic presupposes familiarity with basic concepts of functions, iteration, and sequences, without requiring prior knowledge of graph theory.[6]Illustrative Examples
To illustrate the cycle detection problem, consider a simple cyclic linked list consisting of three nodes labeled A, B, and C, where A points to B, B points to C, and C points back to B. Starting traversal from A, the sequence of visits is A, B, C, B, C, B, C, ..., entering an infinite loop at B without ever terminating. This configuration forms a "rho" shape (ρ), with a non-repeating tail segment (A to B) leading into a repeating cycle (B to C back to B).[7] In contrast, an acyclic linked list provides a terminating traversal. For example, nodes A, B, and C connected linearly as A → B → C → null yield the finite sequence A, B, C, ending at null without repetition or looping. This highlights the absence of a cycle, allowing traversal to complete normally.[7] A widely recognized formulation of cycle detection in linked lists appears in programming challenges such as LeetCode problem 141, "Linked List Cycle." The problem requires determining whether a singly linked list contains a cycle: given the head of the list, return true if there is a cycle (some node reachable again via next pointers), and false otherwise. Internally, pos denotes the (0-based) index where the tail connects to form a cycle, though pos is not passed as input (pos = -1 indicates no cycle).[8] Examples include:- head = [3,2,0,-4], pos = 1 → true (tail connects to index 1, the node with value 2)
- head = [1,2], pos = 0 → true (tail connects to index 0, the node with value 1)
- head = [9], pos = -1 → false (no cycle)
Computational Representations
Linked Lists
A singly-linked list is a fundamental data structure in computer science, composed of a sequence of nodes where each node stores a data value and a reference (pointer) to the next node in the sequence. The list originates from a head pointer referencing the first node, and conventionally, the final node's next pointer is set to null to signify the end of the list. This pointer-based organization enables dynamic memory allocation, allowing the list to grow or shrink at runtime without predefined size limits. A cycle in a singly-linked list arises when the next pointer of any node directs back to a preceding node, thereby forming a looped structure that revisits elements indefinitely upon traversal. Such cycles often emerge unintentionally from programming errors, such as erroneous pointer reassignments during insertion, deletion, or merging operations, which disrupt the intended linear flow. Intentionally, cycles appear in circular linked lists, where the last node connects to the first, facilitating applications like circular buffers for efficient queue implementations without fixed endpoints. The key challenge with cycles in linked lists is their propensity to cause non-terminating traversals, as algorithms expecting a null terminator enter infinite loops, potentially leading to resource exhaustion or program crashes. This issue is particularly prevalent in buggy code where pointer manipulations inadvertently create loops, underscoring the need for robust verification in list-handling routines. In the broader scope of cycle detection, a linked list cycle represents a special case of iteration over a function that maps each node to its successor. Standard traversal of a singly-linked list proceeds by iteratively following next pointers from the head until null is encountered. A representative pseudocode snippet for this is:current = head
while current is not null:
process(current.data)
current = current.next
current = head
while current is not null:
process(current.data)
current = current.next
Iterated Functions and Sequences
In the context of cycle detection, iterated functions generate sequences where each term is obtained by applying a function to the previous term, starting from an initial state , yielding for .[11] These sequences can be modeled as functional graphs, which are directed graphs where each node represents a possible state in the domain of , and each node has exactly one outgoing edge pointing to .[12] In such graphs, cycles appear as loops where a state eventually maps back to itself after a series of iterations, often within connected components that include trees feeding into the cycle.[12] When the state space is finite, the pigeonhole principle ensures that infinite iteration must produce repetitions, as there are only finitely many distinct states available.[11] Specifically, for a function defined over a finite set of size , the sequence generated from any starting point cannot exceed unique values before repeating, guaranteeing the existence of a cycle in the functional graph.[11] This periodicity is fundamental to analyzing the behavior of iterative processes in finite domains, motivating efficient detection methods. The typical structure of such a sequence consists of a non-repeating prefix, known as the tail or leader of length (where the first terms are distinct), followed by a repeating cycle of length , starting at the -th term.[13] This -shaped configuration—visualized as a tail leading into a loop—arises naturally in random mappings and is central to understanding where cycles begin, enabling algorithms to identify the entry point and length without exhaustive search.[13] Common state spaces for these iterations include the integers modulo (e.g., ), where arithmetic operations like multiplication or addition define , as seen in pseudorandom number generators; fixed-length bit strings, such as 32-bit or 64-bit words used in hash functions; or arbitrary finite sets like residue classes in cryptographic protocols.[12] Linked lists serve as a concrete data structure implementation for iterating such sequences in memory.[11]Standard Algorithms
Floyd's Tortoise and Hare Algorithm
Floyd's tortoise and hare algorithm is a constant-space method for detecting cycles in functional graphs or sequences produced by repeated application of a function , where the structure resembles the Greek letter rho () with a tail of length leading to a cycle of length . The algorithm employs two pointers starting from the initial element: the tortoise moves one step at a time by applying once, while the hare moves twice as fast by applying twice per iteration. If a cycle exists, the faster hare will eventually overtake the slower tortoise within the cycle, confirming the presence of a loop; otherwise, the pointers traverse the entire finite sequence without meeting.[14] In particular, the algorithm serves as the standard O(1) extra space solution for detecting cycles in linked lists, where advances to the next node, as required by problems such as LeetCode 141 (Linked List Cycle).[8] The algorithm can be expressed in the following pseudocode, assuming access to the starting element and the next-function :[tortoise](/page/Tortoise) ← start
[hare](/page/Hare) ← start
repeat
[tortoise](/page/Tortoise) ← f([tortoise](/page/Tortoise))
[hare](/page/Hare) ← f(f([hare](/page/Hare)))
until [tortoise](/page/Tortoise) = [hare](/page/Hare)
return "cycle detected"
[tortoise](/page/Tortoise) ← start
[hare](/page/Hare) ← start
repeat
[tortoise](/page/Tortoise) ← f([tortoise](/page/Tortoise))
[hare](/page/Hare) ← f(f([hare](/page/Hare)))
until [tortoise](/page/Tortoise) = [hare](/page/Hare)
return "cycle detected"
Proof of Correctness
The correctness relies on the relative speeds of the pointers in the -shaped structure. Without a cycle (), the sequence is finite or tree-like, and the pointers will not meet before exhausting it, but the algorithm assumes iteration until meeting or external termination. With a cycle (), both pointers enter the cycle after steps. At that point, the tortoise is at cycle position 0; the hare, having traveled twice as far, is at position . In subsequent steps, the tortoise advances 1 position while the hare advances 2 positions, so the hare gains 1 position relative to the tortoise per iteration. After steps in the cycle, the tortoise is at and the hare is at . They meet when , which simplifies to . Thus, the smallest positive is if , or otherwise, ensuring a meeting in at most steps after entering the cycle. The total steps until meeting is at most , ensuring detection.[14]Complexity Analysis
The time complexity is , as the pointers traverse the tail once and the cycle a constant number of times (at most twice) before meeting; in the worst case, it is linear in the input size for sequences up to length . The space complexity is , storing only the two pointers and temporary values, independent of . This makes it optimal for space among deterministic cycle detection methods.[14] The algorithm is attributed to Robert W. Floyd from unpublished notes circa 1967, with the first printed description appearing in Donald E. Knuth's The Art of Computer Programming, Volume 2 (1969), and further analysis in Richard P. Brent's 1980 paper improving upon it.[15]Brent's Cycle Detection Algorithm
Brent's cycle detection algorithm, introduced by Richard P. Brent in 1980, provides an efficient method for identifying cycles in sequences generated by iterated functions, building on the principle of detecting meetings between fast and slow traversals similar to Floyd's approach but with optimized step sizes.[16] The algorithm requires only constant space and detects the cycle length λ (the period) and the tail length μ (the distance to the cycle entry) in expected time proportional to μ + λ.[16] The algorithm operates in two main phases. In the first phase, it employs exponentially growing step sizes—specifically, powers of 2—to traverse the sequence and detect a repetition, thereby identifying the cycle length λ with fewer function evaluations than linear probing. A "tortoise" pointer advances in smaller increments, while a "hare" pointer leaps ahead in doubling distances; when the hare laps back to the tortoise after a power-of-2 interval, a cycle is suspected, and the interval length confirms λ. This phase initializes with the tortoise at the starting point x₀ and the hare at f(x₀), where f is the iterating function, using variables to track the current power (starting at 1) and steps within the current power (starting at 0). The hare advances one step at a time, incrementing the step counter; upon reaching the current power's limit, the tortoise teleports to the hare's position, the power doubles, and the step counter resets. Repetition occurs when the hare meets the tortoise within the current power, yielding λ as the steps taken since the last teleport.[16] In the second phase, starting from x₀ for the tortoise and the meeting point for the hare, both advance one step at a time until they meet again; the number of steps μ locates the cycle entry, as the hare circumnavigates the cycle while the tortoise enters it.[16] The following pseudocode illustrates the algorithm, adapted from Brent's original description for general cycle detection (setting Q=2 and simplifying the random u to deterministic powering for standard use):[16]function brent(f, x0):
# Phase 1: Find cycle length λ
[tortoise](/page/Tortoise) = x0
[hare](/page/Hare) = f(x0)
power = 1
lam = 0
while [tortoise](/page/Tortoise) != [hare](/page/Hare):
if power == lam:
[tortoise](/page/Tortoise) = [hare](/page/Hare)
power = power * 2
lam = 0
[hare](/page/Hare) = f([hare](/page/Hare))
lam = lam + 1
# lam now holds λ, [tortoise](/page/Tortoise) and [hare](/page/Hare) meet at a cycle point
meeting = [tortoise](/page/Tortoise) # Save meeting point
# Phase 2: Find tail length μ
mu = 0
tortoise = x0
while tortoise != meeting:
tortoise = f(tortoise)
meeting = f(meeting)
mu = mu + 1
return mu, lam
function brent(f, x0):
# Phase 1: Find cycle length λ
[tortoise](/page/Tortoise) = x0
[hare](/page/Hare) = f(x0)
power = 1
lam = 0
while [tortoise](/page/Tortoise) != [hare](/page/Hare):
if power == lam:
[tortoise](/page/Tortoise) = [hare](/page/Hare)
power = power * 2
lam = 0
[hare](/page/Hare) = f([hare](/page/Hare))
lam = lam + 1
# lam now holds λ, [tortoise](/page/Tortoise) and [hare](/page/Hare) meet at a cycle point
meeting = [tortoise](/page/Tortoise) # Save meeting point
# Phase 2: Find tail length μ
mu = 0
tortoise = x0
while tortoise != meeting:
tortoise = f(tortoise)
meeting = f(meeting)
mu = mu + 1
return mu, lam
Gosper's Cycle Detection Algorithm
Gosper's cycle detection algorithm, developed by Bill Gosper in 1972 as part of the HAKMEM project at MIT's Artificial Intelligence Laboratory, provides an efficient method for identifying cycles in sequences generated by repeated application of a function over a finite set. The algorithm targets "inexorable" processes, where reversing or altering function arguments is impractical, such as in pseudorandom number generation or self-referential list traversal in early Lisp systems. By using bitwise operations to manage a compact table of prior values, it simulates the behavior of multiple pointers without explicit storage for each, enabling detection in environments with limited resources. The algorithm initializes a tableS of size , where bounds the maximum possible period, and stores the initial value at S[0]. It then iterates the function, maintaining a step counter starting from 1 and the current value . At each step, compute , and check against S[0] to S[s-1]. If a match occurs at index , the cycle is detected, with length , and indices and satisfy . Otherwise, increment , advance , compute the number of trailing zeros in the binary representation of (the ruler function), and store at S[r]. This power-of-2 indexing via bit shifts and masks advances "virtual pointers" exponentially, akin to Brent's later approach but optimized for low-level bit manipulation. The process guarantees cycle detection before any value repeats a third time.[17]
In terms of complexity, the algorithm runs in time, where is the tail length and is the cycle length, as it performs a constant number of comparisons per iteration on average due to the logarithmic table size. Space usage is , which remains effectively constant for bounded domains typical in early computing applications. Its bitwise efficiency reduces memory accesses and suits embedded or hardware contexts, such as Lisp machines of the era, by allowing register-based simulation of pointers without dynamic allocation.
Implementation Notes
The following pseudocode illustrates the algorithm in a simplified form, assuming a functionh that iterates the sequence and ctz(i) computes the number of trailing zeros in 's binary representation:
m = ceil(log2(L))
S = [array](/page/Array) of size m // table to store selected prior values
S[0] = x0
i = 1
z = h(x0)
while i <= |G| do
s = floor(log2(i)) + 1
for k = 0 to s-1 do
if z == S[k] then
t = 1 + ((i - (1 << (k+1))) % (1 << (k+2)))
return (i, i + t) // cycle detected
end if
end for
i = i + 1
z = h(z)
r = ctz(i)
if r < m then
S[r] = z
end if
end while
return no cycle
m = ceil(log2(L))
S = [array](/page/Array) of size m // table to store selected prior values
S[0] = x0
i = 1
z = h(x0)
while i <= |G| do
s = floor(log2(i)) + 1
for k = 0 to s-1 do
if z == S[k] then
t = 1 + ((i - (1 << (k+1))) % (1 << (k+2)))
return (i, i + t) // cycle detected
end if
end for
i = i + 1
z = h(z)
r = ctz(i)
if r < m then
S[r] = z
end if
end while
return no cycle
<<) and modulo operations on powers of 2 for speed in low-level code.
Advanced Techniques
Time-Space Tradeoff Methods
Time-space tradeoff methods in cycle detection leverage additional memory to enable efficient identification of cycles, particularly beneficial in structures or sequences with large or complex state spaces where constant-space techniques may incur high traversal costs in worst-case scenarios. A fundamental approach employs a hash set to track visited states during traversal. The algorithm iterates through the sequence, inserting each new state into the hash set before proceeding. If a subsequent state is found to already exist in the set, a cycle is confirmed at that point. This method ensures exact detection with a time complexity of O(n), where n represents the number of steps until repetition (tail length μ plus cycle length λ), and a space complexity of O(n) in the worst case, as the set may store up to all unique states encountered. The use of hashing provides amortized constant-time insertions and lookups, making it practical for moderate-sized inputs despite the space overhead.[18] For scenarios demanding reduced memory footprint at the expense of perfect accuracy, a Bloom filter variant offers a probabilistic alternative. In this setup, visited states are represented by setting bits in a fixed-size bit array via multiple independent hash functions. Membership queries for new states involve checking the corresponding bits; all bits set indicates a probable prior visit, signaling a potential cycle, though false positives can lead to erroneous detections. The false positive probability can be controlled by adjusting the filter size and number of hash functions, typically achieving low error rates (e.g., <1%) with space usage of roughly 1.44 log₂(1/ε) bits per element for error rate ε. This variant is especially useful for very large state spaces, such as in distributed systems or graph analyses, where exact storage is infeasible. Bloom filters were originally proposed to enable space-efficient hashing with tolerable errors. Their application to cycle detection appears in contexts like distributed garbage collection, where sketches approximate reachable sets to identify cyclic references with bounded false positive rates for cycle detection.[19] Tradeoff analyses for these methods in finite state spaces of size m highlight how allocated space influences detection speed via collision probabilities akin to the birthday paradox. With k bits of space in a probabilistic structure like a tuned Bloom filter or limited hash table, collisions signaling cycles can be detected in expected time O(√m), as the probability of revisiting a stored state rises quadratically with steps taken. The expected steps to the first collision approximates derived from the asymptotic analysis of random collisions in m possible states. This bound establishes key context for scalability: in random mappings, detection occurs much sooner than exhaustive traversal of m states. Such methods are preferable when state spaces are vast (m ≫ μ + λ), as in cryptographic iterations or simulations, where constant-space linear-time methods would require impractical traversals, though they introduce probabilistic guarantees rather than determinism. The rho-shaped structure of the sequence aids in locating the exact collision point once detected.Extensions and Recent Developments
Cycle detection algorithms originally developed for sequences, such as Floyd's tortoise and hare and Brent's methods (collectively known as rho algorithms), have been extended to functional graphs, which are directed graphs where each node has out-degree exactly one, representing the structure of iterated functions.[20] In these settings, the algorithms traverse the unique path from a starting node until a cycle is detected, maintaining O(1) space and expected O(√n) time for graphs with n nodes. This adaptation contrasts with cycle detection in general directed graphs, where depth-first search (DFS) is the standard approach, systematically exploring branches to identify back edges in O(|V| + |E|) time. Incremental cycle detection addresses dynamic directed graphs where edges are inserted over time, requiring updates to maintain acyclicity or detect cycles after each insertion. In 2018, Bhattacharya and Kulkarni introduced an improved randomized algorithm for sparse graphs, achieving a total expected update time of across m edge insertions, where hides polylogarithmic factors.[21] This bound improves upon prior deterministic methods that required time, leveraging data structures like dynamic trees to efficiently track topological orders.[21] Recent advances have further reduced complexities for incremental settings. In 2024, Chen et al. developed the first almost-linear time algorithms for incremental cycle detection in directed graphs, running in total time for m updates, by reducing the problem to approximate minimum-ratio cycle finding via minimum-cost flow techniques.[22] For undirected graphs, Trabelsi's 2025 work provides randomized algorithms for girth computation (the length of the shortest cycle) and cycle detection, achieving time to find cycles of length at most roughly 2ℓ times the girth, using fast matrix multiplication for subquadratic performance in dense graphs.[23] Distributed variants extend cycle detection to large-scale graphs via message-passing models. The Rocha-Thatte algorithm, proposed in 2015, operates in a bulk-synchronous parallel framework on directed graphs, where vertices propagate sequences to neighbors until a cycle is identified by a vertex receiving its own identifier, completing in iterations proportional to the longest path length plus a constant.[24] Despite these extensions, rho methods remain inefficient for sparse non-functional graphs, as they assume a single deterministic path per node and fail to handle branching out-edges, often requiring exponential exploration without the linear structure of functional graphs.[20]Applications
Data Structure Verification
Cycle detection plays a vital role in debugging linked lists, where unintended cycles arising from pointer manipulation errors can cause infinite loops during traversal operations. Such cycles typically result from bugs like erroneous node linking in insertion, deletion, or modification routines, where a node's next pointer inadvertently points back to an earlier node instead of null or the correct successor. To identify these issues efficiently, Floyd's cycle-finding algorithm, which employs two pointers advancing at disparate speeds through the list, is widely used; it detects a cycle if the faster pointer laps the slower one, achieving this in linear time and constant space. This approach prevents program hangs and aids in isolating faulty code segments during development.[25] A common coding problem that exemplifies this application is LeetCode 141: Linked List Cycle. In this problem, given the head of a singly linked list, the goal is to return true if the list contains a cycle (i.e., some node can be reached again by following the next pointers) and false otherwise. Internally, pos denotes the index where the tail connects to form the cycle (or -1 if no cycle), though pos is not passed as a parameter. Examples include: head = [3,2,0,-4], pos = 1 → true (tail connects to index 1); head = [1,2], pos = 0 → true (tail connects to index 0); head = [9], pos = -1 → false (no cycle). The constraints specify that the number of nodes is between 0 and 10^4, node values range from -10^5 to 10^5, and pos is -1 or a valid index. The follow-up challenge requires solving the problem with O(1) extra memory, which Floyd's tortoise and hare algorithm satisfies perfectly.[8] In memory management, cycle detection is indispensable for garbage collection in systems relying on reference counting, as mutual references forming cycles evade automatic deallocation and lead to memory leaks. Python's garbage collector addresses this through a dedicated cycle detection mechanism integrated into its generational collection process, which employs a depth-first search to traverse container objects and identify strongly connected components of cyclic references for reclamation. This ensures that objects involved in cycles, such as lists or dictionaries referencing each other, are properly freed without manual intervention, maintaining application performance. For tree validation, cycle detection verifies the acyclicity of structures purported to be trees, particularly those augmented with parent pointers that could inadvertently form loops. A standard technique involves depth-first search (DFS) starting from the root, marking visited nodes and using the parent reference to ignore the immediate predecessor; a cycle is flagged if a back edge to an ancestor or in-progress node is encountered. This method is essential in domains like file system hierarchies or XML DOM parsing, where cycles would disrupt recursive processing and hierarchical integrity. Cycle detection integrates into various debugging tools and language runtimes to automate integrity checks. In Python, the built-in gc module provides programmatic hooks for developers to trigger cycle detection and inspect uncollectable objects during runtime analysis. For C and C++ programs, tools like AddressSanitizer (part of LLVM) can indirectly reveal cycle-related issues through memory access violations during traversal, while runtime assertions incorporating Floyd's algorithm are common in custom debug builds. These integrations facilitate proactive verification without altering production code significantly. A illustrative case study involves a frequent bug in merging two singly linked lists, such as during sorted list combination; if the tail of the first list is mistakenly set to point to a node within the second list (e.g., due to off-by-one indexing in pointer advancement), it creates a cycle when the merged structure is traversed. This error manifests as an infinite loop in subsequent operations like printing or searching the list, highlighting the need for post-merge cycle checks using efficient detection algorithms to catch and correct such implementation flaws early.[8]Cryptography and Number Theory
In cryptography and number theory, cycle detection plays a pivotal role in probabilistic algorithms for integer factorization and solving discrete logarithm problems. One seminal application is Pollard's rho algorithm, introduced in 1975, which leverages cycle detection to efficiently factor composite integers . The algorithm generates a pseudorandom sequence using the iteration , where is a constant, and applies Floyd's tortoise and hare method to detect a collision where two values for some prime factor of . This collision allows computation of , yielding a nontrivial factor with high probability.[26] Cycle detection is also essential for evaluating pseudorandom number generators, such as linear congruential generators (LCGs) defined by . These generators produce periodic sequences, and algorithms like Floyd's tortoise and hare are employed to empirically detect the cycle length, providing a practical measure of the period without exhaustive enumeration. This assessment ensures the generator's suitability for cryptographic applications by verifying that the period approaches the theoretical maximum under proper parameter choices.[27] In elliptic curve cryptography, the rho method extends Pollard's approach to solve the elliptic curve discrete logarithm problem (ECDLP): given points and on an elliptic curve over a finite field, find the scalar . Iteration proceeds in the group by partitioning into phases with operations like point doubling and addition, forming a pseudorandom walk; cycle detection via tortoise and hare identifies a collision that resolves the logarithm through linear algebra over the relations. This adaptation, rooted in Pollard's 1978 work on discrete logs modulo primes, is a cornerstone for attacking ECC due to its generic group applicability. The practical complexity of these rho methods stems from the birthday paradox, where in a space of size (a prime factor or group order), the expected number of steps until a collision is approximately , yielding an overall time complexity of arithmetic operations. Brent's cycle detection algorithm can optimize implementations by reducing constant factors in this detection phase. Historically, Pollard's rho enabled the factorization of large integers, such as those in early RSA challenges, serving as a key tool for cryptanalysis in the pre-quantum era before more advanced methods like the number field sieve dominated.[28]Simulation and Modeling
In simulations of dynamical systems, cycle detection is essential for identifying attractors and periodic orbits, particularly in chaotic iterations where trajectories may converge to repeating states. The logistic map, defined by the iteration , serves as a canonical example, where numerical simulations reveal periodic orbits through period-doubling bifurcations as the parameter increases. Hybrid numeric-symbolic methods compute attracting cycles by iterating near bifurcation points and solving characteristic equations derived from cyclic polynomials, enabling precise period detection even in unstable regimes. These techniques filter out transient behavior to isolate stable limit cycles, with computations feasible up to period 13 using tools like Mathematica for polynomial reduction via Gröbner bases.[29][30] Close-return methods further enhance detection in chaotic maps by monitoring state proximities during long iterations, systematically identifying unstable periodic orbits without exhaustive enumeration. For instance, downsampling sequences reduces computational load while preserving recurrence signatures, applicable to both discrete maps and continuous systems discretized for simulation. This approach has been validated on real-world data, such as laser systems exhibiting chaos, where it locates orbits missed by brute-force iteration.[31] In rule-based simulations, such as cellular automata or iterative game theory processes, cycle detection identifies steady states by tracking configuration repetitions, signaling convergence to periodic equilibria. In elementary cellular automata, evolutions from initial states (e.g., single-site or disordered) are simulated until a configuration recurs, often within a bounded state space of possibilities under periodic boundaries; for rule 126 with , cycles of length 2 or 6 emerge after short transients, analyzed via state transition graphs. Similarly, in game theory iterations like rock-paper-scissors, experimental simulations detect persistent cycles in strategy frequencies, quantifying oscillation periods to validate evolutionary predictions.[32][33] Network protocols employ cycle detection to identify loops in routing algorithms, preventing broadcast storms that amplify traffic indefinitely. Packet trace analysis detects loops by identifying replica streams—duplicate packets with matching headers (except TTL, differing by at least 2) and payloads—merging overlapping events to map cyclic paths; applied to backbone traces, it reveals loops lasting under 10 seconds, primarily from TTL deltas of 2.[34] Deadlock avoidance in operating systems integrates cycle detection via resource allocation graphs, where nodes represent processes and edges denote resource requests; a cycle indicates potential deadlock, prompting preemptive recovery. Simulations model allocation scenarios using wait-for graphs, invoking detection periodically based on resource contention frequency to maintain system liveness without excessive overhead.[35] In population modeling simulations, cycle detection terminates iterative runs upon state repetition, capturing oscillating dynamics like predator-prey cycles. Ecological time series apply periodogram analysis to detect significant periodicities, evaluating peak significance against red-noise null models to distinguish true cyclicity from stochastic variation in simulated populations. A methodology for limit cycle detection in discrete-event simulations monitors state histories for oscillations, using Fourier transforms on output traces to quantify cycle lengths and amplitudes in models like semiconductor fabs, where product mix changes induce periodic throughput fluctuations.[36][37]References
- Apr 28, 2023 · Description. Cycle detection or cycle finding is the algorithmic problem of finding a cycle in a sequence of iterated function values.
