Hubbry Logo
Optimal stoppingOptimal stoppingMain
Open search
Optimal stopping
Community hub
Optimal stopping
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Optimal stopping
Optimal stopping
from Wikipedia

In mathematics, the theory of optimal stopping[1][2] or early stopping[3] is concerned with the problem of choosing a time to take a particular action, in order to maximise an expected reward or minimise an expected cost. Optimal stopping problems can be found in areas of statistics, economics, and mathematical finance (related to the pricing of American options). A key example of an optimal stopping problem is the secretary problem. Optimal stopping problems can often be written in the form of a Bellman equation, and are therefore often solved using dynamic programming.

Definition

[edit]

Discrete time case

[edit]

Stopping rule problems are associated with two objects:

  1. A sequence of random variables , whose joint distribution is something assumed to be known
  2. A sequence of 'reward' functions which depend on the observed values of the random variables in 1:

Given those objects, the problem is as follows:

  • You are observing the sequence of random variables, and at each step , you can choose to either stop observing or continue
  • If you stop observing at step , you will receive reward
  • You want to choose a stopping rule to maximize your expected reward (or equivalently, minimize your expected loss)

Continuous time case

[edit]

Consider a gain process defined on a filtered probability space and assume that is adapted to the filtration. The optimal stopping problem is to find the stopping time which maximizes the expected gain

where is called the value function. Here can take value .

A more specific formulation is as follows. We consider an adapted strong Markov process defined on a filtered probability space where denotes the probability measure where the stochastic process starts at . Given continuous functions , and , the optimal stopping problem is

This is sometimes called the MLS (which stand for Mayer, Lagrange, and supremum, respectively) formulation.[4]

Solution methods

[edit]

There are generally two approaches to solving optimal stopping problems.[4] When the underlying process (or the gain process) is described by its unconditional finite-dimensional distributions, the appropriate solution technique is the martingale approach, so called because it uses martingale theory, the most important concept being the Snell envelope. In the discrete time case, if the planning horizon is finite, the problem can also be easily solved by dynamic programming.

When the underlying process is determined by a family of (conditional) transition functions leading to a Markov family of transition probabilities, powerful analytical tools provided by the theory of Markov processes can often be utilized and this approach is referred to as the Markov method. The solution is usually obtained by solving the associated free-boundary problems (Stefan problems).

A jump diffusion result

[edit]

Let be a Lévy diffusion in given by the SDE

where is an -dimensional Brownian motion, is an -dimensional compensated Poisson random measure, , , and are given functions such that a unique solution exists. Let be an open set (the solvency region) and

be the bankruptcy time. The optimal stopping problem is:

It turns out that under some regularity conditions,[5] the following verification theorem holds:

If a function satisfies

  • where the continuation region is ,
  • on , and
  • on , where is the infinitesimal generator of

then for all . Moreover, if

  • on

Then for all and is an optimal stopping time.

These conditions can also be written is a more compact form (the integro-variational inequality):

  • on

Examples

[edit]

Coin tossing

[edit]

(Example where converges)

You have a fair coin and are repeatedly tossing it. Each time, before it is tossed, you can choose to stop tossing it and get paid (in dollars, say) the average number of heads observed.

You wish to maximise the amount you get paid by choosing a stopping rule. If Xi (for i ≥ 1) forms a sequence of independent, identically distributed random variables with Bernoulli distribution

and if

then the sequences , and are the objects associated with this problem.

House selling

[edit]

(Example where does not necessarily converge)

You have a house and wish to sell it. Each day you are offered for your house, and pay to continue advertising it. If you sell your house on day , you will earn , where .

You wish to maximise the amount you earn by choosing a stopping rule.

In this example, the sequence () is the sequence of offers for your house, and the sequence of reward functions is how much you will earn.[6]

Secretary problem

[edit]
Three cases of the secretary problem with icon height denoting desirability:
  1. Too small an exploration set selects a suboptimal candidate before the best (*) is seen.
  2. An ideal set identifies the best.
  3. If a too-large set includes the best, the last candidate is selected.

(Example where is a finite sequence)

You are observing a sequence of objects which can be ranked from best to worst. You wish to choose a stopping rule which maximises your chance of picking the best object.

Here, if (n is some large number) are the ranks of the objects, and is the chance you pick the best object if you stop intentionally rejecting objects at step i, then and are the sequences associated with this problem. This problem was solved in the early 1960s by several people. An elegant solution to the secretary problem and several modifications of this problem is provided by the more recent odds algorithm of optimal stopping (Bruss algorithm).

Search theory

[edit]

Economists have studied a number of optimal stopping problems similar to the 'secretary problem', and typically call this type of analysis 'search theory'. Search theory has especially focused on a worker's search for a high-wage job, or a consumer's search for a low-priced good.

Parking problem

[edit]

A special example of an application of search theory is the task of optimal selection of parking space by a driver going to the opera (theater, shopping, etc.). Approaching the destination, the driver goes down the street along which there are parking spaces – usually, only some places in the parking lot are free. The goal is clearly visible, so the distance from the target is easily assessed. The driver's task is to choose a free parking space as close to the destination as possible without turning around so that the distance from this place to the destination is the shortest.[7]

Option trading

[edit]

In the trading of options on financial markets, the holder of an American option is allowed to exercise the right to buy (or sell) the underlying asset at a predetermined price at any time before or at the expiry date. Therefore, the valuation of American options is essentially an optimal stopping problem. Consider a classical Black–Scholes set-up and let be the risk-free interest rate and and be the dividend rate and volatility of the stock. The stock price follows geometric Brownian motion

under the risk-neutral measure.

When the option is perpetual, the optimal stopping problem is

where the payoff function is for a call option and for a put option. The variational inequality is

for all where is the exercise boundary. The solution is known to be[8]

  • (Perpetual call) where and
  • (Perpetual put) where and

On the other hand, when the expiry date is finite, the problem is associated with a 2-dimensional free-boundary problem with no known closed-form solution. Various numerical methods can, however, be used. See Black–Scholes model#American options for various valuation methods here, as well as Fugit for a discrete, tree based, calculation of the optimal time to exercise.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Optimal stopping is a branch of mathematics, primarily within and stochastic processes, that addresses the problem of determining the optimal time to cease of a sequential and take an action, in order to maximize the expected reward or minimize the expected cost based on the information gathered up to that point. This involves formulating a stopping rule—a decision procedure adapted to the evolving —that balances the between continuing to observe for potentially better outcomes and the risk of missing an acceptable one due to costs like time or opportunity. The theory formalizes these decisions using concepts such as stopping times, which are random variables indicating when to halt, and relies on tools from martingale theory and dynamic programming to compute the value function representing the maximal expected gain. The origins of optimal stopping trace back to early work in sequential analysis during the 1940s, notably Abraham Wald's development of the sequential probability ratio test for efficient hypothesis testing in quality control and military applications during World War II. Building on this, foundational contributions in the 1950s and 1960s by researchers like J.L. Snell, Y.S. Chow, and H. Robbins established the modern framework, emphasizing backward induction for finite-horizon problems and essential suprema for infinite ones. A seminal text, Great Expectations: The Theory of Optimal Stopping by Chow, Robbins, and Siegmund (1971), synthesized these advances and highlighted the role of the Snell envelope—a supermartingale that bounds the optimal value process. Earlier roots can be found in gambling problems from the 16th century, such as those analyzed by Gerolamo Cardano, though rigorous probabilistic treatment awaited the axiomatic foundations laid by Andrey Kolmogorov in 1933. Classic examples illustrate the theory's core principles. In the secretary problem (or marriage problem), one must select the best candidate from a sequence of N applicants interviewed in random order, without recall; the optimal strategy is to observe and reject the first N/e ≈ 37% of candidates (where e is the base of the natural logarithm), then hire the next applicant who exceeds all prior ones, yielding a success probability that approaches 1/e ≈ 37% as N grows large. Another foundational case is the house-selling problem, where sequential offers arrive randomly, and the decision to accept balances the current bid against the of waiting, often solved via threshold rules derived from the value function. These discrete-time problems extend to continuous settings, such as stopping a to maximize expected gain, using free-boundary problems where continuation and stopping regions are delineated by the infinitesimal generator of the process. Applications of optimal stopping span diverse fields, underscoring its practical significance. In , it underpins the pricing of American options, where early exercise decisions maximize payoff under the Black-Scholes framework, contributing to the 1997 Nobel Prize in awarded to Merton and . Economic models of job search employ it to determine reservation wages, minimizing unemployment duration while maximizing lifetime earnings. In medicine and engineering, it aids change-point detection, such as monitoring vitals or failures to trigger interventions at the optimal moment, often incorporating costs for delayed action. More broadly, the theory informs in , including the one-armed bandit problem for and sequential auctions for .

Fundamentals

Definition

Optimal stopping refers to the mathematical framework for determining the optimal time to terminate a or sequence of observations in order to maximize an expected reward or minimize an expected cost. This decision-making paradigm arises in uncertain environments where actions must be based on evolving information, such as sequentially arriving random variables, and the goal is to select a moment that balances immediate gains against potential future improvements. Formally, the problem involves choosing a τ\tau for an observation process XX to optimize an objective like supτE[g(Xτ)]\sup_{\tau} \mathbb{E}[g(X_\tau)], where gg denotes the reward function and the supremum is taken over admissible stopping times. Central assumptions include the sequential revelation of information about the process, the irrevocability of the stopping decision once made, and the absence of recall, meaning previously passed opportunities cannot be revisited. These constraints model real-world scenarios where decisions are final and information unfolds over time without the option to undo choices. Key terminology in optimal stopping includes the stopping time, a τ\tau that is measurable with respect to the generated by the process up to time τ\tau, ensuring the decision to stop depends only on information available at that instant. The reservation value represents the critical threshold for the process value above which stopping becomes optimal, often characterizing simple threshold-based strategies. Additionally, the indifference curve (or boundary) delineates the region in the state space where the of stopping equals that of continuing, marking the transition between continuation and stopping sets in the solution. A non-mathematical illustrates the concept: consider a searching for the best on an item, weighing the effort and time of further inquiries against the risk of settling for a suboptimal deal, ultimately stopping when the current offer sufficiently outweighs anticipated improvements. Such problems are analyzed in both discrete and continuous time formulations, as explored in subsequent sections.

Historical Context

The roots of optimal stopping theory trace back to early gambling problems in the 16th century, such as those analyzed by in De Ludo Aleae (1564), and further developed through 18th-century advancements in probability, with influences from the formulated by in the late 1700s. In the 1940s, applications spurred connections to , originating with the U.S. Navy's 1942 Antisubmarine Warfare Group under Bernard Koopman, whose 1946 synthesis in Search and Screening optimized detection efforts under uncertainty, effectively framing when to halt searches based on probabilistic success. The 1950s marked significant development with Abraham Wald's sequential analysis from 1947, extended by Herbert Robbins' 1952 papers on adaptive experimental designs and optimal stopping rules in statistical , which generalized stopping to non-statistical contexts via dynamic programming influences from Richard Bellman. Thomas S. Ferguson's contemporaneous work further formalized pure stopping problems, as seen in his 1967 contributions linking to broader sequential frameworks. The 1960s saw key advancements, including the formalization of the secretary problem—traced to discussions by Merrill Flood in 1950 and popularized by in 1960—with rigorous solutions by Dennis V. Lindley (1961), (1963), and others like Chow, Moriguti, Robbins, and Samuels (1964), establishing asymptotic strategies for rank-based stopping. This era's momentum carried into the 1970s-1980s transition to , driven by Albert Shiryaev's 1978 Optimal Stopping Rules and collaborative works with Gordon Peskir, which integrated martingale methods and free-boundary problems for continuous-time formulations, influencing quickest detection and prediction. Post-2000 growth integrated optimal stopping with , building on 1970s option like Black-Scholes (1973) for American options as stopping problems, and extended to AI through Gaussian processes for time-series decisions (2022). In the 2020s, recent extensions address predicted priors in contexts, as in the 2025 formulation balancing consistency and robustness against prior errors, bridging classical theory to algorithmic predictions.

Mathematical Formulations

Discrete Time Case

In the discrete time case, optimal stopping problems are formulated over a countable set of time steps, typically indexed by non-negative integers, where decisions are made sequentially based on observed data. Consider a stochastic process (Xt)t=0(X_t)_{t=0}^\infty with state space ERdE \subseteq \mathbb{R}^d, often assumed to be a time-homogeneous Markov chain for tractability, starting from an initial state X0=xEX_0 = x \in E. The filtration (Ft)t0(\mathcal{F}_t)_{t \geq 0} is generated by the observations up to time tt, with Ft=σ(X0,,Xt)\mathcal{F}_t = \sigma(X_0, \dots, X_t). A stopping time τ\tau is a random variable taking values in {0,1,2,,n}\{0, 1, 2, \dots, n\} for finite horizon nn or {0,1,2,}{}\{0, 1, 2, \dots \} \cup \{\infty\} for infinite horizon, such that {τt}Ft\{\tau \leq t\} \in \mathcal{F}_t for all tt. If τ=0\tau = 0, the payoff is g(X0)g(X_0). The objective is to choose τ\tau to maximize the expected total reward, given by supτEx[k=1τrk(Xk1)+g(Xτ)],\sup_{\tau} E_x\left[ \sum_{k=1}^\tau r_k(X_{k-1}) + g(X_\tau) \right], where the sum is empty (equal to 0) if τ=0\tau = 0, rk:ERr_k: E \to \mathbb{R} denotes the running reward received at step kk (possibly time-dependent), and g:ERg: E \to \mathbb{R} is the terminal payoff function upon stopping. Here, ExE_x denotes expectation under the measure with X0=xX_0 = x, and the supremum is taken over all stopping times τ\tau. This setup captures sequential decision problems where continuing incurs a running reward (or if negative) until stopping yields the terminal payoff. In many Markovian settings, the running rewards are state-dependent, rk(Xk1)r_k(X_{k-1}), to reflect the process dynamics. For the finite horizon case with nn steps, the problem is solved via . Define the value function Vk(x)V_k(x) as the supremum expected reward starting from state xx at step kk, for k=0,1,,nk = 0, 1, \dots, n. At the final step k=nk = n, Vn(x)=g(x),V_n(x) = g(x), and recursively for k=n1,,0k = n-1, \dots, 0, Vk(x)=max(g(x),rk+1(x)+Ex[Vk+1(Xk+1)Xk=x]).V_k(x) = \max\left( g(x), \, r_{k+1}(x) + E_x[V_{k+1}(X_{k+1}) \mid X_k = x] \right). The optimal action at step kk is to stop if g(x)rk+1(x)+Ex[Vk+1(Xk+1)Xk=x]g(x) \geq r_{k+1}(x) + E_x[V_{k+1}(X_{k+1}) \mid X_k = x], yielding the optimal stopping time as the first kk where this inequality holds. This dynamic programming ensures the value function satisfies the principle of optimality, computing the solution in O(n)O(n) steps under standard assumptions like bounded rewards. In the infinite horizon case, to ensure convergence and avoid unbounded rewards, a discount factor β(0,1)\beta \in (0,1) is typically introduced, weighting future rewards. The value function V(x)V(x) satisfies the V(x)=max(g(x),r(x)+βEx[V(X1)X0=x]),V(x) = \max\left( g(x), \, r(x) + \beta E_x[V(X_1) \mid X_0 = x] \right), where the supremum is over all stopping times τ<\tau < \infty almost surely. Under contractivity of the operator induced by β\beta and boundedness of gg and rr, successive approximations converge to the unique fixed point VV, and an optimal stopping time exists. Under monotonicity assumptions, such as gg being non-decreasing and the process XtX_t having non-decreasing conditional expectations, the solution converges to an optimal threshold policy. Specifically, the continuation region {x:V(x)>g(x)}\{x : V(x) > g(x)\} is below a threshold bb^*, and the optimal τ=inf{t0:Xtb}\tau = \inf\{t \geq 0 : X_t \geq b^* \}, where bb^* solves the free-boundary problem from the . This structure simplifies computation and holds for one-dimensional diffusions discretized in time or i.i.d. observations with monotone likelihood ratios.

Continuous Time Case

In the continuous time case, optimal stopping problems are formulated for evolving over uncountable time intervals, typically Markov processes such as diffusions, observed continuously. The underlying is XtX_t, defined on the time [0,)[0, \infty), and the stopping decision is made via a stopping time τ\tau adapted to the filtration {Ft}t0\{\mathcal{F}_t\}_{t \geq 0} generated by the process up to time tt. This setup allows for decisions based on the complete history of observations, contrasting with discrete-time analogs that rely on finite-step decisions. The goal is to select the stopping time τ\tau that maximizes the expected discounted reward: supτE[g(Xτ)erτ+0τf(Xs)ersds],\sup_{\tau} \mathbb{E}\left[ g(X_\tau) e^{-r\tau} + \int_0^\tau f(X_s) e^{-rs} \, ds \right], where r>0r > 0 is the constant discount rate, gg is the terminal payoff function received upon stopping, and ff is the running payoff function accumulated continuously until τ\tau. This objective balances the immediate reward from stopping against the ongoing accumulation of discounted gains from continuation. The solution is provided by the Snell envelope ZtZ_t, which represents the conditional value of optimal future stopping: Zt=\esssupτtE[g(Xτ)er(τt)+tτf(Xs)er(st)dsFt].Z_t = \esssup_{\tau \geq t} \mathbb{E}\left[ g(X_\tau) e^{-r(\tau - t)} + \int_t^\tau f(X_s) e^{-r(s - t)} \, ds \,\Big|\, \mathcal{F}_t \right].
Add your contribution
Related Hubs
User Avatar
No comments yet.