Hubbry Logo
Stochastic programmingStochastic programmingMain
Open search
Stochastic programming
Community hub
Stochastic programming
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Stochastic programming
Stochastic programming
from Wikipedia

In the field of mathematical optimization, stochastic programming is a framework for modeling optimization problems that involve uncertainty. A stochastic program is an optimization problem in which some or all problem parameters are uncertain, but follow known probability distributions.[1][2] This framework contrasts with deterministic optimization, in which all problem parameters are assumed to be known exactly. The goal of stochastic programming is to find a decision which both optimizes some criteria chosen by the decision maker, and appropriately accounts for the uncertainty of the problem parameters. Because many real-world decisions involve uncertainty, stochastic programming has found applications in a broad range of areas ranging from finance to transportation to energy optimization.[3][4]

Methods

[edit]

Several stochastic programming methods have been developed:

Two-stage problem definition

[edit]

The basic idea of two-stage stochastic programming is that (optimal) decisions should be based on data available at the time the decisions are made and cannot depend on future observations. The two-stage formulation is widely used in stochastic programming. The general formulation of a two-stage stochastic programming problem is given by: where is the optimal value of the second-stage problem

The classical two-stage linear stochastic programming problems can be formulated as

where is the optimal value of the second-stage problem

In such formulation is the first-stage decision variable vector, is the second-stage decision variable vector, and contains the data of the second-stage problem. In this formulation, at the first stage we have to make a "here-and-now" decision before the realization of the uncertain data , viewed as a random vector, is known. At the second stage, after a realization of becomes available, we optimize our behavior by solving an appropriate optimization problem.

At the first stage we optimize (minimize in the above formulation) the cost of the first-stage decision plus the expected cost of the (optimal) second-stage decision. We can view the second-stage problem simply as an optimization problem which describes our supposedly optimal behavior when the uncertain data is revealed, or we can consider its solution as a recourse action where the term compensates for a possible inconsistency of the system and is the cost of this recourse action.

The considered two-stage problem is linear because the objective functions and the constraints are linear. Conceptually this is not essential and one can consider more general two-stage stochastic programs. For example, if the first-stage problem is integer, one could add integrality constraints to the first-stage problem so that the feasible set is discrete. Non-linear objectives and constraints could also be incorporated if needed.[5]

Distributional assumption

[edit]

The formulation of the above two-stage problem assumes that the second-stage data is modeled as a random vector with a known probability distribution. This would be justified in many situations. For example, the distribution of could be inferred from historical data if one assumes that the distribution does not significantly change over the considered period of time. Also, the empirical distribution of the sample could be used as an approximation to the distribution of the future values of . If one has a prior model for , one could obtain a posteriori distribution by a Bayesian update.

Scenario-based approach

[edit]

Discretization

[edit]

To solve the two-stage stochastic problem numerically, one often needs to assume that the random vector has a finite number of possible realizations, called scenarios, say , with respective probability masses . Then the expectation in the first-stage problem's objective function can be written as the summation: and, moreover, the two-stage problem can be formulated as one large linear programming problem (this is called the deterministic equivalent of the original problem, see section § Deterministic equivalent of a stochastic problem).

When has an infinite (or very large) number of possible realizations the standard approach is then to represent this distribution by scenarios. This approach raises three questions, namely:

  1. How to construct scenarios, see § Scenario construction;
  2. How to solve the deterministic equivalent. Optimizers such as CPLEX, and GLPK can solve large linear/nonlinear problems. The NEOS Server,[6] hosted at the University of Wisconsin, Madison, allows free access to many modern solvers. The structure of a deterministic equivalent is particularly amenable to apply decomposition methods,[7] such as Benders' decomposition or scenario decomposition;
  3. How to measure quality of the obtained solution with respect to the "true" optimum.

These questions are not independent. For example, the number of scenarios constructed will affect both the tractability of the deterministic equivalent and the quality of the obtained solutions.

Stochastic linear programming

[edit]

A stochastic linear program is a specific instance of the classical two-stage stochastic program. A stochastic LP is built from a collection of multi-period linear programs (LPs), each having the same structure but somewhat different data. The two-period LP, representing the scenario, may be regarded as having the following form:

The vectors and contain the first-period variables, whose values must be chosen immediately. The vector contains all of the variables for subsequent periods. The constraints involve only first-period variables and are the same in every scenario. The other constraints involve variables of later periods and differ in some respects from scenario to scenario, reflecting uncertainty about the future.

Note that solving the two-period LP is equivalent to assuming the scenario in the second period with no uncertainty. In order to incorporate uncertainties in the second stage, one should assign probabilities to different scenarios and solve the corresponding deterministic equivalent.

Deterministic equivalent of a stochastic problem

[edit]

With a finite number of scenarios, two-stage stochastic linear programs can be modelled as large linear programming problems. This formulation is often called the deterministic equivalent linear program, or abbreviated to deterministic equivalent. (Strictly speaking a deterministic equivalent is any mathematical program that can be used to compute the optimal first-stage decision, so these will exist for continuous probability distributions as well, when one can represent the second-stage cost in some closed form.) For example, to form the deterministic equivalent to the above stochastic linear program, we assign a probability to each scenario . Then we can minimize the expected value of the objective, subject to the constraints from all scenarios:

We have a different vector of later-period variables for each scenario . The first-period variables and are the same in every scenario, however, because we must make a decision for the first period before we know which scenario will be realized. As a result, the constraints involving just and need only be specified once, while the remaining constraints must be given separately for each scenario.

Scenario construction

[edit]

In practice it might be possible to construct scenarios by eliciting experts' opinions on the future. The number of constructed scenarios should be relatively modest so that the obtained deterministic equivalent can be solved with reasonable computational effort. It is often claimed that a solution that is optimal using only a few scenarios provides more adaptable plans than one that assumes a single scenario only. In some cases such a claim could be verified by a simulation. In theory some measures of guarantee that an obtained solution solves the original problem with reasonable accuracy. Typically in applications only the first stage optimal solution has a practical value since almost always a "true" realization of the random data will be different from the set of constructed (generated) scenarios.

Suppose contains independent random components, each of which has three possible realizations (for example, future realizations of each random parameters are classified as low, medium and high), then the total number of scenarios is . Such exponential growth of the number of scenarios makes model development using expert opinion very difficult even for reasonable size . The situation becomes even worse if some random components of have continuous distributions.

Monte Carlo sampling and sample average approximation (SAA) method

[edit]

A common approach to reduce the scenario set to a manageable size is by using Monte Carlo simulation. Suppose the total number of scenarios is very large or even infinite. Suppose further that we can generate a sample of realizations of the random vector . Usually the sample is assumed to be independent and identically distributed (i.i.d sample). Given a sample, the expectation function is approximated by the sample average

and consequently the first-stage problem is given by

This formulation is known as the sample average approximation method. The SAA problem is a function of the considered sample and in that sense is random. For a given sample the SAA problem is of the same form as a two-stage stochastic linear programming problem with the scenarios ., , each taken with the same probability .


Statistical inference

[edit]

Consider the following stochastic programming problem

Here is a nonempty closed subset of , is a random vector whose probability distribution is supported on a set , and . In the framework of two-stage stochastic programming, is given by the optimal value of the corresponding second-stage problem.

Assume that is well defined and finite valued for all . This implies that for every the value is finite almost surely.

Suppose that we have a sample of realizations of the random vector . This random sample can be viewed as historical data of observations of , or it can be generated by Monte Carlo sampling techniques. Then we can formulate a corresponding sample average approximation

By the law of large numbers we have that, under some regularity conditions converges pointwise with probability 1 to as . Moreover, under mild additional conditions the convergence is uniform. We also have , i.e., is an unbiased estimator of . Therefore, it is natural to expect that the optimal value and optimal solutions of the SAA problem converge to their counterparts of the true problem as .

Consistency of SAA estimators

[edit]

Suppose the feasible set of the SAA problem is fixed, i.e., it is independent of the sample. Let and be the optimal value and the set of optimal solutions, respectively, of the true problem and let and be the optimal value and the set of optimal solutions, respectively, of the SAA problem.

  1. Let and be a sequence of (deterministic) real valued functions. The following two properties are equivalent:
    • for any and any sequence converging to it follows that converges to
    • the function is continuous on and converges to uniformly on any compact subset of
  2. If the objective of the SAA problem converges to the true problem's objective with probability 1, as , uniformly on the feasible set . Then converges to with probability 1 as .
  3. Suppose that there exists a compact set such that
    • the set of optimal solutions of the true problem is nonempty and is contained in
    • the function is finite valued and continuous on
    • the sequence of functions converges to with probability 1, as , uniformly in
    • for large enough the set is nonempty and with probability 1
then and with probability 1 as . Note that denotes the deviation of set from set , defined as

In some situations the feasible set of the SAA problem is estimated, then the corresponding SAA problem takes the form

where is a subset of depending on the sample and therefore is random. Nevertheless, consistency results for SAA estimators can still be derived under some additional assumptions:

  1. Suppose that there exists a compact set such that
    • the set of optimal solutions of the true problem is nonempty and is contained in
    • the function is finite valued and continuous on
    • the sequence of functions converges to with probability 1, as , uniformly in
    • for large enough the set is nonempty and with probability 1
    • if and converges with probability 1 to a point , then
    • for some point there exists a sequence such that with probability 1.
then and with probability 1 as .

Asymptotics of the SAA optimal value

[edit]

Suppose the sample is i.i.d. and fix a point . Then the sample average estimator , of , is unbiased and has variance , where is supposed to be finite. Moreover, by the central limit theorem we have that

where denotes convergence in distribution and has a normal distribution with mean and variance , written as .

In other words, has asymptotically normal distribution, i.e., for large , has approximately normal distribution with mean and variance . This leads to the following (approximate) % confidence interval for :

where (here denotes the cdf of the standard normal distribution) and

is the sample variance estimate of . That is, the error of estimation of is (stochastically) of order .

Applications and examples

[edit]

Biological applications

[edit]

Stochastic dynamic programming is frequently used to model animal behaviour in such fields as behavioural ecology.[8][9] Empirical tests of models of optimal foraging, life-history transitions such as fledging in birds and egg laying in parasitoid wasps have shown the value of this modelling technique in explaining the evolution of behavioural decision making. These models are typically many-staged, rather than two-staged.

Economic applications

[edit]

Stochastic dynamic programming is a useful tool in understanding decision making under uncertainty. The accumulation of capital stock under uncertainty is one example; often it is used by resource economists to analyze bioeconomic problems[10] where the uncertainty enters in such as weather, etc.

Example: multistage portfolio optimization

[edit]

The following is an example from finance of multi-stage stochastic programming. Suppose that at time we have initial capital to invest in assets. Suppose further that we are allowed to rebalance our portfolio at times but without injecting additional cash into it. At each period we make a decision about redistributing the current wealth among the assets. Let be the initial amounts invested in the n assets. We require that each is nonnegative and that the balance equation should hold.

Consider the total returns for each period . This forms a vector-valued random process . At time period , we can rebalance the portfolio by specifying the amounts invested in the respective assets. At that time the returns in the first period have been realized so it is reasonable to use this information in the rebalancing decision. Thus, the second-stage decisions, at time , are actually functions of realization of the random vector , i.e., . Similarly, at time the decision is a function of the available information given by the history of the random process up to time . A sequence of functions , , with being constant, defines an implementable policy of the decision process. It is said that such a policy is feasible if it satisfies the model constraints with probability 1, i.e., the nonnegativity constraints , , , and the balance of wealth constraints,

where in period the wealth is given by

which depends on the realization of the random process and the decisions up to time .

Suppose the objective is to maximize the expected utility of this wealth at the last period, that is, to consider the problem

This is a multistage stochastic programming problem, where stages are numbered from to . Optimization is performed over all implementable and feasible policies. To complete the problem description one also needs to define the probability distribution of the random process . This can be done in various ways. For example, one can construct a particular scenario tree defining time evolution of the process. If at every stage the random return of each asset is allowed to have two continuations, independent of other assets, then the total number of scenarios is

In order to write dynamic programming equations, consider the above multistage problem backward in time. At the last stage , a realization of the random process is known and has been chosen. Therefore, one needs to solve the following problem

where denotes the conditional expectation of given . The optimal value of the above problem depends on and and is denoted .

Similarly, at stages , one should solve the problem

whose optimal value is denoted by . Finally, at stage , one solves the problem

Stagewise independent random process

[edit]

For a general distribution of the process , it may be hard to solve these dynamic programming equations. The situation simplifies dramatically if the process is stagewise independent, i.e., is (stochastically) independent of for . In this case, the corresponding conditional expectations become unconditional expectations, and the function , does not depend on . That is, is the optimal value of the problem

and is the optimal value of

for .

Software tools

[edit]

Modelling languages

[edit]

All discrete stochastic programming problems can be represented with any algebraic modeling language, manually implementing explicit or implicit non-anticipativity to make sure the resulting model respects the structure of the information made available at each stage. An instance of an SP problem generated by a general modelling language tends to grow quite large (linearly in the number of scenarios), and its matrix loses the structure that is intrinsic to this class of problems, which could otherwise be exploited at solution time by specific decomposition algorithms. Extensions to modelling languages specifically designed for SP are starting to appear, see:

  • AIMMS – supports the definition of SP problems
  • EMP SP (Extended Mathematical Programming for Stochastic Programming) – a module of GAMS created to facilitate stochastic programming (includes keywords for parametric distributions, chance constraints and risk measures such as Value at risk and Expected shortfall).
  • SAMPL – a set of extensions to AMPL specifically designed to express stochastic programs (includes syntax for chance constraints, integrated chance constraints and robust optimization problems)

They both can generate SMPS instance level format, which conveys in a non-redundant form the structure of the problem to the solver.

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Stochastic programming is a framework designed to make optimal decisions in the presence of , where some or all problem parameters are modeled as random variables following known probability distributions. Unlike deterministic optimization, it incorporates probabilistic elements to account for real-world variability in data such as demands, prices, or yields, aiming to minimize expected costs or maximize expected profits over possible scenarios. The field originated in the early 1950s as an extension of to handle uncertain coefficients, pioneered by George B. Dantzig through concepts like for solving large-scale problems. A fundamental distinction in stochastic programming is between two-stage and multistage models. In two-stage stochastic programs, decisions are divided into "here-and-now" choices made before is revealed (first stage) and "wait-and-see" recourse actions taken afterward (second stage), with the objective minimizing first-stage costs plus the expected value of second-stage recourse costs. This approach approximates more complex dynamics by collapsing future uncertainties into a single revelation point, often using scenario trees or sample average approximations for computation. Multistage extensions generalize this to sequential decision-making across multiple periods, where uncertainties unfold gradually and decisions adapt at each stage, better capturing non-anticipativity constraints that prevent foresight of future outcomes. These models are typically formulated as large-scale linear or mixed-integer programs, solved via methods like , stochastic decomposition, or progressive hedging. Stochastic programming finds broad applications in industries facing , including planning for power under variable and fuel prices, financial amid market volatility, and with fluctuating supplies. In process , it optimizes designs for chemical plants accounting for raw material variability, while in , it supports and under stochastic travel times. Recent advances emphasize measures beyond expectation, such as conditional value-at-risk, to address downside potential, and integration with for scenario from data. Despite computational challenges for high-dimensional uncertainties, the field's growth reflects its essential role in decision support for complex, real-time systems.

Introduction

Definition and Basic Concepts

Stochastic programming is a mathematical optimization paradigm designed to make optimal decisions in the presence of , where some or all problem parameters are modeled as random variables following known probability distributions. Unlike deterministic optimization, which assumes fixed parameters, stochastic programming incorporates probabilistic elements to evaluate decisions based on expected outcomes or risk-adjusted measures, enabling robust solutions for real-world applications such as , , and financial planning. The general formulation of a stochastic program seeks to minimize the of an objective function involving uncertain parameters, expressed as minxE[f(x,ξ)]\min_x \mathbb{E}[f(x, \xi)], where xx represents the decision variables, ξ\xi is a random vector with a specified distribution, and E[]\mathbb{E}[\cdot] denotes the expectation operator. This expectation integrates over all possible realizations of ξ\xi, weighted by their probabilities, to capture the average performance of the decision xx. Probabilistic constraints may also be included, such as P[g(x,ξ)0]1α\mathbb{P}[g(x, \xi) \leq 0] \geq 1 - \alpha, ensuring that constraints hold with high probability 1α1 - \alpha. These elements replace fixed parameters in deterministic models with distributional information, shifting the focus from point estimates to optimization that accounts for variability. Central to stochastic programming are key concepts like here-and-now decisions, which are first-stage choices made before the uncertainty ξ\xi is realized, and wait-and-see decisions, or recourse actions, implemented after observing ξ\xi. Non-anticipativity constraints enforce that decisions at each stage depend only on information available up to that point, preventing the use of future revelations in earlier choices and ensuring feasible, implementable policies. For instance, in a two-stage setting, here-and-now decisions commit resources upfront, while wait-and-see adjustments adapt to realized outcomes. Uncertainty in stochastic programming is modeled using discrete distributions, represented by finite scenarios each with associated probabilities, or continuous distributions approximated via sampling methods. Beyond , risk measures such as conditional value-at-risk (CVaR) provide introductory tools to quantify tail risks by focusing on the exceeding a value-at-risk threshold, promoting more conservative decisions in volatile environments.

Historical Development

The origins of stochastic programming trace back to the mid-1950s, when researchers began addressing the inherent uncertainties in real-world optimization problems. George B. Dantzig, often regarded as the father of , highlighted the stochastic nature of practical applications in his seminal 1955 paper, "Linear Programming under Uncertainty," where he proposed initial approaches to incorporate probabilistic elements into . Independently, E. M. L. Beale contributed foundational work in the same year through his paper "On Minimizing a Subject to Linear Inequalities," which explored methods for handling stochastic by approximating expected values and recourse actions. These efforts marked the field's emergence as a distinct extension of deterministic optimization, focusing on under . Key milestones in the late 1950s and further solidified the theoretical framework. Dantzig and Albert Madansky formalized the two-stage recourse problem in their 1961 paper, "On the Solution of Two-Stage Linear Programs under Uncertainty," presented at the Fourth Berkeley Symposium on Mathematical Statistics and Probability, introducing supporting hyperplanes for the recourse function to enable practical computation. Chance-constrained programming, a subclass emphasizing probabilistic feasibility, was introduced by Abraham Charnes and William W. Cooper in 1959, allowing constraints to hold with a specified probability rather than deterministically. During the and , decomposition methods gained prominence for tackling larger-scale problems; notably, the L-shaped method, an adaptation of for stochastic programs, was developed by Richard Van Slyke and Roger J.-B. Wets in 1969, facilitating iterative solution of two-stage models by generating optimality and feasibility cuts. The and 1990s saw significant growth in computational techniques and modeling sophistication, driven by advances in hardware and algorithms. Scenario trees emerged as a critical tool for representing multistage , with early developments in the enabling non-anticipative decisions in dynamic settings, as explored in works like those by John R. Birge and François Louveaux in their 1997 book, which built on prior scenario-based approximations from the decade. Mixed-integer stochastic programming also advanced, with theoretical and computational progress starting in the late , allowing integration of discrete decisions under . The formation of the Stochastic Programming Society in 2012, evolving from the earlier Committee on Stochastic Programming (established in 1982) and the triennial International Conference on Stochastic Programming (initiated in 1986), fostered collaboration and dissemination of these innovations. In recent years, up to 2025, stochastic programming has integrated with for enhanced scenario generation, such as using generative models to create realistic uncertainty representations from data, as reviewed in studies on machine learning applications to . Distributionally robust extensions have also proliferated, addressing ambiguity in probability distributions to improve solution robustness, with key formulations appearing in works like those on two-stage distributionally robust stochastic programs since 2020. These trends are particularly evident in applications to planning, where multistage models optimize capacity expansion under variable and solar outputs, as demonstrated in recent frameworks combining stochastic operations with expansion decisions.

Problem Formulations

Two-Stage Stochastic Programs

Two-stage stochastic programs represent a core paradigm in stochastic programming, modeling decision-making under where first-stage decisions must be made before the realization of random variables, followed by second-stage corrective actions. Introduced by in his seminal 1955 paper, this framework captures "here-and-now" choices in the first stage and "wait-and-see" adjustments in the second stage once uncertainty is revealed. Such models are widely applied in areas like and , where initial commitments cannot be easily revised without incurring additional costs. The standard mathematical formulation of a two-stage stochastic linear program with recourse is given by minx,y cx+Eξ[Q(x,ξ)]\min_{x, y} \ \mathbf{c}' x + \mathbb{E}_\xi [Q(x, \xi)] subject to Ax=b,x0,A x = b, \quad x \geq 0, T(x,ξ)y=h(ξ),y0,T(x, \xi) y = h(\xi), \quad y \geq 0, where xx denotes the first-stage decision vector, yy the second-stage decision vector, ξ\xi the random vector representing , c\mathbf{c} the first-stage cost coefficients, and Q(x,ξ)=miny{q(ξ)yT(x,ξ)y=h(ξ),y0}Q(x, \xi) = \min_y \{ \mathbf{q}(\xi)' y \mid T(x, \xi) y = h(\xi), \, y \geq 0 \} is the recourse function capturing the minimum expected cost of second-stage adjustments for a given first-stage decision and uncertainty realization. The expectation Eξ\mathbb{E}_\xi integrates over the distribution of ξ\xi, reflecting the probabilistic of the recourse costs. This ensures that first-stage decisions against potential future scenarios while minimizing total expected costs. Recourse in two-stage models refers to the feasibility and cost of second-stage actions to address deviations caused by . Complete recourse assumes that for every feasible first-stage xx and every realization ξ\xi, there exists a second-stage yy satisfying the constraints, ensuring Q(x,ξ)Q(x, \xi) is finite and well-defined everywhere. Relatively complete recourse relaxes this by requiring feasibility only for first-stage decisions in the original feasible set. Simple recourse, a special case, models second-stage decisions as non-negative surplus and shortage variables to balance supply-demand imbalances, often with linear costs. Economically, recourse costs q(ξ)\mathbf{q}(\xi) interpret as penalties for corrective measures, such as overtime production or lost sales, incentivizing first-stage decisions that minimize expected adjustments. The random vector ξ\xi is typically assumed to have a known , often with finite discrete support for tractability—representing scenarios ξs\xi_s with probabilities ps>0p_s > 0, sps=1\sum_s p_s = 1, so the expectation becomes spsQ(x,ξs)\sum_s p_s Q(x, \xi_s). For continuous distributions, the expectation is approximated by generating a of scenarios via sampling methods like , enabling reformulation as a large-scale deterministic equivalent. This scenario-based preserves the two-stage structure while facilitating computational solution. A representative example is inventory planning under random demand. In the first stage, a retailer decides an order x0x \geq 0 at cxc x, subject to a xbx \leq b. ξ\xi follows a discrete distribution with scenarios ξs\xi_s (e.g., low, medium, high demand) and probabilities psp_s. In the second stage, if ξs>x\xi_s > x, a incurs penalty p(ξsx)p (\xi_s - x); if ξs<x\xi_s < x, excess costs h(xξs)h (x - \xi_s). The recourse function is Q(x,ξs)=pmax(ξsx,0)+hmax(xξs,0)Q(x, \xi_s) = p \max(\xi_s - x, 0) + h \max(x - \xi_s, 0), and the objective minimizes cx+spsQ(x,ξs)c x + \sum_s p_s Q(x, \xi_s). For instance, with c=1c=1, b=100b=100, p=2p=2, h=0.5h=0.5, and scenarios ξ1=80\xi_1=80 (p1=0.4p_1=0.4), ξ2=120\xi_2=120 (p2=0.6p_2=0.6), due to the specific parameters the expected total cost is constant at 128 for any xx in [80, 120]; with the budget constraint, the optimal x=100x = 100 yields an expected total cost of 128, balancing overstock and shortage risks.

Multistage Stochastic Programs

Multistage stochastic programs extend the framework of stochastic optimization to sequential decision-making over multiple time periods, where uncertainty unfolds gradually and decisions are made adaptively based on partial information revelation. In this setting, decisions at each stage tt are taken before the full realization of future uncertainties, leading to a dynamic structure that captures the value of information and recourse actions over time. This contrasts with simpler models by allowing for a richer representation of real-world planning problems, such as resource allocation under evolving market conditions or environmental factors. The general formulation of a multistage stochastic program involves minimizing a nested sequence of expectations over decision variables xtx_t at each stage t=1,,Tt = 1, \dots, T, conditional on information available up to that point. Specifically, the objective is to solve minx1E[t=1TctxtF0],\min_{x_1} \mathbb{E} \left[ \sum_{t=1}^T c_t' x_t \mid \mathcal{F}_0 \right], where the expectation is taken with respect to the filtration {Ft}\{\mathcal{F}_t\} representing the information structure, subject to constraints like Atxt=bt(ωt)A_t x_t = b_t(\omega_t) for realizations ωt\omega_t of the random process ξt\xi_t. Non-anticipativity constraints ensure that xtx_t is Ft1\mathcal{F}_{t-1}-measurable, meaning decisions at stage tt depend only on past observations and not on future outcomes, preventing the use of unattainable foresight. This recursive structure defines the value function at each stage via dynamic programming, where the cost-to-go from stage tt onward is the conditional expectation of the remaining costs given the history up to tt. For instance, the two-stage model emerges as a special case when T=2T=2, reducing to a single here-and-now decision followed by wait-and-see recourse. To operationalize this formulation, uncertainties are typically discretized into scenario trees, which provide a discrete approximation of the underlying stochastic process. A scenario tree is constructed as a lattice with nodes representing partial histories of the random outcomes, where branches from a node denote possible evolutions at the next stage, incorporating both fan-out (divergence into multiple futures) and fan-in (convergence of paths to shared nodes) structures. Decisions are node-based, with the same action prescribed for all scenarios passing through a given node to enforce non-anticipativity; for example, the root node corresponds to the initial decision x1x_1, while leaf nodes terminate at stage TT with associated probabilities summing to one along each full path (scenario). Construction methods, such as backward reduction or forward clustering of initial scenarios, aim to balance approximation accuracy with computational feasibility, often using distance metrics like the LrL_r-norm to bound errors in the tree's representation of the original distribution. A key simplifying assumption in many multistage models is stagewise independence, where the random increments ξt\xi_t at stage tt are conditionally independent of the past history ξ[t1]\xi_{[t-1]} given the information up to t1t-1. This Markovian property decouples the conditional expectations in the dynamic program, allowing the cost-to-go function Qt(xt1,ξt)Q_t(x_{t-1}, \xi_t) to depend only on the current state rather than the entire path, which facilitates recursive computation and reduces the dimensionality of scenario dependencies. Without this assumption, the full history must be tracked, complicating tree structures and increasing storage requirements. Despite these advancements, multistage stochastic programs face significant computational challenges, primarily the curse of dimensionality arising from the exponential growth in the number of nodes and scenarios as stages TT and branching factors increase. For SS scenarios per stage, the tree size scales as O(ST)O(S^T), rendering exact solutions intractable for large horizons; approximation techniques must thus be employed to prune or aggregate nodes while preserving solution quality. A representative application is long-term hydropower planning, where operators must allocate water releases over multiple periods (e.g., weeks or months) amid uncertain inflows and electricity prices, balancing immediate generation against future reservoir levels—problems where even modest T=7T=7 stages can yield tens of thousands of nodes due to hydrological variability.

Chance-Constrained Programs

Chance-constrained programs address optimization problems under uncertainty by requiring that constraints hold with a specified probability, rather than deterministically, to model reliability in the presence of random parameters. The standard formulation minimizes a linear objective subject to probabilistic constraints on linear inequalities involving uncertain parameters, expressed as minxcx\min_{x} c^\top x subject to P{A(x,ξ)b}1α\mathbb{P}\{A(x, \xi) \leq b\} \geq 1 - \alpha, where xx is the decision vector, ξ\xi is a random vector with known distribution, A(x,ξ)A(x, \xi) is an affine function in xx and ξ\xi, bb is a deterministic vector, and α(0,1)\alpha \in (0,1) is the risk tolerance level. Chance constraints can be classified as individual or joint based on how multiple constraints interact probabilistically. Individual chance constraints apply separately to each constraint, requiring P{Ai(x,ξ)bi}1α\mathbb{P}\{A_i(x, \xi) \leq b_i\} \geq 1 - \alpha for each row ii, which simplifies computation but may overestimate feasibility since violations can occur simultaneously across constraints. In contrast, joint chance constraints ensure simultaneous satisfaction of all constraints with the desired probability, P{A(x,ξ)b}1α\mathbb{P}\{A(x, \xi) \leq b\} \geq 1 - \alpha, providing a more conservative and realistic reliability guarantee but increasing computational complexity. Programs are further categorized as static or dynamic depending on the decision structure and uncertainty resolution. Static chance-constrained programs involve here-and-now decisions xx made before observing ξ\xi, with ξ\xi drawn from a fixed known distribution, leading to single-stage optimization. Dynamic variants extend this to sequential decisions over time, where uncertainty unfolds gradually, often incorporating feedback or recourse actions, though the core probabilistic constraint framework remains similar. Solving general chance-constrained programs is challenging due to non-convexity arising from the probabilistic nature of the constraints, which can result in non-convex feasible regions even for linear deterministic counterparts. For arbitrary distributions, these problems are NP-hard, particularly with discrete or non-elliptical continuous distributions, complicating exact solutions. However, exact convex reformulations exist when ξ\xi follows a normal distribution, transforming individual chance constraints into deterministic second-order cone constraints via the inverse cumulative distribution function of the standard normal. To address non-convexity, conservative approximations replace the chance constraint with a convex surrogate, such as one based on conditional value-at-risk (CVaR), which bounds the expected shortfall beyond the α\alpha-quantile and yields a tractable semidefinite program for certain distributions. This CVaR approach provides a feasible but potentially suboptimal solution, with tightness improving as α\alpha decreases. A representative application appears in reservoir operation, where the goal is to determine release policies that minimize costs while ensuring the probability of reservoir overflow (spill) remains below a threshold, such as P{storage level>capacity}α\mathbb{P}\{\text{storage level} > \text{capacity}\} \leq \alpha, accounting for uncertain inflows.

Solution Methods

Scenario-Based Methods

Scenario-based methods in stochastic programming approximate the continuous probability distribution of uncertain parameters, denoted as ξ\xi, by a finite discrete set of scenarios ω=1,,S\omega = 1, \dots, S, where each scenario ω\omega is assigned a probability pω>0p_\omega > 0 such that ω=1Spω=1\sum_{\omega=1}^S p_\omega = 1. This discretization process transforms the original stochastic program into a deterministic equivalent problem that can be solved using standard optimization techniques, enabling practical computation for problems where the uncertainty space Ξ\Xi is infinite or continuous. The choice of scenarios is guided by the need to capture the essential features of the distribution, such as moments or support points, often derived from historical data or analytical approximations. For multistage stochastic programs, scenario-based methods employ non-anticipative trees to model the evolution of uncertainty over time, ensuring that decisions at each stage depend only on information available up to that point. In a tree, nodes represent decision points, and branches from a common ancestor node share the same history, enforcing non-anticipativity through constraints that identical paths lead to identical decisions. For instance, a three-period tree might branch into multiple paths at each stage, with probabilities propagating along the branches to reflect conditional distributions of ξ\xi. This structure preserves the sequential nature of decisions under partial information revelation. The resulting deterministic equivalent for a two-stage stochastic program has a size that grows linearly with the number of scenarios SS, featuring n+Smn + S m variables and m1+Sm2m_1 + S m_2 constraints, where nn and mm denote first- and second-stage dimensions, respectively. For a two-stage formulation minxcx+E[Q(x,ξ)]\min_x c^\top x + \mathbb{E}[Q(x, \xi)], the equivalent is minx,{yω}ω=1Spω(cx+qωyω)\min_{x, \{y_\omega\}} \sum_{\omega=1}^S p_\omega (c^\top x + q_\omega^\top y_\omega) subject to Ax=bA x = b and Tωx+Wyω=hωT_\omega x + W y_\omega = h_\omega for each ω\omega, with non-negativity. In multistage settings, the extensive form expands similarly across tree nodes, leading to exponential growth in SS for deep trees, which poses computational challenges. To mitigate this, scenario reduction techniques such as fast-forward selection iteratively select a subset of scenarios by maximizing probabilistic similarity, using metrics like the Kantorovich distance to preserve distribution properties while reducing SS (e.g., from thousands to dozens), thereby trading minor approximation error for significant decreases in solve time. A representative example is under uncertainty, where are generated to model varying over planning periods. In a two-stage model, first-stage decisions set production capacities, while second-stage recourse adjusts or backorders for each ω\omega with probability pωp_\omega; for instance, discretizing a normal distribution into five equiprobable allows solving the deterministic equivalent to balance expected costs. can be generated via sampling methods to approximate the underlying distribution.

Decomposition Techniques

Decomposition techniques in stochastic programming exploit the of the problem to divide large-scale formulations into smaller, more manageable subproblems, facilitating efficient while preserving optimality guarantees. These methods are particularly valuable for two-stage and multistage programs where the function introduces nonlinearity that can be approximated through iterative cut generation or dual updates. By solving subproblems corresponding to scenarios—which represent possible realizations of the —these algorithms generate supporting to refine the first-stage decisions progressively. A foundational approach is adapted to two-stage stochastic programs, known as the L-shaped method, which separates the here-and-now decisions xx from the wait-and-see recourse decisions yy. The two-stage problem is formulated as minimizing cx+E[Q(x,ξ)]c^\top x + \mathbb{E}[Q(x, \xi)], where Q(x,ξ)=min{qy:Wy=hTx,y0}Q(x, \xi) = \min \{ q^\top y : W y = h - T x, y \geq 0 \} is the recourse function for ξ\xi. The algorithm iteratively solves a master problem approximating the expected recourse via linear cuts and subproblems for each scenario to generate dual information. Optimality cuts are derived from extreme points πi\pi^i of the dual feasible set {π:Wπq}\{ \pi : W^\top \pi \leq q \}, taking the form θkpk(hkTkx)πki\theta \geq \sum_k p_k (h_k - T_k x)^\top \pi_k^i, added to bound the recourse from below. Feasibility cuts, generated when subproblems are infeasible, use extreme rays σr\sigma^r of the dual to enforce kpk(hkTkx)σkr0\sum_k p_k (h_k - T_k x)^\top \sigma_k^r \geq 0. The process converges to the optimal solution under finite scenarios, with the master providing lower bounds and subproblem evaluations yielding upper bounds. Stochastic decomposition extends this framework by incorporating sampling to handle large or continuous uncertainty sets, generating sample-based cuts that approximate the expected recourse statistically. In the two-stage setting, the algorithm alternates between solving a master problem with cuts from a growing sample of scenarios and updating the sample to reduce variance in the approximation, ensuring almost sure convergence to the true optimum as the sample size increases. Variants for multistage programs build on this by sequentially sampling across stages, using forward and backward passes to propagate cuts and value functions, which bridges stochastic programming with approximate dynamic programming techniques. These multistage extensions incorporate variance reduction strategies, such as or , to accelerate convergence in high-dimensional settings with endogenous uncertainties. For instance, the method generates stagewise cuts based on empirical distributions, allowing parallel subproblem solves while maintaining statistical consistency. Progressive hedging provides an alternative dual strategy suited for scenario-based multistage programs, enforcing the non-anticipativity constraints—requiring decisions to be scenario-independent up to the information revelation—through penalty terms and aggregation. The algorithm solves scenario subproblems independently in each , then updates Lagrange multipliers to penalize deviations from the average (hedged) policy, converging to a feasible non-anticipative solution under convexity assumptions. This approach is particularly effective for parallel computation, as subproblems can be distributed across processors, and it handles variables via relaxations or bundling. Recent enhancements include adaptive penalty updates to balance feasibility and optimality progress. As an illustrative example, consider a two-stage newsvendor (inventory) model where the first-stage decision xx is the order quantity, and the second-stage recourse accounts for overage and underage costs under uncertain demand ξ\xi. The objective is to minimize E[ξx]\mathbb{E}[|\xi - x|] (assuming unit costs for excess and shortage), with 0x100 \leq x \leq 10 and scenarios ξ=1,2,4\xi = 1, 2, 4 each with probability 1/31/3. The L-shaped method proceeds iteratively, solving the master problem and subproblems to generate optimality and feasibility cuts that tighten the approximation of the recourse function. The process adds cuts based on dual information from scenario subproblems, with upper bounds from subproblem costs and lower bounds from the master, converging after several iterations to the optimal solution at x=2x = 2, where Q(x)=1Q(x) = 1.

Approximation and Sampling Methods

In stochastic programming, approximation methods rely on sampling techniques to estimate expectations involving random variables, enabling the solution of otherwise intractable problems. sampling serves as a foundational approach, where a set of NN independent and identically distributed (i.i.d.) samples ξk\xi^k, k=1,,Nk=1,\dots,N, drawn from the underlying , approximates the function E[Q(x,ξ)]\mathbb{E}[Q(x,\xi)] by the empirical mean 1Nk=1NQ(x,ξk)\frac{1}{N} \sum_{k=1}^N Q(x,\xi^k). This method is particularly useful for high-dimensional or complex distributions, as it requires minimal assumptions on the probability measure beyond the ability to generate samples. The sample average approximation (SAA) method builds directly on sampling by formulating and solving a deterministic using the sample-based estimate as a proxy for the original program. In SAA, the true problem minxE[Q(x,ξ)]\min_x \mathbb{E}[Q(x,\xi)] is replaced by minx1Nk=1NQ(x,ξk)\min_x \frac{1}{N} \sum_{k=1}^N Q(x,\xi^k), which can be solved using standard optimization solvers, with the resulting solution serving as an approximation to the optimal stochastic decision. To mitigate the high variance inherent in crude estimates, reweights the sampling distribution to focus on regions of high contribution to the expectation, reducing the variance of the without introducing when properly adjusted. For instance, samples are drawn from a proposal distribution q(ξ)q(\xi) and reweighted by the likelihood ratio p(ξ)q(ξ)\frac{p(\xi)}{q(\xi)}, where pp is the target distribution, leading to more efficient approximations in problems with rare events or skewed uncertainties. Beyond basic Monte Carlo, specialized scenario construction methods enhance approximation quality by generating structured samples that better capture distributional properties. Quasi-Monte Carlo (QMC) methods employ low-discrepancy sequences, such as Sobol or Halton points, to produce samples that are more uniformly distributed than pseudorandom ones, yielding faster convergence rates—often O((logN)d/N)O((\log N)^d / N) in dimension dd—for smooth integrands typical in recourse functions. Latin hypercube sampling (LHS) divides the probability space into strata and samples one point from each, ensuring marginal coverage and reducing clustering, which is effective for scenario generation in multidimensional settings. Moment-matching techniques construct discrete scenarios by optimizing weights and support points to replicate specified statistical moments (e.g., mean, variance, skewness, and correlations) of the original distribution, providing a calibrated approximation suitable for multistage programs. These methods can be used as inputs to decomposition algorithms like Benders decomposition to improve subproblem evaluations. A representative application of SAA arises in portfolio selection under random asset returns, where the objective is to maximize subject to constraints. Samples of return scenarios are generated via or LHS, and the SAA problem is solved to approximate the ; however, smaller sample sizes introduce bias toward overly optimistic portfolios, while larger sizes reduce variance but increase computational cost, necessitating a analysis to balance estimation error and solvability.

Theoretical Foundations

Deterministic Equivalents

In stochastic programming, a deterministic equivalent reformulates the original problem into a large-scale deterministic that preserves the optimality conditions of the stochastic formulation, allowing it to be solved using standard optimization solvers. This approach is particularly useful for scenario-based models where is discretized into a of scenarios, each with an associated probability. The resulting equivalent is often a linear or mixed-integer program, depending on the original problem structure. For a two-stage stochastic linear program with recourse, the deterministic equivalent takes the form of minimizing the expected total cost over scenarios. Specifically, consider a problem where first-stage decisions xx are made before is revealed, followed by second-stage recourse decisions yωy_\omega for each scenario ωΩ\omega \in \Omega. The equivalent is given by minx,{yω}ωΩcx+ωΩpωqωyω\min_{x, \{y_\omega\}_{\omega \in \Omega}} \quad c^\top x + \sum_{\omega \in \Omega} p_\omega q_\omega^\top y_\omega subject to Ax=b,Tωx+Wyω=hωωΩ,x0,yω0ωΩ,Ax = b, \quad T_\omega x + W y_\omega = h_\omega \quad \forall \omega \in \Omega, \quad x \geq 0, \quad y_\omega \geq 0 \quad \forall \omega \in \Omega, where pωp_\omega is the probability of scenario ω\omega, and the variables yωy_\omega are scenario-indexed to reflect recourse adapted to the realized . This formulation expands the problem size linearly with the number of scenarios Ω|\Omega|, making it tractable for moderate uncertainty but challenging for large sets. In the multistage case, the deterministic equivalent adopts an extensive form that represents the of evolving uncertainty over TT stages. Here, decisions at stage tt are replicated across all paths up to that point, with non-anticipativity enforced either through shared variables for decisions made under the same or via explicit linking constraints that ensure identical decisions for indistinguishable histories. The resulting problem size grows exponentially as O(ST)O(S^T), where SS is the number of per stage, leading to a that renders direct solution impractical for more than a few stages without reduction techniques. To mitigate this, scenario aggregation methods bundle similar into representative nodes, iteratively refining the approximation while preserving key distributional properties, as developed in progressive aggregation algorithms. A concrete example illustrates the two-stage deterministic equivalent for a simple linear program with three equally probable scenarios. Consider a farmer deciding on planting acres x1x_1 (), x2x_2 (corn), and x3x_3 (sugar beets) on 500 acres to minimize costs while meeting uncertain yield-based feed requirements in stage. The first-stage objective includes planting costs: 150x1+230x2+260x3150x_1 + 230x_2 + 260x_3. For each scenario—good (ω=1\omega=1), fair (ω=2\omega=2), and bad (ω=3\omega=3) yields—the second-stage recourse minimizes the expected costs of purchases (y, to cover feed requirements if production is insufficient) and (w, of excess production), such as 170w1150w236w310w4+238y1+210y2-170w_1 -150w_2 -36w_3 -10w_4 + 238y_1 +210y_2 per (where negative coefficients reflect revenues from sales and positive reflect purchase costs), subject to yield-dependent constraints like y1w12003x1y_1 - w_1 \geq 200 - 3x_1 (good), y1w12002.5x1y_1 - w_1 \geq 200 - 2.5x_1 (fair), and y1w12002x1y_1 - w_1 \geq 200 - 2x_1 (bad) for wheat, with analogous forms for corn (requiring 240 tons) and beets (sales constraints without purchases), plus x1+x2+x3500x_1 + x_2 + x_3 \leq 500 and non-negativity. The full equivalent integrates these into a single linear program with expected second-stage costs weighted by pω=1/3p_\omega = 1/3, yielding an optimal solution of 108,390 total cost with x1=170x_1 = 170, x2=80x_2 = 80, x3=250x_3 = 250.

Convergence and Statistical Inference

In stochastic programming, the sample average approximation (SAA) method replaces the expected value in the objective function with an empirical mean based on a sample of size NN from the underlying probability distribution. Under suitable regularity conditions, such as the feasible set being compact and the objective function being continuous with finite moments, the SAA optimal value vNv_N converges almost surely to the true optimal value vv^* as NN \to \infty. Similarly, the SAA optimal solution xNx_N converges almost surely to the true optimal solution set, provided the latter is nonempty and the sample average function converges uniformly on the feasible set. The asymptotic distribution of the SAA optimal value is characterized by a : N(vNv)\sqrt{N} (v_N - v^*)
Add your contribution
Related Hubs
User Avatar
No comments yet.