Hubbry Logo
Mathematical optimizationMathematical optimizationMain
Open search
Mathematical optimization
Community hub
Mathematical optimization
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Mathematical optimization
Mathematical optimization
from Wikipedia
Graph of a surface given by z = f(x, y) = −(x² + y²) + 4. The global maximum at (x, y, z) = (0, 0, 4) is indicated by a blue dot.
Nelder-Mead minimum search of Simionescu's function. Simplex vertices are ordered by their values, with 1 having the lowest ( best) value.

Mathematical optimization (alternatively spelled optimisation) or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives.[1][2] It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering[3] to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.[4]

In the more general approach, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function. The generalization of optimization theory and techniques to other formulations constitutes a large area of applied mathematics.

Optimization problems

[edit]

Optimization problems can be divided into two categories, depending on whether the variables are continuous or discrete:

An optimization problem can be represented in the following way:

Given: a function from some set A to the real numbers
Sought: an element x0A such that f(x0) ≤ f(x) for all xA ("minimization") or such that f(x0) ≥ f(x) for all xA ("maximization").

Such a formulation is called an optimization problem or a mathematical programming problem (a term not directly related to computer programming, but still in use for example in linear programming – see History below). Many real-world and theoretical problems may be modeled in this general framework.

Since the following is valid:

it suffices to solve only minimization problems. However, the opposite perspective of considering only maximization problems would be valid, too.

Problems formulated using this technique in the fields of physics may refer to the technique as energy minimization,[5] speaking of the value of the function f as representing the energy of the system being modeled. In machine learning, it is always necessary to continuously evaluate the quality of a data model by using a cost function where a minimum implies a set of possibly optimal parameters with an optimal (lowest) error.

Typically, A is some subset of the Euclidean space , often specified by a set of constraints, equalities or inequalities that the members of A have to satisfy. The domain A of f is called the search space or the choice set, while the elements of A are called candidate solutions or feasible solutions.

The function f is variously called an objective function, criterion function, loss function, cost function (minimization),[6] utility function or fitness function (maximization), or, in certain fields, an energy function or energy functional. A feasible solution that minimizes (or maximizes) the objective function is called an optimal solution.

In mathematics, conventional optimization problems are usually stated in terms of minimization.

A local minimum x* is defined as an element for which there exists some δ > 0 such that

the expression f(x*) ≤ f(x) holds;

that is to say, on some region around x* all of the function values are greater than or equal to the value at that element. Local maxima are defined similarly.

While a local minimum is at least as good as any nearby elements, a global minimum is at least as good as every feasible element. Generally, unless the objective function is convex in a minimization problem, there may be several local minima. In a convex problem, if there is a local minimum that is interior (not on the edge of the set of feasible elements), it is also the global minimum, but a nonconvex problem may have more than one local minimum not all of which need be global minima.

A large number of algorithms proposed for solving the nonconvex problems – including the majority of commercially available solvers – are not capable of making a distinction between locally optimal solutions and globally optimal solutions, and will treat the former as actual solutions to the original problem. Global optimization is the branch of applied mathematics and numerical analysis that is concerned with the development of deterministic algorithms that are capable of guaranteeing convergence in finite time to the actual optimal solution of a nonconvex problem.

Notation

[edit]

Optimization problems are often expressed with special notation. Here are some examples:

Minimum and maximum value of a function

[edit]

Consider the following notation:

This denotes the minimum value of the objective function x2 + 1, when choosing x from the set of real numbers . The minimum value in this case is 1, occurring at x = 0.

Similarly, the notation

asks for the maximum value of the objective function 2x, where x may be any real number. In this case, there is no such maximum as the objective function is unbounded, so the answer is "infinity" or "undefined".

Optimal input arguments

[edit]

Consider the following notation:

or equivalently

This represents the value (or values) of the argument x in the interval (−∞,−1] that minimizes (or minimize) the objective function x2 + 1 (the actual minimum value of that function is not what the problem asks for). In this case, the answer is x = −1, since x = 0 is infeasible, that is, it does not belong to the feasible set.

Similarly,

or equivalently

represents the {x, y} pair (or pairs) that maximizes (or maximize) the value of the objective function x cos y, with the added constraint that x lie in the interval [−5,5] (again, the actual maximum value of the expression does not matter). In this case, the solutions are the pairs of the form {5, 2kπ} and {−5, (2k + 1)π}, where k ranges over all integers.

Operators arg min and arg max are sometimes also written as argmin and argmax, and stand for argument of the minimum and argument of the maximum.

History

[edit]

Fermat and Lagrange found calculus-based formulae for identifying optima, while Newton and Gauss proposed iterative methods for moving towards an optimum.

The term "linear programming" for certain optimization cases was due to George B. Dantzig, although much of the theory had been introduced by Leonid Kantorovich in 1939. (Programming in this context does not refer to computer programming, but comes from the use of program by the United States military to refer to proposed training and logistics schedules, which were the problems Dantzig studied at that time.) Dantzig published the Simplex algorithm in 1947, and also John von Neumann and other researchers worked on the theoretical aspects of linear programming (like the theory of duality) around the same time.[7]

Other notable researchers in mathematical optimization include the following:

Major subfields

[edit]
  • Convex programming studies the case when the objective function is convex (minimization) or concave (maximization) and the constraint set is convex. This can be viewed as a particular case of nonlinear programming or as generalization of linear or convex quadratic programming.
    • Linear programming (LP), a type of convex programming, studies the case in which the objective function f is linear and the constraints are specified using only linear equalities and inequalities. Such a constraint set is called a polyhedron or a polytope if it is bounded.
    • Second-order cone programming (SOCP) is a convex program, and includes certain types of quadratic programs.
    • Semidefinite programming (SDP) is a subfield of convex optimization where the underlying variables are semidefinite matrices. It is a generalization of linear and convex quadratic programming.
    • Conic programming is a general form of convex programming. LP, SOCP and SDP can all be viewed as conic programs with the appropriate type of cone.
    • Geometric programming is a technique whereby objective and inequality constraints expressed as posynomials and equality constraints as monomials can be transformed into a convex program.
  • Integer programming studies linear programs in which some or all variables are constrained to take on integer values. This is not convex, and in general much more difficult than regular linear programming.
  • Quadratic programming allows the objective function to have quadratic terms, while the feasible set must be specified with linear equalities and inequalities. For specific forms of the quadratic term, this is a type of convex programming.
  • Fractional programming studies optimization of ratios of two nonlinear functions. The special class of concave fractional programs can be transformed to a convex optimization problem.
  • Nonlinear programming studies the general case in which the objective function or the constraints or both contain nonlinear parts. This may or may not be a convex program. In general, whether the program is convex affects the difficulty of solving it.
  • Stochastic programming studies the case in which some of the constraints or parameters depend on random variables.
  • Robust optimization is, like stochastic programming, an attempt to capture uncertainty in the data underlying the optimization problem. Robust optimization aims to find solutions that are valid under all possible realizations of the uncertainties defined by an uncertainty set.
  • Combinatorial optimization is concerned with problems where the set of feasible solutions is discrete or can be reduced to a discrete one.
  • Stochastic optimization is used with random (noisy) function measurements or random inputs in the search process.
  • Infinite-dimensional optimization studies the case when the set of feasible solutions is a subset of an infinite-dimensional space, such as a space of functions.
  • Heuristics and metaheuristics make few or no assumptions about the problem being optimized. Usually, heuristics do not guarantee that any optimal solution need be found. On the other hand, heuristics are used to find approximate solutions for many complicated optimization problems.
  • Constraint satisfaction studies the case in which the objective function f is constant (this is used in artificial intelligence, particularly in automated reasoning).
    • Constraint programming is a programming paradigm wherein relations between variables are stated in the form of constraints.
  • Disjunctive programming is used where at least one constraint must be satisfied but not all. It is of particular use in scheduling.
  • Space mapping is a concept for modeling and optimization of an engineering system to high-fidelity (fine) model accuracy exploiting a suitable physically meaningful coarse or surrogate model.

In a number of subfields, the techniques are designed primarily for optimization in dynamic contexts (that is, decision making over time):

Multi-objective optimization

[edit]

Adding more than one objective to an optimization problem adds complexity. For example, to optimize a structural design, one would desire a design that is both light and rigid. When two objectives conflict, a trade-off must be created. There may be one lightest design, one stiffest design, and an infinite number of designs that are some compromise of weight and rigidity. The set of trade-off designs that improve upon one criterion at the expense of another is known as the Pareto set. The curve created plotting weight against stiffness of the best designs is known as the Pareto frontier.

A design is judged to be "Pareto optimal" (equivalently, "Pareto efficient" or in the Pareto set) if it is not dominated by any other design: If it is worse than another design in some respects and no better in any respect, then it is dominated and is not Pareto optimal.

The choice among "Pareto optimal" solutions to determine the "favorite solution" is delegated to the decision maker. In other words, defining the problem as multi-objective optimization signals that some information is missing: desirable objectives are given but combinations of them are not rated relative to each other. In some cases, the missing information can be derived by interactive sessions with the decision maker.

Multi-objective optimization problems have been generalized further into vector optimization problems where the (partial) ordering is no longer given by the Pareto ordering.

Multi-modal or global optimization

[edit]

Optimization problems are often multi-modal; that is, they possess multiple good solutions. They could all be globally good (same cost function value) or there could be a mix of globally good and locally good solutions. Obtaining all (or at least some of) the multiple solutions is the goal of a multi-modal optimizer.

Classical optimization techniques due to their iterative approach do not perform satisfactorily when they are used to obtain multiple solutions, since it is not guaranteed that different solutions will be obtained even with different starting points in multiple runs of the algorithm.

Common approaches to global optimization problems, where multiple local extrema may be present include evolutionary algorithms, Bayesian optimization and simulated annealing.

Classification of critical points and extrema

[edit]

Feasibility problem

[edit]

The satisfiability problem, also called the feasibility problem, is just the problem of finding any feasible solution at all without regard to objective value. This can be regarded as the special case of mathematical optimization where the objective value is the same for every solution, and thus any solution is optimal.

Many optimization algorithms need to start from a feasible point. One way to obtain such a point is to relax the feasibility conditions using a slack variable; with enough slack, any starting point is feasible. Then, minimize that slack variable until the slack is null or negative.

Existence

[edit]

The extreme value theorem of Karl Weierstrass states that a continuous real-valued function on a compact set attains its maximum and minimum value. More generally, a lower semi-continuous function on a compact set attains its minimum; an upper semi-continuous function on a compact set attains its maximum point or view.

Necessary conditions for optimality

[edit]

One of Fermat's theorems states that optima of unconstrained problems are found at stationary points, where the first derivative or the gradient of the objective function is zero (see first derivative test). More generally, they may be found at critical points, where the first derivative or gradient of the objective function is zero or is undefined, or on the boundary of the choice set. An equation (or set of equations) stating that the first derivative(s) equal(s) zero at an interior optimum is called a 'first-order condition' or a set of first-order conditions.

Optima of equality-constrained problems can be found by the Lagrange multiplier method. The optima of problems with equality and/or inequality constraints can be found using the 'Karush–Kuhn–Tucker conditions'.

Sufficient conditions for optimality

[edit]

While the first derivative test identifies points that might be extrema, this test does not distinguish a point that is a minimum from one that is a maximum or one that is neither. When the objective function is twice differentiable, these cases can be distinguished by checking the second derivative or the matrix of second derivatives (called the Hessian matrix) in unconstrained problems, or the matrix of second derivatives of the objective function and the constraints called the bordered Hessian in constrained problems. The conditions that distinguish maxima, or minima, from other stationary points are called 'second-order conditions' (see 'Second derivative test'). If a candidate solution satisfies the first-order conditions, then the satisfaction of the second-order conditions as well is sufficient to establish at least local optimality.

Sensitivity and continuity of optima

[edit]

The envelope theorem describes how the value of an optimal solution changes when an underlying parameter changes. The process of computing this change is called comparative statics.

The maximum theorem of Claude Berge (1963) describes the continuity of an optimal solution as a function of underlying parameters.

Calculus of optimization

[edit]

For unconstrained problems with twice-differentiable functions, some critical points can be found by finding the points where the gradient of the objective function is zero (that is, the stationary points). More generally, a zero subgradient certifies that a local minimum has been found for minimization problems with convex functions and other locally Lipschitz functions, which meet in loss function minimization of the neural network. The positive-negative momentum estimation lets to avoid the local minimum and converges at the objective function global minimum.[8]

Further, critical points can be classified using the definiteness of the Hessian matrix: If the Hessian is positive definite at a critical point, then the point is a local minimum; if the Hessian matrix is negative definite, then the point is a local maximum; finally, if indefinite, then the point is some kind of saddle point.

Constrained problems can often be transformed into unconstrained problems with the help of Lagrange multipliers. Lagrangian relaxation can also provide approximate solutions to difficult constrained problems.

When the objective function is a convex function, then any local minimum will also be a global minimum. There exist efficient numerical techniques for minimizing convex functions, such as interior-point methods.

Global convergence

[edit]

More generally, if the objective function is not a quadratic function, then many optimization methods use other methods to ensure that some subsequence of iterations converges to an optimal solution. The first and still popular method for ensuring convergence relies on line searches, which optimize a function along one dimension. A second and increasingly popular method for ensuring convergence uses trust regions. Both line searches and trust regions are used in modern methods of non-differentiable optimization. Usually, a global optimizer is much slower than advanced local optimizers (such as BFGS), so often an efficient global optimizer can be constructed by starting the local optimizer from different starting points.

Computational optimization techniques

[edit]

To solve problems, researchers may use algorithms that terminate in a finite number of steps, or iterative methods that converge to a solution (on some specified class of problems), or heuristics that may provide approximate solutions to some problems (although their iterates need not converge).

Optimization algorithms

[edit]

Iterative methods

[edit]

The iterative methods used to solve problems of nonlinear programming differ according to whether they evaluate Hessians, gradients, or only function values. While evaluating Hessians (H) and gradients (G) improves the rate of convergence, for functions for which these quantities exist and vary sufficiently smoothly, such evaluations increase the computational complexity (or computational cost) of each iteration. In some cases, the computational complexity may be excessively high.

One major criterion for optimizers is just the number of required function evaluations as this often is already a large computational effort, usually much more effort than within the optimizer itself, which mainly has to operate over the N variables. The derivatives provide detailed information for such optimizers, but are even harder to calculate, e.g. approximating the gradient takes at least N+1 function evaluations. For approximations of the 2nd derivatives (collected in the Hessian matrix), the number of function evaluations is in the order of N². Newton's method requires the 2nd-order derivatives, so for each iteration, the number of function calls is in the order of N², but for a simpler pure gradient optimizer it is only N. However, gradient optimizers need usually more iterations than Newton's algorithm. Which one is best with respect to the number of function calls depends on the problem itself.

  • Methods that evaluate Hessians (or approximate Hessians, using finite differences):
    • Newton's method
    • Sequential quadratic programming: A Newton-based method for small-medium scale constrained problems. Some versions can handle large-dimensional problems.
    • Interior point methods: This is a large class of methods for constrained optimization, some of which use only (sub)gradient information and others of which require the evaluation of Hessians.
  • Methods that evaluate gradients, or approximate gradients in some way (or even subgradients):
    • Coordinate descent methods: Algorithms which update a single coordinate in each iteration
    • Conjugate gradient methods: Iterative methods for large problems. (In theory, these methods terminate in a finite number of steps with quadratic objective functions, but this finite termination is not observed in practice on finite–precision computers.)
    • Gradient descent (alternatively, "steepest descent" or "steepest ascent"): A (slow) method of historical and theoretical interest, which has had renewed interest for finding approximate solutions of enormous problems.
    • Subgradient methods: An iterative method for large locally Lipschitz functions using generalized gradients. Following Boris T. Polyak, subgradient–projection methods are similar to conjugate–gradient methods.
    • Bundle method of descent: An iterative method for small–medium-sized problems with locally Lipschitz functions, particularly for convex minimization problems (similar to conjugate gradient methods).
    • Ellipsoid method: An iterative method for small problems with quasiconvex objective functions and of great theoretical interest, particularly in establishing the polynomial time complexity of some combinatorial optimization problems. It has similarities with Quasi-Newton methods.
    • Conditional gradient method (Frank–Wolfe) for approximate minimization of specially structured problems with linear constraints, especially with traffic networks. For general unconstrained problems, this method reduces to the gradient method, which is regarded as obsolete (for almost all problems).
    • Quasi-Newton methods: Iterative methods for medium-large problems (e.g. N<1000).
    • Simultaneous perturbation stochastic approximation (SPSA) method for stochastic optimization; uses random (efficient) gradient approximation.
  • Methods that evaluate only function values: If a problem is continuously differentiable, then gradients can be approximated using finite differences, in which case a gradient-based method can be used.

Heuristics

[edit]

Besides (finitely terminating) algorithms and (convergent) iterative methods, there are heuristics. A heuristic is any algorithm which is not guaranteed (mathematically) to find the solution, but which is nevertheless useful in certain practical situations. List of some well-known heuristics:

Applications

[edit]

Mechanics

[edit]

Problems in rigid body dynamics (in particular articulated rigid body dynamics) often require mathematical programming techniques, since you can view rigid body dynamics as attempting to solve an ordinary differential equation on a constraint manifold;[9] the constraints are various nonlinear geometric constraints such as "these two points must always coincide", "this surface must not penetrate any other", or "this point must always lie somewhere on this curve". Also, the problem of computing contact forces can be done by solving a linear complementarity problem, which can also be viewed as a QP (quadratic programming) problem.

Many design problems can also be expressed as optimization programs. This application is called design optimization. One subset is the engineering optimization, and another recent and growing subset of this field is multidisciplinary design optimization, which, while useful in many problems, has in particular been applied to aerospace engineering problems.

This approach may be applied in cosmology and astrophysics.[10]

Economics and finance

[edit]

Economics is closely enough linked to optimization of agents that an influential definition relatedly describes economics qua science as the "study of human behavior as a relationship between ends and scarce means" with alternative uses.[11] Modern optimization theory includes traditional optimization theory but also overlaps with game theory and the study of economic equilibria. The Journal of Economic Literature codes classify mathematical programming, optimization techniques, and related topics under JEL:C61-C63.

In microeconomics, the utility maximization problem and its dual problem, the expenditure minimization problem, are economic optimization problems. Insofar as they behave consistently, consumers are assumed to maximize their utility, while firms are usually assumed to maximize their profit. Also, agents are often modeled as being risk-averse, thereby preferring to avoid risk. Asset prices are also modeled using optimization theory, though the underlying mathematics relies on optimizing stochastic processes rather than on static optimization. International trade theory also uses optimization to explain trade patterns between nations. The optimization of portfolios is an example of multi-objective optimization in economics.

Since the 1970s, economists have modeled dynamic decisions over time using control theory.[12] For example, dynamic search models are used to study labor-market behavior.[13] A crucial distinction is between deterministic and stochastic models.[14] Macroeconomists build dynamic stochastic general equilibrium (DSGE) models that describe the dynamics of the whole economy as the result of the interdependent optimizing decisions of workers, consumers, investors, and governments.[15][16]

Electrical engineering

[edit]

Some common applications of optimization techniques in electrical engineering include active filter design,[17] stray field reduction in superconducting magnetic energy storage systems, space mapping design of microwave structures,[18] handset antennas,[19][20][21] electromagnetics-based design. Electromagnetically validated design optimization of microwave components and antennas has made extensive use of an appropriate physics-based or empirical surrogate model and space mapping methodologies since the discovery of space mapping in 1993.[22][23] Optimization techniques are also used in power-flow analysis.[24]

Civil engineering

[edit]

Optimization has been widely used in civil engineering. Construction management and transportation engineering are among the main branches of civil engineering that heavily rely on optimization. The most common civil engineering problems that are solved by optimization are cut and fill of roads, life-cycle analysis of structures and infrastructures,[25] resource leveling,[26][27] water resource allocation, traffic management[28] and schedule optimization.

Operations research

[edit]

Another field that uses optimization techniques extensively is operations research.[29] Operations research also uses stochastic modeling and simulation to support improved decision-making. Increasingly, operations research uses stochastic programming to model dynamic decisions that adapt to events; such problems can be solved with large-scale optimization and stochastic optimization methods.

Control engineering

[edit]

Mathematical optimization is used in much modern controller design. High-level controllers such as model predictive control (MPC) or real-time optimization (RTO) employ mathematical optimization. These algorithms run online and repeatedly determine values for decision variables, such as choke openings in a process plant, by iteratively solving a mathematical optimization problem including constraints and a model of the system to be controlled.

Geophysics

[edit]

Optimization techniques are regularly used in geophysical parameter estimation problems. Given a set of geophysical measurements, e.g. seismic recordings, it is common to solve for the physical properties and geometrical shapes of the underlying rocks and fluids. The majority of problems in geophysics are nonlinear with both deterministic and stochastic methods being widely used.

Molecular modeling

[edit]

Nonlinear optimization methods are widely used in conformational analysis.

Computational systems biology

[edit]

Optimization techniques are used in many facets of computational systems biology such as model building, optimal experimental design, metabolic engineering, and synthetic biology.[30] Linear programming has been applied to calculate the maximal possible yields of fermentation products,[30] and to infer gene regulatory networks from multiple microarray datasets[31] as well as transcriptional regulatory networks from high-throughput data.[32] Nonlinear programming has been used to analyze energy metabolism[33] and has been applied to metabolic engineering and parameter estimation in biochemical pathways.[34]

Machine learning

[edit]

Solvers

[edit]

See also

[edit]

Notes

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Mathematical optimization, also known as mathematical programming, is a subfield of dedicated to selecting the best element from a set of available alternatives by systematically finding values of decision variables that minimize or maximize a real-valued objective function while satisfying a set of constraints. These problems can be formulated as minf0(x)\min f_0(x) subject to inequality constraints fi(x)bif_i(x) \leq b_i and equality constraints, where xRnx \in \mathbb{R}^n represents the variables, f0f_0 is the objective function, and the fif_i define the . The field encompasses diverse problem classes, including (where the objective and constraints are linear), nonlinear programming (with nonlinear functions), integer programming (requiring integer variables), and convex optimization (where the objective and feasible set are convex, enabling efficient global solutions). Historical development traces back to the 1700s with theories for unconstrained optimization by Fermat, Newton, and Euler, followed by Lagrange's 1797 work on equality-constrained problems; modern milestones include Dantzig's 1947 simplex method for linear programming and Karmarkar's 1984 interior-point algorithms, which spurred polynomial-time solutions for various classes. Applications of mathematical optimization span numerous domains, including engineering design (e.g., minimizing material weight in structures), (e.g., and equilibrium models), (e.g., via Markowitz's mean-variance framework), transportation (e.g., minimizing shipping costs), (e.g., training support vector machines), and (e.g., scheduling and ). These techniques leverage algorithms like for unconstrained problems and specialized solvers for constrained ones, often implemented in software such as JuMP for modeling complex scenarios.

Fundamental Concepts

Optimization Problems

Mathematical optimization seeks to determine the values of decision variables that either minimize or maximize a given objective function while satisfying a set of constraints. The general formulation of such a problem is to minimize (or equivalently, maximize) an objective function f:RnRf: \mathbb{R}^n \to \mathbb{R} over decision variables xRnx \in \mathbb{R}^n, subject to inequality constraints gi(x)0g_i(x) \leq 0 for i=1,,mi = 1, \dots, m and equality constraints hj(x)=0h_j(x) = 0 for j=1,,pj = 1, \dots, p. Here, the decision variables xx represent the unknowns to be determined, and the constraints define the allowable values for xx. Optimization problems are classified based on the nature of the decision variables. In , the variables xx take real values, allowing for solutions in Rn\mathbb{R}^n. , in contrast, requires some or all variables to belong to a discrete set, such as integers, leading to a finite (though potentially large) number of possible solutions. Mixed-integer problems combine both continuous and discrete variables, blending the characteristics of the two categories. Standard forms of optimization problems vary by the types of constraints imposed. An unconstrained problem simply minimizes f(x)f(x) without any restrictions on xx, such as minxx2\min_x x^2, where the objective is a . Equality-constrained problems include only equality constraints, like minxf(x)\min_x f(x) subject to h(x)=0h(x) = 0. Inequality-constrained problems incorporate inequality constraints, for example, minxx\min_x x subject to x1x \geq 1, which limits the decision variable to non-negative values. These forms provide a structured way to express diverse applications in fields like and . The , also known as the , consists of all decision variables xx that satisfy the given constraints, forming the domain over which the objective function is evaluated. In continuous problems, this set is often a convex or other geometric region in Rn\mathbb{R}^n; in discrete cases, it is a finite collection of points. The optimal solution, if it exists, lies within this feasible set and may be local or global, depending on the problem's structure.

Notation and Terminology

Mathematical optimization problems are typically formulated in a standard minimization form, given by minxXf(x)subject togi(x)0,i=1,,m,hj(x)=0,j=1,,p,\begin{align*} \min_{x \in X} \quad & f(x) \\ \text{subject to} \quad & g_i(x) \leq 0, \quad i = 1, \dots, m, \\ & h_j(x) = 0, \quad j = 1, \dots, p, \end{align*} where xRnx \in \mathbb{R}^n represents the decision variables, f:RnRf: \mathbb{R}^n \to \mathbb{R} is the objective function to be minimized, gi:RnRg_i: \mathbb{R}^n \to \mathbb{R} are inequality constraint functions, and hj:RnRh_j: \mathbb{R}^n \to \mathbb{R} are equality constraint functions. The set XRnX \subseteq \mathbb{R}^n denotes the domain, often Rn\mathbb{R}^n itself. Maximization problems are equivalently handled by minimizing the negative of the objective, i.e., minxf(x)\min_x -f(x) for maxxf(x)\max_x f(x). The feasible set, denoted Ω={xXgi(x)0,i=1,,m;hj(x)=0,j=1,,p}\Omega = \{ x \in X \mid g_i(x) \leq 0, \, i=1,\dots,m; \, h_j(x) = 0, \, j=1,\dots,p \}, consists of all points satisfying the constraints. Standard notation for optimal points and values includes argminxf(x)\arg\min_x f(x), the set of points xx^* achieving the minimum; inff(x)\inf f(x), the greatest lower bound of ff over the domain (equal to the minimum if attained); and supf(x)\sup f(x), the least upper bound (used analogously for maximization). If inff(x)=\inf f(x) = -\infty over Ω\Omega, the problem is unbounded below; otherwise, it is bounded. Key terms include convexity: a function ff is convex if f(λx+(1λ)y)λf(x)+(1λ)f(y)f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y) for λ[0,1]\lambda \in [0,1], ensuring any local minimum is global in convex problems; concave functions satisfy the reverse inequality and are relevant for maximization objectives. For constrained , provides a constraint qualification ensuring : there exists a strictly feasible point x~Ω\tilde{x} \in \Omega such that gi(x~)<0g_i(\tilde{x}) < 0 for all ii with convex gig_i, and equalities hold.

Local and Global Optima

In mathematical optimization, a global minimum of an objective function f:ΩRf: \Omega \to \mathbb{R} over a feasible set Ω\Omega is a point xΩx^* \in \Omega such that f(x)f(x)f(x^*) \leq f(x) for all xΩx \in \Omega. Similarly, a global maximum satisfies f(x)f(x)f(x^*) \geq f(x) for all xΩx \in \Omega. These represent the overall best and worst values of ff across the entire domain. In contrast, a local minimum is a point xΩx^* \in \Omega for which there exists a neighborhood NN around xx^* such that f(x)f(x)f(x^*) \leq f(x) for all xNΩx \in N \cap \Omega. Local maxima are defined analogously. Local optima may not coincide with global ones, particularly in nonconvex problems where multiple basins of attraction exist. Optima are further classified as strict or non-strict. A strict local minimum satisfies f(x)<f(x)f(x^*) < f(x) for all xNΩ{x}x \in N \cap \Omega \setminus \{x^*\}, implying uniqueness within the neighborhood. Non-strict optima allow equality with nearby points, such as along a flat region. Saddle points are critical points that are neither local minima nor maxima, where the function increases in some directions and decreases in others. Critical points form the foundation for identifying potential optima. In unconstrained optimization over an open set, a critical point xx^* satisfies f(x)=0\nabla f(x^*) = 0, where f\nabla f is the gradient. For constrained problems, critical points are those satisfying the Karush-Kuhn-Tucker (KKT) conditions, which generalize stationarity to include for equality and inequality constraints. These conditions, introduced by Kuhn and Tucker, require primal feasibility, dual feasibility, complementary slackness, and stationarity of the Lagrangian. Consider the quadratic function f(x)=x2f(x) = x^2 over R\mathbb{R}. Here, x=[0](/page/0)x^* = [0](/page/0) is both a strict local and global minimum, with f(0)=0f(0) = 0 and f(x)>0f(x) > 0 for x0x \neq 0. For a multimodal example, f(x)=sin(x)+x210f(x) = \sin(x) + \frac{x^2}{10} over R\mathbb{R} has multiple local minima near points where cos(x)+x5=0\cos(x) + \frac{x}{5} = 0, but the global minimum occurs at the leftmost such point, approximately x1.2x \approx -1.2, due to the dominating quadratic term. Optima may be attainable or unattainable. An attainable optimum is achieved at some xΩx^* \in \Omega, so the minimum value equals the infimum infxΩf(x)\inf_{x \in \Omega} f(x). Unattainable optima occur when the infimum is not achieved, such as for f(x)=exf(x) = e^x over R\mathbb{R}, where inff(x)=0\inf f(x) = 0 but no xx realizes it. This distinction arises in open or unbounded feasible sets, impacting convergence and problem well-posedness.

Historical Development

Early History

The origins of mathematical optimization trace back to ancient civilizations, where geometric and practical problems implicitly involved finding extrema or efficient allocations. In , Euclid's Elements (circa 300 BCE) addressed early forms of optimization through geometric constructions that maximized or minimized quantities. For instance, Book III, Proposition 15, demonstrates that among straight lines in a circle, the is the longest, and chords nearer are greater than those farther away, establishing a foundational result on maximizing line segments within a bounded region. Similarly, in ancient China, The Nine Chapters on the Mathematical Art (circa 200 BCE) included problems on resource allocation, such as fair division of land and labor in Chapter 6 ("Fair Taxes") and solving systems of equations in Chapter 7 ("Excess and Deficit") to optimize distributions under constraints. In the 17th century, optimization ideas emerged in the context of physics and . Pierre de Fermat's principle of least time, proposed in 1662, stated that light travels between two points along the path that minimizes travel time, unifying laws of reflection and refraction under a variational framework and foreshadowing modern optimization principles. Concurrently, developed his method for root-finding in the 1670s, an iterative technique to approximate solutions to equations, which later found applications in optimization by solving for critical points where vanish. The marked a pivotal advancement with the formalization of the by Leonhard Euler and . Euler's 1744 treatise Methodus inveniendi lineas curvas maximi minimive proprietate gaudentes introduced systematic methods to find curves extremizing functionals, including solutions to isoperimetric problems that maximize area for a fixed perimeter. Lagrange extended this in his 1788 Mécanique Analytique, developing the Euler-Lagrange equations for and applying them to , thereby laying the groundwork for . By the early 19th century, precursors to appeared in Joseph Fourier's 1823 work on solving systems of linear inequalities, providing an elimination method to determine feasible regions and optimize linear objectives over polyhedra. Additionally, the method of , independently developed by in 1805 and (who claimed prior invention around 1795), minimized the sum of squared residuals for data fitting in astronomy, establishing a cornerstone of regression and parameter estimation in optimization.

Modern Foundations

The modern foundations of mathematical optimization were laid in the 1930s and 1940s through pioneering work on for problems. developed the foundational ideas of in his 1939 monograph, formulating methods to optimize under linear constraints, which anticipated key concepts like shadow prices. Independently, advanced similar ideas around 1942 in a memorandum on activity analysis, later elaborated in his 1951 edited volume, applying linear models to economic allocation during wartime logistics efforts. These contributions established as a tool for efficient resource use, earning and the 1975 in Economic Sciences. then introduced the simplex method in 1947, providing an efficient algorithmic framework for solving linear programs by traversing vertices of the feasible , which became the cornerstone of practical implementation. Following , the field experienced a surge through the (OR) movement, driven by military applications in and planning. This era saw the integration of optimization with emerging computational tools, fostering interdisciplinary growth. John von Neumann's 1928 minimax theorem, which guarantees the existence of optimal mixed strategies in zero-sum games, found direct applications in the to economic and strategic decision-making, notably in his 1944 collaboration on , linking adversarial optimization to broader economic models. The 1950s marked the emergence of nonlinear programming, extending optimization to non-linear objectives and constraints. The Karush-Kuhn-Tucker (KKT) conditions, first derived by William Karush in his 1939 master's thesis and independently published by Harold Kuhn and Albert Tucker in 1951, provide necessary optimality criteria for constrained nonlinear problems under constraint qualifications, generalizing Lagrange multipliers to inequalities. This period also saw Richard Bellman's introduction of dynamic programming in 1957, a recursive approach for solving multistage decision problems by breaking them into subproblems, revolutionizing sequential optimization in control and . A pivotal event was the inaugural International Symposium on Mathematical Programming in 1949, which unified researchers and solidified the field as a distinct discipline. Further advancements included Anthony Fiacco and Garth McCormick's 1968 development of sequential unconstrained minimization techniques (SUMT), which transform constrained nonlinear programs into sequences of unconstrained problems via penalty functions, laying groundwork for modern interior-point and methods.

Contemporary Advances

In the and , interior-point methods marked a pivotal advancement in , with Narendra Karmarkar's 1984 introducing a polynomial-time approach that efficiently navigated the interior of the , surpassing the limitations of the simplex method for large-scale problems. This innovation spurred widespread adoption in and , enabling solutions to industrial-scale linear programs with improved scalability and theoretical guarantees. By the , extensions of these methods extended to convex quadratic and , facilitating applications in , , and design. The 2010s witnessed the surge of techniques in , driven by the need to handle massive datasets in . variants, such as the Adam optimizer introduced in 2014, combined adaptive learning rates with momentum to accelerate convergence in nonconvex landscapes, becoming a standard for training neural networks on tasks like image recognition and . Concurrently, emerged as a cornerstone for solving optimization challenges, enabling efficient computation of gradients through complex computational graphs via frameworks like Theano (2010) and (2015), which automated reverse-mode differentiation for scalable . These tools addressed the computational bottlenecks in training deep architectures, achieving state-of-the-art performance on benchmarks such as with reduced engineering effort. Integration with artificial intelligence further propelled optimization in the late 2010s, exemplified by () methods starting around 2016, which automated the design of topologies using or evolutionary strategies to optimize performance metrics like accuracy and efficiency. frameworks, such as those employing recurrent networks to explore architectural spaces, yielded models outperforming hand-crafted designs on datasets like , with applications spanning and beyond. Robust optimization, initially formalized by Dimitris Bertsimas and colleagues in the early 2000s for handling parameter uncertainty, matured in the 2020s through data-driven enhancements that incorporated for uncertainty sets, improving decision-making under ambiguity in fields like and energy systems. These advancements, including two-stage sample robust formulations, provided tractable approximations with probabilistic guarantees, demonstrating superior out-of-sample performance compared to methods in high-dimensional settings. In the 2020s, gained traction, with the quantum approximate optimization algorithm (QAOA), proposed in 2014, seeing practical implementations on noisy intermediate-scale quantum devices for combinatorial problems like MaxCut, often extending principles from Grover's search algorithm to achieve quadratic speedups in unstructured optimization. Recent extensions, such as , have demonstrated proven speedups over classical methods on small-scale problems with up to a few , such as in robotic manipulator tasks, leveraging for hybrid quantum-classical solvers. Parallel to this, distributed optimization techniques addressed challenges, with algorithms like doubly accelerated methods enabling parallel local computations across clusters to minimize communication overhead, achieving convergence rates competitive with centralized approaches on terabyte-scale datasets in distributed . Contemporary applications increasingly emphasize , particularly in modeling, where optimization integrates with AI to calibrate parameters in Earth system simulations, such as deriving data-driven closure models for atmospheric dynamics that enhance predictive accuracy for global warming scenarios. These efforts, including multi-objective formulations for emission reduction pathways, have supported policy decisions by optimizing trade-offs in deployment and carbon capture under uncertain projections. methods have evolved alongside these developments, with hybrids providing robust approximations for nonconvex problems where exact solutions remain intractable.

Theoretical Foundations

Feasibility and Existence

In mathematical optimization, the feasible set, denoted as Ω\Omega, consists of all points xx in the domain that satisfy the problem's constraints, such as equalities hi(x)=0h_i(x) = 0 and inequalities gj(x)0g_j(x) \leq 0. If Ω\Omega is empty, the is infeasible, meaning no solution exists that meets all constraints simultaneously. For instance, consider the linear program minx\min x subject to x1x \geq 1 and x0x \leq 0; the constraints contradict each other, resulting in an empty feasible set. Constraint qualifications, such as the Mangasarian-Fromovitz constraint qualification (MFCQ), ensure that the feasible set possesses desirable geometric properties, like a non-empty interior relative to the active constraints at certain points, which is crucial for the well-posedness of the problem. Introduced by Mangasarian and Fromovitz, MFCQ requires that the gradients of the equality constraints are linearly independent and that there exists a direction satisfying strict inequality for the active inequality constraints. This qualification helps in analyzing the structure of Ω\Omega without singularities that could complicate feasibility assessments. Even when Ω\Omega is non-empty, an optimization problem may lack a solution if the objective function ff does not attain its infimum over Ω\Omega. The Weierstrass extreme value theorem guarantees existence under suitable conditions: if f:RnRf: \mathbb{R}^n \to \mathbb{R} is continuous and Ω\Omega is compact (closed and bounded), then ff attains both its minimum and maximum on Ω\Omega. minxΩf(x)=f(x)for some xΩ,\min_{x \in \Omega} f(x) = f(x^*) \quad \text{for some } x^* \in \Omega, where the infimum is achieved. For unbounded feasible sets, such as Ω=Rn\Omega = \mathbb{R}^n, compactness fails, but coercivity of ff—defined as limxf(x)=+\lim_{\|x\| \to \infty} f(x) = +\infty—ensures a global minimizer exists, as minimizing sequences remain bounded and thus have convergent subsequences yielding the minimum. Problems can also be unbounded below, where infxΩf(x)=\inf_{x \in \Omega} f(x) = -\infty, precluding a minimum despite a non-empty Ω\Omega. For example, the linear program minx\min -x subject to x0x \geq 0 has feasible points but no minimum, as f(x)f(x) \to -\infty along the ray x+x \to +\infty. In such cases, the infimum is not attained, highlighting the distinction between the infimum value and an actual minimizer in the feasible set.

Optimality Conditions

Optimality conditions provide necessary and sufficient criteria for identifying and global in mathematical optimization problems, serving as foundational tools for theoretical and algorithmic verification. For unconstrained optimization problems, where the objective is to minimize a f:RnRf: \mathbb{R}^n \to \mathbb{R}, the first-order necessary condition for a point xx^* to be a minimum is that the vanishes at xx^*, i.e., f(x)=0\nabla f(x^*) = 0. This condition arises from the requirement that the in any feasible direction must be non-negative at a optimum. Under twice differentiability assumptions, the second-order necessary condition further requires that the H(x)=2f(x)H(x^*) = \nabla^2 f(x^*) is positive semi-definite, meaning dTH(x)d0d^T H(x^*) d \geq 0 for all directions dRnd \in \mathbb{R}^n. Conversely, if H(x)H(x^*) is positive definite (dTH(x)d>0d^T H(x^*) d > 0 for all d0d \neq 0), then xx^* is a strict minimum, providing a second-order sufficient condition. In the presence of equality constraints, consider the problem of minimizing f(x)f(x) subject to gi(x)=0g_i(x) = 0 for i=1,,mi = 1, \dots, m, where gi:RnRg_i: \mathbb{R}^n \to \mathbb{R} are differentiable. The method of Lagrange multipliers introduces the Lagrangian L(x,λ)=f(x)+i=1mλigi(x)\mathcal{L}(x, \lambda) = f(x) + \sum_{i=1}^m \lambda_i g_i(x), and the first-order necessary conditions require the existence of multipliers λRm\lambda \in \mathbb{R}^m such that xL(x,λ)=0\nabla_x \mathcal{L}(x^*, \lambda) = 0 and λL(x,λ)=0\nabla_\lambda \mathcal{L}(x^*, \lambda) = 0, or equivalently, f(x)+i=1mλigi(x)=0\nabla f(x^*) + \sum_{i=1}^m \lambda_i \nabla g_i(x^*) = 0 and gi(x)=0g_i(x^*) = 0 for all ii. These conditions ensure that at the optimum, the gradients of the active constraints lie in the span of the objective gradient, aligning the stationarity requirement with the constraint manifold. Second-order conditions extend this by requiring the Hessian of the Lagrangian, projected onto the tangent space of the constraints, to be positive semi-definite for necessity and positive definite for sufficiency. For problems involving inequality constraints, minimize f(x)f(x) subject to gi(x)0g_i(x) \leq 0 for i=1,,mi = 1, \dots, m and hj(x)=0h_j(x) = 0 for j=1,,pj = 1, \dots, p, the Karush-Kuhn-Tucker (KKT) conditions generalize the Lagrange framework to handle both equalities and inequalities under suitable constraint qualifications, such as . The KKT conditions consist of stationarity, primal feasibility, dual feasibility, and complementary slackness, stated as follows for a local minimum xx^* with associated multipliers λR0m\lambda \in \mathbb{R}^m_{\geq 0} and μRp\mu \in \mathbb{R}^p: f(x)+i=1mλigi(x)+j=1pμjhj(x)=0,gi(x)0,i=1,,m,hj(x)=0,j=1,,p,λi0,i=1,,m,λigi(x)=0,i=1,,m.\begin{align*} \nabla f(x^*) + \sum_{i=1}^m \lambda_i \nabla g_i(x^*) + \sum_{j=1}^p \mu_j \nabla h_j(x^*) &= 0, \\ g_i(x^*) &\leq 0, \quad i=1,\dots,m, \\ h_j(x^*) &= 0, \quad j=1,\dots,p, \\ \lambda_i &\geq 0, \quad i=1,\dots,m, \\ \lambda_i g_i(x^*) &= 0, \quad i=1,\dots,m. \end{align*} Here, the multipliers λi\lambda_i for inequalities are non-negative, reflecting the directional nature of the constraints, and complementary slackness enforces that λi>0\lambda_i > 0 only for active inequalities (gi(x)=0g_i(x^*) = 0). These conditions are necessary for optimality under constraint qualifications and sufficient for global optimality in convex problems. In convex optimization, where the objective ff and feasible set are convex, any point satisfying the first-order (KKT) conditions is a global optimum, as local minima coincide with global ones due to the absence of local traps in the convex landscape. This property, established through the convexity of sublevel sets and epigraphs, ensures that verification via optimality conditions yields the overall solution without exhaustive search.

Duality and Sensitivity

In mathematical optimization, duality provides a framework for reformulating a primal optimization problem into a dual problem, offering insights into bounds, optimality, and economic interpretations. The Lagrangian dual is constructed for a primal problem of minimizing an objective function subject to inequality and equality constraints. Consider the primal problem: minxf(x)subject togi(x)0,i=1,,m,hj(x)=0,j=1,,p,\begin{align*} \min_{x} \quad & f(x) \\ \text{subject to} \quad & g_i(x) \leq 0, \quad i = 1, \dots, m, \\ & h_j(x) = 0, \quad j = 1, \dots, p, \end{align*} where f:RnRf: \mathbb{R}^n \to \mathbb{R} is the objective, gi:RnRg_i: \mathbb{R}^n \to \mathbb{R} are inequality functions, and hj:RnRh_j: \mathbb{R}^n \to \mathbb{R} are equality functions. The Lagrangian is defined as L(x,λ,μ)=f(x)+i=1mλigi(x)+j=1pμjhj(x)L(x, \lambda, \mu) = f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^p \mu_j h_j(x), with λ0\lambda \geq 0. The dual function is g(λ,μ)=infxL(x,λ,μ)g(\lambda, \mu) = \inf_x L(x, \lambda, \mu), and the dual problem is maxλ,μg(λ,μ)\max_{\lambda, \mu} g(\lambda, \mu) subject to λ0\lambda \geq 0. Weak duality states that the primal optimal value is at least as large as the dual optimal value, i.e., fgf^* \geq g^*, for any feasible primal and dual points. This holds generally, providing a lower bound on the primal optimum via dual solutions. , where f=gf^* = g^* and the is zero, requires additional conditions; for convex problems satisfying —a strict feasibility point exists for inequalities—the vanishes, enabling exact recovery of the primal optimum from the dual. Sensitivity analysis examines how the optimal value and solutions change with perturbations in problem data, such as objective coefficients or right-hand sides of constraints. Shadow prices, given by the optimal dual variables, quantify these changes; for a constraint AxbAx \geq b, the shadow price yy^* satisfies fbk=yk\frac{\partial f^*}{\partial b_k} = y_k^* under strong duality, indicating the marginal improvement in the objective per unit increase in bkb_k. This follows from the envelope theorem, which states that the derivative of the optimal value function with respect to a parameter equals the partial derivative of the Lagrangian with respect to that parameter, evaluated at the optimum, ignoring indirect effects through the optimizer. A canonical example is duality. The primal is mincx\min c^\top x subject to AxbAx \geq b, x0x \geq 0, with optimal value ff^*. The dual is maxby\max b^\top y subject to AycA^\top y \leq c, y0y \geq 0, with optimal value gg^*. holds (f=gf^* = g^*) without further conditions due to the problem's structure, and the dual variables yy^* serve as shadow prices for resource constraints in bb. For instance, in , yky_k^* represents the value of an additional unit of kk. (Note: This is a draft PDF of Bertsimas & Tsitsiklis; official book: Athena Scientific, 1997) Under suitable regularity conditions, such as differentiability of the objective and constraints, the optimal value function is continuous with respect to perturbations, and the set of optimal solutions exhibits . For convex problems with , the optimal solution set is nonempty and compact for nearby perturbations, ensuring stability. These properties are central to perturbation analysis, linking duality to robustness in applications like control and .

Major Subfields

Linear and Convex Optimization

is a fundamental subfield of mathematical optimization that involves minimizing or maximizing a linear objective function subject to linear equality and inequality constraints. The standard form of a linear program is given by mincxs.t.Ax=b,x0,\begin{align*} \min &\quad \mathbf{c}^\top \mathbf{x} \\ \text{s.t.} &\quad A\mathbf{x} = \mathbf{b}, \\ &\quad \mathbf{x} \geq \mathbf{0}, \end{align*} where cRn\mathbf{c} \in \mathbb{R}^n, ARm×nA \in \mathbb{R}^{m \times n}, bRm\mathbf{b} \in \mathbb{R}^m, and xRn\mathbf{x} \in \mathbb{R}^n. This formulation assumes non-negativity constraints, with inequalities convertible to equalities via slack variables. The simplex method, developed by in 1947, solves linear programs by traversing vertices of the defined by the constraints, starting from an initial and pivoting to improve the objective until optimality. Interior-point methods, which navigate the interior of the using barrier functions to approach the optimum, emerged as alternatives offering better practical performance for large-scale problems. A landmark advancement was Narendra Karmarkar's , which provided the first polynomial-time for , running in O(n3.5L)O(n^{3.5} L) time where nn is the number of variables and LL the of the input, sparking widespread development of efficient solvers. Convex optimization generalizes to problems where the objective function and constraints are convex, ensuring that local optima are global and enabling efficient solvability. A set CRnC \subseteq \mathbb{R}^n is convex if for any x,yC\mathbf{x}, \mathbf{y} \in C and θ[0,1]\theta \in [0,1], the point θx+(1θ)yC\theta \mathbf{x} + (1-\theta) \mathbf{y} \in C. A function f:RnRf: \mathbb{R}^n \to \mathbb{R} is convex if its epigraph {(x,t)f(x)t}\{(\mathbf{x}, t) \mid f(\mathbf{x}) \leq t\} is a , or equivalently, if f(θx+(1θ)y)θf(x)+(1θ)f(y)f(\theta \mathbf{x} + (1-\theta) \mathbf{y}) \leq \theta f(\mathbf{x}) + (1-\theta) f(\mathbf{y}) for all x,y\mathbf{x}, \mathbf{y} in the domain and θ[0,1]\theta \in [0,1]. The epigraph formulation transforms a convex problem minf0(x)\min f_0(\mathbf{x}) subject to fi(x)0f_i(\mathbf{x}) \leq 0 into an equivalent form mint\min t subject to (x,t)epi f0( \mathbf{x}, t ) \in \text{epi } f_0 and affine constraints, facilitating analysis and solution via linear approximations. Jensen's inequality states that for a convex function ff and weights λi0\lambda_i \geq 0 summing to 1, f(iλixi)iλif(xi)f(\sum_i \lambda_i \mathbf{x}_i) \leq \sum_i \lambda_i f(\mathbf{x}_i), with equality if ff is affine on the convex hull of the xi\mathbf{x}_i. This property underpins many convexity proofs and applications in optimization. Key properties of convex optimization include solution uniqueness under strict convexity of the objective, where any local minimum is the unique global minimum, and separability, where the objective decomposes as f(x)=kfk(xk)f(\mathbf{x}) = \sum_k f_k(\mathbf{x}_k) across blocks xk\mathbf{x}_k, allowing decomposition into subproblems for parallel or distributed solving. Semidefinite programming extends convex optimization by optimizing linear functions over spectrahedra, the feasible sets defined by linear matrix inequalities F(x)0F(\mathbf{x}) \succeq 0 where F(x)F(\mathbf{x}) is affine in x\mathbf{x} and 0\succeq 0 denotes positive semidefiniteness, capturing problems like maximum cut approximations beyond linear constraints. A classic example is the diet problem, formulated as minimizing the cost jcjxj\sum_j c_j x_j where xjx_j is the amount of food jj, subject to jaijxjbi\sum_j a_{ij} x_j \geq b_i for nutrients ii with requirements bib_i, and xj0x_j \geq 0; George Stigler posed this in 1945, solvable via linear programming to find minimal-cost nutrient-adequate diets.

Nonlinear and Nonconvex Optimization

Nonlinear optimization encompasses problems where the objective function or constraints are nonlinear, often leading to complex landscapes with multiple local optima rather than the unique global solutions typical of linear or convex cases. These problems arise in diverse fields such as engineering design, chemical process modeling, and economic equilibrium analysis, where the nonlinearity captures realistic phenomena like saturation effects or interaction terms. Unlike convex optimization, which guarantees global optimality under mild conditions, nonlinear problems generally converge only to local optima, necessitating careful algorithm selection to navigate saddle points and ensure practical convergence. In unconstrained nonlinear optimization, the goal is to minimize a f:RnRf: \mathbb{R}^n \to \mathbb{R} without restrictions on the variables. is a foundational second-order approach that approximates the function quadratically using the f(x)\nabla f(x) and H(x)=2f(x)H(x) = \nabla^2 f(x), yielding the update direction dk=H(xk)1f(xk)d_k = -H(x_k)^{-1} \nabla f(x_k). This step leverages curvature information for rapid quadratic convergence near a local minimum satisfying second-order optimality conditions, though it can fail far from solutions due to indefinite Hessians or high computational cost in inverting large matrices. To mitigate these issues, trust-region methods restrict the step to a neighborhood where the quadratic model remains trustworthy, solving a subproblem mindmk(d)=f(xk)+f(xk)Td+12dTH(xk)d\min_d m_k(d) = f(x_k) + \nabla f(x_k)^T d + \frac{1}{2} d^T H(x_k) d subject to dΔk\|d\| \leq \Delta_k, with trust radius Δk\Delta_k adjusted based on the of actual to predicted function decrease. These methods ensure global convergence for sufficiently small steps and handle nonconvexity better than pure Newton iterations by incorporating line searches or regularization. For problems with nonlinear constraints, such as minf(x)\min f(x) subject to c(x)=0c(x) = 0 (equality-constrained), (SQP) iteratively solves quadratic approximations of the Lagrangian, forming subproblems mindf(xk)Td+12dTHkd\min_d \nabla f(x_k)^T d + \frac{1}{2} d^T H_k d subject to c(xk)Td+c(xk)=0\nabla c(x_k)^T d + c(x_k) = 0, where HkH_k approximates the Hessian of the Lagrangian. This approach exploits the structure of quadratic programs for efficiency and achieves superlinear convergence under constraint qualifications, making it suitable for medium-scale nonlinear programs in aerospace trajectory optimization. Inequality-constrained nonlinear optimization, minf(x)\min f(x) subject to g(x)0g(x) \leq 0, often employs barrier methods, which transform the problem into a sequence of unconstrained ones by adding a logarithmic barrier term μlog(gi(x))-\mu \sum \log(-g_i(x)) to the objective, with barrier parameter μ>0\mu > 0 decreasing toward zero. This interior-point technique keeps iterates feasible and feasible, converging to the optimum while avoiding boundary violations, though it requires careful handling of ill-conditioning near degeneracy. Nonconvex nonlinear optimization introduces significant challenges due to the proliferation of local minima, saddle points, and flat regions, where algorithms may converge to suboptimal solutions depending on initialization. For instance, verifying global optimality is often NP-hard, as seen in the traveling salesman problem (TSP), a classic nonconvex task that requires finding the shortest tour visiting each city once, with the continuous relaxation exhibiting multiple local minima that trap gradient-based methods. These issues underscore the need for hybrid strategies combining local search with randomization, though deterministic guarantees remain elusive without additional problem structure. A representative benchmark for testing nonlinear optimization algorithms is the , defined as f(x1,x2)=100(x2x12)2+(1x1)2f(x_1, x_2) = 100(x_2 - x_1^2)^2 + (1 - x_1)^2, which features a narrow, curved valley leading to the global minimum at (1,1)(1, 1). This nonconvex test case challenges methods on slow convergence along the valley and sensitivity to step sizes, highlighting the trade-offs in unconstrained solvers like versus more robust trust-region variants.

Stochastic and Robust Optimization

Stochastic optimization addresses decision-making problems where the objective function or constraints involve random variables, typically formulated as minimizing the of a cost function over uncertain parameters. In this framework, the goal is to solve problems of the form minxEξ[f(x,ξ)]\min_x \mathbb{E}_{\xi}[f(x, \xi)], where xx is the decision variable, ξ\xi represents the random uncertainty, and the expectation is taken with respect to the distribution of ξ\xi. This approach originated in the mid-20th century with early works on under uncertainty, such as Dantzig's 1955 paper introducing recourse actions in linear programs. has since become essential in fields like and , where exact distributions may be known or assumed, allowing for risk-neutral decisions based on averages over possible outcomes. A key computational technique in stochastic optimization is the sample average approximation (SAA) method, which replaces the intractable expectation with an empirical mean from a finite set of scenarios drawn from the distribution of ξ\xi. This yields an approximate problem minx1Ni=1Nf(x,ξi)\min_x \frac{1}{N} \sum_{i=1}^N f(x, \xi_i), where NN is the number of samples, transforming the stochastic problem into a deterministic one that can be solved using standard optimization solvers. Under suitable conditions, such as convexity and finite moments, SAA solutions converge to the true optimum as NN increases, with error bounds established through large deviation principles. This method is particularly effective for two-stage stochastic programs, where first-stage decisions are made before uncertainty is revealed, followed by second-stage recourse actions. In contrast, robust optimization focuses on worst-case performance over an set U\mathcal{U}, formulated as minxmaxξUf(x,ξ)\min_x \max_{\xi \in \mathcal{U}} f(x, \xi), ensuring solutions remain feasible and near-optimal even under adversarial realizations of . This paradigm, developed prominently in the late 1990s, provides tractable reformulations for common sets like ellipsoids or polyhedra, converting the min-max problem into a convex program solvable in time. Robust methods are conservative by design, prioritizing protection against the most detrimental outcomes rather than probabilistic averages, and have been extended to conic and semidefinite programs for broader applicability. Common techniques in these areas include scenario-based approaches, which generate a finite set of plausible uncertainty realizations to approximate either expectations or worst cases, and chance-constrained formulations that enforce probabilistic guarantees on constraints. Scenario methods, as in the scenario approach, draw independent samples to construct convex programs with explicit violation probability bounds, scaling well for high-dimensional problems. Chance constraints, such as P(g(x,ξ)0)1αP(g(x, \xi) \leq 0) \geq 1 - \alpha, limit the risk of constraint violation to at most α\alpha, with tractable approximations available for joint or individual probabilities under log-concave distributions. These techniques bridge stochastic and robust paradigms, often combining sampling with uncertainty sets for practical implementation. A classic application is portfolio optimization under uncertain asset returns, extending Markowitz's 1952 mean-variance model to handle estimation errors in expected returns and covariances. In versions, portfolios minimize over return distributions, while robust extensions consider worst-case returns within ellipsoidal uncertainty sets around nominal estimates, yielding more stable allocations with controlled . Such formulations have demonstrated improved out-of-sample performance compared to classical models, particularly in volatile markets. In the , distributionally robust optimization emerged as a unifying framework, optimizing over ambiguity sets of possible distributions rather than a single nominal one, such as moment-based sets containing all distributions with bounded mean and variance. This addresses distributional in data-driven settings, with tractable reformulations via duality yielding conservative yet computationally efficient solutions. Seminal work showed that for linear problems, these lead to tightened robust counterparts with explicit levels. DRO has gained traction in and , offering a balance between flexibility and robust .

Multi-objective and Global Optimization

Multi-objective optimization addresses problems where multiple conflicting objectives must be optimized simultaneously, such as minimizing cost while maximizing performance in engineering designs. Unlike single-objective problems, solutions are rarely optimal for all objectives at once; instead, the goal is to identify trade-offs among them. A key concept is Pareto optimality, where a feasible solution xx is Pareto optimal if no other feasible solution xx' exists such that fi(x)fi(x)f_i(x') \leq f_i(x) for all objectives i=1,,ki = 1, \dots, k and fj(x)<fj(x)f_j(x') < f_j(x) for at least one jj. This definition ensures that improving one objective cannot occur without degrading another, forming the basis for non-dominated solutions in vector optimization. The set of all Pareto-optimal solutions in the objective space traces the efficient frontier, a curve or surface representing the best possible trade-offs, often visualized for two objectives to highlight compromises like risk versus return in portfolio selection. To generate points on this frontier, scalarization techniques convert the multi-objective problem into a single-objective one. A prominent method is the weighted sum scalarization, which minimizes i=1kwifi(x)\sum_{i=1}^k w_i f_i(x) subject to constraints, where weights wi>0w_i > 0 reflect objective priorities and sum to 1; this yields properly efficient solutions under convexity assumptions. In , this approach optimizes trade-offs, such as minimizing production cost while maximizing structural performance in component design, where Pareto solutions balance material expenses against load-bearing capacity. Global optimization seeks the absolute best solution in problems with multimodal landscapes, particularly nonconvex ones where local minima abound, extending beyond local nonlinear solvers by guaranteeing convergence to the global optimum. Branch-and-bound methods systematically partition the and use lower bounds to prune suboptimal branches, ensuring finite termination for twice-differentiable nonconvex problems. The αBB algorithm exemplifies this by constructing convex underestimators via diagonal perturbation of the Hessian, enabling rigorous global minimization. Spatial partitioning complements this by dividing the search space into subregions, such as via , to refine bounds and focus computation on promising areas in low-dimensional bound-constrained problems. From the 1990s onward, evolutionary algorithms have advanced global search in multi-objective settings by maintaining diverse populations to approximate the entire without gradient reliance. The Non-dominated Sorting (NSGA), introduced in 1995, ranks solutions by dominance levels and uses crowding distance for diversity, effectively handling nonconvex Pareto fronts in applications like aerodynamic design. These methods provide robust approximations in complex, multimodal landscapes where exact global techniques are computationally intensive.

Optimization Algorithms

Exact Methods

Exact methods in mathematical optimization are algorithms designed to find globally optimal solutions with finite termination guarantees, typically for problems with linear, convex, or structured discrete objectives and constraints. These methods are particularly effective for (LP) and certain (IP) instances where the problem size permits exhaustive or polynomial-time exploration. Unlike approximation techniques, exact methods ensure optimality by systematically evaluating feasible regions or subproblems, often leveraging duality or relaxation principles to prune search spaces. They form the backbone of solvers for and applications, where precision is paramount over computational speed for moderate-scale instances. The simplex method, introduced by George B. Dantzig in 1947, addresses by iteratively pivoting from one to an adjacent one along the edges of the feasible , improving the objective value until optimality is reached. This tableau-based approach represents the problem in , selecting entering and leaving variables based on reduced costs and ratio tests to maintain feasibility. Despite its empirical efficiency on sparse problems, the method's is exponential due to potential cycling through an unbounded number of vertices, though anti-cycling rules like Bland's rule prevent this in practice. Interior-point methods, pioneered by in 1984, solve linear and convex programs by traversing paths through the interior of the rather than its boundaries. These algorithms employ barrier functions, such as the logarithmic barrier for inequality constraints, to transform the constrained problem into a sequence of unconstrained minimizations solved via . Primal-dual variants simultaneously optimize primal and dual objectives, following central paths that approximate the analytic center and converge to the optimum. Karmarkar's projective algorithm achieves polynomial-time complexity, specifically O(n3.5L)O(n^{3.5} L) arithmetic operations for an nn-variable problem with LL-bit input, marking a theoretical breakthrough over . Branch-and-bound, formalized by Ailsa Land and Alison Doig in , extends exact solving to by partitioning the feasible space into subproblems via branching on fractional variables from LP relaxations. Each node in the search is bounded using continuous relaxations or Lagrangian duals; branches exceeding the current best solution are fathomed to prune the tree. This enumerative strategy guarantees optimality by exhaustively exploring only promising regions, with enhancements like cutting planes strengthening bounds. For mixed-integer linear programs, the method's worst-case complexity is exponential, reflecting the NP-hard nature of IP, though average performance benefits from tight relaxations in structured cases. Dynamic programming, developed by Richard Bellman in the , provides an exact recursive framework for sequential decision problems, decomposing them into overlapping subproblems solved via backward or forward induction. The core defines the value function for stage kk as Vk(x)=mina[c(x,a)+Vk1(f(x,a))],V_k(x) = \min_a \left[ c(x,a) + V_{k-1}(f(x,a)) \right], where xx is the state, aa the action, c(x,a)c(x,a) the stage cost, and f(x,a)f(x,a) the transition function, with V0(x)=0V_0(x) = 0 for the terminal stage. This principle of optimality ensures that optimal subpolicy solutions compose to a global optimum, applicable to finite-horizon problems like inventory control or shortest paths. Computationally, it requires storing value functions across states, leading to exponential space and time in the state dimension unless sparsity or decomposition is exploited. In terms of , exact methods for achieve polynomial time via interior-point approaches, contrasting with simplex's exponential worst-case, while via branch-and-bound remains exponential in the worst case due to the of discrete choices. These guarantees underscore their suitability for verifiable optima in linear and convex subfields, though scalability limits their use to problems with up to thousands of variables.

Iterative and Gradient-Based Methods

Iterative and gradient-based methods form a of , relying on derivatives to iteratively refine solutions to minimization problems. These algorithms generate a of points approaching a local optimum by following the negative direction, which points toward steepest descent for differentiable objectives. Unlike exact methods that terminate finitely for structured problems, iterative approaches are designed for large-scale, general nonlinear functions, often achieving approximate solutions efficiently through repeated updates. Their effectiveness stems from exploiting first- or second-order information, with convergence guaranteed under conditions like of gradients. Gradient descent, the simplest such method, updates the iterate as xk+1=xkαkf(xk)x_{k+1} = x_k - \alpha_k \nabla f(x_k), where αk>0\alpha_k > 0 is a step size and f\nabla f is the of ff. Introduced by Cauchy in for solving systems via minimization, it ensures descent when αk\alpha_k satisfies sufficient decrease conditions. The step size αk\alpha_k is typically chosen via , such as the Armijo rule, which backtracks from an initial guess until f(xkαkf(xk))f(xk)cαkf(xk)2f(x_k - \alpha_k \nabla f(x_k)) \leq f(x_k) - c \alpha_k \|\nabla f(x_k)\|^2 for parameters 0<c<10 < c < 1 and a contraction factor. This rule, proposed by Armijo in 1966, promotes global convergence for smooth functions with Lipschitz gradients. Newton's method enhances this by incorporating curvature via the Hessian matrix H(xk)=2f(xk)H(x_k) = \nabla^2 f(x_k), yielding xk+1=xkH(xk)1f(xk)x_{k+1} = x_k - H(x_k)^{-1} \nabla f(x_k). Originating from root-finding but adapted for optimization, it approximates the quadratic model of ff locally, leading to faster progress near stationary points where second-order conditions hold, such as positive definiteness of the Hessian indicating a local minimum. For twice-differentiable functions with Lipschitz continuous Hessians, local quadratic convergence occurs: the error satisfies xk+1xCxkx2\|x_{k+1} - x^*\| \leq C \|x_k - x^*\|^2 for some constant C>0C > 0 near the optimum xx^*. The addresses quadratic objectives f(x)=12xTAxbTxf(x) = \frac{1}{2} x^T A x - b^T x with positive definite AA, avoiding explicit Hessian inversion by generating conjugate search directions. Developed by Hestenes and Stiefel in , it solves the equivalent Ax=bA x = b in at most nn iterations for nn-dimensional problems, with updates xk+1=xk+αkdkx_{k+1} = x_k + \alpha_k d_k where directions dkd_k satisfy dkTAdj=0d_k^T A d_j = 0 for j<kj < k, and αk=f(xk)Tf(xk)dkTAdk\alpha_k = \frac{\nabla f(x_k)^T \nabla f(x_k)}{d_k^T A d_k}. For non-quadratic cases, it approximates second-order behavior with only first-order oracles, exhibiting superlinear convergence under suitable conditions. Convergence properties vary by method and assumptions. Gradient descent achieves linear convergence—error reduces by a factor ρ<1\rho < 1 per iteration, xk+1xρxkx\|x_{k+1} - x^*\| \leq \rho \|x_k - x^*\|—for strongly convex ff with condition number κ\kappa, at rate 11κ1 - \frac{1}{\kappa}, improvable to O(1/k2)O(1/k^2) via acceleration as in Nesterov's 1983 method. Newton's method offers quadratic rates locally, while conjugate gradient minimizes quadratics exactly and shows O(k1/2)O(k^{1/2}) global rates for convex quadratics. Step size rules like Armijo ensure descent and global convergence to stationary points for nonconvex smooth ff. For constrained optimization, projected gradient descent adapts the update to feasible sets: xk+1=PC(xkαkf(xk))x_{k+1} = P_\mathcal{C}(x_k - \alpha_k \nabla f(x_k)), where PCP_\mathcal{C} projects onto the convex set C\mathcal{C}. Pioneered by Goldstein in 1964 for Hilbert spaces, it converges linearly for convex problems with feasible projections. The augmented Lagrangian method handles general nonlinear constraints by minimizing an augmented objective Lρ(x,λ)=f(x)+λTg(x)+ρ2g(x)2\mathcal{L}_\rho(x, \lambda) = f(x) + \lambda^T g(x) + \frac{\rho}{2} \|g(x)\|^2, with g(x)=0g(x) = 0 the constraints, updating multipliers λk+1=λk+ρg(xk+1)\lambda_{k+1} = \lambda_k + \rho g(x_{k+1}). Introduced by Hestenes in 1969, it combines penalty and multiplier ideas for dual feasibility and global convergence under constraint qualifications.

Heuristic and Approximation Methods

Heuristic and approximation methods provide practical solutions to optimization problems that are computationally intractable for exact algorithms, particularly NP-hard problems in global optimization where local search may trap solutions in suboptimal regions. These methods trade optimality guarantees for efficiency, often yielding high-quality approximations in reasonable time. Metaheuristics, a prominent class, draw inspiration from natural processes to explore large search spaces stochastically, while approximation algorithms offer provable bounds on solution quality relative to the optimum. Genetic algorithms (GAs) emulate Darwinian evolution to optimize over populations of candidate solutions. An initial population of encoded individuals (e.g., binary strings representing decision variables) is iteratively evolved through selection, crossover, and mutation operators. Selection favors fitter individuals based on an objective function, crossover combines parent solutions to produce offspring, and mutation introduces random variations to maintain diversity and escape local optima. This process, formalized by John Holland, enables GAs to converge toward global optima in complex, multimodal landscapes. Simulated annealing (SA) mimics the physical process of annealing in metallurgy to probabilistically explore solution spaces. Starting from an initial solution, SA generates neighboring candidates and accepts worse ones with probability P=exp(ΔE/T)P = \exp(-\Delta E / T), where ΔE\Delta E is the change in objective value and TT is a temperature parameter that decreases over time. This allows escape from local minima early on, with acceptance becoming more deterministic as TT cools. Introduced by Kirkpatrick, Gelatt, and Vecchi, SA has proven effective for combinatorial problems like the traveling salesman. Particle swarm optimization (PSO) models social behavior in flocks or swarms, where particles adjust positions in the search space based on personal and collective experiences. Each particle ii updates its velocity according to vit+1=wvit+c1r1(pbestixit)+c2r2(gbestxit),\mathbf{v}_i^{t+1} = w \mathbf{v}_i^t + c_1 r_1 (\mathbf{pbest}_i - \mathbf{x}_i^t) + c_2 r_2 (\mathbf{gbest} - \mathbf{x}_i^t), followed by position update xit+1=xit+vit+1\mathbf{x}_i^{t+1} = \mathbf{x}_i^t + \mathbf{v}_i^{t+1}, with inertia weight ww, cognitive coefficient c1c_1, social coefficient c2c_2, random values r1,r2[0,1]r_1, r_2 \in [0,1], personal best pbesti\mathbf{pbest}_i, and global best gbest\mathbf{gbest}. Developed by Kennedy and Eberhart, PSO excels in continuous, high-dimensional problems due to its simplicity and few parameters. Approximation methods provide rigorous guarantees for specific problems, such as the 0-1 knapsack problem, which maximizes value under a weight constraint and admits a polynomial-time approximation scheme (PTAS). A fully polynomial-time approximation scheme (FPTAS) achieves a (1 - ε)-approximation in time polynomial in input size and 1/ε, using scaled dynamic programming to reduce state space while bounding error. The original FPTAS, by Ibarra and Kim, scales item values and solves a relaxed instance exactly. In recent developments, reinforcement learning (RL) has emerged for automating hyperparameter tuning in optimization algorithms, treating tuning as a sequential decision process. Population-based training with RL, as in Jaderberg et al. (2020), evolves hyperparameters online across a population of models, rewarding performance improvements and enabling adaptation to diverse tasks in the 2020s. This approach enhances efficiency in AutoML pipelines for nonconvex and stochastic settings.

Applications

Engineering and Physics

Mathematical optimization plays a pivotal role in engineering and physics, enabling the design and analysis of physical systems under constraints such as material limits, energy efficiency, and performance requirements. In structural mechanics, topology optimization seeks to minimize weight while maintaining structural integrity, often formulated as a compliance minimization problem subject to volume constraints. A seminal approach, the Solid Isotropic Material with Penalization (SIMP) method, introduced in the late 1980s, models material distribution by assigning a density variable to each finite element, penalizing intermediate densities to promote binary (solid-void) designs; the effective stiffness is interpolated as E(ρ)=ρpE0E(\rho) = \rho^p E_0, where ρ\rho is the density (0 to 1), p>1p > 1 is the penalization factor, and E0E_0 is the solid material modulus. This method builds on homogenization techniques for optimal topologies and has been widely adopted for lightweight structures in and automotive applications, achieving significant weight reductions in benchmark cantilever beam problems compared to uniform designs. In , addresses by optimizing component values to meet specifications like gain, bandwidth, and , often minimizing a least-squares error between desired and simulated responses. For instance, gradient-based methods solve for sizes and biases in analog amplifiers, reducing design iterations from weeks to hours in workflows. Similarly, optimization uses to shape radiation patterns, adjusting element positions and excitations to minimize sidelobe levels while maximizing ; a classic example is the synthesis of linear arrays for , where navigates the nonconvex landscape to achieve sidelobe suppression below -20 dB in Dolph-Chebyshev configurations. These applications leverage nonlinear methods to handle the inherent nonconvexity of electromagnetic objectives. Control systems in engineering rely on optimization for stabilizing dynamic processes, with the (LQR) providing an optimal feedback law for linear systems. The LQR problem minimizes the infinite-horizon quadratic cost functional: J=0(xTQx+uTRu)dtJ = \int_0^\infty \left( \mathbf{x}^T Q \mathbf{x} + \mathbf{u}^T R \mathbf{u} \right) dt subject to x˙=Ax+Bu\dot{\mathbf{x}} = A \mathbf{x} + B \mathbf{u}, where x\mathbf{x} is the state, u\mathbf{u} the control input, Q0Q \geq 0 penalizes state deviations, and R>0R > 0 penalizes control effort. Solved via the , LQR yields gains that balance stability and performance, commonly applied in spacecraft attitude control to improve settling times over proportional-integral controllers. This framework, originating from early optimal control theory, remains foundational for applications in and automotive suspension. In , optimization enhances design by sizing members and adjusting geometries to minimize material use under load constraints, often using for stress-limited problems. Seminal work on topology traces to Michell's 1904 criteria for statically determinate structures, extended numerically to yield layouts with less steel than designs in bridge girders. in civil networks, such as water distribution or transportation systems, employs mixed-integer programming to optimize pipe diameters or lane capacities, minimizing costs while ensuring flow demands; for example, genetic algorithms level construction resources across project phases, helping to reduce peak usage and delays in large-scale builds. Geophysics applies least-squares optimization in seismic inversion to estimate subsurface properties from wave data, formulating the problem as minimizing the misfit dGm22\| d - G m \|^2_2, where dd are observed seismograms, GG the forward modeling operator, and mm the model parameters like profiles. This damped least-squares approach, regularized with Tikhonov terms, resolves ambiguities in ill-posed inverses, enabling accurate of reservoirs with significantly improved resolutions over ray-based methods in synthetic benchmarks like the Marmousi model. Originating from inverse theory advancements, it underpins full-waveform inversion for hazard assessment and resource .

Economics and Operations Research

In economics, mathematical optimization plays a central role in modeling consumer behavior through the , where an individual seeks to maximize their function u(x)u(x) subject to a pxwp \cdot x \leq w, with xx representing quantities of , pp their prices, and ww the . This formulation underpins , ensuring that individual optimizations lead to market equilibria under assumptions of convexity and completeness. The problem is typically solved using Lagrange multipliers or Kuhn-Tucker conditions for interior or boundary solutions, highlighting trade-offs in . In , optimization techniques are applied to portfolio selection, notably through the mean-variance framework introduced by , which minimizes portfolio for a given or maximizes return for a given level. The model formulates the problem as minimizing the variance σp2=wTΣw\sigma_p^2 = w^T \Sigma w subject to an expected return constraint μTw=r\mu^T w = r and 1Tw=11^T w = 1, where ww are asset weights, Σ\Sigma the , and μ\mu . This approach revolutionized investment management by quantifying diversification benefits. Operations research employs optimization for scheduling and inventory management. , a classic problem, assigns jobs to machines to minimize or , often modeled as a mixed-integer linear program (MILP) with binary variables for sequencing and continuous variables for start times, subject to precedence and capacity constraints. Seminal formulations demonstrated that such MILP models can yield optimal schedules for small instances, though necessitates heuristics for larger scales. Inventory models, such as the (EOQ), optimize ordering policies to balance holding and setup costs, deriving the optimal order size Q=2DSHQ = \sqrt{\frac{2DS}{H}}
Add your contribution
Related Hubs
User Avatar
No comments yet.