Hubbry Logo
logo
Gradient
Community hub

Gradient

logo
0 subscribers
Read side by side
from Wikipedia
The gradient, represented by the blue arrows, denotes the direction of greatest change of a scalar function. The values of the function are represented in greyscale and increase in value from white (low) to dark (high).

In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) whose value at a point gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of . If the gradient of a function is non-zero at a point , the direction of the gradient is the direction in which the function increases most quickly from , and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative.[1] Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to minimize a function by gradient descent. In coordinate-free terms, the gradient of a function may be defined by:

where is the total infinitesimal change in for an infinitesimal displacement , and is seen to be maximal when is in the direction of the gradient . The nabla symbol , written as an upside-down triangle and pronounced "del", denotes the vector differential operator.

When a coordinate system is used in which the basis vectors are not functions of position, the gradient is given by the vector[a] whose components are the partial derivatives of at .[2] That is, for , its gradient is defined at the point in n-dimensional space as the vector[b]

.

Note that the above definition for gradient is defined for the function only if is differentiable at . There can be functions for which partial derivatives exist in every direction but fail to be differentiable. Furthermore, this definition as the vector of partial derivatives is only valid when the basis of the coordinate system is orthonormal. For any other basis, the metric tensor at that point needs to be taken into account.

For example, the function unless at origin where , is not differentiable at the origin as it does not have a well defined tangent plane despite having well defined partial derivatives in every direction at the origin.[3] In this particular example, under rotation of x-y coordinate system, the above formula for gradient fails to transform like a vector (gradient becomes dependent on choice of basis for coordinate system) and also fails to point towards the 'steepest ascent' in some orientations. For differentiable functions where the formula for gradient holds, it can be shown to always transform as a vector under transformation of the basis so as to always point towards the fastest increase.

The gradient is dual to the total derivative : the value of the gradient at a point is a tangent vector – a vector at each point; while the value of the derivative at a point is a cotangent vector – a linear functional on vectors.[c] They are related in that the dot product of the gradient of at a point with another tangent vector equals the directional derivative of at of the function along ; that is, . The gradient admits multiple generalizations to more general functions on manifolds; see § Generalizations.

Motivation

[edit]
Gradient of the 2D function f(x, y) = xe−(x2 + y2) is plotted as arrows over the pseudocolor plot of the function.

Consider a room where the temperature is given by a scalar field, T, so at each point (x, y, z) the temperature is T(x, y, z), independent of time. At each point in the room, the gradient of T at that point will show the direction in which the temperature rises most quickly, moving away from (x, y, z). The magnitude of the gradient will determine how fast the temperature rises in that direction.

Consider a surface whose height above sea level at point (x, y) is H(x, y). The gradient of H at a point is a plane vector pointing in the direction of the steepest slope or grade at that point. The steepness of the slope at that point is given by the magnitude of the gradient vector.

The gradient can also be used to measure how a scalar field changes in other directions, rather than just the direction of greatest change, by taking a dot product. Suppose that the steepest slope on a hill is 40%. A road going directly uphill has slope 40%, but a road going around the hill at an angle will have a shallower slope. For example, if the road is at a 60° angle from the uphill direction (when both directions are projected onto the horizontal plane), then the slope along the road will be the dot product between the gradient vector and a unit vector along the road, as the dot product measures how much the unit vector along the road aligns with the steepest slope,[d] which is 40% times the cosine of 60°, or 20%.

More generally, if the hill height function H is differentiable, then the gradient of H dotted with a unit vector gives the slope of the hill in the direction of the vector, the directional derivative of H along the unit vector.

Notation

[edit]

The gradient of a function at point is usually written as . It may also be denoted by any of the following:

  •  : to emphasize the vector nature of the result.
  • and  : Written with Einstein notation, where repeated indices (i) are summed over.

Definition

[edit]
The gradient of the function f(x,y) = −(cos2x + cos2y)2 depicted as a projected vector field on the bottom plane.

The gradient (or gradient vector field) of a scalar function f(x1, x2, x3, …, xn) is denoted f or f where (nabla) denotes the vector differential operator, del. The notation grad f is also commonly used to represent the gradient. The gradient of f is defined as the unique vector field whose dot product with any vector v at each point x is the directional derivative of f along v. That is,

where the right-hand side is the directional derivative and there are many ways to represent it. Formally, the derivative is dual to the gradient; see relationship with derivative.

When a function also depends on a parameter such as time, the gradient often refers simply to the vector of its spatial derivatives only (see Spatial gradient).

The magnitude and direction of the gradient vector are independent of the particular coordinate representation.[4][5]

Cartesian coordinates

[edit]

In the three-dimensional Cartesian coordinate system with a Euclidean metric, the gradient, if it exists, is given by

where i, j, k are the standard unit vectors in the directions of the x, y and z coordinates, respectively. For example, the gradient of the function is or

In some applications it is customary to represent the gradient as a row vector or column vector of its components in a rectangular coordinate system; this article follows the convention of the gradient being a column vector, while the derivative is a row vector.

Cylindrical and spherical coordinates

[edit]

In cylindrical coordinates, the gradient is given by:[6]

where ρ is the axial distance, φ is the azimuthal or azimuth angle, z is the axial coordinate, and eρ, eφ and ez are unit vectors pointing along the coordinate directions.

In spherical coordinates with a Euclidean metric, the gradient is given by:[6]

where r is the radial distance, φ is the azimuthal angle and θ is the polar angle, and er, eθ and eφ are again local unit vectors pointing in the coordinate directions (that is, the normalized covariant basis).

For the gradient in other orthogonal coordinate systems, see Orthogonal coordinates (Differential operators in three dimensions).

General coordinates

[edit]

We consider general coordinates, which we write as x1, …, xi, …, xn, where n is the number of dimensions of the domain. Here, the upper index refers to the position in the list of the coordinate or component, so x2 refers to the second component—not the quantity x squared. The index variable i refers to an arbitrary element xi. Using Einstein notation, the gradient can then be written as:

(Note that its dual is ),

where and refer to the unnormalized local covariant and contravariant bases respectively, is the inverse metric tensor, and the Einstein summation convention implies summation over i and j.

If the coordinates are orthogonal we can easily express the gradient (and the differential) in terms of the normalized bases, which we refer to as and , using the scale factors (also known as Lamé coefficients)  :

(and ),

where we cannot use Einstein notation, since it is impossible to avoid the repetition of more than two indices. Despite the use of upper and lower indices, , , and are neither contravariant nor covariant.

The latter expression evaluates to the expressions given above for cylindrical and spherical coordinates.

Relationship with derivative

[edit]

Relationship with total derivative

[edit]

The gradient is closely related to the total derivative (total differential) : they are transpose (dual) to each other. Using the convention that vectors in are represented by column vectors, and that covectors (linear maps ) are represented by row vectors,[a] the gradient and the derivative are expressed as a column and row vector, respectively, with the same components, but transpose of each other:

While these both have the same components, they differ in what kind of mathematical object they represent: at each point, the derivative is a cotangent vector, a linear form (or covector) which expresses how much the (scalar) output changes for a given infinitesimal change in (vector) input, while at each point, the gradient is a tangent vector, which represents an infinitesimal change in (vector) input. In symbols, the gradient is an element of the tangent space at a point, , while the derivative is a map from the tangent space to the real numbers, . The tangent spaces at each point of can be "naturally" identified[e] with the vector space itself, and similarly the cotangent space at each point can be naturally identified with the dual vector space of covectors; thus the value of the gradient at a point can be thought of a vector in the original , not just as a tangent vector.

Computationally, given a tangent vector, the vector can be multiplied by the derivative (as matrices), which is equal to taking the dot product with the gradient:

Differential or (exterior) derivative

[edit]

The best linear approximation to a differentiable function at a point in is a linear map from to which is often denoted by or and called the differential or total derivative of at . The function , which maps to , is called the total differential or exterior derivative of and is an example of a differential 1-form.

Much as the derivative of a function of a single variable represents the slope of the tangent to the graph of the function,[7] the directional derivative of a function in several variables represents the slope of the tangent hyperplane in the direction of the vector.

The gradient is related to the differential by the formula for any , where is the dot product: taking the dot product of a vector with the gradient is the same as taking the directional derivative along the vector.

If is viewed as the space of (dimension ) column vectors (of real numbers), then one can regard as the row vector with components so that is given by matrix multiplication. Assuming the standard Euclidean metric on , the gradient is then the corresponding column vector, that is,

Linear approximation to a function

[edit]

The best linear approximation to a function can be expressed in terms of the gradient, rather than the derivative. The gradient of a function from the Euclidean space to at any particular point in characterizes the best linear approximation to at . The approximation is as follows:

for close to , where is the gradient of computed at , and the dot denotes the dot product on . This equation is equivalent to the first two terms in the multivariable Taylor series expansion of at .

Relationship with Fréchet derivative

[edit]

Let U be an open set in Rn. If the function f : UR is differentiable, then the differential of f is the Fréchet derivative of f. Thus f is a function from U to the space Rn such that where · is the dot product.

As a consequence, the usual properties of the derivative hold for the gradient, though the gradient is not a derivative itself, but rather dual to the derivative:

Linearity
The gradient is linear in the sense that if f and g are two real-valued functions differentiable at the point aRn, and α and β are two constants, then αf + βg is differentiable at a, and moreover
Product rule
If f and g are real-valued functions differentiable at a point aRn, then the product rule asserts that the product fg is differentiable at a, and
Chain rule
Suppose that f : AR is a real-valued function defined on a subset A of Rn, and that f is differentiable at a point a. There are two forms of the chain rule applying to the gradient. First, suppose that the function g is a parametric curve; that is, a function g : IRn maps a subset IR into Rn. If g is differentiable at a point cI such that g(c) = a, then where ∘ is the composition operator: (f ∘ g)(x) = f(g(x)).

More generally, if instead IRk, then the following holds: where (Dg)T denotes the transpose Jacobian matrix.

For the second form of the chain rule, suppose that h : IR is a real valued function on a subset I of R, and that h is differentiable at the point f(a) ∈ I. Then

Further properties and applications

[edit]

Level sets

[edit]

A level surface, or isosurface, is the set of all points where some function has a given value.

If f is differentiable, then the dot product (∇f )xv of the gradient at a point x with a vector v gives the directional derivative of f at x in the direction v. It follows that in this case the gradient of f is orthogonal to the level sets of f. For example, a level surface in three-dimensional space is defined by an equation of the form F(x, y, z) = c. The gradient of F is then normal to the surface.

More generally, any embedded hypersurface in a Riemannian manifold can be cut out by an equation of the form F(P) = 0 such that dF is nowhere zero. The gradient of F is then normal to the hypersurface.

Similarly, an affine algebraic hypersurface may be defined by an equation F(x1, ..., xn) = 0, where F is a polynomial. The gradient of F is zero at a singular point of the hypersurface (this is the definition of a singular point). At a non-singular point, it is a nonzero normal vector.

Conservative vector fields and the gradient theorem

[edit]

The gradient of a function is called a gradient field. A (continuous) gradient field is always a conservative vector field: its line integral along any path depends only on the endpoints of the path, and can be evaluated by the gradient theorem (the fundamental theorem of calculus for line integrals). Conversely, a (continuous) conservative vector field is always the gradient of a function.

Gradient is direction of steepest ascent

[edit]

The gradient of a function at point x is also the direction of its steepest ascent, i.e. it maximizes its directional derivative:

Let be an arbitrary unit vector. With the directional derivative defined as

we get, by substituting the function with its Taylor series,

where denotes higher order terms in .

Dividing by , and taking the limit yields a term which is bounded from above by the Cauchy–Schwarz inequality[8]

Choosing maximizes the directional derivative, and equals the upper bound

Generalizations

[edit]

Jacobian

[edit]

The Jacobian matrix is the generalization of the gradient for vector-valued functions of several variables and differentiable maps between Euclidean spaces or, more generally, manifolds.[9][10] A further generalization for a function between Banach spaces is the Fréchet derivative.

Suppose f : RnRm is a function such that each of its first-order partial derivatives exist on n. Then the Jacobian matrix of f is defined to be an m×n matrix, denoted by or simply . The (i,j)th entry is . Explicitly

Gradient of a vector field

[edit]

Since the total derivative of a vector field is a linear mapping from vectors to vectors, it is a tensor quantity.

In rectangular coordinates, the gradient of a vector field f = ( f1, f2, f3) is defined by:

(where the Einstein summation notation is used and the tensor product of the vectors ei and ek is a dyadic tensor of type (2,0)). Overall, this expression equals the transpose of the Jacobian matrix:

In curvilinear coordinates, or more generally on a curved manifold, the gradient involves Christoffel symbols:

where gjk are the components of the inverse metric tensor and the ei are the coordinate basis vectors.

Expressed more invariantly, the gradient of a vector field f can be defined by the Levi-Civita connection and metric tensor:[11]

where c is the connection.

Riemannian manifolds

[edit]

For any smooth function f on a Riemannian manifold (M, g), the gradient of f is the vector field f such that for any vector field X, that is, where gx( , ) denotes the inner product of tangent vectors at x defined by the metric g and Xf is the function that takes any point xM to the directional derivative of f in the direction X, evaluated at x. In other words, in a coordinate chart φ from an open subset of M to an open subset of Rn, (∂Xf )(x) is given by: where Xj denotes the jth component of X in this coordinate chart.

So, the local form of the gradient takes the form:

Generalizing the case M = Rn, the gradient of a function is related to its exterior derivative, since More precisely, the gradient f is the vector field associated to the differential 1-form df using the musical isomorphism (called "sharp") defined by the metric g. The relation between the exterior derivative and the gradient of a function on Rn is a special case of this in which the metric is the flat metric given by the dot product.

See also

[edit]

Notes

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In mathematics and physics, the gradient of a scalar-valued differentiable function $ f $ of several variables is a vector field that points in the direction of the function's steepest increase at each point and whose magnitude equals the rate of that increase.[1] For a function $ f(x, y, z) $ in three dimensions, the gradient is formally defined as the vector $ \nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right) $, where the components are the partial derivatives of $ f $ with respect to each variable.[2] This operator, symbolized by the nabla $ \nabla $, was introduced by William Rowan Hamilton in 1853 as part of his work on quaternions and vector analysis.[3] The gradient plays a central role in multivariable calculus, where it enables the computation of directional derivatives: the directional derivative of $ f $ in the direction of a unit vector $ \mathbf{u} $ is given by the dot product $ \nabla f \cdot \mathbf{u} $.[4] Geometrically, level surfaces (or curves in 2D) of $ f $ are perpendicular to the gradient vector at every point, making $ \nabla f $ normal to these surfaces and useful for finding tangent planes.[1] In physics, the gradient describes conservative force fields, such as the gravitational or electric field, where the force on a particle is $ \mathbf{F} = - \nabla V $ for a potential $ V $.[5] Beyond pure mathematics, the gradient is foundational in optimization algorithms like gradient descent, which iteratively adjusts parameters to minimize a loss function by moving opposite to the gradient direction. It also appears in fluid dynamics for pressure gradients driving flow and in computer graphics for shading based on surface normals derived from gradients.[6] These applications underscore the gradient's versatility across disciplines, from theoretical analysis to practical computations in engineering and machine learning.

Basic Concepts

Motivation and Intuition

The concept of the gradient emerged in the 19th century as part of the development of vector calculus, building on the foundations of partial derivatives established earlier. Partial derivatives, which capture how a multivariable function varies with respect to one independent variable while treating others as constant, were systematically developed by Leonhard Euler around 1734,[] with notation refinements by Carl Gustav Jacob Jacobi in the 1840s.[] The gradient itself took shape through William Rowan Hamilton's introduction of quaternions in 1843 and the nabla operator in 1853,[] which laid groundwork for vector operations, and was fully articulated in modern form by J. Willard Gibbs and Oliver Heaviside in the 1880s as they separated scalar and vector components in calculus.[7] Intuitively, the gradient generalizes the idea of a slope to functions of multiple variables, providing a directional measure of change in a scalar field across multidimensional space. At any point, it indicates the path of most rapid increase in the function's value, much like following the steepest uphill route on a hilly landscape, with its length reflecting the sharpness of that rise. This vectorial perspective allows for a unified understanding of variation in all directions, bridging single-variable derivatives to complex spatial behaviors without relying on isolated one-dimensional slices.[8] Physically, the gradient motivates many natural processes by quantifying how scalar quantities like temperature or potential evolve in space, driving flows and forces accordingly. In heat transfer, for example, the temperature gradient determines the direction of thermal conduction, where heat moves perpendicular to isotherms from hotter to cooler regions, as heat flux is proportional to this gradient per Fourier's law established in 1822.[] Likewise, in gravitational contexts, the gradient of the potential field points toward decreasing potential, aligning with the direction of attractive force and exemplifying how such vectors model conservative systems in mechanics.[9] As a cornerstone of multivariable calculus, the gradient establishes essential intuition for analyzing scalar fields—functions assigning values to points in space—before formal mathematical treatments. It underscores why tracking multidimensional changes matters for modeling real-world scenarios involving multiple influences, such as environmental variations or fluid dynamics, setting the stage for deeper explorations in optimization and field theory.[10]

Notation

The gradient of a scalar function ff, denoted as f\nabla f or f\mathbf{\nabla} f, represents the vector field consisting of its partial derivatives, where \nabla is the nabla symbol or del operator.[11] In vector form, it is often written using boldface notation, such as f\mathbf{\nabla} f, to emphasize its status as a vector.[12] The nabla operator \nabla itself is a vector differential operator, commonly expressed in Cartesian coordinates as =i^x+j^y+k^z\nabla = \hat{\mathbf{i}} \frac{\partial}{\partial x} + \hat{\mathbf{j}} \frac{\partial}{\partial y} + \hat{\mathbf{k}} \frac{\partial}{\partial z}, acting on ff to yield the gradient vector.[13] Variations in notation include index form, where the ii-th component of the gradient is fxi\frac{\partial f}{\partial x_i} for coordinates xix_i, useful in higher-dimensional or tensorial settings.[14] In computational and optimization contexts, the gradient may appear as a column matrix or vector, such as f=(fx1fxn)\nabla f = \begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{pmatrix}, facilitating numerical implementations.[15] Conventions distinguish the gradient from related operators: applied to a scalar field, f\nabla f produces a vector, whereas the divergence v\nabla \cdot \mathbf{v} (for vector v\mathbf{v}) yields a scalar, and the curl ×v\nabla \times \mathbf{v} yields a vector, ensuring no ambiguity in multivariable calculus.[11] In mathematics, f\nabla f is the predominant notation, while physics texts often prefer gradf\operatorname{grad} f for clarity in electromagnetic or fluid dynamics applications.[16] This notation will appear consistently in subsequent equations, such as the simple two-dimensional example (x2+y2)=(2x,2y)\nabla (x^2 + y^2) = (2x, 2y), illustrating the vector pointing in the direction of steepest ascent without specifying coordinate systems here.[13]

Definition

In Cartesian Coordinates

In Cartesian coordinates, the gradient of a scalar-valued function f:RnRf: \mathbb{R}^n \to \mathbb{R} defined on an open set in Euclidean space is a vector whose components are the partial derivatives of ff with respect to each coordinate variable. Specifically, at a point x=(x1,x2,,xn)\mathbf{x} = (x_1, x_2, \dots, x_n), the gradient is given by
f(x)=(fx1(x),fx2(x),,fxn(x)). \nabla f(\mathbf{x}) = \left( \frac{\partial f}{\partial x_1}(\mathbf{x}), \frac{\partial f}{\partial x_2}(\mathbf{x}), \dots, \frac{\partial f}{\partial x_n}(\mathbf{x}) \right).
This assumes that the partial derivatives exist at x\mathbf{x}.[17][11] In two dimensions, for f(x,y)f(x, y), the gradient takes the form
f(x,y)=(fx,fy), \nabla f(x, y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right),
while in three dimensions, for f(x,y,z)f(x, y, z), it is
f(x,y,z)=(fx,fy,fz)=fxi+fyj+fzk. \nabla f(x, y, z) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right) = \frac{\partial f}{\partial x} \mathbf{i} + \frac{\partial f}{\partial y} \mathbf{j} + \frac{\partial f}{\partial z} \mathbf{k}.
These expressions hold assuming the partial derivatives exist at the point of interest, on an open domain in R2\mathbb{R}^2 or R3\mathbb{R}^3.[18][11] To compute the gradient, evaluate each partial derivative separately by treating the other variables as constants and differentiating with respect to the respective coordinate; the resulting vector components are assembled at the point of interest. For example, consider f(x,y)=x2+y2f(x, y) = x^2 + y^2; the partial with respect to xx is 2x2x, and with respect to yy is 2y2y, yielding f(x,y)=(2x,2y)\nabla f(x, y) = (2x, 2y). This vector is normal to the level curves of ff, which are circles centered at the origin.[18][11]

In Curvilinear Coordinates

In orthogonal curvilinear coordinate systems, the gradient of a scalar function ff accounts for the local geometry through scale factors, which adjust the partial derivatives to reflect the varying metric of the coordinate basis.[19] These systems are particularly useful for problems with cylindrical or spherical symmetry, where the coordinate curves align with the physical domain.[20] The general expression for the gradient in an orthogonal curvilinear system with coordinates (u1,u2,u3)(u_1, u_2, u_3) and corresponding scale factors h1,h2,h3h_1, h_2, h_3 is
f=i=131hifuie^i, \nabla f = \sum_{i=1}^3 \frac{1}{h_i} \frac{\partial f}{\partial u_i} \hat{e}_i,
where e^i\hat{e}_i are the unit basis vectors along each coordinate direction.[19] The scale factors hih_i are defined as hi=r/uih_i = |\partial \mathbf{r}/\partial u_i|, quantifying the infinitesimal arc length per unit change in uiu_i.[20] Cartesian coordinates represent a special case where all hi=1h_i = 1.[19] In cylindrical coordinates (ρ,ϕ,z)(\rho, \phi, z), the scale factors are hρ=1h_\rho = 1, hϕ=ρh_\phi = \rho, and hz=1h_z = 1.[21] Thus, the gradient takes the form
f=fρe^ρ+1ρfϕe^ϕ+fze^z. \nabla f = \frac{\partial f}{\partial \rho} \hat{e}_\rho + \frac{1}{\rho} \frac{\partial f}{\partial \phi} \hat{e}_\phi + \frac{\partial f}{\partial z} \hat{e}_z.
This expression arises from the metric in cylindrical systems, where the azimuthal direction stretches with radius ρ\rho.[22] For spherical coordinates (r,θ,ϕ)(r, \theta, \phi), the scale factors are hr=1h_r = 1, hθ=rh_\theta = r, and hϕ=rsinθh_\phi = r \sin \theta.[21] The gradient is then
f=fre^r+1rfθe^θ+1rsinθfϕe^ϕ. \nabla f = \frac{\partial f}{\partial r} \hat{e}_r + \frac{1}{r} \frac{\partial f}{\partial \theta} \hat{e}_\theta + \frac{1}{r \sin \theta} \frac{\partial f}{\partial \phi} \hat{e}_\phi.
The dependence on sinθ\sin \theta in the ϕ\phi-component reflects the contraction of azimuthal circles toward the poles.[22] A representative example is the gradient of a radial potential, such as the electric potential V=1/(4πϵ0r)V = 1/(4\pi \epsilon_0 r) from a point charge, which depends only on the radial coordinate rr.[23] In spherical coordinates, V/r=1/(4πϵ0r2)\partial V / \partial r = -1/(4\pi \epsilon_0 r^2) and the angular derivatives vanish, yielding V=14πϵ0r2e^r\nabla V = -\frac{1}{4\pi \epsilon_0 r^2} \hat{e}_r.[23] This purely radial form aligns with the symmetry of the field.[23] These formulations are essential in fields like electromagnetism, where the electric field is the negative gradient of the scalar potential in symmetric geometries, and in fluid dynamics, for computing pressure gradients in axisymmetric or spherical flows.[20]

In General Coordinate Systems

In a general coordinate system on a smooth manifold equipped with a Riemannian metric, the gradient of a scalar function f:MRf: M \to \mathbb{R} is defined abstractly as the unique vector field f\nabla f on MM such that for every smooth vector field XX on MM, the inner product satisfies f,X=df(X)\langle \nabla f, X \rangle = df(X), where dfdf denotes the differential of ff and ,\langle \cdot, \cdot \rangle is the metric tensor.[24] This definition assumes MM is a smooth manifold and the metric provides a smoothly varying positive definite inner product on each tangent space TpMT_p M, enabling the identification of tangent and cotangent spaces via the musical isomorphism.[25] The differential dfdf is a smooth 1-form, and f\nabla f arises as its image under the sharp operator (^\sharp) induced by the metric, which maps covectors to vectors by raising indices. In local coordinates (x1,,xn)(x^1, \dots, x^n) on MM, where the metric tensor has components gijg_{ij} (with inverse gijg^{ij}), the gradient takes the explicit form
f=gijfxjxi, \nabla f = g^{ij} \frac{\partial f}{\partial x^j} \frac{\partial}{\partial x^i},
with summation over repeated indices i,j=1,,ni, j = 1, \dots, n.[25] This coordinate expression leverages the metric to contract the covector f/xjdxj\partial f / \partial x^j \, dx^j (the local representation of dfdf) against gijg^{ij} to yield vector components. The assumption here is that ff is smooth, ensuring the partial derivatives exist and the expression defines a smooth vector field.[24] From the perspective of differential forms, the gradient f\nabla f corresponds to the 1-form dfdf via the metric's musical isomorphism in the Riemannian setting, which provides a canonical way to associate vector fields to 1-forms without relying on a specific coordinate chart.[25] This view emphasizes the coordinate-free nature of the construction, where the metric bridges the duality between tangent vectors and covectors. As a representative example, in flat Euclidean space Rn\mathbb{R}^n with the standard metric gij=δijg_{ij} = \delta_{ij} (the Kronecker delta), the inverse is gij=δijg^{ij} = \delta^{ij}, so the general formula reduces to the familiar Cartesian gradient f=i=1nfxixi\nabla f = \sum_{i=1}^n \frac{\partial f}{\partial x^i} \frac{\partial}{\partial x^i}.[24]

Relationships to Derivatives

Connection to Total Derivative

For a scalar-valued function f:RnRf: \mathbb{R}^n \to \mathbb{R}, the total derivative Df(x)Df(\mathbf{x}) at a point x\mathbf{x} is the linear map from Rn\mathbb{R}^n to R\mathbb{R} that approximates the change in ff for small displacements h\mathbf{h}, given by Df(x)(h)=f(x)hDf(\mathbf{x})(\mathbf{h}) = \nabla f(\mathbf{x}) \cdot \mathbf{h}.[26] This representation shows that the gradient f(x)\nabla f(\mathbf{x}) fully encodes the total derivative as a dot product, providing the best linear approximation to the function's variation in any direction.[27] The total differential of ff expands this as
df=i=1nfxidxi=fdx, df = \sum_{i=1}^n \frac{\partial f}{\partial x_i} \, dx_i = \nabla f \cdot d\mathbf{x},
where dxidx_i are infinitesimal changes in the coordinates, directly linking the partial derivatives in the gradient to the overall rate of change.[27] This form arises from the definition of differentiability, where ff is differentiable at x\mathbf{x} if
limh0f(x+h)f(x)f(x)hh=0, \lim_{\mathbf{h} \to \mathbf{0}} \frac{f(\mathbf{x} + \mathbf{h}) - f(\mathbf{x}) - \nabla f(\mathbf{x}) \cdot \mathbf{h}}{\|\mathbf{h}\|} = 0,
with the linear term f(x)h\nabla f(\mathbf{x}) \cdot \mathbf{h} constituting the total derivative; a proof follows by verifying that the existence of partial derivatives and this limit condition imply the gradient's role in the approximation.[26]
A key application is the directional derivative, which measures the instantaneous rate of change of ff along a unit vector u\mathbf{u}, defined as f(x)u\nabla f(\mathbf{x}) \cdot \mathbf{u}.[28] This is a special case of the total derivative where h=tu\mathbf{h} = t \mathbf{u} for small tt, reducing to the projection of the gradient onto the direction u\mathbf{u}.[28] The connection extends to the multivariable chain rule: for a differentiable path g(t):RRn\mathbf{g}(t): \mathbb{R} \to \mathbb{R}^n, the derivative of the composition f(g(t))f(\mathbf{g}(t)) is
ddtf(g(t))=f(g(t))g(t). \frac{d}{dt} f(\mathbf{g}(t)) = \nabla f(\mathbf{g}(t)) \cdot \mathbf{g}'(t).
[28] This follows from applying the total derivative along the curve, where g(t)\mathbf{g}'(t) acts as the tangential displacement, and a sketch of the proof uses the linear approximation along the path to match the limit definition of the derivative.[28]

Linear Approximations

The gradient of a differentiable scalar-valued function $ f: \mathbb{R}^n \to \mathbb{R} $ at a point $ \mathbf{x} $ enables the best linear approximation of $ f $ near $ \mathbf{x} $. Specifically,
f(x+h)f(x)+f(x)h, f(\mathbf{x} + \mathbf{h}) \approx f(\mathbf{x}) + \nabla f(\mathbf{x}) \cdot \mathbf{h},
with the error satisfying $ o(|\mathbf{h}|) $ as $ \mathbf{h} \to \mathbf{0} $.[29][30] This formula arises from the first-order Taylor expansion in multiple variables, where the gradient captures the linear change in $ f $ along any direction $ \mathbf{h} $. This approximation is particularly useful for estimating function values when exact computation is difficult. Geometrically, the linear approximation defines the tangent hyperplane to the graph of $ f $ at the point $ (\mathbf{x}, f(\mathbf{x})) $ in $ \mathbb{R}^{n+1} $. The hyperplane equation is $ z = f(\mathbf{x}) + \nabla f(\mathbf{x}) \cdot (\mathbf{u} - \mathbf{x}) $, providing the closest affine approximation to the graph locally at that point.[31] This extends the one-dimensional tangent line concept to higher dimensions, where the gradient vector serves as the normal to the level sets but here defines the plane's slope in all directions. For illustration, consider $ f(x,y) = \sin x + \cos y $ near $ (0,0) $. Here, $ f(0,0) = 1 $ and $ \nabla f(0,0) = (1, 0) $, so the linear approximation is $ L(x,y) = 1 + x $. For small increments $ (h,k) $, $ f(h,k) = \sin h + \cos k \approx h + (1 - k^2/2) $, confirming that the linear term $ 1 + h $ captures the dominant first-order behavior while neglecting higher-order contributions like $ -k^2/2 $. A higher-order refinement incorporates the Hessian matrix $ Hf(\mathbf{x}) $ for the quadratic term $ \frac{1}{2} \mathbf{h}^T Hf(\mathbf{x}) \mathbf{h} $, yielding a second-order approximation $ f(\mathbf{x} + \mathbf{h}) \approx f(\mathbf{x}) + \nabla f(\mathbf{x}) \cdot \mathbf{h} + \frac{1}{2} \mathbf{h}^T Hf(\mathbf{x}) \mathbf{h} $.[30] In optimization, the condition $ \nabla f(\mathbf{x}) = \mathbf{0} $ identifies critical points, which may correspond to local minima if the function decreases in all directions away from $ \mathbf{x} $.[32] This linear approximation underpins methods like gradient descent, where the gradient's direction and magnitude guide iterative improvements toward minima. The total derivative $ Df(\mathbf{x}) $ formalizes this as the linear map whose standard-basis matrix is the row vector $ \nabla f(\mathbf{x}) $.[29]

Fréchet Derivative

The Fréchet derivative generalizes the concept of the derivative to functions between normed vector spaces, particularly Banach spaces, providing a linear approximation that is uniform in all directions. For a function f:XYf: X \to Y where XX and YY are Banach spaces and UXU \subseteq X is an open set containing xXx \in X, the Fréchet derivative of ff at xx, denoted Df(x)Df(x) or TT, is a bounded linear operator T:XYT: X \to Y such that
f(x+h)=f(x)+T(h)+o(h) f(x + h) = f(x) + T(h) + o(\|h\|)
as h0h \to 0, where the little-o notation indicates that o(h)/h0\|o(\|h\|)\| / \|h\| \to 0 as h0\|h\| \to 0. This condition ensures that the linear term T(h)T(h) captures the first-order behavior of ff uniformly over the space, making it a stronger notion of differentiability than directional variants.[33] In the specific case of finite-dimensional Euclidean spaces, such as f:RnRf: \mathbb{R}^n \to \mathbb{R}, the Fréchet derivative aligns directly with the classical gradient. Here, the bounded linear operator TT is represented by the inner product T(h)=f(x)hT(h) = \nabla f(x) \cdot h, where f(x)\nabla f(x) is the gradient vector of ff at xx. The defining limit then becomes
f(x+h)f(x)f(x)hh0 \frac{\|f(x + h) - f(x) - \nabla f(x) \cdot h\|}{\|h\|} \to 0
as h0\|h\| \to 0, illustrating how the gradient serves as the Fréchet derivative in this setting by providing the best linear approximation to ff near xx.[33] An illustrative example arises in function spaces, common in the calculus of variations, where functionals map infinite-dimensional spaces like C[0,1]C[0,1] (continuous functions on [0,1][0,1] with the sup norm) to R\mathbb{R}. Consider the integral functional ϕ(f)=01f(x)2dx\phi(f) = \int_0^1 f(x)^2 \, dx for fC[0,1]f \in C[0,1]. The Fréchet derivative at ff is the bounded linear functional A(h)=201f(x)h(x)dxA(h) = 2 \int_0^1 f(x) h(x) \, dx, satisfying ϕ(f+h)=ϕ(f)+A(h)+o(h)\phi(f + h) = \phi(f) + A(h) + o(\|h\|_\infty) as h0\|h\|_\infty \to 0. This derivative, often identified via the Riesz representation theorem with multiplication by 2f(x)2f(x), highlights how Fréchet differentiability facilitates optimization in such spaces by linearizing variations around a function.[34] The Fréchet derivative is distinguished from the weaker Gâteaux derivative, which only requires the existence of directional derivatives along each direction hh (i.e., the limit along rays tht h as t0t \to 0) that form a linear map, but without uniformity over all directions. While a continuous Gâteaux derivative implies the Fréchet derivative (and they coincide), the converse holds, but Gâteaux differentiability alone does not guarantee the stronger uniform approximation essential for applications in Banach spaces.[35]

Properties and Applications

Level Sets

In multivariable calculus, the level set of a scalar function f:RnRf: \mathbb{R}^n \to \mathbb{R} at a constant value cc is defined as the set Lc={xRnf(x)=c}L_c = \{ \mathbf{x} \in \mathbb{R}^n \mid f(\mathbf{x}) = c \}. Where the gradient f(x0)0\nabla f(\mathbf{x}_0) \neq \mathbf{0} at a point x0Lc\mathbf{x}_0 \in L_c, this gradient vector is perpendicular to the tangent space of the level set at x0\mathbf{x}_0.[36] To see this, consider a smooth curve r(t)\mathbf{r}(t) on the level set LcL_c passing through x0\mathbf{x}_0 at t=0t=0, so f(r(t))=cf(\mathbf{r}(t)) = c for all tt near 0. Differentiating with respect to tt yields ddtf(r(t))=f(r(t))r(t)=0\frac{d}{dt} f(\mathbf{r}(t)) = \nabla f(\mathbf{r}(t)) \cdot \mathbf{r}'(t) = 0, implying that f(x0)\nabla f(\mathbf{x}_0) is orthogonal to the tangent vector r(0)\mathbf{r}'(0). Since this holds for any tangent direction, f(x0)\nabla f(\mathbf{x}_0) is normal to the entire tangent space of LcL_c at x0\mathbf{x}_0.[36][37] This perpendicularity has key implications for analysis and computation. The integral curves of the gradient field, known as gradient flow lines, are everywhere normal to the level sets, providing a natural way to traverse from one level set to another along the direction of maximum change. In implicit differentiation, the relation enables computation of tangent spaces or normals to surfaces defined implicitly by f(x)=cf(\mathbf{x}) = c, such as in computer graphics or optimization, without explicit parameterization.[37][1] A simple example in two dimensions is f(x,y)=x2+y2f(x,y) = x^2 + y^2, whose level sets Lc={(x,y)x2+y2=c}L_c = \{ (x,y) \mid x^2 + y^2 = c \} for c>0c > 0 are circles centered at the origin. The gradient f=(2x,2y)\nabla f = (2x, 2y) points radially outward, perpendicular to the tangent (circumferential) direction at every point on the circle. In physics, equipotential surfaces—level sets of electric potential VV—have the electric field E=V\mathbf{E} = -\nabla V normal to them, explaining why field lines are orthogonal to equipotentials in electrostatics.[1][38] At points where f(x0)=0\nabla f(\mathbf{x}_0) = \mathbf{0}, known as critical points, the level set LcL_c may develop singularities, such as cusps or isolated points, and need not form a smooth manifold; the perpendicularity property fails there, complicating local analysis.[36]

Conservative Vector Fields and Gradient Theorem

A vector field V\mathbf{V} defined on a domain in Rn\mathbb{R}^n is called conservative if there exists a scalar potential function ff such that V=f\mathbf{V} = \nabla f.[39] In R3\mathbb{R}^3, for a simply connected domain, a continuously differentiable vector field V\mathbf{V} is conservative if and only if its curl is zero, i.e., ×V=0\nabla \times \mathbf{V} = \mathbf{0}.[40] This irrotational condition ensures that line integrals of V\mathbf{V} are path-independent, meaning the integral from point a\mathbf{a} to b\mathbf{b} yields the same value regardless of the path taken.[39] The gradient theorem, also known as the fundamental theorem for line integrals, states that if V=f\mathbf{V} = \nabla f for a scalar function ff with continuous partial derivatives on a domain, then for any piecewise smooth curve CC parameterized by r(t)\mathbf{r}(t) from t=at = a to t=bt = b, the line integral is given by
CVdr=f(r(b))f(r(a)). \int_C \mathbf{V} \cdot d\mathbf{r} = f(\mathbf{r}(b)) - f(\mathbf{r}(a)).
[40] This result generalizes the one-dimensional fundamental theorem of calculus to higher dimensions./16:_Vector_Calculus/16.03:_The_Fundamental_Theorem_of_Line_Integrals) The proof relies on the chain rule and the fundamental theorem of calculus. Consider the composition g(t)=f(r(t))g(t) = f(\mathbf{r}(t)); then g(t)=f(r(t))r(t)=V(r(t))r(t)g'(t) = \nabla f(\mathbf{r}(t)) \cdot \mathbf{r}'(t) = \mathbf{V}(\mathbf{r}(t)) \cdot \mathbf{r}'(t). Integrating both sides from aa to bb yields
abg(t)dt=abV(r(t))r(t)dt=g(b)g(a)=f(r(b))f(r(a)), \int_a^b g'(t) \, dt = \int_a^b \mathbf{V}(\mathbf{r}(t)) \cdot \mathbf{r}'(t) \, dt = g(b) - g(a) = f(\mathbf{r}(b)) - f(\mathbf{r}(a)),
which is exactly the line integral along CC.[40] For the potential ff to exist, the domain must be simply connected (open, connected, and every closed curve can be continuously shrunk to a point), ensuring that ×V=0\nabla \times \mathbf{V} = \mathbf{0} implies conservativeness.[40] In non-simply connected domains, additional conditions may be needed, but the curl-zero test suffices in simply connected regions.[39] This theorem has key applications in physics, where conservative fields like gravitational or electrostatic forces allow work done by the field to be computed as a potential difference, independent of path. For instance, the gravitational field F=GMmr2r^\mathbf{F} = - \frac{GM m}{r^2} \hat{r} derives from the potential f=GMmrf = - \frac{GM m}{r}, so work is f(b)f(a)f(\mathbf{b}) - f(\mathbf{a}).[40] Similarly, in electrostatics, the electric field E=V\mathbf{E} = - \nabla V yields work as a voltage difference.[40]

Direction of Steepest Ascent

The gradient vector f(x)\nabla f(\mathbf{x}) at a point x\mathbf{x} in the domain of a differentiable scalar function ff points in the direction of the steepest ascent of ff, meaning it maximizes the directional derivative among all unit vectors. The magnitude f(x)|\nabla f(\mathbf{x})| equals the supremum of the directional derivatives f(x)u\nabla f(\mathbf{x}) \cdot \mathbf{u} over all unit vectors u\mathbf{u} with u=1|\mathbf{u}| = 1, and the maximizing direction is given by the unit vector f(x)/f(x)\nabla f(\mathbf{x}) / |\nabla f(\mathbf{x})|. This property arises because the directional derivative f(x)u\nabla f(\mathbf{x}) \cdot \mathbf{u} represents the rate of change of ff along the direction u\mathbf{u}, and the maximum occurs when u\mathbf{u} aligns with f(x)\nabla f(\mathbf{x}).[41] To see this formally, apply the Cauchy-Schwarz inequality to the inner product:
f(x)uf(x)u=f(x), |\nabla f(\mathbf{x}) \cdot \mathbf{u}| \leq |\nabla f(\mathbf{x})| \cdot |\mathbf{u}| = |\nabla f(\mathbf{x})|,
since u=1|\mathbf{u}| = 1. Equality holds if and only if u\mathbf{u} is parallel to f(x)\nabla f(\mathbf{x}), confirming that the gradient direction achieves the supremum and that f(x)|\nabla f(\mathbf{x})| is the maximum rate of increase. The direction of steepest descent, which maximizes the rate of decrease, is then f(x)/f(x)-\nabla f(\mathbf{x}) / |\nabla f(\mathbf{x})|. This duality between ascent and descent directions is fundamental in analyzing local behavior of functions.[41] In optimization, the steepest ascent property underpins gradient ascent algorithms, where iterates are updated as xk+1=xk+tkf(xk)\mathbf{x}_{k+1} = \mathbf{x}_k + t_k \nabla f(\mathbf{x}_k) for a step size tk>0t_k > 0 to maximize convex or nonconvex objectives, such as likelihood functions in statistical models. Similarly, the flow lines of the gradient vector field—curves r(t)\mathbf{r}(t) satisfying drdt=f(r(t))\frac{d\mathbf{r}}{dt} = \nabla f(\mathbf{r}(t))—trace paths of steepest ascent, representing trajectories that follow the field's direction at each point. These paths align with the normals to the level sets of ff, pointing toward regions of higher function values. As an illustrative example, consider the function f(x,y)=x2y2f(x, y) = -x^2 - y^2 in R2\mathbb{R}^2, which models a downward paraboloid with a global maximum at the origin. At a point (x0,y0)(x_0, y_0) away from the origin, f(x0,y0)=(2x0,2y0)\nabla f(x_0, y_0) = (-2x_0, -2y_0), so the unit direction of steepest ascent is (x0,y0)/x02+y02(-x_0, -y_0)/\sqrt{x_0^2 + y_0^2}, directing movement inward toward the peak; following this repeatedly simulates hill-climbing to the maximum.[41]

Generalizations

Jacobian Matrix

The Jacobian matrix provides a generalization of the gradient to functions mapping from Rn\mathbb{R}^n to Rm\mathbb{R}^m, where m>1m > 1. For a differentiable function F:RnRm\mathbf{F}: \mathbb{R}^n \to \mathbb{R}^m with components F1,,FmF_1, \dots, F_m, the Jacobian matrix JFJ_\mathbf{F} at a point xRn\mathbf{x} \in \mathbb{R}^n is the m×nm \times n matrix whose ii-th row is the gradient vector Fi(x)\nabla F_i(\mathbf{x}), given by
JF(x)=(F1x1F1xnFmx1Fmxn). J_\mathbf{F}(\mathbf{x}) = \begin{pmatrix} \frac{\partial F_1}{\partial x_1} & \cdots & \frac{\partial F_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial F_m}{\partial x_1} & \cdots & \frac{\partial F_m}{\partial x_n} \end{pmatrix}.
[12] This matrix represents the best linear approximation to F\mathbf{F} near x\mathbf{x}, capturing how infinitesimal changes in the input variables affect each output component.[12] When m=1m = 1, so F=f:RnR\mathbf{F} = f: \mathbb{R}^n \to \mathbb{R} is scalar-valued, the Jacobian matrix reduces to a 1×n1 \times n row vector that is the transpose of the standard column gradient f\nabla f.[12] In this case, Jf(x)=(f(x))TJ_f(\mathbf{x}) = (\nabla f(\mathbf{x}))^T, linking the two concepts directly as the Jacobian extends the directional information of the gradient to multiple outputs.[12] Key properties of the Jacobian include the chain rule for composition: if F:RmRp\mathbf{F}: \mathbb{R}^m \to \mathbb{R}^p and G:RnRm\mathbf{G}: \mathbb{R}^n \to \mathbb{R}^m are differentiable, then JFG(x)=JF(G(x))JG(x)J_{\mathbf{F} \circ \mathbf{G}}(\mathbf{x}) = J_\mathbf{F}(\mathbf{G}(\mathbf{x})) \cdot J_\mathbf{G}(\mathbf{x}).[42] When the Jacobian is square (m=nm = n), its determinant detJF(x)\det J_\mathbf{F}(\mathbf{x}) measures the local scaling of volumes under the transformation F\mathbf{F}, with detJF(x)|\det J_\mathbf{F}(\mathbf{x})| giving the factor by which infinitesimal volumes in the input space are multiplied in the output space.[12] If detJF(x)0\det J_\mathbf{F}(\mathbf{x}) \neq 0, then F\mathbf{F} is locally invertible near x\mathbf{x}, establishing it as a local diffeomorphism by the inverse function theorem.[43] A representative example is the transformation from polar to Cartesian coordinates in R2\mathbb{R}^2, defined by x=rcosθx = r \cos \theta, y=rsinθy = r \sin \theta. The Jacobian matrix is
J=(cosθrsinθsinθrcosθ), J = \begin{pmatrix} \cos \theta & -r \sin \theta \\ \sin \theta & r \cos \theta \end{pmatrix},
with determinant detJ=r\det J = r.[44] This positive value for r>0r > 0 indicates that the transformation stretches areas by a factor of rr, explaining the adjustment in polar integrals.[44] Applications of the Jacobian include change of variables in multiple integrals, where for a transformation T:RnRn\mathbf{T}: \mathbb{R}^n \to \mathbb{R}^n, the integral F(D)f(y)dy=Df(T(u))detJT(u)du\int_{\mathbf{F}(D)} f(\mathbf{y}) \, d\mathbf{y} = \int_D f(\mathbf{T}(\mathbf{u})) |\det J_\mathbf{T}(\mathbf{u})| \, d\mathbf{u}.[45] The absolute value of the determinant ensures the integral accounts for orientation-preserving or reversing effects while preserving the total measure.[45] Additionally, the invertibility condition via nonzero determinant is essential for confirming local diffeomorphisms in analysis and geometry.[43]

Gradient of Vector Fields

The gradient of a vector field V:R3R3\mathbf{V}: \mathbb{R}^3 \to \mathbb{R}^3 is a second-order tensor, represented as a 3×33 \times 3 matrix whose entries are the partial derivatives of the components of V\mathbf{V}. Specifically, the components are given by (V)ij=Vixj(\nabla \mathbf{V})_{ij} = \frac{\partial V_i}{\partial x_j}, where the ii-th row corresponds to the gradient of the scalar component ViV_i.[12] In explicit matrix form,
V=(V1x1V1x2V1x3V2x1V2x2V2x3V3x1V3x2V3x3). \nabla \mathbf{V} = \begin{pmatrix} \frac{\partial V_1}{\partial x_1} & \frac{\partial V_1}{\partial x_2} & \frac{\partial V_1}{\partial x_3} \\ \frac{\partial V_2}{\partial x_1} & \frac{\partial V_2}{\partial x_2} & \frac{\partial V_2}{\partial x_3} \\ \frac{\partial V_3}{\partial x_1} & \frac{\partial V_3}{\partial x_2} & \frac{\partial V_3}{\partial x_3} \end{pmatrix}.
This matrix is a special case of the Jacobian matrix for vector-valued functions from R3\mathbb{R}^3 to R3\mathbb{R}^3.[12] The gradient tensor can be decomposed into its symmetric and antisymmetric parts, which capture the deformation and rotation of the field, respectively. The trace of V\nabla \mathbf{V} equals the divergence V=i=13Vixi\nabla \cdot \mathbf{V} = \sum_{i=1}^3 \frac{\partial V_i}{\partial x_i}, measuring the net flux out of a volume element.[46] The antisymmetric part relates to the curl ×V\nabla \times \mathbf{V}, where the curl vector is twice the axial vector associated with this antisymmetric tensor.[46] In fluid dynamics, the gradient of the velocity field u\mathbf{u} plays a central role in describing local fluid behavior. A divergence of zero, u=0\nabla \cdot \mathbf{u} = 0, characterizes incompressible flows, where fluid elements neither expand nor contract, simplifying the Navier-Stokes equations.[47] The curl ×u\nabla \times \mathbf{u} defines the vorticity ω\boldsymbol{\omega}, which quantifies the local rotation or spinning of fluid parcels around an axis.[48] For example, consider a simple shear flow with velocity field u=(y,0,0)\mathbf{u} = (y, 0, 0). The gradient tensor is
u=(010000000), \nabla \mathbf{u} = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix},
yielding u=0\nabla \cdot \mathbf{u} = 0 (incompressible) and ω=×u=(0,0,1)\boldsymbol{\omega} = \nabla \times \mathbf{u} = (0, 0, -1), indicating uniform vorticity in the negative zz-direction due to shearing.[49]

On Riemannian Manifolds

In a Riemannian manifold (M,g)(M, g), the gradient of a smooth scalar function f:MRf: M \to \mathbb{R} is the unique vector field f\nabla f satisfying g(f,X)=df(X)g(\nabla f, X) = df(X) for every smooth vector field XX on MM, where dfdf is the differential of ff. Equivalently, f\nabla f is obtained by applying the musical isomorphism induced by the metric gg, which raises the index of the covector dfdf, yielding f=g1(df)\nabla f = g^{-1}(df). This definition ensures that f\nabla f points in the direction of steepest ascent of ff with respect to the geometry defined by gg.[50][51] In local coordinates (xi)(x^i) on MM, the components of the gradient are given by f=gijfxjxi\nabla f = g^{ij} \frac{\partial f}{\partial x^j} \frac{\partial}{\partial x^i}, where gijg^{ij} are the entries of the inverse metric tensor and summation over repeated indices is implied. This expression arises directly from contracting the covector components fxj\frac{\partial f}{\partial x^j} with gijg^{ij}, without involvement of connection terms, as the differential dfdf is covariantly constant for scalars. The squared norm of the gradient is then f2=g(f,f)=gijfxifxj|\nabla f|^2 = g(\nabla f, \nabla f) = g^{ij} \frac{\partial f}{\partial x^i} \frac{\partial f}{\partial x^j}, which quantifies the maximum rate of change of ff at each point.[50][51] The integral curves of f\nabla f, known as gradient flow lines, satisfy the ordinary differential equation dγdt=f(γ(t))\frac{d\gamma}{dt} = \nabla f(\gamma(t)) and evolve to increase ff along geodesics in the direction of f\nabla f when appropriately normalized, though the flow itself incorporates the Hessian of ff in its acceleration. A classic example occurs on the unit sphere S2R3S^2 \subset \mathbb{R}^3 endowed with the induced Riemannian metric from the Euclidean inner product. For the height function f(p)=zf(p) = z, where p=(x,y,z)S2p = (x, y, z) \in S^2 and zz is the third coordinate, the gradient at pp is the orthogonal projection of the ambient Euclidean gradient (0,0,1)(0, 0, 1) onto the tangent space TpS2T_p S^2, given explicitly by f(p)=(0,0,1)zp=(xz,yz,1z2)\nabla f(p) = (0, 0, 1) - z p = (-xz, -yz, 1 - z^2). This vector field vanishes at the poles (0,0,±1)(0, 0, \pm 1), the critical points of ff, and points equatorially elsewhere, directing flow toward the north pole.[50][52] When the Riemannian manifold is flat, such as Euclidean space in Cartesian coordinates where the metric is δij\delta_{ij} and the Christoffel symbols Γijk=0\Gamma^k_{ij} = 0, the expression simplifies to the classical gradient f=ifxixi\nabla f = \sum_i \frac{\partial f}{\partial x^i} \frac{\partial}{\partial x^i}, recovering the familiar directional derivative structure. This flat limit highlights how the Riemannian gradient generalizes the Euclidean case to account for intrinsic geometry via the metric.[50]

References

User Avatar
No comments yet.