Recent from talks
Contribute something
Nothing was collected or created yet.
Sigmoid function
View on Wikipedia


A sigmoid function is any mathematical function whose graph has a characteristic S-shaped or sigmoid curve.
A common example of a sigmoid function is the logistic function, which is defined by the formula[1]
Other sigmoid functions are given in the Examples section. In some fields, most notably in the context of artificial neural networks, the term "sigmoid function" is used as a synonym for "logistic function".
Special cases of the sigmoid function include the Gompertz curve (used in modeling systems that saturate at large values of x) and the ogee curve (used in the spillway of some dams). Sigmoid functions have domain of all real numbers, with return (response) value commonly monotonically increasing but could be decreasing. Sigmoid functions most often show a return value (y axis) in the range 0 to 1. Another commonly used range is from −1 to 1.
A wide variety of sigmoid functions including the logistic and hyperbolic tangent functions have been used as the activation function of artificial neurons. Sigmoid curves are also common in statistics as cumulative distribution functions (which go from 0 to 1), such as the integrals of the logistic density, the normal density, and Student's t probability density functions. The logistic sigmoid function is invertible, and its inverse is the logit function.
Definition
[edit]A sigmoid function is a bounded, differentiable, real function that is defined for all real input values and has a positive derivative at each point.[1][2]
Properties
[edit]In general, a sigmoid function is monotonic, and has a first derivative which is bell shaped. Conversely, the integral of any continuous, non-negative, bell-shaped function (with one local maximum and no local minimum, unless degenerate) will be sigmoidal. Thus the cumulative distribution functions for many common probability distributions are sigmoidal. One such example is the error function, which is related to the cumulative distribution function of a normal distribution; another is the arctan function, which is related to the cumulative distribution function of a Cauchy distribution.
A sigmoid function is constrained by a pair of horizontal asymptotes as .
A sigmoid function is convex for values less than a particular point, and it is concave for values greater than that point: in many of the examples here, that point is 0.
Examples
[edit]
- Logistic function
- Hyperbolic tangent (shifted and scaled version of the logistic function, above)
- Arctangent function
- Gudermannian function
- Error function
- Generalised logistic function
- Smoothstep function
- Some algebraic functions, for example
- and in a more general form[3]
- Up to shifts and scaling, many sigmoids are special cases of where is the inverse of the negative Box–Cox transformation, and and are shape parameters.[4]
- Smooth transition function[5] normalized to (−1,1):
using the hyperbolic tangent mentioned above. Here, is a free parameter encoding the slope at , which must be greater than or equal to because any smaller value will result in a function with multiple inflection points, which is therefore not a true sigmoid. This function is unusual because it actually attains the limiting values of −1 and 1 within a finite range, meaning that its value is constant at −1 for all and at 1 for all . Nonetheless, it is smooth (infinitely differentiable, ) everywhere, including at .
Applications
[edit]
Many natural processes, such as those of complex system learning curves, exhibit a progression from small beginnings that accelerates and approaches a climax over time.[6] When a specific mathematical model is lacking, a sigmoid function is often used.[7]
The van Genuchten–Gupta model is based on an inverted S-curve and applied to the response of crop yield to soil salinity.
Examples of the application of the logistic S-curve to the response of crop yield (wheat) to both the soil salinity and depth to water table in the soil are shown in modeling crop response in agriculture.
In artificial neural networks, sometimes non-smooth functions are used instead for efficiency; these are known as hard sigmoids.
In audio signal processing, sigmoid functions are used as waveshaper transfer functions to emulate the sound of analog circuitry clipping.[8]
In biochemistry and pharmacology, the Hill and Hill–Langmuir equations are sigmoid functions.
In computer graphics and real-time rendering, some of the sigmoid functions are used to blend colors or geometry between two values, smoothly and without visible seams or discontinuities.
Titration curves between strong acids and strong bases have a sigmoid shape due to the logarithmic nature of the pH scale.
The logistic function can be calculated efficiently by utilizing type III Unums.[9]
An hierarchy of sigmoid growth models with increasing complexity (number of parameters) was built[10] with the primary goal to re-analyze kinetic data, the so called N-t curves, from heterogeneous nucleation experiments,[11] in electrochemistry. The hierarchy includes at present three models, with 1, 2 and 3 parameters, if not counting the maximal number of nuclei Nmax, respectively—a tanh2 based model called α21[12] originally devised to describe diffusion-limited crystal growth (not aggregation!) in 2D, the Johnson–Mehl–Avrami–Kolmogorov (JMAK) model,[13] and the Richards model.[14] It was shown that for the concrete purpose even the simplest model works and thus it was implied that the experiments revisited are an example of two-step nucleation with the first step being the growth of the metastable phase in which the nuclei of the stable phase form.[10]
See also
[edit]- Step function – Linear combination of indicator functions of real intervals
- Sign function – Function returning minus 1, zero or plus 1
- Heaviside step function – Indicator function of positive numbers
- Logistic regression – Statistical model for a binary dependent variable
- Logit – Function in statistics
- Softplus function – Smoothed ramp function
- Soboleva modified hyperbolic tangent – Mathematical activation function in data analysis
- Softmax function – Smooth approximation of one-hot arg max
- Swish function – Mathematical activation function in data analysis
- Weibull distribution – Continuous probability distribution
- Fermi–Dirac statistics – Statistical description for the behavior of fermions
- HELIOS Hybrid Evaluation of Lifecycle and Impact of Outstanding Science
References
[edit]- ^ a b Han, Jun; Morag, Claudio (1995). "The influence of the sigmoid function parameters on the speed of backpropagation learning". In Mira, José; Sandoval, Francisco (eds.). From Natural to Artificial Neural Computation. Lecture Notes in Computer Science. Vol. 930. pp. 195–201. doi:10.1007/3-540-59497-3_175. ISBN 978-3-540-59497-0.
- ^ Ling, Yibei; He, Bin (December 1993). "Entropic analysis of biological growth models". IEEE Transactions on Biomedical Engineering. 40 (12): 1193–2000. doi:10.1109/10.250574. PMID 8125495.
- ^ Dunning, Andrew J.; Kensler, Jennifer; Coudeville, Laurent; Bailleux, Fabrice (2015-12-28). "Some extensions in continuous methods for immunological correlates of protection". BMC Medical Research Methodology. 15 (107): 107. doi:10.1186/s12874-015-0096-9. PMC 4692073. PMID 26707389.
- ^ "grex --- Growth-curve Explorer". GitHub. 2022-07-09. Archived from the original on 2022-08-25. Retrieved 2022-08-25.
- ^ EpsilonDelta (2022-08-16). "Smooth Transition Function in One Dimension | Smooth Transition Function Series Part 1". 13:29/14:04 – via www.youtube.com.
- ^ Laurens Speelman, Yuki Numata (2022). "Harnessing the Power of S-Curves". RMI. Rocky Mountain Institute.
- ^ Gibbs, Mark N.; Mackay, D. (November 2000). "Variational Gaussian process classifiers". IEEE Transactions on Neural Networks. 11 (6): 1458–1464. doi:10.1109/72.883477. PMID 18249869. S2CID 14456885.
- ^ Smith, Julius O. (2010). Physical Audio Signal Processing (2010 ed.). W3K Publishing. ISBN 978-0-9745607-2-4. Archived from the original on 2022-07-14. Retrieved 2020-03-28.
- ^ Gustafson, John L.; Yonemoto, Isaac (2017-06-12). "Beating Floating Point at its Own Game: Posit Arithmetic" (PDF). Archived (PDF) from the original on 2022-07-14. Retrieved 2019-12-28.
- ^ a b Kleshtanova, Viktoria and Ivanov, Vassil V and Hodzhaoglu, Feyzim and Prieto, Jose Emilio and Tonchev, Vesselin (2023). "Heterogeneous Substrates Modify Non-Classical Nucleation Pathways: Reanalysis of Kinetic Data from the Electrodeposition of Mercury on Platinum Using Hierarchy of Sigmoid Growth Models". Crystals. 13 (12). MDPI: 1690. doi:10.3390/cryst13121690. hdl:10261/341589.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Markov, I. and Stoycheva, E. (1976). "Saturation Nucleus Density in the Electrodeposition of Metals onto Inert Electrodes II. Experimental". Thin Solid Films. 35 (1). Elsevier: 21–35. doi:10.1016/0040-6090(76)90237-6.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Ivanov, V.V. and Tielemann, C. and Avramova, K. and Reinsch, S. and Tonchev, V. (2023). "Modelling Crystallization: When the Normal Growth Velocity Depends on the Supersaturation". Journal of Physics and Chemistry of Solids. 181 111542. Elsevier. doi:10.1016/j.jpcs.2023.111542.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Fanfoni, M. and Tomellini, M. (1998). "The Johnson-Mehl-Avrami-Kolmogorov Model: A Brief Review". Il Nuovo Cimento D. 20. Springer: 1171–1182. doi:10.1007/BF03185527.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Tjørve, E. and Tjørve, K.M.C. (2010). "A Unified Approach to the Richards-Model Family for Use in Growth Analyses: Why We Need Only Two Model Forms". Journal of Theoretical Biology. 267 (3). Elsevier: 417–425. doi:10.1016/j.jtbi.2010.09.008.
{{cite journal}}: CS1 maint: multiple names: authors list (link)
Further reading
[edit]- Mitchell, Tom M. (1997). Machine Learning. WCB McGraw–Hill. ISBN 978-0-07-042807-2.. (NB. In particular see "Chapter 4: Artificial Neural Networks" (in particular pp. 96–97) where Mitchell uses the word "logistic function" and the "sigmoid function" synonymously – this function he also calls the "squashing function" – and the sigmoid (aka logistic) function is used to compress the outputs of the "neurons" in multi-layer neural nets.)
- Humphrys, Mark. "Continuous output, the sigmoid function". Archived from the original on 2022-07-14. Retrieved 2022-07-14. (NB. Properties of the sigmoid, including how it can shift along axes and how its domain may be transformed.)
External links
[edit]- "Fitting of logistic S-curves (sigmoids) to data using SegRegA". Archived from the original on 2022-07-14.
Sigmoid function
View on GrokipediaMathematical Foundations
Definition
A sigmoid function is a mathematical function that maps the real numbers to a bounded interval, typically (0,1) or (-1,1), producing a characteristic S-shaped curve.[7] This shape arises from the function's behavior in transitioning smoothly between its limiting values, making it useful for modeling processes with saturation effects.[8] Formally, a sigmoid function satisfies as finite horizontal asymptotes, is strictly increasing such that for all , continuous, and differentiable, with and .[8] It features exactly one inflection point, where the concavity changes from downward to upward.[7] Monotonicity in this context means the function preserves the order of inputs: for any , , ensuring a consistent progression along the S-curve without reversals.[7] Horizontal asymptotes represent the unchanging limits the function approaches at the extremes of the domain, preventing unbounded growth or decline.[9] The inflection point marks the location of maximum slope, where the rate of change is steepest, dividing the curve into symmetric or asymmetric regions of acceleration and deceleration.[8]Properties
Sigmoid functions are continuous and infinitely differentiable over the entire real line, ensuring smoothness that facilitates their use in analytical models and numerical computations. This C^∞ property holds for standard sigmoid functions, such as those in the logistic family, allowing for higher-order derivatives without discontinuities.[10][11] Their first derivative is strictly positive everywhere, reflecting the absence of flat regions or reversals in the function's growth.[12] These functions exhibit strict monotonicity, being increasing across their domain, which underpins their S-shaped profile and ensures a unique mapping from inputs to outputs within the bounded range. Regarding convexity, sigmoid functions are convex for inputs below the inflection point and concave above it, with the second derivative changing sign exactly once, marking a transition from accelerating to decelerating growth. This sigmoidal convexity is a defining behavioral trait, distinguishing them from purely convex or concave functions.[13][11] Horizontal asymptotes characterize the long-term behavior: as , the function approaches an upper bound (typically 1), and as , it approaches a lower bound (typically 0). For symmetric variants centered at the origin, the inflection point occurs at , where the function value is midway between the asymptotes. The derivative of logistic-like sigmoids takes the form , achieving its maximum value at the inflection point, which quantifies the steepest rate of change.[11][13] Symmetry properties include the relation for standard logistic sigmoids, implying antisymmetry around the midpoint. Under affine transformations—such as scaling by a positive constant or shifting the argument—the function retains its sigmoid nature, preserving monotonicity, boundedness, and the single inflection point. This invariance supports generalizations while maintaining core behavioral traits.[10][11] The uniqueness of the inflection point, where the concavity switches, ensures a single transition in the function's curvature, a hallmark that aligns with their role as activation functions in neural networks for modeling nonlinear transitions.[13][12]Variants and Generalizations
Logistic Sigmoid
The logistic sigmoid function, in its standard form, is defined as which maps every real number to a value in the open interval , asymptotically approaching 0 for large negative and 1 for large positive .[14] This normalization arises naturally in contexts requiring bounded outputs between 0 and 1, such as probability estimates. A generalized parameterization of the logistic function extends this form to where specifies the upper horizontal asymptote (maximum value), controls the steepness or growth rate of the curve, and denotes the midpoint, or inflection point, where .[15] This flexible form allows modeling of various S-shaped growth processes by adjusting the parameters to fit empirical data. The logistic function originates from solving the logistic differential equation a model for bounded growth where is the population at time , is the intrinsic growth rate, and is the carrying capacity.[15] Separation of variables and integration yield the explicit solution , where is the initial value; rescaling time so that and normalizing by recovers the generalized logistic form with , , and . This derivation, introduced by Pierre Verhulst in 1838 (with the term 'logistic' coined in 1845), highlights the function's roots in exponential growth tempered by resource limits.[15] To map the standard logistic sigmoid to other intervals, such as , the transformation is commonly applied, which produces an odd function symmetric about the origin.[16] This scaled version equals , linking it to hyperbolic functions while preserving the S-shape.[17] In computational implementations, direct evaluation of risks overflow or underflow for large due to the exponential term exceeding floating-point limits. To mitigate this, approximations are employed, such as returning 0 for and 1 for , or using equivalent expressions like for to maintain numerical stability without loss of precision in typical ranges.[18]Other Sigmoid Functions
The hyperbolic tangent function, defined as serves as a prominent sigmoid alternative, mapping inputs to the range (-1, 1) and exhibiting symmetry around zero due to its odd nature.[8] This zero-centered output facilitates faster convergence in optimization processes compared to positively biased sigmoids.[19] Its saturation occurs at a moderate rate, with steeper gradients near the origin than exponential-based forms.[8] Another variant is the arctangent-based sigmoid, commonly scaled as which bounds outputs to (0, 1) while providing a smooth, monotonic transition.[20] This form demonstrates slower saturation than hyperbolic tangent, as its approach to asymptotes is more gradual, owing to the bounded derivative of the arctangent.[21] It maintains odd symmetry in its unscaled version but is adjusted for positive range applications.[20] The Gompertz function offers an asymmetric sigmoid, given by where sets the upper asymptote, and control growth parameters, yielding a range of (0, a).[22] Its curve features delayed initial rise followed by rapid acceleration, contrasting with symmetric sigmoids through pronounced asymmetry.[22] Saturation in the upper region is slower than in logistic forms, reflecting its double-exponential structure.[13] Algebraic sigmoids provide computationally efficient alternatives, such as the rational form , which approximates a bounded S-curve over (-1, 1) without exponentials.[13] Piecewise or rational constructions like this enable faster evaluation in resource-constrained settings, though they may introduce minor discontinuities in derivatives.[23] These functions differ notably in saturation speed, with arctangent showing the slowest approach to bounds, hyperbolic tangent offering balanced steepness, and Gompertz displaying asymmetric deceleration.[19] Symmetry varies from the odd, zero-centered hyperbolic tangent to the asymmetric Gompertz, while bounded ranges consistently limit outputs to finite intervals, preserving monotonicity as a shared sigmoid trait.[13] Algebraic variants prioritize efficiency over smoothness, saturating more abruptly in approximations.[23]Applications
Statistics and Probability
In statistics, the sigmoid function plays a central role in modeling binary outcomes through its connection to the logistic distribution. The cumulative distribution function (CDF) of the logistic distribution is given by the logistic sigmoid: where is the location parameter representing the mean and median, and is the scale parameter that controls the spread and steepness of the distribution.[24] This form ensures that maps any real-valued input to a probability between 0 and 1, making it suitable for representing cumulative probabilities in probabilistic models. The logistic distribution is symmetric and bell-shaped, with variance , and arises naturally in contexts where errors follow a logistic rather than normal distribution.[24] The logistic sigmoid also serves as an approximation to the cumulative distribution function of the standard normal distribution in probit models, providing a computationally simpler alternative in logistic regression. Specifically, the sigmoid closely resembles , where is the normal CDF and scales the argument for a good fit, particularly in the central region around zero.[25] This approximation justifies the use of the logistic model over probit in many applications, as it yields similar coefficient estimates while avoiding the need for numerical integration of the normal CDF.[26] In logistic regression, the sigmoid output interprets (the linear predictor) as the log-odds of the positive outcome, where the probability satisfies for the standard case with scale .[27] This relationship allows coefficients to be exponentiated directly into odds ratios, quantifying how the odds change with predictors; for instance, a coefficient implies an odds ratio of , meaning a one-unit increase in the -th predictor multiplies the odds by 1.65, holding other variables constant.[28] Bayesian frameworks leverage the logistic sigmoid for updating posterior probabilities in binary classification, often modeling the posterior odds as a logistic function of evidence under conjugate priors like the logistic-normal approximation.[29] In Bayesian logistic regression, the sigmoid arises when integrating over parameter uncertainty, enabling variational inference to approximate intractable posteriors and update beliefs about class probabilities based on observed data.[30] Parameter estimation in sigmoid-based models, such as logistic regression, typically employs maximum likelihood estimation (MLE) to maximize the log-likelihood , where are binary responses.[31] This objective is convex, ensuring a unique global maximum solvable via gradient-based methods like Newton-Raphson, which iteratively update using the score function and Hessian derived from the sigmoid's derivative .[32] MLE provides consistent and asymptotically efficient estimates under standard regularity conditions, forming the basis for inference in these models.[33]Machine Learning and Neural Networks
In artificial neural networks, the sigmoid function serves as an activation function that introduces non-linearity into the model, enabling it to learn complex patterns beyond linear transformations. Applied to the weighted sum of inputs in hidden layers, it maps real-valued inputs to the range (0, 1), which facilitates the representation of hierarchical features during forward propagation. In the output layer for binary classification tasks, the sigmoid's output is interpreted as the probability of belonging to the positive class, aligning with probabilistic decision-making. A key advantage of the sigmoid in training neural networks via backpropagation lies in its derivative, which simplifies gradient computation. The derivative is given by: This closed-form expression allows efficient calculation of error gradients during the backward pass, as it depends only on the sigmoid's output without requiring additional forward computations. This property contributed to the widespread adoption of sigmoid activations in early multilayer perceptrons, where backpropagation was first demonstrated effectively. Despite these benefits, the sigmoid activation suffers from the vanishing gradient problem, where gradients approach zero for large positive or negative inputs due to the function's saturation in the flat regions near 0 and 1.[34] This leads to slow or stalled learning in deep networks, as updates to earlier layer weights become negligible during backpropagation.[34] To mitigate this, alternatives like the rectified linear unit (ReLU) activation, which avoids saturation for positive inputs, have become preferred in hidden layers of modern architectures. In popular deep learning frameworks, the sigmoid is implemented with optimizations for numerical stability. For instance, TensorFlow providestf.keras.activations.sigmoid, which handles large inputs to prevent overflow in the exponential term.[35] Similarly, PyTorch's torch.nn.Sigmoid module applies the function element-wise, often paired with stable variants like log_sigmoid for loss computations involving logarithms, computed as to avoid underflow.
For binary classification outputs, the sigmoid is typically applied in the final layer, followed by binary cross-entropy loss to measure divergence between predicted probabilities and true labels. This combination encourages the model to produce well-calibrated probabilities, with the loss defined as , where and is the linear output. Frameworks like TensorFlow support a from_logits=True option in binary cross-entropy to apply the sigmoid internally, enhancing numerical stability by avoiding explicit computation of the sigmoid on raw logits.