Hubbry Logo
Sigmoid functionSigmoid functionMain
Open search
Sigmoid function
Community hub
Sigmoid function
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Sigmoid function
Sigmoid function
from Wikipedia

The logistic curve
Plot of the error function

A sigmoid function is any mathematical function whose graph has a characteristic S-shaped or sigmoid curve.

A common example of a sigmoid function is the logistic function, which is defined by the formula[1]

Other sigmoid functions are given in the Examples section. In some fields, most notably in the context of artificial neural networks, the term "sigmoid function" is used as a synonym for "logistic function".

Special cases of the sigmoid function include the Gompertz curve (used in modeling systems that saturate at large values of x) and the ogee curve (used in the spillway of some dams). Sigmoid functions have domain of all real numbers, with return (response) value commonly monotonically increasing but could be decreasing. Sigmoid functions most often show a return value (y axis) in the range 0 to 1. Another commonly used range is from −1 to 1.

A wide variety of sigmoid functions including the logistic and hyperbolic tangent functions have been used as the activation function of artificial neurons. Sigmoid curves are also common in statistics as cumulative distribution functions (which go from 0 to 1), such as the integrals of the logistic density, the normal density, and Student's t probability density functions. The logistic sigmoid function is invertible, and its inverse is the logit function.

Definition

[edit]

A sigmoid function is a bounded, differentiable, real function that is defined for all real input values and has a positive derivative at each point.[1][2]

Properties

[edit]

In general, a sigmoid function is monotonic, and has a first derivative which is bell shaped. Conversely, the integral of any continuous, non-negative, bell-shaped function (with one local maximum and no local minimum, unless degenerate) will be sigmoidal. Thus the cumulative distribution functions for many common probability distributions are sigmoidal. One such example is the error function, which is related to the cumulative distribution function of a normal distribution; another is the arctan function, which is related to the cumulative distribution function of a Cauchy distribution.

A sigmoid function is constrained by a pair of horizontal asymptotes as .

A sigmoid function is convex for values less than a particular point, and it is concave for values greater than that point: in many of the examples here, that point is 0.

Examples

[edit]
Some sigmoid functions compared. In the drawing all functions are normalized in such a way that their slope at the origin is 1.
  • Logistic function
  • Hyperbolic tangent (shifted and scaled version of the logistic function, above)
  • Arctangent function
  • Gudermannian function
  • Error function
  • Generalised logistic function
  • Smoothstep function
  • Some algebraic functions, for example
  • and in a more general form[3]
  • Up to shifts and scaling, many sigmoids are special cases of where is the inverse of the negative Box–Cox transformation, and and are shape parameters.[4]
  • Smooth transition function[5] normalized to (−1,1):

using the hyperbolic tangent mentioned above. Here, is a free parameter encoding the slope at , which must be greater than or equal to because any smaller value will result in a function with multiple inflection points, which is therefore not a true sigmoid. This function is unusual because it actually attains the limiting values of −1 and 1 within a finite range, meaning that its value is constant at −1 for all and at 1 for all . Nonetheless, it is smooth (infinitely differentiable, ) everywhere, including at .

Applications

[edit]
Inverted logistic S-curve to model the relation between wheat yield and soil salinity

Many natural processes, such as those of complex system learning curves, exhibit a progression from small beginnings that accelerates and approaches a climax over time.[6] When a specific mathematical model is lacking, a sigmoid function is often used.[7]

The van Genuchten–Gupta model is based on an inverted S-curve and applied to the response of crop yield to soil salinity.

Examples of the application of the logistic S-curve to the response of crop yield (wheat) to both the soil salinity and depth to water table in the soil are shown in modeling crop response in agriculture.

In artificial neural networks, sometimes non-smooth functions are used instead for efficiency; these are known as hard sigmoids.

In audio signal processing, sigmoid functions are used as waveshaper transfer functions to emulate the sound of analog circuitry clipping.[8]

In biochemistry and pharmacology, the Hill and Hill–Langmuir equations are sigmoid functions.

In computer graphics and real-time rendering, some of the sigmoid functions are used to blend colors or geometry between two values, smoothly and without visible seams or discontinuities.

Titration curves between strong acids and strong bases have a sigmoid shape due to the logarithmic nature of the pH scale.

The logistic function can be calculated efficiently by utilizing type III Unums.[9]

An hierarchy of sigmoid growth models with increasing complexity (number of parameters) was built[10] with the primary goal to re-analyze kinetic data, the so called N-t curves, from heterogeneous nucleation experiments,[11] in electrochemistry. The hierarchy includes at present three models, with 1, 2 and 3 parameters, if not counting the maximal number of nuclei Nmax, respectively—a tanh2 based model called α21[12] originally devised to describe diffusion-limited crystal growth (not aggregation!) in 2D, the Johnson–Mehl–Avrami–Kolmogorov (JMAK) model,[13] and the Richards model.[14] It was shown that for the concrete purpose even the simplest model works and thus it was implied that the experiments revisited are an example of two-step nucleation with the first step being the growth of the metastable phase in which the nuclei of the stable phase form.[10]

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The sigmoid function, also known as the logistic sigmoid or simply the sigmoid, is a mathematical function that maps any real-valued number to an output between 0 and 1, producing a characteristic S-shaped curve. It is commonly defined by the formula σ(x)=11+ex\sigma(x) = \frac{1}{1 + e^{-x}}, where ee is the base of the natural logarithm; this form ensures the output approaches 1 as xx becomes large and positive, approaches 0 as xx becomes large and negative, and equals 0.5 at x=0x = 0. The function is continuous, differentiable, and strictly increasing, making it suitable for modeling bounded growth processes and probabilistic interpretations. Originally developed in the context of population dynamics, the sigmoid function traces its roots to the work of Belgian mathematician Pierre François Verhulst, who introduced the logistic equation in 1838 to describe limited population growth approaching a carrying capacity. Verhulst's model, published in Correspondance Mathématique et Physique, generalized exponential growth by incorporating an upper bound, yielding the differential equation dNdt=rN(1NK)\frac{dN}{dt} = rN\left(1 - \frac{N}{K}\right), whose solution involves the sigmoid form. This logistic curve gained renewed attention in the 20th century for applications in ecology, epidemiology, and economics, where it models phenomena like diffusion of innovations or resource saturation. In modern statistics and , the sigmoid function underpins , a foundational method for that estimates the probability of a binary outcome using the link: p=σ(wTx+b)p = \sigma(\mathbf{w}^T \mathbf{x} + b), where w\mathbf{w} and bb are parameters learned via maximum likelihood. In artificial neural networks, it serves as an to introduce nonlinearity, enabling the approximation of complex functions; its use was popularized in the 1986 seminal paper on by Rumelhart, Hinton, and Williams, which demonstrated efficient training of multilayer networks with sigmoid units. Despite its advantages in interpretability and smoothness, the sigmoid's —where derivatives approach zero for large |x|—has led to alternatives like ReLU in deeper architectures, though it remains influential in probabilistic modeling and shallow networks.

Mathematical Foundations

Definition

A sigmoid function is a mathematical function that maps the real numbers to a bounded interval, typically (0,1) or (-1,1), producing a characteristic S-shaped curve. This shape arises from the function's behavior in transitioning smoothly between its limiting values, making it useful for modeling processes with saturation effects. Formally, a sigmoid function σ:R(a,b)\sigma: \mathbb{R} \to (a, b) satisfies a<ba < b as finite horizontal asymptotes, is strictly increasing such that σ(x)>0\sigma'(x) > 0 for all xx, continuous, and differentiable, with limxσ(x)=a\lim_{x \to -\infty} \sigma(x) = a and limxσ(x)=b\lim_{x \to \infty} \sigma(x) = b. It features exactly one , where the concavity changes from downward to upward. Monotonicity in this context means the function preserves the order of inputs: for any x1<x2x_1 < x_2, σ(x1)<σ(x2)\sigma(x_1) < \sigma(x_2), ensuring a consistent progression along the S-curve without reversals. Horizontal asymptotes represent the unchanging limits the function approaches at the extremes of the domain, preventing unbounded growth or decline. The inflection point marks the location of maximum slope, where the rate of change is steepest, dividing the curve into symmetric or asymmetric regions of acceleration and deceleration.

Properties

Sigmoid functions are continuous and infinitely differentiable over the entire real line, ensuring smoothness that facilitates their use in analytical models and numerical computations. This C^∞ property holds for standard sigmoid functions, such as those in the logistic family, allowing for higher-order derivatives without discontinuities. Their first derivative is strictly positive everywhere, reflecting the absence of flat regions or reversals in the function's growth. These functions exhibit strict monotonicity, being increasing across their domain, which underpins their S-shaped profile and ensures a unique mapping from inputs to outputs within the bounded range. Regarding convexity, sigmoid functions are convex for inputs below the inflection point and concave above it, with the second derivative changing sign exactly once, marking a transition from accelerating to decelerating growth. This sigmoidal convexity is a defining behavioral trait, distinguishing them from purely convex or concave functions. Horizontal asymptotes characterize the long-term behavior: as xx \to \infty, the function approaches an upper bound (typically 1), and as xx \to -\infty, it approaches a lower bound (typically 0). For symmetric variants centered at the origin, the inflection point occurs at x=0x = 0, where the function value is midway between the asymptotes. The derivative of logistic-like sigmoids takes the form σ(x)=σ(x)(1σ(x))\sigma'(x) = \sigma(x) (1 - \sigma(x)), achieving its maximum value at the inflection point, which quantifies the steepest rate of change. Symmetry properties include the relation σ(x)+σ(x)=1\sigma(x) + \sigma(-x) = 1 for standard logistic sigmoids, implying antisymmetry around the midpoint. Under affine transformations—such as scaling by a positive constant or shifting the argument—the function retains its sigmoid nature, preserving monotonicity, boundedness, and the single inflection point. This invariance supports generalizations while maintaining core behavioral traits. The uniqueness of the inflection point, where the concavity switches, ensures a single transition in the function's curvature, a hallmark that aligns with their role as activation functions in neural networks for modeling nonlinear transitions.

Variants and Generalizations

Logistic Sigmoid

The logistic sigmoid function, in its standard form, is defined as σ(x)=11+ex,\sigma(x) = \frac{1}{1 + e^{-x}}, which maps every real number xx to a value in the open interval (0,1)(0, 1), asymptotically approaching 0 for large negative xx and 1 for large positive xx. This normalization arises naturally in contexts requiring bounded outputs between 0 and 1, such as probability estimates. A generalized parameterization of the logistic function extends this form to σ(x)=L1+ek(xx0),\sigma(x) = \frac{L}{1 + e^{-k(x - x_0)}}, where L>0L > 0 specifies the upper horizontal (maximum value), k>0k > 0 controls the steepness or growth rate of the , and x0x_0 denotes the midpoint, or , where σ(x0)=L/2\sigma(x_0) = L/2. This flexible form allows modeling of various S-shaped growth processes by adjusting the parameters to fit empirical data. The originates from solving the dPdt=rP(1PK),\frac{dP}{dt} = r P \left(1 - \frac{P}{K}\right), a model for bounded growth where P(t)P(t) is the population at time tt, r>0r > 0 is the intrinsic growth rate, and K>0K > 0 is the carrying capacity. Separation of variables and integration yield the explicit solution P(t)=K1+(KP01)ertP(t) = \frac{K}{1 + \left(\frac{K}{P_0} - 1\right) e^{-rt}}, where P0=P(0)P_0 = P(0) is the initial value; rescaling time so that x=rtx = rt and normalizing by KK recovers the generalized logistic form with L=KL = K, k=rk = r, and x0=1rln(KP01)x_0 = \frac{1}{r} \ln\left(\frac{K}{P_0} - 1\right). This derivation, introduced by Pierre Verhulst in 1838 (with the term 'logistic' coined in 1845), highlights the function's roots in exponential growth tempered by resource limits. To map the standard logistic sigmoid to other intervals, such as (1,1)(-1, 1), the transformation 2σ(x)12\sigma(x) - 1 is commonly applied, which produces an odd function symmetric about the origin. This scaled version equals tanh(x/2)\tanh(x/2), linking it to while preserving the S-shape. In computational implementations, direct evaluation of σ(x)\sigma(x) risks overflow or underflow for large x|x| due to the exponential term exceeding floating-point limits. To mitigate this, approximations are employed, such as returning 0 for x0x \ll 0 and 1 for x0x \gg 0, or using equivalent expressions like σ(x)=ex/(1+ex)\sigma(x) = e^x / (1 + e^x) for x<0x < 0 to maintain numerical stability without loss of precision in typical ranges.

Other Sigmoid Functions

The hyperbolic tangent function, defined as tanh(x)=exexex+ex,\tanh(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}, serves as a prominent sigmoid alternative, mapping inputs to the range (-1, 1) and exhibiting symmetry around zero due to its odd nature. This zero-centered output facilitates faster convergence in optimization processes compared to positively biased sigmoids. Its saturation occurs at a moderate rate, with steeper gradients near the origin than exponential-based forms. Another variant is the arctangent-based sigmoid, commonly scaled as σ(x)=2πarctan(x)+12,\sigma(x) = \frac{2}{\pi} \arctan(x) + \frac{1}{2}, which bounds outputs to (0, 1) while providing a smooth, monotonic transition. This form demonstrates slower saturation than hyperbolic tangent, as its approach to asymptotes is more gradual, owing to the bounded derivative of the arctangent. It maintains odd symmetry in its unscaled version but is adjusted for positive range applications. The Gompertz function offers an asymmetric sigmoid, given by σ(x)=aebecx,\sigma(x) = a e^{-b e^{-c x}}, where a>0a > 0 sets the upper , and b,c>0b, c > 0 control growth parameters, yielding a range of (0, a). Its curve features delayed initial rise followed by rapid acceleration, contrasting with symmetric sigmoids through pronounced . Saturation in the upper region is slower than in logistic forms, reflecting its double-exponential structure. Algebraic sigmoids provide computationally efficient alternatives, such as the rational form f(x)=x1+xf(x) = \frac{x}{1 + |x|}, which approximates a bounded S-curve over (-1, 1) without exponentials. Piecewise or rational constructions like this enable faster evaluation in resource-constrained settings, though they may introduce minor discontinuities in derivatives. These functions differ notably in saturation speed, with arctangent showing the slowest approach to bounds, hyperbolic offering balanced steepness, and Gompertz displaying asymmetric deceleration. Symmetry varies from the odd, zero-centered hyperbolic to the asymmetric Gompertz, while bounded ranges consistently limit outputs to finite intervals, preserving monotonicity as a shared sigmoid trait. Algebraic variants prioritize efficiency over smoothness, saturating more abruptly in approximations.

Applications

Statistics and Probability

In statistics, the sigmoid function plays a central role in modeling binary outcomes through its connection to the . The (CDF) of the logistic distribution is given by the logistic sigmoid: F(x)=11+e(xμ)/s,F(x) = \frac{1}{1 + e^{-(x - \mu)/s}}, where μ\mu is the representing the and , and s>0s > 0 is the that controls the spread and steepness of the distribution. This form ensures that F(x)F(x) maps any real-valued input to a probability between 0 and 1, making it suitable for representing cumulative probabilities in probabilistic models. The logistic distribution is symmetric and bell-shaped, with variance π2s2/3\pi^2 s^2 / 3, and arises naturally in contexts where errors follow a logistic rather than . The logistic sigmoid also serves as an approximation to the cumulative distribution function of the standard normal distribution in probit models, providing a computationally simpler alternative in logistic regression. Specifically, the sigmoid σ(x)=1/(1+ex)\sigma(x) = 1 / (1 + e^{-x}) closely resembles Φ(λx)\Phi(\lambda x), where Φ\Phi is the normal CDF and λ1.7\lambda \approx 1.7 scales the argument for a good fit, particularly in the central region around zero. This approximation justifies the use of the logistic model over probit in many applications, as it yields similar coefficient estimates while avoiding the need for numerical integration of the normal CDF. In logistic regression, the sigmoid output σ(x)\sigma(x) interprets xx (the linear predictor) as the log-odds of the positive outcome, where the probability p=σ(x)p = \sigma(x) satisfies odds(p)=p/(1p)=ex\text{odds}(p) = p / (1 - p) = e^x for the standard case with scale s=1s=1. This relationship allows coefficients to be exponentiated directly into odds ratios, quantifying how the odds change with predictors; for instance, a coefficient βj=0.5\beta_j = 0.5 implies an odds ratio of e0.51.65e^{0.5} \approx 1.65, meaning a one-unit increase in the jj-th predictor multiplies the odds by 1.65, holding other variables constant. Bayesian frameworks leverage the logistic sigmoid for updating posterior probabilities in , often modeling the posterior odds as a of evidence under conjugate priors like the logistic-normal approximation. In Bayesian , the sigmoid arises when integrating over parameter uncertainty, enabling variational inference to approximate intractable posteriors and update beliefs about class probabilities based on observed data. Parameter estimation in sigmoid-based models, such as , typically employs (MLE) to maximize the log-likelihood (β)=i[yixiTβlog(1+exiTβ)]\ell(\beta) = \sum_i [y_i x_i^T \beta - \log(1 + e^{x_i^T \beta})], where yi{0,1}y_i \in \{0,1\} are binary responses. This objective is convex, ensuring a unique global maximum solvable via gradient-based methods like Newton-Raphson, which iteratively update β\beta using the score function and Hessian derived from the sigmoid's derivative σ(x)(1σ(x))\sigma(x)(1 - \sigma(x)). MLE provides consistent and asymptotically efficient estimates under standard regularity conditions, forming the basis for in these models.

Machine Learning and Neural Networks

In artificial neural networks, the sigmoid function serves as an that introduces non-linearity into the model, enabling it to learn complex patterns beyond linear transformations. Applied to the weighted sum of inputs in hidden layers, it maps real-valued inputs to the range (0, 1), which facilitates the representation of hierarchical features during forward propagation. In the output layer for tasks, the sigmoid's output is interpreted as the probability of belonging to the positive class, aligning with probabilistic . A key advantage of the sigmoid in training neural networks via lies in its , which simplifies computation. The is given by: σ(x)=σ(x)(1σ(x))\sigma'(x) = \sigma(x) (1 - \sigma(x)) This allows efficient calculation of error gradients during the backward pass, as it depends only on the sigmoid's output without requiring additional forward computations. This property contributed to the widespread adoption of sigmoid activations in early multilayer perceptrons, where was first demonstrated effectively. Despite these benefits, the sigmoid activation suffers from the , where gradients approach zero for large positive or negative inputs due to the function's saturation in the flat regions near 0 and 1. This leads to slow or stalled learning in deep networks, as updates to earlier layer weights become negligible during . To mitigate this, alternatives like the rectified linear unit (ReLU) activation, which avoids saturation for positive inputs, have become preferred in hidden layers of modern architectures. In popular frameworks, the sigmoid is implemented with optimizations for . For instance, provides tf.keras.activations.sigmoid, which handles large inputs to prevent overflow in the exponential term. Similarly, PyTorch's torch.nn.Sigmoid module applies the function element-wise, often paired with stable variants like log_sigmoid for loss computations involving logarithms, computed as log(σ(x))=log(1+ex)\log(\sigma(x)) = -\log(1 + e^{-x}) to avoid underflow. For binary classification outputs, the sigmoid is typically applied in the final layer, followed by binary cross-entropy loss to measure divergence between predicted probabilities and true labels. This combination encourages the model to produce well-calibrated probabilities, with the loss defined as [ylog(y^)+(1y)log(1y^)]-\left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right], where y^=σ(z)\hat{y} = \sigma(z) and zz is the linear output. Frameworks like support a from_logits=True option in binary cross-entropy to apply the sigmoid internally, enhancing by avoiding explicit computation of the sigmoid on raw logits.

Biological and Physical Models

In , the sigmoid function arises as the solution to the , which models bounded growth in biological populations limited by environmental . The equation is given by dPdt=rP(1PK),\frac{dP}{dt} = r P \left(1 - \frac{P}{K}\right), where P(t)P(t) is the at time tt, rr is the intrinsic growth rate, and KK is the . The explicit solution is the logistic sigmoid function P(t)=K1+(KP01)ert,P(t) = \frac{K}{1 + \left(\frac{K}{P_0} - 1\right) e^{-r t}}, with initial population P0P_0, describing an initial exponential phase followed by deceleration toward the asymptote KK. This model, originally proposed by Pierre Verhulst in 1838 to fit human population data, has been widely applied to microbial and animal populations where resources constrain growth. Biological neurons exhibit sigmoidal response curves, where the firing rate increases nonlinearly with input stimulus intensity, saturating at high levels to reflect physiological limits. This graded response allows neurons to perform thresholded computations and gain modulation, as seen in dendritic compartments and synaptic integrations. Seminal models, such as those analyzing variance in neuronal populations, derive the sigmoid shape from probabilistic firing mechanisms, where dispersion in input leads to a smooth transition from low to high activity. Experimental observations in cortical and hippocampal neurons confirm this form, with the steepness of the curve varying by neuron type and modulating network dynamics. In , the Michaelis-Menten equation describes the reaction rate as a hyperbolic sigmoid function of substrate concentration, capturing saturation effects in catalytic processes. The rate vv is v=Vmax[S]Km+[S],v = \frac{V_{\max} [S]}{K_m + [S]}, where VmaxV_{\max} is the maximum rate, [S][S] is substrate concentration, and KmK_m is the Michaelis constant representing half-saturation. This form, derived from steady-state assumptions in enzyme-substrate binding, fits empirical data for many biochemical reactions and underpins quantitative analyses in . The model was established by and in 1913 through experiments on , providing a foundational tool for studying efficiency. Sigmoid functions serve as smooth approximations in physical models of transitions, such as phase changes in materials and processes. In of , the mm versus reduced follows a sigmoid-like curve near the critical point, arising from the self-consistent solution m=tanh(TcTm+h)m = \tanh\left(\frac{T_c}{T} m + h\right), where TcT_c is the and hh is the external field; this captures the abrupt onset of order below TcT_c. Pierre Weiss introduced this molecular field approach in 1907 to explain ferromagnetic and susceptibility. In models, sigmoids approximate sharp interfaces, like Heaviside steps in reaction- systems, enabling while preserving essential dynamics in or boundary propagation. For instance, in lithiation, a flexible sigmoid delineates two-phase regions to model stress evolution. The , an asymmetric sigmoid, models tumor growth in by describing slower initial proliferation accelerating to a plateau due to nutrient limitations and . Unlike the symmetric logistic, it features an in growth rate, fitting longitudinal data from various cancers like carcinomas and melanomas. This application, pioneered by A.K. Laird in through analysis of mouse tumor volumes, highlights how the model's parameters correlate with tumor aggressiveness and treatment response, aiding prognostic simulations.

History and Development

Origins

The sigmoid curve, characterized by its S-shaped form representing bounded growth, first emerged in mathematical modeling during the early . Benjamin Gompertz introduced an asymmetric variant in 1825 while studying human mortality rates, proposing a function that described the decreasing intensity of mortality force over time, approaching an asymptote as age increases. This model, known as the , provided an early non-exponential example of sigmoid behavior in , influencing later demographic analyses. In 1838, Pierre-François Verhulst developed the logistic growth model to describe , deriving a symmetric sigmoid that starts slowly, accelerates, and then tapers off toward a limit. Verhulst's work, published in Correspondance Mathématique et Physique, applied this form to predict bounded population expansion in contrast to unchecked , laying foundational principles for and . Prior to the formal adoption of the "sigmoid" terminology in the 20th century, S-shaped curves appeared in 19th-century and as graphical representations of resource-constrained processes. Biochemical applications of sigmoid forms arose in the early 20th century with Archibald Vivian Hill's 1910 work on oxygen binding to , where he formulated the Hill equation to capture the cooperative, S-shaped dissociation curve observed in experimental data. This equation modeled the nonlinear response of binding sites, establishing a precedent for sigmoid functions in and . In 19th-century , sigmoid shapes were recognized in cumulative distribution functions, particularly through —graphical plots of cumulative frequencies that often resembled S-curves for continuous data. formalized the in the 1880s as the inverse of the normal cumulative distribution, linking these forms to probabilistic interpretations of ordered observations in anthropometric and biological studies.

Modern Usage

The McCulloch-Pitts model of 1943 introduced early concepts using step functions as mechanisms, representing binary threshold logic to mimic neural firing. This foundational work laid the groundwork for computational neural models, but the rigid step functions limited differentiability for learning algorithms. Frank Rosenblatt's perceptron, introduced in 1958, advanced these ideas by incorporating rules and relying on threshold functions. The classified patterns through weight updates using the , marking a pivotal step in making artificial neurons trainable via error minimization, though limited to linear separability in single layers. The era in the 1980s further entrenched the logistic sigmoid in multilayer networks as a differentiable alternative to step functions, as popularized by Rumelhart, Hinton, and Williams in their 1986 work, which demonstrated its effectiveness for propagating errors through hidden layers due to its smoothness and bounded output between 0 and 1. Their algorithm enabled the training of deep architectures, revitalizing interest in neural networks after earlier limitations highlighted by Minsky and Papert. In the and , concerns over vanishing s—where sigmoid derivatives near 0 or 1 cause error signals to diminish in deep or recurrent networks—prompted a shift toward alternatives like ReLU in models for faster convergence and reduced saturation. However, the sigmoid persisted in recurrent architectures, notably in LSTM gates introduced by Hochreiter and Schmidhuber in 1997, where it controls information flow (e.g., forget and input gates) while the cell state maintains stability over long sequences. Post-1990s, sigmoid functions expanded interdisciplinary applications, such as logistic models in for binary outcome prediction in analyses and in modeling for simulating sigmoidal growth patterns like CO2 accumulation or retention curves.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.