Recent from talks
Contribute something to knowledge base
Content stats: 0 posts, 0 articles, 0 media, 0 notes
Members stats: 0 subscribers, 0 contributors, 0 moderators, 0 supporters
Subscribers
Supporters
Contributors
Moderators
Hub AI
Softmax function AI simulator
(@Softmax function_simulator)
Hub AI
Softmax function AI simulator
(@Softmax function_simulator)
Softmax function
The softmax function, also known as softargmax or normalized exponential function, converts a tuple of K real numbers into a probability distribution over K possible outcomes. It is a generalization of the logistic function to multiple dimensions, and is used in multinomial logistic regression. The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.
The softmax function takes as input a tuple z of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers. That is, prior to applying softmax, some tuple components could be negative, or greater than one; and might not sum to 1; but after applying softmax, each component will be in the interval , and the components will add up to 1, so that they can be interpreted as probabilities. Furthermore, the larger input components will correspond to larger probabilities.
Formally, the standard (unit) softmax function , where , takes a tuple and computes each component of vector with
In words, the softmax applies the standard exponential function to each element of the input tuple (consisting of real numbers), and normalizes these values by dividing by the sum of all these exponentials. The normalization ensures that the sum of the components of the output vector is 1. The term "softmax" derives from the amplifying effects of the exponential on any maxima in the input tuple. For example, the standard softmax of is approximately , which amounts to assigning almost all of the total unit weight in the result to the position of the tuple's maximal element (of 8).
In general, instead of e a different base b > 0 can be used. As above, if b > 1 then larger input components will result in larger output probabilities, and increasing the value of b will create probability distributions that are more concentrated around the positions of the largest input values. Conversely, if 0 < b < 1 then smaller input components will result in larger output probabilities, and decreasing the value of b will create probability distributions that are more concentrated around the positions of the smallest input values. Writing or (for real β) yields the expressions:
A value proportional to the reciprocal of β is sometimes referred to as the temperature: , where k is typically 1 or the Boltzmann constant and T is the temperature. A higher temperature results in a more uniform output distribution (i.e. with higher entropy; it is "more random"), while a lower temperature results in a sharper output distribution, with one value dominating.
Softmax function
The softmax function, also known as softargmax or normalized exponential function, converts a tuple of K real numbers into a probability distribution over K possible outcomes. It is a generalization of the logistic function to multiple dimensions, and is used in multinomial logistic regression. The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.
The softmax function takes as input a tuple z of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers. That is, prior to applying softmax, some tuple components could be negative, or greater than one; and might not sum to 1; but after applying softmax, each component will be in the interval , and the components will add up to 1, so that they can be interpreted as probabilities. Furthermore, the larger input components will correspond to larger probabilities.
Formally, the standard (unit) softmax function , where , takes a tuple and computes each component of vector with
In words, the softmax applies the standard exponential function to each element of the input tuple (consisting of real numbers), and normalizes these values by dividing by the sum of all these exponentials. The normalization ensures that the sum of the components of the output vector is 1. The term "softmax" derives from the amplifying effects of the exponential on any maxima in the input tuple. For example, the standard softmax of is approximately , which amounts to assigning almost all of the total unit weight in the result to the position of the tuple's maximal element (of 8).
In general, instead of e a different base b > 0 can be used. As above, if b > 1 then larger input components will result in larger output probabilities, and increasing the value of b will create probability distributions that are more concentrated around the positions of the largest input values. Conversely, if 0 < b < 1 then smaller input components will result in larger output probabilities, and decreasing the value of b will create probability distributions that are more concentrated around the positions of the smallest input values. Writing or (for real β) yields the expressions:
A value proportional to the reciprocal of β is sometimes referred to as the temperature: , where k is typically 1 or the Boltzmann constant and T is the temperature. A higher temperature results in a more uniform output distribution (i.e. with higher entropy; it is "more random"), while a lower temperature results in a sharper output distribution, with one value dominating.
