Recent from talks
Contribute something
Nothing was collected or created yet.
Binary entropy function
View on Wikipedia
In information theory, the binary entropy function, denoted or , is defined as the entropy of a Bernoulli process (i.i.d. binary variable) with probability of one of two values, and is given by the formula:
The base of the logarithm corresponds to the choice of units of information; base e corresponds to nats and is mathematically convenient, while base 2 (binary logarithm) corresponds to shannons and is conventional (as shown in the graph); explicitly:
Note that the values at 0 and 1 are given by the limit (by L'Hôpital's rule); and that "binary" refers to two possible values for the variable, not the units of information.
When , the binary entropy function attains its maximum value, 1 shannon (1 binary unit of information); this is the case of an unbiased coin flip. When or , the binary entropy is 0 (in any units), corresponding to no information, since there is no uncertainty in the variable.
Notation
[edit]Binary entropy is a special case of , the entropy function. is distinguished from the entropy function in that the former takes a single real number as a parameter whereas the latter takes a distribution or random variable as a parameter. Thus the binary entropy (of p) is the entropy of the distribution , so .
Writing the probability of each of the two values being p and q, so and , this corresponds to
Sometimes the binary entropy function is also written as . However, it is different from and should not be confused with the Rényi entropy, which is denoted as .
Explanation
[edit]In terms of information theory, entropy is considered to be a measure of the uncertainty in a message. To put it intuitively, suppose . At this probability, the event is certain never to occur, and so there is no uncertainty at all, leading to an entropy of 0. If , the result is again certain, so the entropy is 0 here as well. When , the uncertainty is at a maximum; if one were to place a fair bet on the outcome in this case, there is no advantage to be gained with prior knowledge of the probabilities. In this case, the entropy is maximum at a value of 1 bit. Intermediate values fall between these cases; for instance, if , there is still a measure of uncertainty on the outcome, but one can still predict the outcome correctly more often than not, so the uncertainty measure, or entropy, is less than 1 full bit.
Properties
[edit]Derivative
[edit]The derivative of the binary entropy function may be expressed as the negative of the logit function:
- .
Convex conjugate
[edit]The convex conjugate (specifically, the Legendre transform) of the binary entropy (with base e) is the negative softplus function. This is because (following the definition of the Legendre transform: the derivatives are inverse functions) the derivative of negative binary entropy is the logit, whose inverse function is the logistic function, which is the derivative of softplus.
Softplus can be interpreted as logistic loss, so by duality, minimizing logistic loss corresponds to maximizing entropy. This justifies the principle of maximum entropy as loss minimization.
Taylor series
[edit]The Taylor series of the binary entropy function at 1/2 is
which converges to the binary entropy function for all values .
Bounds
[edit]The following bounds hold for :[1]
and
where denotes natural logarithm.
See also
[edit]References
[edit]- ^ Topsøe, Flemming (2001). "Bounds for entropy and divergence for distributions over a two-element set". JIPAM. Journal of Inequalities in Pure & Applied Mathematics. 2 (2): Paper No. 25, 13 p.-Paper No. 25, 13 p.
Further reading
[edit]- MacKay, David J. C. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1
Binary entropy function
View on Grokipediawith the convention that .[1] This function, a special case of Shannon entropy for two outcomes, serves as a foundational measure in information theory for assessing the inefficiency of a binary source or the information required to encode it.[2] The binary entropy function exhibits several key properties that highlight its role in probabilistic modeling. It is symmetric around , concave, and achieves its maximum value of 1 bit at , corresponding to maximum uncertainty in a fair coin flip.[1] At the endpoints, , reflecting zero uncertainty for deterministic outcomes.[1] These characteristics make a concave function that bounds the entropy of any binary distribution and facilitates analysis in optimization problems.[3] Introduced by Claude Shannon in his seminal 1948 paper "A Mathematical Theory of Communication," the binary entropy function emerged as part of the broader framework for quantifying information transmission over noisy channels.[2] Shannon demonstrated that it represents the fundamental limit on the efficiency of encoding binary sources, influencing the development of source coding theorems.[2] Its definition aligns with the entropy of an ideal binary source, providing a baseline for measuring information loss or gain in communication systems.[4] In applications, the binary entropy function is central to calculating channel capacities, such as the binary symmetric channel, where capacity is given by and is the crossover probability, determining the maximum reliable transmission rate.[1] It also appears in error-correcting codes, data compression algorithms like Huffman coding for binary alphabets, and analyses of randomness in cryptography and machine learning, where it evaluates model uncertainty or feature importance. Beyond communications, extensions of inform statistical inference and decision theory by modeling binary decision entropy.