Hubbry Logo
search
logo

Uncertainty coefficient

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Uncertainty coefficient

In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal association. It was first introduced by Henri Theil[citation needed] and is based on the concept of information entropy.

Suppose we have samples of two discrete random variables, X and Y. By constructing the joint distribution, PX,Y(xy), from which we can calculate the conditional distributions, PX|Y(x|y) = PX,Y(xy)/PY(y) and PY|X(y|x) = PX,Y(xy)/PX(x), and calculating the various entropies, we can determine the degree of association between the two variables.

The entropy of a single distribution is given as:

while the conditional entropy is given as:

The uncertainty coefficient or proficiency is defined as:

and tells us: given Y, what fraction of the bits of X can we predict? In this case we can think of X as containing the total information, and of Y as allowing one to predict part of such information.

The above expression makes clear that the uncertainty coefficient is a normalised mutual information I(X;Y). In particular, the uncertainty coefficient ranges in [0, 1] as I(X;Y) < H(X) and both I(X,Y) and H(X) are positive or null.

Note that the value of U (but not H!) is independent of the base of the log since all logarithms are proportional.

See all
User Avatar
No comments yet.