Variation of information
Variation of information
Main page
1044116

Variation of information

logo
Community Hub0 subscribers
What are your thoughts?
Be the first to start a discussion here.
Be the first to start a discussion here.
Variation of information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.

Suppose we have two partitions and of a set , namely and .

Let:

Then the variation of information between the two partitions is:

This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on defined by for .

We can rewrite this definition in terms that explicitly highlight the information content of this metric.

The set of all partitions of a set form a compact lattice where the partial order induces two operations, the meet and the join , where the maximum is the partition with only one block, i.e., all elements grouped together, and the minimum is , the partition consisting of all elements as singletons. The meet of two partitions and is easy to understand as that partition formed by all pair intersections of one block of, , of and one, , of . It then follows that and .

Let's define the entropy of a partition as

See all
User Avatar
No comments yet.