Hubbry Logo
search
logo
1995273

Jaccard index

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Jaccard index

The Jaccard index is a statistic used for gauging the similarity and diversity of sample sets. It is defined in general taking the ratio of two sizes (areas or volumes), the intersection size divided by the union size, also called intersection over union (IoU).

It was developed by Grove Karl Gilbert in 1884 as his ratio of verification (v) and now is often called the critical success index in meteorology. It was later developed independently by Paul Jaccard, originally giving the French name coefficient de communauté (coefficient of community), and independently formulated again by Taffee Tadashi Tanimoto. Thus, it is also called Tanimoto index or Tanimoto coefficient in some fields.

The Jaccard index measures similarity between finite non-empty sample sets and is defined as the size of the intersection divided by the size of the union of the sample sets:

Note that by design, If the sets and have no elements in common, their intersection is empty, so and therefore The other extreme is that the two sets are equal. In that case so then The Jaccard index is widely used in computer science, ecology, genomics and other sciences where binary or binarized data are used. Both the exact solution and approximation methods are available for hypothesis testing with the Jaccard index.

Jaccard similarity also applies to bags, i.e., multisets. This has a similar formula, but the symbols used represent bag intersection and bag sum (not union). The maximum value is 1/2.

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard index and is obtained by subtracting the Jaccard index from 1 or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union:

An alternative interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference to the union. Jaccard distance is commonly used to calculate an n × n matrix for clustering and multidimensional scaling of n sample sets.

This distance is a metric on the collection of all finite sets.

See all
User Avatar
No comments yet.