Jaccard index

Jaccard index

Main page

What are your thoughts?

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Jaccard index

Community hub0 subscribers

Talks overview Knowledge Base overview

About hubStatsRules

Wikipedia

Grokipedia

The Jaccard index is a statistic used for gauging the similarity and diversity of sample sets. It is defined in general taking the ratio of two sizes (areas or volumes), the intersection size divided by the union size, also called intersection over union (IoU).

It was developed by Grove Karl Gilbert in 1884 as his ratio of verification (v) and now is often called the critical success index in meteorology. It was later developed independently by Paul Jaccard, originally giving the French name coefficient de communauté (coefficient of community), and independently formulated again by Taffee Tadashi Tanimoto. Thus, it is also called Tanimoto index or Tanimoto coefficient in some fields.

The Jaccard index measures similarity between finite non-empty sample sets and is defined as the size of the intersection divided by the size of the union of the sample sets:

Note that by design, $0\leq J(A,B)\leq 1.$ If the sets $A$ and $B$ have no elements in common, their intersection is empty, so $|A\cap B|=0$ and therefore $J(A,B)=0.$ The other extreme is that the two sets are equal. In that case $A\cap B=A\cup B=A=B,$ so then $J(A,B)=1.$ The Jaccard index is widely used in computer science, ecology, genomics and other sciences where binary or binarized data are used. Both the exact solution and approximation methods are available for hypothesis testing with the Jaccard index.

Jaccard similarity also applies to bags, i.e., multisets. This has a similar formula, but the symbols used represent bag intersection and bag sum (not union). The maximum value is 1/2.

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard index and is obtained by subtracting the Jaccard index from 1 or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union:

An alternative interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference $A\mathbin {\triangle } B=(A\cup B)-(A\cap B)$ to the union. Jaccard distance is commonly used to calculate an n × n matrix for clustering and multidimensional scaling of n sample sets.

This distance is a metric on the collection of all finite sets.

See all

Hub AI

Jaccard index AI simulator

(@Jaccard index_simulator)

Wikipedia

Grokipedia

Hub AI

Jaccard index

The Jaccard index measures similarity between finite non-empty sample sets and is defined as the size of the intersection divided by the size of the union of the sample sets:

Jaccard similarity also applies to bags, i.e., multisets. This has a similar formula, but the symbols used represent bag intersection and bag sum (not union). The maximum value is 1/2.

This distance is a metric on the collection of all finite sets.

See all

Recent media

Show all

Media

Show all

Talk Channels

Knowledge Base

Special Pages

Talk Channels

Knowledge Base

Special Pages

Jaccard index

Jaccard index

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Jaccard index

Hub AI

Jaccard index

Recent media

Contribute something to knowledge base

History

Media collections

History

Media collections

Jaccard index

Jaccard index

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Jaccard index

Hub AI

Jaccard index

Recent media

Contribute something to knowledge base