Hubbry Logo
search
logo

OPTICS algorithm

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
OPTICS algorithm

Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based clusters in spatial data. It was presented in 1999 by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and Jörg Sander. Its basic idea is similar to DBSCAN, but it addresses one of DBSCAN's major weaknesses: the problem of detecting meaningful clusters in data of varying density. To do so, the points of the database are (linearly) ordered such that spatially closest points become neighbors in the ordering. Additionally, a special distance is stored for each point that represents the density that must be accepted for a cluster so that both points belong to the same cluster. This is represented as a dendrogram.

Like DBSCAN, OPTICS requires two parameters: ε, which describes the maximum distance (radius) to consider, and MinPts, describing the number of points required to form a cluster. A point p is a core point if at least MinPts points are found within its ε-neighborhood (including point p itself). In contrast to DBSCAN, OPTICS also considers points that are part of a more densely packed cluster, so each point is assigned a core distance that describes the distance to the MinPtsth closest point:

The reachability-distance of another point o from a point p is either the distance between o and p, or the core distance of p, whichever is bigger:

If p and o are nearest neighbors, this is the we need to assume to have p and o belong to the same cluster.

Both core-distance and reachability-distance are undefined if no sufficiently dense cluster (w.r.t. ε) is available. Given a sufficiently large ε, this never happens, but then every ε-neighborhood query returns the entire database, resulting in runtime. Hence, the ε parameter is required to cut off the density of clusters that are no longer interesting, and to speed up the algorithm.

The parameter ε is, strictly speaking, not necessary. It can simply be set to the maximum possible value. When a spatial index is available, however, it does play a practical role with regards to complexity. OPTICS abstracts from DBSCAN by removing this parameter, at least to the extent of only having to give the maximum value.

The basic approach of OPTICS is similar to DBSCAN, but instead of maintaining known, but so far unprocessed cluster members in a set, they are maintained in a priority queue (e.g. using an indexed heap).

In update(), the priority queue Seeds is updated with the -neighborhood of and , respectively:

See all
User Avatar
No comments yet.