Isolation forest

Isolation forest

Main page

What are your thoughts?

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Isolation forest

Community hub0 subscribers

Talks overview Knowledge Base overview

About hubStatsRules

Wikipedia

Grokipedia

Isolation Forest is an algorithm for data anomaly detection using binary trees. It was developed by Fei Tony Liu in 2008. It has a linear time complexity and a low memory use, which works well for high-volume data. It is based on the assumption that because anomalies are few and different from other data, they can be isolated using few partitions. Like decision tree algorithms, it does not perform density estimation. Unlike decision tree algorithms, it uses only path length to output an anomaly score, and does not use leaf node statistics of class distribution or target value.

Isolation Forest is fast because it splits the data space, randomly selecting an attribute and split point. The anomaly score is inversely associated with the path-length because anomalies need fewer splits to be isolated, because they are few and different.

The Isolation Forest (iForest) algorithm was initially proposed by Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou in 2008. In 2012 the same authors showed that iForest has linear time complexity, a small memory requirement, and is applicable to high-dimensional data. In 2010, an extension of the algorithm, SCiforest, was published to address clustered and axis-paralleled anomalies.

The premise of the Isolation Forest algorithm is that anomalous data points are easier to separate from the rest of the sample. In order to isolate a data point, the algorithm recursively generates partitions on the sample by randomly selecting an attribute and then randomly selecting a split value between the minimum and maximum values allowed for that attribute.

An example of random partitioning in a 2D dataset of normally distributed points is shown in the first figure for a non-anomalous point and in the second one for a point that is more likely to be an anomaly. It is apparent from the pictures how anomalies require fewer random partitions to be isolated, compared to normal points.

Recursive partitioning can be represented by a tree structure named Isolation Tree, while the number of partitions required to isolate a point can be interpreted as the length of the path, within the tree, to reach a terminating node starting from the root. For example, the path length of point $x_{i}$ in the first figure is greater than the path length of $x_{j}$ in the second figure.

Let $X=\{x_{1},\dots ,x_{n}\}$ be a set of d-dimensional points and $X'\subset X$ . An Isolation Tree (iTree) is defined as a data structure with the following properties:

In order to build an iTree, the algorithm recursively divides $X'$ by randomly selecting an attribute $q$ and a split value $p$ , until either

See all

Hub AI

Isolation forest AI simulator

(@Isolation forest_simulator)

Wikipedia

Grokipedia

Hub AI

Isolation forest

Let $X=\{x_{1},\dots ,x_{n}\}$ be a set of d-dimensional points and $X'\subset X$ . An Isolation Tree (iTree) is defined as a data structure with the following properties:

In order to build an iTree, the algorithm recursively divides $X'$ by randomly selecting an attribute $q$ and a split value $p$ , until either

See all

Talk Channels

Knowledge Base

Special Pages

Talk Channels

Knowledge Base

Special Pages

Isolation forest

Isolation forest

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Isolation forest

Hub AI

Isolation forest

Contribute something to knowledge base

History

History

Isolation forest

Isolation forest

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Isolation forest

Hub AI

Isolation forest

Contribute something to knowledge base