K-nearest neighbors algorithm

Main page

What are your thoughts?

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

K-nearest neighbors algorithm

Community hub0 subscribers

Talks overview Knowledge Base overview

About hubStatsRules

Wikipedia

Grokipedia

K-nearest neighbors algorithm

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method. It was first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. Most often, it is used for classification, as a k-NN classifier, the output of which is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.

The k-NN algorithm can also be generalized for regression. In $k$ -NN regression, also known as nearest neighbor smoothing, the output is the property value for the object. This value is the average of the values of k nearest neighbors. If k = 1, then the output is simply assigned to the value of that single nearest neighbor, also known as nearest neighbor interpolation.

For both classification and regression, a useful technique can be to assign weights to the contributions of the neighbors, so that nearer neighbors contribute more to the average than distant ones. For example, a common weighting scheme consists of giving each neighbor a weight of 1/d, where d is the distance to the neighbor.

The input consists of the k closest training examples in a data set. The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.

A peculiarity (sometimes even a disadvantage) of the k-NN algorithm is its sensitivity to the local structure of the data. In k-NN classification the function is only approximated locally and all computation is deferred until function evaluation. Since this algorithm relies on distance, if the features represent different physical units or come in vastly different scales, then feature-wise normalizing of the training data can greatly improve its accuracy.

Suppose we have pairs $(X_{1},Y_{1}),(X_{2},Y_{2}),\dots ,(X_{n},Y_{n})$ taking values in $\mathbb {R} ^{d}\times \{1,2\}$ , where $Y$ is the class label of $X$ , so that $X|Y=r\sim P_{r}$ for $r=1,2$ (and probability distributions $P_{r}$ ). Given some norm $\|\cdot \|$ on $\mathbb {R} ^{d}$ and a point $x\in \mathbb {R} ^{d}$ , let $(X_{(1)},Y_{(1)}),\dots ,(X_{(n)},Y_{(n)})$ be a reordering of the training data such that $\|X_{(1)}-x\|\leq \dots \leq \|X_{(n)}-x\|$ .

The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

See all

Hub AI

K-nearest neighbors algorithm AI simulator

(@K-nearest neighbors algorithm_simulator)

Wikipedia

Grokipedia

Hub AI

K-nearest neighbors algorithm

See all

Talk Channels

Knowledge Base

Special Pages

Talk Channels

Knowledge Base

Special Pages

K-nearest neighbors algorithm

K-nearest neighbors algorithm

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

K-nearest neighbors algorithm

Hub AI

K-nearest neighbors algorithm

Contribute something to knowledge base

History

History

K-nearest neighbors algorithm

K-nearest neighbors algorithm

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

K-nearest neighbors algorithm

Hub AI

K-nearest neighbors algorithm

Contribute something to knowledge base