Recent from talks
Nothing was collected or created yet.
Nearest centroid classifier
View on Wikipedia
In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation. When applied to text classification using word vectors containing tf*idf weights to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.[1]
An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.[2]
Algorithm
[edit]Training
[edit]Given labeled training samples with class labels , compute the per-class centroids where is the set of indices of samples belonging to class .
Prediction
[edit]The class assigned to an observation is .
See also
[edit]References
[edit]- ^ Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich (2008). "Vector space classification". Introduction to Information Retrieval. Cambridge University Press.
- ^ Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert (2002). "Diagnosis of multiple cancer types by shrunken centroids of gene expression". Proceedings of the National Academy of Sciences. 99 (10): 6567–6572. Bibcode:2002PNAS...99.6567T. doi:10.1073/pnas.082099299. PMC 124443. PMID 12011421.
Nearest centroid classifier
View on GrokipediaOverview
Definition and intuition
The nearest centroid classifier is a simple prototype-based method for multi-class classification, where each class is represented by a single prototype known as the centroid, and an input data point is assigned to the class whose centroid is closest according to a distance metric. This approach treats the centroid as the "center of mass" for the class's training samples, providing an intuitive summary that captures the average location of the class in feature space. It operates as a linear-time algorithm, requiring only a single pass over the training data to compute centroids, which makes it computationally efficient and suitable as a baseline for more complex classifiers.[1] The core intuition behind the nearest centroid classifier is that data points from the same class tend to cluster around a central representative point, exerting a "pull" on new samples toward their respective class centers during prediction. For a test sample, the classifier measures the distance—typically Euclidean—to each class centroid and assigns the sample to the nearest one, effectively partitioning the feature space into regions dominated by each class prototype. This mimics natural grouping behaviors observed in clustering tasks, where points are attracted to their nearest center, and it assumes well-separated classes with roughly spherical distributions for optimal performance. A high-level pseudocode overview of the workflow is as follows: Training Phase:- For each class in the set of classes:
- Compute the centroid as the arithmetic mean of all training samples assigned to class .
- For a test sample :
- Compute the distance from to each centroid .
- Assign to the class where .
