Multiclass classification
Multiclass classification
Main page

Multiclass classification

logo
Community Hub0 subscribers
What are your thoughts?
Be the first to start a discussion here.
Be the first to start a discussion here.
Multiclass classification

In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary classification). For example, deciding on whether an image is showing a banana, peach, orange, or an apple is a multiclass classification problem, with four possible classes (banana, peach, orange, apple), while deciding on whether an image contains an apple or not is a binary classification problem (with the two possible classes being: apple, no apple).

While many classification algorithms (notably multinomial logistic regression) naturally permit the use of more than two classes, some are by nature binary algorithms; these can, however, be turned into multinomial classifiers by a variety of strategies.

Multiclass classification should not be confused with multi-label classification, where multiple labels are to be predicted for each instance (e.g., predicting that an image contains both an apple and an orange, in the previous example).

From the confusion matrix of a multiclass model, we can determine whether a model does better than chance. Let be the number of classes, a set of observations, a model of the target variable and be the number of observations in the set . We note , , , and . It is assumed that the confusion matrix contains at least one non-zero entry in each row, that is for any . Finally we call "normalized confusion matrix" the matrix of conditional probabilities .

The lift is a way of measuring the deviation from independence of two events and  :

We have if and only if events and occur simultaneously with a greater probability than if they were independent. In other words, if one of the two events occurs, the probability of observing the other event increases.

A first condition to satisfy is to have for any . And the quality of a model (better or worse than chance) does not change if we over- or undersample the dataset, that is if we multiply each row of the confusion matrix by a constant . Thus the second condition is that the necessary and sufficient conditions for doing better than chance need only depend on the normalized confusion matrix.

See all
User Avatar
No comments yet.