Hubbry Logo
Discretization of continuous featuresDiscretization of continuous featuresMain
Open search
Discretization of continuous features
Community hub
Discretization of continuous features
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Discretization of continuous features
from Wikipedia

In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals. This can be useful when creating probability mass functions – formally, in density estimation. It is a form of discretization in general and also of binning, as in making a histogram. Whenever continuous data is discretized, there is always some amount of discretization error. The goal is to reduce the amount to a level considered negligible for the modeling purposes at hand.

Typically data is discretized into partitions of K equal lengths/width (equal intervals) or K% of the total data (equal frequencies).[1]

Mechanisms for discretizing continuous data include Fayyad & Irani's MDL method,[2] which uses mutual information to recursively define the best bins, CAIM, CACC, Ameva, and many others[3]

Many machine learning algorithms are known to produce better models by discretizing continuous attributes.[4]

Software

[edit]

This is a partial list of software that implement MDL algorithm.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
Add your contribution
Related Hubs
User Avatar
No comments yet.