Hubbry Logo
XGBoostXGBoostMain
Open search
XGBoost
Community hub
XGBoost
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
XGBoost
from Wikipedia
XGBoost
DeveloperThe XGBoost Contributors
Initial releaseMarch 27, 2014; 11 years ago (2014-03-27)
Stable release
3.0.0[1] Edit this on Wikidata / 15 March 2025; 8 months ago (15 March 2025)
Repository
Written inC++
Operating systemLinux, macOS, Microsoft Windows
TypeMachine learning
LicenseApache License 2.0
Websitexgboost.ai

XGBoost[2] (eXtreme Gradient Boosting) is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python,[3] R,[4] Julia,[5] Perl,[6] and Scala. It works on Linux, Microsoft Windows,[7] and macOS.[8] From the project description, it aims to provide a "Scalable, Portable and Distributed Gradient Boosting (GBM, GBRT, GBDT) Library". It runs on a single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask.[9][10]

XGBoost gained much popularity and attention in the mid-2010s as the algorithm of choice for many winning teams of machine learning competitions.[11]

History

[edit]

XGBoost initially started as a research project by Tianqi Chen[12] as part of the Distributed (Deep) Machine Learning Community (DMLC) group at the University of Washington. Initially, it began as a terminal application which could be configured using a libsvm configuration file. It became well known in the ML competition circles after its use in the winning solution of the Higgs Machine Learning Challenge. Soon after, the Python and R packages were built, and XGBoost now has package implementations for Java, Scala, Julia, Perl, and other languages. This brought the library to more developers and contributed to its popularity among the Kaggle community, where it has been used for a large number of competitions.[11]

It was soon integrated with a number of other packages making it easier to use in their respective communities. It has now been integrated with scikit-learn for Python users and with the caret package for R users. It can also be integrated into Data Flow frameworks like Apache Spark, Apache Hadoop, and Apache Flink using the abstracted Rabit[13] and XGBoost4J.[14] XGBoost is also available on OpenCL for FPGAs.[15] An efficient, scalable implementation of XGBoost has been published by Tianqi Chen and Carlos Guestrin.[16]

While the XGBoost model often achieves higher accuracy than a single decision tree, it sacrifices the intrinsic interpretability of decision trees.  For example, following the path that a decision tree takes to make its decision is trivial and self-explained, but following the paths of hundreds or thousands of trees is much harder.

Features

[edit]

Salient features of XGBoost which make it different from other gradient boosting algorithms include:[17][18][16]

  • Clever penalization of trees
  • A proportional shrinking of leaf nodes
  • Newton Boosting
  • Extra randomization parameter
  • Implementation on single, distributed systems and out-of-core computation
  • Automatic feature selection [citation needed]
  • Theoretically justified weighted quantile sketching for efficient computation
  • Parallel tree structure boosting with sparsity
  • Efficient cacheable block structure for decision tree training

The algorithm

[edit]

XGBoost works as Newton–Raphson in function space unlike gradient boosting that works as gradient descent in function space, a second order Taylor approximation is used in the loss function to make the connection to Newton–Raphson method.

A generic unregularized XGBoost algorithm is:

Input: training set , a differentiable loss function , a number of weak learners and a learning rate .

Algorithm:

  • Initialize model with a constant value:[further explanation needed]
Note that this is the initialization of the model and therefore we set a constant value for all inputs. So even if in later iterations we use optimization to find new functions, in step 0 we have to find the value, equals for all inputs, that minimizes the loss functions.
  1. For m = 1 to M:
    1. Compute the 'gradients' and 'hessians':[clarification needed]
    2. Fit a base learner (or weak learner, e.g. tree) using the training set
      [clarification needed] by solving the optimization problem below: [clarification needed]
    3. Update the model:
  2. Output

Awards

[edit]
  • John Chambers Award (2016)[19]
  • High Energy Physics meets Machine Learning award (HEP meets ML) (2016)[20]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
Add your contribution
Related Hubs
User Avatar
No comments yet.