Alien-XGBoost
view release on metacpan or search on metacpan
xgboost/doc/R-package/xgboostPresentation.md view on Meta::CPAN
XGBoost R Tutorial
==================
## Introduction
**Xgboost** is short for e**X**treme **G**radient **Boost**ing package.
The purpose of this Vignette is to show you how to use **Xgboost** to build a model and make predictions.
It is an efficient and scalable implementation of gradient boosting framework by @friedman2000additive and @friedman2001greedy. Two solvers are included:
- *linear* model ;
- *tree learning* algorithm.
It supports various objective functions, including *regression*, *classification* and *ranking*. The package is made to be extendible, so that users are also allowed to define their own objective functions easily.
It has been [used](https://github.com/dmlc/xgboost) to win several [Kaggle](http://www.kaggle.com) competitions.
It has several features:
* Speed: it can automatically do parallel computation on *Windows* and *Linux*, with *OpenMP*. It is generally over 10 times faster than the classical `gbm`.
* Input Type: it takes several types of input data:
* *Dense* Matrix: *R*'s *dense* matrix, i.e. `matrix` ;
* *Sparse* Matrix: *R*'s *sparse* matrix, i.e. `Matrix::dgCMatrix` ;
* Data File: local data files ;
* `xgb.DMatrix`: its own class (recommended).
* Sparsity: it accepts *sparse* input for both *tree booster* and *linear booster*, and is optimized for *sparse* input ;
* Customization: it supports customized objective functions and evaluation functions.
## Installation
### Github version
For weekly updated version (highly recommended), install from *Github*:
```r
install.packages("drat", repos="https://cran.rstudio.com")
drat:::addRepo("dmlc")
install.packages("xgboost", repos="http://dmlc.ml/drat/", type = "source")
```
> *Windows* user will need to install [Rtools](http://cran.r-project.org/bin/windows/Rtools/) first.
### CRAN version
The version 0.4-2 is on CRAN, and you can install it by:
```r
install.packages("xgboost")
```
Formerly available versions can be obtained from the CRAN [archive](http://cran.r-project.org/src/contrib/Archive/xgboost)
## Learning
For the purpose of this tutorial we will load **XGBoost** package.
```r
require(xgboost)
```
### Dataset presentation
In this example, we are aiming to predict whether a mushroom can be eaten or not (like in many tutorials, example data are the the same as you will use on in your every day life :-).
Mushroom data is cited from UCI Machine Learning Repository. @Bache+Lichman:2013.
### Dataset loading
We will load the `agaricus` datasets embedded with the package and will link them to variables.
The datasets are already split in:
* `train`: will be used to build the model ;
* `test`: will be used to assess the quality of our model.
Why *split* the dataset in two parts?
In the first part we will build our model. In the second part we will want to test it and assess its quality. Without dividing the dataset we would test the model on the data which the algorithm have already seen.
```r
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
```
( run in 1.636 second using v1.01-cache-2.11-cpan-39bf76dae61 )