Alien-XGBoost

 view release on metacpan or  search on metacpan

xgboost/R-package/configure  view on Meta::CPAN

rm -f confcache

test "x$prefix" = xNONE && prefix=$ac_default_prefix
# Let make expand exec_prefix.
test "x$exec_prefix" = xNONE && exec_prefix='${prefix}'

# Transform confdefs.h into DEFS.
# Protect against shell expansion while executing Makefile rules.
# Protect against Makefile macro expansion.
#
# If the first sed substitution is executed (which looks for macros that
# take arguments), then branch to the quote section.  Otherwise,
# look for a macro that doesn't take arguments.
ac_script='
:mline
/\\$/{
 N
 s,\\\n,,
 b mline
}
t clear

xgboost/R-package/vignettes/xgboost.Rnw  view on Meta::CPAN

\verb@xgboost@ is the main function to train a \verb@Booster@, i.e. a model.
\verb@predict@ does prediction on the model.

Here we can save the model to a binary local file, and load it when needed.
We can't inspect the trees inside. However we have another function to save the
model in plain text. 
<<Dump Model>>=
xgb.dump(bst, 'model.dump')
@

The output looks like 

\begin{verbatim}
booster[0]:
0:[f28<1.00001] yes=1,no=2,missing=2
  1:[f108<1.00001] yes=3,no=4,missing=4
    3:leaf=1.85965
    4:leaf=-1.94071
  2:[f55<1.00001] yes=5,no=6,missing=6
    5:leaf=-1.70044
    6:leaf=1.71218

xgboost/demo/kaggle-otto/understandingXGBoostModel.Rmd  view on Meta::CPAN

In the final model, these *leafs* are supposed to be as pure as possible for each tree, meaning in our case that each *leaf* should be made of one class of **Otto** product only (of course it is not true, but that's what we try to achieve in a minimu...

**Not all *splits* are equally important**. Basically the first *split* of a tree will have more impact on the purity that, for instance, the deepest *split*. Intuitively, we understand that the first *split* makes most of the work, and the following...

In the same way, in Boosting we try to optimize the misclassification at each round (it is called the *loss*). So the first *tree* will do the big work and the following trees will focus on the remaining, on the parts not correctly learned by the pre...

The improvement brought by each *split* can be measured, it is the *gain*.

Each *split* is done on one feature only at one value.

Let's see what the model looks like.

```{r modelDump}
model <- xgb.dump(bst, with.stats = T)
model[1:10]
```
> For convenience, we are displaying the first 10 lines of the model only.

Clearly, it is not easy to understand what it means.

Basically each line represents a *branch*, there is the *tree* ID, the feature ID, the point where it *splits*, and information regarding the next *branches* (left, right, when the row for this feature is N/A).



( run in 0.435 second using v1.01-cache-2.11-cpan-64827b87656 )