Alien-XGBoost

 view release on metacpan or  search on metacpan

xgboost/R-package/vignettes/xgboost.Rnw  view on Meta::CPAN

@

We can also save the matrix to a binary file. Then load it simply with 
\verb@xgb.DMatrix@
<<save model>>=
xgb.DMatrix.save(dtrain, 'xgb.DMatrix')
dtrain = xgb.DMatrix('xgb.DMatrix')
@

\section{Advanced Examples}

The function \verb@xgboost@ is a simple function with less parameter, in order
to be R-friendly. The core training function is wrapped in \verb@xgb.train@. It is more flexible than \verb@xgboost@, but it requires users to read the document a bit more carefully.

\verb@xgb.train@ only accept a \verb@xgb.DMatrix@ object as its input, while it supports advanced features as custom objective and evaluation functions.

<<Customized loss function>>=
logregobj <- function(preds, dtrain) {
   labels <- getinfo(dtrain, "label")
   preds <- 1/(1 + exp(-preds))
   grad <- preds - labels
   hess <- preds * (1 - preds)
   return(list(grad = grad, hess = hess))
}

evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  err <- sqrt(mean((preds-labels)^2))
  return(list(metric = "MSE", value = err))
}

dtest <- xgb.DMatrix(test$data, label = test$label)
watchlist <- list(eval = dtest, train = dtrain)
param <- list(max_depth = 2, eta = 1, silent = 1)

bst <- xgb.train(param, dtrain, nrounds = 2, watchlist, logregobj, evalerror, maximize = FALSE)
@

The gradient and second order gradient is required for the output of customized 
objective function. 

We also have \verb@slice@ for row extraction. It is useful in 
cross-validation.

For a walkthrough demo, please see \verb@R-package/demo/@ for further 
details.

\section{The Higgs Boson competition}

We have made a demo for \href{http://www.kaggle.com/c/higgs-boson}{the Higgs 
Boson Machine Learning Challenge}. 

Here are the instructions to make a submission
\begin{enumerate}
    \item Download the \href{http://www.kaggle.com/c/higgs-boson/data}{datasets}
    and extract them to \verb@data/@.
    \item Run scripts under \verb@xgboost/demo/kaggle-higgs/@: 
    \href{https://github.com/tqchen/xgboost/blob/master/demo/kaggle-higgs/higgs-train.R}{higgs-train.R} 
    and \href{https://github.com/tqchen/xgboost/blob/master/demo/kaggle-higgs/higgs-pred.R}{higgs-pred.R}. 
    The computation will take less than a minute on Intel i7. 
    \item Go to the \href{http://www.kaggle.com/c/higgs-boson/submissions/attach}{submission page} 
    and submit your result.
\end{enumerate}

We provide \href{https://github.com/tqchen/xgboost/blob/master/demo/kaggle-higgs/speedtest.R}{a script}
to compare the time cost on the higgs dataset with \verb@gbm@ and \verb@xgboost@. 
The training set contains 350000 records and 30 features. 

\verb@xgboost@ can automatically do parallel computation. On a machine with Intel
i7-4700MQ and 24GB memories, we found that \verb@xgboost@ costs about 35 seconds, which is about 20 times faster
than \verb@gbm@. When we limited \verb@xgboost@ to use only one thread, it was 
still about two times faster than \verb@gbm@. 

Meanwhile, the result from \verb@xgboost@ reaches 
\href{http://www.kaggle.com/c/higgs-boson/details/evaluation}{3.60@AMS} with a 
single model. This results stands in the 
\href{http://www.kaggle.com/c/higgs-boson/leaderboard}{top 30\%} of the 
competition. 

\bibliographystyle{jss}
\nocite{*} % list uncited references
\bibliography{xgboost}

\end{document}

<<Temp file cleaning, include=FALSE>>=
file.remove("xgb.DMatrix")
file.remove("model.dump")
file.remove("model.save")
@



( run in 0.416 second using v1.01-cache-2.11-cpan-e1769b4cff6 )