Alien-XGBoost

 view release on metacpan or  search on metacpan

xgboost/R-package/man/xgb.cv.Rd  view on Meta::CPAN

\item{label}{vector of response values. Should be provided only when data is an R-matrix.}

\item{missing}{is only used when input is a dense matrix. By default is set to NA, which means 
that NA values should be considered as 'missing' by the algorithm. 
Sometimes, 0 or other extreme value might be used to represent missing values.}

\item{prediction}{A logical value indicating whether to return the test fold predictions 
from each CV model. This parameter engages the \code{\link{cb.cv.predict}} callback.}

\item{showsd}{\code{boolean}, whether to show standard deviation of cross validation}

\item{metrics, }{list of evaluation metrics to be used in cross validation,
  when it is not specified, the evaluation metric is chosen according to objective function.
  Possible options are:
\itemize{
  \item \code{error} binary classification error rate
  \item \code{rmse} Rooted mean square error
  \item \code{logloss} negative log-likelihood function
  \item \code{auc} Area under curve
  \item \code{merror} Exact matching error, used to evaluate multi-class classification
}}

\item{obj}{customized objective function. Returns gradient and second order 
gradient with given prediction and dtrain.}

\item{feval}{custimized evaluation function. Returns 
\code{list(metric='metric-name', value='metric-value')} with given 
prediction and dtrain.}

\item{stratified}{a \code{boolean} indicating whether sampling of folds should be stratified 
by the values of outcome labels.}

\item{folds}{\code{list} provides a possibility to use a list of pre-defined CV folds
(each element must be a vector of test fold's indices). When folds are supplied, 
the \code{nfold} and \code{stratified} parameters are ignored.}

\item{verbose}{\code{boolean}, print the statistics during the process}

\item{print_every_n}{Print each n-th iteration evaluation messages when \code{verbose>0}.
Default is 1 which means all messages are printed. This parameter is passed to the 
\code{\link{cb.print.evaluation}} callback.}

\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered. 
If set to an integer \code{k}, training with a validation set will stop if the performance 
doesn't improve for \code{k} rounds.
Setting this parameter engages the \code{\link{cb.early.stop}} callback.}

\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set,
then this parameter must be set as well.
When it is \code{TRUE}, it means the larger the evaluation score the better.
This parameter is passed to the \code{\link{cb.early.stop}} callback.}

\item{callbacks}{a list of callback functions to perform various task during boosting.
See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the 
parameters' values. User can provide either existing or their own callback methods in order 
to customize the training process.}

\item{...}{other parameters to pass to \code{params}.}
}
\value{
An object of class \code{xgb.cv.synchronous} with the following elements:
\itemize{
  \item \code{call} a function call.
  \item \code{params} parameters that were passed to the xgboost library. Note that it does not 
        capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
  \item \code{callbacks} callback functions that were either automatically assigned or 
        explicitely passed.
  \item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
        first column corresponding to iteration number and the rest corresponding to the 
        CV-based evaluation means and standard deviations for the training and test CV-sets.
        It is created by the \code{\link{cb.evaluation.log}} callback.
  \item \code{niter} number of boosting iterations.
  \item \code{folds} the list of CV folds' indices - either those passed through the \code{folds} 
        parameter or randomly generated.
  \item \code{best_iteration} iteration number with the best evaluation metric value
        (only available with early stopping).
  \item \code{best_ntreelimit} the \code{ntreelimit} value corresponding to the best iteration, 
        which could further be used in \code{predict} method
        (only available with early stopping).
  \item \code{pred} CV prediction values available when \code{prediction} is set. 
        It is either vector or matrix (see \code{\link{cb.cv.predict}}).
  \item \code{models} a liost of the CV folds' models. It is only available with the explicit 
        setting of the \code{cb.cv.predict(save_models = TRUE)} callback.
}
}
\description{
The cross validation function of xgboost
}
\details{
The original sample is randomly partitioned into \code{nfold} equal size subsamples. 

Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model, and the remaining \code{nfold - 1} subsamples are used as training data. 

The cross-validation process is then repeated \code{nrounds} times, with each of the \code{nfold} subsamples used exactly once as the validation data.

All observations are used for both training and validation.

Adapted from \url{http://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29#k-fold_cross-validation}
}
\examples{
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = list("rmse","auc"),
                  max_depth = 3, eta = 1, objective = "binary:logistic")
print(cv)
print(cv, verbose=TRUE)

}



( run in 1.452 second using v1.01-cache-2.11-cpan-39bf76dae61 )