Alien-XGBoost

 view release on metacpan or  search on metacpan

xgboost/R-package/man/xgb.train.Rd  view on Meta::CPAN

\title{eXtreme Gradient Boosting Training}
\usage{
xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL,
  feval = NULL, verbose = 1, print_every_n = 1L,
  early_stopping_rounds = NULL, maximize = NULL, save_period = NULL,
  save_name = "xgboost.model", xgb_model = NULL, callbacks = list(), ...)

xgboost(data = NULL, label = NULL, missing = NA, weight = NULL,
  params = list(), nrounds, verbose = 1, print_every_n = 1L,
  early_stopping_rounds = NULL, maximize = NULL, save_period = NULL,
  save_name = "xgboost.model", xgb_model = NULL, callbacks = list(), ...)
}
\arguments{
\item{params}{the list of parameters. 
       The complete list of parameters is available at \url{http://xgboost.readthedocs.io/en/latest/parameter.html}.
       Below is a shorter summary:

1. General Parameters

\itemize{
  \item \code{booster} which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}.
}
 
2. Booster Parameters

2.1. Parameter for Tree Booster

\itemize{
  \item \code{eta} control the learning rate: scale the contribution of each tree by a factor of \code{0 < eta < 1} when it is added to the current approximation. Used to prevent overfitting by making the boosting process more conservative. Lower val...
  \item \code{gamma} minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. 
  \item \code{max_depth} maximum depth of a tree. Default: 6
  \item \code{min_child_weight} minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further p...
  \item \code{subsample} subsample ratio of the training instance. Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees and this will prevent overfitting. It makes computation shorter (because less data to ...
  \item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1
  \item \code{num_parallel_tree} Experimental parameter. number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample  < 1}  and \code{round = 1}) accordingly. Default: 1
  \item \code{monotone_constraints} A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length equals to the number of features in the training data. \code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.
}

2.2. Parameter for Linear Booster
 
\itemize{
  \item \code{lambda} L2 regularization term on weights. Default: 0
  \item \code{lambda_bias} L2 regularization term on bias. Default: 0
  \item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
}

3. Task Parameters 

\itemize{
\item \code{objective} specify the learning task and the corresponding learning objective, users can pass a self-defined function to it. The default objective options are below:
  \itemize{
    \item \code{reg:linear} linear regression (Default).
    \item \code{reg:logistic} logistic regression.
    \item \code{binary:logistic} logistic regression for binary classification. Output probability.
    \item \code{binary:logitraw} logistic regression for binary classification, output score before logistic transformation.
    \item \code{num_class} set the number of classes. To use only with multiclass objectives.
    \item \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to \code{num_class - 1}.
    \item \code{multi:softprob} same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class...
    \item \code{rank:pairwise} set xgboost to do ranking task by minimizing the pairwise loss.
  }
  \item \code{base_score} the initial prediction score of all instances, global bias. Default: 0.5
  \item \code{eval_metric} evaluation metrics for validation data. Users can pass a self-defined function to it. Default: metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ran...
}}

\item{data}{training dataset. \code{xgb.train} accepts only an \code{xgb.DMatrix} as the input.
\code{xgboost}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or name of a local data file.}

\item{nrounds}{max number of boosting iterations.}

\item{watchlist}{named list of xgb.DMatrix datasets to use for evaluating model performance.
Metrics specified in either \code{eval_metric} or \code{feval} will be computed for each
of these datasets during each boosting iteration, and stored in the end as a field named 
\code{evaluation_log} in the resulting object. When either \code{verbose>=1} or 
\code{\link{cb.print.evaluation}} callback is engaged, the performance results are continuously
printed out during the training. 
E.g., specifying \code{watchlist=list(validation1=mat1, validation2=mat2)} allows to track
the performance of each round's model on mat1 and mat2.}

\item{obj}{customized objective function. Returns gradient and second order 
gradient with given prediction and dtrain.}

\item{feval}{custimized evaluation function. Returns 
\code{list(metric='metric-name', value='metric-value')} with given 
prediction and dtrain.}

\item{verbose}{If 0, xgboost will stay silent. If 1, it will print information about performance.
If 2, some additional information will be printed out.
Note that setting \code{verbose > 0} automatically engages the 
\code{cb.print.evaluation(period=1)} callback function.}

\item{print_every_n}{Print each n-th iteration evaluation messages when \code{verbose>0}.
Default is 1 which means all messages are printed. This parameter is passed to the 
\code{\link{cb.print.evaluation}} callback.}

\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered. 
If set to an integer \code{k}, training with a validation set will stop if the performance 
doesn't improve for \code{k} rounds.
Setting this parameter engages the \code{\link{cb.early.stop}} callback.}

\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set,
then this parameter must be set as well.
When it is \code{TRUE}, it means the larger the evaluation score the better.
This parameter is passed to the \code{\link{cb.early.stop}} callback.}

\item{save_period}{when it is non-NULL, model is saved to disk after every \code{save_period} rounds,
0 means save at the end. The saving is handled by the \code{\link{cb.save.model}} callback.}

\item{save_name}{the name or path for periodically saved model file.}

\item{xgb_model}{a previously built model to continue the training from.
Could be either an object of class \code{xgb.Booster}, or its raw data, or the name of a 
file with a previously saved model.}

\item{callbacks}{a list of callback functions to perform various task during boosting.
See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the 
parameters' values. User can provide either existing or their own callback methods in order 
to customize the training process.}

\item{...}{other parameters to pass to \code{params}.}

\item{label}{vector of response values. Should not be provided when data is 



( run in 1.263 second using v1.01-cache-2.11-cpan-39bf76dae61 )