Alien-XGBoost

 view release on metacpan or  search on metacpan

MANIFEST  view on Meta::CPAN

xgboost/CMakeLists.txt
xgboost/CONTRIBUTORS.md
xgboost/ISSUE_TEMPLATE.md
xgboost/Jenkinsfile
xgboost/LICENSE
xgboost/Makefile
xgboost/NEWS.md
xgboost/R-package/DESCRIPTION
xgboost/R-package/LICENSE
xgboost/R-package/NAMESPACE
xgboost/R-package/R/callbacks.R
xgboost/R-package/R/utils.R
xgboost/R-package/R/xgb.Booster.R
xgboost/R-package/R/xgb.DMatrix.R
xgboost/R-package/R/xgb.DMatrix.save.R
xgboost/R-package/R/xgb.create.features.R
xgboost/R-package/R/xgb.cv.R
xgboost/R-package/R/xgb.dump.R
xgboost/R-package/R/xgb.ggplot.R
xgboost/R-package/R/xgb.importance.R
xgboost/R-package/R/xgb.load.R

MANIFEST  view on Meta::CPAN

xgboost/R-package/demo/custom_objective.R
xgboost/R-package/demo/early_stopping.R
xgboost/R-package/demo/generalized_linear_model.R
xgboost/R-package/demo/poisson_regression.R
xgboost/R-package/demo/predict_first_ntree.R
xgboost/R-package/demo/predict_leaf_indices.R
xgboost/R-package/demo/runall.R
xgboost/R-package/demo/tweedie_regression.R
xgboost/R-package/man/agaricus.test.Rd
xgboost/R-package/man/agaricus.train.Rd
xgboost/R-package/man/callbacks.Rd
xgboost/R-package/man/cb.cv.predict.Rd
xgboost/R-package/man/cb.early.stop.Rd
xgboost/R-package/man/cb.evaluation.log.Rd
xgboost/R-package/man/cb.print.evaluation.Rd
xgboost/R-package/man/cb.reset.parameters.Rd
xgboost/R-package/man/cb.save.model.Rd
xgboost/R-package/man/dim.xgb.DMatrix.Rd
xgboost/R-package/man/dimnames.xgb.DMatrix.Rd
xgboost/R-package/man/getinfo.Rd
xgboost/R-package/man/predict.xgb.Booster.Rd

MANIFEST  view on Meta::CPAN

xgboost/R-package/man/xgboost-deprecated.Rd
xgboost/R-package/src/Makevars.in
xgboost/R-package/src/Makevars.win
xgboost/R-package/src/init.c
xgboost/R-package/src/xgboost_R.cc
xgboost/R-package/src/xgboost_R.h
xgboost/R-package/src/xgboost_assert.c
xgboost/R-package/src/xgboost_custom.cc
xgboost/R-package/tests/testthat.R
xgboost/R-package/tests/testthat/test_basic.R
xgboost/R-package/tests/testthat/test_callbacks.R
xgboost/R-package/tests/testthat/test_custom_objective.R
xgboost/R-package/tests/testthat/test_dmatrix.R
xgboost/R-package/tests/testthat/test_gc_safety.R
xgboost/R-package/tests/testthat/test_glm.R
xgboost/R-package/tests/testthat/test_helpers.R
xgboost/R-package/tests/testthat/test_lint.R
xgboost/R-package/tests/testthat/test_monotone.R
xgboost/R-package/tests/testthat/test_parameter_exposure.R
xgboost/R-package/tests/testthat/test_poisson_regression.R
xgboost/R-package/tests/testthat/test_update.R

xgboost/R-package/R/callbacks.R  view on Meta::CPAN

#' 
#' When a callback function has \code{finalize} parameter, its finalizer part will also be run after 
#' the boosting is completed.
#' 
#' WARNING: side-effects!!! Be aware that these callback functions access and modify things in 
#' the environment from which they are called from, which is a fairly uncommon thing to do in R.
#' 
#' To write a custom callback closure, make sure you first understand the main concepts about R envoronments.
#' Check either R documentation on \code{\link[base]{environment}} or the 
#' \href{http://adv-r.had.co.nz/Environments.html}{Environments chapter} from the "Advanced R" 
#' book by Hadley Wickham. Further, the best option is to read the code of some of the existing callbacks -
#' choose ones that do something similar to what you want to achieve. Also, you would need to get familiar 
#' with the objects available inside of the \code{xgb.train} and \code{xgb.cv} internal environments.
#' 
#' @seealso
#' \code{\link{cb.print.evaluation}},
#' \code{\link{cb.evaluation.log}},
#' \code{\link{cb.reset.parameters}},
#' \code{\link{cb.early.stop}},
#' \code{\link{cb.save.model}},
#' \code{\link{cb.cv.predict}},
#' \code{\link{xgb.train}},
#' \code{\link{xgb.cv}}
#' 
#' @name callbacks
NULL

#
# Callbacks -------------------------------------------------------------------
# 

#' Callback closure for printing the result of evaluation
#' 
#' @param period  results would be printed every number of periods
#' @param showsd  whether standard deviations should be printed (when available)

xgboost/R-package/R/callbacks.R  view on Meta::CPAN

#' The callback function prints the result of evaluation at every \code{period} iterations.
#' The initial and the last iteration's evaluations are always printed.
#' 
#' Callback function expects the following values to be set in its calling frame:
#' \code{bst_evaluation} (also \code{bst_evaluation_err} when available),
#' \code{iteration},
#' \code{begin_iteration},
#' \code{end_iteration}.
#' 
#' @seealso
#' \code{\link{callbacks}}
#' 
#' @export
cb.print.evaluation <- function(period = 1, showsd = TRUE) {
  
  callback <- function(env = parent.frame()) {
    if (length(env$bst_evaluation) == 0 ||
        period == 0 ||
        NVL(env$rank, 0) != 0 )
      return()
    

xgboost/R-package/R/callbacks.R  view on Meta::CPAN

#' 
#' Note: in the column names of the final data.table, the dash '-' character is replaced with 
#' the underscore '_' in order to make the column names more like regular R identifiers.
#' 
#' Callback function expects the following values to be set in its calling frame:
#' \code{evaluation_log},
#' \code{bst_evaluation},
#' \code{iteration}.
#' 
#' @seealso
#' \code{\link{callbacks}}
#' 
#' @export
cb.evaluation.log <- function() {

  mnames <- NULL
  
  init <- function(env) {
    if (!is.list(env$evaluation_log))
      stop("'evaluation_log' has to be a list")
    mnames <<- names(env$bst_evaluation)

xgboost/R-package/R/callbacks.R  view on Meta::CPAN

#' reset a parameter value, the \code{nround} argument in this function would be the 
#' the number of boosting rounds in the current training.
#'
#' Callback function expects the following values to be set in its calling frame:
#' \code{bst} or \code{bst_folds},
#' \code{iteration},
#' \code{begin_iteration},
#' \code{end_iteration}.
#' 
#' @seealso
#' \code{\link{callbacks}}
#' 
#' @export
cb.reset.parameters <- function(new_params) {

  if (typeof(new_params) != "list") 
    stop("'new_params' must be a list")
  pnames <- gsub("\\.", "_", names(new_params))
  nrounds <- NULL
  
  # run some checks in the begining

xgboost/R-package/R/callbacks.R  view on Meta::CPAN

#' \code{stop_condition},
#' \code{bst_evaluation},
#' \code{rank},
#' \code{bst} (or \code{bst_folds} and \code{basket}),
#' \code{iteration},
#' \code{begin_iteration},
#' \code{end_iteration},
#' \code{num_parallel_tree}.
#' 
#' @seealso
#' \code{\link{callbacks}},
#' \code{\link{xgb.attr}}
#' 
#' @export
cb.early.stop <- function(stopping_rounds, maximize = FALSE, 
                          metric_name = NULL, verbose = TRUE) {
  # state variables
  best_iteration <- -1
  best_ntreelimit <- -1
  best_score <- Inf
  best_msg <- NULL

xgboost/R-package/R/callbacks.R  view on Meta::CPAN

#' @details 
#' This callback function allows to save an xgb-model file, either periodically after each \code{save_period}'s or at the end.
#' 
#' Callback function expects the following values to be set in its calling frame:
#' \code{bst},
#' \code{iteration},
#' \code{begin_iteration},
#' \code{end_iteration}.
#' 
#' @seealso
#' \code{\link{callbacks}}
#' 
#' @export
cb.save.model <- function(save_period = 0, save_name = "xgboost.model") {
  
  if (save_period < 0)
    stop("'save_period' cannot be negative")

  callback <- function(env = parent.frame()) {
    if (is.null(env$bst))
      stop("'save_model' callback requires the 'bst' booster object in its calling frame")

xgboost/R-package/R/callbacks.R  view on Meta::CPAN

#' Predictions are returned inside of the \code{pred} element, which is either a vector or a matrix,
#' depending on the number of prediction outputs per data row. The order of predictions corresponds 
#' to the order of rows in the original dataset. Note that when a custom \code{folds} list is 
#' provided in \code{xgb.cv}, the predictions would only be returned properly when this list is a 
#' non-overlapping list of k sets of indices, as in a standard k-fold CV. The predictions would not be 
#' meaningful when user-profided folds have overlapping indices as in, e.g., random sampling splits.
#' When some of the indices in the training dataset are not included into user-provided \code{folds},
#' their prediction value would be \code{NA}.
#' 
#' @seealso
#' \code{\link{callbacks}}
#' 
#' @export
cb.cv.predict <- function(save_models = FALSE) {
  
  finalizer <- function(env) {
    if (is.null(env$basket) || is.null(env$bst_folds))
      stop("'cb.cv.predict' callback requires 'basket' and 'bst_folds' lists in its calling frame")
    
    N <- nrow(env$data)
    pred <- 

xgboost/R-package/R/callbacks.R  view on Meta::CPAN

    if (finalize)
      return(finalizer(env))
  }
  attr(callback, 'call') <- match.call()
  attr(callback, 'name') <- 'cb.cv.predict'
  callback
}


#
# Internal utility functions for callbacks ------------------------------------
# 

# Format the evaluation metric string
format.eval.string <- function(iter, eval_res, eval_err = NULL) {
  if (length(eval_res) == 0)
    stop('no evaluation results')
  enames <- names(eval_res)
  if (is.null(enames))
    stop('evaluation results must have names')
  iter <- sprintf('[%d]\t', iter)
  if (!is.null(eval_err)) {
    if (length(eval_res) != length(eval_err))
      stop('eval_res & eval_err lengths mismatch')
    res <- paste0(sprintf("%s:%f+%f", enames, eval_res, eval_err), collapse = '\t')
  } else {
    res <- paste0(sprintf("%s:%f", enames, eval_res), collapse = '\t')
  }
  return(paste0(iter, res))
}

# Extract callback names from the list of callbacks
callback.names <- function(cb_list) {
  unlist(lapply(cb_list, function(x) attr(x, 'name')))
}

# Extract callback calls from the list of callbacks
callback.calls <- function(cb_list) {
  unlist(lapply(cb_list, function(x) attr(x, 'call')))
}

# Add a callback cb to the list and make sure that 
# cb.early.stop and cb.cv.predict are at the end of the list
# with cb.cv.predict being the last (when present)
add.cb <- function(cb_list, cb) {
  cb_list <- c(cb_list, cb)
  names(cb_list) <- callback.names(cb_list)

xgboost/R-package/R/callbacks.R  view on Meta::CPAN

    # this removes only the first one
    cb_list['cb.early.stop'] <- NULL 
  }
  if ('cb.cv.predict' %in% names(cb_list)) {
    cb_list <- c(cb_list, cb_list['cb.cv.predict'])
    cb_list['cb.cv.predict'] <- NULL 
  }
  cb_list
}

# Sort callbacks list into categories
categorize.callbacks <- function(cb_list) {
  list(
    pre_iter = Filter(function(x) {
        pre <- attr(x, 'is_pre_iteration')
        !is.null(pre) && pre 
      }, cb_list),
    post_iter = Filter(function(x) {
        pre <- attr(x, 'is_pre_iteration')
        is.null(pre) || !pre
      }, cb_list),
    finalize = Filter(function(x) {
        'finalize' %in% names(formals(x))
      }, cb_list)
  )
}

# Check whether all callback functions with names given by 'query_names' are present in the 'cb_list'.
has.callbacks <- function(cb_list, query_names) {
  if (length(cb_list) < length(query_names))
    return(FALSE)
  if (!is.list(cb_list) ||
      any(sapply(cb_list, class) != 'function')) {
    stop('`cb_list` must be a list of callback functions')
  }
  cb_names <- callback.names(cb_list)
  if (!is.character(cb_names) ||
      length(cb_names) != length(cb_list) ||
      any(cb_names == "")) {
    stop('All callbacks in the `cb_list` must have a non-empty `name` attribute')
  }
  if (!is.character(query_names) ||
      length(query_names) == 0 ||
      any(query_names == "")) {
    stop('query_names must be a non-empty vector of non-empty character names')
  }
  return(all(query_names %in% cb_names))
}

xgboost/R-package/R/utils.R  view on Meta::CPAN

  if (!is.null(env$params[['eval_metric']]) &&
      typeof(env$params$eval_metric) == 'closure') {
    env$feval <- env$params$eval_metric
    env$params$eval_metric <- NULL
  }
  
  # require maximize to be set when custom feval and early stopping are used together
  if (!is.null(env$feval) &&
      is.null(env$maximize) && (
        !is.null(env$early_stopping_rounds) || 
        has.callbacks(env$callbacks, 'cb.early.stop')))
    stop("Please set 'maximize' to indicate whether the evaluation metric needs to be maximized or not")
}


# Update a booster handle for an iteration with dtrain data
xgb.iter.update <- function(booster_handle, dtrain, iter, obj = NULL) {
  if (!identical(class(booster_handle), "xgb.Booster.handle")) {
    stop("booster_handle must be of xgb.Booster.handle class")
  }
  if (!inherits(dtrain, "xgb.DMatrix")) {

xgboost/R-package/R/xgb.Booster.R  view on Meta::CPAN

    cat('xgb.attributes:\n')
    if (verbose) {
      cat( paste(paste0('  ',names(attrs)),
                 paste0('"', unlist(attrs), '"'),
                 sep = ' = ', collapse = '\n'), '\n', sep = '')
    } else {
      cat('  ', paste(names(attrs), collapse = ', '), '\n', sep = '')
    }
  }
  
  if (!is.null(x$callbacks) && length(x$callbacks) > 0) {
    cat('callbacks:\n')
    lapply(callback.calls(x$callbacks), function(x) {
      cat('  ')
      print(x)
    })
  }
  
  if (!is.null(x$feature_names))
    cat('# of features:', length(x$feature_names), '\n')
  
  cat('niter: ', x$niter, '\n', sep = '')
  # TODO: uncomment when faster xgb.ntree is implemented
  #cat('ntree: ', xgb.ntree(x), '\n', sep='')
  
  for (n in setdiff(names(x), c('handle', 'raw', 'call', 'params', 'callbacks',
                                'evaluation_log','niter','feature_names'))) {
    if (is.atomic(x[[n]])) {
      cat(n, ':', x[[n]], '\n', sep = ' ')
    } else {
      cat(n, ':\n\t', sep = ' ')
      print(x[[n]])
    }
  }
  
  if (!is.null(x$evaluation_log)) {

xgboost/R-package/R/xgb.cv.R  view on Meta::CPAN

#'        Default is 1 which means all messages are printed. This parameter is passed to the 
#'        \code{\link{cb.print.evaluation}} callback.
#' @param early_stopping_rounds If \code{NULL}, the early stopping function is not triggered. 
#'        If set to an integer \code{k}, training with a validation set will stop if the performance 
#'        doesn't improve for \code{k} rounds.
#'        Setting this parameter engages the \code{\link{cb.early.stop}} callback.
#' @param maximize If \code{feval} and \code{early_stopping_rounds} are set,
#'        then this parameter must be set as well.
#'        When it is \code{TRUE}, it means the larger the evaluation score the better.
#'        This parameter is passed to the \code{\link{cb.early.stop}} callback.
#' @param callbacks a list of callback functions to perform various task during boosting.
#'        See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the 
#'        parameters' values. User can provide either existing or their own callback methods in order 
#'        to customize the training process.
#' @param ... other parameters to pass to \code{params}.
#' 
#' @details 
#' The original sample is randomly partitioned into \code{nfold} equal size subsamples. 
#' 
#' Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model, and the remaining \code{nfold - 1} subsamples are used as training data. 
#' 
#' The cross-validation process is then repeated \code{nrounds} times, with each of the \code{nfold} subsamples used exactly once as the validation data.

xgboost/R-package/R/xgb.cv.R  view on Meta::CPAN

#' All observations are used for both training and validation.
#' 
#' Adapted from \url{http://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29#k-fold_cross-validation}
#'
#' @return 
#' An object of class \code{xgb.cv.synchronous} with the following elements:
#' \itemize{
#'   \item \code{call} a function call.
#'   \item \code{params} parameters that were passed to the xgboost library. Note that it does not 
#'         capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
#'   \item \code{callbacks} callback functions that were either automatically assigned or 
#'         explicitely passed.
#'   \item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
#'         first column corresponding to iteration number and the rest corresponding to the 
#'         CV-based evaluation means and standard deviations for the training and test CV-sets.
#'         It is created by the \code{\link{cb.evaluation.log}} callback.
#'   \item \code{niter} number of boosting iterations.
#'   \item \code{folds} the list of CV folds' indices - either those passed through the \code{folds} 
#'         parameter or randomly generated.
#'   \item \code{best_iteration} iteration number with the best evaluation metric value
#'         (only available with early stopping).

xgboost/R-package/R/xgb.cv.R  view on Meta::CPAN

#' cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = list("rmse","auc"),
#'                   max_depth = 3, eta = 1, objective = "binary:logistic")
#' print(cv)
#' print(cv, verbose=TRUE)
#' 
#' @export
xgb.cv <- function(params=list(), data, nrounds, nfold, label = NULL, missing = NA,
                   prediction = FALSE, showsd = TRUE, metrics=list(),
                   obj = NULL, feval = NULL, stratified = TRUE, folds = NULL, 
                   verbose = TRUE, print_every_n=1L,
                   early_stopping_rounds = NULL, maximize = NULL, callbacks = list(), ...) {

  check.deprecation(...)
  
  params <- check.booster.params(params, ...)
  # TODO: should we deprecate the redundant 'metrics' parameter?
  for (m in metrics)
    params <- c(params, list("eval_metric" = m))
  
  check.custom.obj()
  check.custom.eval()

xgboost/R-package/R/xgb.cv.R  view on Meta::CPAN

    folds <- generate.cv.folds(nfold, nrow(data), stratified, label, params)
  }
  
  # Potential TODO: sequential CV
  #if (strategy == 'sequential')
  #  stop('Sequential CV strategy is not yet implemented')

  # verbosity & evaluation printing callback:
  params <- c(params, list(silent = 1))
  print_every_n <- max( as.integer(print_every_n), 1L)
  if (!has.callbacks(callbacks, 'cb.print.evaluation') && verbose) {
    callbacks <- add.cb(callbacks, cb.print.evaluation(print_every_n, showsd = showsd))
  }
  # evaluation log callback: always is on in CV
  evaluation_log <- list()
  if (!has.callbacks(callbacks, 'cb.evaluation.log')) {
    callbacks <- add.cb(callbacks, cb.evaluation.log())
  }
  # Early stopping callback
  stop_condition <- FALSE
  if (!is.null(early_stopping_rounds) &&
      !has.callbacks(callbacks, 'cb.early.stop')) {
    callbacks <- add.cb(callbacks, cb.early.stop(early_stopping_rounds, 
                                                 maximize = maximize, verbose = verbose))
  }
  # CV-predictions callback
  if (prediction &&
      !has.callbacks(callbacks, 'cb.cv.predict')) {
    callbacks <- add.cb(callbacks, cb.cv.predict(save_models = FALSE))
  }
  # Sort the callbacks into categories
  cb <- categorize.callbacks(callbacks)

  
  # create the booster-folds
  dall <- xgb.get.DMatrix(data, label, missing)
  bst_folds <- lapply(seq_along(folds), function(k) {
    dtest  <- slice(dall, folds[[k]])
    dtrain <- slice(dall, unlist(folds[-k]))
    handle <- xgb.Booster.handle(params, list(dtrain, dtest))
    list(dtrain = dtrain, bst = handle, watchlist = list(train = dtrain, test=dtest), index = folds[[k]])
  })
  # a "basket" to collect some results from callbacks
  basket <- list()

  # extract parameters that can affect the relationship b/w #trees and #iterations
  num_class <- max(as.numeric(NVL(params[['num_class']], 1)), 1)
  num_parallel_tree <- max(as.numeric(NVL(params[['num_parallel_tree']], 1)), 1)

  # those are fixed for CV (no training continuation)
  begin_iteration <- 1
  end_iteration <- nrounds
  

xgboost/R-package/R/xgb.cv.R  view on Meta::CPAN

    for (f in cb$post_iter) f()
    
    if (stop_condition) break
  }
  for (f in cb$finalize) f(finalize = TRUE)

  # the CV result
  ret <- list(
    call = match.call(),
    params = params,
    callbacks = callbacks,
    evaluation_log = evaluation_log,
    niter = end_iteration,
    folds = folds
  )
  ret <- c(ret, basket)

  class(ret) <- 'xgb.cv.synchronous'
  invisible(ret)
}

xgboost/R-package/R/xgb.cv.R  view on Meta::CPAN

      cat('call:\n  ')
      print(x$call)
    }
    if (!is.null(x$params)) {
      cat('params (as set within xgb.cv):\n')
      cat( '  ', 
           paste(names(x$params), 
                 paste0('"', unlist(x$params), '"'),
                 sep = ' = ', collapse = ', '), '\n', sep = '')
    }
    if (!is.null(x$callbacks) && length(x$callbacks) > 0) {
      cat('callbacks:\n')
      lapply(callback.calls(x$callbacks), function(x) {
        cat('  ')
        print(x)
      })
    }
    
    for (n in c('niter', 'best_iteration', 'best_ntreelimit')) {
      if (is.null(x[[n]])) 
        next
      cat(n, ': ', x[[n]], '\n', sep = '')
    }

xgboost/R-package/R/xgb.train.R  view on Meta::CPAN

#' @param maximize If \code{feval} and \code{early_stopping_rounds} are set,
#'        then this parameter must be set as well.
#'        When it is \code{TRUE}, it means the larger the evaluation score the better.
#'        This parameter is passed to the \code{\link{cb.early.stop}} callback.
#' @param save_period when it is non-NULL, model is saved to disk after every \code{save_period} rounds,
#'        0 means save at the end. The saving is handled by the \code{\link{cb.save.model}} callback.
#' @param save_name the name or path for periodically saved model file.
#' @param xgb_model a previously built model to continue the training from.
#'        Could be either an object of class \code{xgb.Booster}, or its raw data, or the name of a 
#'        file with a previously saved model.
#' @param callbacks a list of callback functions to perform various task during boosting.
#'        See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the 
#'        parameters' values. User can provide either existing or their own callback methods in order 
#'        to customize the training process.
#' @param ... other parameters to pass to \code{params}.
#' @param label vector of response values. Should not be provided when data is 
#'        a local data file name or an \code{xgb.DMatrix}.
#' @param missing by default is set to NA, which means that NA values should be considered as 'missing'
#'        by the algorithm. Sometimes, 0 or other extreme value might be used to represent missing values.
#'        This parameter is only used when input is a dense matrix.
#' @param weight a vector indicating the weight for each row of the input.
#' 

xgboost/R-package/R/xgb.train.R  view on Meta::CPAN

#'      \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
#'      \item \code{mlogloss} multiclass logloss. \url{https://www.kaggle.com/wiki/MultiClassLogLoss/}
#'      \item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
#'            By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
#'            Different threshold (e.g., 0.) could be specified as "error@0."
#'      \item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
#'      \item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
#'      \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG}
#'   }
#' 
#' The following callbacks are automatically created when certain parameters are set:
#' \itemize{
#'   \item \code{cb.print.evaluation} is turned on when \code{verbose > 0};
#'         and the \code{print_every_n} parameter is passed to it.
#'   \item \code{cb.evaluation.log} is on when \code{watchlist} is present.
#'   \item \code{cb.early.stop}: when \code{early_stopping_rounds} is set.
#'   \item \code{cb.save.model}: when \code{save_period > 0} is set.
#' }
#' 
#' @return 
#' An object of class \code{xgb.Booster} with the following elements:
#' \itemize{
#'   \item \code{handle} a handle (pointer) to the xgboost model in memory.
#'   \item \code{raw} a cached memory dump of the xgboost model saved as R's \code{raw} type.
#'   \item \code{niter} number of boosting iterations.
#'   \item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
#'         first column corresponding to iteration number and the rest corresponding to evaluation
#'         metrics' values. It is created by the \code{\link{cb.evaluation.log}} callback.
#'   \item \code{call} a function call.
#'   \item \code{params} parameters that were passed to the xgboost library. Note that it does not 
#'         capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
#'   \item \code{callbacks} callback functions that were either automatically assigned or 
#'         explicitely passed.
#'   \item \code{best_iteration} iteration number with the best evaluation metric value
#'         (only available with early stopping).
#'   \item \code{best_ntreelimit} the \code{ntreelimit} value corresponding to the best iteration, 
#'         which could further be used in \code{predict} method
#'         (only available with early stopping).
#'   \item \code{best_score} the best evaluation metric value during early stopping.
#'         (only available with early stopping).
#'   \item \code{feature_names} names of the training dataset features
#'         (only when comun names were defined in training data).
#' }
#' 
#' @seealso
#' \code{\link{callbacks}},
#' \code{\link{predict.xgb.Booster}},
#' \code{\link{xgb.cv}}
#' 
#' @examples
#' data(agaricus.train, package='xgboost')
#' data(agaricus.test, package='xgboost')
#' 
#' dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
#' dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
#' watchlist <- list(train = dtrain, eval = dtest)

xgboost/R-package/R/xgb.train.R  view on Meta::CPAN

#' #  or as dedicated 'obj' and 'feval' parameters of xgb.train:
#' bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
#'                  obj = logregobj, feval = evalerror)
#' 
#' 
#' ## An xgb.train example of using variable learning rates at each iteration:
#' param <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2,
#'               objective = "binary:logistic", eval_metric = "auc")
#' my_etas <- list(eta = c(0.5, 0.1))
#' bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
#'                  callbacks = list(cb.reset.parameters(my_etas)))
#' 
#' ## Early stopping:
#' bst <- xgb.train(param, dtrain, nrounds = 25, watchlist,
#'                  early_stopping_rounds = 3)
#' 
#' ## An 'xgboost' interface example:
#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, 
#'                max_depth = 2, eta = 1, nthread = 2, nrounds = 2, 
#'                objective = "binary:logistic")
#' pred <- predict(bst, agaricus.test$data)
#' 
#' @rdname xgb.train
#' @export
xgb.train <- function(params = list(), data, nrounds, watchlist = list(),
                      obj = NULL, feval = NULL, verbose = 1, print_every_n = 1L,
                      early_stopping_rounds = NULL, maximize = NULL,
                      save_period = NULL, save_name = "xgboost.model", 
                      xgb_model = NULL, callbacks = list(), ...) {
  
  check.deprecation(...)
  
  params <- check.booster.params(params, ...)

  check.custom.obj()
  check.custom.eval()
  
  # data & watchlist checks
  dtrain <- data

xgboost/R-package/R/xgb.train.R  view on Meta::CPAN

        !all(vapply(watchlist, inherits, logical(1), what = 'xgb.DMatrix')))
      stop("watchlist must be a list of xgb.DMatrix elements")
    evnames <- names(watchlist)
    if (is.null(evnames) || any(evnames == ""))
      stop("each element of the watchlist must have a name tag")
  }

  # evaluation printing callback
  params <- c(params, list(silent = ifelse(verbose > 1, 0, 1)))
  print_every_n <- max( as.integer(print_every_n), 1L)
  if (!has.callbacks(callbacks, 'cb.print.evaluation') &&
      verbose) {
    callbacks <- add.cb(callbacks, cb.print.evaluation(print_every_n))
  }
  # evaluation log callback:  it is automatically enabled when watchlist is provided
  evaluation_log <- list()
  if (!has.callbacks(callbacks, 'cb.evaluation.log') &&
      length(watchlist) > 0) {
    callbacks <- add.cb(callbacks, cb.evaluation.log())
  }
  # Model saving callback
  if (!is.null(save_period) &&
      !has.callbacks(callbacks, 'cb.save.model')) {
    callbacks <- add.cb(callbacks, cb.save.model(save_period, save_name))
  }
  # Early stopping callback
  stop_condition <- FALSE
  if (!is.null(early_stopping_rounds) &&
      !has.callbacks(callbacks, 'cb.early.stop')) {
    callbacks <- add.cb(callbacks, cb.early.stop(early_stopping_rounds, 
                                                 maximize = maximize, verbose = verbose))
  }
  # Sort the callbacks into categories
  cb <- categorize.callbacks(callbacks)

  # The tree updating process would need slightly different handling
  is_update <- NVL(params[['process_type']], '.') == 'update'

  # Construct a booster (either a new one or load from xgb_model)
  handle <- xgb.Booster.handle(params, append(watchlist, dtrain), xgb_model)
  bst <- xgb.handleToBooster(handle)

  # extract parameters that can affect the relationship b/w #trees and #iterations
  num_class <- max(as.numeric(NVL(params[['num_class']], 1)), 1)

xgboost/R-package/R/xgb.train.R  view on Meta::CPAN

        !is.null(xgb_model$evaluation_log) &&
        all.equal(colnames(evaluation_log),
                  colnames(xgb_model$evaluation_log))) {
      evaluation_log <- rbindlist(list(xgb_model$evaluation_log, evaluation_log))
    }
    bst$evaluation_log <- evaluation_log
  }

  bst$call <- match.call()
  bst$params <- params
  bst$callbacks <- callbacks
  if (!is.null(colnames(dtrain)))
    bst$feature_names <- colnames(dtrain)
  
  return(bst)
}

xgboost/R-package/R/xgboost.R  view on Meta::CPAN

# Simple interface for training an xgboost model that wraps \code{xgb.train}.
# Its documentation is combined with xgb.train.
#
#' @rdname xgb.train
#' @export
xgboost <- function(data = NULL, label = NULL, missing = NA, weight = NULL,
                    params = list(), nrounds,
                    verbose = 1, print_every_n = 1L, 
                    early_stopping_rounds = NULL, maximize = NULL, 
                    save_period = NULL, save_name = "xgboost.model",
                    xgb_model = NULL, callbacks = list(), ...) {

  dtrain <- xgb.get.DMatrix(data, label, missing, weight)

  watchlist <- list(train = dtrain)

  bst <- xgb.train(params, dtrain, nrounds, watchlist, verbose = verbose, print_every_n = print_every_n,
                   early_stopping_rounds = early_stopping_rounds, maximize = maximize,
                   save_period = save_period, save_name = save_name,
                   xgb_model = xgb_model, callbacks = callbacks, ...)
  return(bst)
}

#' Training part from Mushroom Data Set
#' 
#' This data set is originally from the Mushroom data set,
#' UCI Machine Learning Repository.
#' 
#' This data set includes the following fields:
#' 

xgboost/R-package/man/callbacks.Rd  view on Meta::CPAN

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/callbacks.R
\name{callbacks}
\alias{callbacks}
\title{Callback closures for booster training.}
\description{
These are used to perform various service tasks either during boosting iterations or at the end.
This approach helps to modularize many of such tasks without bloating the main training methods, 
and it offers .
}
\details{
By default, a callback function is run after each boosting iteration.
An R-attribute \code{is_pre_iteration} could be set for a callback to define a pre-iteration function.

When a callback function has \code{finalize} parameter, its finalizer part will also be run after 
the boosting is completed.

WARNING: side-effects!!! Be aware that these callback functions access and modify things in 
the environment from which they are called from, which is a fairly uncommon thing to do in R.

To write a custom callback closure, make sure you first understand the main concepts about R envoronments.
Check either R documentation on \code{\link[base]{environment}} or the 
\href{http://adv-r.had.co.nz/Environments.html}{Environments chapter} from the "Advanced R" 
book by Hadley Wickham. Further, the best option is to read the code of some of the existing callbacks -
choose ones that do something similar to what you want to achieve. Also, you would need to get familiar 
with the objects available inside of the \code{xgb.train} and \code{xgb.cv} internal environments.
}
\seealso{
\code{\link{cb.print.evaluation}},
\code{\link{cb.evaluation.log}},
\code{\link{cb.reset.parameters}},
\code{\link{cb.early.stop}},
\code{\link{cb.save.model}},
\code{\link{cb.cv.predict}},

xgboost/R-package/man/cb.cv.predict.Rd  view on Meta::CPAN

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/callbacks.R
\name{cb.cv.predict}
\alias{cb.cv.predict}
\title{Callback closure for returning cross-validation based predictions.}
\usage{
cb.cv.predict(save_models = FALSE)
}
\arguments{
\item{save_models}{a flag for whether to save the folds' models.}
}
\value{

xgboost/R-package/man/cb.cv.predict.Rd  view on Meta::CPAN

Callback function expects the following values to be set in its calling frame:
\code{bst_folds},
\code{basket},
\code{data},
\code{end_iteration},
\code{params},
\code{num_parallel_tree},
\code{num_class}.
}
\seealso{
\code{\link{callbacks}}
}

xgboost/R-package/man/cb.early.stop.Rd  view on Meta::CPAN

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/callbacks.R
\name{cb.early.stop}
\alias{cb.early.stop}
\title{Callback closure to activate the early stopping.}
\usage{
cb.early.stop(stopping_rounds, maximize = FALSE, metric_name = NULL,
  verbose = TRUE)
}
\arguments{
\item{stopping_rounds}{The number of rounds with no improvement in 
the evaluation metric in order to stop the training.}

xgboost/R-package/man/cb.early.stop.Rd  view on Meta::CPAN

\code{stop_condition},
\code{bst_evaluation},
\code{rank},
\code{bst} (or \code{bst_folds} and \code{basket}),
\code{iteration},
\code{begin_iteration},
\code{end_iteration},
\code{num_parallel_tree}.
}
\seealso{
\code{\link{callbacks}},
\code{\link{xgb.attr}}
}

xgboost/R-package/man/cb.evaluation.log.Rd  view on Meta::CPAN

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/callbacks.R
\name{cb.evaluation.log}
\alias{cb.evaluation.log}
\title{Callback closure for logging the evaluation history}
\usage{
cb.evaluation.log()
}
\description{
Callback closure for logging the evaluation history
}
\details{

xgboost/R-package/man/cb.evaluation.log.Rd  view on Meta::CPAN


Note: in the column names of the final data.table, the dash '-' character is replaced with 
the underscore '_' in order to make the column names more like regular R identifiers.

Callback function expects the following values to be set in its calling frame:
\code{evaluation_log},
\code{bst_evaluation},
\code{iteration}.
}
\seealso{
\code{\link{callbacks}}
}

xgboost/R-package/man/cb.print.evaluation.Rd  view on Meta::CPAN

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/callbacks.R
\name{cb.print.evaluation}
\alias{cb.print.evaluation}
\title{Callback closure for printing the result of evaluation}
\usage{
cb.print.evaluation(period = 1, showsd = TRUE)
}
\arguments{
\item{period}{results would be printed every number of periods}

\item{showsd}{whether standard deviations should be printed (when available)}

xgboost/R-package/man/cb.print.evaluation.Rd  view on Meta::CPAN

The callback function prints the result of evaluation at every \code{period} iterations.
The initial and the last iteration's evaluations are always printed.

Callback function expects the following values to be set in its calling frame:
\code{bst_evaluation} (also \code{bst_evaluation_err} when available),
\code{iteration},
\code{begin_iteration},
\code{end_iteration}.
}
\seealso{
\code{\link{callbacks}}
}

xgboost/R-package/man/cb.reset.parameters.Rd  view on Meta::CPAN

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/callbacks.R
\name{cb.reset.parameters}
\alias{cb.reset.parameters}
\title{Callback closure for restetting the booster's parameters at each iteration.}
\usage{
cb.reset.parameters(new_params)
}
\arguments{
\item{new_params}{a list where each element corresponds to a parameter that needs to be reset.
Each element's value must be either a vector of values of length \code{nrounds} 
to be set at each iteration, 

xgboost/R-package/man/cb.reset.parameters.Rd  view on Meta::CPAN

reset a parameter value, the \code{nround} argument in this function would be the 
the number of boosting rounds in the current training.

Callback function expects the following values to be set in its calling frame:
\code{bst} or \code{bst_folds},
\code{iteration},
\code{begin_iteration},
\code{end_iteration}.
}
\seealso{
\code{\link{callbacks}}
}

xgboost/R-package/man/cb.save.model.Rd  view on Meta::CPAN

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/callbacks.R
\name{cb.save.model}
\alias{cb.save.model}
\title{Callback closure for saving a model file.}
\usage{
cb.save.model(save_period = 0, save_name = "xgboost.model")
}
\arguments{
\item{save_period}{save the model to disk after every 
\code{save_period} iterations; 0 means save the model at the end.}

xgboost/R-package/man/cb.save.model.Rd  view on Meta::CPAN

\details{
This callback function allows to save an xgb-model file, either periodically after each \code{save_period}'s or at the end.

Callback function expects the following values to be set in its calling frame:
\code{bst},
\code{iteration},
\code{begin_iteration},
\code{end_iteration}.
}
\seealso{
\code{\link{callbacks}}
}

xgboost/R-package/man/xgb.cv.Rd  view on Meta::CPAN

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.cv.R
\name{xgb.cv}
\alias{xgb.cv}
\title{Cross Validation}
\usage{
xgb.cv(params = list(), data, nrounds, nfold, label = NULL, missing = NA,
  prediction = FALSE, showsd = TRUE, metrics = list(), obj = NULL,
  feval = NULL, stratified = TRUE, folds = NULL, verbose = TRUE,
  print_every_n = 1L, early_stopping_rounds = NULL, maximize = NULL,
  callbacks = list(), ...)
}
\arguments{
\item{params}{the list of parameters. Commonly used ones are:
\itemize{
  \item \code{objective} objective function, common ones are
  \itemize{
    \item \code{reg:linear} linear regression
    \item \code{binary:logistic} logistic regression for classification
  }
  \item \code{eta} step size of each boosting step

xgboost/R-package/man/xgb.cv.Rd  view on Meta::CPAN

\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered. 
If set to an integer \code{k}, training with a validation set will stop if the performance 
doesn't improve for \code{k} rounds.
Setting this parameter engages the \code{\link{cb.early.stop}} callback.}

\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set,
then this parameter must be set as well.
When it is \code{TRUE}, it means the larger the evaluation score the better.
This parameter is passed to the \code{\link{cb.early.stop}} callback.}

\item{callbacks}{a list of callback functions to perform various task during boosting.
See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the 
parameters' values. User can provide either existing or their own callback methods in order 
to customize the training process.}

\item{...}{other parameters to pass to \code{params}.}
}
\value{
An object of class \code{xgb.cv.synchronous} with the following elements:
\itemize{
  \item \code{call} a function call.
  \item \code{params} parameters that were passed to the xgboost library. Note that it does not 
        capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
  \item \code{callbacks} callback functions that were either automatically assigned or 
        explicitely passed.
  \item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
        first column corresponding to iteration number and the rest corresponding to the 
        CV-based evaluation means and standard deviations for the training and test CV-sets.
        It is created by the \code{\link{cb.evaluation.log}} callback.
  \item \code{niter} number of boosting iterations.
  \item \code{folds} the list of CV folds' indices - either those passed through the \code{folds} 
        parameter or randomly generated.
  \item \code{best_iteration} iteration number with the best evaluation metric value
        (only available with early stopping).

xgboost/R-package/man/xgb.train.Rd  view on Meta::CPAN

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.train.R, R/xgboost.R
\name{xgb.train}
\alias{xgb.train}
\alias{xgboost}
\title{eXtreme Gradient Boosting Training}
\usage{
xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL,
  feval = NULL, verbose = 1, print_every_n = 1L,
  early_stopping_rounds = NULL, maximize = NULL, save_period = NULL,
  save_name = "xgboost.model", xgb_model = NULL, callbacks = list(), ...)

xgboost(data = NULL, label = NULL, missing = NA, weight = NULL,
  params = list(), nrounds, verbose = 1, print_every_n = 1L,
  early_stopping_rounds = NULL, maximize = NULL, save_period = NULL,
  save_name = "xgboost.model", xgb_model = NULL, callbacks = list(), ...)
}
\arguments{
\item{params}{the list of parameters. 
       The complete list of parameters is available at \url{http://xgboost.readthedocs.io/en/latest/parameter.html}.
       Below is a shorter summary:

1. General Parameters

\itemize{
  \item \code{booster} which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}.

xgboost/R-package/man/xgb.train.Rd  view on Meta::CPAN


\item{save_period}{when it is non-NULL, model is saved to disk after every \code{save_period} rounds,
0 means save at the end. The saving is handled by the \code{\link{cb.save.model}} callback.}

\item{save_name}{the name or path for periodically saved model file.}

\item{xgb_model}{a previously built model to continue the training from.
Could be either an object of class \code{xgb.Booster}, or its raw data, or the name of a 
file with a previously saved model.}

\item{callbacks}{a list of callback functions to perform various task during boosting.
See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the 
parameters' values. User can provide either existing or their own callback methods in order 
to customize the training process.}

\item{...}{other parameters to pass to \code{params}.}

\item{label}{vector of response values. Should not be provided when data is 
a local data file name or an \code{xgb.DMatrix}.}

\item{missing}{by default is set to NA, which means that NA values should be considered as 'missing'
by the algorithm. Sometimes, 0 or other extreme value might be used to represent missing values.

xgboost/R-package/man/xgb.train.Rd  view on Meta::CPAN

\itemize{
  \item \code{handle} a handle (pointer) to the xgboost model in memory.
  \item \code{raw} a cached memory dump of the xgboost model saved as R's \code{raw} type.
  \item \code{niter} number of boosting iterations.
  \item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
        first column corresponding to iteration number and the rest corresponding to evaluation
        metrics' values. It is created by the \code{\link{cb.evaluation.log}} callback.
  \item \code{call} a function call.
  \item \code{params} parameters that were passed to the xgboost library. Note that it does not 
        capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
  \item \code{callbacks} callback functions that were either automatically assigned or 
        explicitely passed.
  \item \code{best_iteration} iteration number with the best evaluation metric value
        (only available with early stopping).
  \item \code{best_ntreelimit} the \code{ntreelimit} value corresponding to the best iteration, 
        which could further be used in \code{predict} method
        (only available with early stopping).
  \item \code{best_score} the best evaluation metric value during early stopping.
        (only available with early stopping).
  \item \code{feature_names} names of the training dataset features
        (only when comun names were defined in training data).

xgboost/R-package/man/xgb.train.Rd  view on Meta::CPAN

     \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
     \item \code{mlogloss} multiclass logloss. \url{https://www.kaggle.com/wiki/MultiClassLogLoss/}
     \item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
           By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
           Different threshold (e.g., 0.) could be specified as "error@0."
     \item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
     \item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
     \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG}
  }

The following callbacks are automatically created when certain parameters are set:
\itemize{
  \item \code{cb.print.evaluation} is turned on when \code{verbose > 0};
        and the \code{print_every_n} parameter is passed to it.
  \item \code{cb.evaluation.log} is on when \code{watchlist} is present.
  \item \code{cb.early.stop}: when \code{early_stopping_rounds} is set.
  \item \code{cb.save.model}: when \code{save_period > 0} is set.
}
}
\examples{
data(agaricus.train, package='xgboost')

xgboost/R-package/man/xgb.train.Rd  view on Meta::CPAN

#  or as dedicated 'obj' and 'feval' parameters of xgb.train:
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
                 obj = logregobj, feval = evalerror)


## An xgb.train example of using variable learning rates at each iteration:
param <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2,
              objective = "binary:logistic", eval_metric = "auc")
my_etas <- list(eta = c(0.5, 0.1))
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
                 callbacks = list(cb.reset.parameters(my_etas)))

## Early stopping:
bst <- xgb.train(param, dtrain, nrounds = 25, watchlist,
                 early_stopping_rounds = 3)

## An 'xgboost' interface example:
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, 
               max_depth = 2, eta = 1, nthread = 2, nrounds = 2, 
               objective = "binary:logistic")
pred <- predict(bst, agaricus.test$data)

}
\seealso{
\code{\link{callbacks}},
\code{\link{predict.xgb.Booster}},
\code{\link{xgb.cv}}
}

xgboost/R-package/tests/testthat/test_basic.R  view on Meta::CPAN

                 verbose=TRUE)
  , "train-error:")
  expect_is(cv, 'xgb.cv.synchronous')
  expect_false(is.null(cv$evaluation_log))
  expect_lt(cv$evaluation_log[, min(test_error_mean)], 0.03)
  expect_lt(cv$evaluation_log[, min(test_error_std)], 0.004)
  expect_equal(cv$niter, 2)
  expect_false(is.null(cv$folds) && is.list(cv$folds))
  expect_length(cv$folds, 5)
  expect_false(is.null(cv$params) && is.list(cv$params))
  expect_false(is.null(cv$callbacks))
  expect_false(is.null(cv$call))
})

test_that("train and predict with non-strict classes", {
  # standard dense matrix input
  train_dense <- as.matrix(train$data)
  bst <- xgboost(data = train_dense, label = train$label, max_depth = 2,
                 eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic", verbose = 0)
  pr0 <- predict(bst, train_dense)
  

xgboost/R-package/tests/testthat/test_callbacks.R  view on Meta::CPAN

# More specific testing of callbacks

require(xgboost)
require(data.table)

context("callbacks")

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test

# add some label noise for early stopping tests
add.noise <- function(label, frac) {
  inoise <- sample(length(label), length(label) * frac)
  label[inoise] <- !label[inoise]

xgboost/R-package/tests/testthat/test_callbacks.R  view on Meta::CPAN

  # fixed eta
  set.seed(111)
  bst0 <- xgb.train(param, dtrain, nrounds = 2, watchlist, eta = 0.9, verbose = 0)
  expect_false(is.null(bst0$evaluation_log))
  expect_false(is.null(bst0$evaluation_log$train_error))

  # same eta but re-set as a vector parameter in the callback
  set.seed(111)
  my_par <- list(eta = c(0.9, 0.9))
  bst1 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0,
                    callbacks = list(cb.reset.parameters(my_par)))
  expect_false(is.null(bst1$evaluation_log$train_error))
  expect_equal(bst0$evaluation_log$train_error, 
               bst1$evaluation_log$train_error)
  
  # same eta but re-set via a function in the callback
  set.seed(111)
  my_par <- list(eta = function(itr, itr_end) 0.9)
  bst2 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0,
                    callbacks = list(cb.reset.parameters(my_par)))
  expect_false(is.null(bst2$evaluation_log$train_error))
  expect_equal(bst0$evaluation_log$train_error, 
               bst2$evaluation_log$train_error)
  
  # different eta re-set as a vector parameter in the callback
  set.seed(111)
  my_par <- list(eta = c(0.6, 0.5))
  bst3 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0,
                    callbacks = list(cb.reset.parameters(my_par)))
  expect_false(is.null(bst3$evaluation_log$train_error))
  expect_false(all(bst0$evaluation_log$train_error == bst3$evaluation_log$train_error))
  
  # resetting multiple parameters at the same time runs with no error
  my_par <- list(eta = c(1., 0.5), gamma = c(1, 2), max_depth = c(4, 8))
  expect_error(
    bst4 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0,
                      callbacks = list(cb.reset.parameters(my_par)))
  , NA) # NA = no error
  # CV works as well
  expect_error(
    bst4 <- xgb.cv(param, dtrain, nfold = 2, nrounds = 2, verbose = 0,
                   callbacks = list(cb.reset.parameters(my_par)))
  , NA) # NA = no error

  # expect no learning with 0 learning rate
  my_par <- list(eta = c(0., 0.))
  bstX <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0,
                    callbacks = list(cb.reset.parameters(my_par)))
  expect_false(is.null(bstX$evaluation_log$train_error))
  er <- unique(bstX$evaluation_log$train_error)
  expect_length(er, 1)
  expect_gt(er, 0.4)
})

test_that("cb.save.model works as expected", {
  files <- c('xgboost_01.model', 'xgboost_02.model', 'xgboost.model')
  for (f in files) if (file.exists(f)) file.remove(f)
  

xgboost/R-package/tests/testthat/test_callbacks.R  view on Meta::CPAN

                      early_stopping_rounds = 3, maximize = FALSE, verbose = 0)
  )
  expect_equal(bst$evaluation_log, bst0$evaluation_log)
})

test_that("early stopping using a specific metric works", {
  set.seed(11)
  expect_output(
    bst <- xgb.train(param, dtrain, nrounds = 20, watchlist, eta = 0.6,
                     eval_metric="logloss", eval_metric="auc",
                     callbacks = list(cb.early.stop(stopping_rounds = 3, maximize = FALSE,
                                                    metric_name = 'test_logloss')))
  , "Stopping. Best iteration")
  expect_false(is.null(bst$best_iteration))
  expect_lt(bst$best_iteration, 19)
  expect_equal(bst$best_iteration, bst$best_ntreelimit)

  pred <- predict(bst, dtest, ntreelimit = bst$best_ntreelimit)
  expect_equal(length(pred), 1611)
  logloss_pred <- sum(-ltest * log(pred) - (1 - ltest) * log(1 - pred)) / length(ltest)
  logloss_log <- bst$evaluation_log[bst$best_iteration, test_logloss]

xgboost/R-package/tests/testthat/test_callbacks.R  view on Meta::CPAN

  expect_false(is.null(cv$evaluation_log))
  expect_false(is.null(cv$pred))
  expect_length(cv$pred, nrow(train$data))
  err_pred <- mean( sapply(cv$folds, function(f) mean(err(ltrain[f], cv$pred[f]))) )
  err_log <- cv$evaluation_log[nrounds, test_error_mean]
  expect_equal(err_pred, err_log, tolerance = 1e-6)

  # save CV models
  set.seed(11)
  cvx <- xgb.cv(param, dtrain, nfold = 5, eta = 0.5, nrounds = nrounds, prediction = TRUE, verbose = 0,
                callbacks = list(cb.cv.predict(save_models = TRUE)))
  expect_equal(cv$evaluation_log, cvx$evaluation_log)
  expect_length(cvx$models, 5)
  expect_true(all(sapply(cvx$models, class) == 'xgb.Booster'))
})

test_that("prediction in xgb.cv works for gblinear too", {
  set.seed(11)
  p <- list(booster = 'gblinear', objective = "reg:logistic", nthread = 2)
  cv <- xgb.cv(p, dtrain, nfold = 5, eta = 0.5, nrounds = 2, prediction = TRUE, verbose = 0)
  expect_false(is.null(cv$evaluation_log))

xgboost/demo/guide-python/cross_validation.py  view on Meta::CPAN

dtrain = xgb.DMatrix('../data/agaricus.txt.train')
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic'}
num_round = 2

print ('running cross validation')
# do cross validation, this will print result out as
# [iteration]  metric_name:mean_value+std_value
# std_value is standard deviation of the metric
xgb.cv(param, dtrain, num_round, nfold=5,
       metrics={'error'}, seed = 0,
       callbacks=[xgb.callback.print_evaluation(show_stdv=True)])

print ('running cross validation, disable standard deviation display')
# do cross validation, this will print result out as
# [iteration]  metric_name:mean_value
res = xgb.cv(param, dtrain, num_boost_round=10, nfold=5,
             metrics={'error'}, seed = 0,
             callbacks=[xgb.callback.print_evaluation(show_stdv=False),
                        xgb.callback.early_stop(3)])
print (res)
print ('running cross validation, with preprocessing function')
# define the preprocessing function
# used to return the preprocessed training, test data, and parameter
# we can use this to do weight rescale, etc.
# as a example, we try to set scale_pos_weight
def fpreproc(dtrain, dtest, param):
    label = dtrain.get_label()
    ratio = float(np.sum(label == 0)) / np.sum(label==1)

xgboost/dmlc-core/include/dmlc/json.h  view on Meta::CPAN

  typedef void (*ReadFunction)(JSONReader *reader, void *addr);
  /*! \brief internal data entry */
  struct Entry {
    /*! \brief the reader function */
    ReadFunction func;
    /*! \brief the address to read */
    void *addr;
    /*! \brief whether it is optional */
    bool optional;
  };
  /*! \brief the internal map of reader callbacks */
  std::map<std::string, Entry> map_;
};

#define DMLC_JSON_ENABLE_ANY_VAR_DEF(KeyName)                  \
  static DMLC_ATTRIBUTE_UNUSED ::dmlc::json::AnyJSONManager&   \
  __make_AnyJSONType ## _ ## KeyName ## __

/*!
 * \def DMLC_JSON_ENABLE_ANY
 * \brief Macro to enable save/load JSON of dmlc:: whose actual type is Type.

xgboost/python-package/xgboost/core.py  view on Meta::CPAN

    Parameters
    ----------
    best_iteration : int
        The best iteration stopped.
    """
    def __init__(self, best_iteration):
        super(EarlyStopException, self).__init__()
        self.best_iteration = best_iteration


# Callback environment used by callbacks
CallbackEnv = collections.namedtuple(
    "XGBoostCallbackEnv",
    ["model",
     "cvfolds",
     "iteration",
     "begin_iteration",
     "end_iteration",
     "rank",
     "evaluation_result_list"])

xgboost/python-package/xgboost/training.py  view on Meta::CPAN

import numpy as np
from .core import Booster, STRING_TYPES, XGBoostError, CallbackEnv, EarlyStopException
from .compat import (SKLEARN_INSTALLED, XGBStratifiedKFold)
from . import rabit
from . import callback


def _train_internal(params, dtrain,
                    num_boost_round=10, evals=(),
                    obj=None, feval=None,
                    xgb_model=None, callbacks=None):
    """internal training function"""
    callbacks = [] if callbacks is None else callbacks
    evals = list(evals)
    if isinstance(params, dict) \
            and 'eval_metric' in params \
            and isinstance(params['eval_metric'], list):
        params = dict((k, v) for k, v in params.items())
        eval_metrics = params['eval_metric']
        params.pop("eval_metric", None)
        params = list(params.items())
        for eval_metric in eval_metrics:
            params += [('eval_metric', eval_metric)]

xgboost/python-package/xgboost/training.py  view on Meta::CPAN

    if 'num_class' in _params:
        nboost //= _params['num_class']

    # Distributed code: Load the checkpoint from rabit.
    version = bst.load_rabit_checkpoint()
    assert(rabit.get_world_size() != 1 or version == 0)
    rank = rabit.get_rank()
    start_iteration = int(version / 2)
    nboost += start_iteration

    callbacks_before_iter = [
        cb for cb in callbacks if cb.__dict__.get('before_iteration', False)]
    callbacks_after_iter = [
        cb for cb in callbacks if not cb.__dict__.get('before_iteration', False)]

    for i in range(start_iteration, num_boost_round):
        for cb in callbacks_before_iter:
            cb(CallbackEnv(model=bst,
                           cvfolds=None,
                           iteration=i,
                           begin_iteration=start_iteration,
                           end_iteration=num_boost_round,
                           rank=rank,
                           evaluation_result_list=None))
        # Distributed code: need to resume to this point.
        # Skip the first update if it is a recovery step.
        if version % 2 == 0:

xgboost/python-package/xgboost/training.py  view on Meta::CPAN

        # check evaluation result.
        if len(evals) != 0:
            bst_eval_set = bst.eval_set(evals, i, feval)
            if isinstance(bst_eval_set, STRING_TYPES):
                msg = bst_eval_set
            else:
                msg = bst_eval_set.decode()
            res = [x.split(':') for x in msg.split()]
            evaluation_result_list = [(k, float(v)) for k, v in res[1:]]
        try:
            for cb in callbacks_after_iter:
                cb(CallbackEnv(model=bst,
                               cvfolds=None,
                               iteration=i,
                               begin_iteration=start_iteration,
                               end_iteration=num_boost_round,
                               rank=rank,
                               evaluation_result_list=evaluation_result_list))
        except EarlyStopException:
            break
        # do checkpoint after evaluation, in case evaluation also updates booster.

xgboost/python-package/xgboost/training.py  view on Meta::CPAN

        bst.best_score = float(bst.attr('best_score'))
        bst.best_iteration = int(bst.attr('best_iteration'))
    else:
        bst.best_iteration = nboost - 1
    bst.best_ntree_limit = (bst.best_iteration + 1) * num_parallel_tree
    return bst


def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
          maximize=False, early_stopping_rounds=None, evals_result=None,
          verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None):
    # pylint: disable=too-many-statements,too-many-branches, attribute-defined-outside-init
    """Train a booster with given parameters.

    Parameters
    ----------
    params : dict
        Booster params.
    dtrain : DMatrix
        Data to be trained.
    num_boost_round: int

xgboost/python-package/xgboost/training.py  view on Meta::CPAN

        / the boosting stage found by using `early_stopping_rounds` is also printed.
        Example: with verbose_eval=4 and at least one item in evals, an evaluation metric
        is printed every 4 boosting stages, instead of every boosting stage.
    learning_rates: list or function (deprecated - use callback API instead)
        List of learning rate for each boosting round
        or a customized function that calculates eta in terms of
        current number of round and the total number of boosting round (e.g. yields
        learning rate decay)
    xgb_model : file name of stored xgb model or 'Booster' instance
        Xgb model to be loaded before training (allows training continuation).
    callbacks : list of callback functions
        List of callback functions that are applied at end of each iteration.
        It is possible to use predefined callbacks by using xgb.callback module.
        Example: [xgb.callback.reset_learning_rate(custom_rates)]

    Returns
    -------
    booster : a trained booster model
    """
    callbacks = [] if callbacks is None else callbacks

    # Most of legacy advanced options becomes callbacks
    if isinstance(verbose_eval, bool) and verbose_eval:
        callbacks.append(callback.print_evaluation())
    else:
        if isinstance(verbose_eval, int):
            callbacks.append(callback.print_evaluation(verbose_eval))

    if early_stopping_rounds is not None:
        callbacks.append(callback.early_stop(early_stopping_rounds,
                                             maximize=maximize,
                                             verbose=bool(verbose_eval)))
    if evals_result is not None:
        callbacks.append(callback.record_evaluation(evals_result))

    if learning_rates is not None:
        warnings.warn("learning_rates parameter is deprecated - use callback API instead",
                      DeprecationWarning)
        callbacks.append(callback.reset_learning_rate(learning_rates))

    return _train_internal(params, dtrain,
                           num_boost_round=num_boost_round,
                           evals=evals,
                           obj=obj, feval=feval,
                           xgb_model=xgb_model, callbacks=callbacks)


class CVPack(object):
    """"Auxiliary datastruct to hold one fold of CV."""
    def __init__(self, dtrain, dtest, param):
        """"Initialize the CVPack"""
        self.dtrain = dtrain
        self.dtest = dtest
        self.watchlist = [(dtrain, 'train'), (dtest, 'test')]
        self.bst = Booster(param, [dtrain, dtest])

xgboost/python-package/xgboost/training.py  view on Meta::CPAN

        if not isinstance(msg, STRING_TYPES):
            msg = msg.decode()
        mean, std = np.mean(v), np.std(v)
        results.extend([(k, mean, std)])
    return results


def cv(params, dtrain, num_boost_round=10, nfold=3, stratified=False, folds=None,
       metrics=(), obj=None, feval=None, maximize=False, early_stopping_rounds=None,
       fpreproc=None, as_pandas=True, verbose_eval=None, show_stdv=True,
       seed=0, callbacks=None, shuffle=True):
    # pylint: disable = invalid-name
    """Cross-validation with given parameters.

    Parameters
    ----------
    params : dict
        Booster params.
    dtrain : DMatrix
        Data to be trained.
    num_boost_round : int

xgboost/python-package/xgboost/training.py  view on Meta::CPAN

    verbose_eval : bool, int, or None, default None
        Whether to display the progress. If None, progress will be displayed
        when np.ndarray is returned. If True, progress will be displayed at
        boosting stage. If an integer is given, progress will be displayed
        at every given `verbose_eval` boosting stage.
    show_stdv : bool, default True
        Whether to display the standard deviation in progress.
        Results are not affected, and always contains std.
    seed : int
        Seed used to generate the folds (passed to numpy.random.seed).
    callbacks : list of callback functions
        List of callback functions that are applied at end of each iteration.
        It is possible to use predefined callbacks by using xgb.callback module.
        Example: [xgb.callback.reset_learning_rate(custom_rates)]
     shuffle : bool
        Shuffle data before creating folds.

    Returns
    -------
    evaluation history : list(string)
    """
    if stratified is True and not SKLEARN_INSTALLED:
        raise XGBoostError('sklearn needs to be installed in order to use stratified cv')

xgboost/python-package/xgboost/training.py  view on Meta::CPAN

            metrics = params['eval_metric']
        else:
            metrics = [params['eval_metric']]

    params.pop("eval_metric", None)

    results = {}
    cvfolds = mknfold(dtrain, nfold, params, seed, metrics, fpreproc,
                      stratified, folds, shuffle)

    # setup callbacks
    callbacks = [] if callbacks is None else callbacks
    if early_stopping_rounds is not None:
        callbacks.append(callback.early_stop(early_stopping_rounds,
                                             maximize=maximize,
                                             verbose=False))

    if isinstance(verbose_eval, bool) and verbose_eval:
        callbacks.append(callback.print_evaluation(show_stdv=show_stdv))
    else:
        if isinstance(verbose_eval, int):
            callbacks.append(callback.print_evaluation(verbose_eval, show_stdv=show_stdv))

    callbacks_before_iter = [
        cb for cb in callbacks if cb.__dict__.get('before_iteration', False)]
    callbacks_after_iter = [
        cb for cb in callbacks if not cb.__dict__.get('before_iteration', False)]

    for i in range(num_boost_round):
        for cb in callbacks_before_iter:
            cb(CallbackEnv(model=None,
                           cvfolds=cvfolds,
                           iteration=i,
                           begin_iteration=0,
                           end_iteration=num_boost_round,
                           rank=0,
                           evaluation_result_list=None))
        for fold in cvfolds:
            fold.update(i, obj)
        res = aggcv([f.eval(i, feval) for f in cvfolds])

        for key, mean, std in res:
            if key + '-mean' not in results:
                results[key + '-mean'] = []
            if key + '-std' not in results:
                results[key + '-std'] = []
            results[key + '-mean'].append(mean)
            results[key + '-std'].append(std)
        try:
            for cb in callbacks_after_iter:
                cb(CallbackEnv(model=None,
                               cvfolds=cvfolds,
                               iteration=i,
                               begin_iteration=0,
                               end_iteration=num_boost_round,
                               rank=0,
                               evaluation_result_list=res))
        except EarlyStopException as e:
            for k in results.keys():
                results[k] = results[k][:(e.best_iteration + 1)]

xgboost/rabit/README.md  view on Meta::CPAN

  - Rabit is one of the backbone library to support distributed XGBoost

## Features
All these features comes from the facts about small rabbit:)
* Portable: rabit is light weight and runs everywhere
  - Rabit is a library instead of a framework, a program only needs to link the library to run
  - Rabit only replies on a mechanism to start program, which was provided by most framework
  - You can run rabit programs on many platforms, including Yarn(Hadoop), MPI using the same code
* Scalable and Flexible: rabit runs fast
  * Rabit program use Allreduce to communicate, and do not suffer the cost between iterations of MapReduce abstraction.
  - Programs can call rabit functions in any order, as opposed to frameworks where callbacks are offered and called by the framework, i.e. inversion of control principle.
  - Programs persist over all the iterations, unless they fail and recover.
* Reliable: rabit dig burrows to avoid disasters
  - Rabit programs can recover the model and results using synchronous function calls.

## Use Rabit
* Type make in the root folder will compile the rabit library in lib folder
* Add lib to the library path and include to the include path of compiler
* Languages: You can use rabit in C++ and python
  - It is also possible to port the library to other languages

xgboost/tests/python/test_basic.py  view on Meta::CPAN

    def test_record_results(self):
        dtrain = xgb.DMatrix(dpath + 'agaricus.txt.train')
        dtest = xgb.DMatrix(dpath + 'agaricus.txt.test')
        param = {'max_depth': 2, 'eta': 1, 'silent': 1, 'objective': 'binary:logistic'}
        # specify validations set to watch performance
        watchlist = [(dtest, 'eval'), (dtrain, 'train')]
        num_round = 2
        result = {}
        res2 = {}
        xgb.train(param, dtrain, num_round, watchlist,
                  callbacks=[xgb.callback.record_evaluation(result)])
        xgb.train(param, dtrain, num_round, watchlist,
                  evals_result=res2)
        assert result['train']['error'][0] < 0.1
        assert res2 == result

    def test_multiclass(self):
        dtrain = xgb.DMatrix(dpath + 'agaricus.txt.train')
        dtest = xgb.DMatrix(dpath + 'agaricus.txt.test')
        param = {'max_depth': 2, 'eta': 1, 'silent': 1, 'num_class': 2}
        # specify validations set to watch performance



( run in 1.630 second using v1.01-cache-2.11-cpan-9b1e4054eb1 )