view release on metacpan or search on metacpan
xgboost/R-package/R/xgb.Booster.R view on Meta::CPAN
#' would not be saved by \code{xgb.save} because an xgboost model is an external memory object
#' and its serialization is handled externally.
#' Also, setting an attribute that has the same name as one of xgboost's parameters wouldn't
#' change the value of that parameter for a model.
#' Use \code{\link{xgb.parameters<-}} to set or change model parameters.
#'
#' The attribute setters would usually work more efficiently for \code{xgb.Booster.handle}
#' than for \code{xgb.Booster}, since only just a handle (pointer) would need to be copied.
#' That would only matter if attributes need to be set many times.
#' Note, however, that when feeding a handle of an \code{xgb.Booster} object to the attribute setters,
#' the raw model cache of an \code{xgb.Booster} object would not be automatically updated,
#' and it would be user's responsibility to call \code{xgb.save.raw} to update it.
#'
#' The \code{xgb.attributes<-} setter either updates the existing or adds one or several attributes,
#' but it doesn't delete the other existing attributes.
#'
#' @return
#' \code{xgb.attr} returns either a string value of an attribute
#' or \code{NULL} if an attribute wasn't stored in a model.
#'
#' \code{xgb.attributes} returns a list of all attribute stored in a model
xgboost/R-package/man/xgb.attr.Rd view on Meta::CPAN
would not be saved by \code{xgb.save} because an xgboost model is an external memory object
and its serialization is handled externally.
Also, setting an attribute that has the same name as one of xgboost's parameters wouldn't
change the value of that parameter for a model.
Use \code{\link{xgb.parameters<-}} to set or change model parameters.
The attribute setters would usually work more efficiently for \code{xgb.Booster.handle}
than for \code{xgb.Booster}, since only just a handle (pointer) would need to be copied.
That would only matter if attributes need to be set many times.
Note, however, that when feeding a handle of an \code{xgb.Booster} object to the attribute setters,
the raw model cache of an \code{xgb.Booster} object would not be automatically updated,
and it would be user's responsibility to call \code{xgb.save.raw} to update it.
The \code{xgb.attributes<-} setter either updates the existing or adds one or several attributes,
but it doesn't delete the other existing attributes.
}
\examples{
data(agaricus.train, package='xgboost')
train <- agaricus.train
bst <- xgboost(data = train$data, label = train$label, max_depth = 2,
xgboost/R-package/vignettes/xgboostPresentation.Rmd view on Meta::CPAN
* `xgb.DMatrix`: its own class (recommended).
* Sparsity: it accepts *sparse* input for both *tree booster* and *linear booster*, and is optimized for *sparse* input ;
* Customization: it supports customized objective functions and evaluation functions.
## Installation
### Github version
For weekly updated version (highly recommended), install from *Github*:
```{r installGithub, eval=FALSE}
install.packages("drat", repos="https://cran.rstudio.com")
drat:::addRepo("dmlc")
install.packages("xgboost", repos="http://dmlc.ml/drat/", type = "source")
```
> *Windows* user will need to install [Rtools](https://cran.r-project.org/bin/windows/Rtools/) first.
### CRAN version
xgboost/cub/cub/agent/agent_segment_fixup.cuh view on Meta::CPAN
}
}
else
{
// Exclusive scan of values and segment_flags
TilePrefixCallbackOpT prefix_op(tile_state, temp_storage.prefix, scan_op, tile_idx);
BlockScanT(temp_storage.scan).ExclusiveScan(pairs, scatter_pairs, scan_op, prefix_op);
tile_aggregate = prefix_op.GetBlockAggregate();
}
// Scatter updated values
#pragma unroll
for (int ITEM = 0; ITEM < ITEMS_PER_THREAD; ++ITEM)
{
if (scatter_pairs[ITEM].key != pairs[ITEM].key)
{
// Update the value at the key location
ValueT value = d_fixup_in[scatter_pairs[ITEM].key];
value = reduction_op(value, scatter_pairs[ITEM].value);
d_aggregates_out[scatter_pairs[ITEM].key] = value;
xgboost/cub/cub/block/block_reduce.cuh view on Meta::CPAN
* operators.
*
* \par
* Execution is comprised of four phases:
* -# Upsweep sequential reduction in registers (if threads contribute more
* than one input each). Each thread then places the partial reduction
* of its item(s) into shared memory.
* -# Compute a shallow, but inefficient warp-synchronous Kogge-Stone style
* reduction within each warp.
* -# A propagation phase where the warp reduction outputs in each warp are
* updated with the aggregate from each preceding warp.
*
* \par
* \image html block_scan_warpscans.png
* <div class="centercaption">\p BLOCK_REDUCE_WARP_REDUCTIONS data flow for a hypothetical 16-thread threadblock and 4-thread raking warp.</div>
*
* \par Performance Considerations
* - This variant applies more reduction operators than BLOCK_REDUCE_RAKING
* or BLOCK_REDUCE_RAKING_NON_COMMUTATIVE, which may result in lower overall
* throughput across the GPU. However turn-around latency may be lower and
* thus useful when the GPU is under-occupied.
xgboost/cub/cub/block/block_scan.cuh view on Meta::CPAN
* scan, allowing the "downsweep" not to re-read them from shared memory.
*/
BLOCK_SCAN_RAKING_MEMOIZE,
/**
* \par Overview
* A quick "tiled warpscans" prefix scan algorithm. Execution is comprised of four phases:
* -# Upsweep sequential reduction in registers (if threads contribute more than one input each). Each thread then places the partial reduction of its item(s) into shared memory.
* -# Compute a shallow, but inefficient warp-synchronous Kogge-Stone style scan within each warp.
* -# A propagation phase where the warp scan outputs in each warp are updated with the aggregate from each preceding warp.
* -# Downsweep sequential scan in registers (if threads contribute more than one input), seeded with the raking scan output.
*
* \par
* \image html block_scan_warpscans.png
* <div class="centercaption">\p BLOCK_SCAN_WARP_SCANS data flow for a hypothetical 16-thread threadblock and 4-thread raking warp.</div>
*
* \par Performance Considerations
* - Although this variant may suffer lower overall throughput across the
* GPU because due to a heavy reliance on inefficient warpscans, it can
* often provide lower turnaround latencies when the GPU is under-occupied.
xgboost/cub/cub/block/block_shuffle.cuh view on Meta::CPAN
/**
* \brief Each <em>thread<sub>i</sub></em> obtains the \p input provided by <em>thread</em><sub><em>i</em>+<tt>distance</tt></sub>. The offset \p distance may be negative.
*
* \par
* - \smemreuse
*/
__device__ __forceinline__ void Offset(
T input, ///< [in] The input item from the calling thread (<em>thread<sub>i</sub></em>)
T& output, ///< [out] The \p input item from the successor (or predecessor) thread <em>thread</em><sub><em>i</em>+<tt>distance</tt></sub> (may be aliased to \p input). This value is only updated for for <em>thread<sub>i</sub...
int distance = 1) ///< [in] Offset distance (may be negative)
{
temp_storage[linear_tid].prev = input;
CTA_SYNC();
if ((linear_tid + distance >= 0) && (linear_tid + distance < BLOCK_THREADS))
output = temp_storage[linear_tid + distance].prev;
}
/**
* \brief Each <em>thread<sub>i</sub></em> obtains the \p input provided by <em>thread</em><sub><em>i</em>+<tt>distance</tt></sub>.
*
* \par
* - \smemreuse
*/
__device__ __forceinline__ void Rotate(
T input, ///< [in] The calling thread's input item
T& output, ///< [out] The \p input item from thread <em>thread</em><sub>(<em>i</em>+<tt>distance></tt>)%<tt><BLOCK_THREADS></tt></sub> (may be aliased to \p input). This value is not updated for <em>thread</em><sub>BLOCK_THR...
unsigned int distance = 1) ///< [in] Offset distance (0 < \p distance < <tt>BLOCK_THREADS</tt>)
{
temp_storage[linear_tid].prev = input;
CTA_SYNC();
unsigned int offset = threadIdx.x + distance;
if (offset >= BLOCK_THREADS)
offset -= BLOCK_THREADS;
xgboost/cub/cub/block/block_shuffle.cuh view on Meta::CPAN
* \brief The thread block rotates its [<em>blocked arrangement</em>](index.html#sec5sec3) of \p input items, shifting it up by one item
*
* \par
* - \blocked
* - \granularity
* - \smemreuse
*/
template <int ITEMS_PER_THREAD>
__device__ __forceinline__ void Up(
T (&input)[ITEMS_PER_THREAD], ///< [in] The calling thread's input items
T (&prev)[ITEMS_PER_THREAD]) ///< [out] The corresponding predecessor items (may be aliased to \p input). The item \p prev[0] is not updated for <em>thread</em><sub>0</sub>.
{
temp_storage[linear_tid].prev = input[ITEMS_PER_THREAD - 1];
CTA_SYNC();
#pragma unroll
for (int ITEM = ITEMS_PER_THREAD - 1; ITEM > 0; --ITEM)
prev[ITEM] = input[ITEM - 1];
xgboost/cub/cub/block/block_shuffle.cuh view on Meta::CPAN
* \brief The thread block rotates its [<em>blocked arrangement</em>](index.html#sec5sec3) of \p input items, shifting it up by one item. All threads receive the \p input provided by <em>thread</em><sub><tt>BLOCK_THREADS-1</tt></sub>.
*
* \par
* - \blocked
* - \granularity
* - \smemreuse
*/
template <int ITEMS_PER_THREAD>
__device__ __forceinline__ void Up(
T (&input)[ITEMS_PER_THREAD], ///< [in] The calling thread's input items
T (&prev)[ITEMS_PER_THREAD], ///< [out] The corresponding predecessor items (may be aliased to \p input). The item \p prev[0] is not updated for <em>thread</em><sub>0</sub>.
T &block_suffix) ///< [out] The item \p input[ITEMS_PER_THREAD-1] from <em>thread</em><sub><tt>BLOCK_THREADS-1</tt></sub>, provided to all threads
{
Up(input, prev);
block_suffix = temp_storage[BLOCK_THREADS - 1].prev;
}
/**
* \brief The thread block rotates its [<em>blocked arrangement</em>](index.html#sec5sec3) of \p input items, shifting it down by one item
*
* \par
* - \blocked
* - \granularity
* - \smemreuse
*/
template <int ITEMS_PER_THREAD>
__device__ __forceinline__ void Down(
T (&input)[ITEMS_PER_THREAD], ///< [in] The calling thread's input items
T (&prev)[ITEMS_PER_THREAD]) ///< [out] The corresponding predecessor items (may be aliased to \p input). The value \p prev[0] is not updated for <em>thread</em><sub>BLOCK_THREADS-1</sub>.
{
temp_storage[linear_tid].prev = input[ITEMS_PER_THREAD - 1];
CTA_SYNC();
#pragma unroll
for (int ITEM = ITEMS_PER_THREAD - 1; ITEM > 0; --ITEM)
prev[ITEM] = input[ITEM - 1];
if (linear_tid > 0)
xgboost/cub/cub/block/block_shuffle.cuh view on Meta::CPAN
* \brief The thread block rotates its [<em>blocked arrangement</em>](index.html#sec5sec3) of input items, shifting it down by one item. All threads receive \p input[0] provided by <em>thread</em><sub><tt>0</tt></sub>.
*
* \par
* - \blocked
* - \granularity
* - \smemreuse
*/
template <int ITEMS_PER_THREAD>
__device__ __forceinline__ void Down(
T (&input)[ITEMS_PER_THREAD], ///< [in] The calling thread's input items
T (&prev)[ITEMS_PER_THREAD], ///< [out] The corresponding predecessor items (may be aliased to \p input). The value \p prev[0] is not updated for <em>thread</em><sub>BLOCK_THREADS-1</sub>.
T &block_prefix) ///< [out] The item \p input[0] from <em>thread</em><sub><tt>0</tt></sub>, provided to all threads
{
Up(input, prev);
block_prefix = temp_storage[BLOCK_THREADS - 1].prev;
}
//@} end member group
};
xgboost/cub/cub/device/device_radix_sort.cuh view on Meta::CPAN
* \tparam KeyT <b>[inferred]</b> KeyT type
* \tparam ValueT <b>[inferred]</b> ValueT type
*/
template <
typename KeyT,
typename ValueT>
CUB_RUNTIME_FUNCTION
static cudaError_t SortPairs(
void *d_temp_storage, ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes, ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys, ///< [in,out] Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted...
DoubleBuffer<ValueT> &d_values, ///< [in,out] Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output value...
int num_items, ///< [in] Number of items to sort
int begin_bit = 0, ///< [in] <b>[optional]</b> The least-significant bit index (inclusive) needed for key comparison
int end_bit = sizeof(KeyT) * 8, ///< [in] <b>[optional]</b> The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8)
cudaStream_t stream = 0, ///< [in] <b>[optional]</b> CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous = false) ///< [in] <b>[optional]</b> Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the consol...
{
// Signed integer type for global offsets
typedef int OffsetT;
return DispatchRadixSort<false, KeyT, ValueT, OffsetT>::Dispatch(
xgboost/cub/cub/device/device_radix_sort.cuh view on Meta::CPAN
* \tparam KeyT <b>[inferred]</b> KeyT type
* \tparam ValueT <b>[inferred]</b> ValueT type
*/
template <
typename KeyT,
typename ValueT>
CUB_RUNTIME_FUNCTION
static cudaError_t SortPairsDescending(
void *d_temp_storage, ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes, ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys, ///< [in,out] Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted...
DoubleBuffer<ValueT> &d_values, ///< [in,out] Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output value...
int num_items, ///< [in] Number of items to sort
int begin_bit = 0, ///< [in] <b>[optional]</b> The least-significant bit index (inclusive) needed for key comparison
int end_bit = sizeof(KeyT) * 8, ///< [in] <b>[optional]</b> The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8)
cudaStream_t stream = 0, ///< [in] <b>[optional]</b> CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous = false) ///< [in] <b>[optional]</b> Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the consol...
{
// Signed integer type for global offsets
typedef int OffsetT;
return DispatchRadixSort<true, KeyT, ValueT, OffsetT>::Dispatch(
xgboost/cub/cub/device/device_radix_sort.cuh view on Meta::CPAN
*
* \endcode
*
* \tparam KeyT <b>[inferred]</b> KeyT type
*/
template <typename KeyT>
CUB_RUNTIME_FUNCTION
static cudaError_t SortKeys(
void *d_temp_storage, ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes, ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys, ///< [in,out] Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted out...
int num_items, ///< [in] Number of items to sort
int begin_bit = 0, ///< [in] <b>[optional]</b> The least-significant bit index (inclusive) needed for key comparison
int end_bit = sizeof(KeyT) * 8, ///< [in] <b>[optional]</b> The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8)
cudaStream_t stream = 0, ///< [in] <b>[optional]</b> CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous = false) ///< [in] <b>[optional]</b> Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. ...
{
// Signed integer type for global offsets
typedef int OffsetT;
// Null value type
xgboost/cub/cub/device/device_radix_sort.cuh view on Meta::CPAN
*
* \endcode
*
* \tparam KeyT <b>[inferred]</b> KeyT type
*/
template <typename KeyT>
CUB_RUNTIME_FUNCTION
static cudaError_t SortKeysDescending(
void *d_temp_storage, ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes, ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys, ///< [in,out] Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted out...
int num_items, ///< [in] Number of items to sort
int begin_bit = 0, ///< [in] <b>[optional]</b> The least-significant bit index (inclusive) needed for key comparison
int end_bit = sizeof(KeyT) * 8, ///< [in] <b>[optional]</b> The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8)
cudaStream_t stream = 0, ///< [in] <b>[optional]</b> CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous = false) ///< [in] <b>[optional]</b> Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. ...
{
// Signed integer type for global offsets
typedef int OffsetT;
// Null value type
xgboost/cub/cub/device/device_segmented_radix_sort.cuh view on Meta::CPAN
* \tparam KeyT <b>[inferred]</b> Key type
* \tparam ValueT <b>[inferred]</b> Value type
*/
template <
typename KeyT,
typename ValueT>
CUB_RUNTIME_FUNCTION
static cudaError_t SortPairs(
void *d_temp_storage, ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes, ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys, ///< [in,out] Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted...
DoubleBuffer<ValueT> &d_values, ///< [in,out] Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output value...
int num_items, ///< [in] The total number of items to sort (across all segments)
int num_segments, ///< [in] The number of segments that comprise the sorting data
const int *d_begin_offsets, ///< [in] %Device-accessible pointer to the sequence of beginning offsets of length \p num_segments, such that <tt>d_begin_offsets[i]</tt> is the first element of the <em>i</em><...
const int *d_end_offsets, ///< [in] %Device-accessible pointer to the sequence of ending offsets of length \p num_segments, such that <tt>d_end_offsets[i]-1</tt> is the last element of the <em>i</em><sup>...
int begin_bit = 0, ///< [in] <b>[optional]</b> The least-significant bit index (inclusive) needed for key comparison
int end_bit = sizeof(KeyT) * 8, ///< [in] <b>[optional]</b> The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8)
cudaStream_t stream = 0, ///< [in] <b>[optional]</b> CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous = false) ///< [in] <b>[optional]</b> Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the consol...
{
// Signed integer type for global offsets
xgboost/cub/cub/device/device_segmented_radix_sort.cuh view on Meta::CPAN
* \tparam KeyT <b>[inferred]</b> Key type
* \tparam ValueT <b>[inferred]</b> Value type
*/
template <
typename KeyT,
typename ValueT>
CUB_RUNTIME_FUNCTION
static cudaError_t SortPairsDescending(
void *d_temp_storage, ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes, ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys, ///< [in,out] Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted...
DoubleBuffer<ValueT> &d_values, ///< [in,out] Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output value...
int num_items, ///< [in] The total number of items to sort (across all segments)
int num_segments, ///< [in] The number of segments that comprise the sorting data
const int *d_begin_offsets, ///< [in] %Device-accessible pointer to the sequence of beginning offsets of length \p num_segments, such that <tt>d_begin_offsets[i]</tt> is the first element of the <em>i</em><...
const int *d_end_offsets, ///< [in] %Device-accessible pointer to the sequence of ending offsets of length \p num_segments, such that <tt>d_end_offsets[i]-1</tt> is the last element of the <em>i</em><sup>...
int begin_bit = 0, ///< [in] <b>[optional]</b> The least-significant bit index (inclusive) needed for key comparison
int end_bit = sizeof(KeyT) * 8, ///< [in] <b>[optional]</b> The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8)
cudaStream_t stream = 0, ///< [in] <b>[optional]</b> CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous = false) ///< [in] <b>[optional]</b> Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the consol...
{
// Signed integer type for global offsets
xgboost/cub/cub/device/device_segmented_radix_sort.cuh view on Meta::CPAN
*
* \endcode
*
* \tparam KeyT <b>[inferred]</b> Key type
*/
template <typename KeyT>
CUB_RUNTIME_FUNCTION
static cudaError_t SortKeys(
void *d_temp_storage, ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes, ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys, ///< [in,out] Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted out...
int num_items, ///< [in] The total number of items to sort (across all segments)
int num_segments, ///< [in] The number of segments that comprise the sorting data
const int *d_begin_offsets, ///< [in] %Device-accessible pointer to the sequence of beginning offsets of length \p num_segments, such that <tt>d_begin_offsets[i]</tt> is the first element of the <em>i</em><sup>...
const int *d_end_offsets, ///< [in] %Device-accessible pointer to the sequence of ending offsets of length \p num_segments, such that <tt>d_end_offsets[i]-1</tt> is the last element of the <em>i</em><sup>th</...
int begin_bit = 0, ///< [in] <b>[optional]</b> The least-significant bit index (inclusive) needed for key comparison
int end_bit = sizeof(KeyT) * 8, ///< [in] <b>[optional]</b> The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8)
cudaStream_t stream = 0, ///< [in] <b>[optional]</b> CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous = false) ///< [in] <b>[optional]</b> Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. ...
{
// Signed integer type for global offsets
xgboost/cub/cub/device/device_segmented_radix_sort.cuh view on Meta::CPAN
*
* \endcode
*
* \tparam KeyT <b>[inferred]</b> Key type
*/
template <typename KeyT>
CUB_RUNTIME_FUNCTION
static cudaError_t SortKeysDescending(
void *d_temp_storage, ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes, ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys, ///< [in,out] Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted out...
int num_items, ///< [in] The total number of items to sort (across all segments)
int num_segments, ///< [in] The number of segments that comprise the sorting data
const int *d_begin_offsets, ///< [in] %Device-accessible pointer to the sequence of beginning offsets of length \p num_segments, such that <tt>d_begin_offsets[i]</tt> is the first element of the <em>i</em><sup>...
const int *d_end_offsets, ///< [in] %Device-accessible pointer to the sequence of ending offsets of length \p num_segments, such that <tt>d_end_offsets[i]-1</tt> is the last element of the <em>i</em><sup>th</...
int begin_bit = 0, ///< [in] <b>[optional]</b> The least-significant bit index (inclusive) needed for key comparison
int end_bit = sizeof(KeyT) * 8, ///< [in] <b>[optional]</b> The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8)
cudaStream_t stream = 0, ///< [in] <b>[optional]</b> CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous = false) ///< [in] <b>[optional]</b> Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. ...
{
// Signed integer type for global offsets
xgboost/cub/cub/device/dispatch/dispatch_radix_sort.cuh view on Meta::CPAN
KEYS_ONLY = (Equals<ValueT, NullType>::VALUE),
};
//------------------------------------------------------------------------------
// Problem state
//------------------------------------------------------------------------------
void *d_temp_storage; ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes; ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys; ///< [in,out] Double-buffer whose current buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys
DoubleBuffer<ValueT> &d_values; ///< [in,out] Double-buffer whose current buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values
OffsetT num_items; ///< [in] Number of items to sort
int begin_bit; ///< [in] The beginning (least-significant) bit index needed for key comparison
int end_bit; ///< [in] The past-the-end (most-significant) bit index needed for key comparison
cudaStream_t stream; ///< [in] CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous; ///< [in] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is \p false.
int ptx_version; ///< [in] PTX version
bool is_overwrite_okay; ///< [in] Whether is okay to overwrite source buffers
//------------------------------------------------------------------------------
xgboost/cub/cub/device/dispatch/dispatch_radix_sort.cuh view on Meta::CPAN
// Dispatch entrypoints
//------------------------------------------------------------------------------
/**
* Internal dispatch routine
*/
CUB_RUNTIME_FUNCTION __forceinline__
static cudaError_t Dispatch(
void* d_temp_storage, ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes, ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys, ///< [in,out] Double-buffer whose current buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys
DoubleBuffer<ValueT> &d_values, ///< [in,out] Double-buffer whose current buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values
OffsetT num_items, ///< [in] Number of items to sort
int begin_bit, ///< [in] The beginning (least-significant) bit index needed for key comparison
int end_bit, ///< [in] The past-the-end (most-significant) bit index needed for key comparison
bool is_overwrite_okay, ///< [in] Whether is okay to overwrite source buffers
cudaStream_t stream, ///< [in] CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous) ///< [in] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is \p false.
{
typedef typename DispatchRadixSort::MaxPolicy MaxPolicyT;
cudaError_t error;
xgboost/cub/cub/device/dispatch/dispatch_radix_sort.cuh view on Meta::CPAN
KEYS_ONLY = (Equals<ValueT, NullType>::VALUE),
};
//------------------------------------------------------------------------------
// Parameter members
//------------------------------------------------------------------------------
void *d_temp_storage; ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes; ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys; ///< [in,out] Double-buffer whose current buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys
DoubleBuffer<ValueT> &d_values; ///< [in,out] Double-buffer whose current buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values
OffsetT num_items; ///< [in] Number of items to sort
OffsetT num_segments; ///< [in] The number of segments that comprise the sorting data
const OffsetT *d_begin_offsets; ///< [in] %Device-accessible pointer to the sequence of beginning offsets of length \p num_segments, such that <tt>d_begin_offsets[i]</tt> is the first element of the <em>i</em><sup>th</sup> data se...
const OffsetT *d_end_offsets; ///< [in] %Device-accessible pointer to the sequence of ending offsets of length \p num_segments, such that <tt>d_end_offsets[i]-1</tt> is the last element of the <em>i</em><sup>th</sup> data segmen...
int begin_bit; ///< [in] The beginning (least-significant) bit index needed for key comparison
int end_bit; ///< [in] The past-the-end (most-significant) bit index needed for key comparison
cudaStream_t stream; ///< [in] CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous; ///< [in] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is \p false.
int ptx_version; ///< [in] PTX version
bool is_overwrite_okay; ///< [in] Whether is okay to overwrite source buffers
xgboost/cub/cub/device/dispatch/dispatch_radix_sort.cuh view on Meta::CPAN
//------------------------------------------------------------------------------
// Dispatch entrypoints
//------------------------------------------------------------------------------
/// Internal dispatch routine
CUB_RUNTIME_FUNCTION __forceinline__
static cudaError_t Dispatch(
void* d_temp_storage, ///< [in] %Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to \p temp_storage_bytes and no work is done.
size_t &temp_storage_bytes, ///< [in,out] Reference to size in bytes of \p d_temp_storage allocation
DoubleBuffer<KeyT> &d_keys, ///< [in,out] Double-buffer whose current buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys
DoubleBuffer<ValueT> &d_values, ///< [in,out] Double-buffer whose current buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values
int num_items, ///< [in] Number of items to sort
int num_segments, ///< [in] The number of segments that comprise the sorting data
const int *d_begin_offsets, ///< [in] %Device-accessible pointer to the sequence of beginning offsets of length \p num_segments, such that <tt>d_begin_offsets[i]</tt> is the first element of the <em>i</em><sup>th</sup> dat...
const int *d_end_offsets, ///< [in] %Device-accessible pointer to the sequence of ending offsets of length \p num_segments, such that <tt>d_end_offsets[i]-1</tt> is the last element of the <em>i</em><sup>th</sup> data se...
int begin_bit, ///< [in] The beginning (least-significant) bit index needed for key comparison
int end_bit, ///< [in] The past-the-end (most-significant) bit index needed for key comparison
bool is_overwrite_okay, ///< [in] Whether is okay to overwrite source buffers
cudaStream_t stream, ///< [in] CUDA stream to launch kernels within. Default is stream<sub>0</sub>.
bool debug_synchronous) ///< [in] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is \p false.
{
xgboost/cub/test/test_block_scan.cu view on Meta::CPAN
{
// Copy out and display block_aggregate
printf("\tScan block aggregate: ");
compare = CompareDeviceResults(h_aggregate, d_aggregate, BLOCK_THREADS, g_verbose, g_verbose);
printf("%s\n", compare ? "FAIL" : "PASS");
AssertEquals(0, compare);
}
if (TEST_MODE == PREFIX)
{
// Copy out and display updated prefix
printf("\tScan running total: ");
T running_total = scan_op(initial_value, block_aggregate);
compare = CompareDeviceResults(&running_total, d_out + TILE_SIZE, 1, g_verbose, g_verbose);
printf("%s\n", compare ? "FAIL" : "PASS");
AssertEquals(0, compare);
}
printf("\tElapsed clocks: ");
DisplayDeviceResults(d_elapsed, 1);
xgboost/doc/R-package/xgboostPresentation.md view on Meta::CPAN
* `xgb.DMatrix`: its own class (recommended).
* Sparsity: it accepts *sparse* input for both *tree booster* and *linear booster*, and is optimized for *sparse* input ;
* Customization: it supports customized objective functions and evaluation functions.
## Installation
### Github version
For weekly updated version (highly recommended), install from *Github*:
```r
install.packages("drat", repos="https://cran.rstudio.com")
drat:::addRepo("dmlc")
install.packages("xgboost", repos="http://dmlc.ml/drat/", type = "source")
```
> *Windows* user will need to install [Rtools](http://cran.r-project.org/bin/windows/Rtools/) first.
xgboost/doc/build.md view on Meta::CPAN
## R Package Installation
### Installing pre-packaged version
You can install xgboost from CRAN just like any other R package:
```r
install.packages("xgboost")
```
Or you can install it from our weekly updated drat repo:
```r
install.packages("drat", repos="https://cran.rstudio.com")
drat:::addRepo("dmlc")
install.packages("xgboost", repos="http://dmlc.ml/drat/", type = "source")
```
For OSX users, single threaded version will be installed. To install multi-threaded version,
first follow [Building on OSX](#building-on-osx) to get the OpenMP enabled compiler, then:
xgboost/doc/parameter.md view on Meta::CPAN
- 'grow_local_histmaker': based on local histogram counting.
- 'grow_skmaker': uses the approximate sketching algorithm.
- 'sync': synchronizes trees in all distributed nodes.
- 'refresh': refreshes tree's statistics and/or leaf values based on the current data. Note that no random subsampling of data rows is performed.
- 'prune': prunes the splits where loss < min_split_loss (or gamma).
- In a distributed setting, the implicit updater sequence value would be adjusted as follows:
- 'grow_histmaker,prune' when dsplit='row' (or default) and prob_buffer_row == 1 (or default); or when data has multiple sparse pages
- 'grow_histmaker,refresh,prune' when dsplit='row' and prob_buffer_row < 1
- 'distcol' when dsplit='col'
* refresh_leaf, [default=1]
- This is a parameter of the 'refresh' updater plugin. When this flag is true, tree leafs as well as tree nodes' stats are updated. When it is false, only node stats are updated.
* process_type, [default='default']
- A type of boosting process to run.
- Choices: {'default', 'update'}
- 'default': the normal boosting process which creates new trees.
- 'update': starts from an existing model and only updates its trees. In each boosting iteration, a tree from the initial model is taken, a specified sequence of updater plugins is run for that tree, and a modified tree is added to the new model....
* grow_policy, string [default='depthwise']
- Controls a way new nodes are added to the tree.
- Currently supported only if `tree_method` is set to 'hist'.
- Choices: {'depthwise', 'lossguide'}
- 'depthwise': split at nodes closest to the root.
xgboost/include/xgboost/gbm.h view on Meta::CPAN
* \param fi input stream.
*/
virtual void Load(dmlc::Stream* fi) = 0;
/*!
* \brief save model to stream.
* \param fo output stream
*/
virtual void Save(dmlc::Stream* fo) const = 0;
/*!
* \brief whether the model allow lazy checkpoint
* return true if model is only updated in DoBoost
* after all Allreduce calls
*/
virtual bool AllowLazyCheckPoint() const {
return false;
}
/*!
* \brief perform update to the model(boosting)
* \param p_fmat feature matrix that provide access to features
* \param in_gpair address of the gradient pair statistics of the data
* \param obj The objective function, optional, can be nullptr when use customized version
xgboost/include/xgboost/tree_updater.h view on Meta::CPAN
virtual ~TreeUpdater() {}
/*!
* \brief Initialize the updater with given arguments.
* \param args arguments to the objective function.
*/
virtual void Init(const std::vector<std::pair<std::string, std::string> >& args) = 0;
/*!
* \brief perform update to the tree models
* \param gpair the gradient pair statistics of the data
* \param data The data matrix passed to the updater.
* \param trees references the trees to be updated, updater will change the content of trees
* note: all the trees in the vector are updated, with the same statistics,
* but maybe different random seeds, usually one tree is passed in at a time,
* there can be multiple trees when we train random forest style model
*/
virtual void Update(const std::vector<bst_gpair>& gpair,
DMatrix* data,
const std::vector<RegTree*>& trees) = 0;
/*!
* \brief determines whether updater has enough knowledge about a given dataset
* to quickly update prediction cache its training data and performs the
* update if possible.
* \param data: data matrix
* \param out_preds: prediction cache to be updated
* \return boolean indicating whether updater has capability to update
* the prediction cache. If true, the prediction cache will have been
* updated by the time this function returns.
*/
virtual bool UpdatePredictionCache(const DMatrix* data,
std::vector<bst_float>* out_preds) {
return false;
}
/*!
* \brief Create a tree updater given name
* \param name Name of the tree updater.
*/
static TreeUpdater* Create(const std::string& name);
xgboost/jvm-packages/xgboost4j/src/main/scala/ml/dmlc/xgboost4j/scala/rabit/handler/RabitTrackerHandler.scala view on Meta::CPAN
.filter{ r => r != -1 && r < rank}
log.debug(s"Rank $rank connected, dependencies: $dependentWorkers")
dependencyMap.put(rank, dependentWorkers)
case RequestAwaitConnWorkers(rank, toConnectSet) =>
val promise = Promise[AwaitingConnections]()
assert(dependencyMap.contains(rank))
val updatedDependency = dependencyMap(rank) diff startedWorkers
if (updatedDependency.isEmpty) {
// all dependencies are satisfied
log.debug(s"Rank $rank has all dependencies satisfied.")
promise.success(awaitingWorkers(toConnectSet))
} else {
log.debug(s"Rank $rank's request for AwaitConnWorkers is pending fulfillment.")
// promise is pending fulfillment due to unresolved dependency
pendingFulfillment.put(rank, Fulfillment(toConnectSet, promise))
}
sender() ! promise.future
case WorkerStarted(_, started, awaitingAcceptance) =>
startedWorkers.add(started)
if (awaitingAcceptance > 0) {
awaitConnWorkers.put(started, sender())
}
// remove the started rank from all dependencies.
dependencyMap.remove(started)
dependencyMap.foreach { case (r, dset) =>
val updatedDependency = dset diff startedWorkers
// fulfill the future if all dependencies are met (started.)
if (updatedDependency.isEmpty) {
log.debug(s"Rank $r has all dependencies satisfied.")
pendingFulfillment.remove(r).map{
case Fulfillment(toConnectSet, promise) =>
promise.success(awaitingWorkers(toConnectSet))
}
}
dependencyMap.update(r, updatedDependency)
}
case DropFromWaitingList(rank) =>
assert(awaitConnWorkers.remove(rank).isDefined)
case Terminated(ref) =>
if (ref.equals(handler)) {
context.stop(self)
}
}
xgboost/nccl/src/primitives.h view on Meta::CPAN
*
* In order to reduce the reptetion of template arguments, the operations
* are bundled as static methods of the Primitives class.
*
* Each primitive operation copies/reduces a contiguous buffer and syncs
* an optional set of flags against a sub-step counter. The sync value is
* based on the step parameter. Sync flags must be of type WaitFlag or
* PostFlag. The primitive routines wait for all WaitFlag args to attain
* at least a value of SUBSTEPS*(step-1)+substep+1 (i.e. completion of
* corresponding substep by previous step) before executing the transfer.
* After each substep is transfered, all PostFlag arguments get updated to
* the value SUBSTEPS*step+substep+1.
*/
class WaitFlag {
volatile int * const flag;
const int shift;
public:
__device__ __forceinline__
WaitFlag(volatile int * const flag, const int shift) : flag(flag), shift(shift) { }
xgboost/plugin/updater_gpu/src/exact/gpu_builder.cuh view on Meta::CPAN
__global__ void assignNodeIds(node_id_t* nodeIdsPerInst, int* nodeLocations,
const node_id_t* nodeIds, const int* instId,
const Node<node_id_t>* nodes,
const int* colOffsets, const float* vals,
int nVals, int nCols) {
int id = threadIdx.x + (blockIdx.x * blockDim.x);
const int stride = blockDim.x * gridDim.x;
for (; id < nVals; id += stride) {
// fusing generation of indices for node locations
nodeLocations[id] = id;
// using nodeIds here since the previous kernel would have updated
// the nodeIdsPerInst with all default assignments
int nId = nodeIds[id];
// if this element belongs to none of the currently active node-id's
if (nId != UNUSED_NODE) {
const Node<node_id_t> n = nodes[nId];
int colId = n.colIdx;
// printf("nid=%d colId=%d id=%d\n", nId, colId, id);
int start = colOffsets[colId];
int end = colOffsets[colId + 1];
///@todo: too much wasteful threads!!
xgboost/plugin/updater_gpu/src/exact/split2node.cuh view on Meta::CPAN
#include "../../../../src/tree/param.h"
#include "node.cuh"
namespace xgboost {
namespace tree {
namespace exact {
/**
* @brief Helper function to update the child node based on the current status
* of its parent node
* @param nodes the nodes array in which the position at 'nid' will be updated
* @param nid the nodeId in the 'nodes' array corresponding to this child node
* @param grad gradient sum for this child node
* @param minChildWeight minimum child weight for the split
* @param alpha L1 regularizer for weight updates
* @param lambda lambda as in xgboost
* @param maxStep max weight step update
*/
template <typename node_id_t>
DEV_INLINE void updateOneChildNode(Node<node_id_t>* nodes, int nid,
const bst_gpair& grad,
const TrainParam& param) {
nodes[nid].gradSum = grad;
nodes[nid].score = CalcGain(param, grad.grad, grad.hess);
nodes[nid].weight = CalcWeight(param, grad.grad, grad.hess);
nodes[nid].id = nid;
}
/**
* @brief Helper function to update the child nodes based on the current status
* of their parent node
* @param nodes the nodes array in which the position at 'nid' will be updated
* @param pid the nodeId of the parent
* @param gradL gradient sum for the left child node
* @param gradR gradient sum for the right child node
* @param param the training parameter struct
*/
template <typename node_id_t>
DEV_INLINE void updateChildNodes(Node<node_id_t>* nodes, int pid,
const bst_gpair& gradL, const bst_gpair& gradR,
const TrainParam& param) {
int childId = (pid * 2) + 1;