Alien-XGBoost
view release on metacpan or search on metacpan
xgboost/cub/CHANGE_LOG.TXT view on Meta::CPAN
pointer-to-const type
- Mollify Clang device-side warnings
- Remove out-dated VC project files
//-----------------------------------------------------------------------------
1.6.3 11/20/2016
- API change: BlockLoad and BlockStore are now templated by the local
data type, instead of the Iterator type. This allows for output iterators
having \p void as their \p value_type (e.g., discard iterators).
- Updated GP100 tuning policies for radix sorting (6.2B 32b keys/s)
- Bug fixes:
- Issue #74: Warpreduce executes reduction operator for out-of-bounds items
- Issue #72 (cub:InequalityWrapper::operator() should be non-const)
- Issue #71 (KeyVairPair won't work if Key has non-trivial ctor)
- Issue #70 1.5.3 breaks BlockScan API. Retroactively reversioned
from v1.5.3 -> v1.6 to appropriately indicate API change.
- Issue #69 cub::BlockStore::Store doesn't compile if OutputIteratorT::value_type != T
- Issue #68 (cub::TilePrefixCallbackOp::WarpReduce doesn't permit ptx
arch specialization)
- Improved support for Win32 platforms (warnings, alignment, etc)
//-----------------------------------------------------------------------------
1.6.2 (was 1.5.5) 10/25/2016
- Updated Pascal tuning policies for radix sorting
- Bug fixes:
- Fix for arm64 compilation of caching allocator
//-----------------------------------------------------------------------------
1.6.1 (was 1.5.4) 10/14/2016
- Bug fixes:
- Fix for radix sorting bug introduced by scan refactorization
//-----------------------------------------------------------------------------
1.6.0 (was 1.5.3) 10/11/2016
- API change: Device/block/warp-wide exclusive scans have been revised to now
accept an "initial value" (instead of an "identity value") for seeding the
computation with an arbitrary prefix.
- API change: Device-wide reductions and scans can now have input sequence types that are
different from output sequence types (as long as they are coercible)
value") for seeding the computation with an arbitrary prefix
- Reduce repository size (move doxygen binary to doc repository)
- Minor reductions in block-scan instruction count
- Bug fixes:
- Issue #55: warning in cub/device/dispatch/dispatch_reduce_by_key.cuh
- Issue #59: cub::DeviceScan::ExclusiveSum can't prefix sum of float into double
- Issue #58: Infinite loop in cub::CachingDeviceAllocator::NearestPowerOf
- Issue #47: Caching allocator needs to clean up cuda error upon successful retry
- Issue #46: Very high amount of needed memory from the cub::DeviceHistogram::HistogramEven routine
- Issue #45: Caching Device Allocator fails with debug output enabled
- Fix for generic-type reduce-by-key warpscan (sm3.x and newer)
//-----------------------------------------------------------------------------
1.5.2 03/21/2016
- Improved medium-size scan performance for sm5x (Maxwell)
- Refactored caching allocator for device memory
- Spends less time locked
- Failure to allocate a block from the runtime will retry once after
freeing cached allocations
- Now respects max-bin (issue where blocks in excess of max-bin were
still being retained in free cache)
- Uses C++11 mutex when available
- Bug fixes:
- Fix for generic-type reduce-by-key warpscan (sm3.x and newer)
//-----------------------------------------------------------------------------
1.5.1 12/28/2015
- Bug fixes:
- Fix for incorrect DeviceRadixSort output for some small problems on
Maxwell SM52 architectures
- Fix for macro redefinition warnings when compiling with Thrust sort
//-----------------------------------------------------------------------------
1.5.0 12/14/2015
- New Features:
- Added new segmented device-wide operations for device-wide sort and
reduction primitives.
- Bug fixes:
- Fix for Git Issue 36 (Compilation error with GCC 4.8.4 nvcc 7.0.27) and
Forums thread (ThreadLoad generates compiler errors when loading from
pointer-to-const)
- Fix for Git Issue 29 (DeviceRadixSort::SortKeys<bool> yields compiler
errors)
- Fix for Git Issue 26 (CUDA error: misaligned address after
cub::DeviceRadixSort::SortKeys())
- Fix for incorrect/crash on 0-length problems, e.g., Git Issue 25 (Floating
point exception (core dumped) during cub::DeviceRadixSort::SortKeys)
- Fix for CUDA 7.5 issues on SM 5.2 with SHFL-based warp-scan and warp-reduction
on non-primitive data types (e.g., user-defined structs)
- Fix for small radix sorting problems where 0 temporary bytes were
required and users code was invoking malloc(0) on some systems where
that returns NULL. (Impl assumed was asking for size again and was not
running the sort.)
//-----------------------------------------------------------------------------
1.4.1 04/13/2015
- Bug fixes:
- Fixes for CUDA 7.0 issues with SHFL-based warp-scan and warp-reduction
on non-primitive data types (e.g., user-defined structs)
- Fixes for minor CUDA 7.0 performance regressions in cub::DeviceScan,
DeviceReduceByKey
- Fixes to allow cub::DeviceRadixSort and cub::BlockRadixSort on bool types
- Remove requirement for callers to define the CUB_CDP macro
when invoking CUB device-wide rountines using CUDA dynamic parallelism
- Fix for headers not being included in the proper order (or missing includes)
for some block-wide functions
//-----------------------------------------------------------------------------
1.4.0 03/18/2015
- New Features:
- Support and performance tuning for new Maxwell GPU architectures
- Updated cub::DeviceHistogram implementation that provides the same
"histogram-even" and "histogram-range" functionality as IPP/NPP.
( run in 1.383 second using v1.01-cache-2.11-cpan-39bf76dae61 )