Alien-XGBoost

 view release on metacpan or  search on metacpan

xgboost/cub/CHANGE_LOG.TXT  view on Meta::CPAN

          pointer-to-const type 
        - Mollify Clang device-side warnings
        - Remove out-dated VC project files
          		  
//-----------------------------------------------------------------------------

1.6.3    11/20/2016
    - API change: BlockLoad and BlockStore are now templated by the local
      data type, instead of the Iterator type.  This allows for output iterators
      having \p void as their \p value_type (e.g., discard iterators).
    - Updated GP100 tuning policies for radix sorting (6.2B 32b keys/s)
    - Bug fixes: 
        - Issue #74: Warpreduce executes reduction operator for out-of-bounds items
        - Issue #72 (cub:InequalityWrapper::operator() should be non-const)
        - Issue #71 (KeyVairPair won't work if Key has non-trivial ctor)
		- Issue #70 1.5.3 breaks BlockScan API.  Retroactively reversioned
		  from v1.5.3 -> v1.6 to appropriately indicate API change.
		- Issue #69 cub::BlockStore::Store doesn't compile if OutputIteratorT::value_type != T  
        - Issue #68 (cub::TilePrefixCallbackOp::WarpReduce doesn't permit ptx 
          arch specialization)
		- Improved support for Win32 platforms (warnings, alignment, etc)
		  
//-----------------------------------------------------------------------------

1.6.2 (was 1.5.5)    10/25/2016
    - Updated Pascal tuning policies for radix sorting
    - Bug fixes: 
        - Fix for arm64 compilation of caching allocator

//-----------------------------------------------------------------------------

1.6.1 (was 1.5.4)    10/14/2016
    - Bug fixes: 
        - Fix for radix sorting bug introduced by scan refactorization

//-----------------------------------------------------------------------------

1.6.0 (was 1.5.3)    10/11/2016
    - API change: Device/block/warp-wide exclusive scans have been revised to now 
      accept an "initial value" (instead of an "identity value") for seeding the 
      computation with an arbitrary prefix.  
    - API change: Device-wide reductions and scans can now have input sequence types that are 
      different from output sequence types (as long as they are coercible)
      value") for seeding the computation with an arbitrary prefix
    - Reduce repository size (move doxygen binary to doc repository)
    - Minor reductions in block-scan instruction count
    - Bug fixes: 
        - Issue #55: warning in cub/device/dispatch/dispatch_reduce_by_key.cuh 
        - Issue #59: cub::DeviceScan::ExclusiveSum can't prefix sum of float into double
        - Issue #58: Infinite loop in cub::CachingDeviceAllocator::NearestPowerOf
        - Issue #47: Caching allocator needs to clean up cuda error upon successful retry 
        - Issue #46: Very high amount of needed memory from the cub::DeviceHistogram::HistogramEven routine
        - Issue #45: Caching Device Allocator fails with debug output enabled
        - Fix for generic-type reduce-by-key warpscan (sm3.x and newer)

//-----------------------------------------------------------------------------

1.5.2    03/21/2016
	- Improved medium-size scan performance for sm5x (Maxwell)
    - Refactored caching allocator for device memory
   		- Spends less time locked
		- Failure to allocate a block from the runtime will retry once after
		  freeing cached allocations
		- Now respects max-bin (issue where blocks in excess of max-bin were
		  still being retained in free cache)
		- Uses C++11 mutex when available
    - Bug fixes: 
        - Fix for generic-type reduce-by-key warpscan (sm3.x and newer)
          
//-----------------------------------------------------------------------------

1.5.1    12/28/2015
    - Bug fixes: 
        - Fix for incorrect DeviceRadixSort output for some small problems on 
          Maxwell SM52 architectures
        - Fix for macro redefinition warnings when compiling with Thrust sort
          
//-----------------------------------------------------------------------------

1.5.0    12/14/2015
    - New Features:
        - Added new segmented device-wide operations for device-wide sort and 
          reduction primitives.
    - Bug fixes: 
        - Fix for Git Issue 36 (Compilation error with GCC 4.8.4 nvcc 7.0.27) and
          Forums thread (ThreadLoad generates compiler errors when loading from 
          pointer-to-const)
        - Fix for Git Issue 29 (DeviceRadixSort::SortKeys<bool> yields compiler 
          errors)
        - Fix for Git Issue 26 (CUDA error: misaligned address after 
          cub::DeviceRadixSort::SortKeys())
        - Fix for incorrect/crash on 0-length problems, e.g., Git Issue 25 (Floating 
          point exception (core dumped) during cub::DeviceRadixSort::SortKeys)
        - Fix for CUDA 7.5 issues on SM 5.2 with SHFL-based warp-scan and warp-reduction 
          on non-primitive data types (e.g., user-defined structs)
        - Fix for small radix sorting problems where 0 temporary bytes were 
          required and users code was invoking malloc(0) on some systems where
          that returns NULL.  (Impl assumed was asking for size again and was not 
          running the sort.)
          
//-----------------------------------------------------------------------------

1.4.1    04/13/2015
    - Bug fixes: 
        - Fixes for CUDA 7.0 issues with SHFL-based warp-scan and warp-reduction 
          on non-primitive data types (e.g., user-defined structs)
        - Fixes for minor CUDA 7.0 performance regressions in cub::DeviceScan,
          DeviceReduceByKey
        - Fixes to allow cub::DeviceRadixSort and cub::BlockRadixSort on bool types
        - Remove requirement for callers to define the CUB_CDP macro 
          when invoking CUB device-wide rountines using CUDA dynamic parallelism
        - Fix for headers not being included in the proper order (or missing includes)
          for some block-wide functions
          
//-----------------------------------------------------------------------------

1.4.0    03/18/2015
    - New Features:
		- Support and performance tuning for new Maxwell GPU architectures
        - Updated cub::DeviceHistogram implementation that provides the same 
          "histogram-even" and "histogram-range" functionality as IPP/NPP.



( run in 1.383 second using v1.01-cache-2.11-cpan-39bf76dae61 )