Alien-XGBoost
view release on metacpan or search on metacpan
xgboost/rabit/doc/guide.md view on Meta::CPAN
* When node 1 reaches the second Allreduce, the other nodes find out that node 1 has catched up and they can continue the program normally.
This fault tolerance model is based on a key property of Allreduce and
Broadcast: All the nodes get the same result after calling Allreduce/Broadcast.
Because of this property, any node can record the results of history
Allreduce/Broadcast calls. When a node is recovered, it can fetch the lost
results from some alive nodes and rebuild its model.
The checkpoint is introduced so that we can discard the history results of
Allreduce/Broadcast calls before the latest checkpoint. This saves memory
consumption used for backup. The checkpoint of each node is a model defined by
users and can be split into 2 parts: a global model and a local model. The
global model is shared by all nodes and can be backed up by any nodes. The
local model of a node is replicated to some other nodes (selected using a ring
replication strategy). The checkpoint is only saved in the memory without
touching the disk which makes rabit programs more efficient. The strategy of
rabit is different from the fail-restart strategy where all the nodes restart
from the same checkpoint when any of them fail. In rabit, all the alive nodes
will block in the Allreduce call and help the recovery. To catch up, the
recovered node fetches its latest checkpoint and the results of
Allreduce/Broadcast calls after the checkpoint from some alive nodes.
xgboost/src/common/quantile.h view on Meta::CPAN
* \tparam RType type of rank
* \tparam TSummary actual summary data structure it uses
*/
template<typename DType, typename RType, class TSummary>
class QuantileSketchTemplate {
public:
/*! \brief type of summary type */
typedef TSummary Summary;
/*! \brief the entry type */
typedef typename Summary::Entry Entry;
/*! \brief same as summary, but use STL to backup the space */
struct SummaryContainer : public Summary {
std::vector<Entry> space;
SummaryContainer(const SummaryContainer &src) : Summary(NULL, src.size) {
this->space = src.space;
this->data = dmlc::BeginPtr(this->space);
}
SummaryContainer() : Summary(NULL, 0) {
}
/*! \brief reserve space for summary */
inline void Reserve(size_t size) {
( run in 0.497 second using v1.01-cache-2.11-cpan-49f99fa48dc )