Alien-XGBoost

 view release on metacpan or  search on metacpan

xgboost/doc/tutorials/aws_yarn.md  view on Meta::CPAN


```bash
cd xgboost
s3cmd put demo/data/agaricus.txt.train s3://${BUCKET}/xgb-demo/train/
s3cmd put demo/data/agaricus.txt.test s3://${BUCKET}/xgb-demo/test/
```

Submit the Jobs
---------------
Now everything is ready, we can submit the xgboost distributed job to the YARN cluster.
We will use the [dmlc-submit](https://github.com/dmlc/dmlc-core/tree/master/tracker) script to submit the job.

Now we can run the following script in the distributed training folder (replace ${BUCKET} with the real bucket name)
```bash
cd xgboost/demo/distributed-training
# Use dmlc-submit to submit the job.
../../dmlc-core/tracker/dmlc-submit --cluster=yarn --num-workers=2 --worker-cores=2\
    ../../xgboost mushroom.aws.conf nthread=2\
    data=s3://${BUCKET}/xgb-demo/train\
    eval[test]=s3://${BUCKET}/xgb-demo/test\
    model_dir=s3://${BUCKET}/xgb-demo/model
```
All the configurations such as ```data``` and ```model_dir``` can also be directly written into the configuration file.
Note that we only specified the folder path to the file, instead of the file name.
XGBoost will read in all the files in that folder as the training and evaluation data.

In this command, we are using two workers, and each worker uses two running threads.
XGBoost can benefit from using multiple cores in each worker.
A common choice of working cores can range from 4 to 8.
The trained model will be saved into the specified model folder. You can browse the model folder.
```
s3cmd ls s3://${BUCKET}/xgb-demo/model/
```

The following is an example output from distributed training.
```
16/02/26 05:41:59 INFO dmlc.Client: jobname=DMLC[nworker=2]:xgboost,username=ubuntu
16/02/26 05:41:59 INFO dmlc.Client: Submitting application application_1456461717456_0015
16/02/26 05:41:59 INFO impl.YarnClientImpl: Submitted application application_1456461717456_0015
2016-02-26 05:42:05,230 INFO @tracker All of 2 nodes getting started
2016-02-26 05:42:14,027 INFO [05:42:14] [0]  test-error:0.016139        train-error:0.014433
2016-02-26 05:42:14,186 INFO [05:42:14] [1]  test-error:0.000000        train-error:0.001228
2016-02-26 05:42:14,947 INFO @tracker All nodes finishes job
2016-02-26 05:42:14,948 INFO @tracker 9.71754479408 secs between node start and job finish
Application application_1456461717456_0015 finished with state FINISHED at 1456465335961
```

Analyze the Model
-----------------
After the model is trained, we can analyse the learnt model and use it for future prediction tasks.
XGBoost is a portable framework, meaning the models in all platforms are ***exchangeable***.
This means we can load the trained model in python/R/Julia and take benefit of data science pipelines
in these languages to do model analysis and prediction.

For example, you can use [this IPython notebook](https://github.com/dmlc/xgboost/tree/master/demo/distributed-training/plot_model.ipynb)
to plot feature importance and visualize the learnt model.

Troubleshooting
----------------

If you encounter a problem, the best way might be to use the following command
to get logs of stdout and stderr of the containers and check what causes the problem.
```
yarn logs -applicationId yourAppId
```

Future Directions
-----------------
You have learned to use distributed XGBoost on YARN in this tutorial.
XGBoost is a portable and scalable framework for gradient boosting.
You can check out more examples and resources in the [resources page](https://github.com/dmlc/xgboost/blob/master/demo/README.md).

The project goal is to make the best scalable machine learning solution available to all platforms.
The API is designed to be able to portable, and the same code can also run on other platforms such as MPI and SGE.
XGBoost is actively evolving and we are working on even more exciting features
such as distributed xgboost python/R package. Checkout [RoadMap](https://github.com/dmlc/xgboost/issues/873) for
more details and you are more than welcomed to contribute to the project.



( run in 0.624 second using v1.01-cache-2.11-cpan-39bf76dae61 )