Alien-XGBoost

 view release on metacpan or  search on metacpan

xgboost/demo/binary_classification/README.md  view on Meta::CPAN

../../xgboost mushroom.conf task=dump model_in=0002.model fmap=featmap.txt name_dump=dump.nice.txt
```

In this demo, the tree boosters obtained will be printed in dump.raw.txt and dump.nice.txt, and the latter one is easier to understand because of usage of feature mapping featmap.txt

Format of ```featmap.txt: <featureid> <featurename> <q or i or int>\n ```:
  - Feature id must be from 0 to number of features, in sorted order.
  - i means this feature is binary indicator feature
  - q means this feature is a quantitative value, such as age, time, can be missing
  - int means this feature is integer value (when int is hinted, the decision boundary will be integer)

#### Monitoring Progress
When you run training we can find there are messages displayed on screen
```
tree train end, 1 roots, 12 extra nodes, 0 pruned nodes ,max_depth=3
[0]  test-error:0.016139
boosting round 1, 0 sec elapsed

tree train end, 1 roots, 10 extra nodes, 0 pruned nodes ,max_depth=3
[1]  test-error:0.000000
```
The messages for evaluation are printed into stderr, so if you want only to log the evaluation progress, simply type
```
../../xgboost mushroom.conf 2>log.txt
```
Then you can find the following content in log.txt
```
[0]     test-error:0.016139
[1]     test-error:0.000000
```
We can also monitor both training and test statistics, by adding following lines to configure
```conf
eval[test] = "agaricus.txt.test"
eval[trainname] = "agaricus.txt.train"
```
Run the command again, we can find the log file becomes
```
[0]     test-error:0.016139     trainname-error:0.014433
[1]     test-error:0.000000     trainname-error:0.001228
```
The rule is eval[name-printed-in-log] = filename, then the file will be added to monitoring process, and evaluated each round.

xgboost also supports monitoring multiple metrics, suppose we also want to monitor average log-likelihood of each prediction during training, simply add ```eval_metric=logloss``` to configure. Run again, we can find the log file becomes
```
[0]     test-error:0.016139     test-negllik:0.029795   trainname-error:0.014433        trainname-negllik:0.027023
[1]     test-error:0.000000     test-negllik:0.000000   trainname-error:0.001228        trainname-negllik:0.002457
```
### Saving Progress Models
If you want to save model every two round, simply set save_period=2. You will find 0002.model in the current folder. If you want to change the output folder of models, add model_dir=foldername. By default xgboost saves the model of last round.

#### Continue from Existing Model
If you want to continue boosting from existing model, say 0002.model, use
```
../../xgboost mushroom.conf model_in=0002.model num_round=2 model_out=continue.model
```
xgboost will load from 0002.model continue boosting for 2 rounds, and save output to continue.model. However, beware that the training and evaluation data specified in mushroom.conf should not change when you use this function.
#### Use Multi-Threading
When you are working with a large dataset, you may want to take advantage of parallelism. If your compiler supports OpenMP, xgboost is naturally multi-threaded, to set number of parallel running add ```nthread``` parameter to you configuration.
Eg. ```nthread=10```

Set nthread to be the number of your real cpu (On Unix, this can be found using ```lscpu```)
Some systems will have ```Thread(s) per core = 2```, for example, a 4 core cpu with 8 threads, in such case set ```nthread=4``` and not 8.



( run in 0.609 second using v1.01-cache-2.11-cpan-39bf76dae61 )