Alien-XGBoost

 view release on metacpan or  search on metacpan

xgboost/doc/python/python_intro.md  view on Meta::CPAN

Python Package Introduction
===========================
This document gives a basic walkthrough of xgboost python package.

***List of other Helpful Links***
* [Python walkthrough code collections](https://github.com/tqchen/xgboost/blob/master/demo/guide-python)
* [Python API Reference](python_api.rst)

Install XGBoost
---------------
To install XGBoost, do the following:

* Run `make` in the root directory of the project
* In the  `python-package` directory, run
```shell
python setup.py install
```

To verify your installation, try to `import xgboost` in Python.
```python
import xgboost as xgb
```

Data Interface
--------------
The XGBoost python module is able to load data from:
- libsvm txt format file
- Numpy 2D array, and
- xgboost binary buffer file.

The data is stored in a ```DMatrix``` object.

* To load a libsvm text file or a XGBoost binary file into ```DMatrix```:
```python
dtrain = xgb.DMatrix('train.svm.txt')
dtest = xgb.DMatrix('test.svm.buffer')
```
* To load a numpy array into ```DMatrix```:
```python
data = np.random.rand(5,10) # 5 entities, each contains 10 features
label = np.random.randint(2, size=5) # binary target
dtrain = xgb.DMatrix( data, label=label)
```
* To load a scpiy.sparse array into ```DMatrix```:
```python
csr = scipy.sparse.csr_matrix((dat, (row, col)))
dtrain = xgb.DMatrix(csr)
```
* Saving ```DMatrix``` into a XGBoost binary file will make loading faster:
```python
dtrain = xgb.DMatrix('train.svm.txt')
dtrain.save_binary("train.buffer")
```
* Missing values can be replaced by a default value in the ```DMatrix``` constructor:
```python
dtrain = xgb.DMatrix(data, label=label, missing = -999.0)
```
* Weights can be set when needed:
```python
w = np.random.rand(5, 1)
dtrain = xgb.DMatrix(data, label=label, missing = -999.0, weight=w)
```

Setting Parameters
------------------
XGBoost can use either a list of pairs or a dictionary to set [parameters](../parameter.md). For instance:
* Booster parameters
```python
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
param['nthread'] = 4
param['eval_metric'] = 'auc'
```
* You can also specify multiple eval metrics:
```python
param['eval_metric'] = ['auc', 'ams@0']

# alternatively:
# plst = param.items()
# plst += [('eval_metric', 'ams@0')]



( run in 0.716 second using v1.01-cache-2.11-cpan-e1769b4cff6 )