edit results from the CPAN

edit
Alien-XGBoost
view release on metacpan or search on metacpan
xgboost/R-package/man/xgb.create.features.Rd view on Meta::CPAN
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.create.features.R
\name{xgb.create.features}
\alias{xgb.create.features}
\title{Create new features from a previously learned model}
\usage{
xgb.create.features(model, data, ...)
}
\arguments{
\item{model}{decision tree boosting model learned on the original data}

\item{data}{original data (usually provided as a \code{dgCMatrix} matrix)}

\item{...}{currently not used}
}
\value{
\code{dgCMatrix} matrix including both the original data and the new features.
}
\description{
May improve the learning by adding new features to the training data based on the decision trees from a previously learned model.
}
\details{
This is the function inspired from the paragraph 3.1 of the paper:

\strong{Practical Lessons from Predicting Clicks on Ads at Facebook}

\emph{(Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yan, xin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, 
Joaquin Quinonero Candela)}
 
International Workshop on Data Mining for Online Advertising (ADKDD) - August 24, 2014

\url{https://research.fb.com/publications/practical-lessons-from-predicting-clicks-on-ads-at-facebook/}.

Extract explaining the method:

"We found that boosted decision trees are a powerful and very
convenient way to implement non-linear and tuple transformations
of the kind we just described. We treat each individual
tree as a categorical feature that takes as value the
index of the leaf an instance ends up falling in. We use 
1-of-K coding of this type of features. 

For example, consider the boosted tree model in Figure 1 with 2 subtrees, 
where the first subtree has 3 leafs and the second 2 leafs. If an
instance ends up in leaf 2 in the first subtree and leaf 1 in
second subtree, the overall input to the linear classifier will
be the binary vector \code{[0, 1, 0, 1, 0]}, where the first 3 entries
correspond to the leaves of the first subtree and last 2 to
those of the second subtree.

[...]

We can understand boosted decision tree
based transformation as a supervised feature encoding that
converts a real-valued vector into a compact binary-valued
vector. A traversal from root node to a leaf node represents
a rule on certain features."
}
\examples{
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
dtrain <- xgb.DMatrix(data = agaricus.train$data, label = agaricus.train$label)
( run in 1.528 second using v1.01-cache-2.11-cpan-13bb782fe5a )