AI-MaxEntropy
view release on metacpan or search on metacpan
For example,
...
$me->cut(1)
will cut all features that occur less than one time.
learn
Learn a model from all the samples that the learner have seen so far,
returns an AI::MaxEntropy::Model object, which can be used to make
prediction on unlabeled samples.
...
my $model = $me->learn;
print $model->predict(['x1', 'x2', ...]);
PROPERTIES
algorithm
This property enables client program to choose different algorithms for
learning the ME model and set their parameters.
There are mainly 3 algorithm for learning ME models, they are GIS, IIS
and L-BFGS. This module implements 2 of them, namely, L-BFGS and GIS.
L-BFGS provides full functionality, while GIS runs faster, but only
applicable on limited scenarios.
To use GIS, the following conditions must be satisified:
1. All samples have same number of active features
2. No feature has been cut
3. No smoother is used (in fact, the property "smoother" is simplly
ignored when the type of algorithm equal to 'gis').
This property "algorithm" is supposed to be a hash ref, like
{
type => ...,
progress_cb => ...,
param_1 => ...,
param_2 => ...,
...,
param_n => ...
}
type
The entry "type => ..." specifies which algorithm is used for the
optimization. Valid values include:
'lbfgs' Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)
'gis' General Iterative Scaling (GIS)
If ommited, 'lbfgs' is used by default.
progress_cb
The entry "progress_cb => ..." specifies the progress callback
subroutine which is used to trace the process of the algorithm. The
specified callback routine will be called at each iteration of the
algorithm.
For L-BFGS, "progress_cb" will be directly passed to "fmin" in
Algorithm::LBFGS. f(x) is the negative log-likelihood of current lambda
vector.
For GIS, the "progress_cb" is supposed to have a prototype like
progress_cb(i, lambda, d_lambda, lambda_norm, d_lambda_norm)
"i" is the number of the iterations, "lambda" is an array ref containing
the current lambda vector, "d_lambda" is an array ref containing the
delta of the lambda vector in current iteration, "lambda_norm" and
"d_lambda_norm" are Euclid norms of "lambda" and "d_lambda"
respectively.
For both L-BFGS and GIS, the client program can also pass a string
'verbose' to "progress_cb" to use a default progress callback which
simply print out the progress on the screen.
"progress_cb" can also be omitted if the client program do not want to
trace the progress.
parameters
The rest entries are parameters for the specified algorithm. Each
parameter will be assigned with its default value when it is not given
explicitly.
For L-BFGS, the parameters will be directly passed to Algorithm::LBFGS
object, please refer to "Parameters" in Algorithm::LBFGS for details.
For GIS, there is only one parameter "epsilon", which controls the
precision of the algorithm (similar to the "epsilon" in
Algorithm::LBFGS). Generally speaking, a smaller "epsilon" produces a
more precise result. The default value of "epsilon" is 1e-3.
smoother
The smoother is a solution to the over-fitting problem. This property
chooses which type of smoother the client program want to apply and sets
the smoothing parameters.
Only one smoother have been implemented in this version of the module,
the Gaussian smoother.
One can apply the Gaussian smoother as following,
my $me = AI::MaxEntropy->new(
smoother => { type => 'gaussian', sigma => 0.6 }
);
The parameter "sigma" indicates the strength of smoothing. Usually,
sigma is a positive number no greater than 1.0. The strength of
smoothing grows as sigma getting close to 0.
SEE ALSO
AI::MaxEntropy::Model, AI::MaxEntropy::Util
Algorithm::LBFGS
( run in 1.291 second using v1.01-cache-2.11-cpan-0c5ce583b80 )