AI-MaxEntropy

 view release on metacpan or  search on metacpan

lib/AI/MaxEntropy.pm  view on Meta::CPAN


  print $model->predict(['x1', 'x2', ...]);

=head1 PROPERTIES

=head2 algorithm

This property enables client program to choose different algorithms for
learning the ME model and set their parameters.

There are mainly 3 algorithm for learning ME models, they are GIS, IIS and
L-BFGS. This module implements 2 of them, namely,  L-BFGS and GIS.
L-BFGS provides full functionality, while GIS runs faster, but only 
applicable on limited scenarios.

To use GIS, the following conditions must be satisified:

1. All samples have same number of active features

2. No feature has been cut

3. No smoother is used (in fact, the property L</smoother> is simplly
ignored when the type of algorithm equal to 'gis').

This property C<algorithm> is supposed to be a hash ref, like

  {
    type => ...,
    progress_cb => ...,
    param_1 => ...,
    param_2 => ...,
    ...,
    param_n => ...
  }

=head3 type

The entry C<type =E<gt> ...> specifies which algorithm is used for the 
optimization. Valid values include:

  'lbfgs'       Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)
  'gis'         General Iterative Scaling (GIS)

If ommited, C<'lbfgs'> is used by default.

=head3 progress_cb

The entry C<progress_cb =E<gt> ...> specifies the progress callback
subroutine which is used to trace the process of the algorithm. 
The specified callback routine will be called at each iteration of the
algorithm.

For L-BFGS, C<progress_cb> will be directly passed to
L<Algorithm::LBFGS/fmin>. C<f(x)> is the negative log-likelihood of current
lambda vector.

For GIS, the C<progress_cb> is supposed to have a prototype like

  progress_cb(i, lambda, d_lambda, lambda_norm, d_lambda_norm)

C<i> is the number of the iterations, C<lambda> is an array ref containing
the current lambda vector, C<d_lambda> is an array ref containing the
delta of the lambda vector in current iteration, C<lambda_norm> and
C<d_lambda_norm> are Euclid norms of C<lambda> and C<d_lambda> respectively.

For both L-BFGS and GIS, the client program can also pass a string
C<'verbose'> to C<progress_cb> to use a default progress callback
which simply print out the progress on the screen.

C<progress_cb> can also be omitted if the client program
do not want to trace the progress.

=head3 parameters

The rest entries are parameters for the specified algorithm.
Each parameter will be assigned with its default value when it is not
given explicitly.

For L-BFGS, the parameters will be directly passed to
L<Algorithm::LBFGS> object, please refer to L<Algorithm::LBFGS/Parameters>
for details.

For GIS, there is only one parameter C<epsilon>, which controls the
precision of the algorithm (similar to the C<epsilon> in
L<Algorithm::LBFGS>). Generally speaking, a smaller C<epsilon> produces
a more precise result. The default value of C<epsilon> is 1e-3.

=head2 smoother

The smoother is a solution to the over-fitting problem. 
This property chooses which type of smoother the client program want to
apply and sets the smoothing parameters. 

Only one smoother have been implemented in this version of the module, 
the Gaussian smoother.

One can apply the Gaussian smoother as following,

  my $me = AI::MaxEntropy->new(
      smoother => { type => 'gaussian', sigma => 0.6 }
  );

The parameter C<sigma> indicates the strength of smoothing.
Usually, sigma is a positive number no greater than 1.0.
The strength of smoothing grows as sigma getting close to 0.

=head1 SEE ALSO

L<AI::MaxEntropy::Model>, L<AI::MaxEntropy::Util>

L<Algorithm::LBFGS>

L<Statistics::MaxEntropy>, L<Algorithm::CRF>, L<Algorithm::SVM>,
L<AI::DecisionTree>

=head1 AUTHOR

Laye Suen, E<lt>laye@cpan.orgE<gt>

=head1 COPYRIGHT AND LICENSE



( run in 0.470 second using v1.01-cache-2.11-cpan-96521ef73a4 )