view release on metacpan or search on metacpan
Revision history for Perl extension AI::MaxEntropy.
0.20 Thu Mar 6 20:20:00 2008
- Implemented GIS for ME model learning, now the client program can
any of the two algorithm (GIS and L-BFGS) for learning
- More optimization for speed
- $self->{optimizer} changed to $self->{algorithm}
- all_features changed to all_x
- Some modification on the documentation to avoid the ambiguity of
the concept 'feature'
0.11 Sat Feb 16 17:27:00 2008
- Optimize the XS code, now the the function 'learn' should run
at least twice faster than 0.10 version
- New functions in AI::MaxEntropy::Util, which allows the client
program manipulate samples more flexibly
- Replace Test::Differences with is_deeply in Test::More
0.10 Wed Feb 13 16:56:00 2008
- rewrite the log likelihood evaluation and smoothing by C, now
my $me = AI::MaxEntropy->new;
# the learner see 2 red round smooth apples
$me->see(['round', 'smooth', 'red'] => 'apple' => 2);
# the learner see 3 yellow long smooth bananas
$me->see(['long', 'smooth', 'yellow'] => 'banana' => 3);
# and more
# samples needn't have the same numbers of active features
$me->see(['rough', 'big'] => 'pomelo');
# the order of active features is not concerned, too
$me->see(['big', 'rough'] => 'pomelo');
# ...
# and, let it learn
my $model = $me->learn;
# then, we can make predictions on unseen data
# ask what a red thing is most likely to be
# ask what a smooth thing is most likely to be
print $model->predict(['smooth'])."\n";
# the answer is banana, because the learner have seen more smooth bananas
# (weighted 3) than smooth apples (weighted 2)
# ask what a red, long thing is most likely to be
print $model->predict(['red', 'long'])."\n";
# the answer is banana, because the learner have seen more long bananas
# (weighted 3) than red apples (weighted 2)
# print out scores of all possible answers to the feature round and red
for ($model->all_labels) {
my $s = $model->score(['round', 'red'] => $_);
print "$_: $s\n";
}
# save the model
$model->save('model_file');
# load the model
$model->load('model_file');
Theorically, a ME learner try to recover the real probability
distribution of the data based on limited number of observations, by
applying the principle of maximum entropy.
You can find some good tutorials on Maximum Entropy model here:
<http://homepages.inf.ed.ac.uk/s0450736/maxent.html>
Features
Generally, a feature is a binary function answers a yes-no question on a
specified piece of data.
For examples,
"Is it a red apple?"
"Is it a yellow banana?"
If the answer is yes, we say this feature is active on that piece of
data.
In practise, a feature is usually represented as a tuple "<x, y>". For
examples, the above two features can be represented as
<red, apple>
<yellow, banana>
Samples
A sample is a set of active features, all of which share a common "y".
This common "y" is sometimes called label or tag. For example, we have a
big round red apple, the correpsonding sample is
{<big, apple>, <round, apple>, <red, apple>}
In this module, a samples is denoted in Perl code as
$xs => $y => $w
$xs is an array ref holding all "x", $y is a scalar holding the label
Actually, the two samples above are converted internally to,
$me->see(['color:yellow', 'shape:long'] => 'banana');
$me->see(['color:red', 'color:green', 'shape:round'] => 'apple');
forget_all
Forget all samples the learner have seen previously.
cut
Cut the features that occur less than the specified number.
For example,
...
$me->cut(1)
will cut all features that occur less than one time.
learn
Learn a model from all the samples that the learner have seen so far,
returns an AI::MaxEntropy::Model object, which can be used to make
prediction on unlabeled samples.
...
my $model = $me->learn;
This property enables client program to choose different algorithms for
learning the ME model and set their parameters.
There are mainly 3 algorithm for learning ME models, they are GIS, IIS
and L-BFGS. This module implements 2 of them, namely, L-BFGS and GIS.
L-BFGS provides full functionality, while GIS runs faster, but only
applicable on limited scenarios.
To use GIS, the following conditions must be satisified:
1. All samples have same number of active features
2. No feature has been cut
3. No smoother is used (in fact, the property "smoother" is simplly
ignored when the type of algorithm equal to 'gis').
This property "algorithm" is supposed to be a hash ref, like
{
type => ...,
progress_cb => ...,
param_1 => ...,
inc/Module/AutoInstall.pm view on Meta::CPAN
use strict;
use Cwd ();
use ExtUtils::MakeMaker ();
use vars qw{$VERSION};
BEGIN {
$VERSION = '1.03';
}
# special map on pre-defined feature sets
my %FeatureMap = (
'' => 'Core Features', # XXX: deprecated
'-core' => 'Core Features',
);
# various lexical flags
my ( @Missing, @Existing, %DisabledTests, $UnderCPAN, $HasCPANPLUS );
my ( $Config, $CheckOnly, $SkipInstall, $AcceptDefault, $TestOnly );
my ( $PostambleActions, $PostambleUsed );
inc/Module/AutoInstall.pm view on Meta::CPAN
map {
ref($_)
? ( ( ref($_) eq 'HASH' ) ? keys(%$_) : @{$_} )
: ''
}
map { +{@args}->{$_} }
grep { /^[^\-]/ or /^-core$/i } keys %{ +{@args} }
)[0]
);
while ( my ( $feature, $modules ) = splice( @args, 0, 2 ) ) {
my ( @required, @tests, @skiptests );
my $default = 1;
my $conflict = 0;
if ( $feature =~ m/^-(\w+)$/ ) {
my $option = lc($1);
# check for a newer version of myself
_update_to( $modules, @_ ) and return if $option eq 'version';
# sets CPAN configuration options
$Config = $modules if $option eq 'config';
# promote every features to core status
$core_all = ( $modules =~ /^all$/i ) and next
if $option eq 'core';
next unless $option eq 'core';
}
print "[" . ( $FeatureMap{ lc($feature) } || $feature ) . "]\n";
$modules = [ %{$modules} ] if UNIVERSAL::isa( $modules, 'HASH' );
unshift @$modules, -default => &{ shift(@$modules) }
if ( ref( $modules->[0] ) eq 'CODE' ); # XXX: bugward combatability
while ( my ( $mod, $arg ) = splice( @$modules, 0, 2 ) ) {
if ( $mod =~ m/^-(\w+)$/ ) {
my $option = lc($1);
inc/Module/AutoInstall.pm view on Meta::CPAN
$DisabledTests{$_} = 1 for map { glob($_) } @skiptests;
}
else {
print "missing." . ( $arg ? " (would need $arg)" : '' ) . "\n";
push @required, $mod => $arg;
}
}
next unless @required;
my $mandatory = ( $feature eq '-core' or $core_all );
if (
!$SkipInstall
and (
$CheckOnly
or _prompt(
qq{==> Auto-install the }
. ( @required / 2 )
. ( $mandatory ? ' mandatory' : ' optional' )
. qq{ module(s) from CPAN?},
inc/Module/AutoInstall.pm view on Meta::CPAN
no strict 'refs';
$PostambleUsed = 0;
local *MY::postamble = \&postamble unless defined &MY::postamble;
ExtUtils::MakeMaker::WriteMakefile(%args);
print << "." unless $PostambleUsed;
*** WARNING: Makefile written with customized MY::postamble() without
including contents from Module::AutoInstall::postamble() --
auto installation features disabled. Please contact the author.
.
return 1;
}
sub postamble {
$PostambleUsed = 1;
return << ".";
inc/Module/Install/AutoInstall.pm view on Meta::CPAN
my @config = @_;
# We'll need Module::AutoInstall
$self->include('Module::AutoInstall');
require Module::AutoInstall;
Module::AutoInstall->import(
(@config ? (-config => \@config) : ()),
(@core ? (-core => \@core) : ()),
$self->features,
);
$self->makemaker_args( Module::AutoInstall::_make_args() );
my $class = ref($self);
$self->postamble(
"# --- $class section:\n" .
Module::AutoInstall::postamble()
);
}
inc/Module/Install/Metadata.pm view on Meta::CPAN
require Module::Build;
my $build = Module::Build->new(
dist_name => $self->name,
dist_version => $self->version,
license => $self->license,
);
$self->provides(%{ $build->find_dist_packages || {} });
}
sub feature {
my $self = shift;
my $name = shift;
my $features = ( $self->{values}{features} ||= [] );
my $mods;
if ( @_ == 1 and ref( $_[0] ) ) {
# The user used ->feature like ->features by passing in the second
# argument as a reference. Accomodate for that.
$mods = $_[0];
} else {
$mods = \@_;
}
my $count = 0;
push @$features, (
$name => [
map {
ref($_) ? ( ref($_) eq 'HASH' ) ? %$_
: @$_
: $_
} @$mods
]
);
return @$features;
}
sub features {
my $self = shift;
while ( my ( $name, $mods ) = splice( @_, 0, 2 ) ) {
$self->feature( $name, @$mods );
}
return $self->{values}->{features}
? @{ $self->{values}->{features} }
: ();
}
sub no_index {
my $self = shift;
my $type = shift;
push @{ $self->{values}{no_index}{$type} }, @_ if $type;
return $self->{values}{no_index};
}
lib/AI/MaxEntropy.pm view on Meta::CPAN
my $me = AI::MaxEntropy->new;
# the learner see 2 red round smooth apples
$me->see(['round', 'smooth', 'red'] => 'apple' => 2);
# the learner see 3 yellow long smooth bananas
$me->see(['long', 'smooth', 'yellow'] => 'banana' => 3);
# and more
# samples needn't have the same numbers of active features
$me->see(['rough', 'big'] => 'pomelo');
# the order of active features is not concerned, too
$me->see(['big', 'rough'] => 'pomelo');
# ...
# and, let it learn
my $model = $me->learn;
# then, we can make predictions on unseen data
# ask what a red thing is most likely to be
lib/AI/MaxEntropy.pm view on Meta::CPAN
# ask what a smooth thing is most likely to be
print $model->predict(['smooth'])."\n";
# the answer is banana, because the learner have seen more smooth bananas
# (weighted 3) than smooth apples (weighted 2)
# ask what a red, long thing is most likely to be
print $model->predict(['red', 'long'])."\n";
# the answer is banana, because the learner have seen more long bananas
# (weighted 3) than red apples (weighted 2)
# print out scores of all possible answers to the feature round and red
for ($model->all_labels) {
my $s = $model->score(['round', 'red'] => $_);
print "$_: $s\n";
}
# save the model
$model->save('model_file');
# load the model
$model->load('model_file');
lib/AI/MaxEntropy.pm view on Meta::CPAN
Theorically, a ME learner try to recover the real probability distribution
of the data based on limited number of observations, by applying the
principle of maximum entropy.
You can find some good tutorials on Maximum Entropy model here:
L<http://homepages.inf.ed.ac.uk/s0450736/maxent.html>
=head2 Features
Generally, a feature is a binary function answers a yes-no question on a
specified piece of data.
For examples,
"Is it a red apple?"
"Is it a yellow banana?"
If the answer is yes,
we say this feature is active on that piece of data.
In practise, a feature is usually represented as
a tuple C<E<lt>x, yE<gt>>. For examples, the above two features can be
represented as
<red, apple>
<yellow, banana>
=head2 Samples
A sample is a set of active features, all of which share a common C<y>.
This common C<y> is sometimes called label or tag.
For example, we have a big round red apple, the correpsonding sample is
{<big, apple>, <round, apple>, <red, apple>}
In this module, a samples is denoted in Perl code as
$xs => $y => $w
C<$xs> is an array ref holding all C<x>,
lib/AI/MaxEntropy.pm view on Meta::CPAN
$me->see(['color:yellow', 'shape:long'] => 'banana');
$me->see(['color:red', 'color:green', 'shape:round'] => 'apple');
=head2 forget_all
Forget all samples the learner have seen previously.
=head2 cut
Cut the features that occur less than the specified number.
For example,
...
$me->cut(1)
will cut all features that occur less than one time.
=head2 learn
Learn a model from all the samples that the learner have seen so far,
returns an L<AI::MaxEntropy::Model> object, which can be used to make
prediction on unlabeled samples.
...
my $model = $me->learn;
lib/AI/MaxEntropy.pm view on Meta::CPAN
This property enables client program to choose different algorithms for
learning the ME model and set their parameters.
There are mainly 3 algorithm for learning ME models, they are GIS, IIS and
L-BFGS. This module implements 2 of them, namely, L-BFGS and GIS.
L-BFGS provides full functionality, while GIS runs faster, but only
applicable on limited scenarios.
To use GIS, the following conditions must be satisified:
1. All samples have same number of active features
2. No feature has been cut
3. No smoother is used (in fact, the property L</smoother> is simplly
ignored when the type of algorithm equal to 'gis').
This property C<algorithm> is supposed to be a hash ref, like
{
type => ...,
progress_cb => ...,
param_1 => ...,
should make any changes to your code, and whether any special defines
should be used, F<ppport.h> can be run as a Perl script to check your
source code. Simply say:
perl ppport.h
The result will usually be a list of patches suggesting changes
that should at least be acceptable, if not necessarily the most
efficient solution, or a fix for all possible problems.
If you know that your XS module uses features only available in
newer Perl releases, if you're aware that it uses C++ comments,
and if you want all suggestions as a single patch file, you could
use something like this:
perl ppport.h --compat-version=5.6.0 --cplusplus --patch=test.diff
If you only want your code to be scanned without any suggestions
for changes, use:
perl ppport.h --nochanges