AI-MaxEntropy

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

Revision history for Perl extension AI::MaxEntropy.

0.20  Thu Mar  6 20:20:00 2008
	- Implemented GIS for ME model learning, now the client program can
	  any of the two algorithm (GIS and L-BFGS) for learning
	- More optimization for speed
	- $self->{optimizer} changed to $self->{algorithm}
	- all_features changed to all_x
	- Some modification on the documentation to avoid the ambiguity of
	  the concept 'feature'

0.11  Sat Feb 16 17:27:00 2008
	- Optimize the XS code, now the the function 'learn' should run 
	  at least twice faster than 0.10 version
	- New functions in AI::MaxEntropy::Util, which allows the client
	  program manipulate samples more flexibly
	- Replace Test::Differences with is_deeply in Test::More

0.10  Wed Feb 13 16:56:00 2008
	- rewrite the log likelihood evaluation and smoothing by C, now

README  view on Meta::CPAN

      my $me = AI::MaxEntropy->new; 
  
      # the learner see 2 red round smooth apples
      $me->see(['round', 'smooth', 'red'] => 'apple' => 2);
  
      # the learner see 3 yellow long smooth bananas
      $me->see(['long', 'smooth', 'yellow'] => 'banana' => 3);

      # and more

      # samples needn't have the same numbers of active features
      $me->see(['rough', 'big'] => 'pomelo');

      # the order of active features is not concerned, too
      $me->see(['big', 'rough'] => 'pomelo');

      # ...

      # and, let it learn
      my $model = $me->learn;

      # then, we can make predictions on unseen data

      # ask what a red thing is most likely to be

README  view on Meta::CPAN

      # ask what a smooth thing is most likely to be
      print $model->predict(['smooth'])."\n";
      # the answer is banana, because the learner have seen more smooth bananas
      # (weighted 3) than smooth apples (weighted 2)

      # ask what a red, long thing is most likely to be
      print $model->predict(['red', 'long'])."\n";
      # the answer is banana, because the learner have seen more long bananas
      # (weighted 3) than red apples (weighted 2)

      # print out scores of all possible answers to the feature round and red
      for ($model->all_labels) {
          my $s = $model->score(['round', 'red'] => $_);
          print "$_: $s\n";
      }
  
      # save the model
      $model->save('model_file');

      # load the model
      $model->load('model_file');

README  view on Meta::CPAN


    Theorically, a ME learner try to recover the real probability
    distribution of the data based on limited number of observations, by
    applying the principle of maximum entropy.

    You can find some good tutorials on Maximum Entropy model here:

    <http://homepages.inf.ed.ac.uk/s0450736/maxent.html>

  Features
    Generally, a feature is a binary function answers a yes-no question on a
    specified piece of data.

    For examples,

      "Is it a red apple?"

      "Is it a yellow banana?"

    If the answer is yes, we say this feature is active on that piece of
    data.

    In practise, a feature is usually represented as a tuple "<x, y>". For
    examples, the above two features can be represented as

      <red, apple>

      <yellow, banana>

  Samples
    A sample is a set of active features, all of which share a common "y".
    This common "y" is sometimes called label or tag. For example, we have a
    big round red apple, the correpsonding sample is

      {<big, apple>, <round, apple>, <red, apple>}

    In this module, a samples is denoted in Perl code as

      $xs => $y => $w

    $xs is an array ref holding all "x", $y is a scalar holding the label

README  view on Meta::CPAN


    Actually, the two samples above are converted internally to,

      $me->see(['color:yellow', 'shape:long'] => 'banana');
      $me->see(['color:red', 'color:green', 'shape:round'] => 'apple');

  forget_all
    Forget all samples the learner have seen previously.

  cut
    Cut the features that occur less than the specified number.

    For example,

      ...
      $me->cut(1)

    will cut all features that occur less than one time.

  learn
    Learn a model from all the samples that the learner have seen so far,
    returns an AI::MaxEntropy::Model object, which can be used to make
    prediction on unlabeled samples.

      ...

      my $model = $me->learn;

README  view on Meta::CPAN

    This property enables client program to choose different algorithms for
    learning the ME model and set their parameters.

    There are mainly 3 algorithm for learning ME models, they are GIS, IIS
    and L-BFGS. This module implements 2 of them, namely, L-BFGS and GIS.
    L-BFGS provides full functionality, while GIS runs faster, but only
    applicable on limited scenarios.

    To use GIS, the following conditions must be satisified:

    1. All samples have same number of active features

    2. No feature has been cut

    3. No smoother is used (in fact, the property "smoother" is simplly
    ignored when the type of algorithm equal to 'gis').

    This property "algorithm" is supposed to be a hash ref, like

      {
        type => ...,
        progress_cb => ...,
        param_1 => ...,

inc/Module/AutoInstall.pm  view on Meta::CPAN


use strict;
use Cwd                 ();
use ExtUtils::MakeMaker ();

use vars qw{$VERSION};
BEGIN {
	$VERSION = '1.03';
}

# special map on pre-defined feature sets
my %FeatureMap = (
    ''      => 'Core Features',    # XXX: deprecated
    '-core' => 'Core Features',
);

# various lexical flags
my ( @Missing, @Existing,  %DisabledTests, $UnderCPAN,     $HasCPANPLUS );
my ( $Config,  $CheckOnly, $SkipInstall,   $AcceptDefault, $TestOnly );
my ( $PostambleActions, $PostambleUsed );

inc/Module/AutoInstall.pm  view on Meta::CPAN

              map  {
                ref($_)
                  ? ( ( ref($_) eq 'HASH' ) ? keys(%$_) : @{$_} )
                  : ''
              }
              map { +{@args}->{$_} }
              grep { /^[^\-]/ or /^-core$/i } keys %{ +{@args} }
        )[0]
    );

    while ( my ( $feature, $modules ) = splice( @args, 0, 2 ) ) {
        my ( @required, @tests, @skiptests );
        my $default  = 1;
        my $conflict = 0;

        if ( $feature =~ m/^-(\w+)$/ ) {
            my $option = lc($1);

            # check for a newer version of myself
            _update_to( $modules, @_ ) and return if $option eq 'version';

            # sets CPAN configuration options
            $Config = $modules if $option eq 'config';

            # promote every features to core status
            $core_all = ( $modules =~ /^all$/i ) and next
              if $option eq 'core';

            next unless $option eq 'core';
        }

        print "[" . ( $FeatureMap{ lc($feature) } || $feature ) . "]\n";

        $modules = [ %{$modules} ] if UNIVERSAL::isa( $modules, 'HASH' );

        unshift @$modules, -default => &{ shift(@$modules) }
          if ( ref( $modules->[0] ) eq 'CODE' );    # XXX: bugward combatability

        while ( my ( $mod, $arg ) = splice( @$modules, 0, 2 ) ) {
            if ( $mod =~ m/^-(\w+)$/ ) {
                my $option = lc($1);

inc/Module/AutoInstall.pm  view on Meta::CPAN

                $DisabledTests{$_} = 1 for map { glob($_) } @skiptests;
            }
            else {
                print "missing." . ( $arg ? " (would need $arg)" : '' ) . "\n";
                push @required, $mod => $arg;
            }
        }

        next unless @required;

        my $mandatory = ( $feature eq '-core' or $core_all );

        if (
            !$SkipInstall
            and (
                $CheckOnly
                or _prompt(
                    qq{==> Auto-install the }
                      . ( @required / 2 )
                      . ( $mandatory ? ' mandatory' : ' optional' )
                      . qq{ module(s) from CPAN?},

inc/Module/AutoInstall.pm  view on Meta::CPAN


    no strict 'refs';

    $PostambleUsed = 0;
    local *MY::postamble = \&postamble unless defined &MY::postamble;
    ExtUtils::MakeMaker::WriteMakefile(%args);

    print << "." unless $PostambleUsed;
*** WARNING: Makefile written with customized MY::postamble() without
    including contents from Module::AutoInstall::postamble() --
    auto installation features disabled.  Please contact the author.
.

    return 1;
}

sub postamble {
    $PostambleUsed = 1;

    return << ".";

inc/Module/Install/AutoInstall.pm  view on Meta::CPAN


    my @config = @_;

    # We'll need Module::AutoInstall
    $self->include('Module::AutoInstall');
    require Module::AutoInstall;

    Module::AutoInstall->import(
        (@config ? (-config => \@config) : ()),
        (@core   ? (-core   => \@core)   : ()),
        $self->features,
    );

    $self->makemaker_args( Module::AutoInstall::_make_args() );

    my $class = ref($self);
    $self->postamble(
        "# --- $class section:\n" .
        Module::AutoInstall::postamble()
    );
}

inc/Module/Install/Metadata.pm  view on Meta::CPAN


    require Module::Build;
    my $build = Module::Build->new(
        dist_name    => $self->name,
        dist_version => $self->version,
        license      => $self->license,
    );
    $self->provides(%{ $build->find_dist_packages || {} });
}

sub feature {
    my $self     = shift;
    my $name     = shift;
    my $features = ( $self->{values}{features} ||= [] );

    my $mods;

    if ( @_ == 1 and ref( $_[0] ) ) {
        # The user used ->feature like ->features by passing in the second
        # argument as a reference.  Accomodate for that.
        $mods = $_[0];
    } else {
        $mods = \@_;
    }

    my $count = 0;
    push @$features, (
        $name => [
            map {
                ref($_) ? ( ref($_) eq 'HASH' ) ? %$_
                                                : @$_
                        : $_
            } @$mods
        ]
    );

    return @$features;
}

sub features {
    my $self = shift;
    while ( my ( $name, $mods ) = splice( @_, 0, 2 ) ) {
        $self->feature( $name, @$mods );
    }
    return $self->{values}->{features}
    	? @{ $self->{values}->{features} }
    	: ();
}

sub no_index {
    my $self = shift;
    my $type = shift;
    push @{ $self->{values}{no_index}{$type} }, @_ if $type;
    return $self->{values}{no_index};
}

lib/AI/MaxEntropy.pm  view on Meta::CPAN

  my $me = AI::MaxEntropy->new; 
  
  # the learner see 2 red round smooth apples
  $me->see(['round', 'smooth', 'red'] => 'apple' => 2);
  
  # the learner see 3 yellow long smooth bananas
  $me->see(['long', 'smooth', 'yellow'] => 'banana' => 3);

  # and more

  # samples needn't have the same numbers of active features
  $me->see(['rough', 'big'] => 'pomelo');

  # the order of active features is not concerned, too
  $me->see(['big', 'rough'] => 'pomelo');

  # ...

  # and, let it learn
  my $model = $me->learn;

  # then, we can make predictions on unseen data

  # ask what a red thing is most likely to be

lib/AI/MaxEntropy.pm  view on Meta::CPAN

  # ask what a smooth thing is most likely to be
  print $model->predict(['smooth'])."\n";
  # the answer is banana, because the learner have seen more smooth bananas
  # (weighted 3) than smooth apples (weighted 2)

  # ask what a red, long thing is most likely to be
  print $model->predict(['red', 'long'])."\n";
  # the answer is banana, because the learner have seen more long bananas
  # (weighted 3) than red apples (weighted 2)

  # print out scores of all possible answers to the feature round and red
  for ($model->all_labels) {
      my $s = $model->score(['round', 'red'] => $_);
      print "$_: $s\n";
  }
  
  # save the model
  $model->save('model_file');

  # load the model
  $model->load('model_file');

lib/AI/MaxEntropy.pm  view on Meta::CPAN

Theorically, a ME learner try to recover the real probability distribution 
of the data based on limited number of observations, by applying the
principle of maximum entropy. 

You can find some good tutorials on Maximum Entropy model here:

L<http://homepages.inf.ed.ac.uk/s0450736/maxent.html>

=head2 Features

Generally, a feature is a binary function answers a yes-no question on a
specified piece of data. 

For examples, 

  "Is it a red apple?"

  "Is it a yellow banana?"

If the answer is yes,
we say this feature is active on that piece of data.

In practise, a feature is usually represented as
a tuple C<E<lt>x, yE<gt>>. For examples, the above two features can be
represented as

  <red, apple>

  <yellow, banana>

=head2 Samples

A sample is a set of active features, all of which share a common C<y>.
This common C<y> is sometimes called label or tag.
For example, we have a big round red apple, the correpsonding sample is 

  {<big, apple>, <round, apple>, <red, apple>}

In this module, a samples is denoted in Perl code as

  $xs => $y => $w

C<$xs> is an array ref holding all C<x>,

lib/AI/MaxEntropy.pm  view on Meta::CPAN


  $me->see(['color:yellow', 'shape:long'] => 'banana');
  $me->see(['color:red', 'color:green', 'shape:round'] => 'apple');

=head2 forget_all

Forget all samples the learner have seen previously.

=head2 cut

Cut the features that occur less than the specified number.

For example, 

  ...
  $me->cut(1)

will cut all features that occur less than one time.

=head2 learn 

Learn a model from all the samples that the learner have seen so far,
returns an L<AI::MaxEntropy::Model> object, which can be used to make
prediction on unlabeled samples.

  ...

  my $model = $me->learn;

lib/AI/MaxEntropy.pm  view on Meta::CPAN

This property enables client program to choose different algorithms for
learning the ME model and set their parameters.

There are mainly 3 algorithm for learning ME models, they are GIS, IIS and
L-BFGS. This module implements 2 of them, namely,  L-BFGS and GIS.
L-BFGS provides full functionality, while GIS runs faster, but only 
applicable on limited scenarios.

To use GIS, the following conditions must be satisified:

1. All samples have same number of active features

2. No feature has been cut

3. No smoother is used (in fact, the property L</smoother> is simplly
ignored when the type of algorithm equal to 'gis').

This property C<algorithm> is supposed to be a hash ref, like

  {
    type => ...,
    progress_cb => ...,
    param_1 => ...,

ppport.h  view on Meta::CPAN

should make any changes to your code, and whether any special defines
should be used, F<ppport.h> can be run as a Perl script to check your
source code. Simply say:

    perl ppport.h

The result will usually be a list of patches suggesting changes
that should at least be acceptable, if not necessarily the most
efficient solution, or a fix for all possible problems.

If you know that your XS module uses features only available in
newer Perl releases, if you're aware that it uses C++ comments,
and if you want all suggestions as a single patch file, you could
use something like this:

    perl ppport.h --compat-version=5.6.0 --cplusplus --patch=test.diff

If you only want your code to be scanned without any suggestions
for changes, use:

    perl ppport.h --nochanges



( run in 0.420 second using v1.01-cache-2.11-cpan-4d50c553e7e )