Algorithm-SVMLight

 view release on metacpan or  search on metacpan

lib/Algorithm/SVMLight.pm  view on Meta::CPAN


sub predict {
  my ($self, %params) = @_;
  for ('attributes') {
    die "Missing required '$_' parameter" unless exists $params{$_};
  }
  
  my (@values, @indices);
  while (my ($key) = each %{ $params{attributes} }) {
    push @indices, $self->{features}{$key} if exists $self->{features}{$key};
  }

  @indices = sort {$a <=> $b} @indices;
  foreach my $i (@indices) {
    push @values, $params{attributes}{ $self->{rfeatures}[$i] };
  }

  # warn "Predicting: (@indices), (@values)\n";
  $self->predict_i(\@indices, \@values);
}

sub add_instance {
  my ($self, %params) = @_;
  for ('attributes', 'label') {
    die "Missing required '$_' parameter" unless exists $params{$_};
  }
  for ($params{label}) {
    die "Label must be a real number, not '$_'" unless /^-?\d+(\.\d+)?$/;
  }
  
  my @values;
  my @indices;
  while (my ($key, $val) = each %{ $params{attributes} }) {
    unless ( exists $self->{features}{$key} ) {
      $self->{features}{$key} = 1 + keys %{ $self->{features} };
      push @{ $self->{rfeatures} }, $key;
    }
    push @indices, $self->{features}{$key};
  }

  @indices = sort { $a <=> $b} @indices;
  foreach my $i (@indices) {
    push @values, $params{attributes}{ $self->{rfeatures}[$i] };
  }

  #warn "Adding document: (@indices), (@values) => $params{label}\n";
  my $id = exists $params{query_id} ? $params{query_id} : 0;
  my $slack = exists $params{slack_id} ? $params{slack_id} : 1;
  my $cost = exists $params{cost_factor} ? $params{cost_factor} : 1.0;
  $self->add_instance_i($params{label}, "", \@indices, \@values, $id, $slack, $cost);
}

sub write_model {
  my ($self, $file) = @_;
  $self->_write_model($file);

  # Write a footer line
  if ( my $numf = keys %{ $self->{features} } ) {
    open my($fh), ">> $file" or die "Can't write footer to $file: $!";
    print $fh ('#rfeatures: [undef, ' ,
	       join( ', ', map _escape($self->{rfeatures}[$_]), 1..$numf ),
	       "]\n");
  }
}

sub read_model {
  my ($self, $file) = @_;
  $self->_read_model($file);

  # Read the footer line
  open my($fh), $file or die "Can't read $file: $!";
  local $_;
  while (<$fh>) {
    next unless /^#rfeatures: (\[.*\])$/;
    my $rf = $self->{rfeatures} = eval $1;
    die $@ if $@;
    $self->{features} = { map {$rf->[$_], $_} 1..$#$rf };
  }
}

sub _escape {
  local $_ = shift;
  s/([\\'])/\\$1/g;
  s/\n/\\n/g;
  s/\r/\\r/g;
  return "'$_'";
}

1;
__END__

=head1 NAME

Algorithm::SVMLight - Perl interface to SVMLight Machine-Learning Package

=head1 SYNOPSIS

  use Algorithm::SVMLight;
  my $s = new Algorithm::SVMLight;
  
  $s->add_instance
    (attributes => {foo => 1, bar => 1, baz => 3},
     label => 1);
  
  $s->add_instance
    (attributes => {foo => 2, blurp => 1},
     label => -1);
  
  ... repeat for several more instances, then:
  $s->train;

  # Find results for unseen instances
  my $result = $s->predict
    (attributes => {bar => 3, blurp => 2});


=head1 DESCRIPTION

This module implements a perl interface to Thorsten Joachims' SVMLight
package:

=over 4

SVMLight is an implementation of Vapnik's Support Vector Machine
[Vapnik, 1995] for the problem of pattern recognition, for the problem
of regression, and for the problem of learning a ranking function. The
optimization algorithms used in SVMlight are described in [Joachims,
2002a ]. [Joachims, 1999a]. The algorithm has scalable memory
requirements and can handle problems with many thousands of support
vectors efficiently.

 -- http://svmlight.joachims.org/

=back

Support Vector Machines in general, and SVMLight specifically,
represent some of the best-performing Machine Learning approaches in
domains such as text categorization, image recognition, bioinformatics
string processing, and others.

For efficiency reasons, the underlying SVMLight engine indexes features by integers, not



( run in 2.351 seconds using v1.01-cache-2.11-cpan-cdf2f3d4e48 )