AI-Categorizer
view release on metacpan or search on metacpan
lib/AI/Categorizer/Learner/Weka.pm view on Meta::CPAN
return unless $self->{model};
my $model_dir = File::Spec->catdir($path, 'models');
mkdir($model_dir, 0777) or die "Couldn't create $model_dir: $!";
while (my ($name, $learner) = each %{$self->{model}{learners}}) {
my $oldpath = File::Spec->catdir($self->{model}{_in_dir}, $learner->{machine_file});
my $newpath = File::Spec->catfile($model_dir, "${name}_model");
File::Copy::copy($oldpath, $newpath);
}
$self->{model}{_in_dir} = $model_dir;
}
sub restore_state {
my ($pkg, $path) = @_;
my $self = $pkg->SUPER::restore_state($path);
my $model_dir = File::Spec->catdir($path, 'models');
return $self unless -e $model_dir;
$self->{model}{_in_dir} = $model_dir;
return $self;
}
1;
__END__
=head1 NAME
AI::Categorizer::Learner::Weka - Pass-through wrapper to Weka system
=head1 SYNOPSIS
use AI::Categorizer::Learner::Weka;
# Here $k is an AI::Categorizer::KnowledgeSet object
my $nb = new AI::Categorizer::Learner::Weka(...parameters...);
$nb->train(knowledge_set => $k);
$nb->save_state('filename');
... time passes ...
$nb = AI::Categorizer::Learner->restore_state('filename');
my $c = new AI::Categorizer::Collection::Files( path => ... );
while (my $document = $c->next) {
my $hypothesis = $nb->categorize($document);
print "Best assigned category: ", $hypothesis->best_category, "\n";
}
=head1 DESCRIPTION
This class doesn't implement any machine learners of its own, it
merely passes the data through to the Weka machine learning system
(http://www.cs.waikato.ac.nz/~ml/weka/). This can give you access to
a collection of machine learning algorithms not otherwise implemented
in C<AI::Categorizer>.
Currently this is a simple command-line wrapper that calls C<java>
subprocesses. In the future this may be converted to an
C<Inline::Java> wrapper for better performance (faster running
times). However, if you're looking for really great performance,
you're probably looking in the wrong place - this Weka wrapper is
intended more as a way to try lots of different machine learning
methods.
=head1 METHODS
This class inherits from the C<AI::Categorizer::Learner> class, so all
of its methods are available unless explicitly mentioned here.
=head2 new()
Creates a new Weka Learner and returns it. In addition to the
parameters accepted by the C<AI::Categorizer::Learner> class, the
Weka subclass accepts the following parameters:
=over 4
=item java_path
Specifies where the C<java> executable can be found on this system.
The default is simply C<java>, meaning that it will search your
C<PATH> to find java.
=item java_args
Specifies a list of any additional arguments to give to the java
process. Commonly it's necessary to allocate more memory than the
default, using an argument like C<-Xmx130MB>.
=item weka_path
Specifies the path to the C<weka.jar> file containing the Weka
bytecode. If Weka has been installed somewhere in your java
C<CLASSPATH>, you needn't specify a C<weka_path>.
=item weka_classifier
Specifies the Weka class to use for a categorizer. The default is
C<weka.classifiers.NaiveBayes>. Consult your Weka documentation for a
list of other classifiers available.
=item weka_args
Specifies a list of any additional arguments to pass to the Weka
classifier class when building the categorizer.
=item tmpdir
A directory in which temporary files will be written when training the
categorizer and categorizing new documents. The default is given by
C<< File::Spec->tmpdir >>.
=back
=head2 train(knowledge_set => $k)
Trains the categorizer. This prepares it for later use in
categorizing documents. The C<knowledge_set> parameter must provide
an object of the class C<AI::Categorizer::KnowledgeSet> (or a subclass
thereof), populated with lots of documents and categories. See
L<AI::Categorizer::KnowledgeSet> for the details of how to create such
an object.
=head2 categorize($document)
Returns an C<AI::Categorizer::Hypothesis> object representing the
categorizer's "best guess" about which categories the given document
should be assigned to. See L<AI::Categorizer::Hypothesis> for more
details on how to use this object.
=head2 save_state($path)
Saves the categorizer for later use. This method is inherited from
C<AI::Categorizer::Storable>.
=head1 AUTHOR
Ken Williams, ken@mathforum.org
=head1 COPYRIGHT
Copyright 2000-2003 Ken Williams. All rights reserved.
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
=head1 SEE ALSO
( run in 1.852 second using v1.01-cache-2.11-cpan-39bf76dae61 )