AI-Categorizer
view release on metacpan or search on metacpan
lib/AI/Categorizer/Learner/NaiveBayes.pm view on Meta::CPAN
package AI::Categorizer::Learner::NaiveBayes;
use strict;
use AI::Categorizer::Learner;
use base qw(AI::Categorizer::Learner);
use Params::Validate qw(:types);
use Algorithm::NaiveBayes;
__PACKAGE__->valid_params
(
threshold => {type => SCALAR, default => 0.3},
);
sub create_model {
my $self = shift;
my $m = $self->{model} = Algorithm::NaiveBayes->new;
foreach my $d ($self->knowledge_set->documents) {
$m->add_instance(attributes => $d->features->as_hash,
label => [ map $_->name, $d->categories ]);
}
$m->train;
}
sub get_scores {
my ($self, $newdoc) = @_;
return ($self->{model}->predict( attributes => $newdoc->features->as_hash ),
$self->{threshold});
}
sub threshold {
my $self = shift;
$self->{threshold} = shift if @_;
return $self->{threshold};
}
sub save_state {
my $self = shift;
local $self->{knowledge_set}; # Don't need the knowledge_set to categorize
$self->SUPER::save_state(@_);
}
sub categories {
my $self = shift;
return map AI::Categorizer::Category->by_name( name => $_ ), $self->{model}->labels;
}
1;
__END__
=head1 NAME
AI::Categorizer::Learner::NaiveBayes - Naive Bayes Algorithm For AI::Categorizer
=head1 SYNOPSIS
use AI::Categorizer::Learner::NaiveBayes;
# Here $k is an AI::Categorizer::KnowledgeSet object
my $nb = new AI::Categorizer::Learner::NaiveBayes(...parameters...);
$nb->train(knowledge_set => $k);
$nb->save_state('filename');
... time passes ...
$nb = AI::Categorizer::Learner::NaiveBayes->restore_state('filename');
my $c = new AI::Categorizer::Collection::Files( path => ... );
while (my $document = $c->next) {
my $hypothesis = $nb->categorize($document);
print "Best assigned category: ", $hypothesis->best_category, "\n";
print "All assigned categories: ", join(', ', $hypothesis->categories), "\n";
}
=head1 DESCRIPTION
This is an implementation of the Naive Bayes decision-making
algorithm, applied to the task of document categorization (as defined
by the AI::Categorizer module). See L<AI::Categorizer> for a complete
description of the interface.
This module is now a wrapper around the stand-alone
C<Algorithm::NaiveBayes> module. I moved the discussion of Bayes'
Theorem into that module's documentation.
=head1 METHODS
This class inherits from the C<AI::Categorizer::Learner> class, so all
of its methods are available unless explicitly mentioned here.
=head2 new()
Creates a new Naive Bayes Learner and returns it. In addition to the
parameters accepted by the C<AI::Categorizer::Learner> class, the
Naive Bayes subclass accepts the following parameters:
=over 4
=item * threshold
Sets the score threshold for category membership. The default is
currently 0.3. Set the threshold lower to assign more categories per
( run in 1.920 second using v1.01-cache-2.11-cpan-39bf76dae61 )