AI-NaiveBayes
view release on metacpan or search on metacpan
my $total = 0;
my $max = max(values %$scores);
foreach (values %$scores) {
$_ = exp($_ - $max);
$total += $_**2;
}
$total = sqrt($total);
foreach (values %$scores) {
$_ /= $total;
}
}
__PACKAGE__->meta->make_immutable;
1;
__END__
# ABSTRACT: A Bayesian classifier
=encoding utf8
=head1 SYNOPSIS
# AI::NaiveBayes objects are created by AI::NaiveBayes::Learner
# but for quick start you can use the 'train' class method
# that is a shortcut using default AI::NaiveBayes::Learner settings
my $classifier = AI::NaiveBayes->train(
{
attributes => {
sheep => 1, very => 1, valuable => 1, farming => 1
},
labels => ['farming']
},
{
attributes => {
vampires => 1, cannot => 1, see => 1, their => 1,
images => 1, mirrors => 1
},
labels => ['vampire']
},
);
# Classify a feature vector
my $result = $classifier->classify({bar => 3, blurp => 2});
# $result is now a AI::NaiveBayes::Classification object
my $best_category = $result->best_category;
=head1 DESCRIPTION
This module implements the classic "Naive Bayes" machine learning
algorithm. This is a low level class that accepts only pre-computed feature-vectors
as input, see L<AI::Classifier::Text> for a text classifier that uses
this class.
Creation of C<AI::NaiveBayes> classifier object out of training
data is done by L<AI::NaiveBayes::Learner>. For quick start
you can use the limited C<train> class method that trains the
classifier in a default way.
The classifier object is immutable.
It is a well-studied probabilistic algorithm often used in
automatic text categorization. Compared to other algorithms (kNN,
SVM, Decision Trees), it's pretty fast and reasonably competitive in
the quality of its results.
A paper by Fabrizio Sebastiani provides a really good introduction to
text categorization:
L<http://faure.iei.pi.cnr.it/~fabrizio/Publications/ACMCS02.pdf>
=head1 METHODS
=over 4
=item new( model => $model )
Internal. See L<AI::NaiveBayes::Learner> to learn how to create a C<AI::NaiveBayes>
classifier from training data.
=item train( LIST of HASHREFS )
Shortcut for creating a trained classifier using L<AI::NaiveBayes::Learner> default
settings.
Arguments are passed to the C<add_example> method of the L<AI::NaiveBayes::Learner>
object one by one.
=item classify( HASHREF )
Classifies a feature-vector of the form:
{ feature1 => weight1, feature2 => weight2, ... }
The result is a C<AI::NaiveBayes::Classification> object.
=item rescale
Internal
=back
=head1 ATTRIBUTES
=over 4
=item model
Internal
=back
=head1 THEORY
Bayes' Theorem is a way of inverting a conditional probability. It
states:
P(y|x) P(x)
P(x|y) = -------------
P(y)
The notation C<P(x|y)> means "the probability of C<x> given C<y>." See also
L<"http://mathforum.org/dr.math/problems/battisfore.03.22.99.html">
for a simple but complete example of Bayes' Theorem.
In this case, we want to know the probability of a given category given a
certain string of words in a document, so we have:
P(words | cat) P(cat)
P(cat | words) = --------------------
P(words)
We have applied Bayes' Theorem because C<P(cat | words)> is a difficult
quantity to compute directly, but C<P(words | cat)> and C<P(cat)> are accessible
(see below).
The greater the expression above, the greater the probability that the given
document belongs to the given category. So we want to find the maximum
value. We write this as
( run in 2.848 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )