AI-Categorizer
view release on metacpan or search on metacpan
Build.PL
Changes
doc/classes-overview.png
doc/classes.png
eg/categorizer
eg/demo.pl
eg/easy_guesser.pl
INSTALL
lib/AI/Categorizer.pm
lib/AI/Categorizer/Category.pm
lib/AI/Categorizer/Collection.pm
lib/AI/Categorizer/Collection/DBI.pm
lib/AI/Categorizer/Collection/Files.pm
lib/AI/Categorizer/Collection/InMemory.pm
lib/AI/Categorizer/Collection/SingleFile.pm
lib/AI/Categorizer/Document.pm
lib/AI/Categorizer/Document/SMART.pm
lib/AI/Categorizer/Util.pm
Makefile.PL
MANIFEST
README
t/01-naive_bayes.t
t/02-experiment.t
t/03-weka.t
t/04-decision_tree.t
t/05-svm.t
t/06-knn.t
t/07-guesser.t
t/09-rocchio.t
t/10-tools.t
t/11-feature_vector.t
t/12-hypothesis.t
t/13-document.t
t/14-collection.t
t/15-knowledge_set.t
t/common.pl
t/traindocs/doc1
t/traindocs/doc2
determining its category, but they are usually some version of the "most
important" words in the document. A list of features and their weights in
each document is encapsulated by the "AI::Categorizer::FeatureVector" class.
You may think of this class as roughly analogous to a Perl hash, where the
keys are the names of features and the values are their weights.
Hypotheses
The result of asking a categorizer to categorize a previously unseen
document is called a hypothesis, because it is some kind of "statistical
guess" of what categories this document should be assigned to. Since you may
be interested in any of several pieces of information about the hypothesis
(for instance, which categories were assigned, which category was the single
most likely category, the scores assigned to each category, etc.), the
hypothesis is returned as an object of the "AI::Categorizer::Hypothesis"
class, and you can use its object methods to get information about the
hypothesis. See its class documentation for the details.
Experiments
The "AI::Categorizer::Experiment" class helps you organize the results of
lib/AI/Categorizer.pm view on Meta::CPAN
some version of the "most important" words in the document. A list of
features and their weights in each document is encapsulated by the
C<AI::Categorizer::FeatureVector> class. You may think of this class
as roughly analogous to a Perl hash, where the keys are the names of
features and the values are their weights.
=head2 Hypotheses
The result of asking a categorizer to categorize a previously unseen
document is called a hypothesis, because it is some kind of
"statistical guess" of what categories this document should be
assigned to. Since you may be interested in any of several pieces of
information about the hypothesis (for instance, which categories were
assigned, which category was the single most likely category, the
scores assigned to each category, etc.), the hypothesis is returned as
an object of the C<AI::Categorizer::Hypothesis> class, and you can use
its object methods to get information about the hypothesis. See its
class documentation for the details.
=head2 Experiments
lib/AI/Categorizer/Learner.pm view on Meta::CPAN
categorizing documents. The C<knowledge_set> parameter must provide
an object of the class C<AI::Categorizer::KnowledgeSet> (or a subclass
thereof), populated with lots of documents and categories. See
L<AI::Categorizer::KnowledgeSet> for the details of how to create such
an object. If you provided a C<knowledge_set> parameter to C<new()>,
specifying one here will override it.
=item categorize($document)
Returns an C<AI::Categorizer::Hypothesis> object representing the
categorizer's "best guess" about which categories the given document
should be assigned to. See L<AI::Categorizer::Hypothesis> for more
details on how to use this object.
=item categorize_collection(collection => $collection)
Categorizes every document in a collection and returns an Experiment
object representing the results. Note that the Experiment does not
contain knowledge of the assigned categories for every document, only
a statistical summary of the results.
lib/AI/Categorizer/Learner/DecisionTree.pm view on Meta::CPAN
Trains the categorizer. This prepares it for later use in
categorizing documents. The C<knowledge_set> parameter must provide
an object of the class C<AI::Categorizer::KnowledgeSet> (or a subclass
thereof), populated with lots of documents and categories. See
L<AI::Categorizer::KnowledgeSet> for the details of how to create such
an object.
=head2 categorize($document)
Returns an C<AI::Categorizer::Hypothesis> object representing the
categorizer's "best guess" about which categories the given document
should be assigned to. See L<AI::Categorizer::Hypothesis> for more
details on how to use this object.
=head2 save_state($path)
Saves the categorizer for later use. This method is inherited from
C<AI::Categorizer::Storable>.
=head1 AUTHOR
lib/AI/Categorizer/Learner/Guesser.pm view on Meta::CPAN
return (\%scores, 0.5);
}
1;
__END__
=head1 NAME
AI::Categorizer::Learner::Guesser - Simple guessing based on class probabilities
=head1 SYNOPSIS
use AI::Categorizer::Learner::Guesser;
# Here $k is an AI::Categorizer::KnowledgeSet object
my $l = new AI::Categorizer::Learner::Guesser;
$l->train(knowledge_set => $k);
$l->save_state('filename');
lib/AI/Categorizer/Learner/Guesser.pm view on Meta::CPAN
$l = AI::Categorizer::Learner->restore_state('filename');
my $c = new AI::Categorizer::Collection::Files( path => ... );
while (my $document = $c->next) {
my $hypothesis = $l->categorize($document);
print "Best assigned category: ", $hypothesis->best_category, "\n";
print "All assigned categories: ", join(', ', $hypothesis->categories), "\n";
}
=head1 DESCRIPTION
This implements a simple category guesser that makes assignments based
solely on the prior probabilities of categories. For instance, if 5%
of the training documents belong to a certain category, then the
probability of any test document being assigned to that category is
0.05. This can be useful for providing baseline scores to compare
with other more sophisticated algorithms.
See L<AI::Categorizer> for a complete description of the interface.
=head1 METHODS
lib/AI/Categorizer/Learner/KNN.pm view on Meta::CPAN
Trains the categorizer. This prepares it for later use in
categorizing documents. The C<knowledge_set> parameter must provide
an object of the class C<AI::Categorizer::KnowledgeSet> (or a subclass
thereof), populated with lots of documents and categories. See
L<AI::Categorizer::KnowledgeSet> for the details of how to create such
an object.
=head2 categorize($document)
Returns an C<AI::Categorizer::Hypothesis> object representing the
categorizer's "best guess" about which categories the given document
should be assigned to. See L<AI::Categorizer::Hypothesis> for more
details on how to use this object.
=head2 save_state($path)
Saves the categorizer for later use. This method is inherited from
C<AI::Categorizer::Storable>.
=head1 AUTHOR
lib/AI/Categorizer/Learner/NaiveBayes.pm view on Meta::CPAN
Trains the categorizer. This prepares it for later use in
categorizing documents. The C<knowledge_set> parameter must provide
an object of the class C<AI::Categorizer::KnowledgeSet> (or a subclass
thereof), populated with lots of documents and categories. See
L<AI::Categorizer::KnowledgeSet> for the details of how to create such
an object.
=head2 categorize($document)
Returns an C<AI::Categorizer::Hypothesis> object representing the
categorizer's "best guess" about which categories the given document
should be assigned to. See L<AI::Categorizer::Hypothesis> for more
details on how to use this object.
=head2 save_state($path)
Saves the categorizer for later use. This method is inherited from
C<AI::Categorizer::Storable>.
=head1 CALCULATIONS
lib/AI/Categorizer/Learner/SVM.pm view on Meta::CPAN
Trains the categorizer. This prepares it for later use in
categorizing documents. The C<knowledge_set> parameter must provide
an object of the class C<AI::Categorizer::KnowledgeSet> (or a subclass
thereof), populated with lots of documents and categories. See
L<AI::Categorizer::KnowledgeSet> for the details of how to create such
an object.
=head2 categorize($document)
Returns an C<AI::Categorizer::Hypothesis> object representing the
categorizer's "best guess" about which categories the given document
should be assigned to. See L<AI::Categorizer::Hypothesis> for more
details on how to use this object.
=head2 save_state($path)
Saves the categorizer for later use. This method is inherited from
C<AI::Categorizer::Storable>.
=head1 AUTHOR
lib/AI/Categorizer/Learner/Weka.pm view on Meta::CPAN
Trains the categorizer. This prepares it for later use in
categorizing documents. The C<knowledge_set> parameter must provide
an object of the class C<AI::Categorizer::KnowledgeSet> (or a subclass
thereof), populated with lots of documents and categories. See
L<AI::Categorizer::KnowledgeSet> for the details of how to create such
an object.
=head2 categorize($document)
Returns an C<AI::Categorizer::Hypothesis> object representing the
categorizer's "best guess" about which categories the given document
should be assigned to. See L<AI::Categorizer::Hypothesis> for more
details on how to use this object.
=head2 save_state($path)
Saves the categorizer for later use. This method is inherited from
C<AI::Categorizer::Storable>.
=head1 AUTHOR
( run in 0.310 second using v1.01-cache-2.11-cpan-702932259ff )