AI-Categorizer
view release on metacpan or search on metacpan
- Added the 'stopword_behavior' parameter for controlling how
stopword lists and stemming interact. Previously, if stopwords &
stemming were both used, stopwords were assumed to be pre-stemmed,
which often isn't the case.
- parse() is now an instance method of the Document class, not a
class method. This means it can operate directly on an object, it
doesn't have to return a hash of content. This allows more
flexible document parsing. This may cause some backward
compatibility problems if people were overriding the parse()
method.
- Added a parse_handle() method, which can parse a document directly
from a filehandle.
- Fixed documentation for add_hypothesis() [spotted by Thierry
Guillotin]
- Added documentation for the AI::Categorizer::Collection::Files
class.
0.04 Thu Nov 7 19:27:15 AEST 2002
- Added learners for SVMs, Decision Trees, and a pass-through to
Weka.
lib/AI/Categorizer/Collection/DBI.pm view on Meta::CPAN
sub new {
my $class = shift;
my $self = $class->SUPER::new(@_);
die "Must provide 'dbh' or 'connection_string' arguments"
unless $self->{dbh} or $self->{connection_string};
unless ($self->{dbh}) {
$self->{dbh} = DBI->connect($self->{connection_string}, '', '', {RaiseError => 1})
or die DBI->errstr;
delete $self->{connection_string};
}
$self->rewind;
return $self;
}
sub dbh { shift()->{dbh} }
sub rewind {
lib/AI/Categorizer/Document.pm view on Meta::CPAN
indicates that no biasing should be done.
=item categories
A reference to an array of Category objects that this document belongs
to. Optional.
=item stopwords
A list/hash of features (words) that should be ignored when parsing
document content. A hash reference is preferred, with the features as
the keys. If you pass an array reference containing the features, it
will be converted to a hash reference internally.
=item use_features
A Feature Vector specifying the only features that should be
considered when parsing this document. This is an alternative to
using C<stopwords>.
=item stemming
lib/AI/Categorizer/Learner.pm view on Meta::CPAN
hypothesis => {
class => 'AI::Categorizer::Hypothesis',
delayed => 1,
},
experiment => {
class => 'AI::Categorizer::Experiment',
delayed => 1,
},
);
# Subclasses must override these virtual methods:
sub get_scores;
sub create_model;
# Optional virtual method for on-line learning:
sub add_knowledge;
sub verbose {
my $self = shift;
if (@_) {
$self->{verbose} = shift;
lib/AI/Categorizer/Learner.pm view on Meta::CPAN
=item train()
=item train(knowledge_set => $k)
Trains the categorizer. This prepares it for later use in
categorizing documents. The C<knowledge_set> parameter must provide
an object of the class C<AI::Categorizer::KnowledgeSet> (or a subclass
thereof), populated with lots of documents and categories. See
L<AI::Categorizer::KnowledgeSet> for the details of how to create such
an object. If you provided a C<knowledge_set> parameter to C<new()>,
specifying one here will override it.
=item categorize($document)
Returns an C<AI::Categorizer::Hypothesis> object representing the
categorizer's "best guess" about which categories the given document
should be assigned to. See L<AI::Categorizer::Hypothesis> for more
details on how to use this object.
=item categorize_collection(collection => $collection)
( run in 0.957 second using v1.01-cache-2.11-cpan-49f99fa48dc )