AI-Categorizer

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN


 - Added the 'stopword_behavior' parameter for controlling how
   stopword lists and stemming interact.  Previously, if stopwords &
   stemming were both used, stopwords were assumed to be pre-stemmed,
   which often isn't the case.

 - parse() is now an instance method of the Document class, not a
   class method.  This means it can operate directly on an object, it
   doesn't have to return a hash of content.  This allows more
   flexible document parsing.  This may cause some backward
   compatibility problems if people were overriding the parse()
   method.

 - Added a parse_handle() method, which can parse a document directly
   from a filehandle.

 - Fixed documentation for add_hypothesis() [spotted by Thierry
   Guillotin]

 - Added documentation for the AI::Categorizer::Collection::Files
   class.

0.04  Thu Nov  7 19:27:15 AEST 2002

 - Added learners for SVMs, Decision Trees, and a pass-through to
   Weka.

lib/AI/Categorizer/Collection/DBI.pm  view on Meta::CPAN


sub new {
  my $class = shift;
  my $self = $class->SUPER::new(@_);
  
  die "Must provide 'dbh' or 'connection_string' arguments"
    unless $self->{dbh} or $self->{connection_string};
  
  unless ($self->{dbh}) {
    $self->{dbh} = DBI->connect($self->{connection_string}, '', '', {RaiseError => 1})
      or die DBI->errstr;
    delete $self->{connection_string};
  }
  
  $self->rewind;
  return $self;
}

sub dbh { shift()->{dbh} }

sub rewind {

lib/AI/Categorizer/Document.pm  view on Meta::CPAN

indicates that no biasing should be done.

=item categories

A reference to an array of Category objects that this document belongs
to.  Optional.

=item stopwords

A list/hash of features (words) that should be ignored when parsing
document content.  A hash reference is preferred, with the features as
the keys.  If you pass an array reference containing the features, it
will be converted to a hash reference internally.

=item use_features

A Feature Vector specifying the only features that should be
considered when parsing this document.  This is an alternative to
using C<stopwords>.

=item stemming

lib/AI/Categorizer/Learner.pm  view on Meta::CPAN

   hypothesis => {
		  class => 'AI::Categorizer::Hypothesis',
		  delayed => 1,
		 },
   experiment => {
		  class => 'AI::Categorizer::Experiment',
		  delayed => 1,
		 },
  );

# Subclasses must override these virtual methods:
sub get_scores;
sub create_model;

# Optional virtual method for on-line learning:
sub add_knowledge;

sub verbose {
  my $self = shift;
  if (@_) {
    $self->{verbose} = shift;

lib/AI/Categorizer/Learner.pm  view on Meta::CPAN

=item train()

=item train(knowledge_set => $k)

Trains the categorizer.  This prepares it for later use in
categorizing documents.  The C<knowledge_set> parameter must provide
an object of the class C<AI::Categorizer::KnowledgeSet> (or a subclass
thereof), populated with lots of documents and categories.  See
L<AI::Categorizer::KnowledgeSet> for the details of how to create such
an object.  If you provided a C<knowledge_set> parameter to C<new()>,
specifying one here will override it.

=item categorize($document)

Returns an C<AI::Categorizer::Hypothesis> object representing the
categorizer's "best guess" about which categories the given document
should be assigned to.  See L<AI::Categorizer::Hypothesis> for more
details on how to use this object.

=item categorize_collection(collection => $collection)



( run in 0.664 second using v1.01-cache-2.11-cpan-49f99fa48dc )