AI-Categorizer
    
    
  
  
  
view release on metacpan or search on metacpan
   gone away.
 - The building & installing process now uses Module::Build rather
   than ExtUtils::MakeMaker.
 - When the features_kept mechanism was used to explicitly state the
   features to use, and the scan_first parameter was left as its
   default value, the features_kept mechanism would silently fail to
   do anything.  This has now been fixed. [Spotted by Arnaud Gaudinat]
 - Recent versions of Weka have changed the name of the SVM class, so
   I've updated it in our test (t/03-weka.t) of the Weka wrapper
   too. [Sebastien Aperghis-Tramoni]
0.07  Tue May  6 16:15:04 CDT 2003
 - Oops - eg/demo.pl and t/15-knowledge_set.t didn't make it into the
   MANIFEST, so they weren't included in the 0.06 distribution.
   [Spotted by Zoltan Barta]
0.06 Tue Apr 22 10:27:26 CDT 2003
   for providing a set of baseline scores against which to evaluate
   other machine learners.
 - The NaiveBayes learner is now a wrapper around my new
   Algorithm::NaiveBayes module, which is just the old NaiveBayes code
   from here, turned into its own standalone module.
 - Much more extensive regression testing of the code.
 - Added a Document subclass for XML documents. [Implemented by
   Jae-Moon Lee] Its interface is still unstable, it may change in
   later releases.
 - Added a 'Build.PL' file for an alternate installation method using
   Module::Build.
 - Fixed a problem in the Hypothesis' best_category() method that
   would often result in the wrong category being reported.  Added a
   regression test to exercise the Hypothesis class.  [Spotted by
   Xiaobo Li]
 - Added save_features() and restore_features() to KnowledgeSet.
 - Added default categories() and categorize() methods to Learner base
   class.  get_scores() is now abstract.
 - Extended interface of ObjectSet class with retrieve(), includes(),
   and includes_name().
 - Moved 'term_weighting' parameter from Document to KnowledgeSet,
   since the normalized version needs to know the maximum
   term-frequency.  Also changed its values to 'n', 'l', 'b', and 't',
   with 'x' a synonym for 't'.
 - Implemented full range of TF/IDF term weighting methods (see Salton
   & Buckley, "Term Weighting Approaches in Automatic Text Retrieval",
   in journal "Information Processing & Management", 1988 #5)
0.03  Wed Jul 24 01:57:00 AEST 2002
 - First version released to CPAN
        classes), or any class that *they* create. This is managed by the
        "Class::Container" module, so see its documentation for the details of
        how this works.
        The specific parameters accepted here are:
        progress_file
            A string that indicates a place where objects will be saved during
            several of the methods of this class. The default value is the
            string "save", which means files like "save-01-knowledge_set" will
            get created. The exact names of these files may change in future
            releases, since they're just used internally to resume where we last
            left off.
        verbose
            If true, a few status messages will be printed during execution.
        training_set
            Specifies the "path" parameter that will be fed to the
            KnowledgeSet's "scan_features()" and "read()" methods during our
            "scan_features()" and "read_training_set()" methods.
    stats_table()
        Returns the value of the Experiment's (as created by
        "evaluate_test_set()") "stats_table()" method. This is a string that
        shows various statistics about the accuracy/precision/recall/F1/etc. of
        the assignments made during testing.
HISTORY
    This module is a revised and redesigned version of the previous
    "AI::Categorize" module by the same author. Note the added 'r' in the new
    name. The older module has a different interface, and no attempt at backward
    compatibility has been made - that's why I changed the name.
    You can have both "AI::Categorize" and "AI::Categorizer" installed at the
    same time on the same machine, if you want. They don't know about each other
    or use conflicting namespaces.
AUTHOR
    Ken Williams <ken@mathforum.org>
    Discussion about this module can be directed to the perl-AI list at
    <perl-ai@perl.org>. For more info about the list, see
eg/categorizer view on Meta::CPAN
  my $result = $c->stats_table;
  print $result if $c->verbose;
  print $out_fh $result if $out_fh;
}
sub run_section {
  my ($section, $stage, $do_stage) = @_;
  return unless $do_stage->{$stage};
  if (keys %$do_stage > 1) {
    print " % $0 @ARGV -$stage\n" if $c->verbose;
    die "$0 is not executable, please change its execution permissions"
      unless -x $0;
    system($0, @ARGV, "-$stage") == 0
      or die "$0 returned nonzero status, \$?=$?";
    return;
  }
  my $start = new Benchmark;
  $c->$section();
  my $end = new Benchmark;
  my $summary = timestr(timediff($end, $start));
  my ($rss, $vsz) = memory_usage();
lib/AI/Categorizer.pm view on Meta::CPAN
The specific parameters accepted here are:
=over 4
=item progress_file
A string that indicates a place where objects will be saved during
several of the methods of this class.  The default value is the string
C<save>, which means files like C<save-01-knowledge_set> will get
created.  The exact names of these files may change in future
releases, since they're just used internally to resume where we last
left off.
=item verbose
If true, a few status messages will be printed during execution.
=item training_set
Specifies the C<path> parameter that will be fed to the KnowledgeSet's
lib/AI/Categorizer.pm view on Meta::CPAN
accuracy/precision/recall/F1/etc. of the assignments made during
testing.
=back
=head1 HISTORY
This module is a revised and redesigned version of the previous
C<AI::Categorize> module by the same author.  Note the added 'r' in
the new name.  The older module has a different interface, and no
attempt at backward compatibility has been made - that's why I changed
the name.
You can have both C<AI::Categorize> and C<AI::Categorizer> installed
at the same time on the same machine, if you want.  They don't know
about each other or use conflicting namespaces.
=head1 AUTHOR
Ken Williams <ken@mathforum.org>
lib/AI/Categorizer/Document.pm view on Meta::CPAN
=item use_features
A Feature Vector specifying the only features that should be
considered when parsing this document.  This is an alternative to
using C<stopwords>.
=item stemming
Indicates the linguistic procedure that should be used to convert
tokens in the document to features.  Possible values are C<none>,
which indicates that the tokens should be used without change, or
C<porter>, indicating that the Porter stemming algorithm should be
applied to each token.  This requires the C<Lingua::Stem> module from
CPAN.
=item stopword_behavior
There are a few ways you might want the stopword list (specified with
the C<stopwords> parameter) to interact with the stemming algorithm
(specified with the C<stemming> parameter).  These options can be
controlled with the C<stopword_behavior> parameter, which can take the
lib/AI/Categorizer/FeatureSelector.pm view on Meta::CPAN
where C<N> is the total number of documents in the collection, and
C<n> is the number of documents in which term C<t> is found.
=item p
Probabilistic inverse document frequency - multiply term C<t>'s value
by C<log((N-n)/n)> (same variable meanings as above).
=item x
No change - multiply by 1.
=back
The third character specifies the "normalization" component, which
can take the following values:
=over 4
=item c
Apply cosine normalization - multiply by 1/length(document_vector).
=item x
No change - multiply by 1.
=back
The three components may alternatively be specified by the
C<term_weighting>, C<collection_weighting>, and C<normalize_weighting>
parameters respectively.
=item verbose
If set to a true value, some status/debugging information will be
lib/AI/Categorizer/KnowledgeSet.pm view on Meta::CPAN
where C<N> is the total number of documents in the collection, and
C<n> is the number of documents in which term C<t> is found.
=item p
Probabilistic inverse document frequency - multiply term C<t>'s value
by C<log((N-n)/n)> (same variable meanings as above).
=item x
No change - multiply by 1.
=back
The third character specifies the "normalization" component, which
can take the following values:
=over 4
=item c
Apply cosine normalization - multiply by 1/length(document_vector).
=item x
No change - multiply by 1.
=back
The three components may alternatively be specified by the
C<term_weighting>, C<collection_weighting>, and C<normalize_weighting>
parameters respectively.
=item verbose
If set to a true value, some status/debugging information will be
( run in 0.673 second using v1.01-cache-2.11-cpan-c333fce770f )