AI-Categorizer

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

   MANIFEST, so they weren't included in the 0.06 distribution.
   [Spotted by Zoltan Barta]

0.06 Tue Apr 22 10:27:26 CDT 2003

 - Added a relatively simple example script at the request of several
   people, at eg/demo.pl

 - Forgot to note a dependency on Algorithm::NaiveBayes in version
   0.05.  Fixed.

 - AI::Categorizer class wasn't loading AI::Categorizer::KnowledgeSet
   class.  Fixed.

 - Fixed a bug in which the 'documents' and 'categories' parameters to
   the KnowledgeSet objects were never accepted, claiming that it
   failed the "All are Document objects" or "All are Category objects"
   callbacks. [Spotted by rob@phraud.org]

 - Moved the 'stopword_file' parameter from Categorizer.pm to the
   Collection class.

0.05  Sat Mar 29 00:38:21 CST 2003

 - Feature selection is now handled by an abstract FeatureSelector
   framework class.  Currently the only concrete subclass implemented
   is FeatureSelector::DocFrequency.  The 'feature_selection'
   parameter has been replaced with a 'feature_selector_class'
   parameter.

 - Added a k-Nearest-Neighbor machine learner. [First revision
   implemented by David Bell]

 - Added a Rocchio machine learner. [Partially implemented by Xiaobo
   Li]

 - Added a "Guesser" machine learner which simply uses overall class
   probabilities to make categorization decisions.  Sometimes useful
   for providing a set of baseline scores against which to evaluate
   other machine learners.

 - The NaiveBayes learner is now a wrapper around my new
   Algorithm::NaiveBayes module, which is just the old NaiveBayes code
   from here, turned into its own standalone module.

 - Much more extensive regression testing of the code.

 - Added a Document subclass for XML documents. [Implemented by
   Jae-Moon Lee] Its interface is still unstable, it may change in
   later releases.

 - Added a 'Build.PL' file for an alternate installation method using
   Module::Build.

 - Fixed a problem in the Hypothesis' best_category() method that
   would often result in the wrong category being reported.  Added a
   regression test to exercise the Hypothesis class.  [Spotted by
   Xiaobo Li]

 - The 'categorizer' script now records more useful benchmarking
   information about time & memory in its outfile.

 - The AI::Categorizer->dump_parameters() method now tries to avoid
   showing you its entire list of stopwords.

 - Document objects now use a default 'name' if none is supplied.

 - For some Learner classes, the generated Hypothesis objects had
   non-functioning all_categories() methods.  Fixed.

 - The Collection::Files class now uses File::Spec internally to
   manage cross-platform filenames.

 - Added the 'stopword_behavior' parameter for controlling how
   stopword lists and stemming interact.  Previously, if stopwords &
   stemming were both used, stopwords were assumed to be pre-stemmed,
   which often isn't the case.

 - parse() is now an instance method of the Document class, not a
   class method.  This means it can operate directly on an object, it
   doesn't have to return a hash of content.  This allows more
   flexible document parsing.  This may cause some backward
   compatibility problems if people were overriding the parse()
   method.

 - Added a parse_handle() method, which can parse a document directly
   from a filehandle.

 - Fixed documentation for add_hypothesis() [spotted by Thierry
   Guillotin]

 - Added documentation for the AI::Categorizer::Collection::Files
   class.

0.04  Thu Nov  7 19:27:15 AEST 2002

 - Added learners for SVMs, Decision Trees, and a pass-through to
   Weka.

 - Added a virtual class for binary classifiers.

 - Wrote documentation for lots of the undocumented classes.

 - Added a PNG file giving an overview diagram of the classes.

 - Added a script 'categorizer' to provide a simple command-line
   interface to AI::Categorizer

 - save_state() and restore_state() now save to a directory, not a
   file.

 - Removed F1(), precision(), recall(), etc. from Util package since
   they're in Statistics::Contingency.  Added random_elements() to
   Util.

 - Collection::Files now warns when no category information is known
   about a document in the collection (knowing it's in zero categories
   is okay).

 - Added the Collection::InMemory class



( run in 1.929 second using v1.01-cache-2.11-cpan-39bf76dae61 )