Algorithm-VSM

 view release on metacpan or  search on metacpan

lib/Algorithm/VSM.pm  view on Meta::CPAN

modeling and retrieval process.  So you are very likely to get faster performance
with retrieval based on LSA modeling, especially if you store the model once
constructed in a database file on the disk and carry out retrievals using the
disk-based model.


=head1 CAN THIS MODULE BE USED FOR GENERAL TEXT RETRIEVAL?

This module has only been tested for software retrieval.  For more general text
retrieval, you would need to replace the simple stemmer used in the module by one
based on, say, Porter's Stemming Algorithm.  You would also need to vastly expand the
list of stop words appropriate to the text corpora of interest to you. As previously
mentioned, the stop words are the commonly occurring words that do not carry much
discriminatory power from the standpoint of distinguishing between the documents.
See the file 'stop_words.txt' in the 'examples' directory for how such a file must be
formatted.


=head1 HOW DOES ONE DEAL WITH VERY LARGE LIBRARIES/CORPORA?

It is not uncommon for large software libraries to consist of tens of thousands of



( run in 1.988 second using v1.01-cache-2.11-cpan-5b529ec07f3 )