Algorithm-VSM
view release on metacpan or search on metacpan
lib/Algorithm/VSM.pm view on Meta::CPAN
or
calculate_similarity_matrix_for_all_normalized_docs.pl
The former uses regular document vectors for calculating the similarity between every
pair of documents in the corpus. And the latter uses normalized document vectors for
the same purpose. The document order used for row and column indexing of the matrix
corresponds to the alphabetic ordering of the document names in the corpus directory.
=back
=head1 EXPORT
None by design.
=head1 SO THAT YOU DO NOT LOSE RELEVANCY JUDGMENTS
You have to be careful when carrying out Precision verses Recall calculations if you
do not wish to lose the previously created relevancy judgments. Invoking the method
C<estimate_doc_relevancies()> in your own script will cause the file C<relevancy.txt>
to be overwritten. If you have created a relevancy database and stored it in a file
called, say, C<relevancy.txt>, you should make a backup copy of this file before
executing a script that calls C<estimate_doc_relevancies()>.
=head1 BUGS
Please notify the author if you encounter any bugs. When sending email, please place
the string 'VSM' in the subject line to get past my spam filter.
=head1 INSTALLATION
Download the archive from CPAN in any directory of your choice. Unpack the archive
with a command that on a Linux machine would look like:
tar zxvf Algorithm-VSM-1.70.tar.gz
This will create an installation directory for you whose name will be
C<Algorithm-VSM-1.70>. Enter this directory and execute the following commands for a
standard install of the module if you have root privileges:
perl Makefile.PL
make
make test
sudo make install
If you do not have root privileges, you can carry out a non-standard install the
module in any directory of your choice by:
perl Makefile.PL prefix=/some/other/directory/
make
make test
make install
With a non-standard install, you may also have to set your PERL5LIB environment
variable so that this module can find the required other modules. How you do that
would depend on what platform you are working on. In order to install this module in
a Linux machine on which I use tcsh for the shell, I set the PERL5LIB environment
variable by
setenv PERL5LIB /some/other/directory/lib64/perl5/:/some/other/directory/share/perl5/
If I used bash, I'd need to declare:
export PERL5LIB=/some/other/directory/lib64/perl5/:/some/other/directory/share/perl5/
=head1 THANKS
Many thanks are owed to Shivani Rao and Bunyamin Sisman for sharing with me their
deep insights in IR. Version 1.4 was prompted by Zahn Bozanic's interest in
similarity matrix characterization of a corpus. Thanks, Zahn!
Several of the recent changes to the module are a result of the feedback I have
received from Naveen Kulkarni of Infosys Labs. Thanks, Naveen!
Version 1.62 was a result of Slaven Rezic's recommendation that I remove the Perl
version restriction on the module since he was able to run it with Perl version
5.8.9. Another important reason for v. 1.62 was the discovery of the two bugs
mentioned in Changes, one of them brought to my attention by Naveen Kulkarni.
=head1 AUTHOR
The author, Avinash Kak, recently finished a 17-year long "Objects Trilogy" project
with the publication of the book "B<Designing with Objects>" by John-Wiley. If
interested, check out his web page at Purdue to find out what the Objects Trilogy
project was all about. You might like "B<Designing with Objects>" especially if you
enjoyed reading Harry Potter as a kid (or even as an adult, for that matter). The
other two books in the trilogy are "B<Programming with Objects>" and "B<Scripting
with Objects>".
For all issues related to this module, contact the author at C<kak@purdue.edu>
If you send email, please place the string "VSM" in your subject line to get past the
author's spam filter.
=head1 COPYRIGHT
This library is free software; you can redistribute it and/or modify it under the
same terms as Perl itself.
Copyright 2015 Avinash Kak
=cut
( run in 0.433 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )