Algorithm-VSM

 view release on metacpan or  search on metacpan

lib/Algorithm/VSM.pm  view on Meta::CPAN

or

    calculate_similarity_matrix_for_all_normalized_docs.pl

The former uses regular document vectors for calculating the similarity between every
pair of documents in the corpus. And the latter uses normalized document vectors for
the same purpose.  The document order used for row and column indexing of the matrix
corresponds to the alphabetic ordering of the document names in the corpus directory.

=back


=head1 EXPORT

None by design.

=head1 SO THAT YOU DO NOT LOSE RELEVANCY JUDGMENTS

You have to be careful when carrying out Precision verses Recall calculations if you
do not wish to lose the previously created relevancy judgments. Invoking the method
C<estimate_doc_relevancies()> in your own script will cause the file C<relevancy.txt>
to be overwritten.  If you have created a relevancy database and stored it in a file
called, say, C<relevancy.txt>, you should make a backup copy of this file before
executing a script that calls C<estimate_doc_relevancies()>.

=head1 BUGS

Please notify the author if you encounter any bugs.  When sending email, please place
the string 'VSM' in the subject line to get past my spam filter.

=head1 INSTALLATION

Download the archive from CPAN in any directory of your choice.  Unpack the archive
with a command that on a Linux machine would look like:

    tar zxvf Algorithm-VSM-1.70.tar.gz

This will create an installation directory for you whose name will be
C<Algorithm-VSM-1.70>.  Enter this directory and execute the following commands for a
standard install of the module if you have root privileges:

    perl Makefile.PL
    make
    make test
    sudo make install

If you do not have root privileges, you can carry out a non-standard install the
module in any directory of your choice by:

    perl Makefile.PL prefix=/some/other/directory/
    make
    make test
    make install

With a non-standard install, you may also have to set your PERL5LIB environment
variable so that this module can find the required other modules. How you do that
would depend on what platform you are working on.  In order to install this module in
a Linux machine on which I use tcsh for the shell, I set the PERL5LIB environment
variable by

    setenv PERL5LIB /some/other/directory/lib64/perl5/:/some/other/directory/share/perl5/

If I used bash, I'd need to declare:

    export PERL5LIB=/some/other/directory/lib64/perl5/:/some/other/directory/share/perl5/


=head1 THANKS

Many thanks are owed to Shivani Rao and Bunyamin Sisman for sharing with me their
deep insights in IR.  Version 1.4 was prompted by Zahn Bozanic's interest in
similarity matrix characterization of a corpus. Thanks, Zahn!  

Several of the recent changes to the module are a result of the feedback I have
received from Naveen Kulkarni of Infosys Labs. Thanks, Naveen!

Version 1.62 was a result of Slaven Rezic's recommendation that I remove the Perl
version restriction on the module since he was able to run it with Perl version
5.8.9.  Another important reason for v. 1.62 was the discovery of the two bugs
mentioned in Changes, one of them brought to my attention by Naveen Kulkarni.

=head1 AUTHOR

The author, Avinash Kak, recently finished a 17-year long "Objects Trilogy" project
with the publication of the book "B<Designing with Objects>" by John-Wiley.  If
interested, check out his web page at Purdue to find out what the Objects Trilogy
project was all about.  You might like "B<Designing with Objects>" especially if you
enjoyed reading Harry Potter as a kid (or even as an adult, for that matter).  The
other two books in the trilogy are "B<Programming with Objects>" and "B<Scripting
with Objects>".

For all issues related to this module, contact the author at C<kak@purdue.edu>

If you send email, please place the string "VSM" in your subject line to get past the
author's spam filter.

=head1 COPYRIGHT

This library is free software; you can redistribute it and/or modify it under the
same terms as Perl itself.

 Copyright 2015 Avinash Kak

=cut




( run in 0.433 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )