Algorithm-VSM

 view release on metacpan or  search on metacpan

lib/Algorithm/VSM.pm  view on Meta::CPAN

        $vsm->precision_and_recall_calculator('vsm');
        $vsm->display_precision_vs_recall_for_queries();
        $vsm->display_average_precision_for_queries_and_map();

      Measuring precision and recall requires a set of queries.  These are supplied
      through the constructor parameter 'query_file'.  The format of the this file
      must be according to the sample file 'test_queries.txt' in the 'examples'
      directory.  The module estimates the relevancies of the documents to the
      queries and dumps the relevancies in a file named by the 'relevancy_file'
      constructor parameter.  The constructor parameter 'relevancy_threshold' is used
      to decide which of the documents are considered to be relevant to a query.  A
      document must contain at least the 'relevancy_threshold' occurrences of query
      words in order to be considered relevant to a query.



  # FOR MEASURING PRECISION VERSUS RECALL FOR LSA:

        my $lsa = Algorithm::VSM->new( 
                            break_camelcased_and_underscored  => 1, 
                            case_sensitive         => 0,

lib/Algorithm/VSM.pm  view on Meta::CPAN

        $lsa->display_doc_relevancies();
        $lsa->precision_and_recall_calculator('lsa');
        $lsa->display_precision_vs_recall_for_queries();
        $lsa->display_average_precision_for_queries_and_map();

      We have already explained the purpose of the constructor parameter 'query_file'
      and about the constraints on the format of queries in the file named through
      this parameter.  As mentioned earlier, the module estimates the relevancies of
      the documents to the queries and dumps the relevancies in a file named by the
      'relevancy_file' constructor parameter.  The constructor parameter
      'relevancy_threshold' is used in deciding which of the documents are considered
      to be relevant to a query.  A document must contain at least the
      'relevancy_threshold' occurrences of query words in order to be considered
      relevant to a query.  We have previously explained the role of the constructor
      parameter 'lsa_svd_threshold'.



  # FOR MEASURING PRECISION VERSUS RECALL FOR VSM USING FILE-BASED RELEVANCE JUDGMENTS:

        my $corpus_dir = "corpus";  

lib/Algorithm/VSM.pm  view on Meta::CPAN


=item B<construct_lsa_model():>

You call this subroutine for constructing an LSA model for your corpus
after you have extracted the corpus vocabulary and constructed document
vectors:

    $vsm->construct_lsa_model();

The SVD decomposition that is carried out in LSA model construction uses the
constructor parameter C<lsa_svd_threshold> to decide how many of the singular values
to retain for the LSA model.  A singular is retained only if it is larger than the
C<lsa_svd_threshold> fraction of the largest singular value.


=item B<display_average_precision_for_queries_and_map():>

The Average Precision for a query is the average of the Precision-at-rank values
associated with each of the corpus documents relevant to the query.  The mean of the
Average Precision values for all the queries is the Mean Average Precision (MAP).
The C<Average Precision> values for the queries and the overall C<MAP> can be printed



( run in 1.153 second using v1.01-cache-2.11-cpan-de7293f3b23 )