RePrec

 view release on metacpan or  search on metacpan

lib/RePrec.pm  view on Meta::CPAN

B<RePrec> is an abstract class for computing recall precision curves.
Subclasses implement different recall-precision curve interpretation
measures. Theoretical background is given in detail by the
I<Information Retrieval Lecture Notes> by Norbert Fuhr (chapter 3,
Evaluation). Web address:
F<http://ls6-www.cs.uni-dortmund.de/ir/teaching/>.

=head1 METHODS

=over

=cut #---------------------------------------------------------------#


package RePrec;


use Carp;
use IO::File;
require RePrec::Tools;


our $VERSION;
'$Name: release_0_32 $ 0_0' =~ /(\d+)[-_](\d+)/; $VERSION = sprintf '%d.%03d', $1, $2;


## public ############################################################

=pod #---------------------------------------------------------------#

=item $rp = RePrec::<Subclass>->new($distribution)

constructor. Takes as argument a distribution. $distribution is a
reference to an array containing a two element array reference for
each rank (top most rank first). The first element within the
references contains the number of relevant documents while the second
one contains the number of non-relevant documents.

=cut #---------------------------------------------------------------#

sub new {

  my $proto = shift;
  my $class = ref($proto) || $proto;
  my $self  = {};

  my $distribution = shift;
  croak 'distribution: wrong format' unless ref $distribution eq 'ARRAY';

  my($rels, $nrels) = (0, 0);
  foreach (@$distribution) {
    croak 'distribution: wrong format' unless ref $_ eq 'ARRAY' and @$_ == 2;
    $rels += $_->[0];
    $nrels += $_->[1];
  }

  $self->{rels}    = $rels;
  $self->{nrels}   = $nrels;
  $self->{numdocs} = $rels + $nrels;

  bless $self => $class;

  $self->_sortrfdata($distribution);

  $self;
}


=pod #---------------------------------------------------------------#

=item $visual = $rp->visual

returns a textual representation of the searchresult.

=cut #---------------------------------------------------------------#

sub visual {

  my $self = shift;
  return $self->{resultstring};
}


=cut #---------------------------------------------------------------#

=item ($graph, $average) = $rp->calculate([$points])

calculates precision values for $points. $points may be an integer
(specifying for how many recall points precision is to be computed),
an reference to a list of recall points, the string I<smart> (implying
the recall points 0.25, 0.50, and 0.75), the string I<trec> (implying
recall points 0, 0.1, 0.2, ..., 1), or the string I<rank> (implying
one recall point computed after each rank). If argument $points is
omitted precision will be computed for ten recall points (i. e., 0.1,
0.2, ..., 1).

As a result you get a list of (recall, precision) pairs (array of
array references with two elements each) and the averaged precision
(over all recall points computed).

=cut #---------------------------------------------------------------#

sub calculate {

  my $self = shift;
  my $points = shift;

  return undef unless $self->{rels};

  $points = 10 unless defined $points;

  # calculate recall points for which precision is to be computed
  my @points;
  if ($points =~ /rank/i) {
    # calculate precision at the end of each rank
    @points = 'rank';
  } elsif ($points =~ /smart/i) {
    # calculate precision for smart default recall points
    @points = (0.25, 0.50, 0.75);
  } elsif ($points =~ /trec/i) {
    # calculate precision for smart default recall points



( run in 1.245 second using v1.01-cache-2.11-cpan-0bb4e1dffa6 )