AI-Calibrate

 view release on metacpan or  search on metacpan

lib/AI/Calibrate.pm  view on Meta::CPAN


    my $last_prob = 1.0;

    for my $tuple (@$calibrated) {
        my($bound, $prob) = @$tuple;
        return $prob if $score >= $bound;
        $last_prob = $prob;
    }
    #  If we drop off the end, probability estimate is zero
    return 0;
}


=item B<print_mapping>

This is a simple utility function that takes the structure returned by
B<calibrate> and prints out a simple list of lines describing the mapping
created.

Example calling form:

  print_mapping($calibrated);

Sample output:

  1.00 > SCORE >= 1.00     prob = 1.000
  1.00 > SCORE >= 0.71     prob = 0.667
  0.71 > SCORE >= 0.39     prob = 0.000
  0.39 > SCORE >= 0.00     prob = 0.000

These ranges are not necessarily compressed/optimized, as this sample output
shows.

=back

=cut
sub print_mapping {
    my($calibrated) = @_;
    my $last_bound = 1.0;
    for my $tuple (@$calibrated) {
        my($bound, $prob) = @$tuple;
        printf("%0.3f > SCORE >= %0.3f     prob = %0.3f\n",
               $last_bound, $bound, $prob);
        $last_bound = $bound;
    }
    if ($last_bound != 0) {
        printf("%0.3f > SCORE >= %0.3f     prob = %0.3f\n",
               $last_bound, 0, 0);
    }
}

=head1 DETAILS

The PAV algorithm is conceptually straightforward.  Given a set of training
cases ordered by the scores assigned by the classifier, it first assigns a
probability of one to each positive instance and a probability of zero to each
negative instance, and puts each instance in its own group.  It then looks, at
each iteration, for adjacent violators: adjacent groups whose probabilities
locally increase rather than decrease.  When it finds such groups, it pools
them and replaces their probability estimates with the average of the group's
values.  It continues this process of averaging and replacement until the
entire sequence is monotonically decreasing.  The result is a sequence of
instances, each of which has a score and an associated probability estimate,
which can then be used to map scores into probability estimates.

For further information on the PAV algorithm, you can read the section in my
paper referenced below.

=head1 EXPORT

This module exports three functions: calibrate, score_prob and print_mapping.

=head1 BUGS

None known.  This implementation is straightforward but inefficient (its time
is O(n^2) in the length of the data series).  A linear time algorithm is
known, and in a later version of this module I'll probably implement it.

=head1 SEE ALSO

The AI::NaiveBayes1 perl module.

My paper "PAV and the ROC Convex Hull" has a good discussion of the PAV
algorithm, including examples:
L<http://home.comcast.net/~tom.fawcett/public_html/papers/PAV-ROCCH-dist.pdf>

If you want to read more about the general issue of classifier calibration,
here are some good papers, which are freely available on the web:

I<"Transforming classifier scores into accurate multiclass probability estimates">
by Bianca Zadrozny and Charles Elkan

I<"Predicting Good Probabilities With Supervised Learning">
by A. Niculescu-Mizil and R. Caruana


=head1 AUTHOR

Tom Fawcett, E<lt>tom.fawcett@gmail.comE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2008-2012 by Tom Fawcett

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.8 or,
at your option, any later version of Perl 5 you may have available.

=cut
1;



( run in 1.851 second using v1.01-cache-2.11-cpan-39bf76dae61 )