data results from the CPAN

data

AI-Calibrate

view release on metacpan or search on metacpan

Revision history for Perl extension AI::Calibrate.

1.5   Fri Aug 3 2012
      - Changes to ./t/AI-Calibrate-1.t to let it pass with almost-equal
        numbers.

1.4   Thu Aug 2 2012
      - Revised calibration algorithm based on bug
      - Updated tests in ./t
      - Added ./t/AI-Calibrate-KL.t using Kun Liu's dataset.
      - Added ./t/AI-Calibrate-pathologies.t to test for pathological cases.

1.3   Fri Nov 4
      - Removed dependency on Test::Deep, added explicit declaration of
        dependency on Test::More to Makefile.PL

1.2   Thu Nov 3
      - Fixed test ./t/AI-Calibrate-NB.t so that test wouldn't fail.  Used to
        call is_deeply, which was failing on slight differences between
        floating point numbers.  Now compares with a small tolerance.

MANIFEST view on Meta::CPAN

Changes
Makefile.PL
MANIFEST
README
t/AI-Calibrate-1.t
t/AI-Calibrate-pathologies.t
t/AI-Calibrate-NB.t
t/AI-Calibrate-KL.t
lib/AI/Calibrate.pm
META.yml                                 Module YAML meta-data (added by MakeMaker)
META.json                                Module JSON meta-data (added by MakeMaker)

lib/AI/Calibrate.pm view on Meta::CPAN

=cut

=head1 FUNCTIONS

=over 4

=item B<calibrate>

This is the main calibration function.  The calling form is:

my $calibrated = calibrate( $data, $sorted);

$data looks like: C<[ [score, class], [score, class], [score, class]...]>
Each score is a number.  Each class is either 0 (negative class) or 1
(positive class).

$sorted is boolean (0 by default) indicating whether the data are already
sorted by score.  Unless this is set to 1, calibrate() will sort the data
itself.

Calibrate returns a reference to an ordered list of references:

  [ [score, prob], [score, prob], [score, prob] ... ]

Scores will be in descending numerical order.  See the DESCRIPTION section for
how this structure is interpreted.  You can pass this structure to the
B<score_prob> function, along with a new score, to get a probability.

=cut

sub calibrate {
    my($data, $sorted) = @_;

    if (DEBUG) {
        print "Original data:\n";
        for my $pair (@$data) {
            my($score, $prob) = @$pair;
            print "($score, $prob)\n";
        }
    }

    #  Copy the data over so PAV can clobber the PROB field
    my $new_data = [ map([@$_], @$data) ];

    #   If not already sorted, sort data decreasing by score
    if (!$sorted) {
        $new_data = [ sort { $b->[SCORE] <=> $a->[SCORE] } @$new_data ];
    }

    PAV($new_data);

    if (DEBUG) {
        print("After PAV, vector is:\n");
        print_vector($new_data);
    }

    my(@result);
    my( $last_prob, $last_score);

    push(@$new_data, [-1e10, 0]);

    for my $pair (@$new_data) {
        print "Seeing @$pair\n" if DEBUG;
        my($score, $prob) = @$pair;
        if (defined($last_prob) and $prob < $last_prob) {
            print("Pushing [$last_score, $last_prob]\n") if DEBUG;
            push(@result, [$last_score, $last_prob] );
        }
        $last_prob = $prob;
        $last_score = $score;
    }

lib/AI/Calibrate.pm view on Meta::CPAN

For further information on the PAV algorithm, you can read the section in my
paper referenced below.

=head1 EXPORT

This module exports three functions: calibrate, score_prob and print_mapping.

=head1 BUGS

None known.  This implementation is straightforward but inefficient (its time
is O(n^2) in the length of the data series).  A linear time algorithm is
known, and in a later version of this module I'll probably implement it.

=head1 SEE ALSO

The AI::NaiveBayes1 perl module.

My paper "PAV and the ROC Convex Hull" has a good discussion of the PAV
algorithm, including examples:
L<http://home.comcast.net/~tom.fawcett/public_html/papers/PAV-ROCCH-dist.pdf>

( run in 0.406 second using v1.01-cache-2.11-cpan-8d75d55dd25 )