AI-Calibrate

 view release on metacpan or  search on metacpan

lib/AI/Calibrate.pm  view on Meta::CPAN


 my $points = [
              [.9, 1],
              [.8, 1],
              [.7, 0],
              [.6, 1],
              [.55, 1],
              [.5, 1],
              [.45, 0],
              [.4, 1],
              [.35, 1],
              [.3, 0 ],
              [.27, 1],
              [.2, 0 ],
              [.18, 0],
              [.1, 1 ],
              [.02, 0]
             ];

If you then call calibrate($points), it will return this structure:

 [
   [.9,    1 ],
   [.7,  3/4 ],
   [.45, 2/3 ],
   [.3,  1/2 ],
   [.2,  1/3 ],
   [.02,   0 ]
  ]

This means that, given a SCORE produced by the classifier, you can map the
SCORE onto a probability like this:

               SCORE >= .9        prob = 1
         .9  > SCORE >= .7        prob = 3/4
         .7  > SCORE >= .45       prob = 2/3
         .45 > SCORE >= .3        prob = 3/4
         .2  > SCORE >= .7        prob = 3/4
         .02 > SCORE              prob = 0

For a realistic example of classifier calibration, see the test file
t/AI-Calibrate-NB.t, which uses the AI::NaiveBayes1 module to train a Naive
Bayes classifier then calibrates it using this module.

=cut

=head1 FUNCTIONS

=over 4

=item B<calibrate>

This is the main calibration function.  The calling form is:

my $calibrated = calibrate( $data, $sorted);

$data looks like: C<[ [score, class], [score, class], [score, class]...]>
Each score is a number.  Each class is either 0 (negative class) or 1
(positive class).

$sorted is boolean (0 by default) indicating whether the data are already
sorted by score.  Unless this is set to 1, calibrate() will sort the data
itself.

Calibrate returns a reference to an ordered list of references:

  [ [score, prob], [score, prob], [score, prob] ... ]

Scores will be in descending numerical order.  See the DESCRIPTION section for
how this structure is interpreted.  You can pass this structure to the
B<score_prob> function, along with a new score, to get a probability.

=cut

sub calibrate {
    my($data, $sorted) = @_;

    if (DEBUG) {
        print "Original data:\n";
        for my $pair (@$data) {
            my($score, $prob) = @$pair;
            print "($score, $prob)\n";
        }
    }

    #  Copy the data over so PAV can clobber the PROB field
    my $new_data = [ map([@$_], @$data) ];

    #   If not already sorted, sort data decreasing by score
    if (!$sorted) {
        $new_data = [ sort { $b->[SCORE] <=> $a->[SCORE] } @$new_data ];
    }

    PAV($new_data);

    if (DEBUG) {
        print("After PAV, vector is:\n");
        print_vector($new_data);
    }

    my(@result);
    my( $last_prob, $last_score);

    push(@$new_data, [-1e10, 0]);

    for my $pair (@$new_data) {
        print "Seeing @$pair\n" if DEBUG;
        my($score, $prob) = @$pair;
        if (defined($last_prob) and $prob < $last_prob) {
            print("Pushing [$last_score, $last_prob]\n") if DEBUG;
            push(@result, [$last_score, $last_prob] );
        }
        $last_prob = $prob;
        $last_score = $score;
    }

    return \@result;
}


sub PAV {



( run in 1.248 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )