AI-Calibrate
view release on metacpan or search on metacpan
lib/AI/Calibrate.pm view on Meta::CPAN
my $points = [
[.9, 1],
[.8, 1],
[.7, 0],
[.6, 1],
[.55, 1],
[.5, 1],
[.45, 0],
[.4, 1],
[.35, 1],
[.3, 0 ],
[.27, 1],
[.2, 0 ],
[.18, 0],
[.1, 1 ],
[.02, 0]
];
If you then call calibrate($points), it will return this structure:
[
[.9, 1 ],
[.7, 3/4 ],
[.45, 2/3 ],
[.3, 1/2 ],
[.2, 1/3 ],
[.02, 0 ]
]
This means that, given a SCORE produced by the classifier, you can map the
SCORE onto a probability like this:
SCORE >= .9 prob = 1
.9 > SCORE >= .7 prob = 3/4
.7 > SCORE >= .45 prob = 2/3
.45 > SCORE >= .3 prob = 3/4
.2 > SCORE >= .7 prob = 3/4
.02 > SCORE prob = 0
For a realistic example of classifier calibration, see the test file
t/AI-Calibrate-NB.t, which uses the AI::NaiveBayes1 module to train a Naive
Bayes classifier then calibrates it using this module.
=cut
=head1 FUNCTIONS
=over 4
=item B<calibrate>
This is the main calibration function. The calling form is:
my $calibrated = calibrate( $data, $sorted);
$data looks like: C<[ [score, class], [score, class], [score, class]...]>
Each score is a number. Each class is either 0 (negative class) or 1
(positive class).
$sorted is boolean (0 by default) indicating whether the data are already
sorted by score. Unless this is set to 1, calibrate() will sort the data
itself.
Calibrate returns a reference to an ordered list of references:
[ [score, prob], [score, prob], [score, prob] ... ]
Scores will be in descending numerical order. See the DESCRIPTION section for
how this structure is interpreted. You can pass this structure to the
B<score_prob> function, along with a new score, to get a probability.
=cut
sub calibrate {
my($data, $sorted) = @_;
if (DEBUG) {
print "Original data:\n";
for my $pair (@$data) {
my($score, $prob) = @$pair;
print "($score, $prob)\n";
}
}
# Copy the data over so PAV can clobber the PROB field
my $new_data = [ map([@$_], @$data) ];
# If not already sorted, sort data decreasing by score
if (!$sorted) {
$new_data = [ sort { $b->[SCORE] <=> $a->[SCORE] } @$new_data ];
}
PAV($new_data);
if (DEBUG) {
print("After PAV, vector is:\n");
print_vector($new_data);
}
my(@result);
my( $last_prob, $last_score);
push(@$new_data, [-1e10, 0]);
for my $pair (@$new_data) {
print "Seeing @$pair\n" if DEBUG;
my($score, $prob) = @$pair;
if (defined($last_prob) and $prob < $last_prob) {
print("Pushing [$last_score, $last_prob]\n") if DEBUG;
push(@result, [$last_score, $last_prob] );
}
$last_prob = $prob;
$last_score = $score;
}
return \@result;
}
sub PAV {
( run in 1.248 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )