AI-Calibrate
view release on metacpan or search on metacpan
Revision history for Perl extension AI::Calibrate.
1.5 Fri Aug 3 2012
- Changes to ./t/AI-Calibrate-1.t to let it pass with almost-equal
numbers.
1.4 Thu Aug 2 2012
- Revised calibration algorithm based on bug
- Updated tests in ./t
- Added ./t/AI-Calibrate-KL.t using Kun Liu's dataset.
- Added ./t/AI-Calibrate-pathologies.t to test for pathological cases.
1.3 Fri Nov 4
- Removed dependency on Test::Deep, added explicit declaration of
dependency on Test::More to Makefile.PL
1.2 Thu Nov 3
- Fixed test ./t/AI-Calibrate-NB.t so that test wouldn't fail. Used to
call is_deeply, which was failing on slight differences between
floating point numbers. Now compares with a small tolerance.
Changes
Makefile.PL
MANIFEST
README
t/AI-Calibrate-1.t
t/AI-Calibrate-pathologies.t
t/AI-Calibrate-NB.t
t/AI-Calibrate-KL.t
lib/AI/Calibrate.pm
META.yml Module YAML meta-data (added by MakeMaker)
META.json Module JSON meta-data (added by MakeMaker)
lib/AI/Calibrate.pm view on Meta::CPAN
=cut
=head1 FUNCTIONS
=over 4
=item B<calibrate>
This is the main calibration function. The calling form is:
my $calibrated = calibrate( $data, $sorted);
$data looks like: C<[ [score, class], [score, class], [score, class]...]>
Each score is a number. Each class is either 0 (negative class) or 1
(positive class).
$sorted is boolean (0 by default) indicating whether the data are already
sorted by score. Unless this is set to 1, calibrate() will sort the data
itself.
Calibrate returns a reference to an ordered list of references:
[ [score, prob], [score, prob], [score, prob] ... ]
Scores will be in descending numerical order. See the DESCRIPTION section for
how this structure is interpreted. You can pass this structure to the
B<score_prob> function, along with a new score, to get a probability.
=cut
sub calibrate {
my($data, $sorted) = @_;
if (DEBUG) {
print "Original data:\n";
for my $pair (@$data) {
my($score, $prob) = @$pair;
print "($score, $prob)\n";
}
}
# Copy the data over so PAV can clobber the PROB field
my $new_data = [ map([@$_], @$data) ];
# If not already sorted, sort data decreasing by score
if (!$sorted) {
$new_data = [ sort { $b->[SCORE] <=> $a->[SCORE] } @$new_data ];
}
PAV($new_data);
if (DEBUG) {
print("After PAV, vector is:\n");
print_vector($new_data);
}
my(@result);
my( $last_prob, $last_score);
push(@$new_data, [-1e10, 0]);
for my $pair (@$new_data) {
print "Seeing @$pair\n" if DEBUG;
my($score, $prob) = @$pair;
if (defined($last_prob) and $prob < $last_prob) {
print("Pushing [$last_score, $last_prob]\n") if DEBUG;
push(@result, [$last_score, $last_prob] );
}
$last_prob = $prob;
$last_score = $score;
}
lib/AI/Calibrate.pm view on Meta::CPAN
For further information on the PAV algorithm, you can read the section in my
paper referenced below.
=head1 EXPORT
This module exports three functions: calibrate, score_prob and print_mapping.
=head1 BUGS
None known. This implementation is straightforward but inefficient (its time
is O(n^2) in the length of the data series). A linear time algorithm is
known, and in a later version of this module I'll probably implement it.
=head1 SEE ALSO
The AI::NaiveBayes1 perl module.
My paper "PAV and the ROC Convex Hull" has a good discussion of the PAV
algorithm, including examples:
L<http://home.comcast.net/~tom.fawcett/public_html/papers/PAV-ROCCH-dist.pdf>
( run in 0.406 second using v1.01-cache-2.11-cpan-8d75d55dd25 )