AI-Calibrate
view release on metacpan or search on metacpan
Revision history for Perl extension AI::Calibrate.
1.5 Fri Aug 3 2012
- Changes to ./t/AI-Calibrate-1.t to let it pass with almost-equal
numbers.
1.4 Thu Aug 2 2012
- Revised calibration algorithm based on bug
- Updated tests in ./t
- Added ./t/AI-Calibrate-KL.t using Kun Liu's dataset.
- Added ./t/AI-Calibrate-pathologies.t to test for pathological cases.
1.3 Fri Nov 4
- Removed dependency on Test::Deep, added explicit declaration of
dependency on Test::More to Makefile.PL
1.2 Thu Nov 3
- Fixed test ./t/AI-Calibrate-NB.t so that test wouldn't fail. Used to
call is_deeply, which was failing on slight differences between
floating point numbers. Now compares with a small tolerance.
lib/AI/Calibrate.pm view on Meta::CPAN
Classifiers usually return some sort of an instance score with their
classifications. These scores can be used as probabilities in various
calculations, but first they need to be I<calibrated>. Naive Bayes, for
example, is a very useful classifier, but the scores it produces are usually
"bunched" around 0 and 1, making these scores poor probability estimates.
Support vector machines have a similar problem. Both classifier types should
be calibrated before their scores are used as probability estimates.
This module calibrates classifier scores using a method called the Pool
Adjacent Violators (PAV) algorithm. After you train a classifier, you take a
(usually separate) set of test instances and run them through the classifier,
collecting the scores assigned to each. You then supply this set of instances
to the calibrate function defined here, and it will return a set of ranges
mapping from a score range to a probability estimate.
For example, assume you have the following set of instance results from your
classifier. Each result is of the form C<[ASSIGNED_SCORE, TRUE_CLASS]>:
my $points = [
[.9, 1],
[.8, 1],
[.7, 0],
[.6, 1],
[.55, 1],
[.5, 1],
[.45, 0],
lib/AI/Calibrate.pm view on Meta::CPAN
This is the main calibration function. The calling form is:
my $calibrated = calibrate( $data, $sorted);
$data looks like: C<[ [score, class], [score, class], [score, class]...]>
Each score is a number. Each class is either 0 (negative class) or 1
(positive class).
$sorted is boolean (0 by default) indicating whether the data are already
sorted by score. Unless this is set to 1, calibrate() will sort the data
itself.
Calibrate returns a reference to an ordered list of references:
[ [score, prob], [score, prob], [score, prob] ... ]
Scores will be in descending numerical order. See the DESCRIPTION section for
how this structure is interpreted. You can pass this structure to the
B<score_prob> function, along with a new score, to get a probability.
lib/AI/Calibrate.pm view on Meta::CPAN
This is a simple utility function that takes the structure returned by
B<calibrate>, along with a new score, and returns the probability estimate.
Example calling form:
$p = score_prob($calibrated, $score);
Once you have a trained, calibrated classifier, you could imagine using it
like this:
$calibrated = calibrate( $calibration_set );
print "Input instances, one per line:\n";
while (<>) {
chomp;
my(@fields) = split;
my $score = classifier(@fields);
my $prob = score_prob($score);
print "Estimated probability: $prob\n";
}
=cut
lib/AI/Calibrate.pm view on Meta::CPAN
$last_bound = $bound;
}
if ($last_bound != 0) {
printf("%0.3f > SCORE >= %0.3f prob = %0.3f\n",
$last_bound, 0, 0);
}
}
=head1 DETAILS
The PAV algorithm is conceptually straightforward. Given a set of training
cases ordered by the scores assigned by the classifier, it first assigns a
probability of one to each positive instance and a probability of zero to each
negative instance, and puts each instance in its own group. It then looks, at
each iteration, for adjacent violators: adjacent groups whose probabilities
locally increase rather than decrease. When it finds such groups, it pools
them and replaces their probability estimates with the average of the group's
values. It continues this process of averaging and replacement until the
entire sequence is monotonically decreasing. The result is a sequence of
instances, each of which has a score and an associated probability estimate,
which can then be used to map scores into probability estimates.
t/AI-Calibrate-NB.t view on Meta::CPAN
'yes'],
[ {outlook=>'overcast',temperature=>72,humidity=>90,windy=>'TRUE'},
'yes'],
[ {outlook=>'overcast',temperature=>81,humidity=>75,windy=>'FALSE'},
'yes'],
[ {outlook=>'rainy',temperature=>71,humidity=>91,windy=>'TRUE'},
'no']
);
my $nb = AI::NaiveBayes1->new;
$nb->set_real('temperature', 'humidity');
for my $inst (@instances) {
my($attrs, $play) = @$inst;
$nb->add_instance(attributes=>$attrs, label=>"play=$play");
}
$nb->train;
my @points;
for my $inst (@instances) {
t/AI-Calibrate-pathologies.t view on Meta::CPAN
# `make test'. After `make install' it should work as `perl AI-Calibrate.t'
use Test::More tests => 6;
BEGIN { use_ok('AI::Calibrate', ':all') };
my $points0 = [ ];
use Data::Dumper;
is_deeply( calibrate($points0), [], "empty point set");
my $points1 = [
[.9, 1]
];
is_deeply(calibrate($points1), [[0.9,1]], "Singleton point set");
my $points2 = [
[.8, 1],
[.7, 0],
];
is_deeply(calibrate($points2), [[0.8, 1]], "two-point perfect");
my $points3 = [
[.8, 0],
( run in 1.012 second using v1.01-cache-2.11-cpan-49f99fa48dc )