Algorithm-CurveFit-Simple
view release on metacpan or search on metacpan
lib/Algorithm/CurveFit/Simple.pm view on Meta::CPAN
package Algorithm::CurveFit::Simple;
# ABSTRACT: Convenience wrapper around Algorithm::CurveFit.
our $VERSION = '1.03'; # VERSION 1.03
use strict;
use warnings;
use Algorithm::CurveFit;
use Time::HiRes;
use JSON::PP;
our %STATS_H; # side-products of fit() stored here for profiling purposes
BEGIN {
require Exporter;
our $VERSION = '1.03';
our @ISA = qw(Exporter);
our @EXPORT_OK = qw(fit %STATS_H);
}
# fit() - only public function for this distribution
# Given at least parameter "xy", generate a best-fit curve within a time limit.
# Output: max deviation, avg deviation, implementation source string (perl or C, for now).
# Optional parameters and their defaults:
# terms => 3 # number of terms in formula, max is 10
# time_limit => 3 # number of seconds to try for better fit
# inv => 1 # invert sense of curve-fit, from x->y to y->x
# impl_lang => 'perl' # programming language used for output implementation: perl, c
# impl_name => 'x2y' # name given to output implementation function
sub fit {
my %p = @_;
my $formula = _init_formula(%p);
my ($xdata, $ydata) = _init_data(%p);
my $parameters = _init_parameters($xdata, $ydata, %p);
my $iter_mode = 'time';
my $time_limit = 3; # sane default?
$time_limit = 0.01 if ($time_limit < 0.01);
my $n_iter;
if (defined($p{iterations})) {
$iter_mode = 'iter';
$n_iter = $p{iterations} || 10000;
} else {
$time_limit = $p{time_limit} // $time_limit;
$n_iter = 10000 * $time_limit; # will use this to figure out how long it -really- takes.
}
my ($n_sec, $params_ar_ar);
if ($iter_mode eq 'time') {
($n_sec, $params_ar_ar) = _try_fit($formula, $parameters, $xdata, $ydata, $n_iter, $p{fitter_class});
$STATS_H{iter_mode} = $iter_mode;
$STATS_H{fit_calib_iter} = $n_iter;
$STATS_H{fit_calib_time} = $n_sec;
$STATS_H{fit_calib_parar} = $params_ar_ar;
$n_iter = int(($time_limit / $n_sec) * $n_iter + 1);
}
($n_sec, $params_ar_ar) = _try_fit($formula, $parameters, $xdata, $ydata, $n_iter, $p{fitter_class});
$STATS_H{fit_iter} = $n_iter;
$STATS_H{fit_time} = $n_sec;
$STATS_H{fit_parar} = $params_ar_ar;
my $coderef = _implement_formula($params_ar_ar, "coderef", "", $xdata, \%p);
my ($max_dev, $avg_dev) = _calculate_deviation($coderef, $xdata, $ydata);
my $impl_lang = $p{impl_lang} // 'perl';
$impl_lang = lc($impl_lang);
my $impl_name = $p{inv} ? "y2x" : "x2y";
$impl_name = $p{impl_name} // $impl_name;
my $impl = $coderef;
$impl = _implement_formula($params_ar_ar, $impl_lang, $impl_name, $xdata, \%p) unless($impl_lang eq 'coderef');
return ($max_dev, $avg_dev, $impl);
}
# ($n_sec, $params_ar_ar) = _try_fit($formula, $parameters, $xdata, $ydata, $n_iter, $p{fitter_class});
sub _try_fit {
my ($formula, $parameters, $xdata, $ydata, $n_iter, $fitter_class) = @_;
$fitter_class //= "Algorithm::CurveFit";
my $params_ar_ar = [map {[@$_]} @$parameters]; # making a copy because curve_fit() is destructive
my $tm0 = Time::HiRes::time();
my $res = $fitter_class->curve_fit(
lib/Algorithm/CurveFit/Simple.pm view on Meta::CPAN
$STATS_H{deviation_exception} = $@;
$STATS_H{deviation_exception_datum} = $x;
die "caught exception calculating deviations";
}
my $observed_y = $ydata->[$i];
if ($observed_y && $y) {
my $deviation = $y > $observed_y ? $y / $observed_y : $observed_y / $y;
my $dev_mag = abs($deviation - 1.0);
my $max_mag = abs($max_off - 1.0);
# print "x=$x\ty=$y\toy=$observed_y\tdev_mag=$dev_mag\tmax_mag=$max_mag\n";
($max_off, $max_off_datum) = ($deviation, $x) if ($dev_mag > $max_mag);
$tot_off += $deviation;
} else {
$tot_off += 1.0;
}
}
$STATS_H{deviation_max_offset_datum} = $max_off_datum;
return ($max_off, $tot_off / @$xdata);
}
1;
=head1 NAME
Algorithm::CurveFit::Simple - Convenience wrapper around Algorithm::CurveFit
=head1 SYNOPSIS
use Algorithm::CurveFit::Simple qw(fit);
my ($max_dev, $avg_dev, $src) = fit(xdata => \@xdata, ydata => \@ydata, ..options..);
# Alternatively pass xdata and ydata together:
my ($max_dev, $avg_dev, $src) = fit(xydata => [\@xdata, \@ydata], ..options..);
# Alternatively pass data as array of [x,y] pairs:
my ($max_dev, $avg_dev, $src) = fit(xydata => [[1, 2], [2, 5], [3, 10]], ..options..);
=head1 DESCRIPTION
This is a convenience wrapper around L<Algorithm::CurveFit>. Given a body of (x, y) data points, it will generate a polynomial formula f(x) = y which fits that data.
Its main differences from L<Algorithm::CurveFit> are:
=over 4
=item * It synthesizes the initial formula for you,
=item * It allows for a time limit on the curve-fit instead of an iteration count,
=item * It implements the formula as source code (or as a perl coderef, if you want to use the formula immediately in your program).
=back
Additionally it returns a maximum deviation and average deviation of the formula vs the xydata, which is more useful (to me, at least) than L<Algorithm::CurveFit>'s square residual output. Closer to 1.0 indicates a better fit. Play with C<terms =E<...
=head1 SUBROUTINES
There is only one public subroutine, C<fit()>. It B<must> be given either C<xydata> or C<xdata> and C<ydata> parameters. All other paramters are optional.
It returns three values: A maximum deviation, the average deviation and the formula implementation.
=head2 Options
=over 4
=item C<fit(xdata =E<gt> \@xdata, ydata =E<gt> \@ydata)>
The data points the formula will fit. Same as L<Algorithm::CurveFit> parameters of the same name.
=item C<fit(xydata =E<gt> [[1, 2, 3, 4], [10, 17, 26, 37]])>
=item C<fit(xydata =E<gt> [[1, 10], [2, 17], [3, 26], [4, 37]])>
A more convenient way to provide data points. C<fit()> will try to detect how the data points are organized -- list of x and list of y, or list of [x,y].
=item C<fit(terms =E<gt> 3)>
Sets the order of the polynomial, which will be of the form C<k + a*x + b*x**2 + c*x**3 ...>. The default is 3 and the limit is 10.
There is no need to specify initial C<k>. It will be calculated from C<xydata>.
=item C<fit(time_limit =E<gt> 3)>
If a time limit is given (in seconds), C<fit()> will spend no more than that long trying to fit the data. It may return in much less time. The default is 3.
=item C<fit(iterations =E<gt> 10000)>
If an iteration count is given, C<fit()> will ignore any time limit and iterate up to C<iterations> times trying to fit the curve. Same as L<Algorithm::CurveFit> parameter of the same name.
=item C<fit(inv =E<gt> 1)>
Setting C<inv> inverts the sense of the fit. Instead of C<f(x) = y> the formula will fit C<f(y) = x>.
=item C<fit(impl_lang =E<gt> "perl")>
Sets the programming language in which the formula will be implemented. Currently supported languages are C<"C">, C<"coderef"> and the default, C<"perl">.
When C<impl_lang =E<gt> "coderef"> is specified, a code reference is returned instead which may be used immediately by your perl script:
my($max_dev, $avg_dev, $x2y) = fit(xydata => \@xy, impl_lang => "coderef");
my $y = $x2y->(42);
More implementation languages will be supported in the future.
=item C<fit(impl_name =E<gt> "x2y")>
Sets the name of the function implementing the formula. The default is C<"x2y">. Has no effect when used with C<impl_lang =E<gt> "coderef")>.
my($max_dev, $avg_dev, $src) = fit(xydata => \@xy, impl_name => "converto");
print "$src\n";
sub converto {
my($x) = @_;
my $y = -5340.93059104837 + 249.23009968947 * $x + -3.87745746448 * $x**2 + 0.02114780993 * $x**3;
return $y;
}
( run in 2.776 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )