Algorithm-CurveFit-Simple

 view release on metacpan or  search on metacpan

lib/Algorithm/CurveFit/Simple.pm  view on Meta::CPAN

=head2 Options

=over 4

=item C<fit(xdata =E<gt> \@xdata, ydata =E<gt> \@ydata)>

The data points the formula will fit.  Same as L<Algorithm::CurveFit> parameters of the same name.

=item C<fit(xydata =E<gt> [[1, 2, 3, 4], [10, 17, 26, 37]])>

=item C<fit(xydata =E<gt> [[1, 10], [2, 17], [3, 26], [4, 37]])>

A more convenient way to provide data points.  C<fit()> will try to detect how the data points are organized -- list of x and list of y, or list of [x,y].

=item C<fit(terms =E<gt> 3)>

Sets the order of the polynomial, which will be of the form C<k + a*x + b*x**2 + c*x**3 ...>.  The default is 3 and the limit is 10.

There is no need to specify initial C<k>.  It will be calculated from C<xydata>.

=item C<fit(time_limit =E<gt> 3)>

If a time limit is given (in seconds), C<fit()> will spend no more than that long trying to fit the data.  It may return in much less time.  The default is 3.

=item C<fit(iterations =E<gt> 10000)>

If an iteration count is given, C<fit()> will ignore any time limit and iterate up to C<iterations> times trying to fit the curve.  Same as L<Algorithm::CurveFit> parameter of the same name.

=item C<fit(inv =E<gt> 1)>

Setting C<inv> inverts the sense of the fit.  Instead of C<f(x) = y> the formula will fit C<f(y) = x>.

=item C<fit(impl_lang =E<gt> "perl")>

Sets the programming language in which the formula will be implemented.  Currently supported languages are C<"C">, C<"coderef"> and the default, C<"perl">.

When C<impl_lang =E<gt> "coderef"> is specified, a code reference is returned instead which may be used immediately by your perl script:

    my($max_dev, $avg_dev, $x2y) = fit(xydata => \@xy, impl_lang => "coderef");

    my $y = $x2y->(42);

More implementation languages will be supported in the future.

=item C<fit(impl_name =E<gt> "x2y")>

Sets the name of the function implementing the formula.  The default is C<"x2y">.  Has no effect when used with C<impl_lang =E<gt> "coderef")>.

    my($max_dev, $avg_dev, $src) = fit(xydata => \@xy, impl_name => "converto");

    print "$src\n";

    sub converto {
        my($x) = @_;
        my $y = -5340.93059104837 + 249.23009968947 * $x + -3.87745746448 * $x**2 + 0.02114780993 * $x**3;
        return $y;
    }

=item C<fit(bounds_check =E<gt> 1)>

When set, the implementation will include logic for checking whether the input is out-of-bounds, per the highest and lowest x points in the data used to fit the formula.  For implementation languages which support exceptions, an exception will be thr...

For instance, if the highest x in C<$xydata> is 83.0 and the lowest x is 60.0:

    my($max_dev, $avg_dev, $src) = fit(xydata => \@xy, bounds_check => 1);

    print "$src\n";

    sub x2y {
        my($x) = @_;
        die "x out of bounds (high)" if ($x > 83.80000000000);
        die "x out of bounds (low)"  if ($x < 60.80000000000);
        my $y = -5340.93059104837 + 249.23009968947 * $x + -3.87745746448 * $x**2 + 0.02114780993 * $x**3;
        return $y;
    }

=item C<fit(round_result =E<gt> 1)>

When set, the implementation will round the output to the nearest whole number.  When the implementation language is C<"C"> this adds an C<#include E<lt>math.hE<gt>> directive to the source code, which will have to be compiled against libm -- see C<m...

    my($max_dev, $avg_dev, $src) = fit(xydata => \@xy, round_result => 1);

    print "$src\n";

    sub x2y {
        my($x) = @_;
        my $y = -5340.93059104837 + 249.23009968947 * $x + -3.87745746448 * $x**2 + 0.02114780993 * $x**3;
        $y = int($y + 0.5);
        return $y;
    }

=item C<fit(suppress_includes =E<gt> 1)>

When set and C<lang_impl =E<gt> "C">, any C<#include> directives which the implementation might need will be suppressed.

=back

=head1 VARIABLES

The class variable C<%STATS_H> contains various intermediate values which might be helpful.  For instance, C<$STATS_H{deviation_max_offset_datum}> contains the x data point which corresponds to the maximum deviation returned.

The contents of C<%STATS_H> is subject to change and might not be fully documented in future versions.  The current fields are:

=over 4

=item C<deviation_max_offset_datum>: The x data point corresponding with returned maximum deviation.

=item C<fit_calib_parar>: Arrayref of formula parameters as returned by L<Algorithm::CurveFit> after a short fitting attempt used for timing calibration.

=item C<fit_calib_time>: The number of seconds L<Algorithm::CurveFit> spent in the calibration run.

=item C<fit_iter>: The iterations parameter passed to L<Algorithm::CurveFit>.

=item C<fit_parar>: Arrayref of formula parameters as returned by L<Algorithm::CurveFit>.

=item C<fit_time>: The number of seconds L<Algorithm::CurveFit> actually spent fitting the formula.

=item C<impl_exception>: The exception thrown when the implementation was used to calculate the deviations, or the empty string if none.

=item C<impl_formula>: The formula part of the implementation.

=item C<impl_source>: The implementation source string.

=item C<iter_mode>: One of C<"time"> or C<"iter">, indicating whether a time limit was used or an iteration count.

=item C<xdata>: Arrayref of x data points as passed to L<Algorithm::CurveFit>.

=item C<ydata>: Arrayref of y data points as passed to L<Algorithm::CurveFit>.

=back

=head1 CAVEATS

=over 4

=item * Only simple polynomial functions are supported.  Sometimes you need something else.  Use L<Algorithm::CurveFit> for such cases.

=item * If C<xydata> is very large, iterating over it to calculate deviances can take more time than permitted by C<time_limit>.

=item * The dangers of overfitting are real!  L<https://en.wikipedia.org/wiki/Overfitting>

=item * Using too many terms can dramatically reduce the accuracy of the fitted formula.

=item * Sometimes calling L<Algorithm::CurveFit> with a ten-term polynomial causes it to hang.

=back

=head1 TO DO

=over 4

=item * Support more programming languages for formula implementation: R, MATLAB, python

=item * Calculate the actual term sigfigs and set precision appropriately in the formula implementation instead of just "%.11f".

=item * Support trying a range of terms and returning whatever gives the best fit.

=item * Support piecewise output formulas.

=item * Work around L<Algorithm::CurveFit>'s occasional hang problem when using ten-term polynomials.

=back

=head1 SEE ALSO

L<Algorithm::CurveFit>

L<curvefit>

=cut



( run in 0.922 second using v1.01-cache-2.11-cpan-5735350b133 )