start results from the CPAN

Algorithm-DecisionTree


                README For Regression Tree Examples


In order to become familiar with the regression tree capabilities of this
module, start by executing the following scripts without changing them and
seeing what sort of output is produced by them:

    regression4.pl

    regression5.pl

    regression6.pl

    regression8.pl

lib/Algorithm/DecisionTree.pm view on Meta::CPAN

B<Algorithm::DecisionTree> is a I<perl5> module for constructing a decision tree from
a training datafile containing multidimensional data.  In one form or another,
decision trees have been around for about fifty years.  From a statistical
perspective, they are closely related to classification and regression by recursive
partitioning of multidimensional data.  Early work that demonstrated the usefulness
of such partitioning of data for classification and regression can be traced to the
work of Terry Therneau in the early 1980's in the statistics community, and to the
work of Ross Quinlan in the mid 1990's in the machine learning community.

For those not familiar with decision tree ideas, the traditional way to classify
multidimensional data is to start with a feature space whose dimensionality is the
same as that of the data.  Each feature in this space corresponds to the attribute
that each dimension of the data measures.  You then use the training data to carve up
the feature space into different regions, each corresponding to a different class.
Subsequently, when you try to classify a new data sample, you locate it in the
feature space and find the class label of the region to which it belongs.  One can
also give the new data point the same class label as that of the nearest training
sample. This is referred to as the nearest neighbor classification.  There exist
hundreds of variations of varying power on these two basic approaches to the
classification of multidimensional data.

lib/Algorithm/DecisionTree.pm view on Meta::CPAN

methods for classifying individual samples and for bulk classification when you place
all your test samples in a single file.

=head1 USING BOOSTING

Starting with Version 3.20, you can use the class C<BoostedDecisionTree> for
constructing a boosted decision-tree classifier.  Boosting results in a cascade of
decision trees in which each decision tree is constructed with samples that are
mostly those that are misclassified by the previous decision tree.  To be precise,
you create a probability distribution over the training samples for the selection of
samples for training each decision tree in the cascade.  To start out, the
distribution is uniform over all of the samples. Subsequently, this probability
distribution changes according to the misclassifications by each tree in the cascade:
if a sample is misclassified by a given tree in the cascade, the probability of its
being selected for training the next tree is increased significantly.  You also
associate a trust factor with each decision tree depending on its power to classify
correctly all of the training data samples.  After a cascade of decision trees is
constructed in this manner, you construct a final classifier that calculates the
class label for a test data sample by taking into account the classification
decisions made by each individual tree in the cascade, the decisions being weighted
by the trust factors associated with the individual classifiers.  These boosting

lib/Algorithm/DecisionTree.pm view on Meta::CPAN

The last script illustrates how you can evaluate the classification power of an
ensemble of decision trees as constructed by C<RandomizedTreesForBigData> by classifying
a large number of test samples extracted randomly from the training database.


=head1 THE C<ExamplesRegression> DIRECTORY

The C<ExamplesRegression> subdirectory in the main installation directory shows
example scripts that you can use to become familiar with regression trees and how
they can be used for nonlinear regression.  If you are new to the concept of
regression trees, start by executing the following scripts without changing them and
see what sort of output is produced by them:

    regression4.pl

    regression5.pl

    regression6.pl

    regression8.pl

lib/Algorithm/RegressionTree.pm view on Meta::CPAN

    my $gamma = 0.1;
    my $iterate_again_flag = 1;
    my $delta = 0.001;
    my $master_interation_index = 0;
    $|++;
    while (1) {
        print "*" unless $master_interation_index++ % 100;
        last unless $iterate_again_flag;
        $gamma *= 0.1;
        $beta0 = 0.99 * $beta0;
        print "\n\n======== starting iterations with gamma= $gamma ===========\n\n\n" if $self->{_debug2_r};
        $beta = $beta0;
        my $beta_old = Math::GSL::Matrix->new($betarows, 1)->zero;
        my $error_old = sum( map abs, ($y - ($X * $beta_old) )->col(0)->as_list ) / $nrows;   
        my $error;
        foreach my $iteration (0 .. 1499) {
            print "." unless $iteration % 100;
            $beta_old = $beta->copy;
            my $jacobian;
            if ($jacobian_choice == 1) {
                $jacobian = $X;

( run in 0.311 second using v1.01-cache-2.11-cpan-0d8aa00de5b )