Algorithm-DecisionTree

 view release on metacpan or  search on metacpan

lib/Algorithm/DecisionTree.pm  view on Meta::CPAN


This call calculates the predictions for all of the predictor variables data in your
training file.  The parameter C<$root_node> is what is returned by the call to
C<construct_regression_tree()>.  The values for the dependent variable thus predicted
can be seen by calling C<display_all_plots()>, which is the method mentioned below.

=item B<display_all_plots():>

This method displays the results obtained by calling the prediction method of the
previous entry.  This method also creates a hardcopy of the plots and saves it as a
C<.png> disk file. The name of this output file is always C<regression_plots.png>.

=item B<mse_for_tree_regression_for_all_training_samples( $root_node ):>

This method carries out an error analysis of the predictions for the samples in your
training datafile.  It shows you the overall MSE (Mean Squared Error) with tree-based
regression, the MSE for the data samples at each of the leaf nodes of the regression
tree, and the MSE for the plain old Linear Regression as applied to all of the data.
The parameter C<$root_node> in the call syntax is what is returned by the call to
C<construct_regression_tree()>.

lib/Algorithm/RegressionTree.pm  view on Meta::CPAN

        my ($feature,$value) = ($1,$2);
        die "Your test data has formatting error" unless defined($feature) && defined($value);
        return 0 unless contained_in($feature, @{$self->{_feature_names}});
    }
    return 1;
}

sub display_all_plots {
    my $self = shift;
    my $ncols = $self->{_XMatrix}->cols;
    unlink "regression_plots.png" if -e "regression_plots.png";
    my $master_datafile = $self->{_training_datafile};
    my $filename = basename($master_datafile);
    my $temp_file = "__temp_" . $filename;
    unlink $temp_file if -e $temp_file;
    open OUTPUT, ">$temp_file"
           or die "Unable to open a temp file in this directory: $!\n";
    if ($ncols == 2) {
        my @predictor_entries = $self->{_XMatrix}->col(0)->as_list;
        my @dependent_val_vals = $self->{_YVector}->col(0)->as_list;
        map {print OUTPUT "$predictor_entries[$_] $dependent_val_vals[$_]\n"} 0 .. $self->{_XMatrix}->rows - 1;
        print OUTPUT "\n\n";
        foreach my $plot (sort {$a <=> $b} keys %{$self->{_output_for_plots}}) {
            map {print OUTPUT "$self->{_output_for_plots}->{$plot}->[0]->[$_] $self->{_output_for_plots}->{$plot}->[1]->[$_]\n"} 0 .. @{$self->{_output_for_plots}->{$plot}->[0]} - 1;
            print OUTPUT "\n\n"
        }
        close OUTPUT;
        my $gplot = Graphics::GnuplotIF->new( persist => 1 );
        my $hardcopy_plot = Graphics::GnuplotIF->new();
        $hardcopy_plot->gnuplot_cmd('set terminal png', "set output \"regression_plots.png\"");        
        $gplot->gnuplot_cmd( "set noclip" );
        $gplot->gnuplot_cmd( "set pointsize 2" );
        my $arg_string = "";
        foreach my $i (0 .. scalar(keys %{$self->{_output_for_plots}})) {
            if ($i == 0) {            
                $arg_string .= "\"$temp_file\" index $i using 1:2 notitle with points lt -1 pt 1, ";
            } elsif ($i == 1) {
                $arg_string .= "\"$temp_file\" index $i using 1:2 title \"linear regression\" with lines lt 1 lw 4, ";
            } elsif ($i == 2) {
                $arg_string .= "\"$temp_file\" index $i using 1:2 title \"tree regression\" with lines lt 3 lw 4, ";

lib/Algorithm/RegressionTree.pm  view on Meta::CPAN

        foreach my $plot (sort {$a <=> $b} keys %{$self->{_output_for_surface_plots}}) {
            my @plot_data = @{$self->{_output_for_surface_plots}->{$plot}};
            my @predictors = @{$plot_data[0]};
            my @predictions = @{$plot_data[1]};
            map {print OUTPUT "$predictors[$_] $predictions[$_]\n"} 0 .. @predictions - 1;
            print OUTPUT "\n\n"
        }
        close OUTPUT;
        my $gplot = Graphics::GnuplotIF->new( persist => 1 );
        my $hardcopy_plot = Graphics::GnuplotIF->new();
        $hardcopy_plot->gnuplot_cmd('set terminal png', "set output \"regression_plots.png\"");        
        $gplot->gnuplot_cmd( "set noclip" );
        $gplot->gnuplot_cmd( "set pointsize 2" );
        my $arg_string = "";
        foreach my $i (0 .. scalar(keys %{$self->{_output_for_surface_plots}})) {
            if ($i == 0) {            
                $arg_string .= "\"$temp_file\" index $i using 1:2:3 notitle with points lt -1 pt 1, ";
            } elsif ($i == 1) {
                $arg_string .= "\"$temp_file\" index $i using 1:2:3 title \"linear regression\" with points lt 1 pt 2, ";
            } elsif ($i == 2) {
                $arg_string .= "\"$temp_file\" index $i using 1:2:3 title \"tree regression\" with points lt 3 pt 3, ";



( run in 1.753 second using v1.01-cache-2.11-cpan-df04353d9ac )