Algorithm-DecisionTree

 view release on metacpan or  search on metacpan

lib/Algorithm/DecisionTree.pm  view on Meta::CPAN

must list the names of the fields, and the first column must contain an integer ID
for each record.

=item B<construct_regression_tree():>

As the name implies, this is the method that construct a regression tree.

=item B<display_regression_tree("     "):>

Displays the regression tree, as the name implies.  The white-space string argument
specifies the offset to use in displaying the child nodes in relation to a parent
node.

=item B<prediction_for_single_data_point( $root_node, $test_sample ):>

You call this method after you have constructed a regression tree if you want to
calculate the prediction for one sample.  The parameter C<$root_node> is what is
returned by the call C<construct_regression_tree()>.  The formatting of the argument
bound to the C<$test_sample> parameter is important.  To elaborate, let's say you are
using two variables named C<$xvar1> and C<$xvar2> as your predictor variables. In
this case, the C<$test_sample> parameter will be bound to a list that will look like

lib/Algorithm/RegressionTree.pm  view on Meta::CPAN

    $root_node->set_num_data_points($XMatrix->cols);
    print "\nerror at root: $error\n";
    print "\nbeta at root:\n";
    display_matrix($beta);
    $self->{_root_node} = $root_node;
    $self->recursive_descent($root_node) if $self->{_max_depth_desired} > 0;
    return $root_node;
}

##  We first look for a feature, along with its partitioning point, that yields the 
##  largest reduction in MSE compared to the MSE at the parent node.  This feature and
##  its partitioning point are then used to create two child nodes in the tree.
sub recursive_descent {
    my $self = shift;
    my $node = shift;
    print "\n==================== ENTERING RECURSIVE DESCENT ==========================\n";
    my $node_serial_number = $node->get_serial_num();
    my @features_and_values_or_thresholds_on_branch = @{$node->get_branch_features_and_values_or_thresholds()};
    my @copy_of_path_attributes = @{deep_copy_array(\@features_and_values_or_thresholds_on_branch)};
    if (@features_and_values_or_thresholds_on_branch > 0) {
        my ($error,$beta,$XMatrix,$YVector) = 

lib/Algorithm/RegressionTree.pm  view on Meta::CPAN

        my $right_child_node = RTNode->new(undef, undef, undef, 
                        \@extended_branch_features_and_values_or_thresholds_for_greaterthan_child, $self);
        $node->add_child_link($right_child_node);
        $self->recursive_descent($right_child_node);
    }
}

##  This is the heart of the regression tree constructor.  Its main job is to figure
##  out the best feature to use for partitioning the training data samples at the
##  current node.  The partitioning criterion is that the largest of the MSE's in 
##  the two partitions should be smaller than the error associated with the parent
##  node.
sub best_feature_calculator {
    my $self = shift;
    my $features_and_values_or_thresholds_on_branch = shift;
    my @features_and_values_or_thresholds_on_branch =  @$features_and_values_or_thresholds_on_branch;
    print "\n\nfeatures_and_values_or_thresholds_on_branch: @features_and_values_or_thresholds_on_branch\n";
    if (@features_and_values_or_thresholds_on_branch == 0) {
        my $best_partition_point_for_feature_hash = { map {$_ => undef} @{$self->{_feature_names}} };
        my $best_minmax_error_for_feature_hash = { map {$_ => undef} @{$self->{_feature_names}} };
        foreach my $i (0 .. @{$self->{_feature_names}}-1) {



( run in 0.311 second using v1.01-cache-2.11-cpan-4d50c553e7e )