Algorithm-DecisionTree
view release on metacpan or search on metacpan
lib/Algorithm/DecisionTree.pm view on Meta::CPAN
must list the names of the fields, and the first column must contain an integer ID
for each record.
=item B<construct_regression_tree():>
As the name implies, this is the method that construct a regression tree.
=item B<display_regression_tree(" "):>
Displays the regression tree, as the name implies. The white-space string argument
specifies the offset to use in displaying the child nodes in relation to a parent
node.
=item B<prediction_for_single_data_point( $root_node, $test_sample ):>
You call this method after you have constructed a regression tree if you want to
calculate the prediction for one sample. The parameter C<$root_node> is what is
returned by the call C<construct_regression_tree()>. The formatting of the argument
bound to the C<$test_sample> parameter is important. To elaborate, let's say you are
using two variables named C<$xvar1> and C<$xvar2> as your predictor variables. In
this case, the C<$test_sample> parameter will be bound to a list that will look like
lib/Algorithm/RegressionTree.pm view on Meta::CPAN
$root_node->set_num_data_points($XMatrix->cols);
print "\nerror at root: $error\n";
print "\nbeta at root:\n";
display_matrix($beta);
$self->{_root_node} = $root_node;
$self->recursive_descent($root_node) if $self->{_max_depth_desired} > 0;
return $root_node;
}
## We first look for a feature, along with its partitioning point, that yields the
## largest reduction in MSE compared to the MSE at the parent node. This feature and
## its partitioning point are then used to create two child nodes in the tree.
sub recursive_descent {
my $self = shift;
my $node = shift;
print "\n==================== ENTERING RECURSIVE DESCENT ==========================\n";
my $node_serial_number = $node->get_serial_num();
my @features_and_values_or_thresholds_on_branch = @{$node->get_branch_features_and_values_or_thresholds()};
my @copy_of_path_attributes = @{deep_copy_array(\@features_and_values_or_thresholds_on_branch)};
if (@features_and_values_or_thresholds_on_branch > 0) {
my ($error,$beta,$XMatrix,$YVector) =
lib/Algorithm/RegressionTree.pm view on Meta::CPAN
my $right_child_node = RTNode->new(undef, undef, undef,
\@extended_branch_features_and_values_or_thresholds_for_greaterthan_child, $self);
$node->add_child_link($right_child_node);
$self->recursive_descent($right_child_node);
}
}
## This is the heart of the regression tree constructor. Its main job is to figure
## out the best feature to use for partitioning the training data samples at the
## current node. The partitioning criterion is that the largest of the MSE's in
## the two partitions should be smaller than the error associated with the parent
## node.
sub best_feature_calculator {
my $self = shift;
my $features_and_values_or_thresholds_on_branch = shift;
my @features_and_values_or_thresholds_on_branch = @$features_and_values_or_thresholds_on_branch;
print "\n\nfeatures_and_values_or_thresholds_on_branch: @features_and_values_or_thresholds_on_branch\n";
if (@features_and_values_or_thresholds_on_branch == 0) {
my $best_partition_point_for_feature_hash = { map {$_ => undef} @{$self->{_feature_names}} };
my $best_minmax_error_for_feature_hash = { map {$_ => undef} @{$self->{_feature_names}} };
foreach my $i (0 .. @{$self->{_feature_names}}-1) {
( run in 0.311 second using v1.01-cache-2.11-cpan-4d50c553e7e )