Algorithm-DecisionTree
view release on metacpan or search on metacpan
lib/Algorithm/DecisionTree.pm view on Meta::CPAN
$dt->get_training_data();
$dt->calculate_first_order_probabilities();
$dt->calculate_class_priors();
# Next you construct a decision tree for your training data by calling:
$root_node = $dt->construct_decision_tree_classifier();
# where $root_node is an instance of the DTNode class that is also defined in the
# module file. Now you are ready to classify a new data record. Let's say that
# your data record looks like:
my @test_sample = qw / g2=4.2
grade=2.3
gleason=4
eet=1.7
age=55.0
ploidy=diploid /;
# You can classify it by calling:
lib/Algorithm/DecisionTree.pm view on Meta::CPAN
test set of data is a good way to develop greater proficiency with decision trees.
=head1 WHAT PRACTICAL PROBLEM IS SOLVED BY THIS MODULE
If you are new to the concept of a decision tree, their practical utility is best
understood with an example that only involves symbolic features. However, as
mentioned earlier, versions of the module higher than 2.0 allow you to use both
symbolic and numeric features.
Consider the following scenario: Let's say you are running a small investment company
that employs a team of stockbrokers who make buy/sell decisions for the customers of
your company. Assume that your company has asked the traders to make each investment
decision on the basis of the following four criteria:
price_to_earnings_ratio (P_to_E)
price_to_sales_ratio (P_to_S)
return_on_equity (R_on_E)
lib/Algorithm/DecisionTree.pm view on Meta::CPAN
This will display the decision tree in your terminal window by using a recursively
determined offset for each node as the display routine descends down the tree.
I have intentionally left the syntax fragment C<$root_node> in the above call to
remind the reader that C<display_decision_tree()> is NOT called on the instance of
the C<DecisionTree> we constructed earlier, but on the C<DTNode> instance returned by
the call to C<construct_decision_tree_classifier()>.
=item B<classify($root_node, \@test_sample):>
Let's say you want to classify the following data record:
my @test_sample = qw / g2=4.2
grade=2.3
gleason=4
eet=1.7
age=55.0
ploidy=diploid /;
you'd make the following call:
lib/Algorithm/DecisionTree.pm view on Meta::CPAN
Note in particular the following two constructor parameters:
how_many_bags
bag_overlap_fraction
where, as the name implies, the parameter C<how_many_bags> controls how many bags
(and, therefore, how many decision trees) will be constructed from your training
dataset; and where the parameter C<bag_overlap_fraction> controls the degree of
overlap between the bags. To understand what exactly is achieved by setting the
parameter C<bag_overlap_fraction> to 0.2 in the above example, let's say that the
non-overlapping partitioning of the training data between the bags results in 100
training samples per bag. With bag_overlap_fraction set to 0.2, additional 20 samples
drawn randomly from the other bags will be added to the data in each bag.
=back
=head2 B<Methods defined for C<DecisionTreeWithBagging> class>
=over 8
lib/Algorithm/DecisionTree.pm view on Meta::CPAN
Displays the regression tree, as the name implies. The white-space string argument
specifies the offset to use in displaying the child nodes in relation to a parent
node.
=item B<prediction_for_single_data_point( $root_node, $test_sample ):>
You call this method after you have constructed a regression tree if you want to
calculate the prediction for one sample. The parameter C<$root_node> is what is
returned by the call C<construct_regression_tree()>. The formatting of the argument
bound to the C<$test_sample> parameter is important. To elaborate, let's say you are
using two variables named C<$xvar1> and C<$xvar2> as your predictor variables. In
this case, the C<$test_sample> parameter will be bound to a list that will look like
['xvar1 = 23.4', 'xvar2 = 12.9']
Arbitrary amount of white space, including none, on the two sides of the equality
symbol is allowed in the construct shown above. A call to this method returns a
dictionary with two key-value pairs. One of the keys is called C<solution_path> and
the other C<prediction>. The value associated with key C<solution_path> is the path
in the regression tree to the leaf node that yielded the prediction. And the value
( run in 0.838 second using v1.01-cache-2.11-cpan-483215c6ad5 )