AI-DecisionTree
view release on metacpan or search on metacpan
won't be making many (any?) more of these kinds of changes, but
these ones were probably necessary.
- Removed a bit of debugging code that I left in for 0.03.
0.03 Mon Sep 2 11:41:18 AEST 2002
- Added a 'prune' parameter to new(), which controls whether the tree
will be pruned after training. This is usually a good idea, so the
default is to prune. Currently we prune using a simple
minimum-description-length criterion.
- Training instances are now represented using a C struct rather than
a Perl hash. This can dramatically reduce memory usage, though it
doesn't have much effect on speed. Note that Inline.pm is now
required.
- The list of instances is now deleted after training, since it's no
longer needed.
- Small speedup to the train() method, achieved by less copying of data.
lib/AI/DecisionTree.pm view on Meta::CPAN
# Entropy is defined with log base 2 - we just divide by log(2) at the end to adjust.
my $sum = 0;
$sum += $_ * log($_) foreach values %count;
return +(log(@_) - $sum/@_)/log(2);
}
sub prune_tree {
my $self = shift;
# We use a minimum-description-length approach. We calculate the
# score of each node:
# n = number of nodes below
# r = number of results (categories) in the entire tree
# i = number of instances in the entire tree
# e = number of errors below this node
# Hypothesis description length (MML):
# describe tree: number of nodes + number of edges
# describe exceptions: num_exceptions * log2(total_num_instances) * log2(total_num_results)
my $r = keys %{ $self->{results} };
my $i = $self->{tree}{instances};
my $exception_cost = log($r) * log($i) / log(2)**2;
# Pruning can turn a branch into a leaf
my $maybe_prune = sub {
my ($self, $node) = @_;
lib/AI/DecisionTree.pm view on Meta::CPAN
If C<noise_mode> is set to C<fatal> (the default), the C<train()>
method will throw an exception (die). If C<noise_mode> is set to
C<pick_best>, the most frequent result at each noisy node will be
selected.
=item prune
A boolean C<prune> parameter which specifies
whether the tree should be pruned after training. This is usually a
good idea, so the default is to prune. Currently we prune using a
simple minimum-description-length criterion.
=item verbose
If set to a true value, some status information will be output while
training a decision tree. Default is false.
=item purge
If set to a true value, the C<do_purge()> method will be invoked
during C<train()>. The default is true.
lib/AI/DecisionTree.pm view on Meta::CPAN
build this tree.
=item nodes()
Returns the number of nodes in the trained decision tree.
=item depth()
Returns the depth of the tree. This is the maximum number of
decisions that would need to be made to classify an unseen instance,
i.e. the length of the longest path from the tree's root to a leaf. A
tree with a single node would have a depth of zero.
=item rule_tree()
Returns a data structure representing the decision tree. For
instance, for the tree diagram above, the following data structure
is returned:
[ 'outlook', {
'rain' => [ 'wind', {
( run in 0.478 second using v1.01-cache-2.11-cpan-65fba6d93b7 )