AI-DecisionTree

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

   won't be making many (any?) more of these kinds of changes, but
   these ones were probably necessary.

 - Removed a bit of debugging code that I left in for 0.03.

0.03  Mon Sep  2 11:41:18 AEST 2002

 - Added a 'prune' parameter to new(), which controls whether the tree
   will be pruned after training.  This is usually a good idea, so the
   default is to prune.  Currently we prune using a simple
   minimum-description-length criterion.

 - Training instances are now represented using a C struct rather than
   a Perl hash.  This can dramatically reduce memory usage, though it
   doesn't have much effect on speed.  Note that Inline.pm is now
   required.

 - The list of instances is now deleted after training, since it's no
   longer needed.

 - Small speedup to the train() method, achieved by less copying of data.

lib/AI/DecisionTree.pm  view on Meta::CPAN


  # Entropy is defined with log base 2 - we just divide by log(2) at the end to adjust.
  my $sum = 0;
  $sum += $_ * log($_) foreach values %count;
  return +(log(@_) - $sum/@_)/log(2);
}

sub prune_tree {
  my $self = shift;

  # We use a minimum-description-length approach.  We calculate the
  # score of each node:
  #  n = number of nodes below
  #  r = number of results (categories) in the entire tree
  #  i = number of instances in the entire tree
  #  e = number of errors below this node

  # Hypothesis description length (MML):
  #  describe tree: number of nodes + number of edges
  #  describe exceptions: num_exceptions * log2(total_num_instances) * log2(total_num_results)
  
  my $r = keys %{ $self->{results} };
  my $i = $self->{tree}{instances};
  my $exception_cost = log($r) * log($i) / log(2)**2;

  # Pruning can turn a branch into a leaf
  my $maybe_prune = sub {
    my ($self, $node) = @_;

lib/AI/DecisionTree.pm  view on Meta::CPAN

If C<noise_mode> is set to C<fatal> (the default), the C<train()>
method will throw an exception (die).  If C<noise_mode> is set to
C<pick_best>, the most frequent result at each noisy node will be
selected.

=item prune

A boolean C<prune> parameter which specifies
whether the tree should be pruned after training.  This is usually a
good idea, so the default is to prune.  Currently we prune using a
simple minimum-description-length criterion.

=item verbose

If set to a true value, some status information will be output while
training a decision tree.  Default is false.

=item purge

If set to a true value, the C<do_purge()> method will be invoked
during C<train()>.  The default is true.

lib/AI/DecisionTree.pm  view on Meta::CPAN

build this tree.

=item nodes()

Returns the number of nodes in the trained decision tree.

=item depth()

Returns the depth of the tree.  This is the maximum number of
decisions that would need to be made to classify an unseen instance,
i.e. the length of the longest path from the tree's root to a leaf.  A
tree with a single node would have a depth of zero.

=item rule_tree()

Returns a data structure representing the decision tree.  For 
instance, for the tree diagram above, the following data structure 
is returned:

 [ 'outlook', {
     'rain' => [ 'wind', {



( run in 0.478 second using v1.01-cache-2.11-cpan-65fba6d93b7 )