Algorithm-DecisionTree

 view release on metacpan or  search on metacpan

lib/Algorithm/DecisionTree.pm  view on Meta::CPAN

      $dt->get_training_data();
      $dt->calculate_first_order_probabilities();
      $dt->calculate_class_priors();

  # Next you construct a decision tree for your training data by calling:

      $root_node = $dt->construct_decision_tree_classifier();

  # where $root_node is an instance of the DTNode class that is also defined in the
  # module file.  Now you are ready to classify a new data record.  Let's say that
  # your data record looks like:

      my @test_sample  = qw /  g2=4.2
                               grade=2.3
                               gleason=4
                               eet=1.7
                               age=55.0
                               ploidy=diploid /;

  # You can classify it by calling:

lib/Algorithm/DecisionTree.pm  view on Meta::CPAN

The first script places you in an interactive session in which you will first be
asked for the node number you are interested in.  Subsequently, you will be asked for
whether or not you are interested in specific questions that the introspection can
provide answers for. The second script descends down the decision tree and shows for
each node the training samples that fall directly in the portion of the feature space
assigned to that node.  The third script shows for each training sample how it
affects the decision-tree nodes either directly or indirectly through the
generalization achieved by the probabilistic modeling of the data.

The output of the script
C<introspection_show_training_samples_at_all_nodes_direct_influence.pl> looks like:

    Node 0: the samples are: None
    Node 1: the samples are: [sample_46 sample_58]
    Node 2: the samples are: [sample_1 sample_4 sample_7 .....]
    Node 3: the samples are: []
    Node 4: the samples are: []
    ...
    ...            

The nodes for which no samples are listed come into existence through
the generalization achieved by the probabilistic modeling of the data.

The output produced by the script
C<introspection_show_training_samples_to_nodes_influence_propagation.pl> looks like

    sample_1:                                                                 
       nodes affected directly: [2 5 19 23]                                
       nodes affected through probabilistic generalization:                   
            2=> [3 4 25]                                                    
                25=> [26]                                                     
            5=> [6]                                                           
                6=> [7 13]                                                   
                    7=> [8 11]                                               
                        8=> [9 10]                                           

lib/Algorithm/DecisionTree.pm  view on Meta::CPAN

(possibly overlapping) segments of the data extracted from your training dataset and
then aggregating the decisions made by the individual decision trees for the final
classification.  The aggregation of the classification decisions can average out the
noise and bias that may otherwise affect the classification decision obtained from
just one tree.

=over 4

=item B<Calling the bagging constructor::>

A typical call to the constructor for the C<DecisionTreeWithBagging> class looks
like:

    use Algorithm::DecisionTreeWithBagging;
    
    my $training_datafile = "stage3cancer.csv";
    
    my $dtbag = Algorithm::DecisionTreeWithBagging->new(
                                  training_datafile => $training_datafile,
                                  csv_class_column_index => 2,
                                  csv_columns_for_features => [3,4,5,6,7,8],

lib/Algorithm/DecisionTree.pm  view on Meta::CPAN

Keep in mind the fact that, ordinarily, the theoretical guarantees provided by
boosting apply only to the case of binary classification.  Additionally, your
training dataset must capture all of the significant statistical variations in the
classes represented therein.

=over 4

=item B<Calling the BoostedDecisionTree constructor:>

If you'd like to experiment with boosting, a typical call to the constructor for the
C<BoostedDecisionTree> class looks like: 

    use Algorithm::BoostedDecisionTree;
    my $training_datafile = "training6.csv";
    my $boosted = Algorithm::BoostedDecisionTree->new(
                              training_datafile => $training_datafile,
                              csv_class_column_index => 1,
                              csv_columns_for_features => [2,3],
                              entropy_threshold => 0.01,
                              max_depth_desired => 8,
                              symbolic_to_numeric_cardinality_threshold => 10,



( run in 0.354 second using v1.01-cache-2.11-cpan-64827b87656 )