Algorithm-DecisionTree

 view release on metacpan or  search on metacpan

lib/Algorithm/DecisionTree.pm  view on Meta::CPAN

decision trees, each using the minority class samples along with samples drawn
randomly from the majority class.  The final classification is made through a
majority vote from all the decision trees.  B<(3)> Assuming you are faced with a
big-data problem --- in the sense that you have been given a training database with a
very large number of training records --- the class C<RandomizedTreesForBigData> will
also let you construct multiple decision trees by pulling training data randomly from
your training database (without paying attention to the relative populations of the
classes).  The final classification decision for a test sample is based on a majority
vote from all the decision trees thus constructed.  See the C<ExamplesRandomizedTrees>
directory for how to use these new features of the module. And, finally, B<(4)>
Support for the old-style '.dat' training files has been dropped in this version.

B<Version 3.21:> This version makes it easier to use a CSV training file that
violates the assumption that a comma be used only to separate the different field
values in a line record.  Some large econometrics databases use double-quoted values
for fields, and these values may contain commas (presumably for better readability).
This version also allows you to specify the leftmost entry in the first CSV record
that names all the fields. Previously, this entry was required to be an empty
double-quoted string.  I have also made some minor changes to the
'C<get_training_data_from_csv()>' method to make it more user friendly for large
training files that may contain tens of thousands of records.  When pulling training



( run in 0.755 second using v1.01-cache-2.11-cpan-49f99fa48dc )