Catmandu-Stat
view release on metacpan or search on metacpan
lib/Catmandu/Exporter/Stat.pm view on Meta::CPAN
=head1 DESCRIPTION
The L<Catmandu::Stat> package can be used to calculate statistics on the availablity of
fields in a data file. Use this exporter to count the availability of fields or count
the number of duplicate values. For each field the exporter calculates the following
statistics:
* name : the name of a field
* count : the number of occurences of a field in all records
* zeros : the number of records without a field
* zeros% : the percentage of records without a field
* min : the minimum number of occurences of a field in any record
* max : the maximum number of occurences of a field in any record
* mean : the mean number of occurences of a field in all records
* variance : the variance of the field number
* stdev : the standard deviation of the field number
* uniq~ : the estimated number of unique records
* uniq% : the estimated percentage of uniq values
* entropy : the minimum and maximum entropy in the field values (estimated value)
Details:
* entropy is an indication in the variation of field values (are some values more unique than others)
* entropy values are displayed as : minimum/maximum entropy
* when the minimum entropy = 0, then all the field values are equal
* when the minimum and maximum entropy are equal, then all the field values are different
* the 'uniq%' and 'entropy' fields are estimated and are normally within 1% of the
correct value (this is done to keep the memory requirements of this module low)
( run in 0.415 second using v1.01-cache-2.11-cpan-624ce96ca49 )