Catmandu-Stat

 view release on metacpan or  search on metacpan

lib/Catmandu/Exporter/Stat.pm  view on Meta::CPAN

=head1 DESCRIPTION

The L<Catmandu::Stat> package can be used to calculate statistics on the availablity of
fields in a data file. Use this exporter to count the availability of fields or count
the number of duplicate values. For each field the exporter calculates the following
statistics:

  * name    : the name of a field
  * count   : the number of occurences of a field in all records
  * zeros   : the number of records without a field
  * zeros%  : the percentage of records without a field
  * min     : the minimum number of occurences of a field in any record
  * max     : the maximum number of occurences of a field in any record
  * mean    : the mean number of occurences of a field in all records
  * variance : the variance of the field number
  * stdev   : the standard deviation of the field number
  * uniq~   : the estimated number of unique records
  * uniq%   : the estimated percentage of uniq values
  * entropy : the minimum and maximum entropy in the field values (estimated value)

Details:

  * entropy is an indication in the variation of field values (are some values more unique than others)
  * entropy values are displayed as : minimum/maximum entropy
  * when the minimum entropy = 0, then all the field values are equal
  * when the minimum and maximum entropy are equal, then all the field values are different
  * the 'uniq%' and 'entropy' fields are estimated and are normally within 1% of the
    correct value (this is done to keep the memory requirements of this module low)



( run in 0.415 second using v1.01-cache-2.11-cpan-624ce96ca49 )