Statistics-TopK

 view release on metacpan or  search on metacpan

META.json  view on Meta::CPAN

{
   "abstract" : "Implementation of the top-k streaming algorithm",
   "author" : [
      "gray <gray@cpan.org>"
   ],
   "dynamic_config" : 1,
   "generated_by" : "ExtUtils::MakeMaker version 7.04, CPAN::Meta::Converter version 2.150001",
   "license" : [
      "perl_5"
   ],
   "meta-spec" : {
      "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",

META.yml  view on Meta::CPAN

---
abstract: 'Implementation of the top-k streaming algorithm'
author:
  - 'gray <gray@cpan.org>'
build_requires:
  Test::More: '0.98'
configure_requires:
  ExtUtils::MakeMaker: '0'
dynamic_config: 1
generated_by: 'ExtUtils::MakeMaker version 7.04, CPAN::Meta::Converter version 2.150001'
license: perl
meta-spec:

README  view on Meta::CPAN

Statistics-TopK
==============

This module implements the top-K streaming algorithm.

INSTALLATION

To install this module type the following:

    perl Makefile.PL
    make
    make test
    make install

lib/Statistics/TopK.pm  view on Meta::CPAN

    return %{$_[0]->[_COUNTS]};
}


1;

__END__

=head1 NAME

Statistics::TopK - Implementation of the top-k streaming algorithm

=head1 SYNOPSIS

    use Statistics::TopK;

    my $counter = Statistics::TopK->new(10);
    while (my $val = <STDIN>) {
        chomp $val;
        $counter->add($val);
    }
    my @top = $counter->top;
    my %counts = $counter->counts;

=head1 DESCRIPTION

The C<Statistics::TopK> module implements the top-k streaming algorithm,
also know as the "heavy hitters" algorithm. It is designed to process
data streams and probabilistally calculate the C<k> most frequent items
while using limited memory.

A typical example would be to determine the top 10 IP addresses listed in an
access log. A simple solution would be to hash each IP address to a counter
and then sort the resulting hash by the counter size. But the hash could
theoretically require over 4 billion keys.

The top-k algorithm only requires storage space proportional to the number



( run in 0.262 second using v1.01-cache-2.11-cpan-4d50c553e7e )