Statistics-TopK
view release on metacpan or search on metacpan
{
"abstract" : "Implementation of the top-k streaming algorithm",
"author" : [
"gray <gray@cpan.org>"
],
"dynamic_config" : 1,
"generated_by" : "ExtUtils::MakeMaker version 7.04, CPAN::Meta::Converter version 2.150001",
"license" : [
"perl_5"
],
"meta-spec" : {
"url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
---
abstract: 'Implementation of the top-k streaming algorithm'
author:
- 'gray <gray@cpan.org>'
build_requires:
Test::More: '0.98'
configure_requires:
ExtUtils::MakeMaker: '0'
dynamic_config: 1
generated_by: 'ExtUtils::MakeMaker version 7.04, CPAN::Meta::Converter version 2.150001'
license: perl
meta-spec:
Statistics-TopK
==============
This module implements the top-K streaming algorithm.
INSTALLATION
To install this module type the following:
perl Makefile.PL
make
make test
make install
lib/Statistics/TopK.pm view on Meta::CPAN
return %{$_[0]->[_COUNTS]};
}
1;
__END__
=head1 NAME
Statistics::TopK - Implementation of the top-k streaming algorithm
=head1 SYNOPSIS
use Statistics::TopK;
my $counter = Statistics::TopK->new(10);
while (my $val = <STDIN>) {
chomp $val;
$counter->add($val);
}
my @top = $counter->top;
my %counts = $counter->counts;
=head1 DESCRIPTION
The C<Statistics::TopK> module implements the top-k streaming algorithm,
also know as the "heavy hitters" algorithm. It is designed to process
data streams and probabilistally calculate the C<k> most frequent items
while using limited memory.
A typical example would be to determine the top 10 IP addresses listed in an
access log. A simple solution would be to hash each IP address to a counter
and then sort the resulting hash by the counter size. But the hash could
theoretically require over 4 billion keys.
The top-k algorithm only requires storage space proportional to the number
( run in 0.262 second using v1.01-cache-2.11-cpan-4d50c553e7e )