App-freqtable

 view release on metacpan or  search on metacpan

script/freqtable  view on Meta::CPAN


=head2 Formatting the output line: custom formatting (`--format` option)

 % freqtable --format '%04d: %s' input-lines.txt
 0003: five

=head2 Filter by rank

Only display the top 3 ranks:

 % freqtable input-lines.txt -r -3
 % freqtable input-lines.txt -r 1-3
         3 five
         2 eight
         2 one

=head2 Sorting

Instead of the default sorting by frequency (descending order), if you specify
C<--sort-sub> (and optionally one or more C<--sort-arg>) you can sort by the
keys using one of L<Sort::Sub>::* subroutines. Examples:

 # sort by keys, asciibetically
 % freqtable -F input-lines.txt --sort-sub asciibetically
 2       eight
 3       five
 1       four
 1       nine
 2       one
 1       seven
 1       six
 1       three
 1       two

 # sort by keys, asciibetically (descending order)
 % freqtable -F input-lines.txt --sort-sub 'asciibetically<r>'
 1       two
 1       three
 1       six
 1       seven
 2       one
 1       nine
 1       four
 3       five
 2       eight

 # sort by keys, randomly using perl code (essentially, shuffling)
 % freqtable -F input-lines.txt --sort-sub 'by_perl_code' --sort-arg 'code=int(rand()*3)-1'
 3       five
 1       three
 2       eight
 1       seven
 2       one
 1       six
 1       nine
 1       two
 1       four

=head2 Running table (`--output-every` option)

If you have streaming input, you can instruct `freqtable` to print the result
periodically after a number of input lines/words/characters/bytes. You can also
instruct to clear the terminal screen before every output
(`--clear-before-output`).

 % perl -MArray::Sample::WeightedRandom=sample_weighted_random_with_replacement \
     -E'say sample_weighted_random_with_replacement(
          [ ["a", 1], ["b", 2], ["c", 3], ["d",5] ], 1) while 1' | \
   freqtable --output-every 10000 --clear --percent

Sample output:

 45.43%  d
 27.28%  c
 18.20%  b
  9.10%  a

=head1 DESCRIPTION

This utility counts the occurences of lines (or words/characters) in the input
then display each unique lines along with their number of occurrences. You can
also instruct it to only show lines that have a specified number of occurrences.

You can use the following Unix command to count occurences of lines:

 % sort input-lines.txt | uniq -c | sort -nr

and with a bit more work you can also use a combination of existing Unix
commands to count occurrences of words/characters, as well as filter items that
have a specified number of occurrences; freqtable basically offers convenience.

=head1 EXIT CODES

0 on success.

255 on I/O error.

99 on command-line options error.

=head1 OPTIONS

=over

=item * --bytes, -c

=item * --chars, -m

=item * --words, -w

=item * --lines, -l

=item * --number, -n

Treat each line as a number. A line like this:

 9.99 cents

will be regarded as:

 9.99



( run in 1.970 second using v1.01-cache-2.11-cpan-140bd7fdf52 )