App-freqtable
view release on metacpan or search on metacpan
script/freqtable view on Meta::CPAN
=head2 Formatting the output line: custom formatting (`--format` option)
% freqtable --format '%04d: %s' input-lines.txt
0003: five
=head2 Filter by rank
Only display the top 3 ranks:
% freqtable input-lines.txt -r -3
% freqtable input-lines.txt -r 1-3
3 five
2 eight
2 one
=head2 Sorting
Instead of the default sorting by frequency (descending order), if you specify
C<--sort-sub> (and optionally one or more C<--sort-arg>) you can sort by the
keys using one of L<Sort::Sub>::* subroutines. Examples:
# sort by keys, asciibetically
% freqtable -F input-lines.txt --sort-sub asciibetically
2 eight
3 five
1 four
1 nine
2 one
1 seven
1 six
1 three
1 two
# sort by keys, asciibetically (descending order)
% freqtable -F input-lines.txt --sort-sub 'asciibetically<r>'
1 two
1 three
1 six
1 seven
2 one
1 nine
1 four
3 five
2 eight
# sort by keys, randomly using perl code (essentially, shuffling)
% freqtable -F input-lines.txt --sort-sub 'by_perl_code' --sort-arg 'code=int(rand()*3)-1'
3 five
1 three
2 eight
1 seven
2 one
1 six
1 nine
1 two
1 four
=head2 Running table (`--output-every` option)
If you have streaming input, you can instruct `freqtable` to print the result
periodically after a number of input lines/words/characters/bytes. You can also
instruct to clear the terminal screen before every output
(`--clear-before-output`).
% perl -MArray::Sample::WeightedRandom=sample_weighted_random_with_replacement \
-E'say sample_weighted_random_with_replacement(
[ ["a", 1], ["b", 2], ["c", 3], ["d",5] ], 1) while 1' | \
freqtable --output-every 10000 --clear --percent
Sample output:
45.43% d
27.28% c
18.20% b
9.10% a
=head1 DESCRIPTION
This utility counts the occurences of lines (or words/characters) in the input
then display each unique lines along with their number of occurrences. You can
also instruct it to only show lines that have a specified number of occurrences.
You can use the following Unix command to count occurences of lines:
% sort input-lines.txt | uniq -c | sort -nr
and with a bit more work you can also use a combination of existing Unix
commands to count occurrences of words/characters, as well as filter items that
have a specified number of occurrences; freqtable basically offers convenience.
=head1 EXIT CODES
0 on success.
255 on I/O error.
99 on command-line options error.
=head1 OPTIONS
=over
=item * --bytes, -c
=item * --chars, -m
=item * --words, -w
=item * --lines, -l
=item * --number, -n
Treat each line as a number. A line like this:
9.99 cents
will be regarded as:
9.99
( run in 1.970 second using v1.01-cache-2.11-cpan-140bd7fdf52 )