App-BloomUtils
view release on metacpan or search on metacpan
script/bloomcalc view on Meta::CPAN
url => "/App/BloomUtils/bloom_filter_calculator",
program_name => "bloomcalc",
log => 1,
read_config => 0,
read_env => 0,
);
$cmdline->run;
# ABSTRACT: Shorter alias for bloom-filter-calculator
# PODNAME: bloomcalc
__END__
=pod
=encoding UTF-8
=head1 NAME
bloomcalc - Shorter alias for bloom-filter-calculator
=head1 VERSION
This document describes version 0.007 of bloomcalc (from Perl distribution App-BloomUtils), released on 2020-05-24.
=head1 SYNOPSIS
Usage:
% bloomcalc [--debug] [--false-positive-rate=s] [--format=name]
[--fp-rate=s] [--json] [-k=s] [--log-level=level] [-m=s]
[--(no)naked-res] [--num-bits=s]
[--num-hashes-to-bits-per-item-ratio=s] [--num-hashes=s] [-p=s]
[--page-result[=program]] [--quiet] [--trace] [--verbose] <num_items>
=head1 DESCRIPTION
You supply lines of text from STDIN and it will output the bloom filter bits on
STDOUT. You can also customize C<num_bits> (C<m>) and C<num_hashes> (C<k>), or, more
easily, C<num_items> and C<fp_rate>. Some rules of thumb to remember:
=over
=item * One byte per item in the input set gives about a 2% false positive rate. So if
you expect two have 1024 elements, create a 1KB bloom filter with about 2%
false positive rate. For other false positive rates:
10% - 4.8 bits per item
1% - 9.6 bits per item
0.1% - 14.4 bits per item
0.01% - 19.2 bits per item
=item * Optimal number of hash functions is 0.7 times number of bits per item. Note
that the number of hashes dominate performance. If you want higher
performance, pick a smaller number of hashes. But for most cases, use the the
optimal number of hash functions.
=item * What is an acceptable false positive rate? This depends on your needs. 1% (1
in 100) or 0.1% (1 in 1,000) is a good start. If you want to make sure that
user's chosen password is not in a known wordlist, a higher false positive
rates will annoy your user more by rejecting her password more often, while
lower false positive rates will require a higher memory usage.
=back
Ref: https://corte.si/posts/code/bloom-filter-rules-of-thumb/index.html
B<FAQ>
=over
=item * Why does two different false positive rates (e.g. 1% and 0.1%) give the same bloom filter size?
The parameter C<m> is rounded upwards to the nearest power of 2 (e.g. 1024*8
bits becomes 1024*8 bits but 1025*8 becomes 2048*8 bits), so sometimes two
false positive rates with different C<m> get rounded to the same value of C<m>.
Use the C<bloom_filter_calculator> routine to see the C<actual_m> and C<actual_p>
(actual false-positive rate).
=back
=head1 OPTIONS
C<*> marks required options.
=head2 Main options
=over
=item B<--false-positive-rate>=I<s>, B<-p>, B<--fp-rate>
Default value:
0.02
=item B<--num-bits>=I<s>, B<-m>
Number of bits to set for the bloom filter.
=item B<--num-hashes-to-bits-per-item-ratio>=I<s>
0.7 (the default) is optimal.
=item B<--num-hashes>=I<s>, B<-k>
=item B<--num-items>=I<s>*, B<-n>
Expected number of items to add to bloom filter.
=back
=head2 Logging options
=over
=item B<--debug>
Shortcut for --log-level=debug.
=item B<--log-level>=I<s>
( run in 1.420 second using v1.01-cache-2.11-cpan-39bf76dae61 )