App-BloomUtils

 view release on metacpan or  search on metacpan

script/bloomcalc  view on Meta::CPAN

    url => "/App/BloomUtils/bloom_filter_calculator",
    program_name => "bloomcalc",
    log => 1,
    read_config => 0,
    read_env => 0,
);

$cmdline->run;

# ABSTRACT: Shorter alias for bloom-filter-calculator
# PODNAME: bloomcalc

__END__

=pod

=encoding UTF-8

=head1 NAME

bloomcalc - Shorter alias for bloom-filter-calculator

=head1 VERSION

This document describes version 0.007 of bloomcalc (from Perl distribution App-BloomUtils), released on 2020-05-24.

=head1 SYNOPSIS

Usage:

 % bloomcalc [--debug] [--false-positive-rate=s] [--format=name]
     [--fp-rate=s] [--json] [-k=s] [--log-level=level] [-m=s]
     [--(no)naked-res] [--num-bits=s]
     [--num-hashes-to-bits-per-item-ratio=s] [--num-hashes=s] [-p=s]
     [--page-result[=program]] [--quiet] [--trace] [--verbose] <num_items>

=head1 DESCRIPTION

You supply lines of text from STDIN and it will output the bloom filter bits on
STDOUT. You can also customize C<num_bits> (C<m>) and C<num_hashes> (C<k>), or, more
easily, C<num_items> and C<fp_rate>. Some rules of thumb to remember:

=over

=item * One byte per item in the input set gives about a 2% false positive rate. So if
you expect two have 1024 elements, create a 1KB bloom filter with about 2%
false positive rate. For other false positive rates:

10%    -  4.8 bits per item
 1%    -  9.6 bits per item
 0.1%  - 14.4 bits per item
 0.01% - 19.2 bits per item

=item * Optimal number of hash functions is 0.7 times number of bits per item. Note
that the number of hashes dominate performance. If you want higher
performance, pick a smaller number of hashes. But for most cases, use the the
optimal number of hash functions.

=item * What is an acceptable false positive rate? This depends on your needs. 1% (1
in 100) or 0.1% (1 in 1,000) is a good start. If you want to make sure that
user's chosen password is not in a known wordlist, a higher false positive
rates will annoy your user more by rejecting her password more often, while
lower false positive rates will require a higher memory usage.

=back

Ref: https://corte.si/posts/code/bloom-filter-rules-of-thumb/index.html

B<FAQ>

=over

=item * Why does two different false positive rates (e.g. 1% and 0.1%) give the same bloom filter size?

The parameter C<m> is rounded upwards to the nearest power of 2 (e.g. 1024*8
bits becomes 1024*8 bits but 1025*8 becomes 2048*8 bits), so sometimes two
false positive rates with different C<m> get rounded to the same value of C<m>.
Use the C<bloom_filter_calculator> routine to see the C<actual_m> and C<actual_p>
(actual false-positive rate).

=back

=head1 OPTIONS

C<*> marks required options.

=head2 Main options

=over

=item B<--false-positive-rate>=I<s>, B<-p>, B<--fp-rate>

Default value:

 0.02

=item B<--num-bits>=I<s>, B<-m>

Number of bits to set for the bloom filter.

=item B<--num-hashes-to-bits-per-item-ratio>=I<s>

0.7 (the default) is optimal.

=item B<--num-hashes>=I<s>, B<-k>

=item B<--num-items>=I<s>*, B<-n>

Expected number of items to add to bloom filter.

=back

=head2 Logging options

=over

=item B<--debug>

Shortcut for --log-level=debug.

=item B<--log-level>=I<s>



( run in 1.420 second using v1.01-cache-2.11-cpan-39bf76dae61 )