best results from the CPAN

best

Algorithm-RabinKarp

view release on metacpan or search on metacpan

lib/Algorithm/RabinKarp.pm view on Meta::CPAN

Schleimer, Wilkerson, and Aiken.  Following the suggestion of Schleimer,
I am using their second equation:

  $H[ $c[2..$k + 1] ] = (( $H[ $c[1..$k] ] - $c[1] ** $k ) + $c[$k+1] ) * $k

The results of this hash encodes information about the next k values in
the stream (hense k-gram.) This means for any given stream of length n
integer values (or characters), you will get back n - k + 1 hash
values.

For best results, you will want to create a code generator that filters
your data to remove all unnecessary information. For example, in a large
english document, you should probably remove all white space, as well
as removing all capitalization.

=head1 INTENT

By preprocessing your document with the Rabin Karp hashing algorithm,
it makes it possible to create a "fingerprint" of your document (or documents), 
and then perform multiple searches for fragments contained within your document
database.

( run in 0.768 second using v1.01-cache-2.11-cpan-39bf76dae61 )