Algorithm-RabinKarp
view release on metacpan or search on metacpan
lib/Algorithm/RabinKarp.pm view on Meta::CPAN
Schleimer, Wilkerson, and Aiken. Following the suggestion of Schleimer,
I am using their second equation:
$H[ $c[2..$k + 1] ] = (( $H[ $c[1..$k] ] - $c[1] ** $k ) + $c[$k+1] ) * $k
The results of this hash encodes information about the next k values in
the stream (hense k-gram.) This means for any given stream of length n
integer values (or characters), you will get back n - k + 1 hash
values.
For best results, you will want to create a code generator that filters
your data to remove all unnecessary information. For example, in a large
english document, you should probably remove all white space, as well
as removing all capitalization.
=head1 INTENT
By preprocessing your document with the Rabin Karp hashing algorithm,
it makes it possible to create a "fingerprint" of your document (or documents),
and then perform multiple searches for fragments contained within your document
database.
( run in 1.006 second using v1.01-cache-2.11-cpan-4e96b696675 )