Algorithm-RabinKarp
view release on metacpan or search on metacpan
lib/Algorithm/RabinKarp.pm view on Meta::CPAN
integer values (or characters), you will get back n - k + 1 hash
values.
For best results, you will want to create a code generator that filters
your data to remove all unnecessary information. For example, in a large
english document, you should probably remove all white space, as well
as removing all capitalization.
=head1 INTENT
By preprocessing your document with the Rabin Karp hashing algorithm,
it makes it possible to create a "fingerprint" of your document (or documents),
and then perform multiple searches for fragments contained within your document
database.
Schleimer, Wilkerson, and Aiken suggest preproccessing to remove
unnecessary information (like whitespace), as well as known redundent information
(like, say, copyright notices or other boilerplate that is 'acceptable'.)
They also suggest a post processing pass to reduce data volume, using a technique
called winnowing (see the link at the end of this documentation.)
=head1 METHODS
=over
=item new($k, [FileHandle|Scalar|Coderef] )
Creates a new hash generator. If you provide a callback function, it must
return the next integer value in the stream. Additionally, you may
( run in 0.231 second using v1.01-cache-2.11-cpan-8d75d55dd25 )