Algorithm-RabinKarp
view release on metacpan or search on metacpan
lib/Algorithm/RabinKarp.pm view on Meta::CPAN
sub next {
my $self = shift;
# assume, for now, that each value is an integer, or can
# auto cast to char
my $values = $self->{vals}; #assume that @values always contains k values
my $prev = shift @$values || [0, undef];
my $hash = $self->{hash} || 0;
while (@$values < $self->{k}) {
my $nextval = [$self->{stream}->()];
return unless @$nextval;
push @$values, $nextval;
$hash <<= 1;
$hash -= $prev->[0] << $self->{k};
$hash += $nextval->[0];
}
$self->{hash} = $hash;
return $hash, $values->[0][1], $values->[-1][1], @{ $values }[0, -1];
}
=item values
Returns an array containing all C<n - k + 1> hash values contained
within the data stream, and the positions associated with them (in the same
format as yielded by L<next|/METHODS>.)
After calling C<values()> the stream will be completely exhausted, causing
subsequent calls to C<values> and C<next()> to return C<undef>.
NOTE: You should use C<next> if your source stream is infinite, as values
will greedily attempt to consume all values.
=cut
sub values {
my $self = shift;
my @values;
while (my @next = $self->next()) {
push @values, \@next;
}
return @values;
}
=back
=cut
=head1 BUGS
The current multipliers and modulus lead to very poor hash
distributions. I'll investigate methods of improving this
in future versions.
=head1 SEE ALSO
"Winnowing: Local Algorithms for Document Fingerprinting"
L<http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf>
Wikipedia: Rabin-Karp string search algorithm
L<http://en.wikipedia.org/wiki/Rabin-Karp>
=head1 AUTHOR
Norman Nunley E<lt>nnunley@gmail.comE<gt>
Nicholas Clark (Who paired with me)
=cut
1;
( run in 2.227 seconds using v1.01-cache-2.11-cpan-5a3173703d6 )