Algorithm-RabinKarp

 view release on metacpan or  search on metacpan

lib/Algorithm/RabinKarp.pm  view on Meta::CPAN

sub next {
  my $self = shift;

  # assume, for now, that each value is an integer, or can
  # auto cast to char
  my $values = $self->{vals}; #assume that @values always contains k values
  my $prev = shift @$values || [0, undef];
  my $hash = $self->{hash} || 0;
  while (@$values < $self->{k}) {
    my $nextval = [$self->{stream}->()];
    return unless @$nextval;
    push @$values, $nextval;
    $hash <<= 1;
    $hash -= $prev->[0] << $self->{k};
    $hash += $nextval->[0];
    
  }

  $self->{hash} = $hash;
  
  return $hash, $values->[0][1], $values->[-1][1], @{ $values }[0, -1];
}

=item values

Returns an array containing all C<n - k + 1> hash values contained
within the data stream, and the positions associated with them (in the same
format as yielded by L<next|/METHODS>.)

After calling C<values()> the stream will be completely exhausted, causing 
subsequent calls to C<values> and C<next()> to return C<undef>.

NOTE: You should use C<next> if your source stream is infinite, as values
will greedily attempt to consume all values.

=cut

sub values {
  my $self = shift;
  
  my @values;
  while (my @next = $self->next()) {
    push @values, \@next;
  }
  return @values;
}

=back

=cut

=head1 BUGS

The current multipliers and modulus lead to very poor hash
distributions.  I'll investigate methods of improving this
in future versions.

=head1 SEE ALSO

  "Winnowing: Local Algorithms for Document Fingerprinting"
  L<http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf>

  Wikipedia: Rabin-Karp string search algorithm
  L<http://en.wikipedia.org/wiki/Rabin-Karp>

=head1 AUTHOR

  Norman Nunley E<lt>nnunley@gmail.comE<gt>
  Nicholas Clark (Who paired with me)

=cut

1;



( run in 2.227 seconds using v1.01-cache-2.11-cpan-5a3173703d6 )