Algorithm-AhoCorasick

 view release on metacpan or  search on metacpan

lib/Algorithm/AhoCorasick.pm  view on Meta::CPAN

         print "$pos: $keywords\n";
     }
 }

=head1 DESCRIPTION

Aho-Corasick is a classic (1975) algorithm for locating elements of a
finite set of strings within an input text. It constructs a finite
state machine from a list of keywords, then uses the machine to locate
all occurrences of the keywords. Construction of the machine takes
time proportional to the sum of the lengths of the keywords and the
machine processes the input string in a single pass - that is, the
algorithm may be considerably more efficient than searching for each
keyword separately.

=head1 PROCEDURAL INTERFACE

The module exports 2 functions for the common use cases: C<find_all>
for finding all matches, and C<find_first> for finding whether a match
exists at all. Note that both functions must be explicitly imported
(i.e. with C<use Algorithm::AhoCorasick qw(find_all find_first);>)
before they can be called. Both functions take the same arguments: the
first argument is the text to be searched, the following are the
keywords to search for (there must be at least one, and the functions
die rather than search for empty strings).

=head2 find_all

When no keyword is found in the input text, C<find_all> returns
undef; when some keywords are found, the return value is a hash
reference mapping positions to keywords (in an array reference,
ordered by length) found at those positions.

=head2 find_first

When no keyword is found in the input text, C<find_first> returns
undef in scalar context and an empty array in list context; when a
keyword is found, the return value is a pair of its position in the
input text and the found keyword (as a list if the function has been
called in list context, as an array reference otherwise).

=head1 OBJECT-ORIENTED INTERFACE

lib/Algorithm/AhoCorasick/SearchMachine.pm  view on Meta::CPAN

    }

    $self->{root}->failure($self->{root});
    $self->{state} = $self->{root};
}

sub feed {
    my ($self, $text, $callback) = @_;

    my $index = 0;
    my $l = length($text);
    while ($index < $l) {
	my $trans = undef;
	while (1) {
	    $trans = $self->{state}->get_transition(substr($text, $index, 1));
	    last if ($self->{state} == $self->{root}) || $trans;
	    $self->{state} = $self->{state}->failure;
	}

	if ($trans) {
	    $self->{state} = $trans;
	}

	foreach my $found ($self->{state}->results) {
	    my $rv = &$callback($index - length($found) + 1, $found);
	    if ($rv) {
		return $rv;
	    }
	}

	++$index;
    }

    return undef;
}



( run in 0.710 second using v1.01-cache-2.11-cpan-65fba6d93b7 )