process results from the CPAN

process

Algorithm-NGram

view release on metacpan or search on metacpan

lib/Algorithm/NGram.pm view on Meta::CPAN

=cut

=item token_table

Returns n-gram table

=cut

=item add_text

Splits a block of text up by whitespace and processes each word as a
token. Automatically calls C<add_start_token()> at the beginning of
the text and C<add_end_token()> at the end.

=cut

# process a block of text, auto-tokenizing it
sub add_text {
    my ($self, $text) = @_;

    $self->add_start_token;

    # tokenize text
    foreach my $tok (split(/ /, $text)) {
        $tok =~ s/ +//g; # remove spaces

        next unless $tok;

( run in 0.245 second using v1.01-cache-2.11-cpan-8d75d55dd25 )