Algorithm-NGram
view release on metacpan or search on metacpan
lib/Algorithm/NGram.pm view on Meta::CPAN
=cut
=item token_table
Returns n-gram table
=cut
=item add_text
Splits a block of text up by whitespace and processes each word as a
token. Automatically calls C<add_start_token()> at the beginning of
the text and C<add_end_token()> at the end.
=cut
# process a block of text, auto-tokenizing it
sub add_text {
my ($self, $text) = @_;
$self->add_start_token;
# tokenize text
foreach my $tok (split(/ /, $text)) {
$tok =~ s/ +//g; # remove spaces
next unless $tok;
( run in 0.245 second using v1.01-cache-2.11-cpan-8d75d55dd25 )