Algorithm-AhoCorasick-XS

 view release on metacpan or  search on metacpan

README.md  view on Meta::CPAN


If you pass Unicode strings to the matcher, they will be interpreted as a sequence
of UTF-8 bytes. This means the output of `matches`, `match_details` etc. will also
be in terms of bytes.

You can simply call ` decode('UTF-8', ...) ` on the substrings to get their
Unicode versions. The offsets will be in bytes though; converting them to character
offsets in the Unicode string is a little more tricky:

    use Encode qw(decode);
    my $unicode_start = length(decode('UTF-8', bytes::substr($string, 0, $start)));
    my $unicode_end   = $start + length(decode('UTF-8', $word)) - 1;

This will be handled for you in a future version.

# CAVEATS

This is an early release and has not been tested thoroughly, use at your own risk.
The API is subject to change until version 1.0.

If your keyword list contains duplicates, you will get duplicate matches.

lib/Algorithm/AhoCorasick/XS.pm  view on Meta::CPAN


If you pass Unicode strings to the matcher, they will be interpreted as a sequence
of UTF-8 bytes. This means the output of C<matches>, C<match_details> etc. will also
be in terms of bytes.

You can simply call C< decode('UTF-8', ...) > on the substrings to get their
Unicode versions. The offsets will be in bytes though; converting them to character
offsets in the Unicode string is a little more tricky:

 use Encode qw(decode);
 my $unicode_start = length(decode('UTF-8', bytes::substr($string, 0, $start)));
 my $unicode_end   = $start + length(decode('UTF-8', $word)) - 1;

This will be handled for you in a future version.

=head1 CAVEATS

This is an early release and has not been tested thoroughly, use at your own risk.
The API is subject to change until version 1.0.

If your keyword list contains duplicates, you will get duplicate matches.

ppport.h  view on Meta::CPAN

parse_fullexpr||5.013008|
parse_fullstmt||5.013005|
parse_gv_stash_name|||
parse_ident|||
parse_label||5.013007|
parse_listexpr||5.013008|
parse_lparen_question_flags|||
parse_stmtseq||5.013006|
parse_subsignature|||
parse_termexpr||5.013008|
parse_unicode_opts|||
parser_dup|||
parser_free_nexttoke_ops|||
parser_free|||
path_is_searchable|||n
peep|||
pending_ident|||
perl_alloc_using|||n
perl_alloc|||n
perl_clone_using|||n
perl_clone|||n

ppport.h  view on Meta::CPAN

#endif
#ifndef PERL_PV_PRETTY_DUMP
#  define PERL_PV_PRETTY_DUMP            PERL_PV_PRETTY_ELLIPSES|PERL_PV_PRETTY_QUOTE
#endif

#ifndef PERL_PV_PRETTY_REGPROP
#  define PERL_PV_PRETTY_REGPROP         PERL_PV_PRETTY_ELLIPSES|PERL_PV_PRETTY_LTGT|PERL_PV_ESCAPE_RE
#endif

/* Hint: pv_escape
 * Note that unicode functionality is only backported to
 * those perl versions that support it. For older perl
 * versions, the implementation will fall back to bytes.
 */

#ifndef pv_escape
#if defined(NEED_pv_escape)
static char * DPPP_(my_pv_escape)(pTHX_ SV * dsv, char const * const str, const STRLEN count, const STRLEN max, STRLEN * const escaped, const U32 flags);
static
#else
extern char * DPPP_(my_pv_escape)(pTHX_ SV * dsv, char const * const str, const STRLEN count, const STRLEN max, STRLEN * const escaped, const U32 flags);



( run in 0.505 second using v1.01-cache-2.11-cpan-88abd93f124 )