Algorithm-AhoCorasick-XS
view release on metacpan or search on metacpan
Other modules suffer from bugs (false negatives) when given overlapping substrings,
segfault, or won't tell you precisely which substrings matched.
# FUNCTIONS
- new ( ARRAYREF\_OF\_SUBSTRINGS )
Constructs a matcher object given an arrayref of substrings. Builds the internal
automaton.
- matches ( INPUT )
Given a string, returns a list of the substrings which are present in the input.
There may be duplicates if a substring occurs more than once.
- unique\_matches ( INPUT )
As above but runs `uniq` on the list for you.
- first\_match ( INPUT )
Returns the first match only (or undef if none). This is efficient - the matcher
will stop once it encounters the first match, and the rest of the string will be
ignored.
- match\_details ( INPUT )
Returns a list of hashrefs, containing the keys `word`, `start` and `end`.
These correspond to an occurence of a substring - the word, start and end offset
within the string.
# ENCODING SUPPORT
The matcher runs at the byte level, so you can use any encoding you like. If you
want to match strings regardless of encoding, I recommend that you encode everything
into UTF-8 and apply NFC normalization (or perhaps NFD).
lib/Algorithm/AhoCorasick/XS.pm view on Meta::CPAN
=head1 FUNCTIONS
=over
=item new ( ARRAYREF_OF_SUBSTRINGS )
Constructs a matcher object given an arrayref of substrings. Builds the internal
automaton.
=item matches ( INPUT )
Given a string, returns a list of the substrings which are present in the input.
There may be duplicates if a substring occurs more than once.
=item unique_matches ( INPUT )
As above but runs C<uniq> on the list for you.
=item first_match ( INPUT )
Returns the first match only (or undef if none). This is efficient - the matcher
will stop once it encounters the first match, and the rest of the string will be
ignored.
=item match_details ( INPUT )
Returns a list of hashrefs, containing the keys C<word>, C<start> and C<end>.
These correspond to an occurence of a substring - the word, start and end offset
within the string.
=back
=head1 ENCODING SUPPORT
The matcher runs at the byte level, so you can use any encoding you like. If you
perlobject.map view on Meta::CPAN
# "perlobject.map" Dean Roehrich, version 19960302
#
# TYPEMAPs
#
# HV * -> unblessed Perl HV object.
# AV * -> unblessed Perl AV object.
#
# INPUT/OUTPUT maps
#
# O_* -> opaque blessed objects
# T_* -> opaque blessed or unblessed objects
#
# O_OBJECT -> link an opaque C or C++ object to a blessed Perl object.
# T_OBJECT -> link an opaque C or C++ object to an unblessed Perl object.
# O_HvRV -> a blessed Perl HV object.
# T_HvRV -> an unblessed Perl HV object.
# O_AvRV -> a blessed Perl AV object.
# T_AvRV -> an unblessed Perl AV object.
TYPEMAP
HV * T_HvRV
AV * T_AvRV
######################################################################
OUTPUT
# The Perl object is blessed into 'CLASS', which should be a
# char* having the name of the package for the blessing.
O_OBJECT
sv_setref_pv( $arg, CLASS, (void*)$var );
T_OBJECT
sv_setref_pv( $arg, Nullch, (void*)$var );
# Cannot use sv_setref_pv() because that will destroy
perlobject.map view on Meta::CPAN
# the AV-ness of the object. Remember that newRV() will increment
# the refcount.
O_AvRV
$arg = sv_bless( newRV((SV*)$var), gv_stashpv(CLASS,1) );
T_AvRV
$arg = newRV((SV*)$var);
######################################################################
INPUT
O_OBJECT
if( sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG) )
$var = ($type)SvIV((SV*)SvRV( $arg ));
else{
warn( \"${Package}::$func_name() -- $var is not a blessed SV reference\" );
XSRETURN_UNDEF;
}
T_OBJECT
PTR2ul|5.007001||p
PTRV|5.006000||p
PUSHMARK|||
PUSH_MULTICALL||5.024000|
PUSHi|||
PUSHmortal|5.009002||p
PUSHn|||
PUSHp|||
PUSHs|||
PUSHu|5.004000||p
PUTBACK|||
PadARRAY||5.024000|
PadMAX||5.024000|
PadlistARRAY||5.024000|
PadlistMAX||5.024000|
PadlistNAMESARRAY||5.024000|
PadlistNAMESMAX||5.024000|
PadlistNAMES||5.024000|
PadlistREFCNT||5.017004|
PadnameIsOUR|||
PadnameIsSTATE|||
{
dSP;
SV* sv = newSVpv(p, 0);
PUSHMARK(sp);
eval_sv(sv, G_SCALAR);
SvREFCNT_dec(sv);
SPAGAIN;
sv = POPs;
PUTBACK;
if (croak_on_error && SvTRUE(GvSV(errgv)))
croak(SvPVx(GvSV(errgv), na));
return sv;
}
#endif
#endif
TYPEMAP
AhoCorasick::Matcher* O_OBJECT
string T_STD_STRING
vector<string> T_STD_VECTOR_STRING
vector<AhoCorasick::match> T_STD_VECTOR_MATCH
INPUT
T_STD_STRING
if (!SvOK($arg)) {
$var = std::string();
} else {
$var = std::string(SvPV_nolen($arg), SvCUR($arg));
}
// From ExtUtils::Typemap::STL::Vector.
// Accept an arrayref
T_STD_VECTOR_STRING
}
else
${var}[i] = std::string(\"\");
}
}
else
Perl_croak(aTHX_ \"%s: %s is not an array reference\",
${$ALIAS?\q[GvNAME(CvGV(cv))]:\qq[\"$pname\"]},
\"$var\");
OUTPUT
T_STD_STRING
$arg = sv_2mortal(newSVpvn($var.c_str(), $var.length()));
// Output a list.
T_STD_VECTOR_STRING
{
U32 ix_$var;
size_t size_$var = $var.size();
SSize_t extend_size =
/* The weird way this is written is because g++ is dumb
( run in 0.771 second using v1.01-cache-2.11-cpan-4e96b696675 )