Algorithm-AhoCorasick-XS

 view release on metacpan or  search on metacpan

README.md  view on Meta::CPAN

Other modules suffer from bugs (false negatives) when given overlapping substrings,
segfault, or won't tell you precisely which substrings matched.

# FUNCTIONS

- new ( ARRAYREF\_OF\_SUBSTRINGS )

    Constructs a matcher object given an arrayref of substrings. Builds the internal
    automaton.

- matches ( INPUT )

    Given a string, returns a list of the substrings which are present in the input.
    There may be duplicates if a substring occurs more than once.

- unique\_matches ( INPUT )

    As above but runs `uniq` on the list for you.

- first\_match ( INPUT )

    Returns the first match only (or undef if none). This is efficient - the matcher
    will stop once it encounters the first match, and the rest of the string will be
    ignored.

- match\_details ( INPUT )

    Returns a list of hashrefs, containing the keys `word`, `start` and `end`.
    These correspond to an occurence of a substring - the word, start and end offset
    within the string.

# ENCODING SUPPORT

The matcher runs at the byte level, so you can use any encoding you like. If you
want to match strings regardless of encoding, I recommend that you encode everything
into UTF-8 and apply NFC normalization (or perhaps NFD).

lib/Algorithm/AhoCorasick/XS.pm  view on Meta::CPAN


=head1 FUNCTIONS

=over

=item new ( ARRAYREF_OF_SUBSTRINGS )

Constructs a matcher object given an arrayref of substrings. Builds the internal
automaton.

=item matches ( INPUT )

Given a string, returns a list of the substrings which are present in the input.
There may be duplicates if a substring occurs more than once.

=item unique_matches ( INPUT )

As above but runs C<uniq> on the list for you.

=item first_match ( INPUT )

Returns the first match only (or undef if none). This is efficient - the matcher
will stop once it encounters the first match, and the rest of the string will be
ignored.

=item match_details ( INPUT )

Returns a list of hashrefs, containing the keys C<word>, C<start> and C<end>.
These correspond to an occurence of a substring - the word, start and end offset
within the string.

=back

=head1 ENCODING SUPPORT

The matcher runs at the byte level, so you can use any encoding you like. If you

perlobject.map  view on Meta::CPAN

# "perlobject.map"  Dean Roehrich, version 19960302
#
# TYPEMAPs
#
# HV *    -> unblessed Perl HV object.
# AV *    -> unblessed Perl AV object.
#
# INPUT/OUTPUT maps
#
# O_*   -> opaque blessed objects
# T_*   -> opaque blessed or unblessed objects
#
# O_OBJECT  -> link an opaque C or C++ object to a blessed Perl object.
# T_OBJECT  -> link an opaque C or C++ object to an unblessed Perl object.
# O_HvRV  -> a blessed Perl HV object.
# T_HvRV  -> an unblessed Perl HV object.
# O_AvRV  -> a blessed Perl AV object.
# T_AvRV  -> an unblessed Perl AV object.

TYPEMAP

HV *    T_HvRV
AV *    T_AvRV


######################################################################
OUTPUT

# The Perl object is blessed into 'CLASS', which should be a
# char* having the name of the package for the blessing.
O_OBJECT
  sv_setref_pv( $arg, CLASS, (void*)$var );

T_OBJECT
  sv_setref_pv( $arg, Nullch, (void*)$var );

# Cannot use sv_setref_pv() because that will destroy

perlobject.map  view on Meta::CPAN

# the AV-ness of the object.  Remember that newRV() will increment
# the refcount.
O_AvRV
  $arg = sv_bless( newRV((SV*)$var), gv_stashpv(CLASS,1) );

T_AvRV
  $arg = newRV((SV*)$var);


######################################################################
INPUT

O_OBJECT
  if( sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG) )
    $var = ($type)SvIV((SV*)SvRV( $arg ));
  else{
    warn( \"${Package}::$func_name() -- $var is not a blessed SV reference\" );
    XSRETURN_UNDEF;
  }

T_OBJECT

ppport.h  view on Meta::CPAN

PTR2ul|5.007001||p
PTRV|5.006000||p
PUSHMARK|||
PUSH_MULTICALL||5.024000|
PUSHi|||
PUSHmortal|5.009002||p
PUSHn|||
PUSHp|||
PUSHs|||
PUSHu|5.004000||p
PUTBACK|||
PadARRAY||5.024000|
PadMAX||5.024000|
PadlistARRAY||5.024000|
PadlistMAX||5.024000|
PadlistNAMESARRAY||5.024000|
PadlistNAMESMAX||5.024000|
PadlistNAMES||5.024000|
PadlistREFCNT||5.017004|
PadnameIsOUR|||
PadnameIsSTATE|||

ppport.h  view on Meta::CPAN

{
    dSP;
    SV* sv = newSVpv(p, 0);

    PUSHMARK(sp);
    eval_sv(sv, G_SCALAR);
    SvREFCNT_dec(sv);

    SPAGAIN;
    sv = POPs;
    PUTBACK;

    if (croak_on_error && SvTRUE(GvSV(errgv)))
        croak(SvPVx(GvSV(errgv), na));

    return sv;
}

#endif
#endif

typemap  view on Meta::CPAN

TYPEMAP
AhoCorasick::Matcher* O_OBJECT

string T_STD_STRING
vector<string> T_STD_VECTOR_STRING

vector<AhoCorasick::match> T_STD_VECTOR_MATCH

INPUT
T_STD_STRING
  if (!SvOK($arg)) {
    $var = std::string();
  } else {
    $var = std::string(SvPV_nolen($arg), SvCUR($arg));
  }

// From ExtUtils::Typemap::STL::Vector.
// Accept an arrayref
T_STD_VECTOR_STRING

typemap  view on Meta::CPAN

      }
      else
        ${var}[i] = std::string(\"\");
    }
  }
  else
    Perl_croak(aTHX_ \"%s: %s is not an array reference\",
               ${$ALIAS?\q[GvNAME(CvGV(cv))]:\qq[\"$pname\"]},
               \"$var\");

OUTPUT
T_STD_STRING
  $arg = sv_2mortal(newSVpvn($var.c_str(), $var.length()));

// Output a list.
T_STD_VECTOR_STRING
   {
	    U32 ix_$var;
        size_t size_$var = $var.size();
        SSize_t extend_size =
            /* The weird way this is written is because g++ is dumb



( run in 0.771 second using v1.01-cache-2.11-cpan-4e96b696675 )