re-engine-Plugin

 view release on metacpan or  search on metacpan

Plugin.pod  view on Meta::CPAN


The second paramater of each routine is the paren number being
requested/stored, the following mapping applies for those numbers:

    -2 => $` or ${^PREMATCH}
    -1 => $' or ${^POSTMATCH}
     0 => $& or ${^MATCH}
     1 => $1
     # ...

Assignment to capture variables makes it possible to implement
something like Perl 6 C<:rw> semantics, and since it's possible to
make the capture variables return any scalar instead of just a string
it becomes possible to implement Perl 6 match object semantics (to
name an example).

=head2 named_captures

B<TODO>: implement

perl internals still needs to be changed to support this but when it's
done it'll allow the binding of C<%+> and C<%-> and support the
L<Tie::Hash> methods FETCH, STORE, DELETE, CLEAR, EXISTS, FIRSTKEY,
NEXTKEY and SCALAR.

=head1 CONSTANTS

=head2 C<REP_THREADSAFE>

True iff the module could have been built with thread-safety features
enabled.

=head2 C<REP_FORKSAFE>

True iff this module could have been built with fork-safety features
enabled. This will always be true except on Windows where it's false
for perl 5.10.0 and below.

=head1 TAINTING

The only way to untaint an existing variable in Perl is to use it as a
hash key or referencing subpatterns from a regular expression match
(see L<perlsec|perlsec/Laundering and Detecting Tainted Data>), the
latter only works in perl's regex engine because it explicitly
untaints capture variables which a custom engine will also need to do
if it wants its capture variables to be untanted.

There are basically two ways to go about this, the first and obvious
one is to make use of Perl'l lexical scoping which enables the use of
its built-in regex engine in the scope of the overriding engine's
callbacks:

    use re::engine::Plugin (
        exec => sub {
            my ($re, $str) = @_; # $str is tainted

            $re->num_captures(
                FETCH => sub {
                    my ($re, $paren) = @_;

                    # This is perl's engine doing the match
                    $str =~ /(.*)/;

                    # $1 has been untainted
                    return $1;
                },
            );
        },
    );

The second is to use something like L<Taint::Util> which flips the
taint flag on the scalar without invoking the perl's regex engine:

    use Taint::Util;
    use re::engine::Plugin (
        exec => sub {
            my ($re, $str) = @_; # $str is tainted

            $re->num_captures(
                FETCH => sub {
                    my ($re, $paren) = @_;

                    # Copy $str and untaint the copy
                    untaint(my $ret = $str);

                    # Return the untainted value
                    return $ret;
                },
            );
        },
    );

In either case a regex engine using perl's L<regex api|perlapi> or
this module is responsible for how and if it untaints its variables.

=head1 SEE ALSO

L<perlreapi>, L<Taint::Util>

=head1 TODO & CAVEATS

I<here be dragons>

=over

=item *

Engines implemented with this module don't support C<s///> and C<split
//>, the appropriate parts of the C<REGEXP> struct need to be wrapped
and documented.

=item *

Still not a complete wrapper for L<perlreapi> in other ways, needs
methods for some C<REGEXP> struct members, some callbacks aren't
implemented etc.

=item *

Support overloading operations on the C<qr//> object, this allow
control over the of C<qr//> objects in a manner that isn't limited by



( run in 0.862 second using v1.01-cache-2.11-cpan-99c4e6809bf )