endpoint results from the CPAN

App-s2p
view release on metacpan or search on metacpan

=head2 Atoms

The possible atoms of a BRE are: B<.>, matching any single character;
B<^> and B<$>, matching the null string at the beginning or end
of a string, respectively; a I<bracket expressions>, enclosed
in B<[> and B<]> (see below); and any single character with no
other significance (matching that character). A B<\> before one
of: B<.>, B<^>, B<$>, B<[>, B<*>, B<\>, matching the character
after the backslash. A sequence of atoms enclosed in B<\(> and B<\)>
becomes an atom and establishes the target for a I<backreference>,
consisting of the substring that actually matches the enclosed atoms.
Finally, B<\> followed by one of the digits B<0> through B<9> is a
backreference.

A B<^> that is not first, or a B<$> that is not last does not have
a special significance and need not be preceded by a backslash to
become literal. The same is true for a B<]>, that does not terminate
a bracket expression.

An unescaped backslash cannot be last in a BRE.

=head2 Bounds

The BRE bounds are: B<*>, specifying 0 or more matches of the preceding
atom; B<\{>I<count>B<\}>, specifying that many repetitions;
B<\{>I<minimum>B<,\}>, giving a lower limit; and
B<\{>I<minimum>B<,>I<maximum>B<\}> finally defines a lower and upper
bound.

A bound appearing as the first item in a BRE is taken literally.

=head2 Bracket Expressions

A I<bracket expression> is a list of characters, character ranges
and character classes enclosed in B<[> and B<]> and matches any
single character from the represented set of characters.

A character range is written as two characters separated by B<-> and
represents all characters (according to the character collating sequence)
that are not less than the first and not greater than the second.
(Ranges are very collating-sequence-dependent, and portable programs
should avoid relying on them.)

A character class is one of the class names

   alnum     digit     punct
   alpha     graph     space
   blank     lower     upper
   cntrl     print     xdigit

enclosed in B<[:> and B<:]> and represents the set of characters
as defined in ctype(3).

If the first character after B<[> is B<^>, the sense of matching is
inverted.

To include a literal 'C<^>', place it anywhere else but first. To
include a literal 'C<]>' place it first or immediately after an
initial B<^>. To include a literal 'C<->' make it the first (or
second after B<^>) or last character, or the second endpoint of
a range.

The special bracket expression constructs C<[[:E<lt>:]]> and C<[[:E<gt>:]]>
match the null string at the beginning and end of a word respectively.
(Note that neither is identical to Perl's '\b' atom.)

=head2 Additional Atoms

Since some sed implementations provide additional regular expression
atoms (not defined in POSIX 1003.2), B<psed> is capable of translating
the following backslash escapes:

=over 4

=item B<\E<lt>> This is the same as C<[[:E<gt>:]]>.

=item B<\E<gt>> This is the same as C<[[:E<lt>:]]>.

=item B<\w> This is an abbreviation for C<[[:alnum:]_]>.

=item B<\W> This is an abbreviation for C<[^[:alnum:]_]>.

=item B<\y> Match the empty string at a word boundary.

=item B<\B> Match the empty string between any two either word or non-word characters.

=back

To enable this feature, the environment variable PSEDEXTBRE must be set
to a string containing the requested characters, e.g.:
C<PSEDEXTBRE='E<lt>E<gt>wW'>.

=cut

#####
# bre2p - convert BRE to Perl RE
#
sub peek(\$$){
    my( $pref, $ic ) = @_;
    $ic < length($$pref)-1 ? substr( $$pref, $ic+1, 1 ) : '';
}

sub bre2p($$$){
    my( $del, $pat, $fl ) = @_;
    my $led = $del;
    $led =~ tr/{([</})]>/;
    $led = '' if $led eq $del;

    $pat = substr( $pat, 1, length($pat) - 2 );
    my $res = '';
    my $bracklev = 0;
    my $backref  = 0;
    my $parlev = 0;
    for( my $ic = 0; $ic < length( $pat ); $ic++ ){
        my $c = substr( $pat, $ic, 1 );
        if( $c eq '\\' ){
	    ### backslash escapes
            my $nc = peek($pat,$ic);
            if( $nc eq '' ){
                Warn( "'\\' cannot be last in pattern", $fl );
( run in 1.233 second using v1.01-cache-2.11-cpan-acf6aa7dc9e )