App-s2p
view release on metacpan or search on metacpan
=head2 Atoms
The possible atoms of a BRE are: B<.>, matching any single character;
B<^> and B<$>, matching the null string at the beginning or end
of a string, respectively; a I<bracket expressions>, enclosed
in B<[> and B<]> (see below); and any single character with no
other significance (matching that character). A B<\> before one
of: B<.>, B<^>, B<$>, B<[>, B<*>, B<\>, matching the character
after the backslash. A sequence of atoms enclosed in B<\(> and B<\)>
becomes an atom and establishes the target for a I<backreference>,
consisting of the substring that actually matches the enclosed atoms.
Finally, B<\> followed by one of the digits B<0> through B<9> is a
backreference.
A B<^> that is not first, or a B<$> that is not last does not have
a special significance and need not be preceded by a backslash to
become literal. The same is true for a B<]>, that does not terminate
a bracket expression.
An unescaped backslash cannot be last in a BRE.
=head2 Bounds
The BRE bounds are: B<*>, specifying 0 or more matches of the preceding
atom; B<\{>I<count>B<\}>, specifying that many repetitions;
B<\{>I<minimum>B<,\}>, giving a lower limit; and
B<\{>I<minimum>B<,>I<maximum>B<\}> finally defines a lower and upper
bound.
A bound appearing as the first item in a BRE is taken literally.
=head2 Bracket Expressions
A I<bracket expression> is a list of characters, character ranges
and character classes enclosed in B<[> and B<]> and matches any
single character from the represented set of characters.
A character range is written as two characters separated by B<-> and
represents all characters (according to the character collating sequence)
that are not less than the first and not greater than the second.
(Ranges are very collating-sequence-dependent, and portable programs
should avoid relying on them.)
A character class is one of the class names
alnum digit punct
alpha graph space
blank lower upper
cntrl print xdigit
enclosed in B<[:> and B<:]> and represents the set of characters
as defined in ctype(3).
If the first character after B<[> is B<^>, the sense of matching is
inverted.
To include a literal 'C<^>', place it anywhere else but first. To
include a literal 'C<]>' place it first or immediately after an
initial B<^>. To include a literal 'C<->' make it the first (or
second after B<^>) or last character, or the second endpoint of
a range.
The special bracket expression constructs C<[[:E<lt>:]]> and C<[[:E<gt>:]]>
match the null string at the beginning and end of a word respectively.
(Note that neither is identical to Perl's '\b' atom.)
=head2 Additional Atoms
Since some sed implementations provide additional regular expression
atoms (not defined in POSIX 1003.2), B<psed> is capable of translating
the following backslash escapes:
=over 4
=item B<\E<lt>> This is the same as C<[[:E<gt>:]]>.
=item B<\E<gt>> This is the same as C<[[:E<lt>:]]>.
=item B<\w> This is an abbreviation for C<[[:alnum:]_]>.
=item B<\W> This is an abbreviation for C<[^[:alnum:]_]>.
=item B<\y> Match the empty string at a word boundary.
=item B<\B> Match the empty string between any two either word or non-word characters.
=back
To enable this feature, the environment variable PSEDEXTBRE must be set
to a string containing the requested characters, e.g.:
C<PSEDEXTBRE='E<lt>E<gt>wW'>.
=cut
#####
# bre2p - convert BRE to Perl RE
#
sub peek(\$$){
my( $pref, $ic ) = @_;
$ic < length($$pref)-1 ? substr( $$pref, $ic+1, 1 ) : '';
}
sub bre2p($$$){
my( $del, $pat, $fl ) = @_;
my $led = $del;
$led =~ tr/{([</})]>/;
$led = '' if $led eq $del;
$pat = substr( $pat, 1, length($pat) - 2 );
my $res = '';
my $bracklev = 0;
my $backref = 0;
my $parlev = 0;
for( my $ic = 0; $ic < length( $pat ); $ic++ ){
my $c = substr( $pat, $ic, 1 );
if( $c eq '\\' ){
### backslash escapes
my $nc = peek($pat,$ic);
if( $nc eq '' ){
Warn( "'\\' cannot be last in pattern", $fl );
( run in 0.484 second using v1.01-cache-2.11-cpan-524268b4103 )