split m results from the CPAN

split m

Perl6-Doc

view release on metacpan or search on metacpan

share/Apocalypse/A05.pod view on Meta::CPAN

    my $foo = ?/.../;   # boolean context, return whether matched, 
    my $foo = +/.../;   # numeric context, return count of matches
    my $foo = _/.../;   # string context, return captured/matched string

[Update: Unary C<_> is now unary C<~>.]

The standard match and substitution forms also force immediate
evaluation regardless of context:

    $result = m/.../;           # do match on topic string
    $result = s/.../.../;       # do substitution on topic string

These forms also force the regex to start matching at the beginning of
the string in question and scan forward through the string for the
match, as if there were an implicit C<.*?> in front of every iteration.
(Both of these behaviors are suppressed if you use the C<:c>/C<:cont>
modifier). In contrast, the meaning of the deferred forms is dependent
on context. In particular, a deferred regex naturally assumes C<:c>
when used as a subrule. That is, it continues where the last match left
off, and the next thing has to match right there at the head of the
string.

[Update: A regex actually assumes C<:p> semantics when used as a subrule.
And the long form of C<:c> is C<:continue>.]

In any other context, including list context, a deferred regex is
I<not> immediately evaluated, but produces a reference to the regex
object:

    my $rx = /.../;     # not evaluated
    my @foo = $rx;      # ERROR: type mismatch.
    my @foo = ($rx);    # One element, a regex object.
    my @foo = (/.../);  # Same thing.
    my @foo := $rx;     # Set autogrow rule for @foo.

To evaluate repeatedly in list context, treat the regex object as you
would any other iterator:

    my @foo = <$rx>;

You can also use the more explicit form:

    my @foo = m/<$rx>/;

Those aren't identical, since the former assumes C<:c> and starts up at
the current position of the unmentioned topic, while the latter
explicitly resets the position to the beginning before scanning. Also,
since the deferred regex assumes a C<:c> modifier, C<< <$rx> >>
won't scan through the string like C<m//>. It can return multiple
values to the list, but they have to be contiguous. You can get the
scanning effect of C<m//> by prepending the pattern with C<.*?>.

[Update: Where the preceding paragraphs says C<:c> read C<:p>.]

But it's vitally important to understand this fundamental change, that
C<//> is no longer a short form of C<m//>, but rather a short form of
C<rx//>. If you want to add modifiers to a C<//>, you have to turn it
into an C<rx//>, not an C<m//>. It's now I<wrong> to call C<split> like
this:

    split m/.../

(That is, it's wrong unless you actually want the return value of the
pattern match to be used as the literal split delimiter.)

The old C<?...?> syntax is gone. Indeed, it has to go for us to get the
unary C<?> operator.

    Old                 New
    ---                 ---
    ?pat?               m:once/pat/
    qr//                rx//
                        rule { }

[Update: The C<:once> modifier is also gone.  You can write the equivalent
using a C<state> variable now.]

=head2 Null String Reform

The null pattern is now illegal. To match whatever you used to match
with a null pattern, use one of these:

    Old                 New
    ---                 ---
    //                  /<prior>/       # match what prior match did
    //                  /<null>/        # match the null string between chars
    (a|b|)              (a|b|<null>)    # match a null alternative

Note that, as an assertion, C<< <null> >> always succeeds. You never
want to say:

    / <null> | single | double | triple | home run /

because you'll never get to first base.

[Update: now we just ignore the first C<|> if there's nothing before it
so that you can say

    [
    | single
    | double
    | triple
    | home run
    ]

]

=head2 Extension Syntax Reform

There are no longer any C<(?...)> sequences, because parens now always
capture. Some of the replacement sequences take their intrinsic
scoping from C<< <...> >>, while others are associated with other
bracketing characters, or with any arbitrary atom that could be a
bracketed construct. Looking at the metasyntax problem from the
perspective of a Perl5-to-Perl6 translator, here's what the various
Perl 5 extension constructs translate to:

    Old                 New
    ---                 ---
    (??{$rule})         <$rule>         # call regex in variable
    (?{ code })         { code }        # call Perl code, ignore result

share/Apocalypse/A05.pod view on Meta::CPAN

Probably okay, though we need a way to translate old code. It was
originally done this way because split on whitespace would typically
return an extra field after the newline. But most newlines will be
prechomped in Perl 6.

=over

=item I<When not in list context, split currently splits into C<@_>. I
suggest that this side-effect be removed.>

=back

Fine. It's easy enough to translate to an explicit assignment.

=over

=item I<C<split ?pat?> in any context currently splits into C<@_>. I
suggest that this side-effect be removed.>

=back

Fine. I don't think anyone uses that.

=over

=item I<C<split ' '> (but not C<split / />) currently splits on
whitespace, but also removes leading empty fields. I suggest that this
irregularity be removed.>

=back

The question is, what to replace it with, since it's a very handy
construct. We could use a different conventional pattern:

    @array = split /<ws>/, $string;

Or we could say that it's now a split on whitespace only if the split
argument is unspecified. That wouldn't work very well with the old
syntax, where we often have to supply the second argument. But given
that the C<=~> operator now serves as a topicalizer for any term, we
could translate:

    @array = split ' ', $string;

to this:

    @array = $string =~ split;

Oddly, this probably also works:

    $string =~ (@array = split);

or maybe even this:

    @array = split given $string;

But I think I like the OO notation better here anyway:

    @array = $string.split;

In fact, split may not be a function at all. The default split might
just be a string method and use unary dot:

    @array = .split;

We still have the third argument to deal with, but that's likely to be
specified like this:

    @array = $string.split(limit => 3);

We could conceivably make a different method for word splitting, much
like REXX does:

    @array = .words;

Then a limit could be the first argument:

    @array = .words(3);

But there almost doesn't need to be such a method, since

    @array = m/ [ (\S*) \s* ]* /;

will do the right thing. Admittedly, a C<.words> method would be much
more readable...

Fortunately, C<split> is a function, so I can put off that decision
till Apocalypse 29. C<:-)>

[Update: At the moment I think there's a C<words> multimethod on strings.]

=head1 Rejected RFCs

=head2 RFC 135: Require explicit m on matches, even with ?? and // as
delimiters.

Squish that gnat... C<:-)>

A decent Perl parser is still going to have to keep track of whether a
term or an operator is expected. And while we're simplifying the
grammar in many ways, it's also the case that we're letting users
install their own grammar rules to perform syntactic warpage. Besides,
people I<like> to write patterns with C</.../>. So rather than
impoverishing Perl's syntax artificially, let's make the standard
parser more accessible by writing it all in Perl 6 regexes.

=head2 RFC 145: Brace-matching for Perl Regular Expressions

Good problem, not-so-good solution from a complexity point of view. I'd
like to leverage existing character class and backref notations maybe.
If there were simply some way to tell a backref to invert any match
characters, that might do it. Or maybe reverse them when you remember
them, and leave the backref ignorant? (Downside is nested brackets
would probably need recursive patterns.)

Recursion might be advisable anyway--you can't really pick up the
arguments to a function, for instance, without also handling things
like quoted strings, which may have different bracketing rules than
outside of strings. Certainly matching C<\"> would be dependent on
whether you're inside or outside of a string. Given that recursion is
often necessary, I'm not sure making this construct recurse itself is

( run in 0.478 second using v1.01-cache-2.11-cpan-71847e10f99 )