Regexp-Grammars
view release on metacpan or search on metacpan
lib/Regexp/Grammars.pm view on Meta::CPAN
series of nested hashes (and possibly arrays) representing the hierarchical
structure of the parsed data.
Typically, the next step is to walk that tree, extracting or
converting or otherwise processing that information. If the tree has nodes of
many different types, it can be difficult to build a recursive subroutine that
can navigate it easily.
A much cleaner solution is possible if the nodes of the tree are proper
objects. In that case, you just define a C<process()> or C<traverse()> method
for eah of the classes, and have every node call that method on each of its
children. For example, if the parser were to return a tree of nodes
representing the contents of a LaTeX file, then you could define the following
methods:
sub Latex::file::explain
{
my ($self, $level) = @_;
for my $element (@{$self->{element}}) {
$element->explain($level);
}
}
sub Latex::element::explain {
my ($self, $level) = @_;
( $self->{command} || $self->{literal})->explain($level)
}
sub Latex::command::explain {
my ($self, $level) = @_;
say "\t"x$level, "Command:";
say "\t"x($level+1), "Name: $self->{name}";
if ($self->{options}) {
say "\t"x$level, "\tOptions:";
$self->{options}->explain($level+2)
}
for my $arg (@{$self->{arg}}) {
say "\t"x$level, "\tArg:";
$arg->explain($level+2)
}
}
sub Latex::options::explain {
my ($self, $level) = @_;
$_->explain($level) foreach @{$self->{option}};
}
sub Latex::literal::explain {
my ($self, $level, $label) = @_;
$label //= 'Literal';
say "\t"x$level, "$label: ", $self->{q{}};
}
and then simply write:
if ($text =~ $LaTeX_parser) {
$/{LaTeX_file}->explain();
}
and the chain of C<explain()> calls would cascade down the nodes of the tree,
each one invoking the appropriate C<explain()> method according to the type of
node encountered.
The only problem is that, by default, Regexp::Grammars returns a tree of
plain-old hashes, not LaTeX::Whatever objects. Fortunately, it's easy to
request that the result hashes be automatically blessed into the appropriate
classes, using the C<< <objrule:...> >> and C<< <objtoken:...> >> directives.
These directives are identical to the C<< <rule:...> >> and C<<
<token:...> >> directives (respectively), except that the rule or token
they create will also convert the hash it normally returns into an
object of a specified class. This conversion is done by passing the result
hash to the class's constructor:
$class->new(\%result_hash)
if the class has a constructor method named C<new()>, or else (if
the class doesn't provide a constructor) by directly blessing the
result hash:
bless \%result_hash, $class
Note that, even if object is constructed via its own constructor, the
module still expects the new object to be hash-based, and will fail if
the object is anything but a blessed hash. The module issues an
error in this case.
The generic syntax for these types of rules and tokens is:
<objrule: CLASS::NAME = RULENAME >
<objtoken: CLASS::NAME = TOKENNAME >
For example:
<objrule: LaTeX::Element=component>
# ...Defines a rule that can be called as <component>
# ...and which returns a hash-based LaTeX::Element object
<objtoken: LaTex::Literal=atom>
# ...Defines a token that can be called as <atom>
# ...and which returns a hash-based LaTeX::Literal object
Note that, just as in L<aliased subrule calls|"Renaming subrule results">,
the name by which something is referred to outside the grammar (in this
case, the class name) comes I<before> the C<=>, whereas the name that it
is referred to inside the grammar comes I<after> the C<=>.
You can freely mix object-returning and plain-old-hash-returning rules
and tokens within a single grammar, though you have to be careful not to
subsequently try to call a method on any of the unblessed nodes.
=head4 An important caveat regarding OO rules
Prior to Perl 5.14.0, Perl's regex engine was not fully re-entrant.
This means that in older versions of Perl, it is not possible to
re-invoke the regex engine when already inside the regex engine.
This means that you need to be careful that the C<new()>
constructors that are called by your object-rules do not themselves
use regexes in any way, unless you're running under Perl 5.14 or later
( run in 0.700 second using v1.01-cache-2.11-cpan-0bb4e1dffa6 )