Parse-Eyapp
view release on metacpan or search on metacpan
lib/Parse/Eyapp/languageintro.pod view on Meta::CPAN
To solve a shift-reduce conflict between a production C<A --E<gt> SOMETHING>
and a token C<'a'> you can follow this procedure:
=over
=item 1. Edit the C<.output> file
=item 2. Search for the state where the conflict between the production and the token
is. In our example it looks like:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '56,65p' ambiguities.output
State 5:
exp -> exp . '-' exp (Rule 2)
exp -> exp '-' exp . (Rule 2)
'-' shift, and go to state 3
'-' [reduce using rule 2 (exp)]
$default reduce using rule 2 (exp)
=item 3. Inside the state there has to be a production of the type C<A --E<gt> SOMETHING.>
(with the dot at the end)
indicating that a reduction must take place. There has to be also another production
of the form C<A --E<gt> prefix . suffix>, where suffix can I<start> with the involved
token C<'a'>.
=item 4. Decide what action shift or reduce matches the kind of trees you want.
In this example we want C<NUM - NUM - NUM> to produce a tree like
C<MINUS(MINUS(NUM, NUM), NUM)> and not C<MINUS(NUM, MINUS(NUM, NUM))>. We want the
conflict in C<exp - exp.- NUM> to be solved in favor of the reduction
by C<exp: exp '-' exp>.
This is achieved by declaring C<%left '-'>.
=back
=head3 Error Recovery
The token name C<error> is reserved for error handling. This name can
be used in grammar productions; it suggests places where errors are
expected, and recovery can take place:
line:
'\n' { undef }
| exp '\n' { print "$_[1]\n" if defined($_[1]); $_[1] }
| error '\n'
{
$_[0]->YYErrok;
undef
}
The parser pops its stack until
it enters a state where the token C<error> is legal. It then shifts
the token C<error> and proceeds to discard tokens until finding
one that is acceptable. In the example
all the tokens until finding a C<'\n'> will be skipped.
If no special error productions have been specified,
the processing will halt.
In order to prevent a cascade of error messages, the parser, after
detecting an error, remains in error state until three tokens have been
successfully read and shifted. If an error is detected when the parser
is already in error state, no message is given, and the input token is
quietly deleted. The method C<YYErrok> used in the example
communicates to the parser
that a satisfactory recovery has been reached
and that it can safely emit new error
messages.
You cannot have a literal I<'error'> in your grammar as it would
confuse the driver with the I<error> token. Use a symbolic token instead.
=head2 The Tail
The tail section contains Perl code. Usually the lexical analyzer and the
Error management subroutines go there. A better practice however is to isolate
both subroutines in a module and use them in the grammar. An example of this is in
files C<examples/CalcUsingTail.eyp> and C<examples/Tail.pm>.
=head3 The Lexical Analyzer
The Lexical Analyzer
is called each time the parser needs a new token.
It is called with only one argument (the parser object)
and returns a pair
containing the next token and its associated attribute.
The fact that is a method of the parser object means that
the parser
methods are accesible inside the lexical analyzer.
Specially interesting is the C<$_[0]-E<gt>YYData>
method which provides access to the user data area.
I<When the lexical analyzer reaches the end of input, it must return the
pair> C<('', undef)>
See below how to write a lexical analyzer (file C<examples/Calc.eyp>):
1 sub make_lexer {
2 my $input = shift;
3
4 return sub {
5 my $parser = shift;
6
7 for ($$input) {
8 m{\G[ \t]*}gc;
9 m{\G([0-9]+(?:\.[0-9]+)?)}gc and return ('NUM',$1);
10 m{\G([A-Za-z][A-Za-z0-9_]*)}gc and return ('VAR',$1);
11 m{\G\n}gc and do { $lineno++; return ("\n", "\n") };
12 m{\G(.)}gc and return ($1,$1);
13
14 return('',undef);
15 }
16 }
17 }
The subroutine C<make_lexer> creates the lexical analyzer as
a closure. The lexer returned by C<make_lexer> is used by the
C<YYParse> method:
( run in 0.430 second using v1.01-cache-2.11-cpan-e93a5daba3e )