Alt-CWB-ambs
view release on metacpan or search on metacpan
lib/CWB/CEQL/Parser.pm view on Meta::CPAN
}
=item I<$result> = I<$grammar>->B<Parse>(I<$string> [, I<$rule>]);
Parse input string I<$string> as a constituent of type I<$rule> (if
unspecified, the C<default> rule will be used). The return value I<$result>
is typically a string containing the transformed query, but may also be an
arbitrary data structure or object (such as a parse tree for I<$input>).
Consult the relevant grammar documentation for details. If parsing fails,
B<undef> is returned.
=cut
sub Parse {
croak 'Usage: $result = $grammar->Parse($string [, $rule]);'
unless @_ == 2 or @_ == 3;
my ($self, $input, $rule) = @_;
$rule = "default"
unless defined $rule;
confess "CWB::CEQL::Parser: Parse() method is not re-entrant\n(tried to parse '$input' while parsing '".$self->{INPUT}."')"
if defined $self->{INPUT};
$self->{INPUT} = $input;
%{$self->{PARAM}} = %{$self->{PARAM_DEFAULTS}}; # shallow copy of hash
$self->{CALLSTACK} = []; # re-initialise call stack (should destroy information from last parse)
$self->{GROUPS} = undef; # indicate that shift-reduce parser is not active
$self->{CURRENT_GROUP} = undef;
$self->{GROUPSTACK} = undef;
$self->{ERROR} = undef; # clear previous errors
my $result = eval { $self->Call($rule, $input) }; # catch exceptions from parse errors
if (not defined $result) {
my $error = $@;
chomp($error); # remove trailing newline
$error = "parse of '' ".$self->{INPUT}." '' returned no result (reason unknown)"
if $error eq "";
$error =~ s/\s*\n\s*/ **::** /g;
$self->{ERROR} = $error;
}
$self->{INPUT} = undef; # no active parse
$self->{PARAM} = undef; # restore global parameter values (PARAM_DEFAULTS)
return $result; # undef if parse failed
}
=item I<@lines_of_text> = I<$grammar>->B<ErrorMessage>;
If the last parse failed, returns a detailed error message and backtrace of
the callstack as a list of text lines (without newlines). Otherwise, returns
empty list.
=cut
sub ErrorMessage {
my $self = shift;
my $error = $self->{ERROR};
return ()
unless defined $error;
my @lines = "**Error:** $error";
my $previous_frame = { RULE => "", INPUT => "" }; # init do dummy frame to avoid special case below
foreach my $frame (reverse @{$self->{CALLSTACK}}) {
my $rule = $frame->{RULE};
if ($rule eq "APPLY") {
my @done = @{$frame->{APPLY_DONE}};
my @remain = @{$frame->{APPLY_ITEMS}};
push @lines, " - at this location: '' @done ''**<==**'' @remain ''";
}
else {
my $input = $frame->{INPUT};
my $previous_input = $previous_frame->{INPUT} || "";
if (($previous_input eq $input) and ($previous_frame->{RULE} ne "APPLY")) {
$lines[-1] .= ", **$rule**";
}
else {
push @lines, " - when parsing '' $input '' as **$rule**";
}
}
$previous_frame = $frame;
}
return @lines;
}
=item I<$html_code> = I<$grammar>->B<HtmlErrorMessage>;
If the last parse failed, returns HTML-formatted error message and backtrace
of the callstack. The string I<$html_code> is valid HTML and can directly be
included in a generated Web page. In particular, unsafe and non-ASCII
characters have been encoded as HTML entities. Simple, non-recursive
wiki-style markup in an error message is interpreted in the following way:
**<text>** <text> is shown in bold font (<b> ... </b>)
//<text>// <text> is displayed in italics (<i> ... </i>)
''<text>'' <text> is shown in typewriter font (<code> ... </code>)
Lines starting with C< - > (note the two blanks) are converted into list items.
=cut
sub HtmlErrorMessage {
my $self = shift;
my @text_lines = $self->ErrorMessage();
if (@text_lines > 0) {
return $self->formatHtmlText(@text_lines);
}
else {
return undef;
}
}
=item I<$grammar>->B<SetParam>(I<$name>, I<$value>);
=item I<$value> = I<$grammar>->B<GetParam>(I<$name>);
Set the value of parameter I<$name> (B<SetParam>), or read its current value
(B<GetParam>). The parameter I<$name> must have been defined by the grammar
class (which I<$grammar> is an instance of) and should be described in the
grammar's documentation.
=cut
sub SetParam {
croak 'Usage: $grammar->SetParam($name, $value)'
unless @_ == 3;
my ($self, $name, $value) = @_;
## select either global parameter values (user level) or working copy (during parse)
my $param_set = (defined $self->{INPUT}) ? $self->{PARAM} : $self->{PARAM_DEFAULTS};
croak "CWB::CEQL::Parser: parameter '$name' does not exist"
unless exists $param_set->{$name};
$param_set->{$name} = $value;
}
sub GetParam {
croak 'Usage: $grammar->GetParam($name)'
unless @_ == 2;
my ($self, $name) = @_;
my $param_set = (defined $self->{INPUT}) ? $self->{PARAM} : $self->{PARAM_DEFAULTS};
croak "CWB::CEQL::Parser: parameter '$name' does not exist"
unless exists $param_set->{$name};
return $param_set->{$name};
}
=back
=head1 METHODS USED BY GRAMMAR AUTHORS
Methods for grammar authors. Since these methods are intended for use in the
rules of a DPP grammar, they are typically applied to the object I<$self>.
=over 4
=item I<$self>->B<NewParam>(I<$name>, I<$default_value>);
Define new parameter I<$name> with default value I<$default_value>. This
method is normally called in the constructor (method B<new>) of a
parameterized grammar. If it is used in a rule body, the new parameter
will be created in the working copy of the parameter set and will only be
available during the current parse.
=cut
sub NewParam {
confess 'Usage: $self->NewParam($name, $default_value)'
unless @_ == 3;
my ($self, $name, $value) = @_;
## select either global parameter values (user level) or working copy (during parse)
my $param_set = (defined $self->{INPUT}) ? $self->{PARAM} : $self->{PARAM_DEFAULTS};
confess "CWB::CEQL::Parser: parameter '$name' already exists, cannot create with NewParam()"
if exists $param_set->{$name};
$param_set->{$name} = $value;
}
=item I<$result> = I<$self>->B<Call>(I<$rule>, I<$input>);
Apply rule I<$rule> to input string I<$input>. The return value I<$result>
depends on the grammar rule, but is usually a string containing a translated
version of I<$input>. Grammar rules may also annotate this string with
B<attributes> or by B<bless>ing it into a custom class, or return a complex
data structure such as a parse tree for I<$input>. The caller has to be aware
what kind of value I<$rule> returns.
Note that B<Call> never returns B<undef>. In case of an error, the entire
parse is aborted.
=cut
sub Call {
confess 'Usage: $result = $self->Call($rule, $input);'
unless @_ == 3;
my ($self, $rule, $input) = @_;
confess "Sorry, we're not parsing yet"
unless defined $self->{INPUT};
my $method = $self->can("$rule");
confess "the rule **$rule** does not exist in grammar **".ref($self)."** (internal error)\n"
unless defined $method;
my $frame = {RULE => $rule,
INPUT => $input};
push @{$self->{CALLSTACK}}, $frame;
my $result = $method->($self, $input);
die "rule **$rule** failed to return a result (internal error)\n"
unless defined $result;
my $return_frame = pop @{$self->{CALLSTACK}};
die "call stack has been corrupted (internal error)"
unless $return_frame eq $frame;
return $result;
}
=item I<$result> = I<$self>->B<Try>(I<$rule>, I<$input>);
Tentatively apply rule I<$rule> to the input string. If I<$input> is parsed
successfully, B<Try> returns the translated version I<$result> (or an
arbitrary data structure such as a parse tree for I<$input>) just as B<Call>
would. If parsing fails, B<Try> does not abort but simply returns B<undef>,
ignoring any error messages generated during the attempt. In addition, the
call stack is restored and all parameters are reset to their previous values,
so that parsing can continue as if nothing had happened (note, however, that
this is based on flat backup copies, so complex data structures may have been
altered destructively).
=cut
sub Try {
confess 'Usage: $result = $self->Try($rule, $input);'
unless @_ == 3;
my ($self, $rule, $input) = @_;
confess "Sorry, we're not parsing yet"
unless defined $self->{INPUT};
## make flat backup copies of important data structures and ensure they are restored upon return
## (this is not completely safe, but should undo most changes that a failed parse may have made)
my $back_param = [ @{$self->{PARAM}} ];
my $back_callstack = [ @{$self->{CALLSTACK}} ];
my ($back_groups, $back_current_group, $back_groupstack) = (undef, undef, undef);
if (defined $self->{GROUPS}) {
$back_groups = [ @{$self->{GROUPS}} ];
$back_current_group = [ @{$back_groups->[0]} ]
if @$back_groups > 0;
}
if (defined $self->{GROUPSTACK}) {
$back_groupstack = [ @{$self->{GROUPSTACK}} ];
}
my $result = eval { $self->Call($rule, $input) };
## if parsing failed, restore internal data structures from backup copies
if (not defined $result) {
$self->{PARAM} = $back_param;
$self->{CALLSTACK} = $back_callstack;
$self->{GROUPS} = $back_groups;
if (defined $back_groups and defined $back_current_group) {
$self->{GROUPS}->[0] = $back_current_group;
}
$self->{GROUPSTACK} = $back_groupstack;
}
return $result;
}
=item I<@results> = I<$self>->B<Apply>(I<$rule>, I<@items>);
Apply rule I<$rule> to each input string in the list I<@items>. The return
values are collected and returned as a list I<@results>, which has to be
further processed by the caller. Note that empty strings (C<"">) are
automatically removed from the list of return values.
=cut
sub Apply {
confess 'Usage: @results = $self->Apply($rule, @items);'
unless @_ >= 2;
my $self = shift;
my $rule = shift;
my $frame = {RULE => "APPLY",
INPUT => undef,
APPLY_ITEMS => [ @_ ],
APPLY_DONE => []};
push @{$self->{CALLSTACK}}, $frame;
## data structures for nested groups and result values must be restored on exit (in case of nested Apply())
local $self->{GROUPS} = [ [] ]; # set up data structure to collect result values of nested groups
local $self->{GROUPSTACK} = []; # stack of nested groups (keeps track of nesting depth and ensures proper nesting)
## process each input item in turn
while (@{$frame->{APPLY_ITEMS}}) {
my $input = shift @{$frame->{APPLY_ITEMS}};
push @{$frame->{APPLY_DONE}}, $input;
my $result = $self->Call($rule, $input);
push @{$self->{GROUPS}->[0]}, $result
unless $result eq "" and not ref $result; # plain empty string indicates that this item does not generate output
}
## check that nested bracketing was balanced and that data structure for result values is sane
if (@{$self->{GROUPSTACK}} > 0) {
my $next_type = pop @{$self->{GROUPSTACK}};
my $type_msg = ($next_type eq "*") ? "" : "of type ''$next_type''";
die "bracketing is not balanced: too many opening delimiters $type_msg\n";
}
confess "data structure for result values is corrupt in Apply() call (internal error)"
unless @{$self->{GROUPS}} == 1;
my @results = @{$self->{GROUPS}->[0]};
my $return_frame = pop @{$self->{CALLSTACK}};
die "call stack has been corrupted (internal error)"
unless $return_frame eq $frame;
return @results;
}
=item I<$self>->B<BeginGroup>([I<$name>]);
Marks the start of a nested group, when an opening delimiter is encountered.
B<BeginGroup> may only be called while the shift-reduce parser is active
during an B<Apply> operation. The optional parameter I<$name> can be used to
ensure proper nesting of different types of groups; the default group name is
C<*>. After calling B<BeginGroup>, a DPP rule will often return C<""> since
the opening determiner has a purely syntactic function and is not generate
output directly.
=cut
sub BeginGroup {
my ($self, $name) = @_;
$name = "*"
unless defined $name;
my $group_stack = $self->{GROUPSTACK};
confess "BeginGroup() called outside Apply() operation (internal error)"
unless defined $group_stack;
push @$group_stack, $name;
unshift @{$self->{GROUPS}}, [];
}
=item I<@group_results> = I<$self>->B<EndGroup>([I<$name>]);
Marks the end of a nested group, when a closing delimiter is encountered. The
optional parameter I<$name> (or the default name C<*>) must be identical to
the group name of the matching opening delimiter. B<EndGroup> returns a list
containing the result values collected from this nested group.
=cut
sub EndGroup {
my ($self, $name) = @_;
$name = "*"
unless defined $name;
my $group_stack = $self->{GROUPSTACK};
my $groups = $self->{GROUPS};
confess "EndGroup() called outside Apply() operation (internal error)"
unless defined $group_stack;
my $type_msg = ($name eq "*") ? "" : "of type ''$name''";
die "bracketing is not balanced: too many closing delimiters $type_msg\n"
unless @$group_stack > 0;
confess "data structure for result values is corrupt in Apply() call (internal error)"
unless @$groups == @$group_stack + 1;
my $active_group = pop @$group_stack;
die "opening delimiter of type ''$active_group'' paired with closing delimiter of type ''$name''\n"
unless $name eq $active_group;
my $group_results = shift @$groups;
return @$group_results;
}
=item I<$n> = I<$self>->B<NestingLevel>;
Returns the nesting depth I<$n> of the current group during an B<Apply>
operation. A nesting depth of 0 corresponds to the top level.
B<NestingLevel> may only be called while the shift-reduce parser is active and
( run in 1.813 second using v1.01-cache-2.11-cpan-e1769b4cff6 )