Language-FormulaEngine
view release on metacpan or search on metacpan
lib/Language/FormulaEngine/Parser.pm view on Meta::CPAN
=head1 NAME
Language::FormulaEngine::Parser - Create parse tree from an input string
=head1 VERSION
version 0.08
=head1 SYNOPSIS
my $parse_tree= Language::FormulaEngine::Parser->new->parse($string);
=head1 DESCRIPTION
This class scans tokens from an input string and builds a parse tree. In compiler terminology,
it is both a Scanner and Parser. It performs a top-down recursive descent parse, because this
is easy and gives good error messages. It only parses strings, but leaves room for subclasses
to implement streaming. By default, the parser simply applies a Grammar to the input, without
checking whether the functions or variables exist, but can be subclassed to do more detailed
analysis during the parse.
The generated parse tree is made up of Function nodes (each infix operator is converted to a
named function) and each Function node may contain Symbols, Strings, Numbers, and other
Function nodes. The parse tree can be passed to the Evaluator for instant execution, or passed
to the Compiler to generate an optimized perl coderef. The parse tree is lightweight, and does
not include token/context information; this could also be added by a subclass.
=head1 PUBLIC API
=head2 parse
Parse a new input text, updating all derived attributes with the result of the operation.
It returns the value of L</parse_tree> (which is undef if the parse failed).
On failure, the exception is stored in L</error> and other attributes like L</token_pos> may
contain useful diagnostic information.
=head2 parse_tree
This holds the generated parse tree, or C<undef> if the parse failed. See L</"Parse Nodes">.
=head2 error
This is C<undef> if the parse succeeded, else an error message describing the syntax that ended
the parse.
=head2 functions
A set (hashref) of all function names encountered during the parse.
=head2 symbols
A set (hashref) of all non-function symbols encountered. (variables, constnts, etc.)
=head2 reset
Clear the results of the previous parse, to re-use the object. Returns C<$self> for chaining.
=head2 deparse
my $formula_text= $parser->deparse($tree);
Return a canonical formula text for the parse tree, or a parse tree that you supply.
=head1 EXTENSIBLE API
These methods and attributes are documented for purposes of subclassing the parser.
=head2 input
The input string being scanned.
Code within the parser should access this as C<< $self->{input} >> for efficiency.
=head2 input_pos
Shortcut for C<< pos($self->{input}) >>.
=head2 token_type
Type of current token scanned from C<input>.
Code within the parser should access this as C<< $self->{token_type} >> for efficiency.
=head2 token_value
Value of current token scanned from C<input>, with escape sequences and etc resolved to a
sensible perl value.
Code within the parser should access this as C<< $self->{token_value} >> for efficiency.
=head2 token_pos
An offset within C<input> where this token started.
Code within the parser should access this as C<< $self->{token_pos} >> for efficiency.
=head2 next_token
Advance to the next token, replacing the values of C<token_> variables and updating
C<input_pos>. Returns the token_type, of which all are true except EOF which has a
type of C<0>, so this also means the function returns true if it parsed a token and
false if it reached EOF. It dies if no token could be parsed.
If you call next_token again after the eof token, it throws an exception.
This method is a wrapper around L</scan_token>. Override that method to add new token types.
=head2 scan_token
Pattern-match the next token, and either return C<< $type => $value >> or an empty list if
the syntax is invalid. This is intended to be overridden by subclasses.
=head2 consume_token
return $self->consume_token if $self->{token_type} eq $desired_type;
This is a shorthand for returning the current C<token_value> while also calling C<next_token>.
=head2 token_context
my $text= $self->token_context(%options);
Default behavior generates a string like:
"'blah blah' on line 15, char 12"
Passing C<< token_context(multiline => 1) >> generates a string like
( run in 0.482 second using v1.01-cache-2.11-cpan-39bf76dae61 )