Language-FormulaEngine

 view release on metacpan or  search on metacpan

lib/Language/FormulaEngine/Parser.pm  view on Meta::CPAN


=head1 NAME

Language::FormulaEngine::Parser - Create parse tree from an input string

=head1 VERSION

version 0.08

=head1 SYNOPSIS

  my $parse_tree= Language::FormulaEngine::Parser->new->parse($string);

=head1 DESCRIPTION

This class scans tokens from an input string and builds a parse tree.  In compiler terminology,
it is both a Scanner and Parser.  It performs a top-down recursive descent parse, because this
is easy and gives good error messages.  It only parses strings, but leaves room for subclasses
to implement streaming.  By default, the parser simply applies a Grammar to the input, without
checking whether the functions or variables exist, but can be subclassed to do more detailed
analysis during the parse.

The generated parse tree is made up of Function nodes (each infix operator is converted to a
named function) and each Function node may contain Symbols, Strings, Numbers, and other
Function nodes.  The parse tree can be passed to the Evaluator for instant execution, or passed
to the Compiler to generate an optimized perl coderef.  The parse tree is lightweight, and does
not include token/context information; this could also be added by a subclass.

=head1 PUBLIC API

=head2 parse

Parse a new input text, updating all derived attributes with the result of the operation.
It returns the value of L</parse_tree> (which is undef if the parse failed).
On failure, the exception is stored in L</error> and other attributes like L</token_pos> may
contain useful diagnostic information.

=head2 parse_tree

This holds the generated parse tree, or C<undef> if the parse failed.  See L</"Parse Nodes">.

=head2 error

This is C<undef> if the parse succeeded, else an error message describing the syntax that ended
the parse.

=head2 functions

A set (hashref) of all function names encountered during the parse.

=head2 symbols

A set (hashref) of all non-function symbols encountered.  (variables, constnts, etc.)

=head2 reset

Clear the results of the previous parse, to re-use the object.  Returns C<$self> for chaining.

=head2 deparse

  my $formula_text= $parser->deparse($tree);

Return a canonical formula text for the parse tree, or a parse tree that you supply.

=head1 EXTENSIBLE API

These methods and attributes are documented for purposes of subclassing the parser.

=head2 input

The input string being scanned.
Code within the parser should access this as C<< $self->{input} >> for efficiency.

=head2 input_pos

Shortcut for C<< pos($self->{input}) >>.

=head2 token_type

Type of current token scanned from C<input>.
Code within the parser should access this as C<< $self->{token_type} >> for efficiency.

=head2 token_value

Value of current token scanned from C<input>, with escape sequences and etc resolved to a
sensible perl value.
Code within the parser should access this as C<< $self->{token_value} >> for efficiency.

=head2 token_pos

An offset within C<input> where this token started.
Code within the parser should access this as C<< $self->{token_pos} >> for efficiency.

=head2 next_token

Advance to the next token, replacing the values of C<token_> variables and updating
C<input_pos>.  Returns the token_type, of which all are true except EOF which has a
type of C<0>, so this also means the function returns true if it parsed a token and
false if it reached EOF.  It dies if no token could be parsed.
If you call next_token again after the eof token, it throws an exception.

This method is a wrapper around L</scan_token>. Override that method to add new token types.

=head2 scan_token

Pattern-match the next token, and either return C<< $type => $value >> or an empty list if
the syntax is invalid.  This is intended to be overridden by subclasses.

=head2 consume_token

  return $self->consume_token if $self->{token_type} eq $desired_type;

This is a shorthand for returning the current C<token_value> while also calling C<next_token>.

=head2 token_context

  my $text= $self->token_context(%options);

Default behavior generates a string like:

  "'blah blah' on line 15, char 12"

Passing C<< token_context(multiline => 1) >> generates a string like



( run in 0.482 second using v1.01-cache-2.11-cpan-39bf76dae61 )