Parse-Gnaw
view release on metacpan or search on metacpan
lib/Parse/Gnaw/Blocks/ParsingMethods.pm view on Meta::CPAN
=cut
sub cc{
my ($llobj, $subrule)=@_;
my $href_info=$subrule->[2];
my $hash_of_letters = $href_info->{hash_of_letters};
my $letter_payload = $llobj->[LIST__CURR_START]->[LETTER__DATA_PAYLOAD];
#print "called cc with letter_payload '$letter_payload' and class hash "; print Dumper $class_hashref; warn " ";
unless(exists($hash_of_letters->{$letter_payload})){
#warn "dying ";
die "GRAMMARFAIL";
}
$llobj->[LIST__CURR_START]->[LETTER__LETTER_HAS_BEEN_CONSUMED]=1;
}
=head2 notcc
This is short for "not character class".
In perl regular expressions, this is represented with [^ ].
The letters in the square brackets are letters in teh character class you do NOT want to match.
For example, [^aeiou] would NOT match a character class of any single vowel.
Or it WOULD match any character that is NOT a vowel.
=cut
sub notcc{
my ($llobj, $subrule)=@_;
my $href_info=$subrule->[2];
my $hash_of_letters = $href_info->{hash_of_letters};
my $letter_payload = $llobj->[LIST__CURR_START]->[LETTER__DATA_PAYLOAD];
#print "called cc with letter_payload '$letter_payload' and class hash "; print Dumper $class_hashref; warn " ";
if(exists($hash_of_letters->{$letter_payload})){
#warn "dying ";
die "GRAMMARFAIL";
}
$llobj->[LIST__CURR_START]->[LETTER__LETTER_HAS_BEEN_CONSUMED]=1;
}
=head2 thrifty
perform a thrifty quantifier match
Note: Since we want to be able to read petabytes of streamed data,
we will default to using thrifty matching.
i.e. match as little as possible and move on.
if we do greedy matching, then the first .* we run into will
read in the entire stream (petabytes) into memory and crash the system.
if it doesn't crash, it will back up until it finds amatch.
We default to thrifty matching, meaning we only read in as little as possible
to still find a match. This means we only read in just as much of the
stream as we need to find a match.
We can DO greedy matching, but it can be a problem if we're streaming massive quantities of data.
basic thrifty algorithm:
try the rule at least min times.
if that matches, then return and let rest of grammar try.
If rest of grammar dies, then revert to min location
and try matching one more time.
if that passes, then return and let rest of grammar try.
if rest of grammar dies, then revert to min+1 location
and try another rule.
keep doing this until you reach "max" number of matches.
if that doesn't make things happy, then quantifier dies
and the expression fails.
rule1 : 'a' rule2 'b'
rule2 : 'c' d+ rule3 e+
rule3 : f g+ rule4 h
rule4 : i*
=cut
sub thrifty {
my ($llobj, $subrule)=@_;
my $payload=$subrule->[1];
my $rule = $payload->{rule};
my $then_call = $payload->{then_call};
my $grammarref=$llobj->convert_grammar_name_to_array_ref($rule);
$llobj->parse_grammarref($grammarref, $then_call );
return; # must have matched.
}
=head2 greedy
basic greedy algorithm.
try the rule max times.
if not even zero match, die.
at the end of every match, record the letter location of that specific match.
return and let rest of grammar try.
if rest of grammar dies, then revert to max-1 location,
and try another rule.
return and let rest of grammar try.
if rest of grammar dies, then revert to max-2 location
and try another rule.
keep doing this until you reach "min" number of matches.
we can't find a match even at "min", then quantifier dies
( run in 1.788 second using v1.01-cache-2.11-cpan-140bd7fdf52 )