Parse-Gnaw

 view release on metacpan or  search on metacpan

lib/Parse/Gnaw/Blocks/ParsingMethods.pm  view on Meta::CPAN


=cut

sub cc{
	my ($llobj, $subrule)=@_;
	my $href_info=$subrule->[2];
	my $hash_of_letters = $href_info->{hash_of_letters};


	my $letter_payload = $llobj->[LIST__CURR_START]->[LETTER__DATA_PAYLOAD];

	#print "called cc with letter_payload '$letter_payload' and class hash "; print Dumper $class_hashref; warn " ";

	unless(exists($hash_of_letters->{$letter_payload})){
		#warn "dying ";
		die "GRAMMARFAIL";
	}
	$llobj->[LIST__CURR_START]->[LETTER__LETTER_HAS_BEEN_CONSUMED]=1;
}

=head2 notcc

This is short for "not character class". 
In perl regular expressions, this is represented with [^ ].
The letters in the square brackets are letters in teh character class you do NOT want to match.
For example, [^aeiou] would NOT match a character class of any single vowel.
Or it WOULD match any character that is NOT a vowel.

=cut

sub notcc{
	my ($llobj, $subrule)=@_;
	my $href_info=$subrule->[2];
	my $hash_of_letters = $href_info->{hash_of_letters};


	my $letter_payload = $llobj->[LIST__CURR_START]->[LETTER__DATA_PAYLOAD];

	#print "called cc with letter_payload '$letter_payload' and class hash "; print Dumper $class_hashref; warn " ";

	if(exists($hash_of_letters->{$letter_payload})){
		#warn "dying ";
		die "GRAMMARFAIL";
	}
	$llobj->[LIST__CURR_START]->[LETTER__LETTER_HAS_BEEN_CONSUMED]=1;
}

=head2 thrifty

perform a thrifty quantifier match

Note: Since we want to be able to read petabytes of streamed data,
we will default to using thrifty matching.
i.e. match as little as possible and move on.
if we do greedy matching, then the first .* we run into will
read in the entire stream (petabytes) into memory and crash the system.
if it doesn't crash, it will back up until it finds  amatch.
We default to thrifty matching, meaning we only read in as little as possible
to still find a match. This means we only read in just as much of the
stream as we need to find a match.
We can DO greedy matching, but it can be a problem if we're streaming massive quantities of data.

basic thrifty algorithm:
try the rule at least min times.
if that matches, then return and let rest of grammar try.
If rest of grammar dies, then revert to min location
and try matching one more time.
if that passes, then return and let rest of grammar try.
if rest of grammar dies, then revert to min+1 location
and try another rule.

keep doing this until you reach "max" number of matches.
if that doesn't make things happy, then quantifier dies
and the expression fails.

rule1 : 'a' rule2 'b'

rule2 : 'c' d+ rule3 e+

rule3 : f g+ rule4 h

rule4 : i*



=cut

sub thrifty {
	my ($llobj, $subrule)=@_;

	my $payload=$subrule->[1];

	my $rule 	= $payload->{rule};
	my $then_call 	= $payload->{then_call};

	my $grammarref=$llobj->convert_grammar_name_to_array_ref($rule);

	$llobj->parse_grammarref($grammarref, $then_call );


	

	return;	# must have matched.
}

=head2 greedy

basic greedy algorithm.
try the rule max times.
if not even zero match, die.
at the end of every match, record the letter location of that specific match.

return and let rest of grammar try. 
if rest of grammar dies, then revert to max-1 location,
and try another rule.
return and let rest of grammar try.
if rest of grammar dies, then revert to max-2 location
and try another rule.

keep doing this until you reach "min" number of matches.
we can't find a match even at "min", then quantifier dies



( run in 1.788 second using v1.01-cache-2.11-cpan-140bd7fdf52 )