CodeGen-Cpppp
view release on metacpan or search on metacpan
lib/CodeGen/Cpppp/CParser.pm view on Meta::CPAN
| # all other characters
(.) (?{ $_type= 'unknown'; $_error= q{parse error} })
)
}xcg
) {
my @token= ($_type, $_value // $1, $-[0], $+[0] - $-[0], defined $_error? ($_error) : ());
# disambiguate negative number from minus operator
if (($_type eq 'integer' || $_type eq 'real')
&& @tokens && $tokens[-1][0] eq '-'
&& (@tokens == 1 || !$tokens_before_infix_minus{$tokens[-2]->type})
) {
$token[1]= -$token[1];
$token[2]= $tokens[-1][2];
$token[3]= $+[0] - $tokens[-1][2];
@{$tokens[-1]}= @token;
} else {
push @tokens, bless \@token, 'CodeGen::Cpppp::CParser::Token';
}
($_error, $_value)= (undef, undef);
}
return @tokens;
}
1;
__END__
=pod
=encoding UTF-8
=head1 NAME
CodeGen::Cpppp::CParser - C Parser Utility Library
=head1 METHODS
=head2 tokenize
@tokens= $class->tokenize($string);
@tokens= $class->tokenize(\$string);
@tokens= $class->tokenize(\$string, $max_tokens);
Parse some number of C language tokens from the input string, and update the
regex C<pos()> of the string so that you can resume parsing more tokens later.
Since this updates the pos of the string, you can pass it as a reference to
make it more clear to readers what is happening.
If C<$max_tokens> is given, only that many tokens will be returned.
Whitespace is ignored (not returned as a token) except for whitespace contained
in a 'directive' token. The body of a directive needs further tokenized.
Each token is an arrayref of the form:
[ $type, $value, $offset, $length, $error=undef ]
$type: 'directive', 'comment', 'string', 'char', 'real', 'integer',
'keyword', 'ident', 'unknown', or any punctuation character
$value: for constants, this is the decoded string or numeric value
for directives and comments, it is the body text
for punctuation, it is a copy of $type
for unknown, it is the exact character that didn't parse
$src_pos: the character offset within the source $string
$src_len: the number of characters occupied in the source $string
$error: if the token is invalid in some way, but still undisputedly that
type of token (e.g. unclosed string or unclosed comment) it will be
returned with a 5th element containing the error message.
For some tokens, you will need to inspect C<< substr($string, $offset, $length) >>
to get the full details, like the suffixes on integer constants.
Consecutive string tokens are not merged, since the parser needs to handle
that step after preprocessor macros are substituted.
=head1 AUTHOR
Michael Conrad <mike@nrdvana.net>
=head1 VERSION
version 0.005
=head1 COPYRIGHT AND LICENSE
This software is copyright (c) 2024 by Michael Conrad.
This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.
=cut
( run in 2.116 seconds using v1.01-cache-2.11-cpan-df04353d9ac )