Alt-CWB-ambs
view release on metacpan or search on metacpan
lib/CWB/CEQL.pm view on Meta::CPAN
use CWB::CEQL::String;
=head1 NAME
CWB::CEQL - The Common Elementary Query Language for CQP front-ends
=head1 SYNOPSIS
use CWB::CEQL;
our $CEQL = new CWB::CEQL;
$CEQL->SetParam("pos_attribute", "tags"); # **TODO: parameters**
$cqp_query = $CEQL->Parse($ceql_query);
if (not defined $cqp_query) {
@error_msg = $CEQL->ErrorMessage;
$html_msg = $CEQL->HtmlErrorMessage;
}
## extend or modify standard CEQL grammar by subclassing
package BNCWEB::CEQL;
use base 'CWB::CEQL';
sub lemma {
## overwrite 'lemma' rule here (e.g. to allow for BNCweb's ``{bucket/N}'' notation)
my $orig_result = $self->SUPER::lemma($string); # call original rule if needed
}
## you can now use BNCWEB::CEQL in the same way as CWB::CEQL
=head1
=head1 DESCRIPTION
B<** TODO **>
=head1 METHODS
Most important user-level methods inherited from B<CWB::CEQL::Parser>.
=over 4
=item I<$CEQL> = B<new> CWB::CEQL;
Create parser object for CEQL queries. Use the B<Parse> method of I<$CEQL>
to translate a CEQL query into CQP code.
=cut
sub new {
my $class = shift;
my $self = new CWB::CEQL::Parser;
$self->NewParam("pos_attribute", "pos");
$self->NewParam("lemma_attribute", "lemma");
$self->NewParam("simple_pos", undef);
$self->NewParam("simple_pos_attribute", undef);
$self->NewParam("s_attributes", { "s" => 1 });
$self->NewParam("default_ignore_case", 1);
$self->NewParam("default_ignore_diac", 0);
return bless($self, $class);
}
=item I<$cqp_query> = I<$CEQL>->B<Parse>(I<$simple_query>);
Parses simple query in CEQL syntax and returns equivalent CQP code. If there
is a syntax error in I<$simple_query> or parsing fails for some other reason,
an B<undef>ined value is returned.
=item @text_lines = I<$CEQL>->B<ErrorMessage>;
=item $html_code = I<$CEQL>->B<HtmlErrorMessage>;
If the last CEQL query failed to parse, these methods return an error message
either as a list of text lines (B<ErrorMessage>) or as pre-formatted HTML code
that can be used directly by a Web interface (B<HtmlErrorMessage>). The error
message includes a backtrace of the internal call stack in order to help users
identify the precise location of the problem.
=item I<$CEQL>->B<SetParam>(I<$name>, I<$value>);
Change parameters of the CEQL grammar. Currently, the following parameters
are available:
=over 4
=item C<pos_attribute>
The p-attribute used to store part-of-speech tags in the CWB corpus (default:
C<pos>). CEQL queries should not be used for corpora without POS tagging,
which we consider to be a minimal level of annotation.
=item C<lemma_attribute>
The p-attribute used to store lemmata (base forms) in the CWB corpus (default:
C<lemma>). Set to B<undef> if the corpus has not been lemmatised.
=item C<simple_pos>
Lookup table for simple part-of-speech tags (in CEQL constructions like
C<run_{N}>). Must be a hashref with simple POS tags as keys and CQP regular
expressions matching an appropriate set of standard POS tags as the
corresponding values. The default value is B<undef>, indicating that no
simple POS tags have been defined. A very basic setup for the Penn
Treebank tag set might look like this:
$CEQL->SetParam("simple_pos", {
"N" => "NN.*", # common nouns
"V" => "V.*", # any verb forms
"A" => "JJ.*", # adjectives
});
=item C<simple_pos_attribute>
Simple POS tags may use a different p-attribute than standard POS tags,
specified by the C<simple_pos_attribute> parameter. If it is set to B<undef>
(default), the C<pos_attribute> will be used for simplified POS tags as well.
=item C<s_attributes>
Lookup table indicating which s-attributes in the CWB corpus may be accessed
( run in 1.771 second using v1.01-cache-2.11-cpan-75ffa21a3d4 )