Alt-CWB-ambs

 view release on metacpan or  search on metacpan

lib/CWB/CEQL.pm  view on Meta::CPAN

use CWB::CEQL::String;

=head1 NAME

CWB::CEQL - The Common Elementary Query Language for CQP front-ends

=head1 SYNOPSIS

  use CWB::CEQL;
  our $CEQL = new CWB::CEQL;

  $CEQL->SetParam("pos_attribute", "tags"); # **TODO: parameters**

  $cqp_query = $CEQL->Parse($ceql_query);
  if (not defined $cqp_query) {
    @error_msg = $CEQL->ErrorMessage;
    $html_msg = $CEQL->HtmlErrorMessage;
  }

  ## extend or modify standard CEQL grammar by subclassing
  package BNCWEB::CEQL;
  use base 'CWB::CEQL';

  sub lemma {
    ## overwrite 'lemma' rule here (e.g. to allow for BNCweb's ``{bucket/N}'' notation)
    my $orig_result = $self->SUPER::lemma($string); # call original rule if needed
  }

  ## you can now use BNCWEB::CEQL in the same way as CWB::CEQL

=head1

=head1 DESCRIPTION

B<** TODO **>


=head1 METHODS

Most important user-level methods inherited from B<CWB::CEQL::Parser>.

=over 4

=item I<$CEQL> = B<new> CWB::CEQL;

Create parser object for CEQL queries.  Use the B<Parse> method of I<$CEQL>
to translate a CEQL query into CQP code.

=cut

sub new {
  my $class = shift;
  my $self = new CWB::CEQL::Parser;
  $self->NewParam("pos_attribute", "pos");
  $self->NewParam("lemma_attribute", "lemma");
  $self->NewParam("simple_pos", undef);
  $self->NewParam("simple_pos_attribute", undef);
  $self->NewParam("s_attributes", { "s" => 1 });
  $self->NewParam("default_ignore_case", 1);
  $self->NewParam("default_ignore_diac", 0);
  return bless($self, $class);
}

=item I<$cqp_query> = I<$CEQL>->B<Parse>(I<$simple_query>);

Parses simple query in CEQL syntax and returns equivalent CQP code.  If there
is a syntax error in I<$simple_query> or parsing fails for some other reason,
an B<undef>ined value is returned.

=item @text_lines = I<$CEQL>->B<ErrorMessage>;

=item $html_code = I<$CEQL>->B<HtmlErrorMessage>;

If the last CEQL query failed to parse, these methods return an error message
either as a list of text lines (B<ErrorMessage>) or as pre-formatted HTML code
that can be used directly by a Web interface (B<HtmlErrorMessage>).  The error
message includes a backtrace of the internal call stack in order to help users
identify the precise location of the problem.

=item I<$CEQL>->B<SetParam>(I<$name>, I<$value>);

Change parameters of the CEQL grammar.  Currently, the following parameters
are available:

=over 4

=item C<pos_attribute>

The p-attribute used to store part-of-speech tags in the CWB corpus (default:
C<pos>).  CEQL queries should not be used for corpora without POS tagging,
which we consider to be a minimal level of annotation.

=item C<lemma_attribute>

The p-attribute used to store lemmata (base forms) in the CWB corpus (default:
C<lemma>).  Set to B<undef> if the corpus has not been lemmatised.

=item C<simple_pos>

Lookup table for simple part-of-speech tags (in CEQL constructions like
C<run_{N}>).  Must be a hashref with simple POS tags as keys and CQP regular
expressions matching an appropriate set of standard POS tags as the
corresponding values.  The default value is B<undef>, indicating that no
simple POS tags have been defined.  A very basic setup for the Penn
Treebank tag set might look like this:

  $CEQL->SetParam("simple_pos", {
      "N" => "NN.*",   # common nouns
      "V" => "V.*",    # any verb forms
      "A" => "JJ.*",   # adjectives
    });

=item C<simple_pos_attribute>

Simple POS tags may use a different p-attribute than standard POS tags,
specified by the C<simple_pos_attribute> parameter.  If it is set to B<undef>
(default), the C<pos_attribute> will be used for simplified POS tags as well.

=item C<s_attributes>

Lookup table indicating which s-attributes in the CWB corpus may be accessed



( run in 1.771 second using v1.01-cache-2.11-cpan-75ffa21a3d4 )