Alt-CWB-ambs

 view release on metacpan or  search on metacpan

lib/CWB/CEQL/Parser.pm  view on Meta::CPAN


The DPP architecture was motivated by the observation that simplified queries
are often very similar in structure to the corresponding low-level queries,
and that many authors use cascaded regular expression substitutions to
transform one into the other.  While such cascades are very easy to write in
Perl and perform efficiently, there are two important limitations: it would
often be useful (i) to validate and transform recursive structures, and (ii)
to restrict a particular transformation to a certain scope.  Because of these
limitations, incorrect user input -- and sometimes even correct input -- leads
to malformed low-level queries.  Without an intimate knowledge of the
implementation, it is often impossible to guess the true location of the
problem from the cryptic error messages generated by the backend processor.
Moreover, simplified query languages based on regular expression substitution
typically have rather limited expressiveness and flexibility (because the
substitutions are applied unconditionally, so symbols cannot have different
meanings in different contexts).

B<CWB::CEQL::Parser> aims to overcome these limitations by combining
regexp-based matching and substitution with a simple top-down parser for
context-free grammars, as well as a shift-reduce-style parser for nested
bracketing.  Parsing complexity is limited by enforcing a B<fully

lib/CWB/Encoder.pm  view on Meta::CPAN

  $bnc->memory(400);          # use up to 400 MB of RAM (default: 75)
  $bnc->validate(0);          # disable validation for faster indexing
  $bnc->debug(1);             # enable debugging output

  $bnc->make("word", "pos");  # build index & compress
  $bnc->makeall;              # process all p-attributes


  $bnc = new CWB::Encoder "BNC";

  $bnc->registry("/path/to/registry");  # will try to guess otherwise
  $bnc->dir("/path/to/data/directory"); # directory for corpus data files
  $bnc->overwrite(1);         # may overwrite existing files / directories
  
  $bnc->longname("British National Corpus"); # optional
  $bnc->info("Line1.\nLine2.\n...");    # optional multi-line info text
  $bnc->charset("latin1");    # defaults to latin1
  $bnc->language("en");       # defaults to ??
  
  $bnc->group("corpora");     # optional: group and access permissions
  $bnc->perm("640");          # for newly created files & directories

lib/CWB/Encoder.pm  view on Meta::CPAN

  my $overwrite = $self->{OVERWRITE};
  
  my $name = $self->{NAME};     # check that setup is complete
  croak "CWB::Encoder: Corpus ID hasn't been specified (with name() method)\n"
    unless $name;
  croak "CWB::Encoder: No positional attributes specified.\n"
    unless @{$self->{PATT}} > 0;

  my $reg = $self->{REGISTRY};
  if (not defined $reg) {
    $reg = CWB::RegistryDirectory(); # try to guess registry if not specified
    $self->{REGISTRY} = $reg;
  }
  croak "CWB::Encoder: Can't determine unique registry directory (path is $reg).\n"
    if $reg =~ /:/;
  croak "CWB::Encoder: Registry directory $reg does not exist.\n"
    unless -d $reg;
  print STDERR "CWB::Encoder: registry directory is $reg\n"
    if $self->{DEBUG};

  my $regfile = "$reg/$name";   # remove registry entry if it exists



( run in 1.738 second using v1.01-cache-2.11-cpan-702932259ff )