Alt-CWB-ambs

 view release on metacpan or  search on metacpan

lib/CWB.pm  view on Meta::CPAN

  $reg->set_comments(":INFO", @lines);  # overwrite existing comments
  $reg->add_comments("::language", "", "comment for language property", "");
  $reg->set_comments("::language");     # delete comments before property
  $comment = $reg->line_comment("np");  # inline comment of np attribute
  $reg->line_comment("word", "the required word attribute");  # set comment
  $reg->delete_line_comment("word");    # delete inline comment

  # (over)write registry file (requires full pathname)
  $reg->write("/corpora/c1/registry/dickens");

=head1 DESCRIPTION

This module offers basic support for using the IMS Open Corpus Workbench
(L<http://cwb.sourceforge.net/>) from Perl scripts.
Several additional functions are included to perform tasks
that are often needed by corpus-related scripts.


=head1 CWB PATHNAMES

Package variables give the full pathnames of B<CQP> and the B<CWB tools>, 
so they can be used in shell commands even when they are not
installed in the user's search path. The following variables are available:

  $CWB::CQP;             # cqp
  $CWB::Config;          # cwb-config
  $CWB::Encode;          # cwb-encode
  $CWB::Makeall;         # cwb-makeall
  $CWB::Decode;          # cwb-decode
  $CWB::Lexdecode;       # cwb-lexdecode
  $CWB::DescribeCorpus;  # cwb-describe-corpus
  $CWB::Huffcode;        # cwb-huffcode
  $CWB::CompressRdx;     # cwb-compress-rdx
  $CWB::Itoa;            # cwb-itoa
  $CWB::Atoi;            # cwb-atoi
  $CWB::SEncode;         # cwb-s-encode
  $CWB::SDecode;         # cwb-s-decode
  $CWB::ScanCorpus;      # cwb-scan-corpus
  $CWB::Align;           # cwb-align
  $CWB::AlignEncode;     # cwb-align-encode
  $CWB::CQPserver;       # cqpserver

Other configuration information includes the general installation prefix,
the directory containing CWB binaries (which might be used to install additional
software related to the CWB), and the default registry directory.  B<NB:>
individual install paths may have overridden the general prefix, so the
package variable I<$CWB::Prefix> does not have much practical importance.
Use the B<cwb-config> program to find out the precise installation paths.

  $CWB::Prefix;          # general installation prefix
  $CWB::BinDir;          # directory for CWB binaries (executable programs)
  $CWB::DefaultRegistry; # compiled-in default registry directory

=cut

# make package configuration variables available
our $Prefix = $CWB::Config::Prefix; # this doesn't say much, as individual install directories may have been overwritten
our $BinDir = $CWB::Config::BinDir;
our $DefaultRegistry = $CWB::Config::Registry;

# global variables: full paths to CWB tools
our $Config = "$BinDir/cwb-config";
our $SEncode = "$BinDir/cwb-s-encode";
our $SDecode = "$BinDir/cwb-s-decode";
our $Encode = "$BinDir/cwb-encode";
our $Decode = "$BinDir/cwb-decode";
our $Lexdecode = "$BinDir/cwb-lexdecode";
our $Makeall = "$BinDir/cwb-makeall";
our $DescribeCorpus = "$BinDir/cwb-describe-corpus";
our $Itoa = "$BinDir/cwb-itoa";
our $Atoi = "$BinDir/cwb-atoi";
our $CompressRdx = "$BinDir/cwb-compress-rdx";
our $Huffcode = "$BinDir/cwb-huffcode";
our $ScanCorpus = "$BinDir/cwb-scan-corpus";
our $Align = "$BinDir/cwb-align";
our $AlignEncode = "$BinDir/cwb-align-encode";
our $CQP = "$BinDir/cqp";
our $CQPserver = "$BinDir/cqpserver";

## ======================================================================
##  some general utility functions
## ======================================================================

=head1 MISCELLANEOUS FUNCTIONS

=over 4

=item @dirs = CWB::RegistryDirectory();

The function B<CWB::RegistryDirectory> can be used to determine the I<effective>
registry directory (either the compiled-in default registry or a setting made
in the I<CORPUS_REGISTRY> environment variable). It is possible to specify multiple
registry directories, so B<CWB::RegistryDirectory> returns a list of strings.

=cut

sub RegistryDirectory {
  my $registry = $ENV{'CORPUS_REGISTRY'} || $DefaultRegistry;
  my @dirs = split /:/, $registry;
  foreach (@dirs) { s/^\?// }; # remove '?' marking optional registry directories
  return wantarray ? @dirs : shift @dirs;
}

=item $fh = CWB::OpenFile($name);

=item $fh = CWB::OpenFile($mode, $name);

Open file I<$name> for reading, writing, or appending. Returns B<FileHandle>
object if successful, otherwise it B<die>s with an error message. It is thus
never necessary to check whether I<$fh> is defined.

If B<CWB::OpenFile> is called with two arguments, I<$mode> indicates the file
access mode: C<E<lt>> for reading, C<E<gt>> for writing, C<E<gt>E<gt>> for
appending, C<|-> for a write pipe and C<-|> for a read pipe (see
L<perlfunc/"open"> for details).

In the one-argument form, B<CWB::OpenFile> examines the file name for an
embedded access mode specifier. If I<$name> starts with C<E<gt>> the file is
opened for writing (an existing file will be overwritten), if it starts
with C<E<gt>E<gt>> the file is opened for appending. The default is to open
the file for reading, which can optionally be made explicit by a leading C<E<lt>>. 

lib/CWB.pm  view on Meta::CPAN

  my $status = $self->{STATUS};
  croak "CWB::TempFile: Can't rewind tempfile ".$self->name." with status ".$self->status
    if $status eq "D" or $status eq "W";
  if ($status ne "R") {
    # if rewind is called before first read, it does nothing
  }
  else {
    $self->{FH}->close
      or croak "CWB::TempFile:: Error writing tempfile ".$self->name." ($!)";
    $self->{FH} = CWB::OpenFile $self->{NAME};
  }
}

=back

=cut

## ======================================================================
##  execute shell command with thorough error checks
## ======================================================================

package CWB::Shell;

use Carp;

=head1 SHELL COMMANDS

The B<CWB::>B<Shell::Cmd()> function provides a convenient replacement
for the built-in B<system> command. Standard output and error messages
produced by the invoked shell command are captured to avoid screen
clutter, and the former is available to the Perl script (similar to
the backtick operator C<`$shell_cmd`>). B<CWB::>B<Shell::Cmd()> also checks
for a variety of error conditions and returns an error level value ranging
from 0 (successful) to 6 (fatal error):

  Error Level  Description
    6          command execution failed (system error)
    5          non-zero exit value or error message on STDERR
    4          -- reserved for future use --
    3          warning message on STDERR
    2          any output on STDERR
    1          error message on STDOUT

Depending on the value of I<$CWB::Shell::Paranoid>, a warning message will
be issued or the function will B<die> with an error message.

=over 4

=item $CWB::Shell::Paranoid = 0;

With the default setting of 0, B<CWB::>B<Shell::Cmd()> will B<die> if the
error level is 5 or greater. In the B<extra paranoid> setting (+1), it
will almost always B<die> (error level 2 or greater). In the B<less paranoid>
setting (-1) only an error level of 6 (i.e. failure to execute the shell
command) will cause the script to abort.

=cut

our $Paranoid = 0;

# use global variables and sub to handle warn/die situations
our $return_status = 0;
our $current_cmd = "";

# internal function: raise error (according to current Paranoid setting)
#   Error $errlevel, $message [, $message ...];
# error levels are:
# LVL  <less p.>   <normal>   <extra-p.>
#  6 :  fatal      fatal       fatal 
#  5 :  warn       fatal       fatal
#  4 :  warn       warn        fatal
#  3 :  nothing    warn        fatal
#  2 :  nothing    nothing     fatal
#  1 :  nothing    nothing     warn
#  0 :  nothing    nothing     nothing
sub Error ( $@ ) {
  my $errlevel = shift;
  my @message = @_;
  # $action is: 0=fatal, 1=warn, 2=nothing 
  my $action;
  
  if ($errlevel >= $return_status) {  
    $return_status = $errlevel;

    if ($Paranoid < 0) {
      $action = [qw<2 2 2 2 1 1 0>]->[$errlevel];
    }
    elsif ($Paranoid == 0) {
      $action = [qw<2 2 2 1 1 0 0>]->[$errlevel];
    }
    else {
      $action = [qw<2 1 0 0 0 0 0>]->[$errlevel];
    }
  }
  else {
    $action = 2;                # don't report this error if a more serious one has already occurred
  }

  if ($action == 0) {
    croak
      "\nSHELL CMD '$current_cmd' FAILED:\n",
      map {chomp; ">> $_\n"} @message;
  }
  elsif ($action == 1) {
    print "\nWARNING (SHELL CMD '$current_cmd'):\n";
    map {chomp; print "-> $_\n"} @message;
  }
  else {
    # nothing :o)
  }
}


=item $errlvl = CWB::Shell::Cmd($cmd);

=item $errlvl = CWB::Shell::Cmd($cmd, $filename);

=item $errlvl = CWB::Shell::Cmd($cmd, \@lines);

The first form executes I<$cmd> as a shell command (through the built-in
B<system> function) and returns an error level value. With the default



( run in 0.743 second using v1.01-cache-2.11-cpan-ceb78f64989 )