Alt-CWB-ambs
view release on metacpan or search on metacpan
$reg->set_comments(":INFO", @lines); # overwrite existing comments
$reg->add_comments("::language", "", "comment for language property", "");
$reg->set_comments("::language"); # delete comments before property
$comment = $reg->line_comment("np"); # inline comment of np attribute
$reg->line_comment("word", "the required word attribute"); # set comment
$reg->delete_line_comment("word"); # delete inline comment
# (over)write registry file (requires full pathname)
$reg->write("/corpora/c1/registry/dickens");
=head1 DESCRIPTION
This module offers basic support for using the IMS Open Corpus Workbench
(L<http://cwb.sourceforge.net/>) from Perl scripts.
Several additional functions are included to perform tasks
that are often needed by corpus-related scripts.
=head1 CWB PATHNAMES
Package variables give the full pathnames of B<CQP> and the B<CWB tools>,
so they can be used in shell commands even when they are not
installed in the user's search path. The following variables are available:
$CWB::CQP; # cqp
$CWB::Config; # cwb-config
$CWB::Encode; # cwb-encode
$CWB::Makeall; # cwb-makeall
$CWB::Decode; # cwb-decode
$CWB::Lexdecode; # cwb-lexdecode
$CWB::DescribeCorpus; # cwb-describe-corpus
$CWB::Huffcode; # cwb-huffcode
$CWB::CompressRdx; # cwb-compress-rdx
$CWB::Itoa; # cwb-itoa
$CWB::Atoi; # cwb-atoi
$CWB::SEncode; # cwb-s-encode
$CWB::SDecode; # cwb-s-decode
$CWB::ScanCorpus; # cwb-scan-corpus
$CWB::Align; # cwb-align
$CWB::AlignEncode; # cwb-align-encode
$CWB::CQPserver; # cqpserver
Other configuration information includes the general installation prefix,
the directory containing CWB binaries (which might be used to install additional
software related to the CWB), and the default registry directory. B<NB:>
individual install paths may have overridden the general prefix, so the
package variable I<$CWB::Prefix> does not have much practical importance.
Use the B<cwb-config> program to find out the precise installation paths.
$CWB::Prefix; # general installation prefix
$CWB::BinDir; # directory for CWB binaries (executable programs)
$CWB::DefaultRegistry; # compiled-in default registry directory
=cut
# make package configuration variables available
our $Prefix = $CWB::Config::Prefix; # this doesn't say much, as individual install directories may have been overwritten
our $BinDir = $CWB::Config::BinDir;
our $DefaultRegistry = $CWB::Config::Registry;
# global variables: full paths to CWB tools
our $Config = "$BinDir/cwb-config";
our $SEncode = "$BinDir/cwb-s-encode";
our $SDecode = "$BinDir/cwb-s-decode";
our $Encode = "$BinDir/cwb-encode";
our $Decode = "$BinDir/cwb-decode";
our $Lexdecode = "$BinDir/cwb-lexdecode";
our $Makeall = "$BinDir/cwb-makeall";
our $DescribeCorpus = "$BinDir/cwb-describe-corpus";
our $Itoa = "$BinDir/cwb-itoa";
our $Atoi = "$BinDir/cwb-atoi";
our $CompressRdx = "$BinDir/cwb-compress-rdx";
our $Huffcode = "$BinDir/cwb-huffcode";
our $ScanCorpus = "$BinDir/cwb-scan-corpus";
our $Align = "$BinDir/cwb-align";
our $AlignEncode = "$BinDir/cwb-align-encode";
our $CQP = "$BinDir/cqp";
our $CQPserver = "$BinDir/cqpserver";
## ======================================================================
## some general utility functions
## ======================================================================
=head1 MISCELLANEOUS FUNCTIONS
=over 4
=item @dirs = CWB::RegistryDirectory();
The function B<CWB::RegistryDirectory> can be used to determine the I<effective>
registry directory (either the compiled-in default registry or a setting made
in the I<CORPUS_REGISTRY> environment variable). It is possible to specify multiple
registry directories, so B<CWB::RegistryDirectory> returns a list of strings.
=cut
sub RegistryDirectory {
my $registry = $ENV{'CORPUS_REGISTRY'} || $DefaultRegistry;
my @dirs = split /:/, $registry;
foreach (@dirs) { s/^\?// }; # remove '?' marking optional registry directories
return wantarray ? @dirs : shift @dirs;
}
=item $fh = CWB::OpenFile($name);
=item $fh = CWB::OpenFile($mode, $name);
Open file I<$name> for reading, writing, or appending. Returns B<FileHandle>
object if successful, otherwise it B<die>s with an error message. It is thus
never necessary to check whether I<$fh> is defined.
If B<CWB::OpenFile> is called with two arguments, I<$mode> indicates the file
access mode: C<E<lt>> for reading, C<E<gt>> for writing, C<E<gt>E<gt>> for
appending, C<|-> for a write pipe and C<-|> for a read pipe (see
L<perlfunc/"open"> for details).
In the one-argument form, B<CWB::OpenFile> examines the file name for an
embedded access mode specifier. If I<$name> starts with C<E<gt>> the file is
opened for writing (an existing file will be overwritten), if it starts
with C<E<gt>E<gt>> the file is opened for appending. The default is to open
the file for reading, which can optionally be made explicit by a leading C<E<lt>>.
my $status = $self->{STATUS};
croak "CWB::TempFile: Can't rewind tempfile ".$self->name." with status ".$self->status
if $status eq "D" or $status eq "W";
if ($status ne "R") {
# if rewind is called before first read, it does nothing
}
else {
$self->{FH}->close
or croak "CWB::TempFile:: Error writing tempfile ".$self->name." ($!)";
$self->{FH} = CWB::OpenFile $self->{NAME};
}
}
=back
=cut
## ======================================================================
## execute shell command with thorough error checks
## ======================================================================
package CWB::Shell;
use Carp;
=head1 SHELL COMMANDS
The B<CWB::>B<Shell::Cmd()> function provides a convenient replacement
for the built-in B<system> command. Standard output and error messages
produced by the invoked shell command are captured to avoid screen
clutter, and the former is available to the Perl script (similar to
the backtick operator C<`$shell_cmd`>). B<CWB::>B<Shell::Cmd()> also checks
for a variety of error conditions and returns an error level value ranging
from 0 (successful) to 6 (fatal error):
Error Level Description
6 command execution failed (system error)
5 non-zero exit value or error message on STDERR
4 -- reserved for future use --
3 warning message on STDERR
2 any output on STDERR
1 error message on STDOUT
Depending on the value of I<$CWB::Shell::Paranoid>, a warning message will
be issued or the function will B<die> with an error message.
=over 4
=item $CWB::Shell::Paranoid = 0;
With the default setting of 0, B<CWB::>B<Shell::Cmd()> will B<die> if the
error level is 5 or greater. In the B<extra paranoid> setting (+1), it
will almost always B<die> (error level 2 or greater). In the B<less paranoid>
setting (-1) only an error level of 6 (i.e. failure to execute the shell
command) will cause the script to abort.
=cut
our $Paranoid = 0;
# use global variables and sub to handle warn/die situations
our $return_status = 0;
our $current_cmd = "";
# internal function: raise error (according to current Paranoid setting)
# Error $errlevel, $message [, $message ...];
# error levels are:
# LVL <less p.> <normal> <extra-p.>
# 6 : fatal fatal fatal
# 5 : warn fatal fatal
# 4 : warn warn fatal
# 3 : nothing warn fatal
# 2 : nothing nothing fatal
# 1 : nothing nothing warn
# 0 : nothing nothing nothing
sub Error ( $@ ) {
my $errlevel = shift;
my @message = @_;
# $action is: 0=fatal, 1=warn, 2=nothing
my $action;
if ($errlevel >= $return_status) {
$return_status = $errlevel;
if ($Paranoid < 0) {
$action = [qw<2 2 2 2 1 1 0>]->[$errlevel];
}
elsif ($Paranoid == 0) {
$action = [qw<2 2 2 1 1 0 0>]->[$errlevel];
}
else {
$action = [qw<2 1 0 0 0 0 0>]->[$errlevel];
}
}
else {
$action = 2; # don't report this error if a more serious one has already occurred
}
if ($action == 0) {
croak
"\nSHELL CMD '$current_cmd' FAILED:\n",
map {chomp; ">> $_\n"} @message;
}
elsif ($action == 1) {
print "\nWARNING (SHELL CMD '$current_cmd'):\n";
map {chomp; print "-> $_\n"} @message;
}
else {
# nothing :o)
}
}
=item $errlvl = CWB::Shell::Cmd($cmd);
=item $errlvl = CWB::Shell::Cmd($cmd, $filename);
=item $errlvl = CWB::Shell::Cmd($cmd, \@lines);
The first form executes I<$cmd> as a shell command (through the built-in
B<system> function) and returns an error level value. With the default
( run in 0.743 second using v1.01-cache-2.11-cpan-ceb78f64989 )