view release on metacpan or search on metacpan
lib/CWB/CEQL.pm
lib/CWB/CEQL/Parser.pm
lib/CWB/CEQL/String.pm
lib/CWB/CQP.pm
lib/CWB/Encoder.pm
Makefile.PL
MANIFEST This list of files
README
script/cwb-align-import
script/cwb-make
script/cwb-regedit
t/00_load.t
t/10_cwb_tools.t
t/11_cwb_file.t
t/12_cwb_tempfile.t
t/13_cwb_shell.t
t/14_cwb_registry.t
t/20_encode_vss.t
t/31_cqp_dickens.t.dont_run
t/40_ceql_parser.t
t/41_ceql_string.t
Makefile.PL view on Meta::CPAN
STOP
$fh->close
or die "Error writing '$config_file': $!";
print "Configuraion saved to file $config_file\n";
## now we can let MakeMaker generate the makefile
WriteMakefile(
'NAME' => "Alt::CWB::ambs",
'VERSION_FROM' => "lib/Alt/CWB/ambs.pm",
'EXE_FILES' => ["script/cwb-make",
"script/cwb-regedit",
"script/cwb-align-import"],
'META_MERGE' => {
"meta-spec" => { version => 2 },
"no_index" => {
directory => [ 'lib/CWB/' ],
file => [ 'lib/CWB.pm' ],
},
},
);
If the "cwb-config" utility is not in your standard search path, you can specify
its location with the --config option:
perl Makefile.PL --config=~/cwb-3.0/bin/cwb-config
PACKAGE OVERVIEW
The CWB/Perl package contains the following Perl modules
CWB general utility functions, registry editor
CWB::CQP run CQP process in the background
CWB::Encoder stream-lined encoding and indexing of CWB corpora
CWB::CEQL an implementation of the CEQL simple query language
See the manual pages (e.g. "perldoc CWB::CQP") of these modules for further
information. The package also installs a number of command-line utility
programs:
cwb-make stream-lined indexing & compression of attributes
cwb-regedit a simple registry editor for the command line
cwb-align-import import existing sentence alignment into the CWB
Type e.g. "cwb-make -h" for a usage summary, and "perldoc cwb-make" for the
full manual page.
COPYRIGHT
Copyright (C) 1999-2010 by Stefan Evert [http::/purl.org/stefan.evert]
data/vrt/VeryShortStories.vrt view on Meta::CPAN
pull-down JJ pull-down
menus NNS menu
. SENT .
</s>
<s>
A DT a
short JJ short
note NN note
from IN from
the DT the
editor NN editor
informed VBD inform
him PP him
about IN about
today NN today
's POS 's
work NN work
. SENT .
</s>
<s>
First JJ first
data/vrt/VeryShortStories.vrt view on Meta::CPAN
to TO to
pay VB pay
. SENT .
</s>
<s>
He PP he
gulped VBD gulp
and CC and
gave VBD give
his PP$ his
credit NN credit
card NN card
a DT a
melancholic JJ melancholic
glance NN glance
before IN before
he PP he
inserted VBD insert
it PP it
into IN into
the DT the
data/vrt/VeryShortStories.vrt view on Meta::CPAN
In IN in
order NN order
to TO to
call VB call
a DT a
tin NN tin
, , ,
Ed NP Ed
inserted VBD insert
his PP$ his
credit NN credit
card NN card
into IN into
a DT a
slot NN slot
. SENT .
</s>
<s>
A DT a
display NN display
lit VBD light
data/vrt/VeryShortStories.vrt view on Meta::CPAN
messages NNS message
item NN item
. SENT .
</s>
<s>
The DT the
usual JJ usual
message NN message
from IN from
the DT the
editor NN editor
appeared VBD appear
, , ,
and CC and
another DT another
one NN one
below RB below
. SENT .
</s>
<s>
After IN after
$CWB::Shell::Paranoid = 1; # more paranoid checks (-1 for less paranoid)
# read / modify / write registry files (must be in canonical format)
$reg = new CWB::RegistryFile; # create new registry file
$reg = new CWB::RegistryFile "/corpora/c1/registry/dickens"; # load file
die "failed" unless defined $reg; # will fail if not in canonical format
$reg = new CWB::RegistryFile "dickens"; # search in standard registry
$filename = $reg->filename; # retrieve full pathname
# edit standard fields
$name = $reg->name; # read NAME field
$reg->name("Charles Dickens");# modify NAME field
$corpus_id = $reg->id; # same for ID, HOME, INFO
$home_dir = $reg->home;
$info_file = $reg->info;
$reg->delete_info; # INFO line is optional and may be deleted
# edit corpus properties
@properties = $reg->list_properties;
$value = $reg->property("language"); # get property value
$reg->property("language", "en"); # set / add property
$reg->delete_property("language");
# edit attributes ('p'=positional, 's'=structural, 'a'=alignment)
@attr = $reg->list_attributes; # list all attributes
@s_attr = $reg->list_attributes('a'); # list alignment attributes
$type = $reg->attribute("word"); # 'p'/'s'/'a' or undef
$reg->delete_attribute("np");
$reg->add_attribute("np", 's'); # specify type when adding attribute
$dir = $reg->attribute_path("lemma"); # may be stored in different directory
$reg->attribute_path("lemma", $dir); # set attribute path
$reg->delete_attribute_path; # default location is HOME directory
# comment lines (preceding field/declaration) and inline comments use keys:
## parse, modify and create registry entries (in canonical format)
## ======================================================================
package CWB::RegistryFile;
use Carp;
=head1 REGISTRY FILE EDITING
Registry files in B<canonical format> can be loaded into B<CWB::RegistryFile> objects,
edited using the various access methods detailed below, and written back to disk. It
is also possible to create a registry entry from scratch and save it to a disk file.
Canonical registry files consist of a B<header> and a B<body>. The
B<header> begins with a NAME, ID, PATH, and optional INFO field
NAME "long descriptive name"
ID my-corpus
PATH /path/to/data/directory
INFO /path/to/info/file.txt
The second form attempts to read and parse the registry file I<$filename>. If
successful, a B<CWB::RegistryFile> object storing all relevant information is
returned. If I<$filename> does not contain the character C</> and cannot be
found in the current directory, the constructor will automatically search the
standard registry directories for it. The full pathname of the registry file
can later be determined with the B<filename> method.
If the load operation failed (i.e. the file does not exist or is not in the
canonical registry file format), an error message is printed and an undefined
value returned (so this module can be used e.g. to write a robust graphical
registry editor). Always check the return value of the constructor before
proceeding.
=cut
sub new {
my $class = shift;
my $filename = shift;
my $self = # create and initialise object
{
NAME => "", # name of corpus (defaults to empty string)
lib/CWB/Encoder.pm view on Meta::CPAN
B<CWB::Indexer> builds indices for some or all positional attributes
of an existing corpus (using the B<cwb-makeall> tool). In addition,
these attributes are automatically compressed (using the
B<cwb-huffcode> and B<cwb-compress-rdx> tools). Compression and
indexing is interleaved to minimise the required amount of temporary
disk space, and a B<make>-like system ensures that old index files are
automatically updated.
B<CWB::Encoder> automates all steps necessary to encode a CWB corpus
(which includes cleaning up old files, running B<cwb-encode>, editing
the registry entry, indexing & compressing positional attributes, and
setting access permissions). Both modules can be set up with a few
simple method calls. Full descriptions are given separately in the
following sections.
=cut
## ======================================================================
## automatic creation, compression and updating of CWB index files (for p-attributes)
## ======================================================================
lib/CWB/Encoder.pm view on Meta::CPAN
if $self->{VERBOSE};
my $infofile = "$dir/.info";
my $fh = CWB::OpenFile "> $infofile";
print $fh $self->{INFO}, "\n";
$fh->close;
CWB::Shell::Cmd("chmod $perm '$infofile'")
if $perm;
CWB::Shell::Cmd("chgrp $group '$infofile'")
if $group;
print "Editing registry entry ...\n" # edit registry file
if $self->{VERBOSE};
my $reg = $self->{REGISTRY};
my $name = $self->{NAME};
my $regfile = "$reg/$name";
my $rf = new CWB::RegistryFile $regfile;
croak "CWB::Encoder: Syntax error in registry entry $regfile\n"
unless defined $rf;
$rf->name($self->{LONGNAME});
# $rf->property("charset", $self->{CHARSET}); # -- already set by cwb-encode (since v2.2.101)
$rf->property("language", $self->{LANG});
$rf->write($regfile);
print STDERR "CWB::Encoder: registry entry $regfile has been edited\n"
if $self->{DEBUG};
print STDERR "CWB::Encoder: setting access permissions for $regfile\n"
if $self->{DEBUG} and ($perm or $group);
CWB::Shell::Cmd("chmod $perm '$regfile'")
if $perm;
CWB::Shell::Cmd("chgrp $group '$regfile'")
if $group;
my $idx = new CWB::Indexer "$reg:".(uc $name); # build indices and compress p-attributes
$idx->group($group)
script/cwb-regedit view on Meta::CPAN
#!/usr/bin/perl
## -*-cperl-*-
## Author: Stefan Evert
## Purpose: simple command-line registry file editor
##
$| = 1;
use warnings;
use strict;
use locale;
use CWB;
use Getopt::Long;
sub Usage {
die <<STOP;
Usage: cwb-regedit [options] (CORPUS | <filename>) <command> [<command> ...]
Options:
-r <dir> use registry directory <dir> [system default]
--registry=<dir>
-h show this help page
--help
Commands:
:info
print basic information about the registry entry
script/cwb-regedit view on Meta::CPAN
while (@ARGV and not is_command()) {
push @ARGS, shift @ARGV;
}
return 1;
}
__END__
=head1 NAME
cwb-regedit - A simple command-line editor for CWB registry files
=head1 SYNOPSIS
cwb-regedit [options] (CORPUS | <filename>) <command> [<command> ...]
Options:
-r <dir>, --registry=<dir> use registry directory <dir> [system default]
-h, --help display usage summary
Commands:
:info
script/cwb-regedit view on Meta::CPAN
add positional, structural or alignment attributes
:del <names>
delete attributes of any type
Command names are case-insensitive, and some intuitive abbreviations and variations
are allowed (but only the forms listed above are guaranteed to work).
=head1 DESCRIPTION
B<cwb-regedit> is a simple command-line editor for CWB registry files in B<canonical format>.
It allows you to display and modify header information, corpus properties and attribute declarations.
For more complex editing tasks (such as adding descriptive comments), use the functionality
provided by the B<CWB> module (see L<CWB/"REGISTRY FILE EDITING"> manpage).
The first argument must be the name of a CWB registry file or a corpus ID.
In the latter case, B<cwb-regedit> will attempt to locate the corresponding registry file in
the system registry, or in a directory specified with the C<--registry> (C<-r>) option.
The remaining arguments are editing or display commands, which are applied to the registry entry in turn.
If any changes have been made, the modified registry file will be written back to disk automatically.
The following commands are currently supported:
=over 4
=item B<:info>
Print basic information about the registry entry, including pathname of the registry file,
corpus ID, name, data directory and info file, as well as all corpus properties defined in the registry entry.
script/cwb-regedit view on Meta::CPAN
Display corpus header information (corpus ID, corpus name, data directory, info file).
=item (B<:id> | B<:name> | B<:home> | B<:ifile>) I<value>
Modify corpus header information (corpus ID, corpus name, data directory, info file).
Don't forget to quote I<value> with single or double quotes if it contains whitespace or other special characters.
=item B<:prop> I<name>
Display corpus property I<name>. If this property is not defined, B<cwb-regedit> prints an
empty line and issues a warning message on B<stderr>.
=item B<:prop> I<name> I<value>
Modify or add corpus property I<name>. Don't forget to quote I<value> if it contains whitespace or special characters.
=item B<:list> (B<:p> | B<:s> | B<:a>) ...
List all attributes of the specified type: B<:p> for positional attributes, B<:s> for structural attributes,
and B<:a> for alignment attributes. Attribute names are printed on a single line separated by blanks.
t/14_cwb_registry.t view on Meta::CPAN
# -*-cperl-*-
## Test registry file editor
use Test::More tests => 18;
use File::Path qw(remove_tree);
use CWB;
use File::Compare;
our $dickens = new CWB::RegistryFile "data/registry/dickens";
isa_ok($dickens, CWB::RegistryFile, "load registry entry into RegistryFile object"); # T1