Alt-CWB-ambs

 view release on metacpan or  search on metacpan

MANIFEST  view on Meta::CPAN

lib/CWB/CEQL.pm
lib/CWB/CEQL/Parser.pm
lib/CWB/CEQL/String.pm
lib/CWB/CQP.pm
lib/CWB/Encoder.pm
Makefile.PL
MANIFEST			This list of files
README
script/cwb-align-import
script/cwb-make
script/cwb-regedit
t/00_load.t
t/10_cwb_tools.t
t/11_cwb_file.t
t/12_cwb_tempfile.t
t/13_cwb_shell.t
t/14_cwb_registry.t
t/20_encode_vss.t
t/31_cqp_dickens.t.dont_run
t/40_ceql_parser.t
t/41_ceql_string.t

Makefile.PL  view on Meta::CPAN

STOP
$fh->close
  or die "Error writing '$config_file': $!";
print "Configuraion saved to file $config_file\n";

## now we can let MakeMaker generate the makefile
WriteMakefile(
              'NAME'         => "Alt::CWB::ambs",
              'VERSION_FROM' => "lib/Alt/CWB/ambs.pm",
              'EXE_FILES'    => ["script/cwb-make",
                                 "script/cwb-regedit",
                                 "script/cwb-align-import"],
              'META_MERGE'   => {
                                 "meta-spec" => { version => 2 },
                                 "no_index"  => {
                                                 directory => [ 'lib/CWB/'   ],
                                                 file      => [ 'lib/CWB.pm' ],
                                                },
                                },
             );

README  view on Meta::CPAN

If the "cwb-config" utility is not in your standard search path, you can specify
its location with the --config option:

    perl Makefile.PL --config=~/cwb-3.0/bin/cwb-config


PACKAGE OVERVIEW

The CWB/Perl package contains the following Perl modules

    CWB           general utility functions, registry editor
    CWB::CQP      run CQP process in the background
    CWB::Encoder  stream-lined encoding and indexing of CWB corpora
    CWB::CEQL     an implementation of the CEQL simple query language

See the manual pages (e.g. "perldoc CWB::CQP") of these modules for further
information.  The package also installs a number of command-line utility
programs:

    cwb-make      stream-lined indexing & compression of attributes
    cwb-regedit   a simple registry editor for the command line
    cwb-align-import  import existing sentence alignment into the CWB

Type e.g. "cwb-make -h" for a usage summary, and "perldoc cwb-make" for the
full manual page.


COPYRIGHT

Copyright (C) 1999-2010 by Stefan Evert [http::/purl.org/stefan.evert]

data/vrt/VeryShortStories.vrt  view on Meta::CPAN

pull-down	JJ	pull-down
menus	NNS	menu
.	SENT	.
</s>
<s>
A	DT	a
short	JJ	short
note	NN	note
from	IN	from
the	DT	the
editor	NN	editor
informed	VBD	inform
him	PP	him
about	IN	about
today	NN	today
's	POS	's
work	NN	work
.	SENT	.
</s>
<s>
First	JJ	first

data/vrt/VeryShortStories.vrt  view on Meta::CPAN

to	TO	to
pay	VB	pay
.	SENT	.
</s>
<s>
He	PP	he
gulped	VBD	gulp
and	CC	and
gave	VBD	give
his	PP$	his
credit	NN	credit
card	NN	card
a	DT	a
melancholic	JJ	melancholic
glance	NN	glance
before	IN	before
he	PP	he
inserted	VBD	insert
it	PP	it
into	IN	into
the	DT	the

data/vrt/VeryShortStories.vrt  view on Meta::CPAN

In	IN	in
order	NN	order
to	TO	to
call	VB	call
a	DT	a
tin	NN	tin
,	,	,
Ed	NP	Ed
inserted	VBD	insert
his	PP$	his
credit	NN	credit
card	NN	card
into	IN	into
a	DT	a
slot	NN	slot
.	SENT	.
</s>
<s>
A	DT	a
display	NN	display
lit	VBD	light

data/vrt/VeryShortStories.vrt  view on Meta::CPAN

messages	NNS	message
item	NN	item
.	SENT	.
</s>
<s>
The	DT	the
usual	JJ	usual
message	NN	message
from	IN	from
the	DT	the
editor	NN	editor
appeared	VBD	appear
,	,	,
and	CC	and
another	DT	another
one	NN	one
below	RB	below
.	SENT	.
</s>
<s>
After	IN	after

lib/CWB.pm  view on Meta::CPAN

  $CWB::Shell::Paranoid = 1;    # more paranoid checks (-1 for less paranoid)

  # read / modify / write registry files (must be in canonical format)
  $reg = new CWB::RegistryFile; # create new registry file
  $reg = new CWB::RegistryFile "/corpora/c1/registry/dickens";  # load file
  die "failed" unless defined $reg;    # will fail if not in canonical format

  $reg = new CWB::RegistryFile "dickens";       # search in standard registry
  $filename = $reg->filename;                   # retrieve full pathname

  # edit standard fields
  $name = $reg->name;           # read NAME field
  $reg->name("Charles Dickens");# modify NAME field
  $corpus_id = $reg->id;        # same for ID, HOME, INFO
  $home_dir = $reg->home;
  $info_file = $reg->info;
  $reg->delete_info;            # INFO line is optional and may be deleted

  # edit corpus properties
  @properties = $reg->list_properties;
  $value = $reg->property("language");  # get property value
  $reg->property("language", "en");     # set / add property
  $reg->delete_property("language");

  # edit attributes ('p'=positional, 's'=structural, 'a'=alignment)
  @attr = $reg->list_attributes;        # list all attributes
  @s_attr = $reg->list_attributes('a'); # list alignment attributes
  $type = $reg->attribute("word");      # 'p'/'s'/'a' or undef
  $reg->delete_attribute("np");
  $reg->add_attribute("np", 's');       # specify type when adding attribute
  $dir = $reg->attribute_path("lemma"); # may be stored in different directory
  $reg->attribute_path("lemma", $dir);  # set attribute path
  $reg->delete_attribute_path;          # default location is HOME directory

  # comment lines (preceding field/declaration) and inline comments use keys:

lib/CWB.pm  view on Meta::CPAN

##  parse, modify and create registry entries (in canonical format)
## ======================================================================

package CWB::RegistryFile;

use Carp;

=head1 REGISTRY FILE EDITING

Registry files in B<canonical format> can be loaded into B<CWB::RegistryFile> objects,
edited using the various access methods detailed below, and written back to disk. It
is also possible to create a registry entry from scratch and save it to a disk file.

Canonical registry files consist of a B<header> and a B<body>. The
B<header> begins with a NAME, ID, PATH, and optional INFO field

  NAME "long descriptive name"
  ID   my-corpus
  PATH /path/to/data/directory
  INFO /path/to/info/file.txt

lib/CWB.pm  view on Meta::CPAN

The second form attempts to read and parse the registry file I<$filename>. If
successful, a B<CWB::RegistryFile> object storing all relevant information is
returned.  If I<$filename> does not contain the character C</> and cannot be
found in the current directory, the constructor will automatically search the
standard registry directories for it.  The full pathname of the registry file
can later be determined with the B<filename> method.

If the load operation failed (i.e. the file does not exist or is not in the
canonical registry file format), an error message is printed and an undefined
value returned (so this module can be used e.g. to write a robust graphical
registry editor). Always check the return value of the constructor before
proceeding.

=cut

sub new {
  my $class = shift;
  my $filename = shift;
  my $self =                                    # create and initialise object
    {
     NAME => "",                                # name of corpus (defaults to empty string)

lib/CWB/Encoder.pm  view on Meta::CPAN


B<CWB::Indexer> builds indices for some or all positional attributes
of an existing corpus (using the B<cwb-makeall> tool). In addition,
these attributes are automatically compressed (using the
B<cwb-huffcode> and B<cwb-compress-rdx> tools). Compression and
indexing is interleaved to minimise the required amount of temporary
disk space, and a B<make>-like system ensures that old index files are
automatically updated.

B<CWB::Encoder> automates all steps necessary to encode a CWB corpus
(which includes cleaning up old files, running B<cwb-encode>, editing
the registry entry, indexing & compressing positional attributes, and
setting access permissions). Both modules can be set up with a few
simple method calls. Full descriptions are given separately in the
following sections. 

=cut

## ======================================================================
##  automatic creation, compression and updating of CWB index files (for p-attributes)
## ======================================================================

lib/CWB/Encoder.pm  view on Meta::CPAN

    if $self->{VERBOSE};
  my $infofile = "$dir/.info";
  my $fh = CWB::OpenFile "> $infofile";
  print $fh $self->{INFO}, "\n";
  $fh->close;
  CWB::Shell::Cmd("chmod $perm '$infofile'")
    if $perm;
  CWB::Shell::Cmd("chgrp $group '$infofile'")
    if $group;

  print "Editing registry entry ...\n" # edit registry file
    if $self->{VERBOSE};
  my $reg = $self->{REGISTRY};
  my $name = $self->{NAME};
  my $regfile = "$reg/$name";
  my $rf = new CWB::RegistryFile $regfile;
  croak "CWB::Encoder: Syntax error in registry entry $regfile\n"
    unless defined $rf;
  $rf->name($self->{LONGNAME});
  # $rf->property("charset", $self->{CHARSET}); # -- already set by cwb-encode (since v2.2.101)
  $rf->property("language", $self->{LANG});
  $rf->write($regfile);
  print STDERR "CWB::Encoder: registry entry $regfile has been edited\n"
    if $self->{DEBUG};
  print STDERR "CWB::Encoder: setting access permissions for $regfile\n"
    if $self->{DEBUG} and ($perm or $group);
  CWB::Shell::Cmd("chmod $perm '$regfile'")
    if $perm;
  CWB::Shell::Cmd("chgrp $group '$regfile'")
    if $group;

  my $idx = new CWB::Indexer "$reg:".(uc $name); # build indices and compress p-attributes
  $idx->group($group)

script/cwb-regedit  view on Meta::CPAN

#!/usr/bin/perl
## -*-cperl-*-
## Author:  Stefan Evert
## Purpose: simple command-line registry file editor
##
$| = 1;
use warnings;
use strict;

use locale;

use CWB;
use Getopt::Long;

sub Usage {
  die <<STOP;

Usage:  cwb-regedit [options] (CORPUS | <filename>) <command> [<command> ...]

Options:
  -r <dir>  use registry directory <dir> [system default]
     --registry=<dir>
  -h        show this help page
     --help

Commands:
   :info
        print basic information about the registry entry

script/cwb-regedit  view on Meta::CPAN

  while (@ARGV and not is_command()) {
    push @ARGS, shift @ARGV;
  }
  return 1;
}

__END__

=head1 NAME

cwb-regedit - A simple command-line editor for CWB registry files

=head1 SYNOPSIS

  cwb-regedit [options] (CORPUS | <filename>) <command> [<command> ...]

Options:

  -r <dir>, --registry=<dir>  use registry directory <dir> [system default]
  -h, --help                  display usage summary
     

Commands:

  :info

script/cwb-regedit  view on Meta::CPAN

       add positional, structural or alignment attributes
  :del <names>
       delete attributes of any type

Command names are case-insensitive, and some intuitive abbreviations and variations
are allowed (but only the forms listed above are guaranteed to work).


=head1 DESCRIPTION

B<cwb-regedit> is a simple command-line editor for CWB registry files in B<canonical format>.
It allows you to display and modify header information, corpus properties and attribute declarations.
For more complex editing tasks (such as adding descriptive comments), use the functionality
provided by the B<CWB> module (see L<CWB/"REGISTRY FILE EDITING"> manpage).

The first argument must be the name of a CWB registry file or a corpus ID.
In the latter case, B<cwb-regedit> will attempt to locate the corresponding registry file in
the system registry, or in a directory specified with the C<--registry> (C<-r>) option.
The remaining arguments are editing or display commands, which are applied to the registry entry in turn.
If any changes have been made, the modified registry file will be written back to disk automatically.

The following commands are currently supported:

=over 4

=item B<:info>

Print basic information about the registry entry, including pathname of the registry file,
corpus ID, name, data directory and info file, as well as all corpus properties defined in the registry entry.

script/cwb-regedit  view on Meta::CPAN


Display corpus header information (corpus ID, corpus name, data directory, info file).

=item (B<:id> | B<:name> | B<:home> | B<:ifile>) I<value>

Modify corpus header information (corpus ID, corpus name, data directory, info file).
Don't forget to quote I<value> with single or double quotes if it contains whitespace or other special characters.

=item B<:prop> I<name>

Display corpus property I<name>.  If this property is not defined, B<cwb-regedit> prints an
empty line and issues a warning message on B<stderr>.

=item B<:prop> I<name> I<value>

Modify or add corpus property I<name>.  Don't forget to quote I<value> if it contains whitespace or special characters.

=item B<:list> (B<:p> | B<:s> | B<:a>) ...

List all attributes of the specified type: B<:p> for positional attributes, B<:s> for structural attributes, 
and B<:a> for alignment attributes.  Attribute names are printed on a single line separated by blanks.

t/14_cwb_registry.t  view on Meta::CPAN

# -*-cperl-*-
## Test registry file editor

use Test::More tests => 18;

use File::Path qw(remove_tree);

use CWB;
use File::Compare;

our $dickens = new CWB::RegistryFile "data/registry/dickens";
isa_ok($dickens, CWB::RegistryFile, "load registry entry into RegistryFile object"); # T1



( run in 1.157 second using v1.01-cache-2.11-cpan-de7293f3b23 )