Alt-CWB-ambs
view release on metacpan or search on metacpan
lib/CWB/Encoder.pm view on Meta::CPAN
This package contains modules for the automatic encoding and indexing
of CWB corpora.
B<CWB::Indexer> builds indices for some or all positional attributes
of an existing corpus (using the B<cwb-makeall> tool). In addition,
these attributes are automatically compressed (using the
B<cwb-huffcode> and B<cwb-compress-rdx> tools). Compression and
indexing is interleaved to minimise the required amount of temporary
disk space, and a B<make>-like system ensures that old index files are
automatically updated.
B<CWB::Encoder> automates all steps necessary to encode a CWB corpus
(which includes cleaning up old files, running B<cwb-encode>, editing
the registry entry, indexing & compressing positional attributes, and
setting access permissions). Both modules can be set up with a few
simple method calls. Full descriptions are given separately in the
following sections.
=cut
lib/CWB/Encoder.pm view on Meta::CPAN
# internal method: make single component (recursively builds dependencies)
sub make_comp {
my ($self, $att, $comp) = @_;
my $rule = $RULES{$comp};
croak "CWB::Indexer: no rule found for component $comp (aborted).\n"
unless defined $rule;
my ($trigger, $needed, $creates, $command, $delete) =
@$rule{qw<TRIGGER NEEDED CREATES COMMAND DELETE>};
my $update = 0; # check whether component needs to be created / updated
my $file = $self->filename($att, $comp);
if (not -f $file) {
print STDERR "CWB::Indexer: component $att/$comp does not exist -> create\n"
if $self->{DEBUG};
$update = 1; # file does not exist -> create
}
else {
my $age = -M $file;
foreach my $t (@$trigger) { # check for triggers that are newer than target
my $t_file = $self->filename($att, $t);
lib/CWB/Encoder.pm view on Meta::CPAN
croak "CWB::Indexer: Can't delete intermediate file $f (aborted).\n"
if -f $f;
}
}
}
=item $idx->make($att1, $att2, ...);
Process one or more positional attributes. An index is built for each
attribute and the data files are compressed. Missing files are
re-created (if possible) and old files are updated automatically.
=cut
sub make {
my $self = shift;
my $corpus = $self->{NAME};
foreach my $att (@_) {
my $type = $self->{TYPES}->{$att};
croak "CWB::Indexer: $corpus.$att is not a positional attribute (aborted).\n"
unless $type and $type eq "P";
script/cwb-make view on Meta::CPAN
=head1 DESCRIPTION
The B<cwb-make> utility automates index building and compression for a CWB corpus,
calling B<cwb-makeall>, B<cwb-huffcode> and B<cwb-compress-rdx> as needed.
Main advantages over the manual procedure are:
=over 4
=item *
Old index files are updated automatically (unlike B<cwb-makeall>, which does
not check the age of index files), and it is safe to call B<cwb-make> on an
indexed and compressed corpus (again, unlike B<cwb-makeall>).
=item *
Data files that are no longer needed after compression are immediately deleted.
=item *
The build process is optimised to reduce the amount of temporary disk space and
( run in 0.289 second using v1.01-cache-2.11-cpan-05444aca049 )