Alt-CWB-ambs

 view release on metacpan or  search on metacpan

lib/CWB/Encoder.pm  view on Meta::CPAN


This package contains modules for the automatic encoding and indexing
of CWB corpora. 

B<CWB::Indexer> builds indices for some or all positional attributes
of an existing corpus (using the B<cwb-makeall> tool). In addition,
these attributes are automatically compressed (using the
B<cwb-huffcode> and B<cwb-compress-rdx> tools). Compression and
indexing is interleaved to minimise the required amount of temporary
disk space, and a B<make>-like system ensures that old index files are
automatically updated.

B<CWB::Encoder> automates all steps necessary to encode a CWB corpus
(which includes cleaning up old files, running B<cwb-encode>, editing
the registry entry, indexing & compressing positional attributes, and
setting access permissions). Both modules can be set up with a few
simple method calls. Full descriptions are given separately in the
following sections. 

=cut

lib/CWB/Encoder.pm  view on Meta::CPAN


# internal method: make single component (recursively builds dependencies)
sub make_comp {
  my ($self, $att, $comp) = @_;
  my $rule = $RULES{$comp};
  croak "CWB::Indexer:  no rule found for component $comp (aborted).\n"
    unless defined $rule;
  my ($trigger, $needed, $creates, $command, $delete) =
    @$rule{qw<TRIGGER NEEDED CREATES COMMAND DELETE>};

  my $update = 0;               # check whether component needs to be created / updated
  my $file = $self->filename($att, $comp);
  if (not -f $file) {
    print STDERR "CWB::Indexer: component $att/$comp does not exist -> create\n"
      if $self->{DEBUG};
    $update = 1;                # file does not exist -> create
  }
  else {
    my $age = -M $file;
    foreach my $t (@$trigger) { # check for triggers that are newer than target
      my $t_file = $self->filename($att, $t);

lib/CWB/Encoder.pm  view on Meta::CPAN

      croak "CWB::Indexer: Can't delete intermediate file $f (aborted).\n"
        if -f $f;
    }
  }
}

=item $idx->make($att1, $att2, ...);

Process one or more positional attributes. An index is built for each
attribute and the data files are compressed. Missing files are
re-created (if possible) and old files are updated automatically.

=cut

sub make {
  my $self = shift;
  my $corpus = $self->{NAME};
  foreach my $att (@_) {
    my $type = $self->{TYPES}->{$att};
    croak "CWB::Indexer:  $corpus.$att is not a positional attribute (aborted).\n"
      unless $type and $type eq "P";

script/cwb-make  view on Meta::CPAN

=head1 DESCRIPTION

The B<cwb-make> utility automates index building and compression for a CWB corpus,
calling B<cwb-makeall>, B<cwb-huffcode> and B<cwb-compress-rdx> as needed.
Main advantages over the manual procedure are:

=over 4

=item *

Old index files are updated automatically (unlike B<cwb-makeall>, which does
not check the age of index files), and it is safe to call B<cwb-make> on an
indexed and compressed corpus (again, unlike B<cwb-makeall>).

=item *

Data files that are no longer needed after compression are immediately deleted.

=item *

The build process is optimised to reduce the amount of temporary disk space and



( run in 0.289 second using v1.01-cache-2.11-cpan-05444aca049 )