Alt-CWB-ambs

 view release on metacpan or  search on metacpan

lib/CWB.pm  view on Meta::CPAN


  $tf = new CWB::TempFile;  # create new temporary file in /tmp dir
  $tf->write(...);     # write cycle (buffered output, like print function)
  $tf->finish;         # complete write cycle (flushes buffer)
  $line = $tf->read;   # read cycle (like getline method for FileHandle)
 [$tf->rewind;         # optional: start re-reading temporary file ]
 [$line = $tf->read;                                               ]
  $tf->close;          # delete temporary file

Once the temporary file has been read from, it cannot be re-written; a
new B<CWB::TempFile> object has to be created for the next cycle. When
the write stage is completed (but before reading has started, i.e. after
calling the B<finish> method), the temporary file can be accessed 
and/or overwritten by external programs. Use the B<name> method to
obtain its full pathname. If no direct access to the temporary file is
required, the B<finish> method is optional. The write cycle will
automatically be completed before the first B<read> method call.

=over 4

=item $tf = new CWB::TempFile [ $prefix ];

Creates temporary file in F</tmp> directory. If the optional I<$prefix>
is specified, the filename will begin with I<$prefix> and be extended
to a unique name. If I<$prefix> contains a C</> character, it is interpreted
as an absolute or relative path, and the temporary file will not be created
in the F</tmp> directory. To create a temporary file in the current
working directory, use F<./MyPrefix>. 

You can add the extension C<.Z>, C<.gz>, or C<.bz2> to I<$prefix> in order
to create a compressed temporary file. The actual filename (returned by the
B<name> method) will have the same extension in this case. 

The temporary file is immediately created and opened for writing.

=cut

# $tf = new CWB::TempFile;                (chooses name automatically)
# $tf = new CWB::TempFile "NP-Chunks";    (uses name beginning with "NP-Chunks")
# $tf = new CWB::TempFile "NP-Chunks.gz"; (tell module to write gzipped tempfile, using OpenFile magic)
sub new {
  my $class = shift;
  my $prefix = shift;
  $prefix = "CWB_TempFile" unless defined $prefix;
  my $suffix = "";
  if ($prefix =~ s/\.(gz|bz2|Z)$//) {
    $suffix = $&;
  }
  my $self = {};
  my $basedir = ($prefix =~ /\//) ? "" : "/tmp/";  # if $prefix isn't absolute or relative path, create temp file in /tmp directory
  my $name = $basedir.$prefix.".$$".$suffix;
  my $num = 1;
  while (-e $name) {            # choose unique name in case file already exists
    $name = $basedir.$prefix.".$$-".$num.$suffix;
    $num++;
  }
  my $fh = CWB::OpenFile "> $name";
  $self->{NAME} = $name;
  $self->{FH} = $fh;
  $self->{STATUS} = "W";        # W = writing, F = finished, R = reading, D = deleted
  return bless($self, $class);
}

sub DESTROY {
  my $self = shift;
  if ($self->{STATUS} ne "D") {
    $self->close;
  }
}

=item $tf->close;

Closes all open file handles and deletes the temporary file. This will be done
automatically when the B<CWB::TempFile> object is destroyed. Use B<close> to
free disk space immediately.  

=cut

sub close {
  my $self = shift;
  my $status = $self->{STATUS};
  my $name = $self->{NAME};
  my $fh = $self->{FH};
  if (($status eq "W" or $status eq "R") and defined $fh) {
    $fh->close
      or carp "CWB::TempFile: Error writing/reading tempfile $name ($!)";
  }
  if (-f $name) {
    carp "CWB::TempFile: Could not unlink tempfile $name ($!)"
      unless unlink $name;
  }
  $self->{STATUS} = "D";
}

=item $filename = $tf->name;

Returns the real filename of a temporary file. B<NB:> direct access to this
file (e.g. by external programs) is I<only> allowed after calling B<finish>, 
and before the first B<read>.

=cut

sub name {
  my $self = shift;
  return $self->{NAME};
}

=item $status = $tf->status;

Returns the current status of the temporary file, i.e. the stage in its
life cycle.  The return value is one of the strings
C<WRITING> (initial state),
C<FINISHED> (immediately after B<finish>, before first read),
C<READING> (while reading or after B<rewind>) or
C<DELETED> (after B<close>).

=cut

sub status {
  my $self = shift;
  return {

lib/CWB.pm  view on Meta::CPAN

sub type2keyword {
  my $type = shift;
  my %mapping = qw<p ATTRIBUTE  s STRUCTURE  a ALIGNED>;
  die "Internal error: type2keyword('$type') undefined."
    unless defined $mapping{$type};
  return $mapping{$type};
}

sub keyword2type {
  my $key = shift;
  my %mapping = qw<ATTRIBUTE p  STRUCTURE s  ALIGNED a>;
  die "Internal error: keyword2type('$key') undefined."
    unless defined $mapping{$key};
  return $mapping{$key};
}

=over 4

=item $reg = new CWB::RegistryFile;

=item $reg = new CWB::RegistryFile $filename;

The first form of the B<CWB::RegistryFile> constructor creates a new, 
empty registry entry. The mandatory fields have to be filled in by the
Perl script before the I<$reg> object can be saved to disk. It is also highly
advisable to declare at least the C<word> attribute. :-)

The second form attempts to read and parse the registry file I<$filename>. If
successful, a B<CWB::RegistryFile> object storing all relevant information is
returned.  If I<$filename> does not contain the character C</> and cannot be
found in the current directory, the constructor will automatically search the
standard registry directories for it.  The full pathname of the registry file
can later be determined with the B<filename> method.

If the load operation failed (i.e. the file does not exist or is not in the
canonical registry file format), an error message is printed and an undefined
value returned (so this module can be used e.g. to write a robust graphical
registry editor). Always check the return value of the constructor before
proceeding.

=cut

sub new {
  my $class = shift;
  my $filename = shift;
  my $self =                                    # create and initialise object
    {
     NAME => "",                                # name of corpus (defaults to empty string)
     ID => undef,                               # corpus ID (required)
     HOME => undef,                             # home directory (required)
     INFO => undef,                             # info file (optional, but highly recommended)
     PROPERTIES => [],                          # corpus properties ([property, value] pairs)
     ATT => {},                                 # attributes (att => 'p' / 's' / 'a')
     ATT_PATH => {},                            # data paths for attributs
     SERIALIZE => [],                           # order in which attributes are listed in the registry entry
     COMMENTS =>  {},                           # comments and/or blank lines preceding each content line
                                                # (att => [comment1, comment2, ...], ':NAME' => [...], '::property' => ...)
     LINECOMMENT => {},                         # line comments on content lines (att => comment, ':NAME' => comment, ...)
     FILENAME => undef,                         # filename of registry file (if loaded from file)
    };
  bless($self, $class);

  # if filename was specified, try loading registry entry (searches in registry directories if necessary)
  if (defined $filename) {
    if ($filename !~ /\// and not -f $filename) {
      my @dirs = CWB::RegistryDirectory();
      my @files = grep { -f $_ } map { "$_/".lc($filename) } @dirs; # corpus ID may be specified in uppercase
      return $self->error("Found multiple registry entries for corpus ".uc($filename).":", @files)
        if @files > 1;
      $filename = shift @files
        if @files;
    }
    return $self->error("Can't access registry file or corpus $filename")
      unless -r $filename;
    my $fh = CWB::OpenFile $filename;
    $self->{FILENAME} = $filename;
    # NAME (required)
    my ($l, @c) = read_segment($fh);
    return $self->error("Missing or misplaced NAME line in registry file $filename",
                        "  >> $l <<")
      unless $l =~ /^NAME\s+/;
    return $self->error("Syntax error in registry file $filename:",
                        "  >> $l <<",
                        "(expected >> NAME \" ... \" <<)")
      unless $l =~ /^NAME\s+\"([^\"]*)\"\s*(\#.*)?$/;
    my ($v, $lc) = ($1, $2);
    $lc =~ s/^\#// if $lc;
    $self->{NAME} = $v;
    $self->{COMMENTS}->{':NAME'} = [@c];
    $self->{LINECOMMENT}->{':NAME'} = $lc
      if $lc;
    # ID (required)
    ($l, @c) = read_segment($fh);
    return $self->error("Missing or misplaced ID line in registry file $filename",
                        "  >> $l <<")
      unless $l =~ /^ID\s+/;
    return $self->error("Syntax error in registry file $filename:",
                        "  >> $l <<",
                        "(expected >> ID lowercase-name <<)")
      unless $l =~ /^ID\s+([a-z_][a-z0-9_-]*)\s*(\#.*)?$/;
    ($v, $lc) = ($1, $2);
    $lc =~ s/^\#// if $lc;
    $self->{ID} = $v;
    $self->{COMMENTS}->{':ID'} = [@c];
    $self->{LINECOMMENT}->{':ID'} = $lc
      if $lc;
    # HOME (required)
    ($l, @c) = read_segment($fh);
    return $self->error("Missing or misplaced HOME line in registry file $filename",
                        "  >> $l <<")
      unless $l =~ /^HOME\s+/;
    return $self->error("Syntax error in registry file $filename:",
                        "  >> $l <<",
                        "(expected >> HOME directory <<)")
      unless $l =~ /^HOME\s+(\S+|".+")\s*(\#.*)?$/;  # Can't really check whether path is valid
    ($v, $lc) = ($1, $2);
    $lc =~ s/^\#// if $lc;
    $v =~ s/^"|"$//g; # remove string delimiters if PATH is double-quoted string
    $self->{HOME} = $v;
    $self->{COMMENTS}->{':HOME'} = [@c];
    $self->{LINECOMMENT}->{':HOME'} = $lc



( run in 2.158 seconds using v1.01-cache-2.11-cpan-437f7b0c052 )