File-ANVL

 view release on metacpan or  search on metacpan

lib/File/ANVL.pm  view on Meta::CPAN

For the formats ANVL, Plain, and XML, the returned text string by default
is wrapped to 72 columns.

     use File::OM;
     my $om = File::OM->new("ANVL");
     $anvl_record = $om->elems(
         "erc", "",
         "who", $creator,
         "what", $title,
         "when", $date,
         "where", $identifier)
         . "\n";    # 2nd newline in a row terminates ANVL record

The C<getlines()> function reads from $filehandle up to a blank line and
returns the lines read.  This is a general function for reading
"paragraphs", which is useful for reading ANVL records.  If unspecified,
$filehandle defaults to *ARGV, which makes it easy to take input from
successive file arguments specified on the command line (or from STDIN if
none) of the calling program.

For convenience, C<trimlines()> is often used to process the record just
returned by C<getlines()>.  It strips leading whitespace, optionally
counts lines, and returns undef if the passed record is undefined or
contains only whitespace, both being equivalent to end-of-file (EOF).

These functions treat whitespace specially.  Input is read up until at
least one non-whitespace character and a blank line (two newlines in a
row) or EOF is reached.  If EOF is reached and the record would contain
only whitespace, undef is returned.  Input line counts for preliminary
trimmed whitespace ($wslines) and real record lines ($rrlines) can be
returned through optional scalar references given to C<trimlines()>.
These functions work together to permit the caller access to all inputs,
to accurate line counts, and a familiar "loop until EOF" paradigm, as in

     while (defined trimlines(getlines(), \$wslcount, \$rrlcount)) ...

The C<anvl_recarray()> function splits an ANVL record into ANVL elements,
returning them via the array reference given as the second argument.  The
C<n>-th returned ANVL element corresponds to three Perl array elements as
follows:

     INDEX   CONTENT
     3n + 0  input file line number
     3n + 1  n-th ANVL element name
     3n + 2  n-th ANVL element value

This means, for example, that the first two ANVL element names would be
found at Perl array indices 4 and 7.  The first triple is special; array
elements 0 and 2 are undefined unless the record begins with an unlabeled
value (not strictly ANVL), such as,

     Smith, Jo
     home: 555-1234
     work: 555-9876

in which case they contain the line number and value, respectively. Array
element 1 always contains a string naming the format of the input, such
as, "ANVL", "JSON", "XML", etc.

The remaining triples are free form except that the values will have been
drawn from the original format and possibly decoded.  The first item
("lineno") in each remaining triple is a number followed by a character,
for example, "34:" or "6#".  The number indicates the line number (or
octet offset, depending on the origin format) of the start of the
element.  The character is either ':' to indicate a real element or '#'
to indicate a comment; if the latter, the element name has no defined
meaning and the comment is contained in the value.  Here's example code
that reads a 3-element record and reformats it.

     ($msg = File::ANVL::anvl_recarray('
     a: b c
     d:  e
       f
     g:
       h i
     '     and die "anvl_recarray: $msg";  # report what went wrong
     for ($i = 4; $i < $#elems; $i += 3)
         { print "[$elems[$i] <- $elems[$i+1]]  "; }

which prints

     [a <- b c]  [d <- e f]  [g <- h i]

An optional third argument to C<anvl_recarray> gives the starting line
number (default 1).  An optional fourth argument is a reference to a hash
containing options; the argument { comments => 1, autoindent => 0 } will
cause comments to be kept (stripped by default) and recoverable indention
errors to be flagged as errors (corrected to continuation lines by
default).  This function returns the empty string on success, or a
message beginning "warning: ..." or "error: ...".

C<erc_anvl_expand_array()> inspects and possibly modifies in place the
kind of element array resulting from a call to C<anvl_recarray()>.  It
returns the empty string on success, otherwise an error message.  This
routine is useful for transforming a short form ERC ANVL record into long
form, for example, expanding C<erc: a | b | c | d> into the equivalent,

     erc:
     who: a
     what: b
     when: c
     where: d

The C<anvl_arrayhash()> function takes the kind of element array
resulting from a call to C<anvl_recarry> or C<erc_anvl_expand_array()>
and modifies the hash reference given as the second argument by storing,
for each element name, a list of integers corresponding to the triples
that bear that name.  You should always C<undef>ine the hash first or you
may see unexpected results.  So to print the value (the 2nd array element
past the start of the triple) of the first instance (index 0) of "who",

     anvl_arrayhash(\@elems, \%hash);
     print "First who: ", $elems[ $hash{who}->[0] + 2 ], "\n";

The C<anvl_valsplit()> function splits an ANVL value into sub-values 
(svals) and repeated values (rvals), returning them as an array of arrays
via the array reference given as the second argument.  The top-level of
the array represents svals and the next level represents rvals.  This
function returns the empty string on success, or a message beginning
"warning: ..." or "error: ...".



( run in 0.699 second using v1.01-cache-2.11-cpan-39bf76dae61 )