locked results from the CPAN

Data-Rlist


=item F<require(SELF[, NAME])>

=item F<has(SELF[, NAME])>

Get some  attribute NAME from object SELF.   Unless NAME exists returns  DEFAULT.  The F<require>
method has  no default value,  hence it dies  unless NAME exists.  F<has> returns true  when NAME
exists, false otherwise.  For NAME the leading hyphen is optional.  For example,

    $self->get('foo');          # returns $self->{-foo} or undef
    $self->get(-foo=>);         # dto.
    $self->get('foo', 42);      # returns $self->{-foo} or 42

=back

=cut

sub new {
    my($prototype, $k) = shift;
    carp <<___ if @_ & 1;
$prototype->Data::Rlist::new(${\(join(', ', @_))})
    odd number of arguments supplied, expecting key/value pairs
___
    my %args = @_;
    bless { map { $k = $_;
                  s/^_+//o;         # remove leading underscores
                  s/^([^\-])/-$1/o; # prepend missing '-'
                  $_ => $args{$k}
              } keys %args }, ref($prototype) || $prototype;
}

sub set {
    my($self) = shift;
    my %attr = @_;
    while(my($k, $v) = each %attr) {
        $self->{$k} = $v
    } $self
}

sub require($$) {               # get attribute or confess
    my($self, $attr) = @_;
    my $v = $self->get($attr);
    confess "$self->require(): missing '$attr' attribute:\n\t\t".join("\n\t\t", map { "$_ = $self->{$_}" } keys %$self) unless defined $v;
    return $v;
}

sub get($$;$) {                 # get attribute or return default value/undef
    my($self, $attr, $default) = @_;
    $attr = '-'.$attr unless $attr =~ /^-/;
    return $self->{$attr} if exists $self->{$attr};
    return $default;
}

sub has($$) {
    my($self, $attr) = @_;
    $attr = '-'.$attr unless $attr =~ /^-/;
    exists $self->{$attr};
}

sub dock($\&) {
    carp "package Data::Rlist locked" if $Locked++; # TODO: use critical sections and atomic increment
    my ($self, $block) = @_;
    local $MaxDepth = $self->get(-MaxDepth=>) if $self->has(-MaxDepth=>);
    local $SafeCppMode = $self->get(-SafeCppMode=>) if $self->has(-SafeCppMode=>);
    local $EchoStderr = $self->get(-EchoStderr=>) if $self->has(-EchoStderr=>);
    local $RoundScientific = $self->get(-RoundScientific=>) if $self->has(-RoundScientific=>);
    local $DefaultCsvDelimiter = $self->get(-DefaultCsvDelimiter=>) if $self->has(-DefaultCsvDelimiter=>);
    local $DefaultConfDelimiter = $self->get(-DefaultConfDelimiter=>) if $self->has(-DefaultConfDelimiter=>);
    local $DefaultConfSeparator = $self->get(-DefaultConfSeparator=>) if $self->has(-DefaultConfSeparator=>);
    local $DefaultNanoscriptToken = $self->get(-DefaultNanoscriptToken=>) if $self->has(-DefaultNanoscriptToken=>);
    local $DEBUG = $self->get(-DEBUG=>) if $self->has(-DEBUG=>);
    local $/ = $self->get(-InputRecordSeparator=>) if $self->has(-InputRecordSeparator=>);
    local $R;
    unless (defined wantarray) { # void context
        $block->(); --$Locked;
    } elsif (wantarray) {
        my @r = $block->(); --$Locked; return @r;
    } else {
        my $r = $block->(); --$Locked; return $r;
    }
}

=head2 Public Functions

=over

=item F<read(INPUT[, FILTER, FILTER-ARGS])>

Parse data from INPUT, which specifies some Rlist-text.  See also F<L</errors>>, F<L</write>>.

B<PARAMETERS>

INPUT shall be either

- some Rlist object created by F<L</new>>,

- a string reference, in which case F<read> and F<L</read_string>> parse Rlist text from it,

- a string scalar, in which case F<read> assumes a file to parse.

See F<L</open_input>>  for the FILTER and FILTER-ARGS  parameters, which are used  to preprocess an
input file.  When an input file cannot  be F<open>'d and F<flock>'d this function dies.  When INPUT
is  an  object,  arguments for  FILTER  and  FILTER-ARGS  eventually  override the  F<-filter>  and
F<-filter_args> attributes.

B<RESULT>

The parsed data as array- or hash-reference, or  F<undef> if there was no data. The latter may also
be the case when file consist only of comments/whitespace.

B<NOTES>

This function  may die.  Dying  is Perl's  mechanism to raise  exceptions, which eventually  can be
catched with F<eval>.  For example,

    my $host = eval { use Sys::Hostname; hostname; } || 'some unknown machine';

This code fragment  traps the F<die> exception, so  that F<eval> returns F<undef> or  the result of
calling F<hostname>. The following example uses F<eval> to trap exceptions thrown by F<read>:

    $object = new Data::Rlist(-input => $thingfile);
    $thing = eval { $object->read };

    unless (defined $thing) {
        if ($object->errors) {
            print STDERR "$thingfile has syntax errors"
        } else {
            print STDERR "$thingfile not found, is locked or empty"
        }
    } else {
        # Can use $thing
            .
            .
    }

=item F<read_csv(INPUT[, OPTIONS, FILTER, FILTER-ARGS])>

=item F<read_conf(INPUT[, OPTIONS, FILTER, FILTER-ARGS])>

Parse data from INPUT, which specifies some comma-separated-values (CSV) text.  Both functions

- read data from strings or files,

- use an optional delimiter,

- ignore delimiters in quoted strings,

- ignore empty lines,

- ignore lines begun with F<#>.

F<read_conf> is a variant of F<read_csv> dedicated to configuration files. Such files consist
of lines of the form

    key = value

B<PARAMETERS>

For INPUT see F<L</read>>.  For FILTER,  FILTER-ARGS see F<L</open_input>>.

OPTIONS  can be  used to  override the  L<C<"delimiter">|/Compile Options>  regex.  For  example, a
delimiter of C<'\s+'>  splits the line at horizontal whitespace into  multiple values (with respect
of quoted strings).   For F<read_csv> the delimiter defaults to  C<'\s*,\s*'>, and for F<read_conf>
to C<'\s*=\s*'>.  See also F<L</write_csv>> and F<L</write_conf>>.

B<RESULT>

Both functions return a list of lists.  Each embedded array defines the fields in a line.

B<EXAMPLES>

Un/quoting of values happens implicitly.  Given a file F<db.conf>

    # Comment
    SERVER      = hostname
    DATABASE    = database_name
    LOGIN       = "user,password"

the call F<$opts=ReadConf(C<"db.conf">)> assigns

    [ [ 'SERVER', 'hostname' ],
      [ 'DATABASE', 'database_name' ],
      [ 'LOGIN', 'user,password' ]
    ]

The F<L</WriteConf>> function can be used to create or update the configuration:

    push @$opts, [ 'MAGIC VALUE' => 3.14_15 ];

lib/Data/Rlist.pm view on Meta::CPAN

=item F<open_input(INPUT[, FILTER, FILTER-ARGS])>

=item F<close_input>

Open/close  Rlist text  file  or string  INPUT for  parsing.   Used internally  by F<L</read>>  and
F<L</read_csv>>.

B<PREPROCESSING>

The function  can preprocess the INPUT  file using FILTER.  Use  the special value 1  to select the
default  C preprocessor  (F<gcc  -E -Wp,-C>).   FILTER-ARGS  is an  optional  string of  additional
command-line arguments to be appended to FILTER.  For example,

    my $foo = Data::Rlist::read("foo", 1, "-DEXTRA")

eventually does not parse F<foo>, but the output of the command

    gcc -E -Wp,-C -DEXTRA foo

Hence within F<foo> now C-preprocessor-statements are allowed. For example,

    {
    #ifdef EXTRA
    #include "extra.rlist"
    #endif

        123 = (1, 2, 3);
        foobar = {
            .
            .

B<SAFE CPP MODE>

This mode uses F<sed> and a  temporary file.  It is enabled by setting F<$Data::Rlist::SafeCppMode>
to 1  (the default is  0).  It  protects single-line F<#>-comments  when FILTER begins  with either
F<gcc>, F<g++>  or F<cpp>.  F<L</open_input>>  then additionally runs  F<sed> to convert  all input
lines beginning  with whitespace plus the  F<#> character.  Only the  following F<cpp>-commands are
excluded, and only when they appear in column 1:

- F<#include> and F<#pragma>

- F<#define> and F<#undef>

- F<#if>, F<#ifdef>, F<#else> and F<#endif>.

For  all other  lines F<sed>  converts F<#>  into  F<##>.  This  prevents the  C preprocessor  from
evaluating them.   Because of Perl's  limited F<open> function,  which isn't able to  dissolve long
pipes, the invocation  of F<sed> requires a temporary  file.  The temporary file is  created in the
same directory  as the input file.   When you only use  F<//> and F</* */>  comments, however, this
read mode is not required.

=cut

sub open_input($;$$)
{
    my($input, $fcmd, $fcmdargs) = @_;
    my($rls, $filename);
    my $rtp = reftype $input;

    carp "\n${\((caller(0))[3])}: filename or scalar-ref required as INPUT" if defined $rtp && $rtp ne 'SCALAR';
    carp "\n${\((caller(0))[3])}: package locked" if $Readstruct;
    $Readstruct = $ReadFh = undef;
    local $| = 1 if $DEBUG;

    if (defined $input) {
        $Readstruct = { };
        unless (ref $input) {
            $Readstruct->{filename} = $input;
            unless ($fcmd) {	# the file is read unfiltered
                unless (open($Readstruct->{fh}, "<$input") && flock($Readstruct->{fh}, 1)) {
                    $Readstruct = undef;
                    pr1nt('ERROR', "input file '$input'", $!);
                }
            } else {			# pipe it through $fcmt
                $fcmd = "gcc -E -Wp,-C -x c++" if $fcmd == 1;
                $fcmd = "$fcmd $fcmdargs" if $fcmdargs;

                if ($SafeCppMode) {
                    if ($fcmd =~ /^(gcc|g\+\+|cpp)/i) {
                        # Filter input with sed:
                        #
                        # (1) Because known #-commands must start at column 1 we first escape all
                        #     indented '#'s into '##'s:
                        #           "(^ +)#" -> '$1\#'
                        # (2) Next we prefix the known commands with a blank, e.g.
                        #           "#if 0" -> " #if 0"
                        # (3) Finally we escape all unknown #-commands at column 1:
                        #           "^#" -> "\#"
                        #
                        # lexln will then reverse the escaping.  Since the builtin open does not
                        # support true pipes, a temporary file receives the output of sed, which is
                        # then preprocessed. The temporary file will be removed in close_input.

                        my($sedfh, $tmpfh);
                        open($sedfh,
							 "sed '".
							 join('; ', ("s/^\\([ \t][ \t]*\\)#/\\1\\\\#/", # many seds don't know \t -> insert literally
										 "s/^#\\(include\\|pragma\\|if\\|ifdef\\|else\\|endif\\|define\\|undef\\)/ #\\1/",
										 "s/^#/\\\\#/")).";' <$input 2>nul |") ||
										 die "\nERROR: input file '$fcmd': $!";
                        my($tmpinput, $i) = (undef, 0);
                        do { $tmpinput = $input.'.tmp'.$i++ } while -e $tmpinput;
                        $Readstruct->{tmpfile} = $input = $tmpinput;
                        open ($tmpfh, ">$input") || die "\nERROR: temporary file '$input': $!";
                        print $tmpfh readline($sedfh);
                        close $tmpfh;
                        close $sedfh;
                    }
                }

                # Open the file $input (or the temporary sed'd file) for preprocessing.

                unless (open($Readstruct->{fh}, "$fcmd $input 2>nul |")) {
                    $Readstruct = undef;
                    pr1nt('ERROR', "preprocessed input '$fcmd $input': $!");
                }
            }

            if (defined $Readstruct) {
                $ReadFh = $Readstruct->{fh};
                $LnArray = undef;

lib/Data/Rlist.pm view on Meta::CPAN

As one can  see F<Data::Dumper> writes the data  right in Perl syntax, which means  the dumped text
can be simply F<eval>'d, and the data can  be restored very fast. Rlists are not quite Perl-syntax:
a dedicated parser  is required.  But therefore Rlist  text is portable and can be  read from other
programming languages such as L</C++>.

With  F<$Data::Dumper::Useqq>   enabled  it  was  observed  that   F<Data::Dumper>  renders  output
significantly slower  than F<L</compile>>. This  is actually suprising, since  F<Data::Rlist> tests
for each scalar  whether it is numeric, and truely  quotes/escapes strings.  F<Data::Dumper> quotes
all scalars (including numbers), and it does not  escape strings.  This may also result in some odd
behaviors.  For example,

    use Data::Dumper;
    print Dumper "foo\n";

yields

    $VAR1 = 'foo
    ';

while

    use Data::Rlist;
    PrintData "foo\n"

yields

    { "foo\n"; }

Finally, F<Data::Rlist>  generates smaller files.   With the default F<$Data::Dumper::Indent>  of 2
F<Data::Dumper>'s output  is 4-5  times that of  F<Data::Rlist>'s. This is  because F<Data::Dumper>
recklessly  uses blanks,  instead  of horizontal  tabulators,  which blows  up  file sizes  without
measure.

=head2 Rlist vs. Perl Syntax

Rlists are not Perl syntax:

    RLIST    PERL
    -----    ----
     5;       { 5 => undef }
     "5";     { "5" => undef }
     5=1;     { 5 => 1 }
     {5=1;}   { 5 => 1 }
     (5)      [ 5 ]
     {}       { }
     ;        { }
     ()       [ ]

=head2 Debugging Data

To  reduce recursive data  structures (into  true hierachies)  set F<$Data::Rlist::MaxDepth>  to an
integer above 0.  It then defines the  depth under which F<L</compile>> shall not venture deeper.
The compilation of Perl data (into Rlist text)  then continues, but on F<STDERR> a message like the
following is printed:

    ERROR: compile2() broken in deep ARRAY(0x101aaeec) (depth = 101, max-depth = 100)

This  message will  also be  repeated as  comment when  the compiled  Rlist is  written to  a file.
Furthermore  F<$Data::Rlist::Broken>  is  incremented  by  one. While  the  compilation  continues,
effectively  any  attempt to  venture  deeper as  suggested  by  F<$Data::Rlist::MaxDepth> will  be
blocked.

See F<L</broken>>.

=head2 Speeding up Compilation (Explicit Quoting)

Much work  has been spent to  optimize F<Data::Rlist> for speed.   Still it is  implemented in pure
Perl (no XS).  A rough estimation for Perl 5.8 is "each MB takes one second per GHz".  For example,
when the resulting  Rlist file has a size of 13  MB, compiling it from a Perl  script on a 3-GHz-PC
requires  about 5-7  seconds.   Compiling  the same  data  under Solaris,  on  a sparcv9  processor
operating at 750 MHz, takes about 18-22 seconds.

The process of compiling can be speed up by calling F<L</quote7>> explicitly on scalars. That is,
before calling F<L</write>> or F<L</write_string>>.  Big data sets may compile faster when for
scalars, that certainly not qualify as symbolic name, F<L</quote7>> is called in advance:

    use Data::Rlist qw/:strings/;

    $data{quote7($key)} = $value;
        .
        .
    Data::Rlist::write("data.rlist", \%data);

instead of

    $data{$key} = $value;
        .
        .
    Data::Rlist::write("data.rlist", \%data);

It depends on the case whether  the first variant is faster: F<L</compile>> and F<L</compile_fast>>
both have to call  F<L</is_random_text>> on each scalar.  When the scalar  is already quoted, i.e.,
its first character is C<">, this test ought to run faster.

Internally F<L</is_random_text>> applies the precompiled regex F<$Data::Rlist::REValue>.  Note that
the  expression S<F<($s!~$Data::Rlist::REValue)>>  can  be up  to  20% faster  than the  equivalent
F<is_random_text($s)>.

=head2 Quoting strings that look like numbers

Normally  you  don't  have to  care  about  strings,  since  un/quoting  happens as  required  when
reading/compiling Rlist or CSV  text.  A common problem, however, occurs when  some string uses the
same lexicography than numbers do.

Perl defines  the string as the  basic building block for  all program data, then  lets the program
decide F<what strings mean>.   Analogical, in a printed book the reader  has to decipher the glyphs
and  decide  what evidence  they  hide.   Printed text  uses  well-defined  glyphs and  typographic
conventions, and finally the competence of the reader, to recognize numbers.  But computers need to
know the exact number type and  format.  Integer?  Float?  Hexadecimal?  Scientific?  Klingon?  The
Perl Cookbook recommends the use of a  regular expression to distinguish number from string scalars
(recipe 2.1).

In Rlist,  string scalars  that look  like numbers need  to be  quoted explicitly.   Otherwise, for
example, the  string scalar C<"-3.14"> appears as  F<-3.14> in the output,  C<"007324"> is compiled
into 7324 etc. Such text is lost and read back  as a number.  Of course, in most cases this is just
what you want. For hash keys, however, it might be a problem.  One solution is to prefix the string
with C<"_">:

    my $s = '-9'; $s = "_$s";

Such strings do not qualify as a number anymore.  In the C++ implementation it will then become

( run in 2.676 seconds using v1.01-cache-2.11-cpan-df04353d9ac )