Data-Rlist
view release on metacpan or search on metacpan
lib/Data/Rlist.pm view on Meta::CPAN
=item F<require(SELF[, NAME])>
=item F<has(SELF[, NAME])>
Get some attribute NAME from object SELF. Unless NAME exists returns DEFAULT. The F<require>
method has no default value, hence it dies unless NAME exists. F<has> returns true when NAME
exists, false otherwise. For NAME the leading hyphen is optional. For example,
$self->get('foo'); # returns $self->{-foo} or undef
$self->get(-foo=>); # dto.
$self->get('foo', 42); # returns $self->{-foo} or 42
=back
=cut
sub new {
my($prototype, $k) = shift;
carp <<___ if @_ & 1;
$prototype->Data::Rlist::new(${\(join(', ', @_))})
odd number of arguments supplied, expecting key/value pairs
___
my %args = @_;
bless { map { $k = $_;
s/^_+//o; # remove leading underscores
s/^([^\-])/-$1/o; # prepend missing '-'
$_ => $args{$k}
} keys %args }, ref($prototype) || $prototype;
}
sub set {
my($self) = shift;
my %attr = @_;
while(my($k, $v) = each %attr) {
$self->{$k} = $v
} $self
}
sub require($$) { # get attribute or confess
my($self, $attr) = @_;
my $v = $self->get($attr);
confess "$self->require(): missing '$attr' attribute:\n\t\t".join("\n\t\t", map { "$_ = $self->{$_}" } keys %$self) unless defined $v;
return $v;
}
sub get($$;$) { # get attribute or return default value/undef
my($self, $attr, $default) = @_;
$attr = '-'.$attr unless $attr =~ /^-/;
return $self->{$attr} if exists $self->{$attr};
return $default;
}
sub has($$) {
my($self, $attr) = @_;
$attr = '-'.$attr unless $attr =~ /^-/;
exists $self->{$attr};
}
sub dock($\&) {
carp "package Data::Rlist locked" if $Locked++; # TODO: use critical sections and atomic increment
my ($self, $block) = @_;
local $MaxDepth = $self->get(-MaxDepth=>) if $self->has(-MaxDepth=>);
local $SafeCppMode = $self->get(-SafeCppMode=>) if $self->has(-SafeCppMode=>);
local $EchoStderr = $self->get(-EchoStderr=>) if $self->has(-EchoStderr=>);
local $RoundScientific = $self->get(-RoundScientific=>) if $self->has(-RoundScientific=>);
local $DefaultCsvDelimiter = $self->get(-DefaultCsvDelimiter=>) if $self->has(-DefaultCsvDelimiter=>);
local $DefaultConfDelimiter = $self->get(-DefaultConfDelimiter=>) if $self->has(-DefaultConfDelimiter=>);
local $DefaultConfSeparator = $self->get(-DefaultConfSeparator=>) if $self->has(-DefaultConfSeparator=>);
local $DefaultNanoscriptToken = $self->get(-DefaultNanoscriptToken=>) if $self->has(-DefaultNanoscriptToken=>);
local $DEBUG = $self->get(-DEBUG=>) if $self->has(-DEBUG=>);
local $/ = $self->get(-InputRecordSeparator=>) if $self->has(-InputRecordSeparator=>);
local $R;
unless (defined wantarray) { # void context
$block->(); --$Locked;
} elsif (wantarray) {
my @r = $block->(); --$Locked; return @r;
} else {
my $r = $block->(); --$Locked; return $r;
}
}
=head2 Public Functions
=over
=item F<read(INPUT[, FILTER, FILTER-ARGS])>
Parse data from INPUT, which specifies some Rlist-text. See also F<L</errors>>, F<L</write>>.
B<PARAMETERS>
INPUT shall be either
- some Rlist object created by F<L</new>>,
- a string reference, in which case F<read> and F<L</read_string>> parse Rlist text from it,
- a string scalar, in which case F<read> assumes a file to parse.
See F<L</open_input>> for the FILTER and FILTER-ARGS parameters, which are used to preprocess an
input file. When an input file cannot be F<open>'d and F<flock>'d this function dies. When INPUT
is an object, arguments for FILTER and FILTER-ARGS eventually override the F<-filter> and
F<-filter_args> attributes.
B<RESULT>
The parsed data as array- or hash-reference, or F<undef> if there was no data. The latter may also
be the case when file consist only of comments/whitespace.
B<NOTES>
This function may die. Dying is Perl's mechanism to raise exceptions, which eventually can be
catched with F<eval>. For example,
my $host = eval { use Sys::Hostname; hostname; } || 'some unknown machine';
This code fragment traps the F<die> exception, so that F<eval> returns F<undef> or the result of
calling F<hostname>. The following example uses F<eval> to trap exceptions thrown by F<read>:
$object = new Data::Rlist(-input => $thingfile);
$thing = eval { $object->read };
unless (defined $thing) {
if ($object->errors) {
print STDERR "$thingfile has syntax errors"
} else {
print STDERR "$thingfile not found, is locked or empty"
}
} else {
# Can use $thing
.
.
}
=item F<read_csv(INPUT[, OPTIONS, FILTER, FILTER-ARGS])>
=item F<read_conf(INPUT[, OPTIONS, FILTER, FILTER-ARGS])>
Parse data from INPUT, which specifies some comma-separated-values (CSV) text. Both functions
- read data from strings or files,
- use an optional delimiter,
- ignore delimiters in quoted strings,
- ignore empty lines,
- ignore lines begun with F<#>.
F<read_conf> is a variant of F<read_csv> dedicated to configuration files. Such files consist
of lines of the form
key = value
B<PARAMETERS>
For INPUT see F<L</read>>. For FILTER, FILTER-ARGS see F<L</open_input>>.
OPTIONS can be used to override the L<C<"delimiter">|/Compile Options> regex. For example, a
delimiter of C<'\s+'> splits the line at horizontal whitespace into multiple values (with respect
of quoted strings). For F<read_csv> the delimiter defaults to C<'\s*,\s*'>, and for F<read_conf>
to C<'\s*=\s*'>. See also F<L</write_csv>> and F<L</write_conf>>.
B<RESULT>
Both functions return a list of lists. Each embedded array defines the fields in a line.
B<EXAMPLES>
Un/quoting of values happens implicitly. Given a file F<db.conf>
# Comment
SERVER = hostname
DATABASE = database_name
LOGIN = "user,password"
the call F<$opts=ReadConf(C<"db.conf">)> assigns
[ [ 'SERVER', 'hostname' ],
[ 'DATABASE', 'database_name' ],
[ 'LOGIN', 'user,password' ]
]
The F<L</WriteConf>> function can be used to create or update the configuration:
push @$opts, [ 'MAGIC VALUE' => 3.14_15 ];
lib/Data/Rlist.pm view on Meta::CPAN
=item F<open_input(INPUT[, FILTER, FILTER-ARGS])>
=item F<close_input>
Open/close Rlist text file or string INPUT for parsing. Used internally by F<L</read>> and
F<L</read_csv>>.
B<PREPROCESSING>
The function can preprocess the INPUT file using FILTER. Use the special value 1 to select the
default C preprocessor (F<gcc -E -Wp,-C>). FILTER-ARGS is an optional string of additional
command-line arguments to be appended to FILTER. For example,
my $foo = Data::Rlist::read("foo", 1, "-DEXTRA")
eventually does not parse F<foo>, but the output of the command
gcc -E -Wp,-C -DEXTRA foo
Hence within F<foo> now C-preprocessor-statements are allowed. For example,
{
#ifdef EXTRA
#include "extra.rlist"
#endif
123 = (1, 2, 3);
foobar = {
.
.
B<SAFE CPP MODE>
This mode uses F<sed> and a temporary file. It is enabled by setting F<$Data::Rlist::SafeCppMode>
to 1 (the default is 0). It protects single-line F<#>-comments when FILTER begins with either
F<gcc>, F<g++> or F<cpp>. F<L</open_input>> then additionally runs F<sed> to convert all input
lines beginning with whitespace plus the F<#> character. Only the following F<cpp>-commands are
excluded, and only when they appear in column 1:
- F<#include> and F<#pragma>
- F<#define> and F<#undef>
- F<#if>, F<#ifdef>, F<#else> and F<#endif>.
For all other lines F<sed> converts F<#> into F<##>. This prevents the C preprocessor from
evaluating them. Because of Perl's limited F<open> function, which isn't able to dissolve long
pipes, the invocation of F<sed> requires a temporary file. The temporary file is created in the
same directory as the input file. When you only use F<//> and F</* */> comments, however, this
read mode is not required.
=cut
sub open_input($;$$)
{
my($input, $fcmd, $fcmdargs) = @_;
my($rls, $filename);
my $rtp = reftype $input;
carp "\n${\((caller(0))[3])}: filename or scalar-ref required as INPUT" if defined $rtp && $rtp ne 'SCALAR';
carp "\n${\((caller(0))[3])}: package locked" if $Readstruct;
$Readstruct = $ReadFh = undef;
local $| = 1 if $DEBUG;
if (defined $input) {
$Readstruct = { };
unless (ref $input) {
$Readstruct->{filename} = $input;
unless ($fcmd) { # the file is read unfiltered
unless (open($Readstruct->{fh}, "<$input") && flock($Readstruct->{fh}, 1)) {
$Readstruct = undef;
pr1nt('ERROR', "input file '$input'", $!);
}
} else { # pipe it through $fcmt
$fcmd = "gcc -E -Wp,-C -x c++" if $fcmd == 1;
$fcmd = "$fcmd $fcmdargs" if $fcmdargs;
if ($SafeCppMode) {
if ($fcmd =~ /^(gcc|g\+\+|cpp)/i) {
# Filter input with sed:
#
# (1) Because known #-commands must start at column 1 we first escape all
# indented '#'s into '##'s:
# "(^ +)#" -> '$1\#'
# (2) Next we prefix the known commands with a blank, e.g.
# "#if 0" -> " #if 0"
# (3) Finally we escape all unknown #-commands at column 1:
# "^#" -> "\#"
#
# lexln will then reverse the escaping. Since the builtin open does not
# support true pipes, a temporary file receives the output of sed, which is
# then preprocessed. The temporary file will be removed in close_input.
my($sedfh, $tmpfh);
open($sedfh,
"sed '".
join('; ', ("s/^\\([ \t][ \t]*\\)#/\\1\\\\#/", # many seds don't know \t -> insert literally
"s/^#\\(include\\|pragma\\|if\\|ifdef\\|else\\|endif\\|define\\|undef\\)/ #\\1/",
"s/^#/\\\\#/")).";' <$input 2>nul |") ||
die "\nERROR: input file '$fcmd': $!";
my($tmpinput, $i) = (undef, 0);
do { $tmpinput = $input.'.tmp'.$i++ } while -e $tmpinput;
$Readstruct->{tmpfile} = $input = $tmpinput;
open ($tmpfh, ">$input") || die "\nERROR: temporary file '$input': $!";
print $tmpfh readline($sedfh);
close $tmpfh;
close $sedfh;
}
}
# Open the file $input (or the temporary sed'd file) for preprocessing.
unless (open($Readstruct->{fh}, "$fcmd $input 2>nul |")) {
$Readstruct = undef;
pr1nt('ERROR', "preprocessed input '$fcmd $input': $!");
}
}
if (defined $Readstruct) {
$ReadFh = $Readstruct->{fh};
$LnArray = undef;
lib/Data/Rlist.pm view on Meta::CPAN
As one can see F<Data::Dumper> writes the data right in Perl syntax, which means the dumped text
can be simply F<eval>'d, and the data can be restored very fast. Rlists are not quite Perl-syntax:
a dedicated parser is required. But therefore Rlist text is portable and can be read from other
programming languages such as L</C++>.
With F<$Data::Dumper::Useqq> enabled it was observed that F<Data::Dumper> renders output
significantly slower than F<L</compile>>. This is actually suprising, since F<Data::Rlist> tests
for each scalar whether it is numeric, and truely quotes/escapes strings. F<Data::Dumper> quotes
all scalars (including numbers), and it does not escape strings. This may also result in some odd
behaviors. For example,
use Data::Dumper;
print Dumper "foo\n";
yields
$VAR1 = 'foo
';
while
use Data::Rlist;
PrintData "foo\n"
yields
{ "foo\n"; }
Finally, F<Data::Rlist> generates smaller files. With the default F<$Data::Dumper::Indent> of 2
F<Data::Dumper>'s output is 4-5 times that of F<Data::Rlist>'s. This is because F<Data::Dumper>
recklessly uses blanks, instead of horizontal tabulators, which blows up file sizes without
measure.
=head2 Rlist vs. Perl Syntax
Rlists are not Perl syntax:
RLIST PERL
----- ----
5; { 5 => undef }
"5"; { "5" => undef }
5=1; { 5 => 1 }
{5=1;} { 5 => 1 }
(5) [ 5 ]
{} { }
; { }
() [ ]
=head2 Debugging Data
To reduce recursive data structures (into true hierachies) set F<$Data::Rlist::MaxDepth> to an
integer above 0. It then defines the depth under which F<L</compile>> shall not venture deeper.
The compilation of Perl data (into Rlist text) then continues, but on F<STDERR> a message like the
following is printed:
ERROR: compile2() broken in deep ARRAY(0x101aaeec) (depth = 101, max-depth = 100)
This message will also be repeated as comment when the compiled Rlist is written to a file.
Furthermore F<$Data::Rlist::Broken> is incremented by one. While the compilation continues,
effectively any attempt to venture deeper as suggested by F<$Data::Rlist::MaxDepth> will be
blocked.
See F<L</broken>>.
=head2 Speeding up Compilation (Explicit Quoting)
Much work has been spent to optimize F<Data::Rlist> for speed. Still it is implemented in pure
Perl (no XS). A rough estimation for Perl 5.8 is "each MB takes one second per GHz". For example,
when the resulting Rlist file has a size of 13 MB, compiling it from a Perl script on a 3-GHz-PC
requires about 5-7 seconds. Compiling the same data under Solaris, on a sparcv9 processor
operating at 750 MHz, takes about 18-22 seconds.
The process of compiling can be speed up by calling F<L</quote7>> explicitly on scalars. That is,
before calling F<L</write>> or F<L</write_string>>. Big data sets may compile faster when for
scalars, that certainly not qualify as symbolic name, F<L</quote7>> is called in advance:
use Data::Rlist qw/:strings/;
$data{quote7($key)} = $value;
.
.
Data::Rlist::write("data.rlist", \%data);
instead of
$data{$key} = $value;
.
.
Data::Rlist::write("data.rlist", \%data);
It depends on the case whether the first variant is faster: F<L</compile>> and F<L</compile_fast>>
both have to call F<L</is_random_text>> on each scalar. When the scalar is already quoted, i.e.,
its first character is C<">, this test ought to run faster.
Internally F<L</is_random_text>> applies the precompiled regex F<$Data::Rlist::REValue>. Note that
the expression S<F<($s!~$Data::Rlist::REValue)>> can be up to 20% faster than the equivalent
F<is_random_text($s)>.
=head2 Quoting strings that look like numbers
Normally you don't have to care about strings, since un/quoting happens as required when
reading/compiling Rlist or CSV text. A common problem, however, occurs when some string uses the
same lexicography than numbers do.
Perl defines the string as the basic building block for all program data, then lets the program
decide F<what strings mean>. Analogical, in a printed book the reader has to decipher the glyphs
and decide what evidence they hide. Printed text uses well-defined glyphs and typographic
conventions, and finally the competence of the reader, to recognize numbers. But computers need to
know the exact number type and format. Integer? Float? Hexadecimal? Scientific? Klingon? The
Perl Cookbook recommends the use of a regular expression to distinguish number from string scalars
(recipe 2.1).
In Rlist, string scalars that look like numbers need to be quoted explicitly. Otherwise, for
example, the string scalar C<"-3.14"> appears as F<-3.14> in the output, C<"007324"> is compiled
into 7324 etc. Such text is lost and read back as a number. Of course, in most cases this is just
what you want. For hash keys, however, it might be a problem. One solution is to prefix the string
with C<"_">:
my $s = '-9'; $s = "_$s";
Such strings do not qualify as a number anymore. In the C++ implementation it will then become
( run in 2.676 seconds using v1.01-cache-2.11-cpan-df04353d9ac )