Data-Rlist
view release on metacpan or search on metacpan
Rlist.pm.html view on Meta::CPAN
<dl>
<dt><strong><a name="item_read"><em>read(INPUT[, FILTER, FILTER-ARGS])</em></a></strong>
<dd>
<p>Parse data from INPUT, which specifies some Rlist-text. See also <em><a href="#item_errors">errors</a></em>, <em><a href="#item_write">write</a></em>.</p>
</dd>
<dd>
<p><strong>PARAMETERS</strong></p>
</dd>
<dd>
<p>INPUT shall be either</p>
</dd>
<dd>
<p>- some Rlist object created by <em><a href="#item_new">new</a></em>,</p>
</dd>
<dd>
<p>- a string reference, in which case <em>read</em> and <em><a href="#item_read_string">read_string</a></em> parse Rlist text from it,</p>
</dd>
<dd>
<p>- a string scalar, in which case <em>read</em> assumes a file to parse.</p>
</dd>
<dd>
<p>See <em><a href="#item_open_input">open_input</a></em> for the FILTER and FILTER-ARGS parameters, which are used to preprocess an
input file. When an input file cannot be <em>open</em>'d and <em>flock</em>'d this function dies. When INPUT
is an object, arguments for FILTER and FILTER-ARGS eventually override the <em>-filter</em> and
<em>-filter_args</em> attributes.</p>
</dd>
<dd>
<p><strong>RESULT</strong></p>
</dd>
<dd>
<p>The parsed data as array- or hash-reference, or <em>undef</em> if there was no data. The latter may also
be the case when file consist only of comments/whitespace.</p>
</dd>
<dd>
<p><strong>NOTES</strong></p>
</dd>
<dd>
<p>This function may die. Dying is Perl's mechanism to raise exceptions, which eventually can be
catched with <em>eval</em>. For example,</p>
</dd>
<dd>
<pre>
my $host = eval { use Sys::Hostname; hostname; } || 'some unknown machine';</pre>
</dd>
<dd>
<p>This code fragment traps the <em>die</em> exception, so that <em>eval</em> returns <em>undef</em> or the result of
calling <em>hostname</em>. The following example uses <em>eval</em> to trap exceptions thrown by <em>read</em>:</p>
</dd>
<dd>
<pre>
$object = new Data::Rlist(-input => $thingfile);
$thing = eval { $object->read };</pre>
</dd>
<dd>
<pre>
unless (defined $thing) {
if ($object->errors) {
print STDERR "$thingfile has syntax errors"
} else {
print STDERR "$thingfile not found, is locked or empty"
}
} else {
# Can use $thing
.
.
}</pre>
</dd>
</li>
<dt><strong><a name="item_read_csv"><em>read_csv(INPUT[, OPTIONS, FILTER, FILTER-ARGS])</em></a></strong>
<dt><strong><a name="item_read_conf"><em>read_conf(INPUT[, OPTIONS, FILTER, FILTER-ARGS])</em></a></strong>
<dd>
<p>Parse data from INPUT, which specifies some comma-separated-values (CSV) text. Both functions</p>
</dd>
<dd>
<p>- read data from strings or files,</p>
</dd>
<dd>
<p>- use an optional delimiter,</p>
</dd>
<dd>
<p>- ignore delimiters in quoted strings,</p>
</dd>
<dd>
<p>- ignore empty lines,</p>
</dd>
<dd>
<p>- ignore lines begun with <em>#</em>.</p>
</dd>
<dd>
<p><em>read_conf</em> is a variant of <em>read_csv</em> dedicated to configuration files. Such files consist
of lines of the form</p>
</dd>
<dd>
<pre>
key = value</pre>
</dd>
<dd>
<p><strong>PARAMETERS</strong></p>
</dd>
<dd>
<p>For INPUT see <em><a href="#item_read">read</a></em>. For FILTER, FILTER-ARGS see <em><a href="#item_open_input">open_input</a></em>.</p>
</dd>
<dd>
<p>OPTIONS can be used to override the <a href="#compile_options"><code>"delimiter"</code></a> regex. For example, a
delimiter of <code>'\s+'</code> splits the line at horizontal whitespace into multiple values (with respect
of quoted strings). For <em>read_csv</em> the delimiter defaults to <code>'\s*,\s*'</code>, and for <em>read_conf</em>
to <code>'\s*=\s*'</code>. See also <em><a href="#item_write_csv">write_csv</a></em> and <em><a href="#item_write_conf">write_conf</a></em>.</p>
</dd>
<dd>
<p><strong>RESULT</strong></p>
</dd>
<dd>
<p>Both functions return a list of lists. Each embedded array defines the fields in a line.</p>
</dd>
<dd>
<p><strong>EXAMPLES</strong></p>
</dd>
<dd>
Rlist.pm.html view on Meta::CPAN
default_only = Y;
num_threads = 5;
.
.
};
}</pre>
<p>As one can see <em>Data::Dumper</em> writes the data right in Perl syntax, which means the dumped text
can be simply <em>eval</em>'d, and the data can be restored very fast. Rlists are not quite Perl-syntax:
a dedicated parser is required. But therefore Rlist text is portable and can be read from other
programming languages such as <a href="#c__">C++</a>.</p>
<p>With <em>$Data::Dumper::Useqq</em> enabled it was observed that <em>Data::Dumper</em> renders output
significantly slower than <em><a href="#item_compile">compile</a></em>. This is actually suprising, since <em>Data::Rlist</em> tests
for each scalar whether it is numeric, and truely quotes/escapes strings. <em>Data::Dumper</em> quotes
all scalars (including numbers), and it does not escape strings. This may also result in some odd
behaviors. For example,</p>
<pre>
use Data::Dumper;
print Dumper "foo\n";</pre>
<p>yields</p>
<pre>
$VAR1 = 'foo
';</pre>
<p>while</p>
<pre>
use Data::Rlist;
PrintData "foo\n"</pre>
<p>yields</p>
<pre>
{ "foo\n"; }</pre>
<p>Finally, <em>Data::Rlist</em> generates smaller files. With the default <em>$Data::Dumper::Indent</em> of 2
<em>Data::Dumper</em>'s output is 4-5 times that of <em>Data::Rlist</em>'s. This is because <em>Data::Dumper</em>
recklessly uses blanks, instead of horizontal tabulators, which blows up file sizes without
measure.</p>
<p>
</p>
<h2><a name="rlist_vs__perl_syntax">Rlist vs. Perl Syntax</a></h2>
<p>Rlists are not Perl syntax:</p>
<pre>
RLIST PERL
----- ----
5; { 5 => undef }
"5"; { "5" => undef }
5=1; { 5 => 1 }
{5=1;} { 5 => 1 }
(5) [ 5 ]
{} { }
; { }
() [ ]</pre>
<p>
</p>
<h2><a name="debugging_data">Debugging Data</a></h2>
<p>To reduce recursive data structures (into true hierachies) set <em>$Data::Rlist::MaxDepth</em> to an
integer above 0. It then defines the depth under which <em><a href="#item_compile">compile</a></em> shall not venture deeper.
The compilation of Perl data (into Rlist text) then continues, but on <em>STDERR</em> a message like the
following is printed:</p>
<pre>
ERROR: compile2() broken in deep ARRAY(0x101aaeec) (depth = 101, max-depth = 100)</pre>
<p>This message will also be repeated as comment when the compiled Rlist is written to a file.
Furthermore <em>$Data::Rlist::Broken</em> is incremented by one. While the compilation continues,
effectively any attempt to venture deeper as suggested by <em>$Data::Rlist::MaxDepth</em> will be
blocked.</p>
<p>See <em><a href="#item_broken">broken</a></em>.</p>
<p>
</p>
<h2><a name="speeding_up_compilation__explicit_quoting_">Speeding up Compilation (Explicit Quoting)</a></h2>
<p>Much work has been spent to optimize <em>Data::Rlist</em> for speed. Still it is implemented in pure
Perl (no XS). A rough estimation for Perl 5.8 is ``each MB takes one second per GHz''. For example,
when the resulting Rlist file has a size of 13 MB, compiling it from a Perl script on a 3-GHz-PC
requires about 5-7 seconds. Compiling the same data under Solaris, on a sparcv9 processor
operating at 750 MHz, takes about 18-22 seconds.</p>
<p>The process of compiling can be speed up by calling <em><a href="#item_quote7">quote7</a></em> explicitly on scalars. That is,
before calling <em><a href="#item_write">write</a></em> or <em><a href="#item_write_string">write_string</a></em>. Big data sets may compile faster when for
scalars, that certainly not qualify as symbolic name, <em><a href="#item_quote7">quote7</a></em> is called in advance:</p>
<pre>
use Data::Rlist qw/:strings/;</pre>
<pre>
$data{quote7($key)} = $value;
.
.
Data::Rlist::write("data.rlist", \%data);</pre>
<p>instead of</p>
<pre>
$data{$key} = $value;
.
.
Data::Rlist::write("data.rlist", \%data);</pre>
<p>It depends on the case whether the first variant is faster: <em><a href="#item_compile">compile</a></em> and <em><a href="#item_compile_fast">compile_fast</a></em>
both have to call <em><a href="#item_is_random_text">is_random_text</a></em> on each scalar. When the scalar is already quoted, i.e.,
its first character is <code>"</code>, this test ought to run faster.</p>
<p>Internally <em><a href="#item_is_random_text">is_random_text</a></em> applies the precompiled regex <em>$Data::Rlist::REValue</em>. Note that
the expression <em>($s!~$Data::Rlist::REValue)</em> can be up to 20% faster than the equivalent
<em>is_random_text($s)</em>.</p>
<p>
</p>
<h2><a name="quoting_strings_that_look_like_numbers">Quoting strings that look like numbers</a></h2>
<p>Normally you don't have to care about strings, since un/quoting happens as required when
reading/compiling Rlist or CSV text. A common problem, however, occurs when some string uses the
same lexicography than numbers do.</p>
<p>Perl defines the string as the basic building block for all program data, then lets the program
decide <em>what strings mean</em>. Analogical, in a printed book the reader has to decipher the glyphs
and decide what evidence they hide. Printed text uses well-defined glyphs and typographic
conventions, and finally the competence of the reader, to recognize numbers. But computers need to
know the exact number type and format. Integer? Float? Hexadecimal? Scientific? Klingon? The
Perl Cookbook recommends the use of a regular expression to distinguish number from string scalars
(recipe 2.1).</p>
<p>In Rlist, string scalars that look like numbers need to be quoted explicitly. Otherwise, for
example, the string scalar <code>"-3.14"</code> appears as <em>-3.14</em> in the output, <code>"007324"</code> is compiled
into 7324 etc. Such text is lost and read back as a number. Of course, in most cases this is just
what you want. For hash keys, however, it might be a problem. One solution is to prefix the string
with <code>"_"</code>:</p>
<pre>
my $s = '-9'; $s = "_$s";</pre>
<p>Such strings do not qualify as a number anymore. In the C++ implementation it will then become
some <em>std::string</em>, not a <em>double</em>. But the leading <code>"_"</code> has to be removed by the reading
program. Perhaps a better solution is to explicitly call <em><a href="#item_quote7">quote7</a></em>:</p>
<pre>
use Data::Rlist qw/:strings/;</pre>
<pre>
$k = -9;
$k = quote7($k); # returns qq'"-9"'</pre>
<pre>
( run in 0.840 second using v1.01-cache-2.11-cpan-df04353d9ac )