Data-Rlist

 view release on metacpan or  search on metacpan

Rlist.pm.html  view on Meta::CPAN

<dl>
<dt><strong><a name="item_read"><em>read(INPUT[, FILTER, FILTER-ARGS])</em></a></strong>

<dd>
<p>Parse data from INPUT, which specifies some Rlist-text.  See also <em><a href="#item_errors">errors</a></em>, <em><a href="#item_write">write</a></em>.</p>
</dd>
<dd>
<p><strong>PARAMETERS</strong></p>
</dd>
<dd>
<p>INPUT shall be either</p>
</dd>
<dd>
<p>- some Rlist object created by <em><a href="#item_new">new</a></em>,</p>
</dd>
<dd>
<p>- a string reference, in which case <em>read</em> and <em><a href="#item_read_string">read_string</a></em> parse Rlist text from it,</p>
</dd>
<dd>
<p>- a string scalar, in which case <em>read</em> assumes a file to parse.</p>
</dd>
<dd>
<p>See <em><a href="#item_open_input">open_input</a></em>  for the FILTER and FILTER-ARGS  parameters, which are used  to preprocess an
input file.  When an input file cannot  be <em>open</em>'d and <em>flock</em>'d this function dies.  When INPUT
is  an  object,  arguments for  FILTER  and  FILTER-ARGS  eventually  override the  <em>-filter</em>  and
<em>-filter_args</em> attributes.</p>
</dd>
<dd>
<p><strong>RESULT</strong></p>
</dd>
<dd>
<p>The parsed data as array- or hash-reference, or  <em>undef</em> if there was no data. The latter may also
be the case when file consist only of comments/whitespace.</p>
</dd>
<dd>
<p><strong>NOTES</strong></p>
</dd>
<dd>
<p>This function  may die.  Dying  is Perl's  mechanism to raise  exceptions, which eventually  can be
catched with <em>eval</em>.  For example,</p>
</dd>
<dd>
<pre>
    my $host = eval { use Sys::Hostname; hostname; } || 'some unknown machine';</pre>
</dd>
<dd>
<p>This code fragment  traps the <em>die</em> exception, so  that <em>eval</em> returns <em>undef</em> or  the result of
calling <em>hostname</em>. The following example uses <em>eval</em> to trap exceptions thrown by <em>read</em>:</p>
</dd>
<dd>
<pre>
    $object = new Data::Rlist(-input =&gt; $thingfile);
    $thing = eval { $object-&gt;read };</pre>
</dd>
<dd>
<pre>
    unless (defined $thing) {
        if ($object-&gt;errors) {
            print STDERR &quot;$thingfile has syntax errors&quot;
        } else {
            print STDERR &quot;$thingfile not found, is locked or empty&quot;
        }
    } else {
        # Can use $thing
            .
            .
    }</pre>
</dd>
</li>
<dt><strong><a name="item_read_csv"><em>read_csv(INPUT[, OPTIONS, FILTER, FILTER-ARGS])</em></a></strong>

<dt><strong><a name="item_read_conf"><em>read_conf(INPUT[, OPTIONS, FILTER, FILTER-ARGS])</em></a></strong>

<dd>
<p>Parse data from INPUT, which specifies some comma-separated-values (CSV) text.  Both functions</p>
</dd>
<dd>
<p>- read data from strings or files,</p>
</dd>
<dd>
<p>- use an optional delimiter,</p>
</dd>
<dd>
<p>- ignore delimiters in quoted strings,</p>
</dd>
<dd>
<p>- ignore empty lines,</p>
</dd>
<dd>
<p>- ignore lines begun with <em>#</em>.</p>
</dd>
<dd>
<p><em>read_conf</em> is a variant of <em>read_csv</em> dedicated to configuration files. Such files consist
of lines of the form</p>
</dd>
<dd>
<pre>
    key = value</pre>
</dd>
<dd>
<p><strong>PARAMETERS</strong></p>
</dd>
<dd>
<p>For INPUT see <em><a href="#item_read">read</a></em>.  For FILTER,  FILTER-ARGS see <em><a href="#item_open_input">open_input</a></em>.</p>
</dd>
<dd>
<p>OPTIONS  can be  used to  override the  <a href="#compile_options"><code>&quot;delimiter&quot;</code></a>  regex.  For  example, a
delimiter of <code>'\s+'</code>  splits the line at horizontal whitespace into  multiple values (with respect
of quoted strings).   For <em>read_csv</em> the delimiter defaults to  <code>'\s*,\s*'</code>, and for <em>read_conf</em>
to <code>'\s*=\s*'</code>.  See also <em><a href="#item_write_csv">write_csv</a></em> and <em><a href="#item_write_conf">write_conf</a></em>.</p>
</dd>
<dd>
<p><strong>RESULT</strong></p>
</dd>
<dd>
<p>Both functions return a list of lists.  Each embedded array defines the fields in a line.</p>
</dd>
<dd>
<p><strong>EXAMPLES</strong></p>
</dd>
<dd>

Rlist.pm.html  view on Meta::CPAN

            default_only = Y;
            num_threads = 5;
                .
                .
        };
    }</pre>
<p>As one can  see <em>Data::Dumper</em> writes the data  right in Perl syntax, which means  the dumped text
can be simply <em>eval</em>'d, and the data can  be restored very fast. Rlists are not quite Perl-syntax:
a dedicated parser  is required.  But therefore Rlist  text is portable and can be  read from other
programming languages such as <a href="#c__">C++</a>.</p>
<p>With  <em>$Data::Dumper::Useqq</em>   enabled  it  was  observed  that   <em>Data::Dumper</em>  renders  output
significantly slower  than <em><a href="#item_compile">compile</a></em>. This  is actually suprising, since  <em>Data::Rlist</em> tests
for each scalar  whether it is numeric, and truely  quotes/escapes strings.  <em>Data::Dumper</em> quotes
all scalars (including numbers), and it does not  escape strings.  This may also result in some odd
behaviors.  For example,</p>
<pre>
    use Data::Dumper;
    print Dumper &quot;foo\n&quot;;</pre>
<p>yields</p>
<pre>
    $VAR1 = 'foo
    ';</pre>
<p>while</p>
<pre>
    use Data::Rlist;
    PrintData &quot;foo\n&quot;</pre>
<p>yields</p>
<pre>
    { &quot;foo\n&quot;; }</pre>
<p>Finally, <em>Data::Rlist</em>  generates smaller files.   With the default <em>$Data::Dumper::Indent</em>  of 2
<em>Data::Dumper</em>'s output  is 4-5  times that of  <em>Data::Rlist</em>'s. This is  because <em>Data::Dumper</em>
recklessly  uses blanks,  instead  of horizontal  tabulators,  which blows  up  file sizes  without
measure.</p>
<p>
</p>
<h2><a name="rlist_vs__perl_syntax">Rlist vs. Perl Syntax</a></h2>
<p>Rlists are not Perl syntax:</p>
<pre>
    RLIST    PERL
    -----    ----
     5;       { 5 =&gt; undef }
     &quot;5&quot;;     { &quot;5&quot; =&gt; undef }
     5=1;     { 5 =&gt; 1 }
     {5=1;}   { 5 =&gt; 1 }
     (5)      [ 5 ]
     {}       { }
     ;        { }
     ()       [ ]</pre>
<p>
</p>
<h2><a name="debugging_data">Debugging Data</a></h2>
<p>To  reduce recursive data  structures (into  true hierachies)  set <em>$Data::Rlist::MaxDepth</em>  to an
integer above 0.  It then defines the  depth under which <em><a href="#item_compile">compile</a></em> shall not venture deeper.
The compilation of Perl data (into Rlist text)  then continues, but on <em>STDERR</em> a message like the
following is printed:</p>
<pre>
    ERROR: compile2() broken in deep ARRAY(0x101aaeec) (depth = 101, max-depth = 100)</pre>
<p>This  message will  also be  repeated as  comment when  the compiled  Rlist is  written to  a file.
Furthermore  <em>$Data::Rlist::Broken</em>  is  incremented  by  one. While  the  compilation  continues,
effectively  any  attempt to  venture  deeper as  suggested  by  <em>$Data::Rlist::MaxDepth</em> will  be
blocked.</p>
<p>See <em><a href="#item_broken">broken</a></em>.</p>
<p>
</p>
<h2><a name="speeding_up_compilation__explicit_quoting_">Speeding up Compilation (Explicit Quoting)</a></h2>
<p>Much work  has been spent to  optimize <em>Data::Rlist</em> for speed.   Still it is  implemented in pure
Perl (no XS).  A rough estimation for Perl 5.8 is ``each MB takes one second per GHz''.  For example,
when the resulting  Rlist file has a size of 13  MB, compiling it from a Perl  script on a 3-GHz-PC
requires  about 5-7  seconds.   Compiling  the same  data  under Solaris,  on  a sparcv9  processor
operating at 750 MHz, takes about 18-22 seconds.</p>
<p>The process of compiling can be speed up by calling <em><a href="#item_quote7">quote7</a></em> explicitly on scalars. That is,
before calling <em><a href="#item_write">write</a></em> or <em><a href="#item_write_string">write_string</a></em>.  Big data sets may compile faster when for
scalars, that certainly not qualify as symbolic name, <em><a href="#item_quote7">quote7</a></em> is called in advance:</p>
<pre>
    use Data::Rlist qw/:strings/;</pre>
<pre>
    $data{quote7($key)} = $value;
        .
        .
    Data::Rlist::write(&quot;data.rlist&quot;, \%data);</pre>
<p>instead of</p>
<pre>
    $data{$key} = $value;
        .
        .
    Data::Rlist::write(&quot;data.rlist&quot;, \%data);</pre>
<p>It depends on the case whether  the first variant is faster: <em><a href="#item_compile">compile</a></em> and <em><a href="#item_compile_fast">compile_fast</a></em>
both have to call  <em><a href="#item_is_random_text">is_random_text</a></em> on each scalar.  When the scalar  is already quoted, i.e.,
its first character is <code>&quot;</code>, this test ought to run faster.</p>
<p>Internally <em><a href="#item_is_random_text">is_random_text</a></em> applies the precompiled regex <em>$Data::Rlist::REValue</em>.  Note that
the  expression <em>($s!~$Data::Rlist::REValue)</em>  can  be up  to  20% faster  than the  equivalent
<em>is_random_text($s)</em>.</p>
<p>
</p>
<h2><a name="quoting_strings_that_look_like_numbers">Quoting strings that look like numbers</a></h2>
<p>Normally  you  don't  have to  care  about  strings,  since  un/quoting  happens as  required  when
reading/compiling Rlist or CSV  text.  A common problem, however, occurs when  some string uses the
same lexicography than numbers do.</p>
<p>Perl defines  the string as the  basic building block for  all program data, then  lets the program
decide <em>what strings mean</em>.   Analogical, in a printed book the reader  has to decipher the glyphs
and  decide  what evidence  they  hide.   Printed text  uses  well-defined  glyphs and  typographic
conventions, and finally the competence of the reader, to recognize numbers.  But computers need to
know the exact number type and  format.  Integer?  Float?  Hexadecimal?  Scientific?  Klingon?  The
Perl Cookbook recommends the use of a  regular expression to distinguish number from string scalars
(recipe 2.1).</p>
<p>In Rlist,  string scalars  that look  like numbers need  to be  quoted explicitly.   Otherwise, for
example, the  string scalar <code>&quot;-3.14&quot;</code> appears as  <em>-3.14</em> in the output,  <code>&quot;007324&quot;</code> is compiled
into 7324 etc. Such text is lost and read back  as a number.  Of course, in most cases this is just
what you want. For hash keys, however, it might be a problem.  One solution is to prefix the string
with <code>&quot;_&quot;</code>:</p>
<pre>
    my $s = '-9'; $s = &quot;_$s&quot;;</pre>
<p>Such strings do not qualify as a number anymore.  In the C++ implementation it will then become
some <em>std::string</em>, not a <em>double</em>.  But the leading <code>&quot;_&quot;</code> has to be removed by the reading
program.  Perhaps a better solution is to explicitly call <em><a href="#item_quote7">quote7</a></em>:</p>
<pre>
    use Data::Rlist qw/:strings/;</pre>
<pre>
    $k = -9;
    $k = quote7($k);            # returns qq'&quot;-9&quot;'</pre>
<pre>



( run in 0.840 second using v1.01-cache-2.11-cpan-df04353d9ac )