Text-CSV

 view release on metacpan or  search on metacpan

lib/Text/CSV_PP.pm  view on Meta::CPAN

    my $key  = delete $attr{'key'};
    my $val  = delete $attr{'value'};
    my $kh   = delete $attr{'keep_headers'} ||
        delete $attr{'keep_column_names'} ||
        delete $attr{'kh'};

    my $cbai = delete $attr{'callbacks'}{'after_in'} ||
        delete $attr{'after_in'}                     ||
        delete $attr{'callbacks'}{'after_parse'}     ||
        delete $attr{'after_parse'};
    my $cbbo = delete $attr{'callbacks'}{'before_out'} ||
        delete $attr{'before_out'};
    my $cboi = delete $attr{'callbacks'}{'on_in'} ||
        delete $attr{'on_in'};
    my $cboe = delete $attr{'callbacks'}{'on_error'} ||
        delete $attr{'on_error'};

    my $hd_s = delete $attr{'sep_set'} ||
        delete $attr{'seps'};
    my $hd_b = delete $attr{'detect_bom'} ||
        delete $attr{'bom'};
    my $hd_m = delete $attr{'munge'} ||
        delete $attr{'munge_column_names'};
    my $hd_c = delete $attr{'set_column_names'};

    my $fh;
    my $sink = 0;
    my $cls  = 0;                                                          # If I open a file, I have to close it
    my $in   = delete $attr{in} || delete $attr{file} or croak($csv_usage);
    my $out  = exists $attr{out} && !$attr{out} ? \"skip"
        :   delete $attr{out} || delete $attr{file};

    ref $in eq "CODE" || ref $in eq "ARRAY" and $out ||= \*STDOUT;

    my ($fho, $fho_cls);
    if ($in && $out and (!ref $in || ref $in eq "GLOB" || ref \$in eq "GLOB")
        and (!ref $out || ref $out eq "GLOB" || ref \$out eq "GLOB")) {
        if (ref $out or "GLOB" eq ref \$out) {
            $fho = $out;
        }
        else {
            open $fho, ">", $out or croak "$out: $!\n";
            if (my $e = $attr{'encoding'}) {
                binmode $fho, ":encoding($e)";
                $hd_b and print $fho "\x{feff}";
            }
            $fho_cls = 1;
        }
        if ($cboi && !$cbai) {
            $cbai = $cboi;
            $cboi = undef;
        }
        if ($cbai) {
            my $cb = $cbai;
            $cbai = sub { $cb->(@_); $_[0]->say($fho, $_[1]); 0 };
        }
        else {
            $cbai = sub { $_[0]->say($fho, $_[1]); 0 };
        }

        # Put all callbacks back in place for streaming behavior
        $attr{'callbacks'}{'after_parse'} = $cbai; $cbai = undef;
        $attr{'callbacks'}{'before_out'}  = $cbbo; $cbbo = undef;
        $attr{'callbacks'}{'on_in'}       = $cboi; $cboi = undef;
        $attr{'callbacks'}{'on_error'}    = $cboe; $cboe = undef;
        $out                              = undef;
        $sink                             = 1;
    }

    if ($out) {
        if (ref $out and ("ARRAY" eq ref $out or "HASH" eq ref $out)) {
            delete $attr{out};
            $sink = 1;
        }
        elsif ((ref $out and "SCALAR" ne ref $out) or "GLOB" eq ref \$out) {
            $fh = $out;
        }
        elsif (ref $out and "SCALAR" eq ref $out and defined ${$out} and ${$out} eq "skip") {
            delete $attr{out};
            $sink = 1;
        }
        else {
            open $fh, ">", $out or croak("$out: $!");
            $cls = 1;
        }
        if ($fh) {
            if ($enc) {
                binmode $fh, $enc;
                my $fn = fileno $fh; # This is a workaround for a bug in PerlIO::via::gzip
            }
            unless (defined $attr{eol} || defined $fho) {
                my @layers = eval { PerlIO::get_layers($fh) };
                $attr{eol} = (grep m/crlf/ => @layers) ? "\n" : "\r\n";
            }
        }
    }

    if (ref $in eq "CODE" or ref $in eq "ARRAY") {
        # All done
    }
    elsif (ref $in eq "SCALAR") {
        # Strings with code points over 0xFF may not be mapped into in-memory file handles
        # "<$enc" does not change that :(
        open $fh, "<", $in or croak("Cannot open from SCALAR using PerlIO");
        $cls = 1;
    }
    elsif (ref $in or "GLOB" eq ref \$in) {
        if (!ref $in && $] < 5.008005) {
            $fh = \*{$in}; # uncoverable statement ancient perl version required
        }
        else {
            $fh = $in;
        }
    }
    else {
        open $fh, "<$enc", $in or croak("$in: $!");
        $cls = 1;
    }
    $fh || $sink or croak(qq{No valid source passed. "in" is required});

    for ([quo => "quote"],

lib/Text/CSV_PP.pm  view on Meta::CPAN


 csv (in => $fh,   out => \@aoa);
 csv (in => $fh,   out => \@aoh, bom => 1);
 csv (in => $fh,   out => \%hsh, key => "key");

 csv (in => $file, out => $file);
 csv (in => $file, out => $fh);
 csv (in => $fh,   out => $file);
 csv (in => $fh,   out => $fh);

In output mode, the default CSV options when producing CSV are

 eol       => "\r\n"

The L</fragment> attribute is ignored in output mode.

C<out> can be a file name  (e.g.  C<"file.csv">),  which will be opened for
writing and closed when finished,  a file handle (e.g. C<$fh> or C<FH>),  a
reference to a glob (e.g. C<\*STDOUT>),  the glob itself (e.g. C<*STDOUT>),
or a reference to a scalar (e.g. C<\my $data>).

 csv (in => sub { $sth->fetch },            out => "dump.csv");
 csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
      headers => $sth->{NAME_lc});

When a code-ref is used for C<in>, the output is generated  per invocation,
so no buffering is involved. This implies that there is no size restriction
on the number of records. The C<csv> function ends when the coderef returns
a false value.

If C<out> is set to a reference of the literal string C<"skip">, the output
will be suppressed completely,  which might be useful in combination with a
filter for side effects only.

 my %cache;
 csv (in    => "dump.csv",
      out   => \"skip",
      on_in => sub { $cache{$_[1][1]}++ });

Currently,  setting C<out> to any false value  (C<undef>, C<"">, 0) will be
equivalent to C<\"skip">.

If the C<in> argument point to something to parse, and the C<out> is set to
a reference to an C<ARRAY> or a C<HASH>, the output is appended to the data
in the existing reference. The result of the parse should match what exists
in the reference passed. This might come handy when you have to parse a set
of files with similar content (like data stored per period) and you want to
collect that into a single data structure:

 my %hash;
 csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";

 my @list; # List of arrays
 csv (in => $_, out => \@list)              for sort glob "foo-[0-9]*.csv";

 my @list; # List of hashes
 csv (in => $_, out => \@list, bom => 1)    for sort glob "foo-[0-9]*.csv";

=head4 Streaming

If B<both> C<in> and C<out> are files, file handles or globs,  streaming is
enforced by injecting an C<after_parse> callback  that immediately uses the
L<C<say ()>|/say> method of the same instance to output the result and then
rejects the record.

If a C<after_parse> was already passed as attribute,  that will be included
in the injected call. If C<on_in> was passed and C<after_parse> was not, it
will be used instead. If both were passed, C<on_in> is ignored.

The EOL of the first record of the C<in> source is consistently used as EOL
for all records in the C<out> destination.

The C<filter> attribute is not available.

All other attributes are shared for C<in> and C<out>,  so you cannot define
different encodings for C<in> and C<out>.  You need to pass a C<$fh>, where
C<binmode> was used to apply the encoding layers.

Note that this is work in progress and things might change.

=head3 encoding

If passed,  it should be an encoding accepted by the  C<:encoding()> option
to C<open>. There is no default value. This attribute does not work in perl
5.6.x.  C<encoding> can be abbreviated to C<enc> for ease of use in command
line invocations.

If C<encoding> is set to the literal value C<"auto">, the method L</header>
will be invoked on the opened stream to check if there is a BOM and set the
encoding accordingly.   This is equal to passing a true value in the option
L<C<detect_bom>|/detect_bom>.

Encodings can be stacked, as supported by C<binmode>:

 # Using PerlIO::via::gzip
 csv (in       => \@csv,
      out      => "test.csv:via.gz",
      encoding => ":via(gzip):encoding(utf-8)",
      );
 $aoa = csv (in => "test.csv:via.gz",  encoding => ":via(gzip)");

 # Using PerlIO::gzip
 csv (in       => \@csv,
      out      => "test.csv:via.gz",
      encoding => ":gzip:encoding(utf-8)",
      );
 $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");

=head3 detect_bom

If  C<detect_bom>  is given, the method  L</header>  will be invoked on the
opened stream to check if there is a BOM and set the encoding accordingly.

C<detect_bom> can be abbreviated to C<bom>.

This is the same as setting L<C<encoding>|/encoding> to C<"auto">.

Note that as the method  L</header> is invoked,  its default is to also set
the headers.

=head3 headers



( run in 1.372 second using v1.01-cache-2.11-cpan-39bf76dae61 )