Text-CSV_XS
view release on metacpan or search on metacpan
my $key = delete $attr{'key'};
my $val = delete $attr{'value'};
my $kh = delete $attr{'keep_headers'} ||
delete $attr{'keep_column_names'} ||
delete $attr{'kh'};
my $cbai = delete $attr{'callbacks'}{'after_in'} ||
delete $attr{'after_in'} ||
delete $attr{'callbacks'}{'after_parse'} ||
delete $attr{'after_parse'};
my $cbbo = delete $attr{'callbacks'}{'before_out'} ||
delete $attr{'before_out'};
my $cboi = delete $attr{'callbacks'}{'on_in'} ||
delete $attr{'on_in'};
my $cboe = delete $attr{'callbacks'}{'on_error'} ||
delete $attr{'on_error'};
my $hd_s = delete $attr{'sep_set'} ||
delete $attr{'seps'};
my $hd_b = delete $attr{'detect_bom'} ||
delete $attr{'bom'};
my $hd_m = delete $attr{'munge'} ||
delete $attr{'munge_column_names'};
my $hd_c = delete $attr{'set_column_names'};
my $fh;
my $sink = 0;
my $cls = 0; # If I open a file, I have to close it
my $in = delete $attr{'in'} || delete $attr{'file'} or croak ($csv_usage);
my $out = exists $attr{'out'} && !$attr{'out'} ? \"skip"
: delete $attr{'out'} || delete $attr{'file'};
ref $in eq "CODE" || ref $in eq "ARRAY" and $out ||= \*STDOUT;
my ($fho, $fho_cls);
if ($in && $out and (!ref $in || ref $in eq "GLOB" || ref \$in eq "GLOB")
and (!ref $out || ref $out eq "GLOB" || ref \$out eq "GLOB")) {
if (ref $out or "GLOB" eq ref \$out) {
$fho = $out;
}
else {
open $fho, ">", $out or croak "$out: $!\n";
if (my $e = $attr{'encoding'}) {
binmode $fho, ":encoding($e)";
$hd_b and print $fho "\x{feff}";
}
$fho_cls = 1;
}
if ($cboi && !$cbai) {
$cbai = $cboi;
$cboi = undef;
}
if ($cbai) {
my $cb = $cbai;
$cbai = sub { $cb->(@_); $_[0]->say ($fho, $_[1]); 0 };
}
else {
$cbai = sub { $_[0]->say ($fho, $_[1]); 0 };
}
# Put all callbacks back in place for streaming behavior
$attr{'callbacks'}{'after_parse'} = $cbai; $cbai = undef;
$attr{'callbacks'}{'before_out'} = $cbbo; $cbbo = undef;
$attr{'callbacks'}{'on_in'} = $cboi; $cboi = undef;
$attr{'callbacks'}{'on_error'} = $cboe; $cboe = undef;
$out = undef;
$sink = 1;
}
if ($out) {
if (ref $out and ("ARRAY" eq ref $out or "HASH" eq ref $out)) {
delete $attr{'out'};
$sink = 1;
}
elsif ((ref $out and "SCALAR" ne ref $out) or "GLOB" eq ref \$out) {
$fh = $out;
}
elsif (ref $out and "SCALAR" eq ref $out and defined ${$out} and ${$out} eq "skip") {
delete $attr{'out'};
$sink = 1;
}
else {
open $fh, ">", $out or croak ("$out: $!");
$cls = 1;
}
if ($fh) {
if ($enc) {
binmode $fh, $enc;
my $fn = fileno $fh; # This is a workaround for a bug in PerlIO::via::gzip
}
unless (defined $attr{'eol'} || defined $fho) {
my @layers = eval { PerlIO::get_layers ($fh) };
$attr{'eol'} = (grep m/crlf/ => @layers) ? "\n" : "\r\n";
}
}
}
if ( ref $in eq "CODE" or ref $in eq "ARRAY") {
# All done
}
elsif (ref $in eq "SCALAR") {
# Strings with code points over 0xFF may not be mapped into in-memory file handles
# "<$enc" does not change that :(
open $fh, "<", $in or croak ("Cannot open from SCALAR using PerlIO");
$cls = 1;
}
elsif (ref $in or "GLOB" eq ref \$in) {
if (!ref $in && $] < 5.008005) {
$fh = \*{$in}; # uncoverable statement ancient perl version required
}
else {
$fh = $in;
}
}
else {
open $fh, "<$enc", $in or croak ("$in: $!");
$cls = 1;
}
$fh || $sink or croak (qq{No valid source passed. "in" is required});
for ([ 'quo' => "quote" ],
csv (in => $aoa, out => \"skip");
csv (in => $fh, out => \@aoa);
csv (in => $fh, out => \@aoh, bom => 1);
csv (in => $fh, out => \%hsh, key => "key");
csv (in => $file, out => $file);
csv (in => $file, out => $fh);
csv (in => $fh, out => $file);
csv (in => $fh, out => $fh);
In output mode, the default CSV options when producing CSV are
eol => "\r\n"
The L</fragment> attribute is ignored in output mode.
C<out> can be a file name (e.g. C<"file.csv">), which will be opened for
writing and closed when finished, a file handle (e.g. C<$fh> or C<FH>), a
reference to a glob (e.g. C<\*STDOUT>), the glob itself (e.g. C<*STDOUT>),
or a reference to a scalar (e.g. C<\my $data>).
csv (in => sub { $sth->fetch }, out => "dump.csv");
csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
headers => $sth->{NAME_lc});
When a code-ref is used for C<in>, the output is generated per invocation,
so no buffering is involved. This implies that there is no size restriction
on the number of records. The C<csv> function ends when the coderef returns
a false value.
If C<out> is set to a reference of the literal string C<"skip">, the output
will be suppressed completely, which might be useful in combination with a
filter for side effects only.
my %cache;
csv (in => "dump.csv",
out => \"skip",
on_in => sub { $cache{$_[1][1]}++ });
Currently, setting C<out> to any false value (C<undef>, C<"">, 0) will be
equivalent to C<\"skip">.
If the C<in> argument point to something to parse, and the C<out> is set to
a reference to an C<ARRAY> or a C<HASH>, the output is appended to the data
in the existing reference. The result of the parse should match what exists
in the reference passed. This might come handy when you have to parse a set
of files with similar content (like data stored per period) and you want to
collect that into a single data structure:
my %hash;
csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
my @list; # List of arrays
csv (in => $_, out => \@list) for sort glob "foo-[0-9]*.csv";
my @list; # List of hashes
csv (in => $_, out => \@list, bom => 1) for sort glob "foo-[0-9]*.csv";
=head4 Streaming
X<streaming>
If B<both> C<in> and C<out> are files, file handles or globs, streaming is
enforced by injecting an C<after_parse> callback that immediately uses the
L<C<say ()>|/say> method of the same instance to output the result and then
rejects the record.
If a C<after_parse> was already passed as attribute, that will be included
in the injected call. If C<on_in> was passed and C<after_parse> was not, it
will be used instead. If both were passed, C<on_in> is ignored.
The EOL of the first record of the C<in> source is consistently used as EOL
for all records in the C<out> destination.
The C<filter> attribute is not available.
All other attributes are shared for C<in> and C<out>, so you cannot define
different encodings for C<in> and C<out>. You need to pass a C<$fh>, where
C<binmode> was used to apply the encoding layers.
Note that this is work in progress and things might change.
=head3 encoding
X<encoding>
If passed, it should be an encoding accepted by the C<:encoding()> option
to C<open>. There is no default value. This attribute does not work in perl
5.6.x. C<encoding> can be abbreviated to C<enc> for ease of use in command
line invocations.
If C<encoding> is set to the literal value C<"auto">, the method L</header>
will be invoked on the opened stream to check if there is a BOM and set the
encoding accordingly. This is equal to passing a true value in the option
L<C<detect_bom>|/detect_bom>.
Encodings can be stacked, as supported by C<binmode>:
# Using PerlIO::via::gzip
csv (in => \@csv,
out => "test.csv:via.gz",
encoding => ":via(gzip):encoding(utf-8)",
);
$aoa = csv (in => "test.csv:via.gz", encoding => ":via(gzip)");
# Using PerlIO::gzip
csv (in => \@csv,
out => "test.csv:via.gz",
encoding => ":gzip:encoding(utf-8)",
);
$aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
=head3 detect_bom
X<detect_bom>
If C<detect_bom> is given, the method L</header> will be invoked on the
opened stream to check if there is a BOM and set the encoding accordingly.
Note that the attribute L<C<headers>|/headers> can be used to overrule the
default behavior of how that method automatically sets the attribute.
C<detect_bom> can be abbreviated to C<bom>.
This is the same as setting L<C<encoding>|/encoding> to C<"auto">.
close $csv_fh or die "hello.csv: $!";
=head3 Generating CSV into memory
Format a data-set (C<@foo>) into a scalar value in memory (C<$data>):
# The data
my @foo = map { [ 0 .. 5 ] } 0 .. 3;
# in a loop
my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r\n" });
open my $fh, ">", \my $data;
$csv->print ($fh, $_) for @foo;
close $fh;
# a single call
csv (in => \@foo, out => \my $data);
=head2 Rewriting CSV
=head3 Changing separator
Rewrite C<CSV> files with C<;> as separator character to well-formed C<CSV>:
use Text::CSV_XS qw( csv );
csv (in => csv (in => "bad.csv", sep_char => ";"), out => *STDOUT);
As C<STDOUT> is now default in L</csv>, a one-liner converting a UTF-16 CSV
file with BOM and TAB-separation to valid UTF-8 CSV could be:
$ perl -C3 -MText::CSV_XS=csv -we\
'csv(in=>"utf16tab.csv",encoding=>"utf16",sep=>"\t")' >utf8.csv
=head3 Unifying EOL
Rewrite a CSV file with mixed EOL and/or inconsistent quotation into a new
CSV file with consistent EOL and quotation. Attributes apply.
use Text::CSV_XS qw( csv );
csv (in => "file.csv", out => "newfile.csv", quote_space => 1);
=head2 Dumping database tables to CSV
Dumping a database table can be simple as this (TIMTOWTDI):
my $dbh = DBI->connect (...);
my $sql = "select * from foo";
# using your own loop
open my $fh, ">", "foo.csv" or die "foo.csv: $!\n";
my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\r\n" });
my $sth = $dbh->prepare ($sql); $sth->execute;
$csv->print ($fh, $sth->{NAME_lc});
while (my $row = $sth->fetch) {
$csv->print ($fh, $row);
}
# using the csv function, all in memory
csv (out => "foo.csv", in => $dbh->selectall_arrayref ($sql));
# using the csv function, streaming with callbacks
my $sth = $dbh->prepare ($sql); $sth->execute;
csv (out => "foo.csv", in => sub { $sth->fetch });
csv (out => "foo.csv", in => sub { $sth->fetchrow_hashref });
Note that this does not discriminate between "empty" values and NULL-values
from the database, as both will be the same empty field in CSV. To enable
distinction between the two, use L<C<quote_empty>|/quote_empty>.
csv (out => "foo.csv", in => sub { $sth->fetch }, quote_empty => 1);
If the database import utility supports special sequences to insert C<NULL>
values into the database, like MySQL/MariaDB supports C<\N>, use a filter
or a map
csv (out => "foo.csv", in => sub { $sth->fetch },
on_in => sub { $_ //= "\\N" for @{$_[1]} });
while (my $row = $sth->fetch) {
$csv->print ($fh, [ map { $_ // "\\N" } @$row ]);
}
Note that this will not work as expected when choosing the backslash (C<\>)
as C<escape_char>, as that will cause the C<\> to need to be escaped by yet
another C<\>, which will cause the field to need quotation and thus ending
up as C<"\\N"> instead of C<\N>. See also L<C<undef_str>|/undef_str>.
csv (out => "foo.csv", in => sub { $sth->fetch }, undef_str => "\\N");
These special sequences are not recognized by Text::CSV_XS on parsing the
CSV generated like this, but map and filter are your friends again
while (my $row = $csv->getline ($fh)) {
$sth->execute (map { $_ eq "\\N" ? undef : $_ } @$row);
}
csv (in => "foo.csv", filter => { 1 => sub {
$sth->execute (map { $_ eq "\\N" ? undef : $_ } @{$_[1]}); 0; }});
=head2 Converting CSV to JSON
use Text::CSV_XS qw( csv );
use JSON; # or Cpanel::JSON::XS for better performance
# AoA (no header interpretation)
say encode_json (csv (in => "file.csv"));
# AoH (convert to structures)
say encode_json (csv (in => "file.csv", bom => 1));
Yes, it is that simple.
=head2 The examples folder
For more extended examples, see the F<examples/> C<1>. sub-directory in the
original distribution or the git repository C<2>.
1. https://github.com/Tux/Text-CSV_XS/tree/master/examples
2. https://github.com/Tux/Text-CSV_XS
The following files can be found there:
( run in 0.367 second using v1.01-cache-2.11-cpan-39bf76dae61 )