streaming results from the CPAN

Parse-CSV
view release on metacpan or search on metacpan
use strict;
package Parse::CSV;
$Parse::CSV::VERSION = '2.07';
=pod

=head1 NAME

Parse::CSV - Highly flexible CSV parser for large files

=head1 VERSION

version 2.07

=head1 SYNOPSIS

  # Simple headerless comma-separated column parser
  my $simple = Parse::CSV->new(
      file => 'file.csv',
  );

  while ( my $array_ref = $simple->fetch ) {
     # Do something...
  }

... or a more complex example...

  # Parse a colon-separated variables file  from a handle as a hash
  # based on headers from the first line.
  # Then filter, so we emit objects rather than the plain hash.
  my $objects = Parse::CSV->new(
      handle => $io_handle,
      sep_char   => ';',
      names      => 1,
      filter     => sub { My::Object->new( $_ ) },
  );

  while ( my $object = $objects->fetch ) {
      $object->do_something;
  }

=head1 DESCRIPTION

Surely the CPAN doesn't need yet another CSV parsing module.

L<Text::CSV_XS> is the standard parser for CSV files. It is fast as hell,
but unfortunately it can be a bit verbose to use.

A number of other modules have attempted to put usability wrappers around
this venerable module, but they have all focused on parsing the entire
file into memory at once.

This method is fine unless your CSV files start to get large. Once that
happens, the only existing option is to fall back on the relatively slow
and heavyweight L<XML::SAXDriver::CSV> module.

L<Parse::CSV> fills this functionality gap. It provides a flexible
and light-weight streaming parser for large, extremely large, or
arbitrarily large CSV files.

=head2 Main Features

B<Stream-Based Parser> - All parsing a line at a time.

B<Array Mode> - Parsing can be done in simple array mode, returning
a reference to an array if the columns are not named.

B<Hash Mode> - Parsing can be done in hash mode, putting the data into
a hash and returning a reference to it.

B<Filter Capability> - All items returned can be passed through a
custom filter. This filter can either modify the data on the fly,
or drop records you don't need.

=head2 Writing Filters

A L<Parse::CSV> filter is a subroutine reference that is passed the
original record as C<$_> (not as a function argument), and should
C<return> the alternative or modified record.

A no-op filter (does not modify or drop any records) would look like the
following.

  sub { $_ }

A filter that reversed the order of the columns (assuming the parser
is in array mode) might look like the following.

  sub { [ reverse @$_ ] }

To drop the record, return C<undef> from the filter. The
parser will then keep pulling and parsing new records until one
passes the filter.

  # Only keep records where the 'foo' field is true
  sub { $_->{foo} ? $_ : undef }

To signal an error, throw an exception

  sub {
      $_->{foo} =~ /bar/ or die "Assumption failed";
      return $_;
  }

Feel free to modify C<$_> as a side-effect of your filter routine -
this will have no effect on anything.

=head1 METHODS

=cut

use 5.005;
use Carp              ();
use IO::File     1.13 ();
use Text::CSV_XS 1.22 ();
use Params::Util 1.00 ();
( run in 1.712 second using v1.01-cache-2.11-cpan-140bd7fdf52 )