DBD-CSV

 view release on metacpan or  search on metacpan

lib/DBD/CSV.pm  view on Meta::CPAN

All SQL processing for DBD::CSV is done by SQL::Statement. See
L<SQL::Statement> for more specific information about its feature set.
Features include joins, aliases, built-in and user-defined functions,
and more.  See L<SQL::Statement::Syntax> for a description of the SQL
syntax supported in DBD::CSV.

Table- and column-names are case insensitive unless quoted. Column names
will be sanitized unless L</raw_header> is true.

=head1 Using DBD::CSV with DBI

For most things, DBD-CSV operates the same as any DBI driver.
See L<DBI> for detailed usage.

=head2 Creating a database handle (connect)

Creating a database handle usually implies connecting to a database server.
Thus this command reads

    use DBI;
    my $dbh = DBI->connect ("dbi:CSV:", "", "", {
	f_dir => "/home/user/folder",
	});

The directory tells the driver where it should create or open tables (a.k.a.
files). It defaults to the current directory, so the following are equivalent:

    $dbh = DBI->connect ("dbi:CSV:");
    $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "." });
    $dbh = DBI->connect ("dbi:CSV:f_dir=.");

We were told, that VMS might - for whatever reason - require:

    $dbh = DBI->connect ("dbi:CSV:f_dir=");

The preferred way of passing the arguments is by driver attributes:

    # specify most possible flags via driver flags
    $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
        f_schema         => undef,
        f_dir            => "data",
        f_dir_search     => [],
        f_ext            => ".csv/r",
        f_lock           => 2,
        f_encoding       => "utf8",

        csv_eol          => "\r\n",
        csv_sep_char     => ",",
        csv_quote_char   => '"',
        csv_escape_char  => '"',
        csv_class        => "Text::CSV_XS",
        csv_null         => 1,
        csv_bom          => 0,
        csv_tables       => {
            syspwd => {
                sep_char    => ":",
                quote_char  => undef,
                escape_char => undef,
                file        => "/etc/passwd",
                col_names   => [qw( login password
                                    uid gid realname
                                    directory shell )],
		},
            },

        RaiseError       => 1,
        PrintError       => 1,
        FetchHashKeyName => "NAME_lc",
        }) or die $DBI::errstr;

but you may set these attributes in the DSN as well, separated by semicolons.
Pay attention to the semi-colon for C<csv_sep_char> (as seen in many CSV
exports from MS Excel) is being escaped in below example, as is would
otherwise be seen as attribute separator:

    $dbh = DBI->connect (
	"dbi:CSV:f_dir=$ENV{HOME}/csvdb;f_ext=.csv;f_lock=2;" .
	"f_encoding=utf8;csv_eol=\n;csv_sep_char=\\;;" .
	"csv_quote_char=\";csv_escape_char=\\;csv_class=Text::CSV_XS;" .
	"csv_null=1") or die $DBI::errstr;

Using attributes in the DSN is easier to use when the DSN is derived from an
outside source (environment variable, database entry, or configure file),
whereas specifying entries in the attribute hash is easier to read and to
maintain.

The default value for C<csv_binary> is C<1> (True).

The default value for C<csv_auto_diag> is <1>. Note that this might cause
trouble on perl versions older than 5.8.9, so up to and including perl
version 5.8.8 it might be required to use C<;csv_auto_diag=0> inside the
C<DSN> or C<csv_auto_diag => 0> inside the attributes.

=head2 Creating and dropping tables

You can create and drop tables with commands like the following:

    $dbh->do ("CREATE TABLE $table (id INTEGER, name CHAR (64))");
    $dbh->do ("DROP TABLE $table");

Note that currently only the column names will be stored and no other data.
Thus all other information including column type (INTEGER or CHAR (x), for
example), column attributes (NOT NULL, PRIMARY KEY, ...) will silently be
discarded. This may change in a later release.

A drop just removes the file without any warning.

See L<DBI> for more details.

Table names cannot be arbitrary, due to restrictions of the SQL syntax.
I recommend that table names are valid SQL identifiers: The first
character is alphabetic, followed by an arbitrary number of alphanumeric
characters. If you want to use other files, the file names must start
with "/", "./" or "../" and they must not contain white space.

=head2 Inserting, fetching and modifying data

The following examples insert some data in a table and fetch it back:
First, an example where the column data is concatenated in the SQL string:

    $dbh->do ("INSERT INTO $table VALUES (1, ".

lib/DBD/CSV.pm  view on Meta::CPAN

by default Text::CSV_XS. Alternatively you may pass an instance as
I<csv_csv>, the latter takes precedence. Note that the I<binary>
attribute I<must> be set to a true value in that case.

Additionally you may overwrite these attributes on a per-table base in
the I<csv_tables> attribute.

=item csv_null
X<csv_null>

With this option set, all new statement handles will set C<always_quote>
and C<blank_is_undef> in the CSV parser and writer, so it knows how to
distinguish between the empty string and C<undef> or C<NULL>. You cannot
reset it with a false value. You can pass it to connect, or set it later:

  $dbh = DBI->connect ("dbi:CSV:", "", "", { csv_null => 1 });

  $dbh->{csv_null} = 1;

=item csv_bom
X<csv_bom>

With this option set, the CSV parser will try to detect BOM (Byte Order Mark)
in the header line. This requires L<Text::CSV_XS> version 1.22 or higher.

  $dbh = DBI->connect ("dbi:CSV:", "", "", { csv_bom => 1 });

  $dbh->{csv_bom} = 1;

=item csv_tables
X<csv_tables>

This hash ref is used for storing table dependent metadata. For any
table it contains an element with the table name as key and another
hash ref with the following attributes:

=over 4

=item o

All valid attributes to the CSV parsing module. Any of them can optionally
be prefixed with C<csv_>.

=item o

All attributes valid to DBD::File

=back

If you pass it C<f_file> or its alias C<file>, C<f_ext> has no effect, but
C<f_dir> and C<f_encoding> still have.

  csv_tables => {
      syspwd => {                   # Table name
          csv_sep_char => ":",      # Text::CSV_XS
          quote_char   => undef,    # Text::CSV_XS
          escape_char  => undef,    # Text::CSV_XS
          f_dir        => "/etc",   # DBD::File
          f_file       => "passwd", # DBD::File
          col_names    =>           # DBD::File
            [qw( login password uid gid realname directory shell )],
          },
      },

=item csv_*
X<csv_*>

All other attributes that start with C<csv_> and are not described above
will be passed to C<Text::CSV_XS> (without the C<csv_> prefix). These
extra options are only likely to be useful for reading (select)
handles. Examples:

  $dbh->{csv_allow_whitespace}    = 1;
  $dbh->{csv_allow_loose_quotes}  = 1;
  $dbh->{csv_allow_loose_escapes} = 1;

See the C<Text::CSV_XS> documentation for the full list and the documentation.

=back

=head2 Driver specific attributes

=over 4

=item f_file
X<f_file>

The name of the file used for the table; defaults to

    "$dbh->{f_dir}/$table"

=item eol
X<eol>

=item sep_char
X<sep_char>

=item quote_char
X<quote_char>

=item escape_char
X<escape_char>

=item class
X<class>

=item csv
X<csv>

These correspond to the attributes I<csv_eol>, I<csv_sep_char>,
I<csv_quote_char>, I<csv_escape_char>, I<csv_class> and I<csv_csv>.
The difference is that they work on a per-table basis.

=item col_names
X<col_names>

=item skip_first_row
X<skip_first_row>

By default DBD::CSV assumes that column names are stored in the first row
of the CSV file and sanitizes them (see C<raw_header> below). If this is

lib/DBD/CSV.pm  view on Meta::CPAN

and sometimes confusing errors. If your CSV has headers that match reserved
words, you will require these two attributes.

If C<test.csv> looks like

 select,from
 1,2

the select query would result in C<select select, from from test;>, which
obviously is illegal SQL.

=item raw_header
X<raw_header>

Due to the SQL standard, field names cannot contain special characters
like a dot (C<.>) or a space (C< >) unless the column names are quoted.
Following the approach of mdb_tools, all these tokens are translated to an
underscore (C<_>) when reading the first line of the CSV file, so all field
names are 'sanitized'. If you do not want this to happen, set C<raw_header>
to a true value and the entries in the first line of the CSV data will be
used verbatim for column headers and field names.  DBD::CSV cannot guarantee
that any part in the toolchain will work if field names have those characters,
and the chances are high that the SQL statements will fail.

Currently, the sanitizing of headers is as simple as

  s/\W/_/g;

Note that headers (column names) might be folded in other parts of the code
stack, specifically SQL::Statement, whose docs mention:

 Wildcards are expanded to lower cased identifiers. This might
 confuse some people, but it was easier to implement.

That means that in

 my $sth = $dbh->prepare ("select * from foo");
 $sth->execute;
 while (my $row = $sth->fetchrow_hashref) {
     say for keys %$row;
     }

all keys will show as all lower case, regardless of the original header.

=back

It's strongly recommended to check the attributes supported by
L<DBD::File/Metadata>.

Example: Suppose you want to use F</etc/passwd> as a CSV file. :-)
There simplest way is:

    use DBI;
    my $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
	f_dir           => "/etc",
	csv_sep_char    => ":",
	csv_quote_char  => undef,
	csv_escape_char => undef,
	});
    $dbh->{csv_tables}{passwd} = {
	col_names => [qw( login password uid gid realname
			  directory shell )];
	};
    $sth = $dbh->prepare ("SELECT * FROM passwd");

Another possibility where you leave all the defaults as they are and
override them on a per table basis:

    require DBI;
    my $dbh = DBI->connect ("dbi:CSV:");
    $dbh->{csv_tables}{passwd} = {
	eol         => "\n",
	sep_char    => ":",
	quote_char  => undef,
	escape_char => undef,
	f_file      => "/etc/passwd",
	col_names   => [qw( login password uid gid
			    realname directory shell )],
	};
    $sth = $dbh->prepare ("SELECT * FROM passwd");

=head2 Driver private methods

These methods are inherited from DBD::File:

=over 4

=item data_sources
X<data_sources>

The C<data_sources> method returns a list of sub-directories of the current
directory in the form "dbi:CSV:directory=$dirname".

If you want to read the sub-directories of another directory, use

    my $drh  = DBI->install_driver ("CSV");
    my @list = $drh->data_sources (f_dir => "/usr/local/csv_data");

=item list_tables
X<list_tables>

This method returns a list of file-names inside $dbh->{directory}.
Example:

    my $dbh  = DBI->connect ("dbi:CSV:directory=/usr/local/csv_data");
    my @list = $dbh->func ("list_tables");

Note that the list includes all files contained in the directory, even
those that have non-valid table names, from the view of SQL. See
L<Creating and dropping tables> above.

=back

=head1 KNOWN ISSUES

=over 4

=item *

The module is using flock () internally. However, this function is not
available on some platforms. Use of flock () is disabled on MacOS and
Windows 95: There's no locking at all (perhaps not so important on
these operating systems, as they are for single users anyways).

=back

=head1 TODO

=over 4

=item Tests
X<Tests>

Aim for a full 100% code coverage

 - eol      Make tests for different record separators.
 - csv_xs   Test with a variety of combinations for



( run in 0.777 second using v1.01-cache-2.11-cpan-97f6503c9c8 )