ACME-QuoteDB

 view release on metacpan or  search on metacpan

lib/ACME/QuoteDB/LoadDB.pm  view on Meta::CPAN



__END__

=head1 NAME

ACME::QuoteDB::LoadDB - Database loader for ACME::QuoteDB

=head1 VERSION

Version 0.1.1

=head1 SYNOPSIS

load a csv file to quotes database 

  my $load_db = ACME::QuoteDB::LoadDB->new({
                              file => '/home/me/data/simpsons_quotes.csv',
                              file_format => 'csv',
                          });
  
  $load_db->data_to_db;

  print $load_db->success; # bool

header columns of the csv file as follows:

"Quote", "Attribution Name", "Attribution Source", "Category", "Rating"


=head1 DESCRIPTION

This module is part of L<ACME::QuoteDB>. This is a Database loader, it
takes (quotes) data and loads into a database 
(currently L<sqlite3 or mysql|/'CONFIGURATION AND ENVIRONMENT'>),
which is then accessed by L<ACME::QuoteDB>.


There are several ways to get quote data into the db via this loader:
(There are more aimed towards 'batch' operations, i.e load a bunch of 
records quickly)

=over 4

=item 1

* csv file (pre determined format)

     pros: quick and easy to load.
     cons: getting the quotes data into the correct format need by this module

=item 2

* any source.

    One can take quote data from any source, override
    L<ACME::QuoteDB::LoadDB/dbload> loader methods to populate a record
    and write it to the db.
     pros: can get any quote data into the db.
     cons: you supply the method. depending on the complexity of the data
           source and munging required this will take longer then the other 
           methods.

=back

=head3 load from csv file

The pre defined csv file format is:

format of file is as follows: (headers)
"Quote", "Attribution Name", "Attribution Source", "Category", "Rating"
  
   for example:
   "Quote", "Attribution Name", "Attribution Source", "Category", "Rating"
   "I hope this has taught you kids a lesson: kids never learn.","Chief Wiggum","The Simpsons","Humor",9
   "Sideshow Bob has no decency. He called me Chief Piggum. (laughs) Oh wait, I get it, he's all right.","Chief Wiggum","The Simpsons","Humor",8


   my $load_db = ACME::QuoteDB::LoadDB->new({
                               file => dirname(__FILE__).'/data/simpsons_quotes.csv',
                               file_format => 'csv',
                           });
   
   $load_db->data_to_db;

   if (!$load_db->success){print 'failed'}


=head3 load from any source

If those dont catch your interest, ACME::QuoteDB::LoadDB is sub-classable, 
so one can extract data anyway they like and populate the db themselves. 
(there is a test that illustrates overriding the stub method, 'dbload')

you need to populate a record data structure:

    $self->set_record(quote  => q{}); # mandatory
    $self->set_record(name   => q{}); # mandatory
    $self->set_record(source => q{}); # optional but useful
    $self->set_record(catg   => q{}); # optional but useful
    $self->set_record(rating => q{}); # optional but useful

    # then to write the record you call
    $self->write_record;

NOTE: this is a one record at a time operation, so one would perform 
this within a loop. there is no bulk write operation currently.


=head1 OVERVIEW

You have a collection of quotes (adages/sayings/quips/epigrams, etc) for
whatever reason, you use these quotes for whatever reason, you want to 
access these quotes in a variety of ways,...

This module is part of L<ACME::QuoteDB>. 

This is a Database loader, it takes data (quotes) and loads into a database, 
which is then accessed by L<ACME::QuoteDB>.

See L<ACME::QuoteDB>.


=head1 USAGE

General usage, csv/tsv file in the expected format loaded to the database

  my $load_db = ACME::QuoteDB::LoadDB->new({
                              file => '/home/me/data/sorta_funny_quotes.tsv',
                              file_format => 'tsv',
                              delimiter => "\t",
                              # provide a attr_source for all (if not in data)
                              # data is used first, if not defined use below
                              attr_source => 'Things Randomly Overheard',
                              # provide a category for all (if not in data)
                              category => 'Humor',
                              # provide a rating for all
                              rating   => 5, # scale 1-10
                          });
  $load_db->data_to_db;

  if (!$load_db->success){print 'failed'}

Also see t/01-load_quotes.t included with the distribution.

(available from the CPAN if not included on your system)


=head1 SUBROUTINES/METHODS 

This is an Object Oriented module. There is no proceedural interface.

=head2 new

  Instantiate a ACME::QuoteDB::LoadDB object.

  Argument is a hash ref. Params below 


=head4 Data Related Parameters

=over 4

=item  file or directory - one or the other required (not both)

if file, must be in our defined format, full path is needed.

if directory, full path is needed, can supply a basic glob type filter.

example:

{ file  => '/home/me/data/simpsons_quotes.csv' }

{ dir  => '/home/me/data/*.csv' }
 

=item  file_format - required

can be one of: 'csv', 'tsv', 'custom', or 'html'

if 'html' or 'custom' you must supply the method for parsing. 
(see tests for examples)

example:

{ file_format => 'csv' }


=item  delimiter - optional, default is a comma for csv

csv/tsv options tested: comma(,) and tab(\t)

'html' - not applicable

example:

{ delimiter => "\t" }

=item  category - optional, extracted from data if exists, otherwise will use what you
specify

TODO one quote to multiple categories

=item  attr_source - extracted from data if exists, otherwise will use what you
specify

example:

{attr_source => 'The Simpsons'}

=item  file_encoding - optional

Files being loaded are assumed to be utf8 encoded. if utf8 flag is not detected,
falls back to latin1 (iso-8859-1). If neither of these is correct, set this
option to the encoding your file is in.

=back

=head4 Operation Related Parameters

=over 4

=item  dry_run - optional

do not write to the database. Use with verbose flag to see what would have beed
written.

This can be helpful for testing the outcome of Loading results. 

i.e. like to confirm that the parsing of your data is correct

example:

{
 dry_run => 1,
 verbose => 1
}

=item  verbose  - optional

display to STDOUT what is being done

This can be helpful for testing quotes extraction from file parsing

example:

{verbose => 1}

=item  create_db  - optional (boolean)

L<ACME::QuoteDB::LoadDB> default behaviour is to always assume there is a
database and append new data to that. (It is usually only needed the first 
time one load's data)

setting this parameter to a true value will create a new database.
(so while this is an optional param, it is required at least once ;)

B<NOTE: it is not intelligent, if you hand it a populated database,
it will happily overwrite all data>

B<AGAIN: setting this param will destroy the current database, creating a new
empty one>

example:

{create_db => 1}


=back 

=head2 data_to_db

takes the data input provided to new, process' it and writes to the database.
should appropriatly blow up if not successful

=head2 dbload_from_csv

takes a csv file (in our defined format) as an argument, parses it and writes
the data to the database. (uses L<Text::CSV> with pure perl parser)
utf-8 safe. (opens file as utf8)

will croak with message if not successful


=head2 dbload

if your file format is set to 'html' or 'custom' you must 
define this method to do your parsing in a sub class.

Load from html is not supported because there are too many 
ways to represt the data. (same with 'custom')
(see tests for examples - there is a test for loading a 'fortune' file format)

One can subclass ACME::QuoteDB::LoadDB and override dbload,
to do our html parsing

=head2 debug_record

dump record (show what is set on the internal data structure) 

e.g. Data::Dumper

=head2 set_record

only needed it one plans to sub-class this module.
otherwise, is transparent in usage.

if you are sub-classing this module, you would have to populate 
this record. (L</write_record> knows about/uses this data structure)

possible fields consist of:

$self->set_record(quote  => q{});
$self->set_record(rating => q{});
$self->set_record(name   => q{});
$self->set_record(source => q{});

lib/ACME/QuoteDB/LoadDB.pm  view on Meta::CPAN


=head1 DIAGNOSTICS

An error such as:

C<DBD::SQLite::db prepare_cached failed: no such table: ,...>

probably means that you do not have a database created in the correct format.

basically, you need to create the database, usually, on a first run

you need to add the flag:

create_db => 1, # first run, create the db

appending to an existing database is the default behaviour

see L</create_db_tables>


=head1 CONFIGURATION AND ENVIRONMENT

if you are running perl > 5.8.5 and have access to
install cpan modules, you should have no problem installing this module
(utf-8 support in Text::CSV not avaible until 5.8 - we don't support 'non
utf-8 mode)

=over 1

=item * By default, the quotes database used by this module installs in the 
system path, 'lib', (See L<Module::Build/"INSTALL PATHS">)
as world writable - i.e. 0666 (and probably owned by root)
If you don't like this, you can modify Build.PL to not chmod the file and it
will install as 444/readonly, you can also set a chown in there for whoever
you want to have RW access to the quotes db.

Alternativly, one can specify a location to a quotes database (file) to use.
(Since the local mode is sqlite3, the file doesn't even need to exist, just
needs read/write access to the path)

Set the environmental variable:

$ENV{ACME_QUOTEDB_PATH} (untested on windows)

(this has to be set before trying a database load and also (everytime) before 
using this module, obviouly)

Something such as:

BEGIN {
    # give alternate path to the DB
    # doesn't need to exist, will create
    $ENV{ACME_QUOTEDB_PATH} = '/home/me/my_stuff/my_quote_db'
}

* (NOTE: be sure this (BEGIN) exists *before* the 'use ACME::QuoteDB' lines)

The default is to use sqlite3.

In order to connect to a mysql database, several environmental variables
are required.

BEGIN {
    # have to set this to use remote database
    $ENV{ACME_QUOTEDB_REMOTE} =  'mysql';
    $ENV{ACME_QUOTEDB_DB}     =  'acme_quotedb';
    $ENV{ACME_QUOTEDB_HOST}   =  'localhost';
    $ENV{ACME_QUOTEDB_USER}   =  'acme_user';
    $ENV{ACME_QUOTEDB_PASS}   =  'acme';
}

Set the above in a begin block and all operations are the same but now
you will be writing to the remote mysql database specified.

(The user will need read/write permissions to the db/tables)
(mysql admin duties are beyond the scope of this module)

The only supported databases at this time are sqlite and mysql.

It is trivial to add support for others


see: L<LOADING QUOTES|ACME::QuoteDB/LOADING QUOTES>


=back 


=head1 DEPENDENCIES

L<Carp>

L<Data::Dumper>

L<criticism> (pragma - enforce Perl::Critic if installed)

L<version>(pragma - version numbers)

L<aliased>

L<Test::More>

L<DBD::SQLite>

L<DBI>

L<Class::DBI>

L<File::Basename>

L<Readonly>

L<Module::Build>


=head1 INCOMPATIBILITIES

none known of

=head1 SEE ALSO



( run in 0.678 second using v1.01-cache-2.11-cpan-d8267643d1d )