DBSchema-Normalizer
view release on metacpan or search on metacpan
docs/Normalizer.txt view on Meta::CPAN
| artist | varchar(20) | | MUL | | |
+-----------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> describe albums;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| album_id | int(11) | | PRI | NULL | auto_increment |
| album | varchar(30) | | MUL | | |
| genre | varchar(10) | | MUL | | |
| artist_id | int(11) | | MUL | 0 | |
+-----------+-------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> describe songs;
+----------+-------------+------+-----+----------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+----------+----------------+
| ID | int(11) | | PRI | NULL | auto_increment |
| title | varchar(40) | | MUL | | |
| duration | time | | | 00:00:00 | |
| size | int(11) | | | 0 | |
| album_id | int(11) | | MUL | 0 | |
+----------+-------------+------+-----+----------+----------------+
5 rows in set (0.00 sec)
It should be clear now WHAT we have to do. Less clear is HOW. The above
instructions seem to imply that we manually copy the field structure
from the source table to the lookup and normalized tables.
Actually, that SQL code (except the DESCRIBEs) was produced by this very
module and printed to the STDOUT, so that all I had to do was some
cut-and-paste. And then we are back to the question of the algorithm. If
this is all SQL, where is Perl involved? The answer is that Perl will
reduce the amount of information we need to give to the database engine.
The information about the field structure and indexes is already in the
database. Our Perl module (with a [not so] little help from the DBI) can
extract the structure from the database and create the appropriate SQL
statements. On the practical side, this means that, before producing SQL
code, this module will gather information about the source table. It
will issue a "SHOW FIELDS FROM tablename" and a "SHOW INDEX FROM
tablename" statements, and parse their results to prepare the
operational code.
That's it. It seems a rather small contribution to your average
workload, but if you ever have to deal with a project involving several
large tables, with many fields, to be transformed into many normalized
tables, I am sure you will appreciate the GPL (Golden Perl Laziness)
behind this module.
BTW, this is the code used to produce the above SQL statements:
#!/usr/bin/perl -w
use strict;
use DBSchema::Normalizer;
my $norm = DBSchema::Normalizer->new ({
DSN => "DBI:mysql:music;host=localhost;"
. "mysql_read_default_file=$ENV{HOME}/.my.cnf",
src_table => "MP3",
index_field => "album_id",
lookup_fields => "artist,album,genre",
lookup_table => "tmp_albums",
dest_table => "songs",
copy_indexes => 1,
simulate => 1
});
$norm->do();
$norm->create_lookup_table ({
src_table => "tmp_albums",
index_field => "artist_id",
lookup_fields => "artist",
lookup_table => "artists"
});
$norm->create_normalized_table ({
src_table => "tmp_albums",
lookup_table => "artists",
index_field => "artist_id",
lookup_fields => "artist",
dest_table => "albums"
});
Twenty-five lines of code. Not bad for such a complicated task. But even
that could have been replaced by these two one-liners:
perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music MP3 \
album_id album,artist,genre tmp_albums songs 1 1 1))->do()'
perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music \
tmp_albums artist_id artist artists albums 1 1 1))->do()'
(See below item "snew" for more details.)
One thing that this module won't do for you, though, is to decide which
columns should stay with the source table and which ones should go to
the lookup table. This is something for which you need to apply some
database theory, and I don't expect you to know it unless you have
studied it (unless you happen to be J.F. Codd) either at school or
independently. I am planning (in a very idle and distant way) another
module that will analyze a database table and decide if it is a good
design or not. The examples from commercial software I have seen so far
did not impress me a lot. I am still convinced that humans are better
than machines at this task. But, hey! Ten years ago I was convinced that
humans were much better than machines at chess, and instead, not long
ago, I had to see an IBM box doing very nasty things to Gary Kasparov.
So maybe I'll change my mind. In the meantime, I am enjoying my present
intellectual superiority and I keep analyzing databases with the same
pleasure that I once felt when solving a chess problem.
Simulation mode
This module can do the data transfer for you, or you may want to run it
in "simulation mode", by adding simulate => "1" to the constructor
parameters. When in simulation mode, the DBSchema::Normalizer will just
print the necessary SQL statements to STDOUT, without passing them to
the database engine. You can thus check the queries and eventually
docs/Normalizer.txt view on Meta::CPAN
It may speed up the insertion
for large tables.
verbose if "1", messages indicating what is going on
will be sent to STDOUT.
Using "2", even more verbose information is
given (all queries printed before execution);
Level "3" will also show details about src_table
fields and indexes;
ignore_warning if "1", warning on missing indexes on lookup fields
are ignored, and the requested operation carried
out even at a price of long waiting. Default "0"
simulate if "1", no operation is carried out
but the queries are printed to STDOUT (as in
verbose => 2)
note: src_table, dest_table and lookup_table cannot be called *src*
or *lkp*, which are used internally by the Normalizer. If such names
are given, a fatal error is issued.
If the keys for src_table, index_field, lookup table and fields are
missing, they can (actually they MUST) be later provided by calls to
create_lookup_table() and create_normalized_table().
snew
snew is a shortcut for new(). It is called with parameters passed by
position instead of using a hash reference. It is a
"quick-and-dirty" ("dirty" being the operational word) method
intended for the impatient who does not want to write a script.
Assumes that you have a configuration file for MySQL with username
and password. Parameters are passed in this order:
host
database
source table
index field
lookup fields
lookup table
destination table
copy indexes
verbose
simulate
Here is an example of one-liner normalization call:
perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music MP3 \
album_id album,artist,genre tmp_albums songs 1 1 1))->do()'
Note: ALL 11 parameters must be passed, or an "use of uninitialized
value" error is issued.
This one-liner is equivalent to the following script:
#!/usr/bin/perl
no warnings; # Yeah. No warnings. I said it is equivalent,
# not recommended.
no strict; # Yup. No strict either.
use DBSchema::Normalizer;
$norm = DBSchema::Normalizer->new (
{
DSN => "DBI:mysql:music;host=localhost;"
. "mysql_read_default_file=$ENV{HOME}/.my.cnf",
src_table => "MP3",
index_field => "album_id",
lookup_fields => "artist,album,genre",
lookup_table => "tmp_albums",
dest_table => "songs",
copy_indexes => 1,
verbose => 1,
simulate => 1,
});
$norm->do();
It is definitely not as safe as the normal call. However, TMTOWTDI,
and it's your call. I am using it, but I don't recommend it. Read my
lips: I DON'T RECOMMEND IT.
do do();
do() performs the Normalization according to the parameters already
received. Internally calls *create_lookup_table()* and
*create_normalized_table()*
Will fail if not enough parameters have been supplied to new()
create_normalized_table()
create_normalized_table() will create a 2nd normal form table,
getting data from a source table and a lookup table. Lookup fields
in the source table will be replaced by the corresponding index
field in the lookup table.
If called without parameters, assumes the parameters passed to the
object constructor.
Parameters are passed as a hash reference, and are the same given to
new() except *DSN*, *username* and *password*. None are compulsory
here. The missing ones are taken from the constructor. However, a
check is done to ensure that all parameters are passed from either
sub.
create_lookup_table
create_lookup_table() will create a lookup table, extracting
repeating fields from a 1st normal form table. A numeric primary key
is created.
When called without parameters, assumes the values passed to the
object constructor (*new*).
Parameters are passed as a hash reference, and should include the
following
src_table table where to take the values from
lookup_fields which fields to take
lookup_table table to create
index_field primary key (will be foreign key in src_table)
to be created
AUTHOR
Giuseppe Maxia, giuseppe@gmaxia.it
COPYRIGHT
The DBSchema::Normalizer module is Copyright (c) 2001 Giuseppe Maxia,
( run in 1.343 second using v1.01-cache-2.11-cpan-39bf76dae61 )