DBSchema-Normalizer
view release on metacpan or search on metacpan
Normalizer.pm view on Meta::CPAN
artist_id INT(11) NOT NULL,
PRIMARY KEY (album_id),
KEY genre (genre), KEY album (album), KEY artist_id (artist_id));
INSERT INTO albums
SELECT src.album_id, src.album, src.genre, artist_id
FROM tmp_albums src
INNER JOIN artists lkp ON (src.artist =lkp.artist);
mysql> describe artists;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| artist_id | int(11) | | PRI | NULL | auto_increment |
| artist | varchar(20) | | MUL | | |
+-----------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> describe albums;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| album_id | int(11) | | PRI | NULL | auto_increment |
| album | varchar(30) | | MUL | | |
| genre | varchar(10) | | MUL | | |
| artist_id | int(11) | | MUL | 0 | |
+-----------+-------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> describe songs;
+----------+-------------+------+-----+----------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+----------+----------------+
| ID | int(11) | | PRI | NULL | auto_increment |
| title | varchar(40) | | MUL | | |
| duration | time | | | 00:00:00 | |
| size | int(11) | | | 0 | |
| album_id | int(11) | | MUL | 0 | |
+----------+-------------+------+-----+----------+----------------+
5 rows in set (0.00 sec)
It should be clear now WHAT we have to do. Less clear is HOW. The above instructions seem to imply that we manually copy the field structure from the source table to the lookup and normalized tables.
Actually, that SQL code (except the DESCRIBEs) was produced by this very module and printed to the STDOUT, so that all I had to do was some cut-and-paste.
And then we are back to the question of the algorithm. If this is all SQL, where is Perl involved?
The answer is that Perl will reduce the amount of information we need to give to the database engine.
The information about the field structure and indexes is already in the database. Our Perl module (with a [not so] little help from the DBI) can extract the structure from the database and create the appropriate SQL statements.
On the practical side, this means that, before producing SQL code, this module will gather information about the source table. It will issue a "SHOW FIELDS FROM tablename" and a "SHOW INDEX FROM tablename" statements, and parse their results to prepa...
That's it. It seems a rather small contribution to your average workload, but if you ever have to deal with a project involving several large tables, with many fields, to be transformed into many normalized tables, I am sure you will appreciate the G...
BTW, this is the code used to produce the above SQL statements:
#!/usr/bin/perl -w
use strict;
use DBSchema::Normalizer;
my $norm = DBSchema::Normalizer->new ({
DSN => "DBI:mysql:music;host=localhost;"
. "mysql_read_default_file=$ENV{HOME}/.my.cnf",
src_table => "MP3",
index_field => "album_id",
lookup_fields => "artist,album,genre",
lookup_table => "tmp_albums",
dest_table => "songs",
copy_indexes => 1,
simulate => 1
});
$norm->do();
$norm->create_lookup_table ({
src_table => "tmp_albums",
index_field => "artist_id",
lookup_fields => "artist",
lookup_table => "artists"
});
$norm->create_normalized_table ({
src_table => "tmp_albums",
lookup_table => "artists",
index_field => "artist_id",
lookup_fields => "artist",
dest_table => "albums"
});
Twenty-five lines of code. Not bad for such a complicated task. But even that could have been replaced by these two one-liners:
perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music MP3 \
album_id album,artist,genre tmp_albums songs 1 1 1))->do()'
perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music \
tmp_albums artist_id artist artists albums 1 1 1))->do()'
(See below item "snew" for more details.)
One thing that this module won't do for you, though, is to decide which columns should stay with the source table and which ones should go to the lookup table. This is something for which you need to apply some database theory, and I don't expect you...
I am planning (in a very idle and distant way) another module that will analyze a database table and decide if it is a good design or not. The examples from commercial software I have seen so far did not impress me a lot. I am still convinced that hu...
=head2 Simulation mode
This module can do the data transfer for you, or you may want to run it in "simulation mode", by adding simulate => "1" to the constructor parameters. When in simulation mode, the DBSchema::Normalizer will just print the necessary SQL statements to S...
=head2 EXPORT
new, snew, create_lookup_table, create_normalized_table
=head2 DEPENDENCIES
DBI, DBD::mysql
=head2 Architecture
The Normalizer doesn't enforce private data protection. You are only supposed to call the methods which are documented here as public. In the spirit of Perl OO philosophy, nobody will prevent you from calling private methods (the ones beginning with ...
=head2 PORTABILITY
The algorithm used here is general. It was initially developed in C for an embedded RDBMS and there is no reason to assume that it won't work in any other database engine. However, the information about field types and indexes is, in this module, MyS...
To adapt this module for a different database, corresponding SQL statements for the MYSQL specific SHOW INDEX and SHOW FIELDS should be provided. Also the syntax for INNER JOIN might not be portable across databases.
Normalizer.pm view on Meta::CPAN
$_dbh= DBI->connect($params->{DSN}, $params->{username},
$params->{password}, { RaiseError => 1});
}
my $self = bless {
verbose => 0,
copy_indexes => 0,
simulate => 0,
ignore_warning => 0,
_dbh => $_dbh # Being an object, $_dbh is already
# a reference. Doesn't need the "\"
# before it.
}, $class;
foreach my $key (keys (%$params)) {
croak "invalid parameter $key \n" unless exists $_accepted_params{$key};
$self->{$key} = $params->{$key};
}
if ($self->{simulate} eq "1") {
$self->{verbose} = "2";
}
elsif ($self->{simulate} ne "0") {
croak "invalid value for <simulate>\n";
}
return ($self);
}
=item snew
snew is a shortcut for new(). It is called with parameters passed by position instead of using a hash reference.
It is a "quick-and-dirty" ("dirty" being the operational word) method intended for the impatient who does not want to write a script.
B<Assumes that you have a configuration file for MySQL with username and password>.
Parameters are passed in this order:
host
database
source table
index field
lookup fields
lookup table
destination table
copy indexes
verbose
simulate
Here is an example of one-liner normalization call:
perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music MP3 \
album_id album,artist,genre tmp_albums songs 1 1 1))->do()'
Note: ALL 11 parameters must be passed, or an "use of uninitialized value" error is issued.
This one-liner is equivalent to the following script:
#!/usr/bin/perl
no warnings; # Yeah. No warnings. I said it is equivalent,
# not recommended.
no strict; # Yup. No strict either.
use DBSchema::Normalizer;
$norm = DBSchema::Normalizer->new (
{
DSN => "DBI:mysql:music;host=localhost;"
. "mysql_read_default_file=$ENV{HOME}/.my.cnf",
src_table => "MP3",
index_field => "album_id",
lookup_fields => "artist,album,genre",
lookup_table => "tmp_albums",
dest_table => "songs",
copy_indexes => 1,
verbose => 1,
simulate => 1,
});
$norm->do();
It is definitely not as safe as the normal call. However, TMTOWTDI, and it's your call. I am using it, but I don't recommend it. Read my lips: I DON'T RECOMMEND IT.
=cut
sub snew { # shortcut new (parameters called by position)
my ($class, $host, $db, $src_table, $index_field,
$lookup_fields, $lookup_table, $dest_table,
$copy_indexes, $verbose, $simulate) = @_;
my $DSN= "DBI:mysql:$db;host=$host;"
. "mysql_read_default_file=$ENV{HOME}/.my.cnf";
return new ($class, {
DSN => $DSN,
src_table => $src_table,
index_field => $index_field,
lookup_fields => $lookup_fields,
lookup_table => $lookup_table,
dest_table => $dest_table,
copy_indexes => $copy_indexes,
verbose => $verbose,
simulate => $simulate,
});
}
=for internal use
(The Destroyer will clean-up DBI objects.)
=cut
# use Data::Dumper;
sub DESTROY {
my $self = shift;
# print STDERR Data::Dumper->Dump([$self],["InDestroyer"]);
$self->{_dbh}->disconnect();
undef $self->{_dbh};
}
=item do
do();
do() performs the Normalization according to the parameters already received. Internally calls I<create_lookup_table()> and I<create_normalized_table()>
Will fail if not enough parameters have been supplied to new()
=cut
sub do {
my $self = shift;
return 0 unless $self->_init();
$self->create_lookup_table();
$self->create_normalized_table();
return 1;
}
=for internal use
(Checks that given keys in internal blessed hash are defined)
=cut
sub _init_field {
my $self = shift;
my @fields = @_;
my $def = 1;
foreach (@fields) {
if (!defined $self->{$_}) {
$self->_verbose("0", "missing $_\n");
return 0;
}
}
return 1;
( run in 1.242 second using v1.01-cache-2.11-cpan-39bf76dae61 )