DBSchema-Normalizer
view release on metacpan or search on metacpan
docs/Normalizer.html view on Meta::CPAN
KEY genre (genre), KEY album (album), KEY artist_id (artist_id));</PRE>
<PRE>
INSERT INTO albums
SELECT src.album_id, src.album, src.genre, artist_id
FROM tmp_albums src
INNER JOIN artists lkp ON (src.artist =lkp.artist);</PRE>
<PRE>
mysql> describe artists;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| artist_id | int(11) | | PRI | NULL | auto_increment |
| artist | varchar(20) | | MUL | | |
+-----------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
</PRE>
<PRE>
mysql> describe albums;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| album_id | int(11) | | PRI | NULL | auto_increment |
| album | varchar(30) | | MUL | | |
| genre | varchar(10) | | MUL | | |
| artist_id | int(11) | | MUL | 0 | |
+-----------+-------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)</PRE>
<PRE>
mysql> describe songs;
+----------+-------------+------+-----+----------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+----------+----------------+
| ID | int(11) | | PRI | NULL | auto_increment |
| title | varchar(40) | | MUL | | |
| duration | time | | | 00:00:00 | |
| size | int(11) | | | 0 | |
| album_id | int(11) | | MUL | 0 | |
+----------+-------------+------+-----+----------+----------------+
5 rows in set (0.00 sec)</PRE>
<PRE>
It should be clear now WHAT we have to do. Less clear is HOW. The above instructions seem to imply that we manually copy the field structure from the source table to the lookup and normalized tables.</PRE>
<P>Actually, that SQL code (except the DESCRIBEs) was produced by this very module and printed to the STDOUT, so that all I had to do was some cut-and-paste.
And then we are back to the question of the algorithm. If this is all SQL, where is Perl involved?
The answer is that Perl will reduce the amount of information we need to give to the database engine.
The information about the field structure and indexes is already in the database. Our Perl module (with a [not so] little help from the DBI) can extract the structure from the database and create the appropriate SQL statements.
On the practical side, this means that, before producing SQL code, this module will gather information about the source table. It will issue a ``SHOW FIELDS FROM tablename'' and a ``SHOW INDEX FROM tablename'' statements, and parse their results to p...
<P>That's it. It seems a rather small contribution to your average workload, but if you ever have to deal with a project involving several large tables, with many fields, to be transformed into many normalized tables, I am sure you will appreciate th...
<P>BTW, this is the code used to produce the above SQL statements:</P>
<PRE>
#!/usr/bin/perl -w
use strict;</PRE>
<PRE>
use DBSchema::Normalizer;</PRE>
<PRE>
my $norm = DBSchema::Normalizer->new ({
DSN => "DBI:mysql:music;host=localhost;"
. "mysql_read_default_file=$ENV{HOME}/.my.cnf",
src_table => "MP3",
index_field => "album_id",
lookup_fields => "artist,album,genre",
lookup_table => "tmp_albums",
dest_table => "songs",
copy_indexes => 1,
simulate => 1
});</PRE>
<PRE>
$norm->do();</PRE>
<PRE>
$norm->create_lookup_table ({
src_table => "tmp_albums",
index_field => "artist_id",
lookup_fields => "artist",
lookup_table => "artists"
});</PRE>
<PRE>
$norm->create_normalized_table ({
src_table => "tmp_albums",
lookup_table => "artists",
index_field => "artist_id",
lookup_fields => "artist",
dest_table => "albums"
});</PRE>
<P>Twenty-five lines of code. Not bad for such a complicated task. But even that could have been replaced by these two one-liners:</P>
<PRE>
perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music MP3 \
album_id album,artist,genre tmp_albums songs 1 1 1))->do()'
</PRE>
<PRE>
perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music \
tmp_albums artist_id artist artists albums 1 1 1))->do()'</PRE>
<P>(See below item ``snew'' for more details.)
</P>
<PRE>
One thing that this module won't do for you, though, is to decide which columns should stay with the source table and which ones should go to the lookup table. This is something for which you need to apply some database theory, and I don't expect you...
I am planning (in a very idle and distant way) another module that will analyze a database table and decide if it is a good design or not. The examples from commercial software I have seen so far did not impress me a lot. I am still convinced that hu...
<P>
<H2><A NAME="simulation mode">Simulation mode</A></H2>
<P>This module can do the data transfer for you, or you may want to run it in ``simulation mode'', by adding simulate => ``1'' to the constructor parameters. When in simulation mode, the DBSchema::Normalizer will just print the necessary SQL state...
<P>
<H2><A NAME="export">EXPORT</A></H2>
<P>new, snew, create_lookup_table, create_normalized_table</P>
<P>
<H2><A NAME="dependencies">DEPENDENCIES</A></H2>
<P>DBI, DBD::mysql</P>
<P>
<H2><A NAME="architecture">Architecture</A></H2>
<P>The Normalizer doesn't enforce private data protection. You are only supposed to call the methods which are documented here as public. In the spirit of Perl OO philosophy, nobody will prevent you from calling private methods (the ones beginning wi...
<P>
<H2><A NAME="portability">PORTABILITY</A></H2>
<P>The algorithm used here is general. It was initially developed in C for an embedded RDBMS and there is no reason to assume that it won't work in any other database engine. However, the information about field types and indexes is, in this module, ...
To adapt this module for a different database, corresponding SQL statements for the MYSQL specific SHOW INDEX and SHOW FIELDS should be provided. Also the syntax for INNER JOIN might not be portable across databases.</P>
<P>
<H2><A NAME="caveat">CAVEAT</A></H2>
<P>As always, when dealing with databases, some caution is needed.
The <A HREF="#item_create_lookup_table"><CODE>create_lookup_table()</CODE></A> method will <STRONG>drop the lookup table</STRONG>, if exists. Be careful about the name you supply for this purpose. If you want to use an existing lookup table (whose da...
docs/Normalizer.html view on Meta::CPAN
"yes" or "1" : indexes from the source table will
be immediately replicated to the
destination table
"later" or "2" : indexes will be created after the
data transfer,
as an ALTER TABLE statement.
It may speed up the insertion
for large tables.
verbose if "1", messages indicating what is going on
will be sent to STDOUT.
Using "2", even more verbose information is
given (all queries printed before execution);
Level "3" will also show details about src_table
fields and indexes;
ignore_warning if "1", warning on missing indexes on lookup fields
are ignored, and the requested operation carried
out even at a price of long waiting. Default "0"
simulate if "1", no operation is carried out
but the queries are printed to STDOUT (as in
verbose => 2)</PRE>
<P><STRONG>note</STRONG>: src_table, dest_table and lookup_table <STRONG>cannot</STRONG> be called <EM>src</EM> or <EM>lkp</EM>, which are used internally by the Normalizer. If such names are given, a fatal error is issued.</P>
<P>If the keys for src_table, index_field, lookup table and fields are missing, they can (actually they MUST) be later provided by calls to <A HREF="#item_create_lookup_table"><CODE>create_lookup_table()</CODE></A> and create_normalized_table().</P>
<DT><STRONG><A NAME="item_snew">snew</A></STRONG><BR>
<DD>
snew is a shortcut for new(). It is called with parameters passed by position instead of using a hash reference.
It is a ``quick-and-dirty'' (``dirty'' being the operational word) method intended for the impatient who does not want to write a script.
<STRONG>Assumes that you have a configuration file for MySQL with username and password</STRONG>.
Parameters are passed in this order:
<PRE>
host
database
source table
index field
lookup fields
lookup table
destination table
copy indexes
verbose
simulate
</PRE>
<PRE>
Here is an example of one-liner normalization call:</PRE>
<PRE>
perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music MP3 \
album_id album,artist,genre tmp_albums songs 1 1 1))->do()'</PRE>
<P>Note: ALL 11 parameters must be passed, or an ``use of uninitialized value'' error is issued.</P>
<P>This one-liner is equivalent to the following script:
</P>
<PRE>
#!/usr/bin/perl
no warnings; # Yeah. No warnings. I said it is equivalent,
# not recommended.
no strict; # Yup. No strict either.
use DBSchema::Normalizer;
$norm = DBSchema::Normalizer->new (
{
DSN => "DBI:mysql:music;host=localhost;"
. "mysql_read_default_file=$ENV{HOME}/.my.cnf",
src_table => "MP3",
index_field => "album_id",
lookup_fields => "artist,album,genre",
lookup_table => "tmp_albums",
dest_table => "songs",
copy_indexes => 1,
verbose => 1,
simulate => 1,
});
$norm->do();</PRE>
<P>It is definitely not as safe as the normal call. However, TMTOWTDI, and it's your call. I am using it, but I don't recommend it. Read my lips: I DON'T RECOMMEND IT.</P>
<P></P>
<DT><STRONG><A NAME="item_do">do</A></STRONG><BR>
<DD>
do();
<P><A HREF="#item_do"><CODE>do()</CODE></A> performs the Normalization according to the parameters already received. Internally calls <EM>create_lookup_table()</EM> and <EM>create_normalized_table()</EM></P>
<P>Will fail if not enough parameters have been supplied to <A HREF="#item_new"><CODE>new()</CODE></A></P>
<P></P>
<DT><STRONG><A NAME="item_create_normalized_table"><CODE>create_normalized_table()</CODE></A></STRONG><BR>
<DD>
<A HREF="#item_create_normalized_table"><CODE>create_normalized_table()</CODE></A> will create a 2nd normal form table, getting data from a source table and a lookup table.
Lookup fields in the source table will be replaced by the corresponding index field in the lookup table.
<P>If called without parameters, assumes the parameters passed to the object constructor.</P>
<P>Parameters are passed as a hash reference, and are the same given to <A HREF="#item_new"><CODE>new()</CODE></A> except <EM>DSN</EM>, <EM>username</EM> and <EM>password</EM>. None are compulsory here. The missing ones are taken from the constructor...
<P></P>
<DT><STRONG><A NAME="item_create_lookup_table">create_lookup_table</A></STRONG><BR>
<DD>
<A HREF="#item_create_lookup_table"><CODE>create_lookup_table()</CODE></A> will create a lookup table, extracting repeating fields from a 1st normal form table. A numeric primary key is created.
<P>When called without parameters, assumes the values passed to the object constructor (<EM>new</EM>).</P>
<P>Parameters are passed as a hash reference, and should include the following</P>
<PRE>
src_table table where to take the values from
lookup_fields which fields to take
lookup_table table to create
index_field primary key (will be foreign key in src_table)
to be created</PRE>
<P></P></DL>
<P>
<HR>
<H1><A NAME="author">AUTHOR</A></H1>
<P>Giuseppe Maxia, <A HREF="mailto:giuseppe@gmaxia.it">giuseppe@gmaxia.it</A></P>
<P>
<HR>
<H1><A NAME="copyright">COPYRIGHT</A></H1>
<P>The DBSchema::Normalizer module is Copyright (c) 2001 Giuseppe Maxia,
Sardinia, Italy. All rights reserved.
</P>
<PRE>
You may distribute this software under the terms of either the GNU
General Public License version 2 or the Artistic License, as
specified in the Perl README file.</PRE>
<PRE>
The embedded and encosed documentation is released under
the GNU FDL Free Documentation License 1.1</PRE>
</BODY>
</HTML>
( run in 0.886 second using v1.01-cache-2.11-cpan-39bf76dae61 )