mysql_read_default_file results from the CPAN

mysql_read_defaul...

DBSchema-Normalizer

view release on metacpan or search on metacpan

        KEY genre (genre), KEY album (album), KEY artist_id (artist_id));</PRE>
<PRE>

        INSERT INTO albums 
        SELECT src.album_id, src.album, src.genre, artist_id 
        FROM tmp_albums src 
        INNER JOIN artists lkp ON (src.artist =lkp.artist);</PRE>
<PRE>
 mysql&gt; describe artists;
 +-----------+-------------+------+-----+---------+----------------+
 | Field     | Type        | Null | Key | Default | Extra          |
 +-----------+-------------+------+-----+---------+----------------+
 | artist_id | int(11)     |      | PRI | NULL    | auto_increment |
 | artist    | varchar(20) |      | MUL |         |                |
 +-----------+-------------+------+-----+---------+----------------+
 2 rows in set (0.00 sec)
</PRE>
<PRE>

 mysql&gt; describe albums;
 +-----------+-------------+------+-----+---------+----------------+
 | Field     | Type        | Null | Key | Default | Extra          |
 +-----------+-------------+------+-----+---------+----------------+
 | album_id  | int(11)     |      | PRI | NULL    | auto_increment |
 | album     | varchar(30) |      | MUL |         |                |
 | genre     | varchar(10) |      | MUL |         |                |
 | artist_id | int(11)     |      | MUL | 0       |                |
 +-----------+-------------+------+-----+---------+----------------+
 4 rows in set (0.00 sec)</PRE>
<PRE>

 mysql&gt; describe songs;
 +----------+-------------+------+-----+----------+----------------+
 | Field    | Type        | Null | Key | Default  | Extra          |
 +----------+-------------+------+-----+----------+----------------+
 | ID       | int(11)     |      | PRI | NULL     | auto_increment |
 | title    | varchar(40) |      | MUL |          |                |
 | duration | time        |      |     | 00:00:00 |                |
 | size     | int(11)     |      |     | 0        |                |
 | album_id | int(11)     |      | MUL | 0        |                |
 +----------+-------------+------+-----+----------+----------------+
 5 rows in set (0.00 sec)</PRE>
<PRE>

It should be clear now WHAT we have to do. Less clear is HOW. The above instructions seem to imply that we manually copy the field structure from the source table to the lookup and normalized tables.</PRE>
<P>Actually, that SQL code (except the DESCRIBEs) was produced by this very module and printed to the STDOUT, so that all I had to do was some cut-and-paste.
And then we are back to the question of the algorithm. If this is all SQL, where is Perl involved?
The answer is that Perl will reduce the amount of information we need to give to the database engine.
The information about the field structure and indexes is already in the database. Our Perl module (with a [not so] little help from the DBI) can extract the structure from the database and create the appropriate SQL statements.
On the practical side, this means that, before producing SQL code, this module will gather information about the source table. It will issue a ``SHOW FIELDS FROM tablename'' and a ``SHOW INDEX FROM tablename'' statements, and parse their results to p...
<P>That's it. It seems a rather small contribution to your average workload, but if you ever have to deal with a project involving several large tables, with many fields, to be transformed into many normalized tables, I am sure you will appreciate th...
<P>BTW, this is the code used to produce the above SQL statements:</P>
<PRE>
        #!/usr/bin/perl -w
        use strict;</PRE>
<PRE>
        use DBSchema::Normalizer;</PRE>
<PRE>
        my $norm = DBSchema::Normalizer-&gt;new ({
                DSN  =&gt; &quot;DBI:mysql:music;host=localhost;&quot;
                         . &quot;mysql_read_default_file=$ENV{HOME}/.my.cnf&quot;, 
                src_table     =&gt; &quot;MP3&quot;,
                index_field   =&gt; &quot;album_id&quot;,
                lookup_fields =&gt; &quot;artist,album,genre&quot;,
                lookup_table  =&gt; &quot;tmp_albums&quot;, 
                dest_table    =&gt; &quot;songs&quot;,
                copy_indexes  =&gt;  1,
                simulate      =&gt;  1
         });</PRE>
<PRE>
        $norm-&gt;do();</PRE>
<PRE>
        $norm-&gt;create_lookup_table ({ 
          src_table     =&gt; &quot;tmp_albums&quot;,
          index_field   =&gt; &quot;artist_id&quot;,
          lookup_fields =&gt; &quot;artist&quot;,
          lookup_table  =&gt; &quot;artists&quot;
        });</PRE>
<PRE>
        $norm-&gt;create_normalized_table ({
          src_table     =&gt; &quot;tmp_albums&quot;,
          lookup_table  =&gt; &quot;artists&quot;,
          index_field   =&gt; &quot;artist_id&quot;,
          lookup_fields =&gt; &quot;artist&quot;,
          dest_table    =&gt; &quot;albums&quot;
        });</PRE>
<P>Twenty-five lines of code. Not bad for such a complicated task. But even that could have been replaced by these two one-liners:</P>
<PRE>
 perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer-&gt;snew(qw(localhost music MP3 \ 
       album_id album,artist,genre tmp_albums songs 1 1 1))-&gt;do()'
</PRE>
<PRE>

 perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer-&gt;snew(qw(localhost music \ 
    tmp_albums artist_id artist artists albums 1 1 1))-&gt;do()'</PRE>
<P>(See below item ``snew'' for more details.)
</P>
<PRE>

One thing that this module won't do for you, though, is to decide which columns should stay with the source table and which ones should go to the lookup table. This is something for which you need to apply some database theory, and I don't expect you...
I am planning (in a very idle and distant way) another module that will analyze a database table and decide if it is a good design or not. The examples from commercial software I have seen so far did not impress me a lot. I am still convinced that hu...
<P>
<H2><A NAME="simulation mode">Simulation mode</A></H2>
<P>This module can do the data transfer for you, or you may want to run it in ``simulation mode'', by adding simulate =&gt; ``1'' to the constructor parameters. When in simulation mode, the DBSchema::Normalizer will just print the necessary SQL state...
<P>
<H2><A NAME="export">EXPORT</A></H2>
<P>new, snew, create_lookup_table, create_normalized_table</P>
<P>
<H2><A NAME="dependencies">DEPENDENCIES</A></H2>
<P>DBI, DBD::mysql</P>
<P>
<H2><A NAME="architecture">Architecture</A></H2>
<P>The Normalizer doesn't enforce private data protection. You are only supposed to call the methods which are documented here as public. In the spirit of Perl OO philosophy, nobody will prevent you from calling private methods (the ones beginning wi...
<P>
<H2><A NAME="portability">PORTABILITY</A></H2>
<P>The algorithm used here is general. It was initially developed in C for an embedded RDBMS and there is no reason to assume that it won't work in any other database engine. However, the information about field types and indexes is, in this module, ...
To adapt this module for a different database, corresponding SQL statements for the MYSQL specific SHOW INDEX and SHOW FIELDS should be provided. Also the syntax for INNER JOIN might not be portable across databases.</P>
<P>
<H2><A NAME="caveat">CAVEAT</A></H2>
<P>As always, when dealing with databases, some caution is needed. 
The <A HREF="#item_create_lookup_table"><CODE>create_lookup_table()</CODE></A> method will <STRONG>drop the lookup table</STRONG>, if exists. Be careful about the name you supply for this purpose. If you want to use an existing lookup table (whose da...

docs/Normalizer.html view on Meta::CPAN

                        &quot;yes&quot; or &quot;1&quot;   : indexes from the source table will
                                         be immediately replicated to the 
                                         destination table
                        &quot;later&quot; or &quot;2&quot; : indexes will be created after the 
                                                         data transfer,
                                         as an ALTER TABLE statement. 
                                         It may speed up the insertion
                                         for large tables.
        verbose         if &quot;1&quot;, messages indicating what is going on
                        will be sent to STDOUT. 
                        Using &quot;2&quot;, even more verbose information is 
                        given (all queries printed before execution);
                        Level &quot;3&quot; will also show details about src_table 
                        fields and indexes;
        ignore_warning  if &quot;1&quot;, warning on missing indexes on lookup fields 
                        are ignored, and the requested operation carried 
                                        out even at a price of long waiting. Default &quot;0&quot;
        simulate        if &quot;1&quot;, no operation is carried out
                        but the queries are printed to STDOUT (as in 
                                        verbose =&gt; 2)</PRE>
<P><STRONG>note</STRONG>: src_table, dest_table and lookup_table <STRONG>cannot</STRONG> be called <EM>src</EM> or <EM>lkp</EM>, which are used internally by the Normalizer. If such names are given, a fatal error is issued.</P>
<P>If the keys for src_table, index_field, lookup table and fields are missing, they can (actually they MUST) be later provided by calls to <A HREF="#item_create_lookup_table"><CODE>create_lookup_table()</CODE></A> and create_normalized_table().</P>
<DT><STRONG><A NAME="item_snew">snew</A></STRONG><BR>
<DD>
snew is a shortcut for new(). It is called with parameters passed by position instead of using a hash reference.
It is a ``quick-and-dirty'' (``dirty'' being the operational word) method intended for the impatient who does not want to write a script.
<STRONG>Assumes that you have a configuration file for MySQL with username and password</STRONG>.
Parameters are passed in this order:
<PRE>
        host
        database
        source table
        index field
        lookup fields
        lookup table
        destination table
        copy indexes
        verbose
        simulate
</PRE>
<PRE>

Here is an example of one-liner normalization call:</PRE>
<PRE>

 perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer-&gt;snew(qw(localhost music MP3 \
    album_id album,artist,genre tmp_albums songs 1 1 1))-&gt;do()'</PRE>
<P>Note: ALL 11 parameters must be passed, or an ``use of uninitialized value'' error is issued.</P>
<P>This one-liner is equivalent to the following script:
</P>
<PRE>

        #!/usr/bin/perl 
        no warnings; # Yeah. No warnings. I said it is equivalent, 
                                 # not recommended.
        no strict;   # Yup. No strict either. 
        use DBSchema::Normalizer;
        $norm = DBSchema::Normalizer-&gt;new ( 
        {
                DSN =&gt; &quot;DBI:mysql:music;host=localhost;&quot;
                        . &quot;mysql_read_default_file=$ENV{HOME}/.my.cnf&quot;, 
                src_table     =&gt; &quot;MP3&quot;,
                index_field   =&gt; &quot;album_id&quot;,
                lookup_fields =&gt; &quot;artist,album,genre&quot;,
                lookup_table  =&gt; &quot;tmp_albums&quot;, 
                dest_table    =&gt; &quot;songs&quot;,
                copy_indexes  =&gt;  1,
                verbose       =&gt;  1,
                simulate      =&gt;  1,
         });
        $norm-&gt;do();</PRE>
<P>It is definitely not as safe as the normal call. However, TMTOWTDI, and it's your call. I am using it, but I don't recommend it. Read my lips: I DON'T RECOMMEND IT.</P>
<P></P>
<DT><STRONG><A NAME="item_do">do</A></STRONG><BR>
<DD>
do();
<P><A HREF="#item_do"><CODE>do()</CODE></A> performs the Normalization according to the parameters already received. Internally calls <EM>create_lookup_table()</EM> and <EM>create_normalized_table()</EM></P>
<P>Will fail if not enough parameters have been supplied to <A HREF="#item_new"><CODE>new()</CODE></A></P>
<P></P>
<DT><STRONG><A NAME="item_create_normalized_table"><CODE>create_normalized_table()</CODE></A></STRONG><BR>
<DD>
<A HREF="#item_create_normalized_table"><CODE>create_normalized_table()</CODE></A> will create a 2nd normal form table, getting data from a source table and a lookup table.
Lookup fields in the source table will be replaced by the corresponding index field in the lookup table.
<P>If called without parameters, assumes the parameters passed to the object constructor.</P>
<P>Parameters are passed as a hash reference, and are the same given to <A HREF="#item_new"><CODE>new()</CODE></A> except <EM>DSN</EM>, <EM>username</EM> and <EM>password</EM>. None are compulsory here. The missing ones are taken from the constructor...
<P></P>
<DT><STRONG><A NAME="item_create_lookup_table">create_lookup_table</A></STRONG><BR>
<DD>
<A HREF="#item_create_lookup_table"><CODE>create_lookup_table()</CODE></A> will create a lookup table, extracting repeating fields from a 1st normal form table. A numeric primary key is created.
<P>When called without parameters, assumes the values passed to the object constructor (<EM>new</EM>).</P>
<P>Parameters are passed as a hash reference, and should include the following</P>
<PRE>
        src_table      table where to take the values from
        lookup_fields  which fields to take
        lookup_table   table to create
        index_field        primary key (will be foreign key in src_table) 
                   to be created</PRE>
<P></P></DL>
<P>
<HR>
<H1><A NAME="author">AUTHOR</A></H1>
<P>Giuseppe Maxia, <A HREF="mailto:giuseppe@gmaxia.it">giuseppe@gmaxia.it</A></P>
<P>
<HR>
<H1><A NAME="copyright">COPYRIGHT</A></H1>
<P>The DBSchema::Normalizer module is Copyright (c) 2001 Giuseppe Maxia,
Sardinia, Italy. All rights reserved.
</P>
<PRE>

You may distribute this software under the terms of either the GNU
General Public License version 2 or the Artistic License, as
specified in the Perl README file.</PRE>
<PRE>

The embedded and encosed documentation is released under 
the GNU FDL Free Documentation License 1.1</PRE>

</BODY>

</HTML>

( run in 0.440 second using v1.01-cache-2.11-cpan-bbcb1afb8fc )