DBSchema-Normalizer

 view release on metacpan or  search on metacpan

docs/Normalizer.txt  view on Meta::CPAN

     | artist    | varchar(20) |      | MUL |         |                |
     +-----------+-------------+------+-----+---------+----------------+
     2 rows in set (0.00 sec)
 
     mysql> describe albums;
     +-----------+-------------+------+-----+---------+----------------+
     | Field     | Type        | Null | Key | Default | Extra          |
     +-----------+-------------+------+-----+---------+----------------+
     | album_id  | int(11)     |      | PRI | NULL    | auto_increment |
     | album     | varchar(30) |      | MUL |         |                |
     | genre     | varchar(10) |      | MUL |         |                |
     | artist_id | int(11)     |      | MUL | 0       |                |
     +-----------+-------------+------+-----+---------+----------------+
     4 rows in set (0.00 sec)
 
     mysql> describe songs;
     +----------+-------------+------+-----+----------+----------------+
     | Field    | Type        | Null | Key | Default  | Extra          |
     +----------+-------------+------+-----+----------+----------------+
     | ID       | int(11)     |      | PRI | NULL     | auto_increment |
     | title    | varchar(40) |      | MUL |          |                |
     | duration | time        |      |     | 00:00:00 |                |
     | size     | int(11)     |      |     | 0        |                |
     | album_id | int(11)     |      | MUL | 0        |                |
     +----------+-------------+------+-----+----------+----------------+
     5 rows in set (0.00 sec)
 
    It should be clear now WHAT we have to do. Less clear is HOW. The above
    instructions seem to imply that we manually copy the field structure
    from the source table to the lookup and normalized tables.

    Actually, that SQL code (except the DESCRIBEs) was produced by this very
    module and printed to the STDOUT, so that all I had to do was some
    cut-and-paste. And then we are back to the question of the algorithm. If
    this is all SQL, where is Perl involved? The answer is that Perl will
    reduce the amount of information we need to give to the database engine.
    The information about the field structure and indexes is already in the
    database. Our Perl module (with a [not so] little help from the DBI) can
    extract the structure from the database and create the appropriate SQL
    statements. On the practical side, this means that, before producing SQL
    code, this module will gather information about the source table. It
    will issue a "SHOW FIELDS FROM tablename" and a "SHOW INDEX FROM
    tablename" statements, and parse their results to prepare the
    operational code.

    That's it. It seems a rather small contribution to your average
    workload, but if you ever have to deal with a project involving several
    large tables, with many fields, to be transformed into many normalized
    tables, I am sure you will appreciate the GPL (Golden Perl Laziness)
    behind this module.

    BTW, this is the code used to produce the above SQL statements:

            #!/usr/bin/perl -w
            use strict;

            use DBSchema::Normalizer;

            my $norm = DBSchema::Normalizer->new ({
                    DSN  => "DBI:mysql:music;host=localhost;"
                             . "mysql_read_default_file=$ENV{HOME}/.my.cnf", 
                    src_table     => "MP3",
                    index_field   => "album_id",
                    lookup_fields => "artist,album,genre",
                    lookup_table  => "tmp_albums", 
                    dest_table    => "songs",
                    copy_indexes  =>  1,
                    simulate      =>  1
             });

            $norm->do();

            $norm->create_lookup_table ({ 
              src_table     => "tmp_albums",
              index_field   => "artist_id",
              lookup_fields => "artist",
              lookup_table  => "artists"
            });

            $norm->create_normalized_table ({
              src_table     => "tmp_albums",
              lookup_table  => "artists",
              index_field   => "artist_id",
              lookup_fields => "artist",
              dest_table    => "albums"
            });

    Twenty-five lines of code. Not bad for such a complicated task. But even
    that could have been replaced by these two one-liners:

     perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music MP3 \ 
           album_id album,artist,genre tmp_albums songs 1 1 1))->do()'
        
     perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music \ 
        tmp_albums artist_id artist artists albums 1 1 1))->do()'

    (See below item "snew" for more details.)

    One thing that this module won't do for you, though, is to decide which
    columns should stay with the source table and which ones should go to
    the lookup table. This is something for which you need to apply some
    database theory, and I don't expect you to know it unless you have
    studied it (unless you happen to be J.F. Codd) either at school or
    independently. I am planning (in a very idle and distant way) another
    module that will analyze a database table and decide if it is a good
    design or not. The examples from commercial software I have seen so far
    did not impress me a lot. I am still convinced that humans are better
    than machines at this task. But, hey! Ten years ago I was convinced that
    humans were much better than machines at chess, and instead, not long
    ago, I had to see an IBM box doing very nasty things to Gary Kasparov.
    So maybe I'll change my mind. In the meantime, I am enjoying my present
    intellectual superiority and I keep analyzing databases with the same
    pleasure that I once felt when solving a chess problem.

  Simulation mode

    This module can do the data transfer for you, or you may want to run it
    in "simulation mode", by adding simulate => "1" to the constructor
    parameters. When in simulation mode, the DBSchema::Normalizer will just
    print the necessary SQL statements to STDOUT, without passing them to
    the database engine. You can thus check the queries and eventually

docs/Normalizer.txt  view on Meta::CPAN

                                                 It may speed up the insertion
                                                 for large tables.
                verbose         if "1", messages indicating what is going on
                                will be sent to STDOUT. 
                                Using "2", even more verbose information is 
                                given (all queries printed before execution);
                                Level "3" will also show details about src_table 
                                fields and indexes;
                ignore_warning  if "1", warning on missing indexes on lookup fields 
                                are ignored, and the requested operation carried 
                                                out even at a price of long waiting. Default "0"
                simulate        if "1", no operation is carried out
                                but the queries are printed to STDOUT (as in 
                                                verbose => 2)

        note: src_table, dest_table and lookup_table cannot be called *src*
        or *lkp*, which are used internally by the Normalizer. If such names
        are given, a fatal error is issued.

        If the keys for src_table, index_field, lookup table and fields are
        missing, they can (actually they MUST) be later provided by calls to
        create_lookup_table() and create_normalized_table().

    snew
        snew is a shortcut for new(). It is called with parameters passed by
        position instead of using a hash reference. It is a
        "quick-and-dirty" ("dirty" being the operational word) method
        intended for the impatient who does not want to write a script.
        Assumes that you have a configuration file for MySQL with username
        and password. Parameters are passed in this order:

                host
                database
                source table
                index field
                lookup fields
                lookup table
                destination table
                copy indexes
                verbose
                simulate
        
        Here is an example of one-liner normalization call:

         perl -e 'use DBSchema::Normalizer; DBSchema::Normalizer->snew(qw(localhost music MP3 \
            album_id album,artist,genre tmp_albums songs 1 1 1))->do()'

        Note: ALL 11 parameters must be passed, or an "use of uninitialized
        value" error is issued.

        This one-liner is equivalent to the following script:

                #!/usr/bin/perl 
                no warnings; # Yeah. No warnings. I said it is equivalent, 
                                         # not recommended.
                no strict;   # Yup. No strict either. 
                use DBSchema::Normalizer;
                $norm = DBSchema::Normalizer->new ( 
                {
                        DSN => "DBI:mysql:music;host=localhost;"
                                . "mysql_read_default_file=$ENV{HOME}/.my.cnf", 
                        src_table     => "MP3",
                        index_field   => "album_id",
                        lookup_fields => "artist,album,genre",
                        lookup_table  => "tmp_albums", 
                        dest_table    => "songs",
                        copy_indexes  =>  1,
                        verbose       =>  1,
                        simulate      =>  1,
                 });
                $norm->do();

        It is definitely not as safe as the normal call. However, TMTOWTDI,
        and it's your call. I am using it, but I don't recommend it. Read my
        lips: I DON'T RECOMMEND IT.

    do  do();

        do() performs the Normalization according to the parameters already
        received. Internally calls *create_lookup_table()* and
        *create_normalized_table()*

        Will fail if not enough parameters have been supplied to new()

    create_normalized_table()
        create_normalized_table() will create a 2nd normal form table,
        getting data from a source table and a lookup table. Lookup fields
        in the source table will be replaced by the corresponding index
        field in the lookup table.

        If called without parameters, assumes the parameters passed to the
        object constructor.

        Parameters are passed as a hash reference, and are the same given to
        new() except *DSN*, *username* and *password*. None are compulsory
        here. The missing ones are taken from the constructor. However, a
        check is done to ensure that all parameters are passed from either
        sub.

    create_lookup_table
        create_lookup_table() will create a lookup table, extracting
        repeating fields from a 1st normal form table. A numeric primary key
        is created.

        When called without parameters, assumes the values passed to the
        object constructor (*new*).

        Parameters are passed as a hash reference, and should include the
        following

                src_table      table where to take the values from
                lookup_fields  which fields to take
                lookup_table   table to create
                index_field        primary key (will be foreign key in src_table) 
                           to be created

AUTHOR
    Giuseppe Maxia, giuseppe@gmaxia.it

COPYRIGHT
    The DBSchema::Normalizer module is Copyright (c) 2001 Giuseppe Maxia,



( run in 1.343 second using v1.01-cache-2.11-cpan-39bf76dae61 )