AI-Genetic

 view release on metacpan or  search on metacpan

Genetic.pm  view on Meta::CPAN

# useful little methods to set/query parameters.
sub size       { $_[0]{POPSIZE}    = $_[1] if defined $_[1]; $_[0]{POPSIZE}   }
sub crossProb  { $_[0]{CROSSRATE}  = $_[1] if defined $_[1]; $_[0]{CROSSRATE} }
sub mutProb    { $_[0]{MUTPROB}    = $_[1] if defined $_[1]; $_[0]{MUTPROB}   }
sub indType    { $_[0]{INDIVIDUAL} }
sub generation { $_[0]{GENERATION} }

# sub inject():
# This method is used to add individuals to the current population.
# The point of it is that sometimes the population gets stagnant,
# so it could be useful add "fresh blood".
# Takes a variable number of arguments. The first argument is the
# total number, N, of new individuals to add. The remaining arguments
# are genomes to inject. There must be at most N genomes to inject.
# If the number, n, of genomes to inject is less than N, N - n random
# genomes are added. Perhaps an example will help?
# returns 1 on success and undef on error.

sub inject {
  my ($self, $count, @genomes) = @_;

Genetic.pm  view on Meta::CPAN

         my $fitness;
         # assign a number to $fitness based on the @$genes
         # ...

         return $fitness;
      }

      sub terminateFunc {
         my $ga = shift;

         # terminate if reached some threshold.
         return 1 if $ga->getFittest->score > $THRESHOLD;
         return 0;
      }

=head1 DESCRIPTION

This module implements a Genetic Algorithm (GA) in pure Perl.
Other Perl modules that achieve the same thing (perhaps better,
perhaps worse) do exist. Please check CPAN. I mainly wrote this
module to satisfy my own needs, and to learn something about GAs
along the way.

B<PLEASE NOTE:> As of v0.02, AI::Genetic has been re-written from
scratch to be more modular and expandable. To achieve this, I had
to modify the API, so it is not backward-compatible with v0.01.
As a result, I do not plan on supporting v0.01.

I will not go into the details of GAs here, but here are the
bare basics. Plenty of information can be found on the web.

In a GA, a population of individuals compete for survival. Each
individual is designated by a set of genes that define its
behaviour. Individuals that perform better (as defined by the
fitness function) have a higher chance of mating with other
individuals. When two individuals mate, they swap some of
their genes, resulting in an individual that has properties
from both of its "parents". Every now and then, a mutation
occurs where some gene randomly changes value, resulting in
a different individual. If all is well defined, after a few
generations, the population should converge on a "good-enough"
solution to the problem being tackled.

A GA implementation runs for a discrete number of time steps
called I<generations>. What happens during each generation can
vary greatly depending on the strategy being used (See 
L</"STRATEGIES"> for more info).
Typically, a variation of the following happens at
each generation:

Genetic.pm  view on Meta::CPAN

Here the performance of all the individuals is evaluated
based on the fitness function, and each is given a specific
fitness value. The higher the value, the bigger the chance
of an individual passing its genes on in future generations
through mating (crossover).

=item B<2. Crossover>

Here, individuals selected are randomly paired up for
crossover (aka I<sexual reproduction>). This is further
controlled by the crossover rate specified and may result in
a new offspring individual that contains genes common to
both parents. New individuals are injected into the current
population.

=item B<3. Mutation>

In this step, each individual is given the chance to mutate
based on the mutation probability specified. If an individual
is to mutate, each of its genes is given the chance to randomly
switch its value to some other state.

Genetic.pm  view on Meta::CPAN

For bitvectors, the argument is simply the length of the bitvector.

    $ga->init(10);

this initializes a population where each individual has 10 genes.

=item o

For listvectors, the argument is an anonymous list of lists. The
number of sub-lists is equal to the number of genes of each individual.
Each sub-list defines the possible string values that the corresponding gene
can assume.

    $ga->init([
               [qw/red blue green/],
               [qw/big medium small/],
               [qw/very_fat fat fit thin very_thin/],
              ]);

this initializes a population where each individual has 3 genes, and each gene
can assume one of the given values.

=item o

For rangevectors, the argument is an anonymous list of lists. The
number of sub-lists is equal to the number of genes of each individual.
Each sub-list defines the minimum and maximum integer values that the
corresponding gene can assume.

    $ga->init([
               [1, 5],
               [0, 20],
               [4, 9],
              ]);

this initializes a population where each individual has 3 genes, and each gene
can assume an integer within the corresponding range.

=back

=item I<$ga>-E<gt>B<inject>(I<N>, ?I<args>?)

This method can be used to add more individuals to the population. New individuals
can be randomly generated, or be explicitly specified. The first argument specifies
the number, I<N>, of new individuals to add. This can be followed by at most I<N>
arguments, each of which is an anonymous list that specifies the genome of a
single individual to add. If the number of genomes given, I<n>, is less than I<N>, then

Genetic.pm  view on Meta::CPAN

Each generation consists of the following steps:

=over

=item o

The population is sorted according to the individuals' fitnesses.

=item o

The subroutine corresponding to the named strategy is called with one argument,
the AI::Genetic object. This subroutine is expected to alter the object itself.

=item o

If a termination subroutine is given, it is executed and the return value is
checked. Evolution terminates if this sub returns a true value.

=back

=item I<$ga>-E<gt>B<getFittest>(?I<N>?)

This returns the I<N> fittest individuals. If not specified,
I<N> defaults to 1. As a side effect, it sorts the population by
fitness score. The actual AI::Genetic::Individual objects are returned.
You can use the C<genes()> and C<score()> methods to get the genes and the
scores of the individuals. Please check L<AI::Genetic::Individual> for details.

=item I<$ga>-E<gt>B<sortPopulation>

This method sorts the population according to fitness function. The results
are cached for speed.

=item I<$ga>-E<gt>B<sortIndividuals>(?[I<ListOfIndividuals>]?)

Given an anonymous list of individuals, this method sorts them according
to fitness, returning an anonymous list of the sorted individuals.

=item I<$ga>-E<gt>B<people>()

Returns an anonymous list of individuals of the current population.

Genetic.pm  view on Meta::CPAN

This method returns the current generation.

=back

=head1 FITNESS FUNCTION

Very quickly you will realize that properly defining the fitness function
is the most important aspect of a GA. Most of the time that a genetic
algorithm takes to run is spent in running the fitness function for each
separate individual to get its fitness. AI::Genetic tries to minimize this
time by caching the fitness result for each individual. But, B<you should
spend a lot of time optimizing your fitness function to achieve decent run
times.>

The fitness function should expect only one argument, an anonymous list of
genes, corresponding to the individual being analyzed. It is expected
to return a number which defines the fitness score of the said individual.
The higher the score, the more fit the individual, the more the chance it
has to be chosen for crossover.

=head1 STRATEGIES

AI::Genetic comes with 9 predefined strategies. These are:

=over

Genetic.pm  view on Meta::CPAN

    }

=head1 A NOTE ON SPEED/EFFICIENCY

Genetic algorithms are inherently slow.
Perl can be pretty fast, but will never reach the speed of optimized
C code (at least my Perl coding will not). I wrote AI::Genetic mainly
for my own learning experience, but still tried to optimize it as
much as I can while trying to keep it as flexible as possible.

To do that, I resorted to some well-known tricks like passing a
reference of a long list instead of the list itself (for example,
when calling the fitness function, a reference of the gene list
is passed), and caching fitness scores (if you try to evaluate
the fitness of the same individual more than once, then the fitness
function will not be called, and the cached result is returned).

To help speed up your run times, you should pay special attention
to the design of your fitness function since this will be called once
for each unique individual in each generation. If you can shave off a
few clock cycles here and there, then it will be greatly magnified in
the total run time.

=head1 BUGS

I have tested this module quite a bit, and even used it to solve a

Genetic.pm  view on Meta::CPAN

=head1 AUTHOR & CREDITS

Written by Ala Qumsieh I<aqumsieh@cpan.org>.

Special thanks go to John D. Porter and Oliver Smith for stimulating
discussions and great suggestions. Daniel Martin and Ivan Tubert-Brohman
uncovered various bugs and for this I'm grateful.

=head1 COPYRIGHTS

(c) 2003-2005 Ala Qumsieh. All rights reserved.
This module is distributed under the same terms as Perl itself.

=cut

Genetic/Individual.pm  view on Meta::CPAN


package AI::Genetic::Individual;

use strict;
use vars qw/$VERSION/;
$VERSION = 0.02;

# this package is to serve as a base package to
# all other individuals. It doesn't do anything
# interesting.

1;

sub new {  # hmm .. do I need this?
  my ($class, $genes) = @_;

  my $self;
  if (ref $class) { # clone mode
    $self = bless {} => ref $class;
    $self->{$_} = $class->{$_} for keys %$class;

Genetic/Individual.pm  view on Meta::CPAN

  my $self = shift;

  return $self->{SCORE} if $self->{CALCED};

  $self->{SCORE}  = $self->{FITFUNC}->(scalar $self->genes);
  $self->{CALCED} = 1;

  return $self->{SCORE};
}

sub resetScore { $_[0]{CALCED} = 0 }

# hmmm .. how do I reset {CALCED} in case of mutation?

__END__

=head1 NAME

AI::Genetic::Individual - Base class for AI::Genetic Individuals.

=head1 SYNOPSIS

See L<AI::Genetic>.

Genetic/Individual.pm  view on Meta::CPAN

calculate the individual's fitness. If an argument is given, it expects
a subroutine reference which will be set as the fitness subroutine. In
either case, it'll return the fitness sub ref.

=item I<$individual>-E<gt>B<score()>

This method returns the fitness score of the individual. If the score has
never been calculated before, then the fitness function is executed and
the score saved. Subsequent calls to score() will return the cached value.

=item I<$individual>-E<gt>B<resetScore()>

This method resets the score of the individual such that a subsequent call
to I<score()> will result in the execution of the fitness sub.

=back

The following methods are meant to be over-ridden by any class that
inherits from AI::Genetic::Individual:

=over 4

=item I<$individual>-E<gt>B<newRandom(options)>

Genetic/Individual.pm  view on Meta::CPAN

Note that in order for your own individual class to be useful, you have to define
your own custom strategy that knows how to evolve such individuals. Conceptually,
this should be very simple.

=head1 AUTHOR

Written by Ala Qumsieh I<aqumsieh@cpan.org>.

=head1 COPYRIGHTS

(c) 2003,2004 Ala Qumsieh. All rights reserved.
This module is distributed under the same terms as Perl itself.

=cut

=cut

Genetic/OpCrossover.pm  view on Meta::CPAN

The following method is defined:

=over

=item B<vectorSinglePoint>(I<Xprob, parent1, parent2>)

The first argument is the crossover rate. The second and third arguments are anonymous
lists that define the B<genes>
of the parents (not AI::Genetic::Individual objects, but the return value of the I<genes()>
method in scalar context).
If mating occurs, two anonymous lists of genes are returned corresponding to the two
new children. If no mating occurs, 0 is returned.

=back

=item Two Point

In two point crossover, two points are selected along the choromosomes of both parents.
The chromosomes are then cut at those points, and the middle parts are swapped,
creating two child chromosomes. The following method is defined:

=over

=item B<vectorTwoPoint>(I<Xprob, parent1, parent2>)

The first argument is the crossover rate. The second and third arguments are anonymous
lists that define the B<genes>
of the parents (not AI::Genetic::Individual objects, but the return value of the I<genes()>
method in scalar context).
If mating occurs, two anonymous lists of genes are returned corresponding to the two
new children. If no mating occurs, 0 is returned.

=back

=item Uniform

In uniform crossover, two child chromosomes are created by looking at each gene in both
parents, and randomly selecting which one to go with each child.
The following method is defined:

=over

=item B<vectorUniform>(I<Xprob, parent1, parent2>)

The first argument is the crossover rate. The second and third arguments are anonymous
lists that define the B<genes>
of the parents (not AI::Genetic::Individual objects, but the return value of the I<genes()>
method in scalar context).
If mating occurs, two anonymous lists of genes are returned corresponding to the two
new children. If no mating occurs, 0 is returned.

=back

=back

=head1 AUTHOR

Written by Ala Qumsieh I<aqumsieh@cpan.org>.

=head1 COPYRIGHTS

(c) 2003,2004 Ala Qumsieh. All rights reserved.
This module is distributed under the same terms as Perl itself.

=cut

Genetic/OpMutation.pm  view on Meta::CPAN


  return $genes;
}

# sub rangeVector():
# each gene is a floating number, and can be anything
# within a range of two numbers.
# arguments are mutation prob., anon list of genes,
# and anon list of ranges. Each element in $ranges is
# an anon list of two numbers, min and max value of
# the corresponding gene.

sub rangeVector {
  my ($prob, $ranges, $genes) = @_;

  my $i = -1;
  for my $g (@$genes) {
    $i++;
    next if rand > $prob;

    # now randomly choose another value from the range.

Genetic/OpMutation.pm  view on Meta::CPAN


  return $genes;
}

# sub listVector():
# each gene is a string, and can be anything
# from a list of possible values supplied by user.
# arguments are mutation prob., anon list of genes,
# and anon list of value lists. Each element in $lists
# is an anon list of the possible values of
# the corresponding gene.

sub listVector {
  my ($prob, $lists, $genes) = @_;

  my $i = -1;
  for my $g (@$genes) {
    $i++;
    next if rand > $prob;

    # now randomly choose another value from the lists.

Genetic/OpMutation.pm  view on Meta::CPAN

identical to the given ones.

=back

=head1 AUTHOR

Written by Ala Qumsieh I<aqumsieh@cpan.org>.

=head1 COPYRIGHTS

(c) 2003,2004 Ala Qumsieh. All rights reserved.
This module is distributed under the same terms as Perl itself.

=cut

Genetic/OpSelection.pm  view on Meta::CPAN

=back

=back

=head1 AUTHOR

Written by Ala Qumsieh I<aqumsieh@cpan.org>.

=head1 COPYRIGHTS

(c) 2003,2004 Ala Qumsieh. All rights reserved.
This module is distributed under the same terms as Perl itself.

=cut

README  view on Meta::CPAN

             my $fitness;
             # assign a number to $fitness based on the @$genes
             # ...

             return $fitness;
          }

          sub terminateFunc {
             my $ga = shift;

             # terminate if reached some threshold.
             return 1 if $ga->getFittest->score > $THRESHOLD;
             return 0;
          }

DESCRIPTION
    This module implements a Genetic Algorithm (GA) in pure Perl. Other Perl
    modules that achieve the same thing (perhaps better, perhaps worse) do
    exist. Please check CPAN. I mainly wrote this module to satisfy my own
    needs, and to learn something about GAs along the way.

    PLEASE NOTE: As of v0.02, AI::Genetic has been re-written from scratch
    to be more modular and expandable. To achieve this, I had to modify the
    API, so it is not backward-compatible with v0.01. As a result, I do not
    plan on supporting v0.01.

    I will not go into the details of GAs here, but here are the bare
    basics. Plenty of information can be found on the web.

    In a GA, a population of individuals compete for survival. Each
    individual is designated by a set of genes that define its behaviour.
    Individuals that perform better (as defined by the fitness function)
    have a higher chance of mating with other individuals. When two
    individuals mate, they swap some of their genes, resulting in an
    individual that has properties from both of its "parents". Every now and
    then, a mutation occurs where some gene randomly changes value,
    resulting in a different individual. If all is well defined, after a few
    generations, the population should converge on a "good-enough" solution
    to the problem being tackled.

    A GA implementation runs for a discrete number of time steps called
    *generations*. What happens during each generation can vary greatly
    depending on the strategy being used (See the section on "STRATEGIES"
    for more info). Typically, a variation of the following happens at each
    generation:

    1. Selection
        Here the performance of all the individuals is evaluated based on
        the fitness function, and each is given a specific fitness value.
        The higher the value, the bigger the chance of an individual passing
        its genes on in future generations through mating (crossover).

    2. Crossover
        Here, individuals selected are randomly paired up for crossover (aka
        *sexual reproduction*). This is further controlled by the crossover
        rate specified and may result in a new offspring individual that
        contains genes common to both parents. New individuals are injected
        into the current population.

    3. Mutation
        In this step, each individual is given the chance to mutate based on
        the mutation probability specified. If an individual is to mutate,
        each of its genes is given the chance to randomly switch its value
        to some other state.

CLASS METHODS

README  view on Meta::CPAN

            bitvector.

                $ga->init(10);

            this initializes a population where each individual has 10
            genes.

        o   For listvectors, the argument is an anonymous list of lists. The
            number of sub-lists is equal to the number of genes of each
            individual. Each sub-list defines the possible string values
            that the corresponding gene can assume.

                $ga->init([
                           [qw/red blue green/],
                           [qw/big medium small/],
                           [qw/very_fat fat fit thin very_thin/],
                          ]);

            this initializes a population where each individual has 3 genes,
            and each gene can assume one of the given values.

        o   For rangevectors, the argument is an anonymous list of lists.
            The number of sub-lists is equal to the number of genes of each
            individual. Each sub-list defines the minimum and maximum
            integer values that the corresponding gene can assume.

                $ga->init([
                           [1, 5],
                           [0, 20],
                           [4, 9],
                          ]);

            this initializes a population where each individual has 3 genes,
            and each gene can assume an integer within the corresponding
            range.

    *$ga*->inject(*N*, ?*args*?)
        This method can be used to add more individuals to the population.
        New individuals can be randomly generated, or be explicitly
        specified. The first argument specifies the number, *N*, of new
        individuals to add. This can be followed by at most *N* arguments,
        each of which is an anonymous list that specifies the genome of a
        single individual to add. If the number of genomes given, *n*, is
        less than *N*, then *N* - *n* random individuals are added for a

README  view on Meta::CPAN

        specified strategy. A strategy name has to be specified as the first
        argument. The second argument is optional and specifies the number
        of generations to evolve. It defaults to 1. See the section on
        "STRATEGIES" for more information on the default strategies.

        Each generation consists of the following steps:

        o   The population is sorted according to the individuals'
            fitnesses.

        o   The subroutine corresponding to the named strategy is called
            with one argument, the AI::Genetic object. This subroutine is
            expected to alter the object itself.

        o   If a termination subroutine is given, it is executed and the
            return value is checked. Evolution terminates if this sub
            returns a true value.

    *$ga*->getFittest(?*N*?)
        This returns the *N* fittest individuals. If not specified, *N*
        defaults to 1. As a side effect, it sorts the population by fitness
        score. The actual AI::Genetic::Individual objects are returned. You
        can use the "genes()" and "score()" methods to get the genes and the
        scores of the individuals. Please check the AI::Genetic::Individual
        manpage for details.

    *$ga*->sortPopulation
        This method sorts the population according to fitness function. The
        results are cached for speed.

    *$ga*->sortIndividuals(?[*ListOfIndividuals*]?)
        Given an anonymous list of individuals, this method sorts them
        according to fitness, returning an anonymous list of the sorted
        individuals.

    *$ga*->people()
        Returns an anonymous list of individuals of the current population.
        IMPORTANT: the actual array reference used by the AI::Genetic object
        is returned, so any changes to it will be reflected in *$ga*.

README  view on Meta::CPAN

        *listvector*, or *rangevector*.

    *$ga*->generation()
        This method returns the current generation.

FITNESS FUNCTION
    Very quickly you will realize that properly defining the fitness
    function is the most important aspect of a GA. Most of the time that a
    genetic algorithm takes to run is spent in running the fitness function
    for each separate individual to get its fitness. AI::Genetic tries to
    minimize this time by caching the fitness result for each individual.
    But, you should spend a lot of time optimizing your fitness function to
    achieve decent run times.

    The fitness function should expect only one argument, an anonymous list
    of genes, corresponding to the individual being analyzed. It is expected
    to return a number which defines the fitness score of the said
    individual. The higher the score, the more fit the individual, the more
    the chance it has to be chosen for crossover.

STRATEGIES
    AI::Genetic comes with 9 predefined strategies. These are:

    rouletteSinglePoint
        This strategy implements roulette-wheel selection and single-point
        crossover.

README  view on Meta::CPAN

          }
        }

A NOTE ON SPEED/EFFICIENCY
    Genetic algorithms are inherently slow. Perl can be pretty fast, but
    will never reach the speed of optimized C code (at least my Perl coding
    will not). I wrote AI::Genetic mainly for my own learning experience,
    but still tried to optimize it as much as I can while trying to keep it
    as flexible as possible.

    To do that, I resorted to some well-known tricks like passing a
    reference of a long list instead of the list itself (for example, when
    calling the fitness function, a reference of the gene list is passed),
    and caching fitness scores (if you try to evaluate the fitness of the
    same individual more than once, then the fitness function will not be
    called, and the cached result is returned).

    To help speed up your run times, you should pay special attention to the
    design of your fitness function since this will be called once for each
    unique individual in each generation. If you can shave off a few clock
    cycles here and there, then it will be greatly magnified in the total
    run time.

BUGS
    I have tested this module quite a bit, and even used it to solve a
    work-related problem successfully. But, if you think you found a bug

README  view on Meta::CPAN

    Perl.

AUTHOR & CREDITS
    Written by Ala Qumsieh *aqumsieh@cpan.org*.

    Special thanks go to John D. Porter and Oliver Smith for stimulating
    discussions and great suggestions. Daniel Martin and Ivan Tubert-Brohman
    uncovered various bugs and for this I'm grateful.

COPYRIGHTS
    (c) 2003-2005 Ala Qumsieh. All rights reserved. This module is
    distributed under the same terms as Perl itself.



( run in 1.475 second using v1.01-cache-2.11-cpan-49f99fa48dc )