AI-Gene-Sequence
view release on metacpan or search on metacpan
AI/Gene/Sequence.pm view on Meta::CPAN
If you do not need to keep check on what sort of thing is
currently occupying a slot in the gene, you would be better
off using the AI::Gene::Simple class instead as this
will be somewhat faster. The interface to the mutations is
the same though, so if you need to change in future, then
it will not be too painful.
This module should not be confused with the I<bioperl> modules
which are used to analyse DNA sequences.
It is intended that the methods in this code are inherited
by other modules.
=head2 Anatomy of a gene
A gene is a sequence of tokens, each a member of some group
of simillar tokens (they can of course all be members of a
single group). This module encodes genes as a string
representing token types, and an array containing the
tokens themselves, this allows for arbitary data to be
stored as a token in a gene.
For instance, a regular expression could be encoded as:
$self = ['ccartm',['a', 'b', '|', '[A-Z]', '\W', '*?'] ]
Using a string to indicate the sort of thing held at the
corresponding part of the gene allows for a simple test
of the validity of a proposed gene by using a regular
expression.
=head2 Using the module
To use the genetic sequences, you must write your own
implementations of the following methods:
=over 4
=item generate_token
=item valid_gene
=back
You may also want to override the following methods:
=over 4
=item new
=item clone
=item render_gene
=back
=head2 Mutation methods
Mutation methods are all named C<mutate_*>. In general, the
first argument will be the number of mutations required, followed
by the positions in the genes which should be affected, followed
by the lengths of sequences within the gene which should be affected.
If positions are not defined, then random ones are chosen. If
lengths are not defined, a length of 1 is assumed (ie. working on
single tokens only), if a length of 0 is requested, then a random
length is chosen.
Also, if a mutation is suggested but would result in an invalid
sequence, then the mutation will not be carried out.
If a mutation is attempted which could corrupt your gene (copying
from a region beyond the end of the gene for instance) then it
will be silently skipped. Mutation methods all return the number
of mutations carried out (not the number of tokens affected).
These methods all expect to be passed positive integers, undef or zero,
other values could (and likely will) do something unpredictable.
=over 4
=item C<mutate([num, ref to hash of probs & methods])>
This will call at random one of the other mutation methods.
It will repeat itself I<num> times. If passed a reference
to a hash as its second argument, it will use that to
decide which mutation to attempt.
This hash should contain keys which fit $1 in C<mutate_(.*)>
and values indicating the weight to be given to that method.
The module will normalise this nicely, so you do not have to.
This lets you define your own mutation methods in addition to
overriding any you do not like in the module.
=item C<mutate_insert([num, pos])>
Inserts a single token into the string at position I<pos>.
The token will be randomly generated by the calling object's
C<generate_token> method.
=item C<mutate_overwrite([num, pos1, pos2, len])>
Copies a section of the gene (starting at I<pos1>, length I<len>)
and writes it back into the gene, overwriting current elements,
starting at I<pos2>.
=item C<mutate_reverse([num, pos, len])>
Takes a sequence within the gene and reverses the ordering of the
elements within that sequence. Starts at position I<pos> for
length I<len>.
=item C<mutate_shuffle([num, pos1, pos2, len])>
This takes a sequence (starting at I<pos1> length I<len>)
from within a gene and moves
it to another position (starting at I<pos2>). Odd things might occur if the
position to move the sequence into lies within the
section to be moved, but the module will try its hardest
to cause a mutation.
=item C<mutate_duplicate([num, pos1, pos2, length])>
( run in 1.695 second using v1.01-cache-2.11-cpan-13bb782fe5a )