AI-Gene-Sequence

 view release on metacpan or  search on metacpan

demo/Regexgene.pm  view on Meta::CPAN

  our ($VERSION, @ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS);
  $VERSION     = 0.01;
  @ISA         = qw(Exporter AI::Gene::Sequence);
  @EXPORT      = ();
  %EXPORT_TAGS = ();
  @EXPORT_OK   = qw();
}
our @EXPORT_OK;

=head2 Globals

We have a load of globals, these form the basis of our token types, anything
from the same array, is the same type eg.

 @modifiers = qw( * *? + +? ?? ); # are of type 'm' for modifier

=cut

our @modifiers  = qw( * *? + +? ?? );
our @char_types = qw( \w \W \d \D \s \S .);
our @ranges     = qw( [A-Z] [a-z] [0-9] );
our @chars      = ((0..9,'a'..'z','A'..'Z','_'),
                  (map '\\'.chr, 32..47, 58..64, 91..94, 96, 123..126));


=head2 Constructor

As we want to be able to fill our regular expression at the same
time as we create it and because we will want to nest sequences
we will need some way to know how deep we are, then a different
B<new> method is needed.

If called as an object method (C<$obj->new>), this decreases the depth
count by one from the invoking object.  It also adds tokens to the regular
expression and uses the B<valid_gene> method to ensure we stay sane 
from the start.

As can be seen, we use array offsets above $self->[1] to store information
which is specific to our implementation.

=cut

sub new {
  my $gene = ['',[], ref($_[0]) ? $_[0]->[2]-1 : 3 ]; # limit recursion
  bless $gene, ref($_[0]) || $_[0];
  my $length = $_[1] || 5;
  for (1..$length) {
    my @token = $gene->generate_token();
    my $new = $gene->[0] . $token[0];
    redo unless $gene->valid_gene($new); # hmmmm, enter turing from the wings
    $gene->[0] = $new;
    push @{$gene->[1]}, $token[1];
  }
  return $gene;
}

=head2 clone

As we are going to allow nested sequences, then we need to make sure that
when we copy an object we create new versions of everthing, rather than
reusing pointers to data used by other objects.

=cut

sub clone {
  my $self = shift;
  my $new = bless [$self->[0], [], $self->[2]], ref($self);
  @{$new->[1]} = map {ref($_) ? $_->clone : $_} @{$self->[1]}; # woohoo, recursive objects
  return $new;
}

=head2 generate_token

This is where we really start needing to have our own implementation.
This method is used by AI::Gene::Sequence when it needs a new
token, we also use it ourselves when we create a new object, but we
did not have to.

If we are provided with a token type, we use that, otherwise we chose
one at random.  We make sure that we return a two element list.
If we had wanted, when passed a type of 'g' along with a second
argument, we could have caused this method to mutate the nested
regex, instead, we just create a different one.

=cut

sub generate_token {
  my $self = shift;
  my $type = $_[0] || (qw(m t c r a g))[rand 6];
  my @rt;
  $rt[0] = $type;
 SWITCH: for ($type) {
    /^m/ && do {$rt[1] = $modifiers[rand@modifiers]  ;last SWITCH}; # modifier
    /^t/ && do {$rt[1] = $char_types[rand@char_types];last SWITCH}; # type
    /^c/ && do {$rt[1] = $chars[rand@chars]          ;last SWITCH}; # lone char
    /^r/ && do {$rt[1] = $ranges[rand@ranges]        ;last SWITCH}; # range
    /^a/ && do {$rt[1] = '|'                         ;last SWITCH}; # altern
    /^g/ && do {
      if ($self->[2] > 0) {  # recursion avoidance...
	$rt[1] = $self->new;
      }
      else {
	$rt[1] = $chars[rand@chars];
      }
      ;last SWITCH}; # grouping
    die "Unknown type of regex token ($type)";
  }
  return @rt[0,1];
}

# returns true if a valid regex, otherwise false, ignores optional posn arg

=head2 valid_gene

Because we have restricted ourselves to simple regular
expressions we only need to make sure that modifers and alternation
do not follow or precede things they should not.

Note that we do not use the current version of the gene in $self->[0].
This is because it is the un-mutated version, if we do not accept the
mutation then $self will be left alone by our calling methods.



( run in 1.743 second using v1.01-cache-2.11-cpan-39bf76dae61 )