Algorithm-NeedlemanWunsch
view release on metacpan or search on metacpan
lib/Algorithm/NeedlemanWunsch.pm view on Meta::CPAN
the other sequence, or corresponds to a I<gap> (which is always
aligned with an item - aligning two gaps wouldn't help anything). This
approach is especially suitable for comparing sequences of comparable
length and somewhat similar along their whole lengths - that is,
without long stretches that have nothing to do with each other. If
your sequences don't satisfy these requirements, consider using local
alignment, which, strictly speaking, isn't Needleman-Wunsch, but is
similar enough to be implemented in this module as well - see below
for details.
In the example above, the second alignment has more gaps than the
first, but perhaps your a's are structurally important and you like
them lined up so much that you'd still prefer the second
alignment. Conversely, if c is "almost the same" as g, it might be the
first alignment that matches better. Needleman-Wunsch formalizes such
considerations into a I<similarity matrix>, assigning payoffs to each
(ordered, but the matrix is normally symmetrical so that the order
doesn't matter) pair of possible sequence items, plus a I<gap
penalty>, quantifying the desirability of a gap in a sequence. A
preference of pairings over gaps is expressed by a low (relative to
the similarity matrix values, normally negative) gap penalty.
The alignment score is then defined as the sum, over the positions
where at least one sequence has an item, of the similarity matrix
values indexed by the first and second item (when both are defined)
and gap penalties (for items aligned with a gap). For example, if C<S>
is the similarity matrix and C<g> denotes the gap penalty, the
alignment
sequence A: a a t t c c
sequence B: a - - - t c
has score C<S[a, a] + 3 * g + S[c, t] + S[c, c]>.
lib/Algorithm/NeedlemanWunsch.pm view on Meta::CPAN
=head2 Standard algorithm
=head3 new(\&score_sub [, $gap_penalty ])
The constructor. Takes one mandatory argument, which is a coderef to a
sub implementing the similarity matrix, plus an optional gap penalty
argument. If the gap penalty isn't specified as a constructor
argument, the C<Algorithm::NeedlemanWunsch> object gets it by calling
the scoring sub without arguments; apart from that case, the sub is
called with 2 arguments, which are items from the first and second
sequence, respectively, passed to
C<Algorithm::NeedlemanWunsch::align>. Note that the sub must be pure,
i.e. always return the same value when called with the same arguments.
=head3 align(\@a, \@b [, \%callbacks ])
The core of the algorithm. Creates a bottom-up dynamic programming
matrix, fills it with alignment scores and then traces back to find an
optimal alignment, informing the application about its items by
invoking the callbacks passed to the method.
lib/Algorithm/NeedlemanWunsch.pm view on Meta::CPAN
=over
=item align
Aligns two sequence items. The callback is called with 2 arguments,
which are the positions of the paired items in C<\@a> and C<\@b>,
respectively.
=item shift_a
Aligns an item of the first sequence with a gap in the second
sequence. The callback is called with 1 argument, which is the
position of the item in C<\@a>.
=item shift_b
Aligns a gap in the first sequence with an item of the second
sequence. The callback is called with 1 argument, which is the
position of the item in C<\@b>.
=item select_align
Called when there's more than one way to construct the optimal
alignment, with 1 argument which is a hashref enumerating the
possibilities. The hash may contain the following keys:
=over
=item align
If this key exists, the optimal alignment may align two sequence
items. The key's value is an arrayref with the positions of the paired
items in C<\@a> and C<\@b>, respectively.
=item shift_a
If this key exists, the optimal alignment may align an item of the
first sequence with a gap in the second sequence. The key's value is
the position of the item in C<\@a>.
=item shift_b
If this key exists, the optimal alignment may align a gap in the first
sequence with an item of the second sequence. The key's value is
the position of the item in C<\@b>.
=back
All keys are optional, but the hash will always have at least one. The
callback must select one of the possibilities by returning one of the
keys.
=back
lib/Algorithm/NeedlemanWunsch.pm view on Meta::CPAN
also implements two popular extensions: local alignment and affine
block gap penalties. Use of both extensions is controlled by setting
the properties of C<Algorithm::NeedlemanWunsch> object described
below.
=head3 local
When this flag is set before calling
C<Algorithm::NeedlemanWunsch::align>, the alignment scoring doesn't
charge the gap penalty for gaps at the beginning (i.e. before the
first item) and end (after the last item) of the second sequence
passed to C<align>, so that for example the optimal (with identity
matrix as similarity matrix and a negative gap penalty) alignment of
C<a b c d e f g h> and C<b c h> becomes
sequence A: a b c d e f g h
| |
sequence B: b c h
instead of the global
lib/Algorithm/NeedlemanWunsch.pm view on Meta::CPAN
=head3 gap_open_penalty, gap_extend_penalty
Using the same gap penalty for every gap has the advantage of
simplicity, but some applications may want a more complicated
approach. Biologists, for example, looking for gaps longer than one
DNA sequence base, typically want to distinguish a gap opening
(costly) from more missing items following it (shouldn't cost so
much). That requirement can be modelled by charging 2 gap penalties:
C<gap_open_penalty> for the first gap, and then C<gap_extend_penalty>
for each consecutive gap on the same sequence.
Note that you must set both these properties if you set any of them
and that C<gap_open_penalty> must be less than C<gap_extend_penalty>
(if you know of a use case where the gap opening penalty should be
preferred to gap extension, let me know). With such penalties set
before calling C<Algorithm::NeedlemanWunsch::align>, sequences C<A T G
T A G T G T A T A G T A C A T G C A> and C<A T G T A G T A C A T G C
A> are aligned
sequence A: A T G T A G T G T A T A G T A C A T G C A
$ob = $b[$j] . $ob;
}
sub prepend_first_only {
my $i = shift;
$oa = $a[$i] . $oa;
$ob = "-$ob";
}
sub prepend_second_only {
my $j = shift;
$oa = "-$oa";
$ob = $b[$j] . $ob;
}
$matcher->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only,
});
is($oa, 'ATGTAGTGTATAGTACATGCA');
is($ob, 'ATG-------TAGTACATGCA');
my @t = @a; @a = @b; @b = @t;
$oa = '';
$ob = '';
$matcher->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only,
});
is($oa, 'ATG-------TAGTACATGCA');
is($ob, 'ATGTAGTGTATAGTACATGCA');
$ob = $b[$j] . $ob;
}
sub prepend_first_only {
my $i = shift;
$oa = $a[$i] . $oa;
$ob = "-$ob";
}
sub prepend_second_only {
my $j = shift;
$oa = "-$oa";
$ob = $b[$j] . $ob;
}
my $matcher = Algorithm::NeedlemanWunsch->new(\&kronecker, 0);
my $score = $matcher->align([ ], [ ]);
is($score, 0);
is($score, 0);
@a = qw(a b c);
@b = qw(d e);
$oa = '';
$ob = '';
$score = $matcher->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, 0);
is($oa, 'abc');
is ($ob, '-de');
my $float_gap = -3.14;
my $eps = 0.0001;
$matcher = Algorithm::NeedlemanWunsch->new(\&kronecker, $float_gap);
$score = $matcher->align([ 1 ], [ ]);
@a = ( 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1 );
@b = ( 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0 );
$matcher = Algorithm::NeedlemanWunsch->new(\&simple);
$matcher->gap_open_penalty(-1);
$matcher->gap_extend_penalty(0);
$score = $matcher->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, 3);
is($oa, '-10--11--00011111');
is($ob, '010011100000-----');
$oa = '';
$ob = '';
my @t = @a; @a = @b; @b = @t;
$score = $matcher->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, 3);
is($oa, '-01--00--11100000');
is($ob, '101100011111-----');
$oa = '';
$ob = '';
$matcher->local(1);
$score = $matcher->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, 5);
is($oa, '0---100---11100000');
is($ob, '-101100011111-----');
unshift @foreground, $needle[$j];
}
sub prepend_first_only {
my $i = shift;
unshift @background, $haystack[$i];
unshift @foreground, '-';
}
sub prepend_second_only {
my $j = shift;
unshift @background, '-';
unshift @foreground, $needle[$j];
}
my $matcher = Algorithm::NeedlemanWunsch->new(\&score_sub, -1);
$matcher->local(1);
my $score = $matcher->align(\@haystack, \@needle,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, 5);
is_deeply(\@background,
[ 'div', 'span', 'font', '/font', '/span', '-', 'a', '/a',
'table', 'tr', 'td', 'font', '-', 'span', '/span',
'a', '/a', 'br', 'span', 'a', '/a', '/span', 'br', 'br',
'span', '/span', 'nobr', 'a', '/a', '/nobr', '/font',
'/td', '/tr', '/table', '/div' ]);
is_deeply(\@foreground,
[ '-', '-', '-', '-', '-', 'div', 'a', '/a', 'table', 'tr',
@background = ();
@foreground = ();
$matcher = Algorithm::NeedlemanWunsch->new(\&score_sub);
$matcher->local(1);
$matcher->gap_open_penalty(-2);
$matcher->gap_extend_penalty(-1);
$score = $matcher->align(\@haystack, \@needle,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, 3);
is_deeply(\@background,
[ 'div', 'span', 'font', '/font', '/span', 'a', '/a',
'table', 'tr', 'td', 'font', 'span', '/span',
'a', '/a', 'br', 'span', '-', 'a', '/a', '-', '-', '/span',
'br', 'br', 'span', '/span', 'nobr', '-', '-', 'a', '/a',
'/nobr', '/font', '/td', '/tr', '/table', '/div' ]);
is_deeply(\@foreground,
[ '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-',
unshift @alignment, [$a[$i], $b[$j]];
}
sub prepend_first_only {
my $i = shift;
unshift @alignment, [$a[$i], undef];
}
sub prepend_second_only {
my $j = shift;
unshift @alignment, [undef, $b[$j]];
}
my $simple = Algorithm::NeedlemanWunsch->new(\&simple_scheme);
@alignment = ();
my $score = $simple->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
my $expected = [ [ 'A', 'A' ], [ 'T', 'T' ], [ 'G', 'G' ], [ undef, 'A' ],
[ 'G', undef ], [ 'C', undef ], [ 'G', 'G' ], [ 'T', 'T' ] ];
is($score, 5);
is_deeply(\@alignment, $expected);
$simple->local(1);
@alignment = ();
$score = $simple->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, 5);
is_deeply(\@alignment, $expected);
$simple = Algorithm::NeedlemanWunsch->new(\&simple_scheme, -5);
@alignment = ();
$score = $simple->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, -1);
is_deeply(\@alignment,
[ [ 'A', 'A' ], [ 'T', 'T' ], [ 'G', undef ], [ 'G', 'G' ],
[ 'C', 'A' ], [ 'G', 'G' ], [ 'T', 'T' ] ]);
$simple->local(1);
@alignment = ();
$score = $simple->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, 0);
is_deeply(\@alignment,
[ [ 'A', undef ], [ 'T', 'A' ], [ 'G', 'T' ], [ 'G', 'G' ],
[ 'C', 'A' ], [ 'G', 'G' ], [ 'T', 'T' ] ]);
sub postpone_gap {
my $arg = shift;
if (exists($arg->{shift_a})) {
prepend_first_only($arg->{shift_a});
return 'shift_a';
} elsif (exists($arg->{shift_b})) {
prepend_second_only($arg->{shift_b});
return 'shift_b';
} else {
prepend_align(@{$arg->{align}});
return 'align';
}
}
$simple->local(0);
@alignment = ();
$score = $simple->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only,
select_align => \&postpone_gap
});
is($score, -1);
is_deeply(\@alignment,
[ [ 'A', 'A' ], [ 'T', 'T' ], [ 'G', 'G' ], [ 'G', 'A' ],
[ 'C', undef ], [ 'G', 'G' ], [ 'T', 'T' ] ]);
$simple->local(1);
@alignment = ();
$score = $simple->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only,
select_align => \&postpone_gap
});
is($score, 0);
is_deeply(\@alignment,
[ [ 'A', 'A' ], [ 'T', 'T' ], [ 'G', 'G' ], [ 'G', 'A' ],
[ 'C', 'G' ], [ 'G', 'T' ], [ 'T', undef ] ]);
my $evo = Algorithm::NeedlemanWunsch->new(\&evo_scheme);
@alignment = ();
$score = $evo->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, 11);
$expected = [ [ 'A', 'A' ], [ 'T', 'T' ], [ 'G', 'G' ], [ 'G', 'A' ],
[ 'C', undef ], [ 'G', 'G' ], [ 'T', 'T' ] ];
is_deeply(\@alignment, $expected);
$evo->local(1);
@alignment = ();
$score = $evo->align(\@a, \@b,
{
align => \&prepend_align,
shift_a => \&prepend_first_only,
shift_b => \&prepend_second_only
});
is($score, 11);
is_deeply(\@alignment,, $expected);
# sequences & scoring from
# http://sedefcho.icnhost.net/web/algorithms/needleman_wunsch.html
my $index = { A => 0, G => 1, C => 2, T => 3 };
my $matrix = [ ];
push @$matrix, [ qw(10 -1 -3 -4) ];
$ob = $b[$j] . $ob;
}
sub prepend_first_only2 {
my $i = shift;
$oa = $a[$i] . $oa;
$ob = "-$ob";
}
sub prepend_second_only2 {
my $j = shift;
$oa = "-$oa";
$ob = $b[$j] . $ob;
}
$evo = Algorithm::NeedlemanWunsch->new(\&fine_scheme);
$oa = '';
$ob = '';
@a = qw(A G A C T A G T T A C);
@b = qw(C G A G A C G T);
$score = $evo->align(\@a, \@b,
{
align => \&prepend_align2,
shift_a => \&prepend_first_only2,
shift_b => \&prepend_second_only2
});
is($score, 16);
is($oa, '--AGACTAGTTAC');
is($ob, 'CGAGAC--G-T--');
$evo->local(1);
$oa = '';
$ob = '';
$score = $evo->align(\@a, \@b,
{
align => \&prepend_align2,
shift_a => \&prepend_first_only2,
shift_b => \&prepend_second_only2
});
is($score, 31);
is($oa, '--AGACTAGTTAC');
is($ob, 'CGAGAC--GT---');
$evo->local(0);
sub select_align2 {
my $arg = shift;
if (exists($arg->{align})) {
prepend_align2(@{$arg->{align}});
return 'align';
} elsif (exists($arg->{shift_a})) {
prepend_first_only2($arg->{shift_a});
return 'shift_a';
} else {
prepend_second_only2($arg->{shift_b});
return 'shift_b';
}
}
$oa = '';
$ob = '';
@a = qw(A A G T A G A G);
@b = qw(T A C C G A T A T T A T);
$score = $evo->align(\@a, \@b, { select_align => \&select_align2 });
is($score, 16);
is($score, 16);
$oa = '';
$ob = '';
@a = qw(T A G C A C A C A A C);
@b = qw(A C G T A C G C G A C T A G T C);
$score = $evo->align(\@a, \@b,
{
align => \&prepend_align2,
shift_a => \&prepend_first_only2,
shift_b => \&prepend_second_only2
});
is($score, 38);
is($oa, 'TA-GCA--C-AC-AA-C');
is($ob, '-ACGTACGCGACTAGTC');
$oa = '';
$ob = '';
$evo->local(1);
$score = $evo->align(\@a, \@b,
{
align => \&prepend_align2,
shift_a => \&prepend_first_only2,
shift_b => \&prepend_second_only2
});
is($score, 43);
is($oa, 'TA-GCA--C-AC-AA-C');
is($ob, '-ACGTACGCGACTAGTC');
$evo->local(0);
$oa = '';
$ob = '';
@a = qw(A A G G A T A T A T G C);
@b = qw(T A C C G C T A);
$score = $evo->align(\@a, \@b,
{
align => \&prepend_align2,
shift_a => \&prepend_first_only2,
shift_b => \&prepend_second_only2
});
is($score, -3);
is($oa, '-AAGGATATATGC');
is($ob, 'TACCG-C-TA---');
$oa = '';
$ob = '';
@a = qw(G C C T A T G C C T);
@b = qw(A G T C T A G C T G A T A T T G);
$score = $evo->align(\@a, \@b,
{
align => \&prepend_align2,
shift_a => \&prepend_first_only2,
shift_b => \&prepend_second_only2
});
is($score, 27);
is($oa, '-GCCTA--TG-C-CT-');
is($ob, 'AGTCTAGCTGATATTG');
( run in 0.686 second using v1.01-cache-2.11-cpan-39bf76dae61 )