Algorithm-NeedlemanWunsch
view release on metacpan or search on metacpan
lib/Algorithm/NeedlemanWunsch.pm view on Meta::CPAN
=item align
Aligns two sequence items. The callback is called with 2 arguments,
which are the positions of the paired items in C<\@a> and C<\@b>,
respectively.
=item shift_a
Aligns an item of the first sequence with a gap in the second
sequence. The callback is called with 1 argument, which is the
position of the item in C<\@a>.
=item shift_b
Aligns a gap in the first sequence with an item of the second
sequence. The callback is called with 1 argument, which is the
position of the item in C<\@b>.
=item select_align
Called when there's more than one way to construct the optimal
alignment, with 1 argument which is a hashref enumerating the
possibilities. The hash may contain the following keys:
=over
=item align
If this key exists, the optimal alignment may align two sequence
items. The key's value is an arrayref with the positions of the paired
items in C<\@a> and C<\@b>, respectively.
=item shift_a
If this key exists, the optimal alignment may align an item of the
first sequence with a gap in the second sequence. The key's value is
the position of the item in C<\@a>.
=item shift_b
If this key exists, the optimal alignment may align a gap in the first
sequence with an item of the second sequence. The key's value is
the position of the item in C<\@b>.
=back
All keys are optional, but the hash will always have at least one. The
callback must select one of the possibilities by returning one of the
keys.
=back
All callbacks are optional. When there is just one way to make the
optimal alignment, the C<Algorithm::NeedlemanWunsch> object prefers
calling the specific callbacks, but will call C<select_align> if it's
defined and the specific callback isn't.
Note that C<select_align> is called I<instead> of the specific
callbacks, not in addition to them - users defining both
C<select_align> and other callbacks should probably call the specific
callback explicitly from their C<select_align>, once it decides which
one to prefer.
Also note that the passed positions move backwards, from the sequence
ends to zero - if you're building the alignment in your callbacks, add
items to the front.
=head2 Extensions
In addition to the standard Needleman-Wunsch algorithm, this module
also implements two popular extensions: local alignment and affine
block gap penalties. Use of both extensions is controlled by setting
the properties of C<Algorithm::NeedlemanWunsch> object described
below.
=head3 local
When this flag is set before calling
C<Algorithm::NeedlemanWunsch::align>, the alignment scoring doesn't
charge the gap penalty for gaps at the beginning (i.e. before the
first item) and end (after the last item) of the second sequence
passed to C<align>, so that for example the optimal (with identity
matrix as similarity matrix and a negative gap penalty) alignment of
C<a b c d e f g h> and C<b c h> becomes
sequence A: a b c d e f g h
| |
sequence B: b c h
instead of the global
sequence A: a b c d e f g h
| | |
sequence B: - b c - - - - h
Note that local alignment is asymmetrical - when using it, the longer
sequence should be the first passed to
C<Algorithm::NeedlemanWunsch::align>.
=head3 gap_open_penalty, gap_extend_penalty
Using the same gap penalty for every gap has the advantage of
simplicity, but some applications may want a more complicated
approach. Biologists, for example, looking for gaps longer than one
DNA sequence base, typically want to distinguish a gap opening
(costly) from more missing items following it (shouldn't cost so
much). That requirement can be modelled by charging 2 gap penalties:
C<gap_open_penalty> for the first gap, and then C<gap_extend_penalty>
for each consecutive gap on the same sequence.
Note that you must set both these properties if you set any of them
and that C<gap_open_penalty> must be less than C<gap_extend_penalty>
(if you know of a use case where the gap opening penalty should be
preferred to gap extension, let me know). With such penalties set
before calling C<Algorithm::NeedlemanWunsch::align>, sequences C<A T G
T A G T G T A T A G T A C A T G C A> and C<A T G T A G T A C A T G C
A> are aligned
sequence A: A T G T A G T G T A T A G T A C A T G C A
| | | | | | | | | | | | | |
sequence B: A T G - - - - - - - T A G T A C A T G C A
i.e. with all gaps bunched together.
=head1 SEE ALSO
L<Algorithm::Diff>, L<Text::WagnerFischer>
=head1 AUTHOR
Vaclav Barta, C<< <vbar@comp.cz> >>
=head1 BUGS
Please report any bugs or feature requests to
C<bug-algorithm-needlemanwunsch at rt.cpan.org>, or through the web
interface at
L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Algorithm-NeedlemanWunsch>.
I will be notified, and then you'll automatically be notified of
progress on your bug as I make changes.
=head1 COPYRIGHT & LICENSE
Copyright 2007-2013 Vaclav Barta, all rights reserved.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
=head1 CREDITS
The algorithm is defined by Saul Needleman and Christian Wunsch in "A
general method applicable to the search for similarities in the amino
acid sequence of two proteins", J Mol Biol. 48(3):443-53.
This implementation is based mostly on
L<http://www.ludwig.edu.au/course/lectures2005/Likic.pdf>, local
alignment is from
L<http://www.techfak.uni-bielefeld.de/bcd/Curric/PrwAli/node6.html>.
=cut
( run in 2.435 seconds using v1.01-cache-2.11-cpan-cdf2f3d4e48 )