Algorithm-NeedlemanWunsch

 view release on metacpan or  search on metacpan

lib/Algorithm/NeedlemanWunsch.pm  view on Meta::CPAN

=item align

Aligns two sequence items. The callback is called with 2 arguments,
which are the positions of the paired items in C<\@a> and C<\@b>,
respectively.

=item shift_a

Aligns an item of the first sequence with a gap in the second
sequence. The callback is called with 1 argument, which is the
position of the item in C<\@a>.

=item shift_b

Aligns a gap in the first sequence with an item of the second
sequence. The callback is called with 1 argument, which is the
position of the item in C<\@b>.

=item select_align

Called when there's more than one way to construct the optimal
alignment, with 1 argument which is a hashref enumerating the
possibilities. The hash may contain the following keys:

=over

=item align

If this key exists, the optimal alignment may align two sequence
items. The key's value is an arrayref with the positions of the paired
items in C<\@a> and C<\@b>, respectively.

=item shift_a

If this key exists, the optimal alignment may align an item of the
first sequence with a gap in the second sequence. The key's value is
the position of the item in C<\@a>.

=item shift_b

If this key exists, the optimal alignment may align a gap in the first
sequence with an item of the second sequence. The key's value is
the position of the item in C<\@b>.

=back

All keys are optional, but the hash will always have at least one. The
callback must select one of the possibilities by returning one of the
keys.

=back

All callbacks are optional. When there is just one way to make the
optimal alignment, the C<Algorithm::NeedlemanWunsch> object prefers
calling the specific callbacks, but will call C<select_align> if it's
defined and the specific callback isn't.

Note that C<select_align> is called I<instead> of the specific
callbacks, not in addition to them - users defining both
C<select_align> and other callbacks should probably call the specific
callback explicitly from their C<select_align>, once it decides which
one to prefer.

Also note that the passed positions move backwards, from the sequence
ends to zero - if you're building the alignment in your callbacks, add
items to the front.

=head2 Extensions

In addition to the standard Needleman-Wunsch algorithm, this module
also implements two popular extensions: local alignment and affine
block gap penalties. Use of both extensions is controlled by setting
the properties of C<Algorithm::NeedlemanWunsch> object described
below.

=head3 local

When this flag is set before calling
C<Algorithm::NeedlemanWunsch::align>, the alignment scoring doesn't
charge the gap penalty for gaps at the beginning (i.e. before the
first item) and end (after the last item) of the second sequence
passed to C<align>, so that for example the optimal (with identity
matrix as similarity matrix and a negative gap penalty) alignment of
C<a b c d e f g h> and C<b c h> becomes

  sequence A: a b c d e f g h
                | |
  sequence B:   b c h

instead of the global

  sequence A: a b c d e f g h
                | |         |
  sequence B: - b c - - - - h

Note that local alignment is asymmetrical - when using it, the longer
sequence should be the first passed to
C<Algorithm::NeedlemanWunsch::align>.

=head3 gap_open_penalty, gap_extend_penalty

Using the same gap penalty for every gap has the advantage of
simplicity, but some applications may want a more complicated
approach. Biologists, for example, looking for gaps longer than one
DNA sequence base, typically want to distinguish a gap opening
(costly) from more missing items following it (shouldn't cost so
much). That requirement can be modelled by charging 2 gap penalties:
C<gap_open_penalty> for the first gap, and then C<gap_extend_penalty>
for each consecutive gap on the same sequence.

Note that you must set both these properties if you set any of them
and that C<gap_open_penalty> must be less than C<gap_extend_penalty>
(if you know of a use case where the gap opening penalty should be
preferred to gap extension, let me know). With such penalties set
before calling C<Algorithm::NeedlemanWunsch::align>, sequences C<A T G
T A G T G T A T A G T A C A T G C A> and C<A T G T A G T A C A T G C
A> are aligned

  sequence A: A T G T A G T G T A T A G T A C A T G C A
              | | |               | | | | | | | | | | |
  sequence B: A T G - - - - - - - T A G T A C A T G C A

i.e. with all gaps bunched together.

=head1 SEE ALSO

L<Algorithm::Diff>, L<Text::WagnerFischer>

=head1 AUTHOR

Vaclav Barta, C<< <vbar@comp.cz> >>

=head1 BUGS

Please report any bugs or feature requests to
C<bug-algorithm-needlemanwunsch at rt.cpan.org>, or through the web
interface at
L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Algorithm-NeedlemanWunsch>.
I will be notified, and then you'll automatically be notified of
progress on your bug as I make changes.

=head1 COPYRIGHT & LICENSE

Copyright 2007-2013 Vaclav Barta, all rights reserved.

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

=head1 CREDITS

The algorithm is defined by Saul Needleman and Christian Wunsch in "A
general method applicable to the search for similarities in the amino
acid sequence of two proteins", J Mol Biol. 48(3):443-53.

This implementation is based mostly on
L<http://www.ludwig.edu.au/course/lectures2005/Likic.pdf>, local
alignment is from
L<http://www.techfak.uni-bielefeld.de/bcd/Curric/PrwAli/node6.html>.

=cut



( run in 2.435 seconds using v1.01-cache-2.11-cpan-cdf2f3d4e48 )