Algorithm-MLCS
view release on metacpan or search on metacpan
lib/Algorithm/MLCS.pm view on Meta::CPAN
for (@seq) {
while (@$_) { last if @$_ == ( shift @$_ )->{ $lcs[-1] } }
}
}
wantarray ? @lcs : scalar @lcs;
}
# Auxiliary function that gets single sequence arrayref and
# build specific data structure for further processing
# in order to find LCS
sub _build_seq {
my ( $seq, %dict, @seq_st ) = @_;
for ( 0 .. $#{$seq} ) { push @{ $dict{ $seq->[$_] } }, $_ }
for my $i ( 0 .. $#{$seq} ) {
my %tok;
for ( keys %dict ) {
$tok{$_} = @{$seq} - $dict{$_}[0];
if ( $dict{$_}[0] == $i ) {
shift @{ $dict{$_} };
delete $dict{$_} if !@{ $dict{$_} };
}
}
$seq_st[$i] = \%tok;
}
return \@seq_st;
}
# Auxiliary function that gets arrayref of sequences (arrayrefs),
# builds dictionary of unique tokens presented in all given sequences
# and returns the arrayref of new sequences with only tokens from dictionary
sub _get_dict {
my $seq = shift;
my %dict = map { $_ => 1 } @{ $seq->[0] };
for ( @{$seq}[ 1 .. $#{$seq} ] ) {
%dict = map { $_ => 1 } grep { $dict{$_} } @$_;
last unless %dict;
}
return map { [ grep { $dict{$_} } @$_ ] } @{$seq};
}
1;
=head1 NAME
Algorithm::MLCS - Fast heuristic algorithm for finding Longest Common Subsequence
of multiple sequences
=head1 VERSION
Version 1.02
=head1 SYNOPSIS
use Data::Dumper;
use Algorithm::MLCS;
my @seqs = (
[ qw/a b c d f g h j q z/ ],
[ qw/a b c d f g h j q z/ ],
[ qw/a b c x f h j q z/ ],
[ qw/a b c f g j q z/ ],
);
my @lcs = lcs( \@seqs );
my $lcs_length = lcs( \@seqs );
print Dumper( \@lcs );
=head1 ABSTRACT
Finding the longest common subsequence (LCS) for the general case of an arbitrary
number of input sequences is an NP-hard problem. Algorithm::MLCS implements a fast
heuristic algorithm that addresses the general case of multiple sequences.
It is able to extract common subsequence that is close to the optimal ones.
=head1 METHODS
=head2 lcs ( \@seqs )
Finds a Longest Common Subsequence of multiple sequences given by @seqs arrayref.
Each element of @seqs is arrayref that represents the one of multiple sequences
(e.g. [ ['a', 'b', 'c'], ['a', 'c', 'd', 'e'], ... ]). In list context it returns
LCS array, in scalar - the length of LCS.
=head1 SEE ALSO
Algorithm::LCS
=head1 AUTHOR
Slava Moiseev, C<< <slava.moiseev at yahoo.com> >>
=head1 LICENSE AND COPYRIGHT
Copyright 2012 Slava Moiseev.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
( run in 0.834 second using v1.01-cache-2.11-cpan-39bf76dae61 )