Bio-MUST-Core

 view release on metacpan or  search on metacpan

lib/Bio/MUST/Core/Ali/Stash.pm  view on Meta::CPAN

id (non-whitespace containing string or accession) is unique across the
database, it can be used instead via the option C<<truncate_ids => 1>> of the
C<load> method (see the SYNOPSIS for an example).

While this class is more efficient than the standard C<Ali>, it is way slower
at reading large sequence databases than specialized external programs such as
NCBI C<blastdbcmd> working on indexed binary files. Thus, if you need more
performance, have a look at the C<Blast::Database> class from the
L<Bio::MUST::Drivers> distribution.

=head1 ATTRIBUTES

=head2 seqs

L<Bio::MUST::Core::Ali> object (required)

This required attribute contains the L<Bio::MUST::Core::Seq> objects that
populate the associated sequence database file. It should be initialized
through the class method C<load> (see the SYNOPSIS for an example).

For now, it provides the following methods: C<count_comments>,
C<all_comments>, C<get_comment>, C<guessing>, C<all_seq_ids>, C<has_uniq_ids>,
C<is_protein>, C<is_aligned>, C<get_seq>, C<get_seq_with_id> (see below),
C<first_seq>, C<all_seqs>, C<filter_seqs> and C<count_seqs> (see
L<Bio::MUST::Core::Ali>).

=head2 lookup

L<Bio::MUST::Core::IdList> object (auto)

This attribute is automatically initialized with the list indexing the
sequence ids of the internal C<Ali> object. Thus, it cannot be user-specified.

It provides the following method: C<index_for> (see
L<Bio::MUST::Core::IdList>). Yet, it is nearly a private method. Instead,
individual sequences should be accessed through the C<get_seq_with_id> method
(see below), while sequence batches should be recovered via user-specified
IdList objects (see the SYNOPSIS for an example).

=head1 ACCESSORS

=head2 get_seq_with_id

Returns a sequence of the Ali::Stash by its id. Note that sequence ids are
assumed to be unique in the corresponding database. If no sequence exists for
the specified id, this method will return C<undef>.

    my $id = 'Pyrus malus_3750@658052655';
    my $seq = $db->get_seq_with_id($id);
    croak "Seq $id not found in Ali::Stash!" unless defined $seq;

This method accepts just one argument (and not an array slice).

It is a faster implementation of the same method from the C<Ali> class.

=head1 I/O METHODS

=head2 load

Class method (constructor) returning a new Ali::Stash read from disk. As in
C<Ali>, this method will transparently import plain FASTA files in addition to
the MUST pseudo-FASTA format (ALI files).

    # load database
    my $db = Stash->load( 'database.fasta' );

    # alternatively... (indexing only accessions)
    my $db = Stash->load( 'database.fasta', { truncate_ids => 1 } );

This method requires one argument and accepts a second optional argument
controlling the way sequence ids are processed. It is a hash reference that
may only contain the following key:

    - truncate_ids: consider only the first id word (accession)

=head1 AUTHOR

Denis BAURAIN <denis.baurain@uliege.be>

=head1 COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by University of Liege / Unit of Eukaryotic Phylogenomics / Denis BAURAIN.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.

=cut



( run in 1.719 second using v1.01-cache-2.11-cpan-0d23b851a93 )