Bio-DB-SeqFeature
view release on metacpan or search on metacpan
lib/Bio/DB/SeqFeature/Store.pm view on Meta::CPAN
NOTE: This search will fail to find features that do not have a display name!
You can use fetch() or fetch_many() with the returned IDs to get to
the features themselves.
=cut
sub search_attributes {
my $self = shift;
my ($search_string,$attribute_names,$limit) = @_;
my $attribute_array = ref $attribute_names
&& ref $attribute_names eq 'ARRAY' ? $attribute_names : [$attribute_names];
return $self->_search_attributes($search_string,$attribute_array,$limit);
}
=head2 search_notes
Title : search_notes
Usage : @result_list = $db->search_notes("full text search string",$limit)
Function: Search the notes for a text string
Returns : array of results
Args : full text search string, and an optional feature limit
Status : public
Given a search string, this method performs a full-text search of the
"Notes" attribute and returns an array of results. Each row of the
returned array is a arrayref containing the following fields:
column 1 The display_name of the feature, suitable for passing to get_feature_by_name()
column 2 The text of the note
column 3 A relevance score.
column 4 The type
NOTE: This is equivalent to $db-E<gt>search_attributes('full text search
string','Note',$limit). This search will fail to find features that do
not have a display name!
=cut
###
# search_notes()
#
sub search_notes {
my $self = shift;
my ($search_string,$limit) = @_;
return $self->_search_attributes($search_string,['Note'],$limit);
}
=head2 types
Title : types
Usage : @type_list = $db->types
Function: Get all the types in the database
Returns : array of Bio::DB::GFF::Typename objects
Args : none
Status : public
=cut
sub types {
shift->throw_not_implemented;
}
=head2 insert_sequence
Title : insert_sequence
Usage : $success = $db->insert_sequence($seqid,$sequence_string,$offset)
Function: Inserts sequence data into the database at the indicated offset
Returns : true if successful
Args : see below
Status : public
This method inserts the DNA or protein sequence fragment
$sequence_string, identified by the ID $seq_id, into the database at
the indicated offset $offset. It is used internally by the GFF3Loader
to load sequence data from the files.
=cut
###
# insert_sequence()
#
# insert a bit of primary sequence into the database
#
sub insert_sequence {
my $self = shift;
my ($seqid,$seq,$offset) = @_;
$offset ||= 0;
$self->_insert_sequence($seqid,$seq,$offset);
}
=head2 fetch_sequence
Title : fetch_sequence
Usage : $sequence = $db->fetch_sequence(-seq_id=>$seqid,-start=>$start,-end=>$end)
Function: Fetch the indicated subsequene from the database
Returns : The sequence string (not a Bio::PrimarySeq object!)
Args : see below
Status : public
This method retrieves a portion of the indicated sequence. The arguments are:
Argument Value
-------- -----
-seq_id Chromosome, contig or other DNA segment
-seqid Synonym for -seq_id
-name Synonym for -seq_id
-start Start of range
-end End of range
-class Obsolete argument used for Bio::DB::GFF compatibility. If
specified will qualify the seq_id as "$class:$seq_id".
-bioseq Boolean flag; if true, returns a Bio::PrimarySeq object instead
of a sequence string.
You can call fetch_sequence using the following shortcuts:
$seq = $db->fetch_sequence('chr3'); # entire chromosome
$seq = $db->fetch_sequence('chr3',1000); # position 1000 to end of chromosome
$seq = $db->fetch_sequence('chr3',undef,5000); # position 1 to 5000
$seq = $db->fetch_sequence('chr3',1000,5000); # positions 1000 to 5000
lib/Bio/DB/SeqFeature/Store.pm view on Meta::CPAN
$self->{seqfeatureclass} = $class;
}
$d;
}
=head2 reindex
Title : reindex
Usage : $db->reindex
Function: reindex the database
Returns : nothing
Args : nothing
Status : public
This method will force the secondary indexes (name, location,
attributes, feature types) to be recalculated. It may be useful to
rebuild a corrupted database.
=cut
###
# force reindexing
#
sub reindex {
my $self = shift;
my $count = 0;
my $now;
my $last_time = time();
$self->_start_reindexing;
my $iterator = $self->get_seq_stream;
while (my $f = $iterator->next_seq) {
if (++$count %1000 == 0) {
$now = time();
my $elapsed = sprintf(" in %5.2fs",$now - $last_time);
$last_time = $now;
print STDERR "$count features indexed$elapsed...",' 'x60;
print STDERR -t STDOUT && !$ENV{EMACS} ? "\r" : "\n";
}
$self->_update_indexes($f);
}
$self->_end_reindexing;
}
=head2 attributes
Title : attributes
Usage : @a = $db->attributes
Function: Returns list of all known attributes
Returns : Returns list of all known attributes
Args : nothing
Status : public
=cut
sub attributes {
my $self = shift;
shift->throw_not_implemented;
}
=head2 start_bulk_update,finish_bulk_update
Title : start_bulk_update,finish_bulk_update
Usage : $db->start_bulk_update
$db->finish_bulk_update
Function: Activate optimizations for large number of insertions/updates
Returns : nothing
Args : nothing
Status : public
With some adaptors (currently only the DBI::mysql adaptor), these
methods signal the adaptor that a large number of insertions or
updates are to be performed, and activate certain optimizations. These
methods are called automatically by the
Bio::DB::SeqFeature::Store::GFF3Loader module.
Example:
$db->start_bulk_update;
for my $f (@features) {
$db->store($f);
}
$db->finish_bulk_update;
=cut
sub start_bulk_update { shift->_start_bulk_update(@_) }
sub finish_bulk_update { shift->_finish_bulk_update(@_) }
=head2 add_SeqFeature
Title : add_SeqFeature
Usage : $count = $db->add_SeqFeature($parent,@children)
Function: store a parent/child relationship between a $parent and @children
features that are already stored in the database
Returns : number of children successfully stored
Args : parent feature or primary ID and children features or primary IDs
Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS
If can_store_parentage() returns true, then some store-aware features
(e.g. Bio::DB::SeqFeature) will invoke this method to store
feature/subfeature relationships in a normalized table.
=cut
# these two are called only if _can_store_subFeatures() returns true
# _add_SeqFeature ($parent,@children)
sub add_SeqFeature { shift->_add_SeqFeature(@_) }
=head2 fetch_SeqFeatures
Title : fetch_SeqFeatures
Usage : @children = $db->fetch_SeqFeatures($parent_feature)
Function: return the immediate subfeatures of the indicated feature
Returns : list of subfeatures
Args : the parent feature and an optional list of children types
Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS
lib/Bio/DB/SeqFeature/Store.pm view on Meta::CPAN
$feature->primary_id($key);
$self->store($feature);
}
sub FETCH {
my $self = shift;
$self->fetch(@_);
}
sub FIRSTKEY {
my $self = shift;
$self->_firstid;
}
sub NEXTKEY {
my $self = shift;
my $lastkey = shift;
$self->_nextid($lastkey);
}
sub EXISTS {
my $self = shift;
my $key = shift;
$self->existsid($key);
}
sub DELETE {
my $self = shift;
my $key = shift;
$self->_deleteid($key);
}
sub CLEAR {
my $self = shift;
$self->_clearall;
}
sub SCALAR {
my $self = shift;
$self->_featurecount;
}
###################### TO BE IMPLEMENTED BY ADAPTOR ##########
=head2 _init_database
Title : _init_database
Usage : $success = $db->_init_database([$erase])
Function: initialize an empty database
Returns : true on success
Args : optional boolean flag to erase contents of an existing database
Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR
This method is the back end for init_database(). It must be
implemented by an adaptor that inherits from
Bio::DB::SeqFeature::Store. It returns true on success. @features = $db->features(-seqid=>'Chr1');
=cut
sub _init_database { shift->throw_not_implemented }
=head2 _store
Title : _store
Usage : $success = $db->_store($indexed,@objects)
Function: store seqfeature objects into database
Returns : true on success
Args : a boolean flag indicating whether objects are to be indexed,
and one or more objects
Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR
This method is the back end for store() and store_noindex(). It should
write the seqfeature objects into the database. If indexing is
requested, the features should be indexed for query and
retrieval. Otherwise the features should be stored without indexing
(it is not required that adaptors respect this).
If the object has no primary_id (undef), then the object is written
into the database and assigned a new primary_id. If the object already
has a primary_id, then the system will perform an update, replacing
whatever was there before.
In practice, the implementation will serialize each object using the
freeze() method and then store it in the database under the
corresponding primary_id. The object is then updated with the
primary_id.
=cut
# _store($indexed,@objs)
sub _store {
my $self = shift;
my $indexed = shift;
my @objs = @_;
$self->throw_not_implemented;
}
=head2 _fetch
Title : _fetch
Usage : $feature = $db->_fetch($primary_id)
Function: fetch feature from database
Returns : feature
Args : primary id
Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR
This method is the back end for fetch(). It accepts a primary_id and
returns a feature object. It must be implemented by the adaptor.
In practice, the implementation will retrieve the serialized
Bio::SeqfeatureI object from the database and pass it to the thaw()
method to unserialize it and synchronize the primary_id.
=cut
# _fetch($id)
sub _fetch { shift->throw_not_implemented }
=head2 _fetch_many
Title : _fetch_many
Usage : $feature = $db->_fetch_many(@primary_ids)
Function: fetch many features from database
Returns : feature
Args : primary id
Status : private -- does not need to be implemented
This method fetches many features specified by a list of IDs. The
default implementation simply calls _fetch() once for each
primary_id. Implementors can override it if needed for efficiency.
=cut
# _fetch_many(@ids)
# this one will fall back to many calls on fetch() if you don't
# override it
sub _fetch_many {
my $self = shift;
return map {$self->_fetch($_)} @_;
}
=head2 _update_indexes
Title : _update_indexes
Usage : $success = $db->_update_indexes($feature)
Function: update the indexes for a feature
Returns : true on success
Args : A seqfeature object
Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR
This method is called by reindex() to update the searchable indexes
for a feature object that has changed.
=cut
# this is called to index a feature
sub _update_indexes { shift->throw_not_implemented }
=head2 _start_reindexing, _end_reindexing
Title : _start_reindexing, _end_reindexing
Usage : $db->_start_reindexing()
$db->_end_reindexing
Function: flag that a series of reindexing operations is beginning/ending
Returns : true on success
Args : none
Status : MAY BE IMPLEMENTED BY AN ADAPTOR (optional)
These methods are called by reindex() before and immediately after a
series of reindexing operations. The default behavior is to do
nothing, but these methods can be overridden by an adaptor in order to
perform optimizations, turn off autocommits, etc.
=cut
# these do not necessary have to be overridden
# they are called at beginning and end of reindexing process
sub _start_reindexing {}
sub _end_reindexing {}
=head2 _features
Title : _features
Usage : @features = $db->_features(@args)
Function: back end for all get_feature_by_*() queries
Returns : list of features
Args : see below
Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR
This is the backend for features(), get_features_by_name(),
get_features_by_location(), etc. Arguments are as described for the
features() method, except that only the named-argument form is
recognized.
=cut
# bottleneck query generator
sub _features { shift->throw_not_implemented }
=head2 _search_attributes
Title : _search_attributes
Usage : @result_list = $db->_search_attributes("text search string",[$tag1,$tag2...],$limit)
Function: back end for the search_attributes() method
Returns : results list
Args : as per search_attributes()
Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR
See search_attributes() for the format of the results list. The only
difference between this and the public method is that the tag list is
guaranteed to be an array reference.
=cut
sub _search_attributes { shift->throw_not_implemented }
=head2 can_store_parentage
Title : can_store_parentage
Usage : $flag = $db->can_store_parentage
Function: return true if this adaptor can store parent/child relationships
Returns : boolean
Args : none
Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS
Override this method and return true if this adaptor supports the
_add_SeqFeature() and _get_SeqFeatures() methods, which are used for
storing feature parent/child relationships in a normalized
fashion. Default is false (parent/child relationships are stored in
denormalized form in each feature).
=cut
# return true here if the storage engine is prepared to store parent/child
# relationships using _add_SeqFeature and return them using _fetch_SeqFeatures
sub can_store_parentage { return; }
=head2 _add_SeqFeature
Title : _add_SeqFeature
Usage : $count = $db->_add_SeqFeature($parent,@children)
Function: store a parent/child relationship between $parent and @children
Returns : number of children successfully stored
Args : parent feature and one or more children
Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS
If can_store_parentage() returns true, then some store-aware features
(e.g. Bio::DB::SeqFeature) will invoke this method to store
feature/subfeature relationships in a normalized table.
=cut
sub _add_SeqFeature { shift->throw_not_implemented }
=head2 _fetch_SeqFeatures
Title : _fetch_SeqFeatures
Usage : @children = $db->_fetch_SeqFeatures($parent_feature)
Function: return the immediate subfeatures of the indicated feature
Returns : list of subfeatures
Args : the parent feature
Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS
If can_store_parentage() returns true, then some store-aware features
(e.g. Bio::DB::SeqFeature) will invoke this method to retrieve
feature/subfeature relationships from the database.
=cut
# _get_SeqFeatures($parent,@list_of_child_types)
sub _fetch_SeqFeatures {shift->throw_not_implemented }
=head2 _insert_sequence
Title : _insert_sequence
Usage : $success = $db->_insert_sequence($seqid,$sequence_string,$offset)
Function: Inserts sequence data into the database at the indicated offset
Returns : true if successful
Args : see below
Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR
This is the back end for insert_sequence(). Adaptors must implement
this method in order to store and retrieve nucleotide or protein
sequence.
=cut
sub _insert_sequence { shift->throw_not_implemented }
# _fetch_sequence() is similar to old dna() method
=head2 _fetch_sequence
Title : _fetch_sequence
Usage : $sequence = $db->_fetch_sequence(-seq_id=>$seqid,-start=>$start,-end=>$end)
Function: Fetch the indicated subsequence from the database
Returns : The sequence string (not a Bio::PrimarySeq object!)
Args : see below
Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR
This is the back end for fetch_sequence(). Adaptors must implement
this method in order to store and retrieve nucleotide or protein
sequence.
=cut
sub _fetch_sequence { shift->throw_not_implemented }
sub seq {
my $self = shift;
my ($seq_id,$start,$end) = @_;
if (my $a = $self->dna_accessor) {
return $a->can('seq') ? $a->seq($seq_id,$start,$end)
:$a->can('fetch_sequence')? $a->fetch_sequence($seq_id,$start,$end)
: undef;
}
else {
return $self->_fetch_sequence($seq_id,$start,$end);
}
}
=head2 _seq_ids
Title : _seq_ids
Usage : @ids = $db->_seq_ids()
Function: Return all sequence IDs contained in database
Returns : list of sequence Ids
Args : none
Status : TO BE IMPLEMENTED BY ADAPTOR
This method is invoked by seq_ids() to return all sequence IDs
(coordinate systems) known to the database.
=cut
sub _seq_ids { shift->throw_not_implemented }
=head2 _start_bulk_update,_finish_bulk_update
Title : _start_bulk_update, _finish_bulk_update
Usage : $db->_start_bulk_update
$db->_finish_bulk_update
Function: Activate optimizations for large number of insertions/updates
Returns : nothing
Args : nothing
Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTOR
These are the backends for start_bulk_update() and
finish_bulk_update(). The default behavior of both methods is to do
nothing.
=cut
# Optional flags to change behavior to optimize bulk updating.
sub _start_bulk_update { }
sub _finish_bulk_update { }
# for full TIE() interface - not necessary to implement in most cases
=head2 Optional methods needed to implement full TIEHASH interface
The core TIEHASH interface will work if just the _store() and _fetch()
methods are implemented. To support the full TIEHASH interface,
including support for keys(), each(), and exists(), the following
methods should be implemented:
=over 4
=item $id = $db-E<gt>_firstid()
Return the first primary ID in the database. Needed for the each()
function.
=item $next_id = $db-E<gt>_nextid($id)
Given a primary ID, return the next primary ID in the series. Needed
for the each() function.
=item $boolean = $db-E<gt>_existsid($id)
Returns true if the indicated primary ID is in the database. Needed
for the exists() function.
=item $db-E<gt>_deleteid($id)
Delete the feature corresponding to the given primary ID. Needed for
delete().
=item $db-E<gt>_clearall()
Empty the database. Needed for %tied_hash = ().
=item $count = $db-E<gt>_featurecount()
Return the number of features in the database. Needed for scalar
%tied_hash.
=back
=cut
sub _firstid { shift->throw_not_implemented }
sub _nextid { shift->throw_not_implemented }
sub _existsid { shift->throw_not_implemented }
sub _deleteid { shift->throw_not_implemented }
sub _clearall { shift->throw_not_implemented }
sub _featurecount { shift->throw_not_implemented }
=head1 Internal Methods
These methods are internal to Bio::DB::SeqFeature::Store and adaptors.
=head2 new_instance
Title : new_instance
Usage : $db = $db->new_instance()
Function: class constructor
Returns : A descendent of Bio::DB::SeqFeature::Store
Args : none
Status : internal
This method is called internally by new() to create a new
uninitialized instance of Bio::DB::SeqFeature::Store. It is used
internally and should not be called by application software.
=cut
sub new_instance {
my $class = shift;
return bless {},ref($class) || $class;
}
=head2 init
Title : init
Usage : $db->init(@args)
Function: initialize object
Returns : none
Args : Arguments passed to new()
Status : private
This method is called internally by new() to initialize a
newly-created object using the arguments passed to new(). It is to be
overridden by Bio::DB::SeqFeature::Store adaptors.
=cut
sub init {
my $self = shift;
$self->default_settings();
}
=head2 default_settings
Title : default_settings
Usage : $db->default_settings()
Function: set up default settings for the adaptor
Returns : none
Args : none
Status : private
This method is may be overridden by adaptors. It is responsible for
setting up object default settings.
=cut
lib/Bio/DB/SeqFeature/Store.pm view on Meta::CPAN
my ($seq_name,$start,$end,$types,$bins,$iterator) =
rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'],
['TYPES','TYPE','PRIMARY_TAG'],
'BINS',
'ITERATOR',
],@_);
my ($coverage,$tag) = $self->coverage_array(-seqid=> $seq_name,
-start=> $start,
-end => $end,
-type => $types,
-bins => $bins) or return;
my $score = 0;
for (@$coverage) { $score += $_ }
$score /= @$coverage;
my $feature = Bio::SeqFeature::Lite->new(-seq_id => $seq_name,
-start => $start,
-end => $end,
-type => $tag,
-score => $score,
-attributes =>
{ coverage => [$coverage] });
return $iterator
? Bio::DB::SeqFeature::Store::FeatureIterator->new($feature)
: $feature;
}
=head2 coverage_array
Title : coverage_array
Usage : $arrayref = $db->coverage_array(@args)
Function: returns a coverage summary across indicated region/type
Returns : an array reference
Args : see below
Status : public
This method is used to get coverage density information across a
region of interest. The arguments are identical to feature_summary,
except that instead of returning a Bio::SeqFeatureI object, it returns
an array reference of the desired number of bins. The value of each
element corresponds to the number of features in the bin.
Arguments:
Argument Description
-------- -----------
-seq_id Sequence ID for the region
-start Start of region
-end End of region
-type/-types Feature type of interest or array ref of types
-bins Number of bins across region. Defaults to 1000.
Note that this method uses an approximate algorithm that is only
accurate to 500 bp, so when dealing with bins that are smaller than
1000 bp, you may see some shifting of counts between adjacent bins.
=cut
sub coverage_array {
shift->throw_not_implemented;
}
package Bio::DB::SeqFeature::Store::FeatureIterator;
$Bio::DB::SeqFeature::Store::FeatureIterator::VERSION = '1.7.5';
sub new {
my $self = shift;
my @features = @_;
return bless \@features,ref $self || $self;
}
sub next_seq {
my $self = shift;
return unless @$self;
return shift @$self;
}
sub begin_work { }# noop
sub commit { }# noop
sub rollback { }# noop
1;
__END__
=head1 BUGS
This is an early version, so there are certainly some bugs. Please
use the BioPerl bug tracking system to report bugs.
=head1 SEE ALSO
L<Bio::DB::SeqFeature>,
L<Bio::DB::SeqFeature::Store::GFF3Loader>,
L<Bio::DB::SeqFeature::Segment>,
L<Bio::DB::SeqFeature::Store::DBI::mysql>,
L<Bio::DB::SeqFeature::Store::berkeleydb>
L<Bio::DB::SeqFeature::Store::memory>
=head1 AUTHOR
Lincoln Stein E<lt>lstein@cshl.orgE<gt>.
Copyright (c) 2006 Cold Spring Harbor Laboratory.
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
=cut
( run in 0.698 second using v1.01-cache-2.11-cpan-39bf76dae61 )