Bio-DB-SeqFeature

 view release on metacpan or  search on metacpan

lib/Bio/DB/SeqFeature/Store.pm  view on Meta::CPAN

NOTE: This search will fail to find features that do not have a display name!

You can use fetch() or fetch_many() with the returned IDs to get to
the features themselves.

=cut

sub search_attributes {
  my $self = shift;
  my ($search_string,$attribute_names,$limit) = @_;
  my $attribute_array   = ref $attribute_names
                      && ref $attribute_names eq 'ARRAY' ? $attribute_names : [$attribute_names];
  return $self->_search_attributes($search_string,$attribute_array,$limit);
}

=head2 search_notes

 Title   : search_notes
 Usage   : @result_list = $db->search_notes("full text search string",$limit)
 Function: Search the notes for a text string
 Returns : array of results
 Args    : full text search string, and an optional feature limit
 Status  : public

Given a search string, this method performs a full-text search of the
"Notes" attribute and returns an array of results.  Each row of the
returned array is a arrayref containing the following fields:

  column 1     The display_name of the feature, suitable for passing to get_feature_by_name()
  column 2     The text of the note
  column 3     A relevance score.
  column 4     The type

NOTE: This is equivalent to $db-E<gt>search_attributes('full text search
string','Note',$limit). This search will fail to find features that do
not have a display name!

=cut

###
# search_notes()
#
sub search_notes {
  my $self = shift;
  my ($search_string,$limit) = @_;
  return $self->_search_attributes($search_string,['Note'],$limit);
}

=head2 types

 Title   : types
 Usage   : @type_list = $db->types
 Function: Get all the types in the database
 Returns : array of Bio::DB::GFF::Typename objects
 Args    : none
 Status  : public

=cut

sub types {
    shift->throw_not_implemented;
}

=head2 insert_sequence

 Title   : insert_sequence
 Usage   : $success = $db->insert_sequence($seqid,$sequence_string,$offset)
 Function: Inserts sequence data into the database at the indicated offset
 Returns : true if successful
 Args    : see below
 Status  : public

This method inserts the DNA or protein sequence fragment
$sequence_string, identified by the ID $seq_id, into the database at
the indicated offset $offset. It is used internally by the GFF3Loader
to load sequence data from the files.

=cut

###
# insert_sequence()
#
# insert a bit of primary sequence into the database
#
sub insert_sequence {
  my $self = shift;
  my ($seqid,$seq,$offset) = @_;
  $offset ||= 0;
  $self->_insert_sequence($seqid,$seq,$offset);
}


=head2 fetch_sequence

 Title   : fetch_sequence
 Usage   : $sequence = $db->fetch_sequence(-seq_id=>$seqid,-start=>$start,-end=>$end)
 Function: Fetch the indicated subsequene from the database
 Returns : The sequence string (not a Bio::PrimarySeq object!)
 Args    : see below
 Status  : public

This method retrieves a portion of the indicated sequence. The arguments are:

  Argument       Value
  --------       -----
  -seq_id        Chromosome, contig or other DNA segment
  -seqid         Synonym for -seq_id
  -name          Synonym for -seq_id
  -start         Start of range
  -end           End of range
  -class         Obsolete argument used for Bio::DB::GFF compatibility. If
                  specified will qualify the seq_id as "$class:$seq_id".
  -bioseq        Boolean flag; if true, returns a Bio::PrimarySeq object instead
                  of a sequence string.

You can call fetch_sequence using the following shortcuts:

 $seq = $db->fetch_sequence('chr3');  # entire chromosome
 $seq = $db->fetch_sequence('chr3',1000);        # position 1000 to end of chromosome
 $seq = $db->fetch_sequence('chr3',undef,5000);  # position 1 to 5000
 $seq = $db->fetch_sequence('chr3',1000,5000);   # positions 1000 to 5000

lib/Bio/DB/SeqFeature/Store.pm  view on Meta::CPAN

    $self->{seqfeatureclass} = $class;
  }
  $d;
}

=head2 reindex

 Title   : reindex
 Usage   : $db->reindex
 Function: reindex the database
 Returns : nothing
 Args    : nothing
 Status  : public

This method will force the secondary indexes (name, location,
attributes, feature types) to be recalculated. It may be useful to
rebuild a corrupted database.

=cut

###
# force reindexing
#
sub reindex {
  my $self = shift;

  my $count = 0;
  my $now;
  my $last_time = time();

  $self->_start_reindexing;

  my $iterator = $self->get_seq_stream;
  while (my $f = $iterator->next_seq) {
    if (++$count %1000 == 0) {
      $now = time();
      my $elapsed = sprintf(" in %5.2fs",$now - $last_time);
      $last_time = $now;
      print STDERR "$count features indexed$elapsed...",' 'x60;
      print STDERR -t STDOUT && !$ENV{EMACS} ? "\r" : "\n";
    }
    $self->_update_indexes($f);
  }

  $self->_end_reindexing;
}

=head2 attributes

 Title   : attributes
 Usage   : @a = $db->attributes
 Function: Returns list of all known attributes
 Returns : Returns list of all known attributes
 Args    : nothing
 Status  : public

=cut

sub attributes {
    my $self = shift;
    shift->throw_not_implemented;
}


=head2 start_bulk_update,finish_bulk_update

 Title   : start_bulk_update,finish_bulk_update
 Usage   : $db->start_bulk_update
           $db->finish_bulk_update
 Function: Activate optimizations for large number of insertions/updates
 Returns : nothing
 Args    : nothing
 Status  : public

With some adaptors (currently only the DBI::mysql adaptor), these
methods signal the adaptor that a large number of insertions or
updates are to be performed, and activate certain optimizations. These
methods are called automatically by the
Bio::DB::SeqFeature::Store::GFF3Loader module.

Example:

  $db->start_bulk_update;
  for my $f (@features) {
    $db->store($f);
  }
  $db->finish_bulk_update;

=cut

sub start_bulk_update  { shift->_start_bulk_update(@_) }
sub finish_bulk_update { shift->_finish_bulk_update(@_) }

=head2 add_SeqFeature

 Title   : add_SeqFeature
 Usage   : $count = $db->add_SeqFeature($parent,@children)
 Function: store a parent/child relationship between a $parent and @children
           features that are already stored in the database
 Returns : number of children successfully stored
 Args    : parent feature or primary ID and children features or primary IDs
 Status  : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS

If can_store_parentage() returns true, then some store-aware features
(e.g. Bio::DB::SeqFeature) will invoke this method to store
feature/subfeature relationships in a normalized table.

=cut

# these two are called only if _can_store_subFeatures() returns true
# _add_SeqFeature ($parent,@children)
sub add_SeqFeature  { shift->_add_SeqFeature(@_)   }

=head2 fetch_SeqFeatures

 Title   : fetch_SeqFeatures
 Usage   : @children = $db->fetch_SeqFeatures($parent_feature)
 Function: return the immediate subfeatures of the indicated feature
 Returns : list of subfeatures
 Args    : the parent feature and an optional list of children types
 Status  : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS

lib/Bio/DB/SeqFeature/Store.pm  view on Meta::CPAN

  $feature->primary_id($key);
  $self->store($feature);
}

sub FETCH {
  my $self = shift;
  $self->fetch(@_);
}

sub FIRSTKEY {
  my $self = shift;
  $self->_firstid;
}

sub NEXTKEY {
  my $self    = shift;
  my $lastkey = shift;
  $self->_nextid($lastkey);
}

sub EXISTS {
  my $self = shift;
  my $key  = shift;
  $self->existsid($key);
}

sub DELETE {
  my $self = shift;
  my $key  = shift;
  $self->_deleteid($key);
}

sub CLEAR {
  my $self = shift;
  $self->_clearall;
}

sub SCALAR {
  my $self = shift;
  $self->_featurecount;
}


###################### TO BE IMPLEMENTED BY ADAPTOR ##########

=head2 _init_database

 Title   : _init_database
 Usage   : $success = $db->_init_database([$erase])
 Function: initialize an empty database
 Returns : true on success
 Args    : optional boolean flag to erase contents of an existing database
 Status  : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR

This method is the back end for init_database(). It must be
implemented by an adaptor that inherits from
Bio::DB::SeqFeature::Store. It returns true on success. @features = $db->features(-seqid=>'Chr1');

=cut

sub _init_database { shift->throw_not_implemented }

=head2 _store

 Title   : _store
 Usage   : $success = $db->_store($indexed,@objects)
 Function: store seqfeature objects into database
 Returns : true on success
 Args    : a boolean flag indicating whether objects are to be indexed,
           and one or more objects
 Status  : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR

This method is the back end for store() and store_noindex(). It should
write the seqfeature objects into the database. If indexing is
requested, the features should be indexed for query and
retrieval. Otherwise the features should be stored without indexing
(it is not required that adaptors respect this).

If the object has no primary_id (undef), then the object is written
into the database and assigned a new primary_id. If the object already
has a primary_id, then the system will perform an update, replacing
whatever was there before.

In practice, the implementation will serialize each object using the
freeze() method and then store it in the database under the
corresponding primary_id. The object is then updated with the
primary_id.

=cut

# _store($indexed,@objs)
sub _store {
  my $self    = shift;
  my $indexed = shift;
  my @objs    = @_;
  $self->throw_not_implemented;
}

=head2 _fetch

 Title   : _fetch
 Usage   : $feature = $db->_fetch($primary_id)
 Function: fetch feature from database
 Returns : feature
 Args    : primary id
 Status  : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR

This method is the back end for fetch(). It accepts a primary_id and
returns a feature object. It must be implemented by the adaptor.

In practice, the implementation will retrieve the serialized
Bio::SeqfeatureI object from the database and pass it to the thaw()
method to unserialize it and synchronize the primary_id.

=cut

# _fetch($id)
sub _fetch { shift->throw_not_implemented }

=head2 _fetch_many

 Title   : _fetch_many
 Usage   : $feature = $db->_fetch_many(@primary_ids)
 Function: fetch many features from database
 Returns : feature
 Args    : primary id
 Status  : private -- does not need to be implemented

This method fetches many features specified by a list of IDs. The
default implementation simply calls _fetch() once for each
primary_id. Implementors can override it if needed for efficiency.

=cut

# _fetch_many(@ids)
# this one will fall back to many calls on fetch() if you don't
# override it
sub _fetch_many {
  my $self = shift;
  return map {$self->_fetch($_)} @_;
}

=head2 _update_indexes

 Title   : _update_indexes
 Usage   : $success = $db->_update_indexes($feature)
 Function: update the indexes for a feature
 Returns : true on success
 Args    : A seqfeature object
 Status  : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR

This method is called by reindex() to update the searchable indexes
for a feature object that has changed.

=cut

# this is called to index a feature
sub _update_indexes { shift->throw_not_implemented }

=head2 _start_reindexing, _end_reindexing

 Title   : _start_reindexing, _end_reindexing
 Usage   : $db->_start_reindexing()
           $db->_end_reindexing
 Function: flag that a series of reindexing operations is beginning/ending
 Returns : true on success
 Args    : none
 Status  : MAY BE IMPLEMENTED BY AN ADAPTOR (optional)

These methods are called by reindex() before and immediately after a
series of reindexing operations. The default behavior is to do
nothing, but these methods can be overridden by an adaptor in order to
perform optimizations, turn off autocommits, etc.

=cut

# these do not necessary have to be overridden
# they are called at beginning and end of reindexing process
sub _start_reindexing {}
sub _end_reindexing   {}

=head2 _features

 Title   : _features
 Usage   : @features = $db->_features(@args)
 Function: back end for all get_feature_by_*() queries
 Returns : list of features
 Args    : see below
 Status  : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR

This is the backend for features(), get_features_by_name(),
get_features_by_location(), etc. Arguments are as described for the
features() method, except that only the named-argument form is
recognized.

=cut

# bottleneck query generator
sub _features { shift->throw_not_implemented }

=head2 _search_attributes

 Title   : _search_attributes
 Usage   : @result_list = $db->_search_attributes("text search string",[$tag1,$tag2...],$limit)
 Function: back end for the search_attributes() method
 Returns : results list
 Args    : as per search_attributes()
 Status  : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR

See search_attributes() for the format of the results list. The only
difference between this and the public method is that the tag list is
guaranteed to be an array reference.

=cut

sub _search_attributes { shift->throw_not_implemented }

=head2 can_store_parentage

 Title   : can_store_parentage
 Usage   : $flag = $db->can_store_parentage
 Function: return true if this adaptor can store parent/child relationships
 Returns : boolean
 Args    : none
 Status  : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS

Override this method and return true if this adaptor supports the
_add_SeqFeature() and _get_SeqFeatures() methods, which are used for
storing feature parent/child relationships in a normalized
fashion. Default is false (parent/child relationships are stored in
denormalized form in each feature).

=cut

# return true here if the storage engine is prepared to store parent/child
# relationships using _add_SeqFeature and return them using _fetch_SeqFeatures
sub can_store_parentage { return; }

=head2 _add_SeqFeature

 Title   : _add_SeqFeature
 Usage   : $count = $db->_add_SeqFeature($parent,@children)
 Function: store a parent/child relationship between $parent and @children
 Returns : number of children successfully stored
 Args    : parent feature and one or more children
 Status  : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS

If can_store_parentage() returns true, then some store-aware features
(e.g. Bio::DB::SeqFeature) will invoke this method to store
feature/subfeature relationships in a normalized table.

=cut

sub _add_SeqFeature { shift->throw_not_implemented }

=head2 _fetch_SeqFeatures

 Title   : _fetch_SeqFeatures
 Usage   : @children = $db->_fetch_SeqFeatures($parent_feature)
 Function: return the immediate subfeatures of the indicated feature
 Returns : list of subfeatures
 Args    : the parent feature
 Status  : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS

If can_store_parentage() returns true, then some store-aware features
(e.g. Bio::DB::SeqFeature) will invoke this method to retrieve
feature/subfeature relationships from the database.

=cut

# _get_SeqFeatures($parent,@list_of_child_types)
sub _fetch_SeqFeatures {shift->throw_not_implemented }

=head2 _insert_sequence

 Title   : _insert_sequence
 Usage   : $success = $db->_insert_sequence($seqid,$sequence_string,$offset)
 Function: Inserts sequence data into the database at the indicated offset
 Returns : true if successful
 Args    : see below
 Status  : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR

This is the back end for insert_sequence(). Adaptors must implement
this method in order to store and retrieve nucleotide or protein
sequence.

=cut

sub _insert_sequence   { shift->throw_not_implemented }

# _fetch_sequence() is similar to old dna() method

=head2 _fetch_sequence

 Title   : _fetch_sequence
 Usage   : $sequence = $db->_fetch_sequence(-seq_id=>$seqid,-start=>$start,-end=>$end)
 Function: Fetch the indicated subsequence from the database
 Returns : The sequence string (not a Bio::PrimarySeq object!)
 Args    : see below
 Status  : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR

This is the back end for fetch_sequence(). Adaptors must implement
this method in order to store and retrieve nucleotide or protein
sequence.

=cut

sub _fetch_sequence    { shift->throw_not_implemented }

sub seq {
    my $self     = shift;
    my ($seq_id,$start,$end) = @_;
    if (my $a = $self->dna_accessor) {
	return $a->can('seq')           ? $a->seq($seq_id,$start,$end)
	      :$a->can('fetch_sequence')? $a->fetch_sequence($seq_id,$start,$end)
          : undef;
    }
    else {
	return $self->_fetch_sequence($seq_id,$start,$end);
    }
}

=head2 _seq_ids

 Title   : _seq_ids
 Usage   : @ids = $db->_seq_ids()
 Function: Return all sequence IDs contained in database
 Returns : list of sequence Ids
 Args    : none
 Status  : TO BE IMPLEMENTED BY ADAPTOR

This method is invoked by seq_ids() to return all sequence IDs
(coordinate systems) known to the database.

=cut

sub _seq_ids { shift->throw_not_implemented }

=head2 _start_bulk_update,_finish_bulk_update

 Title   : _start_bulk_update, _finish_bulk_update
 Usage   : $db->_start_bulk_update
           $db->_finish_bulk_update
 Function: Activate optimizations for large number of insertions/updates
 Returns : nothing
 Args    : nothing
 Status  : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTOR

These are the backends for start_bulk_update() and
finish_bulk_update(). The default behavior of both methods is to do
nothing.

=cut

# Optional flags to change behavior to optimize bulk updating.
sub _start_bulk_update { }
sub _finish_bulk_update { }


# for full TIE() interface  - not necessary to implement in most cases

=head2 Optional methods needed to implement full TIEHASH interface

The core TIEHASH interface will work if just the _store() and _fetch()
methods are implemented. To support the full TIEHASH interface,
including support for keys(), each(), and exists(), the following
methods should be implemented:

=over 4

=item $id = $db-E<gt>_firstid()

Return the first primary ID in the database. Needed for the each()
function.

=item $next_id = $db-E<gt>_nextid($id)

Given a primary ID, return the next primary ID in the series. Needed
for the each() function.

=item $boolean = $db-E<gt>_existsid($id)

Returns true if the indicated primary ID is in the database. Needed
for the exists() function.

=item $db-E<gt>_deleteid($id)

Delete the feature corresponding to the given primary ID. Needed for
delete().

=item $db-E<gt>_clearall()

Empty the database. Needed for %tied_hash = ().

=item $count = $db-E<gt>_featurecount()

Return the number of features in the database. Needed for scalar
%tied_hash.

=back

=cut

sub _firstid  { shift->throw_not_implemented }
sub _nextid   { shift->throw_not_implemented }
sub _existsid { shift->throw_not_implemented }
sub _deleteid { shift->throw_not_implemented }
sub _clearall { shift->throw_not_implemented }
sub _featurecount { shift->throw_not_implemented }


=head1 Internal Methods

These methods are internal to Bio::DB::SeqFeature::Store and adaptors.

=head2 new_instance

 Title   : new_instance
 Usage   : $db = $db->new_instance()
 Function: class constructor
 Returns : A descendent of Bio::DB::SeqFeature::Store
 Args    : none
 Status  : internal

This method is called internally by new() to create a new
uninitialized instance of Bio::DB::SeqFeature::Store. It is used
internally and should not be called by application software.

=cut

sub new_instance {
  my $class = shift;
  return bless {},ref($class) || $class;
}

=head2 init

 Title   : init
 Usage   : $db->init(@args)
 Function: initialize object
 Returns : none
 Args    : Arguments passed to new()
 Status  : private

This method is called internally by new() to initialize a
newly-created object using the arguments passed to new(). It is to be
overridden by Bio::DB::SeqFeature::Store adaptors.

=cut

sub init {
  my $self = shift;
  $self->default_settings();
}

=head2 default_settings

 Title   : default_settings
 Usage   : $db->default_settings()
 Function: set up default settings for the adaptor
 Returns : none
 Args    : none
 Status  : private

This method is may be overridden by adaptors. It is responsible for
setting up object default settings.

=cut

lib/Bio/DB/SeqFeature/Store.pm  view on Meta::CPAN

    my ($seq_name,$start,$end,$types,$bins,$iterator) =
	rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'],
		   ['TYPES','TYPE','PRIMARY_TAG'],
		   'BINS',
		   'ITERATOR',
		  ],@_);
    my ($coverage,$tag) = $self->coverage_array(-seqid=> $seq_name,
						-start=> $start,
						-end  => $end,
						-type => $types,
						-bins => $bins) or return;
    my $score = 0;
    for (@$coverage) { $score += $_ }
    $score /= @$coverage;

    my $feature = Bio::SeqFeature::Lite->new(-seq_id => $seq_name,
					     -start  => $start,
					     -end    => $end,
					     -type   => $tag,
					     -score  => $score,
					     -attributes =>
					     { coverage => [$coverage] });
    return $iterator
	   ? Bio::DB::SeqFeature::Store::FeatureIterator->new($feature)
	   : $feature;
}

=head2 coverage_array

 Title   : coverage_array
 Usage   : $arrayref = $db->coverage_array(@args)
 Function: returns a coverage summary across indicated region/type
 Returns : an array reference
 Args    : see below
 Status  : public

This method is used to get coverage density information across a
region of interest. The arguments are identical to feature_summary,
except that instead of returning a Bio::SeqFeatureI object, it returns
an array reference of the desired number of bins. The value of each
element corresponds to the number of features in the bin.

Arguments:

  Argument       Description
  --------       -----------

  -seq_id        Sequence ID for the region
  -start         Start of region
  -end           End of region
  -type/-types   Feature type of interest or array ref of types
  -bins          Number of bins across region. Defaults to 1000.

Note that this method uses an approximate algorithm that is only
accurate to 500 bp, so when dealing with bins that are smaller than
1000 bp, you may see some shifting of counts between adjacent bins.

=cut

sub coverage_array {
    shift->throw_not_implemented;
}


package Bio::DB::SeqFeature::Store::FeatureIterator;
$Bio::DB::SeqFeature::Store::FeatureIterator::VERSION = '1.7.5';
sub new {
    my $self     = shift;
    my @features = @_;
    return bless \@features,ref $self || $self;
}
sub next_seq {
  my $self  = shift;
  return unless @$self;
  return shift @$self;
}

sub begin_work { }# noop
sub commit     { }# noop
sub rollback   { }# noop

1;

__END__

=head1 BUGS

This is an early version, so there are certainly some bugs. Please
use the BioPerl bug tracking system to report bugs.

=head1 SEE ALSO

L<Bio::DB::SeqFeature>,
L<Bio::DB::SeqFeature::Store::GFF3Loader>,
L<Bio::DB::SeqFeature::Segment>,
L<Bio::DB::SeqFeature::Store::DBI::mysql>,
L<Bio::DB::SeqFeature::Store::berkeleydb>
L<Bio::DB::SeqFeature::Store::memory>

=head1 AUTHOR

Lincoln Stein E<lt>lstein@cshl.orgE<gt>.

Copyright (c) 2006 Cold Spring Harbor Laboratory.

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.

=cut



( run in 0.698 second using v1.01-cache-2.11-cpan-39bf76dae61 )