Bio-DB-SeqFeature

 view release on metacpan or  search on metacpan

lib/Bio/DB/SeqFeature/Store.pm  view on Meta::CPAN

    @args  = (-name=>shift);
  }
  push @args,(-aliases=>1);
  $self->get_features_by_name(@args);
}

=head2 get_features_by_type

 Title   : get_features_by_type
 Usage   : @features = $db->get_features_by_type(@types)
 Function: looks up features by their primary_tag
 Returns : a list of matching features
 Args    : list of primary tags
 Status  : public

This method will return a list of features that have any of the
primary tags given in the argument list. For compatibility with
gbrowse and Bio::DB::GFF, types can be qualified using a colon:

  primary_tag:source_tag

in which case only features that match both the primary_tag B<and> the
indicated source_tag will be returned. If the database was loaded from
a GFF3 file, this corresponds to the third and second columns of the
row, in that order.

For example, given the GFF3 lines:

  ctg123 geneFinder exon 1300 1500 . + . ID=exon001
  ctg123 fgenesH    exon 1300 1520 . + . ID=exon002

exon001 and exon002 will be returned by searching for type "exon", but
only exon001 will be returned by searching for type "exon:fgenesH".

=cut

sub get_features_by_type {
  my $self = shift;
  my @types = @_;
  $self->_features(-type=>\@types);
}

=head2 get_features_by_location

 Title   : get_features_by_location
 Usage   : @features = $db->get_features_by_location(@args)
 Function: looks up features by their location
 Returns : a list of matching features
 Args    : see below
 Status  : public

This method fetches features based on a location range lookup. You
call it using a positional list of arguments, or a list of
(-argument=E<gt>$value) pairs.

The positional form is as follows:

 $db->get_features_by_location($seqid [[,$start,]$end])

The $seqid is the name of the sequence on which the feature resides,
and start and end are optional endpoints for the match. If the
endpoints are missing then any feature on the indicated seqid is
returned.

Examples:

 get_features_by_location('chr1');      # all features on chromosome 1
 get_features_by_location('chr1',5000); # features between 5000 and the end
 get_features_by_location('chr1',5000,8000); # features between 5000 and 8000

Location lookups are overlapping. A feature will be returned if it
partially or completely overlaps the indicated range.

The named argument form gives you more control:

  Argument       Value
  --------       -----

  -seq_id        The name of the sequence on which the feature resides
  -start         Start of the range
  -end           End of the range
  -strand        Strand of the feature
  -range_type    Type of range to search over

The B<-strand> argument, if present, can be one of "0" to find
features that are on both strands, "+1" to find only plus strand
features, and "-1" to find only minus strand features. Specifying a
strand of undef is the same as not specifying this argument at all,
and retrieves all features regardless of their strandedness.

The B<-range_type> argument, if present, can be one of "overlaps" (the
default), to find features whose positions overlap the indicated
range, "contains," to find features whose endpoints are completely
contained within the indicated range, and "contained_in" to find
features whose endpoints are both outside the indicated range.

=cut

sub get_features_by_location {
  my $self = shift;
  my ($seqid,$start,$end,$strand,$rangetype) =
    rearrange([['SEQ_ID','SEQID','REF'],'START',['STOP','END'],'STRAND','RANGE_TYPE'],@_);
  $self->_features(-seqid=>$seqid,
		   -start=>$start||undef,
		   -end=>$end||undef,
		   -strand=>$strand||undef,
		   -range_type=>$rangetype);
}

=head2 get_features_by_attribute

 Title   : get_features_by_attribute
 Usage   : @features = $db->get_features_by_attribute(@args)
 Function: looks up features by their attributes/tags
 Returns : a list of matching features
 Args    : see below
 Status  : public

This implements a simple tag filter. Pass a list of tag names and
their values. The module will return a list of features whose tag
names and values match. Tag names are case insensitive. If multiple
tag name/value pairs are present, they will be ANDed together. To
match any of a list of values, use an array reference for the value.

Examples:

 # return all features whose "function" tag is "GO:0000123"
 @features = $db->get_features_by_attribute(function => 'GO:0000123');

 # return all features whose "function" tag is "GO:0000123" or "GO:0000555"
 @features = $db->get_features_by_attribute(function => ['GO:0000123','GO:0000555']);

 # return all features whose "function" tag is "GO:0000123" or "GO:0000555"
 # and whose "confirmed" tag is 1
 @features = $db->get_features_by_attribute(function  => ['GO:0000123','GO:0000555'],
                                            confirmed => 1);

=cut

sub get_features_by_attribute {
  my $self       = shift;
  my %attributes = ref($_[0]) ? %{$_[0]} : @_;
  %attributes  or $self->throw("Usage: get_feature_by_attribute(attribute_name=>\$attribute_value...)");
  $self->_features(-attributes=>\%attributes);
}
###
# features() call -- main query interface
#

=head2 features

 Title   : features
 Usage   : @features = $db->features(@args)
 Function: generalized query & retrieval interface
 Returns : list of features

lib/Bio/DB/SeqFeature/Store.pm  view on Meta::CPAN

 Title   : fetch_sequence
 Usage   : $sequence = $db->fetch_sequence(-seq_id=>$seqid,-start=>$start,-end=>$end)
 Function: Fetch the indicated subsequene from the database
 Returns : The sequence string (not a Bio::PrimarySeq object!)
 Args    : see below
 Status  : public

This method retrieves a portion of the indicated sequence. The arguments are:

  Argument       Value
  --------       -----
  -seq_id        Chromosome, contig or other DNA segment
  -seqid         Synonym for -seq_id
  -name          Synonym for -seq_id
  -start         Start of range
  -end           End of range
  -class         Obsolete argument used for Bio::DB::GFF compatibility. If
                  specified will qualify the seq_id as "$class:$seq_id".
  -bioseq        Boolean flag; if true, returns a Bio::PrimarySeq object instead
                  of a sequence string.

You can call fetch_sequence using the following shortcuts:

 $seq = $db->fetch_sequence('chr3');  # entire chromosome
 $seq = $db->fetch_sequence('chr3',1000);        # position 1000 to end of chromosome
 $seq = $db->fetch_sequence('chr3',undef,5000);  # position 1 to 5000
 $seq = $db->fetch_sequence('chr3',1000,5000);   # positions 1000 to 5000

=cut

###
# fetch_sequence()
#
# equivalent to old Bio::DB::GFF->dna() method
#
sub fetch_sequence {
  my $self = shift;
  my ($seqid,$start,$end,$class,$bioseq) = rearrange([['NAME','SEQID','SEQ_ID'],
						      'START',['END','STOP'],'CLASS','BIOSEQ'],@_);
  $seqid = "$seqid:$class" if defined $class;
  my $seq = $self->seq($seqid,$start,$end);
  return $seq unless $bioseq;

  require Bio::Seq unless Bio::Seq->can('new');
  my $display_id = defined $start ? "$seqid:$start..$end" : $seqid;
  return Bio::Seq->new(-display_id=>$display_id,-seq=>$seq);
}

=head2 segment

 Title   : segment
 Usage   : $segment = $db->segment($seq_id [,$start] [,$end] [,$absolute])
 Function: restrict the database to a sequence range
 Returns : a Bio::DB::SeqFeature::Segment object
 Args    : sequence id, start and end ranges (optional)
 Status  : public

This is a convenience method that can be used when you are interested
in the contents of a particular sequence landmark, such as a
contig. Specify the ID of a sequence or other landmark in the database
and optionally a start and endpoint relative to that landmark. The
method will look up the region and return a
Bio::DB::SeqFeature::Segment object that spans it. You can then use
this segment object to make location-restricted queries on the database.

Example:

 $segment  = $db->segment('contig23',1,1000);  # first 1000 bp of contig23
 my @mRNAs = $segment->features('mRNA');       # all mRNAs that overlap segment

Although you will usually want to fetch segments that correspond to
physical sequences in the database, you can actually use any feature
in the database as the sequence ID. The segment() method will perform
a get_features_by_name() internally and then transform the feature
into the appropriate coordinates.

The named feature should exist once and only once in the database. If
it exists multiple times in the database and you attempt to call
segment() in a scalar context, you will get an exception. A workaround
is to call the method in a list context, as in:

  my ($segment) = $db->segment('contig23',1,1000);

or

  my @segments  = $db->segment('contig23',1,1000);

However, having multiple same-named features in the database is often
an indication of underlying data problems.

If the optional $absolute argument is a true value, then the specified
coordinates are relative to the reference (absolute) coordinates.

=cut

###
# Replacement for Bio::DB::GFF->segment() method
#
sub segment {
  my $self = shift;
  my (@features,@args);

  if (@_ == 1 && blessed($_[0])) {
    @features = @_;
    @args = ();
  }
  else {
    @args     = $self->setup_segment_args(@_);
    @features = $self->get_features_by_name(@args);
  }
  if (!wantarray && @features > 1) {
    $self->throw(<<END);
segment() called in a scalar context but multiple features match.
Either call in a list context or narrow your search using the -types or -class arguments
END
  }
  my ($rel_start,$rel_end,$abs) = rearrange(['START',['STOP','END'],'ABSOLUTE'],@args);
  $rel_start = 1 unless defined $rel_start;

  my @segments;
  for my $f (@features) {



( run in 2.281 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )