AcePerl
view release on metacpan or search on metacpan
Ace/Sequence.pm view on Meta::CPAN
*source_seq = \&source;
*source_tag = \&subtype;
*primary_tag = \&type;
my %plusminus = ( '+' => '-',
'-' => '+',
'.' => '.');
# internal keys
# parent => reference Sequence in "+" strand
# p_offset => our start in the parent
# length => our length
# strand => our strand (+ or -)
# refseq => reference Sequence for coordinate system
# object constructor
# usually called like this:
# $seq = Ace::Sequence->new($object);
# but can be called like this:
# $seq = Ace::Sequence->new(-db=>$db,-name=>$name);
# or
# $seq = Ace::Sequence->new(-seq => $object,
# -offset => $offset,
# -length => $length,
# -ref => $refseq
# );
# $refseq, if provided, will be used to establish the coordinate
# system. Otherwise the first base pair will be set to 1.
sub new {
my $pack = shift;
my ($seq,$start,$end,$offset,$length,$refseq,$db) =
rearrange([
['SEQ','SEQUENCE','SOURCE'],
'START',
['END','STOP'],
['OFFSET','OFF'],
['LENGTH','LEN'],
'REFSEQ',
['DATABASE','DB'],
],@_);
# Object must have a parent sequence and/or a reference
# sequence. In some cases, the parent sequence will be the
# object itself. The reference sequence is used to set up
# the frame of reference for the coordinate system.
# fetch the sequence object if we don't have it already
croak "Please provide either a Sequence object or a database and name"
unless ref($seq) || ($seq && $db);
# convert start into offset
$offset = $start - 1 if defined($start) and !defined($offset);
# convert stop/end into length
$length = ($end > $start) ? $end - $offset : $end - $offset - 2
if defined($end) && !defined($length);
# if just a string is passed, try to fetch a Sequence object
my $obj = ref($seq) ? $seq : $db->fetch('Sequence'=>$seq);
unless ($obj) {
Ace->error("No Sequence named $obj found in database");
return;
}
# get parent coordinates and length of this sequence
# the parent is an Ace Sequence object in the "+" strand
my ($parent,$p_offset,$p_length,$strand) = find_parent($obj);
return unless $parent;
# handle negative strands
my $r_strand = $strand;
my $r_offset = $p_offset;
$offset ||= 0;
$offset *= -1 if $strand < 0;
# handle feature objects
$offset += $obj->offset if $obj->can('smapped');
# get source
my $source = $obj->can('smapped') ? $obj->source : $obj;
# store the object into our instance variables
my $self = bless {
obj => $source,
offset => $offset,
length => $length || $p_length,
parent => $parent,
p_offset => $p_offset,
refseq => [$source,$r_offset,$r_strand],
strand => $strand,
absolute => 0,
automerge => 1,
},$pack;
# set the reference sequence
eval { $self->refseq($refseq) } or return if defined $refseq;
# wheww!
return $self;
}
# return the "source" object that the user offset from
sub source {
$_[0]->{obj};
}
# return the parent object
sub parent { $_[0]->{parent} }
# return the length
#sub length { $_[0]->{length} }
sub length {
my $self = shift;
my ($start,$end) = ($self->start,$self->end);
return $end - $start + ($end > $start ? 1 : -1); # for stupid 1-based adjustments
}
sub reversed { return shift->strand < 0; }
sub automerge {
my $self = shift;
Ace/Sequence.pm view on Meta::CPAN
The class method named new() is the interface to these facilities. In
its simplest, one-argument form, you provide new() with a
previously-created I<Ace::Object> that points to Sequence or
sequence-like object (the meaning of "sequence-like" is explained in
more detail below.) The new() method will return an I<Ace::Sequence>
object extending from the beginning of the object through to its
natural end.
In the named-parameter form of new(), the following arguments are
recognized:
=over 4
=item -source
The sequence source. This must be an I<Ace::Object> of the "Sequence"
class, or be a sequence-like object containing the SMap tag (see
below).
=item -offset
An offset from the beginning of the source sequence. The retrieved
I<Ace::Sequence> will begin at this position. The offset can be any
positive or negative integer. Offets are B<0-based>.
=item -length
The length of the sequence to return. Either a positive or negative
integer can be specified. If a negative length is given, the returned
sequence will be complemented relative to the source sequence.
=item -refseq
The sequence to use to establish the coordinate system for the
returned sequence. Normally the source sequence is used to establish
the coordinate system, but this can be used to override that choice.
You can provide either an I<Ace::Object> or just a sequence name for
this argument. The source and reference sequences must share a common
ancestor, but do not have to be directly related. An attempt to use a
disjunct reference sequence, such as one on a different chromosome,
will fail.
=item -name
As an alternative to using an I<Ace::Object> with the B<-source>
argument, you may specify a source sequence using B<-name> and B<-db>.
The I<Ace::Sequence> module will use the provided database accessor to
fetch a Sequence object with the specified name. new() will return
undef is no Sequence by this name is known.
=item -db
This argument is required if the source sequence is specified by name
rather than by object reference.
=back
If new() is successful, it will create an I<Ace::Sequence> object and
return it. Otherwise it will return undef and return a descriptive
message in Ace->error(). Certain programming errors, such as a
failure to provide required arguments, cause a fatal error.
=head2 Reference Sequences and the Coordinate System
When retrieving information from an I<Ace::Sequence>, the coordinate
system is based on the sequence segment selected at object creation
time. That is, the "+1" strand is the natural direction of the
I<Ace::Sequence> object, and base pair 1 is its first base pair. This
behavior can be overridden by providing a reference sequence to the
new() method, in which case the orientation and position of the
reference sequence establishes the coordinate system for the object.
In addition to the reference sequence, there are two other sequences
used by I<Ace::Sequence> for internal bookeeping. The "source"
sequence corresponds to the smallest ACeDB sequence object that
completely encloses the selected sequence segment. The "parent"
sequence is the smallest ACeDB sequence object that contains the
"source". The parent is used to derive the length and orientation of
source sequences that are not directly associated with DNA objects.
In many cases, the source sequence will be identical to the sequence
initially passed to the new() method. However, there are exceptions
to this rule. One common exception occurs when the offset and/or
length cross the boundaries of the passed-in sequence. In this case,
the ACeDB database is searched for the smallest sequence that contains
both endpoints of the I<Ace::Sequence> object.
The other common exception occurs in Ace 4.8, where there is support
for "sequence-like" objects that contain the C<SMap> ("Sequence Map")
tag. The C<SMap> tag provides genomic location information for
arbitrary object -- not just those descended from the Sequence class.
This allows ACeDB to perform genome map operations on objects that are
not directly related to sequences, such as genetic loci that have been
interpolated onto the physical map. When an C<SMap>-containing object
is passed to the I<Ace::Sequence> new() method, the module will again
choose the smallest ACeDB Sequence object that contains both
end-points of the desired region.
If an I<Ace::Sequence> object is used to create a new I<Ace::Sequence>
object, then the original object's source is inherited.
=head1 Object Methods
Once an I<Ace::Sequence> object is created, you can query it using the
following methods:
=head2 asString()
$name = $seq->asString;
Returns a human-readable identifier for the sequence in the form
I<Source/start-end>, where "Source" is the name of the source
sequence, and "start" and "end" are the endpoints of the sequence
relative to the source (using 1-based indexing). This method is
called automatically when the I<Ace::Sequence> is used in a string
context.
=head2 source_seq()
$source = $seq->source_seq;
Ace/Sequence.pm view on Meta::CPAN
for details.
=head2 clones()
This returns a list of Ace::Sequence::Feature objects containing
reconstructed clones. This is a nasty hack, because ACEDB currently
records clone ends, but not the clones themselves, meaning that we
will not always know both ends of the clone. In this case the missing
end has a synthetic position of -99,999,999 or +99,999,999. Sorry.
=head2 gff()
$gff = $seq->gff();
$gff = $seq->gff(-abs => 1,
-features => ['exon','intron:GeneFinder']);
This method returns a GFF file as a scalar. The following arguments
are optional:
=over 4
=item -abs
Ordinarily the feature entries in the GFF file will be returned in
coordinates relative to the start of the I<Ace::Sequence> object.
Position 1 will be the start of the sequence object, and the "+"
strand will be the sequence object's natural orientation. However if
a true value is provided to B<-abs>, the coordinate system used will
be relative to the start of the source sequence, i.e. the native ACeDB
Sequence object (usually a cosmid sequence or a link).
If a reference sequence was provided when the I<Ace::Sequence> was
created, it will be used by default to set the coordinate system.
Relative coordinates can be reenabled by providing a false value to
B<-abs>.
Ordinarily the coordinate system manipulations automatically "do what
you want" and you will not need to adjust them. See also the abs()
method described below.
=item -features
The B<-features> argument filters the features according to a list of
types and subtypes. The format is identical to the one described for
the features() method. A single filter may be provided as a scalar
string. Multiple filters may be passed as an array reference.
=back
See also the GFF() method described next.
=head2 GFF()
$gff_object = $seq->gff;
$gff_object = $seq->gff(-abs => 1,
-features => ['exon','intron:GeneFinder']);
The GFF() method takes the same arguments as gff() described above,
but it returns a I<GFF::GeneFeatureSet> object from the GFF.pm
module. If the GFF module is not installed, this method will generate
a fatal error.
=head2 absolute()
$abs = $seq->absolute;
$abs = $seq->absolute(1);
This method controls whether the coordinates of features are returned
in absolute or relative coordinates. "Absolute" coordinates are
relative to the underlying source or reference sequence. "Relative"
coordinates are relative to the I<Ace::Sequence> object. By default,
coordinates are relative unless new() was provided with a reference
sequence. This default can be examined and changed using absolute().
=head2 automerge()
$merge = $seq->automerge;
$seq->automerge(0);
This method controls whether groups of features will automatically be
merged together by the features() call. If true (the default), then
the left and right end of clones will be merged into "clone" features,
introns, exons and CDS entries will be merged into
Ace::Sequence::Transcript objects, and similarity entries will be
merged into Ace::Sequence::GappedAlignment objects.
=head2 db()
$db = $seq->db;
Returns the L<Ace> database accessor associated with this sequence.
=head1 SEE ALSO
L<Ace>, L<Ace::Object>, L<Ace::Sequence::Feature>,
L<Ace::Sequence::FeatureList>, L<GFF>
=head1 AUTHOR
Lincoln Stein <lstein@cshl.org> with extensive help from Jean
Thierry-Mieg <mieg@kaa.crbm.cnrs-mop.fr>
Many thanks to David Block <dblock@gene.pbi.nrc.ca> for finding and
fixing the nasty off-by-one errors.
Copyright (c) 1999, Lincoln D. Stein
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself. See DISCLAIMER.txt for
disclaimers of warranty.
=cut
__END__
( run in 0.649 second using v1.01-cache-2.11-cpan-39bf76dae61 )