AcePerl

 view release on metacpan or  search on metacpan

docs/GFF_Spec.html  view on Meta::CPAN

candidates. <P>

Let us assume that the score of an exon can be decomposed into three
parts: the score of the 5' splice site, the score of the 3' splice
site, and the sum of the scores of all the codons in between. In such
a case it can be much more efficient to use the GFF format to report
separate scores for the splice site sensors and for the individual
codons in all three (or six, including reverse strand) frames, and let
the program that interprets this file assemble the exon scores.  The
exon scores can be calculated efficiently by first creating three
arrays, each of which contains in its [i]th position a value A[i] that
is the partial sum of the codon scores in a particular frame for the
entire sequence from position 1 up to position i.  Then for any
positions i &lt j, the sum of the scores of all codons from i to j can
be obtained as A[j] - A[i]. Using these arrays, along with the
candidate splice site scores, a very large number of scores for
overlapping exons are implicitly defined in a data structure that
takes only linear space with respect to the number of positions in the
sequence, and such that the score for each exon can be retrieved in
constant time. <P>

When the GFF format is used to transmit scores that can be summed for
efficient retrieval as in the case of the codon scores above, we ask
that the provider of the scores indicate that these scores are
summable in this manner, and provide a recipe for calculating the
scores that are to be derived from these summable scores, such as the
exon scores described above. We place no limit on the complexity of
this recipe, nor do we provide a standard protocol for such assembly,
other than providing examples.  It behooves the sensor score provider
to keep the recipe simple enough that others can easily implement it.
<P>
Back to <A HREF="#TOC">Table of Contents</A>
<P>
<HR>

<A NAME="mailing_list"><h2> Mailing list </h2>
<P>
There is a <A HREF="mailto:gff-list@sanger.ac.uk"> mailing list </a>
to which you can send comments, enquiries, complaints etc. about GFF.
If you want to be added to the mailing list, please send
mail to <A HREF="mailto:Majordomo@sanger.ac.uk">Majordomo@sanger.ac.uk</A> with the 
following command in the body of your email message:
<P>

<code>
    subscribe gff-list
</code>
<P>
<P>
Back to <A HREF="#TOC">Table of Contents</A>
<P>
<HR>
<A NAME="edit_history"><h2>Edit History</h2></A>
<P>
971028 rd: I changed the comment initiator to '#' from '//' because a 
single symbol is easier for simple parsers.<P>

971028 rd: We also now allow extra text after &#060;group&#062;
without a comment character, because this immediately proved useful.<P>

971028 rd: I considered switching from start-end notation to
start-length notation, on the suggestion of Anders Krogh.  This seems
nicer in many cases, but is a debatable point.  I then switched back!<P>

971028 rd: I added the section about name space.<P>

971108 rd: added ## line proposals - moved them into main text 971113.<P>

971113 rd: added extra "source" field as discussed at Newton Institute
meeting 971029.  There are two main reasons.  First, to help prevent
name space clashes -- each program would have their own source
designation.  Second, to help reuse feature names, so one could have
"exon" for exon predictions from each prediction program.<P>

971113 rd: added section on mailing list.<P>

980909 ihh: fixed some small things and put this page on the Sanger
GFF site.<P>

981216 rd: introduced version 2 changes.<P>

990226 rbsk: incorporated amendments to the version 2 specification as follows:<P>
<UL>
     <LI>Non-printing characters (e.g. newlines, tabs) in Version 2 double quoted
"free text values" must be explicitly represented by their C (UNIX) style 
backslash escaped character (i.e. '\t' for tabs, '\n' for newlines, etc.)<br>
     <LI>Removed field (256) and line (32K) character size limitations for Version 2.
     <LI>Removed arbitrary whitespace field delimiter permission from specification.
TAB ('\t') field delimiters now enforced again, as in Version 1.<br>
</UL>
990317 rbsk:
<UL>
   <LI>End of line comments following Version 2 [group] field tag-value structures must be 
       tab '\t' or hash '#' delimited.
</UL>       
<P>
<P>
Back to <A HREF="#TOC">Table of Contents</A>
<P>
<HR>
<A NAME="authors"><h2>Authors</h2></A>
<P>
GFF Protocol Specification initially proposed by: 
<A HREF="mailto:rd@sanger.ac.uk">Richard Durbin</a> and 
<A HREF="mailto:haussler@cse.ucsc.edu">David Haussler</a>
<P>with amendments proposed by: 
<A HREF="mailto:lstein@cshl.org">Lincoln Stein</a>, Anders Krogh and others.
<P>The GFF specification now maintained at the Sanger Centre by 
<A HREF="mailto:rbsk@sanger.ac.uk">Richard Bruskiewich</a>
<P>
Back to <A HREF="#TOC">Table of Contents</A>
<P>

<!-- page content ends here -->

</TD></TR></TABLE></CENTER>  <!-- close table for page content -->

 <HR ALIGN="CENTER" WIDTH="90%">

<!-- open table for page footer -->
<TABLE BORDER="0" WIDTH="100%">

 view all matches for this distribution
 view release on metacpan -  search on metacpan

( run in 0.577 second using v1.00-cache-2.02-grep-82fe00e-cpan-1925d2aa809 )