AcePerl
view release on metacpan or search on metacpan
docs/GFF_Spec.html view on Meta::CPAN
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>
The Sanger Centre : Gene-Finding Format - introduction and specification
</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TOPMARGIN="0">
<TABLE BORDER=0 WIDTH=100%>
<TR>
<TD ALIGN=LEFT>
<FONT FACE="Arial,Helvetica,sans-serif" SIZE="-1"> <I><A HREF="/Projects/release-policy.shtml">Data release policy</a>
and <A HREF="/Projects/use-policy.shtml">Guidelines and conditions on use of data</A></I></FONT>
</TD>
</TR>
</TABLE>
<TABLE BORDER=1 WIDTH=100%>
<TR>
<TD>
<TABLE BORDER="0" WIDTH="100%">
<TR>
<TD WIDTH="23" ALIGN=LEFT VALIGN=MIDDLE ROWSPAN=3>
<IMG WIDTH="23" HEIGHT="55" ALT="" BORDER="0" SRC="/header-icons/helix.gif">
</TD>
<TD ALIGN=CENTER VALIGN=TOP>
<A HREF=/><IMG WIDTH="236" HEIGHT="29" BORDER="0"
ALT="[The Sanger Centre]"
SRC="/header-icons/sanger-centre.gif"></A>
</TD>
<TD WIDTH="55" ALIGN=RIGHT VALIGN=MIDDLE ROWSPAN=3>
<IMG WIDTH="55" HEIGHT="55" ALT="" BORDER=0 SRC=/header-icons/sw.gif>
</TD>
</TR>
<TR>
<TD ALIGN=CENTER VALIGN=TOP NOWRAP>
<TT><FONT FACE=Arial,Helvetica,sans-serif SIZE=-1>
|
<A HREF=/Info/>Info</A>
|
<A HREF=/HGP/>HGP</A>
|
<A HREF=/Projects/>Projects</A>
|
<A HREF=/DataSearch/>Database Searches</A>
|
<A HREF=/Software/><B>Software</B></A>
|
<A HREF=/Teams/>Teams</A>
|
<A HREF=http://search.sanger.ac.uk>Search</A>
|
</FONT></TT>
</TD>
</TR>
<TR> <TD ALIGN=LEFT VALIGN=TOP NOWRAP>
<FONT FACE=Arial,Helvetica,sans-serif SIZE=-1><TT>
<A HREF=/><IMG WIDTH=11 HEIGHT=10 BORDER=0 HSPACE=0 ALIGN=TOP ALT="Home page" SRC=/icons/arrow.small.up.gif> Home</A>
<A HREF=/Software/><IMG WIDTH=11 HEIGHT=10 BORDER=0 HSPACE=0 ALIGN=TOP ALT="up to Software & Databases " SRC=/icons/arrow.small.left.gif> Software & Databases </A>
<A HREF=/Software/GFF/><IMG WIDTH=11 HEIGHT=10 BORDER=0 HSPACE=0 ALIGN=TOP ALT="up to GFF" SRC=/icons/arrow.small.left.gif> GFF</A>
</TT></FONT>
</TD>
</TR>
</TABLE>
</TD>
</TR>
</TABLE>
<P>
<!-- open table cell holding the page content -->
<CENTER><TABLE BORDER="0" WIDTH="80%"><TR><TD ALIGN="LEFT" VALIGN="TOP">
<!-- page content starts here -->
<A NAME="TOC">
<H1 ALIGN="CENTER">GFF (Gene Finding Features) Specifications Document</H1>
<!-- INDEX BEGIN -->
<UL>
<LI><A HREF="#introduction">Introduction</A>
<LI><A HREF="#version_2_update">Version 2 GFF Update</A>
<LI><A HREF="#fields">Definition</A>
<UL>
<LI><A HREF="#standard_feature_table">Standard Table of Features</A>
<LI><A HREF="#group_field">Group Field</A>
<LI><A HREF="#comments">Comments</A>
<UL>
<LI><A HREF="#meta_info">Comments for Meta-Information</A>
</UL>
<LI><A HREF="#file_names">File Naming</A>
</UL>
<LI><A HREF="#semantics">Semantics</A>
<LI><A HREF="#GFF_use">Ways to use GFF</A>
<UL>
<LI><A HREF="#examples">Complex Examples</A>
<UL>
<LI><A HREF="#homology_feature">Similarities to Other Sequences</A>
</UL>
<LI><A HREF="#cum_score_array">Cumulative Score Arrays</A>
</UL>
<LI><A HREF="#mailing_list"> Mailing list</A>
<LI><A HREF="#edit_history">Edit History</A>
<LI><A HREF="#authors">Authors</A>
</UL>
<!-- INDEX END -->
<HR>
<A NAME="introduction"><h2>Introduction</h2></A>
<P>
Essentially all current approaches to gene finding in higher organisms
use a variety of recognition methods that give scores to likely
signals (starts, splice sites, stops etc.) or to extended regions
(exons, introns etc.), and then combine these to give complete gene
structures. Normally the combination step is done in the same program
as the feature detection, often using dynamic programming methods. We
would like to enable these processes to be decoupled, by proposing a
format called GFF (Gene-Finding Format) for the transfer of feature
information. It would then be possible to take features from an
outside source and add them in to an existing program, or in the
docs/GFF_Spec.html view on Meta::CPAN
meeting 971029. There are two main reasons. First, to help prevent
name space clashes -- each program would have their own source
designation. Second, to help reuse feature names, so one could have
"exon" for exon predictions from each prediction program.<P>
971113 rd: added section on mailing list.<P>
980909 ihh: fixed some small things and put this page on the Sanger
GFF site.<P>
981216 rd: introduced version 2 changes.<P>
990226 rbsk: incorporated amendments to the version 2 specification as follows:<P>
<UL>
<LI>Non-printing characters (e.g. newlines, tabs) in Version 2 double quoted
"free text values" must be explicitly represented by their C (UNIX) style
backslash escaped character (i.e. '\t' for tabs, '\n' for newlines, etc.)<br>
<LI>Removed field (256) and line (32K) character size limitations for Version 2.
<LI>Removed arbitrary whitespace field delimiter permission from specification.
TAB ('\t') field delimiters now enforced again, as in Version 1.<br>
</UL>
990317 rbsk:
<UL>
<LI>End of line comments following Version 2 [group] field tag-value structures must be
tab '\t' or hash '#' delimited.
</UL>
<P>
<P>
Back to <A HREF="#TOC">Table of Contents</A>
<P>
<HR>
<A NAME="authors"><h2>Authors</h2></A>
<P>
GFF Protocol Specification initially proposed by:
<A HREF="mailto:rd@sanger.ac.uk">Richard Durbin</a> and
<A HREF="mailto:haussler@cse.ucsc.edu">David Haussler</a>
<P>with amendments proposed by:
<A HREF="mailto:lstein@cshl.org">Lincoln Stein</a>, Anders Krogh and others.
<P>The GFF specification now maintained at the Sanger Centre by
<A HREF="mailto:rbsk@sanger.ac.uk">Richard Bruskiewich</a>
<P>
Back to <A HREF="#TOC">Table of Contents</A>
<P>
<!-- page content ends here -->
</TD></TR></TABLE></CENTER> <!-- close table for page content -->
<HR ALIGN="CENTER" WIDTH="90%">
<!-- open table for page footer -->
<TABLE BORDER="0" WIDTH="100%">
<TR>
<TD ALIGN=LEFT>
<I>
last modified : 25-Mar-1999, 01:59 PM
</I>
</TD>
<TD ALIGN=RIGHT>
<A HREF=/Users/rbsk/>Richard Bruskiewich</A>
<I>(<A HREF=mailto:rbsk@sanger.ac.uk>rbsk@sanger.ac.uk</A>)</I>
</TD>
</TR>
</TABLE> <!-- close table for page footer -->
</BODY>
</HTML>
( run in 0.583 second using v1.01-cache-2.11-cpan-df04353d9ac )