Bio-ToolBox

 view release on metacpan or  search on metacpan

scripts/get_gene_regions.pl  view on Meta::CPAN

which is described at L<Ensembl TSL glossary entry|http://uswest.ensembl.org/info/website/glossary.html>.
Provide a level of support to filter. Values include: 
    
    1       All splice junctions supported by evidence
    2       Transcript flagged as suspect or only support from multiple ESTs
    3       Only support from single EST
    4       Best supporting EST is suspect
    5       No support
    best    Transcripts at the best (lowest) available level are taken
    best1   The word followed by a digit 1-5, indicating any transcript 
            at or better (lower) than the indicated level
    NA      Only transcripts without a level (NA) are retained.

=item --unique

Compare start and stop coordinates of each collected region from 
each feature and remove duplicate regions. When the --slop option 
is provided, only the start coordinate plus/minus the slop factor 
is checked. 

=item --slop E<lt>integerE<gt>

When identifying unique regions, specify the number of bp to 
add and subtract to the start position (the slop or fudge factor) 
of the regions when considering duplicates. Any other region 
within this window will be considered a duplicate. Useful, for 
example, when start sites of transcription are not precisely mapped, 
but not useful with defined introns and exons. This does not take 
into consideration transcripts from other genes, only the current 
gene. The default is 0 (no sloppiness).

=item --chrskip E<lt>regexE<gt>

Exclude features from the output whose sequence ID or chromosome matches 
the provided regex-compatible string. Expressions should be quoted or 
properly escaped on the command line. Examples might be 
    
    'chrM'
    'scaffold.+'
    'chr.+alt|chrUn.+|chr.+_random'

=back

=head2 Adjustments

=over 4

=item --start E<lt>integerE<gt>

=item --begin E<lt>integerE<gt>

=item --stop E<lt>integerE<gt>

=item --end E<lt>integerE<gt>

Optionally specify adjustment values to adjust the reported start and 
end coordinates of the collected regions. A negative value is shifted 
upstream (5' direction), and a positive value is shifted downstream.
Adjustments are made relative to the feature's strand, such that 
a start adjustment will always modify the feature's 5'end, either 
the feature startpoint or endpoint, depending on its orientation. 

=back

=head2 Output options

=over 4

=item --out E<lt>filenameE<gt>

Specify the output filename.

=item --bed

Automatically convert the output file to a BED file.

=item --bedname E<lt>name<gt>

Specify what to use for the Name column in the output BED file.
Several options are available, including:

    geneid          - The Primary ID of the parent Gene feature
    genename        - The Display Name of the parent Gene feature
    transcriptid    - The Primary ID of the parent Transcript feature
    transcriptname  - The Display Name of the parent Transcript feature
    featurename     - The generated name of the feature (default)

=item --gz

Specify whether (or not) the output file should be compressed with gzip.

=back

=head2 General options

=over 4

=item --version

Print the version number.

=item --help

Display this POD documentation.

=back

=head1 DESCRIPTION

This program will collect specific regions from annotated genes and/or 
transcripts. Often these regions are not explicitly defined in the 
source GFF3 annotation, necessitating a script to pull them out. These 
regions include the start and stop sites of transcription, introns, 
the splice sites (both 5' and 3'), exons, the first (5') or last (3') 
exons, or all alternate or common exons of genes with multiple 
transcripts. Importantly, unique regions may only be reported, 
especially important when a single gene may have multiple alternative 
transcripts. A slop factor is included for imprecise annotation.

The program will report the chromosome, start and stop coordinates, 
strand, name, and parent and transcript names for each region 



( run in 0.579 second using v1.01-cache-2.11-cpan-39bf76dae61 )