Bio-Glite
view release on metacpan or search on metacpan
lib/Bio/Glite.pm view on Meta::CPAN
COMMENT
$gb->{COMMENT}
FEATURE
Each FEATURE is numbered(FEATURE1 .. FEATURE1172), and is a
hash structure that contains all the keys of Genbank.
In other words, in most cases, FEATURE$i's hash at least
contains informations listed below:
$gb->{FEATURE$i}->{start}
$gb->{FEATURE$i}->{end}
$gb->{FEATURE$i}->{direction}
$gb->{FEATURE$i}->{join}
$gb->{FEATURE$i}->{note}
$gb->{FEATURE$i}->{type} -CDS,gene,RNA,etc.
$gb->{FEATURE$i}->{feature} -same as $i
To analyze each FEATURE, write:
foreach my $feature ($gb->feature()){
print $gb->{$feature}->{type}, "\n";
}
In the same manner, to analyze all CDS, write:
foreach my $cds ($gb->cds()){
print $gb->{$cds}->{gene}, "\n";
}
Feature or gene information can also be accessed with CDS numbers:
$gb->{CDS$i}->{start}
or with locus_tags or gene names (for CDS, tRNA, and rRNA)
$gb->{thrL}->{start}
$gb->{b0001}->{start}
BASE COUNT
$gb->{BASE_COUNT}
SEQ
$gb->seq() -sequence data following "ORIGIN"
=head1 Supported methods of G-language Genome Analysis Environment
=cut
=head2 $gb = new G("genome file")
Name: $gb = new G("genome file") - create a G instance
see "help load" for more information.
=cut
=head2 load
Name: load - load genome databases
This funciton is used to load genome databases into memory.
First option is the filename of the database. Default format is
the GenBank database. Database format is guessed from the extensions.
(eg. .gbk => GenBank, .fasta => FASTA, .embl => EMBL)
There are also several sample bacterial genomes included in the system.
$eco = load("ecoli"); # Escherichia coli K12 MG1655 - NC_000913
$bsub = load("bsub"); # Bacillus subtilis - NC_000964
$mgen = load("mgen"); # Mycoplasma genitalium - NC_000908
$cyano = load("cyano"); # Synechococcus sp. - NC_005070
$pyro = load("pyro"); # Pyrococcus furiosus - NC_003413
Data can be automatically donwloaded from public databases using
Uniform Sequence Address (USA) keys.
http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html
Currently supported database keys are:
swiss, genbank, genpept, embl, refseq
eg.
$gb = load("embl:xlrhodop");
$gb = load("genbank:AY063336")
$gb = load("swiss:ROA1_HUMAN")
Second option specifies detailed actions.
'no msg' suprresses all STDOUT messages printed
when loading a database, including the
copyright info and sequence statistics.
'no cache' suppresses the use of database caching.
By default, databases are cached for
optimized performance. (since v.1.6.4)
'force cache' rebuilds database cache.
'multiple locus' this option merges multiple loci in the
database and load the information
as G-language instance.
'bioperl' this option creates a G instance from
a bioperl object.
eg. $bp = $bp->next_seq(); # bioperl
$gb = load($bp, "bioperl"); # G
'longest ORF annotation' this option predicts genes with longest ORF
algorithm (longest frame from start codon
to stop codon, with more than 17 amino
acids) and annotates the sequence.
'glimmer annotation' this option predicts genes using glimmer2,
a gene prediction software for microbial
genomes available from TIGR.
http://www.tigr.org/softlab/
Local installation of glimmer2 and setting
of PATH environment value is required.
- following options require bioperl installation -
'Fasta' this option loads a Fasta format database.
'EMBL' this option loads a EMBL format database.
'swiss' this option loads a swiss format database.
'SCF' this option loads a SCF format database.
'PIR' this option loads a PIR format database.
'GCG' this option loads a GCG format database.
( run in 0.538 second using v1.01-cache-2.11-cpan-5735350b133 )