Bio-Glite

 view release on metacpan or  search on metacpan

lib/Bio/Glite.pm  view on Meta::CPAN


 COMMENT  
         $gb->{COMMENT}  

 FEATURE  
         Each FEATURE is numbered(FEATURE1 .. FEATURE1172), and is a 
         hash structure that contains all the keys of Genbank.   
         In other words,  in most cases, FEATURE$i's hash at least 
         contains informations listed below: 
         $gb->{FEATURE$i}->{start}  
         $gb->{FEATURE$i}->{end}  
         $gb->{FEATURE$i}->{direction}
         $gb->{FEATURE$i}->{join}
         $gb->{FEATURE$i}->{note}  
         $gb->{FEATURE$i}->{type}        -CDS,gene,RNA,etc.
         $gb->{FEATURE$i}->{feature}     -same as $i

         To analyze each FEATURE, write: 

         foreach my $feature ($gb->feature()){
               print $gb->{$feature}->{type}, "\n";
         }  

         In the same manner, to analyze all CDS, write:  
 
         foreach my $cds ($gb->cds()){
               print $gb->{$cds}->{gene}, "\n";
         }

         Feature or gene information can also be accessed with CDS numbers:
         $gb->{CDS$i}->{start}

         or with locus_tags or gene names (for CDS, tRNA, and rRNA)
         $gb->{thrL}->{start}
         $gb->{b0001}->{start}

 BASE COUNT  
         $gb->{BASE_COUNT}  

 SEQ  
         $gb->seq()              -sequence data following "ORIGIN" 

=head1 Supported methods of G-language Genome Analysis Environment

=cut

=head2 $gb = new G("genome file")

     Name: $gb = new G("genome file")   -   create a G instance

     see "help load" for more information.

=cut

=head2 load

     Name: load   -   load genome databases

         This funciton is used to load genome databases into memory.
         First option is the filename of the database. Default format is
         the GenBank database. Database format is guessed from the extensions.
         (eg. .gbk => GenBank, .fasta => FASTA, .embl => EMBL)

         There are also several sample bacterial genomes included in the system.
         $eco   = load("ecoli"); # Escherichia coli K12 MG1655 - NC_000913
         $bsub  = load("bsub");  # Bacillus subtilis           - NC_000964
         $mgen  = load("mgen");  # Mycoplasma genitalium       - NC_000908
         $cyano = load("cyano"); # Synechococcus sp.           - NC_005070
         $pyro  = load("pyro");  # Pyrococcus furiosus         - NC_003413

         Data can be automatically donwloaded from public databases using
         Uniform Sequence Address (USA) keys.
         http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html
         Currently supported database keys are: 
            swiss, genbank, genpept, embl, refseq 
         eg. 
            $gb = load("embl:xlrhodop");
            $gb = load("genbank:AY063336")
            $gb = load("swiss:ROA1_HUMAN")

         Second option specifies detailed actions.

           'no msg'                  suprresses all STDOUT messages printed 
                                     when loading a database, including the
                                     copyright info and sequence statistics.

           'no cache'                suppresses the use of database caching.
                                     By default, databases are cached for
                                     optimized performance. (since v.1.6.4)

           'force cache'             rebuilds database cache.

           'multiple locus'          this option merges multiple loci in the 
                                     database and load the information
                                     as G-language instance.

           'bioperl'                 this option creates a G instance from 
                                     a bioperl object. 
                                     eg. $bp = $bp->next_seq();       # bioperl
                                         $gb = load($bp, "bioperl"); # G

           'longest ORF annotation'  this option predicts genes with longest ORF
                                     algorithm (longest frame from start codon
                                     to stop codon, with more than 17 amino 
                                     acids) and annotates the sequence.

           'glimmer annotation'      this option predicts genes using glimmer2,
                                     a gene prediction software for microbial
                                     genomes available from TIGR.
                                     http://www.tigr.org/softlab/
                                     Local installation of glimmer2 and setting
                                     of PATH environment value is required.

               - following options require bioperl installation -

           'Fasta'              this option loads a Fasta format database.
           'EMBL'               this option loads a EMBL  format database.
           'swiss'              this option loads a swiss format database.
           'SCF'                this option loads a SCF   format database.
           'PIR'                this option loads a PIR   format database.
           'GCG'                this option loads a GCG   format database.



( run in 0.538 second using v1.01-cache-2.11-cpan-5735350b133 )