Bio-BPWrapper

 view release on metacpan or  search on metacpan

bin/bioseq  view on Meta::CPAN

=item --codon-sim <CUTG file>

(Temporarily disabled) 

Output a CDS with the same AA sequence with each AA replaced by a synonymous codon randomly chosen from a genome codon pool (specified by CUTG GCG file; see test-files/BbB31.cutg). Only the 1st sequence used if multiple sequences are supplied.

For testing the signficcance of codon-bias information, it is necessary to run this repeatedly to generate at least 100 simulated CDSs, e.g., for i in {1..10}; do bioseq --codon-sim test-files/BbB31.cutg test-files/test-bioseq.cds; done

Two non-BioPerl module dependencies: Algorithm::Numerical::Sample; Math::Random.

=item --codon-table

Interface to Bio::Tools::CodonTable. Methods include: translate a codon to AA & reverse list codons for an AA. Currently only takes a 3-letter DNA-base codon or a 1-letter uppercase IUPAC aa code

    --codon-table 'ACG'
    --codon-table 'L'

=item --composition, -c

Base or AA composition.

=item --count-codons, -C

Count codons for coding sequences (e.g., a genome file consisting of CDS sequences).

=item --delete, -d 'tag:value'

Delete a sequence or a comma-separated list of sequences, e.g.,

   --delete id:foo	 # by id
   --delete order:2	 # by order
   --delete length:n     # by min length, where 'n' is length
   --delete ambig:x	 # by max number of x ambiguous bases (non-ATCGs), e.g., if x=20, delete seqs w/ 20 or mroe N's
   --delete id:foo,bar   # list by id
   --delete re:REGEX     # using a regular expression (only one regex is expected)
   --delete file:name    # by file, one id per line

=item --feat2fas | -F

Extract gene sequences in FASTA from a GenBank file of bacterial genome. Won't work for a eukaryote genbank file. For example:

   bioseq -i'genbank' -F <genbank_file.gb> (throws error if not genbank file)

=item --fetch, -f <genbank_accession>

Retrieves a sequence from GenBank using the provided accession number, e.g.,
   bioseq -f 'NC_003078' -o 'genbank'

=item --hydroB, -H

Return the mean Kyte-Doolittle hydropathicity for protein sequences.

=item --iep

Return iso-electric point for a protein sequences as well as charges at a series of pH values. Depends on Bio::Tools::pICalculator.

=item --input, -i

Input file format. By default, this is 'fasta'. For Genbank format, use 'genbank'. For EMBL format, use 'embl'. For FASTQ, use 'fastq'

[We tried to guess the format using Bio::Tools::GuessSeqFormat, but it didn't work for pipe. Guess format will be delayed until this issue is fixed]

=item --lead-gaps | -G

Count and return the number of leading gaps in each sequence.

=item --length, -l

Print all sequence lengths.

=item --linearize, -L

Linearize FASTA, one sequence per line.

=item --longest-orf, -z

Find and return the longest ORF (return the original if no error for -t1, i.e., no internal stop at the 1st reading frame). This is useful for fixing out-of-frame seqs. Turn on -Z (no revcom to search only in the given strand)

=item --mol-wt

Print lower and upper bound of molecular weight

=item --num-gaps-dna

Print number of non-ATCG's for a dna sequence. 

Redirect STDERR to a file to view non-ATCG positions, e.g.,:
 bioseq --num-gaps-dna seq.fas > out 2> log

=item --num-gaps-aa

Print number of non-AA's for a protein sequence. 

Redirect STDERR to a file to view non-AA positions, e.g.,:
 bioseq --num-gaps-aa seq.pep > out 2> log

=item --no-gaps, -g

Remove gaps

=item --num-seq, -n

Print number of sequences.

=item --output, -o 'format'

Output file format. By default, this is 'fasta'. For Genbank format, use 'genbank'. For EMBL format, use 'embl'.

=item --pick, -p 'tag:value'

Select a single sequence:

   --pick 'id:foo'        by id
   --pick 'order:2'       by order
   --pick 're:REGEX'      using a regular expression

Select a list of sequences:

   --pick 'id:foo,bar'    list by id
   --pick 'order:2,3'     list by order
   --pick 'order:2-10'    list by range



( run in 2.093 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )