Bio-BPWrapper
view release on metacpan or search on metacpan
=item --codon-sim <CUTG file>
(Temporarily disabled)
Output a CDS with the same AA sequence with each AA replaced by a synonymous codon randomly chosen from a genome codon pool (specified by CUTG GCG file; see test-files/BbB31.cutg). Only the 1st sequence used if multiple sequences are supplied.
For testing the signficcance of codon-bias information, it is necessary to run this repeatedly to generate at least 100 simulated CDSs, e.g., for i in {1..10}; do bioseq --codon-sim test-files/BbB31.cutg test-files/test-bioseq.cds; done
Two non-BioPerl module dependencies: Algorithm::Numerical::Sample; Math::Random.
=item --codon-table
Interface to Bio::Tools::CodonTable. Methods include: translate a codon to AA & reverse list codons for an AA. Currently only takes a 3-letter DNA-base codon or a 1-letter uppercase IUPAC aa code
--codon-table 'ACG'
--codon-table 'L'
=item --composition, -c
Base or AA composition.
=item --count-codons, -C
Count codons for coding sequences (e.g., a genome file consisting of CDS sequences).
=item --delete, -d 'tag:value'
Delete a sequence or a comma-separated list of sequences, e.g.,
--delete id:foo # by id
--delete order:2 # by order
--delete length:n # by min length, where 'n' is length
--delete ambig:x # by max number of x ambiguous bases (non-ATCGs), e.g., if x=20, delete seqs w/ 20 or mroe N's
--delete id:foo,bar # list by id
--delete re:REGEX # using a regular expression (only one regex is expected)
--delete file:name # by file, one id per line
=item --feat2fas | -F
Extract gene sequences in FASTA from a GenBank file of bacterial genome. Won't work for a eukaryote genbank file. For example:
bioseq -i'genbank' -F <genbank_file.gb> (throws error if not genbank file)
=item --fetch, -f <genbank_accession>
Retrieves a sequence from GenBank using the provided accession number, e.g.,
bioseq -f 'NC_003078' -o 'genbank'
=item --hydroB, -H
Return the mean Kyte-Doolittle hydropathicity for protein sequences.
=item --iep
Return iso-electric point for a protein sequences as well as charges at a series of pH values. Depends on Bio::Tools::pICalculator.
=item --input, -i
Input file format. By default, this is 'fasta'. For Genbank format, use 'genbank'. For EMBL format, use 'embl'. For FASTQ, use 'fastq'
[We tried to guess the format using Bio::Tools::GuessSeqFormat, but it didn't work for pipe. Guess format will be delayed until this issue is fixed]
=item --lead-gaps | -G
Count and return the number of leading gaps in each sequence.
=item --length, -l
Print all sequence lengths.
=item --linearize, -L
Linearize FASTA, one sequence per line.
=item --longest-orf, -z
Find and return the longest ORF (return the original if no error for -t1, i.e., no internal stop at the 1st reading frame). This is useful for fixing out-of-frame seqs. Turn on -Z (no revcom to search only in the given strand)
=item --mol-wt
Print lower and upper bound of molecular weight
=item --num-gaps-dna
Print number of non-ATCG's for a dna sequence.
Redirect STDERR to a file to view non-ATCG positions, e.g.,:
bioseq --num-gaps-dna seq.fas > out 2> log
=item --num-gaps-aa
Print number of non-AA's for a protein sequence.
Redirect STDERR to a file to view non-AA positions, e.g.,:
bioseq --num-gaps-aa seq.pep > out 2> log
=item --no-gaps, -g
Remove gaps
=item --num-seq, -n
Print number of sequences.
=item --output, -o 'format'
Output file format. By default, this is 'fasta'. For Genbank format, use 'genbank'. For EMBL format, use 'embl'.
=item --pick, -p 'tag:value'
Select a single sequence:
--pick 'id:foo' by id
--pick 'order:2' by order
--pick 're:REGEX' using a regular expression
Select a list of sequences:
--pick 'id:foo,bar' list by id
--pick 'order:2,3' list by order
--pick 'order:2-10' list by range
( run in 2.093 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )