start-end results from the CPAN

start-end
Bio-BPWrapper
view release on metacpan or search on metacpan
 Seq5/1-8              ATGAATAA
 Seq6/1-8              ATGAATAA
                       ********
 nuc.aln.slice-19.aln:

 Seq1/19-33            AGCATAGAAGAATTA
 Seq2/19-33            AGCATAGAAGAATTA
 Seq3/19-33            AGCATAGAAGAATTA
 Seq4/19-33            AGCATAGAAGAATTA
 Seq5/19-33            AGCATAGAAGAATTA
 Seq6/19-33            AGCATAGAAGAATTA
                       ***************

 nuc.aln.slice-40.aln

 Seq1/40-47            AAAATAAG
 Seq2/40-47            AAAATAAG
 Seq3/40-47            AAAATAAG
 Seq4/40-47            AAAATAAG
 Seq5/40-47            AAAATAAG
 Seq6/40-47            AAAATAAG
                       ********

=item --concat, -A

Concatenate multiple alignments sharing the same set of IDs. This is normally used for concatenating individual gene
alignments of the same set of samples to a single one for making a "supertree".

   bioaln --concat gene1.aln gene2.aln gene3.aln gene4.aln

or using wildcard to specify multiple files (check with "ls *.aln" first to make sure of alignment order):

   bioaln --concat gene*.aln

Two outputs:
 1. concated alignment (in STANDOUT)
 2. "concat.log" file, which shows mapped positions for a reference seq (specified by "-r" otherwise first sequence)

=item --consensus, -C 'percent' (default 50)

Add a consensus sequence to the end of the alignment with a certain threshold percent and id Consensus_<percent>.

=item --delete, -d 'seq_id1,seq_id2,etc'

Delete sequences based on their ids. Option takes a comma-separated list of ids.

=item --dna2pep, -D

Turn an in-frame protein-coding sequence alignment to a corresponding protein alignment.

=item --gap-char '.'

Change '.' (e.g., from BCFtools, which causes problem for --uniq-seq) to default gap character '-'

=item --gap-states

Prints one alignment gap per line, including its start, end, whether in-frame, whether on-edge, how many copies, and alignment length. (Can't remember what context this was developed at first; ignore)

=item --gap-states2

Prints one alignment gap per column, including its start-end as column heading and presence/absence (1/0) in each sequence.

=item --input, -i 'format'

Now it tries to guess the format. BLAST outputs still need to be specified

[Deprecated except for blast output] Specify input file format. Common ones include 'clustalw' (default), 'fasta' and 'phylip'. See L<Bio::AlignIO> for supported formats.

In addition, it reads NCBI-blast outputs as well. e.g., bioaln -i'blast' blast.out.

=item --length, -l

Print alignment length.

=item --listids, -L

List all sequence ids.

=item --match, -m

Go through all columns and change residues identical to the reference sequence to be the match character, '.'.

For input:

 Seq1           ATGAATAAAAAGATATATAGCATAGAAGAATTAGTAGATAAA--ATAAGT
 Seq2           ATGAATAAAAAGATATACAGCATAGAAGAATTAATAGATAAACGATAAGC
 Seq3           ATGAATAATAAAATATACAGCATAGAAGAATTAATAGATAAA--ATAAGC
 Seq4           ATGAATAAAAAAACATATAGCATAGAAGAATTAATAGATAAA--ATAAGT
 Seq5           ATGAATAAAAAAATATATAGCATAGAAGAATTAATAGACAAAC-ATAAGC
 Seq6           ATGAATAAAAAAATATATAGCATAGAAGAATTAATAGACAAA--ATAAGT
                ******** ** * *** *************** **** ***  *****

C<bioaln -m input.aln> gives:

 Seq1           ATGAATAAAAAGATATATAGCATAGAAGAATTAGTAGATAAA--ATAAGT
 Seq2           .................C...............A........CG.....C
 Seq3           ........T..A.....C...............A...............C
 Seq4           ...........A.C...................A................
 Seq5           ...........A.....................A....C...C......C
 Seq6           ...........A.....................A....C...........

=item --no-flat, -F

By default, sequence names do not contain 'begin-end'. This option turns ON 'begin-end' naming.

=item --no-gaps, -g

Remove gaps (and returns an de-gapped alignment).

=item --num-seq, -n

Print number of sequences in alignment.

=item --output, -o 'format'

Output file format. Common ones include 'clustalw' (default), 'fasta' and 'phylip'. See L<Bio::AlignIO> for supported formats. An additional format 'paml' is supported.

=item --pair-diff

Print pairwise sequence differences, including columns: seqA, seqB, num_variable_sites (no gap), num_pair_diff (no gap), total_pair_length (no gap), percent identity, fraction diff, and pair_diff/num_variable. For DNA seqs, it counts any non-ATCG's (...
( run in 0.236 second using v1.01-cache-2.11-cpan-bf8d7bb2d05 )