Bio-MUST-Drivers

 view release on metacpan or  search on metacpan

bin/annotate-ali.pl  view on Meta::CPAN


use Tie::IxHash;

use Bio::MUST::Core;
use Bio::MUST::Core::Utils qw(secure_outfile);
use Bio::MUST::Drivers;


# TODO: add support for prebuilt reference database (e.g. nr)

# convert fractional identity threshold to percentage (see Euclid)
$ARGV_identity *= 100.0 if 0 < $ARGV_identity && $ARGV_identity <= 1;

### Building database: $ARGV_ref_file
my $blastdb = Bio::MUST::Drivers::Blast::Database::Temporary->new(
    seqs => $ARGV_ref_file
);

for my $infile (@ARGV_infiles) {

    ### Processing: $infile

bin/annotate-ali.pl  view on Meta::CPAN

    } );

    #### Parsing BLAST report...
    tie my %ann_for,    'Tie::IxHash';
    tie my %hit_id_for, 'Tie::IxHash';
    my $curr_id = q{};

    HIT:
    while ( my $hit = $parser->next_hit ) {
        my ($qid, $hid, $identity)
            = map { $hit->$_ } qw(query_id hit_id percent_identity);

        next HIT if $identity < $ARGV_identity;     # skip weak-identity hits

        unless ($ARGV_hit_list) {                   # optionally
            next HIT if $qid eq $curr_id;           # skip non-first hits
            $curr_id = $qid;
        }

        # capture annotation bit in ref seq_id using regex
        my ($annotation) = $blastdb->long_id_for($hid) =~ $ARGV_ref_regex;

bin/annotate-ali.pl  view on Meta::CPAN


E-value threshold for annotating a sequence [default: 1e-10].

=for Euclid: float.type: number
    float.default: 1e-10

=item --identity [=] <number>

Identity threshold for annotating a sequence [default: 0]. When specified as a
fraction between 0 and 1 (included), it is first multiplied by 100 to be
interpreted in percentage.

=for Euclid: number.type: number
    number.default: 0

=item --max-hits [=] <number>

Number of hits to return for each query (BLAST -max_target_seqs option)
[default: 10]. Mostly useful in conjunction with the C<--hit-list> option.

=for Euclid: number.type: number

bin/annotate-ali.pl  view on Meta::CPAN

=item --hit-list

Print a list of id/hit pairs (at the specified E-value and identity thresholds)
instead of the standard annotation report [default: no].

=item --fields [=] <str>...

List of whitespace-separated BLAST fields to be displayed in final report
[default: no].

Valid fields are: percent_identity, hsp_length, mismatches, gaps, query_from,
query_to, hit_from, hit_to, evalue, bit_score, query_start, query_end,
hit_start, hit_end.

=for Euclid: str.type: string, str eq "percent_identity" || str eq "hsp_length" || str eq "mismatches" || str eq "gaps" || str eq "query_from" || str eq "query_to" || str eq "hit_from" || str eq "hit_to" || str eq "evalue" || str eq "bit_score" || st...
    str.default: []

=item --out[-suffix] [=] <suffix>

Suffix to append to infile basenames for deriving outfile names [default:
-ann]. When not specified, outfile names are taken from infiles but original
infiles are preserved by being appended a .bak suffix.

=for Euclid: suffix.type: string
    suffix.default: '-ann'



( run in 1.224 second using v1.01-cache-2.11-cpan-39bf76dae61 )