Bio-MUST-Apps-FortyTwo
view release on metacpan or search on metacpan
bin/compress-db.pl view on Meta::CPAN
=item --cap3-o=<n>
Overlap length cutoff for CAP3 (should be > 15) [default: 40].
=for Euclid: n.type: n > 15
n.default: 40
=item --cap3-p=<n>
Overlap percent identity cutoff for CAP3 (should be > 65) [default: 90].
=for Euclid: n.type: n > 65
n.default: 90
=item --verbosity=<level>
Verbosity level for logging to STDERR [default: 0]. Available levels range from
0 to 6. Level 6 corresponds to debugging mode.
=for Euclid: level.type: int, level >= 0 && level <= 6
bin/debrief-42.pl view on Meta::CPAN
# Write file contents
say {$out} join "\t", @{ $line_for{$_} } for @sort_all_banks;
say {$out_sum} join "\t", @{ $line_for{$_} }[0..8,-7..-1] for @sort_all_banks;
### Done!
##################################### SUBS #####################################
sub compute_percentage {
my $array = shift;
my $total = shift;
my @results;
my $percentage;
CALC:
for my $value (@$array) {
if ($value == 0) {
$percentage = 0;
push @results, '-/-';
next CALC;
}
else {
$percentage = $value / $total;
# $percentage = $value * 100 / $total;
}
$percentage = sprintf("%.2f", $percentage);
push @results, $value . '/' . $percentage;
}
return \@results;
}
# for testing:
# perl -Ilib bin/debrief-42.pl --indir=xtest/tax_reports/ \
# --in-strip=-42-camera-megan99-tf --taxdir=../Bio-MUST-Core/test/taxdump
# --seq_labeling=xtest/seq-labels.idl --contam_labeling=xtest/contam-labels.idl \
# --outdir=dbout
bin/prune-outliers.pl view on Meta::CPAN
Minimal identity value used for selecting sequences that match at least this
proportion in the all-versus-all BLAST searches [default: n.default]. An output
dir will be created by step of 0.1 between the min threshold and max threshold.
=for Euclid: n.type: num
n.default: 0.3
=item --max-ident=<n> | --max_ident=<n>
Maximum percent value used for selecting sequences that match at least this
proportion in the all versus all BLAST searches [default: n.default]. An output
dir will be created by step of 0.1 between the min threshold and max threshold.
=for Euclid: n.type: num
n.default: 0.8
=item --min-hits=<n> | --min_hits=<n>
Minimum number of hits in the all-versus-all BLAST searches required for a
sequence to be retained in the output file [default: n.default].
bin/yaml-generator-42.pl view on Meta::CPAN
[% IF megan_like -%]
tax_max_hits: 100
[% ELSIF best_hit -%]
tax_max_hits: 1
[% ELSIF tax_max_hits -%]
tax_max_hits: [% tax_max_hits %]
[% END -%]
# ===Min identity of relatives to use when inferring taxonomy of new seqs===
# Only meaningful when enabling 'tax_reports' or specifying 'tax_filter'.
# This parameter is the traditional BLAST 'percent identity' statistics except
# that it is specified as a fractional number (between 0 and 1). It is
# evaluated on the first HSP of potential relatives.
# When not specified, 'tax_min_ident' internally defaults to 0.
[% IF megan_like -%]
tax_min_ident: 0
[% ELSIF best_hit -%]
tax_min_ident: 0
[% ELSIF tax_min_ident -%]
tax_min_ident: [% tax_min_ident %]
[% END -%]
bin/yaml-generator-42.pl view on Meta::CPAN
my $hit_filtering = prompt "\nSet hit-filtering mode: ",
-menu => { 'default values' => 0, 'length/identity' => 'length_identity', 'Bit score' => 'bitscore' };
if ($hit_filtering eq 'bitscore') {
$ARGV{'--tax_min_score'} = prompt "\nSet minimum bit score to consider a hit: ",
-must => { 'be an integer' => qr{^[0-9]+\z} },
-def => $ARGV{'--tax_min_score'};
}
if ($hit_filtering eq 'length_identity') {
$ARGV{'--tax_min_ident'} = prompt "\nSet minimum percentage of identity to consider a hit: ",
-must => { 'be a number between 0 and 1' => qr{^[0-1](?:\.\d+)?}xms },
-def => $ARGV{'--tax_min_ident'};
$ARGV{'--tax_min_len'} = prompt "\nSet minimum length to consider a hit: ",
-must => { 'be an integer' => qr{^[1-9]+} },
-def => $ARGV{'--tax_min_len'};
}
}
# MEGAN_LIKE
lib/Bio/MUST/Apps/FortyTwo/Manual.pod view on Meta::CPAN
the C<megan-like> algorithm, so as to avoid false positives during LCA
computation, with a C<--tax_score_mul> of C<0.99> instead of C<0.95> and a
C<--tax_min_ident> of C<50> instead of C<0>.
The follow up consists in running C<debrief-42.pl>, which parses the taxonomic
reports produced by C<42> in order to compare the taxonomic label (LCA) of each
ortholog computed by C<42> with the source organism lineage (according to I<NCBI
Taxonomy>) and classifies the sequences as contaminants if they differ at a
predefined taxonomic rank, based on a first user-defined list of taxa
(C<--seq_labeling>). After each ortholog has been classified, an estimated
contamination percentage is computed.
Additionally, contaminations are further classified to determine the main
sources of contaminants, based on a second user-defined list of taxa
(C<--contam_labeling>), which allows the user to fine control the output report.
In this regard, we distinguish two types of sequences, B<classified
contaminations> and B<unclassified contaminations>. The latter are those that
bear an uninformative taxonomic label, i.e., too broad to point to a specific
lineage with accuracy (e.g., C<Sar>). Finally, the sequences that can only be
affiliated at the highest taxonomic levels, such as C<cellular organisms>,
C<Eukaryota>, C<Bacteria> or C<Archaea>, are classified as B<unknown sequences>.
lib/Bio/MUST/Apps/FortyTwo/OrgProcessor.pm view on Meta::CPAN
$ali->add_seq( shift @seqs2cap );
return;
}
# TODO: add debugging comments?
# try to cap seqs
my $cap = Cap3->new(
seqs => \@seqs2cap,
cap3_args => {
-p => $rp->merge_min_ident * 100.0, # CAP3 expects percents
-o => $rp->merge_min_len,
},
);
# add singlet seqs
my @singlets = $cap->all_singlets;
$ali->add_seq($_) for @singlets;
# proceed only if contigs of seqs
my @contigs = $cap->all_contigs;
test/config-42-prot-tax.yaml view on Meta::CPAN
tax_min_hits: 1
# ===Max number of relatives to use when inferring taxonomy of new seqs===
# Only meaningful when enabling 'tax_reports' or specifying 'tax_filter'.
# As for 'tax_min_hits' above, this parameter is a upper bound.
# When not specified, 'tax_max_hits' internally defaults to unlimited.
tax_max_hits: 1
# ===Min identity of relatives to use when inferring taxonomy of new seqs===
# Only meaningful when enabling 'tax_reports' or specifying 'tax_filter'.
# This parameter is the traditional BLAST 'percent identity' statistics except
# that it is specified as a fractional number (between 0 and 1). It is
# evaluated on the first HSP of potential relatives.
# When not specified, 'tax_min_ident' internally defaults to 0.
tax_min_ident: 0
# ===Min length of relatives to use when inferring taxonomy of new seqs===
# Only meaningful when enabling 'tax_reports' or specifying 'tax_filter'.
# This parameter is the traditional BLAST 'alignment length' statistics. It is
# evaluated on the first HSP of potential relatives.
# When not specified, 'tax_min_len' internally defaults to 0.
tax_min_len: 0
( run in 0.428 second using v1.01-cache-2.11-cpan-709fd43a63f )