Bio-Community
view release on metacpan or search on metacpan
bin/bc_summarize view on Meta::CPAN
L<Bio::Community::Tools::Summarizer> for more information.
=head1 REQUIRED ARGUMENTS
=over
=item -if <input_files>... | -input_files <input_files>...
Input file containing the communities to summarize. When providing communities
in a format that supports only one community per file (e.g. gaas), you can
provide multiple input files.
=for Euclid:
input_files.type: readable
=back
=head1 OPTIONAL ARGUMENTS
=over
=item -wf <weight_files>... | -weight_files <weight_files>...
Tab-delimited files containing weights to assign to the community members.
=for Euclid:
weight_files.type: readable
=item -wa <weight_assign> | -weight_assign <weight_assign>
When using a files of weights, define what to do for community members whose
weight is not specified in the weight file (default: weight_assign.default):
* $num : assign to the member the arbitrary weight $num provided
* file_average : assign to the member the average weight in this file.
* community_average : assign to the member the average weight in this community.
* ancestor: go up the taxonomic lineage of the member and assign to it the weight
of the first ancestor that has a weight in the weights file. Fall back to the
'community_average' method if no taxonomic information is available for this
member (for example a member with no BLAST hit).
See the weight_assign() method in Bio::Community::IO for more details.
=for Euclid:
weight_assign.type: string
weight_assign.default: 'ancestor'
=item -op <output_prefix> | -output_prefix <output_prefix>
Path and prefix for the output files. Default: output_prefix.default
=for Euclid:
output_prefix.type: string
output_prefix.default: 'bc_summarize'
=item -cr <convert_relab> | -convert_relab <convert_relab>
Convert counts into relative abundances (in percentage, taking into account
weights): 1 (yes), 0 (no). Default: convert_relab.default
=for Euclid:
convert_relab.type: integer, convert_relab == 0 || convert_relab == 1
convert_relab.type.error: <convert_relab> must be 0 or 1 (not convert_relab)
convert_relab.default: 1
=item -md <merge_dups> | -merge_dups <merge_dups>
Merge community members with the exact same lineage. Default: merge_dups.default
=for Euclid:
merge_dups.type: integer, merge_dups == 0 || merge_dups == 1
merge_dups.type.error: <merge_dups> must be 0 or 1 (not merge_dups)
merge_dups.default: 1
=item -td <taxo_dups> | -taxo_dups <taxo_dups>
By default, the lineage of species has to match character for character for them
to be considered duplicates. When you activate this option (1:on, 0:off), their
lineage is parsed and interpreted taxonomically. For example, these three
members would be duplicates:
k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria
k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria
k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__;f__;g__;s__
Default: taxo_dups.default
=for Euclid:
taxo_dups.type: integer, taxo_dups == 0 || taxo_dups == 1
taxo_dups.type.error: <taxo_dups> must be 0 or 1 (not taxo_dups)
taxo_dups.default: 0
=item -rl <relab_lt> | -relab_lt <relab_lt>
Group community members with a relative abundance less than the specified
threshold (in %) in ALL the communities into an 'Other' group. Default: relab_lt.default %
=for Euclid:
relab_lt.type: number, relab_lt >= 0 && relab_lt <= 100
relab_lt.type.error: <relab_lt> must be between 0 and 100 (not relab_lt)
relab_lt.default: 1
=item -tl <tax_level_min> [.. <tax_level_max>] | -tax_level <tax_level_min> [.. <tax_level_max>]
Group members belonging to the same taxonomic level. For the Greengenes taxonomy,
level 1 represents kingdom, level 2 represents phylum, and so on, until level 7,
representing the species level. Members without taxonomic information are
grouped together in a member with the description 'Unknown taxonomy'. Default:
none
=for Euclid:
tax_level_min.type: number, tax_level_min > 0
tax_level_max.type: number, tax_level_max > 0
tax_level_min.type.error: <tax_level> must be larger than 0
tax_level_max.type.error: <tax_level> must be larger than 0
=back
bin/bc_summarize view on Meta::CPAN
if (not defined $tax_level_max) {
$tax_level_max = $tax_level_min;
} else {
if ($tax_level_min > $tax_level_max) {
($tax_level_min, $tax_level_max) = ($tax_level_max, $tax_level_min);
}
}
} else {
# Keep original taxonomic level
$tax_level_min = 0;
$tax_level_max = 0;
}
# Summarize communities at all requested taxonomic levels
my $summarized_meta = $meta;
for (my $tax_level = $tax_level_max; $tax_level >= $tax_level_min; $tax_level--) {
my $suffix = '';
my $summarizer = Bio::Community::Tools::Summarizer->new(
-metacommunity => $summarized_meta,
-merge_dups => $merge_dups,
-identify_dups_by => $taxo_dups ? 'taxon' : 'desc',
);
if ($tax_level) {
$summarizer->by_tax_level($tax_level);
$suffix = 'L'.$tax_level;
}
if ($by_rel_ab) {
$summarizer->by_rel_ab( ['<', $by_rel_ab] );
}
my $summarized_meta = $summarizer->get_summary;
# Write results, converting to relative abundance if desired
write_communities($summarized_meta, $output_prefix, $format, $suffix, $convert2relab);
}
return 1;
}
func write_communities ($meta, $output_prefix, $output_format, $type = '',
$convert2relab = 1) {
my $multiple_communities = Bio::Community::IO->new(-format=>$output_format)->multiple_communities;
my $num = 0;
my $out;
my $output_file = '';
while (my $community = $meta->next_community) {
if (not defined $out) {
if ($multiple_communities) {
$output_file = $output_prefix;
} else {
$num++;
$output_file = $output_prefix.'_'.$num;
}
if ($type) {
$output_file .= '_'.$type;
}
$output_file .= '.'.$output_format;
$out = Bio::Community::IO->new(
-format => $output_format,
-file => '>'.$output_file,
-abundance_type => $convert2relab ? 'percentage' : 'count',
);
}
print "Writing community '".$community->name."' to file '$output_file'\n";
$out->write_community($community);
if (not $multiple_communities) {
$out->close;
$out = undef;
}
}
if (defined $out) {
$out->close;
}
return 1;
}
( run in 0.452 second using v1.01-cache-2.11-cpan-71847e10f99 )