Bio-Community

 view release on metacpan or  search on metacpan

bin/bc_summarize  view on Meta::CPAN

L<Bio::Community::Tools::Summarizer> for more information.

=head1 REQUIRED ARGUMENTS

=over

=item -if <input_files>... | -input_files <input_files>...

Input file containing the communities to summarize. When providing communities
in a format that supports only one community per file (e.g. gaas), you can
provide multiple input files.

=for Euclid:
   input_files.type: readable

=back

=head1 OPTIONAL ARGUMENTS

=over

=item -wf <weight_files>... | -weight_files <weight_files>...

Tab-delimited files containing weights to assign to the community members.

=for Euclid:
   weight_files.type: readable

=item -wa <weight_assign> | -weight_assign <weight_assign>

When using a files of weights, define what to do for community members whose
weight is not specified in the weight file (default: weight_assign.default):

* $num : assign to the member the arbitrary weight $num provided

* file_average : assign to the member the average weight in this file.

* community_average : assign to the member the average weight in this community.

* ancestor: go up the taxonomic lineage of the member and assign to it the weight
of the first ancestor that has a weight in the weights file. Fall back to the
'community_average' method if no taxonomic information is available for this
member (for example a member with no BLAST hit).

See the weight_assign() method in Bio::Community::IO for more details.

=for Euclid:
   weight_assign.type: string
   weight_assign.default: 'ancestor'

=item -op <output_prefix> | -output_prefix <output_prefix>

Path and prefix for the output files. Default: output_prefix.default

=for Euclid:
   output_prefix.type: string
   output_prefix.default: 'bc_summarize'

=item -cr <convert_relab> | -convert_relab <convert_relab>

Convert counts into relative abundances (in percentage, taking into account
weights): 1 (yes), 0 (no). Default: convert_relab.default

=for Euclid:
   convert_relab.type: integer, convert_relab == 0 || convert_relab == 1
   convert_relab.type.error: <convert_relab> must be 0 or 1 (not convert_relab)
   convert_relab.default: 1

=item -md <merge_dups> | -merge_dups <merge_dups>

Merge community members with the exact same lineage. Default: merge_dups.default

=for Euclid:
   merge_dups.type: integer, merge_dups == 0 || merge_dups == 1
   merge_dups.type.error: <merge_dups> must be 0 or 1 (not merge_dups)
   merge_dups.default: 1

=item -td <taxo_dups> | -taxo_dups <taxo_dups>

By default, the lineage of species has to match character for character for them
to be considered duplicates. When you activate this option (1:on, 0:off), their
lineage is parsed and interpreted taxonomically. For example, these three
members would be duplicates:

  k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria
  k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria
  k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__;f__;g__;s__

Default: taxo_dups.default

=for Euclid:
   taxo_dups.type: integer, taxo_dups == 0 || taxo_dups == 1
   taxo_dups.type.error: <taxo_dups> must be 0 or 1 (not taxo_dups)
   taxo_dups.default: 0

=item -rl <relab_lt> | -relab_lt <relab_lt>

Group community members with a relative abundance less than the specified
threshold (in %) in ALL the communities into an 'Other' group. Default: relab_lt.default %

=for Euclid:
   relab_lt.type: number, relab_lt >= 0 && relab_lt <= 100
   relab_lt.type.error: <relab_lt> must be between 0 and 100 (not relab_lt)
   relab_lt.default: 1

=item -tl <tax_level_min> [.. <tax_level_max>] | -tax_level <tax_level_min> [.. <tax_level_max>]

Group members belonging to the same taxonomic level. For the Greengenes taxonomy,
level 1 represents kingdom, level 2 represents phylum, and so on, until level 7,
representing the species level. Members without taxonomic information are
grouped together in a member with the description 'Unknown taxonomy'. Default:
none

=for Euclid:
   tax_level_min.type: number, tax_level_min > 0
   tax_level_max.type: number, tax_level_max > 0
   tax_level_min.type.error: <tax_level> must be larger than 0
   tax_level_max.type.error: <tax_level> must be larger than 0

=back

bin/bc_summarize  view on Meta::CPAN

      if (not defined $tax_level_max) {
         $tax_level_max = $tax_level_min;
      } else {
         if ($tax_level_min > $tax_level_max) {
            ($tax_level_min, $tax_level_max) = ($tax_level_max, $tax_level_min);
         }
      }
   } else {
      # Keep original taxonomic level
      $tax_level_min = 0;
      $tax_level_max = 0;
   }

   # Summarize communities at all requested taxonomic levels
   my $summarized_meta = $meta;
   for (my $tax_level = $tax_level_max; $tax_level >= $tax_level_min; $tax_level--) {
      my $suffix = '';
      my $summarizer = Bio::Community::Tools::Summarizer->new(
         -metacommunity    => $summarized_meta,
         -merge_dups       => $merge_dups,
         -identify_dups_by => $taxo_dups ? 'taxon' : 'desc',
      );

      if ($tax_level) {
         $summarizer->by_tax_level($tax_level);
         $suffix = 'L'.$tax_level;
      }
      if ($by_rel_ab) {
         $summarizer->by_rel_ab( ['<', $by_rel_ab] );
      }
      my $summarized_meta = $summarizer->get_summary;
      # Write results, converting to relative abundance if desired
      write_communities($summarized_meta, $output_prefix, $format, $suffix, $convert2relab);
   }

   return 1;
}


func write_communities ($meta, $output_prefix, $output_format, $type = '',
   $convert2relab = 1) {
   my $multiple_communities = Bio::Community::IO->new(-format=>$output_format)->multiple_communities;
   my $num = 0;
   my $out;
   my $output_file = '';
   while (my $community = $meta->next_community) {
      if (not defined $out) {
         if ($multiple_communities) {
            $output_file = $output_prefix;
         } else {
            $num++;
            $output_file = $output_prefix.'_'.$num;
         }
         if ($type) {
            $output_file .= '_'.$type;
         }
         $output_file .= '.'.$output_format;
         $out = Bio::Community::IO->new(
            -format         => $output_format,
            -file           => '>'.$output_file,
            -abundance_type => $convert2relab ? 'percentage' : 'count',
         );
      }
      print "Writing community '".$community->name."' to file '$output_file'\n";
      $out->write_community($community);
      if (not $multiple_communities) {
         $out->close;
         $out = undef;
      }
   }
   if (defined $out) {
      $out->close;
   }
   return 1;
}



( run in 0.452 second using v1.01-cache-2.11-cpan-71847e10f99 )