Bio-Community

 view release on metacpan or  search on metacpan

lib/Bio/Community/Tools/Rarefier.pm  view on Meta::CPAN

# BioPerl module for Bio::Community::Tools::Rarefier
#
# Please direct questions and support issues to <bioperl-l@bioperl.org>
#
# Copyright 2011-2014 Florent Angly <florent.angly@gmail.com>
#
# You may distribute this module under the same terms as perl itself


=head1 NAME

Bio::Community::Tools::Rarefier - Normalize communities by count

=head1 SYNOPSIS

  use Bio::Community::Tools::Rarefier;

  # Normalize communities in a metacommunity by repeatedly taking 1,000 random members
  my $rarefier = Bio::Community::Tools::Rarefier->new(
     -metacommunity => $meta,
     -sample_size   => 1000,
     -threshold     => 0.001, # stop bootstrap iterations when threshold is reached
  );

  # Rarefied results, with decimal counts
  my $average_community = $rarefier->get_avg_meta->next_community;

  # Round counts to integer numbers
  my $representative_community = $rarefier->get_repr_meta->next_community;
  

  # Alternatively, specify a number of repetitions
  my $rarefier = Bio::Community::Tools::Rarefier->new(
     -metacommunity   => $meta,
     -sample_size     => 1000,
     -num_repetitions => 0.001, # stop after this number of bootstrap iterations
  );

  # ... or assume an infinite number of repetitions
  my $rarefier = Bio::Community::Tools::Rarefier->new(
     -metacommunity   => $meta,
     -sample_size     => 1000,
     -num_repetitions => 'inf',
  );

=head1 DESCRIPTION

This module takes a metacommunity and normalizes (rarefies) the communities it
contains by their number of counts.

Comparing the composition and diversity of biological communities can be biased
by sampling artefacts. When comparing two identical communities, one for which
10,000 counts were made to one, to one with only 1,000 counts, the smaller
community will appear less diverse. A solution is to repeatedly bootstrap the
larger communities by taking 1,000 random members from it.

This module uses L<Bio::Community::Sampler> to take random member from communities
and normalize them by their number of counts. After all random repetitions have
been performed, average communities or representative communities are returned.
These communities all have the same number of counts.

=head1 AUTHOR

Florent Angly L<florent.angly@gmail.com>

=head1 SUPPORT AND BUGS

User feedback is an integral part of the evolution of this and other Bioperl
modules. Please direct usage questions or support issues to the mailing list, 
L<bioperl-l@bioperl.org>, rather than to the module maintainer directly. Many
experienced and reponsive experts will be able look at the problem and quickly 
address it. Please include a thorough description of the problem with code and
data examples if at all possible.

If you have found a bug, please report it on the BioPerl bug tracking system
to help us keep track the bugs and their resolution:
L<https://redmine.open-bio.org/projects/bioperl/>

=head1 COPYRIGHT

Copyright 2011-2014 by Florent Angly <florent.angly@gmail.com>

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.10.1 or,
at your option, any later version of Perl 5 you may have available.

=head1 APPENDIX

The rest of the documentation details each of the object
methods. Internal methods are usually preceded with a _

=head2 new

 Function: Create a new Bio::Community::Tool::Rarefier object
 Usage   : my $rarefier = Bio::Community::Tool::Rarefier->new( );
 Args    : -metacommunity  : see metacommunity()

lib/Bio/Community/Tools/Rarefier.pm  view on Meta::CPAN

 Returns : positive integer for the (minimum) number of repetitions

=cut

has threshold => (
   is => 'rw',
   isa => 'Maybe[PositiveNum]',
   required => 0, 
   default => 1E-5, # maybe impossible to reach lower thresholds for simplistic communities
   lazy => 1,
   init_arg => '-threshold',
   trigger => sub { $_[0]->_clear_avg_meta; $_[0]->_clear_repr_meta },
);


=head2 num_repetitions

 Function: Get or set the number of bootstrap repetitions to perform. When given,
           instead of relying on the threshold() to determine when to stop
           repeating the bootstrap process, perform an arbitrary number of
           repetitions. After communities have been normalized by count using
           threshold() method, the number of repetitions actually done can be
           accessed using this method. As a special case, specify 'inf' to
           simulate an infinite number of repetitions.
 Usage   : my $repetitions = $rarefier->repetitions;
 Args    : positive integer or 'inf' for the number of repetitions
 Returns : positive integer for the (minimum) number of repetitions

=cut

has num_repetitions => (
   is => 'rw',
   isa => 'Maybe[PositiveInt | Str]',
   required => 0, 
   default => undef,
   lazy => 1,
   init_arg => '-num_repetitions',
   trigger => sub {
      $_[0]->_clear_avg_meta;
      $_[0]->_clear_repr_meta;
      $_[0]->_clear_sampler if $_[0]->num_repetitions eq 'inf';
   },
);

*repetitions = \&num_repetitions;


=head2 get_seed, set_seed

 Usage   : $sampler->set_seed(1234513451);
 Function: Get or set the seed used to pick the random members.
 Args    : Positive integer
 Returns : Positive integer

=cut


=head2 verbose

 Function: Get or set verbose mode. In verbose mode, the current number of
           iterations (and beta diversity if a threshold is used) is displayed.
 Usage   : $rarefier->verbose(1);
 Args    : 0 (default) or 1
 Returns : 0 or 1

=cut

has verbose => (
   is => 'rw',
   isa => 'Bool',
   required => 0, 
   default => 0,
   lazy => 1,
   init_arg => '-verbose',
);


=head2 drop

 Function: Get or set drop mode. In drop mode, this module silently drops
           communities that do not have enough members instead of reporting an
           error.
 Usage   : $rarefier->drop(1);
 Args    : 0 (default) or 1
 Returns : 0 or 1

=cut

has drop => (
   is => 'rw',
   isa => 'Bool',
   required => 0, 
   default => 0,
   lazy => 1,
   init_arg => '-drop',
);


=head2 get_avg_meta

 Function: Calculate an average metacommunity.
 Usage   : my $meta = $rarefier->get_avg_meta;
 Args    : none
 Returns : Bio::Community::Meta object

=cut

has average_meta => (
   is => 'rw',
   isa => 'Maybe[Bio::Community::Meta]',
   required => 0,
   default => undef,
   lazy => 1,
   reader => 'get_avg_meta',
   writer => '_set_avg_meta',
   predicate => '_has_avg_meta',
   clearer => '_clear_avg_meta',
);

before get_avg_meta => sub {
   my ($self) = @_;



( run in 0.569 second using v1.01-cache-2.11-cpan-71847e10f99 )