Bio-Kmer
view release on metacpan or search on metacpan
The BioPerl way
use strict;
use warnings;
use Bio::SeqIO;
use Bio::Kmer;
# Load up any Bio::SeqIO object. Quality values will be
# faked internally to help with compatibility even if
# a fastq file is given.
my $seqin = Bio::SeqIO->new(-file=>"input.fasta");
my $kmer=Bio::Kmer->new($seqin);
my $kmerHash=$kmer->kmers();
my $countOfCounts=$kmer->histogram();
=head1 DESCRIPTION
A module for helping with kmer analysis. The basic methods help count kmers and can produce a count of counts. Currently this module only supports fastq format. Although this module can count kmers with pure perl, it is recommended to give the opti...
=head1 DEPENDENCIES
* BioPerl
* Jellyfish >=2
* Perl threads
* Perl >=5.10
=head1 VARIABLES
=over
=item $Bio::Kmer::iThreads
Boolean describing whether the module instance is using threads
=back
=head1 METHODS
=over
=item Bio::Kmer->new($filename, \%options)
Create a new instance of the kmer counter. One object per file.
Filename can be either a file path or a Bio::SeqIO object.
Applicable arguments for \%options:
Argument Default Description
kmercounter perl What kmer counter software to use.
Choices: Perl, Jellyfish.
kmerlength|k 21 Kmer length
numcpus 1 This module uses perl
multithreading with pure perl or
can supply this option to other
software like jellyfish.
gt 1 If the count of kmers is fewer
than this, ignore the kmer. This
might help speed analysis if you
do not care about low-count kmers.
sample 1 Retain only a percentage of kmers.
1 is 100%; 0 is 0%
Only works with the perl kmer counter.
verbose 0 Print more messages.
Examples:
my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4});
=back
=cut
=pod
=over
=item $kmer->ntcount()
Returns the number of base pairs counted.
In some cases such as when counting with Jellyfish,
that number is not calculated; instead the length
is calculated by the total length of kmers.
Internally, this number is stored as $kmer->{_ntcount}.
Note: internally runs $kmer->histogram() if
$kmer->{_ntcount} is not initially found.
Arguments: None
Returns: integer
=back
=cut
=pod
=over
=item $kmer->count()
Count kmers. This method is called as soon as new() is called
and so you should never have to run this method.
Internally caches the kmer counts to ram.
Arguments: None
Returns: None
=back
=cut
=pod
=over
=item $kmer->clearCache
Clears kmer counts and histogram counts. You should probably never use
( run in 2.462 seconds using v1.01-cache-2.11-cpan-0d23b851a93 )