Bio-ToolBox
view release on metacpan or search on metacpan
lib/Bio/ToolBox/Data.pm view on Meta::CPAN
name of the saved file, complete with any changes to the file extension.
=item summary_file
Write a separate file summarizing columns of data (mean values).
The mean value of each column becomes a row value, and each column
header becomes a row identifier (i.e. the table is transposed). The
best use of this is to summarize the mean profile of windowed data
collected across a feature. See the L<Bio::ToolBox> scripts
L<get_relative_data.pl> and L<get_binned_data.pl> as examples.
For data from L<get_binned_data.pl> where the columns are expressed
as percentile bins, the reported midpoint column is automatically
converted based on a length of 1000 bp.
You may pass these options. They are optional.
=over 4
=item filename
Pass an optional new filename. The default is to take the basename
and append "_<method>_summary" to it.
=item startcolumn
=item stopcolumn
Provide the starting and ending columns to summarize. The default
start is the leftmost column without a recognized standard name.
The default ending column is the last rightmost column.
=item dataset
Pass a string that is the name of the dataset. This could be collected
from the metadata, if present. This will become the name of the score
column if defined.
=item method
Pass the name of the method to combine the values. Methods include
C<mean> (default if not specified), C<median>, or C<trimmean>, where
the top and bottom 1% of the sorted values are discarded and a mean
of the remaining 98% of the values is used. If fewer than 100 values
are available, no trimming is done and a straight mean value is
determined.
=back
The name of the summarized column is either the provided dataset name,
the defined basename in the metadata of the Data structure, or a generic
name. If successful, it will return the name of the file saved.
=back
=head2 Verifying Datasets
When working with row Features and collecting scores, the dataset
from which you are collecting must be verified prior to collection.
This ensures that the proper database adaptor is available and loaded,
and that the dataset is correctly specified (otherwise nothing would be
collected). This verification is normally performed transparently when
you call L<get_score|Bio::ToolBox::Data::Feature/get_score> or
L<get_position_scores|Bio::ToolBox::Data::Feature/get_position_scores>.
However, datasets may be explicitly verified prior to calling the score
methods.
=over 4
=item verify_dataset
my $dataset = $Data->verify_dataset($dataset, $database);
Pass the name of the dataset (GFF type or type:source) for a GFF3-based
database, e.g. L<Bio::DB::SeqFeature::Store>, or path and file name for a
data file, e.g. Bam, BigWig, BigBed, or USeq file. If a separate database
is being used, pass the name or opened database object as a second
parameter. For more advance options, see
L<Bio::ToolBox::db_helper/verify_or_request_feature_types>.
The name of the verified dataset, with a prefix if necessary, is returned.
=back
=head2 Efficient Data Access
Most of the time we need to iterate over the Data table, one row
at a time, collecting data or processing information. These methods
simplify the process.
=over 4
=item iterate
$Data->iterate( sub {
my $row = shift;
my $number = $row->value($index);
my $log_number = log($number);
$row->value($index, $log_number);
} );
This method will process a code reference on every row in the data
table. Pass a subroutine or code reference. The subroutine will
receive the row as a L<Bio::ToolBox::Data::Feature> object. With this
object, you can retrieve values, set values, and add new values.
=item row_stream
This returns an C<Bio::ToolBox::Data::Iterator> object, which has one
method, C<next_row>. Call this method repeatedly until it returns
C<undef> to work through each row of data.
Users of the C<Bio::DB> family of database adaptors may recognize the
analogy to the C<seq_stream> method.
=item next_row
my $stream = $Data->row_stream;
while (my $row = $stream->next_row) {
# each $row is a Bio::ToolBox::Data::Feature object
# representing the row in the data table
my $value = $row->value($index);
( run in 0.935 second using v1.01-cache-2.11-cpan-39bf76dae61 )