Bio-ToolBox

 view release on metacpan or  search on metacpan

lib/Bio/ToolBox/Data.pm  view on Meta::CPAN

name of the saved file, complete with any changes to the file extension.

=item summary_file

Write a separate file summarizing columns of data (mean values). 
The mean value of each column becomes a row value, and each column 
header becomes a row identifier (i.e. the table is transposed). The 
best use of this is to summarize the mean profile of windowed data 
collected across a feature. See the L<Bio::ToolBox> scripts 
L<get_relative_data.pl> and L<get_binned_data.pl> as examples. 
For data from L<get_binned_data.pl> where the columns are expressed 
as percentile bins, the reported midpoint column is automatically 
converted based on a length of 1000 bp.

You may pass these options. They are optional.

=over 4

=item filename

Pass an optional new filename. The default is to take the basename 
and append "_<method>_summary" to it.

=item startcolumn

=item stopcolumn

Provide the starting and ending columns to summarize. The default 
start is the leftmost column without a recognized standard name. 
The default ending column is the last rightmost column.

=item dataset

Pass a string that is the name of the dataset. This could be collected 
from the metadata, if present. This will become the name of the score 
column if defined.

=item method

Pass the name of the method to combine the values. Methods include 
C<mean> (default if not specified), C<median>, or C<trimmean>, where 
the top and bottom 1% of the sorted values are discarded and a mean
of the remaining 98% of the values is used. If fewer than 100 values
are available, no trimming is done and a straight mean value is 
determined.

=back

The name of the summarized column is either the provided dataset name, 
the defined basename in the metadata of the Data structure, or a generic 
name. If successful, it will return the name of the file saved.

=back

=head2 Verifying Datasets

When working with row Features and collecting scores, the dataset 
from which you are collecting must be verified prior to collection. 
This ensures that the proper database adaptor is available and loaded, 
and that the dataset is correctly specified (otherwise nothing would be 
collected). This verification is normally performed transparently when 
you call L<get_score|Bio::ToolBox::Data::Feature/get_score> or 
L<get_position_scores|Bio::ToolBox::Data::Feature/get_position_scores>.
However, datasets may be explicitly verified prior to calling the score 
methods. 

=over 4

=item verify_dataset

 my $dataset = $Data->verify_dataset($dataset, $database);

Pass the name of the dataset (GFF type or type:source) for a GFF3-based 
database, e.g. L<Bio::DB::SeqFeature::Store>, or path and file name for a 
data file, e.g. Bam, BigWig, BigBed, or USeq file. If a separate database 
is being used, pass the name or opened database object as a second 
parameter. For more advance options, see 
L<Bio::ToolBox::db_helper/verify_or_request_feature_types>. 

The name of the verified dataset, with a prefix if necessary, is returned.

=back

=head2 Efficient Data Access

Most of the time we need to iterate over the Data table, one row 
at a time, collecting data or processing information. These methods 
simplify the process.

=over 4

=item iterate

    $Data->iterate( sub {
       my $row = shift;
       my $number = $row->value($index);
       my $log_number = log($number);
       $row->value($index, $log_number);
    } );

This method will process a code reference on every row in the data 
table. Pass a subroutine or code reference. The subroutine will 
receive the row as a L<Bio::ToolBox::Data::Feature> object. With this 
object, you can retrieve values, set values, and add new values. 

=item row_stream

This returns an C<Bio::ToolBox::Data::Iterator> object, which has one 
method, C<next_row>. Call this method repeatedly until it returns 
C<undef> to work through each row of data.

Users of the C<Bio::DB> family of database adaptors may recognize the 
analogy to the C<seq_stream> method.

=item next_row

  my $stream = $Data->row_stream;
  while (my $row = $stream->next_row) {
     # each $row is a Bio::ToolBox::Data::Feature object
     # representing the row in the data table
     my $value = $row->value($index);



( run in 0.935 second using v1.01-cache-2.11-cpan-39bf76dae61 )