Bio-ToolBox
view release on metacpan or search on metacpan
lib/Bio/ToolBox.pm view on Meta::CPAN
=head2 Data manipulation
Work with data columns and/or rows in data tables.
=over 4
=item L<manipulate_datasets.pl>
An interactive, menu-driven application for quickly and easily performing all
sorts of common functions on columns, rows, and values.
=item L<manipulate_wig.pl>
Performs various numeric transformations on scores of text WIG, bedGraph,
and bigWig files.
=back
=head2 File manipulation
Work on columns or rows of one or more data tables.
=over 4
=item L<merge_datasets.pl>
Join columns from two or more data files into one file, with or without using
a lookup value.
=item L<split_data_file.pl>
Split a data file by rows into multiple files.
=item L<join_data_file.pl>
Joins two or data files by rows into one file.
=item L<pull_features.pl>
Take a list of identifiers and pull the corresponding rows from a source file
into a separate table of wanted features.
=back
=head1 USAGE
This module provides a handful of commonly used convenience methods
as entry points to working with data files. Most of them use or
return a L<Bio::ToolBox::Data> object.
=head2 Methods
=over 4
=item load_file
Open a tab-delimited text file as a L<Bio::ToolBox::Data> object.
Simply pass the file path as a single argument. It assumes the first
row is the column headers, and comment lines begin with C<#>.
Compressed files are transparently handled. See the
L<Bio::ToolBox::Data> C<new> method for more details or options.
$Data = Bio::ToolBox->load_file('myfile.txt');
For advanced options, pass key =E<gt> value pairs as arguments as
defined for L<Bio::ToolBox::Data> C<new()>.
=item parse_file
Parse an annotation file, such as BED, GTF, GFF3, UCSC genePred or
refFlat file, into a L<Bio::ToolBox::Data> table with two columns:
PrimaryID (geneID, transcriptID, or coordinate string) and Name. Each
row in the resulting table is linked to a parsed, top-level SeqFeature
object. See the L<Bio::ToolBox::Data> C<new> method for more details
or options. Default options include parsing subfeatures (exon, cds,
and utr) and simple GFF attributes.
$Data = Bio::ToolBox->parse_file('genes.gtf.gz');
=item new_data
Generate a new, empty L<Bio::ToolBox::Data> table with the given
column names. Pass an array of names of the columns for the new table.
$Data = Bio::ToolBox->new_data( qw(Name ID Score) );
Alternatively, you can pass an array of key =E<gt> value arguments
to be passed on to C<new()> function for explicit control.
=item new_bed
Generate a new, empty L<Bio::ToolBox::Data> table formatted
as a BED format. Pass the number of columns desired (integer
in range 3..12 inclusive). Default is 6 (standard BED format).
$Data = Bio::ToolBox->new_bed(4);
=item read_file
Open a generic file handle for reading. It transparently handles
compression as necessary. Returns an L<IO::File> object. Pass the
file path as an argument.
$fh = Bio::ToolBox->read_file('mydata.txt.gz');
=item write_file
Open a generic file handle for writing. It transparently handles
compression as necessary based on filename extension or passed
options. It will use the C<pigz> multi-threaded, external, compression
utility if available. See the C<open_to_write_fh> method in
<Bio::ToolBox::Data::file> for more information.
$fh = Bio::ToolBox->write_file('mynewdata.txt.gz');
=item open_database
Open a binary database file, including Bam, bigWig, bigBed, Fasta,
L<Bio::DB::SeqFeature::Store> SQLite file or named MySQL connection,
USeq file, or any other supported binary or indexed file formats.
Database type is transparently and automatically checked by looking for
common file extensions, if present. See the C<open_db_connection> in
L<Bio::ToolBox::db_helper> for more information.
$db = Bio::ToolBox->open_database($database);
=back
=head1 REPOSITORY
Source code for the Bio::ToolBox package is maintained at
L<https://github.com/tjparnell/biotoolbox/>.
Bugs and issues should be submitted at L<https://github.com/tjparnell/biotoolbox/issues>.
=head1 SEE ALSO
L<Bio::Perl>, L<Bio::DB::SeqFeature::Store>, L<Bio::SeqFeatureI>, L<Bio::DB::BigWig>,
L<Bio::DB::BigBed>, L<Bio::DB::Sam>, L<Bio::DB::HTS>, L<Bio::DB::USeq>,
L<Bio::ViennaNGS>
=head1 AUTHOR
Timothy J. Parnell, PhD
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112
=head1 LICENSE
This package is free software; you can redistribute it and/or modify
it under the terms of the Artistic License 2.0.
( run in 0.882 second using v1.01-cache-2.11-cpan-39bf76dae61 )