Bio-ToolBox
view release on metacpan or search on metacpan
######## Bio::ToolBox revision history #############
v2.041
- Fix bug that prevented compilation on Perls <= 5.28
- Fix testing bug
v2.04
- Fix bugs that prevented scripts get_binned_data.pl and
get_relative_data.pl from compiling.
- Clean up code in utility and big_helper modules.
- Add new Build tests for big_helper.pm, which requires that
external UCSC utilities are in the PATH, but should safely
skip if they are not.
- Add new Build test to verify all included application scripts.
- Change all tests to use the modern Test2 suite.
v2.03
- Improve Data table sorting (again) to handle natural sorting by
using embedded numeric values, regardless at beginning, middle,
or end of string â useful for genes or numbered items with prefix/suffix.
- Add new functions to report statistics on feature length and filter
features by length (minimum - maximum range) in manipulate_datasets.pl.
- Add new option to filter alignments based on mapping quality when
counting using data collection apps get_datasets.pl, get_binned_data.pl
and get_relative_data.pl.
- Add new options for specifying what to use for the output Bed Name
column from apps get_features.pl and get_gene_regions.pl, including
feature Name or ID.
- Optimize alignment filtering based on flags for a very slight, but
measurable, improvement in execution time when collecting alignment
counts or generating wig files with bam2wig.pl.
- Avoid writing duplicate comment lines when merging files in merge_datasets.pl.
- Remove silly multiple-zero prefix when naming features in data2bed.pl
and data2gff.pl.
- The new_data() method now properly recognizes options in Bio::ToolBox.
- Added new new_bed() shortcut method to Bio::ToolBox.
- Avoid writing any metadata or comment lines to TSV files. Presumption
is that these are primarily for data export and sharing. Add rudimentary
support for writing CSV files.
- Improve coordinate extraction from coordinate strings, allowing to extract
for example from "chr1:123,456-789,000:-".
- Allow genomic coordinate sorting by coordinate string.
- Handle new Ensembl gencode tags when filtering in Bio::ToolBox::GeneTools.
- Implement map quality filtering in low level alignment callbacks used by
HTS and Sam adapters. Add new use_minimum_mapq() function in
Bio::ToolBox::db_helper to set the map quality level on global scale.
- Optimize name counting 'ncount' method in Bio::ToolBox::db_helper.
- Optimize and update API for counting all alignments in a bam file with
sum_total_bam_alignments() functions.
- Remove outdated functions in manipulate_datasets.pl.
- Fix bug with setting tag values of zero in SeqFeature objects.
- Rename splice_data() to split_data() in Bio::ToolBox::Data.
v2.02
- Add support for newer versions of UCSC utilities that no longer
allow reading from standard input, particularly 'wigToBigWig'.
Temporary wig files are written first and then the utility is
called. Updates made to bam2wig.pl, data2wig.pl, and manipulate_wig.pl.
- Add public headers() boolean method for Data objects.
- Changed duplicate() method for Stream objects to allow generating
a duplicate Data object if no output file name is provided.
- Various library bug fixes and improvements, including reading bedGraph
files, writing simple text files, handling spaces in numeric index lists,
automatic checking of extensions, parsing annotation files into tables,
and speeding up row deletion in large data tables.
- Fix issues with default output filenames in data collection scripts
get_datasets.pl, get_binned_data.pl, and get_relative_data.pl. Default is
to reuse input filename unless it was parsed, in which case the basename
plus txt is used. Also fix bugs regarding mismatched column names when
explicitly not parsing input annotation files.
- Fix bug with leaving behind MergeDatasetCoordinate column in merge_datasets.pl.
- Fix bug with using new column name instead of original column name in
message statements when manipulating columns in manipulate_datasets.pl.
- Fix possible bug with undefined strand when automatically flipping coordinates
of reversed intervals in SeqFeature objects.
- Remove deprecated and unused functions.
- Add missing POD method sections for complete coverage.
v2.01
- Update chromosome sorting to properly handle chromosomal arms, for
example with Drosophila
- Change '.groups.txt' group file name to '.col_groups.txt' when writing
column metadata file for scripts get_binned_data.pl and get_relative_data.pl
- Change back to '_summary.txt' file name when writing a summary file
- Change "--blacklist" option to "--exclude" in bam2wig.pl
- Improve error handling scenarios in data2wig.pl, including invalid indexes
- Fix bugs in manipulate_datasets.pl, including missing lines in the view function
and restricting the addname function to only update a proper "Name" column
- Update any remaining POD text references about 0-base indexing to 1-base
v2.0
- Version number change, no code changes
v1.70
- MAJOR UPDATE: Change all internal and user-oriented column indexing
to 1-base instead of 0-base indexing, i.e. column numbers are now
listed beginning with 1 instead of 0. WARNING!!! THIS WILL BREAK ALL
PRE-EXISTING SCRIPTS AND CODE THAT USES HARD-CODED COLUMN INDEXES!!!
- MAJOR UPDATE: Use a single unified Bio::ToolBox::Parser module with
subclasses for bed, gff, gtf, and ucsc table formats. NOTE: This changed
name capitalization of Bio::ToolBox::Parser subclasses from parser
- Improve parsing of gtf files, especially with duplicate tags
- Replaced old table sorting algorithm to use numeric, mixed digit-string,
( run in 2.288 seconds using v1.01-cache-2.11-cpan-140bd7fdf52 )