Bio-ToolBox
view release on metacpan or search on metacpan
- Minor optimization when calculating shift values in script bam2wig.pl.
v.1.12 (svn 619)
- Major improvements to performance of some data collection scripts by
adding multi-threaded options. These include get_datasets.pl,
get_relative_data.pl, average_gene.pl, and bam2wig.pl. The number of
CPU forks may be specified with the --cpu option (default 2). This option
requires the installation of Parallel::ForkManager, available through
CPAN. Run the check_dependencies.pl script to install it.
- All gzip compression read and writes are now forked through an
external gzip utility for a considerable boost in performance (2-5X).
The gzip executable must be in your path for this to work (it usually
is on most Unix-like environments).
- Added --long option when collecting data from long features in script
average_gene.pl.
- Improved efficiency when collecting data from very large windows in
both get_relative_data.pl and average_gene.pl.
- Summing the total number of read alignments in Bam files is also
multi-threaded. Summing the total number of intervals in a BigBed file
is also improved.
- Fixed a critical error where not all windows had data collected when
using the script get_relative_data.pl
v.1.11 (svn 603)
- Major revision of how features are now retrieved from the database
using primary_IDs rather than relying on unique names in the database.
Generating lists of features will now return Primary_ID, Name, and Type.
The Primary_ID is unique to a database and is usually non-portable.
Current feature lists with only Name and Type will still work, and are
subject to limitations of non-unique Names in the database. This affects
all scripts that work with database features, including get_features.pl,
get_feature_info.pl, get_datasets.pl, get_relative_data.pl,
average_gene.pl, get_intersecting_features.pl, and correlate_position_data.pl.
- GFF3 annotation scripts get_ensembl_annotation.pl and ucsc_table2gff3.pl
now produce GFF3 files that better match the GFF3 specification. Names
are no longer made unique (which broke ties with the originating data),
proper Dbxref tags are attributed when external sources could be
identified, and chromosomes are now sorted by name. Other minor
improvements were also made.
- Fixed critical bug that prevented spliced alignments from being
counted in script bam2wig.pl. Thanks to Pinal K. for reporting.
v.1.10.3 (svn 597)
- Unified column names and improved their recognition in scripts
get_feature_info.pl and the graphing scripts graph_data.pl,
graph_histogram.pl, and graph_profile.pl.
- Graphing scripts now write the output graph directory in the input
file parent directory instead of the current directory.
v.1.10.2 (svn 591)
- Added a new option of position when adjusting coordinates of retrieved
features using the script get_features.pl. Coordinates may be adjusted
at the 5 prime, 3 prime, or both ends of stranded features. This also
fixes bugs where collected features on the reverse strand with adjusted
coordinates were not reported properly.
- Improved automatic recognition of the name, score, and other columns
in the convertor scripts data2bed.pl, data2gff.pl, and data2wig.pl.
- Improved the Cluster and Treeview export function in script
manipulate_datasets.pl. The CDT files generated now include separate ID
and NAME columns per the specification, and new manipulations are
included prior to exporting, including percentile rank and log2.
- The convert null function now also converts zero values if requested
in script manipulate_datasets.pl.
- Added new option of a minimum size when trimming windows in the script
find_enriched_regions.pl.
- Increased the radius from 35 bp to 50 bp when verifying a putative
mapped nucleosome in script map_nucleosomes.pl, leading to fewer
overlapping or offset nucleosomes.
- Added new option to re-center offset nucleosomes in script
verify_nucleosome_mapping.pl. Also improved report formatting.
- Added checks and warnings when writing file names longer than 256
characters. Some scripts automatically generate file names that may
exceed this limit, preventing writing. File names are now truncated.
Thanks to Adam F. for reporting.
- Added new methods and code improvements to the gff3 parsing library.
- Fixed a bug in script merge_datasets.pl where the column index for a
second file may not be properly validated leading to premature
termination.
- Fixed a bug where multiple datasets combined with an ampersand for
merging were not properly verified.
- Fixed a bug where a user may not be prompted to select a dataset from
a database if none was supplied from the command line.
- Fixed a bug where files containing trailing nulls do not load
properly.
- Fixed a bug related to finding specific data columns by name.
- Fixed a bug with writing summary files.
v.1.10.1 (svn 568)
- Added support for Bio::DB::Fasta in the main BioToolBox library, and
added the support to scripts data2fasta.pl and CpG_calculator.pl. Any
BioToolBox program that requires chromosome information or sequence can
now use a genomic multi-fasta or directory of fasta files in the --db
option.
- Fixed critical error in data2gff.pl that prevented files from being
converted to GFF format.
- Fixed critical error merge_datasets.pl that prevented column headers
from being written to the output file.
- Made the warning about unavailable files on the UCSC FTP server less
scary in the script ucsc_table2gff3.pl.
- Updated and clarified some script documentation.
v.1.10 (svn 559)
- Significantly improved performance when collecting data from Bam files
by using a low level API. Improvements of at least 2X may be realized.
- Significantly improved the performance of the bam2wig.pl script by at
least 2X. Added a new option of recording extended regions across the
predicted fragment based on empirically determined shift values.
Sampling to determine shift values has been increased. BedGraph files
are now written more efficiently. Maximum number of identical reads are
now enforced.
- Significantly improved the performance of the split_bam_by_isize.pl
script to increase speed by at least 2X. Added an option to skip
checking of mates. Improved reporting of results.
- Added a filter option to remove overlapping nucleosomes in script
verify_nucleosome_mapping.pl; also fixed bugs in reporting offset
distances and improved output reporting.
- Removed confusing separate scan and tag datasets required for script
map_nucleosomes.pl. Cleaned up and organized code. Fixed bugs that
prevented datasets from being validated.
- Fixed critical bug where data was not collected for the final row in
script get_datasets.pl.
( run in 0.894 second using v1.01-cache-2.11-cpan-75ffa21a3d4 )