Bio-ToolBox

 view release on metacpan or  search on metacpan

CHANGES  view on Meta::CPAN

	parallel process forks.
	- Fixed bug where the interactive menu would exit upon an empty value
	in script manipulate_datasets.pl. A "q" must now be provided to exit.
	- Minor optimization when calculating shift values in script bam2wig.pl.

v.1.12 (svn 619)
	- Major improvements to performance of some data collection scripts by 
	adding multi-threaded options. These include get_datasets.pl, 
	get_relative_data.pl, average_gene.pl, and bam2wig.pl. The number of 
	CPU forks may be specified with the --cpu option (default 2). This option 
	requires the installation of Parallel::ForkManager, available through 
	CPAN. Run the check_dependencies.pl script to install it.
	- All gzip compression read and writes are now forked through an 
	external gzip utility for a considerable boost in performance (2-5X). 
	The gzip executable must be in your path for this to work (it usually 
	is on most Unix-like environments).
	- Added --long option when collecting data from long features in script 
	average_gene.pl.
	- Improved efficiency when collecting data from very large windows in 
	both get_relative_data.pl and average_gene.pl.
	- Summing the total number of read alignments in Bam files is also  
	multi-threaded. Summing the total number of intervals in a BigBed file 
	is also improved.
	- Fixed a critical error where not all windows had data collected when 
	using the script get_relative_data.pl

v.1.11 (svn 603)
	- Major revision of how features are now retrieved from the database 
	using primary_IDs rather than relying on unique names in the database. 
	Generating lists of features will now return Primary_ID, Name, and Type. 
	The Primary_ID is unique to a database and is usually non-portable. 
	Current feature lists with only Name and Type will still work, and are 
	subject to limitations of non-unique Names in the database. This affects 
	all scripts that work with database features, including get_features.pl, 
	get_feature_info.pl, get_datasets.pl, get_relative_data.pl, 
	average_gene.pl, get_intersecting_features.pl, and correlate_position_data.pl.
	- GFF3 annotation scripts get_ensembl_annotation.pl and ucsc_table2gff3.pl 
	now produce GFF3 files that better match the GFF3 specification. Names 
	are no longer made unique (which broke ties with the originating data), 
	proper Dbxref tags are attributed when external sources could be 
	identified, and chromosomes are now sorted by name. Other minor 
	improvements were also made.
	- Fixed critical bug that prevented spliced alignments from being 
	counted in script bam2wig.pl. Thanks to Pinal K. for reporting.

v.1.10.3 (svn 597)
	- Unified column names and improved their recognition in scripts 
	get_feature_info.pl and the graphing scripts graph_data.pl, 
	graph_histogram.pl, and graph_profile.pl.
	- Graphing scripts now write the output graph directory in the input 
	file parent directory instead of the current directory.

v.1.10.2 (svn 591)
	- Added a new option of position when adjusting coordinates of retrieved
	features using the script get_features.pl. Coordinates may be adjusted
	at the 5 prime, 3 prime, or both ends of stranded features. This also 
	fixes bugs where collected features on the reverse strand with adjusted
	coordinates were not reported properly.
	- Improved automatic recognition of the name, score, and other columns
	in the convertor scripts data2bed.pl, data2gff.pl, and data2wig.pl. 
	- Improved the Cluster and Treeview export function in script
	manipulate_datasets.pl. The CDT files generated now include separate ID
	and NAME columns per the specification, and new manipulations are
	included prior to exporting, including percentile rank and log2.
	- The convert null function now also converts zero values if requested
	in script manipulate_datasets.pl.
	- Added new option of a minimum size when trimming windows in the script
	find_enriched_regions.pl.
	- Increased the radius from 35 bp to 50 bp when verifying a putative
	mapped nucleosome in script map_nucleosomes.pl, leading to fewer
	overlapping or offset nucleosomes.
	- Added new option to re-center offset nucleosomes in script
	verify_nucleosome_mapping.pl. Also improved report formatting.
	- Added checks and warnings when writing file names longer than 256
	characters. Some scripts automatically generate file names that may
	exceed this limit, preventing writing. File names are now truncated.
	Thanks to Adam F. for reporting.
	- Added new methods and code improvements to the gff3 parsing library.
	- Fixed a bug in script merge_datasets.pl where the column index for a
	second file may not be properly validated leading to premature
	termination.
	- Fixed a bug where multiple datasets combined with an ampersand for
	merging were not properly verified. 
	- Fixed a bug where a user may not be prompted to select a dataset from
	a database if none was supplied from the command line.
	- Fixed a bug where files containing trailing nulls do not load
	properly.
	- Fixed a bug related to finding specific data columns by name.
	- Fixed a bug with writing summary files.

v.1.10.1 (svn 568)
	- Added support for Bio::DB::Fasta in the main BioToolBox library, and
	added the support to scripts data2fasta.pl and CpG_calculator.pl. Any
	BioToolBox program that requires chromosome information or sequence can
	now use a genomic multi-fasta or directory of fasta files in the --db
	option.
	- Fixed critical error in data2gff.pl that prevented files from being
	converted to GFF format.
	- Fixed critical error merge_datasets.pl that prevented column headers
	from being written to the output file.
	- Made the warning about unavailable files on the UCSC FTP server less
	scary in the script ucsc_table2gff3.pl.
	- Updated and clarified some script documentation.

v.1.10 (svn 559)
	- Significantly improved performance when collecting data from Bam files
	by using a low level API. Improvements of at least 2X may be realized.
	- Significantly improved the performance of the bam2wig.pl script by at
	least 2X. Added a new option of recording extended regions across the
	predicted fragment based on empirically determined shift values.
	Sampling to determine shift values has been increased. BedGraph files
	are now written more efficiently. Maximum number of identical reads are
	now enforced.
	- Significantly improved the performance of the split_bam_by_isize.pl
	script to increase speed by at least 2X. Added an option to skip
	checking of mates. Improved reporting of results.
	- Added a filter option to remove overlapping nucleosomes in script
	verify_nucleosome_mapping.pl; also fixed bugs in reporting offset
	distances and improved output reporting.
	- Removed confusing separate scan and tag datasets required for script
	map_nucleosomes.pl. Cleaned up and organized code. Fixed bugs that

CHANGES  view on Meta::CPAN

v.1.9.5 (svn 525)
	- Changed the non-intuitive --except option to a more intuitive --zero
	option in script manipulate_datasets.pl; this is now a boolean option to
	include or exclude zero values when calculating statistics. The printed
	statistics output has also been cleaned up and no longer includes
	decimal formatting. The export function will automatically generate a
	name when executed automatically.
	- Added capability to use a column of source values rather than a static
	text string for the GFF source tag in script data2gff.pl. Also made
	improvements to the interactive ask session.
	- Added the capability to use a big file dataset as the database for
	chromosome information in script find_enriched_regions.pl.
	- Added an option to automatically convert the output file to a BED file
	in script get_gene_regions.pl, and included a description of the --in
	option in the POD documentation.

v.1.9.4 (svn 519)
	- Fixed first critical bug in script get_datasets.pl where strand
	information in input files with genomic coordinates (e.g. BED files) was
	not considered when adjusting coordinates (start, stop, or fractional). 
	- Fixed second critical bug in script get_datasets.pl where collecting
	fractional data for named database features resulted in data collection
	over the entire feature.
	- Improved interpretation of input file features as genomic regions or
	named features in script get_datasets.pl.
	- Changed the --set_strand option to --force_strand in multiple data
	collection scripts. This should make the function a little more obvious
	as to its purpose. Documentation changed as appropriate.

v.1.9.3 (svn 516)
	- Fixed bug where wig definition lines may not be written when no
	alignments exist in the first 2 Mb of a chromosome when converting a bam
	file to a wig file in script bam2wig.pl. Definition lines are now always
	written. Thanks to Matt J. for reporting.
	- Fixed bug where the format_with_commas sub was not properly imported
	into the tim_db_helper library
	- Fixed bug where the bed output from script get_features.pl did not
	properly report strand information. 

v.1.9.2 (svn 510)
	- Fixed critical bug where codon changes were not reported correctly for
	minus strand genes in script locate_SNPs.pl. Thanks to Craig K. for
	reporting.

v.1.9.1 (svn 507)
	- Added critical code to interpret strand information from input files
	such as Bed and GFF into BioPerl standards. Essential for collecting
	stranded data. Also properly writes back strand information for valid
	Bed and GFF files
	- Updated and unified internal library methods for validating and
	requesting database feature types. By default, all database features are
	presented to the user as a list when selecting database features to
	collect data. The source_exclude parameter in the biotoolbox.cfg
	configuration file is now deprecated.
	- Upgraded script get_intersecting_features.pl to automatically
	recognize input file columns and search for more than 1 feature type
	- Fixed bug in script get_datasets.pl where the program will not
	continue when only a data database was provided
	- Fixed bug of requesting index when using a .kgg file as a gene list in
	script pull_features.pl
	- Fixed bug in generating file name for Treeview export function in
	script manipulate_datasets.pl
	- Fixed behavior when reading files to prevent adding the current
	program name to the metadata when the input file does not have this
	metadata
	- Minor updates to script novo_wrapper.pl
	
v.1.9.0 (svn 493)
	- Added new script get_features.pl which generates a list of features
	for one or more feature types from a database. Information about the
	features may be returned, including name, type, and coordinates. Sub
	features may be included. The data may be written as a BioToolBox
	formatted text file, GFF or BED.
	- Added new script correlate_position_data.pl that calculates a Pearson
	correlation between the score values at identical positions along a
	feature between two datasets. This helps in identifying changes in
	spatial distribution of values. An option for calculating shifts is also
	available.
	- Improved Big File generation such that Bio::DB::BigWig or
	Bio::DB::BigBed is no longer required just to generate the big file, as
	conversion uses external utilities anyway.
	- Fixed generation of bin values when calculating distribution
	frequencies in scripts data2frequency.pl and graph_histogram.pl

v.1.8.7 (svn 487)
	- Added new command line options to script merge_datasets.pl to control
	the program's behavior. The "--lookupname" option allows you to specify
	the name of the lookup column, while "--manual" turns off all automatic
	guessing of columns. Also improved handling of original_file metadata.
	- Added a new option to collect data from long features (such as genomic
	annotations) instead of point data (microarray or sequence data) in
	script get_relative_data.pl.
	- Added option to convert to and from Roman numerals in chromosome names
	and support for wig files in script change_chr_prefix.pl
	- Added option to change the IP port number when connecting to a remote
	MySQL database host in script get_ensembl_annotation.pl
	- Fixed bug to properly close opened files in script split_data_file.pl
	and avoid unnecessary error messages.
	- Modified statements and warnings regarding step and span values in
	script data2wig.pl

v.1.8.6 (svn 477)
	- Added numerous enhancements and bug fixes to script data2wig.pl,
	including automatically assigning the span parameter in the wig file,
	identifying coordinate columns, adding command line options for
	coordinate columns, and updating the POD documentation
	- Improved the treeview export function in script manipulate_datasets.pl
	to include different manipulations, including median center of genes or
	datasets, converting to Z-scores, and converting null values. Also
	changed the default output name to <basename>.cdt.
	- Added advanced option to script merge_datasets.pl to specify the
	column order on the command line instead of interactively. Also
	increased the number of columns that can be specified as letters.
	- Added the "value" command line option to specify the type of data to
	collect to the script find_enriched_regions.pl. Also added the sum
	method plus some improvements for identifying depleted regions.
	- Updated the script run_cluster.pl to accept any file name as input,
	and added basic file format validation checks prior to running the
	cluster algorithm, among a few other minor improvements
	- Improved handling of error messages when attempting to open databases
	that do not exist or can not otherwise be opened.



( run in 0.479 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )