Bio-ToolBox

 view release on metacpan or  search on metacpan

CHANGES  view on Meta::CPAN

	- Minor optimization when calculating shift values in script bam2wig.pl.

v.1.12 (svn 619)
	- Major improvements to performance of some data collection scripts by 
	adding multi-threaded options. These include get_datasets.pl, 
	get_relative_data.pl, average_gene.pl, and bam2wig.pl. The number of 
	CPU forks may be specified with the --cpu option (default 2). This option 
	requires the installation of Parallel::ForkManager, available through 
	CPAN. Run the check_dependencies.pl script to install it.
	- All gzip compression read and writes are now forked through an 
	external gzip utility for a considerable boost in performance (2-5X). 
	The gzip executable must be in your path for this to work (it usually 
	is on most Unix-like environments).
	- Added --long option when collecting data from long features in script 
	average_gene.pl.
	- Improved efficiency when collecting data from very large windows in 
	both get_relative_data.pl and average_gene.pl.
	- Summing the total number of read alignments in Bam files is also  
	multi-threaded. Summing the total number of intervals in a BigBed file 
	is also improved.
	- Fixed a critical error where not all windows had data collected when 
	using the script get_relative_data.pl

v.1.11 (svn 603)
	- Major revision of how features are now retrieved from the database 
	using primary_IDs rather than relying on unique names in the database. 
	Generating lists of features will now return Primary_ID, Name, and Type. 
	The Primary_ID is unique to a database and is usually non-portable. 
	Current feature lists with only Name and Type will still work, and are 
	subject to limitations of non-unique Names in the database. This affects 
	all scripts that work with database features, including get_features.pl, 
	get_feature_info.pl, get_datasets.pl, get_relative_data.pl, 
	average_gene.pl, get_intersecting_features.pl, and correlate_position_data.pl.
	- GFF3 annotation scripts get_ensembl_annotation.pl and ucsc_table2gff3.pl 
	now produce GFF3 files that better match the GFF3 specification. Names 
	are no longer made unique (which broke ties with the originating data), 
	proper Dbxref tags are attributed when external sources could be 
	identified, and chromosomes are now sorted by name. Other minor 
	improvements were also made.
	- Fixed critical bug that prevented spliced alignments from being 
	counted in script bam2wig.pl. Thanks to Pinal K. for reporting.

v.1.10.3 (svn 597)
	- Unified column names and improved their recognition in scripts 
	get_feature_info.pl and the graphing scripts graph_data.pl, 
	graph_histogram.pl, and graph_profile.pl.
	- Graphing scripts now write the output graph directory in the input 
	file parent directory instead of the current directory.

v.1.10.2 (svn 591)
	- Added a new option of position when adjusting coordinates of retrieved
	features using the script get_features.pl. Coordinates may be adjusted
	at the 5 prime, 3 prime, or both ends of stranded features. This also 
	fixes bugs where collected features on the reverse strand with adjusted
	coordinates were not reported properly.
	- Improved automatic recognition of the name, score, and other columns
	in the convertor scripts data2bed.pl, data2gff.pl, and data2wig.pl. 
	- Improved the Cluster and Treeview export function in script
	manipulate_datasets.pl. The CDT files generated now include separate ID
	and NAME columns per the specification, and new manipulations are
	included prior to exporting, including percentile rank and log2.
	- The convert null function now also converts zero values if requested
	in script manipulate_datasets.pl.
	- Added new option of a minimum size when trimming windows in the script
	find_enriched_regions.pl.
	- Increased the radius from 35 bp to 50 bp when verifying a putative
	mapped nucleosome in script map_nucleosomes.pl, leading to fewer
	overlapping or offset nucleosomes.
	- Added new option to re-center offset nucleosomes in script
	verify_nucleosome_mapping.pl. Also improved report formatting.
	- Added checks and warnings when writing file names longer than 256
	characters. Some scripts automatically generate file names that may
	exceed this limit, preventing writing. File names are now truncated.
	Thanks to Adam F. for reporting.
	- Added new methods and code improvements to the gff3 parsing library.
	- Fixed a bug in script merge_datasets.pl where the column index for a
	second file may not be properly validated leading to premature
	termination.
	- Fixed a bug where multiple datasets combined with an ampersand for
	merging were not properly verified. 
	- Fixed a bug where a user may not be prompted to select a dataset from
	a database if none was supplied from the command line.
	- Fixed a bug where files containing trailing nulls do not load
	properly.
	- Fixed a bug related to finding specific data columns by name.
	- Fixed a bug with writing summary files.

v.1.10.1 (svn 568)
	- Added support for Bio::DB::Fasta in the main BioToolBox library, and
	added the support to scripts data2fasta.pl and CpG_calculator.pl. Any
	BioToolBox program that requires chromosome information or sequence can
	now use a genomic multi-fasta or directory of fasta files in the --db
	option.
	- Fixed critical error in data2gff.pl that prevented files from being
	converted to GFF format.
	- Fixed critical error merge_datasets.pl that prevented column headers
	from being written to the output file.
	- Made the warning about unavailable files on the UCSC FTP server less
	scary in the script ucsc_table2gff3.pl.
	- Updated and clarified some script documentation.

v.1.10 (svn 559)
	- Significantly improved performance when collecting data from Bam files
	by using a low level API. Improvements of at least 2X may be realized.
	- Significantly improved the performance of the bam2wig.pl script by at
	least 2X. Added a new option of recording extended regions across the
	predicted fragment based on empirically determined shift values.
	Sampling to determine shift values has been increased. BedGraph files
	are now written more efficiently. Maximum number of identical reads are
	now enforced.
	- Significantly improved the performance of the split_bam_by_isize.pl
	script to increase speed by at least 2X. Added an option to skip
	checking of mates. Improved reporting of results.
	- Added a filter option to remove overlapping nucleosomes in script
	verify_nucleosome_mapping.pl; also fixed bugs in reporting offset
	distances and improved output reporting.
	- Removed confusing separate scan and tag datasets required for script
	map_nucleosomes.pl. Cleaned up and organized code. Fixed bugs that
	prevented datasets from being validated.
	- Fixed critical bug where data was not collected for the final row in
	script get_datasets.pl.



( run in 0.894 second using v1.01-cache-2.11-cpan-75ffa21a3d4 )