Bio-ToolBox

 view release on metacpan or  search on metacpan

scripts/manipulate_wig.pl  view on Meta::CPAN


sub process_score {
	my $v = shift;    # score
	if ( $doNull and $v =~ m/^(?:n.?[na]) | (?:\-?inf) $/xi ) { $v = 0 }
	if ($deLogValue)                                          { $v = $deLogValue**$v }
	if ($doAbsolute)                                          { $v = abs($v) }
	if ($multiplyValue)                                       { $v *= $multiplyValue }
	if ($addValue)                                            { $v += $addValue }
	if ($logValue)                   { $v = $v == 0 ? 0 : log($v) / $logValue }
	if ( $doMin and $v < $minValue ) { $v = $minValue }
	if ( $doMax and $v > $maxValue ) { $v = $maxValue }
	if ($places)                     { $v = sprintf( $places, $v ) }
	return undef if ( $noZeroes and $v == 0 );
	return $v;
}

sub process_step_stats {
	return unless defined $_[0];
	for ( 1 .. $span ) {
		$stats->{count}      += 1;
		$stats->{sumData}    += $_[0];
		$stats->{sumSquares} += $_[0]**2;
		$stats->{minVal} = $_[0] if not defined $stats->{minVal};
		$stats->{maxVal} = $_[0] if not defined $stats->{maxVal};
		$stats->{minVal} = $_[0] if $_[0] < $stats->{minVal};
		$stats->{maxVal} = $_[0] if $_[0] > $stats->{maxVal};
	}
}

sub binVariance {
	return 0 unless $stats->{count};
	my $var = $stats->{sumSquares} - $stats->{sumData}**2 / $stats->{count};
	if ( $stats->{count} > 1 ) {
		$var /= $stats->{count} - 1;
	}
	return 0 if $var < 0;
	return $var;
}

__END__

=head1 NAME

manipulate_wig.pl

A progam to manipulate wiggle files.

=head1 SYNOPSIS

manipulate_wig.pl [options] -i <file1.wig> -o <file1.out.wig>

  File Options: 
  -i --in <file>            Input file. Accepts 'stdin'.
  -o --out <file>           Output file. Accepts 'stdout'.
  
  Selection functions:
  -k --skip <regex>         Skip lines where chromosomes match regex 
  -y --apply <regex>        Only apply manipulations to matching chromosomes 
  
  Manipulation functions (in order of execution):
  -u --null                 Convert null, NA, N/A, NaN, inf values to 0
  -d --delog [2|10]         Delog values of given base 
  -b --abs                  Convert to the absolute value 
  -m --mult <float>         Multiply score by the given value
  -a --add <float>          Add the given value to the score
  -l --log [2|10]           Convert to log2 or log10. 
  -n --min <float>          Set the minimum score
  -x --max <float>          Set the maximum score
  -p --place <int>          Format score to decimal positions
  -z --zero                 Discard lines with zero values

  BigWig support:
  --chromo <file>           Chromosome sizes file for writing bigWig
  --db <file>               Indexed file to obtain chromosome info
  --bw2w <path>             Path to UCSC bigWigToWig utility
  --w2bw <path>             Path to UCSC wigToBigWig utility
  
  General functions:
  -t --stats                Calculate statistics 
  -v --version              print version and exit
  -h --help                 show extended documentation

=head1 OPTIONS

The command line flags and descriptions:

=head2 File options

=over 4

=item --in E<lt>fileE<gt>

Specify the input wig file. All three formats, variableStep, fixedStep, and 
bedGraph, are supported. Files may be gzipped. BigWig files are supported, 
so long as the UCSC bigWigToWig utility is available. Alternatively, the input 
may be read from standard input by specifying 'stdin' as the file name. 

=item --out E<lt>fileE<gt>

Specify the output wig file. The output format will be the same format as the
input. The file may be gzipped by appending F<.gz> to the name. BigWig files are
supported, so long as the UCSC wigToBigWig utility is available and a chromosome
file is provided. Alternatively, the output may be sent to standard output by
specifying 'stdout' as the file name. 

=back

=head2 Selection functions

=over 4

=item --skip E<lt>regexE<gt>

Selectively skip (discard) lines corresponding to certain chromosomes that 
match the provided regular expression. For example, skip the 
mitochondrial and random contigs, use "chrM|chrUn|random".

=item --apply E<lt>regexE<gt>

Selectively apply manipulation functions to certain chromosomes that match 
provided regular expression, leaving remaining lines untouched. For example, 
to apply a normalization to the X chromosome, use 'chrX'.

=back

=head2 Manipulation functions

=over 4

=item --null

Convert lines with a score of C<null>, C<NA>, C<N/A>, C<NaN>, or C<inf> to 
a value of 0. 

=item --delog [2|10]

Convert lines from log space in the indicated base.

=item --abs

Convert line scores to absolute values.

=item --mult E<lt>floatE<gt>

Multiply line scores by the indicated value.

=item --add E<lt>floatE<gt>

Add the indicated value to each line score.

=item --log [2|10]
 
Convert the line score to a log equivalent in the indicated base space.

=item --min E<lt>floatE<gt>

Set the minimum floor score. Any score below the indicated value 
will be set to the indicated value.

=item --max E<lt>floatE<gt>

Set the maximum ceiling score. Any score above the indicated value 
will be set to the indicated value.

=item --place E<lt>integerE<gt>

Format the score value to the indicated number of decimal positions.

=item --zero 

Discard lines with a score value of zero.

=back

=head2 BigWig support

=over 4

=item chromo E<lt>fileE<gt>

When writing to a bigWig output file, provide a chromosome sizes text 
file for use with the F<wigToBigWig> utility. Alternatively, use a 
database file, below.

=item db E<lt>fileE<gt>

When writing to a bigWig output file, provide an indexed database file, 
such as another bigWig file, Bam, indexed Fasta, etc, for automatically 
generating a chromosome sizes text file to use with the F<wigToBigWig> 
utility. If a bigWig input file was specified, it will be conveniently 
substituted as a database. B<Note> that the C<--skip> option will be 
applied to the generated chromosome file.



( run in 0.564 second using v1.01-cache-2.11-cpan-39bf76dae61 )