Acme-Tools

 view release on metacpan or  search on metacpan

Tools.pm  view on Meta::CPAN


However, there is two methods of interpolation:

Example, we have only three numbers: 5, 6 and 7.

Method 1: The most common is to say that 5 and 7 lays on the 25- and
75-percentile. This method is used in Acme::Tools.

Method 2: In Oracle databases the least and greatest numbers
always lay on the 0- and 100-percentile.

As an argument on why Oracles (and others?) definition is not the best way is to
look at your data as for instance temperature measurements.  If you
place the highest temperature on the 100-percentile you are sort of
saying that there can never be a higher temperatures in future measurements.

A quick non-exhaustive Google survey suggests that method 1 here is most used.

The larger the data sets, the less difference there is between the two methods.

B<Extrapolation:>

In method one, when you want a percentile outside of any possible
interpolation, you use the smallest and second smallest to extrapolate
from. For instance in the data set C<5, 6, 7>, if you want an
x-percentile of x < 25, this is below 5.

If you feel tempted to go below 0 or above 100, C<percentile()> will
I<die> (or I<croak> to be more precise)

Another method could be to use "soft curves" instead of "straight
lines" in interpolation. Maybe B-splines or Bezier curves. This is not
used here.

For large sets of data Hoares algorithm would be faster than the
simple straightforward implementation used in C<percentile()>
here. Hoares don't sort all the numbers fully.

B<Differences between the two main methods described above:>

 Data: 1, 4, 6, 7, 8, 9, 22, 24, 39, 49, 555, 992

 Percentile    Method 1                      Method 2
               (Acme::Tools::percentile      (Oracle)
               and others)
 ------------- ----------------------------- ---------
 0             -2                            1
 1             -1.61                         1.33
 25            6.25                          6.75
 50 (median)   15.5                          15.5
 75            46.5                          41.5
 99            1372.19                       943.93
 100           1429                          992

Found like this:

 perl -MAcme::Tools -le 'print for percentile([0,1,25,50,75,99,100], 1,4,6,7,8,9,22,24,39,49,555,992)'

And like this in Oracle-databases:

 select
   percentile_cont(0.00) within group(order by n) per0,
   percentile_cont(0.01) within group(order by n) per1,
   percentile_cont(0.25) within group(order by n) per25,
   percentile_cont(0.50) within group(order by n) per50,
   percentile_cont(0.75) within group(order by n) per75,
   percentile_cont(0.99) within group(order by n) per99,
   percentile_cont(1.00) within group(order by n) per100
 from (
   select 0+regexp_substr('1,4,6,7,8,9,22,24,39,49,555,992','[^,]+',1,i) n
   from dual,(select level i from dual connect by level <= 12)
 );

(Oracle also provides a similar function: C<percentile_disc> where I<disc>
is short for I<discrete>, meaning no interpolation is taking
place. Instead the closest number from the data set is picked.)

=cut

sub percentile {
  my(@p,@t,@ret);
  if(ref($_[0]) eq 'ARRAY'){ @p=@{shift()} }
  elsif(not ref($_[0]))    { @p=(shift())  }
  else{croak()}
  @t=@_;
  return if !@p;
  croak if !@t;
  @t=sort{$a<=>$b}@t;
  push@t,$t[0] if @t==1;
  for(@p){
    croak if $_<0 or $_>100;
    my $i=(@t+1)*$_/100-1;
    push@ret,
      $i<0       ? $t[0]+($t[1]-$t[0])*$i:
      $i>$#t     ? $t[-1]+($t[-1]-$t[-2])*($i-$#t):
      $i==int($i)? $t[$i]:
                   $t[$i]*(int($i+1)-$i) + $t[$i+1]*($i-int($i));
  }
  return @p==1 ? $ret[0] : @ret;
}

=head1 RANDOM

=head2 random

B<Input:> One or two arguments.

B<Output:>

If two integer arguments: returns a random integer between the integers in argument one and two.

If the first argument is an arrayref: returns a random member of that array without changing the array.

If the first argument is an arrayref and there is a second arg: return that many random members of that array

If the first argument is an hashref and there is no second arg: return a random key weighted by the values of that hash

If the first argument is an hashref and there is a second arg: return that many random keys weighted by the values of that hash

If there is no second argument and the first is an integer, a random integer between 0 and that number is returned. Including 0 and the number itself.

B<Examples:>

 $dice=random(1,6);                                      # 1, 2, 3, 4, 5 or 6
 $dice=random([1..6]);                                   # same as previous
 @dice=random([1..6],10);                                # 10 dice tosses
 $dice=random({1=>1, 2=>1, 3=>1, 4=>1, 5=>1, 6=>2});     # weighted dice with 6 being twice as likely as the others
 @dice=random({1=>1, 2=>1, 3=>1, 4=>1, 5=>1, 6=>2},10);  # 10 weighted dice tosses
 print random({head=>0.4999,tail=>0.4999,edge=>0.0002}); # coin toss (sum 1 here but not required to be)
 print random(2);                                        # prints 0, 1 or 2
 print 2**random(7);                                     # prints 1, 2, 4, 8, 16, 32, 64 or 128

Tools.pm  view on Meta::CPAN

 print "count: $c\n";                                    # prints random number up to 24 = 4*3*2*1 = 4!

=cut

sub perm {
    my(@i,@r) = 0..$#_;
    @_ || return;
    while ( push @r, [@_[@i]] ) {
	my $p = $#i || last;
	--$p || last while $i[$p-1] > $i[$p];
	push @i, reverse splice @i, my$q=$p;
	++$q while $i[$p-1] > $i[$q];
	@i[$p-1,$q] = @i[$q,$p-1];
    }
    @r
}

sub permute (&@) {
    return permute_continue(@_) if 'CODE,ARRAY,ARRAY' eq join',',map ref,@_;
    my $f = shift;
    my @i = 0..$#_;
    my $n = 0;
    @_ || do{ &$f(@_); return 0 };
    while ( ++$n and &$f(@_[@i]) ) {
	my $p = $#i || last;
	--$p || last while $i[$p-1] > $i[$p];
	push @i, reverse splice @i, my$q=$p;
	++$q while $i[$p-1] > $i[$q];
	@i[$p-1,$q] = @i[$q,$p-1];
    }
    $n;
}

#Fischer-Krause permutation starting from a specific sequence, for example to farm out permute to more than one process
sub permute_continue (&\@\@) {
    my ($f,$begin,$from) = @_;
    my %h; @h{@$begin} = 0 .. $#$begin;
    my @idx = @h{@$from};
    my $n = 0;
    while ( ++$n and &$f(@$begin[@idx]) ) {
	my $p = $#idx || last;
	--$p || last while $idx[$p-1] > $idx[$p];
	push @idx, reverse splice @idx, my$q=$p;
	++$q while $idx[$p-1] > $idx[$q];
	@idx[$p-1,$q]=@idx[$q,$p-1];
    }
    $n
}


=head2 cart

Cartesian product

B<Easy usage:>

Input: two or more arrayrefs with accordingly x, y, z and so on number of elements.

Output: An array of x * y * z number of arrayrefs. The arrays being the cartesian product of the input arrays.

It can be useful to think of this as joins in SQL. In C<select> statements with
more than one table behind C<from>, but without any C<where> condition to join the tables.

B<Advanced usage, with condition(s):>

B<Input:>

- Either two or more arrayrefs with x, y, z and so on number of elements.

- Or coderefs to subs containing condition checks. Somewhat like C<where> conditions in SQL.

B<Output:> An array of x * y * z number of arrayrefs (the cartesian product)
minus the ones that did not fulfill the condition(s).

This of is as joins with one or more where conditions as coderefs.

The coderef input arguments can be placed last or among the array refs
to save both runtime and memory if the conditions depend on
arrays further back.

B<Examples, this:>

 for(cart(\@a1,\@a2,\@a3)){
   my($a1,$a2,$a3) = @$_;
   print "$a1,$a2,$a3\n";
 }

Prints the same as this:

 for my $a1 (@a1){
   for my $a2 (@a2){
     for my $a3 (@a3){
       print "$a1,$a2,$a3\n";
     }
   }
 }

B<This:> with a condition: the sum of the first two should be divisible by 3:

 for( cart( \@a1, \@a2, sub{sum(@$_)%3==0}, \@a3 ) ) {
   my($a1,$a2,$a3)=@$_;
   print "$a1,$a2,$a3\n";
 }

Prints the same as this:

 for my $a1 (@a1){
   for my $a2 (@a2){
     next if 0==($a1+$a2)%3;
     for my $a3 (@a3){
       print "$a1,$a2,$a3\n";
     }
   }
 }

B<Examples, from the tests:>

 my @a1 = (1,2);
 my @a2 = (10,20,30);
 my @a3 = (100,200,300,400);

Tools.pm  view on Meta::CPAN

Note: using sub-ref filters do not work (yet) in hash-mode. Use grep on result instead.

=cut

sub cart {
  my @ars=@_;
  if(!ref($_[0])){ #if hash-mode detected
    my(@k,@v); push@k,shift@ars and push@v,shift@ars while @ars;
    return map{my%h;@h{@k}=@$_;\%h}cart(@v);
  }
  my @res=map[$_],@{shift@ars};
  for my $ar (@ars){
    @res=grep{&$ar(@$_)}@res and next if ref($ar) eq 'CODE';
    @res=map{my$r=$_;map{[@$r,$_]}@$ar}@res;
  }
  return @res;
}

sub cart_easy { #not tested, not exported http://stackoverflow.com/questions/2457096/in-perl-how-can-i-get-the-cartesian-product-of-multiple-sets
  my $last = pop @_;
  @_ ? (map {my$left=$_; map [@$left, $_], @$last } cart_easy(@_) )
     : (map [$_], @$last);
}

=head2 reduce

From: Why Functional Programming Matters: L<http://www.md.chalmers.se/~rjmh/Papers/whyfp.pdf> L<http://www.cse.chalmers.se/~rjmh/Papers/whyfp.html>

L<http://www.md.chalmers.se/~rjmh/Papers/whyfp.html>

DON'T TRY THIS AT HOME, C PROGRAMMERS.

 sub reduce (&@) {
   my ($proc, $first, @rest) = @_;
   return $first if @rest == 0;
   local ($a, $b) = ($first, reduce($proc, @rest));
   return $proc->();
 }

Many functions can then be implemented with very little code. Such as:

 sub mean { (reduce {$a + $b} @_) / @_ }

=cut

sub reduce (&@) {
  my ($proc, $first, @rest) = @_;
  return $first if @rest == 0;
  no warnings;
  local ($a, $b) = ($first, reduce($proc, @rest));
  return $proc->();
}


=head2 pivot

Resembles the pivot table function in Excel.

C<pivot()> is used to spread out a slim and long table to a visually improved layout.

For instance spreading out the results of C<group by>-selects from SQL:

 pivot( arrayref, columnname1, columnname2, ...)

 pivot( ref_to_array_of_arrayrefs, @list_of_names_to_down_fields )

The first argument is a ref to a two dimensional table.

The rest of the arguments is a list which also signals the number of
columns from left in each row that is ending up to the left of the
data table, the rest ends up at the top and the last element of
each row ends up as data.

                   top1 top1 top1 top1
 left1 left2 left3 top2 top2 top2 top2
 ----- ----- ----- ---- ---- ---- ----
                   data data data data
                   data data data data
                   data data data data

Example:

 my @table=(
               ["1997","Gerd", "Weight", "Summer",66],
               ["1997","Gerd", "Height", "Summer",170],
               ["1997","Per",  "Weight", "Summer",75],
               ["1997","Per",  "Height", "Summer",182],
               ["1997","Hilde","Weight", "Summer",62],
               ["1997","Hilde","Height", "Summer",168],
               ["1997","Tone", "Weight", "Summer",70],

               ["1997","Gerd", "Weight", "Winter",64],
               ["1997","Gerd", "Height", "Winter",158],
               ["1997","Per",  "Weight", "Winter",73],
               ["1997","Per",  "Height", "Winter",180],
               ["1997","Hilde","Weight", "Winter",61],
               ["1997","Hilde","Height", "Winter",164],
               ["1997","Tone", "Weight", "Winter",69],

               ["1998","Gerd", "Weight", "Summer",64],
               ["1998","Gerd", "Height", "Summer",171],
               ["1998","Per",  "Weight", "Summer",76],
               ["1998","Per",  "Height", "Summer",182],
               ["1998","Hilde","Weight", "Summer",62],
               ["1998","Hilde","Height", "Summer",168],
               ["1998","Tone", "Weight", "Summer",70],

               ["1998","Gerd", "Weight", "Winter",64],
               ["1998","Gerd", "Height", "Winter",171],
               ["1998","Per",  "Weight", "Winter",74],
               ["1998","Per",  "Height", "Winter",183],
               ["1998","Hilde","Weight", "Winter",62],
               ["1998","Hilde","Height", "Winter",168],
               ["1998","Tone", "Weight", "Winter",71],
             );

.

 my @reportA=pivot(\@table,"Year","Name");
 print "\n\nReport A\n\n".tablestring(\@reportA);

Tools.pm  view on Meta::CPAN

  drow
  drows
  drowc
  drowsc
  dcols
  dpk
  dsel
  ddo
  dins
  dupd
  ddel
  dcommit
  drollback

=cut

#my$dummy=<<'SOON';
sub dtype {
  my $connstr=shift;
  return 'SQLite' if $connstr=~/(\.sqlite|sqlite:.*\.db)$/i;
  return 'Oracle' if $connstr=~/\@/;
  return 'Pg' if 1==2;
  die;
}

our($Dbh,@Dbh,%Sth);
our %Dbattr=(RaiseError => 1, AutoCommit => 0); #defaults
sub dlogin {
  my $connstr=shift();
  my %attr=(%Dbattr,@_);
  my $type=dtype($connstr);
  my($dsn,$u,$p)=('','','');
  if($type eq 'SQLite'){
    $dsn=$connstr;
  }
  elsif($type eq 'Oracle'){
    ($u,$p,$dsn)=($connstr=~m,(.+?)(/.+?)?\@(.+),);
  }
  elsif($type eq 'Pg'){
    croak "todo";
  }
  else{
    croak "dblogin: unknown database type for connection string $connstr\n";
  }
  $dsn="dbi:$type:$dsn";
  push @Dbh, $Dbh if $Dbh; #local is better?
  require DBI;
  $Dbh=DBI->connect($dsn,$u,$p,\%attr); #connect_cached?
}
sub dlogout {
  $Dbh->disconnect;
  $Dbh=pop@Dbh if @Dbh;
}
sub drow {
  my($q,@b)=_dattrarg(@_);
  #my $sth=do{$Sth{$Dbh,$q} ||= $Dbh->prepare_cached($q)};
  my $sth=$Dbh->prepare_cached($q);
  $sth->execute(@b);
  my @r=$sth->fetchrow_array;
  $sth->finish if $$Dbh{Driver}{Name} eq 'SQLite';
  #$dbh->selectrow_array($statement);
  return @r==1?$r[0]:@r;
}
sub drows {
}
sub drowc {
}
sub drowsc {
}
sub dcols {
}
sub dpk {
}
sub dsel {
}
sub ddo {
  my @arg=_dattrarg(@_);
  #warn serialize(\@arg,'arg','',1);
  $Dbh->do(@arg); #hm cache?
}
sub dins {
}
sub dupd {
}
sub ddel {
}
sub dcommit { $Dbh->commit }
sub drollback { $Dbh->rollback }

sub _dattrarg {
  my @arg=@_;
  splice @arg,1,0, ref($arg[-1]) eq 'HASH' ? pop(@arg) : {};
  @arg;
}

=head2 self_update

Update Acme::Tools to newest version quick and dirty:

 function pmview(){ ls -ld `perl -M$1 -le'$m=shift;$mi=$m;$mi=~s,::,/,g;print $INC{"$mi.pm"};warn"Version ".${$m."::VERSION"}."\n"' $1`;}

 pmview Acme::Tools                                     #view date and version before
 sudo perl -MAcme::Tools -e Acme::Tools::self_update    #update to newest version
 pmview Acme::Tools                                     #view date and version after

Does C<cd> to where Acme/Tools.pm are and then wget -N https://raw.githubusercontent.com/kjetillll/Acme-Tools/master/Tools.pm

TODO: cmd_acme_tools_self_update, accept --no-check-certificate to use on curl

=cut

our $Wget;
our $Self_update_url='https://raw.githubusercontent.com/kjetillll/Acme-Tools/master/Tools.pm'; #todo: change site
sub self_update {
  #in($^O,'linux','cygwin') or die"ERROR: self_update works on linux and cygwin only";
  $Wget||=(grep -x$_,map"$_/wget",'/usr/bin','/bin','/usr/local/bin','.')[0]; #hm --no-check-certificate
  -x$Wget or die"ERROR: wget ($Wget) executable not found\n";
  my $d=dirname(__FILE__);
  sys("cd $d; ls -l Tools.pm; md5sum Tools.pm");
  sys("cd $d; $Wget -N ".($ARGV[0]||$Self_update_url));
  sys("cd $d; ls -l Tools.pm; md5sum Tools.pm");



( run in 1.352 second using v1.01-cache-2.11-cpan-2398b32b56e )