length results from the CPAN

length

Algorithm-InversionList

view release on metacpan or search on metacpan

);
our $VERSION = '0.03';

sub invlist
{
 my $string = shift @_;

 # we need a valid string
 return undef unless defined $string;

 # handle trivial case of 0-length string
 return [] unless length $string;

 # this is suboptimal, we eventually want to do things in multiples of 8 (on byte boundaries)
 # $length is length in bits, we avoid b* because it will create a list 8 times larger than C*
 my @unpacked = unpack("C*", $string);
 my $length = scalar @unpacked * 8;
 my $current = vec($string, 0, 1);
 my $new;
 my @list;

 push @list, 0 if $current;

 foreach my $offset (0..$length)
 {  
  $new = vec($string, $offset, 1);
  if ($new != $current)
  {
   push @list, $offset;
  }
  $current = $new;
 }

 push @list, $length unless exists $list[-1] && $list[-1] == $length;

 return \@list;
}

sub data_from_invlist
{
 my $out = '';				# start with a blank string
 my $append = 0;			# we're appending '0' bits first
 my $list = shift @_;

InversionList.pm view on Meta::CPAN



=head1 DESCRIPTION

Inversion lists are data structures that store a sequence of bits as
the numeric position of each switch between a run of 0 and 1 bits.
Thus, the data "111111100" is encoded as the list of numbers 0, 7 in
an inversion list.  This module begins the list with the start of the
run of 1's, so if the first 2 bits in the string are 0, the first
entry in the list will be a 2 (where we find the first bit that is 1).
The last number will always be the length of the string, so that we
know where to end it.

Inversion lists are very efficient.  Because of the way that Perl
stores scalars and lists and the various architectures to which Perl
has been ported, there is no definitive rule as to what's the exact
proportion of bit runs to bitstring length required to make inversion
lists efficient.  Generally, if you see long runs of repeated 0 or 1
bits, an inversion list may be appropriate.

This module stores inversion lists in an offset-based format which has
some nice properties, for instance searching is fast and you can
easily do boolean operations on two inversion lists stored in the
offset-based format.

=head2 EXPORT

test.pl view on Meta::CPAN

sub do_test
{
 my $data = shift @_;
 my $reps = shift @_ || 1;
# print 'Test pattern   ', unpack("b*", $data), "\n";

 $data x= $reps;
 
 my $inv = invlist($data);
# print "Inversion list: @$inv\n";
 ok(scalar @$inv || !length($data));
 my $out = data_from_invlist($inv);
# print 'Output pattern ', unpack("b*", $out), "\n";
 ok($out, $data);
}

( run in 0.560 second using v1.01-cache-2.11-cpan-33209edd8b4 )