Compression-Util

 view release on metacpan or  search on metacpan

README.md  view on Meta::CPAN

```

Performs Run-Length Encoding (RLE) on a sequence of symbolic elements, specifically designed for runs of four or more consecutive symbols.

It takes two parameters: `\@symbols`, representing an array of symbols, and `$max_run`, indicating the maximum run length allowed during encoding.

The function returns the encoded RLE sequence as an array-ref of symbols.

By default, the maximum run-length is limited to `255`.

## rle4\_decode

```perl
    my $symbols = rle4_decode(\@rle4);
    my $symbols = rle4_decode($rle4_string);
```

Inverse of `rle4_encode()`.

## zrle\_encode

```perl
    my $zrle = zrle_encode(\@symbols);
```

Performs Zero-Run-Length Encoding (ZRLE) on a sequence of symbolic elements, returning the encoded ZRLE sequence as an array-ref of symbols.

This function efficiently encodes runs of zeros, but also increments each symbol by `1`.

## zrle\_decode

```perl
    my $symbols = zrle_decode($zrle);
```

Inverse of `zrle_encode()`.

## ac\_encode

```perl
    my ($bitstring, $freq) = ac_encode(\@symbols);
```

Performs Arithmetic Coding on the provided symbols.

It takes a single parameter, `\@symbols`, representing the symbols to be encoded.

The function returns two values: `$bitstring`, which is a string of 1s and 0s, and `$freq`, representing the frequency table used for encoding.

## ac\_decode

```perl
    my $symbols = ac_decode($bits_fh, \%freq);
    my $symbols = ac_decode($bitstring, \%freq);
```

Performs Arithmetic Coding decoding using the provided frequency table and a string of 1s and 0s. Inverse of `ac_encode()`.

It takes two parameters: `$bitstring`, representing a string of 1s and 0s containing the arithmetic coded data, and `\%freq`, representing the frequency table used for encoding.

The function returns the decoded sequence of symbols.

## adaptive\_ac\_encode

```perl
    my ($bitstring, $alphabet) = adaptive_ac_encode(\@symbols);
```

Performs Adaptive Arithmetic Coding on the provided symbols.

It takes a single parameter, `\@symbols`, representing the symbols to be encoded.

The function returns two values: `$bitstring`, which is a string of 1s and 0s, and `$alphabet`, which is an array-ref of distinct sorted symbols.

## adaptive\_ac\_decode

```perl
    my $symbols = adaptive_ac_decode($bits_fh, \@alphabet);
    my $symbols = adaptive_ac_decode($bitstring, \@alphabet);
```

Performs Adaptive Arithmetic Coding decoding using the provided frequency table and a string of 1s and 0s.

It takes two parameters: `$bitstring`, representing a string of 1s and 0s containing the adaptive arithmetic coded data, and `\@alphabet`, representing the array of distinct sorted symbols that appear in the encoded data.

The function returns the decoded sequence of symbols.

## lzw\_encode

```perl
    my $symbols = lzw_encode($string);
```

Performs Lempel-Ziv-Welch (LZW) encoding on the provided string.

It takes a single parameter, `$string`, representing the data to be encoded.

The function returns an array-ref of symbols.

## lzw\_decode

```perl
    my $string = lzw_decode(\@symbols);
```

Performs Lempel-Ziv-Welch (LZW) decoding on the provided symbols. Inverse of `lzw_encode()`.

The function returns the decoded string.

# INTERFACE FOR LOW-LEVEL FUNCTIONS

## crc32

```perl
    my $int32 = crc32($data);
    my $int32 = crc32($data, $prev_crc32);
```

Compute the CRC32 checksum of a given string.

## adler32

```perl
    my $int32 = adler32($data);
    my $int32 = adler32($data, $prev_adler32);
```

Compute the Adler32 checksum of a given string.

## read\_bit

```perl
    my $bit = read_bit($fh, \$buffer);
```

Reads a single bit from a file-handle `$fh` (MSB order).

The function stores the extra bits inside the `$buffer`, reading one character at a time from the file-handle.

## read\_bit\_lsb

```perl
    my $bit = read_bit_lsb($fh, \$buffer);
```

Reads a single bit from a file-handle `$fh` (LSB order).

The function stores the extra bits inside the `$buffer`, reading one character at a time from the file-handle.

## read\_bits

```perl
    my $bitstring = read_bits($fh, $bits_len);
```

Reads a specified number of bits (`$bits_len`) from a file-handle (`$fh`) and returns them as a string, in MSB order.

## read\_bits\_lsb

```perl
    my $bitstring = read_bits_lsb($fh, $bits_len);
```

Reads a specified number of bits (`$bits_len`) from a file-handle (`$fh`) and returns them as a string, in LSB order.

## int2bits

```perl

README.md  view on Meta::CPAN

Read `$size` bits from a file-handle `$fh` and convert them to an integer, in LSB order. Inverse of `int2bits_lsb()`.

The function stores the extra bits inside the `$buffer`, reading one character at a time from the file-handle.

## bytes2int

```perl
    my $integer = bytes2int($fh, $n);
    my $integer = bytes2int($str, $n);
```

Read `$n` bytes from a file-handle `$fh` or from a string `$str` and convert them to an integer, in MSB order.

## bytes2int\_lsb

```perl
    my $integer = bytes2int_lsb($fh, $n);
    my $integer = bytes2int_lsb($str, $n);
```

Read `$n` bytes from a file-handle `$fh` or from a string `$str` and convert them to an integer, in LSB order.

## string2symbols

```perl
    my $symbols = string2symbols($string)
```

Returns an array-ref of code points, given a string.

## symbols2string

```perl
    my $string = symbols2string(\@symbols)
```

Returns a string, given an array-ref of code points.

## read\_null\_terminated

```perl
    my $string = read_null_terminated($fh)
```

Read a string from file-handle `$fh` that ends with a NULL character ("\\0").

## binary\_vrl\_encode

```perl
    my $bitstring_enc = binary_vrl_encode($bitstring);
```

Given a string of 1s and 0s, returns back a bitstring of 1s and 0s encoded using variable run-length encoding.

## binary\_vrl\_decode

```perl
    my $bitstring = binary_vrl_decode($bitstring_enc);
```

Given an encoded bitstring, returned by `binary_vrl_encode()`, gives back the decoded string of 1s and 0s.

## bwt\_sort

```perl
    my $indices = bwt_sort($string);
    my $indices = bwt_sort($string, $lookahead_len);
```

Low-level function that sorts the rotations of a given string using the Burrows-Wheeler Transform (BWT) algorithm.

It takes two parameters: `$string`, which is the input string to be transformed, and `$LOOKAHEAD_LEN` (optional), representing the length of look-ahead during sorting.

The function returns an array-ref of indices.

There is probably no need to call this function explicitly. Use `bwt_encode()` instead!

## bwt\_sort\_symbolic

```perl
    my $indices = bwt_sort_symbolic(\@symbols);
```

Low-level function that sorts the rotations of a sequence of symbolic elements using the Burrows-Wheeler Transform (BWT) algorithm.

It takes a single parameter `\@symbols`, which represents the input sequence of symbolic elements. The function returns an array of indices.

There is probably no need to call this function explicitly. Use `bwt_encode_symbolic()` instead!

## huffman\_from\_freq

```perl
    my $dict = huffman_from_freq(\%freq);
    my ($dict, $rev_dict) = huffman_from_freq(\%freq);
```

Low-level function that constructs Huffman prefix codes, based on the frequency of symbols provided in a hash table.

It takes a single parameter, `\%freq`, representing the hash table where keys are symbols, and values are their corresponding frequencies.

The function returns two values: `$dict`, which is the mapping of symbols to Huffman codes, and `$rev_dict`, which holds the reverse mapping of Huffman codes to symbols.

The prefix codes are in canonical form, as defined in RFC 1951 (Section 3.2.2).

## huffman\_from\_symbols

```perl
    my $dict = huffman_from_symbols(\@symbols);
    my ($dict, $rev_dict) = huffman_from_symbols(\@symbols);
```

Low-level function that constructs Huffman prefix codes, given an array-ref of symbols.

It takes a single parameter, `\@symbols`, from which it computes the frequency of each symbol and generates the corresponding Huffman prefix codes.

The function returns two values: `$dict`, which is the mapping of symbols to Huffman codes, and `$rev_dict`, which holds the reverse mapping of Huffman codes to symbols.

The prefix codes are in canonical form, as defined in RFC 1951 (Section 3.2.2).

## huffman\_from\_code\_lengths

```perl
    my $dict = huffman_from_code_lengths(\@code_lengths);
    my ($dict, $rev_dict) = huffman_from_code_lengths(\@code_lengths);
```

Low-level function that constructs a dictionary of canonical prefix codes, given an array of code lengths, as defined in RFC 1951 (Section 3.2.2).

It takes a single parameter, `\@code_lengths`, where entry `$i` in the array corresponds to the code length for symbol `$i`.

The function returns two values: `$dict`, which is the mapping of symbols to Huffman codes, and `$rev_dict`, which holds the reverse mapping of Huffman codes to symbols.

## huffman\_encode

```perl
    my $bitstring = huffman_encode(\@symbols, $dict);
```

Low-level function that performs Huffman encoding on a sequence of symbols using a provided dictionary, returned by `huffman_from_freq()`.

It takes two parameters: `\@symbols`, representing the sequence of symbols to be encoded, and `$dict`, representing the Huffman dictionary mapping symbols to their corresponding Huffman codes.

The function returns a concatenated string of 1s and 0s, representing the Huffman-encoded sequence of symbols.

## huffman\_decode

```perl
    my $symbols = huffman_decode($bitstring, $rev_dict);
```

Low-level function that decodes a Huffman-encoded binary string into a sequence of symbols using a provided reverse dictionary.

It takes two parameters: `$bitstring`, representing the Huffman-encoded string of 1s and 0s, as returned by `huffman_encode()`, and `$rev_dict`, representing the reverse dictionary mapping Huffman codes to their corresponding symbols.

The function returns the decoded sequence of symbols as an array-ref.

## lz77\_encode / lz77\_encode\_symbolic

```perl
    my ($literals, $distances, $lengths, $matches) = lz77_encode($string);
    my ($literals, $distances, $lengths, $matches) = lz77_encode(\@symbols);
```

Low-level function that combines LZSS with ideas from the LZ4 method.

The function returns four values:

```perl
    $literals   # array-ref of uncompressed symbols
    $distances  # array-ref of back-reference distances
    $lengths    # array-ref of literal lengths
    $matches    # array-ref of match lengths
```

The output can be decoded with `lz77_decode()` and `lz77_decode_symbolic()`, respectively.

## lz77\_decode / lz77\_decode\_symbolic

```perl
    my $string  = lz77_decode(\@literals, \@distances, \@lengths, \@matches);
    my $symbols = lz77_decode_symbolic(\@literals, \@distances, \@lengths, \@matches);
```

Low-level function that performs decoding using the provided literals, distances, lengths and matches, returned by LZ77 encoding.

Inverse of `lz77_encode()` and `lz77_encode_symbolic()`, respectively.

## lzss\_encode / lzss\_encode\_fast / lzss\_encode\_symbolic / lzss\_encode\_fast\_symbolic

```perl
    # Standard version
    my ($literals, $distances, $lengths) = lzss_encode($data, %params);
    my ($literals, $distances, $lengths) = lzss_encode(\@symbols, %params);

    # Faster version
    my ($literals, $distances, $lengths) = lzss_encode_fast($data, %params);
    my ($literals, $distances, $lengths) = lzss_encode_fast(\@symbols, %params);
```

Low-level function that applies the LZSS (Lempel-Ziv-Storer-Szymanski) algorithm on the provided data.

The accepted `%params` are:

```perl
    min_len         => $LZ_MIN_LEN,
    max_len         => $LZ_MAX_LEN,
    max_dist        => $LZ_MAX_DIST,
    max_chain_len   => $LZ_MAX_CHAIN_LEN,
```

The function returns three values:

```perl
    $literals   # array-ref of uncompressed symbols
    $distances  # array-ref of back-reference distances
    $lengths    # array-ref of match lengths
```

The output can be decoded with `lzss_decode()` and `lzss_decode_symbolic()`, respectively.

## lzss\_decode / lzss\_decode\_symbolic

    my $string  = lzss_decode(\@literals, \@distances, \@lengths);
    my $symbols = lzss_decode_symbolic(\@literals, \@distances, \@lengths);

Low-level function that decodes the LZSS encoding, using the provided literals, distances, and lengths of matched sub-strings.

Inverse of `lzss_encode()` and `lzss_encode_fast()`.

## deflate\_encode

```perl
    # Returns a binary string
    my $string = deflate_encode(\@literals, \@distances, \@lengths);
    my $string = deflate_encode(\@literals, \@distances, \@lengths, \&create_ac_entry);
```

Low-level function that encodes the results returned by `lzss_encode()` and `lzss_encode_fast()`, using a DEFLATE-like approach, combined with Huffman coding.

## deflate\_decode

```perl
    # Huffman decoding
    my ($literals, $distances, $lengths) = deflate_decode($fh);
    my ($literals, $distances, $lengths) = deflate_decode($string);

    # Arithmetic decoding
    my ($literals, $distances, $lengths) = deflate_decode($fh, \&decode_ac_entry);
    my ($literals, $distances, $lengths) = deflate_decode($string, \&decode_ac_entry);
```

Inverse of `deflate_encode()`.

## make\_deflate\_tables

```perl
    my ($DISTANCE_SYMBOLS, $LENGTH_SYMBOLS, $LENGTH_INDICES) = make_deflate_tables($max_dist, $max_len);
```

Low-level function that returns a list of tables used in encoding the relative back-reference distances and lengths returned by `lzss_encode()` and `lzss_encode_fast()`.

When no arguments are provided:

```perl
    $max_dist = $Compression::Util::LZ_MAX_DIST
    $max_len  = $Compression::Util::LZ_MAX_LEN
```

There is no need to call this function explicitly. Use `deflate_encode()` instead!

## find\_deflate\_index

```perl
    my $index = find_deflate_index($value, $DISTANCE_SYMBOLS);
```

Low-level function that returns the index inside the DEFLATE tables for a given value.

## deflate\_create\_block\_type\_0\_header

```perl
    my $bt0_header = deflate_create_block_type_0_header($chunk);
```

Creates the header for a DEFLATE block of type 0 (uncompressed), as a bitstring, without including the block code number `00`.

The length of the `$chunk` must not exceed `2^16 - 1`.

To create a DEFLATE block of type 0, including the content, use:

```perl
    my $block_type_0 = pack('b*', '00') . pack('b*', $bt0_header) . $chunk;
```

which can be recovered as:

```perl
    open my $fh, '<:raw', \$block_type_0;
    my ($buffer, $search_window) = ('', '');
    my $chunk = deflate_extract_next_block($fh, \$buffer, \$search_window);
```

## deflate\_create\_block\_type\_1

```perl
    my $bitstring = deflate_create_block_type_1($literals, $distances, $lengths);
```

Creates a DEFLATE block of type 1 (fixed prefix-codes), as a bitstring, given the ARRAY-refs of literals, distances and lengths, returned by `lzss_encode()`.

This type of block uses fixed prefix-codes and is pretty fast.

## deflate\_create\_block\_type\_2

```perl
    my $bitstring = deflate_create_block_type_1($literals, $distances, $lengths);
```

Creates a DEFLATE block of type 2 (dynamic prefix-codes), as a bitstring, given the ARRAY-refs of literals, distances and lengths, returned by `lzss_encode()`.

This type of block uses dynamic prefix-codes (Huffman codes) and produces good compression ratio on most inputs.

## deflate\_extract\_block\_type\_0

```perl
    my $data = deflate_extract_block_type_0($fh, \$buffer, \$search_window);
```

Given an input filehandle, it extracts a DEFLATE block of type 0 (uncompressed).

```perl
    my ($buffer, $search_window) = ('', '');
    my $block_type = bits2int_lsb($fh, 2, \$buffer);
    $block_type == 0 or die "Not a block of type 0";
    my $decoded_chunk = deflate_extract_block_type_0($fh, \$buffer, \$search_window);
```

## deflate\_extract\_block\_type\_1

```perl
    my $data = deflate_extract_block_type_1($fh, \$buffer, \$search_window);
```

Given an input filehandle, a bitstring buffer and a search window, it extracts a DEFLATE block of type 1 (fixed prefix-codes).

```perl
    my ($buffer, $search_window) = ('', '');
    my $block_type = bits2int_lsb($fh, 2, \$buffer);
    $block_type == 1 or die "Not a block of type 1";
    my $decoded_chunk = deflate_extract_block_type_1($fh, \$buffer, \$search_window);
```

## deflate\_extract\_block\_type\_2

```perl
    my $data = deflate_extract_block_type_2($fh, \$buffer, \$search_window);
```

Given an input filehandle, a bitstring buffer and a search window, it extracts a DEFLATE block of type 2 (dynamic prefix-codes).

```perl
    my ($buffer, $search_window) = ('', '');
    my $block_type = bits2int_lsb($fh, 2, \$buffer);
    $block_type == 2 or die "Not a block of type 2";
    my $decoded_chunk = deflate_extract_block_type_2($fh, \$buffer, \$search_window);
```

## deflate\_extract\_next\_block

```perl
    my $data = deflate_extract_next_block($fh, \$buffer, \$search_window);
```

Given an input filehandle, a bitstring buffer and a search window, it extracts the next DEFLATE block. The next two bits in the input file-handle (or in the bitstring buffer) must contain the block-type number.

# EXPORT

Each function can be exported individually, as:

```perl
    use Compression::Util qw(bwt_compress);
```

By specifying the **:all** keyword, will export all the exportable functions:

```perl
    use Compression::Util qw(:all);
```

Nothing is exported by default.

# EXAMPLES

The functions can be combined in various ways, easily creating novel compression methods, as illustrated in the following examples.

## Combining LZSS + MRL compression:

```perl
    my $enc = lzss_compress($str, \&mrl_compress_symbolic);
    my $dec = lzss_decompress($enc, \&mrl_decompress_symbolic);
```

## Combining LZ77 + OBH encoding:

```perl
    my $enc = lz77_compress($str, \&obh_encode);
    my $dec = lz77_decompress($enc, \&obh_decode);
```

## Combining LZSS + symbolic BWT compression:

```perl
    my $enc = lzss_compress($str, \&bwt_compress_symbolic);
    my $dec = lzss_decompress($enc, \&bwt_decompress_symbolic);
```

## Combining BWT + symbolic LZSS:

```perl
    my $enc = bwt_compress($str, \&lzss_compress_symbolic);
    my $dec = bwt_decompress($enc, \&lzss_decompress_symbolic);
```

## Combining LZW + Fibonacci encoding:



( run in 1.336 second using v1.01-cache-2.11-cpan-df04353d9ac )