Compression-Util
view release on metacpan or search on metacpan
```
Performs Run-Length Encoding (RLE) on a sequence of symbolic elements, specifically designed for runs of four or more consecutive symbols.
It takes two parameters: `\@symbols`, representing an array of symbols, and `$max_run`, indicating the maximum run length allowed during encoding.
The function returns the encoded RLE sequence as an array-ref of symbols.
By default, the maximum run-length is limited to `255`.
## rle4\_decode
```perl
my $symbols = rle4_decode(\@rle4);
my $symbols = rle4_decode($rle4_string);
```
Inverse of `rle4_encode()`.
## zrle\_encode
```perl
my $zrle = zrle_encode(\@symbols);
```
Performs Zero-Run-Length Encoding (ZRLE) on a sequence of symbolic elements, returning the encoded ZRLE sequence as an array-ref of symbols.
This function efficiently encodes runs of zeros, but also increments each symbol by `1`.
## zrle\_decode
```perl
my $symbols = zrle_decode($zrle);
```
Inverse of `zrle_encode()`.
## ac\_encode
```perl
my ($bitstring, $freq) = ac_encode(\@symbols);
```
Performs Arithmetic Coding on the provided symbols.
It takes a single parameter, `\@symbols`, representing the symbols to be encoded.
The function returns two values: `$bitstring`, which is a string of 1s and 0s, and `$freq`, representing the frequency table used for encoding.
## ac\_decode
```perl
my $symbols = ac_decode($bits_fh, \%freq);
my $symbols = ac_decode($bitstring, \%freq);
```
Performs Arithmetic Coding decoding using the provided frequency table and a string of 1s and 0s. Inverse of `ac_encode()`.
It takes two parameters: `$bitstring`, representing a string of 1s and 0s containing the arithmetic coded data, and `\%freq`, representing the frequency table used for encoding.
The function returns the decoded sequence of symbols.
## adaptive\_ac\_encode
```perl
my ($bitstring, $alphabet) = adaptive_ac_encode(\@symbols);
```
Performs Adaptive Arithmetic Coding on the provided symbols.
It takes a single parameter, `\@symbols`, representing the symbols to be encoded.
The function returns two values: `$bitstring`, which is a string of 1s and 0s, and `$alphabet`, which is an array-ref of distinct sorted symbols.
## adaptive\_ac\_decode
```perl
my $symbols = adaptive_ac_decode($bits_fh, \@alphabet);
my $symbols = adaptive_ac_decode($bitstring, \@alphabet);
```
Performs Adaptive Arithmetic Coding decoding using the provided frequency table and a string of 1s and 0s.
It takes two parameters: `$bitstring`, representing a string of 1s and 0s containing the adaptive arithmetic coded data, and `\@alphabet`, representing the array of distinct sorted symbols that appear in the encoded data.
The function returns the decoded sequence of symbols.
## lzw\_encode
```perl
my $symbols = lzw_encode($string);
```
Performs Lempel-Ziv-Welch (LZW) encoding on the provided string.
It takes a single parameter, `$string`, representing the data to be encoded.
The function returns an array-ref of symbols.
## lzw\_decode
```perl
my $string = lzw_decode(\@symbols);
```
Performs Lempel-Ziv-Welch (LZW) decoding on the provided symbols. Inverse of `lzw_encode()`.
The function returns the decoded string.
# INTERFACE FOR LOW-LEVEL FUNCTIONS
## crc32
```perl
my $int32 = crc32($data);
my $int32 = crc32($data, $prev_crc32);
```
Compute the CRC32 checksum of a given string.
## adler32
```perl
my $int32 = adler32($data);
my $int32 = adler32($data, $prev_adler32);
```
Compute the Adler32 checksum of a given string.
## read\_bit
```perl
my $bit = read_bit($fh, \$buffer);
```
Reads a single bit from a file-handle `$fh` (MSB order).
The function stores the extra bits inside the `$buffer`, reading one character at a time from the file-handle.
## read\_bit\_lsb
```perl
my $bit = read_bit_lsb($fh, \$buffer);
```
Reads a single bit from a file-handle `$fh` (LSB order).
The function stores the extra bits inside the `$buffer`, reading one character at a time from the file-handle.
## read\_bits
```perl
my $bitstring = read_bits($fh, $bits_len);
```
Reads a specified number of bits (`$bits_len`) from a file-handle (`$fh`) and returns them as a string, in MSB order.
## read\_bits\_lsb
```perl
my $bitstring = read_bits_lsb($fh, $bits_len);
```
Reads a specified number of bits (`$bits_len`) from a file-handle (`$fh`) and returns them as a string, in LSB order.
## int2bits
```perl
Read `$size` bits from a file-handle `$fh` and convert them to an integer, in LSB order. Inverse of `int2bits_lsb()`.
The function stores the extra bits inside the `$buffer`, reading one character at a time from the file-handle.
## bytes2int
```perl
my $integer = bytes2int($fh, $n);
my $integer = bytes2int($str, $n);
```
Read `$n` bytes from a file-handle `$fh` or from a string `$str` and convert them to an integer, in MSB order.
## bytes2int\_lsb
```perl
my $integer = bytes2int_lsb($fh, $n);
my $integer = bytes2int_lsb($str, $n);
```
Read `$n` bytes from a file-handle `$fh` or from a string `$str` and convert them to an integer, in LSB order.
## string2symbols
```perl
my $symbols = string2symbols($string)
```
Returns an array-ref of code points, given a string.
## symbols2string
```perl
my $string = symbols2string(\@symbols)
```
Returns a string, given an array-ref of code points.
## read\_null\_terminated
```perl
my $string = read_null_terminated($fh)
```
Read a string from file-handle `$fh` that ends with a NULL character ("\\0").
## binary\_vrl\_encode
```perl
my $bitstring_enc = binary_vrl_encode($bitstring);
```
Given a string of 1s and 0s, returns back a bitstring of 1s and 0s encoded using variable run-length encoding.
## binary\_vrl\_decode
```perl
my $bitstring = binary_vrl_decode($bitstring_enc);
```
Given an encoded bitstring, returned by `binary_vrl_encode()`, gives back the decoded string of 1s and 0s.
## bwt\_sort
```perl
my $indices = bwt_sort($string);
my $indices = bwt_sort($string, $lookahead_len);
```
Low-level function that sorts the rotations of a given string using the Burrows-Wheeler Transform (BWT) algorithm.
It takes two parameters: `$string`, which is the input string to be transformed, and `$LOOKAHEAD_LEN` (optional), representing the length of look-ahead during sorting.
The function returns an array-ref of indices.
There is probably no need to call this function explicitly. Use `bwt_encode()` instead!
## bwt\_sort\_symbolic
```perl
my $indices = bwt_sort_symbolic(\@symbols);
```
Low-level function that sorts the rotations of a sequence of symbolic elements using the Burrows-Wheeler Transform (BWT) algorithm.
It takes a single parameter `\@symbols`, which represents the input sequence of symbolic elements. The function returns an array of indices.
There is probably no need to call this function explicitly. Use `bwt_encode_symbolic()` instead!
## huffman\_from\_freq
```perl
my $dict = huffman_from_freq(\%freq);
my ($dict, $rev_dict) = huffman_from_freq(\%freq);
```
Low-level function that constructs Huffman prefix codes, based on the frequency of symbols provided in a hash table.
It takes a single parameter, `\%freq`, representing the hash table where keys are symbols, and values are their corresponding frequencies.
The function returns two values: `$dict`, which is the mapping of symbols to Huffman codes, and `$rev_dict`, which holds the reverse mapping of Huffman codes to symbols.
The prefix codes are in canonical form, as defined in RFC 1951 (Section 3.2.2).
## huffman\_from\_symbols
```perl
my $dict = huffman_from_symbols(\@symbols);
my ($dict, $rev_dict) = huffman_from_symbols(\@symbols);
```
Low-level function that constructs Huffman prefix codes, given an array-ref of symbols.
It takes a single parameter, `\@symbols`, from which it computes the frequency of each symbol and generates the corresponding Huffman prefix codes.
The function returns two values: `$dict`, which is the mapping of symbols to Huffman codes, and `$rev_dict`, which holds the reverse mapping of Huffman codes to symbols.
The prefix codes are in canonical form, as defined in RFC 1951 (Section 3.2.2).
## huffman\_from\_code\_lengths
```perl
my $dict = huffman_from_code_lengths(\@code_lengths);
my ($dict, $rev_dict) = huffman_from_code_lengths(\@code_lengths);
```
Low-level function that constructs a dictionary of canonical prefix codes, given an array of code lengths, as defined in RFC 1951 (Section 3.2.2).
It takes a single parameter, `\@code_lengths`, where entry `$i` in the array corresponds to the code length for symbol `$i`.
The function returns two values: `$dict`, which is the mapping of symbols to Huffman codes, and `$rev_dict`, which holds the reverse mapping of Huffman codes to symbols.
## huffman\_encode
```perl
my $bitstring = huffman_encode(\@symbols, $dict);
```
Low-level function that performs Huffman encoding on a sequence of symbols using a provided dictionary, returned by `huffman_from_freq()`.
It takes two parameters: `\@symbols`, representing the sequence of symbols to be encoded, and `$dict`, representing the Huffman dictionary mapping symbols to their corresponding Huffman codes.
The function returns a concatenated string of 1s and 0s, representing the Huffman-encoded sequence of symbols.
## huffman\_decode
```perl
my $symbols = huffman_decode($bitstring, $rev_dict);
```
Low-level function that decodes a Huffman-encoded binary string into a sequence of symbols using a provided reverse dictionary.
It takes two parameters: `$bitstring`, representing the Huffman-encoded string of 1s and 0s, as returned by `huffman_encode()`, and `$rev_dict`, representing the reverse dictionary mapping Huffman codes to their corresponding symbols.
The function returns the decoded sequence of symbols as an array-ref.
## lz77\_encode / lz77\_encode\_symbolic
```perl
my ($literals, $distances, $lengths, $matches) = lz77_encode($string);
my ($literals, $distances, $lengths, $matches) = lz77_encode(\@symbols);
```
Low-level function that combines LZSS with ideas from the LZ4 method.
The function returns four values:
```perl
$literals # array-ref of uncompressed symbols
$distances # array-ref of back-reference distances
$lengths # array-ref of literal lengths
$matches # array-ref of match lengths
```
The output can be decoded with `lz77_decode()` and `lz77_decode_symbolic()`, respectively.
## lz77\_decode / lz77\_decode\_symbolic
```perl
my $string = lz77_decode(\@literals, \@distances, \@lengths, \@matches);
my $symbols = lz77_decode_symbolic(\@literals, \@distances, \@lengths, \@matches);
```
Low-level function that performs decoding using the provided literals, distances, lengths and matches, returned by LZ77 encoding.
Inverse of `lz77_encode()` and `lz77_encode_symbolic()`, respectively.
## lzss\_encode / lzss\_encode\_fast / lzss\_encode\_symbolic / lzss\_encode\_fast\_symbolic
```perl
# Standard version
my ($literals, $distances, $lengths) = lzss_encode($data, %params);
my ($literals, $distances, $lengths) = lzss_encode(\@symbols, %params);
# Faster version
my ($literals, $distances, $lengths) = lzss_encode_fast($data, %params);
my ($literals, $distances, $lengths) = lzss_encode_fast(\@symbols, %params);
```
Low-level function that applies the LZSS (Lempel-Ziv-Storer-Szymanski) algorithm on the provided data.
The accepted `%params` are:
```perl
min_len => $LZ_MIN_LEN,
max_len => $LZ_MAX_LEN,
max_dist => $LZ_MAX_DIST,
max_chain_len => $LZ_MAX_CHAIN_LEN,
```
The function returns three values:
```perl
$literals # array-ref of uncompressed symbols
$distances # array-ref of back-reference distances
$lengths # array-ref of match lengths
```
The output can be decoded with `lzss_decode()` and `lzss_decode_symbolic()`, respectively.
## lzss\_decode / lzss\_decode\_symbolic
my $string = lzss_decode(\@literals, \@distances, \@lengths);
my $symbols = lzss_decode_symbolic(\@literals, \@distances, \@lengths);
Low-level function that decodes the LZSS encoding, using the provided literals, distances, and lengths of matched sub-strings.
Inverse of `lzss_encode()` and `lzss_encode_fast()`.
## deflate\_encode
```perl
# Returns a binary string
my $string = deflate_encode(\@literals, \@distances, \@lengths);
my $string = deflate_encode(\@literals, \@distances, \@lengths, \&create_ac_entry);
```
Low-level function that encodes the results returned by `lzss_encode()` and `lzss_encode_fast()`, using a DEFLATE-like approach, combined with Huffman coding.
## deflate\_decode
```perl
# Huffman decoding
my ($literals, $distances, $lengths) = deflate_decode($fh);
my ($literals, $distances, $lengths) = deflate_decode($string);
# Arithmetic decoding
my ($literals, $distances, $lengths) = deflate_decode($fh, \&decode_ac_entry);
my ($literals, $distances, $lengths) = deflate_decode($string, \&decode_ac_entry);
```
Inverse of `deflate_encode()`.
## make\_deflate\_tables
```perl
my ($DISTANCE_SYMBOLS, $LENGTH_SYMBOLS, $LENGTH_INDICES) = make_deflate_tables($max_dist, $max_len);
```
Low-level function that returns a list of tables used in encoding the relative back-reference distances and lengths returned by `lzss_encode()` and `lzss_encode_fast()`.
When no arguments are provided:
```perl
$max_dist = $Compression::Util::LZ_MAX_DIST
$max_len = $Compression::Util::LZ_MAX_LEN
```
There is no need to call this function explicitly. Use `deflate_encode()` instead!
## find\_deflate\_index
```perl
my $index = find_deflate_index($value, $DISTANCE_SYMBOLS);
```
Low-level function that returns the index inside the DEFLATE tables for a given value.
## deflate\_create\_block\_type\_0\_header
```perl
my $bt0_header = deflate_create_block_type_0_header($chunk);
```
Creates the header for a DEFLATE block of type 0 (uncompressed), as a bitstring, without including the block code number `00`.
The length of the `$chunk` must not exceed `2^16 - 1`.
To create a DEFLATE block of type 0, including the content, use:
```perl
my $block_type_0 = pack('b*', '00') . pack('b*', $bt0_header) . $chunk;
```
which can be recovered as:
```perl
open my $fh, '<:raw', \$block_type_0;
my ($buffer, $search_window) = ('', '');
my $chunk = deflate_extract_next_block($fh, \$buffer, \$search_window);
```
## deflate\_create\_block\_type\_1
```perl
my $bitstring = deflate_create_block_type_1($literals, $distances, $lengths);
```
Creates a DEFLATE block of type 1 (fixed prefix-codes), as a bitstring, given the ARRAY-refs of literals, distances and lengths, returned by `lzss_encode()`.
This type of block uses fixed prefix-codes and is pretty fast.
## deflate\_create\_block\_type\_2
```perl
my $bitstring = deflate_create_block_type_1($literals, $distances, $lengths);
```
Creates a DEFLATE block of type 2 (dynamic prefix-codes), as a bitstring, given the ARRAY-refs of literals, distances and lengths, returned by `lzss_encode()`.
This type of block uses dynamic prefix-codes (Huffman codes) and produces good compression ratio on most inputs.
## deflate\_extract\_block\_type\_0
```perl
my $data = deflate_extract_block_type_0($fh, \$buffer, \$search_window);
```
Given an input filehandle, it extracts a DEFLATE block of type 0 (uncompressed).
```perl
my ($buffer, $search_window) = ('', '');
my $block_type = bits2int_lsb($fh, 2, \$buffer);
$block_type == 0 or die "Not a block of type 0";
my $decoded_chunk = deflate_extract_block_type_0($fh, \$buffer, \$search_window);
```
## deflate\_extract\_block\_type\_1
```perl
my $data = deflate_extract_block_type_1($fh, \$buffer, \$search_window);
```
Given an input filehandle, a bitstring buffer and a search window, it extracts a DEFLATE block of type 1 (fixed prefix-codes).
```perl
my ($buffer, $search_window) = ('', '');
my $block_type = bits2int_lsb($fh, 2, \$buffer);
$block_type == 1 or die "Not a block of type 1";
my $decoded_chunk = deflate_extract_block_type_1($fh, \$buffer, \$search_window);
```
## deflate\_extract\_block\_type\_2
```perl
my $data = deflate_extract_block_type_2($fh, \$buffer, \$search_window);
```
Given an input filehandle, a bitstring buffer and a search window, it extracts a DEFLATE block of type 2 (dynamic prefix-codes).
```perl
my ($buffer, $search_window) = ('', '');
my $block_type = bits2int_lsb($fh, 2, \$buffer);
$block_type == 2 or die "Not a block of type 2";
my $decoded_chunk = deflate_extract_block_type_2($fh, \$buffer, \$search_window);
```
## deflate\_extract\_next\_block
```perl
my $data = deflate_extract_next_block($fh, \$buffer, \$search_window);
```
Given an input filehandle, a bitstring buffer and a search window, it extracts the next DEFLATE block. The next two bits in the input file-handle (or in the bitstring buffer) must contain the block-type number.
# EXPORT
Each function can be exported individually, as:
```perl
use Compression::Util qw(bwt_compress);
```
By specifying the **:all** keyword, will export all the exportable functions:
```perl
use Compression::Util qw(:all);
```
Nothing is exported by default.
# EXAMPLES
The functions can be combined in various ways, easily creating novel compression methods, as illustrated in the following examples.
## Combining LZSS + MRL compression:
```perl
my $enc = lzss_compress($str, \&mrl_compress_symbolic);
my $dec = lzss_decompress($enc, \&mrl_decompress_symbolic);
```
## Combining LZ77 + OBH encoding:
```perl
my $enc = lz77_compress($str, \&obh_encode);
my $dec = lz77_decompress($enc, \&obh_decode);
```
## Combining LZSS + symbolic BWT compression:
```perl
my $enc = lzss_compress($str, \&bwt_compress_symbolic);
my $dec = lzss_decompress($enc, \&bwt_decompress_symbolic);
```
## Combining BWT + symbolic LZSS:
```perl
my $enc = bwt_compress($str, \&lzss_compress_symbolic);
my $dec = bwt_decompress($enc, \&lzss_decompress_symbolic);
```
## Combining LZW + Fibonacci encoding:
( run in 1.336 second using v1.01-cache-2.11-cpan-df04353d9ac )