Atomic-Pipe
view release on metacpan or search on metacpan
accept:
compression => 'zstd'
Enable Zstd compression. Currently 'zstd' is the only supported
algorithm; any other value croaks at construction.
compression_level => $level
Zstd compression level, defaults to 3. Only meaningful when
compression is enabled.
compression_dictionary => $bytes
Optional shared Zstd dictionary, supplied as raw bytes. Both ends
must use the same dictionary content. Mutually exclusive with
compression_dictionary_file.
compression_dictionary_file => $path
Same as compression_dictionary but loaded from a file via
"new_from_file" in Compress::Zstd::CompressionDictionary. The file is
read on demand.
keep_compressed => $bool
When set together with compression, reads expose the on-wire
compressed bytes alongside the decompressed payload. See
"read_message" and "get_line_burst_or_data" for the exact
return-shape changes. Has no effect without compression.
Custom dictionary
Custom Zstd dictionaries can dramatically reduce frame size for small,
repetitive payloads. Either form (bytes or file) may be supplied at
construction or via "set_compression_dictionary" /
"set_compression_dictionary_file".
Caveat: raw zstd dictionaries do not embed a dict-ID. As a result a
mismatched peer dictionary will silently decode to garbage rather than
fail. (Hard frame corruption -- truncated or invalid frames -- still
raises fatally.) Both ends must agree on byte-identical dictionary
content.
Performance
Compression is not just a wire-size optimization for Atomic::Pipe: when
messages exceed PIPE_BUF (typically 4096 bytes on Linux) the writer
must fragment them into multiple non-atomic chunks, and the reader must
reassemble them. Compressing the payload first frequently collapses a
multi-part message back into a single atomic burst, which avoids that
per-message protocol overhead entirely. As a result, on workloads
dominated by larger-than-PIPE_BUF messages, compression is often much
faster end-to-end than no compression, even after accounting for the
CPU cost of compress/decompress.
The kernel pipe buffer size (see "resize") does not affect this --
fragmentation is keyed on the POSIX PIPE_BUF atomic-write threshold,
not on the buffer capacity.
Benchmark: streaming JSON objects
Numbers below are from bench/zstd_compression.pl in the distribution.
The workload is a synthetic but representative stream of JSON log/event
objects sent in mixed-data mode via write_message. The corpus is
generated once and reused across all runs; sizes are JSON-encoded byte
counts.
Two corpora were measured:
Small JSON (10 MB total, 11785 objects)
Object sizes 181 .. 1977 bytes, average ~890 B; ~37% of objects under
500 B. Most messages fit in a single PIPE_BUF burst regardless of
compression.
level raw MB/s wire MB ratio saved
plain 9.74 10.00 - -
L-3 15.98 6.68 1.50x 33.2%
L1 24.55 4.92 2.03x 50.8%
L3 (def) 27.79 4.91 2.04x 50.9%
L5 46.34 4.87 2.05x 51.3%
L7 63.72 4.87 2.05x 51.3%
L12 27.02 4.85 2.06x 51.5%
L22 14.43 4.84 2.07x 51.6%
For this size distribution, levels 1..7 are all faster than no
compression (pipe back-pressure on the uncompressed run still
dominates).
Larger JSON (100 MB total, 20407 objects)
Object sizes 187 .. 10000 bytes, average ~5.1 KB, evenly distributed
across the 1..10 KB range. Most objects exceed PIPE_BUF, so the
uncompressed path pays the multi-part fragmentation cost on nearly
every message.
level raw MB/s wire MB ratio saved
plain 0.29 100.00 - -
L-3 287.85 35.61 2.81x 64.4%
L-1 273.56 33.92 2.95x 66.1%
L1 237.04 30.56 3.27x 69.4%
L3 (def) 207.61 30.25 3.31x 69.7%
L5 113.02 30.01 3.33x 70.0%
L9 39.35 29.93 3.34x 70.1%
L18 7.81 28.14 3.55x 71.9%
L22 7.85 28.14 3.55x 71.9%
Here the uncompressed run collapses to ~0.29 MB/s, while even modest
compression levels achieve 200+ MB/s -- a ~1000x throughput
improvement driven almost entirely by avoided fragmentation. Levels
above ~5 trade significant CPU for negligible additional ratio.
Pipe buffer size has minimal impact
The same 100 MB corpus, holding mode constant and varying the kernel
pipe buffer (32 KB, 128 KB, 512 KB, 1 MB), shows almost no movement
in either direction. The bottleneck is PIPE_BUF-aligned framing, not
buffer fill, so calling "resize" with a larger size will not rescue
an uncompressed large-message workload.
( run in 1.010 second using v1.01-cache-2.11-cpan-140bd7fdf52 )