streaming results from the CPAN

Atomic-Pipe

=over 4

=item compression => 'zstd'

Enable Zstd compression. Currently C<'zstd'> is the only supported algorithm;
any other value croaks at construction.

=item compression_level => $level

Zstd compression level, defaults to 3. Only meaningful when C<compression> is
enabled.

=item compression_dictionary => $bytes

Optional shared Zstd dictionary, supplied as raw bytes. Both ends must use the
same dictionary content. Mutually exclusive with C<compression_dictionary_file>.

=item compression_dictionary_file => $path

Same as C<compression_dictionary> but loaded from a file via
L<Compress::Zstd::CompressionDictionary/new_from_file>. The file is read on
demand.

=item keep_compressed => $bool

When set together with C<compression>, reads expose the on-wire compressed
bytes alongside the decompressed payload. See L</read_message> and
L</get_line_burst_or_data> for the exact return-shape changes. Has no effect
without C<compression>.

=back

=head2 Custom dictionary

Custom Zstd dictionaries can dramatically reduce frame size for small,
repetitive payloads. Either form (bytes or file) may be supplied at
construction or via L</set_compression_dictionary> /
L</set_compression_dictionary_file>.

B<Caveat:> raw zstd dictionaries do not embed a dict-ID. As a result a
B<mismatched> peer dictionary will silently decode to garbage rather than
fail. (Hard frame corruption -- truncated or invalid frames -- still raises
fatally.) Both ends must agree on byte-identical dictionary content.

=head2 Performance

Compression is not just a wire-size optimization for C<Atomic::Pipe>: when
messages exceed C<PIPE_BUF> (typically 4096 bytes on Linux) the writer must
fragment them into multiple non-atomic chunks, and the reader must reassemble
them. Compressing the payload first frequently collapses a multi-part message
back into a single atomic burst, which avoids that per-message protocol
overhead entirely. As a result, on workloads dominated by larger-than-PIPE_BUF
messages, compression is often B<much faster end-to-end than no compression>,
even after accounting for the CPU cost of compress/decompress.

The kernel pipe buffer size (see L</resize>) does B<not> affect this --
fragmentation is keyed on the POSIX C<PIPE_BUF> atomic-write threshold, not on
the buffer capacity.

=head3 Benchmark: streaming JSON objects

Numbers below are from C<bench/zstd_compression.pl> in the distribution. The
workload is a synthetic but representative stream of JSON log/event objects
sent in mixed-data mode via C<write_message>. The corpus is generated once and
reused across all runs; sizes are JSON-encoded byte counts.

Two corpora were measured:

=over 4

=item Small JSON (10 MB total, 11785 objects)

Object sizes 181 .. 1977 bytes, average ~890 B; ~37% of objects under 500 B.
Most messages fit in a single C<PIPE_BUF> burst regardless of compression.

  level     raw MB/s   wire MB    ratio   saved
  plain         9.74    10.00       -        -
  L-3          15.98     6.68    1.50x    33.2%
  L1           24.55     4.92    2.03x    50.8%
  L3 (def)     27.79     4.91    2.04x    50.9%
  L5           46.34     4.87    2.05x    51.3%
  L7           63.72     4.87    2.05x    51.3%
  L12          27.02     4.85    2.06x    51.5%
  L22          14.43     4.84    2.07x    51.6%

For this size distribution, levels 1..7 are all faster than no compression
(pipe back-pressure on the uncompressed run still dominates).

=item Larger JSON (100 MB total, 20407 objects)

Object sizes 187 .. 10000 bytes, average ~5.1 KB, evenly distributed across
the 1..10 KB range. Most objects exceed C<PIPE_BUF>, so the uncompressed path
pays the multi-part fragmentation cost on nearly every message.

  level     raw MB/s   wire MB    ratio   saved
  plain         0.29   100.00       -        -
  L-3         287.85    35.61    2.81x    64.4%
  L-1         273.56    33.92    2.95x    66.1%
  L1          237.04    30.56    3.27x    69.4%
  L3 (def)    207.61    30.25    3.31x    69.7%
  L5          113.02    30.01    3.33x    70.0%
  L9           39.35    29.93    3.34x    70.1%
  L18           7.81    28.14    3.55x    71.9%
  L22           7.85    28.14    3.55x    71.9%

Here the uncompressed run collapses to ~0.29 MB/s, while even modest
compression levels achieve 200+ MB/s -- a ~1000x throughput improvement
driven almost entirely by avoided fragmentation. Levels above ~5 trade
significant CPU for negligible additional ratio.

=item Pipe buffer size has minimal impact

The same 100 MB corpus, holding mode constant and varying the kernel pipe
buffer (32 KB, 128 KB, 512 KB, 1 MB), shows almost no movement in either
direction. The bottleneck is C<PIPE_BUF>-aligned framing, not buffer fill, so
calling L</resize> with a larger size will not rescue an uncompressed
large-message workload.

=back
( run in 1.296 second using v1.01-cache-2.11-cpan-140bd7fdf52 )