EV-ClickHouse

 view release on metacpan or  search on metacpan

README.md  view on Meta::CPAN

    # Or via URI: clickhouse[+native]://user:pass@host:port/db?key=val
    my $ch = EV::ClickHouse->new(
        uri        => 'clickhouse+native://default:@127.0.0.1:9000/default',
        on_connect => sub { ... },
    );

    # select
    $ch->query("select number from system.numbers limit 3", sub {
        my ($rows, $err) = @_;
        die $err if $err;
        print "row: @$_\n" for @$rows;     # row: 0 / row: 1 / row: 2
    });

    # Per-query settings + parameterized values (no string interpolation)
    $ch->query(
        "select {x:UInt32} + {y:UInt32} as sum",
        { params => { x => 40, y => 2 }, max_execution_time => 30 },
        sub { my ($rows, $err) = @_; print $rows->[0][0], "\n" },  # 42
    );

    # insert - arrayref of rows (no TSV escaping needed)
    $ch->insert("my_table", [
        [1, "hello\tworld"],   # embedded tab is fine
        [2, undef],            # null
        [3, [10, 20]],         # Array column
    ], sub { my (undef, $err) = @_; warn "insert: $err" if $err });

    # insert - pre-formatted TSV string
    $ch->insert("my_table", "1\tfoo\n2\tbar\n", sub { ... });

    # Raw HTTP response body (HTTP only)
    $ch->query("select * from t format CSV", { raw => 1 }, sub {
        my ($body, $err) = @_;
        print $body;
    });

    EV::run;

# DESCRIPTION

EV::ClickHouse is an asynchronous ClickHouse client that integrates with
the [EV](https://metacpan.org/pod/EV) event loop. It speaks both the ClickHouse HTTP protocol
(port 8123) and the native TCP protocol (port 9000) directly in XS, with
no external ClickHouse client library linked. zlib is required; OpenSSL
(for TLS) and liblz4 (for native compression) are optional and detected
at build time.

## Features

- HTTP and native TCP protocols, with the same Perl API
- gzip compression (HTTP) and LZ4 compression with CityHash
checksums (native)
- TLS/SSL via OpenSSL, with optional `tls_skip_verify` for
self-signed certs and `tls_ca_file` for additional roots
- Connection URIs (`clickhouse[+native]://user:pass@host:port/db`),
including bracketed IPv6 literals
- Per-query and connection-level ClickHouse settings; parameterized
queries via `params`; external tables (native) via `external`
- Auto-reconnect with exponential backoff; queued (unsent) queries
are preserved across reconnects
- Keepalive pings for idle native connections; graceful drain;
query cancellation and skip\_pending
- Streaming results via `on_data` per-block callback (native);
on\_progress for native progress packets
- Raw HTTP response mode for CSV / JSONEachRow / Parquet / etc.
- 35+ ClickHouse types including Int/UInt 8..256, Float32/64,
BFloat16, Decimal32/64/128/256, UUID, IPv4/IPv6, Nullable, Array,
Tuple, Map, LowCardinality (with cross-block dictionaries),
SimpleAggregateFunction, Nested, Geo (Point/Ring/LineString/Polygon
and the Multi variants), and JSON / Object('json') with auto-flattened
hashref leaves (Int64/Float64/Bool/String + Array variants).
- Opt-in decode of Date/DateTime, Decimal, and Enum columns; named-rows
(hashref) mode

# CONSTRUCTOR

## new

    my $ch = EV::ClickHouse->new(%args);

The connection is initiated immediately; `new` returns before it
completes. Queries issued before `on_connect` fires are queued and
dispatched once the connection is ready.

**Connection parameters:**

- uri => $uri\_string

    Single-string connection target:
    `clickhouse[+native]://user:pass@host:port/database?key=value`.

    The `+native` suffix selects the native protocol; otherwise HTTP is used.
    Hostnames, IPv4 addresses, and bracketed IPv6 literals are all accepted
    (e.g. `clickhouse://[::1]:9000/db`). Query-string values are merged into
    the constructor arguments. Discrete `host`, `port`, etc. arguments
    override the URI.

- host => $hostname

    Server hostname. Default: `127.0.0.1`.

    **Note:** DNS resolution is blocking unless [EV::cares](https://metacpan.org/pod/EV%3A%3Acares) is installed.
    With [EV::cares](https://metacpan.org/pod/EV%3A%3Acares) available, hostnames are resolved off-loop at
    construct time (the constructor returns immediately, queries queue
    until the resolved address is connected). Falls back to blocking
    `getaddrinfo` otherwise.

- hosts => \[$h1, $h2, ...\]

    Multi-host failover list. Each entry is `host`, `host:port`, or a
    bracketed-IPv6 literal. On a connect-phase failure (refused, timeout,
    ServerHello stall), the client advances to the next host in round-robin
    order; pair with `auto_reconnect => 1` for automatic recovery.
    The single `host` argument is honoured as a fallback when
    `hosts` isn't given.

- port => $port

    Server port. Default: `8123` (HTTP), `9000` (native).

- protocol => 'http' | 'native'

README.md  view on Meta::CPAN


    TCP/TLS connection timeout. `0` (default) means no timeout. Floating
    point allowed.

- query\_timeout => $seconds

    Default per-query timeout applied to every query and insert. The query
    callback receives a `timeout` error if exceeded. Override per-call via
    the `query_timeout` key in the settings hashref.

- max\_query\_size => $bytes

    Client-side guard: croak before sending any query whose SQL text exceeds
    this many bytes. `0` (default) disables the check. Useful as a
    last-resort defense against accidentally sending unbounded strings.

- max\_recv\_buffer => $bytes

    Defensive ceiling on the response. The cap applies to the raw recv
    buffer (every protocol), the chunked-decoded body (HTTP), and the
    gzip-decompressed body (HTTP), so the same upper bound applies to the
    user-visible payload regardless of transport encoding. On overflow the
    query callback receives an appropriate error ("recv buffer overflow",
    "chunked response too large", or "gzip body exceeds max\_recv\_buffer")
    and the connection is torn down so no subsequent query can slip past
    the cap on the same socket. `0` (default) keeps the historical
    no-cap behaviour (still bounded internally by a hard 128 MB ceiling
    on compressed paths). Recommended in production when the schema is
    constrained and you want a hard upper bound (e.g.
    `128 * 1024 * 1024` for 128 MB).

- http\_basic\_auth => 0 | 1

    HTTP only. When set, send credentials as
    `Authorization: Basic base64(user:password)` instead of the default
    `X-ClickHouse-User` / `X-ClickHouse-Key` header pair. Use this when
    the connection passes through an HTTP gateway (nginx, Envoy, ...) that
    strips the X-ClickHouse-\* headers but forwards Basic auth verbatim.
    Default: `0`.

- auto\_reconnect => 0 | 1

    Reconnect automatically on connection loss. Default: `0`. When enabled,
    queued (unsent) queries are preserved across reconnects; in-flight queries
    receive an error.

    The reconnect path covers TCP/TLS connect failures, `connect_timeout`
    or `query_timeout` expiry, and any clean server-side EOF (idle or
    mid-request). Mid-query I/O errors (ECONNRESET / EPIPE) and a malformed
    native ServerHello are **not** retried - they typically indicate a
    misconfigured peer or client-side bug that retry would only loop on.
    Combine with `reconnect_max_attempts` for an explicit ceiling.

- settings => \\%hash

    ClickHouse settings applied to every query and insert. Per-call settings
    (see ["query"](#query), ["insert"](#insert)) override these.

        settings => { async_insert => 1, max_threads => 4 }

- keepalive => $seconds

    Send a keepalive request every N seconds while the connection is idle:
    a native CLIENT\_PING on the native protocol or a `GET /ping` on HTTP
    (some load balancers / NATs drop idle HTTP connections after a few
    seconds; TCP-level keepalive is too coarse). Default: `0` (disabled).

- reconnect\_delay => $seconds

    Initial delay for the `auto_reconnect` exponential backoff. Each failed
    attempt doubles the delay, capped at `reconnect_max_delay`. Default:
    `0` (immediate retry, no backoff).

- reconnect\_max\_delay => $seconds

    Backoff ceiling. Default: `0`, meaning no explicit cap; the implementation
    still bounds the backoff exponent at 20 doublings, so with
    `reconnect_delay = 0.5` the worst case is roughly 6 days. Setting an
    explicit ceiling is recommended in production.

- reconnect\_jitter => $fraction

    Multiplicative jitter applied to each backoff delay: the actual sleep
    is uniformly random in `[delay, delay * (1 + jitter)]`. `0` (default)
    disables. Set to `0.1`-`0.5` when many clients reconnect against a
    shared cluster - without jitter, every replica restart causes a
    synchronised reconnect storm at the same backoff intervals. Jitter is
    applied _after_ `reconnect_max_delay` clamping, then re-clamped, so
    the ceiling is never exceeded.

- reconnect\_max\_attempts => $N

    Cap the total number of reconnect attempts before giving up. Once the
    cap is reached, `on_error` fires with the message
    `"max reconnect attempts exceeded"` and no further attempts are made
    (the user can manually call `reset` later). Default: `0` (unlimited
    retries; be careful with permanent failures like wrong host).

- progress\_period => $seconds

    Coalesce `on_progress` packets so the callback fires at most once per
    N seconds, with the per-field counters accumulated over the interval.
    Useful for big SELECTs where the server can emit hundreds of progress
    packets per second. Default: `0` (fire on every packet).

- query\_log\_comment => 1 | $string

    Prepend a SQL block comment to every query for `system.query_log`
    traceability. `1` auto-generates `ev_ch user=$ENV{USER} pid=$$`;
    a string is taken literally. Omit (or pass a falsy value) to disable.
    Embedded `*/` sequences are escaped to keep the comment well-formed.

**Decode options (native protocol only):**

These shape how column values are returned. All are opt-in and default
to `0`, which returns raw numeric forms for stable round-tripping.

- decode\_datetime => 0 | 1

    Return `Date`, `Date32`, `DateTime`, and `DateTime64` as formatted
    strings (e.g. `"2024-01-15"`, `"2024-01-15 10:30:00"`) instead of raw
    integers. Uses UTC; columns with an explicit timezone
    (`DateTime('America/New_York')`) are converted to that zone.

- decode\_decimal => 0 | 1

README.md  view on Meta::CPAN

    spec (odd structure list, non-arrayref row, or a column type that
    cannot be encoded).

**Native protocol type notes:** values come back as typed Perl scalars.
By default `Date`/`DateTime` are integers (days since epoch / Unix
timestamps); enable `decode_datetime` for strings. `Enum` values are
numeric codes; `decode_enum` returns labels. `Decimal` values are
unscaled integers; `decode_decimal` scales them to floats.
`SimpleAggregateFunction` is transparently decoded as its inner type.
`Nested` columns become arrays of tuples. `LowCardinality` works
correctly across multi-block results with shared dictionaries.

## insert

    $ch->insert($table, $data, sub { my (undef, $err) = @_ });
    $ch->insert($table, $data, \%settings, sub { my (undef, $err) = @_ });

`$data` may be either:

- A pre-formatted TabSeparated string (tabs separate columns,
newlines separate rows, with the standard ClickHouse escapes).
- An arrayref of arrayrefs (rows of column values).

When using arrayrefs, no TSV escaping is needed: `undef` maps to null
and strings may contain tabs and newlines freely.

Nested arrayrefs (Array/Tuple columns) and hashrefs (Map columns) are
supported **only on the native protocol**, where the encoder has the
column type from the server's sample block. On HTTP the same call
croaks rather than silently produce malformed TSV; use the native
protocol or pre-serialise nested types into ClickHouse TSV literal form.

    # Native: nested types encode directly.
    $ch->insert("my_table", [
        [1, "hello\tworld"],   # embedded tab
        [2, undef],            # null
        [3, [10, 20]],         # Array column   (native only)
        [4, { a => 1, b => 2 }],  # Map column  (native only)
    ], sub { ... });

The optional `\%settings` hashref works exactly as in ["query"](#query),
including `query_id`, `query_timeout`, and `params`. Two extra
flags are recognised here:

- `idempotent => 1 | $token`

    Auto-mints (or uses the supplied) `insert_deduplication_token`, so a
    reconnect-driven retry of the same insert doesn't double-write. Falsy
    values are a no-op.

- `async_insert => 1`

    Enables ClickHouse server-side insert batching by setting
    `async_insert=1, wait_for_async_insert=0`. Both sub-settings can be
    overridden by passing them explicitly.

## ping

    $ch->ping(sub { my ($result, $err) = @_ });

Send a no-op round trip to verify the connection is alive. On success
`$result` is true, `$err` is `undef`. On error: `(undef, $error)`.

## is\_healthy

    $ch->is_healthy(sub { my ($ok, $err) = @_ });
    $ch->is_healthy(sub { ... }, $timeout_seconds);

Bounded health probe: wraps ["ping"](#ping) with a deadline (default 5s). The
callback receives `(1, undef)` on a successful round trip, or
`(0, $msg)` on ping error or timeout. Failure does **not** tear down the
connection; recovery (`reset`, host rotation, etc.) is the caller's
choice. Useful for L4 load-balancer probes and self-monitoring loops.

## ping\_round\_trip

    $ch->ping_round_trip(sub {
        my ($seconds, $err) = @_;
        die "ping: $err" if $err;
        printf "rtt = %.3fms\n", $seconds * 1000;
    });

Issue a single PING and report wall-clock latency in seconds. Lighter
than installing ["track\_query\_durations"](#track_query_durations) for a one-shot probe;
returns `(undef, $err)` on transport failure. Pairs well with
["is\_healthy"](#is_healthy) for health-check endpoints that want both liveness and
latency.

## slow\_query\_log

    my $prev = $ch->slow_query_log(0.1, sub {
        my ($qid, $rows, $bytes, $code, $dur, $err) = @_;
        warn sprintf("SLOW %.3fs %s\n", $dur, $qid // '?');
    });

Filtered variant of ["on\_query\_complete"](#on_query_complete) that fires only when the
query took at least `$threshold` seconds. Returns the previous
`on_query_complete` so the caller can restore it. The previous
handler is also chained on every call, so installing this on top of
existing instrumentation is safe.

## server\_setting

    $ch->server_setting('max_threads', sub {
        my ($value, $err) = @_;
        warn "max_threads = $value\n";
    });

Looks one value up from `system.settings`. Convenient one-liner for
"what's the server's effective `$x`?". Returns `undef` via the
callback if the setting name isn't present on this server.

## row\_count

    $ch->row_count('events', sub { ... });
    $ch->row_count('events', "ts > now() - interval 1 hour", sub { ... });

`select count() from $table [where $where]`. `$where` is interpolated
literally; use parameterized predicates via the ["query"](#query) `params`
mechanism for user-supplied filters. Returns the row count or
`(undef, $err)`.

README.md  view on Meta::CPAN

        my ($info, $err) = @_;
        die $err if $err;
        for my $col (@{ $info->{columns} }) {
            printf "%-20s %s\n", $col->{name}, $col->{type};
        }
    });

Schema introspection: issues `describe table $name` and delivers
`{ columns => [{name=>..., type=>...}, ...] }` to the
callback. Useful for generic insert pipelines that need column types
without hard-coding them. `$name` may be `table` or `db.table`;
non-identifier characters are rejected up-front.

## iterate

    my $it = $ch->iterate("select number from numbers(1_000_000)");
    while (my $batch = $it->next($timeout)) {
        process($_) for @$batch;
    }
    die $it->error if $it->error;

**Native protocol only** - relies on the per-block `on_data` hook and
will croak if invoked on an HTTP connection.

Synchronous-feeling pull iterator over a streaming select. Internally
wraps the native `on_data` per-block callback and drives the EV loop
from inside `->next` until the next block arrives, the query
completes, or the optional timeout (seconds) expires. Useful for
procedural ETL / export code that doesn't fit a callback shape.

`->error`, `->is_done`, and `->cancel` are also
available on the returned iterator object.

## on\_log

    $ch->on_log(sub {
        my ($entry) = @_;
        # $entry: { event_time, host_name, query_id, thread_id,
        #          priority, source, text }
        printf "[CH %s] %s\n", $entry->{priority}, $entry->{text};
    });

Native protocol only. Fires once per row inside any `SERVER_LOG`
packet the server emits. Useful for surfacing
`send_logs_level => 'information'` server-side trace events to
the application's own log stream without polling `system.text_log`.
The row hash keys mirror the server-side log block schema; missing
keys (older revisions) come through as `undef`.

## on\_query\_start

    $ch->on_query_start(sub {
        my ($query_id) = @_;
        log_metric_start($query_id);
    });

Optional connection-level hook that fires the moment a query is
dispatched to the wire (after the query\_id has been resolved, before
the first send byte). Symmetric with ["on\_query\_complete"](#on_query_complete); useful for
deriving accurate "query in flight" durations without depending on
the per-query callback closure. Keepalive PINGs are suppressed, the
same as for `on_query_complete`. Also accepted as a constructor
argument.

## on\_query\_complete

    $ch->on_query_complete(sub {
        my ($query_id, $rows, $bytes, $error_code, $duration_s, $err) = @_;
        log_metric(...);
    });

Optional connection-level hook that fires after every query (success
or error). Arguments: query\_id (or undef), profile\_rows, profile\_bytes,
last\_error\_code, wall-clock duration in seconds, error message (or
undef). Useful for statsd/Prometheus-style instrumentation. Also
accepted as a constructor argument.

A per-query override may be passed in the `\%settings` hashref of
["query"](#query) or ["insert"](#insert). When set, it **replaces** (does not augment)
the connection-level handler for that single call, so per-query
instrumentation doesn't double-count against global metrics:

    $ch->query(
        $sql,
        { on_query_complete => sub {
              my ($qid, $rows, $bytes, $code, $dur, $err) = @_;
              record_slow_query($qid, $dur);
        } },
        $cb,
    );

## insert\_streamer

    my $s = $ch->insert_streamer('events',
        batch_size     => 5_000,
        settings       => { query_id => 'ingest-1' },     # optional
        on_batch_error => sub { warn "batch err: $_[0]" }, # per-failure
    );
    while (my $row = next_event()) {
        $s->push_row($row);
    }
    $s->finish(sub {
        my (undef, $err) = @_;
        die "ingest failed: $err" if $err;
    });

Buffered streaming insert for ETL workloads. Rows are buffered until
`batch_size` is reached, then dispatched as a single `insert()`.
Dispatches are serialised; push\_row keeps buffering while a batch is
in flight (the native protocol cannot pipeline INSERTs). `finish`
flushes the remaining buffer and fires its callback once all batches
complete; if any batch failed the first error is delivered as
`$err`. The streamer also offers `buffered_count` and `in_flight`
accessors for backpressure logic.

`$streamer->reset` discards any rows still in the local buffer
and clears the sticky error so the streamer can be reused after a
permanent error (e.g. a schema fix). Does **not** touch the underlying
`$ch` - any batch already on the wire still completes normally. Any
callback registered via `finish` or `await_drain` that has not yet
fired is invoked with a `'streamer reset'` error rather than being

README.md  view on Meta::CPAN

async chain.

    $pool->with_session(sub {
        my ($ch, $release) = @_;
        $ch->query("create temporary table t (n UInt32)", sub {
            $ch->query("insert into t values (1),(2),(3)", sub {
                $ch->query("select sum(n) from t", sub {
                    my ($rows) = @_;
                    say $rows->[0][0];
                    $release->();
                });
            });
        });
    });

`$pool->query_to($idx, $sql, $cb)` /
`$pool->insert_to($idx, $table, $data, $cb)` force-routes a
call to a specific member without going through `_pick`. Circuit
breaker observation still applies (success/failure is recorded
against that member). Useful for replica-targeted DDL, S3 ingest
that has to land on a chosen node, or sticky-affinity reads.

`$pool->nominate($idx)` returns the underlying connection so
subsequent calls bypass the pool entirely. Use sparingly - calls
made directly on the nominated connection don't update the
circuit-breaker state.

`$pool->hedged_query($sql, hedge => 2, $cb)` dispatches
the same select to `hedge` distinct random members and resolves
with whichever returns first. The callback receives
`($rows, undef, $member_idx)` on success (so callers can attribute
wins per member) or `(undef, $err)` if _every_ member fails.
Extra completions after the winner are silently discarded.
Recommended for tail-latency-sensitive selects on replicated tables.
**Do not** use for insert - would silently double-write when the
server's dedupe window misses.

`$pool->fan_out($sql, $cb)` sends the same select to _every_
member and collects per-member results into one arrayref:

    $pool->fan_out("select hostName(), uptime()", sub {
        for my $r (@{ $_[0] }) {
            printf "[%d] err=%s rows=%s\n",
                   $r->{member}, $r->{err} // '-',
                   $r->{rows} ? scalar @{$r->{rows}} : '-';
        }
    });

Useful for shard-aware diagnostics (per-replica lag, distinct
`system.*` values across the pool). Errors are per-member, not
aggregated - the callback always fires with a complete list. Pass
`settings => \%h` for per-query options.

**Circuit breaker:** pass `circuit_threshold => N` at construction
to enable per-member fail-fast. After N consecutive query/insert/ping
errors on a given member, that member is excluded from `_pick` for
`circuit_cooldown` seconds (default 30). A successful callback resets
the per-member fail counter. If every member is dead at pick time the
breaker is bypassed so the next attempt still has a chance to recover.
Inspect with `$pool->circuit_state` which returns one
`{ fails => N, dead_until => $epoch, alive => 0|1 }`
hashref per member.

**Graceful shutdown:** `$pool->shutdown($grace_seconds, $cb)`
drains every member, then calls `finish` on each. If `$grace_seconds`
elapses before every member drains, members still in flight are
force-finished and `$cb` receives the string
`"Pool::shutdown timed out after Ns"`. On a clean shutdown `$cb`
receives undef. `$grace_seconds` may be 0 (or undef) to wait
indefinitely. The callback fires exactly once.

    $SIG{TERM} = sub { $pool->shutdown(10, sub { EV::break }) };

# LIFECYCLE

## finish

    $ch->finish;

Close the connection. Pending queries receive an error callback. Aliased
as `disconnect`.

## reset

    $ch->reset;

Disconnect and immediately reconnect using the original parameters.
Aliased as `reconnect`.

## drain

    $ch->drain(sub { ... });

Register a callback to fire once all pending queries (queued + in-flight)
have completed. If nothing is pending, the callback fires synchronously.
The classic graceful-shutdown pattern:

    $ch->query("select 1", sub { ... });
    $ch->query("select 2", sub { ... });
    $ch->drain(sub {
        $ch->finish;
        EV::break;
    });

## cancel

    $ch->cancel;

Cancel the currently in-flight query. Native protocol sends CLIENT\_CANCEL
and waits for the server's EndOfStream/Exception; HTTP closes the connection
(use `auto_reconnect` or call ["reset"](#reset) to recover). The query's callback
receives an error.

## skip\_pending

    $ch->skip_pending;

Drop every pending operation: each queued and in-flight callback is invoked
with `(undef, $error_message)`. If a request was on the wire, the connection
is torn down; call ["reset"](#reset) (or rely on `auto_reconnect`) before issuing
new queries.

README.md  view on Meta::CPAN

    `EV::ClickHouse->is_retryable_error($code)` returns true for the
    common transient codes (timeouts, network errors, replica catch-up,
    keeper exceptions, ...). Inspect `$ch->last_error_code` from
    inside your query callback and schedule a retry only when the predicate
    fires - permanent errors (auth failures, missing tables) won't qualify.

    Sample skeleton:

        $ch->query($sql, sub {
            my ($r, $err) = @_;
            if ($err && EV::ClickHouse->is_retryable_error($ch->last_error_code)) {
                schedule_retry($sql);
            } elsif ($err) { warn "permanent: $err" }
        });

- Idempotent insert silently drops some rows

    `idempotent => 1` auto-mints
    `insert_deduplication_token`; if your producer issues the SAME logical
    batch twice (e.g. retry after a transient network blip) only the first
    write lands, by design. To force two distinct logical batches through,
    either pass an explicit `idempotent => $token` per batch or
    omit the option for fresh inserts. See `eg/idempotent_insert.pl`.

- `on_data` vs `iterate` - which should I pick?

    `on_data => sub { }` in the per-query settings is the
    lowest-overhead streaming path: each native data block is delivered as
    soon as the parser has it, no per-row allocation overhead beyond the
    batch arrayref. `iterate` is a synchronous-feeling pull wrapper around
    the same machinery - useful when the surrounding code is procedural
    (ETL scripts, exporters) and a callback shape doesn't fit. Both are
    native-only.

- Connection in front of nginx / reverse proxy strips X-ClickHouse-\* headers

    Pass `http_basic_auth => 1` to send the credentials as
    `Authorization: Basic ...` instead. Most HTTP gateways forward
    Authorization verbatim while filtering proprietary headers.

# TUNING

- Native vs HTTP

    Native (port 9000) is typically 2-5x faster for insert and select-of-many-rows
    because rows ship as binary columns instead of TSV text. Use HTTP only when
    the network path requires HTTPS-only or when you need `raw => 1` CSV /
    JSONEachRow / Parquet bodies.

- `compress => 1`

    Enables LZ4 (native) or gzip (HTTP). LZ4 cost is small and saves ~50-70%
    on text-heavy columns. Gzip is heavier; turn on only if you're bandwidth-bound.

- `insert_streamer` batch\_size

    Default 10\_000 is a good baseline. Smaller (1k-2k) reduces memory pressure
    on the producer; larger (50k-100k) reduces server-side merge cost on
    MergeTree. Match to your row width: ~1 MB per batch is a sweet spot.

- `keepalive`

    Enable on long-lived idle connections (HTTP behind a load balancer or
    NAT, or a native connection that may sit minutes between queries). 15-30s
    is typical.

- `reconnect_max_attempts`

    Always set in production. Default is unlimited; a permanent failure
    (wrong host, wrong port, dead server) will spin `on_error` forever
    otherwise.

- `progress_period`

    Coalesce on\_progress packets to one fire per N seconds. Big SELECTs can
    emit hundreds per second; throttle to 1-5s for monitoring dashboards.

- Pull-iterator vs `on_data`

    `on_data` has lower per-block overhead. `iterate` trades that for a
    synchronous-feeling API; use it when the surrounding code is procedural.

- `EV::ClickHouse::Pool`

    A Pool fans concurrent queries across N independent connections, so a
    slow query on one doesn't head-of-line-block the others. Use it for
    read-mostly fan-out; do not use it for queries that depend on
    session-level state (temporary tables, `set`) since each query may
    land on a different connection.

## Performance tuning checklist

- 1. Pick the right protocol

    Native (port 9000) beats HTTP (port 8123) for almost all workloads.
    HTTP is only required for HTTPS-fronted ingress, the `raw` mode that
    returns `RowBinary` / `JSONEachRow` / `Parquet` bodies unparsed, or
    gateway authentication that strips proprietary CH headers (see
    `http_basic_auth`).

- 2. Tune `batch_size` for INSERTs

    Aim for ~1 MB per batch. ClickHouse merges every block into a part on
    disk, so 1k blocks of 1k rows each is dramatically slower than 1 block
    of 1M rows because of merge amplification. `insert_streamer` with
    `batch_size => $rows_for_1MB` + `high_water` backpressure is the
    production-grade default.

- 3. Cap `max_recv_buffer`

    Without a cap, a runaway select (or a buggy upstream that returns
    gigabytes) will grow the recv buffer until the process is OOM-killed.
    Set `max_recv_buffer => 64 * 1024 * 1024` (64 MB) and let the
    parser tear the connection down with a clean error if exceeded - the
    caller's on\_error can decide whether to retry or surface to the user.

- 4. Watch for head-of-line blocking

    A single `EV::ClickHouse` serialises queries. Use
    [EV::ClickHouse::Pool](https://metacpan.org/pod/EV%3A%3AClickHouse%3A%3APool) when concurrent queries should run in parallel
    (read-mostly workloads, dashboard fan-out). For latency-sensitive



( run in 0.625 second using v1.01-cache-2.11-cpan-f56aa216473 )