EV-ClickHouse
view release on metacpan or search on metacpan
lib/EV/ClickHouse.pm view on Meta::CPAN
(connect, dispatch, disconnect). Useful for diagnosing protocol issues.
=item on_failover => sub { my ($old_host, $old_port, $new_host, $new_port, $msg) = @_ }
Multi-host only. Fires after the failover wrapper rotates to the next
host in the C<hosts =E<gt> [...]> list, with the old and new (host, port)
pair plus the triggering error message. Use it for metrics ("which host
am I on?") or to log host transitions. Fires before the user's C<on_error>.
=back
B<Options:>
=over 4
=item compress => 0 | 1
Enable compression: gzip on HTTP (request and response), LZ4 with CityHash
checksums on the native protocol. Default: C<0>. Native compression
requires liblz4 at build time.
=item session_id => $id
HTTP session id for stateful operations (temporary tables, SET, etc.).
Native protocol has stateful sessions intrinsically; this option is HTTP-only.
=item connect_timeout => $seconds
TCP/TLS connection timeout. C<0> (default) means no timeout. Floating
point allowed.
=item query_timeout => $seconds
Default per-query timeout applied to every query and insert. The query
callback receives a C<timeout> error if exceeded. Override per-call via
the C<query_timeout> key in the settings hashref.
=item max_query_size => $bytes
Client-side guard: croak before sending any query whose SQL text exceeds
this many bytes. C<0> (default) disables the check. Useful as a
last-resort defense against accidentally sending unbounded strings.
=item max_recv_buffer => $bytes
Defensive ceiling on the response. The cap applies to the raw recv
buffer (every protocol), the chunked-decoded body (HTTP), and the
gzip-decompressed body (HTTP), so the same upper bound applies to the
user-visible payload regardless of transport encoding. On overflow the
query callback receives an appropriate error ("recv buffer overflow",
"chunked response too large", or "gzip body exceeds max_recv_buffer")
and the connection is torn down so no subsequent query can slip past
the cap on the same socket. C<0> (default) keeps the historical
no-cap behaviour (still bounded internally by a hard 128 MB ceiling
on compressed paths). Recommended in production when the schema is
constrained and you want a hard upper bound (e.g.
C<128 * 1024 * 1024> for 128 MB).
=item http_basic_auth => 0 | 1
HTTP only. When set, send credentials as
C<Authorization: Basic base64(user:password)> instead of the default
C<X-ClickHouse-User> / C<X-ClickHouse-Key> header pair. Use this when
the connection passes through an HTTP gateway (nginx, Envoy, ...) that
strips the X-ClickHouse-* headers but forwards Basic auth verbatim.
Default: C<0>.
=item auto_reconnect => 0 | 1
Reconnect automatically on connection loss. Default: C<0>. When enabled,
queued (unsent) queries are preserved across reconnects; in-flight queries
receive an error.
The reconnect path covers TCP/TLS connect failures, C<connect_timeout>
or C<query_timeout> expiry, and any clean server-side EOF (idle or
mid-request). Mid-query I/O errors (ECONNRESET / EPIPE) and a malformed
native ServerHello are B<not> retried - they typically indicate a
misconfigured peer or client-side bug that retry would only loop on.
Combine with C<reconnect_max_attempts> for an explicit ceiling.
=item settings => \%hash
ClickHouse settings applied to every query and insert. Per-call settings
(see L</query>, L</insert>) override these.
settings => { async_insert => 1, max_threads => 4 }
=item keepalive => $seconds
Send a keepalive request every N seconds while the connection is idle:
a native CLIENT_PING on the native protocol or a C<GET /ping> on HTTP
(some load balancers / NATs drop idle HTTP connections after a few
seconds; TCP-level keepalive is too coarse). Default: C<0> (disabled).
=item reconnect_delay => $seconds
Initial delay for the C<auto_reconnect> exponential backoff. Each failed
attempt doubles the delay, capped at C<reconnect_max_delay>. Default:
C<0> (immediate retry, no backoff).
=item reconnect_max_delay => $seconds
Backoff ceiling. Default: C<0>, meaning no explicit cap; the implementation
still bounds the backoff exponent at 20 doublings, so with
C<reconnect_delay = 0.5> the worst case is roughly 6 days. Setting an
explicit ceiling is recommended in production.
=item reconnect_jitter => $fraction
Multiplicative jitter applied to each backoff delay: the actual sleep
is uniformly random in C<[delay, delay * (1 + jitter)]>. C<0> (default)
disables. Set to C<0.1>-C<0.5> when many clients reconnect against a
shared cluster - without jitter, every replica restart causes a
synchronised reconnect storm at the same backoff intervals. Jitter is
applied I<after> C<reconnect_max_delay> clamping, then re-clamped, so
the ceiling is never exceeded.
=item reconnect_max_attempts => $N
Cap the total number of reconnect attempts before giving up. Once the
cap is reached, C<on_error> fires with the message
lib/EV/ClickHouse.pm view on Meta::CPAN
Without L<EV::cares>, DNS resolution falls back to blocking
C<getaddrinfo>. Install L<EV::cares> for non-blocking lookup; otherwise
use an IP literal or a local caching resolver (nscd / systemd-resolved).
=item C<connect_timeout> doesn't fire
It does across TCP connect, TLS handshake, and native ServerHello. If
the timer doesn't fire, the underlying issue is usually a synchronous
DNS stall (see above) which happens before C<start_connect> arms the
timer; install L<EV::cares> to push DNS off the loop.
=item Per-query C<query_timeout> is ignored
Set it inside the C<\%settings> hashref, not as a top-level argument:
C<<< $ch->query($sql, { query_timeout =E<gt> 5 }, $cb) >>>.
=item Which host am I currently pointed at after failover?
C<<< $ch->current_host >>> and C<<< $ch->current_port >>> reflect the
live target after a multi-host rotation. Use C<<< on_failover =E<gt>
sub { ... } >>> to get notified at the moment of each rotation.
=item How do I retry only on transient errors?
C<<< EV::ClickHouse->is_retryable_error($code) >>> returns true for the
common transient codes (timeouts, network errors, replica catch-up,
keeper exceptions, ...). Inspect C<<< $ch->last_error_code >>> from
inside your query callback and schedule a retry only when the predicate
fires - permanent errors (auth failures, missing tables) won't qualify.
Sample skeleton:
$ch->query($sql, sub {
my ($r, $err) = @_;
if ($err && EV::ClickHouse->is_retryable_error($ch->last_error_code)) {
schedule_retry($sql);
} elsif ($err) { warn "permanent: $err" }
});
=item Idempotent insert silently drops some rows
C<<< idempotent =E<gt> 1 >>> auto-mints
C<insert_deduplication_token>; if your producer issues the SAME logical
batch twice (e.g. retry after a transient network blip) only the first
write lands, by design. To force two distinct logical batches through,
either pass an explicit C<<< idempotent =E<gt> $token >>> per batch or
omit the option for fresh inserts. See F<eg/idempotent_insert.pl>.
=item C<on_data> vs C<iterate> - which should I pick?
C<<< on_data =E<gt> sub { } >>> in the per-query settings is the
lowest-overhead streaming path: each native data block is delivered as
soon as the parser has it, no per-row allocation overhead beyond the
batch arrayref. C<iterate> is a synchronous-feeling pull wrapper around
the same machinery - useful when the surrounding code is procedural
(ETL scripts, exporters) and a callback shape doesn't fit. Both are
native-only.
=item Connection in front of nginx / reverse proxy strips X-ClickHouse-* headers
Pass C<<< http_basic_auth =E<gt> 1 >>> to send the credentials as
C<Authorization: Basic ...> instead. Most HTTP gateways forward
Authorization verbatim while filtering proprietary headers.
=back
=head1 TUNING
=over 4
=item Native vs HTTP
Native (port 9000) is typically 2-5x faster for insert and select-of-many-rows
because rows ship as binary columns instead of TSV text. Use HTTP only when
the network path requires HTTPS-only or when you need C<raw =E<gt> 1> CSV /
JSONEachRow / Parquet bodies.
=item C<compress =E<gt> 1>
Enables LZ4 (native) or gzip (HTTP). LZ4 cost is small and saves ~50-70%
on text-heavy columns. Gzip is heavier; turn on only if you're bandwidth-bound.
=item C<insert_streamer> batch_size
Default 10_000 is a good baseline. Smaller (1k-2k) reduces memory pressure
on the producer; larger (50k-100k) reduces server-side merge cost on
MergeTree. Match to your row width: ~1 MB per batch is a sweet spot.
=item C<keepalive>
Enable on long-lived idle connections (HTTP behind a load balancer or
NAT, or a native connection that may sit minutes between queries). 15-30s
is typical.
=item C<reconnect_max_attempts>
Always set in production. Default is unlimited; a permanent failure
(wrong host, wrong port, dead server) will spin C<on_error> forever
otherwise.
=item C<progress_period>
Coalesce on_progress packets to one fire per N seconds. Big SELECTs can
emit hundreds per second; throttle to 1-5s for monitoring dashboards.
=item Pull-iterator vs C<on_data>
C<on_data> has lower per-block overhead. C<iterate> trades that for a
synchronous-feeling API; use it when the surrounding code is procedural.
=item C<EV::ClickHouse::Pool>
A Pool fans concurrent queries across N independent connections, so a
slow query on one doesn't head-of-line-block the others. Use it for
read-mostly fan-out; do not use it for queries that depend on
session-level state (temporary tables, C<set>) since each query may
land on a different connection.
=back
=head2 Performance tuning checklist
( run in 2.113 seconds using v1.01-cache-2.11-cpan-cdf2f3d4e48 )