ClickHouse-Encoder
view release on metacpan or search on metacpan
==================
Fast XS encoder for ClickHouse Native format. Builds a binary block from a
Perl arrayref of rows; the result is the request body for an
`insert ... format native` operation over HTTP, the native TCP protocol,
or via stdin to clickhouse-client.
INSTALLATION
perl Makefile.PL
make
make test
make install
REQUIREMENTS
A 64-bit Perl (Config{ivsize} >= 8). No external C library is required;
the encoder is implemented entirely in XS.
SUPPORTED TYPES
Int8/16/32/64, UInt8/16/32/64, Float32/64, BFloat16, String, FixedString(N),
Date, Date32, DateTime, DateTime('tz'), DateTime64(p),
Decimal32(s), Decimal64(s), Decimal128(s), Decimal256(s), Decimal(P, S),
Enum8(...), Enum16(...),
Bool / Boolean, UUID, IPv4, IPv6,
Map(K, V), LowCardinality(String|FixedString|Nullable(...)),
Variant(T1, T2, ...) (CH 24.1+),
SimpleAggregateFunction(func, T),
Tuple(T1, T2, ...) including named: Tuple(a Int32, b String),
Geo: Point, Ring, LineString, MultiLineString, Polygon, MultiPolygon,
Array(T), Nullable(T),
JSON / Object('json') (CH 24.8+): hashref input with nested hashref
auto-flattening to dotted paths; per-path types inferred from
Perl SV flags (Int64, Float64, Bool, String) and arrayref leaves
encoded as Array(T) variants. Symmetric on decode (unflattens).
Dynamic: standalone Dynamic column, same wire format as one JSON
path's Dynamic sub-column without the Object wrapper.
DateTime / DateTime64 strings accept ISO 8601 with timezone offsets
(Z, +HH:MM, -HH:MM, +HHMM); the offset is applied to convert to UTC.
See `perldoc ClickHouse::Encoder` for value coercion rules and limits.
OUTPUT APIS
encode(\@rows) return Native bytes for one block
encode_into(\$buf, \@rows) append a block to an existing scalar
encode_columns(\%cols) column-oriented input (same bytes)
encode_to_handle($fh, \@rows) direct write to a filehandle
stream(\&iter, \&writer, batch_size=>N) pull rows from iter, emit blocks
streamer(\&writer, batch_size=>N) ->push_row($r); ...; ->finish
->reset / ->buffered_count
/ ->is_empty
validate_rows(\@rows) [{row=>N,error=>...}] for bad rows
encode_to_command(\@cmd, \@rows) pipe encoded bytes into a child cmd
compressed_writer($mode, \&writer) wrap a writer with gzip/zstd
flatten_nested(\@cols) expand Nested(...) -> flat name.field
encode_row_binary(\@rows) RowBinary body (row-major format)
decode_row_binary($bytes) decode a RowBinary byte string
HTTP insert
ClickHouse::Encoder->insert_http(host=>..., port=>..., table=>..., rows=>...)
one-shot HTTP insert (POSTs Native bytes, optional zstd/gzip).
ClickHouse::Encoder->bulk_inserter(host=>..., table=>..., columns=>...)
long-lived inserter with auto-flush at batch_size, retries on
transient errors, keep-alive, optional compression. ->summary
rolls up CH X-ClickHouse-Summary stats across batches;
->last_response gives the most recent flush's HTTP response with
parsed CH metadata attached at ->{ch}{query-id,server,summary,...}.
ClickHouse::Encoder->for_query($select_sql, host=>..., port=>...)
runs describe ($select_sql) and returns an encoder configured for
that result shape; useful when the schema isn't a real table.
ClickHouse::Encoder->ping(host=>..., port=>...)
liveness probe via /ping; returns 1 or croaks.
All HTTP entry points accept scheme=>'https' (needs IO::Socket::SSL
+ Net::SSLeay), ssl_options/verify_SSL pass-throughs to HTTP::Tiny,
settings=>{...} for per-query CH settings, and dedup_token=>$id for
idempotent inserts.
SCHEMA INTROSPECTION
ClickHouse::Encoder->for_table($table, via => 'client', ...)
ClickHouse::Encoder->for_table($table, via => 'http', port => 8123, ...)
ClickHouse::Encoder->server_version(host => ..., port => ...)
fetches select version() over HTTP, returns {major,minor,patch,...}.
ClickHouse::Encoder->types list of supported type names
ClickHouse::Encoder->schema_diff(\@a, \@b) {added,removed,changed}
ClickHouse::Encoder->apply_schema_diff($diff, table=>...)
alter table statements
(drops -> modifies -> adds)
ClickHouse::Encoder->format_create_table(table=>..., columns=>...)
create table SQL; columns
accept codec/ttl/default/...
ClickHouse::Encoder->parse_create_table($ddl)
show create table -> hashref
{database,table,columns,...}
ClickHouse::Encoder->parse_wkt($wkt) WKT -> Geo arrayref shape
$enc->estimate_size($nrows) byte-size hint for sizing
DECIMAL HELPERS
ClickHouse::Encoder->decimal128_str($n) / ->decimal256_str($n)
format a 16- or 32-byte little-endian decimal value as a signed
base-10 string (host bigint avoidance for big precisions).
DECODER
ClickHouse::Encoder->decode_block($bytes) / ->decode_rows($bytes)
are the XS-side decoder for select ... format native responses.
Supports every type encode handles; round-trips are symmetric.
->decode_blocks($bytes) walks a concatenated multi-block stream
(also accepts a callback). ->decode_blocks_iter($bytes) returns
a coderef iterator. ->decode_stream($fh, $cb) pulls bytes
incrementally from a filehandle - memory bounded by one block at
a time. ->decode_block($bytes, $offset, \%keep) skips data for
unwanted columns (memory win on wide select *).
DOCUMENTATION
See `perldoc ClickHouse::Encoder` after install, or the POD in
lib/ClickHouse/Encoder.pm.
EXAMPLES
eg/insert_http.pl - end-to-end insert over HTTP::Tiny
eg/insert_streaming.pl - reuse one encoder across many batches
eg/for_table.pl - schema discovery via clickhouse-client
eg/from_csv.pl - read CSV, encode, insert via HTTP
eg/insert_clickhouse_local.pl - server-less ingest to Parquet/ORC
eg/etl_dbi.pl - DBI -> Native -> insert pipeline
( run in 1.450 second using v1.01-cache-2.11-cpan-df04353d9ac )