ClickHouse-Encoder

 view release on metacpan or  search on metacpan

README  view on Meta::CPAN

==================

Fast XS encoder for ClickHouse Native format. Builds a binary block from a
Perl arrayref of rows; the result is the request body for an
`insert ... format native` operation over HTTP, the native TCP protocol,
or via stdin to clickhouse-client.

INSTALLATION
    perl Makefile.PL
    make
    make test
    make install

REQUIREMENTS
    A 64-bit Perl (Config{ivsize} >= 8). No external C library is required;
    the encoder is implemented entirely in XS.

SUPPORTED TYPES
    Int8/16/32/64, UInt8/16/32/64, Float32/64, BFloat16, String, FixedString(N),
    Date, Date32, DateTime, DateTime('tz'), DateTime64(p),
    Decimal32(s), Decimal64(s), Decimal128(s), Decimal256(s), Decimal(P, S),
    Enum8(...), Enum16(...),
    Bool / Boolean, UUID, IPv4, IPv6,
    Map(K, V), LowCardinality(String|FixedString|Nullable(...)),
    Variant(T1, T2, ...) (CH 24.1+),
    SimpleAggregateFunction(func, T),
    Tuple(T1, T2, ...) including named: Tuple(a Int32, b String),
    Geo: Point, Ring, LineString, MultiLineString, Polygon, MultiPolygon,
    Array(T), Nullable(T),
    JSON / Object('json') (CH 24.8+): hashref input with nested hashref
    auto-flattening to dotted paths; per-path types inferred from
    Perl SV flags (Int64, Float64, Bool, String) and arrayref leaves
    encoded as Array(T) variants. Symmetric on decode (unflattens).
    Dynamic: standalone Dynamic column, same wire format as one JSON
    path's Dynamic sub-column without the Object wrapper.
    DateTime / DateTime64 strings accept ISO 8601 with timezone offsets
    (Z, +HH:MM, -HH:MM, +HHMM); the offset is applied to convert to UTC.
    See `perldoc ClickHouse::Encoder` for value coercion rules and limits.

OUTPUT APIS
    encode(\@rows)                        return Native bytes for one block
    encode_into(\$buf, \@rows)            append a block to an existing scalar
    encode_columns(\%cols)                column-oriented input (same bytes)
    encode_to_handle($fh, \@rows)         direct write to a filehandle
    stream(\&iter, \&writer, batch_size=>N)  pull rows from iter, emit blocks
    streamer(\&writer, batch_size=>N)        ->push_row($r); ...; ->finish
                                              ->reset / ->buffered_count
                                              / ->is_empty
    validate_rows(\@rows)                 [{row=>N,error=>...}] for bad rows
    encode_to_command(\@cmd, \@rows)      pipe encoded bytes into a child cmd
    compressed_writer($mode, \&writer)    wrap a writer with gzip/zstd
    flatten_nested(\@cols)                expand Nested(...) -> flat name.field
    encode_row_binary(\@rows)             RowBinary body (row-major format)
    decode_row_binary($bytes)             decode a RowBinary byte string

HTTP insert
    ClickHouse::Encoder->insert_http(host=>..., port=>..., table=>..., rows=>...)
    one-shot HTTP insert (POSTs Native bytes, optional zstd/gzip).
    ClickHouse::Encoder->bulk_inserter(host=>..., table=>..., columns=>...)
    long-lived inserter with auto-flush at batch_size, retries on
    transient errors, keep-alive, optional compression. ->summary
    rolls up CH X-ClickHouse-Summary stats across batches;
    ->last_response gives the most recent flush's HTTP response with
    parsed CH metadata attached at ->{ch}{query-id,server,summary,...}.
    ClickHouse::Encoder->for_query($select_sql, host=>..., port=>...)
    runs describe ($select_sql) and returns an encoder configured for
    that result shape; useful when the schema isn't a real table.
    ClickHouse::Encoder->ping(host=>..., port=>...)
    liveness probe via /ping; returns 1 or croaks.

    All HTTP entry points accept scheme=>'https' (needs IO::Socket::SSL
    + Net::SSLeay), ssl_options/verify_SSL pass-throughs to HTTP::Tiny,
    settings=>{...} for per-query CH settings, and dedup_token=>$id for
    idempotent inserts.

SCHEMA INTROSPECTION
    ClickHouse::Encoder->for_table($table, via => 'client', ...)
    ClickHouse::Encoder->for_table($table, via => 'http', port => 8123, ...)
    ClickHouse::Encoder->server_version(host => ..., port => ...)
    fetches select version() over HTTP, returns {major,minor,patch,...}.
    ClickHouse::Encoder->types                  list of supported type names
    ClickHouse::Encoder->schema_diff(\@a, \@b)  {added,removed,changed}
    ClickHouse::Encoder->apply_schema_diff($diff, table=>...)
                                                alter table statements
                                                (drops -> modifies -> adds)
    ClickHouse::Encoder->format_create_table(table=>..., columns=>...)
                                                create table SQL; columns
                                                accept codec/ttl/default/...
    ClickHouse::Encoder->parse_create_table($ddl)
                                                show create table -> hashref
                                                {database,table,columns,...}
    ClickHouse::Encoder->parse_wkt($wkt)        WKT -> Geo arrayref shape
    $enc->estimate_size($nrows)                 byte-size hint for sizing

DECIMAL HELPERS
    ClickHouse::Encoder->decimal128_str($n) / ->decimal256_str($n)
    format a 16- or 32-byte little-endian decimal value as a signed
    base-10 string (host bigint avoidance for big precisions).

DECODER
    ClickHouse::Encoder->decode_block($bytes) / ->decode_rows($bytes)
    are the XS-side decoder for select ... format native responses.
    Supports every type encode handles; round-trips are symmetric.
    ->decode_blocks($bytes) walks a concatenated multi-block stream
    (also accepts a callback). ->decode_blocks_iter($bytes) returns
    a coderef iterator. ->decode_stream($fh, $cb) pulls bytes
    incrementally from a filehandle - memory bounded by one block at
    a time. ->decode_block($bytes, $offset, \%keep) skips data for
    unwanted columns (memory win on wide select *).

DOCUMENTATION
    See `perldoc ClickHouse::Encoder` after install, or the POD in
    lib/ClickHouse/Encoder.pm.

EXAMPLES
    eg/insert_http.pl              - end-to-end insert over HTTP::Tiny
    eg/insert_streaming.pl         - reuse one encoder across many batches
    eg/for_table.pl                - schema discovery via clickhouse-client
    eg/from_csv.pl                 - read CSV, encode, insert via HTTP
    eg/insert_clickhouse_local.pl  - server-less ingest to Parquet/ORC
    eg/etl_dbi.pl                  - DBI -> Native -> insert pipeline



( run in 1.450 second using v1.01-cache-2.11-cpan-df04353d9ac )