CBOR-Free

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN


0.20 31 Jan 2020
- Update test requirements to avoid dev release of Test::Class::Tiny.

0.19 30 Jan 2020
- BUG FIX: Properly validate UTF-8 map keys when decoding.
- InvalidUTF8 errors now correctly handle NULs in the invalid string.

0.18 19 Jan 2020
- BREAKING CHANGE: Encode UTF8-flagged hash keys as CBOR text.
- BUG FIX: Properly decoded hash keys that contain code points 128-255 were
  previously mangled. This release fixes that.
- BUG FIX: Decode CBOR text strings as UTF8-flaged Perl strings.
  This is an admittedly-imperfect attempt to preserve distinctly-typed text
  strings in Perl (a language that, by design, doesn’t distinguish text vs.
  binary). The POD has described this behavior for some time; it just was never
  implemented correctly until now.
- Add (EXPERIMENTAL) “text_keys” encode mode, which causes all CBOR map keys
  to be text. This requires proper character decoding!
- Add native_utf8() decode mode, which skips UTF-8 validation in decoding.
- Build fix: Require a reasonably recent ExtUtils::MakeMaker.

README.md  view on Meta::CPAN

        “mis-typed” strings.

    - `encode_text`: Treats all strings as unencoded characters.
    All CBOR strings will be text.

        This is probably what you want if you
        follow the receive-decode-process-encode-output workflow that
        [perlunitut](https://metacpan.org/pod/perlunitut) recommends (which you might be doing via `use utf8`)
        **AND** if you intend for your CBOR to contain exclusively text.

        Think of this option as: “All my strings are decoded.”

        (Perl internals note: if !SvUTF8, the CBOR will be the UTF8-upgraded
        version.)

    - `as_text`: Treats all strings as octets of UTF-8.
    Wide characters (i.e., code points above 255) are thus invalid input.
    All CBOR strings will be text.

        This is probably what you want if you forgo character decoding (and encoding),
        treating all input as octets, **BUT** you still intend for your CBOR to

README.md  view on Meta::CPAN

An error is thrown on excess recursion or an unrecognized object.

## $data = decode( $CBOR )

Decodes a data structure from CBOR. Errors are thrown to indicate
invalid CBOR. A warning is thrown if $CBOR is longer than is needed
for $data.

Notes on mapping CBOR to Perl:

- `decode()` decodes CBOR text strings as UTF-8-decoded Perl strings.
CBOR binary strings become undecoded Perl strings.

    (See [CBOR::Free::Decoder](https://metacpan.org/pod/CBOR%3A%3AFree%3A%3ADecoder) and [CBOR::Free::SequenceDecoder](https://metacpan.org/pod/CBOR%3A%3AFree%3A%3ASequenceDecoder) for more
    character-decoding options.)

    Notes:

    - Invalid UTF-8 in a CBOR text string is usually considered
    invalid input and will thus prompt a thrown exception. (See
    [CBOR::Free::Decoder](https://metacpan.org/pod/CBOR%3A%3AFree%3A%3ADecoder) and [CBOR::Free::SequenceDecoder](https://metacpan.org/pod/CBOR%3A%3AFree%3A%3ASequenceDecoder) if you want
    to tolerate invalid UTF-8.)
    - You can reliably use `utf8::is_utf8()` to determine if a given Perl
    string came from CBOR text or binary, but **ONLY** if you test the scalar as
    it appears in the newly-decoded data structure itself. Generally Perl code
    should avoid `is_utf8()`, but with CBOR::Free-created strings this limited
    use case is legitimate and potentially gainful.

- The only map keys that `decode()` accepts are integers and strings.
An exception is thrown if the decoder finds anything else as a map key.
Note that, because Perl does not distinguish between binary and text strings,
if two keys of the same map contain the same bytes, Perl will consider these
a duplicate key and prefer the latter.
- CBOR booleans become the corresponding [Types::Serialiser](https://metacpan.org/pod/Types%3A%3ASerialiser) values.
Both CBOR null and undefined become Perl undef.

cbor_free_decode.c  view on Meta::CPAN

                    _RETURN_IF_INCOMPLETE( decstate, 3, NULL );

                    ret = newSVnv( decode_half_float( (uint8_t *) (1 + decstate->curbyte) ) );

                    decstate->curbyte += 3;
                    break;

                case CBOR_FLOAT:
                    _RETURN_IF_INCOMPLETE( decstate, 5, NULL );

                    float decoded_flt;

#if IS_LITTLE_ENDIAN
                    decoded_flt = _decode_float_to_host( aTHX_ decstate, (uint8_t *) (1 + decstate->curbyte ) );
#else
                    decoded_flt = *( (float *) (1 + decstate->curbyte) );
#endif

                    ret = newSVnv( (NV) decoded_flt );

                    decstate->curbyte += 5;
                    break;

                case CBOR_DOUBLE:
                    _RETURN_IF_INCOMPLETE( decstate, 9, NULL );

                    double decoded_dbl;

#if IS_LITTLE_ENDIAN
                    decoded_dbl = _decode_double_to_le( decstate, (uint8_t *) (1 + decstate->curbyte ) );
#else
                    decoded_dbl = *( (double *) (1 + decstate->curbyte) );
#endif

                    ret = newSVnv( (NV) decoded_dbl );

                    decstate->curbyte += 9;
                    break;

                default:
                    _croak_invalid_control( aTHX_ decstate );
            }

            break;

lib/CBOR/Free.pm  view on Meta::CPAN

=back

=item * C<encode_text>: Treats all strings as unencoded characters.
All CBOR strings will be text.

This is probably what you want if you
follow the receive-decode-process-encode-output workflow that
L<perlunitut> recommends (which you might be doing via C<use utf8>)
B<AND> if you intend for your CBOR to contain exclusively text.

Think of this option as: “All my strings are decoded.”

(Perl internals note: if !SvUTF8, the CBOR will be the UTF8-upgraded
version.)

=item * C<as_text>: Treats all strings as octets of UTF-8.
Wide characters (i.e., code points above 255) are thus invalid input.
All CBOR strings will be text.

This is probably what you want if you forgo character decoding (and encoding),
treating all input as octets, B<BUT> you still intend for your CBOR to

lib/CBOR/Free.pm  view on Meta::CPAN

=head2 $data = decode( $CBOR )

Decodes a data structure from CBOR. Errors are thrown to indicate
invalid CBOR. A warning is thrown if $CBOR is longer than is needed
for $data.

Notes on mapping CBOR to Perl:

=over

=item * C<decode()> decodes CBOR text strings as UTF-8-decoded Perl strings.
CBOR binary strings become undecoded Perl strings.

(See L<CBOR::Free::Decoder> and L<CBOR::Free::SequenceDecoder> for more
character-decoding options.)

Notes:

=over

=item * Invalid UTF-8 in a CBOR text string is usually considered
invalid input and will thus prompt a thrown exception. (See
L<CBOR::Free::Decoder> and L<CBOR::Free::SequenceDecoder> if you want
to tolerate invalid UTF-8.)

=item * You can reliably use C<utf8::is_utf8()> to determine if a given Perl
string came from CBOR text or binary, but B<ONLY> if you test the scalar as
it appears in the newly-decoded data structure itself. Generally Perl code
should avoid C<is_utf8()>, but with CBOR::Free-created strings this limited
use case is legitimate and potentially gainful.

=back

=item * The only map keys that C<decode()> accepts are integers and strings.
An exception is thrown if the decoder finds anything else as a map key.
Note that, because Perl does not distinguish between binary and text strings,
if two keys of the same map contain the same bytes, Perl will consider these
a duplicate key and prefer the latter.

lib/CBOR/Free/Decoder.pm  view on Meta::CPAN


If in doubt, leave this off.

=cut

#----------------------------------------------------------------------

=head2 $obj = I<OBJ>->string_decode_cbor();

This causes I<OBJ> to decode strings according to their CBOR type:
text strings are UTF8-decoded; binary strings are left as-is. This is
the default configuration, à la C<CBOR::Free::decode()>.

=head2 $obj = I<OBJ>->string_decode_never();

This causes I<OBJ> to leave all strings undecoded. This is useful for
applications that treat all strings as octet sequences. Note that CBOR
text strings will still be validated as UTF-8 unless C<naive_utf8()> is
enabled.

=head2 $obj = I<OBJ>->string_decode_always();

This causes I<OBJ> to decode all CBOR strings (including binary strings)
as UTF-8, applying appropriate pre-validation unless C<naive_utf8()> is
enabled. This is useful if you expect all strings (including binary) to be
UTF-8 and want to handle them in Perl as character strings instead of
byte strings.

=head2 I<OBJ>->set_tag_handlers( %TAG_CALLBACK )

Takes a list of key/value pairs where each key is a tag (i.e., number)
and each value is a coderef that CBOR::Free will run when that tag is
seen during a decode operation. The coderef will receive the tagged value,
and its (scalar) return will be inserted into the decoded data structure.

To unset a tag handler, assign undef to it.

This returns the I<OBJ>.

B<NOTE:> Handlers assigned here will only fire if CBOR::Free itself
doesn’t decode the tag. For example, a handler for the “indirection” tag
here will be ignored.

=cut

lib/CBOR/Free/SequenceDecoder.pm  view on Meta::CPAN

=head1 NAME

CBOR::Free::SequenceDecoder

=head1 SYNOPSIS

    my $decoder = CBOR::Free::SequenceDecoder->new();

    if ( my $got_sr = $decoder->give( $some_cbor ) ) {

        # Do something with your decoded CBOR.
    }

    while (my $got_sr = $decoder->get()) {
        # Do something with your decoded CBOR.
    }

=head1 DESCRIPTION

This module implements a parser for CBOR Sequences
(L<RFC 8742|https://tools.ietf.org/html/rfc8742>).

=cut

#----------------------------------------------------------------------

lib/CBOR/Free/SequenceDecoder.pm  view on Meta::CPAN


=over

=item * a B<scalar reference> to the (parsed) first CBOR document in the
internal buffer

=item * undef, if there is no such document

=back

Note that if your decoded CBOR document’s root element is already a reference
(e.g., an array or hash reference), then the return value is a reference
B<to> that reference. So, for example, if you expect all documents in your
stream to be array references, you could do:

    if ( my $got_sr = $decoder->give( $some_cbor ) ) {
        my @decoded_array = @{ $$got_sr };

        # …
    }

=head2 $got_sr = I<CLASS>->get();

Like C<give()> but doesn’t append onto the internal CBOR buffer.

=cut

t/against_cbor_xs.t  view on Meta::CPAN


            { map { ($_ => undef) } 1 .. 1 },
            { map { ($_ => undef) } 1 .. 23},
            { map { ($_ => undef) } 1 .. 24},
            { map { ($_ => undef) } 1 .. 255 },
            { map { ($_ => undef) } 1 .. 256 },
        ],
    );

    for my $item ( @tests ) {
        my ($cbor, $decoded);

        $cbor = CBOR::XS::encode_cbor($item);
        $decoded = CBOR::Free::decode($cbor);

        my $item_q = ref($item) ? "$item" : _dump_string($item);

        is_deeply(
            $decoded,
            $item,
            sprintf("we decode what CBOR::XS encoded ($item_q, %d bytes)", length $cbor),
        ) or diag explain sprintf("CBOR: %v.02x", $cbor);

        $cbor = CBOR::Free::encode($item) or die "failed to encode($item)?";
        $decoded = CBOR::XS::decode_cbor($cbor);

        is_deeply(
            $decoded,
            $item,
            sprintf( "CBOR::XS decodes what we encoded (%d bytes)", length $cbor),
        ) or diag sprintf('CBOR: %v.02x', $cbor);
    }

    #----------------------------------------------------------------------

    use Config;

    for my $key (keys %Config::Config) {

t/dec_strings.t  view on Meta::CPAN


for my $n ( 256, 65535 ) {
    push @tests, [ $n, ('a' x $n), pack('Cna*', 89, $n, ('a' x $n)) ];
}

for my $n ( 65536, 100000 ) {
    push @tests, [ $n, ('a' x $n), pack('CNa*', 90, $n, ('a' x $n)) ];
}

for my $t (@tests) {
    my ($size, $decoded, $cbor) = @$t;

    is(
        CBOR::Free::decode($cbor),
        $decoded,
        "binary size: $size",
    );

    my $utf8cbor = $cbor;
    substr( $utf8cbor, 0, 1 ) ^= "\x40";
    substr( $utf8cbor, 0, 1 ) |= "\x60";

    is(
        CBOR::Free::decode($utf8cbor),
        $decoded,
        "UTF-8 size: $size",
    );
}

is(
    CBOR::Free::decode("\x5f\x41a\x42bc\xff"),
    'abc',
    'indefinite-length binary',
);

t/encode_modes.t  view on Meta::CPAN

use constant UTF8_0100 => do { utf8::encode( my $v = "\x{100}" ); $v };

__PACKAGE__->runtests() if !caller;

sub T32_test_given_unchanged {
    for my $canonical ( 0, 1 ) {
        for my $mode ( qw( sv encode_text as_text as_binary ) ) {
            my $v = UTF8_00FF;
            my $utf8_flag = utf8::is_utf8($v);
            CBOR::Free::encode($v, canonical => $canonical, string_encode_mode => $mode);
            is( $v, UTF8_00FF, "$mode: given undecoded scalar is unchanged" );
            is( utf8::is_utf8($v), $utf8_flag, "$mode: undecoded scalar internals are unchanged (canonical: $canonical)" );

            utf8::decode($v);
            my $v_copy = $v;
            $utf8_flag = utf8::is_utf8($v);
            CBOR::Free::encode($v, canonical => $canonical, string_encode_mode => $mode);
            is( $v, $v_copy, "$mode: given decoded scalar is unchanged" );
            is( utf8::is_utf8($v), $utf8_flag, "$mode: decoded scalar internals are unchanged (canonical: $canonical)" );
        }
    }
}

sub T4_test_sv_hash_key {
    my $the_key = (sort keys %!)[0];

    my $key_cbor_text = CBOR::Free::encode($the_key, string_encode_mode => 'as_text');
    my $key_cbor_binary = CBOR::Free::encode($the_key, string_encode_mode => 'as_binary');

t/examples.t  view on Meta::CPAN


for my $t (@examples) {
    is(
        unpack( 'H*', CBOR::Free::encode( $t->[0] ) ),
        $t->[1],
        sprintf('Encode to %s', $t->[1]),
    );

    use Devel::Peek;

    my $decoded = CBOR::Free::decode( pack( 'H*', $t->[1] ) );

    is_deeply(
        $decoded,
        $t->[0],
        sprintf('Decode %s', $t->[1])
    ) or Devel::Peek::Dump($decoded);

    my $got = CBOR::Free::decode( CBOR::Free::encode( $t->[0] ) );
    is_deeply(
        $got,
        $t->[0],
        sprintf("Round-trip: $t->[1]"),
    ) or Devel::Peek::Dump($got);
}

# NB: Different perls have historically represented these values

t/examples.t  view on Meta::CPAN

    [ [1, [2, 3], [4, 5]] => '83019f0203ff820405' ],
    [ [ 1 .. 25 ] => '9f0102030405060708090a0b0c0d0e0f101112131415161718181819ff' ],
    [ { a => 1, b => [2,3] } => 'bf61610161629f0203ffff' ],
    [ ['a', { b => 'c' }] => '826161bf61626163ff' ],
    [ { Fun => Types::Serialiser::true(), Amt => -2 } => 'bf6346756ef563416d7421ff' ],
);

for my $t (@decode) {
    my @w;

    my $decoded = do {
        local $SIG{'__WARN__'} = sub { push @w, @_ };
        CBOR::Free::decode( pack( 'H*', $t->[1] ) );
    };

    if (@w && (@w != 1 || $w[0] !~ m<tag>)) {
        warn "Unexpected warning(s): @w";
    }

    is_deeply(
        $decoded,
        $t->[0],
        sprintf('Decode %s', $t->[1])
    ) or diag explain $decoded;

    my $encoded = CBOR::Free::encode( $decoded );

    is_deeply(
        scalar( CBOR::Free::decode( $encoded ) ),
        $t->[0],
        sprintf("Round-trip: $t->[1]"),
    );
}

done_testing;

t/hash.t  view on Meta::CPAN

sub T2_text_key {
    my $hash_w_text_key = { "\x{100}" => '123' };
    my $cbor = CBOR::Free::encode($hash_w_text_key);

    is(
        $cbor,
        "\xa1\x62\xc4\x80C123",
        'hash w/ text key encoded as expected',
    ) or diag explain sprintf('%v.02x', $cbor);

    my $decoded = CBOR::Free::decode("$cbor");

    is(
        ord( (keys %$decoded)[0] ),
        256,
        'decoded map’s key is decoded correctly',
    );
}

sub T4_invalid_text_key {
    my $cbor = "\xa1\x63\0\xff\x80C123";

    throws_ok(
        sub { CBOR::Free::decode("$cbor") },
        'CBOR::Free::X::InvalidUTF8',
        'die() on normal attempt to decode invalid-UTF8 text map key',

t/hash.t  view on Meta::CPAN

        all(
            re( qr<double float> ),
            re( qr<5> ),
        ),
        '… with the expected error message',
    );
}

#----------------------------------------------------------------------

sub T1_decoded_high_bit_key {
    my $eacute_utf8 = "é";

    my $eacute = $eacute_utf8;
    utf8::decode($eacute);

    my $cbor = CBOR::Free::encode( { $eacute => 1 } );

    _cmpbin(
        $cbor,
        "\xa1" . "\x62$eacute_utf8" . "\1",
        'decoded UTF-8 e-acute encodes correctly',
    );
}

sub _cmpbin {
    my ($got, $expect, $label) = @_;

    $_ = sprintf('%v.02x', $_) for ($got, $expect);

    return is( $got, $expect, $label );
}

t/scalar_ref.t  view on Meta::CPAN

use Test::More;
use Test::FailWarnings;

use Data::Dumper;

use_ok('CBOR::Free');

# Example taken from http://cbor.schmorp.de/indirection
my $canonical = pack( 'C*', 0x82, 0x80, 0xd9, 0x56, 0x52, 0x66, ) . 'string';

my $decoded = CBOR::Free::decode($canonical);

is_deeply(
    $decoded,
    [ [], \'string' ],
    'decode a string reference (from specification)',
);

my $all_types_ar = [ \undef, \0, \1, \'haha', \[], \{}, \do { \[] } ];

my $round_tripped = CBOR::Free::decode( CBOR::Free::encode($all_types_ar, scalar_references => 1) );

is_deeply(
    $round_tripped,

t/string_decode_modes.t  view on Meta::CPAN


sub _test_string_modes {
    my ($decoder, $decode_cr) = @_;

    my $cbor_text = "\x62é";
    my $cbor_binary = "\x42é";

    my $dec_text = $decode_cr->($cbor_text);
    my $dec_binary = $decode_cr->($cbor_binary);

    is( length($dec_text), 1, 'default: text -> decoded' );
    is( length($dec_binary), 2, 'default: binary -> non-decoded' );

    # ----------------------------------------------------------------------

    my $ret = $decoder->string_decode_never();
    is( $ret, $decoder, 'string_decode_never() returns object' );

    $dec_text = $decode_cr->($cbor_text);
    $dec_binary = $decode_cr->($cbor_binary);

    is( length($dec_text), 2, 'string_decode_never: text -> non-decoded' );
    is( length($dec_binary), 2, 'string_decode_never: binary -> non-decoded' );

    # ----------------------------------------------------------------------

    $ret = $decoder->string_decode_always();
    is( $ret, $decoder, 'string_decode_always() returns object' );

    $dec_text = $decode_cr->($cbor_text);
    $dec_binary = $decode_cr->($cbor_binary);

    is( length($dec_text), 1, 'string_decode_always: text -> decoded' );
    is( length($dec_binary), 1, 'string_decode_always: binary -> decoded' );

    #----------------------------------------------------------------------

    $ret = $decoder->string_decode_cbor();
    is( $ret, $decoder, 'string_decode_cbor() returns object' );

    $dec_text = $decode_cr->($cbor_text);
    $dec_binary = $decode_cr->($cbor_binary);

    is( length($dec_text), 1, 'string_decode_cbor: text -> decoded' );
    is( length($dec_binary), 2, 'string_decode_cbor: binary -> non-decoded' );

    return;
}

1;

t/tag_decode.t  view on Meta::CPAN

    my $tag = int rand 0xffffffff;

    my $decoder = CBOR::Free::Decoder->new()->set_tag_handlers(
        $tag => sub { 42 + shift() },
    );

    my $cbor = CBOR::Free::encode(
        CBOR::Free::tag( $tag, 123 ),
    );

    my $decoded = $decoder->decode( $cbor );
    is( $decoded, 165, "single callback OK (tag $tag)" );
}

my $decoder = CBOR::Free::Decoder->new()->set_tag_handlers(
    1 => sub { 42 + shift() },
);

my @w;
my $decoded = do {
    local $SIG{'__WARN__'} = sub { push @w, @_ };
    $decoder->decode( "\xcb\x80" );
};

cmp_deeply(
    \@w,
    [ all(
        re(qr<11>),         # tag number
        re(qr<4>),          # major type
        re( qr<array> ),    # major type label
    ) ],
    'warning about unrecognized tag',
) or diag explain \@w;

is_deeply($decoded, [], '… and the value is correct' );

{
    my $decoder = CBOR::Free::Decoder->new()->set_tag_handlers(
        111 => sub { shift() },
    );

    $decoder->set_tag_handlers( 111 => undef );

    my $cbor = CBOR::Free::encode(
        CBOR::Free::tag( 111, 123 ),
    );

    my @w;
    my $decoded = do {
        local $SIG{'__WARN__'} = sub { push @w, @_ };
        $decoder->decode( $cbor );
    };

    like( $w[0], qr<111>, 'setting tag handler to undef removes it' );
}

dies_ok(
    sub { $decoder->set_tag_handlers( 99 ) },
    'dies: missing tag handler',

t_manual/upstream_test_vectors.t  view on Meta::CPAN


        # Perl can’t store -(~0).
        next if $t->{'hex'} eq '3bffffffffffffffff';

        # diag $t->{'hex'};

        my $cbor = pack 'H*', $t->{'hex'};

        my $expected;

        if (exists $t->{'decoded'}) {
            $expected = $t->{'decoded'};
        }
        elsif (my $diag = $t->{'diagnostic'}) {
            if (!exists $diagnostic{$diag}) {
                diag "Unknown diagnostic: “$diag”";
                next CBOR_VALUE;
            }

            $expected = $diagnostic{$diag};
        }

        my $decoded = CBOR::Free::decode($cbor);

        is_deeply( $decoded, $expected, "decode: $t->{'hex'}" );

        if ($t->{'roundtrip'}) {
            my $back2cbor = CBOR::Free::encode($expected);
            my $reparsed = CBOR::Free::decode($back2cbor);

            is_deeply(
                $reparsed,
                $expected,
                '… and it round-trips to back CBOR and out again',
            );



( run in 0.474 second using v1.01-cache-2.11-cpan-26ccb49234f )