Cpanel-JSON-XS

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

        - Fix memory corruption in sv_to_ivuv() function (#136 by Pali)
        - Add new method ->require_types (#135 by Pali)
        - Fix typed json encoder conversion from scalar's PV and NV slot to
          JSON_TYPE_INT (#133, #134 by Pali)
        - Fix inconsistency with warnings in typed json encoder (#131 by Pali)
        - Fix Perl 5.8.0 support (#130 by Pali)
        - Fixed minor pod typo (#129 by sheeit)
        - Document invalid recursive callbacks or overloads (#128)

4.11 2019-03-26 (rurban)
        - Fix unicode strings with BOM corrupt ->utf8 state (#125)
          The BOM encoding effects only its very own decode call,
          not its object.

4.10 2019-03-18 (rurban)
        - Fix incr_text refcounts (#123)
        - Add incr_rest testcase (#123)
        - Fix encode_stringify string-overload refcnt problem (#124)
          "Attempt to free unreferenced scalar" with convert_blessed and overload.

4.09 2019-02-15 (rurban)
        - Add seperate allow_dupkeys property, in relaxed (#122)

Changes  view on Meta::CPAN

3.99_02 2018-01-30 (rurban,pali)
        - Avoid setlocale race in threads with non-C locales, where the threads differ
          in the LC_NUMERIC locale. (#99 pali)

3.99_01 2018-01-30 (rurban,pali)
        - Added Cpanel::JSON::XS::Type as 2nd optional encode argument. (#94 pali)
        - Removed calling get magic hooks twice in encode.

3.0240 2017-12-17 (rurban)
        - Simplify >allow_singlequote check, coverity cid #165321
        - Deprecate UTF-16 or UTF-32 BOM's: RFC 8259.

3.0239 2017-08-28 (rurban)
        - Fix compilation under windows. (#98 mauke)

3.0238 2017-08-25 (rurban)
        - Make printing of numbers on perl's earlier than 5.22 locale
        insensitive, to produce a dot as decimal sep. (#96)

3.0237 2017-07-28 (rurban)
        - relax inf/nan tests as in t/op/infnan.t for windows.

Changes  view on Meta::CPAN

          We used a wrong range also, but the wrong code from 3.02220
          was never executed because of an coding error. #73, #74
        - Fixed a perl 5.6 compilation regression from 3.0220.
        - Improve decode_bom for multibyte encoding, but not yet enabled.
          refcount error.
        - Add 5.24 to travis tests

3.0220 2016-10-28 (rurban)
	- add comprehensive JSON decode spectests from
          http://seriot.ch/parsing_json.html. #72
        - decode with BOM (UTF-8, UTF-16, or UTF-32). For now only UTF-8,
          the others error.
        - fixed detection of final \0 as illegal non-whitespace garbage. Fixes
          spectest 'n_number_then_00'. #72
        - changed decode of unicode noncharacters between U+FFFD and U+10FFFF
          to the recommended U+FFFD REPLACEMENT CHARACTER, when not in the binary
          or relaxed mode.
        - fail decode of non-unicode raw characters above U+10FFFF
          when not in relaxed mode.

3.0219 2016-10-26 (rurban)

MANIFEST  view on Meta::CPAN

t/test_parsing/i_string_incomplete_surrogates_escape_valid.json
t/test_parsing/i_string_inverted_surrogates_U+1D11E.json
t/test_parsing/i_string_lone_second_surrogate.json
t/test_parsing/i_string_not_in_unicode_range.json
t/test_parsing/i_string_truncated-utf-8.json
t/test_parsing/i_string_unicode_U+10FFFE_nonchar.json
t/test_parsing/i_string_unicode_U+1FFFE_nonchar.json
t/test_parsing/i_string_unicode_U+FDD0_nonchar.json
t/test_parsing/i_string_unicode_U+FFFE_nonchar.json
t/test_parsing/i_structure_500_nested_arrays.json
t/test_parsing/i_structure_UTF-8_BOM_empty_object.json
t/test_parsing/n_array_1_true_without_comma.json
t/test_parsing/n_array_a_invalid_utf8.json
t/test_parsing/n_array_colon_instead_of_comma.json
t/test_parsing/n_array_comma_after_close.json
t/test_parsing/n_array_comma_and_number.json
t/test_parsing/n_array_double_comma.json
t/test_parsing/n_array_double_extra_comma.json
t/test_parsing/n_array_extra_close.json
t/test_parsing/n_array_extra_comma.json
t/test_parsing/n_array_incomplete.json

MANIFEST  view on Meta::CPAN

t/test_parsing/n_string_single_quote.json
t/test_parsing/n_string_single_string_no_double_quotes.json
t/test_parsing/n_string_start_escape_unclosed.json
t/test_parsing/n_string_unescaped_crtl_char.json
t/test_parsing/n_string_unescaped_newline.json
t/test_parsing/n_string_unescaped_tab.json
t/test_parsing/n_string_unicode_CapitalU.json
t/test_parsing/n_string_with_trailing_garbage.json
t/test_parsing/n_structure_100000_opening_arrays.json
t/test_parsing/n_structure_U+2060_word_joined.json
t/test_parsing/n_structure_UTF8_BOM_no_data.json
t/test_parsing/n_structure_array_trailing_garbage.json
t/test_parsing/n_structure_array_with_extra_array_close.json
t/test_parsing/n_structure_array_with_unclosed_string.json
t/test_parsing/n_structure_ascii-unicode-identifier.json
t/test_parsing/n_structure_capitalized_True.json
t/test_parsing/n_structure_close_unopened_array.json
t/test_parsing/n_structure_comma_instead_of_closing_brace.json
t/test_parsing/n_structure_double_array.json
t/test_parsing/n_structure_end_array.json
t/test_parsing/n_structure_incomplete_UTF8_BOM.json
t/test_parsing/n_structure_lone-invalid-utf-8.json
t/test_parsing/n_structure_lone-open-bracket.json
t/test_parsing/n_structure_lt.gt.json
t/test_parsing/n_structure_ltnullgt.json
t/test_parsing/n_structure_no_data.json
t/test_parsing/n_structure_null-byte-outside-string.json
t/test_parsing/n_structure_number_with_trailing_garbage.json
t/test_parsing/n_structure_object_followed_by_closing_object.json
t/test_parsing/n_structure_object_unclosed_no_value.json
t/test_parsing/n_structure_object_with_comment.json

README  view on Meta::CPAN

    - extended testsuite, passes all
    http://seriot.ch/projects/parsing_json.html tests. In fact it is the
    only know JSON decoder which does so, while also being the fastest.

    - support many more options and methods from JSON::PP: stringify_infnan,
    allow_unknown, allow_stringify, allow_barekey, encode_stringify,
    allow_bignum, allow_singlequote, dupkeys_as_arrayref, sort_by
    (partially), escape_slash, convert_blessed, ... optional decode_json(,
    allow_nonref) arg. relaxed implements allow_dupkeys.

    - support all 5 unicode BOM's: UTF-8, UTF-16LE, UTF-16BE, UTF-32LE,
    UTF-32BE, encoding internally to UTF-8.

FUNCTIONAL INTERFACE
    The following convenience methods are provided by this module. They are
    exported by default:

    $json_text = encode_json $perl_scalar, [json_type]
        Converts the given Perl data structure to a UTF-8 encoded, binary
        string (that is, the string contains octets only). Croaks on error.

README  view on Meta::CPAN

             # else add more data
             sysread $fh, my $buf, 65536
                or die "read error: $!";
             $json->incr_parse ($buf); # void context, so no parsing
          }

    This is a complex example, but most of the complexity comes from the
    fact that we are trying to be correct (bear with me if I am wrong, I
    never ran the above example :).

BOM
    Detect all unicode Byte Order Marks on decode. Which are UTF-8,
    UTF-16LE, UTF-16BE, UTF-32LE and UTF-32BE.

    The BOM encoding is set only for one specific decode call, it does not
    change the state of the JSON object.

    Warning: With perls older than 5.20 you need load the Encode module
    before loading a multibyte BOM, i.e. >= UTF-16. Otherwise an error is
    thrown. This is an implementation limitation and might get fixed later.

    See <https://tools.ietf.org/html/rfc7159#section-8.1> *"JSON text SHALL
    be encoded in UTF-8, UTF-16, or UTF-32."*

    *"Implementations MUST NOT add a byte order mark to the beginning of a
    JSON text", "implementations (...) MAY ignore the presence of a byte
    order mark rather than treating it as an error".*

    See also <http://www.unicode.org/faq/utf_bom.html#BOM>.

    Beware that Cpanel::JSON::XS is currently the only JSON module which
    does accept and decode a BOM.

    The latest JSON spec
    <https://www.greenbytes.de/tech/webdav/rfc8259.html#character.encoding>
    forbid the usage of UTF-16 or UTF-32, the character encoding is UTF-8.
    Thus in subsequent updates BOM's of UTF-16 or UTF-32 will throw an
    error.

MAPPING
    This section describes how Cpanel::JSON::XS maps Perl values to JSON
    values and vice versa. These mappings are designed to "do the right
    thing" in most circumstances automatically, preserving round-tripping
    characteristics (what you put in comes out as something equivalent).

    For the more enlightened: note that in the following descriptions,
    lowercase *perl* refers to the Perl interpreter, while uppercase *Perl*

SIGNATURE  view on Meta::CPAN

SHA256 a1ee85c5d6cec2cc0cea8b91ec6d6c253f2bb7d2bd48a2ce5e9f0db49d9d0b32 t/test_parsing/i_string_incomplete_surrogates_escape_valid.json
SHA256 17dc4691fa7a327ac3411391f7d3b5db983715391f8dfee1c008ffb453959fff t/test_parsing/i_string_inverted_surrogates_U+1D11E.json
SHA256 24b9bc232f8cf6dff1869f7ef72790c7c5aac9c7529dd9c36427bb012bc20f37 t/test_parsing/i_string_lone_second_surrogate.json
SHA256 1e4a55bc32c7de4b1af008ed206899f4debd32ce7676e11b017633aba8a548f8 t/test_parsing/i_string_not_in_unicode_range.json
SHA256 65fd4cdbf6c28fe2d4e18f81ea37e154a90198fae4fe923eb41533dbd301cb02 t/test_parsing/i_string_truncated-utf-8.json
SHA256 95affe1744a2fb5a585d0a4a8b28b3238f36473ac55e21b43ead5e9824389eb3 t/test_parsing/i_string_unicode_U+10FFFE_nonchar.json
SHA256 1522be6352030d1fea7484adcccd731408f3f000fe893b567c4cc6fc79378ad6 t/test_parsing/i_string_unicode_U+1FFFE_nonchar.json
SHA256 093f3d1b0f5c40df1427a3bbb8e1999df3e923f475b0a9655c1a7eafbf69ebe3 t/test_parsing/i_string_unicode_U+FDD0_nonchar.json
SHA256 7e7d42cac051b1e55f093262ecb8cda1cc080cb9cde6a2cbe163b7a5db9fbfa9 t/test_parsing/i_string_unicode_U+FFFE_nonchar.json
SHA256 a6c6b45361ff77e7372a015a1f0289e9c09d1fe1ed59cf9773d599c55acf57cf t/test_parsing/i_structure_500_nested_arrays.json
SHA256 aa25e978046d680ef8740d837e6de5bc1e2a2dc6089dbda1012544b538d53f65 t/test_parsing/i_structure_UTF-8_BOM_empty_object.json
SHA256 66510f3118b7ede39314518bc5a1bc71eb2b4d36a8a01cb0e76829be6768409b t/test_parsing/n_array_1_true_without_comma.json
SHA256 00308ed1d30dd0c6248a36107049cfe7a4a795c774e252d6eeec4d64ccee4aad t/test_parsing/n_array_a_invalid_utf8.json
SHA256 b4bceb6e7fefaf43b73f472b8c260c5aad24610faea9a1abe6b9a3035065deb4 t/test_parsing/n_array_colon_instead_of_comma.json
SHA256 31f674b45f5eaff8419175ab479b1e8d5946d3fae55106a86f116d77f58672be t/test_parsing/n_array_comma_after_close.json
SHA256 2b3646f1ef5cf8b30a71b9e140e89b77da7eb4354ddc1f346f57ffc66ad68cd0 t/test_parsing/n_array_comma_and_number.json
SHA256 1b4a2022b0702d87a15a49f031e9795ab6824cb3945c106bcfa5ad55c2e945cc t/test_parsing/n_array_double_comma.json
SHA256 2161a4ca5aa1599950752bf9aa8a42358035362098dc702e9bde01dc05f3a674 t/test_parsing/n_array_double_extra_comma.json
SHA256 b9ffad41832eeef97b229c9de4f664d7ad3bd3094ebdf5342729564427b1688f t/test_parsing/n_array_extra_close.json
SHA256 20f9fd883ff8af1cf1a64b7bb00f233d64a590cfd08efda89fc96a170bcbe17c t/test_parsing/n_array_extra_comma.json
SHA256 301f5421cd3a61da99963d2f6fbabb29f0d9f00cdd9b3886f512240d148a335f t/test_parsing/n_array_incomplete.json

SIGNATURE  view on Meta::CPAN

SHA256 0389d0d5920ee2996811974bdcbfe05c9ca0ed79af909704b8485bc86f6eaf25 t/test_parsing/n_string_single_quote.json
SHA256 ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad t/test_parsing/n_string_single_string_no_double_quotes.json
SHA256 71efe1dd33978d346960b810a8dd22b9bbca77490b3c35d16ae329f457aea737 t/test_parsing/n_string_start_escape_unclosed.json
SHA256 c85db33857913bc4c4d4f044c72cd06d50ffb5644b2e5093ef8b46ebfe340ed0 t/test_parsing/n_string_unescaped_crtl_char.json
SHA256 b53a55c51fa3b643f1e760d7a2a98869e9bff4e59edff443a632864c37c29e25 t/test_parsing/n_string_unescaped_newline.json
SHA256 f69aad8a6518ca6367a55724dfdc6eace3e202b9035e1f85eab97e3c1495fd2e t/test_parsing/n_string_unescaped_tab.json
SHA256 2871101babb06b63ad8cf90cc9c4b49d52f79d7165db60af743cf42391fae7d2 t/test_parsing/n_string_unicode_CapitalU.json
SHA256 3861f42f224ea26ea00ff20a8a4fba07e3fb26be247c112cf51f4d9840805616 t/test_parsing/n_string_with_trailing_garbage.json
SHA256 13f86ea1e7edd116d18d4ba6c6fa114cd3c927516182d24259623874955d21d1 t/test_parsing/n_structure_100000_opening_arrays.json
SHA256 aacc3ed19ba1f9596daefc68914fe3253c00703648e8ee606f03823feaed1da6 t/test_parsing/n_structure_U+2060_word_joined.json
SHA256 f1945cd6c19e56b3c1c78943ef5ec18116907a4ca1efc40a57d48ab1db7adfc5 t/test_parsing/n_structure_UTF8_BOM_no_data.json
SHA256 4a0aa4da859bc84269fc36c50a5968cf4109b9b07099c0cb49f331f50c5c848b t/test_parsing/n_structure_array_trailing_garbage.json
SHA256 f9bfc9cf84e12516f67e02e9b412520c6ffd0a016730e1da2b5ae32417f8affb t/test_parsing/n_structure_array_with_extra_array_close.json
SHA256 7d5c01400b39d5ebebb81a2a51ec727d9fbf722bbf5caff62cb695c4fb288e29 t/test_parsing/n_structure_array_with_unclosed_string.json
SHA256 c9e0bfc2ef7439ecd1084b7496f1b96ec27e1c8a80c7811eb7f91d4fa64da666 t/test_parsing/n_structure_ascii-unicode-identifier.json
SHA256 b78c88a26a2b512a2964fc4160ddb4d32e2aed8b01219ab5e3666b633572b336 t/test_parsing/n_structure_capitalized_True.json
SHA256 892491211326600b77f0464218fe38f041857e6b8cb5253558f03fe6f81ce40b t/test_parsing/n_structure_close_unopened_array.json
SHA256 d2b042da0c0664285a8485eec9097afc54e376f1a675818386699c5f2a65433d t/test_parsing/n_structure_comma_instead_of_closing_brace.json
SHA256 821bf06b4dcb406ea508a4a992eadc22f29850cd208ba24aea7c29148de8ccf1 t/test_parsing/n_structure_double_array.json
SHA256 cfae0d4248f7142f7b17f826cd7a519280e312577690e957830d23dcf35a3fff t/test_parsing/n_structure_end_array.json
SHA256 3f801979daa8eb44ebbc793bafe85ec1dd22ca8c976d9069fb0b0195c1cad8f9 t/test_parsing/n_structure_incomplete_UTF8_BOM.json
SHA256 ab61ba11a38b007ff98baa3ab20e2a584e15269fd428db3c857e2a2d568b5725 t/test_parsing/n_structure_lone-invalid-utf-8.json
SHA256 245843abef9e72e7efac30138a994bf6301e7e1d7d7042a33d42e863d2638811 t/test_parsing/n_structure_lone-open-bracket.json
SHA256 5545c186e2e58097f78948f4318b1da4a944853d7aa5d431db0a7cc14328ea23 t/test_parsing/n_structure_lt.gt.json
SHA256 7227cd2265406dc197d250c3f0e5293cee7912acad093392367d8679b221f41d t/test_parsing/n_structure_ltnullgt.json
SHA256 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 t/test_parsing/n_structure_no_data.json
SHA256 b092606b4e272c079151f7627c83db66b93c3624d0b051e6d0712925ed8b5c79 t/test_parsing/n_structure_null-byte-outside-string.json
SHA256 4c1731d722cf5ff61708486fbe2593cb779d91d6536e504300da66bdb4754507 t/test_parsing/n_structure_number_with_trailing_garbage.json
SHA256 8a299b12f74132eef36aa0ee1005e6ee4c0610572acf986becf7f0905db3d88f t/test_parsing/n_structure_object_followed_by_closing_object.json
SHA256 c6157167c74dace056a67dee056bee30ec1dbe40227b410d1948578317ef87fd t/test_parsing/n_structure_object_unclosed_no_value.json
SHA256 7ed28a9d4e3ca53e831b6ddf4bc8db6f30ac41bd9a95763662b281d41edb57b1 t/test_parsing/n_structure_object_with_comment.json

XS.pm  view on Meta::CPAN

  tests.  In fact it is the only know JSON decoder which does so,
  while also being the fastest.

- support many more options and methods from JSON::PP:
  stringify_infnan, allow_unknown, allow_stringify, allow_barekey,
  encode_stringify, allow_bignum, allow_singlequote, dupkeys_as_arrayref,
  sort_by (partially), escape_slash, convert_blessed, ...
  optional decode_json(, allow_nonref) arg.
  relaxed implements allow_dupkeys.

- support all 5 unicode L<BOM|/BOM>'s: UTF-8, UTF-16LE, UTF-16BE, UTF-32LE,
  UTF-32BE, encoding internally to UTF-8.

=cut

our @ISA = qw(Exporter);
our @EXPORT = qw(encode_json decode_json to_json from_json);

sub to_json($@) {
   if ($] >= 5.008) {
     require Carp;

XS.pm  view on Meta::CPAN

         # else add more data
         sysread $fh, my $buf, 65536
            or die "read error: $!";
         $json->incr_parse ($buf); # void context, so no parsing
      }

This is a complex example, but most of the complexity comes from the fact
that we are trying to be correct (bear with me if I am wrong, I never ran
the above example :).

=head1 BOM

Detect all unicode B<Byte Order Marks> on decode.
Which are UTF-8, UTF-16LE, UTF-16BE, UTF-32LE and UTF-32BE.

The BOM encoding is set only for one specific decode call, it does not
change the state of the JSON object.

B<Warning>: With perls older than 5.20 you need load the Encode module
before loading a multibyte BOM, i.e. >= UTF-16. Otherwise an error is
thrown. This is an implementation limitation and might get fixed later.

See L<https://tools.ietf.org/html/rfc7159#section-8.1>
I<"JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.">

I<"Implementations MUST NOT add a byte order mark to the beginning of a
JSON text", "implementations (...) MAY ignore the presence of a byte
order mark rather than treating it as an error".>

See also L<http://www.unicode.org/faq/utf_bom.html#BOM>.

Beware that Cpanel::JSON::XS is currently the only JSON module which
does accept and decode a BOM.

The latest JSON spec
L<https://www.greenbytes.de/tech/webdav/rfc8259.html#character.encoding>
forbid the usage of UTF-16 or UTF-32, the character encoding is UTF-8.
Thus in subsequent updates BOM's of UTF-16 or UTF-32 will throw an error.

=head1 MAPPING

This section describes how Cpanel::JSON::XS maps Perl values to JSON
values and vice versa. These mappings are designed to "do the right
thing" in most circumstances automatically, preserving round-tripping
characteristics (what you put in comes out as something equivalent).

For the more enlightened: note that in the following descriptions,
lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>

XS.xs  view on Meta::CPAN

#endif

#if PERL_VERSION < 22 && defined(HAS_SETLOCALE)
#define NEED_NUMERIC_LOCALE_C
#ifdef I_XLOCALE
#include <xlocale.h>
#endif
#endif

/* FIXME: still a refcount error */
#define HAVE_DECODE_BOM
#define UTF8BOM     "\357\273\277"      /* EF BB BF */
/* UTF16/32BOM is deprecated, RFC 8259 */
#define UTF16BOM    "\377\376"          /* FF FE or +UFEFF */
#define UTF16BOM_BE "\376\377"          /* FE FF */
#define UTF32BOM    "\377\376\000\000"  /* FF FE 00 00 or +UFEFF */
#define UTF32BOM_BE "\000\000\376\377"  /* 00 00 FE FF */

/* Need to disable broken gcc-9.[0-3] -O1. Assume gcc-10 is also still broken */
/* Only gcc defines __GNUC_PATCHLEVEL__, clang and icc do define __GNUC__ */
#if defined(__GNUC__) && defined(__GNUC_PATCHLEVEL__) && \
    (((__GNUC__ == 9) && (__GNUC_MINOR__ <= 3) || \
      (__GNUC__ > 9)))
#  define BROKEN_GCC_OPT
#endif

/* mingw with USE_LONG_DOUBLE (and implied USE_MINGW_ANSI_STDIO) do use the

XS.xs  view on Meta::CPAN


/* decode UTF32-LE/... to UTF-8:
   $utf8 = Encode::decode("UTF-32", $string); */
static SV *
decode_bom(pTHX_ const char* encoding, SV* string, STRLEN offset)
{
  dSP;
  I32 items;
  PERL_UNUSED_ARG(offset);

#ifndef HAVE_DECODE_BOM
  croak ("Cannot handle multibyte BOM yet");
  return string;
#else
  ENTER;
#if PERL_VERSION > 18
  /* on older perls (<5.20) this corrupts ax */
  Perl_load_module(aTHX_ PERL_LOADMOD_NOIMPORT, newSVpvs("Encode"),
                   NULL, NULL, NULL);
#else
  if (!get_cvs("Encode::decode", GV_NOADD_NOINIT|GV_NO_SVGMAGIC))
    croak("Multibyte BOM needs to use Encode before");
#endif
  LEAVE;
  ENTER;
  PUSHMARK(SP);
  XPUSHs(newSVpvn(encoding, strlen(encoding)));
  XPUSHs(string);
  PUTBACK;
  /* Calling Encode::Unicode::decode_xs would be faster, but we'd need the blessed
     enc hash from find_encoding() then. e.g. $Encode::Encoding{'UTF-16LE'}
     bless {Name=>UTF-16,size=>2,endian=>'',ucs2=>undef}, 'Encode::Unicode';
     And currenty we enjoy the simplicity of the BOM offset advance by 
     endianness autodetection.
   */
  items = call_sv(MUTABLE_SV(get_cvs("Encode::decode",
              GV_NOADD_NOINIT|GV_NO_SVGMAGIC)), G_SCALAR);
  SPAGAIN;
  if (items >= 0 && SvPOK(TOPs)) {
    LEAVE;
    SvUTF8_on(TOPs);
    return POPs;
  } else {

XS.xs  view on Meta::CPAN

    len = SvOK (string) ? sv_len (string) : 0;
#else
    len = SvCUR (string);
#endif

    if (UNLIKELY(len > json->max_size && json->max_size))
      croak ("attempted decode of JSON text of %lu bytes size, but max_size is set to %lu",
             (unsigned long)len, (unsigned long)json->max_size);
  }

  /* Detect BOM and possibly convert to UTF-8 and set UTF8 flag.

     https://tools.ietf.org/html/rfc7159#section-8.1
     JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.
     Byte Order Mark - While section 8.1 states "Implementations MUST
     NOT add a byte order mark to the beginning of a JSON text",
     "implementations (...) MAY ignore the presence of a byte order
     mark rather than treating it as an error". */
  if (UNLIKELY(len > 2 && SvPOK(string) && !json->incr_pos)) {
    U8 *s = (U8*)SvPVX (string);
    if (*s >= 0xEF) {
      if (len >= 3 && memEQc(s, UTF8BOM)) {
        converted = 1 + (json->flags & F_UTF8);
        json->flags |= F_UTF8;
        offset = 3;
        SvPV_set(string, SvPVX_mutable (string) + 3);
        SvCUR_set(string, len - 3);
        SvUTF8_on(string);
        /* omitting the endian name will skip the BOM in the result */
      } else if (len >= 4 && memEQc(s, UTF32BOM)) {
        string = decode_bom(aTHX_ "UTF-32", string, 4);
        converted = 1 + (json->flags & F_UTF8);
        json->flags |= F_UTF8;
      } else if (memEQc(s, UTF16BOM)) {
        string = decode_bom(aTHX_ "UTF-16", string, 2);
        converted = 1 + (json->flags & F_UTF8);
        json->flags |= F_UTF8;
      } else if (memEQc(s, UTF16BOM_BE)) {
        string = decode_bom(aTHX_ "UTF-16", string, 2);
        converted = 1 + (json->flags & F_UTF8);
        json->flags |= F_UTF8;
      }
    } else if (UNLIKELY(len >= 4 && !*s && memEQc(s, UTF32BOM_BE))) {
        string = decode_bom(aTHX_ "UTF-32", string, 4);
        converted = 1 + (json->flags & F_UTF8);
        json->flags |= F_UTF8;
   }
  }

  if (LIKELY(!converted)) {
    if (DECODE_WANTS_OCTETS (json))
      sv_utf8_downgrade (string, 0);
    else

XS.xs  view on Meta::CPAN

      /* check for trailing garbage */
      decode_ws (&dec);

      if ((dec.end - dec.cur) || *dec.cur)
        {
          dec.err = "garbage after JSON object";
          SvREFCNT_dec (sv);
          sv = NULL;
        }
    }
  /* restore old utf8 string with BOM */
  if (UNLIKELY(offset)) {
    SvPV_set(string, SvPVX_mutable (string) - offset);
    SvCUR_set(string, len);
  }

  if (!sv)
    {
      SV *uni = sv_newmortal ();
      COP cop = *PL_curcop;
      if (dec.cur >= dec.end) // overshoot

XS.xs  view on Meta::CPAN

             dec.err,
             (int)ptr_to_index (aTHX_ string, dec.cur - SvPVX(string)),
             dec.cur < dec.end ? SvPV_nolen (uni) : "(end of string)");
    }

  if (!(dec.json.flags & F_ALLOW_NONREF) && json_nonref(aTHX_ sv)) {
    SvREFCNT_dec (sv);
    croak ("JSON text must be an object or array (but found number, string, true, false or null, use allow_nonref to allow this)");
  }

  if (UNLIKELY(converted && !(converted - 1))) /* with BOM, and UTF8 was not set */
    json->flags &= ~F_UTF8;
  return sv_2mortal (sv);
}

/*/////////////////////////////////////////////////////////////////////////// */
/* incremental parser */

/* Note that our good friend gcc-9.x crashes here, which looks like one of the
   well-known internal gcc tree-optimizer bugs. */
#ifdef BROKEN_GCC_OPT

t/30_jsonspec.t  view on Meta::CPAN

                   );
}

# undefined i_ tests:
# also pass with relaxed
my %i_pass = map{$_ => 1}
  qw(
      i_number_neg_int_huge_exp
      i_number_pos_double_huge_exp
      i_structure_500_nested_arrays
      i_structure_UTF-8_BOM_empty_object
      i_string_unicode_U+10FFFE_nonchar
      i_string_unicode_U+1FFFE_nonchar
      i_string_unicode_U+FDD0_nonchar
      i_string_unicode_U+FFFE_nonchar
   );
# should also fail with relaxed, except i_string_not_in_unicode_range
my %i_parseerr = map{$_ => 1}
  qw(
      i_object_key_lone_2nd_surrogate
      i_string_1st_surrogate_but_2nd_missing

t/30_jsonspec.t  view on Meta::CPAN

      i_error($s, $base);
    } else {
      i_undefined($s, $base);
    }
  }
}

#n_error("[1,\n1\n,1",      "n_array_unclosed_with_new_lines.json");
#n_error("[\"a\",\n4\n,1,", "n_array_newlines_unclosed.json");
#i_pass("[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[...
#n_error("\x{EF}\x{BB}\x{BF}\x{00}{}","i_structure_UTF-8_BOM_empty_object.json");

#done_testing;

t/31_bom.t  view on Meta::CPAN

# Detect BOM and possibly convert to UTF-8 and set UTF8 flag.
#
# https://tools.ietf.org/html/rfc7159#section-8.1
# JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.
use Test::More ($] >= 5.008) ? (tests => 9) : (skip_all => "needs 5.8");;
use Cpanel::JSON::XS;
use Encode; # Currently required for <5.20
use charnames qw(:short);
use utf8;

my $json = Cpanel::JSON::XS->new->utf8->allow_nonref;

t/31_bom.t  view on Meta::CPAN

   ["\xfe\xff\000\133\000\042\000\351\000\042\000\135", 'UTF16-LE'],
   ["\xff\xfe\133\000\042\000\351\000\042\000\135\000", 'UTF16-BE'],
   ["\xff\xfe\000\000\133\000\000\000\042\000\000\000\351\000\000\000\042\000\000\000\135\000\000\000",   'UTF32-LE'],
   ["\000\000\xfe\xff\000\000\000\133\000\000\000\042\000\000\000\351\000\000\000\042\000\000\000\135",   'UTF32-BE'],
  );

for my $bom (@bom) {
  y_pass(@$bom);
}

# [GH #125] BOM in the middle corrupts state, sets utf8 flag
my $j = Cpanel::JSON::XS->new;

ok(my $as_json = eval {
    $j->encode({ example => "data with non-ASCII characters",
                 unicode => "\N{greek:Sigma}" })
}, 'can encode a basic structure');
ok(eval { $j->decode($as_json) }, 'can decode again');
ok(eval { $j->decode("\x{feff}" . $as_json) }, 'can decode with BOM');
ok(eval { $j->decode($as_json) }, 'can decode original');

xt/pod-spelling.t  view on Meta::CPAN

favourably
latin1
licences
utf8
www
16be
16le
32le
32be
BMP
BOM
BOM's
BSON
CBOR
CVE
Cpanel
Crockford
Crockford's
DESERIALIZATION
Deserializing
ECMAscript
GH



( run in 0.653 second using v1.01-cache-2.11-cpan-f29a10751f0 )