Cpanel-JSON-XS
view release on metacpan or search on metacpan
NAME
Cpanel::JSON::XS - cPanel fork of JSON::XS, fast and correct serializing
SYNOPSIS
use Cpanel::JSON::XS;
# exported functions, they croak on error
# and expect/generate UTF-8
$utf8_encoded_json_text = encode_json $perl_hash_or_arrayref;
$perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;
# OO-interface
$coder = Cpanel::JSON::XS->new->ascii->pretty->allow_nonref;
$pretty_printed_unencoded = $coder->encode ($perl_scalar);
$perl_scalar = $coder->decode ($unicode_json_text);
# Note that 5.6 misses most smart utf8 and encoding functionalities
# of newer releases.
# Note that L<JSON::MaybeXS> will automatically use Cpanel::JSON::XS
# if available, at virtually no speed overhead either, so you should
# be able to just:
use JSON::MaybeXS;
# and do the same things, except that you have a pure-perl fallback now.
Note that this module will be replaced by a new JSON::Safe module soon,
with the same API just guaranteed safe defaults.
DESCRIPTION
This module converts Perl data structures to JSON and vice versa. Its
primary goal is to be *correct* and its secondary goal is to be *fast*.
To reach the latter goal it was written in C.
As this is the n-th-something JSON module on CPAN, what was the reason
to write yet another JSON module? While it seems there are many JSON
modules, none of them correctly handle all corner cases, and in most
cases their maintainers are unresponsive, gone missing, or not listening
to bug reports for other reasons.
See below for the cPanel fork.
See MAPPING, below, on how Cpanel::JSON::XS maps perl values to JSON
values and vice versa.
FEATURES
* correct Unicode handling
This module knows how to handle Unicode with Perl version higher
than 5.8.5, documents how and when it does so, and even documents
what "correct" means.
* round-trip integrity
When you serialize a perl data structure using only data types
supported by JSON and Perl, the deserialized data structure is
identical on the Perl level. (e.g. the string "2.0" doesn't suddenly
become "2" just because it looks like a number). There *are* minor
exceptions to this, read the MAPPING section below to learn about
those.
* strict checking of JSON correctness
There is no guessing, no generating of illegal JSON texts by
default, and only JSON is accepted as input by default. the latter
is a security feature.
* fast
Compared to other JSON modules and other serializers such as
Storable, this module usually compares favourably in terms of speed,
too.
* simple to use
This module has both a simple functional interface as well as an
object oriented interface.
* reasonably versatile output formats
You can choose between the most compact guaranteed-single-line
format possible (nice for simple line-based protocols), a pure-ASCII
format (for when your transport is not 8-bit clean, still supports
the whole Unicode range), or a pretty-printed format (for when you
want to read that stuff). Or you can combine those features in
whatever way you like.
cPanel fork
Since the original author MLEHMANN has no public bugtracker, this cPanel
fork sits now on github.
src repo: <https://github.com/rurban/Cpanel-JSON-XS> original:
<http://cvs.schmorp.de/JSON-XS/>
RT: <https://github.com/rurban/Cpanel-JSON-XS/issues> or
<https://rt.cpan.org/Public/Dist/Display.html?Queue=Cpanel-JSON-XS>
Changes to JSON::XS
- bare hashkeys are now checked for utf8. (GH #209)
- stricter decode_json() as documented. non-refs are disallowed. safe by
default. added a 2nd optional argument. decode() honors now
allow_nonref.
- fixed encode of numbers for dual-vars. Different string
representations are preserved, but numbers with temporary strings which
represent the same number are here treated as numbers, not strings.
Cpanel::JSON::XS is a bit slower, but preserves numeric types better.
- numbers ending with .0 stray numbers, are not converted to integers.
[#63] dual-vars which are represented as number not integer (42+"bar" !=
5.8.9) are now encoded as number (=> 42.0) because internally it's now a
NOK type. However !!1 which is wrongly encoded in 5.8 as "1"/1.0 is
still represented as integer.
- different handling of inf/nan. Default now to null, optionally with
stringify_infnan() to "inf"/"nan". [#28, #32]
- added "binary" extension, non-JSON and non JSON parsable, allows
"\xNN" and "\NNN" sequences.
- 5.6.2 support; sacrificing some utf8 features (assuming bytes
all-over), no multi-byte unicode characters with 5.6.
- interop for true/false overloading. JSON::XS, JSON::PP and Mojo::JSON
representations for booleans are accepted and JSON::XS accepts
Cpanel::JSON::XS booleans [#13, #37] Fixed overloading of booleans.
Cpanel::JSON::XS::true stringifies again to "1", not "true", analog to
all other JSON modules.
- native boolean mapping of yes and no to true and false, as in
YAML::XS. In perl "!0" is yes, "!1" is no. The JSON value true maps to
1, false maps to 0. [#39]
- support arbitrary stringification with encode, with convert_blessed
and allow_blessed.
- ithread support. Cpanel::JSON::XS is thread-safe, JSON::XS not
- is_bool can be called as method, JSON::XS::is_bool not.
- performance optimizations for threaded Perls
- relaxed mode, allowing many popular extensions
- protect our magic object from corruption by wrong or missing external
methods, like FREEZE/THAW or serialization with other methods.
- additional fixes for:
- #208 - no security-relevant out-of-bounds reading of module memory
when decoding hash keys without ending ':'
- [cpan #88061] AIX atof without USE_LONG_DOUBLE
- #10 unshare_hek crash
- #7, #29 avoid re-blessing where possible. It fails in JSON::XS for
READONLY values, i.e. restricted hashes.
- #41 overloading of booleans, use the object not the reference.
- #62 -Dusequadmath conversion and no SEGV.
- #72 parsing of values followed \0, like 1\0 does fail.
- #72 parsing of illegal unicode or non-unicode characters.
- #96 locale-insensitive numeric conversion.
- #154 numeric conversion fixed since 5.22, using the same strtold as perl5.
- #167 sort tied hashes with canonical.
- #212 fix utf8 object stringification
- public maintenance and bugtracker
- use ppport.h, sanify XS.xs comment styles, harness C coding style
- common::sense is optional. When available it is not used in the
published production module, just during development and testing.
- extended testsuite, passes all
http://seriot.ch/projects/parsing_json.html tests. In fact it is the
only know JSON decoder which does so, while also being the fastest.
- support many more options and methods from JSON::PP: stringify_infnan,
allow_unknown, allow_stringify, allow_barekey, encode_stringify,
allow_bignum, allow_singlequote, dupkeys_as_arrayref, sort_by
(partially), escape_slash, convert_blessed, ... optional decode_json(,
allow_nonref) arg. relaxed implements allow_dupkeys.
- support all 5 unicode BOM's: UTF-8, UTF-16LE, UTF-16BE, UTF-32LE,
UTF-32BE, encoding internally to UTF-8.
FUNCTIONAL INTERFACE
The following convenience methods are provided by this module. They are
exported by default:
$json_text = encode_json $perl_scalar, [json_type]
Converts the given Perl data structure to a UTF-8 encoded, binary
string (that is, the string contains octets only). Croaks on error.
This function call is functionally identical to:
$json_text = Cpanel::JSON::XS->new->utf8->encode ($perl_scalar, $json_type)
Except being faster.
For the type argument see Cpanel::JSON::XS::Type.
$perl_scalar = decode_json $json_text [, $allow_nonref [, my $json_type
] ]
The opposite of "encode_json": expects an UTF-8 (binary) string of
an json reference and tries to parse that as an UTF-8 encoded JSON
text, returning the resulting reference. Croaks on error.
This function call is functionally identical to:
$perl_scalar = Cpanel::JSON::XS->new->utf8->decode ($json_text, $json_type)
except being faster.
Note that older decode_json versions in Cpanel::JSON::XS older than
3.0116 and JSON::XS did not set allow_nonref but allowed them due to
a bug in the decoder.
Since version 4.42, "allow_nonref" is enabled by default, matching
JSON::XS 4.0+. The 2nd optional $allow_nonref argument can be set to
false (0) to disable it. When enabled, the function accepts all JSON
values: objects, arrays, strings, numbers, "null", "true", and
"false". See ""OLD" VS. "NEW" JSON (RFC 4627 VS. RFC 7159)" for why
you might not want this.
For the 3rd optional type argument see Cpanel::JSON::XS::Type.
$is_boolean = Cpanel::JSON::XS::is_bool $scalar
Returns true if the passed scalar represents either "JSON::PP::true"
or "JSON::PP::false", two constants that act like 1 and 0,
respectively and are used to represent JSON "true" and "false"
values in Perl. (Also recognizes the booleans produced by JSON::XS.)
See MAPPING, below, for more information on how JSON values are
mapped to Perl.
DEPRECATED FUNCTIONS
from_json
from_json has been renamed to decode_json
to_json
to_json has been renamed to encode_json
A FEW NOTES ON UNICODE AND PERL
Since this often leads to confusion, here are a few very clear words on
how Unicode works in Perl, modulo bugs.
1. Perl strings can store characters with ordinal values > 255.
This enables you to store Unicode characters as single characters in
a Perl string - very natural.
2. Perl does *not* associate an encoding with your strings.
... until you force it to, e.g. when matching it against a regex, or
printing the scalar to a file, in which case Perl either interprets
your string as locale-encoded text, octets/binary, or as Unicode,
depending on various settings. In no case is an encoding stored
together with your data, it is *use* that decides encoding, not any
magical meta data.
3. The internal utf-8 flag has no meaning with regards to the encoding
of your string.
4. A "Unicode String" is simply a string where each character can be
validly interpreted as a Unicode code point.
If you have UTF-8 encoded data, it is no longer a Unicode string,
but a Unicode string encoded in UTF-8, giving you a binary string.
5. A string containing "high" (> 255) character values is *not* a UTF-8
string.
6. Raw non-Unicode characters below U+10FFFF are allowed.
The 66 Unicode noncharacters U+FDD0..U+FDEF, and U+*FFFE, U+*FFFF
are allowed without warning, as JSON::PP does, see
<http://www.unicode.org/versions/corrigendum9.html>. But illegal
surrogate pairs fail to parse.
7. Raw non-Unicode characters above U+10FFFF are disallowed.
Raw non-Unicode characters outside the valid unicode range fail to
parse, because "A string is a sequence of zero or more Unicode
characters" RFC 7159 section 1 and "JSON text SHALL be encoded in
Unicode RFC 7159 section 8.1. We use now the UTF8_DISALLOW_SUPER
flag when parsing unicode.
8. Lone surrogates or illegal surrogate pairs are disallowed.
Since RFC 3629, U+D800 through U+DFFF are not legal Unicode values
and their UTF-8 encodings must be treated as an invalid byte
sequence. RFC 8259 section 8.2 admits the spec allows string values
that contain bit sequences that cannot encode Unicode characters and
that the behavior of software that receives such values is
unpredictable. To avoid introducing non-Unicode strings into Perl we
use the UTF8_DISALLOW_SURROGATE flag when parsing Unicode and verify
escaped surrogates form valid pairs.
I hope this helps :)
OBJECT-ORIENTED INTERFACE
The object oriented interface lets you configure your own encoding or
decoding style, within the limits of supported formats.
$json = new Cpanel::JSON::XS
Creates a new JSON object that can be used to de/encode JSON
strings. All boolean flags described below are by default
*disabled*.
The mutators for flags all return the JSON object again and thus
calls can be chained:
my $json = Cpanel::JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
=> {"a": [1, 2]}
$json = $json->ascii ([$enable])
$enabled = $json->get_ascii
If $enable is true (or missing), then the "encode" method will not
generate characters outside the code range 0..127 (which is ASCII).
Any Unicode characters outside that range will be escaped using
either a single "\uXXXX" (BMP characters) or a double
"\uHHHH\uLLLLL" escape sequence, as per RFC4627. The resulting
encoded JSON text can be treated as a native Unicode string, an
ascii-encoded, latin1-encoded or UTF-8 encoded string, or any other
superset of ASCII.
If $enable is false, then the "encode" method will not escape
Unicode characters unless required by the JSON syntax or other
flags. This results in a faster and more compact format.
See also the section *ENCODING/CODESET FLAG NOTES* later in this
document.
The main use for this flag is to produce JSON texts that can be
transmitted over a 7-bit channel, as the encoded JSON texts will not
contain any 8 bit characters.
Cpanel::JSON::XS->new->ascii (1)->encode ([chr 0x10401])
=> ["\ud801\udc01"]
$json = $json->latin1 ([$enable])
$enabled = $json->get_latin1
If $enable is true (or missing), then the "encode" method will
encode the resulting JSON text as latin1 (or ISO-8859-1), escaping
any characters outside the code range 0..255. The resulting string
can be treated as a latin1-encoded JSON text or a native Unicode
string. The "decode" method will not be affected in any way by this
flag, as "decode" by default expects Unicode, which is a strict
superset of latin1.
If $enable is false, then the "encode" method will not escape
Unicode characters unless required by the JSON syntax or other
flags.
See also the section *ENCODING/CODESET FLAG NOTES* later in this
document.
The main use for this flag is efficiently encoding binary data as
JSON text, as most octets will not be escaped, resulting in a
smaller encoded size. The disadvantage is that the resulting JSON
text is encoded in latin1 (and must correctly be treated as such
when storing and transferring), a rare encoding for JSON. It is
therefore most useful when you want to store data structures known
to contain binary data efficiently in files or databases, not when
talking to other JSON encoders/decoders.
Cpanel::JSON::XS->new->latin1->encode (["\x{89}\x{abc}"]
=> ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not)
$json = $json->binary ([$enable])
$enabled = $json = $json->get_binary
If the $enable argument is true (or missing), then the "encode"
method will not try to detect an UTF-8 encoding in any JSON string,
it will strictly interpret it as byte sequence. The result might
contain new "\xNN" sequences, which is unparsable JSON. The "decode"
method forbids "\uNNNN" sequences and accepts "\xNN" and octal
"\NNN" sequences.
There is also a special logic for perl 5.6 and utf8. 5.6 encodes any
string to utf-8 automatically when seeing a codepoint >= 0x80 and <
0x100. With the binary flag enabled decode the perl utf8 encoded
string to the original byte encoding and encode this with "\xNN"
escapes. This will result to the same encodings as with newer perls.
But note that binary multi-byte codepoints with 5.6 will result in
"illegal unicode character in binary string" errors, unlike with
newer perls.
If $enable is false, then the "encode" method will smartly try to
detect Unicode characters unless required by the JSON syntax or
other flags and hex and octal sequences are forbidden.
See also the section *ENCODING/CODESET FLAG NOTES* later in this
document.
The main use for this flag is to avoid the smart unicode detection
and possible double encoding. The disadvantage is that the resulting
JSON text is encoded in new "\xNN" and in latin1 characters and must
correctly be treated as such when storing and transferring, a rare
encoding for JSON. It will produce non-readable JSON strings in the
browser. It is therefore most useful when you want to store data
structures known to contain binary data efficiently in files or
databases, not when talking to other JSON encoders/decoders. The
binary decoding method can also be used when an encoder produced a
non-JSON conformant hex or octal encoding "\xNN" or "\NNN".
Cpanel::JSON::XS->new->binary->encode (["\x{89}\x{abc}"])
5.6: Error: malformed or illegal unicode character in binary string
>=5.8: ['\x89\xe0\xaa\xbc']
Cpanel::JSON::XS->new->binary->encode (["\x{89}\x{bc}"])
=> ["\x89\xbc"]
Cpanel::JSON::XS->new->binary->decode (["\x89\ua001"])
Error: malformed or illegal unicode character in binary string
Cpanel::JSON::XS->new->decode (["\x89"])
Error: illegal hex character in non-binary string
$json = $json->utf8 ([$enable])
$enabled = $json->get_utf8
If $enable is true (or missing), then the "encode" method will
encode the JSON result into UTF-8, as required by many protocols,
while the "decode" method expects to be handled an UTF-8-encoded
string. Please note that UTF-8-encoded strings do not contain any
characters outside the range 0..255, they are thus useful for
bytewise/binary I/O. In future versions, enabling this option might
enable autodetection of the UTF-16 and UTF-32 encoding families, as
described in RFC4627.
If $enable is false, then the "encode" method will return the JSON
string as a (non-encoded) Unicode string, while "decode" expects
thus a Unicode string. Any decoding or encoding (e.g. to UTF-8 or
UTF-16) needs to be done yourself, e.g. using the Encode module.
See also the section *ENCODING/CODESET FLAG NOTES* later in this
document.
Example, output UTF-16BE-encoded JSON:
use Encode;
$jsontext = encode "UTF-16BE", Cpanel::JSON::XS->new->encode ($object);
Example, decode UTF-32LE-encoded JSON:
use Encode;
$object = Cpanel::JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
$json = $json->pretty ([$enable])
This enables (or disables) all of the "indent", "space_before" and
"space_after" (and in the future possibly more) flags in one call to
generate the most readable (or most compact) form possible.
Example, pretty-print some simple structure:
my $json = Cpanel::JSON::XS->new->pretty(1)->encode ({a => [1,2]})
=>
{
"a" : [
1,
2
]
}
$json = $json->indent ([$enable])
$enabled = $json->get_indent
If $enable is true (or missing), then the "encode" method will use a
multiline format as output, putting every array member or
object/hash key-value pair into its own line, indenting them
properly.
If $enable is false, no newlines or indenting will be produced, and
the resulting JSON text is guaranteed not to contain any "newlines".
This setting has no effect when decoding JSON texts.
for (;;) {
sysread $fh, my $buf, 65536
or die "read error: $!";
$json->incr_parse ($buf); # void context, so no parsing
# Exit the loop once we found and removed(!) the initial "[".
# In essence, we are (ab-)using the $json object as a simple scalar
# we append data to.
last if $json->incr_text =~ s/^ \s* \[ //x;
}
# now we have the skipped the initial "[", so continue
# parsing all the elements.
for (;;) {
# in this loop we read data until we got a single JSON object
for (;;) {
if (my $obj = $json->incr_parse) {
# do something with $obj
last;
}
# add more data
sysread $fh, my $buf, 65536
or die "read error: $!";
$json->incr_parse ($buf); # void context, so no parsing
}
# in this loop we read data until we either found and parsed the
# separating "," between elements, or the final "]"
for (;;) {
# first skip whitespace
$json->incr_text =~ s/^\s*//;
# if we find "]", we are done
if ($json->incr_text =~ s/^\]//) {
print "finished.\n";
exit;
}
# if we find ",", we can continue with the next element
if ($json->incr_text =~ s/^,//) {
last;
}
# if we find anything else, we have a parse error!
if (length $json->incr_text) {
die "parse error near ", $json->incr_text;
}
# else add more data
sysread $fh, my $buf, 65536
or die "read error: $!";
$json->incr_parse ($buf); # void context, so no parsing
}
This is a complex example, but most of the complexity comes from the
fact that we are trying to be correct (bear with me if I am wrong, I
never ran the above example :).
BOM
Detect all unicode Byte Order Marks on decode. Which are UTF-8,
UTF-16LE, UTF-16BE, UTF-32LE and UTF-32BE.
The BOM encoding is set only for one specific decode call, it does not
change the state of the JSON object.
Warning: With perls older than 5.20 you need load the Encode module
before loading a multibyte BOM, i.e. >= UTF-16. Otherwise an error is
thrown. This is an implementation limitation and might get fixed later.
See <https://tools.ietf.org/html/rfc7159#section-8.1> *"JSON text SHALL
be encoded in UTF-8, UTF-16, or UTF-32."*
*"Implementations MUST NOT add a byte order mark to the beginning of a
JSON text", "implementations (...) MAY ignore the presence of a byte
order mark rather than treating it as an error".*
See also <http://www.unicode.org/faq/utf_bom.html#BOM>.
Beware that Cpanel::JSON::XS is currently the only JSON module which
does accept and decode a BOM.
The latest JSON spec
<https://www.greenbytes.de/tech/webdav/rfc8259.html#character.encoding>
forbid the usage of UTF-16 or UTF-32, the character encoding is UTF-8.
Thus in subsequent updates BOM's of UTF-16 or UTF-32 will throw an
error.
MAPPING
This section describes how Cpanel::JSON::XS maps Perl values to JSON
values and vice versa. These mappings are designed to "do the right
thing" in most circumstances automatically, preserving round-tripping
characteristics (what you put in comes out as something equivalent).
For the more enlightened: note that in the following descriptions,
lowercase *perl* refers to the Perl interpreter, while uppercase *Perl*
refers to the abstract Perl language itself.
JSON -> PERL
object
A JSON object becomes a reference to a hash in Perl. No ordering of
object keys is preserved (JSON does not preserve object key ordering
itself).
array
A JSON array becomes a reference to an array in Perl.
string
A JSON string becomes a string scalar in Perl - Unicode codepoints
in JSON are represented by the same codepoints in the Perl string,
so no manual decoding is necessary.
number
A JSON number becomes either an integer, numeric (floating point) or
string scalar in perl, depending on its range and any fractional
parts. On the Perl level, there is no difference between those as
Perl handles all the conversion details, but an integer may take
slightly less memory and might represent more values exactly than
floating point numbers.
If the number consists of digits only, Cpanel::JSON::XS will try to
represent it as an integer value. If that fails, it will try to
represent it as a numeric (floating point) value if that is possible
without loss of precision. Otherwise it will preserve the number as
a string value (in which case you lose roundtripping ability, as the
JSON number will be re-encoded to a JSON string).
Numbers containing a fractional or exponential part will always be
represented as numeric (floating point) values, possibly at a loss
of precision (in which case you might lose perfect roundtripping
ability, but the JSON number will still be re-encoded as a JSON
number).
Note that precision is not accuracy - binary floating point values
cannot represent most decimal fractions exactly, and when converting
from and to floating point, "Cpanel::JSON::XS" only guarantees
precision up to but not including the least significant bit.
JSON syntax is based on how literals are represented in javascript (the
not-standardized predecessor of ECMAscript) which is presumably why it
is called "JavaScript Object Notation".
However, JSON is not a subset (and also not a superset of course) of
ECMAscript (the standard) or javascript (whatever browsers actually
implement).
If you want to use javascript's "eval" function to "parse" JSON, you
might run into parse errors for valid JSON texts, or the resulting data
structure might not be queryable:
One of the problems is that U+2028 and U+2029 are valid characters
inside JSON strings, but are not allowed in ECMAscript string literals,
so the following Perl fragment will not output something that can be
guaranteed to be parsable by javascript's "eval":
use Cpanel::JSON::XS;
print encode_json [chr 0x2028];
The right fix for this is to use a proper JSON parser in your javascript
programs, and not rely on "eval" (see for example Douglas Crockford's
json2.js parser).
If this is not an option, you can, as a stop-gap measure, simply encode
to ASCII-only JSON:
use Cpanel::JSON::XS;
print Cpanel::JSON::XS->new->ascii->encode ([chr 0x2028]);
Note that this will enlarge the resulting JSON text quite a bit if you
have many non-ASCII characters. You might be tempted to run some regexes
to only escape U+2028 and U+2029, e.g.:
# DO NOT USE THIS!
my $json = Cpanel::JSON::XS->new->utf8->encode ([chr 0x2028]);
$json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028
$json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029
print $json;
Note that *this is a bad idea*: the above only works for U+2028 and
U+2029 and thus only for fully ECMAscript-compliant parsers. Many
existing javascript implementations, however, have issues with other
characters as well - using "eval" naively simply *will* cause problems.
Another problem is that some javascript implementations reserve some
property names for their own purposes (which probably makes them
non-ECMAscript-compliant). For example, Iceweasel reserves the
"__proto__" property name for its own purposes.
If that is a problem, you could parse try to filter the resulting JSON
output for these property strings, e.g.:
$json =~ s/"__proto__"\s*:/"__proto__renamed":/g;
This works because "__proto__" is not valid outside of strings, so every
occurrence of ""__proto__"\s*:" must be a string used as property name.
Raw non-Unicode characters outside the valid unicode range fail now to
parse, because "A string is a sequence of zero or more Unicode
characters" RFC 7159 section 1 and "JSON text SHALL be encoded in
Unicode RFC 7159 section 8.1. We use now the UTF8_DISALLOW_SUPER flag
when parsing unicode.
Since RFC 3629, U+D800 through U+DFFF are not legal Unicode values and
their UTF-8 encodings must be treated as an invalid byte sequence. RFC
8259 section 8.2 admits the spec allows string values that contain bit
sequences that cannot encode Unicode characters and that the behavior of
software that receives such values is unpredictable. To avoid
introducing non-Unicode strings into Perl we use the
UTF8_DISALLOW_SURROGATE flag when parsing Unicode and verify escaped
surrogates form valid pairs.
If you know of other incompatibilities, please let me know.
JSON and YAML
You often hear that JSON is a subset of YAML. *in general, there is no
way to configure JSON::XS to output a data structure as valid YAML* that
works in all cases. If you really must use Cpanel::JSON::XS to generate
YAML, you should use this algorithm (subject to change in future
versions):
my $to_yaml = Cpanel::JSON::XS->new->utf8->space_after (1);
my $yaml = $to_yaml->encode ($ref) . "\n";
This will *usually* generate JSON texts that also parse as valid YAML.
SPEED
It seems that JSON::XS is surprisingly fast, as shown in the following
tables. They have been generated with the help of the "eg/bench" program
in the JSON::XS distribution, to make it easy to compare on your own
system.
JSON::XS is with Data::MessagePack and Sereal one of the fastest
serializers, because JSON and JSON::XS do not support backrefs (no graph
structures), only trees. Storable supports backrefs, i.e. graphs.
Data::MessagePack encodes its data binary (as Storable) and supports
only very simple subset of JSON.
First comes a comparison between various modules using a very short
single-line JSON string (also available at
<http://dist.schmorp.de/misc/json/short.json>).
{"method": "handleMessage", "params": ["user1",
"we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7,
1, 0]}
It shows the number of encodes/decodes per second (JSON::XS uses the
functional interface, while Cpanel::JSON::XS/2 uses the OO interface
with pretty-printing and hash key sorting enabled, Cpanel::JSON::XS/3
enables shrink. JSON::DWIW/DS uses the deserialize function, while
JSON::DWIW::FJ uses the from_json method). Higher is better:
module | encode | decode |
--------------|------------|------------|
JSON::DWIW/DS | 86302.551 | 102300.098 |
JSON::DWIW/FJ | 86302.551 | 75983.768 |
JSON::PP | 15827.562 | 6638.658 |
JSON::Syck | 63358.066 | 47662.545 |
JSON::XS | 511500.488 | 511500.488 |
JSON::XS/2 | 291271.111 | 388361.481 |
JSON::XS/3 | 361577.931 | 361577.931 |
Storable | 66788.280 | 265462.278 |
( run in 0.876 second using v1.01-cache-2.11-cpan-df04353d9ac )