Cpanel-JSON-XS
view release on metacpan or search on metacpan
package Cpanel::JSON::XS;
our $VERSION = '4.42';
our $XS_VERSION = $VERSION;
# $VERSION = eval $VERSION;
=pod
=head1 NAME
Cpanel::JSON::XS - cPanel fork of JSON::XS, fast and correct serializing
=head1 SYNOPSIS
use Cpanel::JSON::XS;
# exported functions, they croak on error
# and expect/generate UTF-8
$utf8_encoded_json_text = encode_json $perl_hash_or_arrayref;
$perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;
# OO-interface
$coder = Cpanel::JSON::XS->new->ascii->pretty->allow_nonref;
$pretty_printed_unencoded = $coder->encode ($perl_scalar);
$perl_scalar = $coder->decode ($unicode_json_text);
# Note that 5.6 misses most smart utf8 and encoding functionalities
# of newer releases.
# Note that L<JSON::MaybeXS> will automatically use Cpanel::JSON::XS
# if available, at virtually no speed overhead either, so you should
# be able to just:
use JSON::MaybeXS;
# and do the same things, except that you have a pure-perl fallback now.
Note that this module will be replaced by a new JSON::Safe module soon,
with the same API just guaranteed safe defaults.
=head1 DESCRIPTION
This module converts Perl data structures to JSON and vice versa. Its
primary goal is to be I<correct> and its secondary goal is to be
I<fast>. To reach the latter goal it was written in C.
As this is the n-th-something JSON module on CPAN, what was the reason
to write yet another JSON module? While it seems there are many JSON
modules, none of them correctly handle all corner cases, and in most cases
their maintainers are unresponsive, gone missing, or not listening to bug
reports for other reasons.
See below for the cPanel fork.
See MAPPING, below, on how Cpanel::JSON::XS maps perl values to JSON
values and vice versa.
=head2 FEATURES
=over 4
=item * correct Unicode handling
This module knows how to handle Unicode with Perl version higher than 5.8.5,
documents how and when it does so, and even documents what "correct" means.
=item * round-trip integrity
When you serialize a perl data structure using only data types supported
by JSON and Perl, the deserialized data structure is identical on the Perl
level. (e.g. the string "2.0" doesn't suddenly become "2" just because
it looks like a number). There I<are> minor exceptions to this, read the
MAPPING section below to learn about those.
=item * strict checking of JSON correctness
There is no guessing, no generating of illegal JSON texts by default,
and only JSON is accepted as input by default. the latter is a security
feature.
=item * fast
Compared to other JSON modules and other serializers such as Storable,
this module usually compares favourably in terms of speed, too.
=item * simple to use
This module has both a simple functional interface as well as an object
oriented interface.
=item * reasonably versatile output formats
You can choose between the most compact guaranteed-single-line format
possible (nice for simple line-based protocols), a pure-ASCII format
(for when your transport is not 8-bit clean, still supports the whole
Unicode range), or a pretty-printed format (for when you want to read that
stuff). Or you can combine those features in whatever way you like.
=back
=head2 cPanel fork
Since the original author MLEHMANN has no public
bugtracker, this cPanel fork sits now on github.
src repo: L<https://github.com/rurban/Cpanel-JSON-XS>
original: L<http://cvs.schmorp.de/JSON-XS/>
RT: L<https://github.com/rurban/Cpanel-JSON-XS/issues>
or L<https://rt.cpan.org/Public/Dist/Display.html?Queue=Cpanel-JSON-XS>
B<Changes to JSON::XS>
- bare hashkeys are now checked for utf8. (GH #209)
- stricter decode_json() as documented. non-refs are disallowed.
safe by default.
added a 2nd optional argument. decode() honors now allow_nonref.
- fixed encode of numbers for dual-vars. Different string
representations are preserved, but numbers with temporary strings
which represent the same number are here treated as numbers, not
strings. Cpanel::JSON::XS is a bit slower, but preserves numeric
types better.
- numbers ending with .0 stray numbers, are not converted to
integers. [#63] dual-vars which are represented as number not
integer (42+"bar" != 5.8.9) are now encoded as number (=> 42.0)
because internally it's now a NOK type. However !!1 which is
wrongly encoded in 5.8 as "1"/1.0 is still represented as integer.
- different handling of inf/nan. Default now to null, optionally with
stringify_infnan() to "inf"/"nan". [#28, #32]
- added C<binary> extension, non-JSON and non JSON parsable, allows
C<\xNN> and C<\NNN> sequences.
- 5.6.2 support; sacrificing some utf8 features (assuming bytes
all-over), no multi-byte unicode characters with 5.6.
- interop for true/false overloading. JSON::XS, JSON::PP and Mojo::JSON
representations for booleans are accepted and JSON::XS accepts
Cpanel::JSON::XS booleans [#13, #37]
Fixed overloading of booleans. Cpanel::JSON::XS::true stringifies again
to "1", not "true", analog to all other JSON modules.
- native boolean mapping of yes and no to true and false, as in YAML::XS.
In perl C<!0> is yes, C<!1> is no.
The JSON value true maps to 1, false maps to 0. [#39]
- support arbitrary stringification with encode, with convert_blessed
and allow_blessed.
- ithread support. Cpanel::JSON::XS is thread-safe, JSON::XS not
- is_bool can be called as method, JSON::XS::is_bool not.
- performance optimizations for threaded Perls
- relaxed mode, allowing many popular extensions
- protect our magic object from corruption by wrong or missing external
methods, like FREEZE/THAW or serialization with other methods.
- additional fixes for:
- #208 - no security-relevant out-of-bounds reading of module memory
when decoding hash keys without ending ':'
- [cpan #88061] AIX atof without USE_LONG_DOUBLE
- #10 unshare_hek crash
- #7, #29 avoid re-blessing where possible. It fails in JSON::XS for
READONLY values, i.e. restricted hashes.
- #41 overloading of booleans, use the object not the reference.
- #62 -Dusequadmath conversion and no SEGV.
- #72 parsing of values followed \0, like 1\0 does fail.
- #72 parsing of illegal unicode or non-unicode characters.
- #96 locale-insensitive numeric conversion.
- #154 numeric conversion fixed since 5.22, using the same strtold as perl5.
- #167 sort tied hashes with canonical.
- #212 fix utf8 object stringification
- public maintenance and bugtracker
- use ppport.h, sanify XS.xs comment styles, harness C coding style
- common::sense is optional. When available it is not used in the
published production module, just during development and testing.
- extended testsuite, passes all http://seriot.ch/projects/parsing_json.html
tests. In fact it is the only know JSON decoder which does so,
while also being the fastest.
- support many more options and methods from JSON::PP:
stringify_infnan, allow_unknown, allow_stringify, allow_barekey,
encode_stringify, allow_bignum, allow_singlequote, dupkeys_as_arrayref,
sort_by (partially), escape_slash, convert_blessed, ...
optional decode_json(, allow_nonref) arg.
relaxed implements allow_dupkeys.
- support all 5 unicode L<BOM|/BOM>'s: UTF-8, UTF-16LE, UTF-16BE, UTF-32LE,
UTF-32BE, encoding internally to UTF-8.
=cut
our @ISA = qw(Exporter);
our @EXPORT = qw(encode_json decode_json to_json from_json);
sub to_json($@) {
if ($] >= 5.008) {
require Carp;
Carp::croak ("Cpanel::JSON::XS::to_json has been renamed to encode_json,".
" either downgrade to pre-2.0 versions of Cpanel::JSON::XS or".
" rename the call");
} else {
_to_json(@_);
}
}
sub from_json($@) {
if ($] >= 5.008) {
require Carp;
Carp::croak ("Cpanel::JSON::XS::from_json has been renamed to decode_json,".
" either downgrade to pre-2.0 versions of Cpanel::JSON::XS or".
" rename the call");
} else {
_from_json(@_);
}
}
use Exporter;
use XSLoader;
=head1 FUNCTIONAL INTERFACE
The following convenience methods are provided by this module. They are
exported by default:
=over 4
=item $json_text = encode_json $perl_scalar, [json_type]
Converts the given Perl data structure to a UTF-8 encoded, binary string
(that is, the string contains octets only). Croaks on error.
This function call is functionally identical to:
$json_text = Cpanel::JSON::XS->new->utf8->encode ($perl_scalar, $json_type)
Except being faster.
For the type argument see L<Cpanel::JSON::XS::Type>.
=item $perl_scalar = decode_json $json_text [, $allow_nonref [, my $json_type ] ]
The opposite of C<encode_json>: expects an UTF-8 (binary) string of an
json reference and tries to parse that as an UTF-8 encoded JSON text,
returning the resulting reference. Croaks on error.
This function call is functionally identical to:
respectively and are used to represent JSON C<true> and C<false>
values in Perl. (Also recognizes the booleans produced by L<JSON::XS>.)
See MAPPING, below, for more information on how JSON values are mapped
to Perl.
=back
=head1 DEPRECATED FUNCTIONS
=over
=item from_json
from_json has been renamed to decode_json
=item to_json
to_json has been renamed to encode_json
=back
=head1 A FEW NOTES ON UNICODE AND PERL
Since this often leads to confusion, here are a few very clear words on
how Unicode works in Perl, modulo bugs.
=over 4
=item 1. Perl strings can store characters with ordinal values > 255.
This enables you to store Unicode characters as single characters in a
Perl string - very natural.
=item 2. Perl does I<not> associate an encoding with your strings.
... until you force it to, e.g. when matching it against a regex, or
printing the scalar to a file, in which case Perl either interprets
your string as locale-encoded text, octets/binary, or as Unicode,
depending on various settings. In no case is an encoding stored
together with your data, it is I<use> that decides encoding, not any
magical meta data.
=item 3. The internal utf-8 flag has no meaning with regards to the
encoding of your string.
=item 4. A "Unicode String" is simply a string where each character
can be validly interpreted as a Unicode code point.
If you have UTF-8 encoded data, it is no longer a Unicode string, but
a Unicode string encoded in UTF-8, giving you a binary string.
=item 5. A string containing "high" (> 255) character values is I<not>
a UTF-8 string.
=item 6. Raw non-Unicode characters below U+10FFFF are allowed.
The 66 Unicode noncharacters U+FDD0..U+FDEF, and U+*FFFE, U+*FFFF are
allowed without warning, as JSON::PP does, see
L<http://www.unicode.org/versions/corrigendum9.html>. But illegal
surrogate pairs fail to parse.
=item 7. Raw non-Unicode characters above U+10FFFF are disallowed.
Raw non-Unicode characters outside the valid unicode range fail to
parse, because "A string is a sequence of zero or more Unicode
characters" RFC 7159 section 1 and "JSON text SHALL be encoded in
Unicode RFC 7159 section 8.1. We use now the UTF8_DISALLOW_SUPER
flag when parsing unicode.
=item 8. Lone surrogates or illegal surrogate pairs are disallowed.
Since RFC 3629, U+D800 through U+DFFF are not legal Unicode values and
their UTF-8 encodings must be treated as an invalid byte sequence.
RFC 8259 section 8.2 admits the spec allows string values that contain
bit sequences that cannot encode Unicode characters and that the
behavior of software that receives such values is unpredictable. To
avoid introducing non-Unicode strings into Perl we use the
UTF8_DISALLOW_SURROGATE flag when parsing Unicode and verify escaped
surrogates form valid pairs.
=back
I hope this helps :)
=head1 OBJECT-ORIENTED INTERFACE
The object oriented interface lets you configure your own encoding or
decoding style, within the limits of supported formats.
=over 4
=item $json = new Cpanel::JSON::XS
Creates a new JSON object that can be used to de/encode JSON
strings. All boolean flags described below are by default I<disabled>.
The mutators for flags all return the JSON object again and thus calls can
be chained:
my $json = Cpanel::JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
=> {"a": [1, 2]}
=item $json = $json->ascii ([$enable])
=item $enabled = $json->get_ascii
If C<$enable> is true (or missing), then the C<encode> method will not
generate characters outside the code range C<0..127> (which is ASCII). Any
Unicode characters outside that range will be escaped using either a
single C<\uXXXX> (BMP characters) or a double C<\uHHHH\uLLLLL> escape sequence,
as per RFC4627. The resulting encoded JSON text can be treated as a native
Unicode string, an ascii-encoded, latin1-encoded or UTF-8 encoded string,
or any other superset of ASCII.
If C<$enable> is false, then the C<encode> method will not escape Unicode
characters unless required by the JSON syntax or other flags. This results
in a faster and more compact format.
See also the section I<ENCODING/CODESET FLAG NOTES> later in this
document.
The main use for this flag is to produce JSON texts that can be
transmitted over a 7-bit channel, as the encoded JSON texts will not
contain any 8 bit characters.
Cpanel::JSON::XS->new->ascii (1)->encode ([chr 0x10401])
=> ["\ud801\udc01"]
=item $json = $json->latin1 ([$enable])
=item $enabled = $json->get_latin1
If C<$enable> is true (or missing), then the C<encode> method will encode
the resulting JSON text as latin1 (or ISO-8859-1), escaping any characters
outside the code range C<0..255>. The resulting string can be treated as a
latin1-encoded JSON text or a native Unicode string. The C<decode> method
will not be affected in any way by this flag, as C<decode> by default
expects Unicode, which is a strict superset of latin1.
If C<$enable> is false, then the C<encode> method will not escape Unicode
characters unless required by the JSON syntax or other flags.
See also the section I<ENCODING/CODESET FLAG NOTES> later in this
document.
The main use for this flag is efficiently encoding binary data as JSON
text, as most octets will not be escaped, resulting in a smaller encoded
size. The disadvantage is that the resulting JSON text is encoded
in latin1 (and must correctly be treated as such when storing and
transferring), a rare encoding for JSON. It is therefore most useful when
you want to store data structures known to contain binary data efficiently
in files or databases, not when talking to other JSON encoders/decoders.
Cpanel::JSON::XS->new->latin1->encode (["\x{89}\x{abc}"]
=> ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not)
=item $json = $json->binary ([$enable])
=item $enabled = $json = $json->get_binary
If the C<$enable> argument is true (or missing), then the C<encode>
method will not try to detect an UTF-8 encoding in any JSON string, it
will strictly interpret it as byte sequence. The result might contain
new C<\xNN> sequences, which is B<unparsable JSON>. The C<decode>
method forbids C<\uNNNN> sequences and accepts C<\xNN> and octal
C<\NNN> sequences.
There is also a special logic for perl 5.6 and utf8. 5.6 encodes any
string to utf-8 automatically when seeing a codepoint >= C<0x80> and
< C<0x100>. With the binary flag enabled decode the perl utf8 encoded
string to the original byte encoding and encode this with C<\xNN>
escapes. This will result to the same encodings as with newer
perls. But note that binary multi-byte codepoints with 5.6 will
result in C<illegal unicode character in binary string> errors,
unlike with newer perls.
If C<$enable> is false, then the C<encode> method will smartly try to
detect Unicode characters unless required by the JSON syntax or other
flags and hex and octal sequences are forbidden.
See also the section I<ENCODING/CODESET FLAG NOTES> later in this
document.
The main use for this flag is to avoid the smart unicode detection and
possible double encoding. The disadvantage is that the resulting JSON
text is encoded in new C<\xNN> and in latin1 characters and must
correctly be treated as such when storing and transferring, a rare
encoding for JSON. It will produce non-readable JSON strings in the
browser. It is therefore most useful when you want to store data
structures known to contain binary data efficiently in files or
databases, not when talking to other JSON encoders/decoders. The
binary decoding method can also be used when an encoder produced a
non-JSON conformant hex or octal encoding C<\xNN> or C<\NNN>.
Cpanel::JSON::XS->new->binary->encode (["\x{89}\x{abc}"])
5.6: Error: malformed or illegal unicode character in binary string
>=5.8: ['\x89\xe0\xaa\xbc']
Cpanel::JSON::XS->new->binary->encode (["\x{89}\x{bc}"])
=> ["\x89\xbc"]
Cpanel::JSON::XS->new->binary->decode (["\x89\ua001"])
Error: malformed or illegal unicode character in binary string
Cpanel::JSON::XS->new->decode (["\x89"])
Error: illegal hex character in non-binary string
=item $json = $json->utf8 ([$enable])
=item $enabled = $json->get_utf8
If C<$enable> is true (or missing), then the C<encode> method will encode
the JSON result into UTF-8, as required by many protocols, while the
C<decode> method expects to be handled an UTF-8-encoded string. Please
note that UTF-8-encoded strings do not contain any characters outside the
range C<0..255>, they are thus useful for bytewise/binary I/O. In future
versions, enabling this option might enable autodetection of the UTF-16
and UTF-32 encoding families, as described in RFC4627.
If C<$enable> is false, then the C<encode> method will return the JSON
string as a (non-encoded) Unicode string, while C<decode> expects thus a
Unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
to be done yourself, e.g. using the Encode module.
See also the section I<ENCODING/CODESET FLAG NOTES> later in this
document.
Example, output UTF-16BE-encoded JSON:
use Encode;
$jsontext = encode "UTF-16BE", Cpanel::JSON::XS->new->encode ($object);
Example, decode UTF-32LE-encoded JSON:
use Encode;
$object = Cpanel::JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
=item $json = $json->pretty ([$enable])
This enables (or disables) all of the C<indent>, C<space_before> and
C<space_after> (and in the future possibly more) flags in one call to
generate the most readable (or most compact) form possible.
Example, pretty-print some simple structure:
my $json = Cpanel::JSON::XS->new->pretty(1)->encode ({a => [1,2]})
=>
{
"a" : [
1,
2
]
}
=item $json = $json->indent ([$enable])
=item $enabled = $json->get_indent
If C<$enable> is true (or missing), then the C<encode> method will use
a multiline format as output, putting every array member or
object/hash key-value pair into its own line, indenting them properly.
sysread $fh, my $buf, 65536
or die "read error: $!";
$json->incr_parse ($buf); # void context, so no parsing
# Exit the loop once we found and removed(!) the initial "[".
# In essence, we are (ab-)using the $json object as a simple scalar
# we append data to.
last if $json->incr_text =~ s/^ \s* \[ //x;
}
# now we have the skipped the initial "[", so continue
# parsing all the elements.
for (;;) {
# in this loop we read data until we got a single JSON object
for (;;) {
if (my $obj = $json->incr_parse) {
# do something with $obj
last;
}
# add more data
sysread $fh, my $buf, 65536
or die "read error: $!";
$json->incr_parse ($buf); # void context, so no parsing
}
# in this loop we read data until we either found and parsed the
# separating "," between elements, or the final "]"
for (;;) {
# first skip whitespace
$json->incr_text =~ s/^\s*//;
# if we find "]", we are done
if ($json->incr_text =~ s/^\]//) {
print "finished.\n";
exit;
}
# if we find ",", we can continue with the next element
if ($json->incr_text =~ s/^,//) {
last;
}
# if we find anything else, we have a parse error!
if (length $json->incr_text) {
die "parse error near ", $json->incr_text;
}
# else add more data
sysread $fh, my $buf, 65536
or die "read error: $!";
$json->incr_parse ($buf); # void context, so no parsing
}
This is a complex example, but most of the complexity comes from the fact
that we are trying to be correct (bear with me if I am wrong, I never ran
the above example :).
=head1 BOM
Detect all unicode B<Byte Order Marks> on decode.
Which are UTF-8, UTF-16LE, UTF-16BE, UTF-32LE and UTF-32BE.
The BOM encoding is set only for one specific decode call, it does not
change the state of the JSON object.
B<Warning>: With perls older than 5.20 you need load the Encode module
before loading a multibyte BOM, i.e. >= UTF-16. Otherwise an error is
thrown. This is an implementation limitation and might get fixed later.
See L<https://tools.ietf.org/html/rfc7159#section-8.1>
I<"JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.">
I<"Implementations MUST NOT add a byte order mark to the beginning of a
JSON text", "implementations (...) MAY ignore the presence of a byte
order mark rather than treating it as an error".>
See also L<http://www.unicode.org/faq/utf_bom.html#BOM>.
Beware that Cpanel::JSON::XS is currently the only JSON module which
does accept and decode a BOM.
The latest JSON spec
L<https://www.greenbytes.de/tech/webdav/rfc8259.html#character.encoding>
forbid the usage of UTF-16 or UTF-32, the character encoding is UTF-8.
Thus in subsequent updates BOM's of UTF-16 or UTF-32 will throw an error.
=head1 MAPPING
This section describes how Cpanel::JSON::XS maps Perl values to JSON
values and vice versa. These mappings are designed to "do the right
thing" in most circumstances automatically, preserving round-tripping
characteristics (what you put in comes out as something equivalent).
For the more enlightened: note that in the following descriptions,
lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
refers to the abstract Perl language itself.
=head2 JSON -> PERL
=over 4
=item object
A JSON object becomes a reference to a hash in Perl. No ordering of object
keys is preserved (JSON does not preserve object key ordering itself).
=item array
A JSON array becomes a reference to an array in Perl.
=item string
A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
are represented by the same codepoints in the Perl string, so no manual
decoding is necessary.
=item number
A JSON number becomes either an integer, numeric (floating point) or
string scalar in perl, depending on its range and any fractional parts. On
the Perl level, there is no difference between those as Perl handles all
the conversion details, but an integer may take slightly less memory and
might represent more values exactly than floating point numbers.
If the number consists of digits only, Cpanel::JSON::XS will try to
represent it as an integer value. If that fails, it will try to
represent it as a numeric (floating point) value if that is possible
without loss of precision. Otherwise it will preserve the number as a
string value (in which case you lose roundtripping ability, as the
JSON number will be re-encoded to a JSON string).
Numbers containing a fractional or exponential part will always be
represented as numeric (floating point) values, possibly at a loss of
precision (in which case you might lose perfect roundtripping ability, but
the JSON number will still be re-encoded as a JSON number).
JSON syntax is based on how literals are represented in javascript (the
not-standardized predecessor of ECMAscript) which is presumably why it is
called "JavaScript Object Notation".
However, JSON is not a subset (and also not a superset of course) of
ECMAscript (the standard) or javascript (whatever browsers actually
implement).
If you want to use javascript's C<eval> function to "parse" JSON, you
might run into parse errors for valid JSON texts, or the resulting data
structure might not be queryable:
One of the problems is that U+2028 and U+2029 are valid characters inside
JSON strings, but are not allowed in ECMAscript string literals, so the
following Perl fragment will not output something that can be guaranteed
to be parsable by javascript's C<eval>:
use Cpanel::JSON::XS;
print encode_json [chr 0x2028];
The right fix for this is to use a proper JSON parser in your javascript
programs, and not rely on C<eval> (see for example Douglas Crockford's
F<json2.js> parser).
If this is not an option, you can, as a stop-gap measure, simply encode to
ASCII-only JSON:
use Cpanel::JSON::XS;
print Cpanel::JSON::XS->new->ascii->encode ([chr 0x2028]);
Note that this will enlarge the resulting JSON text quite a bit if you
have many non-ASCII characters. You might be tempted to run some regexes
to only escape U+2028 and U+2029, e.g.:
# DO NOT USE THIS!
my $json = Cpanel::JSON::XS->new->utf8->encode ([chr 0x2028]);
$json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028
$json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029
print $json;
Note that I<this is a bad idea>: the above only works for U+2028 and
U+2029 and thus only for fully ECMAscript-compliant parsers. Many existing
javascript implementations, however, have issues with other characters as
well - using C<eval> naively simply I<will> cause problems.
Another problem is that some javascript implementations reserve
some property names for their own purposes (which probably makes
them non-ECMAscript-compliant). For example, Iceweasel reserves the
C<__proto__> property name for its own purposes.
If that is a problem, you could parse try to filter the resulting JSON
output for these property strings, e.g.:
$json =~ s/"__proto__"\s*:/"__proto__renamed":/g;
This works because C<__proto__> is not valid outside of strings, so every
occurrence of C<"__proto__"\s*:> must be a string used as property name.
Raw non-Unicode characters outside the valid unicode range fail now to
parse, because "A string is a sequence of zero or more Unicode
characters" RFC 7159 section 1 and "JSON text SHALL be encoded in
Unicode RFC 7159 section 8.1. We use now the UTF8_DISALLOW_SUPER
flag when parsing unicode.
Since RFC 3629, U+D800 through U+DFFF are not legal Unicode values and
their UTF-8 encodings must be treated as an invalid byte sequence.
RFC 8259 section 8.2 admits the spec allows string values that contain
bit sequences that cannot encode Unicode characters and that the
behavior of software that receives such values is unpredictable. To
avoid introducing non-Unicode strings into Perl we use the
UTF8_DISALLOW_SURROGATE flag when parsing Unicode and verify escaped
surrogates form valid pairs.
If you know of other incompatibilities, please let me know.
=head2 JSON and YAML
You often hear that JSON is a subset of YAML. I<in general, there is
no way to configure JSON::XS to output a data structure as valid YAML>
that works in all cases. If you really must use Cpanel::JSON::XS to
generate YAML, you should use this algorithm (subject to change in
future versions):
my $to_yaml = Cpanel::JSON::XS->new->utf8->space_after (1);
my $yaml = $to_yaml->encode ($ref) . "\n";
This will I<usually> generate JSON texts that also parse as valid
YAML.
=head2 SPEED
It seems that JSON::XS is surprisingly fast, as shown in the following
tables. They have been generated with the help of the C<eg/bench> program
in the JSON::XS distribution, to make it easy to compare on your own
system.
JSON::XS is with L<Data::MessagePack> and L<Sereal> one of the fastest
serializers, because JSON and JSON::XS do not support backrefs (no
graph structures), only trees. Storable supports backrefs,
i.e. graphs. Data::MessagePack encodes its data binary (as Storable)
and supports only very simple subset of JSON.
First comes a comparison between various modules using
a very short single-line JSON string (also available at
L<http://dist.schmorp.de/misc/json/short.json>).
{"method": "handleMessage", "params": ["user1",
"we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7,
1, 0]}
It shows the number of encodes/decodes per second (JSON::XS uses
the functional interface, while Cpanel::JSON::XS/2 uses the OO interface
with pretty-printing and hash key sorting enabled, Cpanel::JSON::XS/3 enables
shrink. JSON::DWIW/DS uses the deserialize function, while JSON::DWIW::FJ
uses the from_json method). Higher is better:
module | encode | decode |
--------------|------------|------------|
JSON::DWIW/DS | 86302.551 | 102300.098 |
JSON::DWIW/FJ | 86302.551 | 75983.768 |
JSON::PP | 15827.562 | 6638.658 |
( run in 2.107 seconds using v1.01-cache-2.11-cpan-df04353d9ac )