Devel-IPerl-Plugin-Perlbrew

 view release on metacpan or  search on metacpan

bin/perlbrewise-spec  view on Meta::CPAN

  C<Cpanel::JSON::XS> throws an exception.
  
  =back
  
  =head3 DESERIALIZATION
  
  For deserialization there are only two cases to consider: either
  nonstandard tagging was used, in which case C<allow_tags> decides,
  or objects cannot be automatically be deserialized, in which
  case you can use postprocessing or the C<filter_json_object> or
  C<filter_json_single_key_object> callbacks to get some real objects our of
  your JSON.
  
  This section only considers the tagged value case: I a tagged JSON object
  is encountered during decoding and C<allow_tags> is disabled, a parse
  error will result (as if tagged values were not part of the grammar).
  
  If C<allow_tags> is enabled, C<Cpanel::JSON::XS> will look up the C<THAW> method
  of the package/classname used during serialization (it will not attempt
  to load the package as a Perl module). If there is no such method, the
  decoding will fail with an error.
  
  Otherwise, the C<THAW> method is invoked with the classname as first
  argument, the constant string C<JSON> as second argument, and all the
  values from the JSON array (the values originally returned by the
  C<FREEZE> method) as remaining arguments.
  
  The method must then return the object. While technically you can return
  any Perl scalar, you might have to enable the C<enable_nonref> setting to
  make that work in all cases, so better return an actual blessed reference.
  
  As an example, let's implement a C<THAW> function that regenerates the
  C<My::Object> from the C<FREEZE> example earlier:
  
     sub My::Object::THAW {
        my ($class, $serializer, $type, $id) = @_;
  
        $class->new (type => $type, id => $id)
     }
  
  See the L</SECURITY CONSIDERATIONS> section below. Allowing external
  json objects being deserialized to perl objects is usually a very bad
  idea.
  
  
  =head1 ENCODING/CODESET FLAG NOTES
  
  The interested reader might have seen a number of flags that signify
  encodings or codesets - C<utf8>, C<latin1>, C<binary> and
  C<ascii>. There seems to be some confusion on what these do, so here
  is a short comparison:
  
  C<utf8> controls whether the JSON text created by C<encode> (and expected
  by C<decode>) is UTF-8 encoded or not, while C<latin1> and C<ascii> only
  control whether C<encode> escapes character values outside their respective
  codeset range. Neither of these flags conflict with each other, although
  some combinations make less sense than others.
  
  Care has been taken to make all flags symmetrical with respect to
  C<encode> and C<decode>, that is, texts encoded with any combination of
  these flag values will be correctly decoded when the same flags are used
  - in general, if you use different flag settings while encoding vs. when
  decoding you likely have a bug somewhere.
  
  Below comes a verbose discussion of these flags. Note that a "codeset" is
  simply an abstract set of character-codepoint pairs, while an encoding
  takes those codepoint numbers and I<encodes> them, in our case into
  octets. Unicode is (among other things) a codeset, UTF-8 is an encoding,
  and ISO-8859-1 (= latin 1) and ASCII are both codesets I<and> encodings at
  the same time, which can be confusing.
  
  =over 4
  
  =item C<utf8> flag disabled
  
  When C<utf8> is disabled (the default), then C<encode>/C<decode> generate
  and expect Unicode strings, that is, characters with high ordinal Unicode
  values (> 255) will be encoded as such characters, and likewise such
  characters are decoded as-is, no changes to them will be done, except
  "(re-)interpreting" them as Unicode codepoints or Unicode characters,
  respectively (to Perl, these are the same thing in strings unless you do
  funny/weird/dumb stuff).
  
  This is useful when you want to do the encoding yourself (e.g. when you
  want to have UTF-16 encoded JSON texts) or when some other layer does
  the encoding for you (for example, when printing to a terminal using a
  filehandle that transparently encodes to UTF-8 you certainly do NOT want
  to UTF-8 encode your data first and have Perl encode it another time).
  
  =item C<utf8> flag enabled
  
  If the C<utf8>-flag is enabled, C<encode>/C<decode> will encode all
  characters using the corresponding UTF-8 multi-byte sequence, and will
  expect your input strings to be encoded as UTF-8, that is, no "character"
  of the input string must have any value > 255, as UTF-8 does not allow
  that.
  
  The C<utf8> flag therefore switches between two modes: disabled means you
  will get a Unicode string in Perl, enabled means you get an UTF-8 encoded
  octet/binary string in Perl.
  
  =item C<latin1>, C<binary> or C<ascii> flags enabled
  
  With C<latin1> (or C<ascii>) enabled, C<encode> will escape
  characters with ordinal values > 255 (> 127 with C<ascii>) and encode
  the remaining characters as specified by the C<utf8> flag.
  With C<binary> enabled, ordinal values > 255 are illegal.
  
  If C<utf8> is disabled, then the result is also correctly encoded in those
  character sets (as both are proper subsets of Unicode, meaning that a
  Unicode string with all character values < 256 is the same thing as a
  ISO-8859-1 string, and a Unicode string with all character values < 128 is
  the same thing as an ASCII string in Perl).
  
  If C<utf8> is enabled, you still get a correct UTF-8-encoded string,
  regardless of these flags, just some more characters will be escaped using
  C<\uXXXX> then before.
  
  Note that ISO-8859-1-I<encoded> strings are not compatible with UTF-8
  encoding, while ASCII-encoded strings are. That is because the ISO-8859-1
  encoding is NOT a subset of UTF-8 (despite the ISO-8859-1 I<codeset> being
  a subset of Unicode), while ASCII is.
  
  Surprisingly, C<decode> will ignore these flags and so treat all input
  values as governed by the C<utf8> flag. If it is disabled, this allows you
  to decode ISO-8859-1- and ASCII-encoded strings, as both strict subsets of
  Unicode. If it is enabled, you can correctly decode UTF-8 encoded strings.
  
  So neither C<latin1>, C<binary> nor C<ascii> are incompatible with the
  C<utf8> flag - they only govern when the JSON output engine escapes a
  character or not.
  
  The main use for C<latin1> or C<binary> is to relatively efficiently
  store binary data as JSON, at the expense of breaking compatibility
  with most JSON decoders.
  
  The main use for C<ascii> is to force the output to not contain characters
  with values > 127, which means you can interpret the resulting string
  as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about any character set and

bin/perlbrewise-spec  view on Meta::CPAN

  JSON syntax is based on how literals are represented in javascript (the
  not-standardized predecessor of ECMAscript) which is presumably why it is
  called "JavaScript Object Notation".
  
  However, JSON is not a subset (and also not a superset of course) of
  ECMAscript (the standard) or javascript (whatever browsers actually
  implement).
  
  If you want to use javascript's C<eval> function to "parse" JSON, you
  might run into parse errors for valid JSON texts, or the resulting data
  structure might not be queryable:
  
  One of the problems is that U+2028 and U+2029 are valid characters inside
  JSON strings, but are not allowed in ECMAscript string literals, so the
  following Perl fragment will not output something that can be guaranteed
  to be parsable by javascript's C<eval>:
  
     use Cpanel::JSON::XS;
  
     print encode_json [chr 0x2028];
  
  The right fix for this is to use a proper JSON parser in your javascript
  programs, and not rely on C<eval> (see for example Douglas Crockford's
  F<json2.js> parser).
  
  If this is not an option, you can, as a stop-gap measure, simply encode to
  ASCII-only JSON:
  
     use Cpanel::JSON::XS;
  
     print Cpanel::JSON::XS->new->ascii->encode ([chr 0x2028]);
  
  Note that this will enlarge the resulting JSON text quite a bit if you
  have many non-ASCII characters. You might be tempted to run some regexes
  to only escape U+2028 and U+2029, e.g.:
  
     # DO NOT USE THIS!
     my $json = Cpanel::JSON::XS->new->utf8->encode ([chr 0x2028]);
     $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028
     $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029
     print $json;
  
  Note that I<this is a bad idea>: the above only works for U+2028 and
  U+2029 and thus only for fully ECMAscript-compliant parsers. Many existing
  javascript implementations, however, have issues with other characters as
  well - using C<eval> naively simply I<will> cause problems.
  
  Another problem is that some javascript implementations reserve
  some property names for their own purposes (which probably makes
  them non-ECMAscript-compliant). For example, Iceweasel reserves the
  C<__proto__> property name for its own purposes.
  
  If that is a problem, you could parse try to filter the resulting JSON
  output for these property strings, e.g.:
  
     $json =~ s/"__proto__"\s*:/"__proto__renamed":/g;
  
  This works because C<__proto__> is not valid outside of strings, so every
  occurrence of C<"__proto__"\s*:> must be a string used as property name.
  
  Unicode non-characters between U+FFFD and U+10FFFF are decoded either
  to the recommended U+FFFD REPLACEMENT CHARACTER (see Unicode PR #121:
  Recommended Practice for Replacement Characters), or in the binary or
  relaxed mode left as is, keeping the illegal non-characters as before.
  
  Raw non-Unicode characters outside the valid unicode range fail now to
  parse, because "A string is a sequence of zero or more Unicode
  characters" RFC 7159 section 1 and "JSON text SHALL be encoded in
  Unicode RFC 7159 section 8.1. We use now the UTF8_DISALLOW_SUPER
  flag when parsing unicode.
  
  If you know of other incompatibilities, please let me know.
  
  
  =head2 JSON and YAML
  
  You often hear that JSON is a subset of YAML.  I<in general, there is
  no way to configure JSON::XS to output a data structure as valid YAML>
  that works in all cases.  If you really must use Cpanel::JSON::XS to
  generate YAML, you should use this algorithm (subject to change in
  future versions):
  
     my $to_yaml = Cpanel::JSON::XS->new->utf8->space_after (1);
     my $yaml = $to_yaml->encode ($ref) . "\n";
  
  This will I<usually> generate JSON texts that also parse as valid
  YAML.
  
  
  =head2 SPEED
  
  It seems that JSON::XS is surprisingly fast, as shown in the following
  tables. They have been generated with the help of the C<eg/bench> program
  in the JSON::XS distribution, to make it easy to compare on your own
  system.
  
  JSON::XS is with L<Data::MessagePack> and L<Sereal> one of the fastest
  serializers, because JSON and JSON::XS do not support backrefs (no
  graph structures), only trees. Storable supports backrefs,
  i.e. graphs. Data::MessagePack encodes its data binary (as Storable)
  and supports only very simple subset of JSON.
  
  First comes a comparison between various modules using
  a very short single-line JSON string (also available at
  L<http://dist.schmorp.de/misc/json/short.json>).
  
     {"method": "handleMessage", "params": ["user1",
     "we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7,
     1,  0]}
  
  It shows the number of encodes/decodes per second (JSON::XS uses
  the functional interface, while Cpanel::JSON::XS/2 uses the OO interface
  with pretty-printing and hash key sorting enabled, Cpanel::JSON::XS/3 enables
  shrink. JSON::DWIW/DS uses the deserialize function, while JSON::DWIW::FJ
  uses the from_json method). Higher is better:
  
     module        |     encode |     decode |
     --------------|------------|------------|
     JSON::DWIW/DS |  86302.551 | 102300.098 |
     JSON::DWIW/FJ |  86302.551 |  75983.768 |
     JSON::PP      |  15827.562 |   6638.658 |



( run in 0.890 second using v1.01-cache-2.11-cpan-98e64b0badf )