1NAME
2 Cpanel::JSON::XS - cPanel fork of JSON::XS, fast and correct serializing
3
4SYNOPSIS
5 use Cpanel::JSON::XS;
6
7 # exported functions, they croak on error
8 # and expect/generate UTF-8
9
10 $utf8_encoded_json_text = encode_json $perl_hash_or_arrayref;
11 $perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;
12
13 # OO-interface
14
15 $coder = Cpanel::JSON::XS->new->ascii->pretty->allow_nonref;
16 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
17 $perl_scalar = $coder->decode ($unicode_json_text);
18
19 # Note that 5.6 misses most smart utf8 and encoding functionalities
20 # of newer releases.
21
22 # Note that L<JSON::MaybeXS> will automatically use Cpanel::JSON::XS
23 # if available, at virtually no speed overhead either, so you should
24 # be able to just:
25
26 use JSON::MaybeXS;
27
28 # and do the same things, except that you have a pure-perl fallback now.
29
30 Note that this module will be replaced by a new JSON::Safe module soon,
31 with the same API just guaranteed safe defaults.
32
33DESCRIPTION
34 This module converts Perl data structures to JSON and vice versa. Its
35 primary goal is to be *correct* and its secondary goal is to be *fast*.
36 To reach the latter goal it was written in C.
37
38 As this is the n-th-something JSON module on CPAN, what was the reason
39 to write yet another JSON module? While it seems there are many JSON
40 modules, none of them correctly handle all corner cases, and in most
41 cases their maintainers are unresponsive, gone missing, or not listening
42 to bug reports for other reasons.
43
44 See below for the cPanel fork.
45
46 See MAPPING, below, on how Cpanel::JSON::XS maps perl values to JSON
47 values and vice versa.
48
49 FEATURES
50 * correct Unicode handling
51
52 This module knows how to handle Unicode with Perl version higher
53 than 5.8.5, documents how and when it does so, and even documents
54 what "correct" means.
55
56 * round-trip integrity
57
58 When you serialize a perl data structure using only data types
59 supported by JSON and Perl, the deserialized data structure is
60 identical on the Perl level. (e.g. the string "2.0" doesn't suddenly
61 become "2" just because it looks like a number). There *are* minor
62 exceptions to this, read the MAPPING section below to learn about
63 those.
64
65 * strict checking of JSON correctness
66
67 There is no guessing, no generating of illegal JSON texts by
68 default, and only JSON is accepted as input by default. the latter
69 is a security feature.
70
71 * fast
72
73 Compared to other JSON modules and other serializers such as
74 Storable, this module usually compares favourably in terms of speed,
75 too.
76
77 * simple to use
78
79 This module has both a simple functional interface as well as an
80 object oriented interface.
81
82 * reasonably versatile output formats
83
84 You can choose between the most compact guaranteed-single-line
85 format possible (nice for simple line-based protocols), a pure-ASCII
86 format (for when your transport is not 8-bit clean, still supports
87 the whole Unicode range), or a pretty-printed format (for when you
88 want to read that stuff). Or you can combine those features in
89 whatever way you like.
90
91 cPanel fork
92 Since the original author MLEHMANN has no public bugtracker, this cPanel
93 fork sits now on github.
94
95 src repo: <https://github.com/rurban/Cpanel-JSON-XS> original:
96 <http://cvs.schmorp.de/JSON-XS/>
97
98 RT: <https://github.com/rurban/Cpanel-JSON-XS/issues> or
99 <https://rt.cpan.org/Public/Dist/Display.html?Queue=Cpanel-JSON-XS>
100
101 Changes to JSON::XS
102
103 - stricter decode_json() as documented. non-refs are disallowed. added a
104 2nd optional argument. decode() honors now allow_nonref.
105
106 - fixed encode of numbers for dual-vars. Different string
107 representations are preserved, but numbers with temporary strings which
108 represent the same number are here treated as numbers, not strings.
109 Cpanel::JSON::XS is a bit slower, but preserves numeric types better.
110
111 - numbers ending with .0 stray numbers, are not converted to integers.
112 [#63] dual-vars which are represented as number not integer (42+"bar" !=
113 5.8.9) are now encoded as number (=> 42.0) because internally it's now a
114 NOK type. However !!1 which is wrongly encoded in 5.8 as "1"/1.0 is
115 still represented as integer.
116
117 - different handling of inf/nan. Default now to null, optionally with
118 stringify_infnan() to "inf"/"nan". [#28, #32]
119
120 - added "binary" extension, non-JSON and non JSON parsable, allows
121 "\xNN" and "\NNN" sequences.
122
123 - 5.6.2 support; sacrificing some utf8 features (assuming bytes
124 all-over), no multi-byte unicode characters with 5.6.
125
126 - interop for true/false overloading. JSON::XS, JSON::PP and Mojo::JSON
127 representations for booleans are accepted and JSON::XS accepts
128 Cpanel::JSON::XS booleans [#13, #37] Fixed overloading of booleans.
129 Cpanel::JSON::XS::true stringifies again to "1", not "true", analog to
130 all other JSON modules.
131
132 - native boolean mapping of yes and no to true and false, as in
133 YAML::XS. In perl "!0" is yes, "!1" is no. The JSON value true maps to
134 1, false maps to 0. [#39]
135
136 - support arbitrary stringification with encode, with convert_blessed
137 and allow_blessed.
138
139 - ithread support. Cpanel::JSON::XS is thread-safe, JSON::XS not
140
141 - is_bool can be called as method, JSON::XS::is_bool not.
142
143 - performance optimizations for threaded Perls
144
145 - relaxed mode, allowing many popular extensions
146
147 - additional fixes for:
148
149 - [cpan #88061] AIX atof without USE_LONG_DOUBLE
150
151 - #10 unshare_hek crash
152
153 - #7, #29 avoid re-blessing where possible. It fails in JSON::XS for
154 READONLY values, i.e. restricted hashes.
155
156 - #41 overloading of booleans, use the object not the reference.
157
158 - #62 -Dusequadmath conversion and no SEGV.
159
160 - #72 parsing of values followed \0, like 1\0 does fail.
161
162 - #72 parsing of illegal unicode or non-unicode characters.
163
164 - #96 locale-insensitive numeric conversion.
165
166 - #154 numeric conversion fixed since 5.22, using the same strtold as perl5.
167
168 - #167 sort tied hashes with canonical.
169
170 - public maintenance and bugtracker
171
172 - use ppport.h, sanify XS.xs comment styles, harness C coding style
173
174 - common::sense is optional. When available it is not used in the
175 published production module, just during development and testing.
176
177 - extended testsuite, passes all http://seriot.ch/parsing_json.html
178 tests. In fact it is the only know JSON decoder which does so, while
179 also being the fastest.
180
181 - support many more options and methods from JSON::PP: stringify_infnan,
182 allow_unknown, allow_stringify, allow_barekey, encode_stringify,
183 allow_bignum, allow_singlequote, sort_by (partially), escape_slash,
184 convert_blessed, ... optional decode_json(, allow_nonref) arg. relaxed
185 implements allow_dupkeys.
186
187 - support all 5 unicode BOM's: UTF-8, UTF-16LE, UTF-16BE, UTF-32LE,
188 UTF-32BE, encoding internally to UTF-8.
189
190FUNCTIONAL INTERFACE
191 The following convenience methods are provided by this module. They are
192 exported by default:
193
194 $json_text = encode_json $perl_scalar, [json_type]
195 Converts the given Perl data structure to a UTF-8 encoded, binary
196 string (that is, the string contains octets only). Croaks on error.
197
198 This function call is functionally identical to:
199
200 $json_text = Cpanel::JSON::XS->new->utf8->encode ($perl_scalar, $json_type)
201
202 Except being faster.
203
204 For the type argument see Cpanel::JSON::XS::Type.
205
206 $perl_scalar = decode_json $json_text [, $allow_nonref [, my $json_type
207 ] ]
208 The opposite of "encode_json": expects an UTF-8 (binary) string of
209 an json reference and tries to parse that as an UTF-8 encoded JSON
210 text, returning the resulting reference. Croaks on error.
211
212 This function call is functionally identical to:
213
214 $perl_scalar = Cpanel::JSON::XS->new->utf8->decode ($json_text, $json_type)
215
216 except being faster.
217
218 Note that older decode_json versions in Cpanel::JSON::XS older than
219 3.0116 and JSON::XS did not set allow_nonref but allowed them due to
220 a bug in the decoder.
221
222 If the new 2nd optional $allow_nonref argument is set and not false,
223 the "allow_nonref" option will be set and the function will act is
224 described as in the relaxed RFC 7159 allowing all values such as
225 objects, arrays, strings, numbers, "null", "true", and "false". See
226 ""OLD" VS. "NEW" JSON (RFC 4627 VS. RFC 7159)" below, why you don't
227 want to do that.
228
229 For the 3rd optional type argument see Cpanel::JSON::XS::Type.
230
231 $is_boolean = Cpanel::JSON::XS::is_bool $scalar
232 Returns true if the passed scalar represents either "JSON::PP::true"
233 or "JSON::PP::false", two constants that act like 1 and 0,
234 respectively and are used to represent JSON "true" and "false"
235 values in Perl. (Also recognizes the booleans produced by JSON::XS.)
236
237 See MAPPING, below, for more information on how JSON values are
238 mapped to Perl.
239
240DEPRECATED FUNCTIONS
241 from_json
242 from_json has been renamed to decode_json
243
244 to_json
245 to_json has been renamed to encode_json
246
247A FEW NOTES ON UNICODE AND PERL
248 Since this often leads to confusion, here are a few very clear words on
249 how Unicode works in Perl, modulo bugs.
250
251 1. Perl strings can store characters with ordinal values > 255.
252 This enables you to store Unicode characters as single characters in
253 a Perl string - very natural.
254
255 2. Perl does *not* associate an encoding with your strings.
256 ... until you force it to, e.g. when matching it against a regex, or
257 printing the scalar to a file, in which case Perl either interprets
258 your string as locale-encoded text, octets/binary, or as Unicode,
259 depending on various settings. In no case is an encoding stored
260 together with your data, it is *use* that decides encoding, not any
261 magical meta data.
262
263 3. The internal utf-8 flag has no meaning with regards to the encoding
264 of your string.
265 4. A "Unicode String" is simply a string where each character can be
266 validly interpreted as a Unicode code point.
267 If you have UTF-8 encoded data, it is no longer a Unicode string,
268 but a Unicode string encoded in UTF-8, giving you a binary string.
269
270 5. A string containing "high" (> 255) character values is *not* a UTF-8
271 string.
272 6. Unicode noncharacters only warn, as in core.
273 The 66 Unicode noncharacters U+FDD0..U+FDEF, and U+*FFFE, U+*FFFF
274 just warn, see <http://www.unicode.org/versions/corrigendum9.html>.
275 But illegal surrogate pairs fail to parse.
276
277 7. Raw non-Unicode characters above U+10FFFF are disallowed.
278 Raw non-Unicode characters outside the valid unicode range fail to
279 parse, because "A string is a sequence of zero or more Unicode
280 characters" RFC 7159 section 1 and "JSON text SHALL be encoded in
281 Unicode RFC 7159 section 8.1. We use now the UTF8_DISALLOW_SUPER
282 flag when parsing unicode.
283
284 I hope this helps :)
285
286OBJECT-ORIENTED INTERFACE
287 The object oriented interface lets you configure your own encoding or
288 decoding style, within the limits of supported formats.
289
290 $json = new Cpanel::JSON::XS
291 Creates a new JSON object that can be used to de/encode JSON
292 strings. All boolean flags described below are by default
293 *disabled*.
294
295 The mutators for flags all return the JSON object again and thus
296 calls can be chained:
297
298 my $json = Cpanel::JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
299 => {"a": [1, 2]}
300
301 $json = $json->ascii ([$enable])
302 $enabled = $json->get_ascii
303 If $enable is true (or missing), then the "encode" method will not
304 generate characters outside the code range 0..127 (which is ASCII).
305 Any Unicode characters outside that range will be escaped using
306 either a single "\uXXXX" (BMP characters) or a double
307 "\uHHHH\uLLLLL" escape sequence, as per RFC4627. The resulting
308 encoded JSON text can be treated as a native Unicode string, an
309 ascii-encoded, latin1-encoded or UTF-8 encoded string, or any other
310 superset of ASCII.
311
312 If $enable is false, then the "encode" method will not escape
313 Unicode characters unless required by the JSON syntax or other
314 flags. This results in a faster and more compact format.
315
316 See also the section *ENCODING/CODESET FLAG NOTES* later in this
317 document.
318
319 The main use for this flag is to produce JSON texts that can be
320 transmitted over a 7-bit channel, as the encoded JSON texts will not
321 contain any 8 bit characters.
322
323 Cpanel::JSON::XS->new->ascii (1)->encode ([chr 0x10401])
324 => ["\ud801\udc01"]
325
326 $json = $json->latin1 ([$enable])
327 $enabled = $json->get_latin1
328 If $enable is true (or missing), then the "encode" method will
329 encode the resulting JSON text as latin1 (or ISO-8859-1), escaping
330 any characters outside the code range 0..255. The resulting string
331 can be treated as a latin1-encoded JSON text or a native Unicode
332 string. The "decode" method will not be affected in any way by this
333 flag, as "decode" by default expects Unicode, which is a strict
334 superset of latin1.
335
336 If $enable is false, then the "encode" method will not escape
337 Unicode characters unless required by the JSON syntax or other
338 flags.
339
340 See also the section *ENCODING/CODESET FLAG NOTES* later in this
341 document.
342
343 The main use for this flag is efficiently encoding binary data as
344 JSON text, as most octets will not be escaped, resulting in a
345 smaller encoded size. The disadvantage is that the resulting JSON
346 text is encoded in latin1 (and must correctly be treated as such
347 when storing and transferring), a rare encoding for JSON. It is
348 therefore most useful when you want to store data structures known
349 to contain binary data efficiently in files or databases, not when
350 talking to other JSON encoders/decoders.
351
352 Cpanel::JSON::XS->new->latin1->encode (["\x{89}\x{abc}"]
353 => ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not)
354
355 $json = $json->binary ([$enable])
356 $enabled = $json = $json->get_binary
357 If the $enable argument is true (or missing), then the "encode"
358 method will not try to detect an UTF-8 encoding in any JSON string,
359 it will strictly interpret it as byte sequence. The result might
360 contain new "\xNN" sequences, which is unparsable JSON. The "decode"
361 method forbids "\uNNNN" sequences and accepts "\xNN" and octal
362 "\NNN" sequences.
363
364 There is also a special logic for perl 5.6 and utf8. 5.6 encodes any
365 string to utf-8 automatically when seeing a codepoint >= 0x80 and <
366 0x100. With the binary flag enabled decode the perl utf8 encoded
367 string to the original byte encoding and encode this with "\xNN"
368 escapes. This will result to the same encodings as with newer perls.
369 But note that binary multi-byte codepoints with 5.6 will result in
370 "illegal unicode character in binary string" errors, unlike with
371 newer perls.
372
373 If $enable is false, then the "encode" method will smartly try to
374 detect Unicode characters unless required by the JSON syntax or
375 other flags and hex and octal sequences are forbidden.
376
377 See also the section *ENCODING/CODESET FLAG NOTES* later in this
378 document.
379
380 The main use for this flag is to avoid the smart unicode detection
381 and possible double encoding. The disadvantage is that the resulting
382 JSON text is encoded in new "\xNN" and in latin1 characters and must
383 correctly be treated as such when storing and transferring, a rare
384 encoding for JSON. It will produce non-readable JSON strings in the
385 browser. It is therefore most useful when you want to store data
386 structures known to contain binary data efficiently in files or
387 databases, not when talking to other JSON encoders/decoders. The
388 binary decoding method can also be used when an encoder produced a
389 non-JSON conformant hex or octal encoding "\xNN" or "\NNN".
390
391 Cpanel::JSON::XS->new->binary->encode (["\x{89}\x{abc}"])
392 5.6: Error: malformed or illegal unicode character in binary string
393 >=5.8: ['\x89\xe0\xaa\xbc']
394
395 Cpanel::JSON::XS->new->binary->encode (["\x{89}\x{bc}"])
396 => ["\x89\xbc"]
397
398 Cpanel::JSON::XS->new->binary->decode (["\x89\ua001"])
399 Error: malformed or illegal unicode character in binary string
400
401 Cpanel::JSON::XS->new->decode (["\x89"])
402 Error: illegal hex character in non-binary string
403
404 $json = $json->utf8 ([$enable])
405 $enabled = $json->get_utf8
406 If $enable is true (or missing), then the "encode" method will
407 encode the JSON result into UTF-8, as required by many protocols,
408 while the "decode" method expects to be handled an UTF-8-encoded
409 string. Please note that UTF-8-encoded strings do not contain any
410 characters outside the range 0..255, they are thus useful for
411 bytewise/binary I/O. In future versions, enabling this option might
412 enable autodetection of the UTF-16 and UTF-32 encoding families, as
413 described in RFC4627.
414
415 If $enable is false, then the "encode" method will return the JSON
416 string as a (non-encoded) Unicode string, while "decode" expects
417 thus a Unicode string. Any decoding or encoding (e.g. to UTF-8 or
418 UTF-16) needs to be done yourself, e.g. using the Encode module.
419
420 See also the section *ENCODING/CODESET FLAG NOTES* later in this
421 document.
422
423 Example, output UTF-16BE-encoded JSON:
424
425 use Encode;
426 $jsontext = encode "UTF-16BE", Cpanel::JSON::XS->new->encode ($object);
427
428 Example, decode UTF-32LE-encoded JSON:
429
430 use Encode;
431 $object = Cpanel::JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
432
433 $json = $json->pretty ([$enable])
434 This enables (or disables) all of the "indent", "space_before" and
435 "space_after" (and in the future possibly more) flags in one call to
436 generate the most readable (or most compact) form possible.
437
438 Example, pretty-print some simple structure:
439
440 my $json = Cpanel::JSON::XS->new->pretty(1)->encode ({a => [1,2]})
441 =>
442 {
443 "a" : [
444 1,
445 2
446 ]
447 }
448
449 $json = $json->indent ([$enable])
450 $enabled = $json->get_indent
451 If $enable is true (or missing), then the "encode" method will use a
452 multiline format as output, putting every array member or
453 object/hash key-value pair into its own line, indenting them
454 properly.
455
456 If $enable is false, no newlines or indenting will be produced, and
457 the resulting JSON text is guaranteed not to contain any "newlines".
458
459 This setting has no effect when decoding JSON texts.
460
461 $json = $json->indent_length([$number_of_spaces])
462 $length = $json->get_indent_length()
463 Set the indent length (default 3). This option is only useful when
464 you also enable indent or pretty. The acceptable range is from 0 (no
465 indentation) to 15
466
467 $json = $json->space_before ([$enable])
468 $enabled = $json->get_space_before
469 If $enable is true (or missing), then the "encode" method will add
470 an extra optional space before the ":" separating keys from values
471 in JSON objects.
472
473 If $enable is false, then the "encode" method will not add any extra
474 space at those places.
475
476 This setting has no effect when decoding JSON texts. You will also
477 most likely combine this setting with "space_after".
478
479 Example, space_before enabled, space_after and indent disabled:
480
481 {"key" :"value"}
482
483 $json = $json->space_after ([$enable])
484 $enabled = $json->get_space_after
485 If $enable is true (or missing), then the "encode" method will add
486 an extra optional space after the ":" separating keys from values in
487 JSON objects and extra whitespace after the "," separating key-value
488 pairs and array members.
489
490 If $enable is false, then the "encode" method will not add any extra
491 space at those places.
492
493 This setting has no effect when decoding JSON texts.
494
495 Example, space_before and indent disabled, space_after enabled:
496
497 {"key": "value"}
498
499 $json = $json->relaxed ([$enable])
500 $enabled = $json->get_relaxed
501 If $enable is true (or missing), then "decode" will accept some
502 extensions to normal JSON syntax (see below). "encode" will not be
503 affected in anyway. *Be aware that this option makes you accept
504 invalid JSON texts as if they were valid!*. I suggest only to use
505 this option to parse application-specific files written by humans
506 (configuration files, resource files etc.)
507
508 If $enable is false (the default), then "decode" will only accept
509 valid JSON texts.
510
511 Currently accepted extensions are:
512
513 * list items can have an end-comma
514
515 JSON *separates* array elements and key-value pairs with commas.
516 This can be annoying if you write JSON texts manually and want
517 to be able to quickly append elements, so this extension accepts
518 comma at the end of such items not just between them:
519
520 [
521 1,
522 2, <- this comma not normally allowed
523 ]
524 {
525 "k1": "v1",
526 "k2": "v2", <- this comma not normally allowed
527 }
528
529 * shell-style '#'-comments
530
531 Whenever JSON allows whitespace, shell-style comments are
532 additionally allowed. They are terminated by the first
533 carriage-return or line-feed character, after which more
534 white-space and comments are allowed.
535
536 [
537 1, # this comment not allowed in JSON
538 # neither this one...
539 ]
540
541 * literal ASCII TAB characters in strings
542
543 Literal ASCII TAB characters are now allowed in strings (and
544 treated as "\t") in relaxed mode. Despite JSON mandates, that
545 TAB character is substituted for "\t" sequence.
546
547 [
548 "Hello\tWorld",
549 "Hello<TAB>World", # literal <TAB> would not normally be allowed
550 ]
551
552 * allow_singlequote
553
554 Single quotes are accepted instead of double quotes. See the
555 "allow_singlequote" option.
556
557 { "foo":'bar' }
558 { 'foo':"bar" }
559 { 'foo':'bar' }
560
561 * allow_barekey
562
563 Accept unquoted object keys instead of with mandatory double
564 quotes. See the "allow_barekey" option.
565
566 { foo:"bar" }
567
568 * allow_dupkeys
569
570 Allow decoding of duplicate keys in hashes. By default duplicate
571 keys are forbidden. See <http://seriot.ch/parsing_json.php#24>:
572 RFC 7159 section 4: "The names within an object should be
573 unique." See the "allow_dupkeys" option.
574
575 $json = $json->canonical ([$enable])
576 $enabled = $json->get_canonical
577 If $enable is true (or missing), then the "encode" method will
578 output JSON objects by sorting their keys. This is adding a
579 comparatively high overhead.
580
581 If $enable is false, then the "encode" method will output key-value
582 pairs in the order Perl stores them (which will likely change
583 between runs of the same script, and can change even within the same
584 run from 5.18 onwards).
585
586 This option is useful if you want the same data structure to be
587 encoded as the same JSON text (given the same overall settings). If
588 it is disabled, the same hash might be encoded differently even if
589 contains the same data, as key-value pairs have no inherent ordering
590 in Perl.
591
592 This setting has no effect when decoding JSON texts.
593
594 This is now also done with tied hashes, contrary to JSON::XS. But
595 note that with most large tied hashes stored as tree it is advised
596 to sort the iterator already and don't sort the hash output here.
597 Most such iterators are already sorted, as such e.g. DB_File with
598 "DB_BTREE".
599
600 $json = $json->sort_by (undef, 0, 1 or a block)
601 This currently only (un)sets the "canonical" option, and ignores
602 custom sort blocks.
603
604 This setting has no effect when decoding JSON texts.
605
606 This setting has currently no effect on tied hashes.
607
608 $json = $json->escape_slash ([$enable])
609 $enabled = $json->get_escape_slash
610 According to the JSON Grammar, the *forward slash* character
611 (U+002F) "/" need to be escaped. But by default strings are encoded
612 without escaping slashes in all perl JSON encoders.
613
614 If $enable is true (or missing), then "encode" will escape slashes,
615 "\/".
616
617 This setting has no effect when decoding JSON texts.
618
619 $json = $json->unblessed_bool ([$enable])
620 $enabled = $json->get_unblessed_bool
621 $json = $json->unblessed_bool([$enable])
622
623 If $enable is true (or missing), then "decode" will return Perl
624 non-object boolean variables (1 and 0) for JSON booleans ("true" and
625 "false"). If $enable is false, then "decode" will return
626 "JSON::PP::Boolean" objects for JSON booleans.
627
628 $json = $json->allow_singlequote ([$enable])
629 $enabled = $json->get_allow_singlequote
630 $json = $json->allow_singlequote([$enable])
631
632 If $enable is true (or missing), then "decode" will accept JSON
633 strings quoted by single quotations that are invalid JSON format.
634
635 $json->allow_singlequote->decode({"foo":'bar'});
636 $json->allow_singlequote->decode({'foo':"bar"});
637 $json->allow_singlequote->decode({'foo':'bar'});
638
639 This is also enabled with "relaxed". As same as the "relaxed"
640 option, this option may be used to parse application-specific files
641 written by humans.
642
643 $json = $json->allow_barekey ([$enable])
644 $enabled = $json->get_allow_barekey
645 $json = $json->allow_barekey([$enable])
646
647 If $enable is true (or missing), then "decode" will accept bare keys
648 of JSON object that are invalid JSON format.
649
650 Same as with the "relaxed" option, this option may be used to parse
651 application-specific files written by humans.
652
653 $json->allow_barekey->decode('{foo:"bar"}');
654
655 $json = $json->allow_bignum ([$enable])
656 $enabled = $json->get_allow_bignum
657 $json = $json->allow_bignum([$enable])
658
659 If $enable is true (or missing), then "decode" will convert the big
660 integer Perl cannot handle as integer into a Math::BigInt object and
661 convert a floating number (any) into a Math::BigFloat.
662
663 On the contrary, "encode" converts "Math::BigInt" objects and
664 "Math::BigFloat" objects into JSON numbers with "allow_blessed"
665 enable.
666
667 $json->allow_nonref->allow_blessed->allow_bignum;
668 $bigfloat = $json->decode('2.000000000000000000000000001');
669 print $json->encode($bigfloat);
670 # => 2.000000000000000000000000001
671
672 See "MAPPING" about the normal conversion of JSON number.
673
674 $json = $json->allow_bigint ([$enable])
675 This option is obsolete and replaced by allow_bignum.
676
677 $json = $json->allow_nonref ([$enable])
678 $enabled = $json->get_allow_nonref
679 If $enable is true (or missing), then the "encode" method can
680 convert a non-reference into its corresponding string, number or
681 null JSON value, which is an extension to RFC4627. Likewise,
682 "decode" will accept those JSON values instead of croaking.
683
684 If $enable is false, then the "encode" method will croak if it isn't
685 passed an arrayref or hashref, as JSON texts must either be an
686 object or array. Likewise, "decode" will croak if given something
687 that is not a JSON object or array.
688
689 Example, encode a Perl scalar as JSON value with enabled
690 "allow_nonref", resulting in an invalid JSON text:
691
692 Cpanel::JSON::XS->new->allow_nonref->encode ("Hello, World!")
693 => "Hello, World!"
694
695 $json = $json->allow_unknown ([$enable])
696 $enabled = $json->get_allow_unknown
697 If $enable is true (or missing), then "encode" will *not* throw an
698 exception when it encounters values it cannot represent in JSON (for
699 example, filehandles) but instead will encode a JSON "null" value.
700 Note that blessed objects are not included here and are handled
701 separately by c<allow_nonref>.
702
703 If $enable is false (the default), then "encode" will throw an
704 exception when it encounters anything it cannot encode as JSON.
705
706 This option does not affect "decode" in any way, and it is
707 recommended to leave it off unless you know your communications
708 partner.
709
710 $json = $json->allow_stringify ([$enable])
711 $enabled = $json->get_allow_stringify
712 If $enable is true (or missing), then "encode" will stringify the
713 non-object perl value or reference. Note that blessed objects are
714 not included here and are handled separately by "allow_blessed" and
715 "convert_blessed". String references are stringified to the string
716 value, other references as in perl.
717
718 This option does not affect "decode" in any way.
719
720 This option is special to this module, it is not supported by other
721 encoders. So it is not recommended to use it.
722
723 $json = $json->require_types ([$enable])
724 $enable = $json->get_require_types
725 $json = $json->require_types([$enable])
726
727 If $enable is true (or missing), then "encode" will require either
728 enabled "type_all_string" or second argument with supplied JSON
729 types. See Cpanel::JSON::XS::Type. When "type_all_string" is not
730 enabled or second argument is not provided (or is undef), then
731 "encode" croaks. It also croaks when the type for provided structure
732 in "encode" is incomplete.
733
734 $json = $json->type_all_string ([$enable])
735 $enable = $json->get_type_all_string
736 $json = $json->type_all_string([$enable])
737
738 If $enable is true (or missing), then "encode" will always produce
739 stable deterministic JSON string types in resulted output.
740
741 When $enable is false, then result of encoded JSON output may be
742 different for different Perl versions and may depends on loaded
743 modules.
744
745 This is useful it you need deterministic JSON types, independently
746 of used Perl version and other modules, but do not want to write
747 complicated type definitions for Cpanel::JSON::XS::Type.
748
749 $json = $json->allow_dupkeys ([$enable])
750 $enabled = $json->get_allow_dupkeys
751 If $enable is true (or missing), then the "decode" method will not
752 die when it encounters duplicate keys in a hash. "allow_dupkeys" is
753 also enabled in the "relaxed" mode.
754
755 The JSON spec allows duplicate name in objects but recommends to
756 disable it, however with Perl hashes they are impossible, parsing
757 JSON in Perl silently ignores duplicate names, using the last value
758 found.
759
760 See <http://seriot.ch/parsing_json.php#24>: RFC 7159 section 4: "The
761 names within an object should be unique."
762
763 $json = $json->allow_blessed ([$enable])
764 $enabled = $json->get_allow_blessed
765 If $enable is true (or missing), then the "encode" method will not
766 barf when it encounters a blessed reference. Instead, the value of
767 the convert_blessed option will decide whether "null"
768 ("convert_blessed" disabled or no "TO_JSON" method found) or a
769 representation of the object ("convert_blessed" enabled and
770 "TO_JSON" method found) is being encoded. Has no effect on "decode".
771
772 If $enable is false (the default), then "encode" will throw an
773 exception when it encounters a blessed object without
774 "convert_blessed" and a "TO_JSON" method.
775
776 This setting has no effect on "decode".
777
778 $json = $json->convert_blessed ([$enable])
779 $enabled = $json->get_convert_blessed
780 If $enable is true (or missing), then "encode", upon encountering a
781 blessed object, will check for the availability of the "TO_JSON"
782 method on the object's class. If found, it will be called in scalar
783 context and the resulting scalar will be encoded instead of the
784 object. If no "TO_JSON" method is found, a stringification overload
785 method is tried next. If both are not found, the value of
786 "allow_blessed" will decide what to do.
787
788 The "TO_JSON" method may safely call die if it wants. If "TO_JSON"
789 returns other blessed objects, those will be handled in the same
790 way. "TO_JSON" must take care of not causing an endless recursion
791 cycle (== crash) in this case. The same care must be taken with
792 calling encode in stringify overloads (even if this works by luck in
793 older perls) or other callbacks. The name of "TO_JSON" was chosen
794 because other methods called by the Perl core (== not by the user of
795 the object) are usually in upper case letters and to avoid
796 collisions with any "to_json" function or method.
797
798 If $enable is false (the default), then "encode" will not consider
799 this type of conversion.
800
801 This setting has no effect on "decode".
802
803 $json = $json->allow_tags ([$enable])
804 $enabled = $json->get_allow_tags
805 See "OBJECT SERIALIZATION" for details.
806
807 If $enable is true (or missing), then "encode", upon encountering a
808 blessed object, will check for the availability of the "FREEZE"
809 method on the object's class. If found, it will be used to serialize
810 the object into a nonstandard tagged JSON value (that JSON decoders
811 cannot decode).
812
813 It also causes "decode" to parse such tagged JSON values and
814 deserialize them via a call to the "THAW" method.
815
816 If $enable is false (the default), then "encode" will not consider
817 this type of conversion, and tagged JSON values will cause a parse
818 error in "decode", as if tags were not part of the grammar.
819
820 $json = $json->filter_json_object ([$coderef->($hashref)])
821 When $coderef is specified, it will be called from "decode" each
822 time it decodes a JSON object. The only argument is a reference to
823 the newly-created hash. If the code references returns a single
824 scalar (which need not be a reference), this value (i.e. a copy of
825 that scalar to avoid aliasing) is inserted into the deserialized
826 data structure. If it returns an empty list (NOTE: *not* "undef",
827 which is a valid scalar), the original deserialized hash will be
828 inserted. This setting can slow down decoding considerably.
829
830 When $coderef is omitted or undefined, any existing callback will be
831 removed and "decode" will not change the deserialized hash in any
832 way.
833
834 Example, convert all JSON objects into the integer 5:
835
836 my $js = Cpanel::JSON::XS->new->filter_json_object (sub { 5 });
837 # returns [5]
838 $js->decode ('[{}]')
839 # throw an exception because allow_nonref is not enabled
840 # so a lone 5 is not allowed.
841 $js->decode ('{"a":1, "b":2}');
842
843 $json = $json->filter_json_single_key_object ($key [=>
844 $coderef->($value)])
845 Works remotely similar to "filter_json_object", but is only called
846 for JSON objects having a single key named $key.
847
848 This $coderef is called before the one specified via
849 "filter_json_object", if any. It gets passed the single value in the
850 JSON object. If it returns a single value, it will be inserted into
851 the data structure. If it returns nothing (not even "undef" but the
852 empty list), the callback from "filter_json_object" will be called
853 next, as if no single-key callback were specified.
854
855 If $coderef is omitted or undefined, the corresponding callback will
856 be disabled. There can only ever be one callback for a given key.
857
858 As this callback gets called less often then the
859 "filter_json_object" one, decoding speed will not usually suffer as
860 much. Therefore, single-key objects make excellent targets to
861 serialize Perl objects into, especially as single-key JSON objects
862 are as close to the type-tagged value concept as JSON gets (it's
863 basically an ID/VALUE tuple). Of course, JSON does not support this
864 in any way, so you need to make sure your data never looks like a
865 serialized Perl hash.
866
867 Typical names for the single object key are "__class_whatever__", or
868 "$__dollars_are_rarely_used__$" or "}ugly_brace_placement", or even
869 things like "__class_md5sum(classname)__", to reduce the risk of
870 clashing with real hashes.
871
872 Example, decode JSON objects of the form "{ "__widget__" => <id> }"
873 into the corresponding $WIDGET{<id>} object:
874
875 # return whatever is in $WIDGET{5}:
876 Cpanel::JSON::XS
877 ->new
878 ->filter_json_single_key_object (__widget__ => sub {
879 $WIDGET{ $_[0] }
880 })
881 ->decode ('{"__widget__": 5')
882
883 # this can be used with a TO_JSON method in some "widget" class
884 # for serialization to json:
885 sub WidgetBase::TO_JSON {
886 my ($self) = @_;
887
888 unless ($self->{id}) {
889 $self->{id} = ..get..some..id..;
890 $WIDGET{$self->{id}} = $self;
891 }
892
893 { __widget__ => $self->{id} }
894 }
895
896 $json = $json->shrink ([$enable])
897 $enabled = $json->get_shrink
898 Perl usually over-allocates memory a bit when allocating space for
899 strings. This flag optionally resizes strings generated by either
900 "encode" or "decode" to their minimum size possible. This can save
901 memory when your JSON texts are either very very long or you have
902 many short strings. It will also try to downgrade any strings to
903 octet-form if possible: perl stores strings internally either in an
904 encoding called UTF-X or in octet-form. The latter cannot store
905 everything but uses less space in general (and some buggy Perl or C
906 code might even rely on that internal representation being used).
907
908 The actual definition of what shrink does might change in future
909 versions, but it will always try to save space at the expense of
910 time.
911
912 If $enable is true (or missing), the string returned by "encode"
913 will be shrunk-to-fit, while all strings generated by "decode" will
914 also be shrunk-to-fit.
915
916 If $enable is false, then the normal perl allocation algorithms are
917 used. If you work with your data, then this is likely to be faster.
918
919 In the future, this setting might control other things, such as
920 converting strings that look like integers or floats into integers
921 or floats internally (there is no difference on the Perl level),
922 saving space.
923
924 $json = $json->max_depth ([$maximum_nesting_depth])
925 $max_depth = $json->get_max_depth
926 Sets the maximum nesting level (default 512) accepted while encoding
927 or decoding. If a higher nesting level is detected in JSON text or a
928 Perl data structure, then the encoder and decoder will stop and
929 croak at that point.
930
931 Nesting level is defined by number of hash- or arrayrefs that the
932 encoder needs to traverse to reach a given point or the number of
933 "{" or "[" characters without their matching closing parenthesis
934 crossed to reach a given character in a string.
935
936 Setting the maximum depth to one disallows any nesting, so that
937 ensures that the object is only a single hash/object or array.
938
939 If no argument is given, the highest possible setting will be used,
940 which is rarely useful.
941
942 Note that nesting is implemented by recursion in C. The default
943 value has been chosen to be as large as typical operating systems
944 allow without crashing.
945
946 See "SECURITY CONSIDERATIONS", below, for more info on why this is
947 useful.
948
949 $json = $json->max_size ([$maximum_string_size])
950 $max_size = $json->get_max_size
951 Set the maximum length a JSON text may have (in bytes) where
952 decoding is being attempted. The default is 0, meaning no limit.
953 When "decode" is called on a string that is longer then this many
954 bytes, it will not attempt to decode the string but throw an
955 exception. This setting has no effect on "encode" (yet).
956
957 If no argument is given, the limit check will be deactivated (same
958 as when 0 is specified).
959
960 See "SECURITY CONSIDERATIONS", below, for more info on why this is
961 useful.
962
963 $json->stringify_infnan ([$infnan_mode = 1])
964 $infnan_mode = $json->get_stringify_infnan
965 Get or set how Cpanel::JSON::XS encodes "inf", "-inf" or "nan" for
966 numeric values. Also qnan, snan or negative nan on some platforms.
967
968 "null": infnan_mode = 0. Similar to most JSON modules in other
969 languages. Always null.
970
971 stringified: infnan_mode = 1. As in Mojo::JSON. Platform specific
972 strings. Stringified via sprintf(%g), with double quotes.
973
974 inf/nan: infnan_mode = 2. As in JSON::XS, and older releases. Passes
975 through platform dependent values, invalid JSON. Stringified via
976 sprintf(%g), but without double quotes.
977
978 "inf/-inf/nan": infnan_mode = 3. Platform independent inf/nan/-inf
979 strings. No QNAN/SNAN/negative NAN support, unified to "nan". Much
980 easier to detect, but may conflict with valid strings.
981
982 $json_text = $json->encode ($perl_scalar, $json_type)
983 Converts the given Perl data structure (a simple scalar or a
984 reference to a hash or array) to its JSON representation. Simple
985 scalars will be converted into JSON string or number sequences,
986 while references to arrays become JSON arrays and references to
987 hashes become JSON objects. Undefined Perl values (e.g. "undef")
988 become JSON "null" values. Neither "true" nor "false" values will be
989 generated.
990
991 For the type argument see Cpanel::JSON::XS::Type.
992
993 $perl_scalar = $json->decode ($json_text, my $json_type)
994 The opposite of "encode": expects a JSON text and tries to parse it,
995 returning the resulting simple scalar or reference. Croaks on error.
996
997 JSON numbers and strings become simple Perl scalars. JSON arrays
998 become Perl arrayrefs and JSON objects become Perl hashrefs. "true"
999 becomes 1, "false" becomes 0 and "null" becomes "undef".
1000
1001 For the type argument see Cpanel::JSON::XS::Type.
1002
1003 ($perl_scalar, $characters) = $json->decode_prefix ($json_text)
1004 This works like the "decode" method, but instead of raising an
1005 exception when there is trailing garbage after the first JSON
1006 object, it will silently stop parsing there and return the number of
1007 characters consumed so far.
1008
1009 This is useful if your JSON texts are not delimited by an outer
1010 protocol and you need to know where the JSON text ends.
1011
1012 Cpanel::JSON::XS->new->decode_prefix ("[1] the tail")
1013 => ([1], 3)
1014
1015 $json->to_json ($perl_hash_or_arrayref)
1016 Deprecated method for perl 5.8 and newer. Use encode_json instead.
1017
1018 $json->from_json ($utf8_encoded_json_text)
1019 Deprecated method for perl 5.8 and newer. Use decode_json instead.
1020
1021INCREMENTAL PARSING
1022 In some cases, there is the need for incremental parsing of JSON texts.
1023 While this module always has to keep both JSON text and resulting Perl
1024 data structure in memory at one time, it does allow you to parse a JSON
1025 stream incrementally. It does so by accumulating text until it has a
1026 full JSON object, which it then can decode. This process is similar to
1027 using "decode_prefix" to see if a full JSON object is available, but is
1028 much more efficient (and can be implemented with a minimum of method
1029 calls).
1030
1031 Cpanel::JSON::XS will only attempt to parse the JSON text once it is
1032 sure it has enough text to get a decisive result, using a very simple
1033 but truly incremental parser. This means that it sometimes won't stop as
1034 early as the full parser, for example, it doesn't detect mismatched
1035 parentheses. The only thing it guarantees is that it starts decoding as
1036 soon as a syntactically valid JSON text has been seen. This means you
1037 need to set resource limits (e.g. "max_size") to ensure the parser will
1038 stop parsing in the presence if syntax errors.
1039
1040 The following methods implement this incremental parser.
1041
1042 [void, scalar or list context] = $json->incr_parse ([$string])
1043 This is the central parsing function. It can both append new text
1044 and extract objects from the stream accumulated so far (both of
1045 these functions are optional).
1046
1047 If $string is given, then this string is appended to the already
1048 existing JSON fragment stored in the $json object.
1049
1050 After that, if the function is called in void context, it will
1051 simply return without doing anything further. This can be used to
1052 add more text in as many chunks as you want.
1053
1054 If the method is called in scalar context, then it will try to
1055 extract exactly *one* JSON object. If that is successful, it will
1056 return this object, otherwise it will return "undef". If there is a
1057 parse error, this method will croak just as "decode" would do (one
1058 can then use "incr_skip" to skip the erroneous part). This is the
1059 most common way of using the method.
1060
1061 And finally, in list context, it will try to extract as many objects
1062 from the stream as it can find and return them, or the empty list
1063 otherwise. For this to work, there must be no separators between the
1064 JSON objects or arrays, instead they must be concatenated
1065 back-to-back. If an error occurs, an exception will be raised as in
1066 the scalar context case. Note that in this case, any
1067 previously-parsed JSON texts will be lost.
1068
1069 Example: Parse some JSON arrays/objects in a given string and return
1070 them.
1071
1072 my @objs = Cpanel::JSON::XS->new->incr_parse ("[5][7][1,2]");
1073
1074 $lvalue_string = $json->incr_text (>5.8 only)
1075 This method returns the currently stored JSON fragment as an lvalue,
1076 that is, you can manipulate it. This *only* works when a preceding
1077 call to "incr_parse" in *scalar context* successfully returned an
1078 object, and 2. only with Perl >= 5.8
1079
1080 Under all other circumstances you must not call this function (I
1081 mean it. although in simple tests it might actually work, it *will*
1082 fail under real world conditions). As a special exception, you can
1083 also call this method before having parsed anything.
1084
1085 This function is useful in two cases: a) finding the trailing text
1086 after a JSON object or b) parsing multiple JSON objects separated by
1087 non-JSON text (such as commas).
1088
1089 $json->incr_skip
1090 This will reset the state of the incremental parser and will remove
1091 the parsed text from the input buffer so far. This is useful after
1092 "incr_parse" died, in which case the input buffer and incremental
1093 parser state is left unchanged, to skip the text parsed so far and
1094 to reset the parse state.
1095
1096 The difference to "incr_reset" is that only text until the parse
1097 error occurred is removed.
1098
1099 $json->incr_reset
1100 This completely resets the incremental parser, that is, after this
1101 call, it will be as if the parser had never parsed anything.
1102
1103 This is useful if you want to repeatedly parse JSON objects and want
1104 to ignore any trailing data, which means you have to reset the
1105 parser after each successful decode.
1106
1107 LIMITATIONS
1108 All options that affect decoding are supported, except "allow_nonref".
1109 The reason for this is that it cannot be made to work sensibly: JSON
1110 objects and arrays are self-delimited, i.e. you can concatenate them
1111 back to back and still decode them perfectly. This does not hold true
1112 for JSON numbers, however.
1113
1114 For example, is the string 1 a single JSON number, or is it simply the
1115 start of 12? Or is 12 a single JSON number, or the concatenation of 1
1116 and 2? In neither case you can tell, and this is why Cpanel::JSON::XS
1117 takes the conservative route and disallows this case.
1118
1119 EXAMPLES
1120 Some examples will make all this clearer. First, a simple example that
1121 works similarly to "decode_prefix": We want to decode the JSON object at
1122 the start of a string and identify the portion after the JSON object:
1123
1124 my $text = "[1,2,3] hello";
1125
1126 my $json = new Cpanel::JSON::XS;
1127
1128 my $obj = $json->incr_parse ($text)
1129 or die "expected JSON object or array at beginning of string";
1130
1131 my $tail = $json->incr_text;
1132 # $tail now contains " hello"
1133
1134 Easy, isn't it?
1135
1136 Now for a more complicated example: Imagine a hypothetical protocol
1137 where you read some requests from a TCP stream, and each request is a
1138 JSON array, without any separation between them (in fact, it is often
1139 useful to use newlines as "separators", as these get interpreted as
1140 whitespace at the start of the JSON text, which makes it possible to
1141 test said protocol with "telnet"...).
1142
1143 Here is how you'd do it (it is trivial to write this in an event-based
1144 manner):
1145
1146 my $json = new Cpanel::JSON::XS;
1147
1148 # read some data from the socket
1149 while (sysread $socket, my $buf, 4096) {
1150
1151 # split and decode as many requests as possible
1152 for my $request ($json->incr_parse ($buf)) {
1153 # act on the $request
1154 }
1155 }
1156
1157 Another complicated example: Assume you have a string with JSON objects
1158 or arrays, all separated by (optional) comma characters (e.g. "[1],[2],
1159 [3]"). To parse them, we have to skip the commas between the JSON texts,
1160 and here is where the lvalue-ness of "incr_text" comes in useful:
1161
1162 my $text = "[1],[2], [3]";
1163 my $json = new Cpanel::JSON::XS;
1164
1165 # void context, so no parsing done
1166 $json->incr_parse ($text);
1167
1168 # now extract as many objects as possible. note the
1169 # use of scalar context so incr_text can be called.
1170 while (my $obj = $json->incr_parse) {
1171 # do something with $obj
1172
1173 # now skip the optional comma
1174 $json->incr_text =~ s/^ \s* , //x;
1175 }
1176
1177 Now lets go for a very complex example: Assume that you have a gigantic
1178 JSON array-of-objects, many gigabytes in size, and you want to parse it,
1179 but you cannot load it into memory fully (this has actually happened in
1180 the real world :).
1181
1182 Well, you lost, you have to implement your own JSON parser. But
1183 Cpanel::JSON::XS can still help you: You implement a (very simple) array
1184 parser and let JSON decode the array elements, which are all full JSON
1185 objects on their own (this wouldn't work if the array elements could be
1186 JSON numbers, for example):
1187
1188 my $json = new Cpanel::JSON::XS;
1189
1190 # open the monster
1191 open my $fh, "<bigfile.json"
1192 or die "bigfile: $!";
1193
1194 # first parse the initial "["
1195 for (;;) {
1196 sysread $fh, my $buf, 65536
1197 or die "read error: $!";
1198 $json->incr_parse ($buf); # void context, so no parsing
1199
1200 # Exit the loop once we found and removed(!) the initial "[".
1201 # In essence, we are (ab-)using the $json object as a simple scalar
1202 # we append data to.
1203 last if $json->incr_text =~ s/^ \s* \[ //x;
1204 }
1205
1206 # now we have the skipped the initial "[", so continue
1207 # parsing all the elements.
1208 for (;;) {
1209 # in this loop we read data until we got a single JSON object
1210 for (;;) {
1211 if (my $obj = $json->incr_parse) {
1212 # do something with $obj
1213 last;
1214 }
1215
1216 # add more data
1217 sysread $fh, my $buf, 65536
1218 or die "read error: $!";
1219 $json->incr_parse ($buf); # void context, so no parsing
1220 }
1221
1222 # in this loop we read data until we either found and parsed the
1223 # separating "," between elements, or the final "]"
1224 for (;;) {
1225 # first skip whitespace
1226 $json->incr_text =~ s/^\s*//;
1227
1228 # if we find "]", we are done
1229 if ($json->incr_text =~ s/^\]//) {
1230 print "finished.\n";
1231 exit;
1232 }
1233
1234 # if we find ",", we can continue with the next element
1235 if ($json->incr_text =~ s/^,//) {
1236 last;
1237 }
1238
1239 # if we find anything else, we have a parse error!
1240 if (length $json->incr_text) {
1241 die "parse error near ", $json->incr_text;
1242 }
1243
1244 # else add more data
1245 sysread $fh, my $buf, 65536
1246 or die "read error: $!";
1247 $json->incr_parse ($buf); # void context, so no parsing
1248 }
1249
1250 This is a complex example, but most of the complexity comes from the
1251 fact that we are trying to be correct (bear with me if I am wrong, I
1252 never ran the above example :).
1253
1254BOM
1255 Detect all unicode Byte Order Marks on decode. Which are UTF-8,
1256 UTF-16LE, UTF-16BE, UTF-32LE and UTF-32BE.
1257
1258 The BOM encoding is set only for one specific decode call, it does not
1259 change the state of the JSON object.
1260
1261 Warning: With perls older than 5.20 you need load the Encode module
1262 before loading a multibyte BOM, i.e. >= UTF-16. Otherwise an error is
1263 thrown. This is an implementation limitation and might get fixed later.
1264
1265 See <https://tools.ietf.org/html/rfc7159#section-8.1> *"JSON text SHALL
1266 be encoded in UTF-8, UTF-16, or UTF-32."*
1267
1268 *"Implementations MUST NOT add a byte order mark to the beginning of a
1269 JSON text", "implementations (...) MAY ignore the presence of a byte
1270 order mark rather than treating it as an error".*
1271
1272 See also <http://www.unicode.org/faq/utf_bom.html#BOM>.
1273
1274 Beware that Cpanel::JSON::XS is currently the only JSON module which
1275 does accept and decode a BOM.
1276
1277 The latest JSON spec
1278 <https://www.greenbytes.de/tech/webdav/rfc8259.html#character.encoding>
1279 forbid the usage of UTF-16 or UTF-32, the character encoding is UTF-8.
1280 Thus in subsequent updates BOM's of UTF-16 or UTF-32 will throw an
1281 error.
1282
1283MAPPING
1284 This section describes how Cpanel::JSON::XS maps Perl values to JSON
1285 values and vice versa. These mappings are designed to "do the right
1286 thing" in most circumstances automatically, preserving round-tripping
1287 characteristics (what you put in comes out as something equivalent).
1288
1289 For the more enlightened: note that in the following descriptions,
1290 lowercase *perl* refers to the Perl interpreter, while uppercase *Perl*
1291 refers to the abstract Perl language itself.
1292
1293 JSON -> PERL
1294 object
1295 A JSON object becomes a reference to a hash in Perl. No ordering of
1296 object keys is preserved (JSON does not preserve object key ordering
1297 itself).
1298
1299 array
1300 A JSON array becomes a reference to an array in Perl.
1301
1302 string
1303 A JSON string becomes a string scalar in Perl - Unicode codepoints
1304 in JSON are represented by the same codepoints in the Perl string,
1305 so no manual decoding is necessary.
1306
1307 number
1308 A JSON number becomes either an integer, numeric (floating point) or
1309 string scalar in perl, depending on its range and any fractional
1310 parts. On the Perl level, there is no difference between those as
1311 Perl handles all the conversion details, but an integer may take
1312 slightly less memory and might represent more values exactly than
1313 floating point numbers.
1314
1315 If the number consists of digits only, Cpanel::JSON::XS will try to
1316 represent it as an integer value. If that fails, it will try to
1317 represent it as a numeric (floating point) value if that is possible
1318 without loss of precision. Otherwise it will preserve the number as
1319 a string value (in which case you lose roundtripping ability, as the
1320 JSON number will be re-encoded to a JSON string).
1321
1322 Numbers containing a fractional or exponential part will always be
1323 represented as numeric (floating point) values, possibly at a loss
1324 of precision (in which case you might lose perfect roundtripping
1325 ability, but the JSON number will still be re-encoded as a JSON
1326 number).
1327
1328 Note that precision is not accuracy - binary floating point values
1329 cannot represent most decimal fractions exactly, and when converting
1330 from and to floating point, "Cpanel::JSON::XS" only guarantees
1331 precision up to but not including the least significant bit.
1332
1333 true, false
1334 When "unblessed_bool" is set to true, then JSON "true" becomes 1 and
1335 JSON "false" becomes 0.
1336
1337 Otherwise these JSON atoms become "JSON::PP::true" and
1338 "JSON::PP::false", respectively. They are "JSON::PP::Boolean"
1339 objects and are overloaded to act almost exactly like the numbers 1
1340 and 0. You can check whether a scalar is a JSON boolean by using the
1341 "Cpanel::JSON::XS::is_bool" function.
1342
1343 The other round, from perl to JSON, "!0" which is represented as
1344 "yes" becomes "true", and "!1" which is represented as "no" becomes
1345 "false".
1346
1347 Via Cpanel::JSON::XS::Type you can now even force negation in
1348 "encode", without overloading of "!":
1349
1350 my $false = Cpanel::JSON::XS::false;
1351 print($json->encode([!$false], [JSON_TYPE_BOOL]));
1352 => [true]
1353
1354 null
1355 A JSON null atom becomes "undef" in Perl.
1356
1357 shell-style comments ("# *text*")
1358 As a nonstandard extension to the JSON syntax that is enabled by the
1359 "relaxed" setting, shell-style comments are allowed. They can start
1360 anywhere outside strings and go till the end of the line.
1361
1362 tagged values ("(*tag*)*value*").
1363 Another nonstandard extension to the JSON syntax, enabled with the
1364 "allow_tags" setting, are tagged values. In this implementation, the
1365 *tag* must be a perl package/class name encoded as a JSON string,
1366 and the *value* must be a JSON array encoding optional constructor
1367 arguments.
1368
1369 See "OBJECT SERIALIZATION", below, for details.
1370
1371 PERL -> JSON
1372 The mapping from Perl to JSON is slightly more difficult, as Perl is a
1373 truly typeless language, so we can only guess which JSON type is meant
1374 by a Perl value.
1375
1376 hash references
1377 Perl hash references become JSON objects. As there is no inherent
1378 ordering in hash keys (or JSON objects), they will usually be
1379 encoded in a pseudo-random order that can change between runs of the
1380 same program but stays generally the same within a single run of a
1381 program. Cpanel::JSON::XS can optionally sort the hash keys
1382 (determined by the *canonical* flag), so the same datastructure will
1383 serialize to the same JSON text (given same settings and version of
1384 Cpanel::JSON::XS), but this incurs a runtime overhead and is only
1385 rarely useful, e.g. when you want to compare some JSON text against
1386 another for equality.
1387
1388 array references
1389 Perl array references become JSON arrays.
1390
1391 other references
1392 Other unblessed references are generally not allowed and will cause
1393 an exception to be thrown, except for references to the integers 0
1394 and 1, which get turned into "false" and "true" atoms in JSON.
1395
1396 With the option "allow_stringify", you can ignore the exception and
1397 return the stringification of the perl value.
1398
1399 With the option "allow_unknown", you can ignore the exception and
1400 return "null" instead.
1401
1402 encode_json [\"x"] # => cannot encode reference to scalar 'SCALAR(0x..)'
1403 # unless the scalar is 0 or 1
1404 encode_json [\0, \1] # yields [false,true]
1405
1406 allow_stringify->encode_json [\"x"] # yields "x" unlike JSON::PP
1407 allow_unknown->encode_json [\"x"] # yields null as in JSON::PP
1408
1409 Cpanel::JSON::XS::true, Cpanel::JSON::XS::false
1410 These special values become JSON true and JSON false values,
1411 respectively. You can also use "\1" and "\0" or "!0" and "!1"
1412 directly if you want.
1413
1414 encode_json [Cpanel::JSON::XS::false, Cpanel::JSON::XS::true] # yields [false,true]
1415 encode_json [!1, !0], [JSON_TYPE_BOOL, JSON_TYPE_BOOL] # yields [false,true]
1416
1417 eq/ne comparisons with true, false:
1418
1419 false is eq to the empty string or the string 'false' or the special
1420 empty string "!!0" or "!1", i.e. "SV_NO", or the numbers 0 or 0.0.
1421
1422 true is eq to the string 'true' or to the special string "!0" (i.e.
1423 "SV_YES") or to the numbers 1 or 1.0.
1424
1425 blessed objects
1426 Blessed objects are not directly representable in JSON, but
1427 "Cpanel::JSON::XS" allows various optional ways of handling objects.
1428 See "OBJECT SERIALIZATION", below, for details.
1429
1430 See the "allow_blessed" and "convert_blessed" methods on various
1431 options on how to deal with this: basically, you can choose between
1432 throwing an exception, encoding the reference as if it weren't
1433 blessed, use the objects overloaded stringification method or
1434 provide your own serializer method.
1435
1436 simple scalars
1437 Simple Perl scalars (any scalar that is not a reference) are the
1438 most difficult objects to encode: Cpanel::JSON::XS will encode
1439 undefined scalars or inf/nan as JSON "null" values and other scalars
1440 to either number or string in non-deterministic way which may be
1441 affected or changed by Perl version or any other loaded Perl module.
1442
1443 If you want to have stable and deterministic types in JSON encoder
1444 then use Cpanel::JSON::XS::Type.
1445
1446 Alternative way for deterministic types is to use "type_all_string"
1447 method when all perl scalars are encoded to JSON strings.
1448
1449 Non-deterministic behavior is following: scalars that have last been
1450 used in a string context before encoding as JSON strings, and
1451 anything else as number value:
1452
1453 # dump as number
1454 encode_json [2] # yields [2]
1455 encode_json [-3.0e17] # yields [-3e+17]
1456 my $value = 5; encode_json [$value] # yields [5]
1457
1458 # used as string, but the two representations are for the same number
1459 print $value;
1460 encode_json [$value] # yields [5]
1461
1462 # used as different string (non-matching dual-var)
1463 my $str = '0 but true';
1464 my $num = 1 + $str;
1465 encode_json [$num, $str] # yields [1,"0 but true"]
1466
1467 # undef becomes null
1468 encode_json [undef] # yields [null]
1469
1470 # inf or nan becomes null, unless you answered
1471 # "Do you want to handle inf/nan as strings" with yes
1472 encode_json [9**9**9] # yields [null]
1473
1474 You can force the type to be a JSON string by stringifying it:
1475
1476 my $x = 3.1; # some variable containing a number
1477 "$x"; # stringified
1478 $x .= ""; # another, more awkward way to stringify
1479 print $x; # perl does it for you, too, quite often
1480
1481 You can force the type to be a JSON number by numifying it:
1482
1483 my $x = "3"; # some variable containing a string
1484 $x += 0; # numify it, ensuring it will be dumped as a number
1485 $x *= 1; # same thing, the choice is yours.
1486
1487 Note that numerical precision has the same meaning as under Perl (so
1488 binary to decimal conversion follows the same rules as in Perl,
1489 which can differ to other languages). Also, your perl interpreter
1490 might expose extensions to the floating point numbers of your
1491 platform, such as infinities or NaN's - these cannot be represented
1492 in JSON, and thus null is returned instead. Optionally you can
1493 configure it to stringify inf and nan values.
1494
1495 OBJECT SERIALIZATION
1496 As JSON cannot directly represent Perl objects, you have to choose
1497 between a pure JSON representation (without the ability to deserialize
1498 the object automatically again), and a nonstandard extension to the JSON
1499 syntax, tagged values.
1500
1501 SERIALIZATION
1502 What happens when "Cpanel::JSON::XS" encounters a Perl object depends on
1503 the "allow_blessed", "convert_blessed" and "allow_tags" settings, which
1504 are used in this order:
1505
1506 1. "allow_tags" is enabled and the object has a "FREEZE" method.
1507 In this case, "Cpanel::JSON::XS" uses the Types::Serialiser object
1508 serialization protocol to create a tagged JSON value, using a
1509 nonstandard extension to the JSON syntax.
1510
1511 This works by invoking the "FREEZE" method on the object, with the
1512 first argument being the object to serialize, and the second
1513 argument being the constant string "JSON" to distinguish it from
1514 other serializers.
1515
1516 The "FREEZE" method can return any number of values (i.e. zero or
1517 more). These values and the paclkage/classname of the object will
1518 then be encoded as a tagged JSON value in the following format:
1519
1520 ("classname")[FREEZE return values...]
1521
1522 e.g.:
1523
1524 ("URI")["http://www.google.com/"]
1525 ("MyDate")[2013,10,29]
1526 ("ImageData::JPEG")["Z3...VlCg=="]
1527
1528 For example, the hypothetical "My::Object" "FREEZE" method might use
1529 the objects "type" and "id" members to encode the object:
1530
1531 sub My::Object::FREEZE {
1532 my ($self, $serializer) = @_;
1533
1534 ($self->{type}, $self->{id})
1535 }
1536
1537 2. "convert_blessed" is enabled and the object has a "TO_JSON" method.
1538 In this case, the "TO_JSON" method of the object is invoked in
1539 scalar context. It must return a single scalar that can be directly
1540 encoded into JSON. This scalar replaces the object in the JSON text.
1541
1542 For example, the following "TO_JSON" method will convert all URI
1543 objects to JSON strings when serialized. The fact that these values
1544 originally were URI objects is lost.
1545
1546 sub URI::TO_JSON {
1547 my ($uri) = @_;
1548 $uri->as_string
1549 }
1550
1551 3. "convert_blessed" is enabled and the object has a stringification
1552 overload.
1553 In this case, the overloaded "" method of the object is invoked in
1554 scalar context. It must return a single scalar that can be directly
1555 encoded into JSON. This scalar replaces the object in the JSON text.
1556
1557 For example, the following "" method will convert all URI objects to
1558 JSON strings when serialized. The fact that these values originally
1559 were URI objects is lost.
1560
1561 package URI;
1562 use overload '""' => sub { shift->as_string };
1563
1564 4. "allow_blessed" is enabled.
1565 The object will be serialized as a JSON null value.
1566
1567 5. none of the above
1568 If none of the settings are enabled or the respective methods are
1569 missing, "Cpanel::JSON::XS" throws an exception.
1570
1571 DESERIALIZATION
1572 For deserialization there are only two cases to consider: either
1573 nonstandard tagging was used, in which case "allow_tags" decides, or
1574 objects cannot be automatically be deserialized, in which case you can
1575 use postprocessing or the "filter_json_object" or
1576 "filter_json_single_key_object" callbacks to get some real objects our
1577 of your JSON.
1578
1579 This section only considers the tagged value case: I a tagged JSON
1580 object is encountered during decoding and "allow_tags" is disabled, a
1581 parse error will result (as if tagged values were not part of the
1582 grammar).
1583
1584 If "allow_tags" is enabled, "Cpanel::JSON::XS" will look up the "THAW"
1585 method of the package/classname used during serialization (it will not
1586 attempt to load the package as a Perl module). If there is no such
1587 method, the decoding will fail with an error.
1588
1589 Otherwise, the "THAW" method is invoked with the classname as first
1590 argument, the constant string "JSON" as second argument, and all the
1591 values from the JSON array (the values originally returned by the
1592 "FREEZE" method) as remaining arguments.
1593
1594 The method must then return the object. While technically you can return
1595 any Perl scalar, you might have to enable the "enable_nonref" setting to
1596 make that work in all cases, so better return an actual blessed
1597 reference.
1598
1599 As an example, let's implement a "THAW" function that regenerates the
1600 "My::Object" from the "FREEZE" example earlier:
1601
1602 sub My::Object::THAW {
1603 my ($class, $serializer, $type, $id) = @_;
1604
1605 $class->new (type => $type, id => $id)
1606 }
1607
1608 See the "SECURITY CONSIDERATIONS" section below. Allowing external json
1609 objects being deserialized to perl objects is usually a very bad idea.
1610
1611ENCODING/CODESET FLAG NOTES
1612 The interested reader might have seen a number of flags that signify
1613 encodings or codesets - "utf8", "latin1", "binary" and "ascii". There
1614 seems to be some confusion on what these do, so here is a short
1615 comparison:
1616
1617 "utf8" controls whether the JSON text created by "encode" (and expected
1618 by "decode") is UTF-8 encoded or not, while "latin1" and "ascii" only
1619 control whether "encode" escapes character values outside their
1620 respective codeset range. Neither of these flags conflict with each
1621 other, although some combinations make less sense than others.
1622
1623 Care has been taken to make all flags symmetrical with respect to
1624 "encode" and "decode", that is, texts encoded with any combination of
1625 these flag values will be correctly decoded when the same flags are used
1626 - in general, if you use different flag settings while encoding vs. when
1627 decoding you likely have a bug somewhere.
1628
1629 Below comes a verbose discussion of these flags. Note that a "codeset"
1630 is simply an abstract set of character-codepoint pairs, while an
1631 encoding takes those codepoint numbers and *encodes* them, in our case
1632 into octets. Unicode is (among other things) a codeset, UTF-8 is an
1633 encoding, and ISO-8859-1 (= latin 1) and ASCII are both codesets *and*
1634 encodings at the same time, which can be confusing.
1635
1636 "utf8" flag disabled
1637 When "utf8" is disabled (the default), then "encode"/"decode"
1638 generate and expect Unicode strings, that is, characters with high
1639 ordinal Unicode values (> 255) will be encoded as such characters,
1640 and likewise such characters are decoded as-is, no changes to them
1641 will be done, except "(re-)interpreting" them as Unicode codepoints
1642 or Unicode characters, respectively (to Perl, these are the same
1643 thing in strings unless you do funny/weird/dumb stuff).
1644
1645 This is useful when you want to do the encoding yourself (e.g. when
1646 you want to have UTF-16 encoded JSON texts) or when some other layer
1647 does the encoding for you (for example, when printing to a terminal
1648 using a filehandle that transparently encodes to UTF-8 you certainly
1649 do NOT want to UTF-8 encode your data first and have Perl encode it
1650 another time).
1651
1652 "utf8" flag enabled
1653 If the "utf8"-flag is enabled, "encode"/"decode" will encode all
1654 characters using the corresponding UTF-8 multi-byte sequence, and
1655 will expect your input strings to be encoded as UTF-8, that is, no
1656 "character" of the input string must have any value > 255, as UTF-8
1657 does not allow that.
1658
1659 The "utf8" flag therefore switches between two modes: disabled means
1660 you will get a Unicode string in Perl, enabled means you get an
1661 UTF-8 encoded octet/binary string in Perl.
1662
1663 "latin1", "binary" or "ascii" flags enabled
1664 With "latin1" (or "ascii") enabled, "encode" will escape characters
1665 with ordinal values > 255 (> 127 with "ascii") and encode the
1666 remaining characters as specified by the "utf8" flag. With "binary"
1667 enabled, ordinal values > 255 are illegal.
1668
1669 If "utf8" is disabled, then the result is also correctly encoded in
1670 those character sets (as both are proper subsets of Unicode, meaning
1671 that a Unicode string with all character values < 256 is the same
1672 thing as a ISO-8859-1 string, and a Unicode string with all
1673 character values < 128 is the same thing as an ASCII string in
1674 Perl).
1675
1676 If "utf8" is enabled, you still get a correct UTF-8-encoded string,
1677 regardless of these flags, just some more characters will be escaped
1678 using "\uXXXX" then before.
1679
1680 Note that ISO-8859-1-*encoded* strings are not compatible with UTF-8
1681 encoding, while ASCII-encoded strings are. That is because the
1682 ISO-8859-1 encoding is NOT a subset of UTF-8 (despite the ISO-8859-1
1683 *codeset* being a subset of Unicode), while ASCII is.
1684
1685 Surprisingly, "decode" will ignore these flags and so treat all
1686 input values as governed by the "utf8" flag. If it is disabled, this
1687 allows you to decode ISO-8859-1- and ASCII-encoded strings, as both
1688 strict subsets of Unicode. If it is enabled, you can correctly
1689 decode UTF-8 encoded strings.
1690
1691 So neither "latin1", "binary" nor "ascii" are incompatible with the
1692 "utf8" flag - they only govern when the JSON output engine escapes a
1693 character or not.
1694
1695 The main use for "latin1" or "binary" is to relatively efficiently
1696 store binary data as JSON, at the expense of breaking compatibility
1697 with most JSON decoders.
1698
1699 The main use for "ascii" is to force the output to not contain
1700 characters with values > 127, which means you can interpret the
1701 resulting string as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about
1702 any character set and 8-bit-encoding, and still get the same data
1703 structure back. This is useful when your channel for JSON transfer
1704 is not 8-bit clean or the encoding might be mangled in between (e.g.
1705 in mail), and works because ASCII is a proper subset of most 8-bit
1706 and multibyte encodings in use in the world.
1707
1708 JSON and ECMAscript
1709 JSON syntax is based on how literals are represented in javascript (the
1710 not-standardized predecessor of ECMAscript) which is presumably why it
1711 is called "JavaScript Object Notation".
1712
1713 However, JSON is not a subset (and also not a superset of course) of
1714 ECMAscript (the standard) or javascript (whatever browsers actually
1715 implement).
1716
1717 If you want to use javascript's "eval" function to "parse" JSON, you
1718 might run into parse errors for valid JSON texts, or the resulting data
1719 structure might not be queryable:
1720
1721 One of the problems is that U+2028 and U+2029 are valid characters
1722 inside JSON strings, but are not allowed in ECMAscript string literals,
1723 so the following Perl fragment will not output something that can be
1724 guaranteed to be parsable by javascript's "eval":
1725
1726 use Cpanel::JSON::XS;
1727
1728 print encode_json [chr 0x2028];
1729
1730 The right fix for this is to use a proper JSON parser in your javascript
1731 programs, and not rely on "eval" (see for example Douglas Crockford's
1732 json2.js parser).
1733
1734 If this is not an option, you can, as a stop-gap measure, simply encode
1735 to ASCII-only JSON:
1736
1737 use Cpanel::JSON::XS;
1738
1739 print Cpanel::JSON::XS->new->ascii->encode ([chr 0x2028]);
1740
1741 Note that this will enlarge the resulting JSON text quite a bit if you
1742 have many non-ASCII characters. You might be tempted to run some regexes
1743 to only escape U+2028 and U+2029, e.g.:
1744
1745 # DO NOT USE THIS!
1746 my $json = Cpanel::JSON::XS->new->utf8->encode ([chr 0x2028]);
1747 $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028
1748 $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029
1749 print $json;
1750
1751 Note that *this is a bad idea*: the above only works for U+2028 and
1752 U+2029 and thus only for fully ECMAscript-compliant parsers. Many
1753 existing javascript implementations, however, have issues with other
1754 characters as well - using "eval" naively simply *will* cause problems.
1755
1756 Another problem is that some javascript implementations reserve some
1757 property names for their own purposes (which probably makes them
1758 non-ECMAscript-compliant). For example, Iceweasel reserves the
1759 "__proto__" property name for its own purposes.
1760
1761 If that is a problem, you could parse try to filter the resulting JSON
1762 output for these property strings, e.g.:
1763
1764 $json =~ s/"__proto__"\s*:/"__proto__renamed":/g;
1765
1766 This works because "__proto__" is not valid outside of strings, so every
1767 occurrence of ""__proto__"\s*:" must be a string used as property name.
1768
1769 Unicode non-characters between U+FFFD and U+10FFFF are decoded either to
1770 the recommended U+FFFD REPLACEMENT CHARACTER (see Unicode PR #121:
1771 Recommended Practice for Replacement Characters), or in the binary or
1772 relaxed mode left as is, keeping the illegal non-characters as before.
1773
1774 Raw non-Unicode characters outside the valid unicode range fail now to
1775 parse, because "A string is a sequence of zero or more Unicode
1776 characters" RFC 7159 section 1 and "JSON text SHALL be encoded in
1777 Unicode RFC 7159 section 8.1. We use now the UTF8_DISALLOW_SUPER flag
1778 when parsing unicode.
1779
1780 If you know of other incompatibilities, please let me know.
1781
1782 JSON and YAML
1783 You often hear that JSON is a subset of YAML. *in general, there is no
1784 way to configure JSON::XS to output a data structure as valid YAML* that
1785 works in all cases. If you really must use Cpanel::JSON::XS to generate
1786 YAML, you should use this algorithm (subject to change in future
1787 versions):
1788
1789 my $to_yaml = Cpanel::JSON::XS->new->utf8->space_after (1);
1790 my $yaml = $to_yaml->encode ($ref) . "\n";
1791
1792 This will *usually* generate JSON texts that also parse as valid YAML.
1793
1794 SPEED
1795 It seems that JSON::XS is surprisingly fast, as shown in the following
1796 tables. They have been generated with the help of the "eg/bench" program
1797 in the JSON::XS distribution, to make it easy to compare on your own
1798 system.
1799
1800 JSON::XS is with Data::MessagePack and Sereal one of the fastest
1801 serializers, because JSON and JSON::XS do not support backrefs (no graph
1802 structures), only trees. Storable supports backrefs, i.e. graphs.
1803 Data::MessagePack encodes its data binary (as Storable) and supports
1804 only very simple subset of JSON.
1805
1806 First comes a comparison between various modules using a very short
1807 single-line JSON string (also available at
1808 <http://dist.schmorp.de/misc/json/short.json>).
1809
1810 {"method": "handleMessage", "params": ["user1",
1811 "we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7,
1812 1, 0]}
1813
1814 It shows the number of encodes/decodes per second (JSON::XS uses the
1815 functional interface, while Cpanel::JSON::XS/2 uses the OO interface
1816 with pretty-printing and hash key sorting enabled, Cpanel::JSON::XS/3
1817 enables shrink. JSON::DWIW/DS uses the deserialize function, while
1818 JSON::DWIW::FJ uses the from_json method). Higher is better:
1819
1820 module | encode | decode |
1821 --------------|------------|------------|
1822 JSON::DWIW/DS | 86302.551 | 102300.098 |
1823 JSON::DWIW/FJ | 86302.551 | 75983.768 |
1824 JSON::PP | 15827.562 | 6638.658 |
1825 JSON::Syck | 63358.066 | 47662.545 |
1826 JSON::XS | 511500.488 | 511500.488 |
1827 JSON::XS/2 | 291271.111 | 388361.481 |
1828 JSON::XS/3 | 361577.931 | 361577.931 |
1829 Storable | 66788.280 | 265462.278 |
1830 --------------+------------+------------+
1831
1832 That is, JSON::XS is almost six times faster than JSON::DWIW on
1833 encoding, about five times faster on decoding, and over thirty to
1834 seventy times faster than JSON's pure perl implementation. It also
1835 compares favourably to Storable for small amounts of data.
1836
1837 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
1838 search API (<http://dist.schmorp.de/misc/json/long.json>).
1839
1840 module | encode | decode |
1841 --------------|------------|------------|
1842 JSON::DWIW/DS | 1647.927 | 2673.916 |
1843 JSON::DWIW/FJ | 1630.249 | 2596.128 |
1844 JSON::PP | 400.640 | 62.311 |
1845 JSON::Syck | 1481.040 | 1524.869 |
1846 JSON::XS | 20661.596 | 9541.183 |
1847 JSON::XS/2 | 10683.403 | 9416.938 |
1848 JSON::XS/3 | 20661.596 | 9400.054 |
1849 Storable | 19765.806 | 10000.725 |
1850 --------------+------------+------------+
1851
1852 Again, JSON::XS leads by far (except for Storable which non-surprisingly
1853 decodes a bit faster).
1854
1855 On large strings containing lots of high Unicode characters, some
1856 modules (such as JSON::PC) seem to decode faster than JSON::XS, but the
1857 result will be broken due to missing (or wrong) Unicode handling. Others
1858 refuse to decode or encode properly, so it was impossible to prepare a
1859 fair comparison table for that case.
1860
1861 For updated graphs see
1862 <https://github.com/Sereal/Sereal/wiki/Sereal-Comparison-Graphs>
1863
1864INTEROP with JSON and JSON::XS and other JSON modules
1865 As long as you only serialize data that can be directly expressed in
1866 JSON, "Cpanel::JSON::XS" is incapable of generating invalid JSON output
1867 (modulo bugs, but "JSON::XS" has found more bugs in the official JSON
1868 testsuite (1) than the official JSON testsuite has found in "JSON::XS"
1869 (0)). "Cpanel::JSON::XS" is currently the only known JSON decoder which
1870 passes all <http://seriot.ch/parsing_json.html> tests, while being the
1871 fastest also.
1872
1873 When you have trouble decoding JSON generated by this module using other
1874 decoders, then it is very likely that you have an encoding mismatch or
1875 the other decoder is broken.
1876
1877 When decoding, "JSON::XS" is strict by default and will likely catch all
1878 errors. There are currently two settings that change this: "relaxed"
1879 makes "JSON::XS" accept (but not generate) some non-standard extensions,
1880 and "allow_tags" or "allow_blessed" will allow you to encode and decode
1881 Perl objects, at the cost of being totally insecure and not outputting
1882 valid JSON anymore.
1883
1884 JSON-XS-3.01 broke interoperability with JSON-2.90 with booleans. See
1885 JSON.
1886
1887 Cpanel::JSON::XS needs to know the JSON and JSON::XS versions to be able
1888 work with those objects, especially when encoding a booleans like
1889 "{"is_true":true}". So you need to load these modules before.
1890
1891 true/false overloading and boolean representations are supported.
1892
1893 JSON::XS and JSON::PP representations are accepted and older JSON::XS
1894 accepts Cpanel::JSON::XS booleans. All JSON modules JSON, JSON, PP,
1895 JSON::XS, Cpanel::JSON::XS produce JSON::PP::Boolean objects, just Mojo
1896 and JSON::YAJL not. Mojo produces Mojo::JSON::_Bool and
1897 JSON::YAJL::Parser just an unblessed IV.
1898
1899 Cpanel::JSON::XS accepts JSON::PP::Boolean and Mojo::JSON::_Bool objects
1900 as booleans.
1901
1902 I cannot think of any reason to still use JSON::XS anymore.
1903
1904 TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS
1905 When you use "allow_tags" to use the extended (and also nonstandard and
1906 invalid) JSON syntax for serialized objects, and you still want to
1907 decode the generated serialize objects, you can run a regex to replace
1908 the tagged syntax by standard JSON arrays (it only works for "normal"
1909 package names without comma, newlines or single colons). First, the
1910 readable Perl version:
1911
1912 # if your FREEZE methods return no values, you need this replace first:
1913 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx;
1914
1915 # this works for non-empty constructor arg lists:
1916 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx;
1917
1918 And here is a less readable version that is easy to adapt to other
1919 languages:
1920
1921 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g;
1922
1923 Here is an ECMAScript version (same regex):
1924
1925 json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,");
1926
1927 Since this syntax converts to standard JSON arrays, it might be hard to
1928 distinguish serialized objects from normal arrays. You can prepend a
1929 "magic number" as first array element to reduce chances of a collision:
1930
1931 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g;
1932
1933 And after decoding the JSON text, you could walk the data structure
1934 looking for arrays with a first element of
1935 "XU1peReLzT4ggEllLanBYq4G9VzliwKF".
1936
1937 The same approach can be used to create the tagged format with another
1938 encoder. First, you create an array with the magic string as first
1939 member, the classname as second, and constructor arguments last, encode
1940 it as part of your JSON structure, and then:
1941
1942 $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g;
1943
1944 Again, this has some limitations - the magic string must not be encoded
1945 with character escapes, and the constructor arguments must be non-empty.
1946
1947RFC7159
1948 Since this module was written, Google has written a new JSON RFC, RFC
1949 7159 (and RFC7158). Unfortunately, this RFC breaks compatibility with
1950 both the original JSON specification on www.json.org and RFC4627.
1951
1952 As far as I can see, you can get partial compatibility when parsing by
1953 using "->allow_nonref". However, consider the security implications of
1954 doing so.
1955
1956 I haven't decided yet when to break compatibility with RFC4627 by
1957 default (and potentially leave applications insecure) and change the
1958 default to follow RFC7159, but application authors are well advised to
1959 call "->allow_nonref(0)" even if this is the current default, if they
1960 cannot handle non-reference values, in preparation for the day when the
1961 default will change.
1962
1963SECURITY CONSIDERATIONS
1964 JSON::XS and Cpanel::JSON::XS are not only fast. JSON is generally the
1965 most secure serializing format, because it is the only one besides
1966 Data::MessagePack, which does not deserialize objects per default. For
1967 all languages, not just perl. The binary variant BSON (MongoDB) does
1968 more but is unsafe.
1969
1970 It is trivial for any attacker to create such serialized objects in JSON
1971 and trick perl into expanding them, thereby triggering certain methods.
1972 Watch <https://www.youtube.com/watch?v=Gzx6KlqiIZE> for an exploit demo
1973 for "CVE-2015-1592 SixApart MovableType Storable Perl Code Execution"
1974 for a deserializer which expands objects. Deserializing even coderefs
1975 (methods, functions) or external data would be considered the most
1976 dangerous.
1977
1978 Security relevant overview of serializers regarding deserializing
1979 objects by default:
1980
1981 Objects Coderefs External Data
1982
1983 Data::Dumper YES YES YES
1984 Storable YES NO (def) NO
1985 Sereal YES NO NO
1986 YAML YES NO NO
1987 B::C YES YES YES
1988 B::Bytecode YES YES YES
1989 BSON YES YES NO
1990 JSON::SL YES NO YES
1991 JSON NO (def) NO NO
1992 Data::MessagePack NO NO NO
1993 XML NO NO YES
1994
1995 Pickle YES YES YES
1996 PHP Deserialize YES NO NO
1997
1998 When you are using JSON in a protocol, talking to untrusted potentially
1999 hostile creatures requires relatively few measures.
2000
2001 First of all, your JSON decoder should be secure, that is, should not
2002 have any buffer overflows. Obviously, this module should ensure that.
2003
2004 Second, you need to avoid resource-starving attacks. That means you
2005 should limit the size of JSON texts you accept, or make sure then when
2006 your resources run out, that's just fine (e.g. by using a separate
2007 process that can crash safely). The size of a JSON text in octets or
2008 characters is usually a good indication of the size of the resources
2009 required to decode it into a Perl structure. While JSON::XS can check
2010 the size of the JSON text, it might be too late when you already have it
2011 in memory, so you might want to check the size before you accept the
2012 string.
2013
2014 Third, Cpanel::JSON::XS recurses using the C stack when decoding objects
2015 and arrays. The C stack is a limited resource: for instance, on my amd64
2016 machine with 8MB of stack size I can decode around 180k nested arrays
2017 but only 14k nested JSON objects (due to perl itself recursing deeply on
2018 croak to free the temporary). If that is exceeded, the program crashes.
2019 To be conservative, the default nesting limit is set to 512. If your
2020 process has a smaller stack, you should adjust this setting accordingly
2021 with the "max_depth" method.
2022
2023 Also keep in mind that Cpanel::JSON::XS might leak contents of your Perl
2024 data structures in its error messages, so when you serialize sensitive
2025 information you might want to make sure that exceptions thrown by
2026 JSON::XS will not end up in front of untrusted eyes.
2027
2028 If you are using Cpanel::JSON::XS to return packets to consumption by
2029 JavaScript scripts in a browser you should have a look at
2030 <http://blog.archive.jpsykes.com/47/practical-csrf-and-json-security/>
2031 to see whether you are vulnerable to some common attack vectors (which
2032 really are browser design bugs, but it is still you who will have to
2033 deal with it, as major browser developers care only for features, not
2034 about getting security right). You might also want to also look at
2035 Mojo::JSON special escape rules to prevent from XSS attacks.
2036
2037"OLD" VS. "NEW" JSON (RFC 4627 VS. RFC 7159)
2038 TL;DR: Due to security concerns, Cpanel::JSON::XS will not allow scalar
2039 data in JSON texts by default - you need to create your own
2040 Cpanel::JSON::XS object and enable "allow_nonref":
2041
2042 my $json = JSON::XS->new->allow_nonref;
2043
2044 $text = $json->encode ($data);
2045 $data = $json->decode ($text);
2046
2047 The long version: JSON being an important and supposedly stable format,
2048 the IETF standardized it as RFC 4627 in 2006. Unfortunately the inventor
2049 of JSON Douglas Crockford unilaterally changed the definition of JSON in
2050 javascript. Rather than create a fork, the IETF decided to standardize
2051 the new syntax (apparently, so I as told, without finding it very
2052 amusing).
2053
2054 The biggest difference between the original JSON and the new JSON is
2055 that the new JSON supports scalars (anything other than arrays and
2056 objects) at the top-level of a JSON text. While this is strictly
2057 backwards compatible to older versions, it breaks a number of protocols
2058 that relied on sending JSON back-to-back, and is a minor security
2059 concern.
2060
2061 For example, imagine you have two banks communicating, and on one side,
2062 the JSON coder gets upgraded. Two messages, such as 10 and 1000 might
2063 then be confused to mean 101000, something that couldn't happen in the
2064 original JSON, because neither of these messages would be valid JSON.
2065
2066 If one side accepts these messages, then an upgrade in the coder on
2067 either side could result in this becoming exploitable.
2068
2069 This module has always allowed these messages as an optional extension,
2070 by default disabled. The security concerns are the reason why the
2071 default is still disabled, but future versions might/will likely upgrade
2072 to the newer RFC as default format, so you are advised to check your
2073 implementation and/or override the default with "->allow_nonref (0)" to
2074 ensure that future versions are safe.
2075
2076THREADS
2077 Cpanel::JSON::XS has proper ithreads support, unlike JSON::XS. If you
2078 encounter any bugs with thread support please report them.
2079
2080 From Version 4.00 - 4.19 you couldn't encode true with threads::shared
2081 magic.
2082
2083BUGS
2084 While the goal of the Cpanel::JSON::XS module is to be correct, that
2085 unfortunately does not mean it's bug-free, only that the author thinks
2086 its design is bug-free. If you keep reporting bugs and tests they will
2087 be fixed swiftly, though.
2088
2089 Since the JSON::XS author refuses to use a public bugtracker and prefers
2090 private emails, we use the tracker at github, so you might want to
2091 report any issues twice. Once in private to MLEHMANN to be fixed in
2092 JSON::XS and one to our the public tracker. Issues fixed by JSON::XS
2093 with a new release will also be backported to Cpanel::JSON::XS and
2094 5.6.2, as long as cPanel relies on 5.6.2 and Cpanel::JSON::XS as our
2095 serializer of choice.
2096
2097 <https://github.com/rurban/Cpanel-JSON-XS/issues>
2098
2099LICENSE
2100 This module is available under the same licences as perl, the Artistic
2101 license and the GPL.
2102
2103SEE ALSO
2104 The cpanel_json_xs command line utility for quick experiments.
2105
2106 JSON, JSON::XS, JSON::MaybeXS, Mojo::JSON, Mojo::JSON::MaybeXS,
2107 JSON::SL, JSON::DWIW, JSON::YAJL, JSON::Any, Test::JSON,
2108 Locale::Wolowitz, <https://metacpan.org/search?q=JSON>
2109
2110 <https://tools.ietf.org/html/rfc7159>
2111
2112 <https://tools.ietf.org/html/rfc4627>
2113
2114AUTHOR
2115 Reini Urban <rurban@cpan.org>
2116
2117 Marc Lehmann <schmorp@schmorp.de>, http://home.schmorp.de/
2118
2119MAINTAINER
2120 Reini Urban <rurban@cpan.org>
2121
2122