README.md
1# BSON Corpus
2
3This BSON test data corpus consists of a JSON file for each BSON type, plus
4a `top.json` file for testing the overall, enclosing document.
5
6Top level keys include:
7
8* `description`: human-readable description of what is in the file
9* `bson_type`: hex string of the first byte of a BSON element (e.g. "0x01"
10 for type "double"); this will be the synthetic value "0x00" for `top.json`.
11* `test_key`: name of a field in a `valid` test case `extjson` document
12 should be checked against the case's `string` field.
13* `valid` (optional): an array of valid test cases (see below).
14* `decodeErrors` (optional): an array of decode error cases (see below).
15* `parseErrors` (optional): an array of type-specific parse error case (see
16 below).
17
18Valid test case keys include:
19
20* `description`: human-readable test case label.
21* `subject`: an (uppercase) big-endian hex representation of a BSON byte
22 string. Be sure to mangle the case as appropriate in any roundtrip
23 tests.
24* `string`: (optional) a representation of an element in the `extjson`
25 field that can be checked to verify correct extjson decoding. How to
26 check is language and bson-type specific.
27* `extjson`: a document representing the decoded extended JSON document
28 equivalent to the subject.
29* `decodeOnly` (optional): if true, indicates that the BSON can not
30 roundtrip; decoding the BSON in 'subject' and re-encoding the result will
31 not generate identical BSON; otherwise, encode(decode(subject)) should be
32 the same as the subject.
33
34Decode error cases provide an invalid BSON document or field that
35should result in an error. For each case, keys include:
36
37* `description`: human-readable test case label.
38* `subject`: an (uppercase) big-endian hex representation of an invalid
39 BSON string that should fail to decode correctly.
40
41Parse error cases are type-specific and represent some input that can not
42be encoded to the `bson_type` under test. For each case, keys include:
43
44* `description`: human-readable test case label.
45* `subject`: a text or numeric representation of an input that can't be
46 encoded.
47
48## Extended JSON extensions
49
50The extended JSON documentation doesn't include extensions for all BSON
51types. These are supported by `mongoexport`:
52
53 # Javascript
54 { "$code": "<code here>" }
55
56 # Javascript with scope
57 { "$code": "<code here>": "$scope": { "x":1, "y":1 } }
58
59 # Int32
60 { "$numberInt": "<number>" }
61
62However, this corpus extends JSON further to include the following:
63
64 # Double (needed for NaN, etc.)
65 { "$numberDouble": "<value|NaN|Inf|-Inf>" }
66
67 # DBpointer (deprecated): <id> is 24 hex chars
68 { "$dbpointer": "<id>", "$ns":"<namespace>" }
69
70 # Symbol (deprecated)
71 { "$symbol": "<text>" }
72
73## Visualizing BSON
74
75The directory includes a Perl script `bsonview`, which will decompose and
76highlight elements of a BSON document. It may be used like this:
77
78 echo "0900000010610005000000" | perl bsonview -x
79
80## Open Questions
81
82These issues are still TBD:
83
84* Can "-0.0" be represented "canonically" in bson? Some languages might
85 not round-trip it. (Do we need a "lossy_bson" field to capture this?)
86
87* How should DBPointer round-trip? Should we expect it to be turned into a
88 DBRef or round-trip faithfully?
89
90* How should Symbol roundtrip? Should we expect it to be turned into a
91 string?
92
93* How should Undefined roundtrip? Should we expect it to be turned into a
94 null?
95
96* Should we flag cases where extjson is lossy compared to bson?
97