• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..07-Aug-2020-

README.mdH A D07-Aug-20204 KiB10582

contentModelFlags.testH A D07-Aug-20202.7 KiB8267

domjs.testH A D07-Aug-20203.3 KiB9795

entities.testH A D07-Aug-202010.1 KiB284212

escapeFlag.testH A D07-Aug-20201.3 KiB3427

namedEntities.testH A D07-Aug-20201.1 MiB42,21042,210

numericEntities.testH A D07-Aug-202040.8 KiB1,3501,010

pendingSpecChanges.testH A D07-Aug-2020123 85

test1.testH A D07-Aug-20205.5 KiB197146

test2.testH A D07-Aug-20206.2 KiB180134

test3.testH A D07-Aug-2020180.5 KiB6,0484,535

test4.testH A D07-Aug-202010.9 KiB345259

unicodeChars.testH A D07-Aug-202036.4 KiB1,296971

unicodeCharsProblematic.testH A D07-Aug-2020945 3127

xmlViolation.testH A D07-Aug-2020518 2316

README.md

1Tokenizer tests
2===============
3
4The test format is [JSON](http://www.json.org/). This has the advantage
5that the syntax allows backward-compatible extensions to the tests and
6the disadvantage that it is relatively verbose.
7
8Basic Structure
9---------------
10
11    {"tests": [
12        {"description": "Test description",
13        "input": "input_string",
14        "output": [expected_output_tokens],
15        "initialStates": [initial_states],
16        "lastStartTag": last_start_tag,
17        "ignoreErrorOrder": ignore_error_order
18        }
19    ]}
20
21Multiple tests per file are allowed simply by adding more objects to the
22"tests" list.
23
24`description`, `input` and `output` are always present. The other values
25are optional.
26
27### Test set-up
28
29`test.input` is a string containing the characters to pass to the
30tokenizer. Specifically, it represents the characters of the **input
31stream**, and so implementations are expected to perform the processing
32described in the spec's **Preprocessing the input stream** section
33before feeding the result to the tokenizer.
34
35If `test.doubleEscaped` is present and `true`, then `test.input` is not
36quite as described above. Instead, it must first be subjected to another
37round of unescaping (i.e., in addition to any unescaping involved in the
38JSON import), and the result of *that* represents the characters of the
39input stream. Currently, the only unescaping required by this option is
40to convert each sequence of the form \\uHHHH (where H is a hex digit)
41into the corresponding Unicode code point. (Note that this option also
42affects the interpretation of `test.output`.)
43
44`test.initialStates` is a list of strings, each being the name of a
45tokenizer state. The test should be run once for each string, using it
46to set the tokenizer's initial state for that run. If
47`test.initialStates` is omitted, it defaults to `["data state"]`.
48
49`test.lastStartTag` is a lowercase string that should be used as "the
50tag name of the last start tag to have been emitted from this
51tokenizer", referenced in the spec's definition of **appropriate end tag
52token**. If it is omitted, it is treated as if "no start tag has been
53emitted from this tokenizer".
54
55### Test results
56
57`test.output` is a list of tokens, ordered with the first produced by
58the tokenizer the first (leftmost) in the list. The list must mach the
59**complete** list of tokens that the tokenizer should produce. Valid
60tokens are:
61
62    ["DOCTYPE", name, public_id, system_id, correctness]
63    ["StartTag", name, {attributes}*, true*]
64    ["StartTag", name, {attributes}]
65    ["EndTag", name]
66    ["Comment", data]
67    ["Character", data]
68    "ParseError"
69
70`public_id` and `system_id` are either strings or `null`. `correctness`
71is either `true` or `false`; `true` corresponds to the force-quirks flag
72being false, and vice-versa.
73
74When the self-closing flag is set, the `StartTag` array has `true` as
75its fourth entry. When the flag is not set, the array has only three
76entries for backwards compatibility.
77
78All adjacent character tokens are coalesced into a single
79`["Character", data]` token.
80
81If `test.doubleEscaped` is present and `true`, then every string within
82`test.output` must be further unescaped (as described above) before
83comparing with the tokenizer's output.
84
85`test.ignoreErrorOrder` is a boolean value indicating that the order of
86`ParseError` tokens relative to other tokens in the output stream is
87unimportant, and implementations should ignore such differences between
88their output and `expected_output_tokens`. (This is used for errors
89emitted by the input stream preprocessing stage, since it is useful to
90test that code but it is undefined when the errors occur). If it is
91omitted, it defaults to `false`.
92
93xmlViolation tests
94------------------
95
96`tokenizer/xmlViolation.test` differs from the above in a couple of
97ways:
98
99-   The name of the single member of the top-level JSON object is
100    "xmlViolationTests" instead of "tests".
101-   Each test's expected output assumes that implementation is applying
102    the tweaks given in the spec's "Coercing an HTML DOM into an
103    infoset" section.
104
105