tokenizer - OpenGrok cross reference for /dports/net-im/fractal/fractal-4.4.0/vendor/html5ever/html5lib-tests/tokenizer/

Tokenizer tests
===============

The test format is [JSON](http://www.json.org/). This has the advantage
that the syntax allows backward-compatible extensions to the tests and
the disadvantage that it is relatively verbose.

Basic Structure
---------------

    {"tests": [
        {"description": "Test description",
        "input": "input_string",
        "output": [expected_output_tokens],
        "initialStates": [initial_states],
        "lastStartTag": last_start_tag,
        "ignoreErrorOrder": ignore_error_order
        }
    ]}

Multiple tests per file are allowed simply by adding more objects to the
"tests" list.

`description`, `input` and `output` are always present. The other values
are optional.

### Test set-up

`test.input` is a string containing the characters to pass to the
tokenizer. Specifically, it represents the characters of the **input
stream**, and so implementations are expected to perform the processing
described in the spec's **Preprocessing the input stream** section
before feeding the result to the tokenizer.

If `test.doubleEscaped` is present and `true`, then `test.input` is not
quite as described above. Instead, it must first be subjected to another
round of unescaping (i.e., in addition to any unescaping involved in the
JSON import), and the result of *that* represents the characters of the
input stream. Currently, the only unescaping required by this option is
to convert each sequence of the form \\uHHHH (where H is a hex digit)
into the corresponding Unicode code point. (Note that this option also
affects the interpretation of `test.output`.)

`test.initialStates` is a list of strings, each being the name of a
tokenizer state. The test should be run once for each string, using it
to set the tokenizer's initial state for that run. If
`test.initialStates` is omitted, it defaults to `["data state"]`.

`test.lastStartTag` is a lowercase string that should be used as "the
tag name of the last start tag to have been emitted from this
tokenizer", referenced in the spec's definition of **appropriate end tag
token**. If it is omitted, it is treated as if "no start tag has been
emitted from this tokenizer".

### Test results

`test.output` is a list of tokens, ordered with the first produced by
the tokenizer the first (leftmost) in the list. The list must mach the
**complete** list of tokens that the tokenizer should produce. Valid
tokens are:

    ["DOCTYPE", name, public_id, system_id, correctness]
    ["StartTag", name, {attributes}*, true*]
    ["StartTag", name, {attributes}]
    ["EndTag", name]
    ["Comment", data]
    ["Character", data]
    "ParseError"

`public_id` and `system_id` are either strings or `null`. `correctness`
is either `true` or `false`; `true` corresponds to the force-quirks flag
being false, and vice-versa.

When the self-closing flag is set, the `StartTag` array has `true` as
its fourth entry. When the flag is not set, the array has only three
entries for backwards compatibility.

All adjacent character tokens are coalesced into a single
`["Character", data]` token.

If `test.doubleEscaped` is present and `true`, then every string within
`test.output` must be further unescaped (as described above) before
comparing with the tokenizer's output.

`test.ignoreErrorOrder` is a boolean value indicating that the order of
`ParseError` tokens relative to other tokens in the output stream is
unimportant, and implementations should ignore such differences between
their output and `expected_output_tokens`. (This is used for errors
emitted by the input stream preprocessing stage, since it is useful to
test that code but it is undefined when the errors occur). If it is
omitted, it defaults to `false`.

xmlViolation tests
------------------

`tokenizer/xmlViolation.test` differs from the above in a couple of
ways:

-   The name of the single member of the top-level JSON object is
    "xmlViolationTests" instead of "tests".
-   Each test's expected output assumes that implementation is applying
    the tweaks given in the spec's "Coercing an HTML DOM into an
    infoset" section.
Name		Date	Size	#Lines	LOC
..		07-Aug-2020	-
README.md	H A D	07-Aug-2020	4 KiB	105	82
contentModelFlags.test	H A D	07-Aug-2020	2.7 KiB	82	67
domjs.test	H A D	07-Aug-2020	3.3 KiB	97	95
entities.test	H A D	07-Aug-2020	10.1 KiB	284	212
escapeFlag.test	H A D	07-Aug-2020	1.3 KiB	34	27
namedEntities.test	H A D	07-Aug-2020	1.1 MiB	42,210	42,210
numericEntities.test	H A D	07-Aug-2020	40.8 KiB	1,350	1,010
pendingSpecChanges.test	H A D	07-Aug-2020	123	8	5
test1.test	H A D	07-Aug-2020	5.5 KiB	197	146
test2.test	H A D	07-Aug-2020	6.2 KiB	180	134
test3.test	H A D	07-Aug-2020	180.5 KiB	6,048	4,535
test4.test	H A D	07-Aug-2020	10.9 KiB	345	259
unicodeChars.test	H A D	07-Aug-2020	36.4 KiB	1,296	971
unicodeCharsProblematic.test	H A D	07-Aug-2020	945	31	27
xmlViolation.test	H A D	07-Aug-2020	518	23	16