1 2 subunit: A streaming protocol for test results 3 Copyright (C) 2005-2013 Robert Collins <robertc@robertcollins.net> 4 5 Licensed under either the Apache License, Version 2.0 or the BSD 3-clause 6 license at the users choice. A copy of both licenses are available in the 7 project source as Apache-2.0 and BSD. You may not use this file except in 8 compliance with one of these two licences. 9 10 Unless required by applicable law or agreed to in writing, software 11 distributed under these licenses is distributed on an "AS IS" BASIS, WITHOUT 12 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the 13 license you chose for the specific language governing permissions and 14 limitations under that license. 15 16 See the COPYING file for full details on the licensing of Subunit. 17 18 subunit reuses iso8601 by Michael Twomey, distributed under an MIT style 19 licence - see python/iso8601/LICENSE for details. 20 21Subunit 22------- 23 24Subunit is a streaming protocol for test results. 25 26There are two major revisions of the protocol. Version 1 was trivially human 27readable but had significant defects as far as highly parallel testing was 28concerned - it had no room for doing discovery and execution in parallel, 29required substantial buffering when multiplexing and was fragile - a corrupt 30byte could cause an entire stream to be misparsed. Version 1.1 added 31encapsulation of binary streams which mitigated some of the issues but the 32core remained. 33 34Version 2 shares many of the good characteristics of Version 1 - it can be 35embedded into a regular text stream (e.g. from a build system) and it still 36models xUnit style test execution. It also fixes many of the issues with 37Version 1 - Version 2 can be multiplexed without excessive buffering (in 38time or space), it has a well defined recovery mechanism for dealing with 39corrupted streams (e.g. where two processes write to the same stream 40concurrently, or where the stream generator suffers a bug). 41 42More details on both protocol version s can be found in the 'Protocol' section 43of this document. 44 45Subunit comes with command line filters to process a subunit stream and 46language bindings for python, C, C++ and shell. Bindings are easy to write 47for other languages. 48 49A number of useful things can be done easily with subunit: 50 * Test aggregation: Tests run separately can be combined and then 51 reported/displayed together. For instance, tests from different languages 52 can be shown as a seamless whole, and tests running on multiple machines 53 can be aggregated into a single stream through a multiplexer. 54 * Test archiving: A test run may be recorded and replayed later. 55 * Test isolation: Tests that may crash or otherwise interact badly with each 56 other can be run seperately and then aggregated, rather than interfering 57 with each other or requiring an adhoc test->runner reporting protocol. 58 * Grid testing: subunit can act as the necessary serialisation and 59 deserialiation to get test runs on distributed machines to be reported in 60 real time. 61 62Subunit supplies the following filters: 63 * tap2subunit - convert perl's TestAnythingProtocol to subunit. 64 * subunit2csv - convert a subunit stream to csv. 65 * subunit2disk - export a subunit stream to files on disk. 66 * subunit2pyunit - convert a subunit stream to pyunit test results. 67 * subunit2gtk - show a subunit stream in GTK. 68 * subunit2junitxml - convert a subunit stream to JUnit's XML format. 69 * subunit-diff - compare two subunit streams. 70 * subunit-filter - filter out tests from a subunit stream. 71 * subunit-ls - list info about tests present in a subunit stream. 72 * subunit-stats - generate a summary of a subunit stream. 73 * subunit-tags - add or remove tags from a stream. 74 75Integration with other tools 76---------------------------- 77 78Subunit's language bindings act as integration with various test runners like 79'check', 'cppunit', Python's 'unittest'. Beyond that a small amount of glue 80(typically a few lines) will allow Subunit to be used in more sophisticated 81ways. 82 83Python 84====== 85 86Subunit has excellent Python support: most of the filters and tools are written 87in python and there are facilities for using Subunit to increase test isolation 88seamlessly within a test suite. 89 90The most common way is to run an existing python test suite and have it output 91subunit via the ``subunit.run`` module:: 92 93 $ python -m subunit.run mypackage.tests.test_suite 94 95For more information on the Python support Subunit offers , please see 96``pydoc subunit``, or the source in ``python/subunit/`` 97 98C 99= 100 101Subunit has C bindings to emit the protocol. The 'check' C unit testing project 102has included subunit support in their project for some years now. See 103'c/README' for more details. 104 105C++ 106=== 107 108The C library is includable and usable directly from C++. A TestListener for 109CPPUnit is included in the Subunit distribution. See 'c++/README' for details. 110 111shell 112===== 113 114There are two sets of shell tools. There are filters, which accept a subunit 115stream on stdin and output processed data (or a transformed stream) on stdout. 116 117Then there are unittest facilities similar to those for C : shell bindings 118consisting of simple functions to output protocol elements, and a patch for 119adding subunit output to the 'ShUnit' shell test runner. See 'shell/README' for 120details. 121 122Filter recipes 123-------------- 124 125To ignore some failing tests whose root cause is already known:: 126 127 subunit-filter --without 'AttributeError.*flavor' 128 129 130The xUnit test model 131-------------------- 132 133Subunit implements a slightly modified xUnit test model. The stock standard 134model is that there are tests, which have an id(), can be run, and when run 135start, emit an outcome (like success or failure) and then finish. 136 137Subunit extends this with the idea of test enumeration (find out about tests 138a runner has without running them), tags (allow users to describe tests in 139ways the test framework doesn't apply any semantic value to), file attachments 140(allow arbitrary data to make analysing a failure easy) and timestamps. 141 142The protocol 143------------ 144 145Version 2, or v2 is new and still under development, but is intended to 146supercede version 1 in the very near future. Subunit's bundled tools accept 147only version 2 and only emit version 2, but the new filters subunit-1to2 and 148subunit-2to1 can be used to interoperate with older third party libraries. 149 150Version 2 151========= 152 153Version 2 is a binary protocol consisting of independent packets that can be 154embedded in the output from tools like make - as long as each packet has no 155other bytes mixed in with it (which 'make -j N>1' has a tendency of doing). 156Version 2 is currently in draft form, and early adopters should be willing 157to either discard stored results (if protocol changes are made), or bulk 158convert them back to v1 and then to a newer edition of v2. 159 160The protocol synchronises at the start of the stream, after a packet, or 161after any 0x0A byte. That is, a subunit v2 packet starts after a newline or 162directly after the end of the prior packet. 163 164Subunit is intended to be transported over a reliable streaming protocol such 165as TCP. As such it does not concern itself with out of order delivery of 166packets. However, because of the possibility of corruption due to either 167bugs in the sender, or due to mixed up data from concurrent writes to the same 168fd when being embedded, subunit strives to recover reasonably gracefully from 169damaged data. 170 171A key design goal for Subunit version 2 is to allow processing and multiplexing 172without forcing buffering for semantic correctness, as buffering tends to hide 173hung or otherwise misbehaving tests. That said, limited time based buffering 174for network efficiency is a good idea - this is ultimately implementator 175choice. Line buffering is also discouraged for subunit streams, as dropping 176into a debugger or other tool may require interactive traffic even if line 177buffering would not otherwise be a problem. 178 179In version two there are two conceptual events - a test status event and a file 180attachment event. Events may have timestamps, and the path of multiplexers that 181an event is routed through is recorded to permit sending actions back to the 182source (such as new tests to run or stdin for driving debuggers and other 183interactive input). Test status events are used to enumerate tests, to report 184tests and test helpers as they run. Tests may have tags, used to allow 185tunnelling extra meanings through subunit without requiring parsing of 186arbitrary file attachments. Things that are not standalone tests get marked 187as such by setting the 'Runnable' flag to false. (For instance, individual 188assertions in TAP are not runnable tests, only the top level TAP test script 189is runnable). 190 191File attachments are used to provide rich detail about the nature of a failure. 192File attachments can also be used to encapsulate stdout and stderr both during 193and outside tests. 194 195Most numbers are stored in network byte order - Most Significant Byte first 196encoded using a variation of http://www.dlugosz.com/ZIP2/VLI.html. The first 197byte's top 2 high order bits encode the total number of octets in the number. 198This encoding can encode values from 0 to 2**30-1, enough to encode a 199nanosecond. Numbers that are not variable length encoded are still stored in 200MSB order. 201 202+--------+--------+---------+------------+ 203| prefix | octets | max | max | 204+========+========+=========+============+ 205| 00 | 1 | 2**6-1 | 63 | 206+--------+--------+---------+------------+ 207| 01 | 2 | 2**14-1 | 16383 | 208+--------+--------+---------+------------+ 209| 10 | 3 | 2**22-1 | 4194303 | 210+--------+--------+---------+------------+ 211| 11 | 4 | 2**30-1 | 1073741823 | 212+--------+--------+---------+------------+ 213 214All variable length elements of the packet are stored with a length prefix 215number allowing them to be skipped over for consumers that don't need to 216interpret them. 217 218UTF-8 strings are with no terminating NUL and should not have any embedded NULs 219(implementations SHOULD validate any such strings that they process and take 220some remedial action (such as discarding the packet as corrupt). 221 222In short the structure of a packet is: 223 224 PACKET := SIGNATURE FLAGS PACKET_LENGTH TIMESTAMP? TESTID? TAGS? MIME? 225 FILECONTENT? ROUTING_CODE? CRC32 226 227In more detail... 228 229Packets are identified by a single byte signature - 0xB3, which is never legal 230in a UTF-8 stream as the first byte of a character. 0xB3 starts with the first 231bit set and the second not, which is the UTF-8 signature for a continuation 232byte. 0xB3 was chosen as 0x73 ('s' in ASCII') with the top two bits replaced by 233the 1 and 0 for a continuation byte. 234 235If subunit packets are being embedded in a non-UTF-8 text stream, where 0x73 is 236a legal character, consider either recoding the text to UTF-8, or using 237subunit's 'file' packets to embed the text stream in subunit, rather than the 238other way around. 239 240Following the signature byte comes a 16-bit flags field, which includes a 2414-bit version field - if the version is not 0x2 then the packet cannot be 242read. It is recommended to signal an error at this point (e.g. by emitting 243a synthetic error packet and returning to the top level loop to look for 244new packets, or exiting with an error). If recovery is desired, treat the 245packet signature as an opaque byte and scan for a new synchronisation point. 246NB: Subunit V1 and V2 packets may legitimately included 0xB3 internally, 247as they are an 8-bit safe container format, so recovery from this situation 248may involve an arbitrary number of false positives until an actual packet 249is encountered : and even then it may still be false, failing after passing 250the version check due to coincidence. 251 252Flags are stored in network byte order too. 253 254+------------+------------+------------------------+ 255| High byte | Low byte | 256+------------+------------+------------------------+ 257| 15 14 13 12 11 10 9 8 | 7 6 5 4 3 2 1 0 | 258+------------+------------+------------------------+ 259| VERSION | feature bits | 260+------------+-------------------------------------+ 261 262Valid version values are: 2630x2 - version 2 264 265Feature bits: 266Bit 11 - mask 0x0800 - Test id present. 267Bit 10 - mask 0x0400 - Routing code present. 268Bit 9 - mask 0x0200 - Timestamp present. 269Bit 8 - mask 0x0100 - Test is 'runnable'. 270Bit 7 - mask 0x0080 - Tags are present. 271Bit 6 - mask 0x0040 - File content is present. 272Bit 5 - mask 0x0020 - File MIME type is present. 273Bit 4 - mask 0x0010 - EOF marker. 274Bit 3 - mask 0x0008 - Must be zero in version 2. 275 276Test status gets three bits: 277Bit 2 | Bit 1 | Bit 0 - mask 0x0007 - A test status enum lookup: 278000 - undefined / no test 279001 - Enumeration / existence 280002 - In progress 281003 - Success 282004 - Unexpected Success 283005 - Skipped 284006 - Failed 285007 - Expected failure 286 287After the flags field is a number field giving the length in bytes for the 288entire packet including the signature and the checksum. This length must 289be less than 4MiB - 4194303 bytes. The encoding can obviously record a larger 290number but one of the goals is to avoid requiring large buffers, or causing 291large latency in the packet forward/processing pipeline. Larger file 292attachments can be communicated in multiple packets, and the overhead in such a 2934MiB packet is approximately 0.2%. 294 295The rest of the packet is a series of optional features as specified by the set 296feature bits in the flags field. When absent they are entirely absent. 297 298Forwarding and multiplexing of packets can be done without interpreting the 299remainder of the packet until the routing code and checksum (which are both at 300the end of the packet). Additionally, routers can often avoid copying or moving 301the bulk of the packet, as long as the routing code size increase doesn't force 302the length encoding to take up a new byte (which will only happen to packets 303less than or equal to 16KiB in length) - large packets are very efficient to 304route. 305 306Timestamp when present is a 32 bit unsigned integer for seconds, and a variable 307length number for nanoseconds, representing UTC time since Unix Epoch in 308seconds and nanoseconds. 309 310Test id when present is a UTF-8 string. The test id should uniquely identify 311runnable tests such that they can be selected individually. For tests and other 312actions which cannot be individually run (such as test 313fixtures/layers/subtests) uniqueness is not required (though being human 314meaningful is highly recommended). 315 316Tags when present is a length prefixed vector of UTF-8 strings, one per tag. 317There are no restrictions on tag content (other than the restrictions on UTF-8 318strings in subunit in general). Tags have no ordering. 319 320When a MIME type is present, it defines the MIME type for the file across all 321packets same file (routing code + testid + name uniquely identifies a file, 322reset when EOF is flagged). If a file never has a MIME type set, it should be 323treated as application/octet-stream. 324 325File content when present is a UTF-8 string for the name followed by the length 326in bytes of the content, and then the content octets. 327 328If present routing code is a UTF-8 string. The routing code is used to 329determine which test backend a test was running on when doing data analysis, 330and to route stdin to the test process if interaction is required. 331 332Multiplexers SHOULD add a routing code if none is present, and prefix any 333existing routing code with a routing code ('/' separated) if one is already 334present. For example, a multiplexer might label each stream it is multiplexing 335with a simple ordinal ('0', '1' etc), and given an incoming packet with route 336code '3' from stream '0' would adjust the route code when forwarding the packet 337to be '0/3'. 338 339Following the end of the packet is a CRC-32 checksum of the contents of the 340packet including the signature. 341 342Example packets 343~~~~~~~~~~~~~~~ 344 345Trivial test "foo" enumeration packet, with test id, runnable set, 346status=enumeration. Spaces below are to visually break up signature / flags / 347length / testid / crc32 348 349b3 2901 0c 03666f6f 08555f1b 350 351 352Version 1 (and 1.1) 353=================== 354 355Version 1 (and 1.1) are mostly human readable protocols. 356 357Sample subunit wire contents 358---------------------------- 359 360The following:: 361 362 test: test foo works 363 success: test foo works 364 test: tar a file. 365 failure: tar a file. [ 366 .. 367 ].. space is eaten. 368 foo.c:34 WARNING foo is not defined. 369 ] 370 a writeln to stdout 371 372When run through subunit2pyunit:: 373 374 .F 375 a writeln to stdout 376 377 ======================== 378 FAILURE: tar a file. 379 ------------------- 380 .. 381 ].. space is eaten. 382 foo.c:34 WARNING foo is not defined. 383 384 385Subunit v1 protocol description 386=============================== 387 388This description is being ported to an EBNF style. Currently its only partly in 389that style, but should be fairly clear all the same. When in doubt, refer the 390source (and ideally help fix up the description!). Generally the protocol is 391line orientated and consists of either directives and their parameters, or 392when outside a DETAILS region unexpected lines which are not interpreted by 393the parser - they should be forwarded unaltered:: 394 395 test|testing|test:|testing: test LABEL 396 success|success:|successful|successful: test LABEL 397 success|success:|successful|successful: test LABEL DETAILS 398 failure: test LABEL 399 failure: test LABEL DETAILS 400 error: test LABEL 401 error: test LABEL DETAILS 402 skip[:] test LABEL 403 skip[:] test LABEL DETAILS 404 xfail[:] test LABEL 405 xfail[:] test LABEL DETAILS 406 uxsuccess[:] test LABEL 407 uxsuccess[:] test LABEL DETAILS 408 progress: [+|-]X 409 progress: push 410 progress: pop 411 tags: [-]TAG ... 412 time: YYYY-MM-DD HH:MM:SSZ 413 414 LABEL: UTF8* 415 NAME: UTF8* 416 DETAILS ::= BRACKETED | MULTIPART 417 BRACKETED ::= '[' CR UTF8-lines ']' CR 418 MULTIPART ::= '[ multipart' CR PART* ']' CR 419 PART ::= PART_TYPE CR NAME CR PART_BYTES CR 420 PART_TYPE ::= Content-Type: type/sub-type(;parameter=value,parameter=value) 421 PART_BYTES ::= (DIGITS CR LF BYTE{DIGITS})* '0' CR LF 422 423unexpected output on stdout -> stdout. 424exit w/0 or last test completing -> error 425 426Tags given outside a test are applied to all following tests 427Tags given after a test: line and before the result line for the same test 428apply only to that test, and inherit the current global tags. 429A '-' before a tag is used to remove tags - e.g. to prevent a global tag 430applying to a single test, or to cancel a global tag. 431 432The progress directive is used to provide progress information about a stream 433so that stream consumer can provide completion estimates, progress bars and so 434on. Stream generators that know how many tests will be present in the stream 435should output "progress: COUNT". Stream filters that add tests should output 436"progress: +COUNT", and those that remove tests should output 437"progress: -COUNT". An absolute count should reset the progress indicators in 438use - it indicates that two separate streams from different generators have 439been trivially concatenated together, and there is no knowledge of how many 440more complete streams are incoming. Smart concatenation could scan each stream 441for their count and sum them, or alternatively translate absolute counts into 442relative counts inline. It is recommended that outputters avoid absolute counts 443unless necessary. The push and pop directives are used to provide local regions 444for progress reporting. This fits with hierarchically operating test 445environments - such as those that organise tests into suites - the top-most 446runner can report on the number of suites, and each suite surround its output 447with a (push, pop) pair. Interpreters should interpret a pop as also advancing 448the progress of the restored level by one step. Encountering progress 449directives between the start and end of a test pair indicates that a previous 450test was interrupted and did not cleanly terminate: it should be implicitly 451closed with an error (the same as when a stream ends with no closing test 452directive for the most recently started test). 453 454The time directive acts as a clock event - it sets the time for all future 455events. The value should be a valid ISO8601 time. 456 457The skip, xfail and uxsuccess outcomes are not supported by all testing 458environments. In Python the testttools (https://launchpad.net/testtools) 459library is used to translate these automatically if an older Python version 460that does not support them is in use. See the testtools documentation for the 461translation policy. 462 463skip is used to indicate a test was discovered but not executed. xfail is used 464to indicate a test that errored in some expected fashion (also know as "TODO" 465tests in some frameworks). uxsuccess is used to indicate and unexpected success 466where a test though to be failing actually passes. It is complementary to 467xfail. 468 469Hacking on subunit 470------------------ 471 472Releases 473======== 474 475* Update versions in configure.ac and python/subunit/__init__.py. 476* Update NEWS. 477* Do a make distcheck, which will update Makefile etc. 478* Do a PyPI release: PYTHONPATH=../../python python ../../setup.py sdist bdist_wheel upload -s 479* Upload the regular one to LP. 480* Push a tagged commit. 481 git push -t origin master:master 482