1% Native/Unit Test Development Guidelines 2 3The purpose of these guidelines is to establish a shared vision on 4what kind of native tests and how we want to develop them for Hotspot 5using GoogleTest. Hence these guidelines include style items as well 6as test approach items. 7 8First section of this document describes properties of good tests 9which are common for almost all types of test regardless of language, 10framework, etc. Further sections provide recommendations to achieve 11those properties and other HotSpot and/or GoogleTest specific 12guidelines. 13 14## Good test properties 15 16### Lightness 17 18Use the most lightweight type of tests. 19 20In Hotspot, there are 3 different types of tests regarding their 21dependency on a JVM, each next level is slower than previous 22 23* `TEST` : a test does not depend on a JVM 24 25* `TEST_VM` : a test does depend on an initialized JVM, but are 26supposed not to break a JVM, i.e. leave it in a workable state. 27 28* `TEST_OTHER_VM` : a test depends on a JVM and requires a freshly 29initialized JVM or leaves a JVM in non-workable state 30 31### Isolation 32 33Tests have to be isolated: not to have visible side-effects, 34influences on other tests results. 35 36Results of one test should not depend on test execution order, other 37tests, otherwise it is becoming almost impossible to find out why a 38test failed. Due to hotspot-specific, it is not so easy to get a full 39isolation, e.g. we share an initialized JVM between all `TEST_VM` tests, 40so if your test changes JVM's state too drastically and does not 41change it back, you had better consider `TEST_OTHER_VM`. 42 43### Atomicity and self-containment 44 45Tests should be *atomic* and *self-contained* at the same time. 46 47One test should check a particular part of a class, subsystem, 48functionality, etc. Then it is quite easy to determine what parts of a 49product are broken basing on test failures. On the other hand, a test 50should test that part more-or-less entirely, because when one sees a 51test `FooTest::bar`, they assume all aspects of bar from `Foo` are tested. 52 53However, it is impossible to cover all aspects even of a method, not 54to mention a subsystem. In such cases, it is recommended to have 55several tests, one for each aspect of a thing under test. For example 56one test to tests how `Foo::bar` works if an argument is `null`, another 57test to test how it works if an argument is acceptable but `Foo` is not 58in the right state to accept it and so on. This helps not only to make 59tests atomic, self-contained but also makes test name self-descriptive 60(discussed in more details in [Test names](#test-names)). 61 62### Repeatability 63 64Tests have to be repeatable. 65 66Reproducibility is very crucial for a test. No one likes sporadic test 67failures, they are hard to investigate, fix and verify a fix. 68 69In some cases, it is quite hard to write a 100% repeatable test, since 70besides a test there can be other moving parts, e.g. in case of 71`TEST_VM` there are several concurrently running threads. Despite this, 72we should try to make a test as reproducible as possible. 73 74### Informativeness 75 76In case of a failure, a test should be as *informative* as possible. 77 78Having more information about a test failure than just compared values 79can be very useful for failure troubleshooting, it can reduce or even 80completely eliminate debugging hours. This is even more important in 81case of not 100% reproducible failures. 82 83Achieving this property, one can easily make a test too verbose, so it 84will be really hard to find useful information in the ocean of useless 85information. Hence they should not only think about how to provide 86[good information](#error-messages), but also 87[when to do it](#uncluttered-output). 88 89### Testing instead of visiting 90 91Tests should *test*. 92 93It is not enough just to "visit" some code, a test should check that 94code does that it has to do, compare return values with expected 95values, check that desired side effects are done, and undesired are 96not, and so on. In other words, a test should contain at least one 97GoogleTest assertion and do not rely on JVM asserts. 98 99Generally speaking to write a good test, one should create a model of 100the system under tests, a model of possible bugs (or bugs which one 101wants to find) and design tests using those models. 102 103### Nearness 104 105Prefer having checks inside test code. 106 107Not only does having test logic outside, e.g. verification method, 108depending on asserts in product code contradict with several items 109above but also decreases test’s readability and stability. It is much 110easier to understand that a test is testing when all testing logic is 111located inside a test or nearby in shared test libraries. As a rule of 112thumb, the closer a check to a test, the better. 113 114## Asserts 115 116### Several checks 117 118Prefer `EXPECT` over `ASSERT` if possible. 119 120This is related to the [informativeness](#informativeness) property of 121tests, information for other checks can help to better localize a 122defect’s root-cause. One should use `ASSERT` if it is impossible to 123continue test execution or if it does not make much sense. Later in 124the text, `EXPECT` forms will be used to refer to both 125`ASSERT/EXPECT`. 126 127When it is possible to make several different checks, but impossible 128to continue test execution if at least one check fails, you can 129use `::testing::Test::HasNonfatalFailure()` function. The recommended 130way to express that is 131`ASSERT_FALSE(::testing::Test::HasNonfatalFailure())`. Besides making it 132clear why a test is aborted, it also allows you to provide more 133information about a failure. 134 135### First parameter is expected value 136 137In all equality assertions, expected values should be passed as the 138first parameter. 139 140This convention is adopted by GoogleTest, and there is a slight 141difference in how GoogleTest treats parameters, the most important one 142is `null` detection. Due to different reasons, `null` detection is enabled 143only for the first parameter, that is to said `EXPECT_EQ(NULL, object)` 144checks that object is `null`, while `EXPECT_EQ(object, NULL)` checks that 145object equals to `NULL`, GoogleTest is very strict regarding types of 146compared values so the latter will generates a compile-time error. 147 148### Floating-point comparison 149 150Use floating-point special macros to compare `float/double` values. 151 152Because of floating-point number representations and round-off errors, 153regular equality comparison will not return true in most cases. There 154are special `EXPECT_FLOAT_EQ/EXPECT_DOUBLE_EQ` assertions which check 155that the distance between compared values is not more than 4 ULPs, 156there is also `EXPECT_NEAR(v1, v2, eps)` which checks that the absolute 157value of the difference between `v1` and `v2` is not greater than `eps`. 158 159### C string comparison 160 161Use string special macros for C strings comparisons. 162 163`EXPECT_EQ` just compares pointers’ values, which is hardly what one 164wants comparing C strings. GoogleTest provides `EXPECT_STREQ` and 165`EXPECT_STRNE` macros to compare C string contents. There are also 166case-insensitive versions `EXPECT_STRCASEEQ`, `EXPECT_STRCASENE`. 167 168### Error messages 169 170Provide informative, but not too verbose error messages. 171 172All GoogleTest asserts print compared expressions and their values, so 173there is no need to have them in error messages. Asserts print only 174compared values, they do not print any of interim variables, e.g. 175`ASSERT_TRUE((val1 == val2 && isFail(foo(8)) || i == 18)` prints only 176one value. If you use some complex predicates, please consider 177`EXPECT_PRED*` or `EXPECT_FORMAT_PRED` assertions family, they check that 178a predicate returns true/success and print out all parameters values. 179 180However in some cases, default information is not enough, a commonly 181used example is an assert inside a loop, GoogleTest will not print 182iteration values (unless it is an assert's parameter). Other 183demonstrative examples are printing error code and a corresponding 184error message; printing internal states which might have an impact on 185results. One should add this information to assert message using `<<` 186operator. 187 188### Uncluttered output 189 190Print information only if it is needed. 191 192Too verbose tests which print all information even if they pass are 193very bad practice. They just pollute output, so it becomes harder to 194find useful information. In order not print information till it is 195really needed, one should consider saving it to a temporary buffer and 196pass to an assert. 197<https://hg.openjdk.java.net/jdk/jdk/file/tip/test/hotspot/gtest/gc/shared/test_memset_with_concurrent_readers.cpp> 198has a good example how to do that. 199 200### Failures propagation 201 202Wrap a subroutine call into `EXPECT_NO_FATAL_FAILURE` macro to 203propagate failures. 204 205`ASSERT` and `FAIL` abort only the current function, so if you have them 206in a subroutine, a test will not be aborted after the subroutine even 207if `ASSERT` or `FAIL` fails. You should call such subroutines in 208`ASSERT_NO_FATAL_FAILURE` macro to propagate fatal failures and abort a 209test. `(EXPECT|ASSERT)_NO_FATAL_FAILURE` can also be used to provide 210more information. 211 212Due to obvious reasons, there are no 213`(EXPECT|ASSERT)_NO_NONFATAL_FAILURE` macros. However, if you need to 214check if a subroutine generated a nonfatal failure (failed an `EXPECT`), 215you can use `::testing::Test::HasNonfatalFailure` function, 216or `::testing::Test::HasFailure` function to check if a subroutine 217generated any failures, see [Several checks](#several-checks). 218 219## Naming and Grouping 220 221### Test group names 222 223Test group names should be in CamelCase, start and end with a letter. 224A test group should be named after tested class, functionality, 225subsystem, etc. 226 227This naming scheme helps to find tests, filter them and simplifies 228test failure analysis. For example, class `Foo` - test group `Foo`, 229compiler logging subsystem - test group `CompilerLogging`, G1 GC — test 230group `G1GC`, and so forth. 231 232### Filename 233 234A test file must have `test_` prefix and `.cpp` suffix. 235 236Both are actually requirements from the current build system to 237recognize your tests. 238 239### File location 240 241Test file location should reflect a location of the tested part of the product. 242 243* All unit tests for a class from `foo/bar/baz.cpp` should be placed 244`foo/bar/test_baz.cpp` in `hotspot/test/native/` directory. Having all 245tests for a class in one file is a common practice for unit tests, it 246helps to see all existing tests at once, share functions and/or 247resources without losing encapsulation. 248 249* For tests which test more than one class, directory hierarchy should 250be the same as product hierarchy, and file name should reflect the 251name of the tested subsystem/functionality. For example, if a 252sub-system under tests belongs to `gc/g1`, tests should be placed in 253`gc/g1` directory. 254 255Please note that framework prepends directory name to a test group 256name. For example, if `TEST(foo, check_this)` and `TEST(bar, check_that)` 257are defined in `hotspot/test/native/gc/shared/test_foo.cpp` file, they 258will be reported as `gc/shared/foo::check_this` and 259`gc/shared/bar::check_that`. 260 261### Test names 262 263Test names should be in small_snake_case, start and end with a letter. 264A test name should reflect that a test checks. 265 266Such naming makes tests self-descriptive and helps a lot during the 267whole test life cycle. It is easy to do test planning, test inventory, 268to see what things are not tested, to review tests, to analyze test 269failures, to evolve a test, etc. For example 270`foo_return_0_if_name_is_null` is better than `foo_sanity` or `foo_basic` or 271just `foo`, `humongous_objects_can_not_be_moved_by_young_gc` is better 272than `ho_young_gc`. 273 274Actually using underscore is against GoogleTest project convention, 275because it can lead to illegal identifiers, however, this is too 276strict. Restricting usage of underscore for test names only and 277prohibiting test name starts or ends with an underscore are enough to 278be safe. 279 280### Fixture classes 281 282Fixture classes should be named after tested classes, subsystems, etc 283(follow [Test group names rule](#test-group-names)) and have 284`Test` suffix to prevent class name conflicts. 285 286### Friend classes 287 288All test purpose friends should have either `Test` or `Testable` suffix. 289 290It greatly simplifies understanding of friendship’s purpose and allows 291statically check that private members are not exposed unexpectedly. 292Having `FooTest` as a friend of `Foo` without any comments will be 293understood as a necessary evil to get testability. 294 295### OS/CPU specific tests 296 297Guard OS/CPU specific tests by `#ifdef` and have OS/CPU name in filename. 298 299For the time being, we do not support separate directories for OS, 300CPU, OS-CPU specific tests, in case we will have lots of such tests, 301we will change directory layout and build system to support that in 302the same way it is done in hotspot. 303 304## Miscellaneous 305 306### Hotspot style 307 308Abide the norms and rules accepted in Hotspot style guide. 309 310Tests are a part of Hotspot, so everything (if applicable) we use for 311Hotspot, should be used for tests as well. Those guidelines cover 312test-specific things. 313 314### Code/test metrics 315 316Coverage information and other code/test metrics are quite useful to 317decide what tests should be written, what tests should be improved and 318what can be removed. 319 320For unit tests, widely used and well-known coverage metric is branch 321coverage, which provides good quality of tests with relatively easy 322test development process. For other levels of testing, branch coverage 323is not as good, and one should consider others metrics, e.g. 324transaction flow coverage, data flow coverage. 325 326### Access to non-public members 327 328Use explicit friend class to get access to non-public members. 329 330We do not use GoogleTest macro to declare friendship relation, 331because, from our point of view, it is less clear than an explicit 332declaration. 333 334Declaring a test fixture class as a friend class of a tested test is 335the easiest and the clearest way to get access. However, it has some 336disadvantages, here is some of them: 337 338* Each test has to be declared as a friend 339* Subclasses do not inheritance friendship relation 340 341In other words, it is harder to share code between tests. Hence if you 342want to share code or expect it to be useful in other tests, you 343should consider making members in a tested class protected and 344introduce a shared test-only class which expose those members via 345public functions, or even making members publicly accessible right 346away in a product class. If it is not an option to change members 347visibility, one can create a friend class which exposes members. 348 349### Death tests 350 351You can not use death tests inside `TEST_OTHER_VM` and `TEST_VM_ASSERT*`. 352 353We tried to make Hotspot-GoogleTest integration as transparent as 354possible, however, due to the current implementation of `TEST_OTHER_VM` 355and `TEST_VM_ASSERT*` tests, you cannot use death test functionality in 356them. These tests are implemented as GoogleTest death tests, and 357GoogleTest does not allow to have a death test inside another death 358test. 359 360### External flags 361 362Passing external flags to a tested JVM is not supported. 363 364The rationality of such design decision is to simplify both tests and 365a test framework and to avoid failures related to incompatible flags 366combination till there is a good solution for that. However there are 367cases when one wants to test a JVM with specific flags combination, 368`_JAVA_OPTIONS` environment variable can be used to do that. Flags from 369`_JAVA_OPTIONS` will be used in `TEST_VM`, `TEST_OTHER_VM` and 370`TEST_VM_ASSERT*` tests. 371 372### Test-specific flags 373 374Passing flags to a tested JVM in `TEST_OTHER_VM` and `TEST_VM_ASSERT*` 375should be possible, but is not implemented yet. 376 377Facility to pass test-specific flags is needed for system, regression 378or other types of tests which require a fully initialized JVM in some 379particular configuration, e.g. with Serial GC selected. There is no 380support for such tests now, however, there is a plan to add that in 381upcoming releases. 382 383For now, if a test depends on flags values, it should have `if 384(!<flag>) { return }` guards in the very beginning and `@requires` 385comment similar to jtreg `@requires` directive right before test macros. 386<https://hg.openjdk.java.net/jdk/jdk/file/tip/test/hotspot/gtest/gc/g1/test_g1IHOPControl.cpp> 387ha an example of this temporary workaround. It is important to follow 388that pattern as it allows us to easily find all such tests and update 389them as soon as there is an implementation of flag passing facility. 390 391In long-term, we expect jtreg to support GoogleTest tests as first 392class citizens, that is to say, jtreg will parse @requires comments 393and filter out inapplicable tests. 394 395### Flag restoring 396 397Restore changed flags. 398 399It is quite common for tests to configure JVM in a certain way 400changing flags’ values. GoogleTest provides two ways to set up 401environment before a test and restore it afterward: using either 402constructor and destructor or `SetUp` and `TearDown` functions. Both ways 403require to use a test fixture class, which sometimes is too wordy. The 404simpler facilities like `FLAG_GUARD` macro or `*FlagSetting` classes could 405be used in such cases to restore/set values. 406 407Caveats: 408 409* Changing a flag’s value could break the invariants between flags' values and hence could lead to unexpected/unsupported JVM state. 410 411* `FLAG_SET_*` macros can change more than one flag (in order to 412maintain invariants) so it is hard to predict what flags will be 413changed and it makes restoring all changed flags a nontrivial task. 414Thus in case one uses `FLAG_SET_*` macros, they should use `TEST_OTHER_VM` 415test type. 416 417### GoogleTest documentation 418 419In case you have any questions regarding GoogleTest itself, its 420asserts, test declaration macros, other macros, etc, please consult 421its documentation. 422 423## TODO 424 425Although this document provides guidelines on the most important parts 426of test development using GTest, it still misses a few items: 427 428* Examples, esp for [access to non-public members](#access-to-non-public-members) 429 430* test types: purpose, drawbacks, limitation 431 * `TEST_VM` 432 * `TEST_VM_F` 433 * `TEST_OTHER_VM` 434 * `TEST_VM_ASSERT` 435 * `TEST_VM_ASSERT_MSG` 436 437* Miscellaneous 438 * Test libraries 439 * where to place 440 * how to write 441 * how to use 442 * test your tests 443 * how to run tests in random order 444 * how to run only specific tests 445 * how to run each test separately 446 * check that a test can find bugs it is supposed to by introducing them 447 * mocks/stubs/dependency injection 448 * setUp/tearDown 449 * vs c-tor/d-tor 450 * empty test to test them 451 * internal (declared in .cpp) struct/classes 452