1These are some simple design notes to document this program and to guide 2further work. 3 4This program is designed to do the following: 5 6 a) Encapsulate every RNG known to humans for testing. We will use the 7GSL both for its rich supply of ready-to-test RNG's and for its 8relatively simple and consistent encapsulation of any new RNG's we add 9-- /dev/random is already encapsulated and can serve as a template for 10future work. [1/16/03 rgb] 11 12 b) test speed of the available RNG's. Both: 13 i) In "native/standard" subroutine calls (with the subroutine call 14 overhead included). [This is done, 1/16/03, rgb] 15 ii) In a custom "vector" form, where a whole block of dedicated 16 memory is filled in one pass from a given algorithm, and 17 subsequently accessed via a macro that only calls for a refill 18 when the vector is exhausted. 19 20 c) test quality of the available RNG's. There are at least three 21primary sources of tests that will be used in this tool. 22 23 i) RGB tests. These are the tests I myself have implemented. To 24 the best of my knowledge these are new, although of course 25 my knowledge is far from complete. 26 27 ii) DIEHARD, by Dr. George Marsaglia of FSU. This is a well-known 28 and venerable "battery of tests" of RNG's. 29 30 iii) NIST STS/FIPS (NIST special publication 800-22, revised 31 5/15/2001). This is also a suite of RNG tests, with some 32 overlap with DIEHARD. I will probably implement only selected 33 tests from this suite at least at first, as some of them 34 appear to be relatively weak compared to e.g. rgb_binomial 35 and to test basically the same thing. 36 37 iv) Additional tests that are to be found on the web and in the 38 literature, where they appear to test things not 39 well-represented in tests already implemented in one suite 40 or another. 41 42 d) test quality of any input set of "random" data according to i-iv in 43c). This will let us test arbitrary RNG's via their data, including and 44especially hardware generators. Note that hardware generators available 45as "devices" will be interfaced via a). This step will only be 46implemented when enough tests to make it worthwhile are already 47implemented. 48 49 50 Addendum and explanation of copyright issues. 51 52The STS was written by a variety of authors: 53 54 Andrew Rukhin, Juan Soto, James Nechvatal, Miles Smid, Elaine Barker, 55 Stefan Leigh, Mark Levenson, Mark Vangel, David Banks, Alan Heckert, 56 James Dray, San Vo. 57 58None of the actual code in the STS suite has been used in this tool -- 59all testing routines have been written using only the STS published 60document as an excellent reference. GSL routines have been used 61throughout, where possible, for computing things such as erfc or Q. 62Since I am not using their code, I am omitting their copyright notice, 63but wish to acknowledge their documentation of the selected tests 64anyway. All the code in dieharder is GPL open source in any event, but 65academic credit matters. 66 67Similarly, DIEHARD is the work of George Marsaglia. Dr. Marsaglia 68doen't have any written copyright notice or license terms that I could 69find in the sources provided on the website where he openly distributes 70source for his own suite, but in keeping with a minor design goal of 71completely original GPL code, all the DIEHARD algorithms have also been 72completely rewritten without directly utilizing any of the actual 73diehard code. Fortunately, Dr. Marsaglia also provides clear 74documentation for his test suite, making it a fairly simple matter to 75implement the tests. Source code WAS examined (but not copied) to 76clarify places where the documentation of the tests was openly erroneous 77(as in the parking lot test) or unclear but dieharder code is completely 78original and its development thoroughly documented in CVS and (later) 79SVN tree form, checkin by checkin. 80 81The relatively few tests I have added are motivated by a desire to get a 82less ambiguous answer than many of these tests provide. In many cases 83it is not (or should not) be correct to say that one "accepts" a 84generator as being a good one just because a test run of the generator 85has a p-value significantly greater than 0. A few moment's 86experimentation, especially with relatively small data sets, should 87convince one that even "bad" RNG's can sometimes, or even frequently, 88return an "acceptable" p-value. Only when the size of the test is 89increased, or the test is repeatedly run with different seeds, does it 90become apparent that although it sometimes performs acceptably (the 91result of a run isn't inconsistent with the generator producing "real" 92random numbers) it sometimes performs so poorly that the result can 93>>never<< be explained or believed for a true random number generator. 94 95That is, while a really poor p-value allows us to reject the hypothesis 96that the tested generator is random, acceptable p-values, even a fair 97number of them in repeated tests, do not usually support accepting the 98hypothesis -- at best they don't support rejection. 99 100Running a test repeatedly to generate a full distribution of results and 101then directly comparing the entire distribution with theory appears to 102provide greater sensitivity and accuracy in the rejection process than 103"simple" tests that generate only a single quantity. This motivated the 104rgb_binomial test, which has proven very good at rejecting "bad" rng's. 105 106Similarily, it may prove useful to directly generate an empirical 107distribution of p-values themselves (repeating a test many times with 108different seeds and binning the p-values) and compare it to the expected 109distribution (which might work for erfc-based p-values, although not so 110well for Q-based p-values). This can replace running a test many times 111looking for an anomalous number of "poor" p-values mixed in with ones 112that are not so poor as to lead to immediate rejection. 113 114Contributions of rng's (consistently GSL-wrapped as e.g. dev_random.c 115demonstrates) and/or additional tests would be most welcome. 116 117 rgb 118 119 120 STS Critique, By Statistic 121 122 The Monobit Statistic 123 124The monobit statistic checks to make sure that the raw number of 1's and 1250's approximately balance in a given bitstring. This is actually a 126decent thing to test, but there is a better way to test it. Given a 127relatively short bitstring (e.g. n<=160 bits), the probability that it 128will have k bits according to the null hypothesis is p(n,k,1/2) (the 129binomial distribution of outcomes over n independent trials, each with 130p=1/2). If one runs m runs of n independent trials each, and 131accumulates X[k] (the count histogram for each possible value of output 132k in [0,n]) then one has X[k], the expected value histogram Y(k) = 133m*p(n,k,1/2), and sigma(k) = sqrt(m)*p(n,k,1/2). From this it is simple 134to compute a p-value from an incomplete gamma function for the 135>>details<< of the distribution of 1's. One can even do it two ways -- 136with an independent random number seed for each of the m samples, or 137by simply running the rng with its original seed long enough to produce 138m*n bits. 139 140This binomial test is stronger than the monobit test. Any result 141histogram that passes it necessarily passes the monobit test for the 142aggregate distribution with m*n bits (as I hope is obvious -- passing it 143means that the distribution is binomial in detail, not just uniformly 144balanced. However, it is simple enough to compute the monobit result 145more or less in passing in the binomial test by simply summing the count 146histogram at the end or aggregating a total 1-bit count as one goes. In 147that case one can generate two p-values -- a p-value for the monobit 148test and one for the binomial -- in one pass for the same data. I would 149predict that the binomial p-value is strictly less than the monobit 150p-value, and that there exist rng's that pass the monobit that 151explicitly fail the binomial, but we can determine that empirically. 152 153This chisq-based binomial test methodology can be used to improve nearly 154any single test statistic in sts. Instead of looking at the expected 155value of the statistic itself, frame its generation as a Bernoulli trial 156with a known percentage, compute the expected binomial distribution and 157its error, generate the result histogram as the test statistic, and use 158chisq and the complementary incomplete gamma function Q(a,x) to generate 159a p-value over many smaller "independent" trials instead of one long 160one. 161 162This suggests ways of improving e.g. the runs and other tests as well, 163see below. 164 165 The Runs Statistic 166 167For n bits, count the test statistic X as the number of (overlapping) 01 168and 10 pairs and add one. Compute p_1 = (# ones)/n, p_0 = (1 - p_1) 169(and ensure that they are close enough to p_0 = p_1 = 0.5). Compute the 170expected number of 01 and 10 pairs as Y = 2*n*p_1*p_0. Compute the 171expected error in the number of 01 and 10 pairs as sigma = 1722*sqrt(2*n)*p_1*p_0. Finally, compute the p-value as erfc(|X - 173Y|/sigma). 174 175This methodology seems questionable, at least to me. 176 177First of all, it seems odd to use a p_0 and p_1 derived from the sample, 178given that the null hypothesis is p_0 = p_1 = 0.5 and independent. 179Ensuring that the sample passes monobit (in a slightly different 180formulation) to start means that one has conditionally >>accepted<< this 181null hypothesis for the rng before beginning the runs test, using 182different p's implies a different null hypothesis. 183 184In addition, given their prior assumption that p_1 = # ones/# bits, 185their computation of the probability of p_01 and p_10 (and hence Y) is 186erroneous. The probability of 01 in a finite string of n bits, subject 187to constrained numbers of 0's and 1's needs to be computed WITHOUT 188replacement, just as one does with a deck of cards. After shuffling a 189deck of cards known to contain four aces, the probability that the top 190two cards are aces is not (4/52)*(4/52), but rather (4/52)*(3/52) -- if 191the first card is an ace, it means there are fewer aces left in the deck 192for the second card. To be consistent, they should compute the expected 193number of 01 strings in a 10-bit sequence containing 6 1's as 194(4/10)(6/9) and not (4/10)(6/10). They need to choose between a null 195hypothesis of p_1 = p_0 = 0.5 (where they can presume successive bit 196probabilities to be independent) and a null hypothesis of p_1 = (# 1971's)/(# bits) and use "probability for a 1, given that the first draw is 198a 0". This matters for short strings. 199 200Next, they don't wrap the n-value bitstring into a torus. Consequently, 201there are only n-1 bitpairs examined, and thus the expected value should 202be Y = 2*(n-1)*p_1*p_0 and not 2*n*p_1*p_0. For large n, of course, it 203won't matter (neither will the previous two problems, at least not 204much), but it is surprisingly sloppy mathematically. 205 206In addition there is no reason I can see to add a 1 to the test 207statistic of (# of 01 and 10 pairs in an overlapping scan of the 208string). This is very difficult to understand, as it clearly breaks the 209symmetry of this test statistic with the complementary view where the 210test statistic X is (# of 00 or 11 pairs in an overlapping scan of the 211string) but Y and sigma are the same -- the sum of the Y's necessarily 212equals n (or n-1, if that is the number of examined pairs). This leads 213to an obvious problem for short strings, e.g. n = 2, where the formula 214they give for the expected number of 01 or 10's measured is 2152*2*(1/2)*(1/2) = 1, but the statistic can only take on the values 1 or 2162, each with probability 1/2. There is something odd here. 217 218Finally, it is more than a bit crazy to view this assessment as somehow 219a measure of excessive sequential correlation, as a measure of "runs". 220Let us reframe the test statistic (count of 01 and 10's) in terms of our 221null hypothesis. Presuming random, equal probability 0's and 1's, we 222expect 25% each of 00 01 10 11. Given an exemplary bit string of e.g. 2231001101011, it doesn't really matter if one examines disjoint bitpairs 224(10 01 10 10 11) (n/2 bitpairs for n even) or the overlapping bitpairs 22510 00 01 11 10 01 10 01 11 11 (wrapping the last pair around 226periodically to ensure symmetry and n pairs of bits, EACH pair is 227expected to occur with a frequency, in the limit of many bitpairs, of 228#bitpairs*(1/4). 229 230Furthermore, in an ensemble of measurements of bitpair frequencies, each 231distinct possibility is (for lots of identical bitsets) to be binomially 232distributed with respect to the rest (as in p_01 = 1/4, p_!01 = 3/4) 233regardless of whether or not sequential overlapping bitpairs are 234counted. We can understand this by noting that our null hypothesis (a 235random generator producing pairs of bits) can be applied to any pair of 236bits in any string. 237 238Note also that in a given bitstring with periodic wraparound, the number 239of 01 (overlapping) pairs must equal the number of 10 pairs. This can 240be proven with ease. Every bit flipped can either change >>both<< the 24101 and 10 count by one (flipping a bit with two neighbors that are both 2420 or both 1) or it can leave them both the same (flipping a bit with a 243neighboring 0 AND a neighboring 1). This wreaks a certain measure of 244havoc on the notion that each sequentially overlapping pair examined is 245an independent measurement -- measuring any pair of (01 or 10 count) and 246(00 or 11 count) suffices to determine the counts for all four possible 247bit pairs. 248 249The only question remaining is whether measuring the counts using 250overlapping bitpairs is more or less rigorous a test than measuring the 251counts using non-overlapping pairs. That is, can a generator create a 252bitstring that is correctly pairwise distributed in both non-overlapping 253pair sequences but that is NOT correctly pairwise distributed in 254overlapping pair sequences or vice versa. The answer is not terribly 255obvious, but if we make the very weak presumption that our test 256measurments do not depend, on average, on whether we begin them 257systematically at bit 0 or bit 1 (so that if we have a bad generator 258we'll eventually fail for either starting point with roughly equal 259frequency for different starting seeds and/or first-bit displacements), 260then measuring both possibilities for each string should (on average) 261produce results that are conservatively related to measuring one 262possibility on twice as many strings. 263 264To put it another way, if a generator creates strings that >>fail<< the 265overlapping test, they would necessarily fail at least one of the two 266non-overlapping tests possible for the same bitstring, since the counts 267for the two non-overlapping tests add to produce the count for the 268non-overlapping test. For this addition to push values out of 269acceptable ranges one would require two deviations from the expected 270value in the same direction. If we only make the extremely weak 271assumption that failure of the non-overlapping tests is equally likely 272for both possible cyclic starting points, the 1/sqrt(2) scaling implicit 273in using both strings is thus conservative. 274 275However, we can easily average over this as well by simply randomly 276starting each non-overlapping run on bit 0 or bit 1. This might 277actually simplify the code and provides an ADDITIONAL measure of 278randomness as the two results should themselves be binomially 279distributed, independently. 280 281The final problem with runs is that it is not easy to see how to extend 282it to a higher order. For example, does 101 occur too often? How about 283010? Again, each bit pattern should occur at a rigorously predicted 284rate for p_0 = p_1 = 0.5 and independent bits, e.g. p_000 = p_001 = 285p_010 = ... = 1/8 for any distinct three-bit combination, p_0000... = 2861/16 for four-bit combinations, and so on. EACH of these outcomes 287should occur with binomially distributed frequencies when measured as a 288bernoulli trial with p_xyz, (1 - p_xyz), making it easy to compute both 289Y and sigma. This suggests that the correct generalization of monobit 290and runs (and probably more of the sts tests) is a test that (in one 291pass): 292 293 a) presumes a valid bitlevel rng so that p_0 = p_1 = 0.5 exactly, all 294bits independent (null hypothesis). 295 296 b) counts 1's (and infers 0's) for M sets of N bits, incrementing a 297histogram vector of the number of 1's observed accordingly. This is 298rigorously compared for EACH possible value of the count to the expected 299binomial distribution of counts. 300 301 c) counts e.g. 00,01,10,11 overlapping pairs. We showed above that 302the count of 01's exactly equals the count of 10's. The count of 11 303pairs equals the total number of pairs less the count of 00's, 01's and 30410's (all known). Even though these are not all independent, there is 305no harm and a small ease of coding advantage in binning all four counts 306in four result histograms, each to be compared to its binomial 307expectation. 308 309 d) counts e.g. 000, 001, 010... overlapping pairs. Here we IGNORE 310possible count relationships for sure (as we won't stop with three 311bits:-). Again, bin the outcome count frequencies and compare to their 312expected binomial distribution. 313 314This process can continue indefinitely, as long as we can accumulate 315enough counts in each pattern histogram to make aggregate statistics 316reliable. For example, at 8 bits, one is basically ensuring that the 317distribution of ascii characters is BOTH overall uniform (has the 318correct mean number of letters) AND that the actual distribution of 319those letters in samples is correct from the point of view of Bernoulli 320trials. 321 322I think that this process is strengthened significantly by conducting it 323simultaneously for all the sequences. The first measures "bit frequency 324and randomness". The second measures "pairwise sequential 325correlations". The third measure "three-bit sequential correlations" 326and so forth. A sequence might pass the first and fail the second, pass 327the first two and fail the third, pass the first seven and fail the 328eighth, pass the first eight and fail the 32nd (so that floats formed 329from the inverse are not uniformly distributed). 330 331=============== 332 333Addendum to previous note. The insight has struck me that all that this 334does is give one the actual DISTRIBUTION of outcomes for any given 335experiment. The distribution of sample means should be -- guess what -- 336normal -- provided that there are enough elements in the sample itself. 337For small numbers of elements in the sample, they should be binomial if 338the experiment is framed as a bernoulli trial. I'll bet that if I 339convert smoothly from binomial into normal as the number of outcome 340states exceeds the threshold where computing binomial probabilities is 341reasonably possible, I'll see no difference in an appropriately computed 342chisq! 343 344This is the solution to the problem of doing long strings. If I counted 345e.g. bitpair combinations across a string of length 8, (for many 346samples) binomial is appropriate. If I count bitpair combinations 347across a string of length 2^23 (8 million), I can't do binomial, but 348don't have to. Outcomes will be normally distributed about the expected 349means. 350 351SO, this is the answer to the bigger problem of why one should ever 352trust "p-value" for a single sample, or even a dozen samples, of a 353random process. Sure (to quote Marsaglia) "p happens", but it doesn't 354JUST happen, it happens according to a distribution. Instead of 355wondering if THIS occurrence of p = 0.02 from a single mean value is 356reasonable or not, compute the p of an entire distribution of mean 357values compared to the appropriate normal or binomial distribution, and 358get a p-value one can really trust! 359 360======================= 361 362OK, so I'm stupid and I'm wrong. sts_monobit is, in fact, more 363sensitive than rgb_binomial although rgb_binomial might yet pick up a 364generator that passes sts_monobit and fail it -- one that preserves the 365symmetry of its 0/1 distribution about the midpoint (so the mean number 366of 0's and 1's is correct) but has the >>wrong<< distribution, e.g. is 367not binomial. 368 369As it turns out, most distributions that fail rgb_binomial do not have 370symmetry about their midpoint and consequently fail sts_monobit "before" 371failing rgb_binomial. If you like, rgb_binomial only becomes reliable 372when there is enough data to fill the primary non-zero bins near the 373middle, which takes a goodly chunk of data. sts_monobit, in the 374meantime, just adds up the number of 0's and 1's while steadily chopping 375down sigma -- by the time one has sampled a few thousand bits a lot of 376rng's are already 0/1 asymmetric enough to fail monobit, but the bin 377sigma's are still large enough to forgive a lot of slop in any given 378bin. 379 380Interestingly, binomial can be applied to very small bitstrings -- in 381fact, to bitstrings of length 1 (I think). At that point it should 382become equivalent to monobit, but in a somewhat different way, 383especially since rgb_binomial now evaluates monobit directly on the 384accumulated bit values. 385 386Looks like I >>do<< have some work to do to fully understand everything, 387I do I do...;-) 388 389============== 390 391Whoo, dawgies! Out of a bit of insight, rgb_persist was born, and it 392explains why a LOT of rng's suck incredibly. Lots of rng's quite 393literally never change the contents of certain bits (usually lower 394order bits). One common symptom is for an rng to always return odd 395or even numbers. Another common symptom is for a rng to fail a 396distributional test when the repeated bits are measured (as repeated 1's 397cause an excess of 1's, repeated 0's an excess of 0's, even repeated 39801's can cause runs/bitpair tests to fail). Worse, rgb_persist clearly 399shows that MOST rng's that exhibit the problem at all exhibit it 400STRONGLY (LOTS of repeated lower order bits) for at least some seeds. 401Using them, one runs a significant (indeed, a near "sure thing") risk of 402encountering seeds such that as many as 18 bits out of 32 are repeated, 403leaving one with only a fraction of the variation you thought you 404had. 405 406At this point, I'd have to say that rgb_persist is one of the first 407tests to run on ANY rng, and any rng that fails it should probably 408just be rejected out of hand -- even if only the last (least 409significant) bit is repeated, it seems dangerous to use an rng that 410produces only even or only odd numbers from any given seed, even for 411games. 412 413Writing the dumpbits and rgb_persist routines also gave me insight into 414how rgb_binomial and even the sts routines might be failing relative to 415their design goals. I've been processing bits least to most 416significant, and I've been ignoring random_max when doing so. This is 417a capital mistake, as of course an rng will fail when it only returns 418e.g. 16 bits in a 32 bit slot (and the other bits are necessarily 419repeated). I've probably been "passing" only rng's that return a full 42032 "random bits" (and of course don't fail e.g. rgb_persist()). 421 422SO, I need to rewrite sts_monobit, sts_runs, and rgb_binomial (last 423first) so that they only check bits in the valid part of each returned 424word. This will be modestly annoying, but is of course absolutely 425necessary for the results to mean a damn thing. 426 427==================== 428 429Note the new test rgb_bitdist, which is the BEST generalization of 430sts_monobit and perhaps better than rgb_binomial as well. We need to 431merge sts_monobit, rgb_persist, and rgb_bitdist into a single test 432that measures the total/average bit number, the per-bit-slot 433total/average number, and derives the cumulative bitmask of repeated 434bits. 435 436Two additional directions to take this: 437 438First: 439 440 generate a set of Ntest variables for e.g. 441 1 bit 442 2 bit 443 3 bit 444 4 bit 445 ... 446 N bit 447 448combinations (one each). For number of bits and bitpattern, sweep the 449bitstring and accumulate the frequency histogram (for each possible 450bitpattern in each string location) 451 452 ntest[no_of_bits].x[bit_pattern_slot] 453 454with periodic wraparound on the bitstring and so forth -- this is, in 455fact, sts_runs but for arbitrary numbers of bits and accumulating BOTH 456the totals AND the actual distribution, and check the chisq on the 457distribution and the pvalue of the total expected. 458 459Second: 460 461Here we look for subtle sequential bitlevel correlations. We do the 462same test as before, except that the frequency histogram is LATERAL -- 463doing each bit position in a sequential fashion. This test should see a 464clear failure for bits that don't change (strong sequential correlation, 465that) but should ALSO discover individual bits that do things like: 46601010101 or 0011110000111100 or any other "balanced" repetition. 467 468In fact, we may eventually do a fft on the forward sequential bit 469sequences to see if there are even very weak sequential bit-level 470correlations like 001010100001100010 (where every sequential triplet has 471two 0's and one 1), even if the triplets themselves are "random". A 472good frequency test on all 3-way bit patterns should catch this, of 473course, and the lovely thing about binary is that I can always use a 474suitably masked integer conversion of the bit pattern itself as the 475histogram index! So it needn't be TOO horrible to encode. 476 477===================================================== 478 479Note on diehard parking lot test. First of all, the documentation for 480the test is incorrect -- it tests for "crashes" (overlap) of SQUARE cars 481with sides of length 1, not ROUND cars of RADIUS one as Marsaglia's 482documentation and comments assert. If round cars ("helicopters", to 483borrow Marsaglia's term) are used a different distribution is obtained 484with mean over 4000. 485 486Actually to me, round cars seems very very similar to one of Knuth's 487tests for hyperplanar distribution of random coordinates, which has a 488much more solid theoretical foundation; I suspect the tests are more or 489less equivalent in sensitivity and will yield identical success/failure 490patterns in 99% of all cases (maybe even 100%). And the sensitivity of 491this test sucks -- it was very difficult for me to find a test that 492failed it out of the entire GSL collection (trying a half dozen 493known-weak generators or more in the process). The one that finally DID 494fail it really, really sucked and failed all sorts of other tests. 495 496I'd be interested to see if this test deserves to live, in the long run. 497I rather suspect that Knuth tests in variable dimensions will yield a 498much more systematic view of RNG failure, just as rgb_bitdist does 499compared to e.g. runs tests. 500 501================================================================== 50207/11/06 503 504I've just finished testing Marsaglia's bitstream test, and it is (like 505rgb_bitdist) a test that nothing survives. In fact, it is a dead 506certainty that rgb_bitdist is a better version of the same basic test, 507and that generators that fail rgb_bitdist at (say) 6 bits are going to 508fail Marsaglia's less sensitive 20-bit test. What his test does that 509bitstream doesn't is use a single relatively coarse grained test 510statistic -- the expected number of missing 20 bit numbers out of 2 511million trials -- instead of doing what rgb_bitdist does, which is 512do a chisq test on the fractional occupancy of the entire VECTOR of 20 513bit numbers. 514 515Research question: Which is more sensitive AT 20 bits? It seems to me 516that the chisq vector test is clearly superior as it would catch many 517possible "bad" distributions that conserved the mean number of missing 518numbers at this sample size (however unlikely such special deviations 519might be in practice) but then there is the question of sensitivity 520given a fixed sample size. SO FAR, however, rgb_bitdist continues to 521be the premier test for revealing failure, with nothing getting out 522alive past 6 bits, ASSUMING that I'm computing the vector chisq and 523p-value per run correctly and not just revealing that I'm using the 524wrong form as the number of bits goes up. 525 526Oh, and nothing survives diehard_bitdist EITHER, at least when one 527cranks the test up past Marsaglia's wimply 20 iterations to (say) 200. 528At that point the non-uniformity of the p-distribution is absolutely 529apparent -- it even has structure. That is, some seeds lead to biased 530results that are TOO CLOSE to the mean value and not properly 531distributed... an interesting observation all by itself. 532 533 rgb 534 535================================================================== 53607/11/06 later that day... 537 538diehard_count_1s_stream() now WORKS! What a PITA! And (I suspect) how 539unnecessary all the complexity! This test (as it turns out) computes 540chisq on what amounts to a strangely remapped rgb_bitdist test 541(precisely!) on four and five digit base 5 integers obtained by 542remapping bytes based on the number of 1's therein. 543 544That is, 256 possibilities are reduced to 5, each with a 545trivial-to-compute frequency/probability, the base 5 integer is left 546shifted (base 5!) and added to the next one to cumulate 4 or 5 digit 547numbers, those numbers are then counted, and the counts compared to the 548EXPECTED counts given the number of samples and turned into chisq (one 549each, for 4 and for 5 digit numbers). Sigh. 550 551OK then, we COULD just print the p-value of those chisq's -- an 552absolutely straightforward process. Alternatively, we could do what 553Marsaglia does -- compute the DIFFERENCE of the chisq's themselves, 554which of course are supposedly distributed around the number of degrees 555of freedom of each one separately, that is 3125 - 625 = 2500! This is 556the mean value of the difference, with stddev = sqrt(2*2500). This 557final result is turned into a p-value. 558 559This of course is wierd and wrong in so many ways. For one, it is 560entirely possible for chisq_4 and chisq_5 to differ systematically from 561their expected values, and it is perfectly possible and likely that 562those deviations would occur in the same direction -- e.g. down -- so 563that the 4 byte and 5 byte streams BOTH have fewer degrees of freedom 564than expected. Of course in this case part of the individual deviations 565CANCEL and is not visible in the final p-value! 566 567Honestly, I can think of pretty much "no" reasonable ways that this 568final pvalue can fail where any/all of the p-values for the 4 or 5 (or 2 569or 3 or 1 or 17) digit distributions would not also fail, but I can (and 570just did) think of a way that the final p-value might MARGINALLY pass 571while the individual p-values fail. 572 573Although the general approach is a good one, what is clearly needed is a 574new test -- one that extends rgb_bitdist to multidigit strings of smaller 575base. For example, right now rgb_bitdist tests the FREQUENCY of the 576occurrence of all 5 digit strings but not all 5 digit strings taken two 577at a time (32 x 32 = 1K possbilities). Of course one REASON that I 578don't do that is that I >>already<< do e.g. 10 bit strings -- the DIRECT 579frequency distribution of those 1K possibilities, which obviously 580samples all the 5 bit combos taken two at a time!. And besides, every 581rng I have tested fails at the level of what amounts to two digit octal 582numbers -- six bit strings. So I'm cynical about this. 583 584I also don't think that this really counts 1's. How is this one iota 585different from simply evaluating the distribution of bytes (8 bit 586randoms)? If the complete distribution of 8 bit bytes (rgb_bitdist at 8 587bits) is random, ALL DERIVED DISTRIBUTIONS are random, with the only 588catchy part being temporal/sequential correlations that might be 589revealed from taking the strings four or five bytes at a time (at the 590expense of compressing the information tremendously). 591 592Once again, this test seems reduceable to but not as sensitive as 593rgb_bitdist at a given level, unfortunately a level far beyond where all 594known rng's fail anyway. It is a MEASURE of the lack of sensitivity 595that this does does NOT fail for many of those rng's where rgb_bitdist 596does... 597 598 rgb 599 600================================================================== 60108/16/06 602 603OK, lots has happened. First of all, I've discovered what I believe to 604be at least one, maybe two diehard data errors -- rngs consistently fail 605operm5 and bitstream that I (and everybody else) believe to be "good" 606ones insofar as such a thing exists. Not surprising -- the data being 607used to compute the test statistics was simulated between 10 and 20 608years ago and this test routinely uses more random numbers from better 609generators in any given test than were likely used to simulate them in 610the first place. 611 612Second, dieharder is already in the process of developing quite a user 613base. People have ported it to Windows, for example, and Dirk 614Eddelbuettel is interested in adding a native R interface to the tests 615(to be able to take advantage of several of the other facilities that R 616provides, a la octave/matlab/maple/mathematica as a programming UI). 617 618To facilitate both this latter project and the eventual creation of a 619true GTK-based GUI for dieharder, I've begun the fairly monumental task 620of splitting off all the tests, the add-on rng's that are intended to be 621part of the basic package, and all the intrinsic support functions into 622a standalone link library and associated set of include files. I know 623from experience writing libwulf to support wulfware apps that this is a 624bit tricky, and I expect there to be deep bugs and places where the API 625and separation are not clean for quite some time now. However, when I'm 626DONE it should be REALLY EASY to replace the front end UI. In fact, if 627I clean up the UI itself while I'm at it to make it even more modular 628than it is, I should be able to add a GTK basic interface in a very 629short period of time. 630 631I'm also stripping down the actual source part of dieharder (the tty 632interface version) and putting most of the documentation and so on in 633with the LIBRARY, since that is where it will be needed. Eventually I 634should need something like man pages and/or a manual for the 635libdieharder library, in addition to a much smaller set of toplevel 636dieharder documentation. The application (in any UI version) should 637REALLY be fairly self-documenting as a design goal. 638 639To mark the advent of a version that appears to build on top of 640completely separated libdieharder.a, largely decrufted on both sides, 641I'm bumping the major number of dieharder even though this wasn't an 642original design goal. It clearly merits a major version bump anyway. 643The minor.minor numbers will retain their original meaning (bugfix 644advances and number of tests supported) except that now I should 645probably reset bugfix to 0 in both libdieharder and dieharder. I don't 646know what I'll do about synchronicity between the two -- really I should 647extract numbers from libdieharder to set dieharder version numbers, but 648parsing out all the screen-scraping macros gives me a headache so I'll 649await inspiration or need before messing with it. 650 651Note finally that doing this has put adding more e.g. marsaglia_tsang 652tests on hold. I do have an extensive agenda here -- fixing up a 653"super" version of sts_serial, adding sts_serial itself (which should be 654more or less equivalent to rgb_bitdist, only the latter doesn't compute 655the correct statistic for a chisq, alas), figuring out what the RIGHT 656data is for diehard_operm5 and diehard_bitstream, completing the 657installation of 10^11 to 10^12 sample-based target data of my OWN for 658marsaglia_tsang_gcd, and so on. However, clearly this needs to wait 659until the development environment is stable again, which could be days 660yet, or even longer depending on what else I have to do for Duke etc. 661 662I also haven't even looked at the WinXX patches -- I'll have to figure 663them out all over again after splitting off the library, I expect. 664 665Still, progress is being made. I think that the application is 666definitely out there to where a) it could be announced more publically 667on e.g. the stats lists -- maybe when the libdieharder split has been in 668"alpha" for at least a while...;-) and b) it is time to look for grant 669support for this project to accomplish certain specific goals -- 670supporting testing on a cluster (by the time we're done the 671sts_series/rgb_bitdist/rgb_lmn tests are going to be BIG CPU 672consumers:-), finishing off all the desired/required tests, 673re-simulating key target data for the brave new world of unlimited rands 674to test, and "getting the word out" via publication, travelling to 675learned meetings and presenting, and so on. I'd guess that this will 676require a fairly serious grant for at least 3-5 years to do all that 677needs to be done that I know of already, and may need even more if (as I 678suspect) things are discovered that require significant new work to be 679done on testing algorithms themselves, e.g. "moment expansion" type 680tests, a connection between high-dimensional test failure and bit 681distribution tests. 682 683I have high hopes for the lmn test, although it may prove only the first 684step in lmnop, lmnopqr, lmnopqrst, etc... -- sampling lagged correlation 685between l-bit number, m-bit gap, n-bit number, o-bit gap, p-bit 686number...etc. MANY tests seem to be fancy ways of doing just this and 687nothing more, interestingly enough -- in some cases ways of examining 688only a projective subspace of the failures one might occur from a truly 689systematic test of this sort. For example, a Knuth test that reveals 690hyperplanar clustering of rand triplets viewed as spatial coordinates is 691effectively observing non-uniformity of the random integers represented 692by the conjunction of the three bitstrings... but without attempting to 693validate true randomness on 96-bit integers! 694================================================================== 695 69610/10/06 697 698Well, many many more bugs were discovered (mostly by Charles Karney) and 699fixed and the tests are much improved as a result. At this point "good" 700generators pass all but the operm5 and bitstream tests. operm5 I 701STRONGLY suspect of containing a significant error. bitstream the jury 702is still out on -- the whole series of n-bit monkey tests bothers me 703as they seem inevitably related to rgb_bitdist but less specific and 704sensitive. They are, however, more asymptotic and I've got to really 705think about it before passing final judgement (hopefully buttressed by 706very specific examples of correlated failure according to specific 707patterns). 708 709I added a whole selection of bit manipulation tools. I can now grab 710GUARANTEED sequential bits from the bitstream produced by any of the 711supported rngs (independent of rmax_bits). I can easily generate a 712string of (precisely) uints with 32 supposedly random bits from this 713sequential stream. I can rotate this uint in place and mask/select any 714of its bits. I can grab arbitrary bit windows within a block of memory, 715and can grab arbitrary ntuples within a block of memory with periodic 716boundary conditions. These routines (only) should be used to access the 717rngs in nearly all tests as they eliminate spurious effects from a rng 718returning only 31 bits on a test that de facto presumes 32, or 22 bits, 719or 16 bits. They also permit true overlapping samples to be generated 720absolutely trivially on a given fixed buffer for overlapping sample 721tests in dieharder (where it is by no means obvious that overlapping 722matters in any significant way). 723 724It is time to create two new tests. One of them generalizes rgb_bitdist 725so that the ntuples it samples can be created by more or less arbitrary 726sample patterns, e.g. 727 728stream: 0101010001111101010110101110100010101110101001011001110110110... 729 -------|(repeat) 730 731Get sequential 8-tuples e.g. 01010100 -- existing rgb_bitdist test for 732nbits = 8. 733 734 735stream: 0101010001111101010110101110100010101110101001011001110110110... 736 ----....---|(repeat) 737 738Get 4 bits, skip 4 bits, get 4 bits to make the 8-tuple 01010111, test 739stream of 8-tuples created with this repeating pattern with rgb_bitdist. 740This should reveal fourier-like correlations of certain kinds within 12 741bits, but is LESS sensitive than properly testing 12 bits directly. 742 743 744stream: 0101010001111101010110101110100010101110101001011001110110110... 745 --........--........--........-|(repeat) 746 747Get 2 bits, skip 8 bits, get 2 bits, skip 8, get 2, skip 8, get 2 to 748make the 8-tuple 01111000, test stream of 8-tuples created with this 749repeating pattern with rgb_bitdist. Note that this test looks for 750relatively LONG range ntuplet correlations -- a fourier-like correlation 751over 32 bit cycles and embedded 8 bit cycles at the same time. 752 753stream: 0101010001111101010110101110100010101110101001011001110110110... 754 .....................................---|(repeat) 755 756Skip 37 bits, get 4, run rgb_bitdist on the stream of 4-tuplets produced 757e.g. 1101 etc. Note well that 37 bits is odd and prime, as is 41! This 758approach can look for trouble in difficult places that might well be 759missed by a binary-happy sampling of sequential uints. 760 761This method of sampling can also be run on a single large buffer with 762overlap. In principle, one can construct a lagged test series that 763should reveal e.g. hyperplanar distributions very precisely. However, 764it is still likely limited by the number of bits one can practically run 765rgb_bitdist on, which appears to be somewhere in the ballpark of 12 to 76616. Nevertheless, it also allows one to sample and discover at least a 767large class of problems across a much larger number of bits. 768 769Note well that in principle ALL the tests in dieharder could run on 770random numbers (uints or whatever) that are generated by using a 771patterned, lagged selection of bits such as this. We can therefore 772accomplish a huge amount of flexibility if we build still more routines 773into bits.c, e.g. 774 775 uint get_rand_pattern((void)*result,int rsize,int *pattern,gsl_rng *gsl_rng){} 776 777where pattern is a variable length vector such as: 778 779 pattern[0] = 2; 780 pattern[1] = -8; 781 pattern[2] = 2; 782 pattern[3] = -8; 783 pattern[4] = 2; 784 pattern[5] = -8; 785 pattern[6] = 2; 786 pattern[7] = 0; 787 788which can be parsed to build an arbitrary sized return very easily. 789Another routine might use pattern and a varying offset to generate the 790ten distinct samplings of the bitstream that use one or more of the 791skipped bits, or even the full 32 corresponding to all overlapping 792samples with an offset that runs from 0 to 31, pulled either from 793primary bitstream or from a large cyclic buffer. 794 795I'm moderately tempted to write the Master Bit Selector in this way, but 796the damnable bit selection routines are actually a real pain in the ass 797to write and especially debug, so I think I'll start with 798get_rand_pattern (an obvious generalization of get_rand_bits()) and go 799from there. 800 801Then the entire decision on what to run and test is on the calling 802program. If I make int *pattern a global variable that can be 803allocated, set, used, and freed by the caller before calling any test 804and use only get_rand_pattern() to get ints for testing, ALL tests can 805be run on selected patterns. Dieharder can then sweep over patterns and 806really test the hell out of rngs... 807 808There is a second kind of test that we must implement -- one that I'm 809very well qualified to write on the basis of published research. The 810CLT is a very powerful thing. It predicts BOTH the mean AND the 811variance of any normal sampling process (and probably higher order 812moments as well -- there should be a series of cumulants, really, that 813describe the distribution of means). 814 815This gives us the ability to write a very subtle and sensitive test for 816global autocorrelation over arbitrary internal scales. This is a 817research question (for me, anyway) but it seems that if a system has 818internal correlations that prevent samples from being truly iid, this 819will show up as a variance that does not scale correctly with the number 820of samples. By comparing the VARIANCES, not the means, one should be 821able to directly observe that a statistic has e.g. fewer samples in it 822than you think that it does because of correlations between samples, 823even correlations over a very long time scale. 824 825A classic example is a sample stream with true periodicity -- you think 826that you've accumulated M samples with a variance that scales like 8271/sqrt{M} but really you've gotten the same P samples (each with 828variance that scales like 1/sqrt{P}). The observed variance is 829therefore going to be much smaller than the expected variance for M 830samples, off by a factor of sqrt{M/P}. Similar considerations hold for 831e.g. overlapping bit pattern selection where there is an 832"autocorrelation time" associated with the overlap. 833 834Here we might have a string like: 835 836 11010111010101000101010... 837 838generating: 839 840 11 01 01 11 01 01 01 00 01 01 01 0... 841 842and 843 844 .1 10 10 11 10 10 10 10 00 10 10 10... 845 846EACH of these strings can be tested with e.g. rgb_bitdist to see if the 847patterns are correctly distributed to yield a p-value. If the generator 848passes two bit tests independent of the offset, the distribution of p 849will of course be correct for either sequence. What of the "mixed" 850(interleaved) sequence: 851 852 11 01 01 11 01 01 01 ... 853 854+ 10 10 11 10 10 10 10 ... 855 856= 11 10 01 10 01 11 11 10 01 10 01 10 01 10 ... ? 857 858Well, it is pretty much guaranteed to have the right number of 00's, 85901's, 10's and 11's overall. However, we can make an even stronger 860statement. Suppose that we know that the original bit sequence passes 861the >>3<<-tuple rgb_bitdist test. Then in fact we know that it has the 862right mean distribution of 000 001 010 011 100 101 110 111's. In that 863case as we parse these combinations into the two overlapping duplets, we 864get: 865 866 P(000) = 1/8 => 00 00 867 P(001) = 1/8 => 00 01 868 P(010) = 1/8 => 01 10 869 P(011) = 1/8 => 01 11 870 P(100) = 1/8 => 10 00 871 P(101) = 1/8 => 10 01 872 P(110) = 1/8 => 11 10 873 P(111) = 1/8 => 11 11 874 875If we see 00, then (as we will with probability 1/4, inherited from the 876triplet probability table and validated by the passing of rgb_bitdist 877for the two independent bit sequences separately), we know that it must 878be followed by 00 or 01. 879 880Suppose there were a surplus of 00s relative to 01s. Then there would 881be more than 1/8 000s and less than 1/8 001s and we could not have 882passed the rgb_bitdist test for triplets. This idea is obviously 883extensible for arbitrary ntuplets -- if one passes rgb_bitdist without 884overlap for n bits, one is guaranteed to pass with overlap at n-1 bits. 885 886Consequently there is really >>no point<< in running rgb_bitdist on a 887series of ntuplets. If one observes the correct (binomial) probability 888distribution for the duplets 00, 01, 10, 11 over many samples, then one 889cannot help but have the correct probability distribution for the monets 8900 and 1 separately. If one has the the correct binomial distribution of 891triplets, one cannot help but have the right distribution of duplets and 892monets, and so on, whether or not overlapping samples are used. 893 894This last example is an excellent case of conditional probability at 895work. If we shift our sample window by one bit from a window that 896yields a valid distribution, {\em all that matters} is whether or not 897that additional bit is a 0 or 1 with the correct frequency. There are 898always only {\em two} possibilities: 899 900 bbbb0 -> bbbb and bbb0 901 bbbb1 -> bbbb and bbb1 902 903If that bit is random with probability 0.5, then we should be able to 904use Bayes theorem to generate very simple proofs that everything stated 905above is correct. 906 907================================================================== 90801/29/07 909 910OK, just finished repackaging dieharder to use subpackages and a single 911source tree to build both libraries and the dieharder ui. This will 912TREMENDOUSLY simplify both the semiseparate maintenance of the R UI as 913an RPM and the soon-come GUI built with Gtk/glade or the like. In fact, 914any number of UI's can be added to this source tree, including ones 915under separate subversion trees. I rather expect that even if I 916maintain a separate svn tree and a standalone RPM build/specfile for 917gdieharder, I'll probably keep the gdieharder development node here for 918convenience if nothing else. It's either that or develop only on 919platforms where libdieharder is stable and installed. 920 921I've added a directory point for R, there is already a separate 922directory point for the manual (which will eventually get its own RPM). 923 924I'm wondering if I need to create a mailman list for dieharder. 925Probably really close. Anyway, on to future plans, which is one thing 926the NOTES are for. I still have several new RNGs to add, several tests 927to debug or re-engineer (they consistently fail "good" generators, 928suggesting "bad" tests). I need to create a "standard dataset" and run 929the old fortran diehard against it to create a test reference, then see 930if I can reproduce the reference test values (within spitting distance) 931using the dieharder replacements. That would help me resolve this 932question -- if I get the same numbers on a reference dataset then the 933new code is "good", so if that same code fails "good" rngs, the old code 934is probably bad. Or something like that. 935 936I also need to add a still further improved reporting facility, 937especially for e.g the bitlevel tests. One doesn't ONLY want to know if 938the rgb_bitdist 1's test fails, one wants to know if the sample had an 939excess of 0's or 1's, and by how many sigma, per run, in a table. 940Ditto the per run distribution of 00 01 10 11 in the 2's test, 000 001 941010... for the 3's test, at least out to maybe 8 bits (a table with 256 942entries is BARELY displayable, at least on a pixelized panel of lights 943display in a GUI). Improved visualization is also important, for 944example to see hyperplane development for certain generators. 945 946I still need to work on the bits.c routines and at the very least get to 947where I can pack uints with arbitrary blocks of bits and holes. This 948will permit me to work on fourier and lagged sample tests, ones that 949look for bitlevel correlations in samples pulled with (say) 23-bit holes 950between the end of one uint bitset and the next in an imagined 951continuous stream of random bits. 952 953I need to upload the new 2.x release to the website, and probably clean 954up the website. Sigh. LOTS to do, and so little time and 955competition... 956 957================================================================== 95802/27/07 959 960dieharder now builds with the usual ./configure;make sequence. Gnu 961Build Tools Suck. However, they are standard and expected, if 962infinitely painful. Hopefully this will facilitate the debian build, 963though, and make rpm still works. 964 965May have fixed a minor bug in operm5, where it has been claimed that the 966correct number of degrees of freedom is 5!-4! = 96, not 99. This makes 967at least some sense, so I implemented the change, but operm5 has MORE 968errors in either code or data, and I don't have time or knowledge to fix 969them... 970 971On to the next step -- putting the code on a truly public svn repo --- 972code.google.com. This will take one little bit of work, as the svnsync 973didn't work the first times I tried it... 974 975========================================================================== 97603/09/08 977 978OK, so GBT don't suck, they are merely difficult. I'm about to release 979a fairly major set of fixes and a new test or three, and one of the main 980focuses of this round of changes has been to make libtool and the GBT 981truly happy and dieharder perhaps more portable as a consequence. This 982is still a process not without pain -- documentation is mediocre, 983undocumented features abound, rpath evil must be hand-repaired in order 984to facilitate rpm builds -- but it does look as if the GBT CAN be used 985to run major projects and automate a lot of the same stuff my original 986build automated, once you figure out how. 987 988As far as tests go, I've implemented a number of fixes and "new" tests 989since the last time around, and am about to get truly serious about a 990certain class of tests. 991 992The most important new test is sts_serial. IMO this may prove to be the 993most useful of all rng tests save -- perhaps -- rgb_bitdist, which tests 994very much the same thing a very slightly different way. 995 996The most important planned addition is a "shuffler" that can be used to 997access the rng stream with lag, with or without loss. In particular, 998I'd like to be able to generate a stream to sample with an adjustable 999lag in between samples, at the bit level, to be able to detect weakly 1000correlated cycles in the bitstream. 1001 1002Most of the tests in dieharder are really only strongly sensitive to 1003what I might call "first order" departure from randomness, first order 1004decorrelation. They measure statistics generated from sampling the rng 1005stream in order, with each sample presumed to be independent. They do 1006not, for the most part (with certain exceptions, that is) detect whether 1007any given rand is correlated with the rand that immediate follows it, or 1008the one two rands downstream, or the one fifty rands downstream -- those 1009are "second order" correlations. 1010 1011Amazingly, there seems to be no test that simply and directly evaluates 1012 1013 A(d) = <(1.0-2.0*x_i) *(1.0-2.0*x_{i+d})>, 1014 1015the average autocorrelation of the floating point random number stream 1016shifted so that the expected autocorrelation is zero for all 1017displacements d. However, this test really needs to be run in a variety 1018of ways, as one can imagine bitlevel correlations that will show up 1019strongly on bitlevel tests but weakly on floating point tests. 1020 1021Naturally, one can imagine looking at higher order moments as well. 1022 1023To facilitate tests of all of these sorts, I want to wrap the gsl 1024generator in such a way that one can request: 1025 1026 n bits, skip m bits (throwing them away), to facilitate "direct" 1027 searches for 1028 1029and returning the n bits, or 1030 1031 a uint composed of the upper n bits, skip m uints, lower 32-n bits or 1032 vice versa. Nothing is thrown away. 1033 1034Or 1035 1036 a bit stream returnable as n-bit window or uints made up of any 1037 reasonable shuffling or displacements of components, with or without 1038 interleaving (lossless) or loss. 1039 1040With such an interface, one could use ALL of the dieharder tests to 1041check for the (mostly) first order randomness of streams generated from 1042various interleaved displacements of any generator. They would then all 1043"magically" become second order (or higher order) tests! 1044 1045 rgb 1046 1047========================================================================== 104809/04/08 1049 1050I've made many changes and bugfixes -- one of which is to fix the 1051run_test() routines so that they create, run, and destroy the tests, one 1052at a time (which suggests that it may well be time to alter the way 1053results are passed back or futher encapsulate tests to simplify 1054dieharder's logic, even at the expense of having tests that "function 1055like diehard's" did. As in some diehard tests generate two statistics 1056in one run; it may be time to separate them into two tests and run them 1057independently with their own names and their own pvalue returns, even at 1058the expense of efficiency. 1059 1060However, I'm writing to post this as my current design goal for -T(able) 1061output with an optional flat to control just what is included in the 1062table and header. Here's the proposed "standard" table output: 1063 1064#======================================================================== 1065# Generator Tested: mt19937 1066# Test Name | n-bits | p-samples | t-samples | p-value(s) 1067#======================================================================== 1068RGB Bit Distribution Test| 1 | 100 | 100000 | 0.55579848 1069RGB Bit Distribution Test| 2 | 100 | 100000 | 0.69322169 1070... 1071 1072Each field's presence or absence will be able to be turned on or off by 1073means of a flag bit. The presence of the header, and whether or not it 1074contains e.g. the rng descriptors (name and/or filename) will be flag 1075bit controlled as well, with -q basically setting the bits to suppress 1076headers altogether. 1077 1078In this way, with suitable flag settings, one can "just" output pvalues, 1079can output names and pvalues, can output just names and assessments, 1080etc. 1081 1082I'm guessing that the following are all fields we want to be able to 1083control with their own flag bits: 1084 1085THEADER 1086TGEN 1087TNAME 1088TSNAME 1089TNBITS 1090TPSAMPLES 1091TTSAMPLES 1092TPVALUE 1093TASSESSMENT 1094 1095Anyway, let's do it... 1096 10979/19/08 1098 1099That worked really well. However, I now have to handle some way of 1100managing the tests that take ntuple as an argument for e.g. lag or 1101ntuple size. I >>think<< that the only correct way to manage this will 1102be to move the loops to the run_all_tests() function, since that is ONLY 1103a CLI feature, and call the execute_test within it as per usual. That 1104way run_test WITH a setting of ntuple will be entirely API driven and 1105consistent. 1106 1107The other thing(s) I have to implement include: Some sort of CLI flags 1108or control variables to tell dieharder when and how to set a random 1109number seed. We have several possible seed strategies, you see: 1110 1111 a) Dieharder reseeds from /dev/random or clock one time at the time 1112the rng is initialized, and then left alone (CLI) until the rng is freed 1113(UI) or the program is ended (CLI). The Rdh UI could in principle 1114permit the seed to be interactively reset by the user at any time; 1115that's up to Dirk. 1116 1117When dieharder outputs results, one of the output flags will indicate 1118whether or not the rng seed should appear. In general it probably 1119should, but only the one seed randomly picked at the beginning. 1120 1121 b) Dieharder uses a seed entered by the user one time at the 1122beginning. This should clearly be able to perfectly reproduce any 1123identical sequence of tests (including dieharder -a results) that began 1124with just that seed, so this specific thing should be a possibility. 1125 1126 c) Dieharder uses a seed entered by the user at the beginning to set 1127the seed at the beginning of each test. This usage is primarily for 1128validation purposes, since doing this makes results from "series" tests 1129that run closely related tests on various ntuples or lags not 1130independent. For example, rgb_bitstream for an ntuple of 4 is obviously 1131going to be strongly correlated with the results for an ntuple of 8 1132STARTED FROM THE SAME SEED; the two half-bytes that make up each 8 were 1133previously tested and the results of that test will clearly affect the 1134results for 8. The same problem exists for the closely related 1135sts_serial, but it keeps track and attempts to correct for it. 1136 1137Still, entering a seed and having it used to reseed EACH test run means 1138that dieharder -a -S 1 can EASILY generate an entire set of 1139reference/validation results that can then be generated and compared one 1140at a time in Rdh (which doesn't "do" a -a test). If we seeded only at 1141the beginning, any tiny variation in e.g. the number of rands called or 1142test order would produce different downstream numbers and a failure to 1143validate. 1144 1145Obviously this mode should not be used for testing, as one expects 1146"failures" to cluster in linked tests because of the aforementioned 1147non-independent of the results. If 4 bit numbers lead to a "weak" 1148pvalue, testing 8 bit numbers should also lead to a weak pvalue and/or 1149vice versa. 1150 1151Our current difficulty is that we have THREE possibilities but only TWO 1152CLI control states -- with -S whatever or without. Without is fine -- 1153that can and should select random seeding. With, however, requires a 1154second CLI switch option to select between the two seeding strategies. 1155 1156There really isn't anything wrong with providing the fourth combination 1157of two separate toggles as follows: 1158 1159 1160 Strategy \ Seed| 0 | 1 1161------------------------------------------------------------------ 1162 beginning| random at beginning | from Seed at beginning 1163 per test | random per test | from Seed per test 1164 1165SO, it looks like we need to introduce TWO variables. -s sets the 1166strategy. -S sets the seed to the selected value, or left alone at 0, 1167sets seed to a "random" choice. 1168 1169This works because -s is liberated now that we don't need to select sts 1170tests that way. So is -r and -d. We should probably keep -d and use it 1171for -d(ieharder test name/test number) the same way -g allows or will 1172allow one to pick either one. -r, well, we'll have to think about what 1173to do with it, eventually. 1174 1175This is enough for now. 1176