1These are some simple design notes to document this program and to guide 2further work. 3 4This program is designed to do the following: 5 6 a) Encapsulate every RNG known to humans for testing. We will use the 7GSL both for its rich supply of ready-to-test RNG's and for its 8relatively simple and consistent encapsulation of any new RNG's we add 9-- /dev/random is already encapsulated and can serve as a template for 10future work. [1/16/03 rgb] 11 12 b) test speed of the available RNG's. Both: 13 i) In "native/standard" subroutine calls (with the subroutine call 14 overhead included). [This is done, 1/16/03, rgb] 15 ii) In a custom "vector" form, where a whole block of dedicated 16 memory is filled in one pass from a given algorithm, and 17 subsequently accessed via a macro that only calls for a refill 18 when the vector is exhausted. 19 20 c) test quality of the available RNG's. There are at least three 21primary sources of tests that will be used in this tool. 22 23 i) RGB tests. These are the tests I myself have implemented. To 24 the best of my knowledge these are new, although of course 25 my knowledge is far from complete. 26 27 ii) DIEHARD, by Dr. George Marsaglia of FSU. This is a well-known 28 and venerable "battery of tests" of RNG's. 29 30 iii) NIST STS/FIPS (NIST special publication 800-22, revised 31 5/15/2001). This is also a suite of RNG tests, with some 32 overlap with DIEHARD. I will probably implement only selected 33 tests from this suite at least at first, as some of them 34 appear to be relatively weak compared to e.g. rgb_binomial 35 and to test basically the same thing. 36 37 iv) Additional tests that are to be found on the web and in the 38 literature, where they appear to test things not 39 well-represented in tests already implemented in one suite 40 or another. 41 42 d) test quality of any input set of "random" data according to i-iv in 43c). This will let us test arbitrary RNG's via their data, including and 44especially hardware generators. Note that hardware generators available 45as "devices" will be interfaced via a). This step will only be 46implemented when enough tests to make it worthwhile are already 47implemented. 48 49 50 Addendum and explanation of copyright issues. 51 52The STS was written by a variety of authors: 53 54 Andrew Rukhin, Juan Soto, James Nechvatal, Miles Smid, Elaine Barker, 55 Stefan Leigh, Mark Levenson, Mark Vangel, David Banks, Alan Heckert, 56 James Dray, San Vo. 57 58None of the actual code in the STS suite has been used in this tool -- 59all testing routines have been written using only the STS published 60document as an excellent reference. GSL routines have been used 61throughout, where possible, for computing things such as erfc or Q. 62Since I am not using their code, I am omitting their copyright notice, 63but wish to acknowledge their documentation of the selected tests 64anyway. All the code in dieharder is GPL open source in any event, but 65academic credit matters. 66 67Similarly, DIEHARD is the work of George Marsaglia. Dr. Marsaglia 68doen't have any written copyright notice or license terms that I could 69find in the sources provided on the website where he openly distributes 70source for his own suite, but in keeping with a minor design goal of 71completely original GPL code, all the DIEHARD algorithms have also been 72completely rewritten without directly utilizing any of the actual 73diehard code. Fortunately, Dr. Marsaglia also provides clear 74documentation for his test suite, making it a fairly simple matter to 75implement the tests. Source code WAS examined (but not copied) to 76clarify places where the documentation of the tests was openly erroneous 77(as in the parking lot test) or unclear but dieharder code is completely 78original and its development thoroughly documented in CVS and (later) 79SVN tree form, checkin by checkin. 80 81The relatively few tests I have added are motivated by a desire to get a 82less ambiguous answer than many of these tests provide. In many cases 83it is not (or should not) be correct to say that one "accepts" a 84generator as being a good one just because a test run of the generator 85has a p-value significantly greater than 0. A few moment's 86experimentation, especially with relatively small data sets, should 87convince one that even "bad" RNG's can sometimes, or even frequently, 88return an "acceptable" p-value. Only when the size of the test is 89increased, or the test is repeatedly run with different seeds, does it 90become apparent that although it sometimes performs acceptably (the 91result of a run isn't inconsistent with the generator producing "real" 92random numbers) it sometimes performs so poorly that the result can 93>>never<< be explained or believed for a true random number generator. 94 95That is, while a really poor p-value allows us to reject the hypothesis 96that the tested generator is random, acceptable p-values, even a fair 97number of them in repeated tests, do not usually support accepting the 98hypothesis -- at best they don't support rejection. 99 100Running a test repeatedly to generate a full distribution of results and 101then directly comparing the entire distribution with theory appears to 102provide greater sensitivity and accuracy in the rejection process than 103"simple" tests that generate only a single quantity. This motivated the 104rgb_binomial test, which has proven very good at rejecting "bad" rng's. 105 106Similarily, it may prove useful to directly generate an empirical 107distribution of p-values themselves (repeating a test many times with 108different seeds and binning the p-values) and compare it to the expected 109distribution (which might work for erfc-based p-values, although not so 110well for Q-based p-values). This can replace running a test many times 111looking for an anomalous number of "poor" p-values mixed in with ones 112that are not so poor as to lead to immediate rejection. 113 114Contributions of rng's (consistently GSL-wrapped as e.g. dev_random.c 115demonstrates) and/or additional tests would be most welcome. 116 117 rgb 118 119 120 STS Critique, By Statistic 121 122 The Monobit Statistic 123 124The monobit statistic checks to make sure that the raw number of 1's and 1250's approximately balance in a given bitstring. This is actually a 126decent thing to test, but there is a better way to test it. Given a 127relatively short bitstring (e.g. n<=160 bits), the probability that it 128will have k bits according to the null hypothesis is p(n,k,1/2) (the 129binomial distribution of outcomes over n independent trials, each with 130p=1/2). If one runs m runs of n independent trials each, and 131accumulates X[k] (the count histogram for each possible value of output 132k in [0,n]) then one has X[k], the expected value histogram Y(k) = 133m*p(n,k,1/2), and sigma(k) = sqrt(m)*p(n,k,1/2). From this it is simple 134to compute a p-value from an incomplete gamma function for the 135>>details<< of the distribution of 1's. One can even do it two ways -- 136with an independent random number seed for each of the m samples, or 137by simply running the rng with its original seed long enough to produce 138m*n bits. 139 140This binomial test is stronger than the monobit test. Any result 141histogram that passes it necessarily passes the monobit test for the 142aggregate distribution with m*n bits (as I hope is obvious -- passing it 143means that the distribution is binomial in detail, not just uniformly 144balanced. However, it is simple enough to compute the monobit result 145more or less in passing in the binomial test by simply summing the count 146histogram at the end or aggregating a total 1-bit count as one goes. In 147that case one can generate two p-values -- a p-value for the monobit 148test and one for the binomial -- in one pass for the same data. I would 149predict that the binomial p-value is strictly less than the monobit 150p-value, and that there exist rng's that pass the monobit that 151explicitly fail the binomial, but we can determine that empirically. 152 153This chisq-based binomial test methodology can be used to improve nearly 154any single test statistic in sts. Instead of looking at the expected 155value of the statistic itself, frame its generation as a Bernoulli trial 156with a known percentage, compute the expected binomial distribution and 157its error, generate the result histogram as the test statistic, and use 158chisq and the complementary incomplete gamma function Q(a,x) to generate 159a p-value over many smaller "independent" trials instead of one long 160one. 161 162This suggests ways of improving e.g. the runs and other tests as well, 163see below. 164 165 The Runs Statistic 166 167For n bits, count the test statistic X as the number of (overlapping) 01 168and 10 pairs and add one. Compute p_1 = (# ones)/n, p_0 = (1 - p_1) 169(and ensure that they are close enough to p_0 = p_1 = 0.5). Compute the 170expected number of 01 and 10 pairs as Y = 2*n*p_1*p_0. Compute the 171expected error in the number of 01 and 10 pairs as sigma = 1722*sqrt(2*n)*p_1*p_0. Finally, compute the p-value as erfc(|X - 173Y|/sigma). 174 175This methodology seems questionable, at least to me. 176 177First of all, it seems odd to use a p_0 and p_1 derived from the sample, 178given that the null hypothesis is p_0 = p_1 = 0.5 and independent. 179Ensuring that the sample passes monobit (in a slightly different 180formulation) to start means that one has conditionally >>accepted<< this 181null hypothesis for the rng before beginning the runs test, using 182different p's implies a different null hypothesis. 183 184In addition, given their prior assumption that p_1 = # ones/# bits, 185their computation of the probability of p_01 and p_10 (and hence Y) is 186erroneous. The probability of 01 in a finite string of n bits, subject 187to constrained numbers of 0's and 1's needs to be computed WITHOUT 188replacement, just as one does with a deck of cards. After shuffling a 189deck of cards known to contain four aces, the probability that the top 190two cards are aces is not (4/52)*(4/52), but rather (4/52)*(3/52) -- if 191the first card is an ace, it means there are fewer aces left in the deck 192for the second card. To be consistent, they should compute the expected 193number of 01 strings in a 10-bit sequence containing 6 1's as 194(4/10)(6/9) and not (4/10)(6/10). They need to choose between a null 195hypothesis of p_1 = p_0 = 0.5 (where they can presume successive bit 196probabilities to be independent) and a null hypothesis of p_1 = (# 1971's)/(# bits) and use "probability for a 1, given that the first draw is 198a 0". This matters for short strings. 199 200Next, they don't wrap the n-value bitstring into a torus. Consequently, 201there are only n-1 bitpairs examined, and thus the expected value should 202be Y = 2*(n-1)*p_1*p_0 and not 2*n*p_1*p_0. For large n, of course, it 203won't matter (neither will the previous two problems, at least not 204much), but it is surprisingly sloppy mathematically. 205 206In addition there is no reason I can see to add a 1 to the test 207statistic of (# of 01 and 10 pairs in an overlapping scan of the 208string). This is very difficult to understand, as it clearly breaks the 209symmetry of this test statistic with the complementary view where the 210test statistic X is (# of 00 or 11 pairs in an overlapping scan of the 211string) but Y and sigma are the same -- the sum of the Y's necessarily 212equals n (or n-1, if that is the number of examined pairs). This leads 213to an obvious problem for short strings, e.g. n = 2, where the formula 214they give for the expected number of 01 or 10's measured is 2152*2*(1/2)*(1/2) = 1, but the statistic can only take on the values 1 or 2162, each with probability 1/2. There is something odd here. 217 218Finally, it is more than a bit crazy to view this assessment as somehow 219a measure of excessive sequential correlation, as a measure of "runs". 220Let us reframe the test statistic (count of 01 and 10's) in terms of our 221null hypothesis. Presuming random, equal probability 0's and 1's, we 222expect 25% each of 00 01 10 11. Given an exemplary bit string of e.g. 2231001101011, it doesn't really matter if one examines disjoint bitpairs 224(10 01 10 10 11) (n/2 bitpairs for n even) or the overlapping bitpairs 22510 00 01 11 10 01 10 01 11 11 (wrapping the last pair around 226periodically to ensure symmetry and n pairs of bits, EACH pair is 227expected to occur with a frequency, in the limit of many bitpairs, of 228#bitpairs*(1/4). 229 230Furthermore, in an ensemble of measurements of bitpair frequencies, each 231distinct possibility is (for lots of identical bitsets) to be binomially 232distributed with respect to the rest (as in p_01 = 1/4, p_!01 = 3/4) 233regardless of whether or not sequential overlapping bitpairs are 234counted. We can understand this by noting that our null hypothesis (a 235random generator producing pairs of bits) can be applied to any pair of 236bits in any string. 237 238Note also that in a given bitstring with periodic wraparound, the number 239of 01 (overlapping) pairs must equal the number of 10 pairs. This can 240be proven with ease. Every bit flipped can either change >>both<< the 24101 and 10 count by one (flipping a bit with two neighbors that are both 2420 or both 1) or it can leave them both the same (flipping a bit with a 243neighboring 0 AND a neighboring 1). This wreaks a certain measure of 244havoc on the notion that each sequentially overlapping pair examined is 245an independent measurement -- measuring any pair of (01 or 10 count) and 246(00 or 11 count) suffices to determine the counts for all four possible 247bit pairs. 248 249The only question remaining is whether measuring the counts using 250overlapping bitpairs is more or less rigorous a test than measuring the 251counts using non-overlapping pairs. That is, can a generator create a 252bitstring that is correctly pairwise distributed in both non-overlapping 253pair sequences but that is NOT correctly pairwise distributed in 254overlapping pair sequences or vice versa. The answer is not terribly 255obvious, but if we make the very weak presumption that our test 256measurments do not depend, on average, on whether we begin them 257systematically at bit 0 or bit 1 (so that if we have a bad generator 258we'll eventually fail for either starting point with roughly equal 259frequency for different starting seeds and/or first-bit displacements), 260then measuring both possibilities for each string should (on average) 261produce results that are conservatively related to measuring one 262possibility on twice as many strings. 263 264To put it another way, if a generator creates strings that >>fail<< the 265overlapping test, they would necessarily fail at least one of the two 266non-overlapping tests possible for the same bitstring, since the counts 267for the two non-overlapping tests add to produce the count for the 268non-overlapping test. For this addition to push values out of 269acceptable ranges one would require two deviations from the expected 270value in the same direction. If we only make the extremely weak 271assumption that failure of the non-overlapping tests is equally likely 272for both possible cyclic starting points, the 1/sqrt(2) scaling implicit 273in using both strings is thus conservative. 274 275However, we can easily average over this as well by simply randomly 276starting each non-overlapping run on bit 0 or bit 1. This might 277actually simplify the code and provides an ADDITIONAL measure of 278randomness as the two results should themselves be binomially 279distributed, independently. 280 281The final problem with runs is that it is not easy to see how to extend 282it to a higher order. For example, does 101 occur too often? How about 283010? Again, each bit pattern should occur at a rigorously predicted 284rate for p_0 = p_1 = 0.5 and independent bits, e.g. p_000 = p_001 = 285p_010 = ... = 1/8 for any distinct three-bit combination, p_0000... = 2861/16 for four-bit combinations, and so on. EACH of these outcomes 287should occur with binomially distributed frequencies when measured as a 288bernoulli trial with p_xyz, (1 - p_xyz), making it easy to compute both 289Y and sigma. This suggests that the correct generalization of monobit 290and runs (and probably more of the sts tests) is a test that (in one 291pass): 292 293 a) presumes a valid bitlevel rng so that p_0 = p_1 = 0.5 exactly, all 294bits independent (null hypothesis). 295 296 b) counts 1's (and infers 0's) for M sets of N bits, incrementing a 297histogram vector of the number of 1's observed accordingly. This is 298rigorously compared for EACH possible value of the count to the expected 299binomial distribution of counts. 300 301 c) counts e.g. 00,01,10,11 overlapping pairs. We showed above that 302the count of 01's exactly equals the count of 10's. The count of 11 303pairs equals the total number of pairs less the count of 00's, 01's and 30410's (all known). Even though these are not all independent, there is 305no harm and a small ease of coding advantage in binning all four counts 306in four result histograms, each to be compared to its binomial 307expectation. 308 309 d) counts e.g. 000, 001, 010... overlapping pairs. Here we IGNORE 310possible count relationships for sure (as we won't stop with three 311bits:-). Again, bin the outcome count frequencies and compare to their 312expected binomial distribution. 313 314This process can continue indefinitely, as long as we can accumulate 315enough counts in each pattern histogram to make aggregate statistics 316reliable. For example, at 8 bits, one is basically ensuring that the 317distribution of ascii characters is BOTH overall uniform (has the 318correct mean number of letters) AND that the actual distribution of 319those letters in samples is correct from the point of view of Bernoulli 320trials. 321 322I think that this process is strengthened significantly by conducting it 323simultaneously for all the sequences. The first measures "bit frequency 324and randomness". The second measures "pairwise sequential 325correlations". The third measure "three-bit sequential correlations" 326and so forth. A sequence might pass the first and fail the second, pass 327the first two and fail the third, pass the first seven and fail the 328eighth, pass the first eight and fail the 32nd (so that floats formed 329from the inverse are not uniformly distributed). 330 331=============== 332 333Addendum to previous note. The insight has struck me that all that this 334does is give one the actual DISTRIBUTION of outcomes for any given 335experiment. The distribution of sample means should be -- guess what -- 336normal -- provided that there are enough elements in the sample itself. 337For small numbers of elements in the sample, they should be binomial if 338the experiment is framed as a bernoulli trial. I'll bet that if I 339convert smoothly from binomial into normal as the number of outcome 340states exceeds the threshold where computing binomial probabilities is 341reasonably possible, I'll see no difference in an appropriately computed 342chisq! 343 344This is the solution to the problem of doing long strings. If I counted 345e.g. bitpair combinations across a string of length 8, (for many 346samples) binomial is appropriate. If I count bitpair combinations 347across a string of length 2^23 (8 million), I can't do binomial, but 348don't have to. Outcomes will be normally distributed about the expected 349means. 350 351SO, this is the answer to the bigger problem of why one should ever 352trust "p-value" for a single sample, or even a dozen samples, of a 353random process. Sure (to quote Marsaglia) "p happens", but it doesn't 354JUST happen, it happens according to a distribution. Instead of 355wondering if THIS occurrence of p = 0.02 from a single mean value is 356reasonable or not, compute the p of an entire distribution of mean 357values compared to the appropriate normal or binomial distribution, and 358get a p-value one can really trust! 359 360======================= 361 362OK, so I'm stupid and I'm wrong. sts_monobit is, in fact, more 363sensitive than rgb_binomial although rgb_binomial might yet pick up a 364generator that passes sts_monobit and fail it -- one that preserves the 365symmetry of its 0/1 distribution about the midpoint (so the mean number 366of 0's and 1's is correct) but has the >>wrong<< distribution, e.g. is 367not binomial. 368 369As it turns out, most distributions that fail rgb_binomial do not have 370symmetry about their midpoint and consequently fail sts_monobit "before" 371failing rgb_binomial. If you like, rgb_binomial only becomes reliable 372when there is enough data to fill the primary non-zero bins near the 373middle, which takes a goodly chunk of data. sts_monobit, in the 374meantime, just adds up the number of 0's and 1's while steadily chopping 375down sigma -- by the time one has sampled a few thousand bits a lot of 376rng's are already 0/1 asymmetric enough to fail monobit, but the bin 377sigma's are still large enough to forgive a lot of slop in any given 378bin. 379 380Interestingly, binomial can be applied to very small bitstrings -- in 381fact, to bitstrings of length 1 (I think). At that point it should 382become equivalent to monobit, but in a somewhat different way, 383especially since rgb_binomial now evaluates monobit directly on the 384accumulated bit values. 385 386Looks like I >>do<< have some work to do to fully understand everything, 387I do I do...;-) 388 389============== 390 391Whoo, dawgies! Out of a bit of insight, rgb_persist was born, and it 392explains why a LOT of rng's suck incredibly. Lots of rng's quite 393literally never change the contents of certain bits (usually lower 394order bits). One common symptom is for an rng to always return odd 395or even numbers. Another common symptom is for a rng to fail a 396distributional test when the repeated bits are measured (as repeated 1's 397cause an excess of 1's, repeated 0's an excess of 0's, even repeated 39801's can cause runs/bitpair tests to fail). Worse, rgb_persist clearly 399shows that MOST rng's that exhibit the problem at all exhibit it 400STRONGLY (LOTS of repeated lower order bits) for at least some seeds. 401Using them, one runs a significant (indeed, a near "sure thing") risk of 402encountering seeds such that as many as 18 bits out of 32 are repeated, 403leaving one with only a fraction of the variation you thought you 404had. 405 406At this point, I'd have to say that rgb_persist is one of the first 407tests to run on ANY rng, and any rng that fails it should probably 408just be rejected out of hand -- even if only the last (least 409significant) bit is repeated, it seems dangerous to use an rng that 410produces only even or only odd numbers from any given seed, even for 411games. 412 413Writing the dumpbits and rgb_persist routines also gave me insight into 414how rgb_binomial and even the sts routines might be failing relative to 415their design goals. I've been processing bits least to most 416significant, and I've been ignoring random_max when doing so. This is 417a capital mistake, as of course an rng will fail when it only returns 418e.g. 16 bits in a 32 bit slot (and the other bits are necessarily 419repeated). I've probably been "passing" only rng's that return a full 42032 "random bits" (and of course don't fail e.g. rgb_persist()). 421 422SO, I need to rewrite sts_monobit, sts_runs, and rgb_binomial (last 423first) so that they only check bits in the valid part of each returned 424word. This will be modestly annoying, but is of course absolutely 425necessary for the results to mean a damn thing. 426 427==================== 428 429Note the new test rgb_bitdist, which is the BEST generalization of 430sts_monobit and perhaps better than rgb_binomial as well. We need to 431merge sts_monobit, rgb_persist, and rgb_bitdist into a single test 432that measures the total/average bit number, the per-bit-slot 433total/average number, and derives the cumulative bitmask of repeated 434bits. 435 436Two additional directions to take this: 437 438First: 439 440 generate a set of Ntest variables for e.g. 441 1 bit 442 2 bit 443 3 bit 444 4 bit 445 ... 446 N bit 447 448combinations (one each). For number of bits and bitpattern, sweep the 449bitstring and accumulate the frequency histogram (for each possible 450bitpattern in each string location) 451 452 ntest[no_of_bits].x[bit_pattern_slot] 453 454with periodic wraparound on the bitstring and so forth -- this is, in 455fact, sts_runs but for arbitrary numbers of bits and accumulating BOTH 456the totals AND the actual distribution, and check the chisq on the 457distribution and the pvalue of the total expected. 458 459Second: 460 461Here we look for subtle sequential bitlevel correlations. We do the 462same test as before, except that the frequency histogram is LATERAL -- 463doing each bit position in a sequential fashion. This test should see a 464clear failure for bits that don't change (strong sequential correlation, 465that) but should ALSO discover individual bits that do things like: 46601010101 or 0011110000111100 or any other "balanced" repetition. 467 468In fact, we may eventually do a fft on the forward sequential bit 469sequences to see if there are even very weak sequential bit-level 470correlations like 001010100001100010 (where every sequential triplet has 471two 0's and one 1), even if the triplets themselves are "random". A 472good frequency test on all 3-way bit patterns should catch this, of 473course, and the lovely thing about binary is that I can always use a 474suitably masked integer conversion of the bit pattern itself as the 475histogram index! So it needn't be TOO horrible to encode. 476 477===================================================== 478 479Note on diehard parking lot test. First of all, the documentation for 480the test is incorrect -- it tests for "crashes" (overlap) of SQUARE cars 481with sides of length 1, not ROUND cars of RADIUS one as Marsaglia's 482documentation and comments assert. If round cars ("helicopters", to 483borrow Marsaglia's term) are used a different distribution is obtained 484with mean over 4000. 485 486Actually to me, round cars seems very very similar to one of Knuth's 487tests for hyperplanar distribution of random coordinates, which has a 488much more solid theoretical foundation; I suspect the tests are more or 489less equivalent in sensitivity and will yield identical success/failure 490patterns in 99% of all cases (maybe even 100%). And the sensitivity of 491this test sucks -- it was very difficult for me to find a test that 492failed it out of the entire GSL collection (trying a half dozen 493known-weak generators or more in the process). The one that finally DID 494fail it really, really sucked and failed all sorts of other tests. 495 496I'd be interested to see if this test deserves to live, in the long run. 497I rather suspect that Knuth tests in variable dimensions will yield a 498much more systematic view of RNG failure, just as rgb_bitdist does 499compared to e.g. runs tests. 500 501================================================================== 50207/11/06 503 504I've just finished testing Marsaglia's bitstream test, and it is (like 505rgb_bitdist) a test that nothing survives. In fact, it is a dead 506certainty that rgb_bitdist is a better version of the same basic test, 507and that generators that fail rgb_bitdist at (say) 6 bits are going to 508fail Marsaglia's less sensitive 20-bit test. What his test does that 509bitstream doesn't is use a single relatively coarse grained test 510statistic -- the expected number of missing 20 bit numbers out of 2 511million trials -- instead of doing what rgb_bitdist does, which is 512do a chisq test on the fractional occupancy of the entire VECTOR of 20 513bit numbers. 514 515Research question: Which is more sensitive AT 20 bits? It seems to me 516that the chisq vector test is clearly superior as it would catch many 517possible "bad" distributions that conserved the mean number of missing 518numbers at this sample size (however unlikely such special deviations 519might be in practice) but then there is the question of sensitivity 520given a fixed sample size. SO FAR, however, rgb_bitdist continues to 521be the premier test for revealing failure, with nothing getting out 522alive past 6 bits, ASSUMING that I'm computing the vector chisq and 523p-value per run correctly and not just revealing that I'm using the 524wrong form as the number of bits goes up. 525 526Oh, and nothing survives diehard_bitdist EITHER, at least when one 527cranks the test up past Marsaglia's wimply 20 iterations to (say) 200. 528At that point the non-uniformity of the p-distribution is absolutely 529apparent -- it even has structure. That is, some seeds lead to biased 530results that are TOO CLOSE to the mean value and not properly 531distributed... an interesting observation all by itself. 532 533 rgb 534 535================================================================== 53607/11/06 later that day... 537 538diehard_count_1s_stream() now WORKS! What a PITA! And (I suspect) how 539unnecessary all the complexity! This test (as it turns out) computes 540chisq on what amounts to a strangely remapped rgb_bitdist test 541(precisely!) on four and five digit base 5 integers obtained by 542remapping bytes based on the number of 1's therein. 543 544That is, 256 possibilities are reduced to 5, each with a 545trivial-to-compute frequency/probability, the base 5 integer is left 546shifted (base 5!) and added to the next one to cumulate 4 or 5 digit 547numbers, those numbers are then counted, and the counts compared to the 548EXPECTED counts given the number of samples and turned into chisq (one 549each, for 4 and for 5 digit numbers). Sigh. 550 551OK then, we COULD just print the p-value of those chisq's -- an 552absolutely straightforward process. Alternatively, we could do what 553Marsaglia does -- compute the DIFFERENCE of the chisq's themselves, 554which of course are supposedly distributed around the number of degrees 555of freedom of each one separately, that is 3125 - 625 = 2500! This is 556the mean value of the difference, with stddev = sqrt(2*2500). This 557final result is turned into a p-value. 558 559This of course is wierd and wrong in so many ways. For one, it is 560entirely possible for chisq_4 and chisq_5 to differ systematically from 561their expected values, and it is perfectly possible and likely that 562those deviations would occur in the same direction -- e.g. down -- so 563that the 4 byte and 5 byte streams BOTH have fewer degrees of freedom 564than expected. Of course in this case part of the individual deviations 565CANCEL and is not visible in the final p-value! 566 567Honestly, I can think of pretty much "no" reasonable ways that this 568final pvalue can fail where any/all of the p-values for the 4 or 5 (or 2 569or 3 or 1 or 17) digit distributions would not also fail, but I can (and 570just did) think of a way that the final p-value might MARGINALLY pass 571while the individual p-values fail. 572 573Although the general approach is a good one, what is clearly needed is a 574new test -- one that extends rgb_bitdist to multidigit strings of smaller 575base. For example, right now rgb_bitdist tests the FREQUENCY of the 576occurrence of all 5 digit strings but not all 5 digit strings taken two 577at a time (32 x 32 = 1K possbilities). Of course one REASON that I 578don't do that is that I >>already<< do e.g. 10 bit strings -- the DIRECT 579frequency distribution of those 1K possibilities, which obviously 580samples all the 5 bit combos taken two at a time!. And besides, every 581rng I have tested fails at the level of what amounts to two digit octal 582numbers -- six bit strings. So I'm cynical about this. 583 584I also don't think that this really counts 1's. How is this one iota 585different from simply evaluating the distribution of bytes (8 bit 586randoms)? If the complete distribution of 8 bit bytes (rgb_bitdist at 8 587bits) is random, ALL DERIVED DISTRIBUTIONS are random, with the only 588catchy part being temporal/sequential correlations that might be 589revealed from taking the strings four or five bytes at a time (at the 590expense of compressing the information tremendously). 591 592Once again, this test seems reduceable to but not as sensitive as 593rgb_bitdist at a given level, unfortunately a level far beyond where all 594known rng's fail anyway. It is a MEASURE of the lack of sensitivity 595that this does does NOT fail for many of those rng's where rgb_bitdist 596does... 597 598 rgb 599