1These are some simple design notes to document this program and to guide
2further work.
3
4This program is designed to do the following:
5
6  a) Encapsulate every RNG known to humans for testing.  We will use the
7GSL both for its rich supply of ready-to-test RNG's and for its
8relatively simple and consistent encapsulation of any new RNG's we add
9-- /dev/random is already encapsulated and can serve as a template for
10future work. [1/16/03 rgb]
11
12  b) test speed of the available RNG's.  Both:
13     i) In "native/standard" subroutine calls (with the subroutine call
14        overhead included).  [This is done, 1/16/03, rgb]
15    ii) In a custom "vector" form, where a whole block of dedicated
16        memory is filled in one pass from a given algorithm, and
17        subsequently accessed via a macro that only calls for a refill
18        when the vector is exhausted.
19
20  c) test quality of the available RNG's.  There are at least three
21primary sources of tests that will be used in this tool.
22
23     i) RGB tests.  These are the tests I myself have implemented.  To
24        the best of my knowledge these are new, although of course
25        my knowledge is far from complete.
26
27    ii) DIEHARD, by Dr. George Marsaglia of FSU.  This is a well-known
28        and venerable "battery of tests" of RNG's.
29
30   iii) NIST STS/FIPS (NIST special publication 800-22, revised
31        5/15/2001).  This is also a suite of RNG tests, with some
32        overlap with DIEHARD.  I will probably implement only selected
33        tests from this suite at least at first, as some of them
34        appear to be relatively weak compared to e.g. rgb_binomial
35        and to test basically the same thing.
36
37    iv) Additional tests that are to be found on the web and in the
38        literature, where they appear to test things not
39        well-represented in tests already implemented in one suite
40        or another.
41
42  d) test quality of any input set of "random" data according to i-iv in
43c).  This will let us test arbitrary RNG's via their data, including and
44especially hardware generators.  Note that hardware generators available
45as "devices" will be interfaced via a).  This step will only be
46implemented when enough tests to make it worthwhile are already
47implemented.
48
49
50         Addendum and explanation of copyright issues.
51
52The STS was written by a variety of authors:
53
54  Andrew Rukhin, Juan Soto, James Nechvatal, Miles Smid, Elaine Barker,
55  Stefan Leigh, Mark Levenson, Mark Vangel, David Banks, Alan Heckert,
56  James Dray, San Vo.
57
58None of the actual code in the STS suite has been used in this tool --
59all testing routines have been written using only the STS published
60document as an excellent reference.  GSL routines have been used
61throughout, where possible, for computing things such as erfc or Q.
62Since I am not using their code, I am omitting their copyright notice,
63but wish to acknowledge their documentation of the selected tests
64anyway.  All the code in dieharder is GPL open source in any event, but
65academic credit matters.
66
67Similarly, DIEHARD is the work of George Marsaglia.  Dr. Marsaglia
68doen't have any written copyright notice or license terms that I could
69find in the sources provided on the website where he openly distributes
70source for his own suite, but in keeping with a minor design goal of
71completely original GPL code, all the DIEHARD algorithms have also been
72completely rewritten without directly utilizing any of the actual
73diehard code.  Fortunately, Dr.  Marsaglia also provides clear
74documentation for his test suite, making it a fairly simple matter to
75implement the tests.  Source code WAS examined (but not copied) to
76clarify places where the documentation of the tests was openly erroneous
77(as in the parking lot test) or unclear but dieharder code is completely
78original and its development thoroughly documented in CVS and (later)
79SVN tree form, checkin by checkin.
80
81The relatively few tests I have added are motivated by a desire to get a
82less ambiguous answer than many of these tests provide.  In many cases
83it is not (or should not) be correct to say that one "accepts" a
84generator as being a good one just because a test run of the generator
85has a p-value significantly greater than 0.  A few moment's
86experimentation, especially with relatively small data sets, should
87convince one that even "bad" RNG's can sometimes, or even frequently,
88return an "acceptable" p-value.  Only when the size of the test is
89increased, or the test is repeatedly run with different seeds, does it
90become apparent that although it sometimes performs acceptably (the
91result of a run isn't inconsistent with the generator producing "real"
92random numbers) it sometimes performs so poorly that the result can
93>>never<< be explained or believed for a true random number generator.
94
95That is, while a really poor p-value allows us to reject the hypothesis
96that the tested generator is random, acceptable p-values, even a fair
97number of them in repeated tests, do not usually support accepting the
98hypothesis -- at best they don't support rejection.
99
100Running a test repeatedly to generate a full distribution of results and
101then directly comparing the entire distribution with theory appears to
102provide greater sensitivity and accuracy in the rejection process than
103"simple" tests that generate only a single quantity.  This motivated the
104rgb_binomial test, which has proven very good at rejecting "bad" rng's.
105
106Similarily, it may prove useful to directly generate an empirical
107distribution of p-values themselves (repeating a test many times with
108different seeds and binning the p-values) and compare it to the expected
109distribution (which might work for erfc-based p-values, although not so
110well for Q-based p-values).  This can replace running a test many times
111looking for an anomalous number of "poor" p-values mixed in with ones
112that are not so poor as to lead to immediate rejection.
113
114Contributions of rng's (consistently GSL-wrapped as e.g. dev_random.c
115demonstrates) and/or additional tests would be most welcome.
116
117     rgb
118
119
120                     STS Critique, By Statistic
121
122                       The Monobit Statistic
123
124The monobit statistic checks to make sure that the raw number of 1's and
1250's approximately balance in a given bitstring.  This is actually a
126decent thing to test, but there is a better way to test it.  Given a
127relatively short bitstring (e.g. n<=160 bits), the probability that it
128will have k bits according to the null hypothesis is p(n,k,1/2) (the
129binomial distribution of outcomes over n independent trials, each with
130p=1/2).  If one runs m runs of n independent trials each, and
131accumulates X[k] (the count histogram for each possible value of output
132k in [0,n]) then one has X[k], the expected value histogram Y(k) =
133m*p(n,k,1/2), and sigma(k) = sqrt(m)*p(n,k,1/2).  From this it is simple
134to compute a p-value from an incomplete gamma function for the
135>>details<< of the distribution of 1's.  One can even do it two ways --
136with an independent random number seed for each of the m samples, or
137by simply running the rng with its original seed long enough to produce
138m*n bits.
139
140This binomial test is stronger than the monobit test.  Any result
141histogram that passes it necessarily passes the monobit test for the
142aggregate distribution with m*n bits (as I hope is obvious -- passing it
143means that the distribution is binomial in detail, not just uniformly
144balanced.  However, it is simple enough to compute the monobit result
145more or less in passing in the binomial test by simply summing the count
146histogram at the end or aggregating a total 1-bit count as one goes.  In
147that case one can generate two p-values -- a p-value for the monobit
148test and one for the binomial -- in one pass for the same data.  I would
149predict that the binomial p-value is strictly less than the monobit
150p-value, and that there exist rng's that pass the monobit that
151explicitly fail the binomial, but we can determine that empirically.
152
153This chisq-based binomial test methodology can be used to improve nearly
154any single test statistic in sts.  Instead of looking at the expected
155value of the statistic itself, frame its generation as a Bernoulli trial
156with a known percentage, compute the expected binomial distribution and
157its error, generate the result histogram as the test statistic, and use
158chisq and the complementary incomplete gamma function Q(a,x) to generate
159a p-value over many smaller "independent" trials instead of one long
160one.
161
162This suggests ways of improving e.g. the runs and other tests as well,
163see below.
164
165                        The Runs Statistic
166
167For n bits, count the test statistic X as the number of (overlapping) 01
168and 10 pairs and add one.  Compute p_1 = (# ones)/n, p_0 = (1 - p_1)
169(and ensure that they are close enough to p_0 = p_1 = 0.5).  Compute the
170expected number of 01 and 10 pairs as Y = 2*n*p_1*p_0.  Compute the
171expected error in the number of 01 and 10 pairs as sigma =
1722*sqrt(2*n)*p_1*p_0.  Finally, compute the p-value as erfc(|X -
173Y|/sigma).
174
175This methodology seems questionable, at least to me.
176
177First of all, it seems odd to use a p_0 and p_1 derived from the sample,
178given that the null hypothesis is p_0 = p_1 = 0.5 and independent.
179Ensuring that the sample passes monobit (in a slightly different
180formulation) to start means that one has conditionally >>accepted<< this
181null hypothesis for the rng before beginning the runs test, using
182different p's implies a different null hypothesis.
183
184In addition, given their prior assumption that p_1 = # ones/# bits,
185their computation of the probability of p_01 and p_10 (and hence Y) is
186erroneous.  The probability of 01 in a finite string of n bits, subject
187to constrained numbers of 0's and 1's needs to be computed WITHOUT
188replacement, just as one does with a deck of cards.  After shuffling a
189deck of cards known to contain four aces, the probability that the top
190two cards are aces is not (4/52)*(4/52), but rather (4/52)*(3/52) -- if
191the first card is an ace, it means there are fewer aces left in the deck
192for the second card.  To be consistent, they should compute the expected
193number of 01 strings in a 10-bit sequence containing 6 1's as
194(4/10)(6/9) and not (4/10)(6/10).  They need to choose between a null
195hypothesis of p_1 = p_0 = 0.5 (where they can presume successive bit
196probabilities to be independent) and a null hypothesis of p_1 = (#
1971's)/(# bits) and use "probability for a 1, given that the first draw is
198a 0".  This matters for short strings.
199
200Next, they don't wrap the n-value bitstring into a torus.  Consequently,
201there are only n-1 bitpairs examined, and thus the expected value should
202be Y = 2*(n-1)*p_1*p_0 and not 2*n*p_1*p_0.  For large n, of course, it
203won't matter (neither will the previous two problems, at least not
204much), but it is surprisingly sloppy mathematically.
205
206In addition there is no reason I can see to add a 1 to the test
207statistic of (# of 01 and 10 pairs in an overlapping scan of the
208string).  This is very difficult to understand, as it clearly breaks the
209symmetry of this test statistic with the complementary view where the
210test statistic X is (# of 00 or 11 pairs in an overlapping scan of the
211string) but Y and sigma are the same -- the sum of the Y's necessarily
212equals n (or n-1, if that is the number of examined pairs).  This leads
213to an obvious problem for short strings, e.g. n = 2, where the formula
214they give for the expected number of 01 or 10's measured is
2152*2*(1/2)*(1/2) = 1, but the statistic can only take on the values 1 or
2162, each with probability 1/2.  There is something odd here.
217
218Finally, it is more than a bit crazy to view this assessment as somehow
219a measure of excessive sequential correlation, as a measure of "runs".
220Let us reframe the test statistic (count of 01 and 10's) in terms of our
221null hypothesis.  Presuming random, equal probability 0's and 1's, we
222expect 25% each of 00 01 10 11.  Given an exemplary bit string of e.g.
2231001101011, it doesn't really matter if one examines disjoint bitpairs
224(10 01 10 10 11) (n/2 bitpairs for n even) or the overlapping bitpairs
22510 00 01 11 10 01 10 01 11 11 (wrapping the last pair around
226periodically to ensure symmetry and n pairs of bits, EACH pair is
227expected to occur with a frequency, in the limit of many bitpairs, of
228#bitpairs*(1/4).
229
230Furthermore, in an ensemble of measurements of bitpair frequencies, each
231distinct possibility is (for lots of identical bitsets) to be binomially
232distributed with respect to the rest (as in p_01 = 1/4, p_!01 = 3/4)
233regardless of whether or not sequential overlapping bitpairs are
234counted.  We can understand this by noting that our null hypothesis (a
235random generator producing pairs of bits) can be applied to any pair of
236bits in any string.
237
238Note also that in a given bitstring with periodic wraparound, the number
239of 01 (overlapping) pairs must equal the number of 10 pairs.  This can
240be proven with ease.  Every bit flipped can either change >>both<< the
24101 and 10 count by one (flipping a bit with two neighbors that are both
2420 or both 1) or it can leave them both the same (flipping a bit with a
243neighboring 0 AND a neighboring 1).  This wreaks a certain measure of
244havoc on the notion that each sequentially overlapping pair examined is
245an independent measurement -- measuring any pair of (01 or 10 count) and
246(00 or 11 count) suffices to determine the counts for all four possible
247bit pairs.
248
249The only question remaining is whether measuring the counts using
250overlapping bitpairs is more or less rigorous a test than measuring the
251counts using non-overlapping pairs.  That is, can a generator create a
252bitstring that is correctly pairwise distributed in both non-overlapping
253pair sequences but that is NOT correctly pairwise distributed in
254overlapping pair sequences or vice versa.  The answer is not terribly
255obvious, but if we make the very weak presumption that our test
256measurments do not depend, on average, on whether we begin them
257systematically at bit 0 or bit 1 (so that if we have a bad generator
258we'll eventually fail for either starting point with roughly equal
259frequency for different starting seeds and/or first-bit displacements),
260then measuring both possibilities for each string should (on average)
261produce results that are conservatively related to measuring one
262possibility on twice as many strings.
263
264To put it another way, if a generator creates strings that >>fail<< the
265overlapping test, they would necessarily fail at least one of the two
266non-overlapping tests possible for the same bitstring, since the counts
267for the two non-overlapping tests add to produce the count for the
268non-overlapping test.  For this addition to push values out of
269acceptable ranges one would require two deviations from the expected
270value in the same direction.  If we only make the extremely weak
271assumption that failure of the non-overlapping tests is equally likely
272for both possible cyclic starting points, the 1/sqrt(2) scaling implicit
273in using both strings is thus conservative.
274
275However, we can easily average over this as well by simply randomly
276starting each non-overlapping run on bit 0 or bit 1.  This might
277actually simplify the code and provides an ADDITIONAL measure of
278randomness as the two results should themselves be binomially
279distributed, independently.
280
281The final problem with runs is that it is not easy to see how to extend
282it to a higher order.  For example, does 101 occur too often?  How about
283010?  Again, each bit pattern should occur at a rigorously predicted
284rate for p_0 = p_1 = 0.5 and independent bits, e.g. p_000 = p_001 =
285p_010 = ... = 1/8 for any distinct three-bit combination, p_0000... =
2861/16 for four-bit combinations, and so on.  EACH of these outcomes
287should occur with binomially distributed frequencies when measured as a
288bernoulli trial with p_xyz, (1 - p_xyz), making it easy to compute both
289Y and sigma.  This suggests that the correct generalization of monobit
290and runs (and probably more of the sts tests) is a test that (in one
291pass):
292
293   a) presumes a valid bitlevel rng so that p_0 = p_1 = 0.5 exactly, all
294bits independent (null hypothesis).
295
296   b) counts 1's (and infers 0's) for M sets of N bits, incrementing a
297histogram vector of the number of 1's observed accordingly.  This is
298rigorously compared for EACH possible value of the count to the expected
299binomial distribution of counts.
300
301   c) counts e.g. 00,01,10,11 overlapping pairs.  We showed above that
302the count of 01's exactly equals the count of 10's.  The count of 11
303pairs equals the total number of pairs less the count of 00's, 01's and
30410's (all known).  Even though these are not all independent, there is
305no harm and a small ease of coding advantage in binning all four counts
306in four result histograms, each to be compared to its binomial
307expectation.
308
309   d) counts e.g. 000, 001, 010... overlapping pairs.  Here we IGNORE
310possible count relationships for sure (as we won't stop with three
311bits:-).  Again, bin the outcome count frequencies and compare to their
312expected binomial distribution.
313
314This process can continue indefinitely, as long as we can accumulate
315enough counts in each pattern histogram to make aggregate statistics
316reliable.  For example, at 8 bits, one is basically ensuring that the
317distribution of ascii characters is BOTH overall uniform (has the
318correct mean number of letters) AND that the actual distribution of
319those letters in samples is correct from the point of view of Bernoulli
320trials.
321
322I think that this process is strengthened significantly by conducting it
323simultaneously for all the sequences.  The first measures "bit frequency
324and randomness".  The second measures "pairwise sequential
325correlations".  The third measure "three-bit sequential correlations"
326and so forth.  A sequence might pass the first and fail the second, pass
327the first two and fail the third, pass the first seven and fail the
328eighth, pass the first eight and fail the 32nd (so that floats formed
329from the inverse are not uniformly distributed).
330
331===============
332
333Addendum to previous note.  The insight has struck me that all that this
334does is give one the actual DISTRIBUTION of outcomes for any given
335experiment.  The distribution of sample means should be -- guess what --
336normal -- provided that there are enough elements in the sample itself.
337For small numbers of elements in the sample, they should be binomial if
338the experiment is framed as a bernoulli trial.  I'll bet that if I
339convert smoothly from binomial into normal as the number of outcome
340states exceeds the threshold where computing binomial probabilities is
341reasonably possible, I'll see no difference in an appropriately computed
342chisq!
343
344This is the solution to the problem of doing long strings.  If I counted
345e.g. bitpair combinations across a string of length 8, (for many
346samples) binomial is appropriate.  If I count bitpair combinations
347across a string of length 2^23 (8 million), I can't do binomial, but
348don't have to.  Outcomes will be normally distributed about the expected
349means.
350
351SO, this is the answer to the bigger problem of why one should ever
352trust "p-value" for a single sample, or even a dozen samples, of a
353random process.  Sure (to quote Marsaglia) "p happens", but it doesn't
354JUST happen, it happens according to a distribution.  Instead of
355wondering if THIS occurrence of p = 0.02 from a single mean value is
356reasonable or not, compute the p of an entire distribution of mean
357values compared to the appropriate normal or binomial distribution, and
358get a p-value one can really trust!
359
360=======================
361
362OK, so I'm stupid and I'm wrong.  sts_monobit is, in fact, more
363sensitive than rgb_binomial although rgb_binomial might yet pick up a
364generator that passes sts_monobit and fail it -- one that preserves the
365symmetry of its 0/1 distribution about the midpoint (so the mean number
366of 0's and 1's is correct) but has the >>wrong<< distribution, e.g. is
367not binomial.
368
369As it turns out, most distributions that fail rgb_binomial do not have
370symmetry about their midpoint and consequently fail sts_monobit "before"
371failing rgb_binomial.  If you like, rgb_binomial only becomes reliable
372when there is enough data to fill the primary non-zero bins near the
373middle, which takes a goodly chunk of data.  sts_monobit, in the
374meantime, just adds up the number of 0's and 1's while steadily chopping
375down sigma -- by the time one has sampled a few thousand bits a lot of
376rng's are already 0/1 asymmetric enough to fail monobit, but the bin
377sigma's are still large enough to forgive a lot of slop in any given
378bin.
379
380Interestingly, binomial can be applied to very small bitstrings -- in
381fact, to bitstrings of length 1 (I think).  At that point it should
382become equivalent to monobit, but in a somewhat different way,
383especially since rgb_binomial now evaluates monobit directly on the
384accumulated bit values.
385
386Looks like I >>do<< have some work to do to fully understand everything,
387I do I do...;-)
388
389==============
390
391Whoo, dawgies!  Out of a bit of insight, rgb_persist was born, and it
392explains why a LOT of rng's suck incredibly.  Lots of rng's quite
393literally never change the contents of certain bits (usually lower
394order bits).  One common symptom is for an rng to always return odd
395or even numbers.  Another common symptom is for a rng to fail a
396distributional test when the repeated bits are measured (as repeated 1's
397cause an excess of 1's, repeated 0's an excess of 0's, even repeated
39801's can cause runs/bitpair tests to fail).  Worse, rgb_persist clearly
399shows that MOST rng's that exhibit the problem at all exhibit it
400STRONGLY (LOTS of repeated lower order bits) for at least some seeds.
401Using them, one runs a significant (indeed, a near "sure thing") risk of
402encountering seeds such that as many as 18 bits out of 32 are repeated,
403leaving one with only a fraction of the variation you thought you
404had.
405
406At this point, I'd have to say that rgb_persist is one of the first
407tests to run on ANY rng, and any rng that fails it should probably
408just be rejected out of hand -- even if only the last (least
409significant) bit is repeated, it seems dangerous to use an rng that
410produces only even or only odd numbers from any given seed, even for
411games.
412
413Writing the dumpbits and rgb_persist routines also gave me insight into
414how rgb_binomial and even the sts routines might be failing relative to
415their design goals.  I've been processing bits least to most
416significant, and I've been ignoring random_max when doing so.  This is
417a capital mistake, as of course an rng will fail when it only returns
418e.g. 16 bits in a 32 bit slot (and the other bits are necessarily
419repeated).  I've probably been "passing" only rng's that return a full
42032 "random bits" (and of course don't fail e.g. rgb_persist()).
421
422SO, I need to rewrite sts_monobit, sts_runs, and rgb_binomial (last
423first) so that they only check bits in the valid part of each returned
424word.  This will be modestly annoying, but is of course absolutely
425necessary for the results to mean a damn thing.
426
427====================
428
429Note the new test rgb_bitdist, which is the BEST generalization of
430sts_monobit and perhaps better than rgb_binomial as well.  We need to
431merge sts_monobit, rgb_persist, and rgb_bitdist into a single test
432that measures the total/average bit number, the per-bit-slot
433total/average number, and derives the cumulative bitmask of repeated
434bits.
435
436Two additional directions to take this:
437
438First:
439
440   generate a set of Ntest variables for e.g.
441     1 bit
442     2 bit
443     3 bit
444     4 bit
445     ...
446     N bit
447
448combinations (one each).  For number of bits and bitpattern, sweep the
449bitstring and accumulate the frequency histogram (for each possible
450bitpattern in each string location)
451
452  ntest[no_of_bits].x[bit_pattern_slot]
453
454with periodic wraparound on the bitstring and so forth -- this is, in
455fact, sts_runs but for arbitrary numbers of bits and accumulating BOTH
456the totals AND the actual distribution, and check the chisq on the
457distribution and the pvalue of the total expected.
458
459Second:
460
461Here we look for subtle sequential bitlevel correlations.  We do the
462same test as before, except that the frequency histogram is LATERAL --
463doing each bit position in a sequential fashion.  This test should see a
464clear failure for bits that don't change (strong sequential correlation,
465that) but should ALSO discover individual bits that do things like:
46601010101 or 0011110000111100 or any other "balanced" repetition.
467
468In fact, we may eventually do a fft on the forward sequential bit
469sequences to see if there are even very weak sequential bit-level
470correlations like 001010100001100010 (where every sequential triplet has
471two 0's and one 1), even if the triplets themselves are "random".  A
472good frequency test on all 3-way bit patterns should catch this, of
473course, and the lovely thing about binary is that I can always use a
474suitably masked integer conversion of the bit pattern itself as the
475histogram index!  So it needn't be TOO horrible to encode.
476
477=====================================================
478
479Note on diehard parking lot test.  First of all, the documentation for
480the test is incorrect -- it tests for "crashes" (overlap) of SQUARE cars
481with sides of length 1, not ROUND cars of RADIUS one as Marsaglia's
482documentation and comments assert.  If round cars ("helicopters", to
483borrow Marsaglia's term) are used a different distribution is obtained
484with mean over 4000.
485
486Actually to me, round cars seems very very similar to one of Knuth's
487tests for hyperplanar distribution of random coordinates, which has a
488much more solid theoretical foundation; I suspect the tests are more or
489less equivalent in sensitivity and will yield identical success/failure
490patterns in 99% of all cases (maybe even 100%).  And the sensitivity of
491this test sucks -- it was very difficult for me to find a test that
492failed it out of the entire GSL collection (trying a half dozen
493known-weak generators or more in the process).  The one that finally DID
494fail it really, really sucked and failed all sorts of other tests.
495
496I'd be interested to see if this test deserves to live, in the long run.
497I rather suspect that Knuth tests in variable dimensions will yield a
498much more systematic view of RNG failure, just as rgb_bitdist does
499compared to e.g. runs tests.
500
501==================================================================
50207/11/06
503
504I've just finished testing Marsaglia's bitstream test, and it is (like
505rgb_bitdist) a test that nothing survives.  In fact, it is a dead
506certainty that rgb_bitdist is a better version of the same basic test,
507and that generators that fail rgb_bitdist at (say) 6 bits are going to
508fail Marsaglia's less sensitive 20-bit test.  What his test does that
509bitstream doesn't is use a single relatively coarse grained test
510statistic -- the expected number of missing 20 bit numbers out of 2
511million trials -- instead of doing what rgb_bitdist does, which is
512do a chisq test on the fractional occupancy of the entire VECTOR of 20
513bit numbers.
514
515Research question:  Which is more sensitive AT 20 bits?  It seems to me
516that the chisq vector test is clearly superior as it would catch many
517possible "bad" distributions that conserved the mean number of missing
518numbers at this sample size (however unlikely such special deviations
519might be in practice) but then there is the question of sensitivity
520given a fixed sample size.  SO FAR, however, rgb_bitdist continues to
521be the premier test for revealing failure, with nothing getting out
522alive past 6 bits, ASSUMING that I'm computing the vector chisq and
523p-value per run correctly and not just revealing that I'm using the
524wrong form as the number of bits goes up.
525
526Oh, and nothing survives diehard_bitdist EITHER, at least when one
527cranks the test up past Marsaglia's wimply 20 iterations to (say) 200.
528At that point the non-uniformity of the p-distribution is absolutely
529apparent -- it even has structure.  That is, some seeds lead to biased
530results that are TOO CLOSE to the mean value and not properly
531distributed... an interesting observation all by itself.
532
533   rgb
534
535==================================================================
53607/11/06 later that day...
537
538diehard_count_1s_stream() now WORKS!  What a PITA!  And (I suspect) how
539unnecessary all the complexity!  This test (as it turns out) computes
540chisq on what amounts to a strangely remapped rgb_bitdist test
541(precisely!) on four and five digit base 5 integers obtained by
542remapping bytes based on the number of 1's therein.
543
544That is, 256 possibilities are reduced to 5, each with a
545trivial-to-compute frequency/probability, the base 5 integer is left
546shifted (base 5!) and added to the next one to cumulate 4 or 5 digit
547numbers, those numbers are then counted, and the counts compared to the
548EXPECTED counts given the number of samples and turned into chisq (one
549each, for 4 and for 5 digit numbers).  Sigh.
550
551OK then, we COULD just print the p-value of those chisq's -- an
552absolutely straightforward process.  Alternatively, we could do what
553Marsaglia does -- compute the DIFFERENCE of the chisq's themselves,
554which of course are supposedly distributed around the number of degrees
555of freedom of each one separately, that is 3125 - 625 = 2500!  This is
556the mean value of the difference, with stddev = sqrt(2*2500).  This
557final result is turned into a p-value.
558
559This of course is wierd and wrong in so many ways.  For one, it is
560entirely possible for chisq_4 and chisq_5 to differ systematically from
561their expected values, and it is perfectly possible and likely that
562those deviations would occur in the same direction -- e.g. down -- so
563that the 4 byte and 5 byte streams BOTH have fewer degrees of freedom
564than expected.  Of course in this case part of the individual deviations
565CANCEL and is not visible in the final p-value!
566
567Honestly, I can think of pretty much "no" reasonable ways that this
568final pvalue can fail where any/all of the p-values for the 4 or 5 (or 2
569or 3 or 1 or 17) digit distributions would not also fail, but I can (and
570just did) think of a way that the final p-value might MARGINALLY pass
571while the individual p-values fail.
572
573Although the general approach is a good one, what is clearly needed is a
574new test -- one that extends rgb_bitdist to multidigit strings of smaller
575base.  For example, right now rgb_bitdist tests the FREQUENCY of the
576occurrence of all 5 digit strings but not all 5 digit strings taken two
577at a time (32 x 32 = 1K possbilities).  Of course one REASON that I
578don't do that is that I >>already<< do e.g. 10 bit strings -- the DIRECT
579frequency distribution of those 1K possibilities, which obviously
580samples all the 5 bit combos taken two at a time!.  And besides, every
581rng I have tested fails at the level of what amounts to two digit octal
582numbers -- six bit strings.  So I'm cynical about this.
583
584I also don't think that this really counts 1's.  How is this one iota
585different from simply evaluating the distribution of bytes (8 bit
586randoms)?  If the complete distribution of 8 bit bytes (rgb_bitdist at 8
587bits) is random, ALL DERIVED DISTRIBUTIONS are random, with the only
588catchy part being temporal/sequential correlations that might be
589revealed from taking the strings four or five bytes at a time (at the
590expense of compressing the information tremendously).
591
592Once again, this test seems reduceable to but not as sensitive as
593rgb_bitdist at a given level, unfortunately a level far beyond where all
594known rng's fail anyway.  It is a MEASURE of the lack of sensitivity
595that this does does NOT fail for many of those rng's where rgb_bitdist
596does...
597
598   rgb
599
600==================================================================
60108/16/06
602
603OK, lots has happened.  First of all, I've discovered what I believe to
604be at least one, maybe two diehard data errors -- rngs consistently fail
605operm5 and bitstream that I (and everybody else) believe to be "good"
606ones insofar as such a thing exists.  Not surprising -- the data being
607used to compute the test statistics was simulated between 10 and 20
608years ago and this test routinely uses more random numbers from better
609generators in any given test than were likely used to simulate them in
610the first place.
611
612Second, dieharder is already in the process of developing quite a user
613base.  People have ported it to Windows, for example, and Dirk
614Eddelbuettel is interested in adding a native R interface to the tests
615(to be able to take advantage of several of the other facilities that R
616provides, a la octave/matlab/maple/mathematica as a programming UI).
617
618To facilitate both this latter project and the eventual creation of a
619true GTK-based GUI for dieharder, I've begun the fairly monumental task
620of splitting off all the tests, the add-on rng's that are intended to be
621part of the basic package, and all the intrinsic support functions into
622a standalone link library and associated set of include files.  I know
623from experience writing libwulf to support wulfware apps that this is a
624bit tricky, and I expect there to be deep bugs and places where the API
625and separation are not clean for quite some time now.  However, when I'm
626DONE it should be REALLY EASY to replace the front end UI.  In fact, if
627I clean up the UI itself while I'm at it to make it even more modular
628than it is, I should be able to add a GTK basic interface in a very
629short period of time.
630
631I'm also stripping down the actual source part of dieharder (the tty
632interface version) and putting most of the documentation and so on in
633with the LIBRARY, since that is where it will be needed.  Eventually I
634should need something like man pages and/or a manual for the
635libdieharder library, in addition to a much smaller set of toplevel
636dieharder documentation.  The application (in any UI version) should
637REALLY be fairly self-documenting as a design goal.
638
639To mark the advent of a version that appears to build on top of
640completely separated libdieharder.a, largely decrufted on both sides,
641I'm bumping the major number of dieharder even though this wasn't an
642original design goal.  It clearly merits a major version bump anyway.
643The minor.minor numbers will retain their original meaning (bugfix
644advances and number of tests supported) except that now I should
645probably reset bugfix to 0 in both libdieharder and dieharder.  I don't
646know what I'll do about synchronicity between the two -- really I should
647extract numbers from libdieharder to set dieharder version numbers, but
648parsing out all the screen-scraping macros gives me a headache so I'll
649await inspiration or need before messing with it.
650
651Note finally that doing this has put adding more e.g. marsaglia_tsang
652tests on hold.  I do have an extensive agenda here -- fixing up a
653"super" version of sts_serial, adding sts_serial itself (which should be
654more or less equivalent to rgb_bitdist, only the latter doesn't compute
655the correct statistic for a chisq, alas), figuring out what the RIGHT
656data is for diehard_operm5 and diehard_bitstream, completing the
657installation of 10^11 to 10^12 sample-based target data of my OWN for
658marsaglia_tsang_gcd, and so on.  However, clearly this needs to wait
659until the development environment is stable again, which could be days
660yet, or even longer depending on what else I have to do for Duke etc.
661
662I also haven't even looked at the WinXX patches -- I'll have to figure
663them out all over again after splitting off the library, I expect.
664
665Still, progress is being made.  I think that the application is
666definitely out there to where a) it could be announced more publically
667on e.g. the stats lists -- maybe when the libdieharder split has been in
668"alpha" for at least a while...;-) and b) it is time to look for grant
669support for this project to accomplish certain specific goals --
670supporting testing on a cluster (by the time we're done the
671sts_series/rgb_bitdist/rgb_lmn tests are going to be BIG CPU
672consumers:-), finishing off all the desired/required tests,
673re-simulating key target data for the brave new world of unlimited rands
674to test, and "getting the word out" via publication, travelling to
675learned meetings and presenting, and so on.  I'd guess that this will
676require a fairly serious grant for at least 3-5 years to do all that
677needs to be done that I know of already, and may need even more if (as I
678suspect) things are discovered that require significant new work to be
679done on testing algorithms themselves, e.g. "moment expansion" type
680tests, a connection between high-dimensional test failure and bit
681distribution tests.
682
683I have high hopes for the lmn test, although it may prove only the first
684step in lmnop, lmnopqr, lmnopqrst, etc... -- sampling lagged correlation
685between l-bit number, m-bit gap, n-bit number, o-bit gap, p-bit
686number...etc.  MANY tests seem to be fancy ways of doing just this and
687nothing more, interestingly enough -- in some cases ways of examining
688only a projective subspace of the failures one might occur from a truly
689systematic test of this sort.  For example, a Knuth test that reveals
690hyperplanar clustering of rand triplets viewed as spatial coordinates is
691effectively observing non-uniformity of the random integers represented
692by the conjunction of the three bitstrings... but without attempting to
693validate true randomness on 96-bit integers!
694==================================================================
695
69610/10/06
697
698Well, many many more bugs were discovered (mostly by Charles Karney) and
699fixed and the tests are much improved as a result.  At this point "good"
700generators pass all but the operm5 and bitstream tests.  operm5 I
701STRONGLY suspect of containing a significant error.  bitstream the jury
702is still out on -- the whole series of n-bit monkey tests bothers me
703as they seem inevitably related to rgb_bitdist but less specific and
704sensitive.  They are, however, more asymptotic and I've got to really
705think about it before passing final judgement (hopefully buttressed by
706very specific examples of correlated failure according to specific
707patterns).
708
709I added a whole selection of bit manipulation tools.  I can now grab
710GUARANTEED sequential bits from the bitstream produced by any of the
711supported rngs (independent of rmax_bits).  I can easily generate a
712string of (precisely) uints with 32 supposedly random bits from this
713sequential stream.  I can rotate this uint in place and mask/select any
714of its bits.  I can grab arbitrary bit windows within a block of memory,
715and can grab arbitrary ntuples within a block of memory with periodic
716boundary conditions.  These routines (only) should be used to access the
717rngs in nearly all tests as they eliminate spurious effects from a rng
718returning only 31 bits on a test that de facto presumes 32, or 22 bits,
719or 16 bits.  They also permit true overlapping samples to be generated
720absolutely trivially on a given fixed buffer for overlapping sample
721tests in dieharder (where it is by no means obvious that overlapping
722matters in any significant way).
723
724It is time to create two new tests.  One of them generalizes rgb_bitdist
725so that the ntuples it samples can be created by more or less arbitrary
726sample patterns, e.g.
727
728stream: 0101010001111101010110101110100010101110101001011001110110110...
729        -------|(repeat)
730
731Get sequential 8-tuples e.g. 01010100 -- existing rgb_bitdist test for
732nbits = 8.
733
734
735stream: 0101010001111101010110101110100010101110101001011001110110110...
736        ----....---|(repeat)
737
738Get 4 bits, skip 4 bits, get 4 bits to make the 8-tuple 01010111, test
739stream of 8-tuples created with this repeating pattern with rgb_bitdist.
740This should reveal fourier-like correlations of certain kinds within 12
741bits, but is LESS sensitive than properly testing 12 bits directly.
742
743
744stream: 0101010001111101010110101110100010101110101001011001110110110...
745        --........--........--........-|(repeat)
746
747Get 2 bits, skip 8 bits, get 2 bits, skip 8, get 2, skip 8, get 2 to
748make the 8-tuple 01111000, test stream of 8-tuples created with this
749repeating pattern with rgb_bitdist.  Note that this test looks for
750relatively LONG range ntuplet correlations -- a fourier-like correlation
751over 32 bit cycles and embedded 8 bit cycles at the same time.
752
753stream: 0101010001111101010110101110100010101110101001011001110110110...
754        .....................................---|(repeat)
755
756Skip 37 bits, get 4, run rgb_bitdist on the stream of 4-tuplets produced
757e.g. 1101 etc.  Note well that 37 bits is odd and prime, as is 41!  This
758approach can look for trouble in difficult places that might well be
759missed by a binary-happy sampling of sequential uints.
760
761This method of sampling can also be run on a single large buffer with
762overlap.  In principle, one can construct a lagged test series that
763should reveal e.g. hyperplanar distributions very precisely.  However,
764it is still likely limited by the number of bits one can practically run
765rgb_bitdist on, which appears to be somewhere in the ballpark of 12 to
76616.  Nevertheless, it also allows one to sample and discover at least a
767large class of problems across a much larger number of bits.
768
769Note well that in principle ALL the tests in dieharder could run on
770random numbers (uints or whatever) that are generated by using a
771patterned, lagged selection of bits such as this.  We can therefore
772accomplish a huge amount of flexibility if we build still more routines
773into bits.c, e.g.
774
775 uint get_rand_pattern((void)*result,int rsize,int *pattern,gsl_rng *gsl_rng){}
776
777where pattern is a variable length vector such as:
778
779 pattern[0] = 2;
780 pattern[1] = -8;
781 pattern[2] = 2;
782 pattern[3] = -8;
783 pattern[4] = 2;
784 pattern[5] = -8;
785 pattern[6] = 2;
786 pattern[7] = 0;
787
788which can be parsed to build an arbitrary sized return very easily.
789Another routine might use pattern and a varying offset to generate the
790ten distinct samplings of the bitstream that use one or more of the
791skipped bits, or even the full 32 corresponding to all overlapping
792samples with an offset that runs from 0 to 31, pulled either from
793primary bitstream or from a large cyclic buffer.
794
795I'm moderately tempted to write the Master Bit Selector in this way, but
796the damnable bit selection routines are actually a real pain in the ass
797to write and especially debug, so I think I'll start with
798get_rand_pattern (an obvious generalization of get_rand_bits()) and go
799from there.
800
801Then the entire decision on what to run and test is on the calling
802program.  If I make int *pattern a global variable that can be
803allocated, set, used, and freed by the caller before calling any test
804and use only get_rand_pattern() to get ints for testing, ALL tests can
805be run on selected patterns.  Dieharder can then sweep over patterns and
806really test the hell out of rngs...
807
808There is a second kind of test that we must implement -- one that I'm
809very well qualified to write on the basis of published research.  The
810CLT is a very powerful thing.  It predicts BOTH the mean AND the
811variance of any normal sampling process (and probably higher order
812moments as well -- there should be a series of cumulants, really, that
813describe the distribution of means).
814
815This gives us the ability to write a very subtle and sensitive test for
816global autocorrelation over arbitrary internal scales.  This is a
817research question (for me, anyway) but it seems that if a system has
818internal correlations that prevent samples from being truly iid, this
819will show up as a variance that does not scale correctly with the number
820of samples.  By comparing the VARIANCES, not the means, one should be
821able to directly observe that a statistic has e.g. fewer samples in it
822than you think that it does because of correlations between samples,
823even correlations over a very long time scale.
824
825A classic example is a sample stream with true periodicity -- you think
826that you've accumulated M samples with a variance that scales like
8271/sqrt{M} but really you've gotten the same P samples (each with
828variance that scales like 1/sqrt{P}).  The observed variance is
829therefore going to be much smaller than the expected variance for M
830samples, off by a factor of sqrt{M/P}.  Similar considerations hold for
831e.g. overlapping bit pattern selection where there is an
832"autocorrelation time" associated with the overlap.
833
834Here we might have a string like:
835
836 11010111010101000101010...
837
838generating:
839
840 11 01 01 11 01 01 01 00 01 01 01 0...
841
842and
843
844 .1 10 10 11 10 10 10 10 00 10 10 10...
845
846EACH of these strings can be tested with e.g. rgb_bitdist to see if the
847patterns are correctly distributed to yield a p-value.  If the generator
848passes two bit tests independent of the offset, the distribution of p
849will of course be correct for either sequence.  What of the "mixed"
850(interleaved) sequence:
851
852  11    01    01    11    01    01    01 ...
853
854+    10    10    11    10    10    10    10 ...
855
856= 11 10 01 10 01 11 11 10 01 10 01 10 01 10 ... ?
857
858Well, it is pretty much guaranteed to have the right number of 00's,
85901's, 10's and 11's overall.  However, we can make an even stronger
860statement.  Suppose that we know that the original bit sequence passes
861the >>3<<-tuple rgb_bitdist test.  Then in fact we know that it has the
862right mean distribution of 000 001 010 011 100 101 110 111's.  In that
863case as we parse these combinations into the two overlapping duplets, we
864get:
865
866 P(000) = 1/8 => 00 00
867 P(001) = 1/8 => 00 01
868 P(010) = 1/8 => 01 10
869 P(011) = 1/8 => 01 11
870 P(100) = 1/8 => 10 00
871 P(101) = 1/8 => 10 01
872 P(110) = 1/8 => 11 10
873 P(111) = 1/8 => 11 11
874
875If we see 00, then (as we will with probability 1/4, inherited from the
876triplet probability table and validated by the passing of rgb_bitdist
877for the two independent bit sequences separately), we know that it must
878be followed by 00 or 01.
879
880Suppose there were a surplus of 00s relative to 01s.  Then there would
881be more than 1/8 000s and less than 1/8 001s and we could not have
882passed the rgb_bitdist test for triplets.  This idea is obviously
883extensible for arbitrary ntuplets -- if one passes rgb_bitdist without
884overlap for n bits, one is guaranteed to pass with overlap at n-1 bits.
885
886Consequently there is really >>no point<< in running rgb_bitdist on a
887series of ntuplets.  If one observes the correct (binomial) probability
888distribution for the duplets 00, 01, 10, 11 over many samples, then one
889cannot help but have the correct probability distribution for the monets
8900 and 1 separately.  If one has the the correct binomial distribution of
891triplets, one cannot help but have the right distribution of duplets and
892monets, and so on, whether or not overlapping samples are used.
893
894This last example is an excellent case of conditional probability at
895work.  If we shift our sample window by one bit from a window that
896yields a valid distribution, {\em all that matters} is whether or not
897that additional bit is a 0 or 1 with the correct frequency.  There are
898always only {\em two} possibilities:
899
900  bbbb0 -> bbbb and bbb0
901  bbbb1 -> bbbb and bbb1
902
903If that bit is random with probability 0.5, then we should be able to
904use Bayes theorem to generate very simple proofs that everything stated
905above is correct.
906
907==================================================================
90801/29/07
909
910OK, just finished repackaging dieharder to use subpackages and a single
911source tree to build both libraries and the dieharder ui.  This will
912TREMENDOUSLY simplify both the semiseparate maintenance of the R UI as
913an RPM and the soon-come GUI built with Gtk/glade or the like.  In fact,
914any number of UI's can be added to this source tree, including ones
915under separate subversion trees.  I rather expect that even if I
916maintain a separate svn tree and a standalone RPM build/specfile for
917gdieharder, I'll probably keep the gdieharder development node here for
918convenience if nothing else.  It's either that or develop only on
919platforms where libdieharder is stable and installed.
920
921I've added a directory point for R, there is already a separate
922directory point for the manual (which will eventually get its own RPM).
923
924I'm wondering if I need to create a mailman list for dieharder.
925Probably really close.  Anyway, on to future plans, which is one thing
926the NOTES are for.  I still have several new RNGs to add, several tests
927to debug or re-engineer (they consistently fail "good" generators,
928suggesting "bad" tests).  I need to create a "standard dataset" and run
929the old fortran diehard against it to create a test reference, then see
930if I can reproduce the reference test values (within spitting distance)
931using the dieharder replacements.  That would help me resolve this
932question -- if I get the same numbers on a reference dataset then the
933new code is "good", so if that same code fails "good" rngs, the old code
934is probably bad.  Or something like that.
935
936I also need to add a still further improved reporting facility,
937especially for e.g the bitlevel tests.  One doesn't ONLY want to know if
938the rgb_bitdist 1's test fails, one wants to know if the sample had an
939excess of 0's or 1's, and by how many sigma, per run, in a table.
940Ditto the per run distribution of 00 01 10 11 in the 2's test, 000 001
941010... for the 3's test, at least out to maybe 8 bits (a table with 256
942entries is BARELY displayable, at least on a pixelized panel of lights
943display in a GUI).  Improved visualization is also important, for
944example to see hyperplane development for certain generators.
945
946I still need to work on the bits.c routines and at the very least get to
947where I can pack uints with arbitrary blocks of bits and holes.  This
948will permit me to work on fourier and lagged sample tests, ones that
949look for bitlevel correlations in samples pulled with (say) 23-bit holes
950between the end of one uint bitset and the next in an imagined
951continuous stream of random bits.
952
953I need to upload the new 2.x release to the website, and probably clean
954up the website.  Sigh.  LOTS to do, and so little time and
955competition...
956
957==================================================================
95802/27/07
959
960dieharder now builds with the usual ./configure;make sequence.  Gnu
961Build Tools Suck.  However, they are standard and expected, if
962infinitely painful.  Hopefully this will facilitate the debian build,
963though, and make rpm still works.
964
965May have fixed a minor bug in operm5, where it has been claimed that the
966correct number of degrees of freedom is 5!-4! = 96, not 99.  This makes
967at least some sense, so I implemented the change, but operm5 has MORE
968errors in either code or data, and I don't have time or knowledge to fix
969them...
970
971On to the next step -- putting the code on a truly public svn repo ---
972code.google.com.  This will take one little bit of work, as the svnsync
973didn't work the first times I tried it...
974
975==========================================================================
97603/09/08
977
978OK, so GBT don't suck, they are merely difficult.  I'm about to release
979a fairly major set of fixes and a new test or three, and one of the main
980focuses of this round of changes has been to make libtool and the GBT
981truly happy and dieharder perhaps more portable as a consequence.  This
982is still a process not without pain -- documentation is mediocre,
983undocumented features abound, rpath evil must be hand-repaired in order
984to facilitate rpm builds -- but it does look as if the GBT CAN be used
985to run major projects and automate a lot of the same stuff my original
986build automated, once you figure out how.
987
988As far as tests go, I've implemented a number of fixes and "new" tests
989since the last time around, and am about to get truly serious about a
990certain class of tests.
991
992The most important new test is sts_serial.  IMO this may prove to be the
993most useful of all rng tests save -- perhaps -- rgb_bitdist, which tests
994very much the same thing a very slightly different way.
995
996The most important planned addition is a "shuffler" that can be used to
997access the rng stream with lag, with or without loss.  In particular,
998I'd like to be able to generate a stream to sample with an adjustable
999lag in between samples, at the bit level, to be able to detect weakly
1000correlated cycles in the bitstream.
1001
1002Most of the tests in dieharder are really only strongly sensitive to
1003what I might call "first order" departure from randomness, first order
1004decorrelation.  They measure statistics generated from sampling the rng
1005stream in order, with each sample presumed to be independent.  They do
1006not, for the most part (with certain exceptions, that is) detect whether
1007any given rand is correlated with the rand that immediate follows it, or
1008the one two rands downstream, or the one fifty rands downstream -- those
1009are "second order" correlations.
1010
1011Amazingly, there seems to be no test that simply and directly evaluates
1012
1013  A(d) = <(1.0-2.0*x_i) *(1.0-2.0*x_{i+d})>,
1014
1015the average autocorrelation of the floating point random number stream
1016shifted so that the expected autocorrelation is zero for all
1017displacements d.  However, this test really needs to be run in a variety
1018of ways, as one can imagine bitlevel correlations that will show up
1019strongly on bitlevel tests but weakly on floating point tests.
1020
1021Naturally, one can imagine looking at higher order moments as well.
1022
1023To facilitate tests of all of these sorts, I want to wrap the gsl
1024generator in such a way that one can request:
1025
1026  n bits, skip m bits (throwing them away), to facilitate "direct"
1027  searches for
1028
1029and returning the n bits, or
1030
1031  a uint composed of the upper n bits, skip m uints, lower 32-n bits or
1032  vice versa.  Nothing is thrown away.
1033
1034Or
1035
1036  a bit stream returnable as n-bit window or uints made up of any
1037  reasonable shuffling or displacements of components, with or without
1038  interleaving (lossless) or loss.
1039
1040With such an interface, one could use ALL of the dieharder tests to
1041check for the (mostly) first order randomness of streams generated from
1042various interleaved displacements of any generator.  They would then all
1043"magically" become second order (or higher order) tests!
1044
1045   rgb
1046
1047==========================================================================
104809/04/08
1049
1050I've made many changes and bugfixes -- one of which is to fix the
1051run_test() routines so that they create, run, and destroy the tests, one
1052at a time (which suggests that it may well be time to alter the way
1053results are passed back or futher encapsulate tests to simplify
1054dieharder's logic, even at the expense of having tests that "function
1055like diehard's" did.  As in some diehard tests generate two statistics
1056in one run; it may be time to separate them into two tests and run them
1057independently with their own names and their own pvalue returns, even at
1058the expense of efficiency.
1059
1060However, I'm writing to post this as my current design goal for -T(able)
1061output with an optional flat to control just what is included in the
1062table and header.  Here's the proposed "standard" table output:
1063
1064#========================================================================
1065# Generator Tested: mt19937
1066# Test Name              | n-bits | p-samples | t-samples | p-value(s)
1067#========================================================================
1068RGB Bit Distribution Test|      1 |       100 |    100000 | 0.55579848
1069RGB Bit Distribution Test|      2 |       100 |    100000 | 0.69322169
1070...
1071
1072Each field's presence or absence will be able to be turned on or off by
1073means of a flag bit.  The presence of the header, and whether or not it
1074contains e.g. the rng descriptors (name and/or filename) will be flag
1075bit controlled as well, with -q basically setting the bits to suppress
1076headers altogether.
1077
1078In this way, with suitable flag settings, one can "just" output pvalues,
1079can output names and pvalues, can output just names and assessments,
1080etc.
1081
1082I'm guessing that the following are all fields we want to be able to
1083control with their own flag bits:
1084
1085THEADER
1086TGEN
1087TNAME
1088TSNAME
1089TNBITS
1090TPSAMPLES
1091TTSAMPLES
1092TPVALUE
1093TASSESSMENT
1094
1095Anyway, let's do it...
1096
10979/19/08
1098
1099That worked really well.  However, I now have to handle some way of
1100managing the tests that take ntuple as an argument for e.g. lag or
1101ntuple size.  I >>think<< that the only correct way to manage this will
1102be to move the loops to the run_all_tests() function, since that is ONLY
1103a CLI feature, and call the execute_test within it as per usual.  That
1104way run_test WITH a setting of ntuple will be entirely API driven and
1105consistent.
1106
1107The other thing(s) I have to implement include:  Some sort of CLI flags
1108or control variables to tell dieharder when and how to set a random
1109number seed.  We have several possible seed strategies, you see:
1110
1111  a) Dieharder reseeds from /dev/random or clock one time at the time
1112the rng is initialized, and then left alone (CLI) until the rng is freed
1113(UI) or the program is ended (CLI).  The Rdh UI could in principle
1114permit the seed to be interactively reset by the user at any time;
1115that's up to Dirk.
1116
1117When dieharder outputs results, one of the output flags will indicate
1118whether or not the rng seed should appear.  In general it probably
1119should, but only the one seed randomly picked at the beginning.
1120
1121  b) Dieharder uses a seed entered by the user one time at the
1122beginning.  This should clearly be able to perfectly reproduce any
1123identical sequence of tests (including dieharder -a results) that began
1124with just that seed, so this specific thing should be a possibility.
1125
1126  c) Dieharder uses a seed entered by the user at the beginning to set
1127the seed at the beginning of each test.  This usage is primarily for
1128validation purposes, since doing this makes results from "series" tests
1129that run closely related tests on various ntuples or lags not
1130independent.  For example, rgb_bitstream for an ntuple of 4 is obviously
1131going to be strongly correlated with the results for an ntuple of 8
1132STARTED FROM THE SAME SEED; the two half-bytes that make up each 8 were
1133previously tested and the results of that test will clearly affect the
1134results for 8.  The same problem exists for the closely related
1135sts_serial, but it keeps track and attempts to correct for it.
1136
1137Still, entering a seed and having it used to reseed EACH test run means
1138that dieharder -a -S 1 can EASILY generate an entire set of
1139reference/validation results that can then be generated and compared one
1140at a time in Rdh (which doesn't "do" a -a test).  If we seeded only at
1141the beginning, any tiny variation in e.g. the number of rands called or
1142test order would produce different downstream numbers and a failure to
1143validate.
1144
1145Obviously this mode should not be used for testing, as one expects
1146"failures" to cluster in linked tests because of the aforementioned
1147non-independent of the results.  If 4 bit numbers lead to a "weak"
1148pvalue, testing 8 bit numbers should also lead to a weak pvalue and/or
1149vice versa.
1150
1151Our current difficulty is that we have THREE possibilities but only TWO
1152CLI control states -- with -S whatever or without.  Without is fine --
1153that can and should select random seeding.  With, however, requires a
1154second CLI switch option to select between the two seeding strategies.
1155
1156There really isn't anything wrong with providing the fourth combination
1157of two separate toggles as follows:
1158
1159
1160   Strategy \ Seed|            0         |           1
1161------------------------------------------------------------------
1162         beginning|  random at beginning | from Seed at beginning
1163         per test |  random per test     | from Seed per test
1164
1165SO, it looks like we need to introduce TWO variables.  -s sets the
1166strategy.  -S sets the seed to the selected value, or left alone at 0,
1167sets seed to a "random" choice.
1168
1169This works because -s is liberated now that we don't need to select sts
1170tests that way.  So is -r and -d.  We should probably keep -d and use it
1171for -d(ieharder test name/test number) the same way -g allows or will
1172allow one to pick either one.  -r, well, we'll have to think about what
1173to do with it, eventually.
1174
1175This is enough for now.
1176