1Cookbook part 2: Random things, and some math
2================================================================
3
4Randomly selecting words from a list
5----------------------------------------------------------------
6
7Given this `word list <https://github.com/johnkerl/miller/blob/master/docs/data/english-words.txt>`_, first take a look to see what the first few lines look like:
8
9::
10
11    $ head data/english-words.txt
12    a
13    aa
14    aal
15    aalii
16    aam
17    aardvark
18    aardwolf
19    aba
20    abac
21    abaca
22
23Then the following will randomly sample ten words with four to eight characters in them:
24
25::
26
27    $ mlr --from data/english-words.txt --nidx filter -S 'n=strlen($1);4<=n&&n<=8' then sample -k 10
28    thionine
29    birchman
30    mildewy
31    avigate
32    addedly
33    abaze
34    askant
35    aiming
36    insulant
37    coinmate
38
39Randomly generating jabberwocky words
40----------------------------------------------------------------
41
42These are simple *n*-grams as `described here <http://johnkerl.org/randspell/randspell-slides-ts.pdf>`_. Some common functions are `located here <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ngfuncs.mlr.txt>`_. Then here are scripts for `1-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng1.mlr.txt>`_ `2-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng2.mlr.txt>`_ `3-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng3.mlr.txt>`_ `4-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng4.mlr.txt>`_, and `5-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng5.mlr.txt>`_.
43
44The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list -- giving us automatically generated words in the same vein as *bromance* and *spork*:
45
46::
47
48    $ mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
49    beard
50    plastinguish
51    politicially
52    noise
53    loan
54    country
55    controductionary
56    suppery
57    lose
58    lessors
59    dollar
60    judge
61    rottendence
62    lessenger
63    diffendant
64    suggestional
65
66Program timing
67----------------------------------------------------------------
68
69This admittedly artificial example demonstrates using Miller time and stats functions to introspectively acquire some information about Miller's own runtime. The ``delta`` function computes the difference between successive timestamps.
70
71::
72
73POKI_INCLUDE_ESCAPED(data/timing-example.txt)HERE
74
75Computing interquartile ranges
76----------------------------------------------------------------
77
78For one or more specified field names, simply compute p25 and p75, then write the IQR as the difference of p75 and p25:
79
80::
81
82POKI_INCLUDE_AND_RUN_ESCAPED(data/iqr1.sh)HERE
83
84For wildcarded field names, first compute p25 and p75, then loop over field names with ``p25`` in them:
85
86::
87
88POKI_INCLUDE_AND_RUN_ESCAPED(data/iqrn.sh)HERE
89
90Computing weighted means
91----------------------------------------------------------------
92
93This might be more elegantly implemented as an option within the ``stats1`` verb. Meanwhile, it's expressible within the DSL:
94
95::
96
97POKI_INCLUDE_AND_RUN_ESCAPED(data/weighted-mean.sh)HERE
98
99Generating random numbers from various distributions
100----------------------------------------------------------------
101
102Here we can chain together a few simple building blocks:
103
104::
105
106POKI_RUN_COMMAND{{cat expo-sample.sh}}HERE
107
108Namely:
109
110* Set the Miller random-number seed so this webdoc looks the same every time I regenerate it.
111* Use pretty-printed tabular output.
112* Use pretty-printed tabular output.
113* Use ``seqgen`` to produce 100,000 records ``i=0``, ``i=1``, etc.
114* Send those to a ``put`` step which defines an inverse-transform-sampling function and calls it twice, then computes the sum and product of samples.
115* Send those to a histogram, and from there to a bar-plotter. This is just for visualization; you could just as well output CSV and send that off to your own plotting tool, etc.
116
117The output is as follows:
118
119::
120
121POKI_RUN_COMMAND{{sh expo-sample.sh}}HERE
122
123Sieve of Eratosthenes
124----------------------------------------------------------------
125
126The `Sieve of Eratosthenes <http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes>`_ is a standard introductory programming topic. The idea is to find all primes up to some *N* by making a list of the numbers 1 to *N*, then striking out all multiples of 2 except 2 itself, all multiples of 3 except 3 itself, all multiples of 4 except 4 itself, and so on. Whatever survives that without getting marked is a prime. This is easy enough in Miller. Notice that here all the work is in ``begin`` and ``end`` statements; there is no file input (so we use ``mlr -n`` to keep Miller from waiting for input data).
127
128::
129
130POKI_RUN_COMMAND{{cat programs/sieve.mlr}}HERE
131
132::
133
134POKI_RUN_COMMAND{{mlr -n put -f programs/sieve.mlr}}HERE
135
136Mandelbrot-set generator
137----------------------------------------------------------------
138
139The `Mandelbrot set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_ is also easily expressed. This isn't an important case of data-processing in the vein for which Miller was designed, but it is an example of Miller as a general-purpose programming language -- a test case for the expressiveness of the language.
140
141The (approximate) computation of points in the complex plane which are and aren't members is just a few lines of complex arithmetic (see the Wikipedia article); how to render them is another task.  Using graphics libraries you can create PNG or JPEG files, but another fun way to do this is by printing various characters to the screen:
142
143::
144
145POKI_RUN_COMMAND{{cat programs/mand.mlr}}HERE
146
147At standard resolution this makes a nice little ASCII plot:
148
149::
150
151POKI_RUN_COMMAND{{mlr -n put -f ./programs/mand.mlr}}HERE
152
153But using a very small font size (as small as my Mac will let me go), and by choosing the coordinates to zoom in on a particular part of the complex plane, we can get a nice little picture:
154
155::
156
157    #!/bin/bash
158    # Get the number of rows and columns from the terminal window dimensions
159    iheight=$(stty size | mlr --nidx --fs space cut -f 1)
160    iwidth=$(stty size | mlr --nidx --fs space cut -f 2)
161    echo "rcorn=-1.755350,icorn=+0.014230,side=0.000020,maxits=10000,iheight=$iheight,iwidth=$iwidth" \
162    | mlr put -f programs/mand.mlr
163
164.. image:: pix/mand.png
165