1Cookbook part 2: Random things, and some math 2================================================================ 3 4Randomly selecting words from a list 5---------------------------------------------------------------- 6 7Given this `word list <https://github.com/johnkerl/miller/blob/master/docs/data/english-words.txt>`_, first take a look to see what the first few lines look like: 8 9:: 10 11 $ head data/english-words.txt 12 a 13 aa 14 aal 15 aalii 16 aam 17 aardvark 18 aardwolf 19 aba 20 abac 21 abaca 22 23Then the following will randomly sample ten words with four to eight characters in them: 24 25:: 26 27 $ mlr --from data/english-words.txt --nidx filter -S 'n=strlen($1);4<=n&&n<=8' then sample -k 10 28 thionine 29 birchman 30 mildewy 31 avigate 32 addedly 33 abaze 34 askant 35 aiming 36 insulant 37 coinmate 38 39Randomly generating jabberwocky words 40---------------------------------------------------------------- 41 42These are simple *n*-grams as `described here <http://johnkerl.org/randspell/randspell-slides-ts.pdf>`_. Some common functions are `located here <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ngfuncs.mlr.txt>`_. Then here are scripts for `1-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng1.mlr.txt>`_ `2-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng2.mlr.txt>`_ `3-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng3.mlr.txt>`_ `4-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng4.mlr.txt>`_, and `5-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng5.mlr.txt>`_. 43 44The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list -- giving us automatically generated words in the same vein as *bromance* and *spork*: 45 46:: 47 48 $ mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr 49 beard 50 plastinguish 51 politicially 52 noise 53 loan 54 country 55 controductionary 56 suppery 57 lose 58 lessors 59 dollar 60 judge 61 rottendence 62 lessenger 63 diffendant 64 suggestional 65 66Program timing 67---------------------------------------------------------------- 68 69This admittedly artificial example demonstrates using Miller time and stats functions to introspectively acquire some information about Miller's own runtime. The ``delta`` function computes the difference between successive timestamps. 70 71:: 72 73POKI_INCLUDE_ESCAPED(data/timing-example.txt)HERE 74 75Computing interquartile ranges 76---------------------------------------------------------------- 77 78For one or more specified field names, simply compute p25 and p75, then write the IQR as the difference of p75 and p25: 79 80:: 81 82POKI_INCLUDE_AND_RUN_ESCAPED(data/iqr1.sh)HERE 83 84For wildcarded field names, first compute p25 and p75, then loop over field names with ``p25`` in them: 85 86:: 87 88POKI_INCLUDE_AND_RUN_ESCAPED(data/iqrn.sh)HERE 89 90Computing weighted means 91---------------------------------------------------------------- 92 93This might be more elegantly implemented as an option within the ``stats1`` verb. Meanwhile, it's expressible within the DSL: 94 95:: 96 97POKI_INCLUDE_AND_RUN_ESCAPED(data/weighted-mean.sh)HERE 98 99Generating random numbers from various distributions 100---------------------------------------------------------------- 101 102Here we can chain together a few simple building blocks: 103 104:: 105 106POKI_RUN_COMMAND{{cat expo-sample.sh}}HERE 107 108Namely: 109 110* Set the Miller random-number seed so this webdoc looks the same every time I regenerate it. 111* Use pretty-printed tabular output. 112* Use pretty-printed tabular output. 113* Use ``seqgen`` to produce 100,000 records ``i=0``, ``i=1``, etc. 114* Send those to a ``put`` step which defines an inverse-transform-sampling function and calls it twice, then computes the sum and product of samples. 115* Send those to a histogram, and from there to a bar-plotter. This is just for visualization; you could just as well output CSV and send that off to your own plotting tool, etc. 116 117The output is as follows: 118 119:: 120 121POKI_RUN_COMMAND{{sh expo-sample.sh}}HERE 122 123Sieve of Eratosthenes 124---------------------------------------------------------------- 125 126The `Sieve of Eratosthenes <http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes>`_ is a standard introductory programming topic. The idea is to find all primes up to some *N* by making a list of the numbers 1 to *N*, then striking out all multiples of 2 except 2 itself, all multiples of 3 except 3 itself, all multiples of 4 except 4 itself, and so on. Whatever survives that without getting marked is a prime. This is easy enough in Miller. Notice that here all the work is in ``begin`` and ``end`` statements; there is no file input (so we use ``mlr -n`` to keep Miller from waiting for input data). 127 128:: 129 130POKI_RUN_COMMAND{{cat programs/sieve.mlr}}HERE 131 132:: 133 134POKI_RUN_COMMAND{{mlr -n put -f programs/sieve.mlr}}HERE 135 136Mandelbrot-set generator 137---------------------------------------------------------------- 138 139The `Mandelbrot set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_ is also easily expressed. This isn't an important case of data-processing in the vein for which Miller was designed, but it is an example of Miller as a general-purpose programming language -- a test case for the expressiveness of the language. 140 141The (approximate) computation of points in the complex plane which are and aren't members is just a few lines of complex arithmetic (see the Wikipedia article); how to render them is another task. Using graphics libraries you can create PNG or JPEG files, but another fun way to do this is by printing various characters to the screen: 142 143:: 144 145POKI_RUN_COMMAND{{cat programs/mand.mlr}}HERE 146 147At standard resolution this makes a nice little ASCII plot: 148 149:: 150 151POKI_RUN_COMMAND{{mlr -n put -f ./programs/mand.mlr}}HERE 152 153But using a very small font size (as small as my Mac will let me go), and by choosing the coordinates to zoom in on a particular part of the complex plane, we can get a nice little picture: 154 155:: 156 157 #!/bin/bash 158 # Get the number of rows and columns from the terminal window dimensions 159 iheight=$(stty size | mlr --nidx --fs space cut -f 1) 160 iwidth=$(stty size | mlr --nidx --fs space cut -f 2) 161 echo "rcorn=-1.755350,icorn=+0.014230,side=0.000020,maxits=10000,iheight=$iheight,iwidth=$iwidth" \ 162 | mlr put -f programs/mand.mlr 163 164.. image:: pix/mand.png 165