• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

makefileH A D03-May-2022135 93

readmeH A D20-Mar-199636.1 KiB759600

wordplay.cH A D03-May-202231.3 KiB1,125739

readme

1------------------------------------------------------------------------------
2
3Wordplay Version 7.22     Evans A Criswell   03-20-96
4
5------------------------------------------------------------------------------
6
7This program was written for fun and is free.  Distribute it as you please,
8but please distribute the entire package, with the original words721.txt and
9the readme file.  If you modify the code, please mention my name in it as the
10original author.  Please send me a copy of improvements you make, because I
11may include them in a future version.
12
13I may be contacted by email at criswell@cs.uah.edu
14
15Evans A Criswell
16Research Associate
17Computer Science Department
18University of Alabama in Huntsville
19Huntsville, AL  35899
20
21------------------------------------------------------------------------------
22
23Wordplay is an anagram finder.  What is an anagram?  Well, let's turn to
24Merriam-Webster's Collegiate Dictionary, Tenth Edition:
25
26Definition:  anagram:  a word or phrase made by transposing the letters
27		       of another word or phrase.
28
29Each letter in the anagram must appear in the same frequency as in the
30original string.
31
32For example, the letters in the word "stop" can be rearranged to spell
33"tops" or "pots" or "sotp".  "sotp" is not a word and is not of interest
34when generating anagrams.  "stop" has four letters, so there are 24 ways
35to rearrange its letters.  However, very few of the rearrangements actually
36spell words.
37
38Wordplay, by using a list of words, takes a specified string of letters and
39uses the list of words to find anagrams of the string.
40
41By the way, "Wordplay" anagrams to "Rowdy Pal", and the program really can
42live up to that particular anagram.  I have been able to come up with
43anagrams of most of my coworkers' names that are humorous, descriptive,
44satirical, or, occasionally, quite vulgar.
45------------------------------------------------------------------------------
46
47Compiling the program:
48
49The only version of Wordplay that requires compilation is the UNIX version.
50The PC versions contain a pre-compiled executable since most people do not
51have 32-bit C compilers on their PC's.  The source is provided in all the
52packages.  I recommend the Gnu C compiler for DOS for compiling it.  There
53is a file in the DOS package that tells where to get the package, as well as
54how to set environment variables for the GO32 extender.
55
56Under UNIX, the program should be compiled with an ANSI C compiler.  Never
57fear; it's easy.  If you have the GNU C compiler, use it as follows:
58
59  gcc -O -o wordplay wordplay.c
60
61If you do not have "gcc", or for whatever reason, wish to use your machine's
62native compiler, use "cc" in place of "gcc", as follows:
63
64  cc -O -o wordplay wordplay.c
65
66This assumes your native "cc" is an ANSI compiler.  If not, it WILL NOT WORK.
67If your compiler does not support the optimization option "-O", leave it out.
68
69Feel free to use optimization options that your particular compiler offers.
70They do make a significant difference with some compilers.
71-----------------------------------------------------------------------------
72
73Usage:
74
75To use the program, simply invoke the program with a combination of options
76that make sense together.  Here is the format:
77
78     wordplay string [-slxavnXmXdX] [-w word] [-f wordfile]
79
80where the capital X's represent integers.  Please see the examples below
81the option descriptions.  The square brackets are not part of the command
82line.  They simply are used to show that the options they surround are
83optional.
84
85wordplay:              This is the name of the executable (under UNIX) after
86		       you have compiled the program.  Under other
87		       environments, method of invocation may vary.
88
89string:                This should be seen to the program AS A SINGLE
90		       ARGUMENT.  If you feel you must put spaces in the
91		       string, under UNIX, you will have to put backslashes
92		       in front of the spaces or just put the entire string
93		       in double quotes.  Just leave the spaces out because
94		       the program throws them out anyway.  Under environments
95		       other than UNIX, you may be forced to omit them anyway,
96		       like under MS-DOS or OS/2 .
97
98Other options:
99
100  s:  Silent operation.  If this option is used, the header and line
101      numbers are not printed.  This is useful if you want the output to
102      contain only the anagrams.  Use this option with the l (and x) option
103      to generate a wordlist which can be piped or redirected.
104
105      This option does not suppress error messages that are printed to
106      stderr.  Finding zero anagrams is not an error.
107
108  l:  Print list of candidate words before anagramming.  This is the list of
109      words that can be spelled with the letters from the specified string,
110      with no letters being used more often that they appear in the input
111      string.
112
113  x:  Do not perform anagramming.  Use with l if you just want the
114      candidate word list without anagrams.
115
116  a:  Allow anagrams containing two or more occurrences of a word.
117
118  v:  Consider strings with no vowels as candidate words and do not give
119      up when there are no vowels remaining after extractions.
120
121  m:  Limit candidate word length to a maximum number of letters.
122      Follow by an integer.  m12 means limit words to 12 letters.
123      m5 means limit them to 5 letters.
124
125  n:  Limit candidate word length to a minimum number of letters.
126      Follow by an integer.  n2 means limit words to 2 letters.
127      n11 means limit them to 11 letters.
128
129  d:  Limit number of words in anagrams to a maximum number.
130      Follow by an integer.  d3 means no anagrams should contain more
131      than 3 words.  d12 means limit anagrams to 12 words.  This is
132      currently the option that I recommend to limit output, since
133      an optimization has been added to speed execution in some cases
134      when this option is used.
135
136  w:  Specify a word which should appear in all anagrams.  This is useful
137      if you already have a word in mind that you want in the anagrams.
138      This option should be specified at the end of the command, followed
139      by a space and the word to use.
140
141  f:  Specify which word list to use.  See example!  This option should
142      be specified at the end of the command, followed by a space and the
143      alternate wordfile name.  This is useful if you have other word lists
144      to try or if you are interested in making your own customized word
145      list.
146
147      New feature:  Use a hyphen as the filename if the wordlist should
148      be read from stdin.
149
150
151Examples of usage:
152
153(1)
154wordplay persiangulf
155
156     Anagram the string "persiangulf" .
157
158(2)
159wordplay anagramming -lx
160
161     Print the list of words from the wordlist that can be spelled by using
162     the letters from the word "anagramming".  A letter may not be used more
163     often than the number of times it occurs in the word "anagramming".
164     No anagrams are generated.
165
166(3)
167wordplay tomservocrow -n3m8
168
169     Anagram the string "tomservocrow" .  Do not use words shorter than
170     3 letters or longer than 8 letters.
171
172(4)
173wordplay persiangulf -ld3m10 -f /usr/dict/words
174
175     Print the candidate words for the string "persiangulf".
176     Print anagrams containing up to 3 words, without considering any
177     words longer than 10 characters.  Usr the file "/usr/dict/words"
178     rather than "words721.txt".
179
180(5)
181wordplay soylentgreen -n3w stolen -f w2     or
182wordplay soylentgreen -n3 -w stolen -f w2   or
183wordplay soylentgreen -n3f w2 -w stolen     or
184wordplay soylentgreen -n3 -f w2 -w stolen   (get the idea?)
185
186     Print anagrams of "soylentgreen" containing the word "stolen" and
187     use the file "w2" as the wordlist file.  Discard candidate words
188     shorter than 3 characters.
189
190(6)
191wordplay university -slx
192
193     Print the candidate word list for the string "university".  The
194     output will consist of just the words.  This output is more useful
195     for redirecting to a file or for piping to another program.
196
197(7)
198wordplay trymeout -s
199
200    Anagram the string "trymeout" and print the anagrams with no line
201    numbers.  The header will not be printed.  This is useful for piping
202    the output to another process (or saving it to a file to be used
203    by another program) without having to parse the output to remove the
204    numbers and header.
205
206(8)
207wordplay trymeout -v
208
209    Anagram "trymeout" as usual, but in case vowel-free strings are in
210    the wordlist, consider them as possible candidate words.
211
212(9)  UNIX ONLY!
213cat wordlist1 wordlist2 wordlist3 | sort -u | wordplay trymeout -f -
214
215    Anagram "trymeout" and read the wordlist from stdin, so that, in
216    this case, under UNIX, the three wordlists "wordlist1", "wordlist2",
217    and "wordlist3" will be concatenated and piped into wordplay as
218    the wordlist.  The "sort -u" is there to remove duplicate words
219    from the combined wordlist.
220
221------------------------------------------------------------------------------
222
223Using the "f" option to specify a word file:
224
225If the option specifiers are combined, as in "an7m7d5f" or "d3n5f", the f
226should come last, followed by a space and the word list file, as shown in
227example number 4 above.
228
229The "w" option is used in the same manner.
230
231------------------------------------------------------------------------------
232
233Notes:
234
235Limit the number of words to consider, if desired, using the n and m options,
236or better yet, use the d option to limit depth, when anagramming certain
237time-consuming strings.  The program is currently optimized to speed execution
238in some cases when the d option is used.
239
240Plurals and past tenses
241
242The words721.txt does not contain plural forms of nouns obtained by adding "s"
243or "es".  It usually does not contain verb forms obtained by adding "ed" or
244"d", and it does not contain many adjective forms obtained by adding "y".
245If the string you are anagramming contains an "s", try anagramming the
246string without the "s" and add an "s" in the output.  This trick may also
247work effectively with "d" and "y".
248
249Apostrophes, hyphens, and other non-alphabetics
250
251All non-alphabetic characters in a word are preserved, INCLUDING BLANKS!!!
252If you have a dictionary with words like "DON'T", "ONE-EYED", and
253"ATOMIC NUMBER", each will be correctly processed.  Note that words
254like "ATOMIC NUMBER" or "KNOW IT ALL" in your word list will be considered
255as a single word by the program!  If "ATOMIC" and "NUMBER" appear as
256single words also, "ATOMIC NUMBER" will appear twice in the output, once
257as a one-word anagram and once as a two-word anagram.  This is not a flaw
258in the program.  The words721.txt word list does not contain "double words",
259but other dictionaries, like web2/web2a, do contain such things.
260
261The "words721.txt" wordfile:
262
263If no wordfile is specified, "words721.txt" is used.  It is highly
264recommended that the "words721.txt" file distributed with the program be
265used, since many nonsense two and three-letter combinations that are not
266words have been eliminated.  This makes the quality of the output slightly
267better and speeds execution of the program a slight bit.  Any word list may
268be used, as long as there is one word per line.  Feel free to create your
269own custom word list and use it instead.  The word list does not have to be
270sorted in any particular way.
271
272------------------------------------------------------------------------------
273Program development history:
274
275Version 1.00
276
277In a restroom in the building I used to work, in the last stall, one person
278had written "roll tide" and someone else had responded with "war eagle".
279(University of Alabama and University of Auburn rivalry).  Well, some people
280started rearranging the letters of the two scribbles and soon there was quite
281a long column of anagrams below each.  People then wrote other things, like
282"tomatoes" and "boredom" and those got anagrammed by everyone.  I thought as I
283sat there one day, "How hard would it be to write a program to do that?".
284One night, I decided to give it a try.  On March 29, 1991, I tackled it.
285I coded one little step at a time and got it working.  I anagrammed
286"Persian Gulf" and wrote 30 anagrams of it in that restroom stall.  I took
287the code and tried to run it under VMS.  I had to change one line to make it
288work.  It also worked on a Stardent UNIX computer after changing that same
289line.
290
291Version 1.00 was written in one night.  I sat down at my PC at home, and using
292Microsoft Fortran 5.0, coded it up.  It just read the words into an array, and
293prompted for strings to anagram or a "." to terminate.  Each time a string was
294entered, the number of occurrences of each letter was counted and stored in
295an array of size 26.  The word list was then filtered, with the result going
296to a second array.  Any word containing a letter that did not appear in the
297string being anagrammed was not copied over.  This eliminated many of the
298words in most cases.
299
300Lengths of the words were checked while doing one and two word anagrams.
301If the length of a word was not equal to the length of the string being
302anagrammed, there was no need to check it.  Likewise, with two-word anagrams,
303there was no need to check them if the sums of the lengths did not match the
304length of the string being anagrammed.
305
306This program did the job, but was quite slow.
307
308Version 1.10
309
310I had two main thoughts in my head when I worked on the program again five
311days later on April 3, 1991.  First, it needed to run faster, and second,
312if the program takes a while to execute, it would be nice if it could give
313an indication of how long the anagramming would take.
314
315The speed increase came from inserting a second word filter into the program.
316Basically, if a word contains more of a particular letter than the string
317being anagrammed, it can be tossed out.  Note than any word thrown out in
318pass one would be thrown out by this second pass, but since the second pass
319is much more "expensive" timewise to execute, the first pass was not changed.
320The second pass simply works on the list produced by pass one, and eliminates,
321on the average, one third to one half of the words remaining after pass one.
322The result of pass two is the list of words that can be spelled using the
323letters of the string being anagrammed.  The result of the second word filter
324being inserted was an average three to four times faster execution.
325
326I inserted a loop similar to the loop the program executes while anagramming
327and timed it when the program started to see how long it took on whatever
328machine it was being run on to get a rough idea of the machine speed.  This
329number, whatever it was, was multiplied by the square of the number of
330candidate words remaining after filtering in order to produce an estimate.
331This speed test only worked under OS/2, though.  Running the program on any
332other system required commenting out that code.
333
334A small but nice thing I did is automatically converted all letters in the
335user's input strings to upper case so the user wouldn't be forced to use all
336upper case.  Blanks were automatically stripped out of the user's string so
337that they wouldn't have to run things together.
338
339I decided at this point to go ahead and give the user some commands that
340could be run by using a "/" as the first character of the string.  Most were
341trivial to implement.  "VER" (print program version), "WORD" (print last
342string anagrammed), "HELP" (print commands available), "EXIT" (same as
343entering a "." as the first character), and "LIST".  "LIST" printed the
344candidate word list after pass two.  Remember in elementary school where
345the teacher would write a word on the board and ask you to find as many
346words as you could that could be spelled using the word's letters?  That's
347what the "LIST" command provided.
348
349Version 1.11
350
351This version was written on April 11, 1991.
352
353An array containing the lengths of the filtered words was added to the
354program to keep from having to repeatedly recompute lengths of the words.
355The length-based optimizations from version 1.00 ran faster this way.
356
357Version 2.00
358
359One day after writing version 1.11, on April 12, 1991, I made a major
360upgrade.  I decided the program was running efficiently enough to
361possibly handle three word anagrams.  I coded the three word anagram
362using basically the same technique I used for the two-word ones.  I
363checked to see if the lengths of the three words being tested added to
364the length of the string being anagrammed before testing the number of
365letters, etc.
366
367More commands and options were added in this version.  Since three word
368anagrams took a lot of time and produced lengthy output, I figured most
369of the time, users would not want to do the three word anagramming.  I
370decided to make various steps of the program options that could be enabled
371or disabled.  The steps that could be disabled or enabled independently
372were the pass two word filter, one word anagrams, two word anagrams, and
373three word anagrams.  The three word anagrams were disabled by default.
374The new commands were "DIS" and "ENA" to enable and disable options, along
375with "STAT", which showed the settings of various options.
376
377Version 2.10
378
379Only four days after writing version 2.00, I improved the program again
380(4-14-91) by adding the minimum and maximum candidate word length restriction
381options.  This allowed the user to specify that words shorter or longer than
382a specified number of letters should not be considered as candidate words.
383This weeds out a lot of small words and speeds things up when anagramming
384longer strings.  Obviously, if the length of the string being anagrammed
385is less than twice the minimum word length, no two word anagrams will be
386found.  Also, if the length of the string being anagrammed is more than
387twice the the minimum word length, no two word anagrams will be found, so
388those checks were put in.  Similar checks were put in for the three word
389anagrams.  The "MIN" and "MAX" commands were the new commands added to
390allow the user specify minimum and maximum word lengths.
391
392Version 3.00
393
394The version 2.10 was satisfactory for quite a while (8 months).
395On December 16, 1991, however, I upgraded it again, and to users of the
396program, except for the new version number "3.00", the program looked
397and behaved exactly the same way, except for a couple of things.  The output
398was no longer in alphabetical order and the program was faster, especially
399when doing three word anagrams.  Losing alphabetical order was a small
400price to pay for the substantial speed increase, which was accomplished by
401sorting the word list by word length, using the combsort algorithm from BYTE
402magazine.  A reprint of the article describing combsort appears in the book
403THE BEST OF BYTE.  Anyway, this was the second optimization.  Let me first
404describe the first optimization:
405
406After reading the user's string to anagram, the least and greatest letters
407occurring in the string were stored.  For example, if the user entered
408"boredom", then the minimum letter was 2 for "b" and the maximum was
40918 for "r".  The array containing the numbers of occurrences for each letter
410of the alphabet was then processed into two "parallel arrays", called
411nlasci and nlascl.  The arrays were created in such a way that, for example,
412if nlasci(1) was 4 and nlascl was 1, this meant there were was one "d" in the
413input string.  nlasci(2) being 9 and nlascl(2) being 4 would mean there were 4
414occurrences of "i" in the input string.  Note that the sizes of nlasci and
415nlascl never exceed 26 and the number of elements used in each is equal to
416the number of distinct letters in the input string.  For example, "boredom"
417contains 6 distinct characters, so the nlasci and nlascl arrays only contain
4186 entries each, with the empty slots in nlasci being filled with "-1" and
419empty slots in nlascl being filled with "0" .  Why create these arrays?
420Because the anagram procedure now has a list of which letters to check for
421equal numbers of occurrences instead of having to check all 26 letters of
422the alphabet every time!  A huge speed increase for shorter strings!
423
424Now back to sorting the word list by length.  I then created indexes into
425the sorted word list by length.  For example, the program now easily "knew"
426that words of length 7, for example, were in, say, elements 234 through 327
427of the candidate word array!  If I'm doing two word anagrams, and my first
428word to consider is 5 characters long, and my anagramming string is 11
429characters long, I can obviously scan the candidate words that are 6
430characters in length and ignore the rest.  The indexes make that very easy.
431
432A change was make to the three-word anagram procedure.  It was faster to
433modify the two word anagram procedure a slight bit and concatenate the
434first two words being considered and call the modified two word anagram
435procedure with two words consisting of the word1 + word2, and word3.
436
437Version 4.00
438
439I felt much more comfortable with C at this point, so I decided I'd try
440to tackle porting the program.  On April 30, 1993, in one night, I ported
441everything except the three word anagrams and a lot of the interactive
442options.  The new version was command-line based instead of interactive
443and was more typical of a UNIX-style program.  About a month later, on
444May 25, I added the three word anagram code and got the program in a
445fairly stable state so it could be distributed.  Actually, I don't think
446I ever took the word "Beta" out of it.  :-)  The program printed its
447command line parameters (debug statements).  That was never taken out.
448The C version was invoked as follows:
449
450wordplay string_to_anagram -123l -f word_file
451
452The 1, 2, 3, and l options were obviously the old one, two, three, and word
453list flags.  The new option was allowing the user to specify an alternate
454word file on the command line.  This program was a straight port, for the
455most part, of the FORTRAN code as far as the anagramming went.  Since I
456had a 32-bit C compiler on my 386/33 system, this program ran much faster
457than the FORTRAN version using the same algorithms.
458
459Version 5.00
460
461Sick of being limited to three word anagrams, since C supports recursion,
462I added a recursive algorithm that almost makes the old iterative routines
463obsolete.  They're still there, though.  The "r" option tells wordplay
464to use the recursive algorithm, which basically works as follows
465very much oversimplified).  Read the actual code for details.
466
467int recursive_anagram (string, accumulation, level):
468{
469  if no vowels in string, decrement level, return   (dead end)
470  go through list of important candidate words:  for each:
471  {
472    attempt to extract the word from the string passed into the function.
473    if extraction not possible, continue (try next word)
474    if extraction was perfect (left no letters), print the anagram
475		       (accumulation plus the word), then decrement the level,
476		       return
477    if extraction was successful, but left some letters, add the word to
478		       accumulation, increment level, and call the function
479		       with args (<<whatever is left after extraction>>,
480				  accumulation, and level)
481  }
482  if no extractions were a success, decrement level and return.
483  decrement level and return anyway, since we're at the end.  :-)
484}
485
486Boy, that was a cheesy description.
487
488Version 5.01
489
490Later during the night of writing 5.00, I realized that I should eliminate
491strings of anagrams like "A A ..." or "A I I ... " or "I I ..." or "THE THE .."
492because such anagrams are seldom of interest.  Taking them out makes the
493program run more quickly and slightly increases the quality of the output.
494The check was easy to put into place, so I did it.  I had already given the
495code to some friends as 5.00 without that check, so I called this one 5.01.
496
497Version 5.02
498
499November 30, 1993
500Thanksgiving!  While anagramming "Thanksgiving", I found a bug that both
5015.00 and 5.01 had.  Those versions would occasionally miss some anagrams.
502It had to do with the fact that if there were no words in the word list with
503length equal to the string being passed to the recursive anagram function, no
504words would be used for extraction in the loop.  The bug was fixed my
505adjusting a variable called "max_important length" by decrementing it until
506there are some words in the word list having that length or until it reaches
507zero, of course.
508
509Version 5.10
510
511April 11, 1994
512Newsgroup alt.anagrams created.  I thought, "I should tell everyone about my
513program and make it available".  I didn't have much of a documentation file,
514but I posted the information.  In the first 40 hours, after April 9 at 21:00,
51560 people had taken the code!  I improved the readme file and added the
516minimum and maximum candidate word restriction options back in.  They can
517really be of help when anagramming longer strings.
518
519Version 5.11
520
521On April 14, 1994, I rewrote the "extract" function, and did not really change
522the algorithm, but changed from using arrays and subscripts and integer loop
523control variables to using pointers and pointer loop control variables.
524Depending on the compiler, this change may or not speed execution of the
525program.  My results so far have ranged from no speed increase to a quite
526dramatic 40 percent reduction in execution time, just by using different
527compilers.
528
529Version 5.20
530
531April 17, 1994
532
533Note:  There was no version 5.12.  It became 5.20 before release.
534
535Since the program was ported to C, it only anagrams one string per run.
536Storing the entire wordfile into an array is unnecessary and takes a lot
537of memory.  The reason I left it in there so long is because I felt
538eventually I would have the program fixed somehow so it would anagram
539more than one string per run.  I changed my mind since anagramming a new
540string requires going back through the entire word list to make a new
541candidate word list.  The extract routine I originally added in version
5425.00 for use by the recursive anagram procedure, I realized, could be used
543to eliminate unnecessary words just as well as the pass1 and pass2 filters
544I had been using before.  I made that change and applied it to the words
545as they were being read in, directly storing them in the words2 array.
546The "words" array was removed from the program.  Now, if MAX_WORDS is set
547reasonable low enough, the program might run under MS-DOS without much
548work.  The program now starts much more quickly and uses less memory.
549It is only storing the candidate words and associated information.
550
551Version 5.21
552
553April 21, 1994
554Bug fixes:
555
556Ron Gregory found a simple fencepost error in the one-word iterative
557anagramming.  When porting from FORTRAN to C in version 4.00, the error
558was made and it was not discovered until a year later!  When using the
559indexes into the word list which say "the words of a particular length
560are in elements x through y of the candidate word list (the words2 array),
561the loop needs to go from x to and including y.  I used a less than instead
562of a less than or equal to in the for statement, stopping one short and
563missing the last word.  When doing one-word anagrams of a single word, the
564last word in the list is usually always that word, so the most obvious one
565was getting missed!  Thanks Ron!  I fixed a similar problem in the
566recursive algorithm, except it was caused by a decrement of a variable
567and a return appearing where I should have had a continue in one of the
568cases.
569
570However, I discovered a much worse bug, in a way, when I tested the program
571with a different wordfile.  I discovered that if the words were lowercase,
572it would not be processed correctly.  This problem came in with version
5735.20 with its new read routine.  I apologize for that one.
574
575Yet another bad bug.  Try wordplay aiai -r .  It blows up any wordplay
5765.xx up to 5.20 .  Fewer candidate words than recursive anagramr calls
577caused that one.  Solution:  place a #define MAX_ANAGRAM_WORDS at the
578top of the program that the user can increase to suit his needs.  Going
579deeper than that value crashes the program.  The number of candidate words
580was the value previously used, which is too small when anagramming strings
581like "catcatcatcatcatcatcatcat" or "aiai".  Another one fixed!
582
583Version 5.22
584
585April 25, 1994
586No bug fixes.  Just a speedup.  If the string returned by exts has
587character zero as the first character, go ahead and continue (try next
588word) instead of going through the remaining body of the for loop.  Duh.
589
590Version 5.23
591
592May 13, 1994
593A tiny bug fix.  The statement
594while ((wordsn[curpos] == curlen) && (curpos < ncount)) curpos++;
595should keep curpos within bounds.  curpos does get incremented to the
596value of ncount however, in some cases.  Evaluating wordsn[curpos]
597with curpos == ncount could cause a crash, since wordsn is of size
598ncount (max valid subscript = ncount - 1).
599
600Version 5.24
601
602May 16, 1994
603Anu Garg at Case Western Reserve University, Cleveland OH found that, on
604his machine, if there are no candidate words loaded, the program prints
605an error message about malloc() returning NULL.  Evidently, there are some
606versions of malloc() that do not like being passed zero as the amount to
607allocate and return a NULL pointer if they do.  This has never caused a
608problem or an error on any C compiler I've used, but at the end of the
609word loading/filtering routine, I now check for zero candidate words and
610exit with a different error message.
611
612Version 6.00
613
614On May 16, 1994, I decided to try a new approach that I previously had
615tried to work into the program without much of a speed increase.  This
616time, though, I did it more thoroughly.  I made a second copy of the
617words2 array (wordss), and sorted the characters of each word
618alphabetically.  I experimented with this idea a long time ago in one
619of the old FORTRAN versions, but never went through with it.  I then
620sorted this second list in alphabetical order, maintaining a pointer
621to the associated words in the words2 array as I sorted.  I then created
622indexes into this new array by first letter as I did earlier with lengths in
623the words2 array.  I then made some modifications to the recursive procedure
624to take advantage of this new indexing scheme.  It was much faster.  I had
625seen several other public domain anagram programs that seemed much faster
626then mine, and they seemed to be using an algorithm nearly identical in
627functionality to mine.  I finally realized that indexing by least letter
628can possible divide the word list into 26 parts, which are each indexed.
629Doing it the version 5 way, by length, since the average word length is
6307 characters, with few words longer than, say, 10 characters, would divide
631the word list into only a dozen parts or less in many cases.  Dividing the
632list into finer parts was not the only speed increase.  The new algorithm
633should not generate all permutations of a given anagram.  This means far
634fewer operations.  This was one of the main requests from some of the local
635wordplay users, "Can you get rid of the duplicates?".  I am not sure if
636100 percent of the duplicates are removed in all cases, though, but it seems
637to get nearly all of them.  It is much faster, also.
638
639Version 7.00    5-26-94
640
641All redundant permutations are eliminated by not trying any words having
642keys less than the current key we're dealing with at the current level.
643Also, the user can now specify whether anagrams containing more than one
644occurrence of a given word are acceptable.  This option is more important
645now that only one permutation of each anagram is printed.  The recursive
646algorithm is now used for everything, and the "1", "2", "3", "o", and "r"
647options were removed.  Anagram depth (number of words) is now specified
648with the "d" option ("d3" for three words maximum).  The major version
649number was changed because of the command line option changes.
650The "r" is no longer required.  The "x" option is used to turn off
651anagramming, if all you want to do is generate a word list with the "l"
652option.  A word may be specified with the "w" option to start anagrams, if
653you have a word in mind that you want the anagrams to contain.  Dictionary
654words containing apostrophes, hyphens, or other non-alphabetics retain
655their internal punctuation when displayed, but only the alphabetic characters
656are significant when processing anagrams.  "DON'T" = "DONT", "ONE-EYED" =
657"ONEEYED", "KANSAS CITY" = "KANSASCITY", and is processed as a single word.
658
659Version 7.01    6-3-94
660
661A user at io.com could not get wordplay to work.  It would load the words
662and not generate any anagrams.  The uppercase conversion was not working.
663For the toupper macro to work properly on BSD/386 (their operating system),
664the ctypes.h file needed to be included.  That's all.  Nothing more.  It
665should be included everywhere, though.  BSD/386 is the only environment
666where that omission has caused a problem, though.
667
668Version 7.02     7-14-94
669
670A user, Ulf Lunde, suggested a mode of operation where only the anagrams
671are output, with no header, line numbers, or other messages, so they
672could be piped or redirected without having to filter anything.  I added
673the "s" option to accomplish that.  Adding this option to any command
674simply turns off numbering of output and turns off the header.  If used
675with the "l" and "x" options (as in -slx), a word list can be generated
676that is directly useful.  I had thought of this option before, but never
677got around to doing it until a user actually requested it.
678
679Version 7.10   08-14-94
680
681Wordplay was up until now, a serious memory hog!  Now, I've made several
682significant changes to reduce the memory usage.  I allocate one contiguous
683block to store the words (and another for the keys).  The word block is
684reallocated (using realloc()) to increase its size dynamically as more and
685more words are read in.  That way, it does not need to be made a "maximum"
686size that is big enough to handle the largest dictionary.  It can now start
687small and grow as needed.  The vowel index arrays were not being used, so
688I deleted them, and their associated code.  The word length indexes (the
689ones that say "words of length i are found in positions a through b" are
690allocated to be as big as the maximum word length (plus one), instead of
691MAX_WORD_LENGTH.
692
693After making these changes, I brought the code to an MS-DOS machine and
694it compiled under Turbo C ON THE FIRST TRY!!!  It ran fine, unless I tried
695to anagram something that required storing too many candidate words.
696
697Version 7.11
698
699Thanks to Dan Ingalls at Interval Research (interval.com) for a suggestion!
700Create integer masks representing the occurring letters in each word:
701
70233222222222211111111110000000000    Bit of integer (tens digit)
70310987654321098765432109876543210    Bit of integer (ones digit)
704......ZYXWVUTSRQPONMLKJIHGFEDCBA    Letter positions  (only 26 bits used)
705
706Example:  intmask ("ACE") is 00000000000000000000000000010101 = 21 decimal
707
708Before the main loop in anagramr7, the intmask of "s" is computed and
709stored (in "s_mask") and inside the loop, the first thing that is
710checked is  "if ((s_mask | wordsmasks[i]) != s_mask)"  skip the rest of
711the loop body on this iteration of the loop.
712
713Version 7.20
714
715Bug fixes and improvements:
716
7171.  The "4" bug.  :-)
718    Numbers are no longer treated as punctuation.  Strings with digits
719    are no longer considered equivalent to the string without the digits.
720    That is, "4th" no longer is equivalent to "th".
721
7222.  Wordplay now is capable of taking the wordlist from stdin by using
723    a hyphen as the input file name.  For those of you familiar with
724    the "tar" command, it'll come natural to you.  This will allow users
725    to break their wordfile into several parts, one with common words,
726    one with places, one with people's names, and concatenate the
727    desired ones together and pipe the combination into wordplay.
728
7293.  An option to override the vowel-checking is available.  When this
730    option is used, the "anagramr7" routine will continue trying to
731    anagram, even when there are no vowels left in the string after
732    an extraction.
733
7344.  When the "starting word" specified with the "w" option is an anagram
735    of the initial string, that "starting word" is printed as an anagram,
736    instead of saying "No anagrams found".
737
738Version 7.21
739
740In the anagramr7 function, the current depth is subtracted from the
741maximum allowable depth and then multiplied by the length of the longest
742candidate word.  If this product is less than the length of the string
743passed into the function, then the case is treated as a dead end, causing
744the program to run faster in many cases when the "d" option is used.
745
746Version 7.22
747
748In a couple of places in the code, my malloc statements were not taking
749into account the space needed to store the trailing NULL in a character
750string.  This bug has rarely affected anyone, but I have now received two
751mail messages about it.
752
753------------------------------------------------------------------------------
754
755HAVE FUN !!!
756
757------------------------------------------------------------------------------
758
759