1NAME
2    CHANGES - Revision history for WordNet::Similarity
3
4DESCRIPTION
5  Version 2.07 (Released 10/05/2015)
6    (1) Fix make test error in lesktrace.t due to overlap results returning
7        in unpredictable orders - problem is documented here :
8        <https://rt.cpan.org/Ticket/Display.html?id=86437> and fix is
9        provided by Phil Goetz, philgoetz@gmail.com and involves sorting
10        overlaps in lesk.pm to guarantee order in testing. Note that keys
11        had to be regenerated after this fix installed using perl t/trace.t
12        --key (TDP)
13
14    (2) Install patch to fix WordNet version detection issues in Windows.
15        Problem description and patch provided here :
16        <https://rt.cpan.org/Ticket/Display.html?id=79065>
17
18    (3) add doc/update-pod.sh in order to create plain text documentation
19        (TDP)
20
21    (4) fix WordNet download location in install.pod (TDP)
22
23    (5) update prereqs in Makefile.PL (TDP)
24
25  Version 2.05 (Released 06/16/2008)
26    (1) Created new module WordNet::Similarity::FrequencyCounter containing
27        common support code for information content programs. (Sid)
28
29    (2) Updated all the frequency counting programs in /utils (*Freq.pl) to
30        use the common code in WordNet::Similarity::FrequencyCounter. (Sid)
31
32    (3) Changed the default path to Perl from /usr/local/bin to /usr/bin in
33        all scripts and tests in the package. (Sid)
34
35    (4) Fixed incorrect handling of BNC header information. (Sid)
36
37    (5) Modified the compoundify() method in WordNet::Tools to include
38        compounds containing special characters (period, hyphen,
39        forward-slash, single-quote). (Sid)
40
41    (6) Updated compoundify() to handle larger compounds. (Sid)
42
43    *   04/23/08
44
45        (1) Fixed the "excessive ROOTs" bug in *Freq.pl. (Sid)
46
47        (2) Fixed the extra verb concept counts in *Freq.pl. (Sid)
48
49  Version 2.04 (Released 04/19/2008)
50    *   04/17/08
51
52        (1) Reorganized similarity_server initialization. (Sid)
53
54        (2) The similarity server now prints more intuitive messages. (Sid)
55
56        (3) Attached timestamps to log messages. (Sid)
57
58        (4) Added additional checks to input strings from clients. (Sid)
59
60    *   04/12/08
61
62        (1) Added more detailed description of information content to
63            rawtextFreq.pl, and made minor copy editing and formatting
64            changes to other /utils files (TDP)
65
66        (2) Made minor copy editing and formatting changes to files in /doc
67            (TDP)
68
69    *   04/10/08
70
71        (1) Moved get_wn_info, stem and vectorFile modules under WordNet,
72            i.e., they are now WordNet::get_wn_info, WordNet::stem and
73            WordNet::vectorFile. (Sid)
74
75        (2) Updated all the modules and programs using the above modules.
76            (Sid)
77
78        (3) Added copyright notices in all module and program headers. (Sid)
79
80        (4) Added method getCompoundsList() to WordNet::Tools. (Sid)
81
82        (5) Made a more distrtibutable version of simialrity_server. The
83            similarity_server is now "daemonized", and is installed in
84            /usr/bin along with the other utils. (Sid)
85
86    *   03/23/08
87
88        (1) Added SIGNATURE to distrribution to enable package verification.
89            (Sid)
90
91        (2) Updated MANIFEST to reflect new SIGNATURE. (Sid)
92
93        (3) Set the LICENSE to gpl in META.yml and Makefile.PL. (Sid)
94
95    *   03/17/08
96
97        (1) Added NO_META option to Makefile.PL to prevent automatic
98            generation of META.yml during 'make dist'. (Sid)
99
100        (2) Removed unused variable "loaded" from Makefile.PL. (Sid)
101
102  Version 2.03 (Released 03/11/2008)
103    *   03/07/08
104
105        (1) Removed all references to WordNet::QueryData from Makefile.PL.
106            This is based on the following advice present in the
107            ExtUtils::MakeMaker documentation: "Module installation tools
108            have ways of resolving unmet dependencies but to do that they
109            need a Makefile". By checking for the presence of
110            WordNet::QueryData during 'perl Makefile.PL', we are preventing
111            any opportunity for automated dependency resolution. (Sid)
112
113        (2) The WordNet path (if specified by the WNHOME option during 'perl
114            Makefile.PL') is not checked for validity beforehand, and is now
115            directly provided as-is to build/Infocontent.PL and
116            build/Depthfiles.PL. In case of a WNHOME error, now 'make'
117            should fail instead of 'perl Makefile.PL' (which is more
118            appropriate). (Sid)
119
120        (3) Corrected a typo in DepthFinder.pm synopsis that refered to
121            getTaxonomyRoot rather than getTaxonomies. Removed some cut and
122            paste documentation from the templated used for GlossFinder.pm
123            and PathFinder.pm (Ted)
124
125        (4) Made synopsis examples WordNet version independent by not hard
126            coding offsets, etc. Did this in Depthfinder.pm, PathFinder.pm,
127            ICFinder, and GlossFinder.pm (Ted)
128
129        (5) Made minor changes in path names and file names in the /samples
130            directory and the /config-files subdirectory. (Ted)
131
132  Version 2.02 (Released 03/04/2008)
133    *   03/04/08
134
135        (1) Applied patch from Ben Haskell to fix a bug report (submitted by
136            Quang Do Xuan) about failing self-similarity of tilde#n#1 using
137            wup and lch measures. (Sid)
138
139        (2) Added tests for above bug to t/wup.t and t/lch.t. (Sid)
140
141        (3) Added WordNet::Similarity package version info to similarity.pl
142            --version. (Sid)
143
144    *   01/31/08
145
146        (1) Changed some default options in the similarity_server.conf
147            configuration. (Sid)
148
149        (2) Reformatted some of the similarity_server code. (Sid)
150
151    *   01/10/08
152
153        (1) Reduced version requirements of some of the PREREQ_PM modules.
154            (Sid)
155
156        (2) Changed WordNet::QueryData requirements to v1.40 in the
157            documentation. (Sid)
158
159  Version 2.01 (Released 10/14/2007)
160    *   10/13/07
161
162        (1) Fixed error in loading WordNet::Tools for similarity_server.pl.
163            (Sid)
164
165        (2) Removed the use of default (hardcoded) stoplist and word-vectors
166            file for similarity_server.pl. (Sid)
167
168        (3) Print WordNet hash-code instead of WordNet version, for
169            similarity.cgi WordNet version information. (Sid)
170
171    *   10/09/07
172
173        (1) Updated the Pathfinder code to handle loops in the WordNet is-a
174            hierarchy (like the one in WN3.0). (Sid)
175
176        (2) Updated MANIFEST, changelog and documentation to reflect the new
177            changes. (Sid)
178
179    *   10/08/07
180
181        (1) The modules now are not dependent on the version() method of
182            WordNet::QueryData (which is no longer reliable). Instead they
183            now use a 'hash-code' representing a specific WordNet
184            distribution. (Sid)
185
186        (2) Added module WordNet::Tools which provides the hashCode and
187            compoundify methods used by most of the other modules and
188            utilities. (Sid)
189
190        (3) Completely modified the build procedure to generate data files
191            during the 'make' step instead of the 'perl Makefile.PL' step.
192            (Sid)
193
194        (4) Removed the WordNet version numbers appended to synsetdepths.dat
195            and treedepths.dat. (Sid)
196
197        (5) Added two "build" utilities -- build/Infocontent.PL and
198            build/Depthfiles.PL -- which are run during the 'make' step to
199            generate data files. (Sid)
200
201        (6) The default WordNet version is now v3.0. Changed all
202            documentation, code and examples to reflect this. (Sid)
203
204        (7) The package now requires WordNet::QueryData version 1.46 or
205            above. (Sid)
206
207        (8) Revised all tests and test-keys for the new code and new version
208            of WordNet and QueryData. (Sid)
209
210        (9) Removed the multiple pieces of code implementing "compoundify"
211            and moved it all into a single method in WordNet::Tools. (Sid)
212
213    *   10/04/07
214
215        (1) Included a default word vectors file in the distribution and
216            eliminated the creation of a default word vectors file at
217            install time. (Sid)
218
219    *   02/25/07
220
221        (1) Fixed documentation where module WordNet::Similarity::path was
222            referred to as WordNet::Similarity::edge (old name). (Sid)
223
224    *   01/30/07
225
226        (1) Fixed wnDepths.pl man-page to display the wnpath option
227            consistently in the usage and the description. (Sid)
228
229        (2) Fixed the "deep recursion" error (only with WN3.0) in the
230            findWPSDepths() subroutine in the wnDepths.pl script. (Sid)
231
232  Version 1.04 (Released 12/13/2006)
233    *   12/13/06
234
235        (1) Fixed major bug reported in vector_pairs, where every alternate
236            function is skipped because of a loop variable being incremented
237            twice. (Sid)
238
239    *   04/21/06
240
241        (1) The web-interface was still not working for the vector measure,
242            because only one side of the client-server interface had been
243            updated. Updated the similarity server with code to support
244            both, vector and vector_pairs measures. (Sid)
245
246        (2) Updated the description of the Gloss Vector measure in
247            measures.html (web interface). (Sid)
248
249  Version 1.03 (Released 04/14/2006)
250    *   04/14/06
251
252        (1) Applied Ben Haskell's patch to ICFinder.pm (to make the
253            behaviour of the probability() and IC() functions consistent
254            with their comments).
255
256    *   04/05/06
257
258        (1) Updated the names for the Extended Gloss Overlaps measure and
259            the Gloss Vector measure in the documentation. (Sid)
260
261    *   02/19/06
262
263        (1) Updated PODs for all modules. (Sid)
264
265        (2) Added tests for POD errors and for POD coverage. (Sid)
266
267    *   03/31/06
268
269        (1) Changed "hash-style" constants (Perl v5.8) to single line
270            constants (Perl v5.6) for compatibility with Perl v5.6.0. (Sid)
271
272  Version 1.02 (Released 02/07/2006)
273    *   02/06/06
274
275        (1) Added utility rankFormat.pl for ranking the output of
276            similarity.pl and making the output suitable for input to
277            rank.pl (to compute Spearman's correlation coefficient) of the
278            Text::NSP package. (Sid)
279
280    *   01/15/06
281
282        (1) Fixed issue in lesk.pm where undefined values for $wc1 and $wc2
283            caused errors with the normalize option. (Sid)
284
285        (2) Fixed minor UI issues in wnDepths.pl. (Sid)
286
287  Version 1.01 (Released 12/21/2005)
288    *   12/09/05
289
290        (1) Modified get_wn_info.pm with Wybo Wiersma's changes. (Sid)
291
292        (2) Modified lesk.pm, vector.pm and vector_pairs.pm to be compatible
293            with above changes. (Sid)
294
295    *   12/07/05
296
297        (1) Updated all utilities to use WordNet 2.1 (WordNet::QueryData
298            1.39 or above). (Sid)
299
300        (2) Updated all modules and test cases for WordNet 2.1. (Sid)
301
302    *   12/05/05
303
304        (1) Changed order of authors in package documentation. (Sid)
305
306  Version 0.16 (Released 12/12/2005)
307    *   12/01/05
308
309        (1) Added Wybo Wiersma's super-gloss caching code to GlossFinder.pm.
310            (Sid)
311
312        (2) Updated documentation to reflect above changes. (Sid)
313
314  Version 0.15 (Re-released 12/11/2005)
315    *   12/11/05
316
317        (1) tar file unpacked as WordNet-Similarity for June 12, v 0.15, now
318            unpacks as WordNet-Similarity-0.15, which is consistent with all
319            previous versions. (Ted)
320
321        (2) Similarity.pm version was shown as 0.14, is now 0.15. Our
322            general convention for modules is that their version number only
323            change when the module itself changes, so the module version
324            number can tell you when was the last time a module changed.
325            However, for Similarity.pm this is needlessly confusing, so it
326            will always carry the same version number as the release. (Ted)
327
328  Version 0.15 (Released 6/12/2005)
329    *   06/10/05
330
331        (1) Fixed a minor bug in MANIFEST. (Sid)
332
333        (2) Updated modules.pod and developers.pod to reflect new software
334            architecture. (Jason)
335
336  Version 0.14 (Released 6/9/2005)
337    *   06/08/05
338
339        (1) Re-introduced the previous (non-pairwise-comparison) vector.
340            (Sid)
341
342        (2) Updated documentation and test cases to support the new vector
343            measure. (Sid)
344
345        (3) Added default relation file for new vector measure. (Sid)
346
347        (4) Expunged erroneous references to LCSFinder, esp. in test
348            scripts. (JM)
349
350  Version 0.13 (Released 5/9/2005)
351    *   04/21/05
352
353        (1) removed LCSFinder module; moved LCS methods to DepthFinder,
354            ICFinder, and PathFinder (JM)
355
356        (2) renamed vector measure vector_pairs (JM)
357
358    *   03/24/05
359
360        (1) Modified the documentation to reflect the relation file format
361            for vector and for lesk. (Sid)
362
363    *   03/02/05
364
365        (1) Set up selective test cases for "make test", depending upon the
366            default data files installed by user. (Sid)
367
368    *   02/24/05
369
370        (1) Reinstated default relation files for vector and lesk. In case
371            the default relation files (vector-relation.dat and
372            lesk-relation.dat) are missing, both modules would default to
373            the glosexample-glosexample relation. (Sid)
374
375        (2) Modified Makefile.PL to query the user before installing default
376            data files. (Sid)
377
378        (3) Removed infocontent file generation code from Makefile.PL. Now
379            Makefile.PL simply calls utilities from the /utils directory
380            (wnDepths.pl, semCorFreq.pl and wordVectors.pl) to generate the
381            all default data files. (Sid)
382
383        (4) Installation process now generates a default word vectors file.
384            The vectordb configuration variable for vector is now optional.
385            (Sid)
386
387        (5) Earlier, the WNHOME option was given to Makefile.PL as --WNHOME
388            <path>, whereas the PREFIX option was written as PREFIX=<path>.
389            This inconsistent (and potentially confusing) notation has now
390            been fixed. Now, the WNHOME option is provided to Makefile.PL as
391            WNHOME=<path>. (Sid)
392
393        (6) Added some basic tests for vector in t/vector.t.
394
395    *   12/11/04
396
397        (1) Created WordNet::Similarity::GlossFinder.pm, a super-class of
398            WordNet::Similarity::vector and WordNet::Similarity::lesk. (Sid)
399
400        (2) Removed default relation file for lesk. Vector and lesk both
401            default to glosexample-glosexample. (Sid)
402
403  Version 0.12 (Released 10/29/04)
404    *   10/29/04
405
406        (1) Added vector to the CGI interface. (JM)
407
408        (2) Incorporated a configuration file into similarity_server.pl.
409            (JM)
410
411    *   10/28/04
412
413        (1) Removed readDB.pl. (JM)
414
415    *   10/27/04
416
417        (1) Modified string overlap finding in lesk to use the
418            Text::OverlapFinder module. Removed string_compare.pm. This
419            fixed an old bug where the relatedness of word1 and word2 wasn't
420            always equal to the relatedness of word2 and word1. (JM)
421
422        (2) Updated Makefile.PL, INSTALL, and doc/install.pod to reflect new
423            dependency on Text::OverlapFinder. (JM)
424
425        (3) Removed lib/dbInterface.pm and lib/string_compare.pm from
426            MANIFEST. (JM)
427
428    *   10/19/04
429
430        (1) Word vectors no longer stored in a BerkeleyDB database, a plain
431            text file is now used. Modified wordVectors.pl,
432            WordNet::Similarity::vector to use the plain text word vectors
433            file. New module vectorFile.pm now used to access this plain
434            text database. Module dbInterace.pm is obsolete. (Sid)
435
436        (2) Modified Makefile.PL to no longer check for BerkeleyDB
437            dependency. All modules are installed. (Sid)
438
439  Version 0.11 (Released 09/23/04)
440    *   09/23/04
441
442        (1) Fixed bug in wup that allowed some relatedness scores to be
443            greater than 1. This bug is discussed in the archives of the
444            mailing list. (JM)
445
446  Version 0.10 (Released 09/03/04)
447    *   09/01/04
448
449        (1) Modified vector to look like the other measures. It now is
450            derived from WordNet::Similarity.pm. (Sid)
451
452        (2) Updated the MANIFEST. (Sid)
453
454        (3) Fixed some minor typos in Makefile.PL. (Sid)
455
456        (4) Added single test case (for vector) to t/access.t. (Sid)
457
458        (5) Fixed config option name conflict in WordNet::Similarity.pm.
459            (JM)
460
461        (6) Fixed WNHOME and WNSEARCHDIR related bugs. (JM)
462
463        (7) Updated documentation for the web interface. (JM)
464
465  Version 0.09 (Released 05/19/04)
466    *   05/19/04
467
468        (1) Fixed over-counting problem in *Freq.pl programs. Under certain
469            conditions, word senses would sometimes get counted twice. (JM)
470
471        (2) Updated *Freq.pl programs to use WordNet 2.0. (JM)
472
473        (3) Input files to rawtextFreq.pl are now specified with the
474            --infile option. (JM)
475
476        (4) Improved speed of compound identification in rawtextFreq.pl by
477            adding ',', ';', and ':' to the list of characters that we
478            consider to be the end of a sentence (compound identification
479            time is proportional to the square of the length of the
480            sentence). (JM)
481
482  Version 0.08 (Released 04/28/04)
483    *   04/28/2004
484
485        (1) Created a CGI-based web interface for the relatedness modules.
486            (JM)
487
488    *   04/19/2004
489
490        (1) Fixed problem with path to Perl interpreter in Makefile.PL. This
491            was causing problems during installation if there was no
492            /usr/local/bin/perl. (JM)
493
494        (2) wnDepths.pl had forgotten that on Windows some filenames are
495            different; for example, data.noun is noun.dat. (JM)
496
497  Version 0.07 (Released 03/24/04)
498    *   03/23/2004
499
500        (1) In /t, save diff files between 0.06 and 0.07. Make sure to run
501            diff tests for path/0.07 and edge/0.06.
502
503    *   03/16/2004
504
505        (1) make sure that every .pm and .pl file has the same GNU copyleft
506            language. Use PathFinder.pm as a template.
507
508        (2) make sure that documentation is clear that vector and lesk
509            require different format relation files (ie they are not
510            interchangeable).
511
512        (3) convert README into a series of pod documents in doc directory.
513            In the intro.pod, provide a table of contents like structure
514            (much like perldoc perl does).
515
516            Make sure that each pod documents follows the cpan style (name,
517            synopsis, etc.) This should be true of any pod documentation in
518            the package.
519
520        (4) Modify INSTALL to describe local install correctly. In
521            particular, the description of how to do a 'use lib' or -I may
522            need adjustment.
523
524    *   03/12/2004
525
526        (1) Make developers.pod into a self contained document that provides
527            a step by step tutorial on how to write a measure of
528            relatedness. The file NewStats.txt in NSP provides an example of
529            the style of presentation that is expected.
530
531        (2) developers.pod should be a tutorial that explains how to create
532            a new measure. It should take the reader through a complete
533            example, such as creating a measure that returns the sum of the
534            information content of the concpets found in the shortest path
535            between two concepts. This should include an example of how to
536            use all of the available configuration options, and also adding
537            a new one.
538
539    *   03/11/2004
540
541        (1) document measure modules (lch.pm, wup.pm, etc.) with information
542            about effect of hypo root node. (Take discussion from email
543            explaining why it has an effect, and why it doesn't have an
544            effect) and make it a part of the .pm perldoc. This will
545            eventually be used in thesis writing, so it should be complete
546            and detailed. Of particular important is the behavior of lch.pm,
547            but all of the modules should have their expected behaviour with
548            and without the hypo root node clearly documented. Also, you
549            should note what the behavior was in 0.06 for both nouns and
550            verbs, and if this has changed.
551
552    *   03/09/2004
553
554        (1) lch.pm does not yet support not having a hypo root. Remember
555            that the lack of hypo root will change (potentially) the max
556            path length found for each taxonomy.
557
558    *   03/08/2004
559
560        (1) depth finding code should be contained with DepthFinder.pm. We
561            should not do any depth finding on the fly, rather that should
562            all be precomputed (like we do info content). That includes the
563            depth of individual concepts, and the max depths of taxonomies.
564
565        (2) When wup.pm encounters two or more paths to the root, the trace
566            output "condenses" those paths into a single path. It would be
567            better to show all paths in the trace (as res does, for
568            example). Also, make sure that the depth reported in such cases
569            is always the minimum (shortest path to root).
570
571    *   03/05/2004
572
573        (1) Modify wnDepths such that it shows both the depths of individual
574            concepts, as well as the max distance from a root node. In the
575            case of multiple inheritance, wndepths should show the depth of
576            the concept in each case, and also the relevant root node.
577            wnDepths should sort these depths from shortest to longest. The
578            output of wndepths should be formatted like infocontent.dat,
579            anticipating an eventual merger.
580
581    *   03/02/2004
582
583        (1) in docs, update/replace current discussion of modules. Include
584            example usage as well. Make sure that path length is clearly
585            defined for lch, edge, and wup.
586
587    *   02/25/2004
588
589        (1) In PathFinder.pm, Infocontent.pm, Similarity.pm, and
590            LCSFinder.pm each function should be documented in perldoc form
591            such that their input, output and basic functionality is
592            described. This should then appear in the DESCRIPTION portion of
593            the perldoc. The SYNOPSIS should contain examples or templates
594            of each function being used.
595
596    *   02/23/2004
597
598        (1) redo random pairs testing such that we have 60 noun-noun pairs,
599            25 verb-verb pairs, and 15 mixed pairs.
600
601    *   02/20/2004
602
603        (1) Revisit the distance versus similarity issue in jcn.pm. It maybe
604            be that simply inverting the distance is too extreme a solution.
605            One possibility is to make it a linear transformation via
606            maxdist - dist instead. (JM - we'll stick with inverting the
607            distance, but added a discussion of this issue to the
608            documentation)
609
610    *   02/18/2004
611
612        (1) document all multiple inheritance issues that are being handled
613            for measures.
614
615    *   02/16/2004
616
617        (1) validateSynset should check wps format fairly closely, and issue
618            descriptive errors if the wps is ill formed. Words can
619            apparently be about anything (except #) but pos should be lower
620            case nvra, and senses should be digits. Error messages should
621            point out which field is the problem, or if there are too few or
622            too many fields.
623
624        (2) place all hypo root handling node code in PathFinder.pm. The
625            measures should not have any hypo root handling code in them.
626
627        (3) PathFinder.pm should include a function getAllPaths.pm that
628            returns all paths between two concepts, their length, and their
629            "tops" (the candidate LCSs). This should be used as the main
630            source of input for the getLCS* functions, and for
631            getShortestPath.
632
633        (4) remove all "input verifcation" code from the measures. That
634            should be inherited from Similarity.pm.
635
636        (5) There is replicated code in the measure modules that checks
637            validity of input. This should be removed to a common module
638            that can be called by all of the measures. Any other replicated
639            code should be removed as well. The goal of 0.07 is to largely
640            eliminate replicated code via the use of inheritance, and to
641            make the writing of new measures simpler.
642
643    *   02/13/2004
644
645        (1) add pod/perldoc to lib/ICFinder.pm. Should also be done for all
646            other files as they are modified for other reasons. In
647            particular, introductory material that appears in source code
648            comments, author information, GPL, etc. should be moved into pod
649            and removed from source code comments. See similarity.pl for an
650            example.
651
652        (2) path should use getShortestPath from PathFinder.pm.
653
654    *   02/09/2004
655
656        (1) getLCSDepth, getLCSInfo, getLCSPath should appear in
657            LCSFinder.pm, which should inherit from both ICFinder and
658            Pathfiner.
659
660        (2) The measures (lch, path, jcn, lin, res, wup) should default to
661            having the hypo root node turned on (for both nouns and verbs).
662            This will eventually be true of hso, but is not currently. hypo
663            root nodes could also be used for lesk and vector, although they
664            are not currently.
665
666    *   02/04/2004
667
668        (1) Wps and offsets will be supported internally. The user can
669            request either mode via an option to getRelatedness. offset is
670            our default. profiling has shown wps to be somewhat faster, in
671            that it makes fewer calls to getSense, although it does make
672            some. For input, we only support wps. For trace output we
673            support wps and offset. For output we support wps and offset.
674
675    *   01/29/2004
676
677        (1) modify option in config files such that an option without a
678            value reverts to the default in all cases (except vectordb).
679
680    *   01/24/2004
681
682        (1) Provide support for undefined values in the path finding and
683            info content measures (path, wup, lch, res, lin, jcn). If two
684            concepts are not in the same taxonomy then an error should be
685            issued and a large negative integer should be returned. This can
686            occur in two cases, between the same part of speech (noun-noun,
687            verb-verb), or between nouns and verbs. Distinct error messsages
688            should be indicated in both cases.
689
690    *   01/20/2004
691
692        (1) Clean up configuration file examples (in samples). Make them
693            consistent by having a master list (all-options.conf) that is
694            what we make changes to. Then specific example files can be
695            created via copy and paste. Make sure all possible options for a
696            measure are included, and that the explanations describe all
697            possible values as well as default handling. (TDP updated
698            all-options.conf on 12/10/03, use this as source of cut and
699            paste).
700
701    *   01/19/2004
702
703        (1) Create test scripts that can be run to verify the correctness of
704            output - they should include "correct" answers that can be
705            compared to (automatically) and rerun as the system changes. We
706            should use the CPAN module Test::More, and create .t files in a
707            /t directory that test specific situations/problems, etc. The .t
708            files themselves should be documented with an explanation of
709            what is being tested. We should have lots of smaller, specific
710            .t tests (rather than a few big test files). Whenever a bug is
711            found and fixed, a .t file should be created that tests the fix,
712            and this should be mentioned in the source code comments where
713            the fix is made (this fix is tested by t/xyz.t).
714
715            Make sure that the testing system can be easily
716            extended/modified, and that it can support the use of multiple
717            input files and configuration files. We should have multiple *.t
718            files to run our tests, and each module and utility should have
719            at least its own *.t file (maybe more than one in some cases).
720            We should also have *.t files that are dedicated to particular
721            situations that affect a number of measures (like what happens
722            when info content is zero for one concept, what happens if one
723            of the concepts being compared is the lcs of the other, what if
724            the two concepts are the same (self similarity), and so forth.
725
726        (2) Test cases for configuration file handling should include:
727
728            repeated options in configuration file, as in
729
730                trace::0
731                trace::1
732
733            bad values in configuration file, as in
734
735                trace::nothankyou
736
737            bad options in configuration file, as in
738
739                tracer::0
740
741        (3) Test cases for similarity.pl should include:
742
743            ill formed file input for similarity.pl, as in
744
745                cat#dog#1 cat#n#2
746                cat#n#n cat#n#2
747                cat
748
749        (4) Test cases for measures should include:
750
751            show that wps and offset methods of path finding are equivalent
752
753            check trace output for each of the measures. use wps format, as
754            that is subject to fewer changes than offsets.
755
756            a "big" file of word pairs (maybe 100 pairs) that run all the
757            measures and compare values to what is obtained in 0.6. If there
758            are differences, let's see what they are.
759
760        (5) Test cases for information content programs should include:
761
762            an information content file based on one of our resident text
763            files that is large enough to be interesting (readme, gpl, etc.)
764            as computed in 0.6/0.7 (should be the same). This can be used as
765            a reference point when we make changes in future.
766
767            Information content computed with a very small number of
768            concepts, to expose the counting problem that ted mentions
769            below.
770
771        (6) Test cases for wnDepth...
772
773            Generate output for 0.07 to use as a point of reference. A few
774            specific manual checks would be good too (leather_carp, entity,
775            etc.)
776
777        (7) run tests to determine where the system now provides different
778            results from version 0.06 - make sure to document these cases
779            (that are different).
780
781    *   01/12/2004
782
783        (1) document configuration options extensively in a separate pod
784            called doc/config.pod. Organize such that you have options that
785            are used with all measures, and then those that are used with
786            certain classes of measures. Then, use this as a master copy to
787            update .pm files with.
788
789    *   01/09/2004
790
791        (1) modify option handling such that multiple occurrences of an
792            option in a config file cause an error. For example
793
794              trace::
795              trace::1
796
797            should cause an error.
798
799    *   12/17/2003
800
801        (1) SemCor1.7Freq.pl and SemTagFreq.pl need to be renamed. They are
802            now called semCorRawFreq.pl and SemCorFreq.pl. semCorRawFreq.pl
803            counts without sense tags and SemCorFreq.pl counts the sense
804            tags. (TDP)
805
806    *   12/09/2003
807
808        (1) In similarity.pl cache error strings that indicate that two
809            input synsets are from different parts of speech so that we only
810            print out a warning once for each unique word1#pos1 word2#pos2
811            combination (JM)
812
813        (2)
814
815            (a) Enhance similarity.pl file handling (for input files).
816                Comments should be allowed - this will help in creation of
817                test data (we can explain in the comment what "case" is
818                being tested by a particular set of pairs. Use standard perl
819                commenting style line starting with a # is a comment. Note
820                that I don't think we can use the convention of # anywhere
821                in a line as being the start of a comment (due to w#p#s) but
822                I think any line that starts with a # can be safely treated
823                as a comment. (JM -- we are using // to indicated the start
824                of a comment)
825
826            (b) Enhance similarity.pl file handling (for input files). At
827                present if a single word (not a pair) appears on a line, no
828                error is issued. It silently ignores this case. This should
829                result in an error to the effect that the input format is
830                invalid, only one word. Also, I'm not sure what happens if
831                you have more than two words on a line. An error of some
832                sort would also be necessary in that case. Also, I am not
833                sure if similarity.pl checks to see that the words pairs are
834                "well formed", that is to say do they adhere to the word,
835                word#pos, or word#pos#number format. It would be good to
836                have a simple check that verifies we have alphanumeric
837                words, pos of n, v, a, or r, and numeric numbers. (JM)
838
839    *   12/08/2003
840
841        (1) Clean up configuration file examples (in samples). Make them
842            consistent by having a master list (all-options.conf) that is
843            what we make changes to. Then specific example files can be
844            created via copy and paste. Make sure all possible options for a
845            measure are included, and that the explanations describe all
846            possible values as well as default handling. (JM)(TDP updated
847            all-options.conf on 12/10/03, use this as source of cut and
848            paste).
849
850        (2) Determine if it is feasible (not too difficult or time
851            consuming) to modify --version option so it can display both the
852            version of similarity.pl and the version of the module used when
853            --type is specified. (JM -- version will show module version as
854            well if a module is specified)
855
856    *   12/05/2003
857
858        (1) all configuration options are now printed to traceString after
859            module initialization. (JM)
860
861        (2) explain the distinction between compounds and collocations
862            raised in sample README. (Drop the distinction, and clarify what
863            we mean by Wordnet compounds. TDP Dec 3). (JM)
864
865    *   12/04/2003
866
867        (1) document caching for random (normally random uses an unlimited
868            cache size) (JM -- random now uses the same default as all other
869            measures)
870
871        (2) determine a reasonable default cache size. Should not be
872            unlimited. Current default is 1000, maybe it can be increased to
873            5000 or 10000. Let lesk with trace be the standard as to what is
874            reasonable. (JM -- default is now 5,000).
875
876        (3) Improve error handling when processing config files. Make sure
877            the values specified are valid and that filenames refer to
878            extant files. All options should allow the value to be omitted,
879            in which case the default is used. (JM)
880
881    *   12/01/2003
882
883        (1) Adjust Makefile.PL to account for new contents of samples
884            directory. Added entries to MANIFEST as well. (JM)
885
886        (2) update samples/sample.pl to run with the new files (and
887            organization) provided in the samples directory. This was also a
888            problem in 0.06, where it did not run for hso properly due to a
889            mismatch in the name specified in sample.pl and the
890            configuration file.
891
892        (3) Rename infocontent.dat in Makefile.PL to use our standard name
893            for semcor information content files. Name should reflect
894            options used in computing information content values (if any).
895            JM
896
897        (4) relation.dat is in lib/WordNet. Should be referred to as
898            lesk-relation.dat. Should also have vector-relation.dat I would
899            think. (if not, what does vector do?). JM (vector doesn't try
900            finding a default relation file--it fails silently).
901
902        (5) /sample/vector-relation.dat is wrong. Calls itself
903            LeskRelationFile. JM
904
905        (6) In intro.pod, provide instruction on how to convert to html or
906            whatever if user wishes (just point them to documentation that
907            describes this elsewhere even). JM
908
909    *   11/28/2003
910
911        (1) remove wordnet 1.7.1 compounds from samples directory. (TDP)
912
913        (2) change comment in Similarity.pm to explain the pluses and
914            minuses of using/not using a unique root node. (JM)
915
916    *   11/26/2003
917
918        (1) added info content files in samples/Infocontent
919
920        (2) changed version numbers to 0.07 in all modules and utils
921
922        (3) fixed bug in wup: if user supplies car#n#1 and auto#n#1, the LCS
923            found by wup is motor_vehicle#n#1, not car#n#1
924
925        (4) added POD to all programs in /samples
926
927    *   11/24/2003
928
929        (1) added documentation (in the form of POD) to /doc
930
931    *   11/21/2003
932
933        (1) added /doc directory to contain documentation
934
935    *   11/18/2003
936
937        (1) ensured that each measure initializes a part-of-speech list in
938            _initialize
939
940        (2) all measures (except vector) now use fetchFromCache and
941            storeToCache
942
943        (3) updated README:
944
945            (a) Replaces most references to WordNet 1.7.1 with 2.0
946
947            (b) Add some documentation on how to write a new measure
948
949        (4) added an INSTALL file
950
951        (5) cleaned up /samples. relation.dat is now named lesk-relation.dat
952            and added vector-relation.dat. A sample config file is also
953            provided for each measure (in /samples/config-files)
954
955    *   11/15/2003
956
957        (1) updated jcn, hso, random, and lesk to use the funcitions that
958            have been moved to Similarity.pm (such as the cache management
959            functions).
960
961        (2) cleaned up the /samples directory. Removed outdated files. Put
962            sample config files in samples/config-files. Added README in
963            /samples.
964
965    *   11/12/2003
966
967        (1) Added fetchFromCache() and storeToCache() to Similarity.pm to
968            make caching easier and cleaner.
969
970        (2) Updated wup, edge, lch, res, and lin to use fetchFromCache() and
971            storeToCache().
972
973    *   10/25/2003
974
975        (1) Reduced the amount of duplication code in the measure modules by
976            moving some common code to WordNet::Similarity.
977            WordNet::Similarity is now a base class for all the measures.
978            Also added a module called infocontent.pm from which all
979            information content measures are descended (i.e., res, lin,
980            jcn).
981
982        (2) Removed @ symbol from all email addresses in all files (I
983            think). This might help keep spammers from harvesting our email
984            addresses.
985
986  Version 0.06
987    *   10/18/2003
988
989        (1) Removed dependence of the vector measure on PDL. Implemented
990            "in-house" sparse vector manipulation functions.
991
992        (2) Modified the README with updated documentation of similarity.pl
993            (--interact option) and wordVectors.pl.
994
995    *   10/15/2003
996
997        (1) Changed Makefile.PL so that it checks for version 1.30 of
998            QueryData
999
1000    *   10/13/2003
1001
1002        (1) Added "maxCacheSize" option to all measures.
1003
1004        (2) Added "maxCacheSize" option info to the man/pod documentation.
1005
1006        (3) Used the new dataPath() method of QueryData 1.31 in all the
1007            utilities to obtain the path of the WordNet data files.
1008
1009        (4) Modified Makefile.PL to check for PDL and BerkeleyDB dependency
1010            during installation. vector.pm is not installed on failed
1011            dependencies.
1012
1013    *   10/11/2003
1014
1015        (1) Replaced instances of deprecated WordNet::QueryData::query with
1016            WordNet::QueryData::queryWord in hso.pm
1017
1018        (2) made hso.pm check QueryData version. queryWord was broken in
1019            QueryData 1.29 and earlier
1020
1021        (3) added support for new relations in WordNet 2.0 to get_wn_info.pm
1022
1023        (4) updated test scripts to work with WN 2.0 (and WN 1.7.1)
1024
1025    *   10/06/2003
1026
1027        (1) Added rootNode option to wup.pm
1028
1029    *   09/27/2003
1030
1031        (1) Fixed syntax error in wordVectors.pl.
1032
1033        (2) Added readDB.pl to utils.
1034
1035        (3) Changed contact information in docs.
1036
1037        (4) Re-organized the samples subdirectory.
1038
1039        (5) Fixed typo in random.pm.
1040
1041        (6) Updated the MANIFEST.
1042
1043    *   09/21/2003
1044
1045        (1) Updated POD for WordNet::Similarity::wup
1046
1047        (2) Added option to wup to specify a cache size in a configuration
1048            file.
1049
1050        (3) similarity.pl now 'use's QueryData 1.30 or later. Previous
1051            versions of QueryData will not work. t/access.t also 'use's
1052            QueryData 1.30. get_wn_info.pm and lesk.pm both check for
1053            QueryData 1.30 and will die if it not found.
1054
1055        (4) Reorganized the bibliography in README and slightly re-worded
1056            part of the introduction.
1057
1058    *   09/18/2003
1059
1060        (1) Added new Wu Palmer measure of similarity
1061            (lib/WordNet/Similarity/wup.pm)
1062
1063        (2) Updated README to mention wup
1064
1065        (3) Added t/wup.t
1066
1067        (4) Updated POD for WordNet::Similarity to mention wup
1068
1069        (5) Updated the help message of similarity.pl to mention wup
1070
1071        (6) Added t/wup.t and lib/WordNet/Similarity/wup.pm to MANIFEST
1072
1073    *   09/05/2003
1074
1075        (1) Added '--interact' option to similarity.pl.
1076
1077        (2) Changed the structure of the Vector Relation File.
1078
1079        (3) Fixed a minor bug in similarity.pl. (s///g)
1080
1081        (4) Updated the perldocs for the measures.
1082
1083        (5) Incorporated some new features into the 'wordVectors.pl'
1084            utility. These features were used for thesis experiments.
1085
1086        (6) Added documentation about the Lesk and Vector relation files
1087            (they have different formats now).
1088
1089  Version 0.05
1090    *   06/03/2003
1091
1092        (1) Added new measure of semantic relatedness, based on
1093            co-occurrence vectors of WordNet glosses.
1094
1095        (2) Set up the package so that similarity.pl and the other perl
1096            utilities get installed in "/usr/local/bin".
1097
1098        (3) Complete rewrite of similarity.pl with cleaner code and added
1099            functionality:
1100
1101            (a) Multiple parts of speech can be specified as car#nv (noun
1102                and verb forms of car) or cool#nar (noun, adjective and
1103                adverb forms of cool).
1104
1105            (b) Word senses can now be specified as car#n#2, jump#v#2, etc.
1106
1107            (c) Added functionality to similarity.pl to use a local install
1108                of WordNet::Similarity modules (in non-standard
1109                directories).
1110
1111            (d) Output of similarity.pl now specifies the senses that
1112                represent the relatedness of two words.
1113
1114        (4) Enforced limit on the cache size of modules.
1115
1116        (5) Updated README to reflect the changes and to specify options for
1117            local installs of similarity.pl and the other utilities.
1118
1119        (6) Fixed the perl docs (remove leading spaces).
1120
1121        (7) Added mailing list address to documentation --
1122            (http://groups.yahoo.com/group/wn-similarity).
1123
1124        (8) Improved jcn and lin tracing ("bird-crane" problem obvious now).
1125
1126        (9) Added new utility wordVectors.pl required for
1127            WordNet::Similarity::vector module.
1128
1129  Version 0.04
1130    *   05/02/2003
1131
1132        (1) *Fixed* newline in traces.
1133
1134        (2) *Fixed* blank line bug in brownFreq.pl.
1135
1136        (3) *Fixed* "--offset" option bug in similarity.pl.
1137
1138        (4) *Fixed* lin measure non-normalized scores... added zero
1139            infocontent handling in jcn and lin.
1140
1141        (5) New utility rawtextFreq.pl, to generate information content
1142            files from plain text.
1143
1144        (6) similarity.pl supports option to specify part-of-speech of input
1145            words while measuring relatedness.
1146
1147        (7) Added option to specify (conifuration / information content)
1148            file in similarity.pl.
1149
1150        (8) Added Resnik counting option to the information content
1151            generation utilities.
1152
1153        (9) More documentation on information content utilities.
1154
1155        (10)
1156            Added Add-1 smoothing option to the information content
1157            generation utilities.
1158
1159  Version 0.03
1160    *   03/10/2003
1161
1162        (1) Removed trace bug in hso.pm.
1163
1164        (2) Added test cases for all modules.
1165
1166  Version 0.01
1167    *   02/10/2003
1168
1169        (1) Created CPAN modules from distance ver 0.11.
1170
1171        (2) Modules are completely object oriented.
1172
1173        (3) Added Adapted Lesk semantic relatedness measure -- lesk.pm.
1174
1175        (4) Added simple edge counting semantic relatedness measure --
1176            edge.pm.
1177
1178        (5) Added a random relatedness measure -- random.pm.
1179
1180        (6) jcn, res and lin measures now support verb hierarchies.
1181
1182        (7) Information content files can now be specified as parameters to
1183            the modules.
1184
1185        (8) Tools provided to build information content files from various
1186            publicly available corpora.
1187
1188        (9) Various parameters now control the behavior of the modules.
1189            These parameters are passed to the modules through
1190            'configuration files'.
1191
1192AUTHORS
1193      Ted Pedersen, University of Minnesota, Duluth
1194      tpederse at d.umn.edu
1195
1196      Siddharth Patwardhan, University of Utah, Salt Lake City
1197      sidd at cs.utah.edu
1198
1199      Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh
1200      banerjee+ at cs.cmu.edu
1201
1202      Jason Michelizzi
1203
1204SEE ALSO
1205    todo.pod
1206
1207COPYRIGHT
1208    Copyright (c) 2005, Ted Pedersen, Siddharth Patwardhan, Satanjeev
1209    Banerjee and Jason Michelizzi
1210
1211    Permission is granted to copy, distribute and/or modify this document
1212    under the terms of the GNU Free Documentation License, Version 1.2 or
1213    any later version published by the Free Software Foundation; with no
1214    Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
1215
1216    Note: a copy of the GNU Free Documentation License is available on the
1217    web at <http://www.gnu.org/copyleft/fdl.html> and is included in this
1218    distribution as FDL.txt.
1219
1220