• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

bin/H08-Oct-2015-342136

doc/H08-Oct-2015-532318

lib/Text/H08-Oct-2015-1,231556

samples/H03-May-2022-394393

t/H03-May-2022-756486

CHANGESH A D08-Oct-20156.9 KiB186130

INSTALLH A D08-Oct-20152 KiB7251

MANIFESTH A D08-Oct-2015627 3029

META.jsonH A D08-Oct-2015872 4039

META.ymlH A D08-Oct-2015514 2221

Makefile.PLH A D26-Jun-2013921 2315

READMEH A D08-Oct-20153.4 KiB9168

README

1NAME
2    README - General information about Text::Similarity
3
4DESCRIPTION
5    Text-Similarity is a Perl module that allows a user to measure the
6    similarity between two strings or two files. There is one method for
7    computing similarity supported Text::Similarity::Overlaps, and others
8    can be added.
9
10    When using Text::Similarity::Overlaps, text similarity is based on
11    counting the number of overlapping words between the two files, and is
12    (optionally) normalized by the length of the files.
13
14    The lesk value provided in Text::Similarity::Overlaps is based on
15    counting the number of overlapping words and phrases between the two
16    files, and is (optionally) normalized by the length of the files.
17    Phrasal matches are scored more highly.
18
19    The smallest unit we are considered for matches are white space
20    separated strings. 'the cat and the hat' and 'these cats and these hats'
21    will only result in similarity between 'and', matches below the word
22    level are not measured.
23
24    Each input file is treated as a single string. There are methods
25    provided that allow you to write programs that measure files for
26    similarity (getSimilarity) and identifying the overlaps present in
27    strings (getOverlaps).
28
29CONTENTS
30    When the distribution is unpacked, several subdirectories are created:
31
32    /bin
33        This directory contains a driver program called text_similarity.pl
34        that can be used to conveniently measure two files for similarity.
35        Please see the perldoc for this program for more details.
36
37    /lib
38        This directory contains the Perl modules that do the actual work of
39        disambiguation. By default, these files are installed into
40        /usr/local/lib/perl5/site_perl/PERL_VERSION (where PERL_VERSION is
41        the version of Perl you are using). See the INSTALL file for more
42        information.
43
44    /doc
45        This directory contains all of the *pod files used to document the
46        system. These are processed via pod2text and the output of this is
47        placed in the top level directory, although these top level text
48        files should be considered read only.
49
50    /t  This directory contains test scripts. These scripts are run when you
51        execute 'make test'.
52
53    /samples
54        It includes two formats of stoplist file, one word per line
55        (stoplist.txt) and regular expression format (stoplist-nsp.regex).
56
57SEE ALSO
58    <http://text-similarity.sourceforge.net>
59
60AUTHORS
61     Ted Pedersen, University of Minnesota, Duluth
62     tpederse at d.umn.edu
63
64     Siddharth Patwardhan, University of Utah
65     sidd at cs.utah.edu
66
67     Satanjeev Banerjee, Carnegie Mellon University
68     banerjee at cs.cmu.edu
69
70     Jason Michelizzi
71
72     Ying Liu, University of Minnesota, Twin Cities
73     liux0395 at umn.edu
74
75    Last modified by: $Id: README.pod,v 1.1.1.1 2013/06/26 02:38:12 tpederse
76    Exp $
77
78COPYRIGHT AND LICENSE
79    Copyright (C) 2004-2008 by Jason Michelizzi, Ted Pedersen, Siddharth
80    Patwardhan, Satanjeev Banerjee
81
82    Permission is granted to copy, distribute and/or modify this document
83    under the terms of the GNU Free Documentation License, Version 1.2 or
84    any later version published by the Free Software Foundation; with no
85    Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
86
87    Note: a copy of the GNU Free Documentation License is available on the
88    web at <http://www.gnu.org/copyleft/fdl.html> and is included in this
89    distribution as FDL.txt.
90
91