• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

LICENSEH A D01-Jan-20141.1 KiB2016

README.mdH A D01-Jan-20143.3 KiB10968

porterstemmer.goH A D01-Jan-201416.2 KiB904546

porterstemmer_contains_vowel_test.goH A D01-Jan-20141,022 7143

porterstemmer_fixes_test.goH A D01-Jan-2014403 2613

porterstemmer_has_repeat_double_consonant_suffix_test.goH A D01-Jan-2014807 5031

porterstemmer_has_suffix.goH A D01-Jan-20149.2 KiB456341

porterstemmer_is_consontant_test.goH A D01-Jan-20141.9 KiB8154

porterstemmer_measure_test.goH A D01-Jan-20142.1 KiB10667

porterstemmer_stem_string_test.goH A D01-Jan-20142.6 KiB13284

porterstemmer_stem_without_lower_casing_test.goH A D01-Jan-20141.1 KiB6343

porterstemmer_step1a_test.goH A D01-Jan-20141.4 KiB7854

porterstemmer_step1b_test.goH A D01-Jan-20142.2 KiB12085

porterstemmer_step1c_test.goH A D01-Jan-20141.1 KiB6846

porterstemmer_step2_test.goH A D01-Jan-20142.7 KiB139100

porterstemmer_step3_test.goH A D01-Jan-20141.5 KiB8358

porterstemmer_step4_test.goH A D01-Jan-20142.5 KiB13194

porterstemmer_step5a_test.goH A D01-Jan-20141.1 KiB6746

porterstemmer_step5b_test.goH A D01-Jan-20141.1 KiB6343

README.md

1# Go Porter Stemmer
2
3A native Go clean room implementation of the Porter Stemming Algorithm.
4
5This algorithm is of interest to people doing Machine Learning or
6Natural Language Processing (NLP).
7
8This is NOT a port. This is a native Go implementation from the human-readable
9description of the algorithm.
10
11I've tried to make it (more) efficient by NOT internally using string's, but
12instead internally using []rune's and using the same (array) buffer used by
13the []rune slice (and sub-slices) at all steps of the algorithm.
14
15For Porter Stemmer algorithm, see:
16
17http://tartarus.org/martin/PorterStemmer/def.txt      (URL #1)
18
19http://tartarus.org/martin/PorterStemmer/             (URL #2)
20
21# Departures
22
23Also, since when I initially implemented it, it failed the tests at...
24
25http://tartarus.org/martin/PorterStemmer/voc.txt      (URL #3)
26
27http://tartarus.org/martin/PorterStemmer/output.txt   (URL #4)
28
29... after reading the human-readble text over and over again to try to figure out
30what the error I made was (and doing all sorts of things to debug it) I came to the
31conclusion that the some of these tests were wrong according to the human-readable
32description of the algorithm.
33
34This led me to wonder if maybe other people's code that was passing these tests had
35rules that were not in the human-readable description. Which led me to look at the source
36code here...
37
38http://tartarus.org/martin/PorterStemmer/c.txt        (URL #5)
39
40... When I looked there I noticed that there are some items marked as a "DEPARTURE",
41which differ from the original algorithm. (There are 2 of these.)
42
43I implemented these departures, and the tests at URL #3 and URL #4 all passed.
44
45## Usage
46
47To use this Golang library, use with something like:
48
49    package main
50
51    import (
52      "fmt"
53      "github.com/reiver/go-porterstemmer"
54    )
55
56    func main() {
57
58      word := "Waxes"
59
60      stem := porterstemmer.StemString(word)
61
62      fmt.Printf("The word [%s] has the stem [%s].\n", word, stem)
63    }
64
65Alternatively, if you want to be a bit more efficient, use []rune slices instead, with code like:
66
67    package main
68
69    import (
70      "fmt"
71      "github.com/reiver/go-porterstemmer"
72    )
73
74    func main() {
75
76      word := []rune("Waxes")
77
78      stem := porterstemmer.Stem(word)
79
80      fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
81    }
82
83Although NOTE that the above code may modify original slice (named "word" in the example) as a side
84effect, for efficiency reasons. And that the slice named "stem" in the example above may be a
85sub-slice of the slice named "word".
86
87Also alternatively, if you already know that your word is already lowercase (and you don't need
88this library to lowercase your word for you) you can instead use code like:
89
90    package main
91
92    import (
93      "fmt"
94      "github.com/reiver/go-porterstemmer"
95    )
96
97    func main() {
98
99      word := []rune("waxes")
100
101      stem := porterstemmer.StemWithoutLowerCasing(word)
102
103      fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
104    }
105
106Again NOTE (like with the previous example) that the above code may modify original slice (named
107"word" in the example) as a side effect, for efficiency reasons. And that the slice named "stem"
108in the example above may be a sub-slice of the slice named "word".
109