README
1src/backend/snowball/README
2
3Snowball-Based Stemming
4=======================
5
6This module uses the word stemming code developed by the Snowball project,
7http://snowball.tartarus.org/
8which is released by them under a BSD-style license.
9
10The files under src/backend/snowball/libstemmer/ and
11src/include/snowball/libstemmer/ are taken directly from their libstemmer_c
12distribution, with only some minor adjustments of file inclusions. Note
13that most of these files are in fact derived files, not master source.
14The master sources are in the Snowball language, and are available along
15with the Snowball-to-C compiler from the Snowball project. We choose to
16include the derived files in the PostgreSQL distribution because most
17installations will not have the Snowball compiler available.
18
19To update the PostgreSQL sources from a new Snowball libstemmer_c
20distribution:
21
221. Copy the *.c files in libstemmer_c/src_c/ to src/backend/snowball/libstemmer
23with replacement of "../runtime/header.h" by "header.h", for example
24
25for f in libstemmer_c/src_c/*.c
26do
27 sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f`
28done
29
30(Alternatively, if you rebuild the stemmer files from the master Snowball
31sources, just omit "-r ../runtime" from the Snowball compiler switches.)
32
332. Copy the *.c files in libstemmer_c/runtime/ to
34src/backend/snowball/libstemmer, and edit them to remove direct inclusions
35of system headers such as <stdio.h> --- they should only include "header.h".
36(This removal avoids portability problems on some platforms where <stdio.h>
37is sensitive to largefile compilation options.)
38
393. Copy the *.h files in libstemmer_c/src_c/ and libstemmer_c/runtime/
40to src/include/snowball/libstemmer. At this writing the header files
41do not require any changes.
42
434. Check whether any stemmer modules have been added or removed. If so, edit
44the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the
45stemmer_modules[] table in dict_snowball.c.
46
475. The various stopword files in stopwords/ must be downloaded
48individually from pages on the snowball.tartarus.org website.
49Be careful that these files must be stored in UTF-8 encoding.
50