• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

libstemmer/H08-Nov-2021-36,56634,504

stopwords/H08-Nov-2021-2,7502,735

.gitignoreH A D08-Nov-202121 21

MakefileH A D08-Nov-20214 KiB161132

READMEH A D08-Nov-20212.5 KiB6547

dict_snowball.cH A D08-Nov-20219 KiB316239

snowball.sql.inH A D08-Nov-20211.5 KiB4335

snowball_func.sql.inH A D08-Nov-20211.3 KiB3529

README

1src/backend/snowball/README
2
3Snowball-Based Stemming
4=======================
5
6This module uses the word stemming code developed by the Snowball project,
7http://snowballstem.org (formerly http://snowball.tartarus.org)
8which is released by them under a BSD-style license.
9
10The Snowball project is not currently making formal releases; it's best
11to pull from their git repository
12
13git clone https://github.com/snowballstem/snowball.git
14
15and then building the derived files is as simple as
16
17cd snowball
18make
19
20At least on Linux, no platform-specific adjustment is needed.
21
22Postgres' files under src/backend/snowball/libstemmer/ and
23src/include/snowball/libstemmer/ are taken directly from the Snowball
24files, with only some minor adjustments of file inclusions.  Note
25that most of these files are in fact derived files, not master source.
26The master sources are in the Snowball language, and are built using
27the Snowball-to-C compiler that is also part of the Snowball project.
28We choose to include the derived files in the PostgreSQL distribution
29because most installations will not have the Snowball compiler available.
30
31We are currently synced with the Snowball git commit
324456b82c26c02493e8807a66f30593a98c5d2888
33of 2019-06-24.
34
35To update the PostgreSQL sources from a new Snowball version:
36
370. If you didn't do it already, "make -C snowball".
38
391. Copy the *.c files in snowball/src_c/ to src/backend/snowball/libstemmer
40with replacement of "../runtime/header.h" by "header.h", for example
41
42for f in .../snowball/src_c/*.c
43do
44    sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f`
45done
46
472. Copy the *.c files in snowball/runtime/ to
48src/backend/snowball/libstemmer, and edit them to remove direct inclusions
49of system headers such as <stdio.h> --- they should only include "header.h".
50(This removal avoids portability problems on some platforms where <stdio.h>
51is sensitive to largefile compilation options.)
52
533. Copy the *.h files in snowball/src_c/ and snowball/runtime/
54to src/include/snowball/libstemmer.  At this writing the header files
55do not require any changes.
56
574. Check whether any stemmer modules have been added or removed.  If so, edit
58the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the
59stemmer_modules[] table in dict_snowball.c.  You might also need to change
60the LANGUAGES list in Makefile and tsearch_config_languages in initdb.c.
61
625. The various stopword files in stopwords/ must be downloaded
63individually from pages on the snowballstem.org website.
64Be careful that these files must be stored in UTF-8 encoding.
65