• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

TODOH A D01-Nov-2021773 1612

libstemmer_c_READMEH A D01-Nov-20215 KiB12691

libstemmer_java_READMEH A D01-Nov-20211.1 KiB4125

libstemmer_c_README

1libstemmer_c
2============
3
4This document pertains to the C version of the libstemmer distribution,
5available for download from:
6
7http://snowball.tartarus.org/dist/libstemmer_c.tgz
8
9
10Compiling the library
11=====================
12
13A simple makefile is provided for Unix style systems.  On such systems, it
14should be possible simply to run "make", and the file "libstemmer.o"
15and the example program "stemwords" will be generated.
16
17If this doesn't work on your system, you need to write your own build
18system (or call the compiler directly).  The files to compile are
19all contained in the "libstemmer", "runtime" and "src_c" directories,
20and the public header file is contained in the "include" directory.
21
22The library comes in two flavours; UTF-8 only, and UTF-8 plus other character
23sets.  To use the utf-8 only flavour, compile "libstemmer_utf8.c" instead of
24"libstemmer.c".
25
26For convenience "mkinc.mak" is a makefile fragment listing the source files and
27header files used to compile the standard version of the library.
28"mkinc_utf8.mak" is a comparable makefile fragment listing just the source
29files for the UTF-8 only version of the library.
30
31
32Using the library
33=================
34
35The library provides a simple C API.  Essentially, a new stemmer can
36be obtained by using "sb_stemmer_new".  "sb_stemmer_stem" is then
37used to stem a word, "sb_stemmer_length" returns the stemmed
38length of the last word processed, and "sb_stemmer_delete" is
39used to delete a stemmer.
40
41Creating a stemmer is a relatively expensive operation - the expected
42usage pattern is that a new stemmer is created when needed, used
43to stem many words, and deleted after some time.
44
45Stemmers are re-entrant, but not threadsafe.  In other words, if
46you wish to access the same stemmer object from multiple threads,
47you must ensure that all access is protected by a mutex or similar
48device.
49
50libstemmer does not currently incorporate any mechanism for caching the results
51of stemming operations.  Such caching can greatly increase the performance of a
52stemmer under certain situations, so suitable patches will be considered for
53inclusion.
54
55The standard libstemmer sources contain an algorithm for each of the supported
56languages.  The algorithm may be selected using the english name of the
57language, or using the 2 or 3 letter ISO 639 language codes.  In addition,
58the traditional "Porter" stemming algorithm for english is included for
59backwards compatibility purposes, but we recommend use of the "English"
60stemmer in preference for new projects.
61
62(Some minor algorithms which are included only as curiosities in the snowball
63website, such as the Lovins stemmer and the Kraaij Pohlmann stemmer, are not
64included in the standard libstemmer sources.  These are not really supported by
65the snowball project, but it would be possible to compile a modified libstemmer
66library containing these if desired.)
67
68
69The stemwords example
70=====================
71
72The stemwords example program allows you to run any of the stemmers
73compiled into the libstemmer library on a sample vocabulary.  For
74details on how to use it, run it with the "-h" command line option.
75
76
77Using the library in a larger system
78====================================
79
80If you are incorporating the library into the build system of a larger
81program, I recommend copying the unpacked tarball without modification into
82a subdirectory of the sources of your program.  Future versions of the
83library are intended to keep the same structure, so this will keep the
84work required to move to a new version of the library to a minimum.
85
86As an additional convenience, the list of source and header files used
87in the library is detailed in mkinc.mak - a file which is in a suitable
88format for inclusion by a Makefile.  By including this file in your build
89system, you can link the snowball system into your program with a few
90extra rules.
91
92Using the library in a system using GNU autotools
93=================================================
94
95The libstemmer_c library can be integrated into a larger system which uses the
96GNU autotool framework (and in particular, automake and autoconf) as follows:
97
981) Unpack libstemmer_c.tgz in the top level project directory so that there is
99   a libstemmer_c subdirectory of the top level directory of the project.
100
1012) Add a file "Makefile.am" to the unpacked libstemmer_c folder, containing:
102
103noinst_LTLIBRARIES = libstemmer.la
104include $(srcdir)/mkinc.mak
105noinst_HEADERS = $(snowball_headers)
106libstemmer_la_SOURCES = $(snowball_sources)
107
108(You may also need to add other lines to this, for example, if you are using
109compiler options which are not compatible with compiling the libstemmer
110library.)
111
1123) Add libstemmer_c to the AC_CONFIG_FILES declaration in the project's
113   configure.ac file.
114
1154) Add to the top level makefile the following lines (or modify existing
116   assignments to these variables appropriately):
117
118AUTOMAKE_OPTIONS = subdir-objects
119AM_CPPFLAGS = -I$(top_srcdir)/libstemmer_c/include
120SUBDIRS=libstemmer_c
121<name>_LIBADD = libstemmer_c/libstemmer.la
122
123(Where <name> is the name of the library or executable which links against
124libstemmer.)
125
126

libstemmer_java_README

1libstemmer_java
2===============
3
4This document pertains to the Java version of the libstemmer distribution,
5available for download from:
6
7http://snowball.tartarus.org/dist/libstemmer_java.tgz
8
9
10Compiling the library
11=====================
12
13Simply run the java compiler on all the java source files under the java
14directory.  For example, this can be done under unix by changing directory into
15the java directory, and running:
16
17 javac org/tartarus/snowball/*.java org/tartarus/snowball/ext/*.java
18
19This will compile the library and also an example program "TestApp" which
20provides a command line interface to the library.
21
22
23Using the library
24=================
25
26There is currently no formal documentation on the use of the Java version
27of the library.  Additionally, its interface is not guaranteed to be
28stable.
29
30The best documentation of the library is the source of the TestApp example
31program.
32
33
34The TestApp example
35===================
36
37The TestApp example program allows you to run any of the stemmers
38compiled into the libstemmer library on a sample vocabulary.  For
39details on how to use it, run it with no command line parameters.
40
41