1* There are packages available for most linux distributions through the usual channels.
2* The Clucene Sourceforge website also has some distributions available.
3
4Also in this document is information how to build from source, troubleshooting,
5performance, and how to create a new distribution.
6
7
8Building from source:
9--------------------
10
11Dependencies:
12* CMake version 2.4.2 or later.
13* A functioning and fairly new C++ compiler. We test mostly on GCC and Visual Studio 6+.
14Anything other than that may not work.
15* Something to unzip/untar the source code.
16
17Build instructions:
181.) Download the latest sourcecode from http://www.sourceforge.net/projects/clucene
19    [Choose stable if you want the 'time tested' version of code. However, often
20    the unstable version will suite your needs more since it is newer and has had
21    more work put into it. The decision is up to you.]
222.) Unpack the tarball/zip/bzip/whatever
233.) Open a command prompt, terminal window, or cygwin session.
244.) Change directory into the root of the sourcecode (from now on referred to as <clucene>)
25# cd <clucene>
265.) Create and change directory into an 'out-of-source' directory for your build.
27    [This is by far the easiest way to build,  it has the benefit of being able to
28    create different types of builds in the same source-tree.]
29# mkdir <clucene>/build-name
30# cd <clucene>/build-name
316.) Configure using cmake. This can be done many different ways, but the basic syntax is
32# cmake [-G "Script name"] ..
33    [Where "Script name" is the name of the scripts to build (e.g. Visual Studio 8 2005).
34    A list of supported build scripts can be found by]
35# cmake --help
367.) You can configure several options such as the build type, debugging information,
37    mmap support, etc, by using the CMake GUI or by calling
38# ccmake ..
39    Make sure you call configure again if you make any changes.
408.) Start the build. This depends on which build script you specified, but it would be something like
41# make
42or
43# nmake
44    Or open the solution files with your IDE.
45
46    [You can also specify to just build a certain target (such as cl_test, cl_demo,
47    clucene-core (shared library), clucene-core-static (static library).]
489.) The binary files will be available in <clucene>build-name/bin
4910.)Test the code. (After building the tests - this is done by default, or by calling make cl_test)
50# ctest -V
5111.)At this point you can install the library:
52# make install
53    [There are options to do this from the IDE, but I find it easier to create a
54    distribution (see instructions below) and install that instead.]
55or
56# make cl_demo
57    [This creates the demo application, which demonstrates a simple text indexing and searching].
58or
59	Adjust build values using ccmake or the Cmake GUI and rebuild.
60
6112.)Now you can develop your own code. This is beyond the scope of this document.
62    Read the README for information about documentation or to get help on the mailinglist.
63
64Other platforms:
65----------------
66Some platforms require specific actions to get cmake working. Here are some general tips:
67
68Solaris:
69I had problems when using the standard stl library. Using the -stlport4 switch worked. Had
70to specify compiler from the command line: cmake -DCXX_COMPILER=xxx -stlport4
71
72Building Performance
73--------------------
74Use of ccache will speed up build times a lot. I found it easiest to add the /usr/lib/ccache directory to the beginning of your paths. This works for most common compilers.
75
76PATH=/usr/lib/ccache:$PATH
77
78Note: you must do this BEFORE you configure the path, since you cannot change the compiler path after it is configured.
79
80Installing:
81-----------
82CLucene is installed in CMAKE_INSTALL_PREFIX by default.
83
84CLucene used to put config headers next to the library. this was done
85because these headers are generated and are relevant to the library.
86CMAKE_INSTALL_PREFIX was for system-independent files. the idea is that
87you could have several versions of the library installed (ascii version,
88ucs2 version, multithread, etc) and have only one set of headers.
89in version 0.9.24+ we allow this feature, but you have to use
90LUCENE_SYS_INCLUDES to specify where to install these files.
91
92Troubleshooting:
93----------------
94
95'Too many open files'
96Some platforms don't provide enough file handles to run CLucene properly.
97To solve this, increase the open file limit:
98
99On Solaris:
100ulimit -n 1024
101set rlim_fd_cur=1024
102
103GDB - GNU debugging tool (linux only)
104------------------------
105If you get an error, try doing this. More information on GDB can be found on the internet
106
107#gdb bin/cl_test
108# gdb> run
109when gdb shows a crash run
110# gdb> bt
111a backtrace will be printed. This may help to solve any problems.
112
113Code layout
114--------------
115File locations:
116* clucene-config.h is required and is distributed next to the library, so that multiple libraries can exist on the
117  same machine, but use the same header files.
118* _HeaderFile.h files are private, and are not to be used or distributed by anything besides the clucene-core library.
119* _clucene-config.h should NOT be used, it is also internal
120* HeaderFile.h are public and are distributed and the classes within should be exported using CLUCENE_EXPORT.
121* The exception to the internal/public conventions is if you use the static library. In this case the internal
122  symbols will be available (this is the way the tests program tests internal code). However this is not recommended.
123
124Memory management
125------------------
126Memory in CLucene has been a bit of a difficult thing to manage because of the
127unclear specification about who owns what memory. This was mostly a result of
128CLucene's java-esque coding style resulting from porting from java to c++ without
129too much re-writing of the API. However, CLucene is slowly improving
130in this respect and we try and follow these development and coding rules (though
131we dont guarantee that they are all met at this stage):
132
1331. Whenever possible the caller must create the object that is being filled. For example:
134IndexReader->getDocument(id, document);
135As opposed to the old method of document = IndexReader->getDocument(id);
136
1372. Clone always returns a new object that must be cleaned up manually.
138
139Questions:
1401. What should be the convention for an object taking ownership of memory?
141   Some documenting is available on this, but not much
142
143Working with valgrind
144----------------------
145Valgrind reports memory leaks and memory problems. Tests should always pass
146valgrind before being passed.
147
148#valgrind --leak-check=full <program>
149
150Memory leak tracking with dmalloc
151---------------------------------
152dmalloc (http://dmalloc.com/) is also a nice tool for finding memory leaks.
153To enable, set the ENABLE_DMALLOC flag to ON in cmake. You will of course
154have to have the dmalloc lib installed for this to work.
155
156The cl_test file will by default print a low number of errors and leaks into
157the dmalloc.log.txt file (however, this has a tendency to print false positives).
158You can override this by setting your environment variable DMALLOC_OPTIONS.
159See http://dmalloc.com/ or dmalloc --usage for more information on how to use dmalloc
160
161For example:
162# DMALLOC_OPTIONS=medium,log=dmalloc.log.txt
163# export DMALLOC_OPTIONS
164
165UPDATE: when i upgrade my machine to Ubuntu 9.04, dmalloc stopped working (caused
166clucene to crash).
167
168Performance with callgrind
169--------------------------
170Really simple
171
172valgrind --tool=callgrind <command: e.g. bin/cl_test>
173this will create a file like callgrind.out.12345. you can open this with kcachegrind or some
174tool like that.
175
176
177Performance with gprof
178----------------------
179Note: I recommend callgrind, it works much better.
180
181Compile with gprof turned on (ENABLE_GPROF in cmake gui or using ccmake).
182I've found (at least on windows cygwin) that gprof wasn't working over
183dll boundaries, running the cl_test-pedantic monolithic build worked better.
184
185This is typically what I use to produce some meaningful output after a -pg
186compiled application has exited:
187# gprof bin/cl_test-pedantic.exe gmon.out >gprof.txt
188
189Code coverage with gcov
190-----------------------
191To create a code coverage report of the test, you can use gcov. Here are the
192steps I followed to create a nice html report. You'll need the lcov package
193installed to generate html. Also, I recommend using an out-of-source build
194directory as there are lots of files that will be generated.
195
196NOTE: you must have lcov installed for this to work
197
198* It is normally recommended to compile with no optimisations, so change CMAKE_BUILD_TYPE
199to Debug.
200
201* I have created a cl_test-gcov target which contains the necessary gcc switches
202already. So all you need to do is
203# make test-gcov
204
205If everything goes well, there will be a directory called code-coverage containing the report.
206
207If you want to do this process manually, then:
208# lcov --directory ./src/test/CMakeFiles/cl_test-gcov.dir/__/core/CLucene -c -o clucene-coverage.info
209# lcov --remove clucene-coverage.info "/usr/*" > clucene-coverage.clean
210# genhtml -o clucene-coverage clucene-coverage.clean
211
212If both those commands pass, then there will be a clucene coverage report in the
213clucene-coverage directory.
214
215Benchmarks
216----------
217Very little benchmarking has been done on clucene. Andi Vajda posted some
218limited statistics on the clucene list a while ago with the following results.
219
220There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
2216108kb of HTML text.
222org.apache.lucene.demo.IndexFiles with java and gcj:
223on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
224    . running with java 1.4.1_01-99 : 20379 ms
225    . running with gcj 3.3.2 -O2    : 17842 ms
226    . running clucene 0.8.9's demo  :  9930 ms
227
228I recently did some more tests and came up with these rough tests:
229663mb (797 files) of Guttenberg texts
230on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
231- Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
232- Clucene: 232141. peak mem usage ~60, avg ~4mb ram
233
234Searching indexing using 10,000 single word queries
235- Jlucene: ~60078ms and used ~13mb ram
236- Clucene: ~48359ms and used ~4.2mb ram
237
238Distribution
239------------
240CPack is used for creating distributions.
241* Create a out-of-source build as per usual
242* Make sure the version number is correct (see <clucene>/CMakeList.txt, right at the top of the file)
243* Make sure you are compiling in the correct release mode (check ccmake or the cmake gui)
244* Make sure you enable ENABLE_PACKAGING (check ccmake or the cmake gui)
245* Next, check that the package is compliant using several tests (must be done from a linux terminal, or cygwin):
246# cd <clucene>/build-name
247# ../dist-check.sh
248* Make sure the source directory is clean. Make sure there are no unknown svn files:
249# svn stat ..
250* Run the tests to make sure that the code is ok (documented above)
251* If all tests pass, then run
252# make package
253for the binary package (and header files). This will only create a tar.gz package.
254and/or
255# make package_source
256for the source package. This will create a ZIP on windows, and tar.bz2 and tar.gz packages on other platforms.
257
258There are also options for create RPM, Cygwin, NSIS, Debian packages, etc. It depends on your version of CPack.
259Call
260# cpack --help
261to get a list of generators.
262
263Then create a special package by calling
264# cpack -G <GENERATOR> CPackConfig.cmake
265
266