1* There are packages available for most linux distributions through the usual channels. 2* The Clucene Sourceforge website also has some distributions available. 3 4Also in this document is information how to build from source, troubleshooting, 5performance, and how to create a new distribution. 6 7 8Building from source: 9-------------------- 10 11Dependencies: 12* CMake version 2.4.2 or later. 13* A functioning and fairly new C++ compiler. We test mostly on GCC and Visual Studio 6+. 14Anything other than that may not work. 15* Something to unzip/untar the source code. 16 17Build instructions: 181.) Download the latest sourcecode from http://www.sourceforge.net/projects/clucene 19 [Choose stable if you want the 'time tested' version of code. However, often 20 the unstable version will suite your needs more since it is newer and has had 21 more work put into it. The decision is up to you.] 222.) Unpack the tarball/zip/bzip/whatever 233.) Open a command prompt, terminal window, or cygwin session. 244.) Change directory into the root of the sourcecode (from now on referred to as <clucene>) 25# cd <clucene> 265.) Create and change directory into an 'out-of-source' directory for your build. 27 [This is by far the easiest way to build, it has the benefit of being able to 28 create different types of builds in the same source-tree.] 29# mkdir <clucene>/build-name 30# cd <clucene>/build-name 316.) Configure using cmake. This can be done many different ways, but the basic syntax is 32# cmake [-G "Script name"] .. 33 [Where "Script name" is the name of the scripts to build (e.g. Visual Studio 8 2005). 34 A list of supported build scripts can be found by] 35# cmake --help 367.) You can configure several options such as the build type, debugging information, 37 mmap support, etc, by using the CMake GUI or by calling 38# ccmake .. 39 Make sure you call configure again if you make any changes. 408.) Start the build. This depends on which build script you specified, but it would be something like 41# make 42or 43# nmake 44 Or open the solution files with your IDE. 45 46 [You can also specify to just build a certain target (such as cl_test, cl_demo, 47 clucene-core (shared library), clucene-core-static (static library).] 489.) The binary files will be available in <clucene>build-name/bin 4910.)Test the code. (After building the tests - this is done by default, or by calling make cl_test) 50# ctest -V 5111.)At this point you can install the library: 52# make install 53 [There are options to do this from the IDE, but I find it easier to create a 54 distribution (see instructions below) and install that instead.] 55or 56# make cl_demo 57 [This creates the demo application, which demonstrates a simple text indexing and searching]. 58or 59 Adjust build values using ccmake or the Cmake GUI and rebuild. 60 6112.)Now you can develop your own code. This is beyond the scope of this document. 62 Read the README for information about documentation or to get help on the mailinglist. 63 64Other platforms: 65---------------- 66Some platforms require specific actions to get cmake working. Here are some general tips: 67 68Solaris: 69I had problems when using the standard stl library. Using the -stlport4 switch worked. Had 70to specify compiler from the command line: cmake -DCXX_COMPILER=xxx -stlport4 71 72Building Performance 73-------------------- 74Use of ccache will speed up build times a lot. I found it easiest to add the /usr/lib/ccache directory to the beginning of your paths. This works for most common compilers. 75 76PATH=/usr/lib/ccache:$PATH 77 78Note: you must do this BEFORE you configure the path, since you cannot change the compiler path after it is configured. 79 80Installing: 81----------- 82CLucene is installed in CMAKE_INSTALL_PREFIX by default. 83 84CLucene used to put config headers next to the library. this was done 85because these headers are generated and are relevant to the library. 86CMAKE_INSTALL_PREFIX was for system-independent files. the idea is that 87you could have several versions of the library installed (ascii version, 88ucs2 version, multithread, etc) and have only one set of headers. 89in version 0.9.24+ we allow this feature, but you have to use 90LUCENE_SYS_INCLUDES to specify where to install these files. 91 92Troubleshooting: 93---------------- 94 95'Too many open files' 96Some platforms don't provide enough file handles to run CLucene properly. 97To solve this, increase the open file limit: 98 99On Solaris: 100ulimit -n 1024 101set rlim_fd_cur=1024 102 103GDB - GNU debugging tool (linux only) 104------------------------ 105If you get an error, try doing this. More information on GDB can be found on the internet 106 107#gdb bin/cl_test 108# gdb> run 109when gdb shows a crash run 110# gdb> bt 111a backtrace will be printed. This may help to solve any problems. 112 113Code layout 114-------------- 115File locations: 116* clucene-config.h is required and is distributed next to the library, so that multiple libraries can exist on the 117 same machine, but use the same header files. 118* _HeaderFile.h files are private, and are not to be used or distributed by anything besides the clucene-core library. 119* _clucene-config.h should NOT be used, it is also internal 120* HeaderFile.h are public and are distributed and the classes within should be exported using CLUCENE_EXPORT. 121* The exception to the internal/public conventions is if you use the static library. In this case the internal 122 symbols will be available (this is the way the tests program tests internal code). However this is not recommended. 123 124Memory management 125------------------ 126Memory in CLucene has been a bit of a difficult thing to manage because of the 127unclear specification about who owns what memory. This was mostly a result of 128CLucene's java-esque coding style resulting from porting from java to c++ without 129too much re-writing of the API. However, CLucene is slowly improving 130in this respect and we try and follow these development and coding rules (though 131we dont guarantee that they are all met at this stage): 132 1331. Whenever possible the caller must create the object that is being filled. For example: 134IndexReader->getDocument(id, document); 135As opposed to the old method of document = IndexReader->getDocument(id); 136 1372. Clone always returns a new object that must be cleaned up manually. 138 139Questions: 1401. What should be the convention for an object taking ownership of memory? 141 Some documenting is available on this, but not much 142 143Working with valgrind 144---------------------- 145Valgrind reports memory leaks and memory problems. Tests should always pass 146valgrind before being passed. 147 148#valgrind --leak-check=full <program> 149 150Memory leak tracking with dmalloc 151--------------------------------- 152dmalloc (http://dmalloc.com/) is also a nice tool for finding memory leaks. 153To enable, set the ENABLE_DMALLOC flag to ON in cmake. You will of course 154have to have the dmalloc lib installed for this to work. 155 156The cl_test file will by default print a low number of errors and leaks into 157the dmalloc.log.txt file (however, this has a tendency to print false positives). 158You can override this by setting your environment variable DMALLOC_OPTIONS. 159See http://dmalloc.com/ or dmalloc --usage for more information on how to use dmalloc 160 161For example: 162# DMALLOC_OPTIONS=medium,log=dmalloc.log.txt 163# export DMALLOC_OPTIONS 164 165UPDATE: when i upgrade my machine to Ubuntu 9.04, dmalloc stopped working (caused 166clucene to crash). 167 168Performance with callgrind 169-------------------------- 170Really simple 171 172valgrind --tool=callgrind <command: e.g. bin/cl_test> 173this will create a file like callgrind.out.12345. you can open this with kcachegrind or some 174tool like that. 175 176 177Performance with gprof 178---------------------- 179Note: I recommend callgrind, it works much better. 180 181Compile with gprof turned on (ENABLE_GPROF in cmake gui or using ccmake). 182I've found (at least on windows cygwin) that gprof wasn't working over 183dll boundaries, running the cl_test-pedantic monolithic build worked better. 184 185This is typically what I use to produce some meaningful output after a -pg 186compiled application has exited: 187# gprof bin/cl_test-pedantic.exe gmon.out >gprof.txt 188 189Code coverage with gcov 190----------------------- 191To create a code coverage report of the test, you can use gcov. Here are the 192steps I followed to create a nice html report. You'll need the lcov package 193installed to generate html. Also, I recommend using an out-of-source build 194directory as there are lots of files that will be generated. 195 196NOTE: you must have lcov installed for this to work 197 198* It is normally recommended to compile with no optimisations, so change CMAKE_BUILD_TYPE 199to Debug. 200 201* I have created a cl_test-gcov target which contains the necessary gcc switches 202already. So all you need to do is 203# make test-gcov 204 205If everything goes well, there will be a directory called code-coverage containing the report. 206 207If you want to do this process manually, then: 208# lcov --directory ./src/test/CMakeFiles/cl_test-gcov.dir/__/core/CLucene -c -o clucene-coverage.info 209# lcov --remove clucene-coverage.info "/usr/*" > clucene-coverage.clean 210# genhtml -o clucene-coverage clucene-coverage.clean 211 212If both those commands pass, then there will be a clucene coverage report in the 213clucene-coverage directory. 214 215Benchmarks 216---------- 217Very little benchmarking has been done on clucene. Andi Vajda posted some 218limited statistics on the clucene list a while ago with the following results. 219 220There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about 2216108kb of HTML text. 222org.apache.lucene.demo.IndexFiles with java and gcj: 223on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb: 224 . running with java 1.4.1_01-99 : 20379 ms 225 . running with gcj 3.3.2 -O2 : 17842 ms 226 . running clucene 0.8.9's demo : 9930 ms 227 228I recently did some more tests and came up with these rough tests: 229663mb (797 files) of Guttenberg texts 230on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields 231- Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram 232- Clucene: 232141. peak mem usage ~60, avg ~4mb ram 233 234Searching indexing using 10,000 single word queries 235- Jlucene: ~60078ms and used ~13mb ram 236- Clucene: ~48359ms and used ~4.2mb ram 237 238Distribution 239------------ 240CPack is used for creating distributions. 241* Create a out-of-source build as per usual 242* Make sure the version number is correct (see <clucene>/CMakeList.txt, right at the top of the file) 243* Make sure you are compiling in the correct release mode (check ccmake or the cmake gui) 244* Make sure you enable ENABLE_PACKAGING (check ccmake or the cmake gui) 245* Next, check that the package is compliant using several tests (must be done from a linux terminal, or cygwin): 246# cd <clucene>/build-name 247# ../dist-check.sh 248* Make sure the source directory is clean. Make sure there are no unknown svn files: 249# svn stat .. 250* Run the tests to make sure that the code is ok (documented above) 251* If all tests pass, then run 252# make package 253for the binary package (and header files). This will only create a tar.gz package. 254and/or 255# make package_source 256for the source package. This will create a ZIP on windows, and tar.bz2 and tar.gz packages on other platforms. 257 258There are also options for create RPM, Cygwin, NSIS, Debian packages, etc. It depends on your version of CPack. 259Call 260# cpack --help 261to get a list of generators. 262 263Then create a special package by calling 264# cpack -G <GENERATOR> CPackConfig.cmake 265 266