1\documentclass{report}
2\usepackage{verbatim}
3\usepackage{emboss}
4
5\begin{document}
6\title{The \EMBOSS\ Administrator's Guide}
7\author{David Martin, EMBnet Norway \\
8Peter Rice, LION Bioscience \\
9Alan Bleasby, HGMP (EMBnet UK)}
10\date{This guide relates to \EMBOSS\ 2.5.0}
11
12\maketitle
13
14Copyright (c) 2000, 2002 David Martin, Peter Rice, Alan Bleasby.
15
16Permission is granted to copy, distribute and/or modify this document
17under the terms of the GNU Free Documentation
18License\URL{http://www.gnu.org/copyleft/fdl.html}, Version 1.1 or any
19later version published by the Free Software Foundation; with no
20Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
21Texts.  A copy of the license is included in the chapter entitled "GNU
22Free Documentation License".
23
24\tableofcontents
25
26\chapter{Introduction}
27\section{About this document}
28This guide has been written to assist system administrators and
29developers with the installation and configuration of \EMBOSS. If you
30are reading this to find out how to do bioinformatics then you are
31wasting your time. You are referred instead to the Resources chapter
32below where there is a list of more relevant literature and web sites.
33Experienced users may find this document useful for configuring their
34own databases and customising their \EMBOSS\ experience.
35
36
37\subsection{Credits}
38The original author of this guide was David
39Martin\URL{damartin\@@hgmp.mrc.ac.uk} at the Norwegian EMBnet
40node.\URL{http://www.no.embnet.org} It is however the result of a team
41effort. Thanks are due in particular to Johann Visagie for the FreeBSD
42information. Other contributors are acknowledged in the text.
43
44\subsection{Reproduction}
45The obligatory bit of legalese. The first version of this guide was
46not in the public domain but has been released under the GNU Free
47Documentation License by the original author.
48
49Although 'Free' in this license is usually explained as 'free as in
50freedom, not as in beer' the authors are likely to appreciate offers
51of free drinks should you ever meet them.
52\section{What is \EMBOSS?}
53
54\EMBOSS\ is a freely available suite of bioinformatics applications
55and libraries. It can be downloaded via the internet, copied,
56customised, and passed on under the terms of the various General
57Public Licenses.  \EMBOSS\ has been developed in response to the need
58for a powerful, adaptable suite of software that can interface readily
59with many different situations and meet the need of professional
60bioinformaticists, particularly those needing high throughput and/or
61scriptable capabilities.
62
63\EMBOSS\ has primarily been developed by those responsible for the
64public extensions to the GCG package. \EMBOSS\ supercedes much of EGCG
65and includes far better database interaction. \EMBOSS\ also has the
66benefit of freely accessible source code so novel applications can be
67developed rapidly and at minimal cost.
68
69\EMBOSS\ is currently only available for Unix/Linux systems but it has
70been known to compile and run on Windows NT. This document will only
71consider the UNIX version and will assume the reader has some
72familiarity with UNIX system administration.
73
74\subsection{Where to get it?}
75
76\EMBOSS\ is available for download from the primary site at Open-Bio
77by anonymous ftp.\URL{ftp://emboss.open-bio.org/pub/EMBOSS/} This
78directory contains the \EMBOSS\ package and several associated
79packages (collectively known as EMBASSY) that are distributed with
80\EMBOSS. Download these to a suitable location. Documentation is
81available on the WWW at the \EMBOSS\ web
82site.\URL{http://emboss.sf.net/}
83
84FreeBSD distributions from 4.2 onwards now include \EMBOSS\ as an
85optional package maintained by Johann
86Visagie.\URL{johann\@@egenetics.com} Please see section
87\ref{sec:FreeBSD} for more information on installation on FreeBSD.
88
89\chapter{Installation}
90\section{Retrieving \EMBOSS\ by anonymous ftp}
91\subsection{Interactive FTP}
92
93Change directory to the location in which you wish to download the
94\EMBOSS\ source code. In this example we will download the source to
95\filename{/packages/EMBOSS}. Then start your ftp client and point it
96to emboss.open-bio.org.
97
98\begin{verbatim}
99% ftp emboss.open-bio.org
100Connected to emboss.open-bio.org.
101220 (vsFTPd 2.0.1)
102530 Please login with USER and PASS.
103530 Please login with USER and PASS.
104KERBEROS_V4 rejected as an authentication type
105Name (emboss.open-bio.org:someuser):
106\end{verbatim}
107
108We are using anonymous FTP so type the username \ilcomm{anonymous}.
109
110\begin{verbatim}
111Name (emboss.open-bio.org:someuser): anonymous
112331 Guest login ok, send your complete e-mail address as password.
113Password:
114\end{verbatim}
115
116Enter your email address here as the password for user \filename{anonymous}.
117
118\begin{verbatim}
119Password:
120230 Login successful.
121Remote system type is UNIX.
122Using binary mode to transfer files.
123ftp>
124\end{verbatim}
125
126Move to the \EMBOSS\ directory and list the files. The output has been
127truncated a little to save space.
128
129\begin{verbatim}
130ftp> cd /pub/EMBOSS
131ftp> ls
132200 PORT command successful.
133150 Opening BINARY mode data connection for /bin/ls.
134total 22334
135...     1024 May 26 20:17 .gnu
136...  9079913 May 14 21:37 EMBOSS-2.5.0.tar.gz
137...       19 May 14 21:37 EMBOSS-latest.tar.gz -> EMBOSS-2.5.0.tar.gz
138...   196872 May 12 18:49 EMNU-1.0.5.tar.gz
139...   231485 May 15 13:55 ESIM4-1.0.0.tar.gz
140...   405620 May 12 18:49 HMMER-2.1.1.tar.gz
141...     1024 Jul 25 08:54 Jemboss
142...   264189 May 12 18:49 MEME-2.3.1.tar.gz
143...   251061 Jul  9 19:01 MSE-0.0.4.tar.gz
144...   694450 May 12 18:49 PHYLIP-3.573c.tar.gz
145...   200490 May 12 18:49 TOPO-0.1.tar.gz
146...     1536 Jul  9 19:01 old
147...      512 Jun 27 14:40 patchfiles
148...      512 Feb 22 15:19 tutorials
149226 Transfer complete.
150ftp>
151\end{verbatim}
152
153Now download the source files
154
155\begin{verbatim}
156ftp> get EMBOSS-latest.tar.gz
157200 PORT command successful.
158150 Opening BINARY mode data connection for EMBOSS-latest.tar.gz
159(9079913 bytes).
160...
161ftp>
162\end{verbatim}
163
164And repeat for each file. Or use \ilcomm{mget *gz} to download all the
165files at once.  Exit your ftp session with the command \ilcomm{bye}.
166
167\subsection{FTP using \progname{wget}}
168The program \progname{wget} can be used to download a remote directory
169noninteractively. More details on \progname{wget} can be obtained from
170the Free Software Foundation.\URL{http://www.gnu.org} Assuming you
171have \progname{wget} installed, use the following command which
172generates a lot of output on the screen:
173
174\begin{verbatim}
175% wget -m 'ftp://emboss.open-bio.org/pub/EMBOSS'
176--15:04:41--  ftp://emboss.open-bio.org:21/pub/EMBOSS
177           => `emboss.open-bio.org/pub/.listing'
178Connecting to emboss.open-bio.org:21... connected!
179Logging in as anonymous ... Logged in!
180==> TYPE I ... done.  ==> CWD pub ... done.
181==> PORT ... done.    ==> LIST ... done.
182
183...
184many pages truncated
185...
186
187FINISHED --15:04:55--
188Downloaded: 2,657,366 bytes in 4 files
189\end{verbatim}
190
191A new directory \filename{emboss.open-bio.org} has been created and
192EMBOSS can be found at \filename{emboss.open-bio.org/pub/EMBOSS}. You
193may wish to create a symbolic link to this from your
194\filename{/packages} directory for convenience.
195
196
197\section{Unpacking}
198
199You will have downloaded the \EMBOSS\ and EMBASSY packages to a
200suitable directory. For this example we will assume you have
201downloaded them to \filename{/packages} so you should now have the
202following files (or similar) and maybe more packages in EMBASSY.
203
204\begin{verbatim}
205% ls
206EMBOSS-latest.tar.gz
207EMNU-1.0.5.tar.gz
208ESIM4-1.0.0.tar.gz
209HMMER-2.1.1.tar.gz
210MEME-2.3.1.tar.gz
211MSE-0.0.4.tar.gz
212PHYLIP-3.573c.tar.gz
213TOPO-0.1.tar.gz
214\end{verbatim}
215
216First unpack the \EMBOSS\ distribution
217
218\begin{verbatim}
219% gunzip EMBOSS-latest.tar.gz
220% tar xf EMBOSS-latest.tar
221\end{verbatim}
222
223This will create a new directory, \filename{EMBOSS-2.5.0} or
224similar. You may wish to use \ilcomm{tar xpf} for unpacking \EMBOSS.
225
226Enter the \EMBOSS\ directory
227
228\begin{verbatim}
229% cd EMBOSS-2.5.0
230\end{verbatim}
231
232create a directory for the EMBASSY packages
233
234\begin{verbatim}
235% mkdir embassy
236\end{verbatim}
237
238Now move the EMBASSY packages to the EMBASSY directory
239
240\begin{verbatim}
241% mv ../MSE-0.0.4.tar.gz PHYLIP-3.573c.tar.gz \
242   TOPO-0.1.tar.gz embassy
243\end{verbatim}
244
245Go into the EMBASSY directory and unpack those packages.
246
247\begin{verbatim}
248% cd embassy
249
250% gunzip MSE-0.0.4.tar.gz
251% tar xf MSE-0.0.4.tar
252\end{verbatim}
253
254and so on for each EMBASSY package.
255
256Go back up one directory to the main \EMBOSS\ package directory and
257prepare to start compilation.
258
259\section{Graphics Requirements}
260
261Depending on your system you may need to explicitly configure the
262graphics. EMBOSS includes the plplot graphics library and will link to
263X11 and the recent (non-GIF) releases of the gd graphics library which
264also require libz and libpng (and possibly libjpeg). Please see the
265section 'Configuring \EMBOSS\ graphics' below.
266
267To get PLPLOT to produce PNG images you will need to have the
268\filename{z}\URL{http://www.info-zip.org/pub/infozip/zlib/},
269\filename{png}\URL{http://libpng.sourceforge.net/} and
270\filename{gd}\URL{http://www.boutell.com/gd/} libraries
271installed. \filename{gd} version $>=$ 1.8.4 is recommended. A recent
272release must be used as older versions support GIF which is NOT
273supported in later versions because of software patent problems.  If
274for some reason you do not have the required libraries and your system
275support group will not update them for the system then install all
276three latest versions (\filename{z},\filename{gd},\filename{png}) to a
277new directory and then add this new directory to your configure line
278for \EMBOSS\ --- \verb+./configure --with-pngdriver=my_dir+ where the
279\filename{z}, \filename{png} and \filename{gd} libraries were each
280installed using \verb+./configure --prefix=my_dir+
281
282??? It may also be helpful to ensure that the \ilcomm{LD\_LIBRARY\_PATH}
283environment variable is set appropriately to include the libraries in
284the path. ???
285
286   GD)  http://www.boutell.com/gd/
287   Z)   http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/
288   PNG) http://www.mirror.ac.uk/sites/ftp.libpng.org/pub/png/libpng.html
289
290   These also list the various mirror sites for non UK people.
291
292   Alternatively, using ftp :-
293
294   GD)  (boutell.com no longer allows FTP, no known mirror sites, use HTTP)
295   Z)   ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz
296   PNG) ftp://swrinde.nde.swri.edu/pub/png/src/libpng.1.2.1.tar.gz
297   You can unpack the tar.gz files in any directory, and install them in
298   a common area.
299
300   By default everything (including EMBOSS) installs
301   in /usr/local but in the examples below we use /home/joe/local
302
303   Note: gd does not use a ./configure script, and will fail at the
304   "make install" stage if the installation directory does not have a
305   /bin subdirectory. You can create this directory
306   (e.g. /home/joe/local/bin) if it does not already exist.
307
308\subsection{zlib}
309
310Zlib is avilable from these sites:
311
312\filename{http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/}
313\URL{http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/}
314\filename{http://www.info-zip.org/pub/infozip/zlib/}
315\URL{http://www.info-zip.org/pub/infozip/zlib/}
316\filename{ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz}
317\URL{ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz}
318
319To install, pick up the sources and then:
320
321\begin{verbatim}
322% gunzip -c zlib-1_1_3_tar.gz   | tar xf -
323% ln -s zlib-1.1.3   zlib
324%  cd zlib
325%  ./configure --prefix=/home/joe/local
326%  make
327%  make install
328%  cd ..
329\end{verbatim}
330
331\subsection{libpng}
332
333Libpng is avilable from these sites:
334
335\URL{http://libpng.sourceforge.net/}
336\URL{http://www.mirror.ac.uk/sites/ftp.libpng.org/pub/png/libpng.html}
337\URL{ftp://swrinde.nde.swri.edu/pub/png/src/libpng.1.2.1.tar.gz}
338
339To install, pick up the sources and then:
340
341\begin{verbatim}
342% gunzip -c libpng-1_2_1_tar.gz | tar xf -
343%   ln -s libpng-1.2.1 libpng
344%   cd libpng
345%   cp scripts/makefile.linux makefile
346\end{verbatim}
347
348Libpng has no configure script so you have to do some work by
349hand. Edit makefile, change prefix to be /home/joe/local and any
350other places - some files point to ../zlib  others use
351/usr/local/lib and /usr/local/include. On HP-UX this is
352trickier. CFLAGS has to match the definition for zlib.
353
354Now build using the edited makefile:
355
356\begin{verbatim}
357%   make
358%   make install
359%   cd ..
360\end{verbatim}
361
362
363\subsection{gd}
364
365Gd is available from these sites:
366
367\URL{http://www.boutell.com/gd/}
368
369There is no FTP server at this site.
370
371To install, pick up the sources, build zlib and libpng first, and then:
372
373\begin{verbatim}
374% gunzip -c gd-1.8.4.tar.gz     | tar xf -
375% ln -s gd-1.8.4     gd
376% cd gd
377\end{verbatim}
378
379Now edit Makefile, change the definitions for INCLUDEDIRS, LIBDIRS,
380INSTALL\_LIB, INSTALL\_INCLUDE, INSTALL\_BIN, and change all
381\filename{/usr/local} to \filename{/home/joe/local}
382
383\begin{verbatim}
384% make
385% make install
386% cd ..
387\end{verbatim}
388
389If the gd "make install" fails with a warning about the "bin"
390directory, you need to create it by hand (see above).
391
392To compile with the local version your EMBOSS configure line should
393now read:
394
395\begin{verbatim}
396./configure --with-pngdriver=/home/joe/local
397\end{verbatim}
398
399This will look for the graphics libraries in your local installation
400under \filename{/home/joe/local} instead of a system-wide location
401
402configure keeps a copy of the previous settings. With earlier releases
403of EMBOSS, or as a developer with an earlier release of autoconf, you
404may need to delete files \filename{config.cache} and
405\filename{config.status} if configure has been run before.
406
407\section{Compilation}
408
409Building \EMBOSS\ is easy. It follows the usual GNU style of
410\ilcomm{./configure}, \ilcomm{make}, \ilcomm{make install}. We'll take
411these steps one at a time.
412
413\subsection{Configure}
414
415To accept the default configuration, just type \ilcomm{./configure}
416and let \EMBOSS\ get on with it. You may however want to make some
417changes to the configuration parameters according to your local
418policy. This section will not cover all the possibilities, just some
419of the more common. The configuration script will attempt to find the
420necessary components in your system to determine how to successfully
421build \EMBOSS. It typically expects the GNU C compiler (gcc) and
422several standard libraries that should already be part of your
423Unix/Linux system. \EMBOSS\ should configure, compile and run on most
424modern Linux distributions straight out of the box.
425
426
427\subsubsection{Installation directory}
428
429You need to have write permission on the directory in which you
430eventually wish to install \EMBOSS. You may also wish to put it
431somewhere else other than the standard location of
432\filename{/usr/local/emboss}.
433
434The installation directory is controlled by the \ilcomm{--prefix}
435argument. For example, you can have all third party applications owned
436by a non-privileged user and installed in a package specific directory
437under \filename{/site/prog}
438
439\begin{verbatim}
440% ./configure --prefix=/site/prog/emboss
441\end{verbatim}
442
443will install \EMBOSS\ under \filename{/site/prog/emboss}. The binaries
444will be installed in \filename{/site/prog/emboss/bin} with shared
445libraries installed in \filename{/site/prog/emboss/lib}. System wide
446data are installed in \filename{/site/prog/emboss/share/EMBOSS/data},
447and the configuration files (ACD files) for the applications will be
448installed in \filename{/site/prog/emboss/share/EMBOSS/acd} (or for
449EMBASSY in directories corresponding to the package name.)
450Documentation is installed in
451\filename{/site/prog/emboss/share/EMBOSS/doc}.  The installation
452directory should be specified using a full path otherwise interesting
453failures may occur.
454
455The individual directories for installation can be modified with other
456configuration commands but this is usually not necessary. Run
457\ilcomm{./configure --help} to get more information on the directories
458that can be changed and other configuration options.
459
460Run \ilcomm{./configure} with the options you wish to use. This may
461take a short time as various messages scroll up the screen.
462
463All should be well with this and configure should exit with a message
464like this:
465
466\begin{verbatim}
467... much output skipped
468
469creating ./config.status
470creating plplot/Makefile
471creating plplot/lib/Makefile
472creating nucleus/Makefile
473creating ajax/Makefile
474creating emboss/Makefile
475creating emboss/acd/Makefile
476creating test/Makefile
477creating test/data/Makefile
478creating test/embl/Makefile
479creating test/pir/Makefile
480creating test/swiss/Makefile
481creating test/swnew/Makefile
482creating test/wormpep/Makefile
483creating emboss/data/Makefile
484creating emboss/data/AAINDEX/Makefile
485creating emboss/data/CODONS/Makefile
486creating emboss/data/REBASE/Makefile
487creating emboss/data/PRINTS/Makefile
488creating emboss/data/PROSITE/Makefile
489creating Makefile
490\end{verbatim}
491
492Configuration is now complete.
493
494\subsubsection{Reconfiguration}
495
496If at first you don't succeed, try, try and try again. It is not
497uncommon to make typos or other mistakes when running
498\ilcomm{./configure}. If you want to run configure again you should
499run \ilcomm{make clean} before running \ilcomm{./configure} with
500(hopefully) the correct options. With an earlier EMBOSS release, or as
501a developer with an earlier release of autoconf, you must first delete
502the file \filename{config.cache} but this is no longer produced.
503
504\subsubsection{Configuring \EMBOSS\ graphics}
505
506The PLPLOT library can produce output to many devices but requires
507certain libraries that are NOT distributed with \EMBOSS
508
509To get X-windows based output you must have X installed, or else PLplot
510will not build the required driver. You may need to specify the
511location of your X-windows library with the configuration options:
512\ilcomm{--x-includes=DIR} (X include files are in DIR)
513\ilcomm{--x-libraries=DIR} (X library files are in DIR)
514
515To explicitly configure PLPLOT without X-windows, use \ilcomm{--without-x}.
516
517You can explicitly tell \EMBOSS\ to not include PNG support with
518\ilcomm{--without-pngdriver}.
519
520 You can tell if \ilcomm{./configure} has
521found a suitable PNG library by watching for something like the
522following when running \ilcomm{./configure}:
523
524\begin{verbatim}
525checking if png driver is wanted... yes
526checking for inflateEnd in -lz... (cached) yes
527checking for png_destroy_read_struct in -lpng... (cached) yes
528checking for gdImageCreateFromPng in -lgd... (cached) yes
529\end{verbatim}
530
531This means that the configuration script has located the PNG libraries
532on your system. If you see a message indicating that
533\ilcomm{./configure} could not find the libraries or that the version
534of \filename{gd} was too old then you should install the latest
535versions of the libraries yourself and rerun configure with the
536correct \ilcomm{--with-pngdriver} value.
537
538When you run an EMBOSS graphical application you can see the list of
539installed graph devices by giving '?' as the response to the 'Graph
540type' prompt.
541
542\subsection{Configuring for 64 bit systems}
543
544\EMBOSS\ configure looks for \progname{gcc} and uses this of
545preference when compiling \EMBOSS. This is not ideal for those who
546wish to have a compiled and linked 64bit version of \EMBOSS. The
547current version is NOT 64 bit clean (ie. it does not necessarily use
54864 bit representation internally) but will compile and run quite
549happily on 64 bit systems.
550
551Additional notes are appended below for the various operating systems
552we have information on.
553
554\subsubsection{IRIX 6.5.10}
555
556In order to compile for 64 bit on IRIX you have to specify the native
557compiler in 64 bit mode (\ilcomm{cc -64}) and the linker in 64 bit
558mode (\ilcomm{/bin/ld -64}). The following notes were provided by Jose
559Ramon Valverde\footnote{jrvalverde\@@cnb.uam.es}.
560
561
562{\it We have succeeded in compiling EMBOSS for IRIX using 64 bit
563compilation.
564
565It required some tweaking, but works. The recipe for those willing to
566give it a try is: }
567
568\begin{itemize}
569	\item remove '\filename{gcc}' from your path
570	\item define \filename{COMPILER\_DEFAULTS\_PATH} appropriately
571	(see \filename{pe\_environ}) to look for a
572	\filename{compiler.defaults} file containing
573	e.g. \ilcomm{:abi=64:isa=4:proc=r10k}
574	\item \ilcomm{./configure} in \EMBOSS\ and all EMBASSY subdirs
575	\item search in all files for '\ilcomm{CC = cc}' and
576	substitute it for '\ilcomm{CC = cc -64}'
577	\item same for '\ilcomm{LD = /bin/ld}' to '\ilcomm{LD = /bin/ld -64}'
578	\item \ilcomm{make}
579\end{itemize}
580
581{\it The reason is that compiling depends on the Makefile and on libtool,
582as well as linking. We didn't spend much in looking at configure since
583the above steps where so straightforward. We know we should look into
584the configure script and add an option for 64-bit-irix-compile or some
585such, but that'll have to wait till we have time for it.
586
587Yes, we know, the search and substitute thing looks tedious, but it
588isn't, honest: create a 'chfile.sh' out of the EMBOSS source hierarchy
589containing: }
590
591\begin{verbatim}
592#/bin/sh
593cp \$1 \$1.orig
594mv \$1 tmpfile
595sed -e 's/CC="cc"/CC="cc -64"/g' tmpfile | \
596sed -e 's/CC = cc/CC = cc -64/g' | \
597sed -e 's/\/bin\/ld/\/bin\/ld -64/g' \$1
598rm tmpfile
599## if you are sure, uncomment this
600#rm \$1.orig
601\end{verbatim}
602
603{\it
604'\ilcomm{cd}' to the \filename{emboss} directory and run}
605
606\begin{verbatim}
607	find . -type f -exec /path/to/chfile.sh {} \; -print
608\end{verbatim}
609
610{\it and you are done with the \progname{CC}
611changes. \progname{Libtool} requires special treatment since it uses
612quotes.  }
613
614\subsection{Building \EMBOSS}
615
616Building \EMBOSS\ is a matter of typing '\ilcomm{make}' and going to
617find something else to do for the next ten minutes to half an hour
618depending on the speed of your system. \EMBOSS\ will first build the
619shared libraries (\filename{PL\_PLOT}, \filename{AJAX}, and
620\filename{NUCLEUS}) and then build the applications.
621
622You may see plenty of warnings (especially on SGI systems) complaining
623about libraries not being used to resolve any symbols. These can be
624safely ignored.
625
626If all goes according to plan you should have built \EMBOSS
627successfully. If not you will have to try to work out why the build
628failed. If you can't work it out yourself, send an email describing
629the problem to emboss-bug@emboss.open-bio.org preferably with a copy of the
630output from the installation.
631
632Assuming that compilation was successful, you can\footnote{You don't
633have to do this. You can leave \EMBOSS\ where it is and just add the
634path to the \filename{emboss} directory to your \ilcomm{PATH}} now
635type '\ilcomm{make install}'. After a few minutes and many pagefuls of
636messages, \EMBOSS\ should be installed where you specified in the
637\ilcomm{--prefix} option (or in the default location of
638\filename{/usr/local/emboss} if \ilcomm{--prefix} was not specified).
639
640\subsection{Post compilation setup}
641
642You will now need to make a few adjustments to your enviromnent to
643ensure that \EMBOSS\ runs smoothly.  \EMBOSS\ looks for certain
644environment variables to determine where the libraries and data are
645found. These instructions assumed you installed \EMBOSS\ in
646\filename{/site/prog/emboss}. Adjust these instructions to suit your
647installation.  Insert the following lines at the end of
648\filename{/etc/cshrc} (or \filename{~/.cshrc} for a personal
649installation)
650
651\begin{verbatim}
652setenv PLPLOT_LIB /site/prog/emboss/lib
653set path=( /site/prog/emboss/bin \${path} )
654\end{verbatim}
655
656Or for bash/ksh/sh users, insert the following at the end of
657\filename{/etc/profile} or \filename{~/.bashrc}
658
659\begin{verbatim}
660PLPLOT_LIB=/site/prog/emboss/lib
661PATH=/site/prog/emboss/bin:\$PATH
662export PLPLOT_LIB PATH
663\end{verbatim}
664
665\EMBOSS\ should now be ready for use.
666
667\subsection{\EMBOSS\ data files}
668
669\EMBOSS\ will by default install the data files (including those
670installed with \progname{Rebaseextract}, \progname{Prosextract}
671\progname{Printsextract} \progname{Aaindexextract} or
672\progname{Cutgsextract}) in the default directory
673\filename{share/EMBOSS/data} in the install prefix directory.  If
674\EMBOSS\ is not installed (for example, your own personal
675installation) the data files are written to \filename{emboss/data} in
676the directory where emboss was built.
677
678If you want to place your data files elsewhere, or have a separate set
679of datafiles you wish to use, you can set the \ilcomm{EMBOSS\_DATA}
680variable in \filename{emboss.default} or, for personal use, in your \filename{.embossrc} file.
681
682\subsection{Testing your \EMBOSS\ installation}
683
684You can test your \EMBOSS\ installation by trying the program
685'\ilcomm{wossname}'
686
687\begin{verbatim}
688% wossname -auto |more
689\end{verbatim}
690
691This should give a long list of programs that are available. Press
692space to page down through the list. This is just the \EMBOSS
693programs and doesn't include any of the EMBASSY programs, but only
694because they are not yet installed. (Note: Although wossname does have
695a -noembassy option this does not work with installed programs because
696wossname can no longer find any difference between EMBOSS and EMBASSY)
697
698\section{Installing EMBASSY}
699
700As well as the base libraries and standard EMBOSS distribution,
701various extra packages (EMBASSY) are distributed with EMBOSS.
702
703To install an EMBASSY package, go to the relevant directory. For
704example to install PHYLIP (which was unpacked into
705\filename{/packages/EMBOSS-2.5.0/embassy/PHYLIP-3.573c} earlier) go to
706the relevant directory.
707
708\begin{verbatim}
709% cd  /packages/EMBOSS-2.5.0/embassy/PHYLIP-3.573c
710% ./configure --prefix=/site/prog/emboss
711... output not shown
712% make
713... output not shown
714% make install
715... output not shown
716\end{verbatim}
717
718Note. You {\bf MUST} use the same arguments for \ilcomm{./configure}
719that you used for the installation of the main \EMBOSS\ package. It
720may be necessary to add other options as required by individual
721packages (see below).
722
723Repeat as necessary for the other EMBASSY packages. It should also be
724noted that certain EMBASSY packages may require additional libraries.
725
726You should now find that running \progname{wossname} as before lists
727the EMBASSY programs.
728
729\subsection{EMBASSY package specific notes}
730
731In most cases, EMBASSY packages should build with no problems. Known
732problems are described below.
733
734\subsubsection{Packages with no known problems}
735So far \progname{ESIM4}, \progname{HMMER}, \progname{MEME},
736\progname{MSE}, \progname{PHYLIP} and \progname{TOPO} appear to
737install without a problem using the same arguments to
738\ilcomm{configure}.
739
740\subsubsection{\progname{EMNU}}
741
742\progname{EMNU} requires the \filename{curses} or \filename{ncurses} libraries
743that come as standard on most Unix-like systems. In particular \progname{EMNU}
744requires two header files \filename{form.h} and \filename{menu.h} that are not
745distributed with all implementations.
746
747If your \filename{curses/ncurses}
748library is installed in a strange place then you may need to instruct
749\ilcomm{configure} with the option
750
751\begin{verbatim}
752--with-curses=/path/to/curses
753\end{verbatim}
754
755
756\section{Installing \EMBOSS\ in package format}
757\label{sec:FreeBSD}
758\EMBOSS\ can be installed on almost all Unix/Linux operating systems
759using the instructions above, but the package format can be far more
760convenient.  A package is a precompiled set of binaries with
761installation instructions that can be set up on your system with a
762minimum of work. In some cases the package will check for the correct
763libraries and install those as necessary.
764
765Brief instructions are given here for the packages of which we are
766aware. These are maintained separately from the main source tree and
767may also install some files in operating system standard locations
768instead of the locations used by the `raw' \EMBOSS
769distribution. Please read the more detailed instructions that
770accompany each package.
771
772\subsection{Installing \EMBOSS\ on FreeBSD}
773
774A FreeBSD \EMBOSS\ package has been created by Johann
775Visagie\URL{johann\@@egenetics.com} of Electric Genetics. This will be
776distributed on the installation CD's and through the normal
777distribution channels from FreeBSD version 4.2 onwards.
778
779For the FreeBSD user with an up-to-date ports tree\footnote{FreeBSD
780users can update their ports tree through a variety of
781mechanisms. Please see the FreeBSD specific guide produced by Johann
782for more information}, installing \EMBOSS\ reduces to two simple
783commands (as root):
784
785\begin{verbatim}
786# cd /usr/ports/biology/emboss
787# make install
788\end{verbatim}
789
790The FreeBSD specific parts of the port are that
791\filename{emboss.default} is included with the other configuration
792files under \filename{/usr/local/etc} as
793\filename{emboss.default.sample}, and the \EMBOSS\ documentation is
794installed in \filename{/usr/local/share/doc/EMBOSS} instead of the
795default location.  For further information on installation under
796FreeBSD you are referred to the Resources chapter.
797
798
799\chapter{Configuration}
800
801\EMBOSS\ can be readily configured to match your requirements. In a
802standard installation of \EMBOSS\ the configuration directives are
803looked for in the following locations and in the following search
804order:
805\begin{enumerate}
806\item A file \filename{emboss.default} in the \filename{share/EMBOSS}
807subdirectory of your \EMBOSS\ installation.\footnote{This location may
808have been redefined in installations of \EMBOSS\ that have been
809packaged for specific operating systems. See section \ref{sec:FreeBSD}
810for further information on OS specific package
811installations.}\footnote{\EMBOSS\ will also look in the
812\filename{emboss} directory under the \EMBOSS\ source distribution for
813\filename{emboss.default.template} and install this as
814\filename{emboss.default} if no existing file is found under the
815installation directory}
816\item A file \filename{.embossrc} in the directory specified by the
817\ilcomm{EMBOSSRC} environment variable.
818\item A file \filename{.embossrc} in the users home directory.
819\end{enumerate}
820\filename{emboss.default} and \filename{.embossrc} are plain text
821files that can readily be edited to suit.\footnote{A sample
822\filename{emboss.default} is located in \filename{emboss/acd} under
823the source distribution.} Redefinitions of configuration parameters
824will override those previously defined. In the descriptions that
825follow only \filename{.embossrc} will be mentioned but all directives
826can be placed in \filename{emboss.default} for site wide
827configuration.
828
829Several aspects of \EMBOSS\ can be defined. These are:
830\begin{itemize}
831\item\EMBOSS\ environment variables
832\item\EMBOSS\ databases
833\item Default behaviour of \EMBOSS\ programs
834\end{itemize}
835Databases are by far the most complex of these.
836
837\EMBOSS\ will ignore blank lines in the \filename{emboss.default} and
838\filename{.embossrc} files. It will also ignore any lines beginning
839with \ilcomm{\#} or \ilcomm{!} allowing comments to illuminate the
840declarations in the file.
841
842
843\section{\EMBOSS\ environment variables}
844
845\EMBOSS\ environment variables are set with an '\ilcomm{env}' or a
846'\ilcomm{set}' declaration. '\ilcomm{env}' and '\ilcomm{set}' are
847interchangeable.  The most important environment variable is the
848location of the \filename{.acd} files that describe each program.
849
850\begin{verbatim}
851set emboss_acdroot /site/prog/emboss/share/EMBOSS/acd
852\end{verbatim}
853
854Environment variables are useful for simplifying maintenance of your
855\filename{.embossrc}. For example you may want to specify the location
856of your databases as an environment variable. Then if you move the
857databases you only have to update one line in the configuration file.
858
859\begin{verbatim}
860set emboss_database_dir /data/databases/flatfiles
861\end{verbatim}
862
863This would then be referred to later in \filename{.embossrc} as
864
865\begin{verbatim}
866\$emboss_database_dir/embl
867\end{verbatim}
868
869for the directory  \filename{/data/databases/flatfiles/embl}
870
871\subsection{Configuring \EMBOSS\ differently for different groups of users}
872It may be the case that you have users who need to share a specific
873setup. Maybe to have access to different sets of databases or need to
874use a different data directory.
875
876It can be time consuming and error prone to maintain a series of
877individual \filename{.embossrc} files or to cause users to have to
878work in the same directory or to copy an \filename{.embossrc} to each
879directory they wish to work in.  The environment variable
880\ilcomm{EMBOSSRC} can be set to point to an arbitrary directory
881containing an \filename{.embossrc} which can then be used to give
882workgroup specific configuration. Each user then only needs to set
883\ilcomm{EMBOSSRC} in their \filename{.cshrc} (\progname{csh}) or
884\filename{.profile} (\progname{bash}) to get the workgroup specific
885setup.
886
887In our case we have several groups of researchers for whom we maintain
888biological sequence databases. These databases have been made
889available under restrictive licenses so that we cannot allow
890researchers outside the groups to access the databases. Using
891\ilcomm{\$EMBOSSRC} we can set up a common configuration for the
892members of each group by defining the databases in the
893\filename{\$EMBOSSRC/.embossrc} file.
894
895
896\section{Databases}
897
898\subsection{Database access modes}
899
900\EMBOSS\ offers three modes for accessing databases:
901\begin{description}
902
903       \item[Single:]\EMBOSS\ retrieves a single sequence indexed by
904       ID.
905
906       \item[Query:]\EMBOSS\ retrieves a set of sequences
907       corresponding to a query that can return more than one entry,
908       including accession numbers or wildcard IDs.
909
910       \item[All:]\EMBOSS\ returns all the sequences in the database
911       in no particular order.
912
913\end{description}
914
915Each database definition can configure one or many of these modes for
916database access.
917
918Typically \EMBOSS\ uses variations on the \progname{emblcd} system of
919database indexing to provide rapid access in single and query modes to
920flat file databases. The \progname{emblcd} method is implemented in a
921variety of ways depending on the original format of your database.
922The \progname{emblcd} method assumes that you have one or both of ID
923and accession number in each record and that they are unique for the
924whole database index.  \EMBOSS\ also provides methods for retrieving
925sequences via the WWW and three specific methods for interaction with
926SRS\URL{http://www.lionbioscience.com/solutions/srs} installed localy
927or through a remote public server.  For other non flatfile databases
928or flat file databases in formats not currently supported by \EMBOSS
929you will have to configure an external application to retrieve
930sequences.
931
932\subsection{General database configuration.}
933
934Each database is configured using a DB declaration.
935
936The generalised form is
937
938\begin{verbatim}
939DB databasename [
940
941Configuration options
942
943]
944\end{verbatim}
945
946The configuration options are tag/value pairs and must contain at
947least a description of the access method (using \ilcomm{method:} or
948one or more of \ilcomm{methodsingle:}, \ilcomm{methodquery:} and
949\ilcomm{methodall:}) and a description of the original format of the
950sequences (using \ilcomm{format:}).  In addition to these tags there
951will be other tags that are needed for particular methods and other
952tags that are optional.
953
954\subsubsection{Database access methods}
955
956The scope of each method is:
957
958\begin{description}
959
960\item[Single mode - \ilcomm{s}] Supports retrieval of a single
961sequence.
962
963\item[Query mode - \ilcomm{q}] Supports retrieval of a subset of the
964sequences in the database specified using a wild card query in the
965USA\footnote{Please see the \EMBOSS\ documentation for description of
966Uniform Sequence Address format}
967
968\item[All mode - \ilcomm{a}] Supports retrieval of all sequences in
969the database as a stream of data.
970
971\end{description}
972
973An example entry for each access method is shown.
974
975\paragraph{APP}\par\noindent
976Modes: \ilcomm{a q s}\par\noindent
977APP is the same as EXTERNAL.
978
979\paragraph{BLAST}\par\noindent
980Modes: \ilcomm{a q s} \par\noindent BLAST uses EMBLCD indices created
981with \progname{dbiblast} to access databases in BLAST format, created
982with NCBI's \ilcomm{formatdb} program.
983
984Note that the latest 'format version 4' is not yet documented by
985NCBI. \EMBOSS\ will only work with 'format version 3' databases, indexed
986with:
987
988\begin{verbatim}
989formatdb -A F
990\end{verbatim}
991
992We hope to support 'format version 4' databases in future. If you pick
993up a blast database from NCBI (or elsewhere) check the format. If it
994is in the new format, you will need to pick up the original FASTA
995format file, and either index it yourself with formatdb, or run
996\ilcomm{dbifasta} and use the FASTA file in \EMBOSS\ (see EMBLCD
997access method)
998
999The definition should use format: ncbi because this is what the blast
1000formatdb databases store internally.
1001
1002\begin{verbatim}
1003DB mydb [
1004#required parameters
1005   method: "blast"
1006   format: "ncbi"
1007   type: "N"
1008   dir: "\$emboss_db_dir/blas"t
1009#optional parameters
1010   fields: "sv des"
1011   release: "63.0"
1012   comment: "my comment"
1013   indexdir: "\$emboss_db_dir/blastindices"]
1014\end{verbatim}
1015
1016The index files can be kept in the same directory as the database, but
1017as each EMBLCD index needs its own directory (the filenames are fixed)
1018the indexdir is usually defined.
1019
1020The EMBLCD index files include the filenames indexed by
1021\ilcomm{dbiblast}. You can use the file: and exclude: attributes to
1022create file-specific subsets from a single \ilcomm{dbiblast} generated
1023index, but as blast index files are split only by the number of
1024entries this is not generally useful.
1025
1026If the database was indexed with additional fields, they can be
1027included in the definition as fields: to allow their use in USAs.
1028
1029\paragraph{DIRECT}\par\noindent
1030Modes: \ilcomm{a}\par\noindent Direct accesses the flatfile
1031directly. It returns all the database entries, one after the other. It
1032assumes no indexing. Queries are still possible as \EMBOSS\ will read
1033each entry and match it against the query, but are slow as the entire
1034database must be read.
1035
1036\begin{verbatim}
1037DB mydb [
1038#required parameters
1039   method: "direct"
1040   format: "embl"
1041   type: "N"
1042   dir: "\$emboss_db_dir/mydb"
1043   file: "*.dat"
1044#optional parameters
1045   fields: "sv des key org"
1046   release: "63.0"
1047   comment: "My own database with no indices"
1048   exclude: "est*.dat"
1049]
1050\end{verbatim}
1051
1052For most cases, it is simpler to use \ilcomm{dbiflat} for EMBL,
1053Genbank or SwissProt format, or \ilcomm{dbifasta} to index FASTA or NCBI
1054format files, and to use the EMBLCD access method.
1055
1056If the file format supports additional fields, they can be
1057included in the definition as fields: to allow their use in USAs.
1058
1059\paragraph{EMBLCD}\par\noindent
1060Modes: \ilcomm{a q s}\par\noindent EMBLCD uses EMBLCD indices created
1061with \progname{dbiflat} or \progname{dbifasta} to access flatfile
1062databases in the original format.
1063
1064\begin{verbatim}
1065DB mydb [
1066#required parameters
1067   method: "emblcd"
1068   format: "embl"
1069   type: "N"
1070   dir: "\$emboss_db_dir/emb"l
1071#optional parameters
1072   fields: "sv des key org"
1073   file: "*.dat"
1074   release: "63.0"
1075   comment: "my comment"
1076   exclude: "est*.dat"
1077   indexdir: "\$emboss_db_dir/indice"s
1078]
1079\end{verbatim}
1080
1081The EMBLCD index files include the filenames indexed by
1082\ilcomm{dbiflat} or \ilcomm{dbifasta}. You can use the file: and
1083exclude: attributes to create file-specific subsets from a single
1084index.
1085
1086This method can require careful setup. Please read the more specific
1087descriptions below.
1088
1089If the database was indexed with additional fields, they can be
1090included in the definition as fields: to allow their use in USAs.
1091
1092\paragraph{EXTERNAL}\par\noindent
1093Modes: \ilcomm{a q s}\par\noindent EXTERNAL uses an external
1094application to retrieve sequences.  The ID is passed as an argument to
1095the application, either replacing \%s in the command string (if
1096present) or as an additional argument (if there is no \%s).
1097
1098EXTERNAL requires the application to return the sequence on STDOUT. If
1099the application writes to somewhere else, simply wrap it in a script
1100that copies the output to STDOUT.
1101
1102\begin{verbatim}
1103DB mydb [
1104#required parameters
1105    method: "app"
1106    format: "fasta"
1107    type: "P"
1108    app: "getfromdb"
1109#optional parameters
1110    comment: "my own protein database with a custom retrieval program"
1111    app: "getfromdb mydatabase \%s"
1112]
1113\end{verbatim}
1114
1115The first app: definition will use the default call 'getfromdb mydb:id'
1116
1117The alternative app: definition will use the \%s format and call
1118'getfromdb mydatabase id'
1119
1120Both will pass either the ID or accession from the query, so that USAs
1121mydb-id:x13776 and mydb-acc:x13776 are equivalent.
1122
1123\paragraph{GCG}\par\noindent
1124Modes: \ilcomm{a q s}\par\noindent GCG uses EMBLCD indices created
1125with \progname{dbigcg} to access databases in GCG format. This method
1126uses the \filename{.ref} and \filename{.seq} files created by the
1127\progname{GCG} suite of programs.
1128
1129\begin{verbatim}
1130DB mygcgdb [
1131#required parameters
1132   method: "gcg"
1133   format: "embl"
1134   type: "N"
1135   dir: "\$emboss_db_dir/gcgembl"
1136#optional parameters
1137   fields: "sv des key org"
1138   file: "*.seq"
1139   release: "63.0"
1140   comment: "my comment"
1141   exclude: "est*"
1142   indexdir: "\$emboss_db_dir/indices"
1143]
1144\end{verbatim}
1145
1146The EMBLCD index files include the filenames indexed by
1147\ilcomm{dbigcg}. You can use the file: and exclude: attributes to
1148create file-specific subsets from a single \ilcomm{dbigcg} generated
1149index.
1150
1151\paragraph{SRS}\par\noindent
1152Modes: \ilcomm{a q s}\par\noindent SRS returns entries from a local
1153installation of SRS using the -e switch to getz to return entries in
1154the original format.
1155
1156\begin{verbatim}
1157DB mydb [
1158#required parameters
1159   method: "srs"
1160   format: "embl"
1161   type: "N"
1162#optional parameters
1163   dbalias: "embl"
1164   fields: "sv des key org"
1165   app: "getz"
1166   comment: "My srs indexed database"
1167   release: "63.0"
1168]
1169\end{verbatim}
1170
1171This access method builds an SRS commandline query to getz. If you
1172have getz installed under another name, define this as app:
1173
1174The SRS query by default uses the EMBOSS database name. If the
1175database has a different name in SRS, define dbalias: as the database
1176name to pass to SRS.
1177
1178SRS will return the results using 'getz -e' so the format should match
1179the format of the original data. For some formats this can be tricky
1180(PIR for example), so consider using SRSFASTA although this will lose
1181information that is not included in the FASTA format SRS output.
1182
1183To query using the additional fields SRS supports, add them as fields:
1184
1185\paragraph{SRSFASTA}\par\noindent
1186Modes: \ilcomm{a q s}\par\noindent
1187As SRS but returns the sequences in FASTA format. The definition must
1188include format: fasta so that EMBOSS will read the results in FASTA
1189format.
1190
1191\begin{verbatim}
1192DB mydb [
1193#required parameters
1194   method: "srsfasta"
1195   format: "fasta"
1196   type: "N"
1197#optional parameters
1198   dbalias: "embl"
1199   fields: "sv des key org"
1200   app: "getz"
1201   comment: "My srs indexed database"
1202   release: "63.0"
1203]
1204\end{verbatim}
1205
1206This access method builds an SRS commandline query to getz. If you
1207have getz installed under another name, define this as app:
1208
1209The SRS query by default uses the EMBOSS database name. If the
1210database has a different name in SRS, define dbalias: as the database
1211name to pass to SRS.
1212
1213SRS will return the results using 'getz -f -sf fasta' so the format
1214must be 'fasta'.
1215
1216To query using the additional fields SRS supports, add them as fields:
1217
1218\paragraph{SRSWWW}\par\noindent
1219Modes: \ilcomm{a q s}\par\noindent
1220As URL, but specific to an SRS web server. This method takes a base
1221URL (up to wgetz) for an SRS server, and builds the rest of the URL as
1222a valid SRS query.
1223
1224By building the URL, SRSWWW access can query both ID and accession
1225number, and can query additional fields 'sv', 'des', 'key' and 'org'
1226if they are allowed with a fields definition.
1227
1228\begin{verbatim}
1229DB mydb [
1230# required parameters
1231    method: "srswww"
1232    format: "genbank"
1233    type: "N"
1234    url: "http://www.infobiogen.fr/srs5bin/cgi-bin/wgetz?"
1235#optional parameters
1236    dbalias: "genbank"
1237    fields: "sv des key org"
1238    comment: "Genbank by SRS from InfoBiogen"
1239    proxy: ":"
1240    httpversion: "1.0"
1241]
1242\end{verbatim}
1243
1244Because queries for such fields to a remote server can find a very
1245large number of hits, and EMBOSS will load the entire output into
1246memory to process the HTML, many EMBOSS administrators choose not to
1247define these fields for an SRSWWW server.
1248
1249If there is sufficient demand, it should be possible to rewrite the
1250HTML preprocessing to avoid buffering in memory.
1251
1252SRSWWW support the \ilcomm{proxy} and \ilcomm{httpversion} settings
1253described under access method URL.
1254
1255\paragraph{URL}\par\noindent
1256Modes: \ilcomm{s}\par\noindent URL uses a defined web server to
1257retrieve a specific entry. EMBOSS may fail if the HTML causes
1258complications with parsing of the entry.
1259
1260\begin{verbatim}
1261DB mydb [
1262# required parameters
1263   method: "url"
1264   format: "genbank"
1265   type: "N"
1266   url: "http://www.infobiogen.fr/srs5bin/cgi-bin/wgetz?-e+[genbank-id:%s]"
1267#optional parameters
1268   comment: "Genbank by ID from InfoBiogen"
1269]
1270\end{verbatim}
1271
1272The \%s in the URL string indicates where \EMBOSS\ will insert the
1273identifier portion of the USA.
1274
1275At many sites, remote HTTP access is controlled by a proxy
1276server. EMBOSS uses a proxy server defined as EMBOSS\_PROXY with a
1277value in the format \ilcomm{domain.address:port}, for example:
1278
1279\begin{verbatim}
1280set emboss_httpversion 'proxy.mydomain.org:8080'
1281\end{verbatim}
1282
1283This is a global definition. For selected databases (local web-based
1284services, for example) you can turn off the proxy inside the database
1285definition with:
1286
1287\begin{verbatim}
1288DB [ ...
1289  proxy: ":"
1290]
1291\end{verbatim}
1292
1293HTTP access by default used HTTP protocol version 1.0. EMBOSS can also
1294support version 1.1, which provides chunked HTML results to improve
1295improve network performance. The HTTP version is controlled by a
1296variable EMBOSS\_HTTPVERSION and by a DB attribute, for example:
1297
1298\begin{verbatim}
1299set emboss_httpversion "1.1"
1300\end{verbatim}
1301
1302or
1303
1304\begin{verbatim}
1305DB [ ...
1306  httpversion: '1.1'
1307]
1308\end{verbatim}
1309
1310\subsection{Mixed access methods}
1311
1312For any given \ilcomm{method:} declaration, \EMBOSS\ will use that
1313method for those access modes supported by the method.
1314
1315If you wish to specify which access mode (all, query or single) should
1316be handled by which database retrieval method then the
1317\ilcomm{methodsingle:}, \ilcomm{methodquery:} and \ilcomm{methodall:}
1318declarations should be used instead of \ilcomm{method:}
1319
1320\begin{verbatim}
1321DB mydb [
1322methodsingle: app
1323format: fasta
1324app: "customapp myproteindb"
1325methodall: direct
1326dir: \$emboss_db_dir/myproteindb
1327file: myproteindb.dat
1328type: P
1329comment: "single and all access for myproteindb"
1330]
1331\end{verbatim}
1332
1333You can mix these, for example, to use a script to query a file, and
1334direct acces to read all entries,
1335
1336\begin{verbatim}
1337  methodall: 'direct'
1338  methodquery: 'external'
1339\end{verbatim}
1340
1341\subsection{Indexing and configuring flatfile databases}
1342
1343Flatfile databases are plain text files in a defined format such as
1344those released by EMBL, Swissprot and so on. The \EMBOSS\ program
1345\progname{dbiflat} is used to generate EMBLCD indices that can be used
1346for all types of database access. \progname{dbiflat} can process
1347databases in EMBL, SWISSPROT and GENBANK format. Pseudo EMBL format
1348databases which do not have unique ID and AC entries may cause
1349\progname{dbiflat} to do mysterious things and should be avoided.
1350
1351\progname{dbiflat} (and the EMBLCD access method) requires the
1352databases to be uncompressed. The examples given here will not probe
1353the deeper secrets of \progname{dbiflat} (for which the reader is
1354referred to the documentation, or failing that the source code) but
1355will show a typical installation for a common database.
1356
1357We assume that \EMBOSS\ has been installed and works. This can be
1358tested with the command \ilcomm{wossname -auto} which should list all
1359the programs available.
1360
1361In this example we will index and configure the EMBL database for use
1362with \EMBOSS.
1363
1364First download and unpack the EMBL database. This will require a
1365considerable amount of disk space. If you do not have sufficient space
1366available then just download a subset of the database.
1367
1368Use \ilcomm{cd} to move the directory in which you have unpacked
1369EMBL. This should look something like this when you run \ilcomm{ls}:
1370
1371\begin{verbatim}
1372% ls
1373est_fun.dat
1374est_hum1.dat
1375est_hum10.dat
1376.
1377Output truncated
1378.
1379syn.dat
1380unc.dat
1381vrl.dat
1382vrt.dat
1383\end{verbatim}
1384
1385Run \progname{dbiflat} to create the EMBLCD indices.
1386
1387\begin{verbatim}
1388% dbiflat
1389
1390Index a flat file database
1391      EMBL : EMBL
1392     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
1393        GB : Genbank, DDBJ
1394Entry format [SWISS]: EMBL
1395Database name: embl
1396Database directory [.]:
1397Wildcard database filename [*.dat]:
1398Release number [0.0]: 63.0
1399Index date [00/00/00]: 31/07/00
1400\end{verbatim}
1401
1402\progname{dbiflat} should happily chug away for some considerable time
1403(up to a few hours depending on the speed of your machine) and will
1404generate (eventually) the following index files:
1405
1406\begin{verbatim}
1407% ls
1408acnum.hit
1409acnum.trg
1410division.lkp
1411entrynam.idx
1412\end{verbatim}
1413
1414Now we create an entry in the \EMBOSS\ configuration files to acces
1415sthe database. It is probably a good idea to try new database
1416definitions in your local configuration file first.
1417
1418Put the following entry in your \filename{.embossrc}
1419
1420\begin{verbatim}
1421DB embl [
1422   type: N
1423   method: emblcd
1424   format: embl
1425   dir: \$emboss_db_dir/embl
1426   file: "*.dat"
1427   release: "63.0"
1428   comment: "EMBL release 63.0"
1429]
1430\end{verbatim}
1431
1432you will have needed to predefine \ilcomm{\$emboss\_db\_dir} using a
1433directive such as
1434
1435\begin{verbatim}
1436set emboss_db_dir /path_to_databases
1437\end{verbatim}
1438
1439somewhere in your \filename{emboss.default} or \filename{.embossrc}.
1440
1441Save \filename{.embossrc} and try \progname{showdb}. You should see a
1442line that looks like:
1443
1444\begin{verbatim}
1445% showdb
1446.. output deleted
1447embl          N    OK  OK  OK  EMBL release 63.0
1448.. output deleted
1449\end{verbatim}
1450
1451\subsection{Fine tuning the installation:}
1452\label{sec:finetune}
1453It is probably a good idea to set up subsections of the database so
1454that end users can search just the regions they wish to search. This
1455section applies to all access methods that use EMBLCD style indexes
1456and probably to others as well.
1457
1458Files can be included with the declaration \ilcomm{file:} or excluded
1459with the declaration \ilcomm{exclude:}. It is a good idea to put the
1460wild card directory specifier (\filename{*/})in front of the filename
1461to ensure that any path that may be included in
1462\filename{division.lkp} will be matched. Please note especially the
1463notes for \progname{GCG} formatted databases indexed with
1464\progname{dbigcg}.
1465
1466In order to just take the EST files in our EMBL database try the following:
1467
1468\begin{verbatim}
1469DB emblest [
1470   type: N
1471   method: emblcd
1472   format: embl
1473   dir: \$emboss_db_dir/embl
1474   file: "est*.dat"
1475   release: "63.0"
1476   comment: "EMBL release 63.0"
1477]
1478\end{verbatim}
1479
1480Files can also be given as a space separated list enclosed in
1481quotes. For example to set up a database of all mamallian sequences
1482(except genomes) try the following:
1483
1484\begin{verbatim}
1485DB emblallmam [
1486   type: N
1487   method: emblcd
1488   format: embl
1489   dir: \$emboss_db_dir/embl
1490   file: "rod*.dat hum*.dat mam*.dat"
1491   release: "63.0"
1492   comment: "EMBL release 63.0"
1493]
1494\end{verbatim}
1495
1496As you can see from these two examples, the \ilcomm{file:} tag takes a
1497space delimited list of filenames enclosed in quotes that can contain
1498normal wildcard (\ilcomm{?*}) characters.
1499
1500It can be quite tedious to set up a long list of sequences to
1501search. In many cases you can use the \ilcomm{exclude:} tag to make
1502things easier.
1503
1504\begin{verbatim}
1505DB emblnoest [
1506   type: N
1507   method: emblcd
1508   format: embl
1509   dir: \$emboss_db_dir/embl
1510   file: "*.dat"
1511   exclude: "est*.dat"
1512   release: "63.0"
1513   comment: "EMBL release 63.0"
1514]
1515\end{verbatim}
1516
1517This configures the \filename{emblnoest} database to contain all of
1518EMBL except the EST's.
1519
1520\subsection{Indexing and configuring GCG format databases}
1521
1522\EMBOSS\ can access GCG formatted databases, thus avoiding having
1523multiple copies of the same databases in different formats for those
1524who still use GCG alongside the flatfiles.  \EMBOSS\ creates EMBLCD
1525like indices for the GCG format databases using the program
1526\progname{dbigcg}.  This runs in much the same way as
1527\progname{dbiflat}. You will need the GCG format \filename{.seq} and
1528\filename{.header} files in order to create an EMBLCD indexed
1529database.
1530
1531Move to the GCG database directory containing your data and run
1532\progname{dbigcg}
1533
1534\begin{verbatim}
1535Index a GCG formatted database
1536      EMBL : EMBL
1537     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
1538        GB : Genbank, DDBJ
1539       PIR : NBRF
1540Entry format [EMBL]:
1541Database name: embl
1542Database directory [.]:
1543Wildcard database filename [*.seq]:
1544Release number [0.0]: 63.0
1545Index date [00/00/00]: 31/07/00
1546\end{verbatim}
1547
1548The program will chug along for a while and will then generate the
1549EMBLCD index files for the GCG format database.
1550
1551When \progname{dbigcg} prompts for the entry format (\ilcomm{Entry
1552format [EMBL]:}) you should enter the original database format before
1553you ran \progname{embltogcg} or similar to generate the \progname{GCG}
1554databases.
1555
1556The following entry should be put in your \filename{.embossrc}
1557
1558\begin{verbatim}
1559DB gcgembl [
1560   type: N
1561   method: gcg
1562   format: embl
1563   dir: \$emboss_db_dir/embl
1564   file: "*.dat"
1565   release: "63.0"
1566   comment: "EMBL release 63.0"
1567]
1568\end{verbatim}
1569
1570\progname{showdb} should show your newly configured database.
1571
1572You can configure subsets of the databases in the same way as for the
1573original format databases, described in section \ref{sec:finetune}
1574above. One difference to \progname{dbiflat} indexing is that both the
1575\filename{.seq} and \filename{.header} files are listed in the
1576\filename{division.lkp} file. \ilcomm{file:} and \ilcomm{exclude:}
1577directives should therefore be of the form \ilcomm{exclude:
1578*/em\_est*} instead of just \ilcomm{*/em\_est*.seq}.
1579
1580\subsection{Indexing and configuring BLAST databases}
1581BLAST format databases are generated for efficient homology searching
1582using the BLAST programs. It can be convenient to avoid redundant
1583copies of databases so \EMBOSS\ provides a mechanism for accessing
1584these databases.
1585
1586BLAST format databases are those generated using the tools distributed
1587with NCBI-BLAST or with WU-BLAST.
1588
1589\begin{comment}At present \EMBOSS
1590will only index BLAST databases created from FASTA format input files
1591with one of the recognised header formats.  More information on the
1592relevant formats can be found in subsection \ref{subsec:fasta}
1593below.
1594\end{comment}
1595
1596For indexing of one BLAST database, move to the
1597directory containing your BLAST format databases and run
1598\progname{dbiblast}
1599
1600\begin{verbatim}
1601Index a BLAST database
1602Database name: blastsw
1603Database directory [.]:
1604database base filename [blastsw]:
1605Release number [0.0]:
1606Index date [00/00/00]:
1607         N : nucleic
1608         P : protein
1609         ? : unknown
1610Sequence type [unknown]: p
1611         1 : wublast and setdb/pressdb
1612         2 : formatdb
1613         0 : unknown
1614Blast index version [unknown]: 2
1615
1616\end{verbatim}
1617
1618The program will chug along for a while and will then generate the
1619EMBLCD index files for the BLAST format database.
1620
1621The following entry (or one like it that is more appropriate to your
1622particular installation) should be put in your \filename{.embossrc}
1623
1624\begin{verbatim}
1625DB blastsw [
1626   type: P
1627   method: blast
1628   format: ncbi
1629   dir: \$emboss_db_dir/blastsw
1630   file: "blastsw"
1631   release: "38.9"
1632   comment: "BLAST format Swissprot"
1633]
1634\end{verbatim}
1635
1636\progname{showdb} should show your newly configured database.
1637
1638Because of the way BLAST works, many sites may group their BLAST
1639databases in the same directory. You can index these {\it in situ}
1640with \progname{dbiblast} but this may require some extra steps if your
1641databases are not of the same type as generation of subsequent index
1642files will overwrite those that already exist. To avoid overwriting of
1643index files you can index many databases with one set of index files,
1644or you can use the \ilcomm{indexdir} options to place the indices in a
1645different directory.
1646
1647There are two requirements for indexing several databases together in
1648one index. The first is that the databases are the same type
1649(protein/nucleic acid) and generated with the same tool (pressdb or
1650formatdb); the second is that all the ID and accession numbers in the
1651combined databases are unique.
1652
1653Run \progname{dbiblast} as before but specify all the databases you
1654wish to be included when prompted for the database filename.
1655
1656\begin{verbatim}
1657Index a BLAST database
1658Database name: alldbs
1659Database directory [.]:
1660database base filename [alldbs]: dbone dbtwo dbthree dbfour
1661Release number [0.0]:
1662Index date [00/00/00]:
1663         N : nucleic
1664         P : protein
1665         ? : unknown
1666Sequence type [unknown]: p
1667         1 : wublast and setdb/pressdb
1668         2 : formatdb
1669         0 : unknown
1670Blast index version [unknown]: 2
1671
1672\end{verbatim}
1673
1674These can then be configured as described in section
1675\ref{sec:finetune} above by using the '\ilcomm{file:}' and
1676'\ilcomm{exclude:}' tags as appropriate.\footnote{There is one
1677difference to the standard EMBLCD access method in that the database
1678indexes will not allow the generation of exclusive subsections of the
1679combined database. If an ID or accession number is specified that is
1680present in the index then the sequence will be returned irrespective
1681of which database it is in.}
1682
1683When you have databases of different types, generated with different
1684programs or where the ID/accession numbers are duplicated between
1685databases the preferred strategy is probably to keep the source data
1686for the individual databases in separate directories and index them
1687there.\footnote{Keeping one directory with symbolic links for your
1688BLAST installation will ensure that BLAST continues to function
1689correctly if you set BLASTDB to point to the directory containing the
1690symbolic links. The EMBOSS indices can be placed wherever you wish as
1691long as you remember to run \progname{dbiblast} with the appropriate
1692options and put an appropriate \ilcomm{indexdir} tag in the DB
1693configuration in your ~/.embossrc}
1694
1695Alternatively you can place the index files in a separate
1696directory. This requires that you run \progname{dbiblast} with the
1697\ilcomm{-indexdirectory} option and set the \ilcomm{indexdir:} tag in
1698the database configuration to point to the correct database. The
1699example below illustrates database configuration using the
1700\ilcomm{indexdir} options.
1701
1702\begin{verbatim}
1703% dbiblast -indexdir=/databases/indices/mydb
1704Index a BLAST database
1705Database name: mydb
1706Database directory [.]:
1707database base filename [mydb]:
1708Release number [0.0]:
1709Index date [00/00/00]:
1710         N : nucleic
1711         P : protein
1712         ? : unknown
1713Sequence type [unknown]: p
1714         1 : wublast and setdb/pressdb
1715         2 : formatdb
1716         0 : unknown
1717Blast index version [unknown]: 2
1718
1719\end{verbatim}
1720
1721The corresponding entry in \filename{~/.embossrc} (or
1722\filename{emboss.default}) would look like:
1723
1724
1725\begin{verbatim}
1726DB mydb [
1727   type: P
1728   method: blast
1729   format: ncbi
1730   dir: \$emboss_db_dir/blastsw
1731   indexdir: /databases/indices/mydb
1732   file: mydb
1733   release: "1.0"
1734   comment: "My BLAST DB with an index in a different directory"
1735]
1736\end{verbatim}
1737
1738Again, multiple indices cannot coexist in the same directory so care
1739should be taken when using the \ilcomm{indexdir} options that an
1740existing database index is not overwritten.
1741
1742\begin{comment}
1743\subsubsection{FASTA formats used with \progname{dbiblast}}
1744\label{subsec:fasta}
1745The following FASTA formats are recognised by \progname{dbiblast}:
1746
1747\begin{tabular}[t]{|l|l|}\hline \setlength{\baselineskip}{1.2\baselineskip}
1748GENBANK/NCBI & \ilcomm{> \ldots |accno|id \ldots }\\
1749\hline
1750GCG & \ilcomm{>{\sl dbname}:accno id \ldots }\\
1751\hline
1752SIMPLE &\ilcomm{ >accno id \ldots} \\
1753\hline
1754ID & \ilcomm{>id}\\
1755\hline
1756\end{tabular}
1757\ilcomm{...} refers to any text. Note that the ID must be the only
1758item in the header for the ID format.
1759
1760\end{comment}
1761\subsection{Indexing and configuring FASTA databases}
1762
1763The FASTA specifications just define the sequence file as a header
1764line that begins with \ilcomm{>} and subsequent lines containing the
1765sequence.  The header line can be present in an almost infinite number
1766of formats, several of which can be processed by \EMBOSS.  \EMBOSS
1767attempts to determine the accession number and/or ID for each
1768sequence.  For indexing purposes there is no semantic difference
1769between an accession number and an ID. In the real world, acession
1770numbers are immutable, ie. they do not change with subsequent releases
1771of the dataabse, but ID's may change. In any case IDs and accession
1772numbers are unique, and that is all that matters for database indexing
1773\EMBOSS.
1774
1775The program used to process FASTA format databases is
1776\progname{dbifasta}. It can recognise the following header line
1777formats, specified on the command line:
1778
1779\begin{tabular}[t]{|l|l|}\hline\setlength{\baselineskip}{1.5\baselineskip}
1780simple &%
1781\ilcomm{>id ...}\\
1782\hline
1783idacc &%
1784\ilcomm{>id accno ...}\\
1785\hline
1786gcgid &%
1787\ilcomm{>db:id ...}\footnotemark[\value{footnote}]\\
1788\hline
1789gcgidacc &%
1790\ilcomm{>db:id acc ...}\footnotemark[\value{footnote}]\\
1791\hline
1792dbid &%
1793\ilcomm{>db id ...}\footnotemark\\
1794\hline
1795ncbi &%
1796\ilcomm{>...[|accno]|id ...}\footnotemark\\
1797\hline
1798\end{tabular}
1799\addtocounter{footnote}{-1} \footnotetext{{\em db} is one word}
1800\addtocounter{footnote}{1} \footnotetext{The ID is always taken to be
1801the characters after the last bar (\ilcomm{|}). The previous field is
1802also indexed but ONLY if it looks like an accession number
1803(e.g. AC00001).}
1804
1805
1806Other header formats will not be recognised by \progname{dbifasta} and
1807will cause indexing and/or database lookup to fail. If you have a
1808different header format that \progname{dbifasta} cannot yet handle you
1809have two options:
1810\begin{enumerate}
1811\item (The preferred option) Get a C programmer to modify the source
1812code for \progname{dbifasta} and recompile. If you are a community
1813spirited person you will also contribute these changes to the main
1814\EMBOSS\ source tree. (email emboss-dev\@@emboss.open-bio.org for more
1815information on contributing changes to the \EMBOSS\ source code and/or
1816read the \EMBOSS\ developers documentation)
1817\item (The quick hack) Write a custom script (using
1818e.g. BioPerl\URL{http://www.bioperl.org}) to access your database and
1819use \ilcomm{method: external} to configure it. This is less desirable
1820as you may be limited in the access modes you can use.
1821\end{enumerate}
1822
1823To index a FASTA format database, run \progname{dbifasta}.
1824
1825\begin{verbatim}
1826% dbifasta
1827Index a fasta database
1828    simple : >ID
1829     idacc : >ID ACC
1830     gcgid : >db:ID
1831  gcgidacc : >db:ID ACC
1832      ncbi : >blah|...[|ACC]|ID
1833ID line format [idacc]:
1834Database name: mydb
1835Database directory [.]:
1836Wildcard database filename [*.dat]: mydb.fasta
1837Release number [0.0]:
1838Index date [00/00/00]:
1839\end{verbatim}
1840
1841\progname{dbifasta} will chug along for a little while and will
1842produce the index files. You can use the same \ilcomm{indexdir}
1843options as for \progname{dbiflat},\progname{dbigcg} and
1844\progname{dbiblast} to place the indices in a different directory.
1845
1846Place the following entry in your \filename{.embossrc}
1847
1848\begin{verbatim}
1849DB mydb [
1850        type: P
1851        method: emblcd
1852        format: fasta
1853        dir: \$emboss_db_dir/mydb
1854	file: mydb.fasta
1855        comment: "My database"
1856]
1857\end{verbatim}
1858
1859\ilcomm{format:} should be \ilcomm{dbid}, \ilcomm{ncbi} or
1860\ilcomm{fasta} (for every format except \ilcomm{dbid} or
1861\ilcomm{ncbi}. The same \ilcomm{file:} and \ilcomm{include:} tags can
1862be used as for the other database indexing programs.
1863
1864
1865\subsection{Configuring \EMBOSS\ to use SRS for database lookup.}
1866
1867\ilcomm{method: srs} is really a special case of \ilcomm{method:
1868external} with some additional features.
1869
1870SRS is a powerful database querying system that can cross reference
1871between different databases, launch applications and so on. SRS can be
1872run either through a web interface (see the description of the URL
1873method above for an example) or via the command line program
1874\progname{getz}.  Indexing and configuring databases for SRS is
1875outside the scope of this document which will describe how to connect
1876to preconfigured and indexed SRS databases.\footnote{For information
1877on configuring and indexing SRS databases please look at the SRS
1878administrators guide \filename{www/doc/srsadmin.pdf} in your SRS 6
1879installation} If \progname{getz} is already in your \ilcomm{PATH}
1880environment variable then insert the following (or similar) in your
1881\filename{.embossrc}:
1882
1883\begin{verbatim}
1884 DB emblgetz [
1885    type: N
1886    method: srs
1887    release: "63"
1888    format: embl
1889    comment: 'EMBL using getz'
1890    dbalias: embl
1891    app: getz
1892]
1893\end{verbatim}
1894
1895This will provide access to the SRS database 'embl' as
1896\ilcomm{emblgetz:acc}. If the SRS database has a different name to the
1897\EMBOSS\ database (as is the case here) then the \ilcomm{dbalias:} tag
1898should be used to access the correct SRS database.
1899
1900This configuration can be extremely slow for the all access mode. It
1901is probably a better idea to set up the database as follows:
1902
1903\begin{verbatim}
1904 DB emblgetz [
1905    type: N
1906    methodquery: srs
1907    release: "63"
1908    format: embl
1909    comment: 'EMBL using getz'
1910    dbalias: embl
1911    app: getz
1912    methodall: direct
1913    file: "*.dat"
1914    dir: \$emboss_db_dir/embl
1915]
1916\end{verbatim}
1917
1918which will use \ilcomm{method: srs} for the \ilcomm{query} access mode
1919but will use \ilcomm{method: direct} for the \ilcomm{all} access mode,
1920thus speeding up reading of the whole database.
1921
1922The SRSFASTA access method is identical to the normal SRS method
1923except that it returns the sequence in FASTA format and so does not
1924need a \ilcomm{format:} tag.
1925
1926
1927\subsection{Indexing and configuring other databases}
1928
1929Many institutions may have local databases set up in their own
1930Laboratory Information Management System. \EMBOSS\ provides a simple
1931mechanism for interfacing with such systems.
1932
1933As long as a program is available that can be called noninteractively
1934and returns the specified sequence on standard output, \EMBOSS\ can
1935interface with it.  Use method: app or external (the two are
1936equivalent) and app: "program command".  The ID given in the USA will
1937be appended to the command used to run the program. It is probably
1938best to specify the methods available using the method subsets,
1939methodall:, methodquery: and methodsingle: rather than using the
1940generic method: tag.
1941
1942
1943\section{Other data}
1944
1945\EMBOSS\ can be integrated with some common biological
1946databases. These are described in this section.
1947
1948\subsection{REBASE}
1949
1950Rebase is the restriction enzyme database maintained by New
1951England Biolabs. It is needed for programs such as remap and
1952restrict.
1953
1954The latest version of Rebase can be obtained by anonymous
1955FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/rebase} \EMBOSS\ needs
1956the \filename{withrefm} file. The data is extracted for \EMBOSS\ with
1957the program \progname{rebaseextract}.
1958
1959\begin{verbatim}
1960% mkdir /site/prog/emboss/data/REBASE
1961% rebaseextract
1962Extract data from REBASE
1963Full pathname of WITHREFM: /data/rebase/withrefm.208
1964\end{verbatim}
1965
1966Rebase is now installed and ready to use.
1967
1968\subsection{TRANSFAC}
1969
1970Transfac is the transcription factor binding site database. It is
1971available by anonymous
1972FTP.\footnote{ftp://transfac.gbf.de/pub/transfac/ascii/} Unpacking the
1973distribution reveals a file called site.dat. This is the one \EMBOSS
1974needs.
1975
1976Run \progname{tfextract} to extract the data from TRANSFAC.
1977
1978\begin{verbatim}
1979% tfextract
1980Extract data from TRANSFAC
1981Full pathname of transfac SITE.DAT: /databases/transfac/site.dat
1982\end{verbatim}
1983
1984\progname{tfscan} can now access the TRANSFAC database.
1985
1986\subsection{PROSITE}
1987
1988Prosite is a database of regular expressions that match potentially
1989diagnostic regions for structural/functional classification of
1990proteins. \EMBOSS\ needs this database for the patmatmotifs program.
1991
1992PROSITE can be obtained via anonymous
1993FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/prosite}
1994
1995You may need to create a PROSITE subdirectory under data in the
1996\EMBOSS\ installation directory.
1997
1998Then run \progname{prosextract} to build the \EMBOSS\ Prosite database.
1999
2000\begin{verbatim}
2001% prosextract
2002Builds the PROSITE motif database for patmatmotifs to search
2003Enter name of prosite directory: /data/prosite
2004\end{verbatim}
2005
2006PROSITE is now integrated into your EMBOSS installation.
2007
2008\subsection{PRINTS}
2009
2010Prints is a database of diagnostic patterns of blocks of sequence
2011homology in protein families. The PRINTS database can be searched
2012using the \EMBOSS\ program \progname{pscan}.
2013
2014PRINTS can be obtained via anonymous
2015FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/prints} The database
2016is made available as compressed files which should be uncompressed
2017using \progname{gzip} before integrating them into \EMBOSS
2018
2019PRINTS is integrated with \EMBOSS\ using the program \progname{printsextract}
2020
2021\begin{verbatim}
2022% printsextract
2023Extract data from PRINTS
2024Input file: /data/prints/prints27_0.dat
2025\end{verbatim}
2026
2027The PRINTS database is now integrated with \EMBOSS.
2028
2029\subsection{AAINDEX}
2030
2031An amino acid index is a set of 20 numerical values representing any
2032of the different physicochemical and biological properties of amino
2033acids.  The AAindex1 section of the Amino Acid Index Database is a
2034collection of published indices together with the result of cluster
2035analysis using the correlation coefficient as the distance between two
2036indices.  This section currently contains 437 indices in release
2037\filename{4.0} of the database.
2038
2039The \EMBOSS\ programs \progname{pepwindow} and {pepwindowall} plot
2040hydrophobicity using the data from an Aaindex entry. If Aaindex is
2041installed these programs can plot the other amino acid properties.
2042
2043Aaindex can be obtained via anonymous
2044FTP.\footnote{ftp://ftp.genome.ad.jp/pub/db/genomenet/aaindex/aaindex1}
2045
2046Aaindex is integrated with \EMBOSS\ using the program \progname{aaindexextract}
2047
2048\begin{verbatim}
2049% aaindexextract
2050Extract data from AAINDEX
2051Full pathname of file aaindex1: /data/aaindex/aaindex1
2052\end{verbatim}
2053
2054The AAINDEX database is now integrated with \EMBOSS.
2055
2056\subsection{CUTG}
2057
2058The CUTG database contains a series of codon usage tables calculated
2059from GenBank.
2060
2061CUTG can be obtained via anonymous
2062FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/cutg/ or
2063ftp://ftp.kazusa.or.jp/pub/codon/current/}
2064
2065CUTG is integrated with \EMBOSS\ using the program
2066\progname{cutgextract} which writes files to the CODONS data
2067directory.
2068
2069\begin{verbatim}
2070% cutgextract
2071Extract data from CUTG
2072CUTG directory [.]: /data/cutg/
2073\end{verbatim}
2074
2075The CUTG database is now integrated with \EMBOSS.
2076
2077\subsection{Miscellaneous data files}
2078
2079Other data files should be kept in the data directory under the main
2080\EMBOSS\ installation. Individual users personal data files can be
2081kept in the current working directory, a subdirectory
2082\filename{.embossdata} of the current directory, their home directory
2083or a subdirectory \filename{.embossdata} of their home
2084directory. \EMBOSS\ will search these locations in this order and will
2085stop as soon as it finds a matching file. If the personal directories
2086do not contain the desired file, \EMBOSS\ will search the system wide
2087data directory, \filename{/site/prog/emboss/data} in this example.
2088
2089Apparently inexplicable errors when running \EMBOSS\ programs may be
2090caused by the system not using the data files one expects. The search
2091path can be displayed in search order using the command
2092\progname{embossdata}.
2093
2094\section{Default program settings}
2095
2096As with many other areas, the default behaviour of programs can be
2097controlled by setting appropriate values in \filename{.embossrc}.
2098
2099All general qualifiers\footnote{See the \EMBOSS\ Quick Guide or the
2100web documentation (or use \ilcomm{wossname -help -verbose}) for an
2101overview of general qualifiers.} can be specified as
2102
2103\begin{verbatim}
2104set emboss_QUALIFIER 1
2105\end{verbatim}
2106
2107where \ilcomm{QUALIFIER} is one of the general qualifiers and the
2108value can be \ilcomm{1} or \ilcomm{1} for true, or \ilcomm{0} or
2109\ilcomm{N} for false.
2110
2111Setting the qualifier value to true has the effect of running every
2112program with that qualifier set.\footnote{You can specifically unset
2113it by using the \ilcomm{-noQUALIFIER} command line option} Qualifiers
2114can be set and will work in the same way as if you set them when
2115running the program. For example you can \ilcomm{set emboss\_verbose
2116Y} and the program will run normally, but when the program is run with
2117the \ilcomm{-help} qualifier, the output will be in verbose form.
2118
2119There is no point in globally setting options that are there for
2120producing help output.
2121
2122Qualifiers that can be set:
2123
2124\begin{description}
2125
2126\item[VERBOSE] Causes \ilcomm{-help} to print verbose text.
2127
2128\item[STDOUT] Causes all output to go to \filename{STDOUT} as
2129default. Programs will usually build a default output file name form
2130the input sequence and the program name.
2131
2132\item[DEBUG] Writes debugging output to a file. Useful for finding
2133bugs as a command line option.
2134
2135\item[OPTIONS] Enable prompting for optional parameters.
2136
2137\item[FILTER] Take input from \filename{STDIN} and send it to
2138\filename{STDOUT}, and turn on \ilcomm{-auto}
2139
2140\item[AUTO] Do not prompt for any options but accept the defaults if
2141no values are given.
2142
2143\item[WARNING] Print warning messages to \filename{STDERR} (default is true)
2144
2145\item[ERROR] Print error messages to \filename{STDERR} (default is true)
2146
2147\item[FATAL] Print fatal messages to \filename{STDERR} (default is true)
2148
2149\item[DIE] Print crash messages to \filename{STDERR}
2150
2151\end{description}
2152
2153These general qualifiers are typically used by advanced users
2154(\ilcomm{-options}, \ilcomm{-verbose}) or by developers
2155(\ilcomm{-debug -acdlog}).
2156
2157
2158Other program options that can be set are \ilcomm{emboss\_format},
2159\ilcomm{emboss\_acdroot}, and \ilcomm{emboss\_data}. The value of
2160\ilcomm{emboss\_format} determines which default sequence format to
2161use for output. for example, if you are running \EMBOSS\ alongside
2162\progname{GCG} you may wish to have the following entry in your
2163\progname{.embossrc}
2164
2165\begin{verbatim}
2166set emboss_FORMAT gcg
2167set emboss_OUTFORMAT gcg
2168\end{verbatim}
2169
2170which has the effect of using \progname{GCG} format by
2171default.\footnote{This can of course be overridden using the
2172\ilcomm{-sformat} and \ilcomm{-osformat} associated qualifiers. See
2173the \EMBOSS\ ACD Syntax documentation or the \EMBOSS\ Quick Guide for
2174more information.}
2175
2176\ilcomm{emboss\_acdroot} \filename{/path/to/acd} can be set if you
2177wish to use a different directory for the ACD files, and
2178\ilcomm{emboss\_data} \filename{/path/to/data} if you wish to use a
2179separate data directory.
2180
2181
2182\section{Logging}
2183
2184Many system administrators may wish to make use of the logging
2185facilities of \EMBOSS. Setting the variable \ilcomm{emboss\_logfile}
2186in \filename{emboss.default} or \filename{.embossrc} allows the system
2187to keep a log of which programs are used when and by whom.
2188
2189\begin{verbatim}
2190set emboss_logfile /site/log/emboss.log
2191\end{verbatim}
2192
2193The log file structure is very simple. Three tab separated fields are
2194stored, program name, user name, and the date and time.
2195
2196\begin{verbatim}
2197prettyplot      joeuser        Wed Aug 02 14:29:13 2000
2198\end{verbatim}
2199
2200The file defined in emboss\_logfile should be world writable. The
2201following command ensures logging can occur.
2202
2203\begin{verbatim}
2204chmod +w /site/log/emboss.log
2205\end{verbatim}
2206
2207All settings can be overridden in a users \filename{.embossrc} files
2208by redefining the relevant variables. So to prevent our system usage
2209being logged we can redefine emboss\_logfile by putting the following
2210entry in our \filename{.embossrc} file.
2211
2212\begin{verbatim}
2213set emboss_logfile /dev/null
2214\end{verbatim}
2215
2216This behaviour may change in the future to prevent users redefining
2217some system settings.
2218
2219\chapter{Graphical interfaces to EMBOSS}
2220
2221This chapter needs to be written. It will be written when the
2222available GUIs are stable enough to document.
2223
2224\chapter{Resources}
2225\section{Web sites}
2226\subsection{Programs}
2227\begin{description}
2228\item[\EMBOSS\ source code]ftp://emboss.open-bio.org/pub/EMBOSS
2229\item[\EMBOSS\ Documentation]http://emboss.sf.net/
2230\item[BLAST tools]Tools for generating BLAST format databases are
2231contained in the NCBI toolkit which can be obtained from NCBI at:
2232\begin{quote}
2233http://www.ncbi.nlm.nih.gov/
2234\end{quote}
2235\item[SRS software]The SRS software can be obtained from Lion
2236Bioscience.\URL{http://www.lionbioscience.com/solutions/srs} This is a
2237commercial package but at the time of writing is available free of
2238charge to academic institutions.
2239\item[\progname{wget}]Various useful utilities including the
2240\progname{wget} program are available from the Free Software
2241Foundation.\URL{http://www.gnu.org}
2242\end{description}
2243\subsection{Databases}
2244
2245Most of the databases mentioned in the text along with many others can
2246be obtained via anonymous ftp from the European Bioinformatics
2247Institute (EBI) at:
2248\begin{quote}
2249ftp://ftp.ebi.ac.uk/pub/databases
2250\end{quote}
2251Please use a mirror site where possible to avoid overloading of the
2252EBI's resources.
2253
2254Other databases can be obtained from NCBI (Genbank,UniGene etc.)
2255
2256\subsection{Other Documentation}
2257Please review the \EMBOSS\ documentation available on the WWW at the
2258URL above.
2259
2260\begin{description}
2261\item[The \EMBOSS\ Quick guide]A pocket reference guide to using
2262\EMBOSS\URL{ftp://ftp.no.embnet.org/pub/EMBOSS-extra/emboss-qg.ps}.
2263\item[The \EMBOSS\ Tutorial]A tutorial to give an introduction to
2264using \EMBOSS\ for bioinformatics
2265users.\URL{http://www.hgmp.mrc.ac.uk/Registered/Option/emboss.html}
2266\item[The updated ABC guide]This is a series of bioinformatics
2267practicals based predominantly on
2268\EMBOSS.\URL{ftp://ftp.no.embnet.org/pub/ABC}
2269\item[EMBOSS-FreeBSD-HOWTO]Detailed documentation on installation of
2270\EMBOSS\ on
2271FreeBSD.\URL{ftp://ftp.no.embnet.org/pub/EMBOSS-extra/EMBOSS-FreeBSD-HOWTO}
2272\end{description}
2273
2274\section{Maintainance of your \EMBOSS\ installation}
2275
2276\EMBOSS\ is a rapidly evolving software packages. It is constantly
2277being improved, new features added and `issues' resolved. In addition
2278there are new applications added and you probably want to make use of
2279these.
2280
2281\subsection{Automated installation of \EMBOSS\ and EMBASSY}
2282
2283Once you have installed \EMBOSS\ and got it to work you have solved
2284the hardest part of the struggle. Updating \EMBOSS\ as new releases
2285appear\footnote{\EMBOSS\ is rebuilt nightly from CVS, tested, and,
2286assuming it passes the compilation tests, the latest version is posted
2287to the \EMBOSS\ FTP server. } can be quite tedious. UNIX is designed
2288for the lazy, so here is our lazy man's guide to always having an up to
2289the minute \EMBOSS\ installation.
2290
2291The following script can be run manually (it should probably be
2292`\ilcomm{source}d' rather than executed directly) or can be fired off
2293with cron (in the early hours of the morning is a good time). It
2294assumes you are installing \EMBOSS\ outside the source directory and
2295have write permissions to do so.
2296
2297\EMBOSS\ will update \EMBOSS\ distributed files but will not alter or
2298overwrite your own datafiles\footnote{Assuming of course that you
2299haven't overwritten \EMBOSS\ datafiles with your own to begin with.}
2300or your \filename{emboss.default}.
2301
2302\begin{verbatim}
2303
2304# This script should be sourced, not run.
2305# EMBOSS UPDATE.
2306# it assumes \$packages_dir/EMBOSS is a symbolic link to
2307# \$mirror_dir/emboss.open-bio.org/pub/EMBOSS
2308#
2309
2310#site specific variables: season according to taste..
2311
2312set mirror_dir=('/ftp/mirrors')
2313set packages_dir=('/site/newprog')
2314set emboss_config_options=\
2315('--prefix=/site/prog/emboss --with-pngdriver=/site/lib')
2316
2317# Now the script proper
2318
2319set oldpwd=`pwd`
2320
2321cd \$mirror_dir
2322echo 'updating EMBOSS'
2323if ( `wget -m 'ftp://emboss.open-bio.org/pub/EMBOSS' |& \
2324  tail -1 | awk '/^Downloaded:/{print \$5}'` != "0" ) then
2325
2326    cd \${packages_dir}/EMBOSS
2327    echo 'new EMBOSS programs found .. installing'
2328    set latest_emboss=`ls -t EMBOSS*|head -1`
2329
2330    cd \$packages_dir
2331    rm -Rf EMBOSS-*
2332    tar zxf EMBOSS/\$latest_emboss
2333    set emboss_dir=`ls -dt EMBOSS-*[^z]|head -1`
2334
2335#the next line is necessary on our system but may not be for yours.
2336    setenv LD_LIBRARYN32_PATH /site/lib
2337
2338    cd \$emboss_dir
2339
2340# If you have any site specific changes to the source code
2341# that you want to include, copy them in here
2342
2343    ./configure \$emboss_config_options &&\
2344    make && \
2345    make install
2346
2347#Now unpack and build EMBASSY
2348
2349    mkdir embassy
2350    cd embassy
2351
2352#Unpack and build each package one at a time
2353
2354    foreach embassadir ( `ls ../../EMBOSS/*gz |grep -v E
2355MBOSS-` )
2356
2357	tar zxf \$embassadir
2358	set embassadir_arch=\$embassadir:t
2359	set embassadir_root=\$embassadir_arch:r
2360
2361	cd \$embassadir_root:r
2362	./configure  \$emboss_config_options &&\
2363	make && \
2364	make install
2365
2366	cd ..
2367    end
2368else
2369    echo 'No new version of EMBOSS available'
2370endif
2371
2372cd \$oldpwd
2373\end{verbatim}
2374
2375\subsection{Automated database updating}
2376
2377In the same way, scripts can be written to automatically update the
2378biological databases. An example is given here for REBASE. As all the
2379parameters for \EMBOSS\ programs can be specified on the command line
2380it is a trivial matter to include index generation in your nightly
2381update scripts. The management of a bioinformatic resource is beyond
2382the scope of this document, though \EMBOSS\ goes a long way towards
2383easing the burden of management.
2384
2385\subsubsection{Automated update of REBASE}
2386
2387This script will look for a new version of REBASE and install it in
2388\EMBOSS\ using \progname{rebaseextract}.
2389
2390\begin{verbatim}
2391# This script should be sourced, not run.
2392# REBASE UPDATE. Should be run just after the beginning of the month.
2393set mirrors_dir=('/ftp/mirrors')
2394set oldpwd=`pwd`
2395
2396cd \$mirrors_dir
2397
2398if ( ` wget -m 'ftp://ftp.ebi.ac.uk/pub/databases/rebase/*' |& \
2399  tail -1 | awk '/^Downloaded:/{print \$5}'` != "0" ) then
2400	cd ftp.ebi.ac.uk/pub/databases/rebase
2401	cp `ls -t withrefm.*.Z|head -1` withrefm.Z
2402	uncompress withrefm.Z
2403	rebaseextract \
2404  \${mirrors_dir}/ftp.ebi.ac.uk/pub/databases/rebase/withrefm
2405	rm withrefm
2406endif
2407
2408cd \$oldpwd
2409\end{verbatim}
2410
2411We make no guarantees that these scripts will work correctly on your
2412system. If it deletes all your files, spams your associates, scratches
2413your CD's and initiates a nuclear strike on a small unpopulated
2414pacific island it is NOT OUR FAULT.  It just happens to work for us.
2415
2416\chapter{GNU Free Documentation License}
2417
2418\begin{verbatim}
2419		GNU Free Documentation License
2420		   Version 1.1, March 2000
2421
2422 Copyright (C) 2000  Free Software Foundation, Inc.
2423     59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
2424 Everyone is permitted to copy and distribute verbatim copies
2425 of this license document, but changing it is not allowed.
2426
2427
24280. PREAMBLE
2429
2430The purpose of this License is to make a manual, textbook, or other
2431written document "free" in the sense of freedom: to assure everyone
2432the effective freedom to copy and redistribute it, with or without
2433modifying it, either commercially or noncommercially.  Secondarily,
2434this License preserves for the author and publisher a way to get
2435credit for their work, while not being considered responsible for
2436modifications made by others.
2437
2438This License is a kind of "copyleft", which means that derivative
2439works of the document must themselves be free in the same sense.  It
2440complements the GNU General Public License, which is a copyleft
2441license designed for free software.
2442
2443We have designed this License in order to use it for manuals for free
2444software, because free software needs free documentation: a free
2445program should come with manuals providing the same freedoms that the
2446software does.  But this License is not limited to software manuals;
2447it can be used for any textual work, regardless of subject matter or
2448whether it is published as a printed book.  We recommend this License
2449principally for works whose purpose is instruction or reference.
2450
2451
24521. APPLICABILITY AND DEFINITIONS
2453
2454This License applies to any manual or other work that contains a
2455notice placed by the copyright holder saying it can be distributed
2456under the terms of this License.  The "Document", below, refers to any
2457such manual or work.  Any member of the public is a licensee, and is
2458addressed as "you".
2459
2460A "Modified Version" of the Document means any work containing the
2461Document or a portion of it, either copied verbatim, or with
2462modifications and/or translated into another language.
2463
2464A "Secondary Section" is a named appendix or a front-matter section of
2465the Document that deals exclusively with the relationship of the
2466publishers or authors of the Document to the Document's overall subject
2467(or to related matters) and contains nothing that could fall directly
2468within that overall subject.  (For example, if the Document is in part a
2469textbook of mathematics, a Secondary Section may not explain any
2470mathematics.)  The relationship could be a matter of historical
2471connection with the subject or with related matters, or of legal,
2472commercial, philosophical, ethical or political position regarding
2473them.
2474
2475The "Invariant Sections" are certain Secondary Sections whose titles
2476are designated, as being those of Invariant Sections, in the notice
2477that says that the Document is released under this License.
2478
2479The "Cover Texts" are certain short passages of text that are listed,
2480as Front-Cover Texts or Back-Cover Texts, in the notice that says that
2481the Document is released under this License.
2482
2483A "Transparent" copy of the Document means a machine-readable copy,
2484represented in a format whose specification is available to the
2485general public, whose contents can be viewed and edited directly and
2486straightforwardly with generic text editors or (for images composed of
2487pixels) generic paint programs or (for drawings) some widely available
2488drawing editor, and that is suitable for input to text formatters or
2489for automatic translation to a variety of formats suitable for input
2490to text formatters.  A copy made in an otherwise Transparent file
2491format whose markup has been designed to thwart or discourage
2492subsequent modification by readers is not Transparent.  A copy that is
2493not "Transparent" is called "Opaque".
2494
2495Examples of suitable formats for Transparent copies include plain
2496ASCII without markup, Texinfo input format, LaTeX input format, SGML
2497or XML using a publicly available DTD, and standard-conforming simple
2498HTML designed for human modification.  Opaque formats include
2499PostScript, PDF, proprietary formats that can be read and edited only
2500by proprietary word processors, SGML or XML for which the DTD and/or
2501processing tools are not generally available, and the
2502machine-generated HTML produced by some word processors for output
2503purposes only.
2504
2505The "Title Page" means, for a printed book, the title page itself,
2506plus such following pages as are needed to hold, legibly, the material
2507this License requires to appear in the title page.  For works in
2508formats which do not have any title page as such, "Title Page" means
2509the text near the most prominent appearance of the work's title,
2510preceding the beginning of the body of the text.
2511
2512
25132. VERBATIM COPYING
2514
2515You may copy and distribute the Document in any medium, either
2516commercially or noncommercially, provided that this License, the
2517copyright notices, and the license notice saying this License applies
2518to the Document are reproduced in all copies, and that you add no other
2519conditions whatsoever to those of this License.  You may not use
2520technical measures to obstruct or control the reading or further
2521copying of the copies you make or distribute.  However, you may accept
2522compensation in exchange for copies.  If you distribute a large enough
2523number of copies you must also follow the conditions in section 3.
2524
2525You may also lend copies, under the same conditions stated above, and
2526you may publicly display copies.
2527
2528
25293. COPYING IN QUANTITY
2530
2531If you publish printed copies of the Document numbering more than 100,
2532and the Document's license notice requires Cover Texts, you must enclose
2533the copies in covers that carry, clearly and legibly, all these Cover
2534Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
2535the back cover.  Both covers must also clearly and legibly identify
2536you as the publisher of these copies.  The front cover must present
2537the full title with all words of the title equally prominent and
2538visible.  You may add other material on the covers in addition.
2539Copying with changes limited to the covers, as long as they preserve
2540the title of the Document and satisfy these conditions, can be treated
2541as verbatim copying in other respects.
2542
2543If the required texts for either cover are too voluminous to fit
2544legibly, you should put the first ones listed (as many as fit
2545reasonably) on the actual cover, and continue the rest onto adjacent
2546pages.
2547
2548If you publish or distribute Opaque copies of the Document numbering
2549more than 100, you must either include a machine-readable Transparent
2550copy along with each Opaque copy, or state in or with each Opaque copy
2551a publicly-accessible computer-network location containing a complete
2552Transparent copy of the Document, free of added material, which the
2553general network-using public has access to download anonymously at no
2554charge using public-standard network protocols.  If you use the latter
2555option, you must take reasonably prudent steps, when you begin
2556distribution of Opaque copies in quantity, to ensure that this
2557Transparent copy will remain thus accessible at the stated location
2558until at least one year after the last time you distribute an Opaque
2559copy (directly or through your agents or retailers) of that edition to
2560the public.
2561
2562It is requested, but not required, that you contact the authors of the
2563Document well before redistributing any large number of copies, to give
2564them a chance to provide you with an updated version of the Document.
2565
2566
25674. MODIFICATIONS
2568
2569You may copy and distribute a Modified Version of the Document under
2570the conditions of sections 2 and 3 above, provided that you release
2571the Modified Version under precisely this License, with the Modified
2572Version filling the role of the Document, thus licensing distribution
2573and modification of the Modified Version to whoever possesses a copy
2574of it.  In addition, you must do these things in the Modified Version:
2575
2576A. Use in the Title Page (and on the covers, if any) a title distinct
2577   from that of the Document, and from those of previous versions
2578   (which should, if there were any, be listed in the History section
2579   of the Document).  You may use the same title as a previous version
2580   if the original publisher of that version gives permission.
2581B. List on the Title Page, as authors, one or more persons or entities
2582   responsible for authorship of the modifications in the Modified
2583   Version, together with at least five of the principal authors of the
2584   Document (all of its principal authors, if it has less than five).
2585C. State on the Title page the name of the publisher of the
2586   Modified Version, as the publisher.
2587D. Preserve all the copyright notices of the Document.
2588E. Add an appropriate copyright notice for your modifications
2589   adjacent to the other copyright notices.
2590F. Include, immediately after the copyright notices, a license notice
2591   giving the public permission to use the Modified Version under the
2592   terms of this License, in the form shown in the Addendum below.
2593G. Preserve in that license notice the full lists of Invariant Sections
2594   and required Cover Texts given in the Document's license notice.
2595H. Include an unaltered copy of this License.
2596I. Preserve the section entitled "History", and its title, and add to
2597   it an item stating at least the title, year, new authors, and
2598   publisher of the Modified Version as given on the Title Page.  If
2599   there is no section entitled "History" in the Document, create one
2600   stating the title, year, authors, and publisher of the Document as
2601   given on its Title Page, then add an item describing the Modified
2602   Version as stated in the previous sentence.
2603J. Preserve the network location, if any, given in the Document for
2604   public access to a Transparent copy of the Document, and likewise
2605   the network locations given in the Document for previous versions
2606   it was based on.  These may be placed in the "History" section.
2607   You may omit a network location for a work that was published at
2608   least four years before the Document itself, or if the original
2609   publisher of the version it refers to gives permission.
2610K. In any section entitled "Acknowledgements" or "Dedications",
2611   preserve the section's title, and preserve in the section all the
2612   substance and tone of each of the contributor acknowledgements
2613   and/or dedications given therein.
2614L. Preserve all the Invariant Sections of the Document,
2615   unaltered in their text and in their titles.  Section numbers
2616   or the equivalent are not considered part of the section titles.
2617M. Delete any section entitled "Endorsements".  Such a section
2618   may not be included in the Modified Version.
2619N. Do not retitle any existing section as "Endorsements"
2620   or to conflict in title with any Invariant Section.
2621
2622If the Modified Version includes new front-matter sections or
2623appendices that qualify as Secondary Sections and contain no material
2624copied from the Document, you may at your option designate some or all
2625of these sections as invariant.  To do this, add their titles to the
2626list of Invariant Sections in the Modified Version's license notice.
2627These titles must be distinct from any other section titles.
2628
2629You may add a section entitled "Endorsements", provided it contains
2630nothing but endorsements of your Modified Version by various
2631parties--for example, statements of peer review or that the text has
2632been approved by an organization as the authoritative definition of a
2633standard.
2634
2635You may add a passage of up to five words as a Front-Cover Text, and a
2636passage of up to 25 words as a Back-Cover Text, to the end of the list
2637of Cover Texts in the Modified Version.  Only one passage of
2638Front-Cover Text and one of Back-Cover Text may be added by (or
2639through arrangements made by) any one entity.  If the Document already
2640includes a cover text for the same cover, previously added by you or
2641by arrangement made by the same entity you are acting on behalf of,
2642you may not add another; but you may replace the old one, on explicit
2643permission from the previous publisher that added the old one.
2644
2645The author(s) and publisher(s) of the Document do not by this License
2646give permission to use their names for publicity for or to assert or
2647imply endorsement of any Modified Version.
2648
2649
26505. COMBINING DOCUMENTS
2651
2652You may combine the Document with other documents released under this
2653License, under the terms defined in section 4 above for modified
2654versions, provided that you include in the combination all of the
2655Invariant Sections of all of the original documents, unmodified, and
2656list them all as Invariant Sections of your combined work in its
2657license notice.
2658
2659The combined work need only contain one copy of this License, and
2660multiple identical Invariant Sections may be replaced with a single
2661copy.  If there are multiple Invariant Sections with the same name but
2662different contents, make the title of each such section unique by
2663adding at the end of it, in parentheses, the name of the original
2664author or publisher of that section if known, or else a unique number.
2665Make the same adjustment to the section titles in the list of
2666Invariant Sections in the license notice of the combined work.
2667
2668In the combination, you must combine any sections entitled "History"
2669in the various original documents, forming one section entitled
2670"History"; likewise combine any sections entitled "Acknowledgements",
2671and any sections entitled "Dedications".  You must delete all sections
2672entitled "Endorsements."
2673
2674
26756. COLLECTIONS OF DOCUMENTS
2676
2677You may make a collection consisting of the Document and other documents
2678released under this License, and replace the individual copies of this
2679License in the various documents with a single copy that is included in
2680the collection, provided that you follow the rules of this License for
2681verbatim copying of each of the documents in all other respects.
2682
2683You may extract a single document from such a collection, and distribute
2684it individually under this License, provided you insert a copy of this
2685License into the extracted document, and follow this License in all
2686other respects regarding verbatim copying of that document.
2687
2688
26897. AGGREGATION WITH INDEPENDENT WORKS
2690
2691A compilation of the Document or its derivatives with other separate
2692and independent documents or works, in or on a volume of a storage or
2693distribution medium, does not as a whole count as a Modified Version
2694of the Document, provided no compilation copyright is claimed for the
2695compilation.  Such a compilation is called an "aggregate", and this
2696License does not apply to the other self-contained works thus compiled
2697with the Document, on account of their being thus compiled, if they
2698are not themselves derivative works of the Document.
2699
2700If the Cover Text requirement of section 3 is applicable to these
2701copies of the Document, then if the Document is less than one quarter
2702of the entire aggregate, the Document's Cover Texts may be placed on
2703covers that surround only the Document within the aggregate.
2704Otherwise they must appear on covers around the whole aggregate.
2705
2706
27078. TRANSLATION
2708
2709Translation is considered a kind of modification, so you may
2710distribute translations of the Document under the terms of section 4.
2711Replacing Invariant Sections with translations requires special
2712permission from their copyright holders, but you may include
2713translations of some or all Invariant Sections in addition to the
2714original versions of these Invariant Sections.  You may include a
2715translation of this License provided that you also include the
2716original English version of this License.  In case of a disagreement
2717between the translation and the original English version of this
2718License, the original English version will prevail.
2719
2720
27219. TERMINATION
2722
2723You may not copy, modify, sublicense, or distribute the Document except
2724as expressly provided for under this License.  Any other attempt to
2725copy, modify, sublicense or distribute the Document is void, and will
2726automatically terminate your rights under this License.  However,
2727parties who have received copies, or rights, from you under this
2728License will not have their licenses terminated so long as such
2729parties remain in full compliance.
2730
2731
273210. FUTURE REVISIONS OF THIS LICENSE
2733
2734The Free Software Foundation may publish new, revised versions
2735of the GNU Free Documentation License from time to time.  Such new
2736versions will be similar in spirit to the present version, but may
2737differ in detail to address new problems or concerns.  See
2738http://www.gnu.org/copyleft/.
2739
2740Each version of the License is given a distinguishing version number.
2741If the Document specifies that a particular numbered version of this
2742License "or any later version" applies to it, you have the option of
2743following the terms and conditions either of that specified version or
2744of any later version that has been published (not as a draft) by the
2745Free Software Foundation.  If the Document does not specify a version
2746number of this License, you may choose any version ever published (not
2747as a draft) by the Free Software Foundation.
2748
2749
2750ADDENDUM: How to use this License for your documents
2751
2752To use this License in a document you have written, include a copy of
2753the License in the document and put the following copyright and
2754license notices just after the title page:
2755
2756      Copyright (c)  YEAR  YOUR NAME.
2757      Permission is granted to copy, distribute and/or modify this document
2758      under the terms of the GNU Free Documentation License, Version 1.1
2759      or any later version published by the Free Software Foundation;
2760      with the Invariant Sections being LIST THEIR TITLES, with the
2761      Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
2762      A copy of the license is included in the section entitled "GNU
2763      Free Documentation License".
2764
2765If you have no Invariant Sections, write "with no Invariant Sections"
2766instead of saying which ones are invariant.  If you have no
2767Front-Cover Texts, write "no Front-Cover Texts" instead of
2768"Front-Cover Texts being LIST"; likewise for Back-Cover Texts.
2769
2770If your document contains nontrivial examples of program code, we
2771recommend releasing these examples in parallel under your choice of
2772free software license, such as the GNU General Public License,
2773to permit their use in free software.
2774\end{verbatim}
2775
2776\chapter{Acknowledgements}
2777
2778The acknowledgements and credits are found at the front of this guide
2779because no one ever reads them if they are at the back.
2780
2781\end{document}
2782
2783
2784
2785