1\documentclass{report} 2\usepackage{verbatim} 3\usepackage{emboss} 4 5\begin{document} 6\title{The \EMBOSS\ Administrator's Guide} 7\author{David Martin, EMBnet Norway \\ 8Peter Rice, LION Bioscience \\ 9Alan Bleasby, HGMP (EMBnet UK)} 10\date{This guide relates to \EMBOSS\ 2.5.0} 11 12\maketitle 13 14Copyright (c) 2000, 2002 David Martin, Peter Rice, Alan Bleasby. 15 16Permission is granted to copy, distribute and/or modify this document 17under the terms of the GNU Free Documentation 18License\URL{http://www.gnu.org/copyleft/fdl.html}, Version 1.1 or any 19later version published by the Free Software Foundation; with no 20Invariant Sections, with no Front-Cover Texts, and with no Back-Cover 21Texts. A copy of the license is included in the chapter entitled "GNU 22Free Documentation License". 23 24\tableofcontents 25 26\chapter{Introduction} 27\section{About this document} 28This guide has been written to assist system administrators and 29developers with the installation and configuration of \EMBOSS. If you 30are reading this to find out how to do bioinformatics then you are 31wasting your time. You are referred instead to the Resources chapter 32below where there is a list of more relevant literature and web sites. 33Experienced users may find this document useful for configuring their 34own databases and customising their \EMBOSS\ experience. 35 36 37\subsection{Credits} 38The original author of this guide was David 39Martin\URL{damartin\@@hgmp.mrc.ac.uk} at the Norwegian EMBnet 40node.\URL{http://www.no.embnet.org} It is however the result of a team 41effort. Thanks are due in particular to Johann Visagie for the FreeBSD 42information. Other contributors are acknowledged in the text. 43 44\subsection{Reproduction} 45The obligatory bit of legalese. The first version of this guide was 46not in the public domain but has been released under the GNU Free 47Documentation License by the original author. 48 49Although 'Free' in this license is usually explained as 'free as in 50freedom, not as in beer' the authors are likely to appreciate offers 51of free drinks should you ever meet them. 52\section{What is \EMBOSS?} 53 54\EMBOSS\ is a freely available suite of bioinformatics applications 55and libraries. It can be downloaded via the internet, copied, 56customised, and passed on under the terms of the various General 57Public Licenses. \EMBOSS\ has been developed in response to the need 58for a powerful, adaptable suite of software that can interface readily 59with many different situations and meet the need of professional 60bioinformaticists, particularly those needing high throughput and/or 61scriptable capabilities. 62 63\EMBOSS\ has primarily been developed by those responsible for the 64public extensions to the GCG package. \EMBOSS\ supercedes much of EGCG 65and includes far better database interaction. \EMBOSS\ also has the 66benefit of freely accessible source code so novel applications can be 67developed rapidly and at minimal cost. 68 69\EMBOSS\ is currently only available for Unix/Linux systems but it has 70been known to compile and run on Windows NT. This document will only 71consider the UNIX version and will assume the reader has some 72familiarity with UNIX system administration. 73 74\subsection{Where to get it?} 75 76\EMBOSS\ is available for download from the primary site at Open-Bio 77by anonymous ftp.\URL{ftp://emboss.open-bio.org/pub/EMBOSS/} This 78directory contains the \EMBOSS\ package and several associated 79packages (collectively known as EMBASSY) that are distributed with 80\EMBOSS. Download these to a suitable location. Documentation is 81available on the WWW at the \EMBOSS\ web 82site.\URL{http://emboss.sf.net/} 83 84FreeBSD distributions from 4.2 onwards now include \EMBOSS\ as an 85optional package maintained by Johann 86Visagie.\URL{johann\@@egenetics.com} Please see section 87\ref{sec:FreeBSD} for more information on installation on FreeBSD. 88 89\chapter{Installation} 90\section{Retrieving \EMBOSS\ by anonymous ftp} 91\subsection{Interactive FTP} 92 93Change directory to the location in which you wish to download the 94\EMBOSS\ source code. In this example we will download the source to 95\filename{/packages/EMBOSS}. Then start your ftp client and point it 96to emboss.open-bio.org. 97 98\begin{verbatim} 99% ftp emboss.open-bio.org 100Connected to emboss.open-bio.org. 101220 (vsFTPd 2.0.1) 102530 Please login with USER and PASS. 103530 Please login with USER and PASS. 104KERBEROS_V4 rejected as an authentication type 105Name (emboss.open-bio.org:someuser): 106\end{verbatim} 107 108We are using anonymous FTP so type the username \ilcomm{anonymous}. 109 110\begin{verbatim} 111Name (emboss.open-bio.org:someuser): anonymous 112331 Guest login ok, send your complete e-mail address as password. 113Password: 114\end{verbatim} 115 116Enter your email address here as the password for user \filename{anonymous}. 117 118\begin{verbatim} 119Password: 120230 Login successful. 121Remote system type is UNIX. 122Using binary mode to transfer files. 123ftp> 124\end{verbatim} 125 126Move to the \EMBOSS\ directory and list the files. The output has been 127truncated a little to save space. 128 129\begin{verbatim} 130ftp> cd /pub/EMBOSS 131ftp> ls 132200 PORT command successful. 133150 Opening BINARY mode data connection for /bin/ls. 134total 22334 135... 1024 May 26 20:17 .gnu 136... 9079913 May 14 21:37 EMBOSS-2.5.0.tar.gz 137... 19 May 14 21:37 EMBOSS-latest.tar.gz -> EMBOSS-2.5.0.tar.gz 138... 196872 May 12 18:49 EMNU-1.0.5.tar.gz 139... 231485 May 15 13:55 ESIM4-1.0.0.tar.gz 140... 405620 May 12 18:49 HMMER-2.1.1.tar.gz 141... 1024 Jul 25 08:54 Jemboss 142... 264189 May 12 18:49 MEME-2.3.1.tar.gz 143... 251061 Jul 9 19:01 MSE-0.0.4.tar.gz 144... 694450 May 12 18:49 PHYLIP-3.573c.tar.gz 145... 200490 May 12 18:49 TOPO-0.1.tar.gz 146... 1536 Jul 9 19:01 old 147... 512 Jun 27 14:40 patchfiles 148... 512 Feb 22 15:19 tutorials 149226 Transfer complete. 150ftp> 151\end{verbatim} 152 153Now download the source files 154 155\begin{verbatim} 156ftp> get EMBOSS-latest.tar.gz 157200 PORT command successful. 158150 Opening BINARY mode data connection for EMBOSS-latest.tar.gz 159(9079913 bytes). 160... 161ftp> 162\end{verbatim} 163 164And repeat for each file. Or use \ilcomm{mget *gz} to download all the 165files at once. Exit your ftp session with the command \ilcomm{bye}. 166 167\subsection{FTP using \progname{wget}} 168The program \progname{wget} can be used to download a remote directory 169noninteractively. More details on \progname{wget} can be obtained from 170the Free Software Foundation.\URL{http://www.gnu.org} Assuming you 171have \progname{wget} installed, use the following command which 172generates a lot of output on the screen: 173 174\begin{verbatim} 175% wget -m 'ftp://emboss.open-bio.org/pub/EMBOSS' 176--15:04:41-- ftp://emboss.open-bio.org:21/pub/EMBOSS 177 => `emboss.open-bio.org/pub/.listing' 178Connecting to emboss.open-bio.org:21... connected! 179Logging in as anonymous ... Logged in! 180==> TYPE I ... done. ==> CWD pub ... done. 181==> PORT ... done. ==> LIST ... done. 182 183... 184many pages truncated 185... 186 187FINISHED --15:04:55-- 188Downloaded: 2,657,366 bytes in 4 files 189\end{verbatim} 190 191A new directory \filename{emboss.open-bio.org} has been created and 192EMBOSS can be found at \filename{emboss.open-bio.org/pub/EMBOSS}. You 193may wish to create a symbolic link to this from your 194\filename{/packages} directory for convenience. 195 196 197\section{Unpacking} 198 199You will have downloaded the \EMBOSS\ and EMBASSY packages to a 200suitable directory. For this example we will assume you have 201downloaded them to \filename{/packages} so you should now have the 202following files (or similar) and maybe more packages in EMBASSY. 203 204\begin{verbatim} 205% ls 206EMBOSS-latest.tar.gz 207EMNU-1.0.5.tar.gz 208ESIM4-1.0.0.tar.gz 209HMMER-2.1.1.tar.gz 210MEME-2.3.1.tar.gz 211MSE-0.0.4.tar.gz 212PHYLIP-3.573c.tar.gz 213TOPO-0.1.tar.gz 214\end{verbatim} 215 216First unpack the \EMBOSS\ distribution 217 218\begin{verbatim} 219% gunzip EMBOSS-latest.tar.gz 220% tar xf EMBOSS-latest.tar 221\end{verbatim} 222 223This will create a new directory, \filename{EMBOSS-2.5.0} or 224similar. You may wish to use \ilcomm{tar xpf} for unpacking \EMBOSS. 225 226Enter the \EMBOSS\ directory 227 228\begin{verbatim} 229% cd EMBOSS-2.5.0 230\end{verbatim} 231 232create a directory for the EMBASSY packages 233 234\begin{verbatim} 235% mkdir embassy 236\end{verbatim} 237 238Now move the EMBASSY packages to the EMBASSY directory 239 240\begin{verbatim} 241% mv ../MSE-0.0.4.tar.gz PHYLIP-3.573c.tar.gz \ 242 TOPO-0.1.tar.gz embassy 243\end{verbatim} 244 245Go into the EMBASSY directory and unpack those packages. 246 247\begin{verbatim} 248% cd embassy 249 250% gunzip MSE-0.0.4.tar.gz 251% tar xf MSE-0.0.4.tar 252\end{verbatim} 253 254and so on for each EMBASSY package. 255 256Go back up one directory to the main \EMBOSS\ package directory and 257prepare to start compilation. 258 259\section{Graphics Requirements} 260 261Depending on your system you may need to explicitly configure the 262graphics. EMBOSS includes the plplot graphics library and will link to 263X11 and the recent (non-GIF) releases of the gd graphics library which 264also require libz and libpng (and possibly libjpeg). Please see the 265section 'Configuring \EMBOSS\ graphics' below. 266 267To get PLPLOT to produce PNG images you will need to have the 268\filename{z}\URL{http://www.info-zip.org/pub/infozip/zlib/}, 269\filename{png}\URL{http://libpng.sourceforge.net/} and 270\filename{gd}\URL{http://www.boutell.com/gd/} libraries 271installed. \filename{gd} version $>=$ 1.8.4 is recommended. A recent 272release must be used as older versions support GIF which is NOT 273supported in later versions because of software patent problems. If 274for some reason you do not have the required libraries and your system 275support group will not update them for the system then install all 276three latest versions (\filename{z},\filename{gd},\filename{png}) to a 277new directory and then add this new directory to your configure line 278for \EMBOSS\ --- \verb+./configure --with-pngdriver=my_dir+ where the 279\filename{z}, \filename{png} and \filename{gd} libraries were each 280installed using \verb+./configure --prefix=my_dir+ 281 282??? It may also be helpful to ensure that the \ilcomm{LD\_LIBRARY\_PATH} 283environment variable is set appropriately to include the libraries in 284the path. ??? 285 286 GD) http://www.boutell.com/gd/ 287 Z) http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/ 288 PNG) http://www.mirror.ac.uk/sites/ftp.libpng.org/pub/png/libpng.html 289 290 These also list the various mirror sites for non UK people. 291 292 Alternatively, using ftp :- 293 294 GD) (boutell.com no longer allows FTP, no known mirror sites, use HTTP) 295 Z) ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz 296 PNG) ftp://swrinde.nde.swri.edu/pub/png/src/libpng.1.2.1.tar.gz 297 You can unpack the tar.gz files in any directory, and install them in 298 a common area. 299 300 By default everything (including EMBOSS) installs 301 in /usr/local but in the examples below we use /home/joe/local 302 303 Note: gd does not use a ./configure script, and will fail at the 304 "make install" stage if the installation directory does not have a 305 /bin subdirectory. You can create this directory 306 (e.g. /home/joe/local/bin) if it does not already exist. 307 308\subsection{zlib} 309 310Zlib is avilable from these sites: 311 312\filename{http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/} 313\URL{http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/} 314\filename{http://www.info-zip.org/pub/infozip/zlib/} 315\URL{http://www.info-zip.org/pub/infozip/zlib/} 316\filename{ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz} 317\URL{ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz} 318 319To install, pick up the sources and then: 320 321\begin{verbatim} 322% gunzip -c zlib-1_1_3_tar.gz | tar xf - 323% ln -s zlib-1.1.3 zlib 324% cd zlib 325% ./configure --prefix=/home/joe/local 326% make 327% make install 328% cd .. 329\end{verbatim} 330 331\subsection{libpng} 332 333Libpng is avilable from these sites: 334 335\URL{http://libpng.sourceforge.net/} 336\URL{http://www.mirror.ac.uk/sites/ftp.libpng.org/pub/png/libpng.html} 337\URL{ftp://swrinde.nde.swri.edu/pub/png/src/libpng.1.2.1.tar.gz} 338 339To install, pick up the sources and then: 340 341\begin{verbatim} 342% gunzip -c libpng-1_2_1_tar.gz | tar xf - 343% ln -s libpng-1.2.1 libpng 344% cd libpng 345% cp scripts/makefile.linux makefile 346\end{verbatim} 347 348Libpng has no configure script so you have to do some work by 349hand. Edit makefile, change prefix to be /home/joe/local and any 350other places - some files point to ../zlib others use 351/usr/local/lib and /usr/local/include. On HP-UX this is 352trickier. CFLAGS has to match the definition for zlib. 353 354Now build using the edited makefile: 355 356\begin{verbatim} 357% make 358% make install 359% cd .. 360\end{verbatim} 361 362 363\subsection{gd} 364 365Gd is available from these sites: 366 367\URL{http://www.boutell.com/gd/} 368 369There is no FTP server at this site. 370 371To install, pick up the sources, build zlib and libpng first, and then: 372 373\begin{verbatim} 374% gunzip -c gd-1.8.4.tar.gz | tar xf - 375% ln -s gd-1.8.4 gd 376% cd gd 377\end{verbatim} 378 379Now edit Makefile, change the definitions for INCLUDEDIRS, LIBDIRS, 380INSTALL\_LIB, INSTALL\_INCLUDE, INSTALL\_BIN, and change all 381\filename{/usr/local} to \filename{/home/joe/local} 382 383\begin{verbatim} 384% make 385% make install 386% cd .. 387\end{verbatim} 388 389If the gd "make install" fails with a warning about the "bin" 390directory, you need to create it by hand (see above). 391 392To compile with the local version your EMBOSS configure line should 393now read: 394 395\begin{verbatim} 396./configure --with-pngdriver=/home/joe/local 397\end{verbatim} 398 399This will look for the graphics libraries in your local installation 400under \filename{/home/joe/local} instead of a system-wide location 401 402configure keeps a copy of the previous settings. With earlier releases 403of EMBOSS, or as a developer with an earlier release of autoconf, you 404may need to delete files \filename{config.cache} and 405\filename{config.status} if configure has been run before. 406 407\section{Compilation} 408 409Building \EMBOSS\ is easy. It follows the usual GNU style of 410\ilcomm{./configure}, \ilcomm{make}, \ilcomm{make install}. We'll take 411these steps one at a time. 412 413\subsection{Configure} 414 415To accept the default configuration, just type \ilcomm{./configure} 416and let \EMBOSS\ get on with it. You may however want to make some 417changes to the configuration parameters according to your local 418policy. This section will not cover all the possibilities, just some 419of the more common. The configuration script will attempt to find the 420necessary components in your system to determine how to successfully 421build \EMBOSS. It typically expects the GNU C compiler (gcc) and 422several standard libraries that should already be part of your 423Unix/Linux system. \EMBOSS\ should configure, compile and run on most 424modern Linux distributions straight out of the box. 425 426 427\subsubsection{Installation directory} 428 429You need to have write permission on the directory in which you 430eventually wish to install \EMBOSS. You may also wish to put it 431somewhere else other than the standard location of 432\filename{/usr/local/emboss}. 433 434The installation directory is controlled by the \ilcomm{--prefix} 435argument. For example, you can have all third party applications owned 436by a non-privileged user and installed in a package specific directory 437under \filename{/site/prog} 438 439\begin{verbatim} 440% ./configure --prefix=/site/prog/emboss 441\end{verbatim} 442 443will install \EMBOSS\ under \filename{/site/prog/emboss}. The binaries 444will be installed in \filename{/site/prog/emboss/bin} with shared 445libraries installed in \filename{/site/prog/emboss/lib}. System wide 446data are installed in \filename{/site/prog/emboss/share/EMBOSS/data}, 447and the configuration files (ACD files) for the applications will be 448installed in \filename{/site/prog/emboss/share/EMBOSS/acd} (or for 449EMBASSY in directories corresponding to the package name.) 450Documentation is installed in 451\filename{/site/prog/emboss/share/EMBOSS/doc}. The installation 452directory should be specified using a full path otherwise interesting 453failures may occur. 454 455The individual directories for installation can be modified with other 456configuration commands but this is usually not necessary. Run 457\ilcomm{./configure --help} to get more information on the directories 458that can be changed and other configuration options. 459 460Run \ilcomm{./configure} with the options you wish to use. This may 461take a short time as various messages scroll up the screen. 462 463All should be well with this and configure should exit with a message 464like this: 465 466\begin{verbatim} 467... much output skipped 468 469creating ./config.status 470creating plplot/Makefile 471creating plplot/lib/Makefile 472creating nucleus/Makefile 473creating ajax/Makefile 474creating emboss/Makefile 475creating emboss/acd/Makefile 476creating test/Makefile 477creating test/data/Makefile 478creating test/embl/Makefile 479creating test/pir/Makefile 480creating test/swiss/Makefile 481creating test/swnew/Makefile 482creating test/wormpep/Makefile 483creating emboss/data/Makefile 484creating emboss/data/AAINDEX/Makefile 485creating emboss/data/CODONS/Makefile 486creating emboss/data/REBASE/Makefile 487creating emboss/data/PRINTS/Makefile 488creating emboss/data/PROSITE/Makefile 489creating Makefile 490\end{verbatim} 491 492Configuration is now complete. 493 494\subsubsection{Reconfiguration} 495 496If at first you don't succeed, try, try and try again. It is not 497uncommon to make typos or other mistakes when running 498\ilcomm{./configure}. If you want to run configure again you should 499run \ilcomm{make clean} before running \ilcomm{./configure} with 500(hopefully) the correct options. With an earlier EMBOSS release, or as 501a developer with an earlier release of autoconf, you must first delete 502the file \filename{config.cache} but this is no longer produced. 503 504\subsubsection{Configuring \EMBOSS\ graphics} 505 506The PLPLOT library can produce output to many devices but requires 507certain libraries that are NOT distributed with \EMBOSS 508 509To get X-windows based output you must have X installed, or else PLplot 510will not build the required driver. You may need to specify the 511location of your X-windows library with the configuration options: 512\ilcomm{--x-includes=DIR} (X include files are in DIR) 513\ilcomm{--x-libraries=DIR} (X library files are in DIR) 514 515To explicitly configure PLPLOT without X-windows, use \ilcomm{--without-x}. 516 517You can explicitly tell \EMBOSS\ to not include PNG support with 518\ilcomm{--without-pngdriver}. 519 520 You can tell if \ilcomm{./configure} has 521found a suitable PNG library by watching for something like the 522following when running \ilcomm{./configure}: 523 524\begin{verbatim} 525checking if png driver is wanted... yes 526checking for inflateEnd in -lz... (cached) yes 527checking for png_destroy_read_struct in -lpng... (cached) yes 528checking for gdImageCreateFromPng in -lgd... (cached) yes 529\end{verbatim} 530 531This means that the configuration script has located the PNG libraries 532on your system. If you see a message indicating that 533\ilcomm{./configure} could not find the libraries or that the version 534of \filename{gd} was too old then you should install the latest 535versions of the libraries yourself and rerun configure with the 536correct \ilcomm{--with-pngdriver} value. 537 538When you run an EMBOSS graphical application you can see the list of 539installed graph devices by giving '?' as the response to the 'Graph 540type' prompt. 541 542\subsection{Configuring for 64 bit systems} 543 544\EMBOSS\ configure looks for \progname{gcc} and uses this of 545preference when compiling \EMBOSS. This is not ideal for those who 546wish to have a compiled and linked 64bit version of \EMBOSS. The 547current version is NOT 64 bit clean (ie. it does not necessarily use 54864 bit representation internally) but will compile and run quite 549happily on 64 bit systems. 550 551Additional notes are appended below for the various operating systems 552we have information on. 553 554\subsubsection{IRIX 6.5.10} 555 556In order to compile for 64 bit on IRIX you have to specify the native 557compiler in 64 bit mode (\ilcomm{cc -64}) and the linker in 64 bit 558mode (\ilcomm{/bin/ld -64}). The following notes were provided by Jose 559Ramon Valverde\footnote{jrvalverde\@@cnb.uam.es}. 560 561 562{\it We have succeeded in compiling EMBOSS for IRIX using 64 bit 563compilation. 564 565It required some tweaking, but works. The recipe for those willing to 566give it a try is: } 567 568\begin{itemize} 569 \item remove '\filename{gcc}' from your path 570 \item define \filename{COMPILER\_DEFAULTS\_PATH} appropriately 571 (see \filename{pe\_environ}) to look for a 572 \filename{compiler.defaults} file containing 573 e.g. \ilcomm{:abi=64:isa=4:proc=r10k} 574 \item \ilcomm{./configure} in \EMBOSS\ and all EMBASSY subdirs 575 \item search in all files for '\ilcomm{CC = cc}' and 576 substitute it for '\ilcomm{CC = cc -64}' 577 \item same for '\ilcomm{LD = /bin/ld}' to '\ilcomm{LD = /bin/ld -64}' 578 \item \ilcomm{make} 579\end{itemize} 580 581{\it The reason is that compiling depends on the Makefile and on libtool, 582as well as linking. We didn't spend much in looking at configure since 583the above steps where so straightforward. We know we should look into 584the configure script and add an option for 64-bit-irix-compile or some 585such, but that'll have to wait till we have time for it. 586 587Yes, we know, the search and substitute thing looks tedious, but it 588isn't, honest: create a 'chfile.sh' out of the EMBOSS source hierarchy 589containing: } 590 591\begin{verbatim} 592#/bin/sh 593cp \$1 \$1.orig 594mv \$1 tmpfile 595sed -e 's/CC="cc"/CC="cc -64"/g' tmpfile | \ 596sed -e 's/CC = cc/CC = cc -64/g' | \ 597sed -e 's/\/bin\/ld/\/bin\/ld -64/g' \$1 598rm tmpfile 599## if you are sure, uncomment this 600#rm \$1.orig 601\end{verbatim} 602 603{\it 604'\ilcomm{cd}' to the \filename{emboss} directory and run} 605 606\begin{verbatim} 607 find . -type f -exec /path/to/chfile.sh {} \; -print 608\end{verbatim} 609 610{\it and you are done with the \progname{CC} 611changes. \progname{Libtool} requires special treatment since it uses 612quotes. } 613 614\subsection{Building \EMBOSS} 615 616Building \EMBOSS\ is a matter of typing '\ilcomm{make}' and going to 617find something else to do for the next ten minutes to half an hour 618depending on the speed of your system. \EMBOSS\ will first build the 619shared libraries (\filename{PL\_PLOT}, \filename{AJAX}, and 620\filename{NUCLEUS}) and then build the applications. 621 622You may see plenty of warnings (especially on SGI systems) complaining 623about libraries not being used to resolve any symbols. These can be 624safely ignored. 625 626If all goes according to plan you should have built \EMBOSS 627successfully. If not you will have to try to work out why the build 628failed. If you can't work it out yourself, send an email describing 629the problem to emboss-bug@emboss.open-bio.org preferably with a copy of the 630output from the installation. 631 632Assuming that compilation was successful, you can\footnote{You don't 633have to do this. You can leave \EMBOSS\ where it is and just add the 634path to the \filename{emboss} directory to your \ilcomm{PATH}} now 635type '\ilcomm{make install}'. After a few minutes and many pagefuls of 636messages, \EMBOSS\ should be installed where you specified in the 637\ilcomm{--prefix} option (or in the default location of 638\filename{/usr/local/emboss} if \ilcomm{--prefix} was not specified). 639 640\subsection{Post compilation setup} 641 642You will now need to make a few adjustments to your enviromnent to 643ensure that \EMBOSS\ runs smoothly. \EMBOSS\ looks for certain 644environment variables to determine where the libraries and data are 645found. These instructions assumed you installed \EMBOSS\ in 646\filename{/site/prog/emboss}. Adjust these instructions to suit your 647installation. Insert the following lines at the end of 648\filename{/etc/cshrc} (or \filename{~/.cshrc} for a personal 649installation) 650 651\begin{verbatim} 652setenv PLPLOT_LIB /site/prog/emboss/lib 653set path=( /site/prog/emboss/bin \${path} ) 654\end{verbatim} 655 656Or for bash/ksh/sh users, insert the following at the end of 657\filename{/etc/profile} or \filename{~/.bashrc} 658 659\begin{verbatim} 660PLPLOT_LIB=/site/prog/emboss/lib 661PATH=/site/prog/emboss/bin:\$PATH 662export PLPLOT_LIB PATH 663\end{verbatim} 664 665\EMBOSS\ should now be ready for use. 666 667\subsection{\EMBOSS\ data files} 668 669\EMBOSS\ will by default install the data files (including those 670installed with \progname{Rebaseextract}, \progname{Prosextract} 671\progname{Printsextract} \progname{Aaindexextract} or 672\progname{Cutgsextract}) in the default directory 673\filename{share/EMBOSS/data} in the install prefix directory. If 674\EMBOSS\ is not installed (for example, your own personal 675installation) the data files are written to \filename{emboss/data} in 676the directory where emboss was built. 677 678If you want to place your data files elsewhere, or have a separate set 679of datafiles you wish to use, you can set the \ilcomm{EMBOSS\_DATA} 680variable in \filename{emboss.default} or, for personal use, in your \filename{.embossrc} file. 681 682\subsection{Testing your \EMBOSS\ installation} 683 684You can test your \EMBOSS\ installation by trying the program 685'\ilcomm{wossname}' 686 687\begin{verbatim} 688% wossname -auto |more 689\end{verbatim} 690 691This should give a long list of programs that are available. Press 692space to page down through the list. This is just the \EMBOSS 693programs and doesn't include any of the EMBASSY programs, but only 694because they are not yet installed. (Note: Although wossname does have 695a -noembassy option this does not work with installed programs because 696wossname can no longer find any difference between EMBOSS and EMBASSY) 697 698\section{Installing EMBASSY} 699 700As well as the base libraries and standard EMBOSS distribution, 701various extra packages (EMBASSY) are distributed with EMBOSS. 702 703To install an EMBASSY package, go to the relevant directory. For 704example to install PHYLIP (which was unpacked into 705\filename{/packages/EMBOSS-2.5.0/embassy/PHYLIP-3.573c} earlier) go to 706the relevant directory. 707 708\begin{verbatim} 709% cd /packages/EMBOSS-2.5.0/embassy/PHYLIP-3.573c 710% ./configure --prefix=/site/prog/emboss 711... output not shown 712% make 713... output not shown 714% make install 715... output not shown 716\end{verbatim} 717 718Note. You {\bf MUST} use the same arguments for \ilcomm{./configure} 719that you used for the installation of the main \EMBOSS\ package. It 720may be necessary to add other options as required by individual 721packages (see below). 722 723Repeat as necessary for the other EMBASSY packages. It should also be 724noted that certain EMBASSY packages may require additional libraries. 725 726You should now find that running \progname{wossname} as before lists 727the EMBASSY programs. 728 729\subsection{EMBASSY package specific notes} 730 731In most cases, EMBASSY packages should build with no problems. Known 732problems are described below. 733 734\subsubsection{Packages with no known problems} 735So far \progname{ESIM4}, \progname{HMMER}, \progname{MEME}, 736\progname{MSE}, \progname{PHYLIP} and \progname{TOPO} appear to 737install without a problem using the same arguments to 738\ilcomm{configure}. 739 740\subsubsection{\progname{EMNU}} 741 742\progname{EMNU} requires the \filename{curses} or \filename{ncurses} libraries 743that come as standard on most Unix-like systems. In particular \progname{EMNU} 744requires two header files \filename{form.h} and \filename{menu.h} that are not 745distributed with all implementations. 746 747If your \filename{curses/ncurses} 748library is installed in a strange place then you may need to instruct 749\ilcomm{configure} with the option 750 751\begin{verbatim} 752--with-curses=/path/to/curses 753\end{verbatim} 754 755 756\section{Installing \EMBOSS\ in package format} 757\label{sec:FreeBSD} 758\EMBOSS\ can be installed on almost all Unix/Linux operating systems 759using the instructions above, but the package format can be far more 760convenient. A package is a precompiled set of binaries with 761installation instructions that can be set up on your system with a 762minimum of work. In some cases the package will check for the correct 763libraries and install those as necessary. 764 765Brief instructions are given here for the packages of which we are 766aware. These are maintained separately from the main source tree and 767may also install some files in operating system standard locations 768instead of the locations used by the `raw' \EMBOSS 769distribution. Please read the more detailed instructions that 770accompany each package. 771 772\subsection{Installing \EMBOSS\ on FreeBSD} 773 774A FreeBSD \EMBOSS\ package has been created by Johann 775Visagie\URL{johann\@@egenetics.com} of Electric Genetics. This will be 776distributed on the installation CD's and through the normal 777distribution channels from FreeBSD version 4.2 onwards. 778 779For the FreeBSD user with an up-to-date ports tree\footnote{FreeBSD 780users can update their ports tree through a variety of 781mechanisms. Please see the FreeBSD specific guide produced by Johann 782for more information}, installing \EMBOSS\ reduces to two simple 783commands (as root): 784 785\begin{verbatim} 786# cd /usr/ports/biology/emboss 787# make install 788\end{verbatim} 789 790The FreeBSD specific parts of the port are that 791\filename{emboss.default} is included with the other configuration 792files under \filename{/usr/local/etc} as 793\filename{emboss.default.sample}, and the \EMBOSS\ documentation is 794installed in \filename{/usr/local/share/doc/EMBOSS} instead of the 795default location. For further information on installation under 796FreeBSD you are referred to the Resources chapter. 797 798 799\chapter{Configuration} 800 801\EMBOSS\ can be readily configured to match your requirements. In a 802standard installation of \EMBOSS\ the configuration directives are 803looked for in the following locations and in the following search 804order: 805\begin{enumerate} 806\item A file \filename{emboss.default} in the \filename{share/EMBOSS} 807subdirectory of your \EMBOSS\ installation.\footnote{This location may 808have been redefined in installations of \EMBOSS\ that have been 809packaged for specific operating systems. See section \ref{sec:FreeBSD} 810for further information on OS specific package 811installations.}\footnote{\EMBOSS\ will also look in the 812\filename{emboss} directory under the \EMBOSS\ source distribution for 813\filename{emboss.default.template} and install this as 814\filename{emboss.default} if no existing file is found under the 815installation directory} 816\item A file \filename{.embossrc} in the directory specified by the 817\ilcomm{EMBOSSRC} environment variable. 818\item A file \filename{.embossrc} in the users home directory. 819\end{enumerate} 820\filename{emboss.default} and \filename{.embossrc} are plain text 821files that can readily be edited to suit.\footnote{A sample 822\filename{emboss.default} is located in \filename{emboss/acd} under 823the source distribution.} Redefinitions of configuration parameters 824will override those previously defined. In the descriptions that 825follow only \filename{.embossrc} will be mentioned but all directives 826can be placed in \filename{emboss.default} for site wide 827configuration. 828 829Several aspects of \EMBOSS\ can be defined. These are: 830\begin{itemize} 831\item\EMBOSS\ environment variables 832\item\EMBOSS\ databases 833\item Default behaviour of \EMBOSS\ programs 834\end{itemize} 835Databases are by far the most complex of these. 836 837\EMBOSS\ will ignore blank lines in the \filename{emboss.default} and 838\filename{.embossrc} files. It will also ignore any lines beginning 839with \ilcomm{\#} or \ilcomm{!} allowing comments to illuminate the 840declarations in the file. 841 842 843\section{\EMBOSS\ environment variables} 844 845\EMBOSS\ environment variables are set with an '\ilcomm{env}' or a 846'\ilcomm{set}' declaration. '\ilcomm{env}' and '\ilcomm{set}' are 847interchangeable. The most important environment variable is the 848location of the \filename{.acd} files that describe each program. 849 850\begin{verbatim} 851set emboss_acdroot /site/prog/emboss/share/EMBOSS/acd 852\end{verbatim} 853 854Environment variables are useful for simplifying maintenance of your 855\filename{.embossrc}. For example you may want to specify the location 856of your databases as an environment variable. Then if you move the 857databases you only have to update one line in the configuration file. 858 859\begin{verbatim} 860set emboss_database_dir /data/databases/flatfiles 861\end{verbatim} 862 863This would then be referred to later in \filename{.embossrc} as 864 865\begin{verbatim} 866\$emboss_database_dir/embl 867\end{verbatim} 868 869for the directory \filename{/data/databases/flatfiles/embl} 870 871\subsection{Configuring \EMBOSS\ differently for different groups of users} 872It may be the case that you have users who need to share a specific 873setup. Maybe to have access to different sets of databases or need to 874use a different data directory. 875 876It can be time consuming and error prone to maintain a series of 877individual \filename{.embossrc} files or to cause users to have to 878work in the same directory or to copy an \filename{.embossrc} to each 879directory they wish to work in. The environment variable 880\ilcomm{EMBOSSRC} can be set to point to an arbitrary directory 881containing an \filename{.embossrc} which can then be used to give 882workgroup specific configuration. Each user then only needs to set 883\ilcomm{EMBOSSRC} in their \filename{.cshrc} (\progname{csh}) or 884\filename{.profile} (\progname{bash}) to get the workgroup specific 885setup. 886 887In our case we have several groups of researchers for whom we maintain 888biological sequence databases. These databases have been made 889available under restrictive licenses so that we cannot allow 890researchers outside the groups to access the databases. Using 891\ilcomm{\$EMBOSSRC} we can set up a common configuration for the 892members of each group by defining the databases in the 893\filename{\$EMBOSSRC/.embossrc} file. 894 895 896\section{Databases} 897 898\subsection{Database access modes} 899 900\EMBOSS\ offers three modes for accessing databases: 901\begin{description} 902 903 \item[Single:]\EMBOSS\ retrieves a single sequence indexed by 904 ID. 905 906 \item[Query:]\EMBOSS\ retrieves a set of sequences 907 corresponding to a query that can return more than one entry, 908 including accession numbers or wildcard IDs. 909 910 \item[All:]\EMBOSS\ returns all the sequences in the database 911 in no particular order. 912 913\end{description} 914 915Each database definition can configure one or many of these modes for 916database access. 917 918Typically \EMBOSS\ uses variations on the \progname{emblcd} system of 919database indexing to provide rapid access in single and query modes to 920flat file databases. The \progname{emblcd} method is implemented in a 921variety of ways depending on the original format of your database. 922The \progname{emblcd} method assumes that you have one or both of ID 923and accession number in each record and that they are unique for the 924whole database index. \EMBOSS\ also provides methods for retrieving 925sequences via the WWW and three specific methods for interaction with 926SRS\URL{http://www.lionbioscience.com/solutions/srs} installed localy 927or through a remote public server. For other non flatfile databases 928or flat file databases in formats not currently supported by \EMBOSS 929you will have to configure an external application to retrieve 930sequences. 931 932\subsection{General database configuration.} 933 934Each database is configured using a DB declaration. 935 936The generalised form is 937 938\begin{verbatim} 939DB databasename [ 940 941Configuration options 942 943] 944\end{verbatim} 945 946The configuration options are tag/value pairs and must contain at 947least a description of the access method (using \ilcomm{method:} or 948one or more of \ilcomm{methodsingle:}, \ilcomm{methodquery:} and 949\ilcomm{methodall:}) and a description of the original format of the 950sequences (using \ilcomm{format:}). In addition to these tags there 951will be other tags that are needed for particular methods and other 952tags that are optional. 953 954\subsubsection{Database access methods} 955 956The scope of each method is: 957 958\begin{description} 959 960\item[Single mode - \ilcomm{s}] Supports retrieval of a single 961sequence. 962 963\item[Query mode - \ilcomm{q}] Supports retrieval of a subset of the 964sequences in the database specified using a wild card query in the 965USA\footnote{Please see the \EMBOSS\ documentation for description of 966Uniform Sequence Address format} 967 968\item[All mode - \ilcomm{a}] Supports retrieval of all sequences in 969the database as a stream of data. 970 971\end{description} 972 973An example entry for each access method is shown. 974 975\paragraph{APP}\par\noindent 976Modes: \ilcomm{a q s}\par\noindent 977APP is the same as EXTERNAL. 978 979\paragraph{BLAST}\par\noindent 980Modes: \ilcomm{a q s} \par\noindent BLAST uses EMBLCD indices created 981with \progname{dbiblast} to access databases in BLAST format, created 982with NCBI's \ilcomm{formatdb} program. 983 984Note that the latest 'format version 4' is not yet documented by 985NCBI. \EMBOSS\ will only work with 'format version 3' databases, indexed 986with: 987 988\begin{verbatim} 989formatdb -A F 990\end{verbatim} 991 992We hope to support 'format version 4' databases in future. If you pick 993up a blast database from NCBI (or elsewhere) check the format. If it 994is in the new format, you will need to pick up the original FASTA 995format file, and either index it yourself with formatdb, or run 996\ilcomm{dbifasta} and use the FASTA file in \EMBOSS\ (see EMBLCD 997access method) 998 999The definition should use format: ncbi because this is what the blast 1000formatdb databases store internally. 1001 1002\begin{verbatim} 1003DB mydb [ 1004#required parameters 1005 method: "blast" 1006 format: "ncbi" 1007 type: "N" 1008 dir: "\$emboss_db_dir/blas"t 1009#optional parameters 1010 fields: "sv des" 1011 release: "63.0" 1012 comment: "my comment" 1013 indexdir: "\$emboss_db_dir/blastindices"] 1014\end{verbatim} 1015 1016The index files can be kept in the same directory as the database, but 1017as each EMBLCD index needs its own directory (the filenames are fixed) 1018the indexdir is usually defined. 1019 1020The EMBLCD index files include the filenames indexed by 1021\ilcomm{dbiblast}. You can use the file: and exclude: attributes to 1022create file-specific subsets from a single \ilcomm{dbiblast} generated 1023index, but as blast index files are split only by the number of 1024entries this is not generally useful. 1025 1026If the database was indexed with additional fields, they can be 1027included in the definition as fields: to allow their use in USAs. 1028 1029\paragraph{DIRECT}\par\noindent 1030Modes: \ilcomm{a}\par\noindent Direct accesses the flatfile 1031directly. It returns all the database entries, one after the other. It 1032assumes no indexing. Queries are still possible as \EMBOSS\ will read 1033each entry and match it against the query, but are slow as the entire 1034database must be read. 1035 1036\begin{verbatim} 1037DB mydb [ 1038#required parameters 1039 method: "direct" 1040 format: "embl" 1041 type: "N" 1042 dir: "\$emboss_db_dir/mydb" 1043 file: "*.dat" 1044#optional parameters 1045 fields: "sv des key org" 1046 release: "63.0" 1047 comment: "My own database with no indices" 1048 exclude: "est*.dat" 1049] 1050\end{verbatim} 1051 1052For most cases, it is simpler to use \ilcomm{dbiflat} for EMBL, 1053Genbank or SwissProt format, or \ilcomm{dbifasta} to index FASTA or NCBI 1054format files, and to use the EMBLCD access method. 1055 1056If the file format supports additional fields, they can be 1057included in the definition as fields: to allow their use in USAs. 1058 1059\paragraph{EMBLCD}\par\noindent 1060Modes: \ilcomm{a q s}\par\noindent EMBLCD uses EMBLCD indices created 1061with \progname{dbiflat} or \progname{dbifasta} to access flatfile 1062databases in the original format. 1063 1064\begin{verbatim} 1065DB mydb [ 1066#required parameters 1067 method: "emblcd" 1068 format: "embl" 1069 type: "N" 1070 dir: "\$emboss_db_dir/emb"l 1071#optional parameters 1072 fields: "sv des key org" 1073 file: "*.dat" 1074 release: "63.0" 1075 comment: "my comment" 1076 exclude: "est*.dat" 1077 indexdir: "\$emboss_db_dir/indice"s 1078] 1079\end{verbatim} 1080 1081The EMBLCD index files include the filenames indexed by 1082\ilcomm{dbiflat} or \ilcomm{dbifasta}. You can use the file: and 1083exclude: attributes to create file-specific subsets from a single 1084index. 1085 1086This method can require careful setup. Please read the more specific 1087descriptions below. 1088 1089If the database was indexed with additional fields, they can be 1090included in the definition as fields: to allow their use in USAs. 1091 1092\paragraph{EXTERNAL}\par\noindent 1093Modes: \ilcomm{a q s}\par\noindent EXTERNAL uses an external 1094application to retrieve sequences. The ID is passed as an argument to 1095the application, either replacing \%s in the command string (if 1096present) or as an additional argument (if there is no \%s). 1097 1098EXTERNAL requires the application to return the sequence on STDOUT. If 1099the application writes to somewhere else, simply wrap it in a script 1100that copies the output to STDOUT. 1101 1102\begin{verbatim} 1103DB mydb [ 1104#required parameters 1105 method: "app" 1106 format: "fasta" 1107 type: "P" 1108 app: "getfromdb" 1109#optional parameters 1110 comment: "my own protein database with a custom retrieval program" 1111 app: "getfromdb mydatabase \%s" 1112] 1113\end{verbatim} 1114 1115The first app: definition will use the default call 'getfromdb mydb:id' 1116 1117The alternative app: definition will use the \%s format and call 1118'getfromdb mydatabase id' 1119 1120Both will pass either the ID or accession from the query, so that USAs 1121mydb-id:x13776 and mydb-acc:x13776 are equivalent. 1122 1123\paragraph{GCG}\par\noindent 1124Modes: \ilcomm{a q s}\par\noindent GCG uses EMBLCD indices created 1125with \progname{dbigcg} to access databases in GCG format. This method 1126uses the \filename{.ref} and \filename{.seq} files created by the 1127\progname{GCG} suite of programs. 1128 1129\begin{verbatim} 1130DB mygcgdb [ 1131#required parameters 1132 method: "gcg" 1133 format: "embl" 1134 type: "N" 1135 dir: "\$emboss_db_dir/gcgembl" 1136#optional parameters 1137 fields: "sv des key org" 1138 file: "*.seq" 1139 release: "63.0" 1140 comment: "my comment" 1141 exclude: "est*" 1142 indexdir: "\$emboss_db_dir/indices" 1143] 1144\end{verbatim} 1145 1146The EMBLCD index files include the filenames indexed by 1147\ilcomm{dbigcg}. You can use the file: and exclude: attributes to 1148create file-specific subsets from a single \ilcomm{dbigcg} generated 1149index. 1150 1151\paragraph{SRS}\par\noindent 1152Modes: \ilcomm{a q s}\par\noindent SRS returns entries from a local 1153installation of SRS using the -e switch to getz to return entries in 1154the original format. 1155 1156\begin{verbatim} 1157DB mydb [ 1158#required parameters 1159 method: "srs" 1160 format: "embl" 1161 type: "N" 1162#optional parameters 1163 dbalias: "embl" 1164 fields: "sv des key org" 1165 app: "getz" 1166 comment: "My srs indexed database" 1167 release: "63.0" 1168] 1169\end{verbatim} 1170 1171This access method builds an SRS commandline query to getz. If you 1172have getz installed under another name, define this as app: 1173 1174The SRS query by default uses the EMBOSS database name. If the 1175database has a different name in SRS, define dbalias: as the database 1176name to pass to SRS. 1177 1178SRS will return the results using 'getz -e' so the format should match 1179the format of the original data. For some formats this can be tricky 1180(PIR for example), so consider using SRSFASTA although this will lose 1181information that is not included in the FASTA format SRS output. 1182 1183To query using the additional fields SRS supports, add them as fields: 1184 1185\paragraph{SRSFASTA}\par\noindent 1186Modes: \ilcomm{a q s}\par\noindent 1187As SRS but returns the sequences in FASTA format. The definition must 1188include format: fasta so that EMBOSS will read the results in FASTA 1189format. 1190 1191\begin{verbatim} 1192DB mydb [ 1193#required parameters 1194 method: "srsfasta" 1195 format: "fasta" 1196 type: "N" 1197#optional parameters 1198 dbalias: "embl" 1199 fields: "sv des key org" 1200 app: "getz" 1201 comment: "My srs indexed database" 1202 release: "63.0" 1203] 1204\end{verbatim} 1205 1206This access method builds an SRS commandline query to getz. If you 1207have getz installed under another name, define this as app: 1208 1209The SRS query by default uses the EMBOSS database name. If the 1210database has a different name in SRS, define dbalias: as the database 1211name to pass to SRS. 1212 1213SRS will return the results using 'getz -f -sf fasta' so the format 1214must be 'fasta'. 1215 1216To query using the additional fields SRS supports, add them as fields: 1217 1218\paragraph{SRSWWW}\par\noindent 1219Modes: \ilcomm{a q s}\par\noindent 1220As URL, but specific to an SRS web server. This method takes a base 1221URL (up to wgetz) for an SRS server, and builds the rest of the URL as 1222a valid SRS query. 1223 1224By building the URL, SRSWWW access can query both ID and accession 1225number, and can query additional fields 'sv', 'des', 'key' and 'org' 1226if they are allowed with a fields definition. 1227 1228\begin{verbatim} 1229DB mydb [ 1230# required parameters 1231 method: "srswww" 1232 format: "genbank" 1233 type: "N" 1234 url: "http://www.infobiogen.fr/srs5bin/cgi-bin/wgetz?" 1235#optional parameters 1236 dbalias: "genbank" 1237 fields: "sv des key org" 1238 comment: "Genbank by SRS from InfoBiogen" 1239 proxy: ":" 1240 httpversion: "1.0" 1241] 1242\end{verbatim} 1243 1244Because queries for such fields to a remote server can find a very 1245large number of hits, and EMBOSS will load the entire output into 1246memory to process the HTML, many EMBOSS administrators choose not to 1247define these fields for an SRSWWW server. 1248 1249If there is sufficient demand, it should be possible to rewrite the 1250HTML preprocessing to avoid buffering in memory. 1251 1252SRSWWW support the \ilcomm{proxy} and \ilcomm{httpversion} settings 1253described under access method URL. 1254 1255\paragraph{URL}\par\noindent 1256Modes: \ilcomm{s}\par\noindent URL uses a defined web server to 1257retrieve a specific entry. EMBOSS may fail if the HTML causes 1258complications with parsing of the entry. 1259 1260\begin{verbatim} 1261DB mydb [ 1262# required parameters 1263 method: "url" 1264 format: "genbank" 1265 type: "N" 1266 url: "http://www.infobiogen.fr/srs5bin/cgi-bin/wgetz?-e+[genbank-id:%s]" 1267#optional parameters 1268 comment: "Genbank by ID from InfoBiogen" 1269] 1270\end{verbatim} 1271 1272The \%s in the URL string indicates where \EMBOSS\ will insert the 1273identifier portion of the USA. 1274 1275At many sites, remote HTTP access is controlled by a proxy 1276server. EMBOSS uses a proxy server defined as EMBOSS\_PROXY with a 1277value in the format \ilcomm{domain.address:port}, for example: 1278 1279\begin{verbatim} 1280set emboss_httpversion 'proxy.mydomain.org:8080' 1281\end{verbatim} 1282 1283This is a global definition. For selected databases (local web-based 1284services, for example) you can turn off the proxy inside the database 1285definition with: 1286 1287\begin{verbatim} 1288DB [ ... 1289 proxy: ":" 1290] 1291\end{verbatim} 1292 1293HTTP access by default used HTTP protocol version 1.0. EMBOSS can also 1294support version 1.1, which provides chunked HTML results to improve 1295improve network performance. The HTTP version is controlled by a 1296variable EMBOSS\_HTTPVERSION and by a DB attribute, for example: 1297 1298\begin{verbatim} 1299set emboss_httpversion "1.1" 1300\end{verbatim} 1301 1302or 1303 1304\begin{verbatim} 1305DB [ ... 1306 httpversion: '1.1' 1307] 1308\end{verbatim} 1309 1310\subsection{Mixed access methods} 1311 1312For any given \ilcomm{method:} declaration, \EMBOSS\ will use that 1313method for those access modes supported by the method. 1314 1315If you wish to specify which access mode (all, query or single) should 1316be handled by which database retrieval method then the 1317\ilcomm{methodsingle:}, \ilcomm{methodquery:} and \ilcomm{methodall:} 1318declarations should be used instead of \ilcomm{method:} 1319 1320\begin{verbatim} 1321DB mydb [ 1322methodsingle: app 1323format: fasta 1324app: "customapp myproteindb" 1325methodall: direct 1326dir: \$emboss_db_dir/myproteindb 1327file: myproteindb.dat 1328type: P 1329comment: "single and all access for myproteindb" 1330] 1331\end{verbatim} 1332 1333You can mix these, for example, to use a script to query a file, and 1334direct acces to read all entries, 1335 1336\begin{verbatim} 1337 methodall: 'direct' 1338 methodquery: 'external' 1339\end{verbatim} 1340 1341\subsection{Indexing and configuring flatfile databases} 1342 1343Flatfile databases are plain text files in a defined format such as 1344those released by EMBL, Swissprot and so on. The \EMBOSS\ program 1345\progname{dbiflat} is used to generate EMBLCD indices that can be used 1346for all types of database access. \progname{dbiflat} can process 1347databases in EMBL, SWISSPROT and GENBANK format. Pseudo EMBL format 1348databases which do not have unique ID and AC entries may cause 1349\progname{dbiflat} to do mysterious things and should be avoided. 1350 1351\progname{dbiflat} (and the EMBLCD access method) requires the 1352databases to be uncompressed. The examples given here will not probe 1353the deeper secrets of \progname{dbiflat} (for which the reader is 1354referred to the documentation, or failing that the source code) but 1355will show a typical installation for a common database. 1356 1357We assume that \EMBOSS\ has been installed and works. This can be 1358tested with the command \ilcomm{wossname -auto} which should list all 1359the programs available. 1360 1361In this example we will index and configure the EMBL database for use 1362with \EMBOSS. 1363 1364First download and unpack the EMBL database. This will require a 1365considerable amount of disk space. If you do not have sufficient space 1366available then just download a subset of the database. 1367 1368Use \ilcomm{cd} to move the directory in which you have unpacked 1369EMBL. This should look something like this when you run \ilcomm{ls}: 1370 1371\begin{verbatim} 1372% ls 1373est_fun.dat 1374est_hum1.dat 1375est_hum10.dat 1376. 1377Output truncated 1378. 1379syn.dat 1380unc.dat 1381vrl.dat 1382vrt.dat 1383\end{verbatim} 1384 1385Run \progname{dbiflat} to create the EMBLCD indices. 1386 1387\begin{verbatim} 1388% dbiflat 1389 1390Index a flat file database 1391 EMBL : EMBL 1392 SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew 1393 GB : Genbank, DDBJ 1394Entry format [SWISS]: EMBL 1395Database name: embl 1396Database directory [.]: 1397Wildcard database filename [*.dat]: 1398Release number [0.0]: 63.0 1399Index date [00/00/00]: 31/07/00 1400\end{verbatim} 1401 1402\progname{dbiflat} should happily chug away for some considerable time 1403(up to a few hours depending on the speed of your machine) and will 1404generate (eventually) the following index files: 1405 1406\begin{verbatim} 1407% ls 1408acnum.hit 1409acnum.trg 1410division.lkp 1411entrynam.idx 1412\end{verbatim} 1413 1414Now we create an entry in the \EMBOSS\ configuration files to acces 1415sthe database. It is probably a good idea to try new database 1416definitions in your local configuration file first. 1417 1418Put the following entry in your \filename{.embossrc} 1419 1420\begin{verbatim} 1421DB embl [ 1422 type: N 1423 method: emblcd 1424 format: embl 1425 dir: \$emboss_db_dir/embl 1426 file: "*.dat" 1427 release: "63.0" 1428 comment: "EMBL release 63.0" 1429] 1430\end{verbatim} 1431 1432you will have needed to predefine \ilcomm{\$emboss\_db\_dir} using a 1433directive such as 1434 1435\begin{verbatim} 1436set emboss_db_dir /path_to_databases 1437\end{verbatim} 1438 1439somewhere in your \filename{emboss.default} or \filename{.embossrc}. 1440 1441Save \filename{.embossrc} and try \progname{showdb}. You should see a 1442line that looks like: 1443 1444\begin{verbatim} 1445% showdb 1446.. output deleted 1447embl N OK OK OK EMBL release 63.0 1448.. output deleted 1449\end{verbatim} 1450 1451\subsection{Fine tuning the installation:} 1452\label{sec:finetune} 1453It is probably a good idea to set up subsections of the database so 1454that end users can search just the regions they wish to search. This 1455section applies to all access methods that use EMBLCD style indexes 1456and probably to others as well. 1457 1458Files can be included with the declaration \ilcomm{file:} or excluded 1459with the declaration \ilcomm{exclude:}. It is a good idea to put the 1460wild card directory specifier (\filename{*/})in front of the filename 1461to ensure that any path that may be included in 1462\filename{division.lkp} will be matched. Please note especially the 1463notes for \progname{GCG} formatted databases indexed with 1464\progname{dbigcg}. 1465 1466In order to just take the EST files in our EMBL database try the following: 1467 1468\begin{verbatim} 1469DB emblest [ 1470 type: N 1471 method: emblcd 1472 format: embl 1473 dir: \$emboss_db_dir/embl 1474 file: "est*.dat" 1475 release: "63.0" 1476 comment: "EMBL release 63.0" 1477] 1478\end{verbatim} 1479 1480Files can also be given as a space separated list enclosed in 1481quotes. For example to set up a database of all mamallian sequences 1482(except genomes) try the following: 1483 1484\begin{verbatim} 1485DB emblallmam [ 1486 type: N 1487 method: emblcd 1488 format: embl 1489 dir: \$emboss_db_dir/embl 1490 file: "rod*.dat hum*.dat mam*.dat" 1491 release: "63.0" 1492 comment: "EMBL release 63.0" 1493] 1494\end{verbatim} 1495 1496As you can see from these two examples, the \ilcomm{file:} tag takes a 1497space delimited list of filenames enclosed in quotes that can contain 1498normal wildcard (\ilcomm{?*}) characters. 1499 1500It can be quite tedious to set up a long list of sequences to 1501search. In many cases you can use the \ilcomm{exclude:} tag to make 1502things easier. 1503 1504\begin{verbatim} 1505DB emblnoest [ 1506 type: N 1507 method: emblcd 1508 format: embl 1509 dir: \$emboss_db_dir/embl 1510 file: "*.dat" 1511 exclude: "est*.dat" 1512 release: "63.0" 1513 comment: "EMBL release 63.0" 1514] 1515\end{verbatim} 1516 1517This configures the \filename{emblnoest} database to contain all of 1518EMBL except the EST's. 1519 1520\subsection{Indexing and configuring GCG format databases} 1521 1522\EMBOSS\ can access GCG formatted databases, thus avoiding having 1523multiple copies of the same databases in different formats for those 1524who still use GCG alongside the flatfiles. \EMBOSS\ creates EMBLCD 1525like indices for the GCG format databases using the program 1526\progname{dbigcg}. This runs in much the same way as 1527\progname{dbiflat}. You will need the GCG format \filename{.seq} and 1528\filename{.header} files in order to create an EMBLCD indexed 1529database. 1530 1531Move to the GCG database directory containing your data and run 1532\progname{dbigcg} 1533 1534\begin{verbatim} 1535Index a GCG formatted database 1536 EMBL : EMBL 1537 SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew 1538 GB : Genbank, DDBJ 1539 PIR : NBRF 1540Entry format [EMBL]: 1541Database name: embl 1542Database directory [.]: 1543Wildcard database filename [*.seq]: 1544Release number [0.0]: 63.0 1545Index date [00/00/00]: 31/07/00 1546\end{verbatim} 1547 1548The program will chug along for a while and will then generate the 1549EMBLCD index files for the GCG format database. 1550 1551When \progname{dbigcg} prompts for the entry format (\ilcomm{Entry 1552format [EMBL]:}) you should enter the original database format before 1553you ran \progname{embltogcg} or similar to generate the \progname{GCG} 1554databases. 1555 1556The following entry should be put in your \filename{.embossrc} 1557 1558\begin{verbatim} 1559DB gcgembl [ 1560 type: N 1561 method: gcg 1562 format: embl 1563 dir: \$emboss_db_dir/embl 1564 file: "*.dat" 1565 release: "63.0" 1566 comment: "EMBL release 63.0" 1567] 1568\end{verbatim} 1569 1570\progname{showdb} should show your newly configured database. 1571 1572You can configure subsets of the databases in the same way as for the 1573original format databases, described in section \ref{sec:finetune} 1574above. One difference to \progname{dbiflat} indexing is that both the 1575\filename{.seq} and \filename{.header} files are listed in the 1576\filename{division.lkp} file. \ilcomm{file:} and \ilcomm{exclude:} 1577directives should therefore be of the form \ilcomm{exclude: 1578*/em\_est*} instead of just \ilcomm{*/em\_est*.seq}. 1579 1580\subsection{Indexing and configuring BLAST databases} 1581BLAST format databases are generated for efficient homology searching 1582using the BLAST programs. It can be convenient to avoid redundant 1583copies of databases so \EMBOSS\ provides a mechanism for accessing 1584these databases. 1585 1586BLAST format databases are those generated using the tools distributed 1587with NCBI-BLAST or with WU-BLAST. 1588 1589\begin{comment}At present \EMBOSS 1590will only index BLAST databases created from FASTA format input files 1591with one of the recognised header formats. More information on the 1592relevant formats can be found in subsection \ref{subsec:fasta} 1593below. 1594\end{comment} 1595 1596For indexing of one BLAST database, move to the 1597directory containing your BLAST format databases and run 1598\progname{dbiblast} 1599 1600\begin{verbatim} 1601Index a BLAST database 1602Database name: blastsw 1603Database directory [.]: 1604database base filename [blastsw]: 1605Release number [0.0]: 1606Index date [00/00/00]: 1607 N : nucleic 1608 P : protein 1609 ? : unknown 1610Sequence type [unknown]: p 1611 1 : wublast and setdb/pressdb 1612 2 : formatdb 1613 0 : unknown 1614Blast index version [unknown]: 2 1615 1616\end{verbatim} 1617 1618The program will chug along for a while and will then generate the 1619EMBLCD index files for the BLAST format database. 1620 1621The following entry (or one like it that is more appropriate to your 1622particular installation) should be put in your \filename{.embossrc} 1623 1624\begin{verbatim} 1625DB blastsw [ 1626 type: P 1627 method: blast 1628 format: ncbi 1629 dir: \$emboss_db_dir/blastsw 1630 file: "blastsw" 1631 release: "38.9" 1632 comment: "BLAST format Swissprot" 1633] 1634\end{verbatim} 1635 1636\progname{showdb} should show your newly configured database. 1637 1638Because of the way BLAST works, many sites may group their BLAST 1639databases in the same directory. You can index these {\it in situ} 1640with \progname{dbiblast} but this may require some extra steps if your 1641databases are not of the same type as generation of subsequent index 1642files will overwrite those that already exist. To avoid overwriting of 1643index files you can index many databases with one set of index files, 1644or you can use the \ilcomm{indexdir} options to place the indices in a 1645different directory. 1646 1647There are two requirements for indexing several databases together in 1648one index. The first is that the databases are the same type 1649(protein/nucleic acid) and generated with the same tool (pressdb or 1650formatdb); the second is that all the ID and accession numbers in the 1651combined databases are unique. 1652 1653Run \progname{dbiblast} as before but specify all the databases you 1654wish to be included when prompted for the database filename. 1655 1656\begin{verbatim} 1657Index a BLAST database 1658Database name: alldbs 1659Database directory [.]: 1660database base filename [alldbs]: dbone dbtwo dbthree dbfour 1661Release number [0.0]: 1662Index date [00/00/00]: 1663 N : nucleic 1664 P : protein 1665 ? : unknown 1666Sequence type [unknown]: p 1667 1 : wublast and setdb/pressdb 1668 2 : formatdb 1669 0 : unknown 1670Blast index version [unknown]: 2 1671 1672\end{verbatim} 1673 1674These can then be configured as described in section 1675\ref{sec:finetune} above by using the '\ilcomm{file:}' and 1676'\ilcomm{exclude:}' tags as appropriate.\footnote{There is one 1677difference to the standard EMBLCD access method in that the database 1678indexes will not allow the generation of exclusive subsections of the 1679combined database. If an ID or accession number is specified that is 1680present in the index then the sequence will be returned irrespective 1681of which database it is in.} 1682 1683When you have databases of different types, generated with different 1684programs or where the ID/accession numbers are duplicated between 1685databases the preferred strategy is probably to keep the source data 1686for the individual databases in separate directories and index them 1687there.\footnote{Keeping one directory with symbolic links for your 1688BLAST installation will ensure that BLAST continues to function 1689correctly if you set BLASTDB to point to the directory containing the 1690symbolic links. The EMBOSS indices can be placed wherever you wish as 1691long as you remember to run \progname{dbiblast} with the appropriate 1692options and put an appropriate \ilcomm{indexdir} tag in the DB 1693configuration in your ~/.embossrc} 1694 1695Alternatively you can place the index files in a separate 1696directory. This requires that you run \progname{dbiblast} with the 1697\ilcomm{-indexdirectory} option and set the \ilcomm{indexdir:} tag in 1698the database configuration to point to the correct database. The 1699example below illustrates database configuration using the 1700\ilcomm{indexdir} options. 1701 1702\begin{verbatim} 1703% dbiblast -indexdir=/databases/indices/mydb 1704Index a BLAST database 1705Database name: mydb 1706Database directory [.]: 1707database base filename [mydb]: 1708Release number [0.0]: 1709Index date [00/00/00]: 1710 N : nucleic 1711 P : protein 1712 ? : unknown 1713Sequence type [unknown]: p 1714 1 : wublast and setdb/pressdb 1715 2 : formatdb 1716 0 : unknown 1717Blast index version [unknown]: 2 1718 1719\end{verbatim} 1720 1721The corresponding entry in \filename{~/.embossrc} (or 1722\filename{emboss.default}) would look like: 1723 1724 1725\begin{verbatim} 1726DB mydb [ 1727 type: P 1728 method: blast 1729 format: ncbi 1730 dir: \$emboss_db_dir/blastsw 1731 indexdir: /databases/indices/mydb 1732 file: mydb 1733 release: "1.0" 1734 comment: "My BLAST DB with an index in a different directory" 1735] 1736\end{verbatim} 1737 1738Again, multiple indices cannot coexist in the same directory so care 1739should be taken when using the \ilcomm{indexdir} options that an 1740existing database index is not overwritten. 1741 1742\begin{comment} 1743\subsubsection{FASTA formats used with \progname{dbiblast}} 1744\label{subsec:fasta} 1745The following FASTA formats are recognised by \progname{dbiblast}: 1746 1747\begin{tabular}[t]{|l|l|}\hline \setlength{\baselineskip}{1.2\baselineskip} 1748GENBANK/NCBI & \ilcomm{> \ldots |accno|id \ldots }\\ 1749\hline 1750GCG & \ilcomm{>{\sl dbname}:accno id \ldots }\\ 1751\hline 1752SIMPLE &\ilcomm{ >accno id \ldots} \\ 1753\hline 1754ID & \ilcomm{>id}\\ 1755\hline 1756\end{tabular} 1757\ilcomm{...} refers to any text. Note that the ID must be the only 1758item in the header for the ID format. 1759 1760\end{comment} 1761\subsection{Indexing and configuring FASTA databases} 1762 1763The FASTA specifications just define the sequence file as a header 1764line that begins with \ilcomm{>} and subsequent lines containing the 1765sequence. The header line can be present in an almost infinite number 1766of formats, several of which can be processed by \EMBOSS. \EMBOSS 1767attempts to determine the accession number and/or ID for each 1768sequence. For indexing purposes there is no semantic difference 1769between an accession number and an ID. In the real world, acession 1770numbers are immutable, ie. they do not change with subsequent releases 1771of the dataabse, but ID's may change. In any case IDs and accession 1772numbers are unique, and that is all that matters for database indexing 1773\EMBOSS. 1774 1775The program used to process FASTA format databases is 1776\progname{dbifasta}. It can recognise the following header line 1777formats, specified on the command line: 1778 1779\begin{tabular}[t]{|l|l|}\hline\setlength{\baselineskip}{1.5\baselineskip} 1780simple &% 1781\ilcomm{>id ...}\\ 1782\hline 1783idacc &% 1784\ilcomm{>id accno ...}\\ 1785\hline 1786gcgid &% 1787\ilcomm{>db:id ...}\footnotemark[\value{footnote}]\\ 1788\hline 1789gcgidacc &% 1790\ilcomm{>db:id acc ...}\footnotemark[\value{footnote}]\\ 1791\hline 1792dbid &% 1793\ilcomm{>db id ...}\footnotemark\\ 1794\hline 1795ncbi &% 1796\ilcomm{>...[|accno]|id ...}\footnotemark\\ 1797\hline 1798\end{tabular} 1799\addtocounter{footnote}{-1} \footnotetext{{\em db} is one word} 1800\addtocounter{footnote}{1} \footnotetext{The ID is always taken to be 1801the characters after the last bar (\ilcomm{|}). The previous field is 1802also indexed but ONLY if it looks like an accession number 1803(e.g. AC00001).} 1804 1805 1806Other header formats will not be recognised by \progname{dbifasta} and 1807will cause indexing and/or database lookup to fail. If you have a 1808different header format that \progname{dbifasta} cannot yet handle you 1809have two options: 1810\begin{enumerate} 1811\item (The preferred option) Get a C programmer to modify the source 1812code for \progname{dbifasta} and recompile. If you are a community 1813spirited person you will also contribute these changes to the main 1814\EMBOSS\ source tree. (email emboss-dev\@@emboss.open-bio.org for more 1815information on contributing changes to the \EMBOSS\ source code and/or 1816read the \EMBOSS\ developers documentation) 1817\item (The quick hack) Write a custom script (using 1818e.g. BioPerl\URL{http://www.bioperl.org}) to access your database and 1819use \ilcomm{method: external} to configure it. This is less desirable 1820as you may be limited in the access modes you can use. 1821\end{enumerate} 1822 1823To index a FASTA format database, run \progname{dbifasta}. 1824 1825\begin{verbatim} 1826% dbifasta 1827Index a fasta database 1828 simple : >ID 1829 idacc : >ID ACC 1830 gcgid : >db:ID 1831 gcgidacc : >db:ID ACC 1832 ncbi : >blah|...[|ACC]|ID 1833ID line format [idacc]: 1834Database name: mydb 1835Database directory [.]: 1836Wildcard database filename [*.dat]: mydb.fasta 1837Release number [0.0]: 1838Index date [00/00/00]: 1839\end{verbatim} 1840 1841\progname{dbifasta} will chug along for a little while and will 1842produce the index files. You can use the same \ilcomm{indexdir} 1843options as for \progname{dbiflat},\progname{dbigcg} and 1844\progname{dbiblast} to place the indices in a different directory. 1845 1846Place the following entry in your \filename{.embossrc} 1847 1848\begin{verbatim} 1849DB mydb [ 1850 type: P 1851 method: emblcd 1852 format: fasta 1853 dir: \$emboss_db_dir/mydb 1854 file: mydb.fasta 1855 comment: "My database" 1856] 1857\end{verbatim} 1858 1859\ilcomm{format:} should be \ilcomm{dbid}, \ilcomm{ncbi} or 1860\ilcomm{fasta} (for every format except \ilcomm{dbid} or 1861\ilcomm{ncbi}. The same \ilcomm{file:} and \ilcomm{include:} tags can 1862be used as for the other database indexing programs. 1863 1864 1865\subsection{Configuring \EMBOSS\ to use SRS for database lookup.} 1866 1867\ilcomm{method: srs} is really a special case of \ilcomm{method: 1868external} with some additional features. 1869 1870SRS is a powerful database querying system that can cross reference 1871between different databases, launch applications and so on. SRS can be 1872run either through a web interface (see the description of the URL 1873method above for an example) or via the command line program 1874\progname{getz}. Indexing and configuring databases for SRS is 1875outside the scope of this document which will describe how to connect 1876to preconfigured and indexed SRS databases.\footnote{For information 1877on configuring and indexing SRS databases please look at the SRS 1878administrators guide \filename{www/doc/srsadmin.pdf} in your SRS 6 1879installation} If \progname{getz} is already in your \ilcomm{PATH} 1880environment variable then insert the following (or similar) in your 1881\filename{.embossrc}: 1882 1883\begin{verbatim} 1884 DB emblgetz [ 1885 type: N 1886 method: srs 1887 release: "63" 1888 format: embl 1889 comment: 'EMBL using getz' 1890 dbalias: embl 1891 app: getz 1892] 1893\end{verbatim} 1894 1895This will provide access to the SRS database 'embl' as 1896\ilcomm{emblgetz:acc}. If the SRS database has a different name to the 1897\EMBOSS\ database (as is the case here) then the \ilcomm{dbalias:} tag 1898should be used to access the correct SRS database. 1899 1900This configuration can be extremely slow for the all access mode. It 1901is probably a better idea to set up the database as follows: 1902 1903\begin{verbatim} 1904 DB emblgetz [ 1905 type: N 1906 methodquery: srs 1907 release: "63" 1908 format: embl 1909 comment: 'EMBL using getz' 1910 dbalias: embl 1911 app: getz 1912 methodall: direct 1913 file: "*.dat" 1914 dir: \$emboss_db_dir/embl 1915] 1916\end{verbatim} 1917 1918which will use \ilcomm{method: srs} for the \ilcomm{query} access mode 1919but will use \ilcomm{method: direct} for the \ilcomm{all} access mode, 1920thus speeding up reading of the whole database. 1921 1922The SRSFASTA access method is identical to the normal SRS method 1923except that it returns the sequence in FASTA format and so does not 1924need a \ilcomm{format:} tag. 1925 1926 1927\subsection{Indexing and configuring other databases} 1928 1929Many institutions may have local databases set up in their own 1930Laboratory Information Management System. \EMBOSS\ provides a simple 1931mechanism for interfacing with such systems. 1932 1933As long as a program is available that can be called noninteractively 1934and returns the specified sequence on standard output, \EMBOSS\ can 1935interface with it. Use method: app or external (the two are 1936equivalent) and app: "program command". The ID given in the USA will 1937be appended to the command used to run the program. It is probably 1938best to specify the methods available using the method subsets, 1939methodall:, methodquery: and methodsingle: rather than using the 1940generic method: tag. 1941 1942 1943\section{Other data} 1944 1945\EMBOSS\ can be integrated with some common biological 1946databases. These are described in this section. 1947 1948\subsection{REBASE} 1949 1950Rebase is the restriction enzyme database maintained by New 1951England Biolabs. It is needed for programs such as remap and 1952restrict. 1953 1954The latest version of Rebase can be obtained by anonymous 1955FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/rebase} \EMBOSS\ needs 1956the \filename{withrefm} file. The data is extracted for \EMBOSS\ with 1957the program \progname{rebaseextract}. 1958 1959\begin{verbatim} 1960% mkdir /site/prog/emboss/data/REBASE 1961% rebaseextract 1962Extract data from REBASE 1963Full pathname of WITHREFM: /data/rebase/withrefm.208 1964\end{verbatim} 1965 1966Rebase is now installed and ready to use. 1967 1968\subsection{TRANSFAC} 1969 1970Transfac is the transcription factor binding site database. It is 1971available by anonymous 1972FTP.\footnote{ftp://transfac.gbf.de/pub/transfac/ascii/} Unpacking the 1973distribution reveals a file called site.dat. This is the one \EMBOSS 1974needs. 1975 1976Run \progname{tfextract} to extract the data from TRANSFAC. 1977 1978\begin{verbatim} 1979% tfextract 1980Extract data from TRANSFAC 1981Full pathname of transfac SITE.DAT: /databases/transfac/site.dat 1982\end{verbatim} 1983 1984\progname{tfscan} can now access the TRANSFAC database. 1985 1986\subsection{PROSITE} 1987 1988Prosite is a database of regular expressions that match potentially 1989diagnostic regions for structural/functional classification of 1990proteins. \EMBOSS\ needs this database for the patmatmotifs program. 1991 1992PROSITE can be obtained via anonymous 1993FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/prosite} 1994 1995You may need to create a PROSITE subdirectory under data in the 1996\EMBOSS\ installation directory. 1997 1998Then run \progname{prosextract} to build the \EMBOSS\ Prosite database. 1999 2000\begin{verbatim} 2001% prosextract 2002Builds the PROSITE motif database for patmatmotifs to search 2003Enter name of prosite directory: /data/prosite 2004\end{verbatim} 2005 2006PROSITE is now integrated into your EMBOSS installation. 2007 2008\subsection{PRINTS} 2009 2010Prints is a database of diagnostic patterns of blocks of sequence 2011homology in protein families. The PRINTS database can be searched 2012using the \EMBOSS\ program \progname{pscan}. 2013 2014PRINTS can be obtained via anonymous 2015FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/prints} The database 2016is made available as compressed files which should be uncompressed 2017using \progname{gzip} before integrating them into \EMBOSS 2018 2019PRINTS is integrated with \EMBOSS\ using the program \progname{printsextract} 2020 2021\begin{verbatim} 2022% printsextract 2023Extract data from PRINTS 2024Input file: /data/prints/prints27_0.dat 2025\end{verbatim} 2026 2027The PRINTS database is now integrated with \EMBOSS. 2028 2029\subsection{AAINDEX} 2030 2031An amino acid index is a set of 20 numerical values representing any 2032of the different physicochemical and biological properties of amino 2033acids. The AAindex1 section of the Amino Acid Index Database is a 2034collection of published indices together with the result of cluster 2035analysis using the correlation coefficient as the distance between two 2036indices. This section currently contains 437 indices in release 2037\filename{4.0} of the database. 2038 2039The \EMBOSS\ programs \progname{pepwindow} and {pepwindowall} plot 2040hydrophobicity using the data from an Aaindex entry. If Aaindex is 2041installed these programs can plot the other amino acid properties. 2042 2043Aaindex can be obtained via anonymous 2044FTP.\footnote{ftp://ftp.genome.ad.jp/pub/db/genomenet/aaindex/aaindex1} 2045 2046Aaindex is integrated with \EMBOSS\ using the program \progname{aaindexextract} 2047 2048\begin{verbatim} 2049% aaindexextract 2050Extract data from AAINDEX 2051Full pathname of file aaindex1: /data/aaindex/aaindex1 2052\end{verbatim} 2053 2054The AAINDEX database is now integrated with \EMBOSS. 2055 2056\subsection{CUTG} 2057 2058The CUTG database contains a series of codon usage tables calculated 2059from GenBank. 2060 2061CUTG can be obtained via anonymous 2062FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/cutg/ or 2063ftp://ftp.kazusa.or.jp/pub/codon/current/} 2064 2065CUTG is integrated with \EMBOSS\ using the program 2066\progname{cutgextract} which writes files to the CODONS data 2067directory. 2068 2069\begin{verbatim} 2070% cutgextract 2071Extract data from CUTG 2072CUTG directory [.]: /data/cutg/ 2073\end{verbatim} 2074 2075The CUTG database is now integrated with \EMBOSS. 2076 2077\subsection{Miscellaneous data files} 2078 2079Other data files should be kept in the data directory under the main 2080\EMBOSS\ installation. Individual users personal data files can be 2081kept in the current working directory, a subdirectory 2082\filename{.embossdata} of the current directory, their home directory 2083or a subdirectory \filename{.embossdata} of their home 2084directory. \EMBOSS\ will search these locations in this order and will 2085stop as soon as it finds a matching file. If the personal directories 2086do not contain the desired file, \EMBOSS\ will search the system wide 2087data directory, \filename{/site/prog/emboss/data} in this example. 2088 2089Apparently inexplicable errors when running \EMBOSS\ programs may be 2090caused by the system not using the data files one expects. The search 2091path can be displayed in search order using the command 2092\progname{embossdata}. 2093 2094\section{Default program settings} 2095 2096As with many other areas, the default behaviour of programs can be 2097controlled by setting appropriate values in \filename{.embossrc}. 2098 2099All general qualifiers\footnote{See the \EMBOSS\ Quick Guide or the 2100web documentation (or use \ilcomm{wossname -help -verbose}) for an 2101overview of general qualifiers.} can be specified as 2102 2103\begin{verbatim} 2104set emboss_QUALIFIER 1 2105\end{verbatim} 2106 2107where \ilcomm{QUALIFIER} is one of the general qualifiers and the 2108value can be \ilcomm{1} or \ilcomm{1} for true, or \ilcomm{0} or 2109\ilcomm{N} for false. 2110 2111Setting the qualifier value to true has the effect of running every 2112program with that qualifier set.\footnote{You can specifically unset 2113it by using the \ilcomm{-noQUALIFIER} command line option} Qualifiers 2114can be set and will work in the same way as if you set them when 2115running the program. For example you can \ilcomm{set emboss\_verbose 2116Y} and the program will run normally, but when the program is run with 2117the \ilcomm{-help} qualifier, the output will be in verbose form. 2118 2119There is no point in globally setting options that are there for 2120producing help output. 2121 2122Qualifiers that can be set: 2123 2124\begin{description} 2125 2126\item[VERBOSE] Causes \ilcomm{-help} to print verbose text. 2127 2128\item[STDOUT] Causes all output to go to \filename{STDOUT} as 2129default. Programs will usually build a default output file name form 2130the input sequence and the program name. 2131 2132\item[DEBUG] Writes debugging output to a file. Useful for finding 2133bugs as a command line option. 2134 2135\item[OPTIONS] Enable prompting for optional parameters. 2136 2137\item[FILTER] Take input from \filename{STDIN} and send it to 2138\filename{STDOUT}, and turn on \ilcomm{-auto} 2139 2140\item[AUTO] Do not prompt for any options but accept the defaults if 2141no values are given. 2142 2143\item[WARNING] Print warning messages to \filename{STDERR} (default is true) 2144 2145\item[ERROR] Print error messages to \filename{STDERR} (default is true) 2146 2147\item[FATAL] Print fatal messages to \filename{STDERR} (default is true) 2148 2149\item[DIE] Print crash messages to \filename{STDERR} 2150 2151\end{description} 2152 2153These general qualifiers are typically used by advanced users 2154(\ilcomm{-options}, \ilcomm{-verbose}) or by developers 2155(\ilcomm{-debug -acdlog}). 2156 2157 2158Other program options that can be set are \ilcomm{emboss\_format}, 2159\ilcomm{emboss\_acdroot}, and \ilcomm{emboss\_data}. The value of 2160\ilcomm{emboss\_format} determines which default sequence format to 2161use for output. for example, if you are running \EMBOSS\ alongside 2162\progname{GCG} you may wish to have the following entry in your 2163\progname{.embossrc} 2164 2165\begin{verbatim} 2166set emboss_FORMAT gcg 2167set emboss_OUTFORMAT gcg 2168\end{verbatim} 2169 2170which has the effect of using \progname{GCG} format by 2171default.\footnote{This can of course be overridden using the 2172\ilcomm{-sformat} and \ilcomm{-osformat} associated qualifiers. See 2173the \EMBOSS\ ACD Syntax documentation or the \EMBOSS\ Quick Guide for 2174more information.} 2175 2176\ilcomm{emboss\_acdroot} \filename{/path/to/acd} can be set if you 2177wish to use a different directory for the ACD files, and 2178\ilcomm{emboss\_data} \filename{/path/to/data} if you wish to use a 2179separate data directory. 2180 2181 2182\section{Logging} 2183 2184Many system administrators may wish to make use of the logging 2185facilities of \EMBOSS. Setting the variable \ilcomm{emboss\_logfile} 2186in \filename{emboss.default} or \filename{.embossrc} allows the system 2187to keep a log of which programs are used when and by whom. 2188 2189\begin{verbatim} 2190set emboss_logfile /site/log/emboss.log 2191\end{verbatim} 2192 2193The log file structure is very simple. Three tab separated fields are 2194stored, program name, user name, and the date and time. 2195 2196\begin{verbatim} 2197prettyplot joeuser Wed Aug 02 14:29:13 2000 2198\end{verbatim} 2199 2200The file defined in emboss\_logfile should be world writable. The 2201following command ensures logging can occur. 2202 2203\begin{verbatim} 2204chmod +w /site/log/emboss.log 2205\end{verbatim} 2206 2207All settings can be overridden in a users \filename{.embossrc} files 2208by redefining the relevant variables. So to prevent our system usage 2209being logged we can redefine emboss\_logfile by putting the following 2210entry in our \filename{.embossrc} file. 2211 2212\begin{verbatim} 2213set emboss_logfile /dev/null 2214\end{verbatim} 2215 2216This behaviour may change in the future to prevent users redefining 2217some system settings. 2218 2219\chapter{Graphical interfaces to EMBOSS} 2220 2221This chapter needs to be written. It will be written when the 2222available GUIs are stable enough to document. 2223 2224\chapter{Resources} 2225\section{Web sites} 2226\subsection{Programs} 2227\begin{description} 2228\item[\EMBOSS\ source code]ftp://emboss.open-bio.org/pub/EMBOSS 2229\item[\EMBOSS\ Documentation]http://emboss.sf.net/ 2230\item[BLAST tools]Tools for generating BLAST format databases are 2231contained in the NCBI toolkit which can be obtained from NCBI at: 2232\begin{quote} 2233http://www.ncbi.nlm.nih.gov/ 2234\end{quote} 2235\item[SRS software]The SRS software can be obtained from Lion 2236Bioscience.\URL{http://www.lionbioscience.com/solutions/srs} This is a 2237commercial package but at the time of writing is available free of 2238charge to academic institutions. 2239\item[\progname{wget}]Various useful utilities including the 2240\progname{wget} program are available from the Free Software 2241Foundation.\URL{http://www.gnu.org} 2242\end{description} 2243\subsection{Databases} 2244 2245Most of the databases mentioned in the text along with many others can 2246be obtained via anonymous ftp from the European Bioinformatics 2247Institute (EBI) at: 2248\begin{quote} 2249ftp://ftp.ebi.ac.uk/pub/databases 2250\end{quote} 2251Please use a mirror site where possible to avoid overloading of the 2252EBI's resources. 2253 2254Other databases can be obtained from NCBI (Genbank,UniGene etc.) 2255 2256\subsection{Other Documentation} 2257Please review the \EMBOSS\ documentation available on the WWW at the 2258URL above. 2259 2260\begin{description} 2261\item[The \EMBOSS\ Quick guide]A pocket reference guide to using 2262\EMBOSS\URL{ftp://ftp.no.embnet.org/pub/EMBOSS-extra/emboss-qg.ps}. 2263\item[The \EMBOSS\ Tutorial]A tutorial to give an introduction to 2264using \EMBOSS\ for bioinformatics 2265users.\URL{http://www.hgmp.mrc.ac.uk/Registered/Option/emboss.html} 2266\item[The updated ABC guide]This is a series of bioinformatics 2267practicals based predominantly on 2268\EMBOSS.\URL{ftp://ftp.no.embnet.org/pub/ABC} 2269\item[EMBOSS-FreeBSD-HOWTO]Detailed documentation on installation of 2270\EMBOSS\ on 2271FreeBSD.\URL{ftp://ftp.no.embnet.org/pub/EMBOSS-extra/EMBOSS-FreeBSD-HOWTO} 2272\end{description} 2273 2274\section{Maintainance of your \EMBOSS\ installation} 2275 2276\EMBOSS\ is a rapidly evolving software packages. It is constantly 2277being improved, new features added and `issues' resolved. In addition 2278there are new applications added and you probably want to make use of 2279these. 2280 2281\subsection{Automated installation of \EMBOSS\ and EMBASSY} 2282 2283Once you have installed \EMBOSS\ and got it to work you have solved 2284the hardest part of the struggle. Updating \EMBOSS\ as new releases 2285appear\footnote{\EMBOSS\ is rebuilt nightly from CVS, tested, and, 2286assuming it passes the compilation tests, the latest version is posted 2287to the \EMBOSS\ FTP server. } can be quite tedious. UNIX is designed 2288for the lazy, so here is our lazy man's guide to always having an up to 2289the minute \EMBOSS\ installation. 2290 2291The following script can be run manually (it should probably be 2292`\ilcomm{source}d' rather than executed directly) or can be fired off 2293with cron (in the early hours of the morning is a good time). It 2294assumes you are installing \EMBOSS\ outside the source directory and 2295have write permissions to do so. 2296 2297\EMBOSS\ will update \EMBOSS\ distributed files but will not alter or 2298overwrite your own datafiles\footnote{Assuming of course that you 2299haven't overwritten \EMBOSS\ datafiles with your own to begin with.} 2300or your \filename{emboss.default}. 2301 2302\begin{verbatim} 2303 2304# This script should be sourced, not run. 2305# EMBOSS UPDATE. 2306# it assumes \$packages_dir/EMBOSS is a symbolic link to 2307# \$mirror_dir/emboss.open-bio.org/pub/EMBOSS 2308# 2309 2310#site specific variables: season according to taste.. 2311 2312set mirror_dir=('/ftp/mirrors') 2313set packages_dir=('/site/newprog') 2314set emboss_config_options=\ 2315('--prefix=/site/prog/emboss --with-pngdriver=/site/lib') 2316 2317# Now the script proper 2318 2319set oldpwd=`pwd` 2320 2321cd \$mirror_dir 2322echo 'updating EMBOSS' 2323if ( `wget -m 'ftp://emboss.open-bio.org/pub/EMBOSS' |& \ 2324 tail -1 | awk '/^Downloaded:/{print \$5}'` != "0" ) then 2325 2326 cd \${packages_dir}/EMBOSS 2327 echo 'new EMBOSS programs found .. installing' 2328 set latest_emboss=`ls -t EMBOSS*|head -1` 2329 2330 cd \$packages_dir 2331 rm -Rf EMBOSS-* 2332 tar zxf EMBOSS/\$latest_emboss 2333 set emboss_dir=`ls -dt EMBOSS-*[^z]|head -1` 2334 2335#the next line is necessary on our system but may not be for yours. 2336 setenv LD_LIBRARYN32_PATH /site/lib 2337 2338 cd \$emboss_dir 2339 2340# If you have any site specific changes to the source code 2341# that you want to include, copy them in here 2342 2343 ./configure \$emboss_config_options &&\ 2344 make && \ 2345 make install 2346 2347#Now unpack and build EMBASSY 2348 2349 mkdir embassy 2350 cd embassy 2351 2352#Unpack and build each package one at a time 2353 2354 foreach embassadir ( `ls ../../EMBOSS/*gz |grep -v E 2355MBOSS-` ) 2356 2357 tar zxf \$embassadir 2358 set embassadir_arch=\$embassadir:t 2359 set embassadir_root=\$embassadir_arch:r 2360 2361 cd \$embassadir_root:r 2362 ./configure \$emboss_config_options &&\ 2363 make && \ 2364 make install 2365 2366 cd .. 2367 end 2368else 2369 echo 'No new version of EMBOSS available' 2370endif 2371 2372cd \$oldpwd 2373\end{verbatim} 2374 2375\subsection{Automated database updating} 2376 2377In the same way, scripts can be written to automatically update the 2378biological databases. An example is given here for REBASE. As all the 2379parameters for \EMBOSS\ programs can be specified on the command line 2380it is a trivial matter to include index generation in your nightly 2381update scripts. The management of a bioinformatic resource is beyond 2382the scope of this document, though \EMBOSS\ goes a long way towards 2383easing the burden of management. 2384 2385\subsubsection{Automated update of REBASE} 2386 2387This script will look for a new version of REBASE and install it in 2388\EMBOSS\ using \progname{rebaseextract}. 2389 2390\begin{verbatim} 2391# This script should be sourced, not run. 2392# REBASE UPDATE. Should be run just after the beginning of the month. 2393set mirrors_dir=('/ftp/mirrors') 2394set oldpwd=`pwd` 2395 2396cd \$mirrors_dir 2397 2398if ( ` wget -m 'ftp://ftp.ebi.ac.uk/pub/databases/rebase/*' |& \ 2399 tail -1 | awk '/^Downloaded:/{print \$5}'` != "0" ) then 2400 cd ftp.ebi.ac.uk/pub/databases/rebase 2401 cp `ls -t withrefm.*.Z|head -1` withrefm.Z 2402 uncompress withrefm.Z 2403 rebaseextract \ 2404 \${mirrors_dir}/ftp.ebi.ac.uk/pub/databases/rebase/withrefm 2405 rm withrefm 2406endif 2407 2408cd \$oldpwd 2409\end{verbatim} 2410 2411We make no guarantees that these scripts will work correctly on your 2412system. If it deletes all your files, spams your associates, scratches 2413your CD's and initiates a nuclear strike on a small unpopulated 2414pacific island it is NOT OUR FAULT. It just happens to work for us. 2415 2416\chapter{GNU Free Documentation License} 2417 2418\begin{verbatim} 2419 GNU Free Documentation License 2420 Version 1.1, March 2000 2421 2422 Copyright (C) 2000 Free Software Foundation, Inc. 2423 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 2424 Everyone is permitted to copy and distribute verbatim copies 2425 of this license document, but changing it is not allowed. 2426 2427 24280. PREAMBLE 2429 2430The purpose of this License is to make a manual, textbook, or other 2431written document "free" in the sense of freedom: to assure everyone 2432the effective freedom to copy and redistribute it, with or without 2433modifying it, either commercially or noncommercially. Secondarily, 2434this License preserves for the author and publisher a way to get 2435credit for their work, while not being considered responsible for 2436modifications made by others. 2437 2438This License is a kind of "copyleft", which means that derivative 2439works of the document must themselves be free in the same sense. It 2440complements the GNU General Public License, which is a copyleft 2441license designed for free software. 2442 2443We have designed this License in order to use it for manuals for free 2444software, because free software needs free documentation: a free 2445program should come with manuals providing the same freedoms that the 2446software does. But this License is not limited to software manuals; 2447it can be used for any textual work, regardless of subject matter or 2448whether it is published as a printed book. We recommend this License 2449principally for works whose purpose is instruction or reference. 2450 2451 24521. APPLICABILITY AND DEFINITIONS 2453 2454This License applies to any manual or other work that contains a 2455notice placed by the copyright holder saying it can be distributed 2456under the terms of this License. The "Document", below, refers to any 2457such manual or work. Any member of the public is a licensee, and is 2458addressed as "you". 2459 2460A "Modified Version" of the Document means any work containing the 2461Document or a portion of it, either copied verbatim, or with 2462modifications and/or translated into another language. 2463 2464A "Secondary Section" is a named appendix or a front-matter section of 2465the Document that deals exclusively with the relationship of the 2466publishers or authors of the Document to the Document's overall subject 2467(or to related matters) and contains nothing that could fall directly 2468within that overall subject. (For example, if the Document is in part a 2469textbook of mathematics, a Secondary Section may not explain any 2470mathematics.) The relationship could be a matter of historical 2471connection with the subject or with related matters, or of legal, 2472commercial, philosophical, ethical or political position regarding 2473them. 2474 2475The "Invariant Sections" are certain Secondary Sections whose titles 2476are designated, as being those of Invariant Sections, in the notice 2477that says that the Document is released under this License. 2478 2479The "Cover Texts" are certain short passages of text that are listed, 2480as Front-Cover Texts or Back-Cover Texts, in the notice that says that 2481the Document is released under this License. 2482 2483A "Transparent" copy of the Document means a machine-readable copy, 2484represented in a format whose specification is available to the 2485general public, whose contents can be viewed and edited directly and 2486straightforwardly with generic text editors or (for images composed of 2487pixels) generic paint programs or (for drawings) some widely available 2488drawing editor, and that is suitable for input to text formatters or 2489for automatic translation to a variety of formats suitable for input 2490to text formatters. A copy made in an otherwise Transparent file 2491format whose markup has been designed to thwart or discourage 2492subsequent modification by readers is not Transparent. A copy that is 2493not "Transparent" is called "Opaque". 2494 2495Examples of suitable formats for Transparent copies include plain 2496ASCII without markup, Texinfo input format, LaTeX input format, SGML 2497or XML using a publicly available DTD, and standard-conforming simple 2498HTML designed for human modification. Opaque formats include 2499PostScript, PDF, proprietary formats that can be read and edited only 2500by proprietary word processors, SGML or XML for which the DTD and/or 2501processing tools are not generally available, and the 2502machine-generated HTML produced by some word processors for output 2503purposes only. 2504 2505The "Title Page" means, for a printed book, the title page itself, 2506plus such following pages as are needed to hold, legibly, the material 2507this License requires to appear in the title page. For works in 2508formats which do not have any title page as such, "Title Page" means 2509the text near the most prominent appearance of the work's title, 2510preceding the beginning of the body of the text. 2511 2512 25132. VERBATIM COPYING 2514 2515You may copy and distribute the Document in any medium, either 2516commercially or noncommercially, provided that this License, the 2517copyright notices, and the license notice saying this License applies 2518to the Document are reproduced in all copies, and that you add no other 2519conditions whatsoever to those of this License. You may not use 2520technical measures to obstruct or control the reading or further 2521copying of the copies you make or distribute. However, you may accept 2522compensation in exchange for copies. If you distribute a large enough 2523number of copies you must also follow the conditions in section 3. 2524 2525You may also lend copies, under the same conditions stated above, and 2526you may publicly display copies. 2527 2528 25293. COPYING IN QUANTITY 2530 2531If you publish printed copies of the Document numbering more than 100, 2532and the Document's license notice requires Cover Texts, you must enclose 2533the copies in covers that carry, clearly and legibly, all these Cover 2534Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on 2535the back cover. Both covers must also clearly and legibly identify 2536you as the publisher of these copies. The front cover must present 2537the full title with all words of the title equally prominent and 2538visible. You may add other material on the covers in addition. 2539Copying with changes limited to the covers, as long as they preserve 2540the title of the Document and satisfy these conditions, can be treated 2541as verbatim copying in other respects. 2542 2543If the required texts for either cover are too voluminous to fit 2544legibly, you should put the first ones listed (as many as fit 2545reasonably) on the actual cover, and continue the rest onto adjacent 2546pages. 2547 2548If you publish or distribute Opaque copies of the Document numbering 2549more than 100, you must either include a machine-readable Transparent 2550copy along with each Opaque copy, or state in or with each Opaque copy 2551a publicly-accessible computer-network location containing a complete 2552Transparent copy of the Document, free of added material, which the 2553general network-using public has access to download anonymously at no 2554charge using public-standard network protocols. If you use the latter 2555option, you must take reasonably prudent steps, when you begin 2556distribution of Opaque copies in quantity, to ensure that this 2557Transparent copy will remain thus accessible at the stated location 2558until at least one year after the last time you distribute an Opaque 2559copy (directly or through your agents or retailers) of that edition to 2560the public. 2561 2562It is requested, but not required, that you contact the authors of the 2563Document well before redistributing any large number of copies, to give 2564them a chance to provide you with an updated version of the Document. 2565 2566 25674. MODIFICATIONS 2568 2569You may copy and distribute a Modified Version of the Document under 2570the conditions of sections 2 and 3 above, provided that you release 2571the Modified Version under precisely this License, with the Modified 2572Version filling the role of the Document, thus licensing distribution 2573and modification of the Modified Version to whoever possesses a copy 2574of it. In addition, you must do these things in the Modified Version: 2575 2576A. Use in the Title Page (and on the covers, if any) a title distinct 2577 from that of the Document, and from those of previous versions 2578 (which should, if there were any, be listed in the History section 2579 of the Document). You may use the same title as a previous version 2580 if the original publisher of that version gives permission. 2581B. List on the Title Page, as authors, one or more persons or entities 2582 responsible for authorship of the modifications in the Modified 2583 Version, together with at least five of the principal authors of the 2584 Document (all of its principal authors, if it has less than five). 2585C. State on the Title page the name of the publisher of the 2586 Modified Version, as the publisher. 2587D. Preserve all the copyright notices of the Document. 2588E. Add an appropriate copyright notice for your modifications 2589 adjacent to the other copyright notices. 2590F. Include, immediately after the copyright notices, a license notice 2591 giving the public permission to use the Modified Version under the 2592 terms of this License, in the form shown in the Addendum below. 2593G. Preserve in that license notice the full lists of Invariant Sections 2594 and required Cover Texts given in the Document's license notice. 2595H. Include an unaltered copy of this License. 2596I. Preserve the section entitled "History", and its title, and add to 2597 it an item stating at least the title, year, new authors, and 2598 publisher of the Modified Version as given on the Title Page. If 2599 there is no section entitled "History" in the Document, create one 2600 stating the title, year, authors, and publisher of the Document as 2601 given on its Title Page, then add an item describing the Modified 2602 Version as stated in the previous sentence. 2603J. Preserve the network location, if any, given in the Document for 2604 public access to a Transparent copy of the Document, and likewise 2605 the network locations given in the Document for previous versions 2606 it was based on. These may be placed in the "History" section. 2607 You may omit a network location for a work that was published at 2608 least four years before the Document itself, or if the original 2609 publisher of the version it refers to gives permission. 2610K. In any section entitled "Acknowledgements" or "Dedications", 2611 preserve the section's title, and preserve in the section all the 2612 substance and tone of each of the contributor acknowledgements 2613 and/or dedications given therein. 2614L. Preserve all the Invariant Sections of the Document, 2615 unaltered in their text and in their titles. Section numbers 2616 or the equivalent are not considered part of the section titles. 2617M. Delete any section entitled "Endorsements". Such a section 2618 may not be included in the Modified Version. 2619N. Do not retitle any existing section as "Endorsements" 2620 or to conflict in title with any Invariant Section. 2621 2622If the Modified Version includes new front-matter sections or 2623appendices that qualify as Secondary Sections and contain no material 2624copied from the Document, you may at your option designate some or all 2625of these sections as invariant. To do this, add their titles to the 2626list of Invariant Sections in the Modified Version's license notice. 2627These titles must be distinct from any other section titles. 2628 2629You may add a section entitled "Endorsements", provided it contains 2630nothing but endorsements of your Modified Version by various 2631parties--for example, statements of peer review or that the text has 2632been approved by an organization as the authoritative definition of a 2633standard. 2634 2635You may add a passage of up to five words as a Front-Cover Text, and a 2636passage of up to 25 words as a Back-Cover Text, to the end of the list 2637of Cover Texts in the Modified Version. Only one passage of 2638Front-Cover Text and one of Back-Cover Text may be added by (or 2639through arrangements made by) any one entity. If the Document already 2640includes a cover text for the same cover, previously added by you or 2641by arrangement made by the same entity you are acting on behalf of, 2642you may not add another; but you may replace the old one, on explicit 2643permission from the previous publisher that added the old one. 2644 2645The author(s) and publisher(s) of the Document do not by this License 2646give permission to use their names for publicity for or to assert or 2647imply endorsement of any Modified Version. 2648 2649 26505. COMBINING DOCUMENTS 2651 2652You may combine the Document with other documents released under this 2653License, under the terms defined in section 4 above for modified 2654versions, provided that you include in the combination all of the 2655Invariant Sections of all of the original documents, unmodified, and 2656list them all as Invariant Sections of your combined work in its 2657license notice. 2658 2659The combined work need only contain one copy of this License, and 2660multiple identical Invariant Sections may be replaced with a single 2661copy. If there are multiple Invariant Sections with the same name but 2662different contents, make the title of each such section unique by 2663adding at the end of it, in parentheses, the name of the original 2664author or publisher of that section if known, or else a unique number. 2665Make the same adjustment to the section titles in the list of 2666Invariant Sections in the license notice of the combined work. 2667 2668In the combination, you must combine any sections entitled "History" 2669in the various original documents, forming one section entitled 2670"History"; likewise combine any sections entitled "Acknowledgements", 2671and any sections entitled "Dedications". You must delete all sections 2672entitled "Endorsements." 2673 2674 26756. COLLECTIONS OF DOCUMENTS 2676 2677You may make a collection consisting of the Document and other documents 2678released under this License, and replace the individual copies of this 2679License in the various documents with a single copy that is included in 2680the collection, provided that you follow the rules of this License for 2681verbatim copying of each of the documents in all other respects. 2682 2683You may extract a single document from such a collection, and distribute 2684it individually under this License, provided you insert a copy of this 2685License into the extracted document, and follow this License in all 2686other respects regarding verbatim copying of that document. 2687 2688 26897. AGGREGATION WITH INDEPENDENT WORKS 2690 2691A compilation of the Document or its derivatives with other separate 2692and independent documents or works, in or on a volume of a storage or 2693distribution medium, does not as a whole count as a Modified Version 2694of the Document, provided no compilation copyright is claimed for the 2695compilation. Such a compilation is called an "aggregate", and this 2696License does not apply to the other self-contained works thus compiled 2697with the Document, on account of their being thus compiled, if they 2698are not themselves derivative works of the Document. 2699 2700If the Cover Text requirement of section 3 is applicable to these 2701copies of the Document, then if the Document is less than one quarter 2702of the entire aggregate, the Document's Cover Texts may be placed on 2703covers that surround only the Document within the aggregate. 2704Otherwise they must appear on covers around the whole aggregate. 2705 2706 27078. TRANSLATION 2708 2709Translation is considered a kind of modification, so you may 2710distribute translations of the Document under the terms of section 4. 2711Replacing Invariant Sections with translations requires special 2712permission from their copyright holders, but you may include 2713translations of some or all Invariant Sections in addition to the 2714original versions of these Invariant Sections. You may include a 2715translation of this License provided that you also include the 2716original English version of this License. In case of a disagreement 2717between the translation and the original English version of this 2718License, the original English version will prevail. 2719 2720 27219. TERMINATION 2722 2723You may not copy, modify, sublicense, or distribute the Document except 2724as expressly provided for under this License. Any other attempt to 2725copy, modify, sublicense or distribute the Document is void, and will 2726automatically terminate your rights under this License. However, 2727parties who have received copies, or rights, from you under this 2728License will not have their licenses terminated so long as such 2729parties remain in full compliance. 2730 2731 273210. FUTURE REVISIONS OF THIS LICENSE 2733 2734The Free Software Foundation may publish new, revised versions 2735of the GNU Free Documentation License from time to time. Such new 2736versions will be similar in spirit to the present version, but may 2737differ in detail to address new problems or concerns. See 2738http://www.gnu.org/copyleft/. 2739 2740Each version of the License is given a distinguishing version number. 2741If the Document specifies that a particular numbered version of this 2742License "or any later version" applies to it, you have the option of 2743following the terms and conditions either of that specified version or 2744of any later version that has been published (not as a draft) by the 2745Free Software Foundation. If the Document does not specify a version 2746number of this License, you may choose any version ever published (not 2747as a draft) by the Free Software Foundation. 2748 2749 2750ADDENDUM: How to use this License for your documents 2751 2752To use this License in a document you have written, include a copy of 2753the License in the document and put the following copyright and 2754license notices just after the title page: 2755 2756 Copyright (c) YEAR YOUR NAME. 2757 Permission is granted to copy, distribute and/or modify this document 2758 under the terms of the GNU Free Documentation License, Version 1.1 2759 or any later version published by the Free Software Foundation; 2760 with the Invariant Sections being LIST THEIR TITLES, with the 2761 Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. 2762 A copy of the license is included in the section entitled "GNU 2763 Free Documentation License". 2764 2765If you have no Invariant Sections, write "with no Invariant Sections" 2766instead of saying which ones are invariant. If you have no 2767Front-Cover Texts, write "no Front-Cover Texts" instead of 2768"Front-Cover Texts being LIST"; likewise for Back-Cover Texts. 2769 2770If your document contains nontrivial examples of program code, we 2771recommend releasing these examples in parallel under your choice of 2772free software license, such as the GNU General Public License, 2773to permit their use in free software. 2774\end{verbatim} 2775 2776\chapter{Acknowledgements} 2777 2778The acknowledgements and credits are found at the front of this guide 2779because no one ever reads them if they are at the back. 2780 2781\end{document} 2782 2783 2784 2785