1\section{Some other topics} 2\label{section:more} 3\setcounter{footnote}{0} 4 5\subsection{How do I cite Infernal?} 6 7If you'd like to cite a paper, please cite the Infernal 1.1 application 8note in \emph{Bioinformatics}: 9 10Infernal 1.1: 100-fold faster RNA homology searches. 11EP Nawrocki and SR Eddy. 12Bioinformatics, 29:2933-2935, 2013. 13 14The most appropriate citation is to the web site, 15\url{http://eddylab.org/infernal/}. You should also cite what version 16of the software you used. We archive all old versions, so anyone 17should be able to obtain the version you used, when exact 18reproducibility of an analysis is an issue. 19 20The version number is in the header of most output files. To see it 21quickly, do something like \prog{cmscan -h} to get a help page, and 22the header will say: 23 24\begin{sreoutput} 25# cmscan :: search sequence(s) against a CM database 26# INFERNAL 1.1.3 (November 2019) 27# Copyright (C) 2019 Howard Hughes Medical Institute. 28# Freely distributed under the BSD open source license. 29# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 30\end{sreoutput} 31 32So (from the second line there) this is from Infernal 1.1.3. 33 34\subsection{How do I report a bug?} 35 36Email us, at \url{eric.nawrocki@nih.gov}. 37 38Before we can see what needs fixing, we almost always need to 39reproduce a bug on one of our machines. This means we want to have a 40small, reproducible test case that shows us the failure you're seeing. 41So if you're reporting a bug, please send us: 42 43\begin{itemize} 44 \item A brief description of what went wrong. 45 \item The command line(s) that reproduce the problem. 46 \item Copies of any files we need to run those command lines. 47 \item Information about what kind of hardware you're on, what 48 operating system, and (if you compiled the software yourself rather 49 than running precompiled binaries), what compiler and version you 50 used, with what configuration arguments. 51\end{itemize} 52 53Depending on how glaring the bug is, we may not need all this 54information, but any work you can put into giving us a clean 55reproducible test case doesn't hurt and often helps. 56 57The information about hardware, operating system, and compiler is 58important. Bugs are frequently specific to particular configurations 59of hardware/OS/compiler. We have a wide variety of systems available 60for trying to reproduce bugs, and we'll try to match your system as 61closely as we can. 62 63If you first see a problem on some huge compute (like running a 64zillion query sequence over a huge profile database), it will really, 65really help us if you spend a bit of time yourself trying to isolate 66whether the problem really only manifests itself on that huge compute, 67or if you can isolate a smaller test case for us. The ideal bug report 68(for us) gives us everything we need to reproduce your problem in one 69email with at most some small attachments. 70 71Remember, we're not a company with dedicated support staff -- we're a 72small lab of busy researchers like you. Somebody here is going to drop 73what they're doing to try to help you out. Try to save us some time, 74and we're more likely to stay in our usual good mood. 75 76If we're in our usual good mood, we'll reply quickly. We'll probably 77tell you we fixed the bug in our development code, and that the fix 78will appear in the next Infernal release. This of course doesn't help you 79much, since nobody knows when the next Infernal release is going to be. 80So if possible, we'll usually try to describe a workaround for the 81bug. 82 83If the code fix is small, we might also tell you how to patch and 84recompile the code yourself. You may or may not want to do this. 85 86There are currently not enough open bugs to justify having a formal 87on-line bug tracking system. We have a bugtracking system, but it's 88internal. 89 90\subsection{Input files} 91 92\subsubsection{Reading from a stdin pipe using - (dash) as a filename argument} 93 94Generally, Infernal programs read their sequence and/or profile input 95from files. Unix power users often find it convenient to string an 96incantation of commands together with pipes (indeed, such wizardly 97incantations are a point of pride). For example, you might extract a 98subset of query sequences from a larger file using a one-liner 99combination of scripting commands (perl, awk, whatever). To facilitate 100the use of Infernal programs in such incantations, you can almost 101always use an argument of '-' (dash) in place of a filename, and the 102program will take its input from a standard input pipe instead of 103opening a file.\footnote{An important exception is the use of '-' in 104place of the target sequence file in \prog{cmsearch}. This is not 105allowed because \prog{cmsearch} first quickly reads the target 106sequence file to determine its size (it needs to know this to know how 107to set filter thresholds), then rewinds it and starts to process 108it. There's a couple of additional cases where stdin piping won't work 109described later in this section.} 110 111For example, the following three commands are entirely equivalent, and 112give essentially identical output: 113 114\user{cmalign tRNA5.cm mrum-tRNAs10.fa} 115 116\user{cat tRNA5.cm | ../src/cmalign - mrum-tRNAs10.fa} 117 118\user{cat mrum-tRNAs10.fa | ../src/cmalign tRNA5.cm -} 119 120Most Easel ``miniapp'' programs share the same ability of pipe-reading. 121 122Because the programs for CM fetching (\prog{cmfetch}) and 123sequence fetching (\prog{esl-sfetch}) can fetch any number of profiles 124or (sub)sequences by names/accessions given in a list, \emph{and} these 125programs can also read these lists from a stdin pipe, you can craft 126incantations that generate subsets of queries or targets on the 127fly. For example, you can extract and align all hits found by 128\prog{cmsearch} with an E-value below the inclusion threshold as 129follows (using the \textbackslash character twice below to split up the final 130command onto three lines): 131 132%note: can't use \user{} here because too many special characters 133%(believe me I tried). Only difference between \user{} and the way 134%I've done it below is that we're not bold, oh well. 135\indent\indent\small\verb+> cmsearch --tblout tRNA5.mrum-genome.tbl tRNA5.cm mrum-genome.fa+ \\ 136\indent\indent\small\verb+> esl-sfetch --index mrum-genome.fa+ \\ 137\indent\indent\small\verb+> cat tRNA5.mrum-genome.tbl | grep -v ^# | grep ! \ + \\ 138\indent\indent\small\verb+> | awk '{ printf(``%s/%s-%s %s %s %s\n'', $1, $8, $9, $8, $9, $1); }' \ + \\ 139\indent\indent\small\verb+> | esl-sfetch -Cf mrum-genome.fa - | ../src/cmalign tRNA5.cm - + \\ 140 141The first command performed the search using the CM file 142\ccode{tRNA5.c.cm} and sequence file \ccode{mrum-genome.fa} (these 143were used in the tutorial), and saved tabular output to 144\prog{tRNA5.mrum-genome.tbl}. The second command indexed the genome 145file to prepare it for fast (sub)sequence retrieval. In the third 146command we've extracted only those lines from 147\prog{tRNA5.mrum-genome.tbl} that do not begin with a \prog{\#} (these 148are comment lines) and also include a \prog{!} (these are hits that 149have E-values below the inclusion threshold) using the first two 150\prog{grep} commands. This output was then sent through \prog{awk} to 151reformat the tabular output to the ``GDF'' format that 152\prog{esl-sfetch} expects: \otext{<newname> <from> <to> <source 153seqname>}. These lines are then piped into \prog{esl-sfetch} (using 154the '-' argument) to retrieve each hit (only the subsequence that 155comprises each hit -- not the full target sequence). \prog{esl-sfetch} 156will output a FASTA file, which is finally being piped into 157\prog{cmalign}, again using the '-' argument. The output that is 158actually printed to the screen will be a multiple alignment of all the 159included tRNA hits. 160 161You can do similar commands piping subsets of CMs. Supposing you have a copy of Rfam in Rfam.cm: 162 163\user{cmfetch --index Rfam.cm} \\ 164\user{cat myqueries.list | cmfetch -f Rfam.cm - | cmsearch - mrum-genome.fa} 165 166This takes a list of query CM names/accessions in 167\prog{myqueries.list}, fetches them one by one from Rfam, and does an 168cmsearch with each of them against the sequence file 169\prog{mrum-genome.fa}. As above, the \prog{cat myqueries.list} part 170can be replaced by any suitable incantation that generates a list of 171profile names/accessions. 172 173There are three kinds of cases where using '-' is restricted or 174doesn't work. A fairly obvious restriction is that you can only use 175one '-' per command; you can't do a \prog{cmalign - -} that tries to 176read both a CM and sequences through the same stdin 177pipe. Second, another case is when an input file must be obligately 178associated with additional, separately generated auxiliary files, so 179reading data from a single stream using '-' doesn't work because the 180auxiliary files aren't present (in this case, using '-' will be 181prohibited by the program). An example is \prog{cmscan}, which needs 182its \prog{<cmfile>} argument to be associated with four auxiliary 183files named \prog{<cmfile>.i1\{mifp\}} that \prog{cmpress} creates, 184so \prog{cmscan} does not permit a '-' for its \prog{<cmfile>} 185argument. Finally, when a command would require multiple passes over 186an input file the command will generally abort after the first pass 187if you are trying to read that file through a standard input pipe 188(pipes are nonrewindable in general; a few Easel programs 189will buffer input streams to make multiple passes possible, but this 190is not usually the case). An important example is trying to search a 191database that is streamed into \prog{cmsearch}. This is not allowed 192because \prog{cmsearch} first reads the entire sequence file to 193determine its size (which dictates the filter thresholds that will be 194used for the search), then needs to rewind the file before beginning 195the search. 196 197In general, Infernal, HMMER and Easel programs document in their man page 198whether (and which) command line arguments can be replaced by '-'. 199You can always check by trial and error, too. The worst that can 200happen is a ``Failed to open file -'' error message, if the program 201can't read from pipes. 202 203 204 205