1\section{Some other topics}
2\label{section:more}
3\setcounter{footnote}{0}
4
5\subsection{How do I cite Infernal?}
6
7If you'd like to cite a paper, please cite the Infernal 1.1 application
8note in \emph{Bioinformatics}:
9
10Infernal 1.1: 100-fold faster RNA homology searches.
11EP Nawrocki and SR Eddy.
12Bioinformatics, 29:2933-2935, 2013.
13
14The most appropriate citation is to the web site,
15\url{http://eddylab.org/infernal/}. You should also cite what version
16of the software you used. We archive all old versions, so anyone
17should be able to obtain the version you used, when exact
18reproducibility of an analysis is an issue.
19
20The version number is in the header of most output files. To see it
21quickly, do something like \prog{cmscan -h} to get a help page, and
22the header will say:
23
24\begin{sreoutput}
25# cmscan :: search sequence(s) against a CM database
26# INFERNAL 1.1.3 (November 2019)
27# Copyright (C) 2019 Howard Hughes Medical Institute.
28# Freely distributed under the BSD open source license.
29# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
30\end{sreoutput}
31
32So (from the second line there) this is from Infernal 1.1.3.
33
34\subsection{How do I report a bug?}
35
36Email us, at \url{eric.nawrocki@nih.gov}.
37
38Before we can see what needs fixing, we almost always need to
39reproduce a bug on one of our machines. This means we want to have a
40small, reproducible test case that shows us the failure you're seeing.
41So if you're reporting a bug, please send us:
42
43\begin{itemize}
44 \item A brief description of what went wrong.
45 \item The command line(s) that reproduce the problem.
46 \item Copies of any files we need to run those command lines.
47 \item Information about what kind of hardware you're on, what
48   operating system, and (if you compiled the software yourself rather
49   than running precompiled binaries), what compiler and version you
50   used, with what configuration arguments.
51\end{itemize}
52
53Depending on how glaring the bug is, we may not need all this
54information, but any work you can put into giving us a clean
55reproducible test case doesn't hurt and often helps.
56
57The information about hardware, operating system, and compiler is
58important. Bugs are frequently specific to particular configurations
59of hardware/OS/compiler.  We have a wide variety of systems available
60for trying to reproduce bugs, and we'll try to match your system as
61closely as we can.
62
63If you first see a problem on some huge compute (like running a
64zillion query sequence over a huge profile database), it will really,
65really help us if you spend a bit of time yourself trying to isolate
66whether the problem really only manifests itself on that huge compute,
67or if you can isolate a smaller test case for us. The ideal bug report
68(for us) gives us everything we need to reproduce your problem in one
69email with at most some small attachments.
70
71Remember, we're not a company with dedicated support staff -- we're a
72small lab of busy researchers like you. Somebody here is going to drop
73what they're doing to try to help you out. Try to save us some time,
74and we're more likely to stay in our usual good mood.
75
76If we're in our usual good mood, we'll reply quickly.  We'll probably
77tell you we fixed the bug in our development code, and that the fix
78will appear in the next Infernal release. This of course doesn't help you
79much, since nobody knows when the next Infernal release is going to be.
80So if possible, we'll usually try to describe a workaround for the
81bug.
82
83If the code fix is small, we might also tell you how to patch and
84recompile the code yourself. You may or may not want to do this.
85
86There are currently not enough open bugs to justify having a formal
87on-line bug tracking system. We have a bugtracking system, but it's
88internal.
89
90\subsection{Input files}
91
92\subsubsection{Reading from a stdin pipe using - (dash) as a filename argument}
93
94Generally, Infernal programs read their sequence and/or profile input
95from files. Unix power users often find it convenient to string an
96incantation of commands together with pipes (indeed, such wizardly
97incantations are a point of pride). For example, you might extract a
98subset of query sequences from a larger file using a one-liner
99combination of scripting commands (perl, awk, whatever). To facilitate
100the use of Infernal programs in such incantations, you can almost
101always use an argument of '-' (dash) in place of a filename, and the
102program will take its input from a standard input pipe instead of
103opening a file.\footnote{An important exception is the use of '-' in
104place of the target sequence file in \prog{cmsearch}. This is not
105allowed because \prog{cmsearch} first quickly reads the target
106sequence file to determine its size (it needs to know this to know how
107to set filter thresholds), then rewinds it and starts to process
108it. There's a couple of additional cases where stdin piping won't work
109described later in this section.}
110
111For example, the following three commands are entirely equivalent, and
112give essentially identical output:
113
114\user{cmalign tRNA5.cm mrum-tRNAs10.fa}
115
116\user{cat tRNA5.cm | ../src/cmalign - mrum-tRNAs10.fa}
117
118\user{cat mrum-tRNAs10.fa | ../src/cmalign tRNA5.cm -}
119
120Most Easel ``miniapp'' programs share the same ability of pipe-reading.
121
122Because the programs for CM fetching (\prog{cmfetch}) and
123sequence fetching (\prog{esl-sfetch}) can fetch any number of profiles
124or (sub)sequences by names/accessions given in a list, \emph{and} these
125programs can also read these lists from a stdin pipe, you can craft
126incantations that generate subsets of queries or targets on the
127fly. For example, you can extract and align all hits found by
128\prog{cmsearch} with an E-value below the inclusion threshold as
129follows (using the \textbackslash character twice below to split up the final
130command onto three lines):
131
132%note: can't use \user{} here because too many special characters
133%(believe me I tried). Only difference between \user{} and the way
134%I've done it below is that we're not bold, oh well.
135\indent\indent\small\verb+> cmsearch --tblout tRNA5.mrum-genome.tbl tRNA5.cm mrum-genome.fa+ \\
136\indent\indent\small\verb+> esl-sfetch --index mrum-genome.fa+ \\
137\indent\indent\small\verb+> cat tRNA5.mrum-genome.tbl | grep -v ^# | grep ! \ + \\
138\indent\indent\small\verb+> | awk '{ printf(``%s/%s-%s %s %s %s\n'', $1, $8, $9, $8, $9, $1); }' \ + \\
139\indent\indent\small\verb+> | esl-sfetch -Cf mrum-genome.fa - | ../src/cmalign tRNA5.cm - + \\
140
141The first command performed the search using the CM file
142\ccode{tRNA5.c.cm} and sequence file \ccode{mrum-genome.fa} (these
143were used in the tutorial), and saved tabular output to
144\prog{tRNA5.mrum-genome.tbl}.  The second command indexed the genome
145file to prepare it for fast (sub)sequence retrieval. In the third
146command we've extracted only those lines from
147\prog{tRNA5.mrum-genome.tbl} that do not begin with a \prog{\#} (these
148are comment lines) and also include a \prog{!} (these are hits that
149have E-values below the inclusion threshold) using the first two
150\prog{grep} commands. This output was then sent through \prog{awk} to
151reformat the tabular output to the ``GDF'' format that
152\prog{esl-sfetch} expects: \otext{<newname> <from> <to> <source
153seqname>}.  These lines are then piped into \prog{esl-sfetch} (using
154the '-' argument) to retrieve each hit (only the subsequence that
155comprises each hit -- not the full target sequence). \prog{esl-sfetch}
156will output a FASTA file, which is finally being piped into
157\prog{cmalign}, again using the '-' argument. The output that is
158actually printed to the screen will be a multiple alignment of all the
159included tRNA hits.
160
161You can do similar commands piping subsets of CMs. Supposing you have a copy of Rfam in Rfam.cm:
162
163\user{cmfetch --index Rfam.cm} \\
164\user{cat myqueries.list | cmfetch -f Rfam.cm - | cmsearch - mrum-genome.fa}
165
166This takes a list of query CM names/accessions in
167\prog{myqueries.list}, fetches them one by one from Rfam, and does an
168cmsearch with each of them against the sequence file
169\prog{mrum-genome.fa}. As above, the \prog{cat myqueries.list} part
170can be replaced by any suitable incantation that generates a list of
171profile names/accessions.
172
173There are three kinds of cases where using '-' is restricted or
174doesn't work. A fairly obvious restriction is that you can only use
175one '-' per command; you can't do a \prog{cmalign - -} that tries to
176read both a CM and sequences through the same stdin
177pipe. Second, another case is when an input file must be obligately
178associated with additional, separately generated auxiliary files, so
179reading data from a single stream using '-' doesn't work because the
180auxiliary files aren't present (in this case, using '-' will be
181prohibited by the program). An example is \prog{cmscan}, which needs
182its \prog{<cmfile>} argument to be associated with four auxiliary
183files named \prog{<cmfile>.i1\{mifp\}} that \prog{cmpress} creates,
184so \prog{cmscan} does not permit a '-' for its \prog{<cmfile>}
185argument. Finally, when a command would require multiple passes over
186an input file the command will generally abort after the first pass
187if you are trying to read that file through a standard input pipe
188(pipes are nonrewindable in general; a few Easel programs
189will buffer input streams to make multiple passes possible, but this
190is not usually the case). An important example is trying to search a
191database that is streamed into \prog{cmsearch}. This is not allowed
192because \prog{cmsearch} first reads the entire sequence file to
193determine its size (which dictates the filter thresholds that will be
194used for the search), then needs to rewind the file before beginning
195the search.
196
197In general, Infernal, HMMER and Easel programs document in their man page
198whether (and which) command line arguments can be replaced by '-'.
199You can always check by trial and error, too. The worst that can
200happen is a ``Failed to open file -'' error message, if the program
201can't read from pipes.
202
203
204
205