1\chapter[\xpe] {\xpe: \\A Powerful Editor and Simulation Center}
2\label{chap:xpertedit}
3\index{\xpe|(}
4
5After having completed this chapter you will be able to perform
6sophisticated polymer chemistry simulations on polymer
7sequences---that can be edited in place---along with automatic mass
8recalculations.
9
10\renewcommand{\sectitle}{\xpe\ Invocation}
11\section*{\textcolor{sectioningcolor}{\sectitle}}
12\addcontentsline{toc}{section}{\numberline{}\sectitle}
13\index{\xpe!module~invocation}
14
15The \xpe\ module is easily called by pulling down the \guimenu{\xpe}
16menu item from the \mXp\ program's menu. The user may start the \xpe\
17module by:
18\smallskip
19
20\begin{itemize}
21
22\item Opening a sample polymer sequence;
23
24\item Creating a new polymer sequence;
25
26\item Loading a polymer sequence from disk.
27
28\end{itemize}
29
30
31\renewcommand{\sectitle}{\xpe\ Operation: \textit{In Medias Res}}
32\section*{\textcolor{sectioningcolor}{\sectitle}}
33\addcontentsline{toc}{section}{\numberline{}\sectitle}
34\index{\xpe!open~sequence}
35
36The first manner to start an \xpe\ session is by opening a sample
37sequence out of the list of sequences that were shipped along with
38\mXp. The \guimenu{\xpe}\guimenuitem{Open Sample Sequence} menu item
39opens the dialog box shown in
40Figure~\vref{fig:xpertedit-select-sample-sequence}. The drop-down
41widget in this dialog window lists all the polymer sequence files that
42were shipped along with \mXp. Simply select one item and click
43\guilabel{OK}. To select another polymer sequence file, click
44\guilabel{Cancel}, which will trigger the system's file selection
45dialog to open for you to browse to the location where the polymer
46sequence file is stored. The process is identical to the normal
47polymer sequence file opening (see below).
48
49\begin{figure}
50  \begin{center}
51    \includegraphics[width=0.75\textwidth]
52    {figures/xpertedit-select-sample-sequence.png}
53  \end{center}
54  \caption[Selection of a sample polymer sequence]{\textbf{Selection
55      of a sample polymer sequence.}  \mXp\ ships with a number of
56    sample polymer sequences which are designed to allow easy
57    demonstration of the \xpe\ features. This selection dialog lists
58    all the polymer sequence files that were shipped along with \mXp.}
59  \label{fig:xpertedit-select-sample-sequence}
60\end{figure}
61
62
63The second way to start an \xpe\ session is by creating a new polymer
64sequence\index{\xpe!create~sequence} (\guimenu{\xpe}\guimenuitem{New
65  Sequence} menu). The program immediately asks to select a polymer
66chemistry definition, as shown in
67Figure~\ref{fig:xpertedit-choose-pol-chem-def}. The drop-down widget
68lists all the polymer chemistry definitions currently registered on
69the system. If the polymer chemistry definition is not listed,
70clicking onto \guilabel{Cancel} will let the user browse the disk in
71search for a polymer chemistry definition file.\footnote{Note that
72  once the sequence is saved, the polymer chemistry definition file
73  \emph{must} be registered or the sequence file will not be
74  loadable. This is described in a later chapter.} Once the polymer
75chemistry definition has been selected and successfully parsed by the
76program, the user is presented with an empty sequence editor.
77
78The third way to start an \xpe\ session is by opening an existing
79polymer sequence file. Once the sequence file has been opened, the user is
80presented with a sequence editor as represented in
81Figure~\ref{fig:xpertedit-protein-main-view}. At this point, when the
82user starts editing a sequence, the characters entered at the
83keyboard, or pasted from the clipboard, will be interpreted using the
84polymer chemistry definition that was selected in the initialization
85window described above.
86
87
88\begin{figure}
89  \begin{center}
90    \includegraphics[width=0.75\textwidth]
91    {figures/xpertedit-choose-pol-chem-def.png}
92  \end{center}
93  \caption[Selection of the polymer chemistry
94  definition]{\textbf{Selection of the polymer chemistry definition.}
95    When creating a new polymer sequence, it is necessary to first
96    indicate of what polymer chemistry definition the polymer sequence
97    will be. This window lists all the polymer chemistry definition
98    currently available on the system.}
99  \label{fig:xpertedit-choose-pol-chem-def}
100\end{figure}
101
102
103\begin{figure}
104  \begin{center}
105    \includegraphics[width=0.8\textwidth]
106    {figures/xpertedit-protein-main-view.png}
107  \end{center}
108  \caption[The \xpe\ module]{\textbf{The \xpe\ module.} This figure shows
109    a polymer sequence displayed in an {\xpe}or window.}
110  \label{fig:xpertedit-protein-main-view}
111\end{figure}
112
113
114
115Now, of course, editing a polymer sequence is not enough for a mass
116spec\-trome\-tric-ori\-ented software suite; what we want is
117\emph{compute masses!}\index{\xpe!mass~calculation} The mass
118calculation process is immediately visible on the right hand side of
119the sequence editor shown in
120Figure~\ref{fig:xpertedit-protein-main-view}. The \guilabel{Masses}
121frame~box widget contains two items: \smallskip
122
123\begin{itemize}
124
125\item \guilabel{Whole
126    Sequence}\index{\xpe!mass~calculation!whole~sequence} A frame~box
127  widget displaying the \guilabel{Mono} and \guilabel{Avg} masses of
128  the whole polymer sequence, irrespective of the current selection;
129
130\item \guilabel{Selected Sequence}\index{\xpe!mass~calculation!selected~region}
131  A frame~box widget displaying the \guilabel{Mono} and \guilabel{Avg}
132  masses of the currently selected region of the polymer sequence.
133
134\end{itemize}
135%
136The user may change the mass calculation engine configuration at any
137point in time using the widgets in the \guilabel{Calculation
138  Engine}\index{\xpe!mass~calculation~engine} tool~box that
139contains the following configurable parameters: \smallskip
140
141\begin{itemize}
142
143\item \guilabel{Polymer}
144
145  \begin{itemize}
146
147  \item \guilabel{Left
148      Cap}\index{\xpe!mass~calculation~engine!left~cap} If checked,
149    the left cap of the polymer sequence will be taken into account;
150
151  \item \guilabel{Right
152      Cap}\index{\xpe!mass~calculation~engine!right~cap} If checked,
153    the right cap of the polymer sequence will be taken into
154    account. Note that if \guilabel{Force} is checked also, then the
155    modification is taken into account even when selecting a region of
156    the sequence that does not encompass the left end monomer;
157
158  \item \guilabel{Left
159      Modif}\index{\xpe!mass~calculation~engine!left~modif} If
160    checked, the modification of the polymer sequence's left end will
161    be taken into account. Note that if \guilabel{Force} is checked
162    also, then the modification is taken into account even when
163    selecting a region of the sequence that does not encompass the
164    right end monomer;
165
166  \item \guilabel{Right
167      Modif}\index{\xpe!mass~calculation~engine!right~modif} Same as
168    above, but for the right end modification;
169
170  \end{itemize}
171
172\item \guilabel{Selections and regions}
173
174  \begin{itemize}
175
176  \item
177    \guilabel{Multi-region}\index{\xpe!mass~calculation~engine!multi-region}
178    If checked, the sequence editor allows more than one region to be
179    selected at any given time (no limitation on the number of
180    selected regions;
181
182  \item
183    \guilabel{Multi-selection}\index{\xpe!mass~calculation~engine!multi-selection}
184    If checked, the sequence editor allows not only the selection of
185    multiple regions at any given time, but also the selection of
186    totally or partially overlapping regions.
187
188  \item
189    \guilabel{Oligomers}\index{\xpe!mass~calculation~engine!oligomers}
190    When multiple regions are selected, each selected region behaves
191    like an oligomer, that is, it gets its left and right end caps
192    added (if the corresponding calculation engine configuration item
193    is activated);
194
195  \item \guilabel{Residual
196      chains}\index{\xpe!mass-~alculation~engine!residual~chains}
197    When multiple regions are selected, the different regions behave
198    like residual chains: the left and end caps are added only once
199    (if the corresponding calculation engine configuration item is
200    activated).
201
202  \end{itemize}
203
204\item \guilabel{Monomers}
205
206  \begin{itemize}
207
208  \item
209    \guilabel{Modifications}\index{\xpe!mass~calculation~engine!modifications}
210    If checked, the monomer modifications will be taken into account;
211
212  \item
213    \guilabel{Cross-links}\index{\xpe!mass~calculation~engine!cross-links}
214    If checked, the cross-links in the polymer sequence will be taken
215    into account. Note that \emph{only cross-links fully encompassed
216      by the selected sequence region(s)} will be taken into account
217    for the \guilabel{Selected sequence} mass calculations. If any
218    number of cross-links are not fully encompassed by the currently
219    selected sequence region, then that number is displayed along with
220    the following label visible in the \guilabel{Selected sequence}
221    group box : \guilabel{Incomplete cross-links:}.
222
223  \end{itemize}
224
225\item \guilabel{Ionization}\index{\xpe!mass~calculation~engine!ionization}
226
227  \begin{itemize}
228
229  \item \guivalue{+H} This formula represents the ionization agent
230    formula (that is, a protonation);
231
232  \item \guilabel{Unitary charge} \guivalue{1} Charge brought by the
233    ionization agent. In the example, a protonation brings a positive
234    charge;
235
236  \item \guilabel{Ionization level} \guivalue{1} Level of the
237    ionization requested. In the example, a single ionization is
238    requested, that is a monoprotonation.
239
240  \end{itemize}
241
242\end{itemize}
243%
244When any parameter listed above is changed, the recalculation of the
245masses---for both the \guilabel{Whole sequence} and the
246\guilabel{Selected sequence}---is triggered and the new masses are
247updated in their respective line~edit widgets, described earlier. The
248fact that the user can specify ionization rules should make it clear
249that the values that are displayed are actually \mz ratios (as long as
250one ionization is required).
251
252
253\renewcommand{\sectitle}{The Editor Window Menu}
254\section*{\textcolor{sectioningcolor}{\sectitle}}
255\addcontentsline{toc}{section}{\numberline{}\sectitle}
256\index{\xpe!editor~window}
257
258The menu bar in the polymer sequence editor displays a number of menu
259items, reviewed below: \smallskip
260
261\begin{itemize}
262
263  %%%%%%% FILE
264\item \guimenu{File} (Figure~\ref{fig:xpertedit-file-menu})
265
266  \begin{itemize}
267
268  \item \guimenu{File}\guimenuitem{Close} Closes the sequence;
269
270  \item \guimenu{File}\guimenuitem{Save} Saves the sequence. If the
271    sequence has no filename yet, the user is invited to select a
272    filename;
273
274  \item \guimenu{File}\guimenuitem{Save As} Save the sequence in a new
275    file;
276
277  \item \guimenu{File}\guimenuitem{Import
278      Raw}\index{\xpe!sequence-editor!sequence~import} Opens a text file
279    and tries to import the sequence. If invalid monomer code
280    characters are found, the user is given a chance to revise the
281    imported sequence;
282
283  \item \guimenu{File}\guimenuitem{Export to
284      Clipboard}\index{\xpe!sequence~editor!sequence~export} Copies the
285    sequence and all the data (masses and calculation options) to the
286    clipboard, in the form of simple text;
287
288  \item \guimenu{File}\guimenuitem{Export to File} Writes to file the
289    sequence and all the data (masses and calculation options) to the
290    clipboard, in the form of simple text (if a filename was already
291    selected, otherwise the user is invited to select a file into
292    which the data are to be written);
293
294  \item \guimenu{File}\guimenuitem{Select export file} Invites the
295    user to select a file into which the data are to be written).
296
297  \end{itemize}
298
299  %%%%%%% EDIT
300\item \guimenu{Edit}
301
302  \begin{itemize}
303
304  \item \guimenu{Edit}\guimenuitem{Copy} Copies the current selected
305    region(s) (if any) to the clipboard. If there are more than one
306    region currently selection, then the user is informed that the
307    copied sequence will correspond to these two sequences joined
308    together. \emph{Be aware, that the order in which the region
309      sequences are joined is the order in which the regions were
310      selected, and not the order in which the sequences appears in
311      the whole polymer sequence};
312
313  \item \guimenu{Edit}\guimenuitem{Cut} Copies the current selection
314    (if any) to the clipboard and removes it from the sequence. Note
315    that it is not yet possible to cut more than one selected region
316    in one single operation;;
317
318  \item \guimenu{Edit}\guimenuitem{Paste} Pastes the sequence from the
319    clipboard into the sequence at point (that is the current cursor
320    location). If the pasted sequence is found to contain characters
321    not valid for the current polymer chemistry definition, the user
322    is given a chance to revise the pasted sequence. If one sequence
323    region was selected, it is replaced with the pasted sequence. If
324    more than one sequence region was selected, the operation cannot
325    be performed and the user is informed;
326
327  \item \guimenu{Edit}\guimenuitem{Find
328      Sequence}\index{\xpe!sequence~editor!find~sequence~motif} Finds
329    a sequence motif in the polymer sequence.
330
331  \end{itemize}
332
333  %%%%%%% CHEMISTRY
334\item
335  \guimenu{Chemistry}\index{\xpe!sequence~editor!chemical~simulations}
336  (Figure~\ref{fig:xpertedit-chemistry-menu})
337
338  \begin{itemize}
339
340  \item \guimenu{Chemistry}\guimenuitem{Modify Monomer(s)} Modify (or
341    unmodify) one or more monomers in the polymer sequence;
342
343  \item \guimenu{Chemistry}\guimenuitem{Modify Polymer} Set (or unset)
344    the left (or right, or both) modification of the polymer sequence;
345
346  \item \guimenu{Chemistry}\guimenuitem{Cross-link Monomers} Set
347    cross-links to monomers of the polymer sequence;
348
349  \item \guimenu{Chemistry}\guimenuitem{Cleave} Perform a
350    chemical/enzymatical cleavage of the polymer sequence;
351
352  \item \guimenu{Chemistry}\guimenuitem{Fragment} Perform the gas
353    phase fragmentation of the currently selected oligomer;
354
355  \item \guimenu{Chemistry}\guimenuitem{Mass Search} For any sequence
356    having a mass matching the searched mass;
357
358  \item \guimenu{Chemistry}\guimenuitem{Compute m/z Ratios} Starting
359    from a given \mz ratio and a given ionization status, calculate a
360    range of \mz ratios with a given ionization agent;
361
362  \item \guimenu{Chemistry}\guimenuitem{Determine Compositions}
363    Calculate the monomeric/element composition of the whole polymer
364    sequence or of the current selection;
365
366  \item \guimenu{Chemistry}\guimenuitem{pKa pH pI} Perform acidity, pH
367    and isoelectric point calculations on the whole sequence or on the
368    current selection.
369
370  \end{itemize}
371
372  %%%%%%% OPTIONS
373\item
374  \guimenu{Options}\index{\xpe!sequence~editor!number~display}
375
376  \begin{itemize}
377
378  \item \guimenu{Options}\guimenuitem{Decimal places} Set the number
379    of decimal places to be used to display the numerical values.
380
381  \end{itemize}
382
383
384\end{itemize}
385
386
387\begin{figure}
388  \begin{center}
389    \includegraphics[width=0.33\textwidth]
390    {figures/xpertedit-file-menu.png}
391  \end{center}
392  \caption[The \xpe\ window File menu]{\textbf{The \xpe\ window File
393      menu.} This figure shows the File menu as dropped-down menu
394    in the polymer sequence window.}
395  \label{fig:xpertedit-file-menu}
396\end{figure}
397
398
399\begin{figure}
400  \begin{center}
401    \includegraphics[width=0.33\textwidth]
402    {figures/xpertedit-chemistry-menu.png}
403  \end{center}
404  \caption[The \xpe\ window Chemistry menu]{\textbf{The \xpe\ window
405      Chemistry menu.} This figure shows the Chemistry menu as
406    dropped-down menu in the polymer sequence window.}
407  \label{fig:xpertedit-chemistry-menu}
408\end{figure}
409
410
411
412
413\renewcommand{\sectitle}{Editing Polymer
414  Sequences}\index{\xpe!sequence~editor!sequence~editing}
415\section*{\textcolor{sectioningcolor}{\sectitle}}
416\addcontentsline{toc}{section}{\numberline{}\sectitle}
417
418As described earlier, in the chapter about the \xpd\ module, a polymer
419chemistry definition may allow more than one character to qualify the
420codes of the monomers (see chapter~\ref{chap:xpertdef},
421section~\vref{sect:monomers}). It was noted also that it is not
422because the number of allowed characters is \cfgval{3}, for example,
423that all the monomer codes of the polymer chemistry definition must be
424defined using three characters: \cfgval{3} is the \emph{maximum}
425number of characters that may be used.
426
427\renewcommand{\sectitle}{Multi-Character Monomer Codes}
428\subsection*{\textcolor{sectioningcolor}{\sectitle}}
429\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
430\index{\xpe!multi-character~monomer~code}
431
432\begin{figure}
433  \begin{center}
434    \includegraphics[scale=0.75]
435    {figures/xpertedit-3-letter-code-whole-process.png}
436  \end{center}
437  \caption[Multi-character code sequence editing in
438  \xpe]{\textbf{Multi-character code sequence editing in \xpe.} This
439    figure shows the process by which it is made possible to edit
440    polymer sequences with a monomer code set that allows more than
441    one character per code.}
442  \label{fig:xpertedit-3-letter-code-whole-process}
443\end{figure}
444
445This section deals with the editing of a polymer sequence for which
446monomer codes can be made of more than one character.
447Figure~\vref{fig:xpertedit-3-letter-code-whole-process} shows the case
448of a polymer sequence for which the polymer chemistry definition
449allows three characters to define monomer codes. The example is based
450on the following real-world situation: the user wants to edit the
451sequence by insertion---at the cursor point---of a new ``Aspartate''
452monomer, of which the user knows only that its code starts with an
453`A'. The cursor is located after the first ``Ala'' monomer at position
4541 (panel~1st).
455
456After keying-in \kbdKey{A} (panel~1st), no sequence modification is
457visible in the sequence editor. Instead, an `A' character is now
458displayed in the left line~edit widget under the sequence.  The reason
459of this apparently odd behaviour is that the polymer chemistry
460definition allows up to 3 characters to describe a monomer code. If no
461monomer vignette is displayed in the polymer sequence, that means that
462more than one monomer code start with an `A' character: \xpe\ cannot
463figure out which monomer code was actually meant by the user when
464keying-in \kbdKey{A}.
465
466There is a way, called \emph{code
467  completion}\index{\xpe!code~completion}, to know which monomer
468code(s)---in the current polymer chemistry definition---do start with
469the keyed-in character(s) (currently, `A'). The user can always enter
470the \emph{code completion mode} by hitting the \kbdKey{ENTER}~key.
471This is what is shown in the panel~1st, right hand side
472\guilabel{Monomer List} listview widget (click on that
473\guilabel{Monomer List} label to show that list if it is not already
474visible). We see that, in the current polymer chemistry definition,
475four monomer codes start with an `A' character, and these are ``Ala'',
476``Arg'', ``Asp'' and ``Asn'' (as highlighted in the code completion
477monomer list).
478
479Because we now know that the code we are to key-in is ``Asp'', we
480key-in a \kbdKey{s}. The result is shown in panel~2nd. What we see
481here is that, this time also, nothing changed in the polymer sequence.
482What changed is that the character string in the left line~edit widget
483below the sequence is now ``As''.  Let's key-in once more the
484\kbdKey{ENTER}~key.  This time, only two items are highlighted:
485``Asp'' and ``Asn'' in the code completion monomer list (panel~2nd).
486This is easy to understand: there are only two monomer codes that
487start with the two letters `A' and `s' (``As'') that we have keyed-in
488so far.  At this time, we key-in a last character: \kbdKey{p}. At this
489point, the monomer is effectively inserted in the polymer sequence, as
490the ``Asp'' monomer left of the cursor, as shown in panel~3rd.
491
492\renewcommand{\sectitle}{Unambiguous Single-/Multi-Character Monomer Codes}
493\subsection*{\textcolor{sectioningcolor}{\sectitle}}
494\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
495
496Let's imagine that we have a polymer chemistry definition that allows
497up to 3 characters for the definition of monomer codes, but that we
498have one of these monomer codes (let's say the one for the
499``Glutamate'' monomer) that is one-letter-long: `E'. This monomer code
500`E' is the only one in the polymer chemistry definition to start with
501an `E' character. In this case, when we key-in \kbdKey{E}, we'll
502observe that the monomer code is immediately validated and that its
503corresponding monomer vignette is also immediately inserted in the
504polymer sequence.  This is because, \emph{if there is no ambiguity,
505  \xpe\ will immediately validate the code being edited}.
506
507The mechanism described above means that the user is absolutely free
508to define \emph{only single-character monomer codes} in a polymer
509chemistry definition; the behaviour of the program is thus to behave
510exactly as if the multi-character code feature was inexistent in the
511program: each time a new uppercase letter is keyed-in, it is
512automatically validated and the corresponding monomer is created in
513the sequence.
514
515
516\renewcommand{\sectitle}{Erroneous Monomer Codes}
517\subsection*{\textcolor{sectioningcolor}{\sectitle}}
518\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
519\index{\xpe!monomer~code~errors}
520
521The typing error detection system triggers immediate alerts whenever
522the code beign keyed-in is incorrect. This is described in
523Figure~\vref{fig:xpertedit-sequence-editor-bad-char}. If the user
524enters an uppercase character not matching any monomer code currently
525defined in the polymer chemistry definition, or a lowercase character
526as the first character of a monomer code, the program immediately
527complains in the right line~edit widget below the sequence. In this
528case, the monomer code is not put into the left text widget, which
529means it is simply ignored.
530
531\begin{figure}
532  \begin{center}
533    \includegraphics[width=0.8\textwidth]
534    {figures/xpertedit-sequence-editor-bad-char.png}
535  \end{center}
536  \caption[Bad code character in \xpe\ sequence editor]{\textbf{Bad
537      code character in \xpe\ sequence editor.} This figure shows the
538    feedback that the user is provided by the code editing engine,
539    when a bad character code is keyed-in.}
540  \label{fig:xpertedit-sequence-editor-bad-char}
541\end{figure}
542
543If the user starts keying-in valid monomer character codes, like for
544example we did earlier with ``As'', and that she wants to erase these
545characters because she changed her mind, she \emph{must not} use the
546\kbdKey{BACKSPACE} key, because this key will erase the monomer left
547of the cursor point in the polymer sequence! The way that the user has
548to remove the characters currently displayed in the left line~edit
549widget below the sequence, is to key-in the \kbdKey{Esc} key once for
550each character. For example, let's say you have already keyed-in
551\kbdKey{A} and \kbdKey{s}. In this case the left line~edit widget
552displays these two characters: ``As''. Now, if the user changes his
553mind, not willing to enter ``Asp'' monomer code anymore, but ``Gly''
554instead, all she has to do is to key-in the \kbdKey{Esc} key once for
555the `s' character (which disappears) and once more to remove the
556remaining `A' character.  At this point it is possible to start fresh
557with the ``Gly'' monomer code by keying-in sequentially \kbdKey{G},
558\kbdKey{l} and finally \kbdKey{y}.
559
560
561\renewcommand{\sectitle}{Simplified Editing}
562\subsection*{\textcolor{sectioningcolor}{\sectitle}}
563\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
564
565When the monomer codes of a given polymer chemistry definition are too
566numerous or too long to remember, one simplified editing strategy is
567by using the list of available monomers located on the right side of
568the sequence editor (widget labelled \guilabel{Monomer list}). The
569items in the list are active: if double-clicked, an item will see its
570corresponding monomer code inserted in the sequence at the current
571cursor location. This list thus makes it easy to ``visually'' edit the
572polymer sequence without having to remember all the codes in the
573polymer chemistry definition.
574
575
576\renewcommand{\sectitle}{Finding sequence motifs}
577\section*{\textcolor{sectioningcolor}{\sectitle}}
578\addcontentsline{toc}{section}{\numberline{}\sectitle}
579\index{\xpe!sequence~editor!find~sequence~motif}
580
581Finding sequence motifs in the polymer sequence is performed by
582selecting the \guimenu{Edit}\guimenuitem{Find Sequence} menu item. The
583dialog window is shown in
584Figure~\vref{fig:xpertedit-find-sequence-dlg}. When performing the
585first search in a polymer sequence, the \guilabel{Find} button should
586be used. This will trigger a search starting at the beginning of the
587polymer sequence. For each successive search, the \guilabel{Next}
588button should be used.
589
590Each searched sequence motif will be stored in a history list that is
591made available by dropping down the combo box widget where the
592sequence motif is entered. The \guilabel{Clear history} button will
593erase all the searched sequence motifs from the history, thus
594resetting it.
595
596\begin{figure}
597  \begin{center}
598    \includegraphics[width=0.5\textwidth]
599    {figures/xpertedit-find-sequence-dlg.png}
600  \end{center}
601  \caption[Finding a sequence motif in the polymer
602  sequence]{\textbf{Finding a sequence motif in the polymer sequence.}
603    The first iteration should be performed by clicking onto the
604    \guilabel{Find} button, and each following iterations should be
605    performed using the \guilabel{Next} button.}
606  \label{fig:xpertedit-find-sequence-dlg}
607\end{figure}
608
609
610\renewcommand{\sectitle}{Importing Sequences}
611\section*{\textcolor{sectioningcolor}{\sectitle}}
612\addcontentsline{toc}{section}{\numberline{}\sectitle}
613\index{\xpe!sequence~editor!sequence~import}
614
615Very often, the user will make a sequence search on the web and be
616provided with a polymer sequence that is crippled with non-code
617characters. That web output might either be saved in a text file for
618future reference or copied to the clipboard for immediate use in \mXp.
619The two cases are reviewed below.
620
621
622\renewcommand{\sectitle}{Importing From The Clipboard}
623\subsection*{\textcolor{sectioningcolor}{\sectitle}}
624\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
625
626\xpe\ provides a convenient way to spot non-valid characters in a text
627and to let the user ``purify'' the imported sequence.  A
628clipboard-imported sequence is systematically parsed. When invalid
629characters are found, the window depicted in
630Figure~\vref{fig:xpertedit-sequence-editor-sequence-import-errors-first}
631is presented to the user for her to make appropriate adjustments (in
632this example we tried to copy from clipboard the following sequence:
633``\texttt{!100 ATGCATGC ATGCATGC ATGCATGC ATGCAUGC
634  anotherSilly-Text;}'').
635
636\begin{figure}
637  \begin{center}
638    \includegraphics[width=0.75\textwidth]
639    {figures/xpertedit-sequence-editor-sequence-import-errors-first.png}
640  \end{center}
641  \caption[Clipboard-imported sequence
642  error-checking]{\textbf{Clipboard-imported sequence error-checking.}
643    If a sequence that is imported through the clipboard to the \xpe\
644    sequence editor contains invalid characters, the user is provided
645    with a facility to ``purify'' the sequence. This facility is
646    provided to the user through the window depicted in this figure.}
647  \label{fig:xpertedit-sequence-editor-sequence-import-errors-first}
648\end{figure}
649
650As soon as a character does not correspond to any valid monomer code,
651it is tagged, and the sequence is presented to the user in a text~edit
652widget (\guilabel{Initial Sequence}) with the all the improper
653characters tagged by underlining. At that point, if the user clicks
654the \guilabel{Remove Tagged From Initial} button, all the tagged
655characters will be automatically removed and the purified sequence
656will show up in the \guilabel{Purified Sequence} text~edit widget.
657
658Also, the user is provided with automatic ``purification'' procedures
659whereby it is possible to remove one or more classes of characters
660from the imported sequence (\guilabel{Purification Options} frame
661widget). Checking one or more of the \guilabel{Numerals} or
662\guilabel{Spaces} or \guilabel{Punctuation} or \guilabel{LowerCase} or
663\guilabel{Uppercase} checkbuttons, or even entering other
664user-specified regular expressions in the \guilabel{Other (RegExp)}
665line~edit widget, will elicit their removal from the imported sequence
666after the user clicks the \guilabel{Purify Initial (Options)} button.
667
668\begin{figure}
669  \begin{center}
670    \includegraphics[width=0.75\textwidth]
671    {figures/xpertedit-sequence-editor-sequence-import-errors-second.png}
672  \end{center}
673  \caption[Clipboard-imported sequence
674  purification]{\textbf{Clipboard-imported sequence purification.}
675    There are a number of ways to purify a sequence. Here the
676    \guilabel{Remove Tagged From Initial} button was clicked. The
677    purified sequence shows up in the \guilabel{Purified Sequence}
678    text~edit widget.}
679  \label{fig:xpertedit-sequence-editor-sequence-import-errors-second}
680\end{figure}
681
682
683When the user is confident that almost all the erroneous characters
684have been removed
685(Figure~\vref{fig:xpertedit-sequence-editor-sequence-import-errors-second}),
686she can click the \guilabel{Test Purified} button, which will trigger
687a ``re-reading'' of the sequence in the \guilabel{Purified Sequence}
688text~edit widget. If erroneous characters are still found, they are
689tagged.
690
691Note that, for maximum flexibility, the user is allowed an immediate
692and direct editing of the purified sequence in the \guilabel{Purified
693  Sequence} text~edit widget (that is, that text~edit widget is
694\emph{not} read-only).
695
696Once the sequence if finally depured from all the invalid characters,
697the user can select it in the text~edit widget and paste it in the
698\xpe\ sequence editor. This time, the paste operation will be
699error-free.  Note that if any sequence portion is currently selected,
700it will be replaced by the one that is being pasted into the editor.
701
702
703\renewcommand{\sectitle}{Importing From Raw Text Files}
704\subsection*{\textcolor{sectioningcolor}{\sectitle}}
705\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
706
707
708It might be of interest to be able to import a sequence from a raw
709file. To this end, the user is provided the menu
710\guimenu{File}\guimenuitem{Import Raw} that opens up a file selection
711window from which to choose the file to import. The program then
712iterates in the lines of that file and checks their contents for
713validity. If errors are found, then the same process as described
714earlier for clipboard-imported sequences is started. The user can then
715purify the sequence imported from the file and finally integrate that
716sequence in the polymer sequence currently edited. Note that if any
717sequence portion is currently selected, it will be replaced by the one
718that is being imported.
719
720
721\renewcommand{\sectitle}{Multi-region
722  Selections}
723\section*{\textcolor{sectioningcolor}{\sectitle}}
724\addcontentsline{toc}{section}{\numberline{}\sectitle}
725\index{\xpe!multi-region~selections}
726\index{\xpe!sequence~editor!multi-region~selection}
727
728\mXp\ implements a sophisticated multi-region selection model. Two
729selection modes are available:\\
730
731\begin{itemize}
732
733\item \emph{Multi-region selection mode:}\/ In this mode, it is
734  possible to select more than one region in the polymer sequence. In
735  all cases below, make sure that the \guilabel{Multi-region}
736  checkbutton is checked in \guilabel{Selections and regions} group
737  box. This is how these selections are performed:
738
739  \begin{itemize}
740
741  \item \textsl{With the
742      mouse:}\index{\xpe!sequence~editor!mouse~selections}
743    Left-click and drag to make the first selection. Go with the mouse
744    cursor at the beginning of new selection, hold the \kbdKey{Ctrl}
745    key down while left-clicking and dragging to perform the second
746    region selection. Continue as may times as necessary;
747
748  \item \textsl{With the
749      keyboard:}\index{\xpe!sequence~editor!keyboard~selections}
750    Position the cursor at the beginning of the first region to be
751    selected, hold the \kbdKey{Ctrl}+\kbdKey{Shift} keys down while
752    moving the cursor with the direction keys (\kbdKey{$\leftarrow$},
753    \kbdKey{$\rightarrow$}, \kbdKey{$\uparrow$},
754    \kbdKey{$\downarrow$}). Hold the \kbdKey{Ctrl} key down and use
755    the direction keys to go to the beginning of the new region
756    selection, press the \kbdKey{Shift} key and hold it down while
757    moving the cursor with the direction keys to actually perform the
758    region selection.
759
760  \end{itemize}
761
762\item \emph{Multi-selection region mode:}\/ In this mode (which
763  requires the multi-region selection mode to be enabled), it is
764  possible to perform selections that overlap. For example, one could
765  select the sequence ``MAMISGM'' and then select the sequence
766  ``SGMSGRKAS''. The overlapping sequence is thus ``SGM''.
767
768\end{itemize}
769
770\noindent Being able to select multiple regions and/or to select
771multiple times the same region involves some configurations, as far as
772calculating relevant masses is concerned. Indeed, whatever the
773selection mode that is enabled, each time one selection (overlapping
774with another or not) is added or removed, masses are recalculated for
775the current selection.\footnote{``Selection'', here, is thus used to
776  collectively represent all multi-region selections and
777  multi-selection regions at any given time in the polymer sequence
778  editor.} The way the multi-region selections and the multi-selection
779regions are handled, from the mass calculation standpoint, is
780configured as follows:\\
781
782\begin{itemize}
783
784\item \emph{Regions are oligomers:} In this configuration, each
785  selection behaves as an oligomer, and thus should normally be capped
786  on both its left and right ends. This is typically the situation
787  when the user wants to simulate the formation of a cross-linked
788  species arising from the cross-linking of two oligomers: each
789  oligomer is capped on both its ends;
790
791\item \emph{Regions are residual chains:} In this configuration, each
792  selection behaves as a residual chain, and thus the oligomer
793  resulting from the multi-region selections is capped on its left and
794  right ends only once. This situation is typically encountered when
795  simulating partial cleavages by first selecting an oligomer,
796  checking its mass and then continuing selection to simulate a longer
797  oligomer resulting from a partial cleavage. Also, the situation
798  might be encountered when there are multiple repeated sequence
799  motifs in a polymer sequence and mass data are difficult to analyze.
800
801\end{itemize}
802
803
804\renewcommand{\sectitle}{Polymer Sequence Modification}
805\section*{\textcolor{sectioningcolor}{\sectitle}}
806\addcontentsline{toc}{section}{\numberline{}\sectitle}
807
808It very much often happens that the (bio)~chemist uses chemical
809reactions to modify the polymer sequence she is working on. Mass
810spectrometry is then often used to check if the reaction proceeded
811properly or not. Further, in nature, chemical modifications of
812biopolymer sequences are very often encountered. For example, protein
813sequences get often modified as a means to regulate their function
814(phophorylations, for example, or acetylations, methylations\dots).
815Nucleic acid sequences are very often and extensively modified with
816modifications such as methylation\dots
817
818It is thus crucial that \mXp\ be able to model with high precision and
819flexibility the various chemical reactions that can be either made in
820the chemistry lab or found in nature. The \mXp\ program provides two
821different chemical modification processes:
822
823\begin{itemize}
824\item A process by which monomers belonging to the polymer sequence
825  can be individually modified;
826\item A process by which the whole polymer sequence can be modified,
827  either on its left end or on its right end or even on both ends.
828\end{itemize}
829
830\renewcommand{\sectitle}{Selected Monomer(s) Modification}
831\subsection*{\textcolor{sectioningcolor}{\sectitle}}
832\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
833\label{subsect:chemical-modification-monomers}
834\index{\xpe!simulations!monomer~modification}
835
836There are a number of manners in which monomers can be modified in a
837polymer sequence. Figure~\vref{fig:xpertedit-modify-monomer} shows the
838simplest manner: the user first selects the monomer vignette to be
839modified and calls the \guimenu{Chemistry}\guimenuitem{Modify
840  Monomer(s)} menu.  A window shows up where all the modifications
841currently available in the polymer chemistry definition are listed.
842Because a monomer vignette was initially selected in the editor
843window, the \guilabel{Selected Monomer} target radiobutton is on by
844default.\footnote{Note that if a sequence was selected when the
845  monomer modification task was started, then selecting
846  \guilabel{Current selection} would be required to modify all the
847  monomers in the selection. Alternatively, if this is not what is
848  required, re-selecting the right monomer in the sequence and
849  selecting \guilabel{Current selection} will ensure the modification
850  applies only on the currently selected monomer.}  It is then simply
851a matter of choosing the right modification from the
852\guilabel{Available modifications} list and clicking onto the
853\guilabel{Modify} button. The target(s) of a given modification (as
854selected in the \guilabel{Target} frame widget) can be identified
855according to: \smallskip
856
857\begin{figure}
858  \begin{center}
859    \includegraphics[width=1\textwidth]
860    {figures/xpertedit-modify-monomer.png}
861  \end{center}
862  \caption[Modification of a monomer in a polymer
863  sequence]{\textbf{Modification of a monomer in a polymer sequence.}
864    This figure shows how the chemical modification of monomer(s) can
865    be performed.}
866  \label{fig:xpertedit-modify-monomer}
867\end{figure}
868
869\begin{itemize}
870
871\item The \guilabel{Selected Monomer} frame will display data in its
872  two line~edit widgets if a single monomer vignette was selected at
873  the time the monomer modification action was invoked (exactly as in
874  Figure~\vref{fig:xpertedit-modify-monomer}). Only the monomer of
875  which the code and the position are displayed will be modified (even
876  if it is no more selected or if the sequence has changed and the
877  monomer at the displayed position is not the same anymore);
878
879\item The \guilabel{Current Selection} radiobutton widget indicates
880  that the modification should be performed on all the monomers that
881  are \textit{currently} selected, that is, if the selection changed
882  after the modification window was displayed, the new selection is
883  modified, not the old one;
884
885\item The \guilabel{Monomers Of Same Code} If a monomer code is
886  displayed in the \guilabel{Selected Monomer} frame, all the monomers
887  in the sequence that have that code are modified;
888
889\item \guilabel{Monomers From The List} All the monomers in the
890  polymer sequence having a code corresponding to any code selected in
891  the \guilabel{Available Monomers} list are modified;
892
893\item \guilabel{All Monomers} All the monomers of the polymer sequence
894  are modified;
895
896\end{itemize}
897%
898Note that there is one checkbox widget (\guilabel{Override target
899  limitations}) that requires explanation. In the chapter about the
900definition of polymer chemistries (chapter\vref{chap:xpertdef}) the
901definition of modifications was detailed, and the target notion was
902explicited. If, during a monomer modification, \mXp\ detects that the
903user is trying to modify a monomer that is not a target of the
904modification at hand, it will complain, as shown in the
905\guilabel{Messages} text~edit widget of
906Figure~\vref{fig:xpertedit-modify-monomer}). In this example, indeed,
907the user tried to modify monomer \emph{Isoleucine} with
908\emph{Phosphorylation}, which is not possible because modification
909\emph{Phosphorylation} has been defined a not having monomer
910\emph{Isoleucine} as any of its targets. Another situation where
911target limitations might show up, is when trying to modify a monomer
912more than authorized by the \guilabel{Max. count} number of times that
913monomer might be modified at once with that modification. For example,
914when working of methylation of proteins, it might happen that lysyl
915residues get methylated more than one at a time (tri-methylation
916occurs often in histones). If the chemical modification was defined in
917\xpd\ with a max count of 2 and a third chemical modification is asked
918on a given target monomer, then the program refuses to perform the
919modification. To override this limitation, check the
920\guilabel{Override target limitations} checkbox widget.
921
922
923The general concept about this is : the \guilabel{Override target
924  limitations} checkbox widget is unchecked by default so that the
925user does not do mistakes without knowing. However, flexibility is
926desirable, and the \guilabel{Override target limitations} checkbox
927widget can be checked if required.
928
929As a result of the monomer modification, the monomer vignette gets
930modified. Figure~\vref{fig:xpertedit-modify-monomer} shows one
931phosphorylated Seryl residue at position 8: a transparent graphics
932object (a red `P') was overlaid onto the corresponding seryl monomer
933vignette. If the user modifies a monomer with a modification that has
934no corresponding \fileformat{svg} file defined for its graphical
935rendering in file \filename{modification\_dictionary}, then a default
936modification rendering is used.
937
938The user is responsible for correctly reading the messages that might
939be published in the \guilabel{Messages} text~edit widget. It is
940important to understand that, when a monomer is modified, its previous
941modification (if any) is overwritten with the new one. The user is
942invited to experiment a bit with the monomer modification process, so
943as to be confident of the results that she is going to obtain when
944real polymer chemistry work is to be modelled in \mXp.
945
946If the modification to be applied is not readily available in the list
947of modifications defined in the polymer chemistry definition, then it
948is possible, by checking the \guilabel{Define modification} check
949button widget to manually define a modification. This procedure leads
950to the modification of the target monomer(s) exactly as if the
951modification had been selected from the list of available
952modifications. But, because the modification has a name not known to
953the polymer chemistry definition, the editor cannot modify the monomer
954vignette with a predefined transparent raster image. Thus, as seen on
955Figure~\ref{fig:xpertedit-modify-monomer-manually-defined-modif}, the
956modified residue gets visually modified using the default transparent
957raster image (4 interrogation marks, one at each corner of the monomer
958vignette square).
959
960\begin{figure}
961  \begin{center}
962    \includegraphics[width=0.66\textwidth]
963    {figures/xpertedit-modify-monomer-manually-defined-modif.png}
964  \end{center}
965  \caption[Rendering of a monomer modification in a polymer
966  sequence]{\textbf{Rendering of a monomer modification in a polymer
967      sequence.}  This figure shows how the chemical modification of
968    monomer(s) is graphically rendered. The `K' residue is modified
969    using an ``Acetylation'' modification. The `S' residue is modified
970    with a modification that has no associated graphical vignette. The
971    default vignette is thus used.}
972  \label{fig:xpertedit-modify-monomer-manually-defined-modif}
973\end{figure}
974
975It is perfectly feasible to modify a single monomer more than once
976(with the same modification or not ; for example a tri-methylation
977with a methylation modification). This is why when the window depicted
978in Figure~\ref{fig:xpertedit-modify-monomer} shows up, the two lists
979at the right hand side show the monomers currently modified and the
980modification(s) that are currently set to these modified
981monomers. Selecting one item from the \guilabel{Modified monomers}
982list will show only the modifications set to that monomer in the
983\guilabel{Modifications} list. If all the modifications in the polymer
984sequence are to be displayed then, checking the \guilabel{All
985  modifications} check box widget will trigger the display of all the
986modifications set to any monomer in the whole polymer sequence.
987
988Unmodification of monomers is easily performed by selecting any number
989of items from the \guilabel{Modifications} list and clicking the
990\guilabel{Unmodify} button.
991
992\fbox{\parbox{0.9\textwidth}{\textsl{It should be noted that once a
993      monomer modification dialog window has been opened, the polymer
994      sequence should not be edited. This is because the
995      modification/unmodification process takes for granted that the
996      polymer sequence still is identical to what it was when the
997      monomer modification dialog was opened. Mecanisms are there to
998      ensure that the irreparable does not happen, but this warning is
999      in order.}}}
1000
1001
1002\renewcommand{\sectitle}{Whole Sequence Modification}
1003\subsection*{\textcolor{sectioningcolor}{\sectitle}}
1004\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
1005\index{\xpe!simulations!polymer~modification}
1006
1007As described above, it is possible to modify any monomer in the
1008polymer sequence; whhen any modified monomer is removed, the
1009modification associated to it disappears also. The modifications that
1010we describe here are not of this kind. They can be applied to either
1011the left end of the polymer sequence or its right end (or both ends at
1012any given time).  But these modifications do belong to the polymer
1013sequence \textit{per se} and are not removed from it---even if the
1014polymer sequence is edited by removing the left end monomer or the
1015right end monomer.  This is why these modifications are \emph{polymer
1016  modifications} and not monomer modifications.
1017
1018\begin{figure}
1019  \begin{center}
1020    \includegraphics[width=0.66\textwidth]
1021    {figures/xpertedit-modify-polymer.png}
1022  \end{center}
1023  \caption[Modification of the left end of a polymer
1024  sequence]{\textbf{Modification of the left end of a polymer
1025      sequence.} This figure shows how simple it is to permanently
1026    modify a polymer sequence on either or both its left/right ends.}
1027  \label{fig:xpertedit-modify-polymer}
1028\end{figure}
1029
1030The way in which a polymer sequence is modified using \emph{polymer
1031  modifications} is much easier than the previous \emph{monomer
1032  modifications} case. The modification window is opened by choosing
1033the \guimenu{Chemistry}\guimenuitem{Modify Polymer} menu. The
1034Figure~\vref{fig:xpertedit-modify-polymer} shows that window. The
1035modification is absolutely easy to perform, with a clear feedback
1036provided to the user (by listing the permanent modifications in two
1037line~edit widgets located in front of the \guilabel{Target}
1038checkbuttons \guilabel{Left End} and \guilabel{Right End}.
1039
1040Note that, as a convenience for the user, it is possible to modify the
1041polymer sequence using an arbitrary modification in the form of a
1042combination of a name and a formula (check the \guilabel{Define
1043  modification} checkbox, to that effect). The modification object
1044used is created on-the-fly by the program and gets saved in the file
1045as if the user had selected a modification out of the list of
1046available modifications. In the example
1047(Figure~\vref{fig:xpertedit-modify-polymer}), the polymer sequence was
1048modified on its left end using the ``Acetylation'' modification
1049available in the polymer chemistry definition and was amidated
1050(formula \guivalue{-OH+NH2}) with a manually-defined modification
1051called \guivalue{MyModif}. The polymer sequence editor window displays
1052the left end and right end modifications as labels of buttons located
1053in the \guilabel{Polymer modifications} groupbox.
1054
1055
1056\renewcommand{\sectitle}{Monomer Cross-linking}
1057\section*{\textcolor{sectioningcolor}{\sectitle}}
1058\addcontentsline{toc}{section}{\numberline{}\sectitle}
1059\label{subsect:monomer-cross-link}
1060\index{\xpe!monomer~cross-linking}
1061
1062A cross-link is a covalent bond that links a monomer with one
1063or more other monomer. A monomer might be cross-linked more than once.
1064The dialog window in which the user might define cross-links is shown
1065in Figure~\ref{fig:xpertedit-cross-link-monomers}.
1066
1067\begin{figure}
1068  \begin{center}
1069    \includegraphics[width=1\textwidth]
1070    {figures/xpertedit-cross-link-monomers.png}
1071  \end{center}
1072  \caption[Cross-linking of monomers]{\textbf{Cross-linking of
1073      monomers.}  This figure shows the window in which monomers can
1074    be cross-linked together. A cross-link (as defined in the current
1075    polymer chemistry definition) is selected and the targets are
1076    specified in the \guilabel{Targets' positions} text line edit
1077    widget in the form of monomer positions separated by ';'
1078    semicolumns.}
1079  \label{fig:xpertedit-cross-link-monomers}
1080\end{figure}
1081
1082Cross-linkers were defined in the section about \xpd\ (see
1083page~\pageref{sect:cross-linkers}). A cross-linker might either define
1084no modification to be applied to the cross-linked monomers or the same
1085number of modifications as there are monomers cross-linked. For
1086example, fluorescent proteins have a chromophore that is made by
1087reaction of three residues (Threonyl [or Seryl]--Tryptophanyl [or
1088Tyrosinyl or Phenylalanyl]--Glycyl), as shown in
1089Figure~\ref{fig:xpertedit-cross-linked-monomers}. When cross-linking
1090with the fluorescent protein cross-linker, there must be three
1091monomers involved as these are three modifications defined in the
1092cross-linker.
1093
1094\begin{figure}
1095  \begin{center}
1096    \includegraphics[width=0.4\textwidth]
1097    {figures/xpertedit-cross-linked-monomers.png}
1098  \end{center}
1099  \caption[Graphical rendering of cross-linked
1100  monomers]{\textbf{Graphical rendering of cross-linked monomers.}
1101    This figure shows the three monomers (TWG) from cyan fluorescent
1102    protein cross-linked together.}
1103  \label{fig:xpertedit-cross-linked-monomers}
1104\end{figure}
1105
1106When any monomer involved in a cross-linker is edited off a polymer
1107sequence, the cross-link(s) it was involved in are automatically
1108dissolved and destroyed. Destruction of a cross-link might be
1109performed by selecting the cross-link in the \guilabel{Cross-links}
1110list widget at the right hand side of the dialog window depicted in
1111Figure~\ref{fig:xpertedit-cross-link-monomers} and by clicking the
1112\guilabel{Uncross-link} button.
1113
1114
1115
1116\renewcommand{\sectitle}{Sequence Cleavage}
1117\section*{\textcolor{sectioningcolor}{\sectitle}}
1118\addcontentsline{toc}{section}{\numberline{}\sectitle}
1119\label{sect:cleave-polymer-sequences}
1120\index{\xpe!simulations!sequence~cleavage}
1121
1122It happens very often that polymer sequences get cleaved in a
1123sequence-specific manner. These specific cleavages do occur very often
1124in nature, and are made by enzymes that do cleave biopolymer
1125sequences, like the glycosidases (cleaving saccharides), the proteases
1126(cleaving proteins) or the nucleases (cleaving nucleic acids). But the
1127scientist also uses purified enzymes or chemicals to perform such
1128cleavages in the test tube.  \mXp\ must be able to perform those
1129cleavages \textit{in silico}.
1130
1131\begin{figure}
1132  \begin{center}
1133    \includegraphics[width=0.9\textwidth]
1134    {figures/xpertedit-cleavages.png}
1135  \end{center}
1136  \caption[Polymer sequence cleavage window]{\textbf{Polymer sequence
1137      cleavage window.}  This figure shows the window in which polymer
1138    sequence cleavage is performed.  One cleavage specification is
1139    selected and the number of allowed partial cleavages is set. The
1140    results are displayed in the same window. The cleavage might be
1141    performed on the currently selected polymer sequence region or the
1142    whole sequence. It is possible to stack oligomers from different
1143    cleavage simulation in the same window.}
1144  \label{fig:xpertedit-cleavages}
1145\end{figure}
1146
1147It is a matter of having a polymer sequence opened in an editor window
1148and selecting the \guimenu{Chemistry}\guimenuitem{Cleave} menu. The
1149user is provided with a window where a number of cleavage
1150specifications are listed (Figure~\ref{fig:xpertedit-cleavages},
1151page~\pageref{fig:xpertedit-cleavages}) along with options that allow
1152customizing the production of oligomers.  The cleavage specifications
1153are listed in the \guilabel{Available cleavage agents} list widget by
1154looking into the polymer chemistry definition corresponding to the
1155polymer sequence to be cleaved. The program knows, for example, that
1156the polymer sequence to be cleaved is of the ``protein-1-letter''
1157chemistry type, and thus will list all the cleavage specifications
1158that were defined in that polymer chemistry definition.
1159
1160The user selects the cleavage specification of interest and sets other
1161useful parameters, like the number of partial cleavages that the
1162cleaving agent may yield, for example. Entering \guivalue{0} means
1163that the cleavage reaction will yield the set of oligomers
1164corresponding to a total cleavage of the polymer sequence (no missed
1165cleavages=partial cleavages 0). Also, the user might indicate that the
1166oligomers computed during the cleavage should be ionized according to
1167the current ionization rule (displayed in the main window) and in the
1168specified range. Finally, when the window is opened, the
1169\guilabel{Oligomer coordinates} group box widget lists the coordinates
1170of the currently selected region of the polymer sequence. Either leave
1171the values as they are shown or check the \guilabel{Whole sequence}
1172check box widget. In the first case, the cleavage will occur only
1173inside the selected region of the polymer sequence (that is, taking
1174that region to be the actual polymer sequence of interest); in the
1175second case, the cleavage will take place in the whole polymer
1176sequence whatever the currently selected polymer sequence region.
1177This feature, which was introduced in version 2.3.0, is useful so as
1178to simulate a first cleavage of a polymer sequence and then a second
1179cleavage of a selected oligomer using a different cleavage agent. In
1180protein chemistry, that would be useful to explore possibilities of
1181double sequential cleavages of a protein, first with EndoAspN, for
1182example, and then with Trypsin.
1183
1184The user might want to generate oligomers for different kinds of
1185cleavages. For example, it might be interesting to have in the same
1186tree view widget the oligomers generated using first trypsin and then
1187cyanogen bromide. In order to add new oligomers to pre-existing one,
1188it is simply required to check the \guilabel{Stack oligomers} check
1189button widget prior to clicking the \guilabel{Cleave} button again
1190with the new cleavage settings.
1191
1192The \guilabel{Details} frame widget at the bottom of the window
1193displays a number of informative data. In particular, the
1194\guilabel{Sequence} tab widget displays the sequence of the oligomer
1195currently selected in the \guilabel{Oligomers} table view along with the
1196name of the cleavage agent which it arose from. The \guilabel{Cleavage
1197  Details} tab widget displays the mass calculation engine
1198configuration at the time the \emph{last} cleavage was performed (one
1199red led means that the related feature was off, conversely a green led
1200means that the feature was on). In our example, the mass calculation
1201for the oligomers did not account for the monomer modifications nor
1202for the left/right ends of the polymer, nor for the cross-links.
1203
1204When the user triggers a cleavage, the mass calculation engine
1205configuration currently set in the sequence editor is used for the
1206calculation of the mass of the oligomers obtained \textit{per} the
1207cleavage.  This process allows an easy change in the mass calculation
1208engine configuration between one cleavage and another so as to allow
1209comparison of masses obtained for the same cleavage but with different
1210mass calculation engine configurations.
1211
1212Finally, one last note: if the list of monoisotopic or average masses
1213are desired in the form of a text list, right-clicking onto the table
1214iew widget will allow copying to the clipboard either the monoisotopic
1215or the average masses. Also, it is possible to either export the data
1216to the clipboard or to a file or even to drag the displayed oligomer
1217items in a text editor. Only the selected items in the tree view
1218widget will be exported.
1219
1220For oligomer data filtering, please refer to
1221section~\ref{sect:oligomer-data-filtering}, page
1222\pageref{sect:oligomer-data-filtering}.
1223
1224\renewcommand{\sectitle}{Spectrum calculation}
1225\subsection*{\textcolor{sectioningcolor}{\sectitle}}
1226\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
1227\index{\xpe!simulations!spectrum-calculation}
1228
1229It is possible to create a full spectrum simulation based on the
1230oligomers presented in the \guilabel{Oligomers} table widget. For
1231that, click the \guilabel
1232{Create spectrum} menu in the drop down
1233menu. Clicking that menu will elicit the opening of the window shown
1234in Figure~\ref{fig:xpertedit-spectrum-creation-from-cleavages}.
1235
1236
1237\begin{figure}
1238  \begin{center}
1239    \includegraphics[scale=1]
1240    {figures/xpertedit-spectrum-creation-from-cleavages.png}
1241  \end{center}
1242  \caption[Spectrum simulation for cleavage-obtained
1243  oligomers]{\textbf{Spectrum simulation for cleavage-obtained
1244      oligomers.}  This figure shows how to configure the calculation
1245    of a spectrum for a set of oligomers obtained after the cleavage
1246    of a polymer sequence.}
1247  \label{fig:xpertedit-spectrum-creation-from-cleavages}
1248\end{figure}
1249
1250If the \guilabel{Isotopic cluster} check box is not checked, then the
1251spectrum will not contain the isotopic cluster for each
1252oligomer. Instead, a single peak will be calculated, based either on
1253the monoisotopic or on the average mass of the oligomer that is used
1254as the peak centroid. When the \guilabel{Isotopic cluster} check box
1255is checked, the starting mass is evidently monoisotopic as the
1256isotopic cluster is calculated starting from that mass. Note that the
1257other parameters have been explained earlier
1258(see section~\ref{sect:xpertcalc-isotopic-pattern-calculator},
1259page~\pageref{sect:xpertcalc-isotopic-pattern-calculator}).
1260
1261Selecting a file to write the results (that is the (x y) pairs making
1262the spectrum) is recommended. Otherwise, when the calculation is
1263finished, refer to the \guilabel{Results} tab page widget for the same
1264spectrum (x y) pairs.
1265
1266During the calculation, the \guilabel{Log} tab page widget shows the
1267details of the running calculation. For example, the following is the
1268log for the first two oligomers of a set of 123:
1269
1270{\small
1271\begin{verbatim}
1272
1273Simulating a spectrum with calculation of
1274an isotopic cluster for each oligomer.
1275
1276There are 123 oligomers. Calculating sub-spectrum for each
1277
1278Computing isotopic cluster for oligomer 1
1279	formula: C82H123N22O25.
1280 Validating formula... Success.
1281	mono m/z: 1815.9
1282	charge: 1
1283	fwhm: 0.18159
1284	increment: 0.024212
1285
1286		Done computing the cluster
1287
1288Computing isotopic cluster for oligomer 2
1289	formula: C82H124N22O25.
1290 Validating formula... Success.
1291	mono m/z: 908.455
1292	charge: 2
1293	fwhm: 0.0908455
1294	increment: 0.00605637
1295
1296		Done computing the cluster
1297\end{verbatim}
1298}
1299
1300The previous example dealt with the horse apomyoglobin that was
1301cleaved with trypsin, with 1 partial cleavage and charge levels from 1
1302to 3. That cleavage simulation yielded 123 oligomers, for which a
1303spectrum was calculated which spans the [49.7--3418] m/z
1304range. Figure~\ref{fig:xpertedit-spectrum-simulation-cleavage-oligomers}
1305shows that spectrum, zoomed in the region [744--759]. Four distinct
1306isotopic clusters are visible:
1307
1308\begin{figure}
1309  \begin{center}
1310    \includegraphics[width=\textwidth]
1311    {figures/xpertedit-spectrum-simulation-cleavage-oligomers.png}
1312  \end{center}
1313  \caption[Simulated spectrum for cleavage-obtained
1314  oligomers]{\textbf{Simulated spectrum for cleavage-obtained
1315      oligomers.}  This spectrum (zoomed portion viewed in
1316    \progname{mMass}) has been simulated starting from a list of
1317    oligomers obtained by cleaving the horse apomyoglobin protein with
1318    trypsin.}
1319  \label{fig:xpertedit-spectrum-simulation-cleavage-oligomers}
1320\end{figure}
1321
1322\begin{tabbing}
1323mono m/z \phantom{room} \= Peptide sequence\phantom{still some roooom here} \= charge\\[2mm]
1324
1325744.70 \> HPGDFGADAQGAMTKALELFR \> 3+\\[2mm]
1326748.44 \> ALELFR \> 1+\\[2mm]
1327751.84 \> HPGDFGADAQGAMTK \> 2+\\[2mm]
1328753.98 \> KHGTVVLTALGGILK \> 2+\\[2mm]
1329\> HGTVVLTALGGILKK \> 2+\\[2mm]
1330\end{tabbing}
1331
1332
1333Computing a full spectrum starting from oligomers which might have
1334large masses (> 6000) will require a large amount of CPU. The above
1335apomyoglobin example could be handled in $\approx$\,20~s on a rather
1336powerful laptop (albeit with a single processor used throughout the
1337task).
1338
1339
1340\renewcommand{\sectitle}{Oligomer Fragmentation}
1341\section*{\textcolor{sectioningcolor}{\sectitle}}
1342\addcontentsline{toc}{section}{\numberline{}\sectitle}
1343\label{sect:fragment-polymer-sequence}
1344\index{\xpe!simulations!oligomer~fragmentation}
1345
1346It happens very often that polymer sequences need to be fragmented in
1347the gas phase (in the mass spectrometer) so that structure
1348characterizations may be performed. For protein chemistry, this
1349happens very often in order to get sequence information for a given
1350peptide ion selected in the gas phase. \mXp\ must be able to perform
1351those fragmentations \textit{in silico}.  Let's see how an oligomer
1352can be fragmented using \mXp.
1353
1354\begin{figure}
1355  \begin{center}
1356    \includegraphics[scale=1]
1357    {figures/xpertedit-fragmentation.png}
1358  \end{center}
1359  \caption[Oligomer fragmentation window]{\textbf{Oligomer
1360      fragmentation window.}  This figure shows the window in which
1361    oligomer fragmentation is performed.  One or more fragmentation
1362    patterns might be selected in one fragmentation step.}
1363  \label{fig:xpertedit-fragmentation}
1364\end{figure}
1365
1366It is a matter of having a polymer sequence opened in an editor window
1367and selecting the sequence region to be fragmented. Once this is done,
1368the user selects the \guimenu{Chemistry}\guimenuitem{Fragment} menu.
1369The user is provided with a window where a number of fragmentation
1370specifications are listed (Figure~\vref{fig:xpertedit-fragmentation}).
1371As detailed for the cleavage of polymers, these fragmentation
1372specifications are listed by looking into the polymer chemistry
1373definition corresponding to the polymer sequence of which an oligomer
1374is to be fragmented.
1375
1376The user selects the fragmentation specification(s) of interest, set
1377the ionization range required for the generated fragment oligomers
1378(the same as for polymer cleave) and clicks the \guilabel{Fragment}
1379button.  Upon successful termination of the fragmentation reaction,
1380the generated fragments are displayed in the \guilabel{Oligomers}
1381table view widget.
1382
1383As detailed for the cleavage of polymer sequences, the
1384\guilabel{Details} frame widget displays data about the fragments
1385generated and the way masses were calculated for them.
1386
1387It is possible to take into account cross-links that are beared by
1388monomers contained in the oligomer. Only cross-links that are fully
1389contained in the oligomer are taken into account. Partial cross-links,
1390that is, cross-links that have at least one involved monomer outside
1391of the oligomer, are ignored.
1392
1393Figure~\ref{fig:xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links}
1394shows the \xpe\ module with the cyan fluorescent protein. The
1395chromophore is shown as an internal cross-link between residues T, W
1396and G (net mass change: -20~Da). There is also a disulfide bond
1397involving two cysteine residues (net mass change: -2~Da). In this
1398example, the mass calculation engine did not take into account the
1399cross-links (see the unchecked \guilabel{Cross-links} check box). When
1400that check box is checked, the mass calculation engine yields mass
1401data with a differential of -22~Da (-20 -2)~Da : both cross-links have
1402now been taken into account.
1403
1404\begin{figure}
1405  \begin{center}
1406    \includegraphics[width=0.75\textwidth]
1407    {figures/xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links.png}
1408  \end{center}
1409  \caption[Two cross-links in the cyan fluorescent protein
1410  sequence]{\textbf{Two cross-links in the cyan fluorescent protein
1411      sequence.}  This figure shows two cross-links (T--W--G and C--C)
1412    set to the cyan fluorescent protein. The mass calculation engine
1413    is configured to take these cross-links into account.}
1414  \label{fig:xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links}
1415\end{figure}
1416
1417
1418\begin{figure}
1419  \begin{center}
1420    \includegraphics[width=0.75\textwidth]
1421    {figures/xpertedit-cfp-chromophore-disulfide-bond-account-cross-links.png}
1422  \end{center}
1423  \caption[Calculations when cross-links are accounted
1424  for]{\textbf{Calculations when cross-links are accounted for.}  This
1425    figure shows that the two cross-links shwon in
1426    Figure~\ref{fig:xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links}
1427    are now taken into account, which translates into a mass decrease
1428    of 22~Da.}
1429  \label{fig:xpertedit-cfp-chromophore-disulfide-bond-account-cross-links}
1430\end{figure}
1431
1432
1433If we select the oligomer region [38--77] and that we ask for a
1434fragmentation, the fragmentation results will take into account both
1435cross-links only in the case the generated fragments encompasse fully
1436one or more cross-links.
1437
1438The following calculation rationale applies:
1439
1440\begin{itemize}
1441
1442\item Fragments b (left end) from b$_1$ (D) to b$_{12}$ (up to I) do
1443  not take into account the cross-links as both are outside of its
1444  scope;
1445
1446\item Fragments b$_{13}$ (up to C) to b$_{34}$ (up to Q) do not take
1447  into account the cross-links because the outer cross-link (disulfide
1448  bond between cysteine residues) is not complete (the second cysteine
1449  is left out of the fragment);
1450
1451\item Fragments b$_{35}$ (up to C) to b$_{40}$ (up to P) do take into
1452  account both cross-links because both are contained in the fragments;
1453
1454\item Likewise, the only y fragments (right end) that do take into
1455  account the cross-links are the fragments y$_{28}$ (up to C) and all
1456  the remaining, as for these fragments, the cross-links are both
1457  fully contained.
1458
1459\end{itemize}
1460
1461
1462\begin{figure}
1463  \begin{center}
1464    \includegraphics[width=0.75\textwidth]
1465    {figures/xpertedit-fragmentation-cross-linked-oligomer.png}
1466  \end{center}
1467  \caption[Complicated cross-linking situation]{\textbf{Complicated
1468      cross-linking situation.}  This figure shows a complicated
1469    cross-linking situation with an oligomer that has five
1470    cross-links, four of which are fully encompassed by the oligomer
1471    and one that involves a monomer outside of the oligomer.}
1472  \label{fig:xpertedit-fragmentation-cross-linked-oligomer}
1473\end{figure}
1474
1475
1476The calculation of the fragments for this oligomer involves the
1477following steps:
1478
1479\begin{itemize}
1480
1481\item Calculate regions of the oligomer that involve cross-links
1482  either overlapping or not. The regions are thus the following:
1483  [3--5], [8--11] and [13--15]. Note that the cross-link involving
1484  monomer~12 is never taken into account as it involves also a monomer
1485  outside of the oligomer;
1486
1487\item For fragments that have the left end of the oligomer (``Left-end
1488  nomenclature''), the following rationale is used:
1489
1490  \begin{itemize}
1491
1492  \item Fragments $\rightarrow$1 and $\rightarrow$2 do not have any
1493    cross-link;
1494
1495  \item Fragments $\rightarrow$3 to $\rightarrow$4 do not account for
1496    cross-link~a because that cross-linke is not fully encompassed by
1497    the fragments;
1498
1499  \item Fragments $\rightarrow$5 to $\rightarrow$10 account only for
1500    the cross-link~a as this is the only cross-linked region to be
1501    fully encompassed by these fragments;
1502
1503  \item Fragments $\rightarrow$11 to $\rightarrow$14 account for
1504    cross-links~a, b and c as they are all fully encompassed in the
1505    fragments;
1506
1507  \item Fragments $\rightarrow$15 to $\rightarrow$16 account for all
1508    cross-links, a, b, c, d as they are all fully encompassed in the
1509    fragments;
1510
1511  \end{itemize}
1512
1513\item For fragments that have the right end of the oligomer (Right-end
1514  nomenclature), the following rationale is used:
1515
1516  \begin{itemize}
1517
1518  \item Fragments 1$\leftarrow$ and 2$\leftarrow$ do not have any
1519    cross-link;
1520
1521  \item Fragments 3$\leftarrow$ and 4$\leftarrow$ do not account for
1522    cross-link~d because that cross-link is not fully encompassed by
1523    the fragments;
1524
1525  \item Fragments 5$\leftarrow$ and 6$\leftarrow$ account for
1526    cross-link~d because it is fully encompassed in these fragments;
1527
1528  \item Fragments 7$\leftarrow$ to 9$\leftarrow$ only account for
1529    cross-link~d because cross-links~b and c (which make one
1530    cross-linked region) are not fully encompassed by these fragments;
1531
1532  \item Fragments 10$\leftarrow$ to 14$\leftarrow$ account for
1533    cross-links~d, c and b, but not for cross-link~a as this last
1534    cross-link is not fully encompassed in these fragments;
1535
1536  \item Fragments 15$\leftarrow$ and 16$\leftarrow$ account for all
1537    the cross-links of the oligomer.
1538
1539  \end{itemize}
1540
1541\end{itemize}
1542
1543\noindent It is necessary to repeat one more time that cross-links
1544that involve monomer(s) outside of the oligomer are ignored. The user
1545is alerted whenever this situation is encountered.
1546
1547Finally, one last note: if the list of monoisotopic or average masses
1548are desired in the form of a text list, right-clicking onto the table
1549view widget will allow copying to the clipboard either the
1550monoisotopic or the average masses. Also, it is possible to either
1551export the data to the clipboard or to a file or even to drag the
1552displayed oligomer items in a text editor.
1553
1554For oligomer data filtering, please refer to
1555section~\ref{sect:oligomer-data-filtering}, page
1556\pageref{sect:oligomer-data-filtering}.
1557
1558
1559\renewcommand{\sectitle}{Mass Searching}
1560\section*{\textcolor{sectioningcolor}{\sectitle}}
1561\addcontentsline{toc}{section}{\numberline{}\sectitle}
1562\label{sect:search-masses-polymer-sequence}
1563\index{\xpe!mass~searching}
1564
1565It may happen that the scientist needs to know if some arbitrary
1566sequence region would have a given mass. \mXp\ allows for mass
1567searching operations in the polymer sequence. This is done by using
1568the menu \guimenu{Chemistry}\guimenuitem{Mass Search}. The window
1569illustrated in Figure~\vref{fig:xpertedit-mass-search} shows up and
1570the user enters masses to search for. A number of parameters are to be
1571detailed:
1572\smallskip
1573
1574\begin{itemize}
1575
1576\item \guilabel{Targets} The masses should be searched for in the
1577  whole sequence or in the currently selection region?
1578
1579\item \guilabel{Ionization} When calculating masses for the potential
1580  oligomers matching the searched mass, should different levels of
1581  ionization be calculated. For example, one find in an electrospray
1582  ionization experiment mass spectrum a peak at \mz{1245}. It is not
1583  possible to know the ionization level for that ion. On could imagine
1584  that this value is for a monopronotonated or for a multiprotonated
1585  species. If we wanted to asses this, we might ask that the mass be
1586  searched for by computing a range of possible ionization levels
1587  between \guilabel{Start} \guivalue{1} and  \guilabel{End} \guivalue{4}
1588  (admitting that for that experiment this is what one would expect).
1589
1590\end{itemize}
1591%
1592Once the masses have been searched for, if results are found they are
1593displayed in the same window in the \guilabel{Oligomers} table view
1594widgets (the left one for the mono masses and the right one for the
1595avg masses).
1596
1597
1598\begin{figure}
1599  \begin{center}
1600    \includegraphics[scale=0.75]
1601    {figures/xpertedit-mass-search.png}
1602  \end{center}
1603  \caption[Searching masses in a a polymer sequence]{\textbf{Searching
1604      masses in a polymer sequence.} This figure shows the window in
1605    which to search for masses in a polymer sequence.}
1606  \label{fig:xpertedit-mass-search}
1607\end{figure}
1608
1609
1610Finally, one last note: if the list of monoisotopic or average masses
1611are desired in the form of a text list, right-clicking onto the table
1612view widget will allow copying to the clipboard either the
1613monoisotopic or the average masses. Also, it is possible to either
1614export the data to the clipboard or to a file or even to drag the
1615displayed oligomer items in a text editor.
1616
1617For oligomer data filtering, please refer to
1618section~\ref{sect:oligomer-data-filtering}, page
1619\pageref{sect:oligomer-data-filtering}.
1620
1621
1622\renewcommand{\sectitle}{Oligomer Data Filtering}
1623\section*{\textcolor{sectioningcolor}{\sectitle}}
1624\addcontentsline{toc}{section}{\numberline{}\sectitle}
1625\label{sect:oligomer-data-filtering}
1626\index{\xpe!data~filtering}
1627
1628Oligomer-generating simulations, like polymer sequence cleavages or
1629fragmentations or mass searches, produce a very large amount of
1630data. It is often desirable to be able to filter quickly some specific
1631data out of these bunch of data\dots\
1632
1633In all three simulations mentioned above, the results that are
1634displayed in the corresponding dialog windows are easily filtered
1635using the mechanism illustrated in
1636Figure~\ref{fig:xpertedit-filtering-oligomer-data}.
1637
1638\begin{figure}
1639  \begin{center}
1640    \includegraphics[width=1\textwidth]
1641    {figures/xpertedit-filtering-oligomer-data.png}
1642  \end{center}
1643  \caption[Oligomer data filtering]{\textbf{Oligomer data filtering.}
1644    This figure shows how oligomer data can be filtered. The
1645    \guilabel{Filtering options} group box contains four line edit
1646    widgets where filtering might be triggered: \guilabel{Partial},
1647    \guilabel{Mono}, \guilabel{Avg}, \guilabel{Charge}. The filtered
1648    data are displayed in the same window (this examlple for polymer
1649    sequence-cleavage oligomer data.}
1650  \label{fig:xpertedit-filtering-oligomer-data}
1651\end{figure}
1652
1653
1654Filtering on the data is easily performed by entering the options in
1655the \guilabel{Filtering options} group box
1656(Figure~\ref{fig:xpertedit-filtering-oligomer-data},
1657page~\pageref{fig:xpertedit-filtering-oligomer-data}). For any
1658filtering operation, only one criterium can be used, that is, for
1659example, filtering can occur only on the basis of the monoisotopic
1660mass or of the average mass, but not on both masses. For example, if
1661one wanted to filter a huge set of data against a specific
1662monoisotopic mass of 850 plus or minus 3 atomic mass units, it would
1663simply be a matter of setting the monoisotopic mass to be
1664\guivalue{850} with a tolerance of \guivalue{3 AMU} in the
1665corresponding line edit widgets contained in the \guilabel{Filtering
1666  options} group box. To perform that filtering action, first set the
1667tolerance value (\guivalue{3}) in its line edit widget and next set
1668the monoisotopic mass value to be \guivalue{850} in the corresponding
1669line edit widget. While the cursor \emph{is still} in the
1670\guilabel{Mono} line edit where \guivalue{850} was entered, press the
1671keyboard key combination \kbdKey{Ctrl}+\kbdKey{ENTER}. The filtering
1672will be immediate and the table view will show the data that passed
1673the filter. Note that the combo box widget holding the unit of the
1674tolerance (in the example, that unit is \guilabel{AMU}, that is
1675``atomic mass unit'') and the line edit widget where the tolerance
1676value proper is set (\guivalue{3} in the example) do not trigger any
1677filtering by themselves; these widgets are only useful in conjunction
1678with other oligomer data : \guilabel{Mono}, \guilabel{Avg},
1679\guilabel{Error} line edit widgets (depending on the dialog window the
1680filtering occurs: cleavage, fragmentation or mass search). In our
1681example, thus, the filtering would be spoken like this:
1682---\textsl{``Only show the oligomers for which the monoisotopic mass
1683  is 850 plus or minus 3 atomic mass units''}.
1684
1685To exit the data filtering mode, simply uncheck the
1686\guilabel{Filtering options} check box, and all the initial data will
1687be displayed, irrespective of any data in the line edit boxes
1688described above.
1689
1690
1691\renewcommand{\sectitle}{m/z Ratio Calculation}
1692\section*{\textcolor{sectioningcolor}{\sectitle}}
1693\addcontentsline{toc}{section}{\numberline{}\sectitle}
1694\label{sect:m-over-z-ratio-calculation}
1695\index{\xpe!simulations!m/z~calculations}
1696
1697In electrospray ionization, a given polymer sequence might be charged
1698a large number of times. The tool shown in
1699Figure~\vref{fig:xpertedit-mz-ratio-calculator} shows how to compute a
1700range of m/z ratios starting from one m/z value for a given charge and
1701a given ionization agent. It is also possible to switch ionization
1702agent on-the-fly.
1703
1704\begin{figure}
1705  \begin{center}
1706    \includegraphics[scale=0.8]
1707    {figures/xpertedit-mz-ratio-calculator.png}
1708  \end{center}
1709  \caption[Calculation of ranges of m/z ratios]{\textbf{Calculation of
1710      ranges of m/z ratios.} This figure shows the window in which to
1711    perform the calculation of different m/z ratios starting from one
1712    m/z value with a given ionization agent.}
1713  \label{fig:xpertedit-mz-ratio-calculator}
1714\end{figure}
1715
1716
1717\renewcommand{\sectitle}{Monomeric And Elemental Compositions}
1718\section*{\textcolor{sectioningcolor}{\sectitle}}
1719\addcontentsline{toc}{section}{\numberline{}\sectitle}
1720\label{sect:monomeric-elemental-compositions}
1721\index{\xpe!elemental~composition}
1722\index{\xpe!monomeric~composition}
1723
1724The \guimenu{Chemistry}\guimenuitem{Determine Compositions} menu
1725triggers the window shown in Figure~\ref{fig:xpertedit-compositions}.
1726The elemental composition is determined using the calculations engine
1727configuration currently set in the polymer sequence editor window.
1728
1729\begin{figure}
1730  \begin{center}
1731    \includegraphics[scale=0.9]
1732    {figures/xpertedit-compositions.png}
1733  \end{center}
1734  \caption[Determination of the compositions]{\textbf{Determination of
1735      the compositions.} This figure shows how to determine the
1736    monomeric and elemental compositions for the whole sequence or the
1737    current selection.}
1738  \label{fig:xpertedit-compositions}
1739\end{figure}
1740
1741
1742
1743\renewcommand{\sectitle}{pKa, pH, pI and Charges}
1744\section*{\textcolor{sectioningcolor}{\sectitle}}
1745\addcontentsline{toc}{section}{\numberline{}\sectitle}
1746\label{sect:acido-basic-calculations}
1747\index{\xpe!pKa}
1748\index{\xpe!pH}
1749\index{\xpe!pI}
1750
1751When preparing biochemical experiments, very often users need to know
1752how many charges a given polymer sequence will bear at any given pH.
1753Equally important is the ability to know at which pH value the polymer
1754sequence will have a net charge near to zero. The pH value for which a
1755given polymer sequence has a net charge near to zero (typically this
1756means that the number of positive charges equals the number of
1757negative charges) is called the isoelectric point---the pI.
1758
1759Such computations are pretty computer-intensive and require a very
1760precise knowledge of the chemical structure of the different monomers
1761that take part in the definition of the polymer chemistry. A file,
1762called \filename{pka\_ph\_pi.xml} is located in the polymer chemistry
1763definition directory. This file lists all the chemical groups that are
1764possibly charged; each monomer of the polymer definition is
1765represented by a \verb|<monomer>| element in which data are defined
1766for any chemical group of that monomer that might bear a charge at any
1767given pH. You can find the listing of the \filename{pka\_ph\_pi.xml}
1768file in chapter\vref{chap:appendices}.  We'll discuss any aspect of
1769this file's contents in the next sections with enough detail that the
1770user will be able to write one such file for her specific polymer
1771chemistry.
1772
1773At the moment, two entities in the polymer chemistry definition might
1774have chemical groups bearing charges: monomers and modifications.
1775We will first review monomers, and modifications next.
1776
1777\renewcommand{\sectitle}{Ionized Group(s) In Monomers}
1778\subsection*{\textcolor{sectioningcolor}{\sectitle}}
1779\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
1780
1781Monomers are the building blocks of polymer sequences. These blocks
1782must have at least two reactive groups so that they can be polymerized
1783into a polymer sequence thread. Reactive groups are often chargeable
1784groups; for example, the amino group of amino-acids is such that it
1785gets protonated (positively charged) at a pH inferior to its pKa.
1786Similarly, the carboxylic acid group of amino-acids is deprotonated
1787(negatively charged) at physiological pH.
1788
1789\subsubsection*{Some Theory First}
1790
1791\begin{figure}
1792  \begin{center}
1793    \includegraphics[scale=2]
1794    {figures/protein-monomer-acidobasic-data.png}
1795  \end{center}
1796  \caption[Different pKa values for a number of amino-acids' chemical
1797  groups]{\textbf{Different pKa values for a number of amino-acids'
1798      chemical groups.} All of the twenty amino-acids are represented
1799    here, which each amino-acid's lateral chain fully represented.
1800    Above each chemical group---for which the value makes sense from a
1801    biological perspective---the pKa value is indicated.}
1802  \label{fig:protein-monomer-acidobasic-data}
1803\end{figure}
1804
1805For the non-biochemist reader, amino-acids involved in the formation
1806of proteins have always at least two chemical groups that are of
1807inverted electrical charge, at physiological pH values (see
1808Figure~\ref{fig:protein-monomer-acidobasic-data}):
1809
1810\begin{itemize}
1811\item The amino group (called $\rm \alpha NH_2$) has a typical pKa
1812  value of 9.6. This means that, at physiological pH values (between
1813  6.5 and 7.5), the amino group will find the environment rather
1814  acidic, and will thus be protonated, leading to a positively-charged
1815  species ($\rm \alpha NH_3^+$);
1816\item The carboxylic group (called $\rm \alpha COOH$) has a typical pKa
1817  value of 2.35. This means that, at physiological pH values, the
1818  carboxylic group will be in a rather basic environment, and will
1819  thus be deprotonated, leading to a negatively-charged species ($\rm
1820  \alpha COO^-$).
1821\end{itemize}
1822
1823\noindent It should be clear that, at physiological pH values the two
1824$\rm \alpha$ chemical groups have a net charge of 0. But proteins are
1825charged, and this is because some of the twenty common amino-acids
1826have other chemical groups beyond the two others already described.
1827Indeed, some amino-acids have lateral chains that bear groups that
1828might be charged depending on the pH: seryl residues have an alcohol
1829group that has a pKa of 13, for example; that means that it is almost
1830always uncharged (form ROH at physiological pH values). The lateral
1831chain of lysine has a pKa of 10.53, which means that at pH values
1832below this pKa value, the $\rm \epsilon NH_2$ gets protonated,
1833introducing a positive charge in the protein. Similarly, amino-acids
1834glutamate and aspartate do have a lateral chain ended with a $\rm
1835\gamma COOH$ and a $\rm \beta COOH$, respectively.  Their pKa values
1836are below 4.5, and thus the groups are negatively charged a
1837physiological pH values.
1838
1839When the net charge of a polymer sequence has to be computed for a
1840given pH condition, the program iterates in the sequence, and for each
1841monomer will check which one of its chemical group(s) is possibly
1842charged.  For this to happen, it is required that a number of data be
1843known for each monomer's chemical group that might play a role in the
1844determination of the polymer sequence's electrical charge. Thus, for
1845each chemical group a number of data should be listed in the
1846\filename{pka\_ph\_pi.xml} file (please, see that file in the
1847chapter\vref{chap:appendices}):
1848
1849\begin{itemize}
1850\item the chemical group's \verb|<name>| element is required.
1851  {\footnotesize Examples: ``$\rm \alpha NH_2$'' or ``$\rm \epsilon
1852    NH_2$'' or ``$\alpha$COOH'';}
1853\item the chemical group's \verb|<pka>| element is optional, but is
1854  the basis for the charge calculation. {\footnotesize Examples: 9.6
1855    for the ``$\alpha$NH$\rm _2$'' or 2.35 for ``$\alpha$COOH'';}
1856\item the \verb|<acidcharged>| element is required if the <pka>
1857  element is given. This element is responsible for telling if the
1858  chemical group is charged (positively) when the pH is lower than pKa
1859  (that is when the medium is acidic with respect to the pKa).
1860  {\footnotesize Examples: an amine is positively charged when it is
1861    in its acidic form (protonated); a carboxylic acid is \emph{not}
1862    charged when it is in its acidic form;}
1863\item there can be none, one or more \verb|<polrule>| element(s) for
1864  each chemgroup. The \verb|<polrule>| element gives informations
1865  about the way the chemical group at hand might be ``trapped'' (or
1866  not) in the formation of inter-monomer bonds (while the monomer is
1867  polymerized into the polymer sequence). The value ``left\_trapped''
1868  means that the chemical group ceases to be involved in charge
1869  calculations as soon as it has a monomer at its left end. The value
1870  ``right\_trapped'' means the same as above, but when a monomer is
1871  polymerized at its right end. For a chemical group that is
1872  ``left\_trapped'', we understand that it is only effectively
1873  evaluated if it is at the left end of the polymer sequence, since in
1874  this case it does not have a monomer at its left side. Conversely, a
1875  chemical group that has a \verb|<polrule>| element with value
1876  ``right\_trapped'', will be evaluated only if the monomer is
1877  actually the right end monomer in the polymer sequence. Finally, the
1878  typical lateral chains of amino-acids have a \verb|<polrule>|
1879  element with a value ``never\_trapped'', as these chemical groups do
1880  not take part in the formation of the inter-monomer bond;
1881\item there can be none, one or more \verb|<chemgrouprule>| element(s)
1882  for each chemgroup. A chemgrouprule element should contain the
1883  following:
1884  \begin{itemize}
1885  \item there must be an \verb|<entity>| element that indicates what
1886    is the chemical entity being dealt with in the current chemgroup
1887    element.  Valid values for this element are ``LE\_PLM\_MODIF'',
1888    ``RE\_PLM\_MODIF'' or ``MNM\_MODIF'';
1889  \item there must be a \verb|<name>| element naming the chemical
1890    entity properly;
1891  \item there must be an \verb|<outcome>| element telling what action
1892    should be taken when encountering the \verb|<entity>| on the
1893    chemgroup.  Valid values are either ``LOST'' or ``PRESERVED''.
1894  \end{itemize}
1895\end{itemize}
1896
1897
1898\subsubsection*{Understanding By Example}
1899
1900Let us take some examples in order to make sure we actually understand
1901the process of describing how an electrical net charge is calculated
1902for a given polymer sequence and at any given pH value.
1903
1904Let us see the example of the aspartate amino-acid, of which the
1905lateral chain is nothing but $\rm CH_2COOH$:
1906
1907\begin{alltt}
1908    <monomer>
1909      <code>D</code>
1910      <mnmchemgroup>
1911        <name>N-term NH2</name>
1912        <pka>9.6</pka>
1913        <acidcharged>TRUE</acidcharged>
1914        <polrule>left_trapped</polrule>
1915        <chemgrouprule>
1916          <entity>LE_PLM_MODIF</entity>
1917          <name>Acetylation</name>
1918          <outcome>LOST</outcome>
1919        </chemgrouprule>
1920      </mnmchemgroup>
1921      <mnmchemgroup>
1922        <name>C-term COOH</name>
1923        <pka>2.36</pka>
1924        <acidcharged>FALSE</acidcharged>
1925        <polrule>right_trapped</polrule>
1926      </mnmchemgroup>
1927      <mnmchemgroup>
1928        <name>Lateral COOH</name>
1929        <pka>3.65</pka>
1930        <acidcharged>FALSE</acidcharged>
1931        <polrule>never_trapped</polrule>
1932        <chemgrouprule>
1933          <entity>MONOMER_MODIF</entity>
1934          <name>AmidationAsp</name>
1935          <outcome>LOST</outcome>
1936        </chemgrouprule>
1937      </mnmchemgroup>
1938    </monomer>
1939\end{alltt}
1940
1941\noindent We see that the code of the monomer for which acid-basic
1942data are being defined is `D' and that this monomer has three chemical
1943groups that might bring electrical charges. These chemical groups are
1944described by three \verb|<mnmchemgroup>| elements that we will review in
1945detail below (see Figure~\vref{fig:protein-monomer-acidobasic-data}).
1946
1947\medskip
1948
1949The first \verb|<mnmchemgroup>| element is related to the $\rm \alpha
1950NH_2$ amino group of the amino-acid:
1951
1952\begin{itemize}
1953\item \verb|<name>N-term NH2</name>| The name of the chemical group is
1954  not immediately useful, but will be used when reports are to be
1955  prepared for the calculation;
1956\item \verb|<pka>9.6</pka>| This element is optional. However, of
1957  course, if the chemical group might be electrically charged, the pKa
1958  value will be essential in order to compute the charge that is
1959  brought by this chemical group at any given pH;
1960\item \verb|<acidcharged>TRUE</acidcharged>| This element is also
1961  optional, however, if the previous element is given, then this one
1962  is compulsory. Telling if the conjugated acid form is charged (that
1963  is protonated) is essential in order to know what sign the charge
1964  has to be when the chemical group is ionized. The value ``TRUE''
1965  indicates that when the pH is lower than the pKa, the chemical group
1966  is charged, thus protonated (in the form $\rm NH_3^+$).
1967  Consequently, if the pH is higher than the pKa, then the chemical
1968  group is neutral (in the form $\rm NH_2$);
1969\item \verb|<polrule>left_trapped</polrule>| This element indicates
1970  that the chemical group should only be taken into account in the
1971  eventuality that the monomer bearing it (code `D') is the left end
1972  monomer of the polymer sequence. This can easily be understood, as
1973  this chemical group is responsible for the establishment of the
1974  inter-monomer bond towards the left end of the polymer sequence;
1975\item \verb|<chemgrouprule>| This element provides further details on
1976  the chemistry that this chemical group might be involved in:
1977  \begin{itemize}
1978  \item \verb|<entity>LE_PLM_MODIF</entity>| This element indicates
1979    that the supplementary data in the current \verb|<chemgrouprule>|
1980    element are pertaining to the $\rm \alpha NH_2$ chemical group
1981    \emph{only} in case the polymer sequence is left end-modified
1982    (that is with a permanent left end modification) and the monomer
1983    (code `D') is located at the left end of the polymer sequence
1984    (that is: it is the first monomer of the sequence for which the
1985    electrical charge---or pI---calculation is to be performed).
1986  \item \verb|<name>Acetylation</name>| This element goes further in
1987    the detail of the potential chemistry of the $\rm \alpha NH_2$
1988    chemical group: if the left end permanent modification is
1989    ``Acetylation'', then the current chemgrouprule element can be
1990    further processed, otherwise it should be abandoned;
1991  \item \verb|<outcome>LOST</outcome>| This element actually indicates
1992    what should be done with the chemical group for which the
1993    chemgrouprule is being defined. What we see here is:
1994    ---\textsl{``If the $\rm \alpha NH_2$ chemical group, belonging to
1995      a `D' monomer located at the left end of a polymer sequence, is
1996      modified permanently with an ``Acetylation'' left end
1997      modification, it should not be taken into account when computing
1998      the charge that it could bring to the polymer sequence.''}
1999  \end{itemize}
2000\end{itemize}
2001
2002\noindent The second \verb|<mnmchemgroup>| element is related to the
2003$\rm \alpha COOH$ carboxylic group of the amino-acid:
2004
2005\begin{itemize}
2006\item \verb|<name>C-term COOH</name>| Same remark as above;
2007\item \verb|<pka>2.36</pka>| Same remark as above;
2008\item \verb|<acidcharged>FALSE</acidcharged>| Same remark as above.
2009  However, as we can see, the value indicates that the acid conjugate
2010  (form $\rm COOH$) does not bring any charge. This means that when
2011  the basic conjugate is predominant (that is when pH > pKa), it
2012  brings a negative charge: the form is $\rm COO^-$;
2013\item \verb|<polrule>right_trapped</polrule>| The chemical group
2014  should not be evaluated if a monomer is linked to it at its right
2015  side. That means that the current chemical group is only evaluated
2016  if the monomer bearing it is located at the right end of the polymer
2017  sequence. This is easily understood, as the $\rm \alpha COOH$
2018  chemical group is involved in the formation of the inter-monomer
2019  bond towards the right end of the polymer sequence.
2020\end{itemize}
2021
2022\noindent The third \verb|<mnmchemgroup>| element is related to the
2023$\rm \beta COOH$ carboxylic group of the amino-acid:
2024
2025\begin{itemize}
2026\item \verb|<name>Lateral COOH</name>|;
2027\item \verb|<pka>3.65</pka>|;
2028\item \verb|<acidcharged>FALSE</acidcharged>|;
2029\item \verb|<polrule>never_trapped</polrule>| This element indicates
2030  that, whatever the position of the monomer bearing the chemical
2031  group in the polymer sequence (left end, right end or middle), the
2032  chemical group is to be evaluated;
2033\item \verb|<chemgrouprule>| This element provides further details on
2034  the chemistry that the chemical group at hand ($\rm \beta COOH$)
2035  might be involved in:
2036  \begin{itemize}
2037  \item \verb|<entity>MONOMER_MODIF</entity>| This element indicates
2038    that the supplementary data in the current \verb|<chemgrouprule>|
2039    element are pertaining to the $\rm \beta COOH$ chemical group
2040    \emph{only} in case the monomer bearing the chemical group is
2041    chemically modified;
2042  \item \verb|<name>AmidationAsp</name>| This is the modification by
2043    which the monomer should be modified in order to have the
2044    \verb|<chemgrouprule>| element effectively evaluated;
2045  \item \verb|<outcome>LOST</outcome>| This element actually indicates
2046    that if the monomer bearing the chemical group is modified with an
2047    ``AmidationAsp'' chemical modification, then the chemical group
2048    should not be evaluated any more for the electrical charge ---or
2049    pI--- calculations, since reacting a carboxylate group with an
2050    amino group produces an amide group which is not easily chargeable
2051    at physiological pH values.
2052  \end{itemize}
2053\end{itemize}
2054
2055\noindent At this point we should have made it clear how the charge
2056calculations can be configured for the different monomers in the
2057polymer chemistry definition. As usual, the more the polymer chemistry
2058definition is sophisticated, the more sophisticated the computations
2059are allowed.
2060
2061
2062\renewcommand{\sectitle}{Ionized Group(s) In Modifications}
2063\subsection*{\textcolor{sectioningcolor}{\sectitle}}
2064\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
2065
2066
2067In the excerpt from the \filename{pka\_ph\_pi.xml} file below, we see
2068that chemical modifications can also bring charges. The example of the
2069chemical modification ``Phosphorylation'' shows that when a monomer is
2070phosphorylated, two chemical groups are brought in: the first has a
2071pKa value of 1.2 (that is it will always be deprotonated at
2072physiological pH values), the second has a pKa value of 7 (that is it
2073will be divided by half in a protonated (not charged) form and in an
2074un-protonated (negatively charged) form, leading to a net electrical
2075charge of $\mathrm{-0.5}$.
2076
2077\begin{alltt}
2078    <modif>
2079      <name>Phosphorylation</name>
2080      <mdfchemgroup>
2081        <name>none_set</name>
2082        <pka>1.2</pka>
2083        <acidcharged>FALSE</acidcharged>
2084      </mdfchemgroup>
2085      <mdfchemgroup>
2086        <name>none_set</name>
2087        <pka>6.5</pka>
2088        <acidcharged>FALSE</acidcharged>
2089      </mdfchemgroup>
2090    </modif>
2091\end{alltt}
2092
2093\noindent At this point we should be able to study the way
2094computations are actually performed in the \xpe\ module.
2095
2096
2097\renewcommand{\sectitle}{pH, pI and Charge Calculations}
2098\subsection*{\textcolor{sectioningcolor}{\sectitle}}
2099\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
2100
2101The user willing to compute charges (positive, negative, net) or the
2102isoelectric point for the current polymer sequence uses the menu
2103\guimenu{Chemistry}\guimenuitem{pKa pH pI} which triggers the
2104appearance of the window shown in
2105Figure~\vref{fig:xpertedit-net-charge-pka-ph-pi}.
2106
2107\begin{figure}
2108  \begin{center}
2109    \includegraphics[width=0.66\textwidth]
2110    {figures/xpertedit-net-charge-pka-ph-pi.png}
2111  \end{center}
2112  \caption[Acido-basic computations: net charges]{\textbf{Acido-basic
2113      computations: net charges.} This figure shows the options that
2114    can be set for the calculation of the charges beared by the
2115    polymer sequence.}
2116  \label{fig:xpertedit-net-charge-pka-ph-pi}
2117\end{figure}
2118
2119This figure shows that the user can calculate the charges (positive,
2120negative and net) beared by the polymer sequence (either the whole
2121sequence or the current selection) by setting the \guilabel{pH} value
2122at which the computation should take place. It is also possible to
2123calculate the isoelectric point by clicking onto the
2124\guilabel{Isoelectric Point} button.
2125
2126Note that the computations might involve the permanent left/right
2127modifications of the polymer sequence, as well as the monomer chemical
2128modifications. To configure the way net charge---or pI---calculations
2129are performed, use the calculations engine configuration of the
2130sequence editor window.
2131\index{\xpe|)}
2132
2133
2134\renewcommand{\sectitle}{General Options}
2135\section*{\textcolor{sectioningcolor}{\sectitle}}
2136\addcontentsline{toc}{section}{\numberline{}\sectitle}
2137
2138One of the options that are valued most by users is to be able to set
2139the number of decimal places used to diplay numbers. The settings
2140should apply in a distinct manner depending on the different entities
2141for which numerical values are to be displayed. The following are the
2142default values (and recommended ones):
2143
2144\begin{itemize}
2145
2146\item Atoms (and all related entities (isotopic masses, isotopic
2147  abundances): 10;
2148
2149\item pKa, pH, pI: 2;
2150
2151\item Oligomers (obtained \textit{via} mass searches, polymer
2152  cleavages, oligomer fragmentations): 5;
2153
2154\item Polymers : 3;
2155
2156\end{itemize}
2157
2158\noindent Note that modifying these values will allow immediate change
2159of the way numerals are displayed, without needing to restart the
2160program. Only triggering a new cleavage or a new fragmentation will
2161update the data display according to the new options set. These
2162options are stored on the disk and are permanent.
2163
2164\cleardoublepage
2165
2166
2167%%% Local Variables:
2168%%% mode: latex
2169%%% TeX-master: "polyxmass"
2170%%% End:
2171