massxpert-c229f4a1abde3c20b83a90e50f9c5d79104dfa5f/usermanual/xpertedit.tex

\chapter[\xpe] {\xpe: \\A Powerful Editor and Simulation Center}
\label{chap:xpertedit}
\index{\xpe|(}

After having completed this chapter you will be able to perform
sophisticated polymer chemistry simulations on polymer
sequences---that can be edited in place---along with automatic mass
recalculations.

\renewcommand{\sectitle}{\xpe\ Invocation}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\index{\xpe!module~invocation}

The \xpe\ module is easily called by pulling down the \guimenu{\xpe}
menu item from the \mXp\ program's menu. The user may start the \xpe\
module by:
\smallskip

\begin{itemize}

\item Opening a sample polymer sequence;

\item Creating a new polymer sequence;

\item Loading a polymer sequence from disk.

\end{itemize}


\renewcommand{\sectitle}{\xpe\ Operation: \textit{In Medias Res}}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\index{\xpe!open~sequence}

The first manner to start an \xpe\ session is by opening a sample
sequence out of the list of sequences that were shipped along with
\mXp. The \guimenu{\xpe}\guimenuitem{Open Sample Sequence} menu item
opens the dialog box shown in
Figure~\vref{fig:xpertedit-select-sample-sequence}. The drop-down
widget in this dialog window lists all the polymer sequence files that
were shipped along with \mXp. Simply select one item and click
\guilabel{OK}. To select another polymer sequence file, click
\guilabel{Cancel}, which will trigger the system's file selection
dialog to open for you to browse to the location where the polymer
sequence file is stored. The process is identical to the normal
polymer sequence file opening (see below).

\begin{figure}
  \begin{center}
    \includegraphics[width=0.75\textwidth]
    {figures/xpertedit-select-sample-sequence.png}
  \end{center}
  \caption[Selection of a sample polymer sequence]{\textbf{Selection
      of a sample polymer sequence.}  \mXp\ ships with a number of
    sample polymer sequences which are designed to allow easy
    demonstration of the \xpe\ features. This selection dialog lists
    all the polymer sequence files that were shipped along with \mXp.}
  \label{fig:xpertedit-select-sample-sequence}
\end{figure}


The second way to start an \xpe\ session is by creating a new polymer
sequence\index{\xpe!create~sequence} (\guimenu{\xpe}\guimenuitem{New
  Sequence} menu). The program immediately asks to select a polymer
chemistry definition, as shown in
Figure~\ref{fig:xpertedit-choose-pol-chem-def}. The drop-down widget
lists all the polymer chemistry definitions currently registered on
the system. If the polymer chemistry definition is not listed,
clicking onto \guilabel{Cancel} will let the user browse the disk in
search for a polymer chemistry definition file.\footnote{Note that
  once the sequence is saved, the polymer chemistry definition file
  \emph{must} be registered or the sequence file will not be
  loadable. This is described in a later chapter.} Once the polymer
chemistry definition has been selected and successfully parsed by the
program, the user is presented with an empty sequence editor.

The third way to start an \xpe\ session is by opening an existing
polymer sequence file. Once the sequence file has been opened, the user is
presented with a sequence editor as represented in
Figure~\ref{fig:xpertedit-protein-main-view}. At this point, when the
user starts editing a sequence, the characters entered at the
keyboard, or pasted from the clipboard, will be interpreted using the
polymer chemistry definition that was selected in the initialization
window described above.


\begin{figure}
  \begin{center}
    \includegraphics[width=0.75\textwidth]
    {figures/xpertedit-choose-pol-chem-def.png}
  \end{center}
  \caption[Selection of the polymer chemistry
  definition]{\textbf{Selection of the polymer chemistry definition.}
    When creating a new polymer sequence, it is necessary to first
    indicate of what polymer chemistry definition the polymer sequence
    will be. This window lists all the polymer chemistry definition
    currently available on the system.}
  \label{fig:xpertedit-choose-pol-chem-def}
\end{figure}


\begin{figure}
  \begin{center}
    \includegraphics[width=0.8\textwidth]
    {figures/xpertedit-protein-main-view.png}
  \end{center}
  \caption[The \xpe\ module]{\textbf{The \xpe\ module.} This figure shows
    a polymer sequence displayed in an {\xpe}or window.}
  \label{fig:xpertedit-protein-main-view}
\end{figure}


Now, of course, editing a polymer sequence is not enough for a mass
spec\-trome\-tric-ori\-ented software suite; what we want is
\emph{compute masses!}\index{\xpe!mass~calculation} The mass
calculation process is immediately visible on the right hand side of
the sequence editor shown in
Figure~\ref{fig:xpertedit-protein-main-view}. The \guilabel{Masses}
frame~box widget contains two items: \smallskip

\begin{itemize}

\item \guilabel{Whole
    Sequence}\index{\xpe!mass~calculation!whole~sequence} A frame~box
  widget displaying the \guilabel{Mono} and \guilabel{Avg} masses of
  the whole polymer sequence, irrespective of the current selection;

\item \guilabel{Selected Sequence}\index{\xpe!mass~calculation!selected~region}
  A frame~box widget displaying the \guilabel{Mono} and \guilabel{Avg}
  masses of the currently selected region of the polymer sequence.

\end{itemize}
%
The user may change the mass calculation engine configuration at any
point in time using the widgets in the \guilabel{Calculation
  Engine}\index{\xpe!mass~calculation~engine} tool~box that
contains the following configurable parameters: \smallskip

\begin{itemize}

\item \guilabel{Polymer}

  \begin{itemize}

  \item \guilabel{Left
      Cap}\index{\xpe!mass~calculation~engine!left~cap} If checked,
    the left cap of the polymer sequence will be taken into account;

  \item \guilabel{Right
      Cap}\index{\xpe!mass~calculation~engine!right~cap} If checked,
    the right cap of the polymer sequence will be taken into
    account. Note that if \guilabel{Force} is checked also, then the
    modification is taken into account even when selecting a region of
    the sequence that does not encompass the left end monomer;

  \item \guilabel{Left
      Modif}\index{\xpe!mass~calculation~engine!left~modif} If
    checked, the modification of the polymer sequence's left end will
    be taken into account. Note that if \guilabel{Force} is checked
    also, then the modification is taken into account even when
    selecting a region of the sequence that does not encompass the
    right end monomer;

  \item \guilabel{Right
      Modif}\index{\xpe!mass~calculation~engine!right~modif} Same as
    above, but for the right end modification;

  \end{itemize}

\item \guilabel{Selections and regions}

  \begin{itemize}

  \item
    \guilabel{Multi-region}\index{\xpe!mass~calculation~engine!multi-region}
    If checked, the sequence editor allows more than one region to be
    selected at any given time (no limitation on the number of
    selected regions;

  \item
    \guilabel{Multi-selection}\index{\xpe!mass~calculation~engine!multi-selection}
    If checked, the sequence editor allows not only the selection of
    multiple regions at any given time, but also the selection of
    totally or partially overlapping regions.

  \item
    \guilabel{Oligomers}\index{\xpe!mass~calculation~engine!oligomers}
    When multiple regions are selected, each selected region behaves
    like an oligomer, that is, it gets its left and right end caps
    added (if the corresponding calculation engine configuration item
    is activated);

  \item \guilabel{Residual
      chains}\index{\xpe!mass-~alculation~engine!residual~chains}
    When multiple regions are selected, the different regions behave
    like residual chains: the left and end caps are added only once
    (if the corresponding calculation engine configuration item is
    activated).

  \end{itemize}

\item \guilabel{Monomers}

  \begin{itemize}

  \item
    \guilabel{Modifications}\index{\xpe!mass~calculation~engine!modifications}
    If checked, the monomer modifications will be taken into account;

  \item
    \guilabel{Cross-links}\index{\xpe!mass~calculation~engine!cross-links}
    If checked, the cross-links in the polymer sequence will be taken
    into account. Note that \emph{only cross-links fully encompassed
      by the selected sequence region(s)} will be taken into account
    for the \guilabel{Selected sequence} mass calculations. If any
    number of cross-links are not fully encompassed by the currently
    selected sequence region, then that number is displayed along with
    the following label visible in the \guilabel{Selected sequence}
    group box : \guilabel{Incomplete cross-links:}.

  \end{itemize}

\item \guilabel{Ionization}\index{\xpe!mass~calculation~engine!ionization}

  \begin{itemize}

  \item \guivalue{+H} This formula represents the ionization agent
    formula (that is, a protonation);

  \item \guilabel{Unitary charge} \guivalue{1} Charge brought by the
    ionization agent. In the example, a protonation brings a positive
    charge;

  \item \guilabel{Ionization level} \guivalue{1} Level of the
    ionization requested. In the example, a single ionization is
    requested, that is a monoprotonation.

  \end{itemize}

\end{itemize}
%
When any parameter listed above is changed, the recalculation of the
masses---for both the \guilabel{Whole sequence} and the
\guilabel{Selected sequence}---is triggered and the new masses are
updated in their respective line~edit widgets, described earlier. The
fact that the user can specify ionization rules should make it clear
that the values that are displayed are actually \mz ratios (as long as
one ionization is required).


\renewcommand{\sectitle}{The Editor Window Menu}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\index{\xpe!editor~window}

The menu bar in the polymer sequence editor displays a number of menu
items, reviewed below: \smallskip

\begin{itemize}

  %%%%%%% FILE
\item \guimenu{File} (Figure~\ref{fig:xpertedit-file-menu})

  \begin{itemize}

  \item \guimenu{File}\guimenuitem{Close} Closes the sequence;

  \item \guimenu{File}\guimenuitem{Save} Saves the sequence. If the
    sequence has no filename yet, the user is invited to select a
    filename;

  \item \guimenu{File}\guimenuitem{Save As} Save the sequence in a new
    file;

  \item \guimenu{File}\guimenuitem{Import
      Raw}\index{\xpe!sequence-editor!sequence~import} Opens a text file
    and tries to import the sequence. If invalid monomer code
    characters are found, the user is given a chance to revise the
    imported sequence;

  \item \guimenu{File}\guimenuitem{Export to
      Clipboard}\index{\xpe!sequence~editor!sequence~export} Copies the
    sequence and all the data (masses and calculation options) to the
    clipboard, in the form of simple text;

  \item \guimenu{File}\guimenuitem{Export to File} Writes to file the
    sequence and all the data (masses and calculation options) to the
    clipboard, in the form of simple text (if a filename was already
    selected, otherwise the user is invited to select a file into
    which the data are to be written);

  \item \guimenu{File}\guimenuitem{Select export file} Invites the
    user to select a file into which the data are to be written).

  \end{itemize}

  %%%%%%% EDIT
\item \guimenu{Edit}

  \begin{itemize}

  \item \guimenu{Edit}\guimenuitem{Copy} Copies the current selected
    region(s) (if any) to the clipboard. If there are more than one
    region currently selection, then the user is informed that the
    copied sequence will correspond to these two sequences joined
    together. \emph{Be aware, that the order in which the region
      sequences are joined is the order in which the regions were
      selected, and not the order in which the sequences appears in
      the whole polymer sequence};

  \item \guimenu{Edit}\guimenuitem{Cut} Copies the current selection
    (if any) to the clipboard and removes it from the sequence. Note
    that it is not yet possible to cut more than one selected region
    in one single operation;;

  \item \guimenu{Edit}\guimenuitem{Paste} Pastes the sequence from the
    clipboard into the sequence at point (that is the current cursor
    location). If the pasted sequence is found to contain characters
    not valid for the current polymer chemistry definition, the user
    is given a chance to revise the pasted sequence. If one sequence
    region was selected, it is replaced with the pasted sequence. If
    more than one sequence region was selected, the operation cannot
    be performed and the user is informed;

  \item \guimenu{Edit}\guimenuitem{Find
      Sequence}\index{\xpe!sequence~editor!find~sequence~motif} Finds
    a sequence motif in the polymer sequence.

  \end{itemize}

  %%%%%%% CHEMISTRY
\item
  \guimenu{Chemistry}\index{\xpe!sequence~editor!chemical~simulations}
  (Figure~\ref{fig:xpertedit-chemistry-menu})

  \begin{itemize}

  \item \guimenu{Chemistry}\guimenuitem{Modify Monomer(s)} Modify (or
    unmodify) one or more monomers in the polymer sequence;

  \item \guimenu{Chemistry}\guimenuitem{Modify Polymer} Set (or unset)
    the left (or right, or both) modification of the polymer sequence;

  \item \guimenu{Chemistry}\guimenuitem{Cross-link Monomers} Set
    cross-links to monomers of the polymer sequence;

  \item \guimenu{Chemistry}\guimenuitem{Cleave} Perform a
    chemical/enzymatical cleavage of the polymer sequence;

  \item \guimenu{Chemistry}\guimenuitem{Fragment} Perform the gas
    phase fragmentation of the currently selected oligomer;

  \item \guimenu{Chemistry}\guimenuitem{Mass Search} For any sequence
    having a mass matching the searched mass;

  \item \guimenu{Chemistry}\guimenuitem{Compute m/z Ratios} Starting
    from a given \mz ratio and a given ionization status, calculate a
    range of \mz ratios with a given ionization agent;

  \item \guimenu{Chemistry}\guimenuitem{Determine Compositions}
    Calculate the monomeric/element composition of the whole polymer
    sequence or of the current selection;

  \item \guimenu{Chemistry}\guimenuitem{pKa pH pI} Perform acidity, pH
    and isoelectric point calculations on the whole sequence or on the
    current selection.

  \end{itemize}

  %%%%%%% OPTIONS
\item
  \guimenu{Options}\index{\xpe!sequence~editor!number~display}

  \begin{itemize}

  \item \guimenu{Options}\guimenuitem{Decimal places} Set the number
    of decimal places to be used to display the numerical values.

  \end{itemize}


\end{itemize}


\begin{figure}
  \begin{center}
    \includegraphics[width=0.33\textwidth]
    {figures/xpertedit-file-menu.png}
  \end{center}
  \caption[The \xpe\ window File menu]{\textbf{The \xpe\ window File
      menu.} This figure shows the File menu as dropped-down menu
    in the polymer sequence window.}
  \label{fig:xpertedit-file-menu}
\end{figure}


\begin{figure}
  \begin{center}
    \includegraphics[width=0.33\textwidth]
    {figures/xpertedit-chemistry-menu.png}
  \end{center}
  \caption[The \xpe\ window Chemistry menu]{\textbf{The \xpe\ window
      Chemistry menu.} This figure shows the Chemistry menu as
    dropped-down menu in the polymer sequence window.}
  \label{fig:xpertedit-chemistry-menu}
\end{figure}


\renewcommand{\sectitle}{Editing Polymer
  Sequences}\index{\xpe!sequence~editor!sequence~editing}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

As described earlier, in the chapter about the \xpd\ module, a polymer
chemistry definition may allow more than one character to qualify the
codes of the monomers (see chapter~\ref{chap:xpertdef},
section~\vref{sect:monomers}). It was noted also that it is not
because the number of allowed characters is \cfgval{3}, for example,
that all the monomer codes of the polymer chemistry definition must be
defined using three characters: \cfgval{3} is the \emph{maximum}
number of characters that may be used.

\renewcommand{\sectitle}{Multi-Character Monomer Codes}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
\index{\xpe!multi-character~monomer~code}

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.75]
    {figures/xpertedit-3-letter-code-whole-process.png}
  \end{center}
  \caption[Multi-character code sequence editing in
  \xpe]{\textbf{Multi-character code sequence editing in \xpe.} This
    figure shows the process by which it is made possible to edit
    polymer sequences with a monomer code set that allows more than
    one character per code.}
  \label{fig:xpertedit-3-letter-code-whole-process}
\end{figure}

This section deals with the editing of a polymer sequence for which
monomer codes can be made of more than one character.
Figure~\vref{fig:xpertedit-3-letter-code-whole-process} shows the case
of a polymer sequence for which the polymer chemistry definition
allows three characters to define monomer codes. The example is based
on the following real-world situation: the user wants to edit the
sequence by insertion---at the cursor point---of a new ``Aspartate''
monomer, of which the user knows only that its code starts with an
`A'. The cursor is located after the first ``Ala'' monomer at position
1 (panel~1st).

After keying-in \kbdKey{A} (panel~1st), no sequence modification is
visible in the sequence editor. Instead, an `A' character is now
displayed in the left line~edit widget under the sequence.  The reason
of this apparently odd behaviour is that the polymer chemistry
definition allows up to 3 characters to describe a monomer code. If no
monomer vignette is displayed in the polymer sequence, that means that
more than one monomer code start with an `A' character: \xpe\ cannot
figure out which monomer code was actually meant by the user when
keying-in \kbdKey{A}.

There is a way, called \emph{code
  completion}\index{\xpe!code~completion}, to know which monomer
code(s)---in the current polymer chemistry definition---do start with
the keyed-in character(s) (currently, `A'). The user can always enter
the \emph{code completion mode} by hitting the \kbdKey{ENTER}~key.
This is what is shown in the panel~1st, right hand side
\guilabel{Monomer List} listview widget (click on that
\guilabel{Monomer List} label to show that list if it is not already
visible). We see that, in the current polymer chemistry definition,
four monomer codes start with an `A' character, and these are ``Ala'',
``Arg'', ``Asp'' and ``Asn'' (as highlighted in the code completion
monomer list).

Because we now know that the code we are to key-in is ``Asp'', we
key-in a \kbdKey{s}. The result is shown in panel~2nd. What we see
here is that, this time also, nothing changed in the polymer sequence.
What changed is that the character string in the left line~edit widget
below the sequence is now ``As''.  Let's key-in once more the
\kbdKey{ENTER}~key.  This time, only two items are highlighted:
``Asp'' and ``Asn'' in the code completion monomer list (panel~2nd).
This is easy to understand: there are only two monomer codes that
start with the two letters `A' and `s' (``As'') that we have keyed-in
so far.  At this time, we key-in a last character: \kbdKey{p}. At this
point, the monomer is effectively inserted in the polymer sequence, as
the ``Asp'' monomer left of the cursor, as shown in panel~3rd.

\renewcommand{\sectitle}{Unambiguous Single-/Multi-Character Monomer Codes}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}

Let's imagine that we have a polymer chemistry definition that allows
up to 3 characters for the definition of monomer codes, but that we
have one of these monomer codes (let's say the one for the
``Glutamate'' monomer) that is one-letter-long: `E'. This monomer code
`E' is the only one in the polymer chemistry definition to start with
an `E' character. In this case, when we key-in \kbdKey{E}, we'll
observe that the monomer code is immediately validated and that its
corresponding monomer vignette is also immediately inserted in the
polymer sequence.  This is because, \emph{if there is no ambiguity,
  \xpe\ will immediately validate the code being edited}.

The mechanism described above means that the user is absolutely free
to define \emph{only single-character monomer codes} in a polymer
chemistry definition; the behaviour of the program is thus to behave
exactly as if the multi-character code feature was inexistent in the
program: each time a new uppercase letter is keyed-in, it is
automatically validated and the corresponding monomer is created in
the sequence.


\renewcommand{\sectitle}{Erroneous Monomer Codes}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
\index{\xpe!monomer~code~errors}

The typing error detection system triggers immediate alerts whenever
the code beign keyed-in is incorrect. This is described in
Figure~\vref{fig:xpertedit-sequence-editor-bad-char}. If the user
enters an uppercase character not matching any monomer code currently
defined in the polymer chemistry definition, or a lowercase character
as the first character of a monomer code, the program immediately
complains in the right line~edit widget below the sequence. In this
case, the monomer code is not put into the left text widget, which
means it is simply ignored.

\begin{figure}
  \begin{center}
    \includegraphics[width=0.8\textwidth]
    {figures/xpertedit-sequence-editor-bad-char.png}
  \end{center}
  \caption[Bad code character in \xpe\ sequence editor]{\textbf{Bad
      code character in \xpe\ sequence editor.} This figure shows the
    feedback that the user is provided by the code editing engine,
    when a bad character code is keyed-in.}
  \label{fig:xpertedit-sequence-editor-bad-char}
\end{figure}

If the user starts keying-in valid monomer character codes, like for
example we did earlier with ``As'', and that she wants to erase these
characters because she changed her mind, she \emph{must not} use the
\kbdKey{BACKSPACE} key, because this key will erase the monomer left
of the cursor point in the polymer sequence! The way that the user has
to remove the characters currently displayed in the left line~edit
widget below the sequence, is to key-in the \kbdKey{Esc} key once for
each character. For example, let's say you have already keyed-in
\kbdKey{A} and \kbdKey{s}. In this case the left line~edit widget
displays these two characters: ``As''. Now, if the user changes his
mind, not willing to enter ``Asp'' monomer code anymore, but ``Gly''
instead, all she has to do is to key-in the \kbdKey{Esc} key once for
the `s' character (which disappears) and once more to remove the
remaining `A' character.  At this point it is possible to start fresh
with the ``Gly'' monomer code by keying-in sequentially \kbdKey{G},
\kbdKey{l} and finally \kbdKey{y}.


\renewcommand{\sectitle}{Simplified Editing}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}

When the monomer codes of a given polymer chemistry definition are too
numerous or too long to remember, one simplified editing strategy is
by using the list of available monomers located on the right side of
the sequence editor (widget labelled \guilabel{Monomer list}). The
items in the list are active: if double-clicked, an item will see its
corresponding monomer code inserted in the sequence at the current
cursor location. This list thus makes it easy to ``visually'' edit the
polymer sequence without having to remember all the codes in the
polymer chemistry definition.


\renewcommand{\sectitle}{Finding sequence motifs}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\index{\xpe!sequence~editor!find~sequence~motif}

Finding sequence motifs in the polymer sequence is performed by
selecting the \guimenu{Edit}\guimenuitem{Find Sequence} menu item. The
dialog window is shown in
Figure~\vref{fig:xpertedit-find-sequence-dlg}. When performing the
first search in a polymer sequence, the \guilabel{Find} button should
be used. This will trigger a search starting at the beginning of the
polymer sequence. For each successive search, the \guilabel{Next}
button should be used.

Each searched sequence motif will be stored in a history list that is
made available by dropping down the combo box widget where the
sequence motif is entered. The \guilabel{Clear history} button will
erase all the searched sequence motifs from the history, thus
resetting it.

\begin{figure}
  \begin{center}
    \includegraphics[width=0.5\textwidth]
    {figures/xpertedit-find-sequence-dlg.png}
  \end{center}
  \caption[Finding a sequence motif in the polymer
  sequence]{\textbf{Finding a sequence motif in the polymer sequence.}
    The first iteration should be performed by clicking onto the
    \guilabel{Find} button, and each following iterations should be
    performed using the \guilabel{Next} button.}
  \label{fig:xpertedit-find-sequence-dlg}
\end{figure}


\renewcommand{\sectitle}{Importing Sequences}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\index{\xpe!sequence~editor!sequence~import}

Very often, the user will make a sequence search on the web and be
provided with a polymer sequence that is crippled with non-code
characters. That web output might either be saved in a text file for
future reference or copied to the clipboard for immediate use in \mXp.
The two cases are reviewed below.


\renewcommand{\sectitle}{Importing From The Clipboard}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}

\xpe\ provides a convenient way to spot non-valid characters in a text
and to let the user ``purify'' the imported sequence.  A
clipboard-imported sequence is systematically parsed. When invalid
characters are found, the window depicted in
Figure~\vref{fig:xpertedit-sequence-editor-sequence-import-errors-first}
is presented to the user for her to make appropriate adjustments (in
this example we tried to copy from clipboard the following sequence:
``\texttt{!100 ATGCATGC ATGCATGC ATGCATGC ATGCAUGC
  anotherSilly-Text;}'').

\begin{figure}
  \begin{center}
    \includegraphics[width=0.75\textwidth]
    {figures/xpertedit-sequence-editor-sequence-import-errors-first.png}
  \end{center}
  \caption[Clipboard-imported sequence
  error-checking]{\textbf{Clipboard-imported sequence error-checking.}
    If a sequence that is imported through the clipboard to the \xpe\
    sequence editor contains invalid characters, the user is provided
    with a facility to ``purify'' the sequence. This facility is
    provided to the user through the window depicted in this figure.}
  \label{fig:xpertedit-sequence-editor-sequence-import-errors-first}
\end{figure}

As soon as a character does not correspond to any valid monomer code,
it is tagged, and the sequence is presented to the user in a text~edit
widget (\guilabel{Initial Sequence}) with the all the improper
characters tagged by underlining. At that point, if the user clicks
the \guilabel{Remove Tagged From Initial} button, all the tagged
characters will be automatically removed and the purified sequence
will show up in the \guilabel{Purified Sequence} text~edit widget.

Also, the user is provided with automatic ``purification'' procedures
whereby it is possible to remove one or more classes of characters
from the imported sequence (\guilabel{Purification Options} frame
widget). Checking one or more of the \guilabel{Numerals} or
\guilabel{Spaces} or \guilabel{Punctuation} or \guilabel{LowerCase} or
\guilabel{Uppercase} checkbuttons, or even entering other
user-specified regular expressions in the \guilabel{Other (RegExp)}
line~edit widget, will elicit their removal from the imported sequence
after the user clicks the \guilabel{Purify Initial (Options)} button.

\begin{figure}
  \begin{center}
    \includegraphics[width=0.75\textwidth]
    {figures/xpertedit-sequence-editor-sequence-import-errors-second.png}
  \end{center}
  \caption[Clipboard-imported sequence
  purification]{\textbf{Clipboard-imported sequence purification.}
    There are a number of ways to purify a sequence. Here the
    \guilabel{Remove Tagged From Initial} button was clicked. The
    purified sequence shows up in the \guilabel{Purified Sequence}
    text~edit widget.}
  \label{fig:xpertedit-sequence-editor-sequence-import-errors-second}
\end{figure}


When the user is confident that almost all the erroneous characters
have been removed
(Figure~\vref{fig:xpertedit-sequence-editor-sequence-import-errors-second}),
she can click the \guilabel{Test Purified} button, which will trigger
a ``re-reading'' of the sequence in the \guilabel{Purified Sequence}
text~edit widget. If erroneous characters are still found, they are
tagged.

Note that, for maximum flexibility, the user is allowed an immediate
and direct editing of the purified sequence in the \guilabel{Purified
  Sequence} text~edit widget (that is, that text~edit widget is
\emph{not} read-only).

Once the sequence if finally depured from all the invalid characters,
the user can select it in the text~edit widget and paste it in the
\xpe\ sequence editor. This time, the paste operation will be
error-free.  Note that if any sequence portion is currently selected,
it will be replaced by the one that is being pasted into the editor.


\renewcommand{\sectitle}{Importing From Raw Text Files}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}


It might be of interest to be able to import a sequence from a raw
file. To this end, the user is provided the menu
\guimenu{File}\guimenuitem{Import Raw} that opens up a file selection
window from which to choose the file to import. The program then
iterates in the lines of that file and checks their contents for
validity. If errors are found, then the same process as described
earlier for clipboard-imported sequences is started. The user can then
purify the sequence imported from the file and finally integrate that
sequence in the polymer sequence currently edited. Note that if any
sequence portion is currently selected, it will be replaced by the one
that is being imported.


\renewcommand{\sectitle}{Multi-region
  Selections}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\index{\xpe!multi-region~selections}
\index{\xpe!sequence~editor!multi-region~selection}

\mXp\ implements a sophisticated multi-region selection model. Two
selection modes are available:\\

\begin{itemize}

\item \emph{Multi-region selection mode:}\/ In this mode, it is
  possible to select more than one region in the polymer sequence. In
  all cases below, make sure that the \guilabel{Multi-region}
  checkbutton is checked in \guilabel{Selections and regions} group
  box. This is how these selections are performed:

  \begin{itemize}

  \item \textsl{With the
      mouse:}\index{\xpe!sequence~editor!mouse~selections}
    Left-click and drag to make the first selection. Go with the mouse
    cursor at the beginning of new selection, hold the \kbdKey{Ctrl}
    key down while left-clicking and dragging to perform the second
    region selection. Continue as may times as necessary;

  \item \textsl{With the
      keyboard:}\index{\xpe!sequence~editor!keyboard~selections}
    Position the cursor at the beginning of the first region to be
    selected, hold the \kbdKey{Ctrl}+\kbdKey{Shift} keys down while
    moving the cursor with the direction keys (\kbdKey{$\leftarrow$},
    \kbdKey{$\rightarrow$}, \kbdKey{$\uparrow$},
    \kbdKey{$\downarrow$}). Hold the \kbdKey{Ctrl} key down and use
    the direction keys to go to the beginning of the new region
    selection, press the \kbdKey{Shift} key and hold it down while
    moving the cursor with the direction keys to actually perform the
    region selection.

  \end{itemize}

\item \emph{Multi-selection region mode:}\/ In this mode (which
  requires the multi-region selection mode to be enabled), it is
  possible to perform selections that overlap. For example, one could
  select the sequence ``MAMISGM'' and then select the sequence
  ``SGMSGRKAS''. The overlapping sequence is thus ``SGM''.

\end{itemize}

\noindent Being able to select multiple regions and/or to select
multiple times the same region involves some configurations, as far as
calculating relevant masses is concerned. Indeed, whatever the
selection mode that is enabled, each time one selection (overlapping
with another or not) is added or removed, masses are recalculated for
the current selection.\footnote{``Selection'', here, is thus used to
  collectively represent all multi-region selections and
  multi-selection regions at any given time in the polymer sequence
  editor.} The way the multi-region selections and the multi-selection
regions are handled, from the mass calculation standpoint, is
configured as follows:\\

\begin{itemize}

\item \emph{Regions are oligomers:} In this configuration, each
  selection behaves as an oligomer, and thus should normally be capped
  on both its left and right ends. This is typically the situation
  when the user wants to simulate the formation of a cross-linked
  species arising from the cross-linking of two oligomers: each
  oligomer is capped on both its ends;

\item \emph{Regions are residual chains:} In this configuration, each
  selection behaves as a residual chain, and thus the oligomer
  resulting from the multi-region selections is capped on its left and
  right ends only once. This situation is typically encountered when
  simulating partial cleavages by first selecting an oligomer,
  checking its mass and then continuing selection to simulate a longer
  oligomer resulting from a partial cleavage. Also, the situation
  might be encountered when there are multiple repeated sequence
  motifs in a polymer sequence and mass data are difficult to analyze.

\end{itemize}


\renewcommand{\sectitle}{Polymer Sequence Modification}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

It very much often happens that the (bio)~chemist uses chemical
reactions to modify the polymer sequence she is working on. Mass
spectrometry is then often used to check if the reaction proceeded
properly or not. Further, in nature, chemical modifications of
biopolymer sequences are very often encountered. For example, protein
sequences get often modified as a means to regulate their function
(phophorylations, for example, or acetylations, methylations\dots).
Nucleic acid sequences are very often and extensively modified with
modifications such as methylation\dots

It is thus crucial that \mXp\ be able to model with high precision and
flexibility the various chemical reactions that can be either made in
the chemistry lab or found in nature. The \mXp\ program provides two
different chemical modification processes:

\begin{itemize}
\item A process by which monomers belonging to the polymer sequence
  can be individually modified;
\item A process by which the whole polymer sequence can be modified,
  either on its left end or on its right end or even on both ends.
\end{itemize}

\renewcommand{\sectitle}{Selected Monomer(s) Modification}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
\label{subsect:chemical-modification-monomers}
\index{\xpe!simulations!monomer~modification}

There are a number of manners in which monomers can be modified in a
polymer sequence. Figure~\vref{fig:xpertedit-modify-monomer} shows the
simplest manner: the user first selects the monomer vignette to be
modified and calls the \guimenu{Chemistry}\guimenuitem{Modify
  Monomer(s)} menu.  A window shows up where all the modifications
currently available in the polymer chemistry definition are listed.
Because a monomer vignette was initially selected in the editor
window, the \guilabel{Selected Monomer} target radiobutton is on by
default.\footnote{Note that if a sequence was selected when the
  monomer modification task was started, then selecting
  \guilabel{Current selection} would be required to modify all the
  monomers in the selection. Alternatively, if this is not what is
  required, re-selecting the right monomer in the sequence and
  selecting \guilabel{Current selection} will ensure the modification
  applies only on the currently selected monomer.}  It is then simply
a matter of choosing the right modification from the
\guilabel{Available modifications} list and clicking onto the
\guilabel{Modify} button. The target(s) of a given modification (as
selected in the \guilabel{Target} frame widget) can be identified
according to: \smallskip

\begin{figure}
  \begin{center}
    \includegraphics[width=1\textwidth]
    {figures/xpertedit-modify-monomer.png}
  \end{center}
  \caption[Modification of a monomer in a polymer
  sequence]{\textbf{Modification of a monomer in a polymer sequence.}
    This figure shows how the chemical modification of monomer(s) can
    be performed.}
  \label{fig:xpertedit-modify-monomer}
\end{figure}

\begin{itemize}

\item The \guilabel{Selected Monomer} frame will display data in its
  two line~edit widgets if a single monomer vignette was selected at
  the time the monomer modification action was invoked (exactly as in
  Figure~\vref{fig:xpertedit-modify-monomer}). Only the monomer of
  which the code and the position are displayed will be modified (even
  if it is no more selected or if the sequence has changed and the
  monomer at the displayed position is not the same anymore);

\item The \guilabel{Current Selection} radiobutton widget indicates
  that the modification should be performed on all the monomers that
  are \textit{currently} selected, that is, if the selection changed
  after the modification window was displayed, the new selection is
  modified, not the old one;

\item The \guilabel{Monomers Of Same Code} If a monomer code is
  displayed in the \guilabel{Selected Monomer} frame, all the monomers
  in the sequence that have that code are modified;

\item \guilabel{Monomers From The List} All the monomers in the
  polymer sequence having a code corresponding to any code selected in
  the \guilabel{Available Monomers} list are modified;

\item \guilabel{All Monomers} All the monomers of the polymer sequence
  are modified;

\end{itemize}
%
Note that there is one checkbox widget (\guilabel{Override target
  limitations}) that requires explanation. In the chapter about the
definition of polymer chemistries (chapter\vref{chap:xpertdef}) the
definition of modifications was detailed, and the target notion was
explicited. If, during a monomer modification, \mXp\ detects that the
user is trying to modify a monomer that is not a target of the
modification at hand, it will complain, as shown in the
\guilabel{Messages} text~edit widget of
Figure~\vref{fig:xpertedit-modify-monomer}). In this example, indeed,
the user tried to modify monomer \emph{Isoleucine} with
\emph{Phosphorylation}, which is not possible because modification
\emph{Phosphorylation} has been defined a not having monomer
\emph{Isoleucine} as any of its targets. Another situation where
target limitations might show up, is when trying to modify a monomer
more than authorized by the \guilabel{Max. count} number of times that
monomer might be modified at once with that modification. For example,
when working of methylation of proteins, it might happen that lysyl
residues get methylated more than one at a time (tri-methylation
occurs often in histones). If the chemical modification was defined in
\xpd\ with a max count of 2 and a third chemical modification is asked
on a given target monomer, then the program refuses to perform the
modification. To override this limitation, check the
\guilabel{Override target limitations} checkbox widget.


The general concept about this is : the \guilabel{Override target
  limitations} checkbox widget is unchecked by default so that the
user does not do mistakes without knowing. However, flexibility is
desirable, and the \guilabel{Override target limitations} checkbox
widget can be checked if required.

As a result of the monomer modification, the monomer vignette gets
modified. Figure~\vref{fig:xpertedit-modify-monomer} shows one
phosphorylated Seryl residue at position 8: a transparent graphics
object (a red `P') was overlaid onto the corresponding seryl monomer
vignette. If the user modifies a monomer with a modification that has
no corresponding \fileformat{svg} file defined for its graphical
rendering in file \filename{modification\_dictionary}, then a default
modification rendering is used.

The user is responsible for correctly reading the messages that might
be published in the \guilabel{Messages} text~edit widget. It is
important to understand that, when a monomer is modified, its previous
modification (if any) is overwritten with the new one. The user is
invited to experiment a bit with the monomer modification process, so
as to be confident of the results that she is going to obtain when
real polymer chemistry work is to be modelled in \mXp.

If the modification to be applied is not readily available in the list
of modifications defined in the polymer chemistry definition, then it
is possible, by checking the \guilabel{Define modification} check
button widget to manually define a modification. This procedure leads
to the modification of the target monomer(s) exactly as if the
modification had been selected from the list of available
modifications. But, because the modification has a name not known to
the polymer chemistry definition, the editor cannot modify the monomer
vignette with a predefined transparent raster image. Thus, as seen on
Figure~\ref{fig:xpertedit-modify-monomer-manually-defined-modif}, the
modified residue gets visually modified using the default transparent
raster image (4 interrogation marks, one at each corner of the monomer
vignette square).

\begin{figure}
  \begin{center}
    \includegraphics[width=0.66\textwidth]
    {figures/xpertedit-modify-monomer-manually-defined-modif.png}
  \end{center}
  \caption[Rendering of a monomer modification in a polymer
  sequence]{\textbf{Rendering of a monomer modification in a polymer
      sequence.}  This figure shows how the chemical modification of
    monomer(s) is graphically rendered. The `K' residue is modified
    using an ``Acetylation'' modification. The `S' residue is modified
    with a modification that has no associated graphical vignette. The
    default vignette is thus used.}
  \label{fig:xpertedit-modify-monomer-manually-defined-modif}
\end{figure}

It is perfectly feasible to modify a single monomer more than once
(with the same modification or not ; for example a tri-methylation
with a methylation modification). This is why when the window depicted
in Figure~\ref{fig:xpertedit-modify-monomer} shows up, the two lists
at the right hand side show the monomers currently modified and the
modification(s) that are currently set to these modified
monomers. Selecting one item from the \guilabel{Modified monomers}
list will show only the modifications set to that monomer in the
\guilabel{Modifications} list. If all the modifications in the polymer
sequence are to be displayed then, checking the \guilabel{All
  modifications} check box widget will trigger the display of all the
modifications set to any monomer in the whole polymer sequence.

Unmodification of monomers is easily performed by selecting any number
of items from the \guilabel{Modifications} list and clicking the
\guilabel{Unmodify} button.

\fbox{\parbox{0.9\textwidth}{\textsl{It should be noted that once a
      monomer modification dialog window has been opened, the polymer
      sequence should not be edited. This is because the
      modification/unmodification process takes for granted that the
      polymer sequence still is identical to what it was when the
      monomer modification dialog was opened. Mecanisms are there to
      ensure that the irreparable does not happen, but this warning is
      in order.}}}


\renewcommand{\sectitle}{Whole Sequence Modification}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
\index{\xpe!simulations!polymer~modification}

As described above, it is possible to modify any monomer in the
polymer sequence; whhen any modified monomer is removed, the
modification associated to it disappears also. The modifications that
we describe here are not of this kind. They can be applied to either
the left end of the polymer sequence or its right end (or both ends at
any given time).  But these modifications do belong to the polymer
sequence \textit{per se} and are not removed from it---even if the
polymer sequence is edited by removing the left end monomer or the
right end monomer.  This is why these modifications are \emph{polymer
  modifications} and not monomer modifications.

\begin{figure}
  \begin{center}
    \includegraphics[width=0.66\textwidth]
    {figures/xpertedit-modify-polymer.png}
  \end{center}
  \caption[Modification of the left end of a polymer
  sequence]{\textbf{Modification of the left end of a polymer
      sequence.} This figure shows how simple it is to permanently
    modify a polymer sequence on either or both its left/right ends.}
  \label{fig:xpertedit-modify-polymer}
\end{figure}

The way in which a polymer sequence is modified using \emph{polymer
  modifications} is much easier than the previous \emph{monomer
  modifications} case. The modification window is opened by choosing
the \guimenu{Chemistry}\guimenuitem{Modify Polymer} menu. The
Figure~\vref{fig:xpertedit-modify-polymer} shows that window. The
modification is absolutely easy to perform, with a clear feedback
provided to the user (by listing the permanent modifications in two
line~edit widgets located in front of the \guilabel{Target}
checkbuttons \guilabel{Left End} and \guilabel{Right End}.

Note that, as a convenience for the user, it is possible to modify the
polymer sequence using an arbitrary modification in the form of a
combination of a name and a formula (check the \guilabel{Define
  modification} checkbox, to that effect). The modification object
used is created on-the-fly by the program and gets saved in the file
as if the user had selected a modification out of the list of
available modifications. In the example
(Figure~\vref{fig:xpertedit-modify-polymer}), the polymer sequence was
modified on its left end using the ``Acetylation'' modification
available in the polymer chemistry definition and was amidated
(formula \guivalue{-OH+NH2}) with a manually-defined modification
called \guivalue{MyModif}. The polymer sequence editor window displays
the left end and right end modifications as labels of buttons located
in the \guilabel{Polymer modifications} groupbox.


\renewcommand{\sectitle}{Monomer Cross-linking}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\label{subsect:monomer-cross-link}
\index{\xpe!monomer~cross-linking}

A cross-link is a covalent bond that links a monomer with one
or more other monomer. A monomer might be cross-linked more than once.
The dialog window in which the user might define cross-links is shown
in Figure~\ref{fig:xpertedit-cross-link-monomers}.

\begin{figure}
  \begin{center}
    \includegraphics[width=1\textwidth]
    {figures/xpertedit-cross-link-monomers.png}
  \end{center}
  \caption[Cross-linking of monomers]{\textbf{Cross-linking of
      monomers.}  This figure shows the window in which monomers can
    be cross-linked together. A cross-link (as defined in the current
    polymer chemistry definition) is selected and the targets are
    specified in the \guilabel{Targets' positions} text line edit
    widget in the form of monomer positions separated by ';'
    semicolumns.}
  \label{fig:xpertedit-cross-link-monomers}
\end{figure}

Cross-linkers were defined in the section about \xpd\ (see
page~\pageref{sect:cross-linkers}). A cross-linker might either define
no modification to be applied to the cross-linked monomers or the same
number of modifications as there are monomers cross-linked. For
example, fluorescent proteins have a chromophore that is made by
reaction of three residues (Threonyl [or Seryl]--Tryptophanyl [or
Tyrosinyl or Phenylalanyl]--Glycyl), as shown in
Figure~\ref{fig:xpertedit-cross-linked-monomers}. When cross-linking
with the fluorescent protein cross-linker, there must be three
monomers involved as these are three modifications defined in the
cross-linker.

\begin{figure}
  \begin{center}
    \includegraphics[width=0.4\textwidth]
    {figures/xpertedit-cross-linked-monomers.png}
  \end{center}
  \caption[Graphical rendering of cross-linked
  monomers]{\textbf{Graphical rendering of cross-linked monomers.}
    This figure shows the three monomers (TWG) from cyan fluorescent
    protein cross-linked together.}
  \label{fig:xpertedit-cross-linked-monomers}
\end{figure}

When any monomer involved in a cross-linker is edited off a polymer
sequence, the cross-link(s) it was involved in are automatically
dissolved and destroyed. Destruction of a cross-link might be
performed by selecting the cross-link in the \guilabel{Cross-links}
list widget at the right hand side of the dialog window depicted in
Figure~\ref{fig:xpertedit-cross-link-monomers} and by clicking the
\guilabel{Uncross-link} button.


\renewcommand{\sectitle}{Sequence Cleavage}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\label{sect:cleave-polymer-sequences}
\index{\xpe!simulations!sequence~cleavage}

It happens very often that polymer sequences get cleaved in a
sequence-specific manner. These specific cleavages do occur very often
in nature, and are made by enzymes that do cleave biopolymer
sequences, like the glycosidases (cleaving saccharides), the proteases
(cleaving proteins) or the nucleases (cleaving nucleic acids). But the
scientist also uses purified enzymes or chemicals to perform such
cleavages in the test tube.  \mXp\ must be able to perform those
cleavages \textit{in silico}.

\begin{figure}
  \begin{center}
    \includegraphics[width=0.9\textwidth]
    {figures/xpertedit-cleavages.png}
  \end{center}
  \caption[Polymer sequence cleavage window]{\textbf{Polymer sequence
      cleavage window.}  This figure shows the window in which polymer
    sequence cleavage is performed.  One cleavage specification is
    selected and the number of allowed partial cleavages is set. The
    results are displayed in the same window. The cleavage might be
    performed on the currently selected polymer sequence region or the
    whole sequence. It is possible to stack oligomers from different
    cleavage simulation in the same window.}
  \label{fig:xpertedit-cleavages}
\end{figure}

It is a matter of having a polymer sequence opened in an editor window
and selecting the \guimenu{Chemistry}\guimenuitem{Cleave} menu. The
user is provided with a window where a number of cleavage
specifications are listed (Figure~\ref{fig:xpertedit-cleavages},
page~\pageref{fig:xpertedit-cleavages}) along with options that allow
customizing the production of oligomers.  The cleavage specifications
are listed in the \guilabel{Available cleavage agents} list widget by
looking into the polymer chemistry definition corresponding to the
polymer sequence to be cleaved. The program knows, for example, that
the polymer sequence to be cleaved is of the ``protein-1-letter''
chemistry type, and thus will list all the cleavage specifications
that were defined in that polymer chemistry definition.

The user selects the cleavage specification of interest and sets other
useful parameters, like the number of partial cleavages that the
cleaving agent may yield, for example. Entering \guivalue{0} means
that the cleavage reaction will yield the set of oligomers
corresponding to a total cleavage of the polymer sequence (no missed
cleavages=partial cleavages 0). Also, the user might indicate that the
oligomers computed during the cleavage should be ionized according to
the current ionization rule (displayed in the main window) and in the
specified range. Finally, when the window is opened, the
\guilabel{Oligomer coordinates} group box widget lists the coordinates
of the currently selected region of the polymer sequence. Either leave
the values as they are shown or check the \guilabel{Whole sequence}
check box widget. In the first case, the cleavage will occur only
inside the selected region of the polymer sequence (that is, taking
that region to be the actual polymer sequence of interest); in the
second case, the cleavage will take place in the whole polymer
sequence whatever the currently selected polymer sequence region.
This feature, which was introduced in version 2.3.0, is useful so as
to simulate a first cleavage of a polymer sequence and then a second
cleavage of a selected oligomer using a different cleavage agent. In
protein chemistry, that would be useful to explore possibilities of
double sequential cleavages of a protein, first with EndoAspN, for
example, and then with Trypsin.

The user might want to generate oligomers for different kinds of
cleavages. For example, it might be interesting to have in the same
tree view widget the oligomers generated using first trypsin and then
cyanogen bromide. In order to add new oligomers to pre-existing one,
it is simply required to check the \guilabel{Stack oligomers} check
button widget prior to clicking the \guilabel{Cleave} button again
with the new cleavage settings.

The \guilabel{Details} frame widget at the bottom of the window
displays a number of informative data. In particular, the
\guilabel{Sequence} tab widget displays the sequence of the oligomer
currently selected in the \guilabel{Oligomers} table view along with the
name of the cleavage agent which it arose from. The \guilabel{Cleavage
  Details} tab widget displays the mass calculation engine
configuration at the time the \emph{last} cleavage was performed (one
red led means that the related feature was off, conversely a green led
means that the feature was on). In our example, the mass calculation
for the oligomers did not account for the monomer modifications nor
for the left/right ends of the polymer, nor for the cross-links.

When the user triggers a cleavage, the mass calculation engine
configuration currently set in the sequence editor is used for the
calculation of the mass of the oligomers obtained \textit{per} the
cleavage.  This process allows an easy change in the mass calculation
engine configuration between one cleavage and another so as to allow
comparison of masses obtained for the same cleavage but with different
mass calculation engine configurations.

Finally, one last note: if the list of monoisotopic or average masses
are desired in the form of a text list, right-clicking onto the table
iew widget will allow copying to the clipboard either the monoisotopic
or the average masses. Also, it is possible to either export the data
to the clipboard or to a file or even to drag the displayed oligomer
items in a text editor. Only the selected items in the tree view
widget will be exported.

For oligomer data filtering, please refer to
section~\ref{sect:oligomer-data-filtering}, page
\pageref{sect:oligomer-data-filtering}.

\renewcommand{\sectitle}{Spectrum calculation}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
\index{\xpe!simulations!spectrum-calculation}

It is possible to create a full spectrum simulation based on the
oligomers presented in the \guilabel{Oligomers} table widget. For
that, click the \guilabel
{Create spectrum} menu in the drop down
menu. Clicking that menu will elicit the opening of the window shown
in Figure~\ref{fig:xpertedit-spectrum-creation-from-cleavages}.


\begin{figure}
  \begin{center}
    \includegraphics[scale=1]
    {figures/xpertedit-spectrum-creation-from-cleavages.png}
  \end{center}
  \caption[Spectrum simulation for cleavage-obtained
  oligomers]{\textbf{Spectrum simulation for cleavage-obtained
      oligomers.}  This figure shows how to configure the calculation
    of a spectrum for a set of oligomers obtained after the cleavage
    of a polymer sequence.}
  \label{fig:xpertedit-spectrum-creation-from-cleavages}
\end{figure}

If the \guilabel{Isotopic cluster} check box is not checked, then the
spectrum will not contain the isotopic cluster for each
oligomer. Instead, a single peak will be calculated, based either on
the monoisotopic or on the average mass of the oligomer that is used
as the peak centroid. When the \guilabel{Isotopic cluster} check box
is checked, the starting mass is evidently monoisotopic as the
isotopic cluster is calculated starting from that mass. Note that the
other parameters have been explained earlier
(see section~\ref{sect:xpertcalc-isotopic-pattern-calculator},
page~\pageref{sect:xpertcalc-isotopic-pattern-calculator}).

Selecting a file to write the results (that is the (x y) pairs making
the spectrum) is recommended. Otherwise, when the calculation is
finished, refer to the \guilabel{Results} tab page widget for the same
spectrum (x y) pairs.

During the calculation, the \guilabel{Log} tab page widget shows the
details of the running calculation. For example, the following is the
log for the first two oligomers of a set of 123:

{\small
\begin{verbatim}

Simulating a spectrum with calculation of
an isotopic cluster for each oligomer.

There are 123 oligomers. Calculating sub-spectrum for each

Computing isotopic cluster for oligomer 1
	formula: C82H123N22O25.
 Validating formula... Success.
	mono m/z: 1815.9
	charge: 1
	fwhm: 0.18159
	increment: 0.024212

		Done computing the cluster

Computing isotopic cluster for oligomer 2
	formula: C82H124N22O25.
 Validating formula... Success.
	mono m/z: 908.455
	charge: 2
	fwhm: 0.0908455
	increment: 0.00605637

		Done computing the cluster
\end{verbatim}
}

The previous example dealt with the horse apomyoglobin that was
cleaved with trypsin, with 1 partial cleavage and charge levels from 1
to 3. That cleavage simulation yielded 123 oligomers, for which a
spectrum was calculated which spans the [49.7--3418] m/z
range. Figure~\ref{fig:xpertedit-spectrum-simulation-cleavage-oligomers}
shows that spectrum, zoomed in the region [744--759]. Four distinct
isotopic clusters are visible:

\begin{figure}
  \begin{center}
    \includegraphics[width=\textwidth]
    {figures/xpertedit-spectrum-simulation-cleavage-oligomers.png}
  \end{center}
  \caption[Simulated spectrum for cleavage-obtained
  oligomers]{\textbf{Simulated spectrum for cleavage-obtained
      oligomers.}  This spectrum (zoomed portion viewed in
    \progname{mMass}) has been simulated starting from a list of
    oligomers obtained by cleaving the horse apomyoglobin protein with
    trypsin.}
  \label{fig:xpertedit-spectrum-simulation-cleavage-oligomers}
\end{figure}

\begin{tabbing}
mono m/z \phantom{room} \= Peptide sequence\phantom{still some roooom here} \= charge\\[2mm]

744.70 \> HPGDFGADAQGAMTKALELFR \> 3+\\[2mm]
748.44 \> ALELFR \> 1+\\[2mm]
751.84 \> HPGDFGADAQGAMTK \> 2+\\[2mm]
753.98 \> KHGTVVLTALGGILK \> 2+\\[2mm]
\> HGTVVLTALGGILKK \> 2+\\[2mm]
\end{tabbing}


Computing a full spectrum starting from oligomers which might have
large masses (> 6000) will require a large amount of CPU. The above
apomyoglobin example could be handled in $\approx$\,20~s on a rather
powerful laptop (albeit with a single processor used throughout the
task).


\renewcommand{\sectitle}{Oligomer Fragmentation}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\label{sect:fragment-polymer-sequence}
\index{\xpe!simulations!oligomer~fragmentation}

It happens very often that polymer sequences need to be fragmented in
the gas phase (in the mass spectrometer) so that structure
characterizations may be performed. For protein chemistry, this
happens very often in order to get sequence information for a given
peptide ion selected in the gas phase. \mXp\ must be able to perform
those fragmentations \textit{in silico}.  Let's see how an oligomer
can be fragmented using \mXp.

\begin{figure}
  \begin{center}
    \includegraphics[scale=1]
    {figures/xpertedit-fragmentation.png}
  \end{center}
  \caption[Oligomer fragmentation window]{\textbf{Oligomer
      fragmentation window.}  This figure shows the window in which
    oligomer fragmentation is performed.  One or more fragmentation
    patterns might be selected in one fragmentation step.}
  \label{fig:xpertedit-fragmentation}
\end{figure}

It is a matter of having a polymer sequence opened in an editor window
and selecting the sequence region to be fragmented. Once this is done,
the user selects the \guimenu{Chemistry}\guimenuitem{Fragment} menu.
The user is provided with a window where a number of fragmentation
specifications are listed (Figure~\vref{fig:xpertedit-fragmentation}).
As detailed for the cleavage of polymers, these fragmentation
specifications are listed by looking into the polymer chemistry
definition corresponding to the polymer sequence of which an oligomer
is to be fragmented.

The user selects the fragmentation specification(s) of interest, set
the ionization range required for the generated fragment oligomers
(the same as for polymer cleave) and clicks the \guilabel{Fragment}
button.  Upon successful termination of the fragmentation reaction,
the generated fragments are displayed in the \guilabel{Oligomers}
table view widget.

As detailed for the cleavage of polymer sequences, the
\guilabel{Details} frame widget displays data about the fragments
generated and the way masses were calculated for them.

It is possible to take into account cross-links that are beared by
monomers contained in the oligomer. Only cross-links that are fully
contained in the oligomer are taken into account. Partial cross-links,
that is, cross-links that have at least one involved monomer outside
of the oligomer, are ignored.

Figure~\ref{fig:xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links}
shows the \xpe\ module with the cyan fluorescent protein. The
chromophore is shown as an internal cross-link between residues T, W
and G (net mass change: -20~Da). There is also a disulfide bond
involving two cysteine residues (net mass change: -2~Da). In this
example, the mass calculation engine did not take into account the
cross-links (see the unchecked \guilabel{Cross-links} check box). When
that check box is checked, the mass calculation engine yields mass
data with a differential of -22~Da (-20 -2)~Da : both cross-links have
now been taken into account.

\begin{figure}
  \begin{center}
    \includegraphics[width=0.75\textwidth]
    {figures/xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links.png}
  \end{center}
  \caption[Two cross-links in the cyan fluorescent protein
  sequence]{\textbf{Two cross-links in the cyan fluorescent protein
      sequence.}  This figure shows two cross-links (T--W--G and C--C)
    set to the cyan fluorescent protein. The mass calculation engine
    is configured to take these cross-links into account.}
  \label{fig:xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links}
\end{figure}


\begin{figure}
  \begin{center}
    \includegraphics[width=0.75\textwidth]
    {figures/xpertedit-cfp-chromophore-disulfide-bond-account-cross-links.png}
  \end{center}
  \caption[Calculations when cross-links are accounted
  for]{\textbf{Calculations when cross-links are accounted for.}  This
    figure shows that the two cross-links shwon in
    Figure~\ref{fig:xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links}
    are now taken into account, which translates into a mass decrease
    of 22~Da.}
  \label{fig:xpertedit-cfp-chromophore-disulfide-bond-account-cross-links}
\end{figure}


If we select the oligomer region [38--77] and that we ask for a
fragmentation, the fragmentation results will take into account both
cross-links only in the case the generated fragments encompasse fully
one or more cross-links.

The following calculation rationale applies:

\begin{itemize}

\item Fragments b (left end) from b$_1$ (D) to b$_{12}$ (up to I) do
  not take into account the cross-links as both are outside of its
  scope;

\item Fragments b$_{13}$ (up to C) to b$_{34}$ (up to Q) do not take
  into account the cross-links because the outer cross-link (disulfide
  bond between cysteine residues) is not complete (the second cysteine
  is left out of the fragment);

\item Fragments b$_{35}$ (up to C) to b$_{40}$ (up to P) do take into
  account both cross-links because both are contained in the fragments;

\item Likewise, the only y fragments (right end) that do take into
  account the cross-links are the fragments y$_{28}$ (up to C) and all
  the remaining, as for these fragments, the cross-links are both
  fully contained.

\end{itemize}


\begin{figure}
  \begin{center}
    \includegraphics[width=0.75\textwidth]
    {figures/xpertedit-fragmentation-cross-linked-oligomer.png}
  \end{center}
  \caption[Complicated cross-linking situation]{\textbf{Complicated
      cross-linking situation.}  This figure shows a complicated
    cross-linking situation with an oligomer that has five
    cross-links, four of which are fully encompassed by the oligomer
    and one that involves a monomer outside of the oligomer.}
  \label{fig:xpertedit-fragmentation-cross-linked-oligomer}
\end{figure}


The calculation of the fragments for this oligomer involves the
following steps:

\begin{itemize}

\item Calculate regions of the oligomer that involve cross-links
  either overlapping or not. The regions are thus the following:
  [3--5], [8--11] and [13--15]. Note that the cross-link involving
  monomer~12 is never taken into account as it involves also a monomer
  outside of the oligomer;

\item For fragments that have the left end of the oligomer (``Left-end
  nomenclature''), the following rationale is used:

  \begin{itemize}

  \item Fragments $\rightarrow$1 and $\rightarrow$2 do not have any
    cross-link;

  \item Fragments $\rightarrow$3 to $\rightarrow$4 do not account for
    cross-link~a because that cross-linke is not fully encompassed by
    the fragments;

  \item Fragments $\rightarrow$5 to $\rightarrow$10 account only for
    the cross-link~a as this is the only cross-linked region to be
    fully encompassed by these fragments;

  \item Fragments $\rightarrow$11 to $\rightarrow$14 account for
    cross-links~a, b and c as they are all fully encompassed in the
    fragments;

  \item Fragments $\rightarrow$15 to $\rightarrow$16 account for all
    cross-links, a, b, c, d as they are all fully encompassed in the
    fragments;

  \end{itemize}

\item For fragments that have the right end of the oligomer (Right-end
  nomenclature), the following rationale is used:

  \begin{itemize}

  \item Fragments 1$\leftarrow$ and 2$\leftarrow$ do not have any
    cross-link;

  \item Fragments 3$\leftarrow$ and 4$\leftarrow$ do not account for
    cross-link~d because that cross-link is not fully encompassed by
    the fragments;

  \item Fragments 5$\leftarrow$ and 6$\leftarrow$ account for
    cross-link~d because it is fully encompassed in these fragments;

  \item Fragments 7$\leftarrow$ to 9$\leftarrow$ only account for
    cross-link~d because cross-links~b and c (which make one
    cross-linked region) are not fully encompassed by these fragments;

  \item Fragments 10$\leftarrow$ to 14$\leftarrow$ account for
    cross-links~d, c and b, but not for cross-link~a as this last
    cross-link is not fully encompassed in these fragments;

  \item Fragments 15$\leftarrow$ and 16$\leftarrow$ account for all
    the cross-links of the oligomer.

  \end{itemize}

\end{itemize}

\noindent It is necessary to repeat one more time that cross-links
that involve monomer(s) outside of the oligomer are ignored. The user
is alerted whenever this situation is encountered.

Finally, one last note: if the list of monoisotopic or average masses
are desired in the form of a text list, right-clicking onto the table
view widget will allow copying to the clipboard either the
monoisotopic or the average masses. Also, it is possible to either
export the data to the clipboard or to a file or even to drag the
displayed oligomer items in a text editor.

For oligomer data filtering, please refer to
section~\ref{sect:oligomer-data-filtering}, page
\pageref{sect:oligomer-data-filtering}.


\renewcommand{\sectitle}{Mass Searching}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\label{sect:search-masses-polymer-sequence}
\index{\xpe!mass~searching}

It may happen that the scientist needs to know if some arbitrary
sequence region would have a given mass. \mXp\ allows for mass
searching operations in the polymer sequence. This is done by using
the menu \guimenu{Chemistry}\guimenuitem{Mass Search}. The window
illustrated in Figure~\vref{fig:xpertedit-mass-search} shows up and
the user enters masses to search for. A number of parameters are to be
detailed:
\smallskip

\begin{itemize}

\item \guilabel{Targets} The masses should be searched for in the
  whole sequence or in the currently selection region?

\item \guilabel{Ionization} When calculating masses for the potential
  oligomers matching the searched mass, should different levels of
  ionization be calculated. For example, one find in an electrospray
  ionization experiment mass spectrum a peak at \mz{1245}. It is not
  possible to know the ionization level for that ion. On could imagine
  that this value is for a monopronotonated or for a multiprotonated
  species. If we wanted to asses this, we might ask that the mass be
  searched for by computing a range of possible ionization levels
  between \guilabel{Start} \guivalue{1} and  \guilabel{End} \guivalue{4}
  (admitting that for that experiment this is what one would expect).

\end{itemize}
%
Once the masses have been searched for, if results are found they are
displayed in the same window in the \guilabel{Oligomers} table view
widgets (the left one for the mono masses and the right one for the
avg masses).


\begin{figure}
  \begin{center}
    \includegraphics[scale=0.75]
    {figures/xpertedit-mass-search.png}
  \end{center}
  \caption[Searching masses in a a polymer sequence]{\textbf{Searching
      masses in a polymer sequence.} This figure shows the window in
    which to search for masses in a polymer sequence.}
  \label{fig:xpertedit-mass-search}
\end{figure}


Finally, one last note: if the list of monoisotopic or average masses
are desired in the form of a text list, right-clicking onto the table
view widget will allow copying to the clipboard either the
monoisotopic or the average masses. Also, it is possible to either
export the data to the clipboard or to a file or even to drag the
displayed oligomer items in a text editor.

For oligomer data filtering, please refer to
section~\ref{sect:oligomer-data-filtering}, page
\pageref{sect:oligomer-data-filtering}.


\renewcommand{\sectitle}{Oligomer Data Filtering}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\label{sect:oligomer-data-filtering}
\index{\xpe!data~filtering}

Oligomer-generating simulations, like polymer sequence cleavages or
fragmentations or mass searches, produce a very large amount of
data. It is often desirable to be able to filter quickly some specific
data out of these bunch of data\dots\

In all three simulations mentioned above, the results that are
displayed in the corresponding dialog windows are easily filtered
using the mechanism illustrated in
Figure~\ref{fig:xpertedit-filtering-oligomer-data}.

\begin{figure}
  \begin{center}
    \includegraphics[width=1\textwidth]
    {figures/xpertedit-filtering-oligomer-data.png}
  \end{center}
  \caption[Oligomer data filtering]{\textbf{Oligomer data filtering.}
    This figure shows how oligomer data can be filtered. The
    \guilabel{Filtering options} group box contains four line edit
    widgets where filtering might be triggered: \guilabel{Partial},
    \guilabel{Mono}, \guilabel{Avg}, \guilabel{Charge}. The filtered
    data are displayed in the same window (this examlple for polymer
    sequence-cleavage oligomer data.}
  \label{fig:xpertedit-filtering-oligomer-data}
\end{figure}


Filtering on the data is easily performed by entering the options in
the \guilabel{Filtering options} group box
(Figure~\ref{fig:xpertedit-filtering-oligomer-data},
page~\pageref{fig:xpertedit-filtering-oligomer-data}). For any
filtering operation, only one criterium can be used, that is, for
example, filtering can occur only on the basis of the monoisotopic
mass or of the average mass, but not on both masses. For example, if
one wanted to filter a huge set of data against a specific
monoisotopic mass of 850 plus or minus 3 atomic mass units, it would
simply be a matter of setting the monoisotopic mass to be
\guivalue{850} with a tolerance of \guivalue{3 AMU} in the
corresponding line edit widgets contained in the \guilabel{Filtering
  options} group box. To perform that filtering action, first set the
tolerance value (\guivalue{3}) in its line edit widget and next set
the monoisotopic mass value to be \guivalue{850} in the corresponding
line edit widget. While the cursor \emph{is still} in the
\guilabel{Mono} line edit where \guivalue{850} was entered, press the
keyboard key combination \kbdKey{Ctrl}+\kbdKey{ENTER}. The filtering
will be immediate and the table view will show the data that passed
the filter. Note that the combo box widget holding the unit of the
tolerance (in the example, that unit is \guilabel{AMU}, that is
``atomic mass unit'') and the line edit widget where the tolerance
value proper is set (\guivalue{3} in the example) do not trigger any
filtering by themselves; these widgets are only useful in conjunction
with other oligomer data : \guilabel{Mono}, \guilabel{Avg},
\guilabel{Error} line edit widgets (depending on the dialog window the
filtering occurs: cleavage, fragmentation or mass search). In our
example, thus, the filtering would be spoken like this:
---\textsl{``Only show the oligomers for which the monoisotopic mass
  is 850 plus or minus 3 atomic mass units''}.

To exit the data filtering mode, simply uncheck the
\guilabel{Filtering options} check box, and all the initial data will
be displayed, irrespective of any data in the line edit boxes
described above.


\renewcommand{\sectitle}{m/z Ratio Calculation}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\label{sect:m-over-z-ratio-calculation}
\index{\xpe!simulations!m/z~calculations}

In electrospray ionization, a given polymer sequence might be charged
a large number of times. The tool shown in
Figure~\vref{fig:xpertedit-mz-ratio-calculator} shows how to compute a
range of m/z ratios starting from one m/z value for a given charge and
a given ionization agent. It is also possible to switch ionization
agent on-the-fly.

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.8]
    {figures/xpertedit-mz-ratio-calculator.png}
  \end{center}
  \caption[Calculation of ranges of m/z ratios]{\textbf{Calculation of
      ranges of m/z ratios.} This figure shows the window in which to
    perform the calculation of different m/z ratios starting from one
    m/z value with a given ionization agent.}
  \label{fig:xpertedit-mz-ratio-calculator}
\end{figure}


\renewcommand{\sectitle}{Monomeric And Elemental Compositions}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\label{sect:monomeric-elemental-compositions}
\index{\xpe!elemental~composition}
\index{\xpe!monomeric~composition}

The \guimenu{Chemistry}\guimenuitem{Determine Compositions} menu
triggers the window shown in Figure~\ref{fig:xpertedit-compositions}.
The elemental composition is determined using the calculations engine
configuration currently set in the polymer sequence editor window.

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.9]
    {figures/xpertedit-compositions.png}
  \end{center}
  \caption[Determination of the compositions]{\textbf{Determination of
      the compositions.} This figure shows how to determine the
    monomeric and elemental compositions for the whole sequence or the
    current selection.}
  \label{fig:xpertedit-compositions}
\end{figure}


\renewcommand{\sectitle}{pKa, pH, pI and Charges}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}
\label{sect:acido-basic-calculations}
\index{\xpe!pKa}
\index{\xpe!pH}
\index{\xpe!pI}

When preparing biochemical experiments, very often users need to know
how many charges a given polymer sequence will bear at any given pH.
Equally important is the ability to know at which pH value the polymer
sequence will have a net charge near to zero. The pH value for which a
given polymer sequence has a net charge near to zero (typically this
means that the number of positive charges equals the number of
negative charges) is called the isoelectric point---the pI.

Such computations are pretty computer-intensive and require a very
precise knowledge of the chemical structure of the different monomers
that take part in the definition of the polymer chemistry. A file,
called \filename{pka\_ph\_pi.xml} is located in the polymer chemistry
definition directory. This file lists all the chemical groups that are
possibly charged; each monomer of the polymer definition is
represented by a \verb|<monomer>| element in which data are defined
for any chemical group of that monomer that might bear a charge at any
given pH. You can find the listing of the \filename{pka\_ph\_pi.xml}
file in chapter\vref{chap:appendices}.  We'll discuss any aspect of
this file's contents in the next sections with enough detail that the
user will be able to write one such file for her specific polymer
chemistry.

At the moment, two entities in the polymer chemistry definition might
have chemical groups bearing charges: monomers and modifications.
We will first review monomers, and modifications next.

\renewcommand{\sectitle}{Ionized Group(s) In Monomers}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}

Monomers are the building blocks of polymer sequences. These blocks
must have at least two reactive groups so that they can be polymerized
into a polymer sequence thread. Reactive groups are often chargeable
groups; for example, the amino group of amino-acids is such that it
gets protonated (positively charged) at a pH inferior to its pKa.
Similarly, the carboxylic acid group of amino-acids is deprotonated
(negatively charged) at physiological pH.

\subsubsection*{Some Theory First}

\begin{figure}
  \begin{center}
    \includegraphics[scale=2]
    {figures/protein-monomer-acidobasic-data.png}
  \end{center}
  \caption[Different pKa values for a number of amino-acids' chemical
  groups]{\textbf{Different pKa values for a number of amino-acids'
      chemical groups.} All of the twenty amino-acids are represented
    here, which each amino-acid's lateral chain fully represented.
    Above each chemical group---for which the value makes sense from a
    biological perspective---the pKa value is indicated.}
  \label{fig:protein-monomer-acidobasic-data}
\end{figure}

For the non-biochemist reader, amino-acids involved in the formation
of proteins have always at least two chemical groups that are of
inverted electrical charge, at physiological pH values (see
Figure~\ref{fig:protein-monomer-acidobasic-data}):

\begin{itemize}
\item The amino group (called $\rm \alpha NH_2$) has a typical pKa
  value of 9.6. This means that, at physiological pH values (between
  6.5 and 7.5), the amino group will find the environment rather
  acidic, and will thus be protonated, leading to a positively-charged
  species ($\rm \alpha NH_3^+$);
\item The carboxylic group (called $\rm \alpha COOH$) has a typical pKa
  value of 2.35. This means that, at physiological pH values, the
  carboxylic group will be in a rather basic environment, and will
  thus be deprotonated, leading to a negatively-charged species ($\rm
  \alpha COO^-$).
\end{itemize}

\noindent It should be clear that, at physiological pH values the two
$\rm \alpha$ chemical groups have a net charge of 0. But proteins are
charged, and this is because some of the twenty common amino-acids
have other chemical groups beyond the two others already described.
Indeed, some amino-acids have lateral chains that bear groups that
might be charged depending on the pH: seryl residues have an alcohol
group that has a pKa of 13, for example; that means that it is almost
always uncharged (form ROH at physiological pH values). The lateral
chain of lysine has a pKa of 10.53, which means that at pH values
below this pKa value, the $\rm \epsilon NH_2$ gets protonated,
introducing a positive charge in the protein. Similarly, amino-acids
glutamate and aspartate do have a lateral chain ended with a $\rm
\gamma COOH$ and a $\rm \beta COOH$, respectively.  Their pKa values
are below 4.5, and thus the groups are negatively charged a
physiological pH values.

When the net charge of a polymer sequence has to be computed for a
given pH condition, the program iterates in the sequence, and for each
monomer will check which one of its chemical group(s) is possibly
charged.  For this to happen, it is required that a number of data be
known for each monomer's chemical group that might play a role in the
determination of the polymer sequence's electrical charge. Thus, for
each chemical group a number of data should be listed in the
\filename{pka\_ph\_pi.xml} file (please, see that file in the
chapter\vref{chap:appendices}):

\begin{itemize}
\item the chemical group's \verb|<name>| element is required.
  {\footnotesize Examples: ``$\rm \alpha NH_2$'' or ``$\rm \epsilon
    NH_2$'' or ``$\alpha$COOH'';}
\item the chemical group's \verb|<pka>| element is optional, but is
  the basis for the charge calculation. {\footnotesize Examples: 9.6
    for the ``$\alpha$NH$\rm _2$'' or 2.35 for ``$\alpha$COOH'';}
\item the \verb|<acidcharged>| element is required if the <pka>
  element is given. This element is responsible for telling if the
  chemical group is charged (positively) when the pH is lower than pKa
  (that is when the medium is acidic with respect to the pKa).
  {\footnotesize Examples: an amine is positively charged when it is
    in its acidic form (protonated); a carboxylic acid is \emph{not}
    charged when it is in its acidic form;}
\item there can be none, one or more \verb|<polrule>| element(s) for
  each chemgroup. The \verb|<polrule>| element gives informations
  about the way the chemical group at hand might be ``trapped'' (or
  not) in the formation of inter-monomer bonds (while the monomer is
  polymerized into the polymer sequence). The value ``left\_trapped''
  means that the chemical group ceases to be involved in charge
  calculations as soon as it has a monomer at its left end. The value
  ``right\_trapped'' means the same as above, but when a monomer is
  polymerized at its right end. For a chemical group that is
  ``left\_trapped'', we understand that it is only effectively
  evaluated if it is at the left end of the polymer sequence, since in
  this case it does not have a monomer at its left side. Conversely, a
  chemical group that has a \verb|<polrule>| element with value
  ``right\_trapped'', will be evaluated only if the monomer is
  actually the right end monomer in the polymer sequence. Finally, the
  typical lateral chains of amino-acids have a \verb|<polrule>|
  element with a value ``never\_trapped'', as these chemical groups do
  not take part in the formation of the inter-monomer bond;
\item there can be none, one or more \verb|<chemgrouprule>| element(s)
  for each chemgroup. A chemgrouprule element should contain the
  following:
  \begin{itemize}
  \item there must be an \verb|<entity>| element that indicates what
    is the chemical entity being dealt with in the current chemgroup
    element.  Valid values for this element are ``LE\_PLM\_MODIF'',
    ``RE\_PLM\_MODIF'' or ``MNM\_MODIF'';
  \item there must be a \verb|<name>| element naming the chemical
    entity properly;
  \item there must be an \verb|<outcome>| element telling what action
    should be taken when encountering the \verb|<entity>| on the
    chemgroup.  Valid values are either ``LOST'' or ``PRESERVED''.
  \end{itemize}
\end{itemize}


\subsubsection*{Understanding By Example}

Let us take some examples in order to make sure we actually understand
the process of describing how an electrical net charge is calculated
for a given polymer sequence and at any given pH value.

Let us see the example of the aspartate amino-acid, of which the
lateral chain is nothing but $\rm CH_2COOH$:

\begin{alltt}
    <monomer>
      <code>D</code>
      <mnmchemgroup>
        <name>N-term NH2</name>
        <pka>9.6</pka>
        <acidcharged>TRUE</acidcharged>
        <polrule>left_trapped</polrule>
        <chemgrouprule>
          <entity>LE_PLM_MODIF</entity>
          <name>Acetylation</name>
          <outcome>LOST</outcome>
        </chemgrouprule>
      </mnmchemgroup>
      <mnmchemgroup>
        <name>C-term COOH</name>
        <pka>2.36</pka>
        <acidcharged>FALSE</acidcharged>
        <polrule>right_trapped</polrule>
      </mnmchemgroup>
      <mnmchemgroup>
        <name>Lateral COOH</name>
        <pka>3.65</pka>
        <acidcharged>FALSE</acidcharged>
        <polrule>never_trapped</polrule>
        <chemgrouprule>
          <entity>MONOMER_MODIF</entity>
          <name>AmidationAsp</name>
          <outcome>LOST</outcome>
        </chemgrouprule>
      </mnmchemgroup>
    </monomer>
\end{alltt}

\noindent We see that the code of the monomer for which acid-basic
data are being defined is `D' and that this monomer has three chemical
groups that might bring electrical charges. These chemical groups are
described by three \verb|<mnmchemgroup>| elements that we will review in
detail below (see Figure~\vref{fig:protein-monomer-acidobasic-data}).

\medskip

The first \verb|<mnmchemgroup>| element is related to the $\rm \alpha
NH_2$ amino group of the amino-acid:

\begin{itemize}
\item \verb|<name>N-term NH2</name>| The name of the chemical group is
  not immediately useful, but will be used when reports are to be
  prepared for the calculation;
\item \verb|<pka>9.6</pka>| This element is optional. However, of
  course, if the chemical group might be electrically charged, the pKa
  value will be essential in order to compute the charge that is
  brought by this chemical group at any given pH;
\item \verb|<acidcharged>TRUE</acidcharged>| This element is also
  optional, however, if the previous element is given, then this one
  is compulsory. Telling if the conjugated acid form is charged (that
  is protonated) is essential in order to know what sign the charge
  has to be when the chemical group is ionized. The value ``TRUE''
  indicates that when the pH is lower than the pKa, the chemical group
  is charged, thus protonated (in the form $\rm NH_3^+$).
  Consequently, if the pH is higher than the pKa, then the chemical
  group is neutral (in the form $\rm NH_2$);
\item \verb|<polrule>left_trapped</polrule>| This element indicates
  that the chemical group should only be taken into account in the
  eventuality that the monomer bearing it (code `D') is the left end
  monomer of the polymer sequence. This can easily be understood, as
  this chemical group is responsible for the establishment of the
  inter-monomer bond towards the left end of the polymer sequence;
\item \verb|<chemgrouprule>| This element provides further details on
  the chemistry that this chemical group might be involved in:
  \begin{itemize}
  \item \verb|<entity>LE_PLM_MODIF</entity>| This element indicates
    that the supplementary data in the current \verb|<chemgrouprule>|
    element are pertaining to the $\rm \alpha NH_2$ chemical group
    \emph{only} in case the polymer sequence is left end-modified
    (that is with a permanent left end modification) and the monomer
    (code `D') is located at the left end of the polymer sequence
    (that is: it is the first monomer of the sequence for which the
    electrical charge---or pI---calculation is to be performed).
  \item \verb|<name>Acetylation</name>| This element goes further in
    the detail of the potential chemistry of the $\rm \alpha NH_2$
    chemical group: if the left end permanent modification is
    ``Acetylation'', then the current chemgrouprule element can be
    further processed, otherwise it should be abandoned;
  \item \verb|<outcome>LOST</outcome>| This element actually indicates
    what should be done with the chemical group for which the
    chemgrouprule is being defined. What we see here is:
    ---\textsl{``If the $\rm \alpha NH_2$ chemical group, belonging to
      a `D' monomer located at the left end of a polymer sequence, is
      modified permanently with an ``Acetylation'' left end
      modification, it should not be taken into account when computing
      the charge that it could bring to the polymer sequence.''}
  \end{itemize}
\end{itemize}

\noindent The second \verb|<mnmchemgroup>| element is related to the
$\rm \alpha COOH$ carboxylic group of the amino-acid:

\begin{itemize}
\item \verb|<name>C-term COOH</name>| Same remark as above;
\item \verb|<pka>2.36</pka>| Same remark as above;
\item \verb|<acidcharged>FALSE</acidcharged>| Same remark as above.
  However, as we can see, the value indicates that the acid conjugate
  (form $\rm COOH$) does not bring any charge. This means that when
  the basic conjugate is predominant (that is when pH > pKa), it
  brings a negative charge: the form is $\rm COO^-$;
\item \verb|<polrule>right_trapped</polrule>| The chemical group
  should not be evaluated if a monomer is linked to it at its right
  side. That means that the current chemical group is only evaluated
  if the monomer bearing it is located at the right end of the polymer
  sequence. This is easily understood, as the $\rm \alpha COOH$
  chemical group is involved in the formation of the inter-monomer
  bond towards the right end of the polymer sequence.
\end{itemize}

\noindent The third \verb|<mnmchemgroup>| element is related to the
$\rm \beta COOH$ carboxylic group of the amino-acid:

\begin{itemize}
\item \verb|<name>Lateral COOH</name>|;
\item \verb|<pka>3.65</pka>|;
\item \verb|<acidcharged>FALSE</acidcharged>|;
\item \verb|<polrule>never_trapped</polrule>| This element indicates
  that, whatever the position of the monomer bearing the chemical
  group in the polymer sequence (left end, right end or middle), the
  chemical group is to be evaluated;
\item \verb|<chemgrouprule>| This element provides further details on
  the chemistry that the chemical group at hand ($\rm \beta COOH$)
  might be involved in:
  \begin{itemize}
  \item \verb|<entity>MONOMER_MODIF</entity>| This element indicates
    that the supplementary data in the current \verb|<chemgrouprule>|
    element are pertaining to the $\rm \beta COOH$ chemical group
    \emph{only} in case the monomer bearing the chemical group is
    chemically modified;
  \item \verb|<name>AmidationAsp</name>| This is the modification by
    which the monomer should be modified in order to have the
    \verb|<chemgrouprule>| element effectively evaluated;
  \item \verb|<outcome>LOST</outcome>| This element actually indicates
    that if the monomer bearing the chemical group is modified with an
    ``AmidationAsp'' chemical modification, then the chemical group
    should not be evaluated any more for the electrical charge ---or
    pI--- calculations, since reacting a carboxylate group with an
    amino group produces an amide group which is not easily chargeable
    at physiological pH values.
  \end{itemize}
\end{itemize}

\noindent At this point we should have made it clear how the charge
calculations can be configured for the different monomers in the
polymer chemistry definition. As usual, the more the polymer chemistry
definition is sophisticated, the more sophisticated the computations
are allowed.


\renewcommand{\sectitle}{Ionized Group(s) In Modifications}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}


In the excerpt from the \filename{pka\_ph\_pi.xml} file below, we see
that chemical modifications can also bring charges. The example of the
chemical modification ``Phosphorylation'' shows that when a monomer is
phosphorylated, two chemical groups are brought in: the first has a
pKa value of 1.2 (that is it will always be deprotonated at
physiological pH values), the second has a pKa value of 7 (that is it
will be divided by half in a protonated (not charged) form and in an
un-protonated (negatively charged) form, leading to a net electrical
charge of $\mathrm{-0.5}$.

\begin{alltt}
    <modif>
      <name>Phosphorylation</name>
      <mdfchemgroup>
        <name>none_set</name>
        <pka>1.2</pka>
        <acidcharged>FALSE</acidcharged>
      </mdfchemgroup>
      <mdfchemgroup>
        <name>none_set</name>
        <pka>6.5</pka>
        <acidcharged>FALSE</acidcharged>
      </mdfchemgroup>
    </modif>
\end{alltt}

\noindent At this point we should be able to study the way
computations are actually performed in the \xpe\ module.


\renewcommand{\sectitle}{pH, pI and Charge Calculations}
\subsection*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{subsection}{\numberline{}\sectitle}

The user willing to compute charges (positive, negative, net) or the
isoelectric point for the current polymer sequence uses the menu
\guimenu{Chemistry}\guimenuitem{pKa pH pI} which triggers the
appearance of the window shown in
Figure~\vref{fig:xpertedit-net-charge-pka-ph-pi}.

\begin{figure}
  \begin{center}
    \includegraphics[width=0.66\textwidth]
    {figures/xpertedit-net-charge-pka-ph-pi.png}
  \end{center}
  \caption[Acido-basic computations: net charges]{\textbf{Acido-basic
      computations: net charges.} This figure shows the options that
    can be set for the calculation of the charges beared by the
    polymer sequence.}
  \label{fig:xpertedit-net-charge-pka-ph-pi}
\end{figure}

This figure shows that the user can calculate the charges (positive,
negative and net) beared by the polymer sequence (either the whole
sequence or the current selection) by setting the \guilabel{pH} value
at which the computation should take place. It is also possible to
calculate the isoelectric point by clicking onto the
\guilabel{Isoelectric Point} button.

Note that the computations might involve the permanent left/right
modifications of the polymer sequence, as well as the monomer chemical
modifications. To configure the way net charge---or pI---calculations
are performed, use the calculations engine configuration of the
sequence editor window.
\index{\xpe|)}


\renewcommand{\sectitle}{General Options}
\section*{\textcolor{sectioningcolor}{\sectitle}}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

One of the options that are valued most by users is to be able to set
the number of decimal places used to diplay numbers. The settings
should apply in a distinct manner depending on the different entities
for which numerical values are to be displayed. The following are the
default values (and recommended ones):

\begin{itemize}

\item Atoms (and all related entities (isotopic masses, isotopic
  abundances): 10;

\item pKa, pH, pI: 2;

\item Oligomers (obtained \textit{via} mass searches, polymer
  cleavages, oligomer fragmentations): 5;

\item Polymers : 3;

\end{itemize}

\noindent Note that modifying these values will allow immediate change
of the way numerals are displayed, without needing to restart the
program. Only triggering a new cleavage or a new fragmentation will
update the data display according to the new options set. These
options are stored on the disk and are permanent.

\cleardoublepage


%%% Local Variables:
%%% mode: latex
%%% TeX-master: "polyxmass"
%%% End: