1\chapter[Basics in Polymer Chemistry]{Basics in Polymer Chemistry}
2\label{chap:basics-polymer-chemistry}
3
4This chapter will introduce the basics of polymer
5chemistry\index{polymer~chemistry}. The way this topic is going to be
6covered is admittedly biased towards mass spectrometry and biological
7polymers. Moreover, the aim of this chapter is to provide the reader
8with the specialized words that will later be used to describe and
9explain the (inner) workings of the \mXp\ program. This manual is not
10a ``crash course'' in biochemistry.
11
12\renewcommand{\sectitle}{Polymers? Where? Everywhere!}
13\section*{\textcolor{sectioningcolor}{\sectitle}}
14\addcontentsline{toc}{section}{\numberline{}\sectitle}
15
16Indeed, polymers are everywhere. If you ask somebody to show you
17something polymeric, he/she will point you at the first plastic object
18in the vicinity. Right, plastic materials are made of hydrocarbon
19polymers. We also have many different polymers in our body. Proteins
20are polymers, complex sugars are polymers, DNA (the so-called
21``molecule of heredity'' is a \emph{huge} polymer. There are polymers
22in wine, in wood... Where? Everywhere!
23
24\bigskip
25
26\noindent The \textsl{Oxford Advanced Learner's Dictionary of Current
27  English} gives for \emph{polymer} the following definition:
28\textit{natural or artificial compound made up of large molecules
29  which are themselves made from combinations of small simple
30  molecules}.
31
32\bigskip
33
34\noindent A polymer is indeed made by covalently linking small simple
35molecules together. These small simple molecules are called
36\emph{monomer}s, and it is immediate that a \emph{polymer} is made of
37a number of monomers. A general term to describe the process that
38leads to the formation of a polymer is \emph{polymerization}. It
39should be noted that there are many ways to polymerize monomers
40together. For example, a polymer might be either linear or branched. A
41polymer is linear if the monomers that are polymerized can be joined
42at most two times. The first junction links the monomer to an
43elongating polymer (thus making it the new end of the elongating
44polymer which, by the way, is longer than before by one unit) and the
45second junction links the new elongating polymer's end to another
46monomer. This process goes on until the reaction is stopped, the point
47at which the polymer reaches its \emph{finished
48  state}\index{finished~state}. A branched polymer is a polymer in
49which at least one monomer is able to contract more than two bonds. It
50is thus clear that a single monomer linked three times to other
51monomers will yield a ``T-structure'', which is nothing but a branched
52structure.
53
54In the following sections we'll describe a number of different kinds
55of polymers. Each time, they will be described by initially detailing
56the structure of their constitutive monomers; next the formation of
57the polymer is described. At each step we shall try to set forth each
58polymer characteristics in such a manner as to introduce the way \mXp\
59``thinks polymers'' and to introduce specialized terminologies. Once
60the basic chemistries (of the different polymers) have all been
61described, we will enter a more complex subject that is of enormous
62importance to the mass spectrometry specialist: polymer chain
63disrupting chemistry. We shall see that this terminology actually
64involves two kinds of chemistries: cleavage, on the one hand, and
65fragmentation, on the other hand.
66
67While \mXp\ is basically oriented to linear single-stranded polymer
68chemistries, it can also be used to simulate highly complex polymer
69chemistries. Biological polymers are the main focus of this manual,
70however all the concepts described here may be applied with no
71modification to synthetic polymer chemistries.
72
73
74\renewcommand{\sectitle}{Various Biopolymer Structures}
75\section*{\textcolor{sectioningcolor}{\sectitle}}
76\addcontentsline{toc}{section}{\numberline{}\sectitle}
77
78Biopolymers are amongst the most sophisticated and complex polymers on
79earth and it certainly is not a mistake to take them as examples of
80how monomers (be these complex or not) can assemble covalently into
81life-enabling polymers. In this section we will visit three different
82polymers encountered in the living world: proteins, nucleic acids and
83polysaccharides. We shall be concerned with 1) the monomers'
84structure, 2) the polymerization reaction and 3) the final end-capping
85reaction responsible for putting the polymer in its finished state.
86
87\renewcommand{\sectitle}{Proteins}
88\subsection*{\textcolor{sectioningcolor}{\sectitle}}
89\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
90\index{protein}
91
92These biopolymers are made of amino acids. There are twenty major
93amino acids in nature, and each protein is made of a number of these
94amino acids\index{amino~acid}. The combinations are infinite,
95providing enormous diversity of proteins to the living world.
96
97A protein is a polar polymer: it has a left end and a right
98end\index{protein!left/right ends}, and polymerization actually occurs
99from left to right (from N-terminus to C-terminus, see
100below). Figure~\ref{fig:peptbond-formation} shows that the chemical
101reaction at the basis of protein synthesis is a
102\emph{condensation}\index{condensation}. A protein is the result of
103the condensation of amino acids with each other in an orderly polar
104fashion. A protein has a left end, called \emph{N-terminus; amino
105  terminal end} and a right end, called \emph{C-terminus; carboxyl
106  terminal end}. The left end is an amino group ($\mathrm{_2HN--}$)
107corresponding to the non-reacted amino group of the amino acid. Upon
108condensation of a new amino acid onto the first one, the carboxyl
109group of the first amino acid reacts with the amino group of the
110second amino acid. A water molecule is released, and the formation of
111an amide bond between the two amino acids yields a dipeptide.  The
112right end of the dipeptide is a carboxyl group (--COOH) corresponding
113to the un-reacted carboxyl group of the last amino acid to have
114``polymerized in''.
115
116The bond formed by condensation of two amino acids is an amide
117bond\index{protein!amide~bond}, also called---in protein chemistry---a
118\emph{peptidic bond}. The elongation of the protein is a simple
119repetition of the condensation reaction shown in
120Figure~\ref{fig:peptbond-formation}, granted that the elongation
121\emph{always} proceeds in the described direction (a new monomer
122arrives to the right end of the elongating polymer, and elongation is
123done from left to right).
124
125\begin{figure}
126  \begin{center}
127    \includegraphics[width=0.4\textwidth]{figures/peptbond-formation.png}
128  \end{center}
129  \caption[Peptidic bond formation]{\textbf{Peptidic bond formation by
130      condensation.} The left end monomer $\mathrm{R_1}$ is condensed
131    to the right end monomer $\mathrm{R_2}$ to yield a peptidic bond.
132    A water molecule is lost during the process.}
133  \label{fig:peptbond-formation}
134\end{figure}
135
136Now we should point at a protein chemistry-specific terminology issue:
137we have seen that a protein is a polymer made of a number of monomers,
138called amino acids\index{amino~acid}. In protein chemistry, there is a
139subtlety: once a monomer is polymerized into a protein it is no more
140called a monomer, it is called a \emph{residue}\index{residue}. We may
141say that a residue is an amino acid less a water molecule.
142
143From what we have seen until now, we may define a protein this way:
144---\textsl{``A protein is a chain of residues linked together in an
145  orderly polar fashion, with the residues being numbered starting
146  from 1 and ending at n, from the first residue on the left end to
147  the last one on the right end''}. This definition is still partly
148inexact, however.  Indeed, from what is shown in
149Figure~\ref{fig:prot-polymer}, there is still a problem with the
150extremities of the residual chain: what about the amino group on the
151left end of a protein (the amino group sits right onto the first amino
152acid of the protein), and what about the carboxyl group of the right
153end of a protein (the carboxyl group sits right onto the last amino
154acid of the protein)? Because these groups lie at the extremities of
155the residual chain, they remained unreacted during the polymerization
156process. But because we are simulating a residual chain using residues
157and not amino-acids, we still need to put the residual chain in its
158finished state: by \emph{capping} the left end with a proton
159\emph{cap} (so as to complete the amino group) and the right end with
160a hydroxyl cap (so as to complete the carboxyl
161group)\index{protein!left/right~caps}. The capping of the residual
162chain extremities ensures that the polymer is in its finished state,
163and that it cannot be elongated anymore. The proton is the \emph{left
164  cap} of the protein polymer and the hydroxyl is the \emph{right cap}
165of the protein polymer.
166
167\begin{figure}
168  \begin{center}
169    \includegraphics[width=0.4\textwidth]{figures/prot-polymer.png}
170   \end{center}
171   \caption[End capping chemistry of the protein polymer]{\textbf{End
172       capping chemistry of the protein polymer.} A protein is made of
173     a chain of residues and of two caps. The left cap is the
174     N-terminal proton and the right cap is the C-terminal hydroxyl.
175     Altogether, the residual chain (enclosed here in the blue
176     polygon) and both the H and OH red-colored caps do form a
177     complete protein polymer in its finished state.}
178  \label{fig:prot-polymer}
179\end{figure}
180
181Now comes the question of unambiguously defining the structure of a
182protein. It is commonly accepted that the simple ordered sequence of
183each residue code in the protein, from left to right, constitutes an
184unambiguous description of the protein's primary structure (that is
185its sequence). Of course, proteins have three-dimensional structures,
186but this is of no interest to a program like \mXp, which is aimed at
187calculating masses of polymers. To enunciate unambiguously the
188sequence of a protein, one would use a symbology like this:
189\smallskip
190
191\begin{mynoindent}
192  {\footnotesize using the 3-letter code of the amino acids:}\\
193  Ala Gly Trp Tyr Glu Gly Lys\\
194  {\footnotesize or, using the 1-letter code of the amino acids:}\\
195  A G W Y E G K\\
196  Alanine is thus the residue 1 and Lysine is the last residue (n =
197  7).
198\end{mynoindent}
199
200
201\renewcommand{\sectitle}{Nucleic Acids}
202\subsection*{\textcolor{sectioningcolor}{\sectitle}}
203\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
204\index{nucleic~acid}
205
206These biopolymers are more complex than proteins, mainly because they
207are composed of monomers \emph{(nucleotides)}\index{nucleotide} that
208have three different chemical parts, and because those parts differ in
209DNA and RNA. A nucleotide\index{nucleic~acid!left/right~ends} is the
210nucleic acid's brick: \emph{a nucleotide consists of a nitrogenous
211  base combined with a ribose/deoxyribose sugar and with a phosphate
212  group}. There are two different kinds of nucleic acids:
213deoxyribonucleic acid (DNA, the sugar is a deoxyribose) and
214ribonucleic acid (RNA, the sugar is a ribose). DNA is most often found
215in its double stranded form, while RNA is most often found in single
216strand form.  There are four nitrogenous bases for each: Adenine,
217Thymine, Guanine, Cytosine for DNA; in RNA only one of these bases
218changes: Thymine is replaced by Uracile. As for proteins, nucleic
219acids are polar polymers: the polymerization process is polar, from
220left to right (sometimes left is up and right is down in certain
221vertical representations found mainly in textbooks).
222
223This manual is not to teach biochemistry, which is why the structure
224of the monomers is not described in atomic detail. However, since it
225is important to understand how the polymerization occurs,
226Figure~\ref{fig:nucacbond-formation} represents the polymerization
227reaction mechanism between a nucleotide and another one, to yield a
228dinucleotide. That reaction is a \emph{trans-esterification}. A
229nucleic acid has a left end---\emph{5' end; often this end is
230  phosphorylated}---and a right end---\emph{3' end; hydroxyl end}. The
231trans-esterification reaction is the attack of the phosphorus of the
232new (deoxy)nucleotide triphosphate by the 3'OH of the right end of the
233elongating nucleotidic chain. Upon
234trans-esterification\index{trans-esterification}, an \emph{inorganic
235  pyrophosphate} ($\mathrm{PP_i}$) is released, and the formation of a
236phosphodiester bond\index{nucleic~acid!phosphodiester~bond} between the two
237nucleotides yields a dinucleotide.  The elongation of the nucleic acid
238polymer is a simple repetition of this esterification reaction so that
239the chain growth is always in the 5'$\Longrightarrow$3' direction.
240This is achieved in the living cells by what is called the
241\emph{5'$\Longrightarrow$3' polymerase enzymatic activity}.
242
243\begin{figure}
244  \begin{center}
245    \includegraphics[width=0.8\textwidth]{figures/nucacbond-formation.png}
246   \end{center}
247  \caption[Phosphodiester bond formation]{\textbf{Phosphodiester bond
248      formation by esterification.} The arriving monomer (on the
249    right) has its triphosphate on the 5' carbon of the sugar
250    esterified by nucleophilic attack of the first phosphorus by the
251    alcohol function beared by the 3' carbon of the (deoxy)ribose
252    sugar ring of the left monomer. The bond that is formed is a
253    phosphodiester bond, with release of a pyrophosphate group
254    ($\mathrm{PP_i}$). Note that the sugar and nitrogenous bases are
255    schematically represented in this figure.}
256  \label{fig:nucacbond-formation}
257\end{figure}
258
259
260The conventional representation of a nucleic acid involves showing the
2615' end on the left, and the 3' end on the right, horizontally.
262Sometimes, to clearly indicate that the left end is phosphorylated,
263while the right end is not, the ends are indicated as ``5'P'' and
264``3'OH''. Figure~\ref{fig:nucac-polymer} shows a simple way to
265formalize what a nucleic acid polymer is. The molecule represented on
266the left is the ``monomer'' in the sense that the polymer is made of n
267monomers. On the right side of that figure, the polymer made of n
268monomers is shown as a residual chain (inside the blue polygon box)
269that got capped with OH on its left end and H on its right end
270(red-colored atoms)\index{nucleic~acid!left/right~caps}. Thus, in the
271case of the nucleic acid polymers, the left cap is a hydroxyl and the
272right cap is a proton.  This anecdotically happens to be the exact
273converse of what was described earlier for proteins.
274
275\begin{figure}
276  \begin{center}
277    \includegraphics[width=0.5\textwidth]{figures/nucac-polymer.png}
278      \end{center}
279  \caption[A nucleic acid is a capped nucleotide chain]{\textbf{End
280      capping chemistry of the nucleic acid polymer.} A nucleic acid
281    is made of a chain of nucleotides (left formula) and of two caps.
282    The left cap is the hydroxyl group that belongs to the terminal
283    phosphate of the 5' carbon of the sugar. The right cap is the
284    proton that belongs to the hydroxyl group of the 3' carbon of the
285    sugar ring (right formula). Altogether, a finished nucleic acid
286    polymer is made of the nucleotidic chain (enclosed here in the
287    blue polygon), made of the repetitive elements (one of which is
288    shown on the left), and of the two caps (red-colored OH and H, out
289    of the box on the right).}
290  \label{fig:nucac-polymer}
291\end{figure}
292
293Now comes the question of unambiguously defining the structure of a
294nucleic acid. It is commonly accepted that the listing of the named
295nitrogenous bases in the nucleic acid---from left (5' end) to right
296(3' end)---constitutes an unambiguous description of the nucleic acid
297sequence. To enunciate the sequence of a gene, one would use a
298symbology like this:
299\smallskip
300
301\begin{mynoindent}
302  {\footnotesize for a DNA, using the 1-letter code of the nitrogenous
303    bases:} A T G C A G T C\\
304  {\footnotesize for an RNA, using the 1-letter code of the
305    nitrogenous bases:} A U G C A G U C\\
306  Adenine is thus the base 1 and Cytosine is the last base (n = 8).
307\end{mynoindent}
308
309
310\renewcommand{\sectitle}{Saccharides}
311\subsection*{\textcolor{sectioningcolor}{\sectitle}}
312\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
313\index{saccharide}
314
315These biopolymers are certainly amongst the most complex ones in the
316living world. This is mainly due to the fact that saccharides are
317usually heavily modified in living cells with a huge variety of
318chemical modifications. Furthermore, the ramifications in the polymer
319structure are more often the normal situation than not. Interestingly,
320these molecules are first thought of as the ``fuel'' for the cell,
321which is certainly far from being total nonsense, but it is also
322undoubtful that their structural role is extremely important (often in
323combination with proteinaceous material). Another interesting aspect
324of their ability to form complex structures is their use as ``key''
325systems for identification processes: a number of complex sugars are
326located on the cell walls and provide ``recognition patterns'' for the
327other cells to deal with\dots
328
329Nonetheless, the general picture is not that complex, if the way
330monomers are polymerized together is the only concern (which is the
331case in this manual). As far as we are concerned, in fact, the
332polymerization mechanism is a simple condensation (much like what has
333been described for proteins), yielding a sugar
334bond\index{saccharide!sugar~bond}. Indeed, some people use the same
335terminology: a monomeric sugar becomes a residue once polymerized in
336the saccharidic chain. There are two main different kinds of sugars:
337\emph{pentoses} (in $\mathrm{C_5}$) and \emph{hexoses} (in
338$\mathrm{C_6}$); it should be noted, however, that there is a variety
339of other common molecules, like \emph{sialic acids},
340\emph{heptoses}\dots
341
342Like already seen for proteins and nucleic acids, a saccharidic
343polymer is polar: it has a left end and a right
344end\index{saccharide!left/right~ends}.  The terminology regarding the
345ends of a saccharidic polymer is rather unexpected at first sight: the
346left end is said to be the \emph{non-reducing
347  end}\index{non-reducing~end} while the right end is said to be the
348\emph{reducing end}\index{saccharide!reducing~end}.  Historically this
349was observed with monosaccharides (also called
350\emph{monoses}\index{monose}), which reduced cupric
351($\mathrm{Cu^{2+}}$) ions, thus getting oxydized themselves on the
352carbonyl (when in the open ring aldehydic form).
353
354Figure~\vref{fig:sacchbond-formation} shows the polymerization
355reaction between a sugar and another one (2 glucose monomers,
356actually), to yield a maltose disaccharide. The polymerization
357mechanism is a simple condensation. The elongation of the saccharidic
358polymer is a simple repetition of this condensation reaction so that
359the chain growth is always in the same orientation, from the
360non-reducing end to the reducing end. The conventional representation
361of a polysaccharide involves showing the non-reducing end on the left,
362and the reducing end on the right, horizontally.
363Figure~\vref{fig:sacch-polymer} shows a simple way to formalize what a
364saccharidic polymer is. The top formula is the representation of the
365monomer. The bottom formula represents a polysaccharide, with the
366repetitive elements boxed (there are n monomers polymerized). The
367atoms shown in red (outside the boxed repetitive elements) are the
368saccharidic polymer caps. Thus, we see clearly that in the case of
369polysaccharides, the left cap is a proton and the right cap is a
370hydroxyl\index{saccharide!left/right~caps}. This anecdotically happens
371to be identical to proteins and the exact converse of what we
372described previously for nucleic acids.
373
374
375
376\begin{figure}
377  \begin{center}
378    \includegraphics[width=0.8\textwidth]{figures/sacchbond-formation.png}
379  \end{center}
380  \caption[Osidic bond formation]{\textbf{Osidic bond formation by
381      condensation.} The two monomers are subject to condensation with
382    loss of one molecule of water.}
383  \label{fig:sacchbond-formation}
384\end{figure}
385
386\begin{figure}
387  \begin{center}
388    \includegraphics[width=0.4\textwidth]{figures/sacch-polymer.png}
389  \end{center}
390  \caption[A saccharidic polymer is a capped osidic residue
391  chain]{\textbf{End capping chemistry of the polysaccharidic
392      polymer.} A polysaccharide is made of a chain of osidic residues
393    (blue-boxed formula) and of two caps (red-colored atoms). The left
394    cap is the proton group that belongs to the non-reducing end of
395    the polymer. The right cap is the hydroxyl group that belongs to
396    the reducing end of the polymer.}
397  \label{fig:sacch-polymer}
398\end{figure}
399
400Now comes the question of unambiguously defining the structure of a
401saccharidic polymer. It is commonly accepted that the simple ordered
402sequence of the named monoses in the saccharidic polymer, from left
403(non-reducing end) to right (reducing end), constitutes an unambiguous
404description of the glycan sequence. To enunciate the sequence of a
405glycan, one would use a symbology like this:
406
407\begin{mynoindent}
408  {\footnotesize using a  3-letter code:}\\
409  Ara Gal Xyl Glc Hep Man Fru\\
410  Arabinose is thus the monose 1 and Fructose is the last monose (n =
411  7).
412\end{mynoindent}
413
414\bigskip
415
416\noindent Incidentally, this is where the ability of \mXp\ to handle monomer
417codes of non-limited length comes in handy!
418
419\renewcommand{\sectitle}{To Sum Up}
420\section*{\textcolor{sectioningcolor}{\sectitle}}
421\addcontentsline{toc}{section}{\numberline{}\sectitle}
422
423We made a rapid overview of the three major polymers in the living
424world. A great many other polymers exist around us.
425Table~\vref{tab:three-biopolym-exples} tries to sum up all the
426informations gathered so far. Note that the formul{\ae} given for the
427monomers are the ``residual'' ones. For example, the formula of the
428glycyl residue corresponds to the formula of the Glycine monomer less
429one molecule of water. Many synthetic polymers are much simpler than
430the ones we have rapidly reviewed, and it should be clear that, if
431\mXp\ can deal with the complex biopolymers described so far, it
432certainly will be very proficient with less complex synthetic
433polymers. Describing the formation of polymers is one thing, but we
434also have to describe how to disrupt polymers. This is what we shall
435do in the next section.
436
437
438\begin{table}
439    \begin{small}
440      \begin{tabular}{c|ccccc}\hline
441        polymer     &   name & code  &    formula                &  left cap  & right cap \\
442        \hline
443        protein     &   &      &                           &      H         &           OH       \\
444        & Glycine   &   G     & $\mathrm{C_2H_3O_1N_1}$   &                &                    \\
445        & Alanine   &   A     & $\mathrm{C_3H_5O_1N_1}$   &                &                    \\
446        & Tyrosine  &   T     & $\mathrm{C_9H_9O_2N_1}$   &                &                    \\
447        nucleic acid&   &      &                           &      OH        &            H       \\
448        & Adenine   &   A     & $\mathrm{C_{10}H_{12}O_5N_5P_1}$ &         &                    \\
449        & Cytosine  &   C     & $\mathrm{C_9H_{12}O_6N_3P_1}$    &         &                    \\
450        saccharide  &   &      &                           &      H         &            OH      \\
451        & Arabinose &   Ara   & $\mathrm{C_5H_8O_4}$      &                &                    \\
452        & Heptose   &   Hep   & $\mathrm{C_7H_{12}O_8}$   &                &                    \\
453        \hline
454        \multicolumn{6}{c}{Note: LC=left cap; RC= right cap}\\
455        \hline
456      \end{tabular}
457      \caption[Comparison of three common biopolymers]{\textbf{Quick comparison of three biopolymers with examples of monomers}}\label{tab:three-biopolym-exples}
458    \end{small}
459\end{table}
460
461
462\renewcommand{\sectitle}{Polymer Chain Disrupting Chemistry}
463\section*{\textcolor{sectioningcolor}{\sectitle}}
464\addcontentsline{toc}{section}{\numberline{}\sectitle}
465
466\label{sect:pol-chain-disrupt-chem}
467
468The ``polymer chain disrupting chemistry'' was mentioned earlier as a
469complex subject that was of \emph{enormous} importance to the mass
470spectrometrist. This is why that subject will be treated in a pretty
471thorough manner. First of all it should be noted that a chemical
472modification of a polymer does not necessarily involve the
473perturbation of the chain structure of the polymer.  Here, however, we
474are concerned specifically with a number of chemical modifications
475that yield a polymer chain perturbation; \emph{cleavage} and
476\emph{fragmentation}:
477\smallskip
478
479\begin{mynoindent}
480  \textsc{A cleavage is a chemical process}\index{cleavage} by which a
481  cleaving agent will act directly on the polymer chain making it fall
482  into at least two separated pieces (the \emph{oligomers}). As a
483  result of the cleavage reaction, groups originating in the cleaving
484  molecule remain attached to the polymer at the precise cleavage
485  location;
486
487\smallskip
488
489\textsc{A fragmentation is a chemical process}\index{fragmentation} by
490which the polymer structure is disrupted into separated pieces (the
491\emph{fragments}) mainly because of energy-dependent electron doublet
492rearrangements leading to bond breakage.
493\end{mynoindent}
494
495
496
497\renewcommand{\sectitle}{Polymer Cleavage}
498\subsection*{\textcolor{sectioningcolor}{\sectitle}}
499\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
500\label{sect:polymer-cleavage}
501\index{cleavage}
502
503We said above that, upon cleavage of a polymer, the cleaving molecule
504reacts with it, and by doing so directly or indirectly
505``\emph{dissolves}'' an inter-monomer bond. A polymer cleavage always
506occurs in such a way as to generate a set of \emph{true} polymers
507(smaller in size than the parent polymer, evidently, which is why they
508are called \emph{oligomers}). Indeed, let us take the example shown in
509Figure~\ref{fig:prot-cleavage}, where a tripeptide (a very little
510protein, containing a methionyl residue at position 2) is submitted
511either to a water-mediated cleavage (hydrolysis, upper panel) or to a
512cyanogen bromide-mediated cleavage (lower panel). The two cases
513presented in this figure are similar in some respects and different in
514others:
515
516\begin{itemize}
517
518\item In the first case the molecule that is responsible for the
519  cleavage is water, while in the second case it is cyanogen bromide;
520
521\item In both cases the bond that is cleaved is the inter-monomer bond
522  (in protein chemistry this is a peptidic bond);
523
524\item In both cases the Oligomer 2 has the same structure;
525
526\item The structures of the Oligomer 1 species differ when produced
527  using water or cyanogen bromide as the cleaving molecule.
528
529\end{itemize}
530%
531%
532The difference between hydrolysis and cyanogen bromide cleavage is in
533the generation of the Oligomer 1 species: the cyanogen bromide
534cleavage has a side effect of generating a homoserine as the right end
535monomer of Oligomer 1, while hydrolysis generates a genuine methionine
536monomer. This is because water reverses in a very symmetrical manner
537what polymerization did (hydrolysis is the converse of condensation),
538while cyanogen bromide\index{cyanogen~bromide} did some chemical
539modification onto the generated Oligomer 1 species.
540
541\begin{figure}
542  \begin{center}
543    \includegraphics[width=0.8\textwidth]{figures/prot-cleavage.png}
544  \end{center}
545  \caption[Protein cleavage by water and cyanogen
546  bromide]{\textbf{Protein cleavage by water and cyanogen bromide.} A
547    tripeptide is cleaved at position 1 either by hydrolysis (top) or
548    by cyanogen bromide (bottom). Cyanogen bromide cleaves
549    specifically on the right of a methionine monomer. Upon cleavage,
550    the methionyl monomer gets converted into homoserine by the
551    cyanogen bromide reagent.}
552  \label{fig:prot-cleavage}
553\end{figure}
554
555Nonetheless, the reader might have noted that---interestingly---all
556the four oligomers do effectively have their left cap (a proton) and
557their right cap (the hydroxyl). This means that in both water- and
558cyanogen bromide-mediated cleavages, all the generated oligomers are
559indeed true polymers in the sense that: 1) they are a chain of
560monomers (modified or not) and 2) they are correctly capped
561(\textit{i.e.} they are polymers in their finished state). This is
562important because it is the basis on which we shall make the
563difference between a cleavage process and a fragmentation process.
564Thus, the \mXp\ definition of an oligomer might be: \emph{an oligomer
565  is a polymer (of at least one monomer) in its finished state that
566  was generated upon cleavage of a longer polymer}.
567
568
569When the polymer cleavage reaction precisely reverses the reaction
570that was performed for the same polymer's synthesis, there is no
571special difficulty. But when the cleavage reaction modifies the
572substrate, then this should be carefully modelled. How? To answer this
573question we might start by comparing the two different Oligomer 1
574species that were yielded upon the water-mediated and the cyanogen
575bromide-mediated cleavage reactions: ``the hydrolysis-generated
576Oligomer 1 is equal to the cyanogen bromide-generated Oligomer 1 +S1
577+C1 +H2 -O1''; this is a big difference! The observations we did so
578far might be worded this way: \textsl{Whenever a protein undergoes a
579  cyanogen bromide-mediated cleavage, the ``-C1H2S1+O1'' chemical
580  reaction should be applied to the resulting oligomers \textit{if and
581    only if} they have a methionine monomer at their right end}. In
582\mXp's jargon, this logical condition is called a \emph{cleavage rule}
583(described later; see page~\pageref{sect:cleavespecif}).
584
585Well, all this sounds reasonable. But what about the ``normal'' case,
586when the cleavage is done using water? Nothing special: the mass of
587the oligomer is calculated by summing the mass of each monomer in the
588oligomer (since the monomers are not modified, this is easily done)
589and the masses corresponding to the left and right caps (these are
590defined in the polymer chemistry definition; in our present case it
591would be a proton on the left end, and a hydroxyl on the right end).
592In this way, the oligomer complies with its definition, which states
593that it is a faithful polymer made of monomers and that it is in its
594finished state.
595
596Yes, but then how will \mXp\ manage to calculate the mass of the
597modified oligomer, like our Oligomer 1 in the case of the cyanogen
598bromide-mediated cleavage?  Simple enough: in a first step it does
599exactly the same way as for the unmodified oligomer. Next, each
600oligomer is checked for presence or absence of a methionine residue on
601its right end. If a methionine is found, the mass corresponding to the
602``-C1H2S1+O1'' chemical reaction is applied. And that's it.
603
604In the previous cyanogen bromide example, the logical condition was
605involving the identity of the oligomers' right end monomer, but other
606examples can involve not the right end monomer, but the left end
607monomer, if some chemical modification was to occur to the monomer
608sitting right of the cleavage location. In this case the user would
609have to analyse the situation and provide \mXp\ with the proper
610chemical reaction by stating something analog to: \textsl{\textit{if
611    and only if} they have a Xyz monomer at their left end}. This
612introduction to polymer cleavage abstraction should be enough to later
613delve into the cleavage specification definition as \mXp\ conceives it
614and that is thoroughly detailed at page~\pageref{sect:cleavespecif}.
615
616
617\renewcommand{\sectitle}{Polymer Fragmentation}
618\subsection*{\textcolor{sectioningcolor}{\sectitle}}
619\addcontentsline{toc}{subsection}{\numberline{}\sectitle}
620\label{sect:polymer-fragmentation}
621\index{fragmentation}
622
623In a fragmentation process, the bond that is broken is not necessarily
624the inter-monomer bond. Indeed, fragmentations are oft-times high
625energy chemical processes that can affect bonds that belong to the
626monomers' internal structure. This is one of the reasons why
627fragmentations do differ from cleavages: they are specific of the
628polymer type in which they occur. Hydrolyzing a protein and an
629oligosaccharide is just the same process, from a chemical point of
630view. But fragmenting a protein or an oligosaccharide are truly
631different processes because the way that the fragmentation happens in
632the polymer sequence is so much dependent on the nature of each
633monomer that makes it.
634
635Another peculiarity of the fragmentations, compared with the cleavages
636that were described above, is the fact that there is no cleaving
637molecule starting the process. Instead, a fragmentation process is
638often initiated by an intra molecular electron doublet rearragement
639that propagates more or less in the polymer structure to eventually
640break it. Fragmentations are mainly a gas phase process, not some
641reaction that happens in solution as a result of putting in contact
642the polymer and some reagent. It is precisely because no cleaving
643molecule is involved in the fragmentation process that the fragments
644are not necessarily capped like a normal polymer should be; and this
645is another really important difference between cleavage and
646fragmentation. The following examples should illustrate these
647concepts: protein and nucleic acid fragmentation.
648
649
650\subsubsection*{Protein Fragmentation}
651\index{fragmentation!protein}
652
653There is a pretty important number of different kinds of fragments
654that can be generated upon fragmentation of peptides. We are going to
655detail the most common ones; the user is invited to use the \mXp'
656fragmentation-specification grammar to add less frequent (or newly
657discovered) fragmentation types. Note that the fragmentation schemes
658below apply to positively-charged precursor ions. To compute the
659product ions' masses obtained in negative mode fragmentation
660experiments, then, simply remove as many protons as required. For
661example, to switch from a fragment positively charged once (+H), then
662remove a first proton to go back to the uncharged state and then
663remove another proton to yield the deprotonated (thus singly
664negatively charged) ion product. The requirement to be able to
665computed masses for the positively- and negatively-charged ion
666products imposes a specific way to defined fragmentation
667specifications in the \xpd{} module (to be detailed later in this
668manual).
669
670\begin{figure}
671  \begin{center}
672    \includegraphics[scale=0.2]{figures/prot-fragmentation.png}
673  \end{center}
674  \caption[Protein fragmentation]{\textbf{Protein fragmentation
675      patterns most widely encountered.} An hexapeptide is fragmented
676    in the seven most widely encountered manners, such as to generate
677    a, b, c, x, y, z and immonium fragment ions. The figure
678    illustrates the position of the cleavage for each kind of fragment
679    (exemplified using the case of the smallest fragment possible) and
680    the mass calculation method is described for each fragment kind;
681    consider that each fragment bears only \emph{one positive}
682    charge.}
683  \label{fig:prot-fragmentation}
684\end{figure}
685
686As can be seen from Figure~\ref{fig:prot-fragmentation}, the
687fragmentations do generate fragments of three categories: the ones
688that include the left end of the precursor polymer (a, b, c), the ones
689that include the right end of the precursor polymer (x, y, z), and
690finally the special case in which the fragment is an \emph{internal
691  fragment}, like the immonium ions. When looking at the
692fragmentations described in the figure it becomes immediately clear
693why a fragmentation cannot be mistaken for a cleavage: the ionization
694of the fragment is not necessarily due to the captation of a proton by
695the fragment. Furthermore, we can also see that a fragmentation is not
696a cleavage because the fragment that is generated is \emph{absolutely}
697not necessarily what we call a polymer, in the sense that the fragment
698might not be capped the same way as the precursor polymer is (that is,
699the fragment is not in its finished polymerizaton state).
700
701The two observations above should make clear to the reader that
702calculating masses for fragments is a more difficult process than what
703was described above for the oligomers. Indeed, while it was simple to
704calculate the mass of an oligomer (by simply adding the masses of its
705constitutive monomer units, plus the left and right caps, plus
706ionization), here there is no chemical formalism generally applicable
707to all the fragment types. This is why the specification of the
708fragmentation is left to the user's responsibility.
709
710By looking at Figure~\ref{fig:prot-fragmentation}, the reader should
711have noticed that the fragment naming scheme takes into consideration
712the fact that the fragment bears the left or the right end of the
713precursor polymer (or none, also). Indeed, the numbering of fragments
714holding the left end of the precursor polymer sequence begins at the
715left end, and for fragments that hold the right end, at the right end.
716Thus the third fragment of series \emph{a}---\emph{a3}---would involve
717monomers [1$\rightarrow$3]; and the third fragment of series
718\emph{y}---\emph{y3}---would involve monomers [6$\rightarrow$4] (in
719the figure, these left-to-right and right-to-left directions are
720symbolized using arrows). Therefore, it should appear to the reader
721how important---when specifying a fragmentation---it is to clearly
722indicate from which end of the precursor polymer the fragment is
723generated (in \mXp's jargon this is ``LE'' for left end, ``RE'' for
724right end and ``NE'' for no end). \mXp\ knows what action it should
725take when it encounters one of these three specifications; for
726example, if a ``LE'' specification is found for a given fragmentation
727specification, \mXp\ adds to the fragment's mass the mass
728corresponding to the left cap of the precursor polymer.
729
730
731\paragraph{\emph{a} fragment series} If we take the \emph{a} fragment
732series, the Figure~\ref{fig:prot-fragmentation} indicates that the
733fragments include the left end and that their last monomer lacks its
734carbonyl group (see, on top of Figure~\ref{fig:prot-fragmentation},
735that the \emph{a1} arrow goes between the C$\alpha$H and the CO of
736monomer 1?).  So we would say that each fragment of the \emph{a}
737series should be challenged with the following chemical treatments: 1)
738addition of the mass corresponding to the left cap (proton), 2)
739removal of the mass corresponding to the lacking CO group. This way we
740have the mass of fragment \emph{a1}. If we were interested in the
741fragment \emph{a4} we would have summed the masses of monomers 1 to 4,
742added the mass of the left cap, and finally removed the mass of a CO.
743The mass calculation is thus mathematically expressed \[a_i = LC +
744\sum_{1}^{i} M_i - CO\]
745
746\paragraph{\emph{b} fragment series} Similarly, the mass calculation
747is mathematically expressed \[b_i = LC + \sum_{1}^{i} M_i\]
748
749\paragraph{\emph{c} fragment series} The mass calculation is
750mathematically expressed \[c_i = LC + \sum_{1}^{i} M_i + NH_3\]
751
752\paragraph{\emph{x} fragment series} For this series of fragments we
753do not add the left cap anymore, but replace it with the right cap,
754since the fragments hold the right end of the precursor polymer. Note
755also that the numbering of the monomers using the variable \emph{i} in
756the following mathematical expressions goes from right to left
757(contrary to what happened for the \emph{a, b, c} fragment series. All
758the fragments that hold the precursor polymer right end are numbered
759this way, so this applies to fragments \emph{x, y, z}. The mass
760calculation is mathematically expressed \[x_i = RC + \sum_{1}^{i} M_i
761+ CO\]
762
763\paragraph{\emph{y} fragment series} The calculation is mathematically
764expressed \[y_i = RC + \sum_{1}^{i} M_i + H_2\]
765
766\paragraph{\emph{z} fragment series} In low energy CID, the \emph{z}
767fragments are expressed this way: \[z_i = RC + \sum_{1}^{i} M_i - NH\]
768which is equivalent to \emph{y-$NH_3$}; in high energy CID an
769additional proton is often measured: \[z_i = RC + \sum_{1}^{i} M_i -
770NH + H\]
771
772\paragraph{\emph{immonium} fragment series} These fragments are
773internal fragments in the sense that they do not hold neither of the
774two precursor polymer's ends. \mXp\ understands that the user is
775speaking of this kind of fragment when the ``from which end'' piece of
776data --in the fragmentation specification-- states ``NE'' instead of
777``LE'' or ``RE'' (see page~\pageref{sect:fragspecif}). The mass
778calculation for these fragments does not take into account the
779monomers surrounding the one for which the calculation is done. The
780mass for an immonium ion --at position \emph{i} in the precursor
781polymer-- will be the mass of the monomer at position \emph{i}, less
782the mass of a CO, plus the mass of a proton. The mass calculation for
783these special internal fragments is expressed \[imm_i = M_i + H - CO\]
784
785
786\subsubsection*{Nucleic Acid Fragmentation}
787\index{fragmentation!nucleic~acid}
788
789The fragmentations that can be obtained with nucleic acids are
790numerous and it is more complicated than with proteins to describe
791them fully.  The main reason for this is that there are a big number
792of fragmentation combinations because of the loss of nitrogenous bases
793from the skeleton. The mechanisms by which this loss happens are
794fairly complex, and I am not going to detail any of them.  Figure
795~\vref{fig:dna-fragmentation} shows the most common fragmentations
796(without taking into consideration the potential loss of bases). An
797example of fragment is given for each fragment series (pretty the same
798way as we did before for proteins). Note that the fragment
799representations are aimed at helping the reader to figure out what the
800product ion is, not taking into account where the negative charge lies
801on the fragment, since this charge can float around at every
802de-protonatable group. All the fragments shown bear one and one only
803negative charge.
804
805Another remark pertaining to the ionization mode of the ion products:
806the fragmentation schemes below apply to negatively-charged precursor
807ions (by loss of a proton, typically). To compute the product ions'
808masses obtained in positive mode fragmentation experiments, then,
809simply add as many protons (or any other cationic ionization agent) as
810required. For example, to switch from a fragment negatively charged
811once (-H), then add a first proton to go back to the uncharged state
812and then add another proton to yield the monoprotonated (thus singly
813positively charged) ion product. The requirement to be able to
814computed masses for the positively- and negatively-charged ion
815products imposes a specific way to defined fragmentation
816specifications in the \xpd{} module (to be detailed later in this
817manual).
818
819The reader might have noticed at the bottom of Figure
820~\vref{fig:dna-fragmentation} that a provision is made in the case the
821fragmented molecular species are not 5' end-phosphorylated but 5'
822end-hydroxylated. Indeed, the canonical monomer is such that, upon
823polymerization and left capping, the 5' end is phosphorylated.
824However, oft-times the oligonucleotides are synthesized chemically
825without the 5' end phosphate group, thus ending in hydroxyl. This
826special case should be accounted for by applying to all the fragments
827that bear the left end of the precursor polymer the following chemical
828reaction: $\mathrm{-HPO_3}$. This chemical reaction should be applied
829\emph{in addition} to the chemical reaction that yields the fragment
830\emph{per se}.
831
832\begin{figure}
833  \begin{center}
834    \includegraphics[scale=0.2]{figures/dna-fragmentation.png}
835  \end{center}
836  \caption[DNA fragmentation]{\textbf{DNA fragmentation patterns most
837      widely encountered.} A short DNA sequence is fragmented in the
838    eight most widely encountered manners, such as to generate a, b,
839    c, d, w, x, y, z fragment ions. The figure illustrates the
840    position of the cleavage for each kind of fragment (exemplified
841    using the case of the smallest fragment possible). and the mass
842    calculation method is described for each fragment kind;
843    considering that each fragment is protonated only once (+1).}
844  \label{fig:dna-fragmentation}
845\end{figure}
846
847Exactly as done earlier for the protein fragments, the mathematical
848expressions used to calculate the mass of different series of nucleic
849acid fragments are provided; in these calculations it is assumed that
850the left end of the precursor polymer is phosphorylated (5'P) and the
851reader should bear in mind that this precise phosphate might itself be
852expelled by the fragmentation. The fragment naming schemed detailed
853earlier for proteins applies to nucleic acids in the very same manner.
854
855\paragraph{\emph{a} fragment series}
856These fragments most often appear with base loss. \[a_i = LC +
857\sum_{1}^{i} M_i - O\]
858
859\paragraph{\emph{b} fragment series}
860\[b_i = LC + \sum_{1}^{i} M_i\]
861
862\paragraph{\emph{c} fragment series}
863\[c_i = LC + \sum_{1}^{i} M_i - HPO_2\]
864
865\paragraph{\emph{d} fragment series}
866\[d_i = LC + \sum_{1}^{i} M_i - HPO_3\]
867
868\paragraph{\emph{w} fragment series}
869\[w_i = RC + \sum_{1}^{i} M_i + O\]
870
871\paragraph{\emph{x} fragment series}
872\[x_i = RC + \sum_{1}^{i} M_i\]
873
874\paragraph{\emph{y} fragment series}
875\[y_i = RC + \sum_{1}^{i} M_i - HPO_2\]
876
877\paragraph{\emph{z} fragment series}
878\[z_i = RC + \sum_{1}^{i} M_i - HPO_3\]
879
880There are also a variety of fragments for which a base is lost.
881
882
883\subsubsection*{More Complex Patterns Of Fragmentation}
884
885
886Before finishing with fragmentations, it is necessary to describe a
887powerful feature of the fragmentation specification grammar available
888in \mXp. This feature was required for the fragmentation of
889oligosaccharides and also sometimes for proteins. When the
890fragmentation (the bond breakage reaction itself) occurs at the level
891of certain monomers, it might be necessary to be able to specify some
892particular chemistry that would arise on the monomer in question.
893
894We have seen in the cleavage documentation that, upon cleavage of a
895protein sequence with cyanogen bromide, for example, a particular
896chemical reaction had to be applied to the oligomers that were
897generated with a methionine monomer as their right end monomer. Well,
898in a fragmentation specification it is possible to apply comparable
899chemical reactions but in a more thorough manner. Indeed, while in the
900cleavage it was possible to say something like ``\textsl{apply a given
901  chemical reaction to the oligomer if the right end monomer is
902  Xyz''}, in the fragmentation the logical condition can be bound not
903only to the identity of the currently fragmented monomer, but also
904(optionally) to the identity of the previous and/or next monomer in
905the precursor polymer sequence. For example: ---\textsl{``Apply a
906  given chemical reaction if fragmentation occurs at the level of
907  ``Xyz'' monomer only if it is preceded by a ``Yxz'' monomer and
908  followed by a ``Zyx'' monomer''}.
909
910These logical conditions are called \emph{fragmentation rules}. A
911\emph{fragmentation specification} can hold as many rules as
912necessary. All of this is described in great detail at
913page~\pageref{sect:fragspecif}.
914
915\subsubsection*{To Sum Up}
916
917
918To sum up all what we have seen so far with polymer chain disrupting
919chemistries:
920
921\begin{itemize}
922\item A polymer sequence gets cleaved into oligomers when a chemical
923  reaction occurs in it at the level of one or more inter-monomer
924  bond(s); monomer-specific chemical reactions can be modelled into
925  the cleavage specification using at most one leftrighrule;
926\item A polymer sequence gets fragmented into fragments when a bond
927  breakage occurs, without the help of any exterior molecule, at any
928  level of the polymer structure, with no limitation to the
929  inter-monomer bond; monomer-specific chemical reactions can be
930  modelled into the fragmentation specification using any number of
931  fragrules;
932\item Oligomers are automatically capped---\emph{on both ends}---using
933  the rules described in the precursor polymer's definition;
934\item Fragments are capped automatically only---\emph{on the end they
935    hold, if any}---using the rules described in the precursor
936  polymer's definition;
937\item Oligomers are automatically ionized (if required by the user)
938  using the rules described in the precursor polymer's definition;
939\item Fragments are never ionized automatically; ionization (gain/loss
940  of a charged group) is necessarily integrated in the fragmentation
941  specification.
942\end{itemize}
943
944
945\cleardoublepage
946
947
948%%% Local Variables:
949%%% mode: latex
950%%% TeX-master: "massxpert"
951%%% End:
952