1\chapter[Basics in Polymer Chemistry]{Basics in Polymer Chemistry} 2\label{chap:basics-polymer-chemistry} 3 4This chapter will introduce the basics of polymer 5chemistry\index{polymer~chemistry}. The way this topic is going to be 6covered is admittedly biased towards mass spectrometry and biological 7polymers. Moreover, the aim of this chapter is to provide the reader 8with the specialized words that will later be used to describe and 9explain the (inner) workings of the \mXp\ program. This manual is not 10a ``crash course'' in biochemistry. 11 12\renewcommand{\sectitle}{Polymers? Where? Everywhere!} 13\section*{\textcolor{sectioningcolor}{\sectitle}} 14\addcontentsline{toc}{section}{\numberline{}\sectitle} 15 16Indeed, polymers are everywhere. If you ask somebody to show you 17something polymeric, he/she will point you at the first plastic object 18in the vicinity. Right, plastic materials are made of hydrocarbon 19polymers. We also have many different polymers in our body. Proteins 20are polymers, complex sugars are polymers, DNA (the so-called 21``molecule of heredity'' is a \emph{huge} polymer. There are polymers 22in wine, in wood... Where? Everywhere! 23 24\bigskip 25 26\noindent The \textsl{Oxford Advanced Learner's Dictionary of Current 27 English} gives for \emph{polymer} the following definition: 28\textit{natural or artificial compound made up of large molecules 29 which are themselves made from combinations of small simple 30 molecules}. 31 32\bigskip 33 34\noindent A polymer is indeed made by covalently linking small simple 35molecules together. These small simple molecules are called 36\emph{monomer}s, and it is immediate that a \emph{polymer} is made of 37a number of monomers. A general term to describe the process that 38leads to the formation of a polymer is \emph{polymerization}. It 39should be noted that there are many ways to polymerize monomers 40together. For example, a polymer might be either linear or branched. A 41polymer is linear if the monomers that are polymerized can be joined 42at most two times. The first junction links the monomer to an 43elongating polymer (thus making it the new end of the elongating 44polymer which, by the way, is longer than before by one unit) and the 45second junction links the new elongating polymer's end to another 46monomer. This process goes on until the reaction is stopped, the point 47at which the polymer reaches its \emph{finished 48 state}\index{finished~state}. A branched polymer is a polymer in 49which at least one monomer is able to contract more than two bonds. It 50is thus clear that a single monomer linked three times to other 51monomers will yield a ``T-structure'', which is nothing but a branched 52structure. 53 54In the following sections we'll describe a number of different kinds 55of polymers. Each time, they will be described by initially detailing 56the structure of their constitutive monomers; next the formation of 57the polymer is described. At each step we shall try to set forth each 58polymer characteristics in such a manner as to introduce the way \mXp\ 59``thinks polymers'' and to introduce specialized terminologies. Once 60the basic chemistries (of the different polymers) have all been 61described, we will enter a more complex subject that is of enormous 62importance to the mass spectrometry specialist: polymer chain 63disrupting chemistry. We shall see that this terminology actually 64involves two kinds of chemistries: cleavage, on the one hand, and 65fragmentation, on the other hand. 66 67While \mXp\ is basically oriented to linear single-stranded polymer 68chemistries, it can also be used to simulate highly complex polymer 69chemistries. Biological polymers are the main focus of this manual, 70however all the concepts described here may be applied with no 71modification to synthetic polymer chemistries. 72 73 74\renewcommand{\sectitle}{Various Biopolymer Structures} 75\section*{\textcolor{sectioningcolor}{\sectitle}} 76\addcontentsline{toc}{section}{\numberline{}\sectitle} 77 78Biopolymers are amongst the most sophisticated and complex polymers on 79earth and it certainly is not a mistake to take them as examples of 80how monomers (be these complex or not) can assemble covalently into 81life-enabling polymers. In this section we will visit three different 82polymers encountered in the living world: proteins, nucleic acids and 83polysaccharides. We shall be concerned with 1) the monomers' 84structure, 2) the polymerization reaction and 3) the final end-capping 85reaction responsible for putting the polymer in its finished state. 86 87\renewcommand{\sectitle}{Proteins} 88\subsection*{\textcolor{sectioningcolor}{\sectitle}} 89\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 90\index{protein} 91 92These biopolymers are made of amino acids. There are twenty major 93amino acids in nature, and each protein is made of a number of these 94amino acids\index{amino~acid}. The combinations are infinite, 95providing enormous diversity of proteins to the living world. 96 97A protein is a polar polymer: it has a left end and a right 98end\index{protein!left/right ends}, and polymerization actually occurs 99from left to right (from N-terminus to C-terminus, see 100below). Figure~\ref{fig:peptbond-formation} shows that the chemical 101reaction at the basis of protein synthesis is a 102\emph{condensation}\index{condensation}. A protein is the result of 103the condensation of amino acids with each other in an orderly polar 104fashion. A protein has a left end, called \emph{N-terminus; amino 105 terminal end} and a right end, called \emph{C-terminus; carboxyl 106 terminal end}. The left end is an amino group ($\mathrm{_2HN--}$) 107corresponding to the non-reacted amino group of the amino acid. Upon 108condensation of a new amino acid onto the first one, the carboxyl 109group of the first amino acid reacts with the amino group of the 110second amino acid. A water molecule is released, and the formation of 111an amide bond between the two amino acids yields a dipeptide. The 112right end of the dipeptide is a carboxyl group (--COOH) corresponding 113to the un-reacted carboxyl group of the last amino acid to have 114``polymerized in''. 115 116The bond formed by condensation of two amino acids is an amide 117bond\index{protein!amide~bond}, also called---in protein chemistry---a 118\emph{peptidic bond}. The elongation of the protein is a simple 119repetition of the condensation reaction shown in 120Figure~\ref{fig:peptbond-formation}, granted that the elongation 121\emph{always} proceeds in the described direction (a new monomer 122arrives to the right end of the elongating polymer, and elongation is 123done from left to right). 124 125\begin{figure} 126 \begin{center} 127 \includegraphics[width=0.4\textwidth]{figures/peptbond-formation.png} 128 \end{center} 129 \caption[Peptidic bond formation]{\textbf{Peptidic bond formation by 130 condensation.} The left end monomer $\mathrm{R_1}$ is condensed 131 to the right end monomer $\mathrm{R_2}$ to yield a peptidic bond. 132 A water molecule is lost during the process.} 133 \label{fig:peptbond-formation} 134\end{figure} 135 136Now we should point at a protein chemistry-specific terminology issue: 137we have seen that a protein is a polymer made of a number of monomers, 138called amino acids\index{amino~acid}. In protein chemistry, there is a 139subtlety: once a monomer is polymerized into a protein it is no more 140called a monomer, it is called a \emph{residue}\index{residue}. We may 141say that a residue is an amino acid less a water molecule. 142 143From what we have seen until now, we may define a protein this way: 144---\textsl{``A protein is a chain of residues linked together in an 145 orderly polar fashion, with the residues being numbered starting 146 from 1 and ending at n, from the first residue on the left end to 147 the last one on the right end''}. This definition is still partly 148inexact, however. Indeed, from what is shown in 149Figure~\ref{fig:prot-polymer}, there is still a problem with the 150extremities of the residual chain: what about the amino group on the 151left end of a protein (the amino group sits right onto the first amino 152acid of the protein), and what about the carboxyl group of the right 153end of a protein (the carboxyl group sits right onto the last amino 154acid of the protein)? Because these groups lie at the extremities of 155the residual chain, they remained unreacted during the polymerization 156process. But because we are simulating a residual chain using residues 157and not amino-acids, we still need to put the residual chain in its 158finished state: by \emph{capping} the left end with a proton 159\emph{cap} (so as to complete the amino group) and the right end with 160a hydroxyl cap (so as to complete the carboxyl 161group)\index{protein!left/right~caps}. The capping of the residual 162chain extremities ensures that the polymer is in its finished state, 163and that it cannot be elongated anymore. The proton is the \emph{left 164 cap} of the protein polymer and the hydroxyl is the \emph{right cap} 165of the protein polymer. 166 167\begin{figure} 168 \begin{center} 169 \includegraphics[width=0.4\textwidth]{figures/prot-polymer.png} 170 \end{center} 171 \caption[End capping chemistry of the protein polymer]{\textbf{End 172 capping chemistry of the protein polymer.} A protein is made of 173 a chain of residues and of two caps. The left cap is the 174 N-terminal proton and the right cap is the C-terminal hydroxyl. 175 Altogether, the residual chain (enclosed here in the blue 176 polygon) and both the H and OH red-colored caps do form a 177 complete protein polymer in its finished state.} 178 \label{fig:prot-polymer} 179\end{figure} 180 181Now comes the question of unambiguously defining the structure of a 182protein. It is commonly accepted that the simple ordered sequence of 183each residue code in the protein, from left to right, constitutes an 184unambiguous description of the protein's primary structure (that is 185its sequence). Of course, proteins have three-dimensional structures, 186but this is of no interest to a program like \mXp, which is aimed at 187calculating masses of polymers. To enunciate unambiguously the 188sequence of a protein, one would use a symbology like this: 189\smallskip 190 191\begin{mynoindent} 192 {\footnotesize using the 3-letter code of the amino acids:}\\ 193 Ala Gly Trp Tyr Glu Gly Lys\\ 194 {\footnotesize or, using the 1-letter code of the amino acids:}\\ 195 A G W Y E G K\\ 196 Alanine is thus the residue 1 and Lysine is the last residue (n = 197 7). 198\end{mynoindent} 199 200 201\renewcommand{\sectitle}{Nucleic Acids} 202\subsection*{\textcolor{sectioningcolor}{\sectitle}} 203\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 204\index{nucleic~acid} 205 206These biopolymers are more complex than proteins, mainly because they 207are composed of monomers \emph{(nucleotides)}\index{nucleotide} that 208have three different chemical parts, and because those parts differ in 209DNA and RNA. A nucleotide\index{nucleic~acid!left/right~ends} is the 210nucleic acid's brick: \emph{a nucleotide consists of a nitrogenous 211 base combined with a ribose/deoxyribose sugar and with a phosphate 212 group}. There are two different kinds of nucleic acids: 213deoxyribonucleic acid (DNA, the sugar is a deoxyribose) and 214ribonucleic acid (RNA, the sugar is a ribose). DNA is most often found 215in its double stranded form, while RNA is most often found in single 216strand form. There are four nitrogenous bases for each: Adenine, 217Thymine, Guanine, Cytosine for DNA; in RNA only one of these bases 218changes: Thymine is replaced by Uracile. As for proteins, nucleic 219acids are polar polymers: the polymerization process is polar, from 220left to right (sometimes left is up and right is down in certain 221vertical representations found mainly in textbooks). 222 223This manual is not to teach biochemistry, which is why the structure 224of the monomers is not described in atomic detail. However, since it 225is important to understand how the polymerization occurs, 226Figure~\ref{fig:nucacbond-formation} represents the polymerization 227reaction mechanism between a nucleotide and another one, to yield a 228dinucleotide. That reaction is a \emph{trans-esterification}. A 229nucleic acid has a left end---\emph{5' end; often this end is 230 phosphorylated}---and a right end---\emph{3' end; hydroxyl end}. The 231trans-esterification reaction is the attack of the phosphorus of the 232new (deoxy)nucleotide triphosphate by the 3'OH of the right end of the 233elongating nucleotidic chain. Upon 234trans-esterification\index{trans-esterification}, an \emph{inorganic 235 pyrophosphate} ($\mathrm{PP_i}$) is released, and the formation of a 236phosphodiester bond\index{nucleic~acid!phosphodiester~bond} between the two 237nucleotides yields a dinucleotide. The elongation of the nucleic acid 238polymer is a simple repetition of this esterification reaction so that 239the chain growth is always in the 5'$\Longrightarrow$3' direction. 240This is achieved in the living cells by what is called the 241\emph{5'$\Longrightarrow$3' polymerase enzymatic activity}. 242 243\begin{figure} 244 \begin{center} 245 \includegraphics[width=0.8\textwidth]{figures/nucacbond-formation.png} 246 \end{center} 247 \caption[Phosphodiester bond formation]{\textbf{Phosphodiester bond 248 formation by esterification.} The arriving monomer (on the 249 right) has its triphosphate on the 5' carbon of the sugar 250 esterified by nucleophilic attack of the first phosphorus by the 251 alcohol function beared by the 3' carbon of the (deoxy)ribose 252 sugar ring of the left monomer. The bond that is formed is a 253 phosphodiester bond, with release of a pyrophosphate group 254 ($\mathrm{PP_i}$). Note that the sugar and nitrogenous bases are 255 schematically represented in this figure.} 256 \label{fig:nucacbond-formation} 257\end{figure} 258 259 260The conventional representation of a nucleic acid involves showing the 2615' end on the left, and the 3' end on the right, horizontally. 262Sometimes, to clearly indicate that the left end is phosphorylated, 263while the right end is not, the ends are indicated as ``5'P'' and 264``3'OH''. Figure~\ref{fig:nucac-polymer} shows a simple way to 265formalize what a nucleic acid polymer is. The molecule represented on 266the left is the ``monomer'' in the sense that the polymer is made of n 267monomers. On the right side of that figure, the polymer made of n 268monomers is shown as a residual chain (inside the blue polygon box) 269that got capped with OH on its left end and H on its right end 270(red-colored atoms)\index{nucleic~acid!left/right~caps}. Thus, in the 271case of the nucleic acid polymers, the left cap is a hydroxyl and the 272right cap is a proton. This anecdotically happens to be the exact 273converse of what was described earlier for proteins. 274 275\begin{figure} 276 \begin{center} 277 \includegraphics[width=0.5\textwidth]{figures/nucac-polymer.png} 278 \end{center} 279 \caption[A nucleic acid is a capped nucleotide chain]{\textbf{End 280 capping chemistry of the nucleic acid polymer.} A nucleic acid 281 is made of a chain of nucleotides (left formula) and of two caps. 282 The left cap is the hydroxyl group that belongs to the terminal 283 phosphate of the 5' carbon of the sugar. The right cap is the 284 proton that belongs to the hydroxyl group of the 3' carbon of the 285 sugar ring (right formula). Altogether, a finished nucleic acid 286 polymer is made of the nucleotidic chain (enclosed here in the 287 blue polygon), made of the repetitive elements (one of which is 288 shown on the left), and of the two caps (red-colored OH and H, out 289 of the box on the right).} 290 \label{fig:nucac-polymer} 291\end{figure} 292 293Now comes the question of unambiguously defining the structure of a 294nucleic acid. It is commonly accepted that the listing of the named 295nitrogenous bases in the nucleic acid---from left (5' end) to right 296(3' end)---constitutes an unambiguous description of the nucleic acid 297sequence. To enunciate the sequence of a gene, one would use a 298symbology like this: 299\smallskip 300 301\begin{mynoindent} 302 {\footnotesize for a DNA, using the 1-letter code of the nitrogenous 303 bases:} A T G C A G T C\\ 304 {\footnotesize for an RNA, using the 1-letter code of the 305 nitrogenous bases:} A U G C A G U C\\ 306 Adenine is thus the base 1 and Cytosine is the last base (n = 8). 307\end{mynoindent} 308 309 310\renewcommand{\sectitle}{Saccharides} 311\subsection*{\textcolor{sectioningcolor}{\sectitle}} 312\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 313\index{saccharide} 314 315These biopolymers are certainly amongst the most complex ones in the 316living world. This is mainly due to the fact that saccharides are 317usually heavily modified in living cells with a huge variety of 318chemical modifications. Furthermore, the ramifications in the polymer 319structure are more often the normal situation than not. Interestingly, 320these molecules are first thought of as the ``fuel'' for the cell, 321which is certainly far from being total nonsense, but it is also 322undoubtful that their structural role is extremely important (often in 323combination with proteinaceous material). Another interesting aspect 324of their ability to form complex structures is their use as ``key'' 325systems for identification processes: a number of complex sugars are 326located on the cell walls and provide ``recognition patterns'' for the 327other cells to deal with\dots 328 329Nonetheless, the general picture is not that complex, if the way 330monomers are polymerized together is the only concern (which is the 331case in this manual). As far as we are concerned, in fact, the 332polymerization mechanism is a simple condensation (much like what has 333been described for proteins), yielding a sugar 334bond\index{saccharide!sugar~bond}. Indeed, some people use the same 335terminology: a monomeric sugar becomes a residue once polymerized in 336the saccharidic chain. There are two main different kinds of sugars: 337\emph{pentoses} (in $\mathrm{C_5}$) and \emph{hexoses} (in 338$\mathrm{C_6}$); it should be noted, however, that there is a variety 339of other common molecules, like \emph{sialic acids}, 340\emph{heptoses}\dots 341 342Like already seen for proteins and nucleic acids, a saccharidic 343polymer is polar: it has a left end and a right 344end\index{saccharide!left/right~ends}. The terminology regarding the 345ends of a saccharidic polymer is rather unexpected at first sight: the 346left end is said to be the \emph{non-reducing 347 end}\index{non-reducing~end} while the right end is said to be the 348\emph{reducing end}\index{saccharide!reducing~end}. Historically this 349was observed with monosaccharides (also called 350\emph{monoses}\index{monose}), which reduced cupric 351($\mathrm{Cu^{2+}}$) ions, thus getting oxydized themselves on the 352carbonyl (when in the open ring aldehydic form). 353 354Figure~\vref{fig:sacchbond-formation} shows the polymerization 355reaction between a sugar and another one (2 glucose monomers, 356actually), to yield a maltose disaccharide. The polymerization 357mechanism is a simple condensation. The elongation of the saccharidic 358polymer is a simple repetition of this condensation reaction so that 359the chain growth is always in the same orientation, from the 360non-reducing end to the reducing end. The conventional representation 361of a polysaccharide involves showing the non-reducing end on the left, 362and the reducing end on the right, horizontally. 363Figure~\vref{fig:sacch-polymer} shows a simple way to formalize what a 364saccharidic polymer is. The top formula is the representation of the 365monomer. The bottom formula represents a polysaccharide, with the 366repetitive elements boxed (there are n monomers polymerized). The 367atoms shown in red (outside the boxed repetitive elements) are the 368saccharidic polymer caps. Thus, we see clearly that in the case of 369polysaccharides, the left cap is a proton and the right cap is a 370hydroxyl\index{saccharide!left/right~caps}. This anecdotically happens 371to be identical to proteins and the exact converse of what we 372described previously for nucleic acids. 373 374 375 376\begin{figure} 377 \begin{center} 378 \includegraphics[width=0.8\textwidth]{figures/sacchbond-formation.png} 379 \end{center} 380 \caption[Osidic bond formation]{\textbf{Osidic bond formation by 381 condensation.} The two monomers are subject to condensation with 382 loss of one molecule of water.} 383 \label{fig:sacchbond-formation} 384\end{figure} 385 386\begin{figure} 387 \begin{center} 388 \includegraphics[width=0.4\textwidth]{figures/sacch-polymer.png} 389 \end{center} 390 \caption[A saccharidic polymer is a capped osidic residue 391 chain]{\textbf{End capping chemistry of the polysaccharidic 392 polymer.} A polysaccharide is made of a chain of osidic residues 393 (blue-boxed formula) and of two caps (red-colored atoms). The left 394 cap is the proton group that belongs to the non-reducing end of 395 the polymer. The right cap is the hydroxyl group that belongs to 396 the reducing end of the polymer.} 397 \label{fig:sacch-polymer} 398\end{figure} 399 400Now comes the question of unambiguously defining the structure of a 401saccharidic polymer. It is commonly accepted that the simple ordered 402sequence of the named monoses in the saccharidic polymer, from left 403(non-reducing end) to right (reducing end), constitutes an unambiguous 404description of the glycan sequence. To enunciate the sequence of a 405glycan, one would use a symbology like this: 406 407\begin{mynoindent} 408 {\footnotesize using a 3-letter code:}\\ 409 Ara Gal Xyl Glc Hep Man Fru\\ 410 Arabinose is thus the monose 1 and Fructose is the last monose (n = 411 7). 412\end{mynoindent} 413 414\bigskip 415 416\noindent Incidentally, this is where the ability of \mXp\ to handle monomer 417codes of non-limited length comes in handy! 418 419\renewcommand{\sectitle}{To Sum Up} 420\section*{\textcolor{sectioningcolor}{\sectitle}} 421\addcontentsline{toc}{section}{\numberline{}\sectitle} 422 423We made a rapid overview of the three major polymers in the living 424world. A great many other polymers exist around us. 425Table~\vref{tab:three-biopolym-exples} tries to sum up all the 426informations gathered so far. Note that the formul{\ae} given for the 427monomers are the ``residual'' ones. For example, the formula of the 428glycyl residue corresponds to the formula of the Glycine monomer less 429one molecule of water. Many synthetic polymers are much simpler than 430the ones we have rapidly reviewed, and it should be clear that, if 431\mXp\ can deal with the complex biopolymers described so far, it 432certainly will be very proficient with less complex synthetic 433polymers. Describing the formation of polymers is one thing, but we 434also have to describe how to disrupt polymers. This is what we shall 435do in the next section. 436 437 438\begin{table} 439 \begin{small} 440 \begin{tabular}{c|ccccc}\hline 441 polymer & name & code & formula & left cap & right cap \\ 442 \hline 443 protein & & & & H & OH \\ 444 & Glycine & G & $\mathrm{C_2H_3O_1N_1}$ & & \\ 445 & Alanine & A & $\mathrm{C_3H_5O_1N_1}$ & & \\ 446 & Tyrosine & T & $\mathrm{C_9H_9O_2N_1}$ & & \\ 447 nucleic acid& & & & OH & H \\ 448 & Adenine & A & $\mathrm{C_{10}H_{12}O_5N_5P_1}$ & & \\ 449 & Cytosine & C & $\mathrm{C_9H_{12}O_6N_3P_1}$ & & \\ 450 saccharide & & & & H & OH \\ 451 & Arabinose & Ara & $\mathrm{C_5H_8O_4}$ & & \\ 452 & Heptose & Hep & $\mathrm{C_7H_{12}O_8}$ & & \\ 453 \hline 454 \multicolumn{6}{c}{Note: LC=left cap; RC= right cap}\\ 455 \hline 456 \end{tabular} 457 \caption[Comparison of three common biopolymers]{\textbf{Quick comparison of three biopolymers with examples of monomers}}\label{tab:three-biopolym-exples} 458 \end{small} 459\end{table} 460 461 462\renewcommand{\sectitle}{Polymer Chain Disrupting Chemistry} 463\section*{\textcolor{sectioningcolor}{\sectitle}} 464\addcontentsline{toc}{section}{\numberline{}\sectitle} 465 466\label{sect:pol-chain-disrupt-chem} 467 468The ``polymer chain disrupting chemistry'' was mentioned earlier as a 469complex subject that was of \emph{enormous} importance to the mass 470spectrometrist. This is why that subject will be treated in a pretty 471thorough manner. First of all it should be noted that a chemical 472modification of a polymer does not necessarily involve the 473perturbation of the chain structure of the polymer. Here, however, we 474are concerned specifically with a number of chemical modifications 475that yield a polymer chain perturbation; \emph{cleavage} and 476\emph{fragmentation}: 477\smallskip 478 479\begin{mynoindent} 480 \textsc{A cleavage is a chemical process}\index{cleavage} by which a 481 cleaving agent will act directly on the polymer chain making it fall 482 into at least two separated pieces (the \emph{oligomers}). As a 483 result of the cleavage reaction, groups originating in the cleaving 484 molecule remain attached to the polymer at the precise cleavage 485 location; 486 487\smallskip 488 489\textsc{A fragmentation is a chemical process}\index{fragmentation} by 490which the polymer structure is disrupted into separated pieces (the 491\emph{fragments}) mainly because of energy-dependent electron doublet 492rearrangements leading to bond breakage. 493\end{mynoindent} 494 495 496 497\renewcommand{\sectitle}{Polymer Cleavage} 498\subsection*{\textcolor{sectioningcolor}{\sectitle}} 499\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 500\label{sect:polymer-cleavage} 501\index{cleavage} 502 503We said above that, upon cleavage of a polymer, the cleaving molecule 504reacts with it, and by doing so directly or indirectly 505``\emph{dissolves}'' an inter-monomer bond. A polymer cleavage always 506occurs in such a way as to generate a set of \emph{true} polymers 507(smaller in size than the parent polymer, evidently, which is why they 508are called \emph{oligomers}). Indeed, let us take the example shown in 509Figure~\ref{fig:prot-cleavage}, where a tripeptide (a very little 510protein, containing a methionyl residue at position 2) is submitted 511either to a water-mediated cleavage (hydrolysis, upper panel) or to a 512cyanogen bromide-mediated cleavage (lower panel). The two cases 513presented in this figure are similar in some respects and different in 514others: 515 516\begin{itemize} 517 518\item In the first case the molecule that is responsible for the 519 cleavage is water, while in the second case it is cyanogen bromide; 520 521\item In both cases the bond that is cleaved is the inter-monomer bond 522 (in protein chemistry this is a peptidic bond); 523 524\item In both cases the Oligomer 2 has the same structure; 525 526\item The structures of the Oligomer 1 species differ when produced 527 using water or cyanogen bromide as the cleaving molecule. 528 529\end{itemize} 530% 531% 532The difference between hydrolysis and cyanogen bromide cleavage is in 533the generation of the Oligomer 1 species: the cyanogen bromide 534cleavage has a side effect of generating a homoserine as the right end 535monomer of Oligomer 1, while hydrolysis generates a genuine methionine 536monomer. This is because water reverses in a very symmetrical manner 537what polymerization did (hydrolysis is the converse of condensation), 538while cyanogen bromide\index{cyanogen~bromide} did some chemical 539modification onto the generated Oligomer 1 species. 540 541\begin{figure} 542 \begin{center} 543 \includegraphics[width=0.8\textwidth]{figures/prot-cleavage.png} 544 \end{center} 545 \caption[Protein cleavage by water and cyanogen 546 bromide]{\textbf{Protein cleavage by water and cyanogen bromide.} A 547 tripeptide is cleaved at position 1 either by hydrolysis (top) or 548 by cyanogen bromide (bottom). Cyanogen bromide cleaves 549 specifically on the right of a methionine monomer. Upon cleavage, 550 the methionyl monomer gets converted into homoserine by the 551 cyanogen bromide reagent.} 552 \label{fig:prot-cleavage} 553\end{figure} 554 555Nonetheless, the reader might have noted that---interestingly---all 556the four oligomers do effectively have their left cap (a proton) and 557their right cap (the hydroxyl). This means that in both water- and 558cyanogen bromide-mediated cleavages, all the generated oligomers are 559indeed true polymers in the sense that: 1) they are a chain of 560monomers (modified or not) and 2) they are correctly capped 561(\textit{i.e.} they are polymers in their finished state). This is 562important because it is the basis on which we shall make the 563difference between a cleavage process and a fragmentation process. 564Thus, the \mXp\ definition of an oligomer might be: \emph{an oligomer 565 is a polymer (of at least one monomer) in its finished state that 566 was generated upon cleavage of a longer polymer}. 567 568 569When the polymer cleavage reaction precisely reverses the reaction 570that was performed for the same polymer's synthesis, there is no 571special difficulty. But when the cleavage reaction modifies the 572substrate, then this should be carefully modelled. How? To answer this 573question we might start by comparing the two different Oligomer 1 574species that were yielded upon the water-mediated and the cyanogen 575bromide-mediated cleavage reactions: ``the hydrolysis-generated 576Oligomer 1 is equal to the cyanogen bromide-generated Oligomer 1 +S1 577+C1 +H2 -O1''; this is a big difference! The observations we did so 578far might be worded this way: \textsl{Whenever a protein undergoes a 579 cyanogen bromide-mediated cleavage, the ``-C1H2S1+O1'' chemical 580 reaction should be applied to the resulting oligomers \textit{if and 581 only if} they have a methionine monomer at their right end}. In 582\mXp's jargon, this logical condition is called a \emph{cleavage rule} 583(described later; see page~\pageref{sect:cleavespecif}). 584 585Well, all this sounds reasonable. But what about the ``normal'' case, 586when the cleavage is done using water? Nothing special: the mass of 587the oligomer is calculated by summing the mass of each monomer in the 588oligomer (since the monomers are not modified, this is easily done) 589and the masses corresponding to the left and right caps (these are 590defined in the polymer chemistry definition; in our present case it 591would be a proton on the left end, and a hydroxyl on the right end). 592In this way, the oligomer complies with its definition, which states 593that it is a faithful polymer made of monomers and that it is in its 594finished state. 595 596Yes, but then how will \mXp\ manage to calculate the mass of the 597modified oligomer, like our Oligomer 1 in the case of the cyanogen 598bromide-mediated cleavage? Simple enough: in a first step it does 599exactly the same way as for the unmodified oligomer. Next, each 600oligomer is checked for presence or absence of a methionine residue on 601its right end. If a methionine is found, the mass corresponding to the 602``-C1H2S1+O1'' chemical reaction is applied. And that's it. 603 604In the previous cyanogen bromide example, the logical condition was 605involving the identity of the oligomers' right end monomer, but other 606examples can involve not the right end monomer, but the left end 607monomer, if some chemical modification was to occur to the monomer 608sitting right of the cleavage location. In this case the user would 609have to analyse the situation and provide \mXp\ with the proper 610chemical reaction by stating something analog to: \textsl{\textit{if 611 and only if} they have a Xyz monomer at their left end}. This 612introduction to polymer cleavage abstraction should be enough to later 613delve into the cleavage specification definition as \mXp\ conceives it 614and that is thoroughly detailed at page~\pageref{sect:cleavespecif}. 615 616 617\renewcommand{\sectitle}{Polymer Fragmentation} 618\subsection*{\textcolor{sectioningcolor}{\sectitle}} 619\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 620\label{sect:polymer-fragmentation} 621\index{fragmentation} 622 623In a fragmentation process, the bond that is broken is not necessarily 624the inter-monomer bond. Indeed, fragmentations are oft-times high 625energy chemical processes that can affect bonds that belong to the 626monomers' internal structure. This is one of the reasons why 627fragmentations do differ from cleavages: they are specific of the 628polymer type in which they occur. Hydrolyzing a protein and an 629oligosaccharide is just the same process, from a chemical point of 630view. But fragmenting a protein or an oligosaccharide are truly 631different processes because the way that the fragmentation happens in 632the polymer sequence is so much dependent on the nature of each 633monomer that makes it. 634 635Another peculiarity of the fragmentations, compared with the cleavages 636that were described above, is the fact that there is no cleaving 637molecule starting the process. Instead, a fragmentation process is 638often initiated by an intra molecular electron doublet rearragement 639that propagates more or less in the polymer structure to eventually 640break it. Fragmentations are mainly a gas phase process, not some 641reaction that happens in solution as a result of putting in contact 642the polymer and some reagent. It is precisely because no cleaving 643molecule is involved in the fragmentation process that the fragments 644are not necessarily capped like a normal polymer should be; and this 645is another really important difference between cleavage and 646fragmentation. The following examples should illustrate these 647concepts: protein and nucleic acid fragmentation. 648 649 650\subsubsection*{Protein Fragmentation} 651\index{fragmentation!protein} 652 653There is a pretty important number of different kinds of fragments 654that can be generated upon fragmentation of peptides. We are going to 655detail the most common ones; the user is invited to use the \mXp' 656fragmentation-specification grammar to add less frequent (or newly 657discovered) fragmentation types. Note that the fragmentation schemes 658below apply to positively-charged precursor ions. To compute the 659product ions' masses obtained in negative mode fragmentation 660experiments, then, simply remove as many protons as required. For 661example, to switch from a fragment positively charged once (+H), then 662remove a first proton to go back to the uncharged state and then 663remove another proton to yield the deprotonated (thus singly 664negatively charged) ion product. The requirement to be able to 665computed masses for the positively- and negatively-charged ion 666products imposes a specific way to defined fragmentation 667specifications in the \xpd{} module (to be detailed later in this 668manual). 669 670\begin{figure} 671 \begin{center} 672 \includegraphics[scale=0.2]{figures/prot-fragmentation.png} 673 \end{center} 674 \caption[Protein fragmentation]{\textbf{Protein fragmentation 675 patterns most widely encountered.} An hexapeptide is fragmented 676 in the seven most widely encountered manners, such as to generate 677 a, b, c, x, y, z and immonium fragment ions. The figure 678 illustrates the position of the cleavage for each kind of fragment 679 (exemplified using the case of the smallest fragment possible) and 680 the mass calculation method is described for each fragment kind; 681 consider that each fragment bears only \emph{one positive} 682 charge.} 683 \label{fig:prot-fragmentation} 684\end{figure} 685 686As can be seen from Figure~\ref{fig:prot-fragmentation}, the 687fragmentations do generate fragments of three categories: the ones 688that include the left end of the precursor polymer (a, b, c), the ones 689that include the right end of the precursor polymer (x, y, z), and 690finally the special case in which the fragment is an \emph{internal 691 fragment}, like the immonium ions. When looking at the 692fragmentations described in the figure it becomes immediately clear 693why a fragmentation cannot be mistaken for a cleavage: the ionization 694of the fragment is not necessarily due to the captation of a proton by 695the fragment. Furthermore, we can also see that a fragmentation is not 696a cleavage because the fragment that is generated is \emph{absolutely} 697not necessarily what we call a polymer, in the sense that the fragment 698might not be capped the same way as the precursor polymer is (that is, 699the fragment is not in its finished polymerizaton state). 700 701The two observations above should make clear to the reader that 702calculating masses for fragments is a more difficult process than what 703was described above for the oligomers. Indeed, while it was simple to 704calculate the mass of an oligomer (by simply adding the masses of its 705constitutive monomer units, plus the left and right caps, plus 706ionization), here there is no chemical formalism generally applicable 707to all the fragment types. This is why the specification of the 708fragmentation is left to the user's responsibility. 709 710By looking at Figure~\ref{fig:prot-fragmentation}, the reader should 711have noticed that the fragment naming scheme takes into consideration 712the fact that the fragment bears the left or the right end of the 713precursor polymer (or none, also). Indeed, the numbering of fragments 714holding the left end of the precursor polymer sequence begins at the 715left end, and for fragments that hold the right end, at the right end. 716Thus the third fragment of series \emph{a}---\emph{a3}---would involve 717monomers [1$\rightarrow$3]; and the third fragment of series 718\emph{y}---\emph{y3}---would involve monomers [6$\rightarrow$4] (in 719the figure, these left-to-right and right-to-left directions are 720symbolized using arrows). Therefore, it should appear to the reader 721how important---when specifying a fragmentation---it is to clearly 722indicate from which end of the precursor polymer the fragment is 723generated (in \mXp's jargon this is ``LE'' for left end, ``RE'' for 724right end and ``NE'' for no end). \mXp\ knows what action it should 725take when it encounters one of these three specifications; for 726example, if a ``LE'' specification is found for a given fragmentation 727specification, \mXp\ adds to the fragment's mass the mass 728corresponding to the left cap of the precursor polymer. 729 730 731\paragraph{\emph{a} fragment series} If we take the \emph{a} fragment 732series, the Figure~\ref{fig:prot-fragmentation} indicates that the 733fragments include the left end and that their last monomer lacks its 734carbonyl group (see, on top of Figure~\ref{fig:prot-fragmentation}, 735that the \emph{a1} arrow goes between the C$\alpha$H and the CO of 736monomer 1?). So we would say that each fragment of the \emph{a} 737series should be challenged with the following chemical treatments: 1) 738addition of the mass corresponding to the left cap (proton), 2) 739removal of the mass corresponding to the lacking CO group. This way we 740have the mass of fragment \emph{a1}. If we were interested in the 741fragment \emph{a4} we would have summed the masses of monomers 1 to 4, 742added the mass of the left cap, and finally removed the mass of a CO. 743The mass calculation is thus mathematically expressed \[a_i = LC + 744\sum_{1}^{i} M_i - CO\] 745 746\paragraph{\emph{b} fragment series} Similarly, the mass calculation 747is mathematically expressed \[b_i = LC + \sum_{1}^{i} M_i\] 748 749\paragraph{\emph{c} fragment series} The mass calculation is 750mathematically expressed \[c_i = LC + \sum_{1}^{i} M_i + NH_3\] 751 752\paragraph{\emph{x} fragment series} For this series of fragments we 753do not add the left cap anymore, but replace it with the right cap, 754since the fragments hold the right end of the precursor polymer. Note 755also that the numbering of the monomers using the variable \emph{i} in 756the following mathematical expressions goes from right to left 757(contrary to what happened for the \emph{a, b, c} fragment series. All 758the fragments that hold the precursor polymer right end are numbered 759this way, so this applies to fragments \emph{x, y, z}. The mass 760calculation is mathematically expressed \[x_i = RC + \sum_{1}^{i} M_i 761+ CO\] 762 763\paragraph{\emph{y} fragment series} The calculation is mathematically 764expressed \[y_i = RC + \sum_{1}^{i} M_i + H_2\] 765 766\paragraph{\emph{z} fragment series} In low energy CID, the \emph{z} 767fragments are expressed this way: \[z_i = RC + \sum_{1}^{i} M_i - NH\] 768which is equivalent to \emph{y-$NH_3$}; in high energy CID an 769additional proton is often measured: \[z_i = RC + \sum_{1}^{i} M_i - 770NH + H\] 771 772\paragraph{\emph{immonium} fragment series} These fragments are 773internal fragments in the sense that they do not hold neither of the 774two precursor polymer's ends. \mXp\ understands that the user is 775speaking of this kind of fragment when the ``from which end'' piece of 776data --in the fragmentation specification-- states ``NE'' instead of 777``LE'' or ``RE'' (see page~\pageref{sect:fragspecif}). The mass 778calculation for these fragments does not take into account the 779monomers surrounding the one for which the calculation is done. The 780mass for an immonium ion --at position \emph{i} in the precursor 781polymer-- will be the mass of the monomer at position \emph{i}, less 782the mass of a CO, plus the mass of a proton. The mass calculation for 783these special internal fragments is expressed \[imm_i = M_i + H - CO\] 784 785 786\subsubsection*{Nucleic Acid Fragmentation} 787\index{fragmentation!nucleic~acid} 788 789The fragmentations that can be obtained with nucleic acids are 790numerous and it is more complicated than with proteins to describe 791them fully. The main reason for this is that there are a big number 792of fragmentation combinations because of the loss of nitrogenous bases 793from the skeleton. The mechanisms by which this loss happens are 794fairly complex, and I am not going to detail any of them. Figure 795~\vref{fig:dna-fragmentation} shows the most common fragmentations 796(without taking into consideration the potential loss of bases). An 797example of fragment is given for each fragment series (pretty the same 798way as we did before for proteins). Note that the fragment 799representations are aimed at helping the reader to figure out what the 800product ion is, not taking into account where the negative charge lies 801on the fragment, since this charge can float around at every 802de-protonatable group. All the fragments shown bear one and one only 803negative charge. 804 805Another remark pertaining to the ionization mode of the ion products: 806the fragmentation schemes below apply to negatively-charged precursor 807ions (by loss of a proton, typically). To compute the product ions' 808masses obtained in positive mode fragmentation experiments, then, 809simply add as many protons (or any other cationic ionization agent) as 810required. For example, to switch from a fragment negatively charged 811once (-H), then add a first proton to go back to the uncharged state 812and then add another proton to yield the monoprotonated (thus singly 813positively charged) ion product. The requirement to be able to 814computed masses for the positively- and negatively-charged ion 815products imposes a specific way to defined fragmentation 816specifications in the \xpd{} module (to be detailed later in this 817manual). 818 819The reader might have noticed at the bottom of Figure 820~\vref{fig:dna-fragmentation} that a provision is made in the case the 821fragmented molecular species are not 5' end-phosphorylated but 5' 822end-hydroxylated. Indeed, the canonical monomer is such that, upon 823polymerization and left capping, the 5' end is phosphorylated. 824However, oft-times the oligonucleotides are synthesized chemically 825without the 5' end phosphate group, thus ending in hydroxyl. This 826special case should be accounted for by applying to all the fragments 827that bear the left end of the precursor polymer the following chemical 828reaction: $\mathrm{-HPO_3}$. This chemical reaction should be applied 829\emph{in addition} to the chemical reaction that yields the fragment 830\emph{per se}. 831 832\begin{figure} 833 \begin{center} 834 \includegraphics[scale=0.2]{figures/dna-fragmentation.png} 835 \end{center} 836 \caption[DNA fragmentation]{\textbf{DNA fragmentation patterns most 837 widely encountered.} A short DNA sequence is fragmented in the 838 eight most widely encountered manners, such as to generate a, b, 839 c, d, w, x, y, z fragment ions. The figure illustrates the 840 position of the cleavage for each kind of fragment (exemplified 841 using the case of the smallest fragment possible). and the mass 842 calculation method is described for each fragment kind; 843 considering that each fragment is protonated only once (+1).} 844 \label{fig:dna-fragmentation} 845\end{figure} 846 847Exactly as done earlier for the protein fragments, the mathematical 848expressions used to calculate the mass of different series of nucleic 849acid fragments are provided; in these calculations it is assumed that 850the left end of the precursor polymer is phosphorylated (5'P) and the 851reader should bear in mind that this precise phosphate might itself be 852expelled by the fragmentation. The fragment naming schemed detailed 853earlier for proteins applies to nucleic acids in the very same manner. 854 855\paragraph{\emph{a} fragment series} 856These fragments most often appear with base loss. \[a_i = LC + 857\sum_{1}^{i} M_i - O\] 858 859\paragraph{\emph{b} fragment series} 860\[b_i = LC + \sum_{1}^{i} M_i\] 861 862\paragraph{\emph{c} fragment series} 863\[c_i = LC + \sum_{1}^{i} M_i - HPO_2\] 864 865\paragraph{\emph{d} fragment series} 866\[d_i = LC + \sum_{1}^{i} M_i - HPO_3\] 867 868\paragraph{\emph{w} fragment series} 869\[w_i = RC + \sum_{1}^{i} M_i + O\] 870 871\paragraph{\emph{x} fragment series} 872\[x_i = RC + \sum_{1}^{i} M_i\] 873 874\paragraph{\emph{y} fragment series} 875\[y_i = RC + \sum_{1}^{i} M_i - HPO_2\] 876 877\paragraph{\emph{z} fragment series} 878\[z_i = RC + \sum_{1}^{i} M_i - HPO_3\] 879 880There are also a variety of fragments for which a base is lost. 881 882 883\subsubsection*{More Complex Patterns Of Fragmentation} 884 885 886Before finishing with fragmentations, it is necessary to describe a 887powerful feature of the fragmentation specification grammar available 888in \mXp. This feature was required for the fragmentation of 889oligosaccharides and also sometimes for proteins. When the 890fragmentation (the bond breakage reaction itself) occurs at the level 891of certain monomers, it might be necessary to be able to specify some 892particular chemistry that would arise on the monomer in question. 893 894We have seen in the cleavage documentation that, upon cleavage of a 895protein sequence with cyanogen bromide, for example, a particular 896chemical reaction had to be applied to the oligomers that were 897generated with a methionine monomer as their right end monomer. Well, 898in a fragmentation specification it is possible to apply comparable 899chemical reactions but in a more thorough manner. Indeed, while in the 900cleavage it was possible to say something like ``\textsl{apply a given 901 chemical reaction to the oligomer if the right end monomer is 902 Xyz''}, in the fragmentation the logical condition can be bound not 903only to the identity of the currently fragmented monomer, but also 904(optionally) to the identity of the previous and/or next monomer in 905the precursor polymer sequence. For example: ---\textsl{``Apply a 906 given chemical reaction if fragmentation occurs at the level of 907 ``Xyz'' monomer only if it is preceded by a ``Yxz'' monomer and 908 followed by a ``Zyx'' monomer''}. 909 910These logical conditions are called \emph{fragmentation rules}. A 911\emph{fragmentation specification} can hold as many rules as 912necessary. All of this is described in great detail at 913page~\pageref{sect:fragspecif}. 914 915\subsubsection*{To Sum Up} 916 917 918To sum up all what we have seen so far with polymer chain disrupting 919chemistries: 920 921\begin{itemize} 922\item A polymer sequence gets cleaved into oligomers when a chemical 923 reaction occurs in it at the level of one or more inter-monomer 924 bond(s); monomer-specific chemical reactions can be modelled into 925 the cleavage specification using at most one leftrighrule; 926\item A polymer sequence gets fragmented into fragments when a bond 927 breakage occurs, without the help of any exterior molecule, at any 928 level of the polymer structure, with no limitation to the 929 inter-monomer bond; monomer-specific chemical reactions can be 930 modelled into the fragmentation specification using any number of 931 fragrules; 932\item Oligomers are automatically capped---\emph{on both ends}---using 933 the rules described in the precursor polymer's definition; 934\item Fragments are capped automatically only---\emph{on the end they 935 hold, if any}---using the rules described in the precursor 936 polymer's definition; 937\item Oligomers are automatically ionized (if required by the user) 938 using the rules described in the precursor polymer's definition; 939\item Fragments are never ionized automatically; ionization (gain/loss 940 of a charged group) is necessarily integrated in the fragmentation 941 specification. 942\end{itemize} 943 944 945\cleardoublepage 946 947 948%%% Local Variables: 949%%% mode: latex 950%%% TeX-master: "massxpert" 951%%% End: 952