1\chapter[\xpe] {\xpe: \\A Powerful Editor and Simulation Center} 2\label{chap:xpertedit} 3\index{\xpe|(} 4 5After having completed this chapter you will be able to perform 6sophisticated polymer chemistry simulations on polymer 7sequences---that can be edited in place---along with automatic mass 8recalculations. 9 10\renewcommand{\sectitle}{\xpe\ Invocation} 11\section*{\textcolor{sectioningcolor}{\sectitle}} 12\addcontentsline{toc}{section}{\numberline{}\sectitle} 13\index{\xpe!module~invocation} 14 15The \xpe\ module is easily called by pulling down the \guimenu{\xpe} 16menu item from the \mXp\ program's menu. The user may start the \xpe\ 17module by: 18\smallskip 19 20\begin{itemize} 21 22\item Opening a sample polymer sequence; 23 24\item Creating a new polymer sequence; 25 26\item Loading a polymer sequence from disk. 27 28\end{itemize} 29 30 31\renewcommand{\sectitle}{\xpe\ Operation: \textit{In Medias Res}} 32\section*{\textcolor{sectioningcolor}{\sectitle}} 33\addcontentsline{toc}{section}{\numberline{}\sectitle} 34\index{\xpe!open~sequence} 35 36The first manner to start an \xpe\ session is by opening a sample 37sequence out of the list of sequences that were shipped along with 38\mXp. The \guimenu{\xpe}\guimenuitem{Open Sample Sequence} menu item 39opens the dialog box shown in 40Figure~\vref{fig:xpertedit-select-sample-sequence}. The drop-down 41widget in this dialog window lists all the polymer sequence files that 42were shipped along with \mXp. Simply select one item and click 43\guilabel{OK}. To select another polymer sequence file, click 44\guilabel{Cancel}, which will trigger the system's file selection 45dialog to open for you to browse to the location where the polymer 46sequence file is stored. The process is identical to the normal 47polymer sequence file opening (see below). 48 49\begin{figure} 50 \begin{center} 51 \includegraphics[width=0.75\textwidth] 52 {figures/xpertedit-select-sample-sequence.png} 53 \end{center} 54 \caption[Selection of a sample polymer sequence]{\textbf{Selection 55 of a sample polymer sequence.} \mXp\ ships with a number of 56 sample polymer sequences which are designed to allow easy 57 demonstration of the \xpe\ features. This selection dialog lists 58 all the polymer sequence files that were shipped along with \mXp.} 59 \label{fig:xpertedit-select-sample-sequence} 60\end{figure} 61 62 63The second way to start an \xpe\ session is by creating a new polymer 64sequence\index{\xpe!create~sequence} (\guimenu{\xpe}\guimenuitem{New 65 Sequence} menu). The program immediately asks to select a polymer 66chemistry definition, as shown in 67Figure~\ref{fig:xpertedit-choose-pol-chem-def}. The drop-down widget 68lists all the polymer chemistry definitions currently registered on 69the system. If the polymer chemistry definition is not listed, 70clicking onto \guilabel{Cancel} will let the user browse the disk in 71search for a polymer chemistry definition file.\footnote{Note that 72 once the sequence is saved, the polymer chemistry definition file 73 \emph{must} be registered or the sequence file will not be 74 loadable. This is described in a later chapter.} Once the polymer 75chemistry definition has been selected and successfully parsed by the 76program, the user is presented with an empty sequence editor. 77 78The third way to start an \xpe\ session is by opening an existing 79polymer sequence file. Once the sequence file has been opened, the user is 80presented with a sequence editor as represented in 81Figure~\ref{fig:xpertedit-protein-main-view}. At this point, when the 82user starts editing a sequence, the characters entered at the 83keyboard, or pasted from the clipboard, will be interpreted using the 84polymer chemistry definition that was selected in the initialization 85window described above. 86 87 88\begin{figure} 89 \begin{center} 90 \includegraphics[width=0.75\textwidth] 91 {figures/xpertedit-choose-pol-chem-def.png} 92 \end{center} 93 \caption[Selection of the polymer chemistry 94 definition]{\textbf{Selection of the polymer chemistry definition.} 95 When creating a new polymer sequence, it is necessary to first 96 indicate of what polymer chemistry definition the polymer sequence 97 will be. This window lists all the polymer chemistry definition 98 currently available on the system.} 99 \label{fig:xpertedit-choose-pol-chem-def} 100\end{figure} 101 102 103\begin{figure} 104 \begin{center} 105 \includegraphics[width=0.8\textwidth] 106 {figures/xpertedit-protein-main-view.png} 107 \end{center} 108 \caption[The \xpe\ module]{\textbf{The \xpe\ module.} This figure shows 109 a polymer sequence displayed in an {\xpe}or window.} 110 \label{fig:xpertedit-protein-main-view} 111\end{figure} 112 113 114 115Now, of course, editing a polymer sequence is not enough for a mass 116spec\-trome\-tric-ori\-ented software suite; what we want is 117\emph{compute masses!}\index{\xpe!mass~calculation} The mass 118calculation process is immediately visible on the right hand side of 119the sequence editor shown in 120Figure~\ref{fig:xpertedit-protein-main-view}. The \guilabel{Masses} 121frame~box widget contains two items: \smallskip 122 123\begin{itemize} 124 125\item \guilabel{Whole 126 Sequence}\index{\xpe!mass~calculation!whole~sequence} A frame~box 127 widget displaying the \guilabel{Mono} and \guilabel{Avg} masses of 128 the whole polymer sequence, irrespective of the current selection; 129 130\item \guilabel{Selected Sequence}\index{\xpe!mass~calculation!selected~region} 131 A frame~box widget displaying the \guilabel{Mono} and \guilabel{Avg} 132 masses of the currently selected region of the polymer sequence. 133 134\end{itemize} 135% 136The user may change the mass calculation engine configuration at any 137point in time using the widgets in the \guilabel{Calculation 138 Engine}\index{\xpe!mass~calculation~engine} tool~box that 139contains the following configurable parameters: \smallskip 140 141\begin{itemize} 142 143\item \guilabel{Polymer} 144 145 \begin{itemize} 146 147 \item \guilabel{Left 148 Cap}\index{\xpe!mass~calculation~engine!left~cap} If checked, 149 the left cap of the polymer sequence will be taken into account; 150 151 \item \guilabel{Right 152 Cap}\index{\xpe!mass~calculation~engine!right~cap} If checked, 153 the right cap of the polymer sequence will be taken into 154 account. Note that if \guilabel{Force} is checked also, then the 155 modification is taken into account even when selecting a region of 156 the sequence that does not encompass the left end monomer; 157 158 \item \guilabel{Left 159 Modif}\index{\xpe!mass~calculation~engine!left~modif} If 160 checked, the modification of the polymer sequence's left end will 161 be taken into account. Note that if \guilabel{Force} is checked 162 also, then the modification is taken into account even when 163 selecting a region of the sequence that does not encompass the 164 right end monomer; 165 166 \item \guilabel{Right 167 Modif}\index{\xpe!mass~calculation~engine!right~modif} Same as 168 above, but for the right end modification; 169 170 \end{itemize} 171 172\item \guilabel{Selections and regions} 173 174 \begin{itemize} 175 176 \item 177 \guilabel{Multi-region}\index{\xpe!mass~calculation~engine!multi-region} 178 If checked, the sequence editor allows more than one region to be 179 selected at any given time (no limitation on the number of 180 selected regions; 181 182 \item 183 \guilabel{Multi-selection}\index{\xpe!mass~calculation~engine!multi-selection} 184 If checked, the sequence editor allows not only the selection of 185 multiple regions at any given time, but also the selection of 186 totally or partially overlapping regions. 187 188 \item 189 \guilabel{Oligomers}\index{\xpe!mass~calculation~engine!oligomers} 190 When multiple regions are selected, each selected region behaves 191 like an oligomer, that is, it gets its left and right end caps 192 added (if the corresponding calculation engine configuration item 193 is activated); 194 195 \item \guilabel{Residual 196 chains}\index{\xpe!mass-~alculation~engine!residual~chains} 197 When multiple regions are selected, the different regions behave 198 like residual chains: the left and end caps are added only once 199 (if the corresponding calculation engine configuration item is 200 activated). 201 202 \end{itemize} 203 204\item \guilabel{Monomers} 205 206 \begin{itemize} 207 208 \item 209 \guilabel{Modifications}\index{\xpe!mass~calculation~engine!modifications} 210 If checked, the monomer modifications will be taken into account; 211 212 \item 213 \guilabel{Cross-links}\index{\xpe!mass~calculation~engine!cross-links} 214 If checked, the cross-links in the polymer sequence will be taken 215 into account. Note that \emph{only cross-links fully encompassed 216 by the selected sequence region(s)} will be taken into account 217 for the \guilabel{Selected sequence} mass calculations. If any 218 number of cross-links are not fully encompassed by the currently 219 selected sequence region, then that number is displayed along with 220 the following label visible in the \guilabel{Selected sequence} 221 group box : \guilabel{Incomplete cross-links:}. 222 223 \end{itemize} 224 225\item \guilabel{Ionization}\index{\xpe!mass~calculation~engine!ionization} 226 227 \begin{itemize} 228 229 \item \guivalue{+H} This formula represents the ionization agent 230 formula (that is, a protonation); 231 232 \item \guilabel{Unitary charge} \guivalue{1} Charge brought by the 233 ionization agent. In the example, a protonation brings a positive 234 charge; 235 236 \item \guilabel{Ionization level} \guivalue{1} Level of the 237 ionization requested. In the example, a single ionization is 238 requested, that is a monoprotonation. 239 240 \end{itemize} 241 242\end{itemize} 243% 244When any parameter listed above is changed, the recalculation of the 245masses---for both the \guilabel{Whole sequence} and the 246\guilabel{Selected sequence}---is triggered and the new masses are 247updated in their respective line~edit widgets, described earlier. The 248fact that the user can specify ionization rules should make it clear 249that the values that are displayed are actually \mz ratios (as long as 250one ionization is required). 251 252 253\renewcommand{\sectitle}{The Editor Window Menu} 254\section*{\textcolor{sectioningcolor}{\sectitle}} 255\addcontentsline{toc}{section}{\numberline{}\sectitle} 256\index{\xpe!editor~window} 257 258The menu bar in the polymer sequence editor displays a number of menu 259items, reviewed below: \smallskip 260 261\begin{itemize} 262 263 %%%%%%% FILE 264\item \guimenu{File} (Figure~\ref{fig:xpertedit-file-menu}) 265 266 \begin{itemize} 267 268 \item \guimenu{File}\guimenuitem{Close} Closes the sequence; 269 270 \item \guimenu{File}\guimenuitem{Save} Saves the sequence. If the 271 sequence has no filename yet, the user is invited to select a 272 filename; 273 274 \item \guimenu{File}\guimenuitem{Save As} Save the sequence in a new 275 file; 276 277 \item \guimenu{File}\guimenuitem{Import 278 Raw}\index{\xpe!sequence-editor!sequence~import} Opens a text file 279 and tries to import the sequence. If invalid monomer code 280 characters are found, the user is given a chance to revise the 281 imported sequence; 282 283 \item \guimenu{File}\guimenuitem{Export to 284 Clipboard}\index{\xpe!sequence~editor!sequence~export} Copies the 285 sequence and all the data (masses and calculation options) to the 286 clipboard, in the form of simple text; 287 288 \item \guimenu{File}\guimenuitem{Export to File} Writes to file the 289 sequence and all the data (masses and calculation options) to the 290 clipboard, in the form of simple text (if a filename was already 291 selected, otherwise the user is invited to select a file into 292 which the data are to be written); 293 294 \item \guimenu{File}\guimenuitem{Select export file} Invites the 295 user to select a file into which the data are to be written). 296 297 \end{itemize} 298 299 %%%%%%% EDIT 300\item \guimenu{Edit} 301 302 \begin{itemize} 303 304 \item \guimenu{Edit}\guimenuitem{Copy} Copies the current selected 305 region(s) (if any) to the clipboard. If there are more than one 306 region currently selection, then the user is informed that the 307 copied sequence will correspond to these two sequences joined 308 together. \emph{Be aware, that the order in which the region 309 sequences are joined is the order in which the regions were 310 selected, and not the order in which the sequences appears in 311 the whole polymer sequence}; 312 313 \item \guimenu{Edit}\guimenuitem{Cut} Copies the current selection 314 (if any) to the clipboard and removes it from the sequence. Note 315 that it is not yet possible to cut more than one selected region 316 in one single operation;; 317 318 \item \guimenu{Edit}\guimenuitem{Paste} Pastes the sequence from the 319 clipboard into the sequence at point (that is the current cursor 320 location). If the pasted sequence is found to contain characters 321 not valid for the current polymer chemistry definition, the user 322 is given a chance to revise the pasted sequence. If one sequence 323 region was selected, it is replaced with the pasted sequence. If 324 more than one sequence region was selected, the operation cannot 325 be performed and the user is informed; 326 327 \item \guimenu{Edit}\guimenuitem{Find 328 Sequence}\index{\xpe!sequence~editor!find~sequence~motif} Finds 329 a sequence motif in the polymer sequence. 330 331 \end{itemize} 332 333 %%%%%%% CHEMISTRY 334\item 335 \guimenu{Chemistry}\index{\xpe!sequence~editor!chemical~simulations} 336 (Figure~\ref{fig:xpertedit-chemistry-menu}) 337 338 \begin{itemize} 339 340 \item \guimenu{Chemistry}\guimenuitem{Modify Monomer(s)} Modify (or 341 unmodify) one or more monomers in the polymer sequence; 342 343 \item \guimenu{Chemistry}\guimenuitem{Modify Polymer} Set (or unset) 344 the left (or right, or both) modification of the polymer sequence; 345 346 \item \guimenu{Chemistry}\guimenuitem{Cross-link Monomers} Set 347 cross-links to monomers of the polymer sequence; 348 349 \item \guimenu{Chemistry}\guimenuitem{Cleave} Perform a 350 chemical/enzymatical cleavage of the polymer sequence; 351 352 \item \guimenu{Chemistry}\guimenuitem{Fragment} Perform the gas 353 phase fragmentation of the currently selected oligomer; 354 355 \item \guimenu{Chemistry}\guimenuitem{Mass Search} For any sequence 356 having a mass matching the searched mass; 357 358 \item \guimenu{Chemistry}\guimenuitem{Compute m/z Ratios} Starting 359 from a given \mz ratio and a given ionization status, calculate a 360 range of \mz ratios with a given ionization agent; 361 362 \item \guimenu{Chemistry}\guimenuitem{Determine Compositions} 363 Calculate the monomeric/element composition of the whole polymer 364 sequence or of the current selection; 365 366 \item \guimenu{Chemistry}\guimenuitem{pKa pH pI} Perform acidity, pH 367 and isoelectric point calculations on the whole sequence or on the 368 current selection. 369 370 \end{itemize} 371 372 %%%%%%% OPTIONS 373\item 374 \guimenu{Options}\index{\xpe!sequence~editor!number~display} 375 376 \begin{itemize} 377 378 \item \guimenu{Options}\guimenuitem{Decimal places} Set the number 379 of decimal places to be used to display the numerical values. 380 381 \end{itemize} 382 383 384\end{itemize} 385 386 387\begin{figure} 388 \begin{center} 389 \includegraphics[width=0.33\textwidth] 390 {figures/xpertedit-file-menu.png} 391 \end{center} 392 \caption[The \xpe\ window File menu]{\textbf{The \xpe\ window File 393 menu.} This figure shows the File menu as dropped-down menu 394 in the polymer sequence window.} 395 \label{fig:xpertedit-file-menu} 396\end{figure} 397 398 399\begin{figure} 400 \begin{center} 401 \includegraphics[width=0.33\textwidth] 402 {figures/xpertedit-chemistry-menu.png} 403 \end{center} 404 \caption[The \xpe\ window Chemistry menu]{\textbf{The \xpe\ window 405 Chemistry menu.} This figure shows the Chemistry menu as 406 dropped-down menu in the polymer sequence window.} 407 \label{fig:xpertedit-chemistry-menu} 408\end{figure} 409 410 411 412 413\renewcommand{\sectitle}{Editing Polymer 414 Sequences}\index{\xpe!sequence~editor!sequence~editing} 415\section*{\textcolor{sectioningcolor}{\sectitle}} 416\addcontentsline{toc}{section}{\numberline{}\sectitle} 417 418As described earlier, in the chapter about the \xpd\ module, a polymer 419chemistry definition may allow more than one character to qualify the 420codes of the monomers (see chapter~\ref{chap:xpertdef}, 421section~\vref{sect:monomers}). It was noted also that it is not 422because the number of allowed characters is \cfgval{3}, for example, 423that all the monomer codes of the polymer chemistry definition must be 424defined using three characters: \cfgval{3} is the \emph{maximum} 425number of characters that may be used. 426 427\renewcommand{\sectitle}{Multi-Character Monomer Codes} 428\subsection*{\textcolor{sectioningcolor}{\sectitle}} 429\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 430\index{\xpe!multi-character~monomer~code} 431 432\begin{figure} 433 \begin{center} 434 \includegraphics[scale=0.75] 435 {figures/xpertedit-3-letter-code-whole-process.png} 436 \end{center} 437 \caption[Multi-character code sequence editing in 438 \xpe]{\textbf{Multi-character code sequence editing in \xpe.} This 439 figure shows the process by which it is made possible to edit 440 polymer sequences with a monomer code set that allows more than 441 one character per code.} 442 \label{fig:xpertedit-3-letter-code-whole-process} 443\end{figure} 444 445This section deals with the editing of a polymer sequence for which 446monomer codes can be made of more than one character. 447Figure~\vref{fig:xpertedit-3-letter-code-whole-process} shows the case 448of a polymer sequence for which the polymer chemistry definition 449allows three characters to define monomer codes. The example is based 450on the following real-world situation: the user wants to edit the 451sequence by insertion---at the cursor point---of a new ``Aspartate'' 452monomer, of which the user knows only that its code starts with an 453`A'. The cursor is located after the first ``Ala'' monomer at position 4541 (panel~1st). 455 456After keying-in \kbdKey{A} (panel~1st), no sequence modification is 457visible in the sequence editor. Instead, an `A' character is now 458displayed in the left line~edit widget under the sequence. The reason 459of this apparently odd behaviour is that the polymer chemistry 460definition allows up to 3 characters to describe a monomer code. If no 461monomer vignette is displayed in the polymer sequence, that means that 462more than one monomer code start with an `A' character: \xpe\ cannot 463figure out which monomer code was actually meant by the user when 464keying-in \kbdKey{A}. 465 466There is a way, called \emph{code 467 completion}\index{\xpe!code~completion}, to know which monomer 468code(s)---in the current polymer chemistry definition---do start with 469the keyed-in character(s) (currently, `A'). The user can always enter 470the \emph{code completion mode} by hitting the \kbdKey{ENTER}~key. 471This is what is shown in the panel~1st, right hand side 472\guilabel{Monomer List} listview widget (click on that 473\guilabel{Monomer List} label to show that list if it is not already 474visible). We see that, in the current polymer chemistry definition, 475four monomer codes start with an `A' character, and these are ``Ala'', 476``Arg'', ``Asp'' and ``Asn'' (as highlighted in the code completion 477monomer list). 478 479Because we now know that the code we are to key-in is ``Asp'', we 480key-in a \kbdKey{s}. The result is shown in panel~2nd. What we see 481here is that, this time also, nothing changed in the polymer sequence. 482What changed is that the character string in the left line~edit widget 483below the sequence is now ``As''. Let's key-in once more the 484\kbdKey{ENTER}~key. This time, only two items are highlighted: 485``Asp'' and ``Asn'' in the code completion monomer list (panel~2nd). 486This is easy to understand: there are only two monomer codes that 487start with the two letters `A' and `s' (``As'') that we have keyed-in 488so far. At this time, we key-in a last character: \kbdKey{p}. At this 489point, the monomer is effectively inserted in the polymer sequence, as 490the ``Asp'' monomer left of the cursor, as shown in panel~3rd. 491 492\renewcommand{\sectitle}{Unambiguous Single-/Multi-Character Monomer Codes} 493\subsection*{\textcolor{sectioningcolor}{\sectitle}} 494\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 495 496Let's imagine that we have a polymer chemistry definition that allows 497up to 3 characters for the definition of monomer codes, but that we 498have one of these monomer codes (let's say the one for the 499``Glutamate'' monomer) that is one-letter-long: `E'. This monomer code 500`E' is the only one in the polymer chemistry definition to start with 501an `E' character. In this case, when we key-in \kbdKey{E}, we'll 502observe that the monomer code is immediately validated and that its 503corresponding monomer vignette is also immediately inserted in the 504polymer sequence. This is because, \emph{if there is no ambiguity, 505 \xpe\ will immediately validate the code being edited}. 506 507The mechanism described above means that the user is absolutely free 508to define \emph{only single-character monomer codes} in a polymer 509chemistry definition; the behaviour of the program is thus to behave 510exactly as if the multi-character code feature was inexistent in the 511program: each time a new uppercase letter is keyed-in, it is 512automatically validated and the corresponding monomer is created in 513the sequence. 514 515 516\renewcommand{\sectitle}{Erroneous Monomer Codes} 517\subsection*{\textcolor{sectioningcolor}{\sectitle}} 518\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 519\index{\xpe!monomer~code~errors} 520 521The typing error detection system triggers immediate alerts whenever 522the code beign keyed-in is incorrect. This is described in 523Figure~\vref{fig:xpertedit-sequence-editor-bad-char}. If the user 524enters an uppercase character not matching any monomer code currently 525defined in the polymer chemistry definition, or a lowercase character 526as the first character of a monomer code, the program immediately 527complains in the right line~edit widget below the sequence. In this 528case, the monomer code is not put into the left text widget, which 529means it is simply ignored. 530 531\begin{figure} 532 \begin{center} 533 \includegraphics[width=0.8\textwidth] 534 {figures/xpertedit-sequence-editor-bad-char.png} 535 \end{center} 536 \caption[Bad code character in \xpe\ sequence editor]{\textbf{Bad 537 code character in \xpe\ sequence editor.} This figure shows the 538 feedback that the user is provided by the code editing engine, 539 when a bad character code is keyed-in.} 540 \label{fig:xpertedit-sequence-editor-bad-char} 541\end{figure} 542 543If the user starts keying-in valid monomer character codes, like for 544example we did earlier with ``As'', and that she wants to erase these 545characters because she changed her mind, she \emph{must not} use the 546\kbdKey{BACKSPACE} key, because this key will erase the monomer left 547of the cursor point in the polymer sequence! The way that the user has 548to remove the characters currently displayed in the left line~edit 549widget below the sequence, is to key-in the \kbdKey{Esc} key once for 550each character. For example, let's say you have already keyed-in 551\kbdKey{A} and \kbdKey{s}. In this case the left line~edit widget 552displays these two characters: ``As''. Now, if the user changes his 553mind, not willing to enter ``Asp'' monomer code anymore, but ``Gly'' 554instead, all she has to do is to key-in the \kbdKey{Esc} key once for 555the `s' character (which disappears) and once more to remove the 556remaining `A' character. At this point it is possible to start fresh 557with the ``Gly'' monomer code by keying-in sequentially \kbdKey{G}, 558\kbdKey{l} and finally \kbdKey{y}. 559 560 561\renewcommand{\sectitle}{Simplified Editing} 562\subsection*{\textcolor{sectioningcolor}{\sectitle}} 563\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 564 565When the monomer codes of a given polymer chemistry definition are too 566numerous or too long to remember, one simplified editing strategy is 567by using the list of available monomers located on the right side of 568the sequence editor (widget labelled \guilabel{Monomer list}). The 569items in the list are active: if double-clicked, an item will see its 570corresponding monomer code inserted in the sequence at the current 571cursor location. This list thus makes it easy to ``visually'' edit the 572polymer sequence without having to remember all the codes in the 573polymer chemistry definition. 574 575 576\renewcommand{\sectitle}{Finding sequence motifs} 577\section*{\textcolor{sectioningcolor}{\sectitle}} 578\addcontentsline{toc}{section}{\numberline{}\sectitle} 579\index{\xpe!sequence~editor!find~sequence~motif} 580 581Finding sequence motifs in the polymer sequence is performed by 582selecting the \guimenu{Edit}\guimenuitem{Find Sequence} menu item. The 583dialog window is shown in 584Figure~\vref{fig:xpertedit-find-sequence-dlg}. When performing the 585first search in a polymer sequence, the \guilabel{Find} button should 586be used. This will trigger a search starting at the beginning of the 587polymer sequence. For each successive search, the \guilabel{Next} 588button should be used. 589 590Each searched sequence motif will be stored in a history list that is 591made available by dropping down the combo box widget where the 592sequence motif is entered. The \guilabel{Clear history} button will 593erase all the searched sequence motifs from the history, thus 594resetting it. 595 596\begin{figure} 597 \begin{center} 598 \includegraphics[width=0.5\textwidth] 599 {figures/xpertedit-find-sequence-dlg.png} 600 \end{center} 601 \caption[Finding a sequence motif in the polymer 602 sequence]{\textbf{Finding a sequence motif in the polymer sequence.} 603 The first iteration should be performed by clicking onto the 604 \guilabel{Find} button, and each following iterations should be 605 performed using the \guilabel{Next} button.} 606 \label{fig:xpertedit-find-sequence-dlg} 607\end{figure} 608 609 610\renewcommand{\sectitle}{Importing Sequences} 611\section*{\textcolor{sectioningcolor}{\sectitle}} 612\addcontentsline{toc}{section}{\numberline{}\sectitle} 613\index{\xpe!sequence~editor!sequence~import} 614 615Very often, the user will make a sequence search on the web and be 616provided with a polymer sequence that is crippled with non-code 617characters. That web output might either be saved in a text file for 618future reference or copied to the clipboard for immediate use in \mXp. 619The two cases are reviewed below. 620 621 622\renewcommand{\sectitle}{Importing From The Clipboard} 623\subsection*{\textcolor{sectioningcolor}{\sectitle}} 624\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 625 626\xpe\ provides a convenient way to spot non-valid characters in a text 627and to let the user ``purify'' the imported sequence. A 628clipboard-imported sequence is systematically parsed. When invalid 629characters are found, the window depicted in 630Figure~\vref{fig:xpertedit-sequence-editor-sequence-import-errors-first} 631is presented to the user for her to make appropriate adjustments (in 632this example we tried to copy from clipboard the following sequence: 633``\texttt{!100 ATGCATGC ATGCATGC ATGCATGC ATGCAUGC 634 anotherSilly-Text;}''). 635 636\begin{figure} 637 \begin{center} 638 \includegraphics[width=0.75\textwidth] 639 {figures/xpertedit-sequence-editor-sequence-import-errors-first.png} 640 \end{center} 641 \caption[Clipboard-imported sequence 642 error-checking]{\textbf{Clipboard-imported sequence error-checking.} 643 If a sequence that is imported through the clipboard to the \xpe\ 644 sequence editor contains invalid characters, the user is provided 645 with a facility to ``purify'' the sequence. This facility is 646 provided to the user through the window depicted in this figure.} 647 \label{fig:xpertedit-sequence-editor-sequence-import-errors-first} 648\end{figure} 649 650As soon as a character does not correspond to any valid monomer code, 651it is tagged, and the sequence is presented to the user in a text~edit 652widget (\guilabel{Initial Sequence}) with the all the improper 653characters tagged by underlining. At that point, if the user clicks 654the \guilabel{Remove Tagged From Initial} button, all the tagged 655characters will be automatically removed and the purified sequence 656will show up in the \guilabel{Purified Sequence} text~edit widget. 657 658Also, the user is provided with automatic ``purification'' procedures 659whereby it is possible to remove one or more classes of characters 660from the imported sequence (\guilabel{Purification Options} frame 661widget). Checking one or more of the \guilabel{Numerals} or 662\guilabel{Spaces} or \guilabel{Punctuation} or \guilabel{LowerCase} or 663\guilabel{Uppercase} checkbuttons, or even entering other 664user-specified regular expressions in the \guilabel{Other (RegExp)} 665line~edit widget, will elicit their removal from the imported sequence 666after the user clicks the \guilabel{Purify Initial (Options)} button. 667 668\begin{figure} 669 \begin{center} 670 \includegraphics[width=0.75\textwidth] 671 {figures/xpertedit-sequence-editor-sequence-import-errors-second.png} 672 \end{center} 673 \caption[Clipboard-imported sequence 674 purification]{\textbf{Clipboard-imported sequence purification.} 675 There are a number of ways to purify a sequence. Here the 676 \guilabel{Remove Tagged From Initial} button was clicked. The 677 purified sequence shows up in the \guilabel{Purified Sequence} 678 text~edit widget.} 679 \label{fig:xpertedit-sequence-editor-sequence-import-errors-second} 680\end{figure} 681 682 683When the user is confident that almost all the erroneous characters 684have been removed 685(Figure~\vref{fig:xpertedit-sequence-editor-sequence-import-errors-second}), 686she can click the \guilabel{Test Purified} button, which will trigger 687a ``re-reading'' of the sequence in the \guilabel{Purified Sequence} 688text~edit widget. If erroneous characters are still found, they are 689tagged. 690 691Note that, for maximum flexibility, the user is allowed an immediate 692and direct editing of the purified sequence in the \guilabel{Purified 693 Sequence} text~edit widget (that is, that text~edit widget is 694\emph{not} read-only). 695 696Once the sequence if finally depured from all the invalid characters, 697the user can select it in the text~edit widget and paste it in the 698\xpe\ sequence editor. This time, the paste operation will be 699error-free. Note that if any sequence portion is currently selected, 700it will be replaced by the one that is being pasted into the editor. 701 702 703\renewcommand{\sectitle}{Importing From Raw Text Files} 704\subsection*{\textcolor{sectioningcolor}{\sectitle}} 705\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 706 707 708It might be of interest to be able to import a sequence from a raw 709file. To this end, the user is provided the menu 710\guimenu{File}\guimenuitem{Import Raw} that opens up a file selection 711window from which to choose the file to import. The program then 712iterates in the lines of that file and checks their contents for 713validity. If errors are found, then the same process as described 714earlier for clipboard-imported sequences is started. The user can then 715purify the sequence imported from the file and finally integrate that 716sequence in the polymer sequence currently edited. Note that if any 717sequence portion is currently selected, it will be replaced by the one 718that is being imported. 719 720 721\renewcommand{\sectitle}{Multi-region 722 Selections} 723\section*{\textcolor{sectioningcolor}{\sectitle}} 724\addcontentsline{toc}{section}{\numberline{}\sectitle} 725\index{\xpe!multi-region~selections} 726\index{\xpe!sequence~editor!multi-region~selection} 727 728\mXp\ implements a sophisticated multi-region selection model. Two 729selection modes are available:\\ 730 731\begin{itemize} 732 733\item \emph{Multi-region selection mode:}\/ In this mode, it is 734 possible to select more than one region in the polymer sequence. In 735 all cases below, make sure that the \guilabel{Multi-region} 736 checkbutton is checked in \guilabel{Selections and regions} group 737 box. This is how these selections are performed: 738 739 \begin{itemize} 740 741 \item \textsl{With the 742 mouse:}\index{\xpe!sequence~editor!mouse~selections} 743 Left-click and drag to make the first selection. Go with the mouse 744 cursor at the beginning of new selection, hold the \kbdKey{Ctrl} 745 key down while left-clicking and dragging to perform the second 746 region selection. Continue as may times as necessary; 747 748 \item \textsl{With the 749 keyboard:}\index{\xpe!sequence~editor!keyboard~selections} 750 Position the cursor at the beginning of the first region to be 751 selected, hold the \kbdKey{Ctrl}+\kbdKey{Shift} keys down while 752 moving the cursor with the direction keys (\kbdKey{$\leftarrow$}, 753 \kbdKey{$\rightarrow$}, \kbdKey{$\uparrow$}, 754 \kbdKey{$\downarrow$}). Hold the \kbdKey{Ctrl} key down and use 755 the direction keys to go to the beginning of the new region 756 selection, press the \kbdKey{Shift} key and hold it down while 757 moving the cursor with the direction keys to actually perform the 758 region selection. 759 760 \end{itemize} 761 762\item \emph{Multi-selection region mode:}\/ In this mode (which 763 requires the multi-region selection mode to be enabled), it is 764 possible to perform selections that overlap. For example, one could 765 select the sequence ``MAMISGM'' and then select the sequence 766 ``SGMSGRKAS''. The overlapping sequence is thus ``SGM''. 767 768\end{itemize} 769 770\noindent Being able to select multiple regions and/or to select 771multiple times the same region involves some configurations, as far as 772calculating relevant masses is concerned. Indeed, whatever the 773selection mode that is enabled, each time one selection (overlapping 774with another or not) is added or removed, masses are recalculated for 775the current selection.\footnote{``Selection'', here, is thus used to 776 collectively represent all multi-region selections and 777 multi-selection regions at any given time in the polymer sequence 778 editor.} The way the multi-region selections and the multi-selection 779regions are handled, from the mass calculation standpoint, is 780configured as follows:\\ 781 782\begin{itemize} 783 784\item \emph{Regions are oligomers:} In this configuration, each 785 selection behaves as an oligomer, and thus should normally be capped 786 on both its left and right ends. This is typically the situation 787 when the user wants to simulate the formation of a cross-linked 788 species arising from the cross-linking of two oligomers: each 789 oligomer is capped on both its ends; 790 791\item \emph{Regions are residual chains:} In this configuration, each 792 selection behaves as a residual chain, and thus the oligomer 793 resulting from the multi-region selections is capped on its left and 794 right ends only once. This situation is typically encountered when 795 simulating partial cleavages by first selecting an oligomer, 796 checking its mass and then continuing selection to simulate a longer 797 oligomer resulting from a partial cleavage. Also, the situation 798 might be encountered when there are multiple repeated sequence 799 motifs in a polymer sequence and mass data are difficult to analyze. 800 801\end{itemize} 802 803 804\renewcommand{\sectitle}{Polymer Sequence Modification} 805\section*{\textcolor{sectioningcolor}{\sectitle}} 806\addcontentsline{toc}{section}{\numberline{}\sectitle} 807 808It very much often happens that the (bio)~chemist uses chemical 809reactions to modify the polymer sequence she is working on. Mass 810spectrometry is then often used to check if the reaction proceeded 811properly or not. Further, in nature, chemical modifications of 812biopolymer sequences are very often encountered. For example, protein 813sequences get often modified as a means to regulate their function 814(phophorylations, for example, or acetylations, methylations\dots). 815Nucleic acid sequences are very often and extensively modified with 816modifications such as methylation\dots 817 818It is thus crucial that \mXp\ be able to model with high precision and 819flexibility the various chemical reactions that can be either made in 820the chemistry lab or found in nature. The \mXp\ program provides two 821different chemical modification processes: 822 823\begin{itemize} 824\item A process by which monomers belonging to the polymer sequence 825 can be individually modified; 826\item A process by which the whole polymer sequence can be modified, 827 either on its left end or on its right end or even on both ends. 828\end{itemize} 829 830\renewcommand{\sectitle}{Selected Monomer(s) Modification} 831\subsection*{\textcolor{sectioningcolor}{\sectitle}} 832\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 833\label{subsect:chemical-modification-monomers} 834\index{\xpe!simulations!monomer~modification} 835 836There are a number of manners in which monomers can be modified in a 837polymer sequence. Figure~\vref{fig:xpertedit-modify-monomer} shows the 838simplest manner: the user first selects the monomer vignette to be 839modified and calls the \guimenu{Chemistry}\guimenuitem{Modify 840 Monomer(s)} menu. A window shows up where all the modifications 841currently available in the polymer chemistry definition are listed. 842Because a monomer vignette was initially selected in the editor 843window, the \guilabel{Selected Monomer} target radiobutton is on by 844default.\footnote{Note that if a sequence was selected when the 845 monomer modification task was started, then selecting 846 \guilabel{Current selection} would be required to modify all the 847 monomers in the selection. Alternatively, if this is not what is 848 required, re-selecting the right monomer in the sequence and 849 selecting \guilabel{Current selection} will ensure the modification 850 applies only on the currently selected monomer.} It is then simply 851a matter of choosing the right modification from the 852\guilabel{Available modifications} list and clicking onto the 853\guilabel{Modify} button. The target(s) of a given modification (as 854selected in the \guilabel{Target} frame widget) can be identified 855according to: \smallskip 856 857\begin{figure} 858 \begin{center} 859 \includegraphics[width=1\textwidth] 860 {figures/xpertedit-modify-monomer.png} 861 \end{center} 862 \caption[Modification of a monomer in a polymer 863 sequence]{\textbf{Modification of a monomer in a polymer sequence.} 864 This figure shows how the chemical modification of monomer(s) can 865 be performed.} 866 \label{fig:xpertedit-modify-monomer} 867\end{figure} 868 869\begin{itemize} 870 871\item The \guilabel{Selected Monomer} frame will display data in its 872 two line~edit widgets if a single monomer vignette was selected at 873 the time the monomer modification action was invoked (exactly as in 874 Figure~\vref{fig:xpertedit-modify-monomer}). Only the monomer of 875 which the code and the position are displayed will be modified (even 876 if it is no more selected or if the sequence has changed and the 877 monomer at the displayed position is not the same anymore); 878 879\item The \guilabel{Current Selection} radiobutton widget indicates 880 that the modification should be performed on all the monomers that 881 are \textit{currently} selected, that is, if the selection changed 882 after the modification window was displayed, the new selection is 883 modified, not the old one; 884 885\item The \guilabel{Monomers Of Same Code} If a monomer code is 886 displayed in the \guilabel{Selected Monomer} frame, all the monomers 887 in the sequence that have that code are modified; 888 889\item \guilabel{Monomers From The List} All the monomers in the 890 polymer sequence having a code corresponding to any code selected in 891 the \guilabel{Available Monomers} list are modified; 892 893\item \guilabel{All Monomers} All the monomers of the polymer sequence 894 are modified; 895 896\end{itemize} 897% 898Note that there is one checkbox widget (\guilabel{Override target 899 limitations}) that requires explanation. In the chapter about the 900definition of polymer chemistries (chapter\vref{chap:xpertdef}) the 901definition of modifications was detailed, and the target notion was 902explicited. If, during a monomer modification, \mXp\ detects that the 903user is trying to modify a monomer that is not a target of the 904modification at hand, it will complain, as shown in the 905\guilabel{Messages} text~edit widget of 906Figure~\vref{fig:xpertedit-modify-monomer}). In this example, indeed, 907the user tried to modify monomer \emph{Isoleucine} with 908\emph{Phosphorylation}, which is not possible because modification 909\emph{Phosphorylation} has been defined a not having monomer 910\emph{Isoleucine} as any of its targets. Another situation where 911target limitations might show up, is when trying to modify a monomer 912more than authorized by the \guilabel{Max. count} number of times that 913monomer might be modified at once with that modification. For example, 914when working of methylation of proteins, it might happen that lysyl 915residues get methylated more than one at a time (tri-methylation 916occurs often in histones). If the chemical modification was defined in 917\xpd\ with a max count of 2 and a third chemical modification is asked 918on a given target monomer, then the program refuses to perform the 919modification. To override this limitation, check the 920\guilabel{Override target limitations} checkbox widget. 921 922 923The general concept about this is : the \guilabel{Override target 924 limitations} checkbox widget is unchecked by default so that the 925user does not do mistakes without knowing. However, flexibility is 926desirable, and the \guilabel{Override target limitations} checkbox 927widget can be checked if required. 928 929As a result of the monomer modification, the monomer vignette gets 930modified. Figure~\vref{fig:xpertedit-modify-monomer} shows one 931phosphorylated Seryl residue at position 8: a transparent graphics 932object (a red `P') was overlaid onto the corresponding seryl monomer 933vignette. If the user modifies a monomer with a modification that has 934no corresponding \fileformat{svg} file defined for its graphical 935rendering in file \filename{modification\_dictionary}, then a default 936modification rendering is used. 937 938The user is responsible for correctly reading the messages that might 939be published in the \guilabel{Messages} text~edit widget. It is 940important to understand that, when a monomer is modified, its previous 941modification (if any) is overwritten with the new one. The user is 942invited to experiment a bit with the monomer modification process, so 943as to be confident of the results that she is going to obtain when 944real polymer chemistry work is to be modelled in \mXp. 945 946If the modification to be applied is not readily available in the list 947of modifications defined in the polymer chemistry definition, then it 948is possible, by checking the \guilabel{Define modification} check 949button widget to manually define a modification. This procedure leads 950to the modification of the target monomer(s) exactly as if the 951modification had been selected from the list of available 952modifications. But, because the modification has a name not known to 953the polymer chemistry definition, the editor cannot modify the monomer 954vignette with a predefined transparent raster image. Thus, as seen on 955Figure~\ref{fig:xpertedit-modify-monomer-manually-defined-modif}, the 956modified residue gets visually modified using the default transparent 957raster image (4 interrogation marks, one at each corner of the monomer 958vignette square). 959 960\begin{figure} 961 \begin{center} 962 \includegraphics[width=0.66\textwidth] 963 {figures/xpertedit-modify-monomer-manually-defined-modif.png} 964 \end{center} 965 \caption[Rendering of a monomer modification in a polymer 966 sequence]{\textbf{Rendering of a monomer modification in a polymer 967 sequence.} This figure shows how the chemical modification of 968 monomer(s) is graphically rendered. The `K' residue is modified 969 using an ``Acetylation'' modification. The `S' residue is modified 970 with a modification that has no associated graphical vignette. The 971 default vignette is thus used.} 972 \label{fig:xpertedit-modify-monomer-manually-defined-modif} 973\end{figure} 974 975It is perfectly feasible to modify a single monomer more than once 976(with the same modification or not ; for example a tri-methylation 977with a methylation modification). This is why when the window depicted 978in Figure~\ref{fig:xpertedit-modify-monomer} shows up, the two lists 979at the right hand side show the monomers currently modified and the 980modification(s) that are currently set to these modified 981monomers. Selecting one item from the \guilabel{Modified monomers} 982list will show only the modifications set to that monomer in the 983\guilabel{Modifications} list. If all the modifications in the polymer 984sequence are to be displayed then, checking the \guilabel{All 985 modifications} check box widget will trigger the display of all the 986modifications set to any monomer in the whole polymer sequence. 987 988Unmodification of monomers is easily performed by selecting any number 989of items from the \guilabel{Modifications} list and clicking the 990\guilabel{Unmodify} button. 991 992\fbox{\parbox{0.9\textwidth}{\textsl{It should be noted that once a 993 monomer modification dialog window has been opened, the polymer 994 sequence should not be edited. This is because the 995 modification/unmodification process takes for granted that the 996 polymer sequence still is identical to what it was when the 997 monomer modification dialog was opened. Mecanisms are there to 998 ensure that the irreparable does not happen, but this warning is 999 in order.}}} 1000 1001 1002\renewcommand{\sectitle}{Whole Sequence Modification} 1003\subsection*{\textcolor{sectioningcolor}{\sectitle}} 1004\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 1005\index{\xpe!simulations!polymer~modification} 1006 1007As described above, it is possible to modify any monomer in the 1008polymer sequence; whhen any modified monomer is removed, the 1009modification associated to it disappears also. The modifications that 1010we describe here are not of this kind. They can be applied to either 1011the left end of the polymer sequence or its right end (or both ends at 1012any given time). But these modifications do belong to the polymer 1013sequence \textit{per se} and are not removed from it---even if the 1014polymer sequence is edited by removing the left end monomer or the 1015right end monomer. This is why these modifications are \emph{polymer 1016 modifications} and not monomer modifications. 1017 1018\begin{figure} 1019 \begin{center} 1020 \includegraphics[width=0.66\textwidth] 1021 {figures/xpertedit-modify-polymer.png} 1022 \end{center} 1023 \caption[Modification of the left end of a polymer 1024 sequence]{\textbf{Modification of the left end of a polymer 1025 sequence.} This figure shows how simple it is to permanently 1026 modify a polymer sequence on either or both its left/right ends.} 1027 \label{fig:xpertedit-modify-polymer} 1028\end{figure} 1029 1030The way in which a polymer sequence is modified using \emph{polymer 1031 modifications} is much easier than the previous \emph{monomer 1032 modifications} case. The modification window is opened by choosing 1033the \guimenu{Chemistry}\guimenuitem{Modify Polymer} menu. The 1034Figure~\vref{fig:xpertedit-modify-polymer} shows that window. The 1035modification is absolutely easy to perform, with a clear feedback 1036provided to the user (by listing the permanent modifications in two 1037line~edit widgets located in front of the \guilabel{Target} 1038checkbuttons \guilabel{Left End} and \guilabel{Right End}. 1039 1040Note that, as a convenience for the user, it is possible to modify the 1041polymer sequence using an arbitrary modification in the form of a 1042combination of a name and a formula (check the \guilabel{Define 1043 modification} checkbox, to that effect). The modification object 1044used is created on-the-fly by the program and gets saved in the file 1045as if the user had selected a modification out of the list of 1046available modifications. In the example 1047(Figure~\vref{fig:xpertedit-modify-polymer}), the polymer sequence was 1048modified on its left end using the ``Acetylation'' modification 1049available in the polymer chemistry definition and was amidated 1050(formula \guivalue{-OH+NH2}) with a manually-defined modification 1051called \guivalue{MyModif}. The polymer sequence editor window displays 1052the left end and right end modifications as labels of buttons located 1053in the \guilabel{Polymer modifications} groupbox. 1054 1055 1056\renewcommand{\sectitle}{Monomer Cross-linking} 1057\section*{\textcolor{sectioningcolor}{\sectitle}} 1058\addcontentsline{toc}{section}{\numberline{}\sectitle} 1059\label{subsect:monomer-cross-link} 1060\index{\xpe!monomer~cross-linking} 1061 1062A cross-link is a covalent bond that links a monomer with one 1063or more other monomer. A monomer might be cross-linked more than once. 1064The dialog window in which the user might define cross-links is shown 1065in Figure~\ref{fig:xpertedit-cross-link-monomers}. 1066 1067\begin{figure} 1068 \begin{center} 1069 \includegraphics[width=1\textwidth] 1070 {figures/xpertedit-cross-link-monomers.png} 1071 \end{center} 1072 \caption[Cross-linking of monomers]{\textbf{Cross-linking of 1073 monomers.} This figure shows the window in which monomers can 1074 be cross-linked together. A cross-link (as defined in the current 1075 polymer chemistry definition) is selected and the targets are 1076 specified in the \guilabel{Targets' positions} text line edit 1077 widget in the form of monomer positions separated by ';' 1078 semicolumns.} 1079 \label{fig:xpertedit-cross-link-monomers} 1080\end{figure} 1081 1082Cross-linkers were defined in the section about \xpd\ (see 1083page~\pageref{sect:cross-linkers}). A cross-linker might either define 1084no modification to be applied to the cross-linked monomers or the same 1085number of modifications as there are monomers cross-linked. For 1086example, fluorescent proteins have a chromophore that is made by 1087reaction of three residues (Threonyl [or Seryl]--Tryptophanyl [or 1088Tyrosinyl or Phenylalanyl]--Glycyl), as shown in 1089Figure~\ref{fig:xpertedit-cross-linked-monomers}. When cross-linking 1090with the fluorescent protein cross-linker, there must be three 1091monomers involved as these are three modifications defined in the 1092cross-linker. 1093 1094\begin{figure} 1095 \begin{center} 1096 \includegraphics[width=0.4\textwidth] 1097 {figures/xpertedit-cross-linked-monomers.png} 1098 \end{center} 1099 \caption[Graphical rendering of cross-linked 1100 monomers]{\textbf{Graphical rendering of cross-linked monomers.} 1101 This figure shows the three monomers (TWG) from cyan fluorescent 1102 protein cross-linked together.} 1103 \label{fig:xpertedit-cross-linked-monomers} 1104\end{figure} 1105 1106When any monomer involved in a cross-linker is edited off a polymer 1107sequence, the cross-link(s) it was involved in are automatically 1108dissolved and destroyed. Destruction of a cross-link might be 1109performed by selecting the cross-link in the \guilabel{Cross-links} 1110list widget at the right hand side of the dialog window depicted in 1111Figure~\ref{fig:xpertedit-cross-link-monomers} and by clicking the 1112\guilabel{Uncross-link} button. 1113 1114 1115 1116\renewcommand{\sectitle}{Sequence Cleavage} 1117\section*{\textcolor{sectioningcolor}{\sectitle}} 1118\addcontentsline{toc}{section}{\numberline{}\sectitle} 1119\label{sect:cleave-polymer-sequences} 1120\index{\xpe!simulations!sequence~cleavage} 1121 1122It happens very often that polymer sequences get cleaved in a 1123sequence-specific manner. These specific cleavages do occur very often 1124in nature, and are made by enzymes that do cleave biopolymer 1125sequences, like the glycosidases (cleaving saccharides), the proteases 1126(cleaving proteins) or the nucleases (cleaving nucleic acids). But the 1127scientist also uses purified enzymes or chemicals to perform such 1128cleavages in the test tube. \mXp\ must be able to perform those 1129cleavages \textit{in silico}. 1130 1131\begin{figure} 1132 \begin{center} 1133 \includegraphics[width=0.9\textwidth] 1134 {figures/xpertedit-cleavages.png} 1135 \end{center} 1136 \caption[Polymer sequence cleavage window]{\textbf{Polymer sequence 1137 cleavage window.} This figure shows the window in which polymer 1138 sequence cleavage is performed. One cleavage specification is 1139 selected and the number of allowed partial cleavages is set. The 1140 results are displayed in the same window. The cleavage might be 1141 performed on the currently selected polymer sequence region or the 1142 whole sequence. It is possible to stack oligomers from different 1143 cleavage simulation in the same window.} 1144 \label{fig:xpertedit-cleavages} 1145\end{figure} 1146 1147It is a matter of having a polymer sequence opened in an editor window 1148and selecting the \guimenu{Chemistry}\guimenuitem{Cleave} menu. The 1149user is provided with a window where a number of cleavage 1150specifications are listed (Figure~\ref{fig:xpertedit-cleavages}, 1151page~\pageref{fig:xpertedit-cleavages}) along with options that allow 1152customizing the production of oligomers. The cleavage specifications 1153are listed in the \guilabel{Available cleavage agents} list widget by 1154looking into the polymer chemistry definition corresponding to the 1155polymer sequence to be cleaved. The program knows, for example, that 1156the polymer sequence to be cleaved is of the ``protein-1-letter'' 1157chemistry type, and thus will list all the cleavage specifications 1158that were defined in that polymer chemistry definition. 1159 1160The user selects the cleavage specification of interest and sets other 1161useful parameters, like the number of partial cleavages that the 1162cleaving agent may yield, for example. Entering \guivalue{0} means 1163that the cleavage reaction will yield the set of oligomers 1164corresponding to a total cleavage of the polymer sequence (no missed 1165cleavages=partial cleavages 0). Also, the user might indicate that the 1166oligomers computed during the cleavage should be ionized according to 1167the current ionization rule (displayed in the main window) and in the 1168specified range. Finally, when the window is opened, the 1169\guilabel{Oligomer coordinates} group box widget lists the coordinates 1170of the currently selected region of the polymer sequence. Either leave 1171the values as they are shown or check the \guilabel{Whole sequence} 1172check box widget. In the first case, the cleavage will occur only 1173inside the selected region of the polymer sequence (that is, taking 1174that region to be the actual polymer sequence of interest); in the 1175second case, the cleavage will take place in the whole polymer 1176sequence whatever the currently selected polymer sequence region. 1177This feature, which was introduced in version 2.3.0, is useful so as 1178to simulate a first cleavage of a polymer sequence and then a second 1179cleavage of a selected oligomer using a different cleavage agent. In 1180protein chemistry, that would be useful to explore possibilities of 1181double sequential cleavages of a protein, first with EndoAspN, for 1182example, and then with Trypsin. 1183 1184The user might want to generate oligomers for different kinds of 1185cleavages. For example, it might be interesting to have in the same 1186tree view widget the oligomers generated using first trypsin and then 1187cyanogen bromide. In order to add new oligomers to pre-existing one, 1188it is simply required to check the \guilabel{Stack oligomers} check 1189button widget prior to clicking the \guilabel{Cleave} button again 1190with the new cleavage settings. 1191 1192The \guilabel{Details} frame widget at the bottom of the window 1193displays a number of informative data. In particular, the 1194\guilabel{Sequence} tab widget displays the sequence of the oligomer 1195currently selected in the \guilabel{Oligomers} table view along with the 1196name of the cleavage agent which it arose from. The \guilabel{Cleavage 1197 Details} tab widget displays the mass calculation engine 1198configuration at the time the \emph{last} cleavage was performed (one 1199red led means that the related feature was off, conversely a green led 1200means that the feature was on). In our example, the mass calculation 1201for the oligomers did not account for the monomer modifications nor 1202for the left/right ends of the polymer, nor for the cross-links. 1203 1204When the user triggers a cleavage, the mass calculation engine 1205configuration currently set in the sequence editor is used for the 1206calculation of the mass of the oligomers obtained \textit{per} the 1207cleavage. This process allows an easy change in the mass calculation 1208engine configuration between one cleavage and another so as to allow 1209comparison of masses obtained for the same cleavage but with different 1210mass calculation engine configurations. 1211 1212Finally, one last note: if the list of monoisotopic or average masses 1213are desired in the form of a text list, right-clicking onto the table 1214iew widget will allow copying to the clipboard either the monoisotopic 1215or the average masses. Also, it is possible to either export the data 1216to the clipboard or to a file or even to drag the displayed oligomer 1217items in a text editor. Only the selected items in the tree view 1218widget will be exported. 1219 1220For oligomer data filtering, please refer to 1221section~\ref{sect:oligomer-data-filtering}, page 1222\pageref{sect:oligomer-data-filtering}. 1223 1224\renewcommand{\sectitle}{Spectrum calculation} 1225\subsection*{\textcolor{sectioningcolor}{\sectitle}} 1226\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 1227\index{\xpe!simulations!spectrum-calculation} 1228 1229It is possible to create a full spectrum simulation based on the 1230oligomers presented in the \guilabel{Oligomers} table widget. For 1231that, click the \guilabel 1232{Create spectrum} menu in the drop down 1233menu. Clicking that menu will elicit the opening of the window shown 1234in Figure~\ref{fig:xpertedit-spectrum-creation-from-cleavages}. 1235 1236 1237\begin{figure} 1238 \begin{center} 1239 \includegraphics[scale=1] 1240 {figures/xpertedit-spectrum-creation-from-cleavages.png} 1241 \end{center} 1242 \caption[Spectrum simulation for cleavage-obtained 1243 oligomers]{\textbf{Spectrum simulation for cleavage-obtained 1244 oligomers.} This figure shows how to configure the calculation 1245 of a spectrum for a set of oligomers obtained after the cleavage 1246 of a polymer sequence.} 1247 \label{fig:xpertedit-spectrum-creation-from-cleavages} 1248\end{figure} 1249 1250If the \guilabel{Isotopic cluster} check box is not checked, then the 1251spectrum will not contain the isotopic cluster for each 1252oligomer. Instead, a single peak will be calculated, based either on 1253the monoisotopic or on the average mass of the oligomer that is used 1254as the peak centroid. When the \guilabel{Isotopic cluster} check box 1255is checked, the starting mass is evidently monoisotopic as the 1256isotopic cluster is calculated starting from that mass. Note that the 1257other parameters have been explained earlier 1258(see section~\ref{sect:xpertcalc-isotopic-pattern-calculator}, 1259page~\pageref{sect:xpertcalc-isotopic-pattern-calculator}). 1260 1261Selecting a file to write the results (that is the (x y) pairs making 1262the spectrum) is recommended. Otherwise, when the calculation is 1263finished, refer to the \guilabel{Results} tab page widget for the same 1264spectrum (x y) pairs. 1265 1266During the calculation, the \guilabel{Log} tab page widget shows the 1267details of the running calculation. For example, the following is the 1268log for the first two oligomers of a set of 123: 1269 1270{\small 1271\begin{verbatim} 1272 1273Simulating a spectrum with calculation of 1274an isotopic cluster for each oligomer. 1275 1276There are 123 oligomers. Calculating sub-spectrum for each 1277 1278Computing isotopic cluster for oligomer 1 1279 formula: C82H123N22O25. 1280 Validating formula... Success. 1281 mono m/z: 1815.9 1282 charge: 1 1283 fwhm: 0.18159 1284 increment: 0.024212 1285 1286 Done computing the cluster 1287 1288Computing isotopic cluster for oligomer 2 1289 formula: C82H124N22O25. 1290 Validating formula... Success. 1291 mono m/z: 908.455 1292 charge: 2 1293 fwhm: 0.0908455 1294 increment: 0.00605637 1295 1296 Done computing the cluster 1297\end{verbatim} 1298} 1299 1300The previous example dealt with the horse apomyoglobin that was 1301cleaved with trypsin, with 1 partial cleavage and charge levels from 1 1302to 3. That cleavage simulation yielded 123 oligomers, for which a 1303spectrum was calculated which spans the [49.7--3418] m/z 1304range. Figure~\ref{fig:xpertedit-spectrum-simulation-cleavage-oligomers} 1305shows that spectrum, zoomed in the region [744--759]. Four distinct 1306isotopic clusters are visible: 1307 1308\begin{figure} 1309 \begin{center} 1310 \includegraphics[width=\textwidth] 1311 {figures/xpertedit-spectrum-simulation-cleavage-oligomers.png} 1312 \end{center} 1313 \caption[Simulated spectrum for cleavage-obtained 1314 oligomers]{\textbf{Simulated spectrum for cleavage-obtained 1315 oligomers.} This spectrum (zoomed portion viewed in 1316 \progname{mMass}) has been simulated starting from a list of 1317 oligomers obtained by cleaving the horse apomyoglobin protein with 1318 trypsin.} 1319 \label{fig:xpertedit-spectrum-simulation-cleavage-oligomers} 1320\end{figure} 1321 1322\begin{tabbing} 1323mono m/z \phantom{room} \= Peptide sequence\phantom{still some roooom here} \= charge\\[2mm] 1324 1325744.70 \> HPGDFGADAQGAMTKALELFR \> 3+\\[2mm] 1326748.44 \> ALELFR \> 1+\\[2mm] 1327751.84 \> HPGDFGADAQGAMTK \> 2+\\[2mm] 1328753.98 \> KHGTVVLTALGGILK \> 2+\\[2mm] 1329\> HGTVVLTALGGILKK \> 2+\\[2mm] 1330\end{tabbing} 1331 1332 1333Computing a full spectrum starting from oligomers which might have 1334large masses (> 6000) will require a large amount of CPU. The above 1335apomyoglobin example could be handled in $\approx$\,20~s on a rather 1336powerful laptop (albeit with a single processor used throughout the 1337task). 1338 1339 1340\renewcommand{\sectitle}{Oligomer Fragmentation} 1341\section*{\textcolor{sectioningcolor}{\sectitle}} 1342\addcontentsline{toc}{section}{\numberline{}\sectitle} 1343\label{sect:fragment-polymer-sequence} 1344\index{\xpe!simulations!oligomer~fragmentation} 1345 1346It happens very often that polymer sequences need to be fragmented in 1347the gas phase (in the mass spectrometer) so that structure 1348characterizations may be performed. For protein chemistry, this 1349happens very often in order to get sequence information for a given 1350peptide ion selected in the gas phase. \mXp\ must be able to perform 1351those fragmentations \textit{in silico}. Let's see how an oligomer 1352can be fragmented using \mXp. 1353 1354\begin{figure} 1355 \begin{center} 1356 \includegraphics[scale=1] 1357 {figures/xpertedit-fragmentation.png} 1358 \end{center} 1359 \caption[Oligomer fragmentation window]{\textbf{Oligomer 1360 fragmentation window.} This figure shows the window in which 1361 oligomer fragmentation is performed. One or more fragmentation 1362 patterns might be selected in one fragmentation step.} 1363 \label{fig:xpertedit-fragmentation} 1364\end{figure} 1365 1366It is a matter of having a polymer sequence opened in an editor window 1367and selecting the sequence region to be fragmented. Once this is done, 1368the user selects the \guimenu{Chemistry}\guimenuitem{Fragment} menu. 1369The user is provided with a window where a number of fragmentation 1370specifications are listed (Figure~\vref{fig:xpertedit-fragmentation}). 1371As detailed for the cleavage of polymers, these fragmentation 1372specifications are listed by looking into the polymer chemistry 1373definition corresponding to the polymer sequence of which an oligomer 1374is to be fragmented. 1375 1376The user selects the fragmentation specification(s) of interest, set 1377the ionization range required for the generated fragment oligomers 1378(the same as for polymer cleave) and clicks the \guilabel{Fragment} 1379button. Upon successful termination of the fragmentation reaction, 1380the generated fragments are displayed in the \guilabel{Oligomers} 1381table view widget. 1382 1383As detailed for the cleavage of polymer sequences, the 1384\guilabel{Details} frame widget displays data about the fragments 1385generated and the way masses were calculated for them. 1386 1387It is possible to take into account cross-links that are beared by 1388monomers contained in the oligomer. Only cross-links that are fully 1389contained in the oligomer are taken into account. Partial cross-links, 1390that is, cross-links that have at least one involved monomer outside 1391of the oligomer, are ignored. 1392 1393Figure~\ref{fig:xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links} 1394shows the \xpe\ module with the cyan fluorescent protein. The 1395chromophore is shown as an internal cross-link between residues T, W 1396and G (net mass change: -20~Da). There is also a disulfide bond 1397involving two cysteine residues (net mass change: -2~Da). In this 1398example, the mass calculation engine did not take into account the 1399cross-links (see the unchecked \guilabel{Cross-links} check box). When 1400that check box is checked, the mass calculation engine yields mass 1401data with a differential of -22~Da (-20 -2)~Da : both cross-links have 1402now been taken into account. 1403 1404\begin{figure} 1405 \begin{center} 1406 \includegraphics[width=0.75\textwidth] 1407 {figures/xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links.png} 1408 \end{center} 1409 \caption[Two cross-links in the cyan fluorescent protein 1410 sequence]{\textbf{Two cross-links in the cyan fluorescent protein 1411 sequence.} This figure shows two cross-links (T--W--G and C--C) 1412 set to the cyan fluorescent protein. The mass calculation engine 1413 is configured to take these cross-links into account.} 1414 \label{fig:xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links} 1415\end{figure} 1416 1417 1418\begin{figure} 1419 \begin{center} 1420 \includegraphics[width=0.75\textwidth] 1421 {figures/xpertedit-cfp-chromophore-disulfide-bond-account-cross-links.png} 1422 \end{center} 1423 \caption[Calculations when cross-links are accounted 1424 for]{\textbf{Calculations when cross-links are accounted for.} This 1425 figure shows that the two cross-links shwon in 1426 Figure~\ref{fig:xpertedit-cfp-chromophore-disulfide-bond-no-account-cross-links} 1427 are now taken into account, which translates into a mass decrease 1428 of 22~Da.} 1429 \label{fig:xpertedit-cfp-chromophore-disulfide-bond-account-cross-links} 1430\end{figure} 1431 1432 1433If we select the oligomer region [38--77] and that we ask for a 1434fragmentation, the fragmentation results will take into account both 1435cross-links only in the case the generated fragments encompasse fully 1436one or more cross-links. 1437 1438The following calculation rationale applies: 1439 1440\begin{itemize} 1441 1442\item Fragments b (left end) from b$_1$ (D) to b$_{12}$ (up to I) do 1443 not take into account the cross-links as both are outside of its 1444 scope; 1445 1446\item Fragments b$_{13}$ (up to C) to b$_{34}$ (up to Q) do not take 1447 into account the cross-links because the outer cross-link (disulfide 1448 bond between cysteine residues) is not complete (the second cysteine 1449 is left out of the fragment); 1450 1451\item Fragments b$_{35}$ (up to C) to b$_{40}$ (up to P) do take into 1452 account both cross-links because both are contained in the fragments; 1453 1454\item Likewise, the only y fragments (right end) that do take into 1455 account the cross-links are the fragments y$_{28}$ (up to C) and all 1456 the remaining, as for these fragments, the cross-links are both 1457 fully contained. 1458 1459\end{itemize} 1460 1461 1462\begin{figure} 1463 \begin{center} 1464 \includegraphics[width=0.75\textwidth] 1465 {figures/xpertedit-fragmentation-cross-linked-oligomer.png} 1466 \end{center} 1467 \caption[Complicated cross-linking situation]{\textbf{Complicated 1468 cross-linking situation.} This figure shows a complicated 1469 cross-linking situation with an oligomer that has five 1470 cross-links, four of which are fully encompassed by the oligomer 1471 and one that involves a monomer outside of the oligomer.} 1472 \label{fig:xpertedit-fragmentation-cross-linked-oligomer} 1473\end{figure} 1474 1475 1476The calculation of the fragments for this oligomer involves the 1477following steps: 1478 1479\begin{itemize} 1480 1481\item Calculate regions of the oligomer that involve cross-links 1482 either overlapping or not. The regions are thus the following: 1483 [3--5], [8--11] and [13--15]. Note that the cross-link involving 1484 monomer~12 is never taken into account as it involves also a monomer 1485 outside of the oligomer; 1486 1487\item For fragments that have the left end of the oligomer (``Left-end 1488 nomenclature''), the following rationale is used: 1489 1490 \begin{itemize} 1491 1492 \item Fragments $\rightarrow$1 and $\rightarrow$2 do not have any 1493 cross-link; 1494 1495 \item Fragments $\rightarrow$3 to $\rightarrow$4 do not account for 1496 cross-link~a because that cross-linke is not fully encompassed by 1497 the fragments; 1498 1499 \item Fragments $\rightarrow$5 to $\rightarrow$10 account only for 1500 the cross-link~a as this is the only cross-linked region to be 1501 fully encompassed by these fragments; 1502 1503 \item Fragments $\rightarrow$11 to $\rightarrow$14 account for 1504 cross-links~a, b and c as they are all fully encompassed in the 1505 fragments; 1506 1507 \item Fragments $\rightarrow$15 to $\rightarrow$16 account for all 1508 cross-links, a, b, c, d as they are all fully encompassed in the 1509 fragments; 1510 1511 \end{itemize} 1512 1513\item For fragments that have the right end of the oligomer (Right-end 1514 nomenclature), the following rationale is used: 1515 1516 \begin{itemize} 1517 1518 \item Fragments 1$\leftarrow$ and 2$\leftarrow$ do not have any 1519 cross-link; 1520 1521 \item Fragments 3$\leftarrow$ and 4$\leftarrow$ do not account for 1522 cross-link~d because that cross-link is not fully encompassed by 1523 the fragments; 1524 1525 \item Fragments 5$\leftarrow$ and 6$\leftarrow$ account for 1526 cross-link~d because it is fully encompassed in these fragments; 1527 1528 \item Fragments 7$\leftarrow$ to 9$\leftarrow$ only account for 1529 cross-link~d because cross-links~b and c (which make one 1530 cross-linked region) are not fully encompassed by these fragments; 1531 1532 \item Fragments 10$\leftarrow$ to 14$\leftarrow$ account for 1533 cross-links~d, c and b, but not for cross-link~a as this last 1534 cross-link is not fully encompassed in these fragments; 1535 1536 \item Fragments 15$\leftarrow$ and 16$\leftarrow$ account for all 1537 the cross-links of the oligomer. 1538 1539 \end{itemize} 1540 1541\end{itemize} 1542 1543\noindent It is necessary to repeat one more time that cross-links 1544that involve monomer(s) outside of the oligomer are ignored. The user 1545is alerted whenever this situation is encountered. 1546 1547Finally, one last note: if the list of monoisotopic or average masses 1548are desired in the form of a text list, right-clicking onto the table 1549view widget will allow copying to the clipboard either the 1550monoisotopic or the average masses. Also, it is possible to either 1551export the data to the clipboard or to a file or even to drag the 1552displayed oligomer items in a text editor. 1553 1554For oligomer data filtering, please refer to 1555section~\ref{sect:oligomer-data-filtering}, page 1556\pageref{sect:oligomer-data-filtering}. 1557 1558 1559\renewcommand{\sectitle}{Mass Searching} 1560\section*{\textcolor{sectioningcolor}{\sectitle}} 1561\addcontentsline{toc}{section}{\numberline{}\sectitle} 1562\label{sect:search-masses-polymer-sequence} 1563\index{\xpe!mass~searching} 1564 1565It may happen that the scientist needs to know if some arbitrary 1566sequence region would have a given mass. \mXp\ allows for mass 1567searching operations in the polymer sequence. This is done by using 1568the menu \guimenu{Chemistry}\guimenuitem{Mass Search}. The window 1569illustrated in Figure~\vref{fig:xpertedit-mass-search} shows up and 1570the user enters masses to search for. A number of parameters are to be 1571detailed: 1572\smallskip 1573 1574\begin{itemize} 1575 1576\item \guilabel{Targets} The masses should be searched for in the 1577 whole sequence or in the currently selection region? 1578 1579\item \guilabel{Ionization} When calculating masses for the potential 1580 oligomers matching the searched mass, should different levels of 1581 ionization be calculated. For example, one find in an electrospray 1582 ionization experiment mass spectrum a peak at \mz{1245}. It is not 1583 possible to know the ionization level for that ion. On could imagine 1584 that this value is for a monopronotonated or for a multiprotonated 1585 species. If we wanted to asses this, we might ask that the mass be 1586 searched for by computing a range of possible ionization levels 1587 between \guilabel{Start} \guivalue{1} and \guilabel{End} \guivalue{4} 1588 (admitting that for that experiment this is what one would expect). 1589 1590\end{itemize} 1591% 1592Once the masses have been searched for, if results are found they are 1593displayed in the same window in the \guilabel{Oligomers} table view 1594widgets (the left one for the mono masses and the right one for the 1595avg masses). 1596 1597 1598\begin{figure} 1599 \begin{center} 1600 \includegraphics[scale=0.75] 1601 {figures/xpertedit-mass-search.png} 1602 \end{center} 1603 \caption[Searching masses in a a polymer sequence]{\textbf{Searching 1604 masses in a polymer sequence.} This figure shows the window in 1605 which to search for masses in a polymer sequence.} 1606 \label{fig:xpertedit-mass-search} 1607\end{figure} 1608 1609 1610Finally, one last note: if the list of monoisotopic or average masses 1611are desired in the form of a text list, right-clicking onto the table 1612view widget will allow copying to the clipboard either the 1613monoisotopic or the average masses. Also, it is possible to either 1614export the data to the clipboard or to a file or even to drag the 1615displayed oligomer items in a text editor. 1616 1617For oligomer data filtering, please refer to 1618section~\ref{sect:oligomer-data-filtering}, page 1619\pageref{sect:oligomer-data-filtering}. 1620 1621 1622\renewcommand{\sectitle}{Oligomer Data Filtering} 1623\section*{\textcolor{sectioningcolor}{\sectitle}} 1624\addcontentsline{toc}{section}{\numberline{}\sectitle} 1625\label{sect:oligomer-data-filtering} 1626\index{\xpe!data~filtering} 1627 1628Oligomer-generating simulations, like polymer sequence cleavages or 1629fragmentations or mass searches, produce a very large amount of 1630data. It is often desirable to be able to filter quickly some specific 1631data out of these bunch of data\dots\ 1632 1633In all three simulations mentioned above, the results that are 1634displayed in the corresponding dialog windows are easily filtered 1635using the mechanism illustrated in 1636Figure~\ref{fig:xpertedit-filtering-oligomer-data}. 1637 1638\begin{figure} 1639 \begin{center} 1640 \includegraphics[width=1\textwidth] 1641 {figures/xpertedit-filtering-oligomer-data.png} 1642 \end{center} 1643 \caption[Oligomer data filtering]{\textbf{Oligomer data filtering.} 1644 This figure shows how oligomer data can be filtered. The 1645 \guilabel{Filtering options} group box contains four line edit 1646 widgets where filtering might be triggered: \guilabel{Partial}, 1647 \guilabel{Mono}, \guilabel{Avg}, \guilabel{Charge}. The filtered 1648 data are displayed in the same window (this examlple for polymer 1649 sequence-cleavage oligomer data.} 1650 \label{fig:xpertedit-filtering-oligomer-data} 1651\end{figure} 1652 1653 1654Filtering on the data is easily performed by entering the options in 1655the \guilabel{Filtering options} group box 1656(Figure~\ref{fig:xpertedit-filtering-oligomer-data}, 1657page~\pageref{fig:xpertedit-filtering-oligomer-data}). For any 1658filtering operation, only one criterium can be used, that is, for 1659example, filtering can occur only on the basis of the monoisotopic 1660mass or of the average mass, but not on both masses. For example, if 1661one wanted to filter a huge set of data against a specific 1662monoisotopic mass of 850 plus or minus 3 atomic mass units, it would 1663simply be a matter of setting the monoisotopic mass to be 1664\guivalue{850} with a tolerance of \guivalue{3 AMU} in the 1665corresponding line edit widgets contained in the \guilabel{Filtering 1666 options} group box. To perform that filtering action, first set the 1667tolerance value (\guivalue{3}) in its line edit widget and next set 1668the monoisotopic mass value to be \guivalue{850} in the corresponding 1669line edit widget. While the cursor \emph{is still} in the 1670\guilabel{Mono} line edit where \guivalue{850} was entered, press the 1671keyboard key combination \kbdKey{Ctrl}+\kbdKey{ENTER}. The filtering 1672will be immediate and the table view will show the data that passed 1673the filter. Note that the combo box widget holding the unit of the 1674tolerance (in the example, that unit is \guilabel{AMU}, that is 1675``atomic mass unit'') and the line edit widget where the tolerance 1676value proper is set (\guivalue{3} in the example) do not trigger any 1677filtering by themselves; these widgets are only useful in conjunction 1678with other oligomer data : \guilabel{Mono}, \guilabel{Avg}, 1679\guilabel{Error} line edit widgets (depending on the dialog window the 1680filtering occurs: cleavage, fragmentation or mass search). In our 1681example, thus, the filtering would be spoken like this: 1682---\textsl{``Only show the oligomers for which the monoisotopic mass 1683 is 850 plus or minus 3 atomic mass units''}. 1684 1685To exit the data filtering mode, simply uncheck the 1686\guilabel{Filtering options} check box, and all the initial data will 1687be displayed, irrespective of any data in the line edit boxes 1688described above. 1689 1690 1691\renewcommand{\sectitle}{m/z Ratio Calculation} 1692\section*{\textcolor{sectioningcolor}{\sectitle}} 1693\addcontentsline{toc}{section}{\numberline{}\sectitle} 1694\label{sect:m-over-z-ratio-calculation} 1695\index{\xpe!simulations!m/z~calculations} 1696 1697In electrospray ionization, a given polymer sequence might be charged 1698a large number of times. The tool shown in 1699Figure~\vref{fig:xpertedit-mz-ratio-calculator} shows how to compute a 1700range of m/z ratios starting from one m/z value for a given charge and 1701a given ionization agent. It is also possible to switch ionization 1702agent on-the-fly. 1703 1704\begin{figure} 1705 \begin{center} 1706 \includegraphics[scale=0.8] 1707 {figures/xpertedit-mz-ratio-calculator.png} 1708 \end{center} 1709 \caption[Calculation of ranges of m/z ratios]{\textbf{Calculation of 1710 ranges of m/z ratios.} This figure shows the window in which to 1711 perform the calculation of different m/z ratios starting from one 1712 m/z value with a given ionization agent.} 1713 \label{fig:xpertedit-mz-ratio-calculator} 1714\end{figure} 1715 1716 1717\renewcommand{\sectitle}{Monomeric And Elemental Compositions} 1718\section*{\textcolor{sectioningcolor}{\sectitle}} 1719\addcontentsline{toc}{section}{\numberline{}\sectitle} 1720\label{sect:monomeric-elemental-compositions} 1721\index{\xpe!elemental~composition} 1722\index{\xpe!monomeric~composition} 1723 1724The \guimenu{Chemistry}\guimenuitem{Determine Compositions} menu 1725triggers the window shown in Figure~\ref{fig:xpertedit-compositions}. 1726The elemental composition is determined using the calculations engine 1727configuration currently set in the polymer sequence editor window. 1728 1729\begin{figure} 1730 \begin{center} 1731 \includegraphics[scale=0.9] 1732 {figures/xpertedit-compositions.png} 1733 \end{center} 1734 \caption[Determination of the compositions]{\textbf{Determination of 1735 the compositions.} This figure shows how to determine the 1736 monomeric and elemental compositions for the whole sequence or the 1737 current selection.} 1738 \label{fig:xpertedit-compositions} 1739\end{figure} 1740 1741 1742 1743\renewcommand{\sectitle}{pKa, pH, pI and Charges} 1744\section*{\textcolor{sectioningcolor}{\sectitle}} 1745\addcontentsline{toc}{section}{\numberline{}\sectitle} 1746\label{sect:acido-basic-calculations} 1747\index{\xpe!pKa} 1748\index{\xpe!pH} 1749\index{\xpe!pI} 1750 1751When preparing biochemical experiments, very often users need to know 1752how many charges a given polymer sequence will bear at any given pH. 1753Equally important is the ability to know at which pH value the polymer 1754sequence will have a net charge near to zero. The pH value for which a 1755given polymer sequence has a net charge near to zero (typically this 1756means that the number of positive charges equals the number of 1757negative charges) is called the isoelectric point---the pI. 1758 1759Such computations are pretty computer-intensive and require a very 1760precise knowledge of the chemical structure of the different monomers 1761that take part in the definition of the polymer chemistry. A file, 1762called \filename{pka\_ph\_pi.xml} is located in the polymer chemistry 1763definition directory. This file lists all the chemical groups that are 1764possibly charged; each monomer of the polymer definition is 1765represented by a \verb|<monomer>| element in which data are defined 1766for any chemical group of that monomer that might bear a charge at any 1767given pH. You can find the listing of the \filename{pka\_ph\_pi.xml} 1768file in chapter\vref{chap:appendices}. We'll discuss any aspect of 1769this file's contents in the next sections with enough detail that the 1770user will be able to write one such file for her specific polymer 1771chemistry. 1772 1773At the moment, two entities in the polymer chemistry definition might 1774have chemical groups bearing charges: monomers and modifications. 1775We will first review monomers, and modifications next. 1776 1777\renewcommand{\sectitle}{Ionized Group(s) In Monomers} 1778\subsection*{\textcolor{sectioningcolor}{\sectitle}} 1779\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 1780 1781Monomers are the building blocks of polymer sequences. These blocks 1782must have at least two reactive groups so that they can be polymerized 1783into a polymer sequence thread. Reactive groups are often chargeable 1784groups; for example, the amino group of amino-acids is such that it 1785gets protonated (positively charged) at a pH inferior to its pKa. 1786Similarly, the carboxylic acid group of amino-acids is deprotonated 1787(negatively charged) at physiological pH. 1788 1789\subsubsection*{Some Theory First} 1790 1791\begin{figure} 1792 \begin{center} 1793 \includegraphics[scale=2] 1794 {figures/protein-monomer-acidobasic-data.png} 1795 \end{center} 1796 \caption[Different pKa values for a number of amino-acids' chemical 1797 groups]{\textbf{Different pKa values for a number of amino-acids' 1798 chemical groups.} All of the twenty amino-acids are represented 1799 here, which each amino-acid's lateral chain fully represented. 1800 Above each chemical group---for which the value makes sense from a 1801 biological perspective---the pKa value is indicated.} 1802 \label{fig:protein-monomer-acidobasic-data} 1803\end{figure} 1804 1805For the non-biochemist reader, amino-acids involved in the formation 1806of proteins have always at least two chemical groups that are of 1807inverted electrical charge, at physiological pH values (see 1808Figure~\ref{fig:protein-monomer-acidobasic-data}): 1809 1810\begin{itemize} 1811\item The amino group (called $\rm \alpha NH_2$) has a typical pKa 1812 value of 9.6. This means that, at physiological pH values (between 1813 6.5 and 7.5), the amino group will find the environment rather 1814 acidic, and will thus be protonated, leading to a positively-charged 1815 species ($\rm \alpha NH_3^+$); 1816\item The carboxylic group (called $\rm \alpha COOH$) has a typical pKa 1817 value of 2.35. This means that, at physiological pH values, the 1818 carboxylic group will be in a rather basic environment, and will 1819 thus be deprotonated, leading to a negatively-charged species ($\rm 1820 \alpha COO^-$). 1821\end{itemize} 1822 1823\noindent It should be clear that, at physiological pH values the two 1824$\rm \alpha$ chemical groups have a net charge of 0. But proteins are 1825charged, and this is because some of the twenty common amino-acids 1826have other chemical groups beyond the two others already described. 1827Indeed, some amino-acids have lateral chains that bear groups that 1828might be charged depending on the pH: seryl residues have an alcohol 1829group that has a pKa of 13, for example; that means that it is almost 1830always uncharged (form ROH at physiological pH values). The lateral 1831chain of lysine has a pKa of 10.53, which means that at pH values 1832below this pKa value, the $\rm \epsilon NH_2$ gets protonated, 1833introducing a positive charge in the protein. Similarly, amino-acids 1834glutamate and aspartate do have a lateral chain ended with a $\rm 1835\gamma COOH$ and a $\rm \beta COOH$, respectively. Their pKa values 1836are below 4.5, and thus the groups are negatively charged a 1837physiological pH values. 1838 1839When the net charge of a polymer sequence has to be computed for a 1840given pH condition, the program iterates in the sequence, and for each 1841monomer will check which one of its chemical group(s) is possibly 1842charged. For this to happen, it is required that a number of data be 1843known for each monomer's chemical group that might play a role in the 1844determination of the polymer sequence's electrical charge. Thus, for 1845each chemical group a number of data should be listed in the 1846\filename{pka\_ph\_pi.xml} file (please, see that file in the 1847chapter\vref{chap:appendices}): 1848 1849\begin{itemize} 1850\item the chemical group's \verb|<name>| element is required. 1851 {\footnotesize Examples: ``$\rm \alpha NH_2$'' or ``$\rm \epsilon 1852 NH_2$'' or ``$\alpha$COOH'';} 1853\item the chemical group's \verb|<pka>| element is optional, but is 1854 the basis for the charge calculation. {\footnotesize Examples: 9.6 1855 for the ``$\alpha$NH$\rm _2$'' or 2.35 for ``$\alpha$COOH'';} 1856\item the \verb|<acidcharged>| element is required if the <pka> 1857 element is given. This element is responsible for telling if the 1858 chemical group is charged (positively) when the pH is lower than pKa 1859 (that is when the medium is acidic with respect to the pKa). 1860 {\footnotesize Examples: an amine is positively charged when it is 1861 in its acidic form (protonated); a carboxylic acid is \emph{not} 1862 charged when it is in its acidic form;} 1863\item there can be none, one or more \verb|<polrule>| element(s) for 1864 each chemgroup. The \verb|<polrule>| element gives informations 1865 about the way the chemical group at hand might be ``trapped'' (or 1866 not) in the formation of inter-monomer bonds (while the monomer is 1867 polymerized into the polymer sequence). The value ``left\_trapped'' 1868 means that the chemical group ceases to be involved in charge 1869 calculations as soon as it has a monomer at its left end. The value 1870 ``right\_trapped'' means the same as above, but when a monomer is 1871 polymerized at its right end. For a chemical group that is 1872 ``left\_trapped'', we understand that it is only effectively 1873 evaluated if it is at the left end of the polymer sequence, since in 1874 this case it does not have a monomer at its left side. Conversely, a 1875 chemical group that has a \verb|<polrule>| element with value 1876 ``right\_trapped'', will be evaluated only if the monomer is 1877 actually the right end monomer in the polymer sequence. Finally, the 1878 typical lateral chains of amino-acids have a \verb|<polrule>| 1879 element with a value ``never\_trapped'', as these chemical groups do 1880 not take part in the formation of the inter-monomer bond; 1881\item there can be none, one or more \verb|<chemgrouprule>| element(s) 1882 for each chemgroup. A chemgrouprule element should contain the 1883 following: 1884 \begin{itemize} 1885 \item there must be an \verb|<entity>| element that indicates what 1886 is the chemical entity being dealt with in the current chemgroup 1887 element. Valid values for this element are ``LE\_PLM\_MODIF'', 1888 ``RE\_PLM\_MODIF'' or ``MNM\_MODIF''; 1889 \item there must be a \verb|<name>| element naming the chemical 1890 entity properly; 1891 \item there must be an \verb|<outcome>| element telling what action 1892 should be taken when encountering the \verb|<entity>| on the 1893 chemgroup. Valid values are either ``LOST'' or ``PRESERVED''. 1894 \end{itemize} 1895\end{itemize} 1896 1897 1898\subsubsection*{Understanding By Example} 1899 1900Let us take some examples in order to make sure we actually understand 1901the process of describing how an electrical net charge is calculated 1902for a given polymer sequence and at any given pH value. 1903 1904Let us see the example of the aspartate amino-acid, of which the 1905lateral chain is nothing but $\rm CH_2COOH$: 1906 1907\begin{alltt} 1908 <monomer> 1909 <code>D</code> 1910 <mnmchemgroup> 1911 <name>N-term NH2</name> 1912 <pka>9.6</pka> 1913 <acidcharged>TRUE</acidcharged> 1914 <polrule>left_trapped</polrule> 1915 <chemgrouprule> 1916 <entity>LE_PLM_MODIF</entity> 1917 <name>Acetylation</name> 1918 <outcome>LOST</outcome> 1919 </chemgrouprule> 1920 </mnmchemgroup> 1921 <mnmchemgroup> 1922 <name>C-term COOH</name> 1923 <pka>2.36</pka> 1924 <acidcharged>FALSE</acidcharged> 1925 <polrule>right_trapped</polrule> 1926 </mnmchemgroup> 1927 <mnmchemgroup> 1928 <name>Lateral COOH</name> 1929 <pka>3.65</pka> 1930 <acidcharged>FALSE</acidcharged> 1931 <polrule>never_trapped</polrule> 1932 <chemgrouprule> 1933 <entity>MONOMER_MODIF</entity> 1934 <name>AmidationAsp</name> 1935 <outcome>LOST</outcome> 1936 </chemgrouprule> 1937 </mnmchemgroup> 1938 </monomer> 1939\end{alltt} 1940 1941\noindent We see that the code of the monomer for which acid-basic 1942data are being defined is `D' and that this monomer has three chemical 1943groups that might bring electrical charges. These chemical groups are 1944described by three \verb|<mnmchemgroup>| elements that we will review in 1945detail below (see Figure~\vref{fig:protein-monomer-acidobasic-data}). 1946 1947\medskip 1948 1949The first \verb|<mnmchemgroup>| element is related to the $\rm \alpha 1950NH_2$ amino group of the amino-acid: 1951 1952\begin{itemize} 1953\item \verb|<name>N-term NH2</name>| The name of the chemical group is 1954 not immediately useful, but will be used when reports are to be 1955 prepared for the calculation; 1956\item \verb|<pka>9.6</pka>| This element is optional. However, of 1957 course, if the chemical group might be electrically charged, the pKa 1958 value will be essential in order to compute the charge that is 1959 brought by this chemical group at any given pH; 1960\item \verb|<acidcharged>TRUE</acidcharged>| This element is also 1961 optional, however, if the previous element is given, then this one 1962 is compulsory. Telling if the conjugated acid form is charged (that 1963 is protonated) is essential in order to know what sign the charge 1964 has to be when the chemical group is ionized. The value ``TRUE'' 1965 indicates that when the pH is lower than the pKa, the chemical group 1966 is charged, thus protonated (in the form $\rm NH_3^+$). 1967 Consequently, if the pH is higher than the pKa, then the chemical 1968 group is neutral (in the form $\rm NH_2$); 1969\item \verb|<polrule>left_trapped</polrule>| This element indicates 1970 that the chemical group should only be taken into account in the 1971 eventuality that the monomer bearing it (code `D') is the left end 1972 monomer of the polymer sequence. This can easily be understood, as 1973 this chemical group is responsible for the establishment of the 1974 inter-monomer bond towards the left end of the polymer sequence; 1975\item \verb|<chemgrouprule>| This element provides further details on 1976 the chemistry that this chemical group might be involved in: 1977 \begin{itemize} 1978 \item \verb|<entity>LE_PLM_MODIF</entity>| This element indicates 1979 that the supplementary data in the current \verb|<chemgrouprule>| 1980 element are pertaining to the $\rm \alpha NH_2$ chemical group 1981 \emph{only} in case the polymer sequence is left end-modified 1982 (that is with a permanent left end modification) and the monomer 1983 (code `D') is located at the left end of the polymer sequence 1984 (that is: it is the first monomer of the sequence for which the 1985 electrical charge---or pI---calculation is to be performed). 1986 \item \verb|<name>Acetylation</name>| This element goes further in 1987 the detail of the potential chemistry of the $\rm \alpha NH_2$ 1988 chemical group: if the left end permanent modification is 1989 ``Acetylation'', then the current chemgrouprule element can be 1990 further processed, otherwise it should be abandoned; 1991 \item \verb|<outcome>LOST</outcome>| This element actually indicates 1992 what should be done with the chemical group for which the 1993 chemgrouprule is being defined. What we see here is: 1994 ---\textsl{``If the $\rm \alpha NH_2$ chemical group, belonging to 1995 a `D' monomer located at the left end of a polymer sequence, is 1996 modified permanently with an ``Acetylation'' left end 1997 modification, it should not be taken into account when computing 1998 the charge that it could bring to the polymer sequence.''} 1999 \end{itemize} 2000\end{itemize} 2001 2002\noindent The second \verb|<mnmchemgroup>| element is related to the 2003$\rm \alpha COOH$ carboxylic group of the amino-acid: 2004 2005\begin{itemize} 2006\item \verb|<name>C-term COOH</name>| Same remark as above; 2007\item \verb|<pka>2.36</pka>| Same remark as above; 2008\item \verb|<acidcharged>FALSE</acidcharged>| Same remark as above. 2009 However, as we can see, the value indicates that the acid conjugate 2010 (form $\rm COOH$) does not bring any charge. This means that when 2011 the basic conjugate is predominant (that is when pH > pKa), it 2012 brings a negative charge: the form is $\rm COO^-$; 2013\item \verb|<polrule>right_trapped</polrule>| The chemical group 2014 should not be evaluated if a monomer is linked to it at its right 2015 side. That means that the current chemical group is only evaluated 2016 if the monomer bearing it is located at the right end of the polymer 2017 sequence. This is easily understood, as the $\rm \alpha COOH$ 2018 chemical group is involved in the formation of the inter-monomer 2019 bond towards the right end of the polymer sequence. 2020\end{itemize} 2021 2022\noindent The third \verb|<mnmchemgroup>| element is related to the 2023$\rm \beta COOH$ carboxylic group of the amino-acid: 2024 2025\begin{itemize} 2026\item \verb|<name>Lateral COOH</name>|; 2027\item \verb|<pka>3.65</pka>|; 2028\item \verb|<acidcharged>FALSE</acidcharged>|; 2029\item \verb|<polrule>never_trapped</polrule>| This element indicates 2030 that, whatever the position of the monomer bearing the chemical 2031 group in the polymer sequence (left end, right end or middle), the 2032 chemical group is to be evaluated; 2033\item \verb|<chemgrouprule>| This element provides further details on 2034 the chemistry that the chemical group at hand ($\rm \beta COOH$) 2035 might be involved in: 2036 \begin{itemize} 2037 \item \verb|<entity>MONOMER_MODIF</entity>| This element indicates 2038 that the supplementary data in the current \verb|<chemgrouprule>| 2039 element are pertaining to the $\rm \beta COOH$ chemical group 2040 \emph{only} in case the monomer bearing the chemical group is 2041 chemically modified; 2042 \item \verb|<name>AmidationAsp</name>| This is the modification by 2043 which the monomer should be modified in order to have the 2044 \verb|<chemgrouprule>| element effectively evaluated; 2045 \item \verb|<outcome>LOST</outcome>| This element actually indicates 2046 that if the monomer bearing the chemical group is modified with an 2047 ``AmidationAsp'' chemical modification, then the chemical group 2048 should not be evaluated any more for the electrical charge ---or 2049 pI--- calculations, since reacting a carboxylate group with an 2050 amino group produces an amide group which is not easily chargeable 2051 at physiological pH values. 2052 \end{itemize} 2053\end{itemize} 2054 2055\noindent At this point we should have made it clear how the charge 2056calculations can be configured for the different monomers in the 2057polymer chemistry definition. As usual, the more the polymer chemistry 2058definition is sophisticated, the more sophisticated the computations 2059are allowed. 2060 2061 2062\renewcommand{\sectitle}{Ionized Group(s) In Modifications} 2063\subsection*{\textcolor{sectioningcolor}{\sectitle}} 2064\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 2065 2066 2067In the excerpt from the \filename{pka\_ph\_pi.xml} file below, we see 2068that chemical modifications can also bring charges. The example of the 2069chemical modification ``Phosphorylation'' shows that when a monomer is 2070phosphorylated, two chemical groups are brought in: the first has a 2071pKa value of 1.2 (that is it will always be deprotonated at 2072physiological pH values), the second has a pKa value of 7 (that is it 2073will be divided by half in a protonated (not charged) form and in an 2074un-protonated (negatively charged) form, leading to a net electrical 2075charge of $\mathrm{-0.5}$. 2076 2077\begin{alltt} 2078 <modif> 2079 <name>Phosphorylation</name> 2080 <mdfchemgroup> 2081 <name>none_set</name> 2082 <pka>1.2</pka> 2083 <acidcharged>FALSE</acidcharged> 2084 </mdfchemgroup> 2085 <mdfchemgroup> 2086 <name>none_set</name> 2087 <pka>6.5</pka> 2088 <acidcharged>FALSE</acidcharged> 2089 </mdfchemgroup> 2090 </modif> 2091\end{alltt} 2092 2093\noindent At this point we should be able to study the way 2094computations are actually performed in the \xpe\ module. 2095 2096 2097\renewcommand{\sectitle}{pH, pI and Charge Calculations} 2098\subsection*{\textcolor{sectioningcolor}{\sectitle}} 2099\addcontentsline{toc}{subsection}{\numberline{}\sectitle} 2100 2101The user willing to compute charges (positive, negative, net) or the 2102isoelectric point for the current polymer sequence uses the menu 2103\guimenu{Chemistry}\guimenuitem{pKa pH pI} which triggers the 2104appearance of the window shown in 2105Figure~\vref{fig:xpertedit-net-charge-pka-ph-pi}. 2106 2107\begin{figure} 2108 \begin{center} 2109 \includegraphics[width=0.66\textwidth] 2110 {figures/xpertedit-net-charge-pka-ph-pi.png} 2111 \end{center} 2112 \caption[Acido-basic computations: net charges]{\textbf{Acido-basic 2113 computations: net charges.} This figure shows the options that 2114 can be set for the calculation of the charges beared by the 2115 polymer sequence.} 2116 \label{fig:xpertedit-net-charge-pka-ph-pi} 2117\end{figure} 2118 2119This figure shows that the user can calculate the charges (positive, 2120negative and net) beared by the polymer sequence (either the whole 2121sequence or the current selection) by setting the \guilabel{pH} value 2122at which the computation should take place. It is also possible to 2123calculate the isoelectric point by clicking onto the 2124\guilabel{Isoelectric Point} button. 2125 2126Note that the computations might involve the permanent left/right 2127modifications of the polymer sequence, as well as the monomer chemical 2128modifications. To configure the way net charge---or pI---calculations 2129are performed, use the calculations engine configuration of the 2130sequence editor window. 2131\index{\xpe|)} 2132 2133 2134\renewcommand{\sectitle}{General Options} 2135\section*{\textcolor{sectioningcolor}{\sectitle}} 2136\addcontentsline{toc}{section}{\numberline{}\sectitle} 2137 2138One of the options that are valued most by users is to be able to set 2139the number of decimal places used to diplay numbers. The settings 2140should apply in a distinct manner depending on the different entities 2141for which numerical values are to be displayed. The following are the 2142default values (and recommended ones): 2143 2144\begin{itemize} 2145 2146\item Atoms (and all related entities (isotopic masses, isotopic 2147 abundances): 10; 2148 2149\item pKa, pH, pI: 2; 2150 2151\item Oligomers (obtained \textit{via} mass searches, polymer 2152 cleavages, oligomer fragmentations): 5; 2153 2154\item Polymers : 3; 2155 2156\end{itemize} 2157 2158\noindent Note that modifying these values will allow immediate change 2159of the way numerals are displayed, without needing to restart the 2160program. Only triggering a new cleavage or a new fragmentation will 2161update the data display according to the new options set. These 2162options are stored on the disk and are permanent. 2163 2164\cleardoublepage 2165 2166 2167%%% Local Variables: 2168%%% mode: latex 2169%%% TeX-master: "polyxmass" 2170%%% End: 2171