1Function mode for Search & replace in the Editor 2================================================ 3 4The :guilabel:`Search & replace` tool in the editor support a *function mode*. 5In this mode, you can combine regular expressions (see :doc:`regexp`) with 6arbitrarily powerful Python functions to do all sorts of advanced text 7processing. 8 9In the standard *regexp* mode for search and replace, you specify both a 10regular expression to search for as well as a template that is used to replace 11all found matches. In function mode, instead of using a fixed template, you 12specify an arbitrary function, in the 13`Python programming language <https://docs.python.org>`_. This allows 14you to do lots of things that are not possible with simple templates. 15 16Techniques for using function mode and the syntax will be described by means of 17examples, showing you how to create functions to perform progressively more 18complex tasks. 19 20 21.. image:: images/function_replace.png 22 :alt: The Function mode 23 :align: center 24 25Automatically fixing the case of headings in the document 26--------------------------------------------------------- 27 28Here, we will leverage one of the builtin functions in the editor to 29automatically change the case of all text inside heading tags to title case:: 30 31 Find expression: <([Hh][1-6])[^>]*>.+?</\1> 32 33For the function, simply choose the :guilabel:`Title-case text (ignore tags)` builtin 34function. The will change titles that look like: ``<h1>some TITLE</h1>`` to 35``<h1>Some Title</h1>``. It will work even if there are other HTML tags inside 36the heading tags. 37 38 39Your first custom function - smartening hyphens 40----------------------------------------------- 41 42The real power of function mode comes from being able to create your own 43functions to process text in arbitrary ways. The Smarten Punctuation tool in 44the editor leaves individual hyphens alone, so you can use the this function to 45replace them with em-dashes. 46 47To create a new function, simply click the :guilabel:`Create/edit` button to create a new 48function and copy the Python code from below. 49 50.. code-block:: python 51 52 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 53 return match.group().replace('--', '—').replace('-', '—') 54 55Every :guilabel:`Search & replace` custom function must have a unique name and consist of a 56Python function named replace, that accepts all the arguments shown above. 57For the moment, we won't worry about all the different arguments to 58``replace()`` function. Just focus on the ``match`` argument. It represents a 59match when running a search and replace. Its full documentation in available 60`here <https://docs.python.org/library/re.html#match-objects>`_. 61``match.group()`` simply returns all the matched text and all we do is replace 62hyphens in that text with em-dashes, first replacing double hyphens and 63then single hyphens. 64 65Use this function with the find regular expression:: 66 67 >[^<>]+< 68 69And it will replace all hyphens with em-dashes, but only in actual text and not 70inside HTML tag definitions. 71 72 73The power of function mode - using a spelling dictionary to fix mis-hyphenated words 74------------------------------------------------------------------------------------ 75 76Often, e-books created from scans of printed books contain mis-hyphenated words 77-- words that were split at the end of the line on the printed page. We will 78write a simple function to automatically find and fix such words. 79 80.. code-block:: python 81 82 import regex 83 from calibre import replace_entities 84 from calibre import prepare_string_for_xml 85 86 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 87 88 def replace_word(wmatch): 89 # Try to remove the hyphen and replace the words if the resulting 90 # hyphen free word is recognized by the dictionary 91 without_hyphen = wmatch.group(1) + wmatch.group(2) 92 if dictionaries.recognized(without_hyphen): 93 return without_hyphen 94 return wmatch.group() 95 96 # Search for words split by a hyphen 97 text = replace_entities(match.group()[1:-1]) # Handle HTML entities like & 98 corrected = regex.sub(r'(\w+)\s*-\s*(\w+)', replace_word, text, flags=regex.VERSION1 | regex.UNICODE) 99 return '>%s<' % prepare_string_for_xml(corrected) # Put back required entities 100 101Use this function with the same find expression as before, namely:: 102 103 >[^<>]+< 104 105And it will magically fix all mis-hyphenated words in the text of the book. The 106main trick is to use one of the useful extra arguments to the replace function, 107``dictionaries``. This refers to the dictionaries the editor itself uses to 108spell check text in the book. What this function does is look for words 109separated by a hyphen, remove the hyphen and check if the dictionary recognizes 110the composite word, if it does, the original words are replaced by the hyphen 111free composite word. 112 113Note that one limitation of this technique is it will only work for 114mono-lingual books, because, by default, ``dictionaries.recognized()`` uses the 115main language of the book. 116 117 118Auto numbering sections 119----------------------- 120 121Now we will see something a little different. Suppose your HTML file has many 122sections, each with a heading in an :code:`<h2>` tag that looks like 123:code:`<h2>Some text</h2>`. You can create a custom function that will 124automatically number these headings with consecutive section numbers, so that 125they look like :code:`<h2>1. Some text</h2>`. 126 127.. code-block:: python 128 129 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 130 section_number = '%d. ' % number 131 return match.group(1) + section_number + match.group(2) 132 133 # Ensure that when running over multiple files, the files are processed 134 # in the order in which they appear in the book 135 replace.file_order = 'spine' 136 137Use it with the find expression:: 138 139 (?s)(<h2[^<>]*>)(.+?</h2>) 140 141Place the cursor at the top of the file and click :guilabel:`Replace all`. 142 143This function uses another of the useful extra arguments to ``replace()``: the 144``number`` argument. When doing a :guilabel:`Replace All` number is 145automatically incremented for every successive match. 146 147Another new feature is the use of ``replace.file_order`` -- setting that to 148``'spine'`` means that if this search is run on multiple HTML files, the files 149are processed in the order in which they appear in the book. See 150:ref:`file_order_replace_all` for details. 151 152 153Auto create a Table of Contents 154------------------------------- 155 156Finally, lets try something a little more ambitious. Suppose your book has 157headings in ``h1`` and ``h2`` tags that look like 158``<h1 id="someid">Some Text</h1>``. We will auto-generate an HTML Table of 159Contents based on these headings. Create the custom function below: 160 161.. code-block:: python 162 163 from calibre import replace_entities 164 from calibre.ebooks.oeb.polish.toc import TOC, toc_to_html 165 from calibre.gui2.tweak_book import current_container 166 from calibre.ebooks.oeb.base import xml2str 167 168 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 169 if match is None: 170 # All matches found, output the resulting Table of Contents. 171 # The argument metadata is the metadata of the book being edited 172 if 'toc' in data: 173 toc = data['toc'] 174 root = TOC() 175 for (file_name, tag_name, anchor, text) in toc: 176 parent = root.children[-1] if tag_name == 'h2' and root.children else root 177 parent.add(text, file_name, anchor) 178 toc = toc_to_html(root, current_container(), 'toc.html', 'Table of Contents for ' + metadata.title, metadata.language) 179 print (xml2str(toc)) 180 else: 181 print ('No headings to build ToC from found') 182 else: 183 # Add an entry corresponding to this match to the Table of Contents 184 if 'toc' not in data: 185 # The entries are stored in the data object, which will persist 186 # for all invocations of this function during a 'Replace All' operation 187 data['toc'] = [] 188 tag_name, anchor, text = match.group(1), replace_entities(match.group(2)), replace_entities(match.group(3)) 189 data['toc'].append((file_name, tag_name, anchor, text)) 190 return match.group() # We don't want to make any actual changes, so return the original matched text 191 192 # Ensure that we are called once after the last match is found so we can 193 # output the ToC 194 replace.call_after_last_match = True 195 # Ensure that when running over multiple files, this function is called, 196 # the files are processed in the order in which they appear in the book 197 replace.file_order = 'spine' 198 199And use it with the find expression:: 200 201 <(h[12]) [^<>]* id=['"]([^'"]+)['"][^<>]*>([^<>]+) 202 203Run the search on :guilabel:`All text files` and at the end of the search, a 204window will popup with "Debug output from your function" which will have the 205HTML Table of Contents, ready to be pasted into :file:`toc.html`. 206 207The function above is heavily commented, so it should be easy to follow. The 208key new feature is the use of another useful extra argument to the 209``replace()`` function, the ``data`` object. The ``data`` object is a Python 210*dict* that persists between all successive invocations of ``replace()`` during 211a single :guilabel:`Replace All` operation. 212 213Another new feature is the use of ``call_after_last_match`` -- setting that to 214``True`` on the ``replace()`` function means that the editor will call 215``replace()`` one extra time after all matches have been found. For this extra 216call, the match object will be ``None``. 217 218This was just a demonstration to show you the power of function mode, 219if you really needed to generate a Table of Contents from headings in your book, 220you would be better off using the dedicated Table of Contents tool in 221:guilabel:`Tools->Table of Contents`. 222 223The API for the function mode 224----------------------------- 225 226All function mode functions must be Python functions named replace, with the 227following signature:: 228 229 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 230 return a_string 231 232When a find/replace is run, for every match that is found, the ``replace()`` 233function will be called, it must return the replacement string for that match. 234If no replacements are to be done, it should return ``match.group()`` which is 235the original string. The various arguments to the ``replace()`` function are 236documented below. 237 238The ``match`` argument 239^^^^^^^^^^^^^^^^^^^^^^ 240 241The ``match`` argument represents the currently found match. It is a 242`Python Match object <https://docs.python.org/library/re.html#match-objects>`_. 243Its most useful method is ``group()`` which can be used to get the matched 244text corresponding to individual capture groups in the search regular 245expression. 246 247The ``number`` argument 248^^^^^^^^^^^^^^^^^^^^^^^ 249 250The ``number`` argument is the number of the current match. When you run 251:guilabel:`Replace All`, every successive match will cause ``replace()`` to be 252called with an increasing number. The first match has number 1. 253 254The ``file_name`` argument 255^^^^^^^^^^^^^^^^^^^^^^^^^^ 256 257This is the filename of the file in which the current match was found. When 258searching inside marked text, the ``file_name`` is empty. The ``file_name`` is 259in canonical form, a path relative to the root of the book, using ``/`` as the 260path separator. 261 262The ``metadata`` argument 263^^^^^^^^^^^^^^^^^^^^^^^^^ 264 265This represents the metadata of the current book, such as title, authors, 266language, etc. It is an object of class :class:`calibre.ebooks.metadata.book.base.Metadata`. 267Useful attributes include, ``title``, ``authors`` (a list of authors) and 268``language`` (the language code). 269 270The ``dictionaries`` argument 271^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 272 273This represents the collection of dictionaries used for spell checking the 274current book. Its most useful method is ``dictionaries.recognized(word)`` 275which will return ``True`` if the passed in word is recognized by the dictionary 276for the current book's language. 277 278The ``data`` argument 279^^^^^^^^^^^^^^^^^^^^^ 280 281This a simple Python ``dict``. When you run 282:guilabel:`Replace all`, every successive match will cause ``replace()`` to be 283called with the same ``dict`` as data. You can thus use it to store arbitrary 284data between invocations of ``replace()`` during a :guilabel:`Replace all` 285operation. 286 287The ``functions`` argument 288^^^^^^^^^^^^^^^^^^^^^^^^^^ 289 290The ``functions`` argument gives you access to all other user defined 291functions. This is useful for code re-use. You can define utility functions in 292one place and re-use them in all your other functions. For example, suppose you 293create a function name ``My Function`` like this: 294 295.. code-block:: python 296 297 def utility(): 298 # do something 299 300 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 301 ... 302 303Then, in another function, you can access the ``utility()`` function like this: 304 305.. code-block:: python 306 307 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 308 utility = functions['My Function']['utility'] 309 ... 310 311You can also use the functions object to store persistent data, that can be 312re-used by other functions. For example, you could have one function that when 313run with :guilabel:`Replace All` collects some data and another function that 314uses it when it is run afterwards. Consider the following two functions: 315 316.. code-block:: python 317 318 # Function One 319 persistent_data = {} 320 321 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 322 ... 323 persistent_data['something'] = 'some data' 324 325 # Function Two 326 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 327 persistent_data = functions['Function One']['persistent_data'] 328 ... 329 330Debugging your functions 331^^^^^^^^^^^^^^^^^^^^^^^^ 332 333You can debug the functions you create by using the standard ``print()`` 334function from Python. The output of print will be displayed in a popup window 335after the Find/replace has completed. You saw an example of using ``print()`` 336to output an entire table of contents above. 337 338.. _file_order_replace_all: 339 340Choose file order when running on multiple HTML files 341^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 342 343When you run a :guilabel:`Replace all` on multiple HTML files, the order in 344which the files are processes depends on what files you have open for editing. 345You can force the search to process files in the order in which the appear by 346setting the ``file_order`` attribute on your function, like this: 347 348.. code-block:: python 349 350 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 351 ... 352 353 replace.file_order = 'spine' 354 355``file_order`` accepts two values, ``spine`` and ``spine-reverse`` which cause 356the search to process multiple files in the order they appear in the book, 357either forwards or backwards, respectively. 358 359Having your function called an extra time after the last match is found 360^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 361 362Sometimes, as in the auto generate table of contents example above, it is 363useful to have your function called an extra time after the last match is 364found. You can do this by setting the ``call_after_last_match`` attribute on your 365function, like this: 366 367.. code-block:: python 368 369 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 370 ... 371 372 replace.call_after_last_match = True 373 374 375Appending the output from the function to marked text 376^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 377 378When running search and replace on marked text, it is sometimes useful to 379append so text to the end of the marked text. You can do that by setting 380the ``append_final_output_to_marked`` attribute on your function (note that you 381also need to set ``call_after_last_match``), like this: 382 383.. code-block:: python 384 385 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 386 ... 387 return 'some text to append' 388 389 replace.call_after_last_match = True 390 replace.append_final_output_to_marked = True 391 392Suppressing the result dialog when performing searches on marked text 393^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 394 395You can also suppress the result dialog (which can slow down the repeated 396application of a search/replace on many blocks of text) by setting 397the ``suppress_result_dialog`` attribute on your function, like this: 398 399.. code-block:: python 400 401 def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 402 ... 403 404 replace.suppress_result_dialog = True 405 406 407More examples 408---------------- 409 410More useful examples, contributed by calibre users, can be found in the 411`calibre E-book editor forum <https://www.mobileread.com/forums/showthread.php?t=237181>`_. 412