1.. include:: global.inc 2 3.. role:: raw-html(raw) 4 :format: html 5 6:raw-html:`<style> .red {color:red} </style>` 7 8.. role:: red 9 10 11######################################## 12Latest Changes 13######################################## 14 15Major Features added to Ruffus 16 17.. note:: 18 19 See :ref:`To do list <todo>` for future enhancements to Ruffus 20 21 22******************************************************************** 23version 2.8.1 24******************************************************************** 25 26* compatibility with gevent >= 1.2 27 28******************************************************************** 29version 2.8.1 30******************************************************************** 31 32* Ctrl-C will kill drmaa jobs 33* python3.7 compatibility, thanks to @jbarlow83, @QuLogic 34 35******************************************************************** 36version 2.6.3 37******************************************************************** 38 21st April 2015 39 40===================================================================================================================== 41Bug fixes and minor enhancements 42===================================================================================================================== 43 44 * `@transform(..., suffix("xxx"),` :red:`output_dir` `= "/new/output/path")` works even when the ouput has more than one file `(github) <https://github.com/bunbun/ruffus/issues/43>`__ 45 * `@subdivide( ..., suffix("xxx"),` :red:`output_dir` `= "/new/output/path")` works in exactly the same way as `@transform(..., outputdir="xxx")` `(github) <https://github.com/bunbun/ruffus/issues/42>`__ 46 * `ruffus.drmaa_wrapper.run_job()` works with python3 `(github) <https://github.com/bunbun/ruffus/issues/46>`__ 47 Fixed issue with byte and text streams. 48 * `ruffus.drmaa.wrapper.run_job()` allows env (environment) to be set for jobs run locally as well as those on the cluster `(github) <https://github.com/bunbun/ruffus/issues/44>`__ 49 * New object-orientated style syntax works seamlessly with Ruffus command line support `ruffus.cmdline.run` `(github) <https://github.com/bunbun/ruffus/issues/48>`__. 50 51 52 53 54******************************************************************** 55version 2.6.2 56******************************************************************** 57 58 12th March 2015 59 60===================================================================================================================== 611) Bug fixes 62===================================================================================================================== 63 64 * ``pipeline_printout_graph()`` incompatibility with python3 fixed 65 * checkpointing did not work correctly with :ref:`@split(...) <decorators.split>` and :ref:`@subdivide(...) <decorators.subdivide>` 66 67 68===================================================================================================================== 692) `@transform `(..., suffix("xxx"),` :red:`output_dir` `= "/new/output/path")` 70===================================================================================================================== 71 72 Thanks to the suggestion of Milan Simonovic. 73 74 :ref:`@transform(..., suffix(...) ) <decorators.transform>` has easy to understand syntax and takes care of all the common use cases 75 of Ruffus. 76 77 However, when we need to place the output in a different directories, we suddenly have to plunge into the deep end and parse file paths using 78 :ref:`regex() <decorators.regex>` or :ref:`formatter() <new_manual.formatter>`. 79 80 Now, :ref:`@transform <decorators.transform>` takes an optional ``output_dir`` named parameter so that we can continue to use :ref:`suffix() <new_manual.suffix>` even when the output needs 81 to go into a new directory. 82 83 .. <<Python 84 85 .. code-block:: python 86 :emphasize-lines: 2,3,9 87 88 # 89 # input/a.fasta -> output/a.sam 90 # input/b.fasta -> output/b.sam 91 # 92 starting_files = ["input/a.fasta","input/b.fasta"] 93 @transform(starting_files, 94 suffix('.fasta'), 95 '.sam', 96 output_dir = "output") 97 def map_dna_sequence(input_file, output_file) : 98 pass 99 100 .. 101 Python 102 103 See example ``test\test_suffix_output_dir.py`` 104 105===================================================================================================================== 1062) Named parameters 107===================================================================================================================== 108 109 Decorators can take named parameters. 110 111 These are self documenting, and improve clarity. 112 113 Note that the usual Python rules for function parameters apply: 114 115 * Positional arguments must precede named arguments 116 * Named arguments cannot be used to fill in for "missing" positional arguments 117 118 119 For example the following two functions are identical: 120 121 **Positional parameters:** 122 123 .. <<Python 124 125 .. code-block:: python 126 127 @merge(prev_task, ["a.summary", "b.summary"], 14, "extra_info", {"a":45, "b":5}) 128 def merge_task(inputs, outputs, extra_num, extra_str, extra_dict): 129 pass 130 .. 131 Python 132 133 **Named parameters:** 134 135 .. <<Python 136 137 .. code-block:: python 138 139 # new style is a bit clearer 140 @merge(input = prev_task, 141 output = ["a.summary", "b.summary"], 142 extras = [14, "extra_info", {"a":45, "b":5}] 143 ) 144 def merge_task(inputs, outputs, extra_num, extra_str, extra_dict): 145 pass 146 .. 147 Python 148 149 .. warning:: 150 151 ``,extras=`` takes all the *extras* parameters (``14, "extra_info", {"a":45, "b":5}``) as a single list 152 153 * :ref:`@split(...) <decorators.split>` and :ref:`@merge(...) <decorators.merge>` 154 * *input* 155 * *output* 156 * [*extras*\ ] 157 * :ref:`@transform(...) <decorators.transform>` and :ref:`@mkdir(...) <decorators.mkdir>` 158 * *input* 159 * *filter* 160 * [*replace_inputs* or *add_inputs*\ ] 161 * *output* 162 * [*extras*\ ] 163 * [*output_dir*\ ] 164 * :ref:`@collate(...) <decorators.collate>` and :ref:`@subdivide(...) <decorators.collate>` 165 * *input* 166 * *filter* 167 * *output* 168 * [*extras*\ ] 169 * :ref:`@originate(...) <decorators.originate>` 170 * *output* 171 * [*extras*\ ] 172 * :ref:`@product(...) <decorators.product>`, :ref:`@permutations(...) <decorators.permutations>`, :ref:`@combinations(...) <decorators.combinations>`, and :ref:`@combinations_with_replacement(...) <decorators.combinations_with_replacement>` 173 * *input* 174 * *filter* 175 * [*input2...NNN*\ ] (only for ``product``) 176 * [*filter2...NNN*\ ] (only for ``product``) where NNN is an incrementing number 177 * *tuple_size* (except for ``product``) 178 * [*replace_inputs* or *add_inputs*\ ] 179 * *output* 180 * [*extras*\ ] 181 182 183 184============================================= 1853) New object orientated syntax for Ruffus 186============================================= 187 188 Ruffus Pipelines can now be created directly using the new ``Pipeline`` and ``Task`` objects instead of via decorators. 189 190 .. <<python 191 192 .. code-block:: python 193 :emphasize-lines: 9 194 195 # make ruffus pipeline 196 my_pipeline = Pipeline(name = "test") 197 my_pipeline.transform(task_func = map_dna_sequence, 198 input = starting_files, 199 filter = suffix('.fasta'), 200 output = '.sam', 201 output_dir = "output") 202 203 my_pipeline.run() 204 .. 205 python 206 207 This new syntax is fully compatible and inter-operates with traditional Ruffus syntax using decorators. 208 209 Apart from cosmetic changes, the new syntax allows different instances of modular Ruffus sub-pipelines 210 to be defined separately, in different python modules and then joined together flexible at runtime. 211 212 The new syntax and discussion are introduced :ref:`here <new_syntax>`. 213 214 215 216******************************************************************** 217version 2.5 218******************************************************************** 219 220 6th August 2014 221 222============================================================================================================================================================ 2231) Python3 compatability (but at least python 2.6 is now required) 224============================================================================================================================================================ 225 226 Ruffus v2.5 is now python3 compatible. This has required surprisingly many changes to the codebase. Please report any bugs to me. 227 228 .. note:: 229 230 **Ruffus now requires at least python 2.6** 231 232 It proved to be impossible to support python 2.5 and python 3.x at the same time. 233 234============================================================================================================================================================ 2352) Ctrl-C interrupts 236============================================================================================================================================================ 237 238 Ruffus now mostly(!) terminates gracefully when interrupted by Ctrl-C . 239 240 Please send me bug reports for when this doesn't work with a minimally reproducible case. 241 242 This means that, in general, if an ``Exception`` is thrown during your pipeline but you don't want to wait for the rest of the jobs to complete, you can still press Ctrl-C at any point. 243 Note that you may still need to clean up spawned processes, for example, using ``qdel`` if you are using ``Ruffus.drmaa_wrapper`` 244 245============================================================================================================================================================ 2463) Customising flowcharts in pipeline_printout_graph() with ``@graphviz`` 247============================================================================================================================================================ 248 249 *Contributed by Sean Davis, with improved syntax via Jake Biesinger* 250 251 The graphics for each task can have its own attributes (URL, shape, colour) etc. by adding 252 `graphviz attributes <http://www.graphviz.org/doc/info/attrs.html>`__ 253 using the ``@graphviz`` decorator. 254 255 * This allows HTML formatting in the task names (using the ``label`` parameter as in the following example). 256 HTML labels **must** be enclosed in ``<`` and ``>``. E.g. 257 258 .. code-block:: python 259 260 label = "<Line <BR/> wrapped task_name()>" 261 262 * You can also opt to keep the task name and wrap it with a prefix and suffix: 263 264 .. code-block:: python 265 266 label_suffix = "??? ", label_prefix = ": What is this?" 267 268 * The ``URL`` attribute allows the generation of clickable svg, and also client / server 269 side image maps usable in web pages. 270 See `Graphviz documentation <http://www.graphviz.org/content/output-formats#dimap>`__ 271 272 273 Example: 274 275 .. code-block:: python 276 277 278 @graphviz(URL='"http://cnn.com"', fillcolor = '"#FFCCCC"', 279 color = '"#FF0000"', pencolor='"#FF0000"', fontcolor='"#4B6000"', 280 label_suffix = "???", label_prefix = "What is this?<BR/> ", 281 label = "<What <FONT COLOR=\"red\">is</FONT>this>", 282 shape= "component", height = 1.5, peripheries = 5, 283 style="dashed") 284 def Up_to_date_task2(infile, outfile): 285 pass 286 287 # Can use dictionary if you wish... 288 graphviz_params = {"URL":"http://cnn.com", "fontcolor": '"#FF00FF"'} 289 @graphviz(**graphviz_params) 290 def myTask(input,output): 291 pass 292 293 .. ** 294 295 .. image:: images/history_html_flowchart.png 296 :scale: 30 297 298 299============================================================================================================================================================ 3004. Consistent verbosity levels 301============================================================================================================================================================ 302 303 The verbosity levels are now more fine-grained and consistent between pipeline_printout and pipeline_run. 304 Note that At verbosity > 2, ``pipeline_run`` outputs lists of up-to-date tasks before running the pipeline. 305 Many users who defaulted to using a verbosity of 3 may want to move up to ``verbose = 4``. 306 307 * **level 0** : *Nothing* 308 * **level 1** : *Out-of-date Task names* 309 * **level 2** : *All Tasks (including any task function docstrings)* 310 * **level 3** : *Out-of-date Jobs in Out-of-date Tasks, no explanation* 311 * **level 4** : *Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings* 312 * **level 5** : *All Jobs in Out-of-date Tasks, (include only list of up-to-date tasks)* 313 * **level 6** : *All jobs in All Tasks whether out of date or not* 314 * **level 10**: *Logs messages useful only for debugging ruffus pipeline code* 315 316 * Defaults to **level 4** for pipeline_printout: *Out of date jobs, with explanations and warnings* 317 * Defaults to **level 1** for pipeline_run: *Out-of-date Task names* 318 319============================================================================================================================================================ 3205. Allow abbreviated paths from ``pipeline_run`` or ``pipeline_printout`` 321============================================================================================================================================================ 322 323 .. note :: 324 325 Please contact me with suggestions if you find the abbreviations useful but "aesthetically challenged"! 326 327 Some pipelines produce interminable lists of long filenames. It would be nice to be able to abbreviate this 328 to just enough information to follow the progress. 329 330 Ruffus now allows either 331 1) Only the nth top level sub-directories to be included 332 2) The message to be truncated to a specified number of characters (to fit on a line, for example) 333 334 Note that the number of characters specified is the separate length of the input and output parameters, 335 not the entire message. You many need to specify a smaller limit that you expect (e.g. ``60`` rather than `80`) 336 337 .. code-block:: python 338 339 pipeline_printout(verbose_abbreviated_path = NNN) 340 pipeline_run(verbose_abbreviated_path = -MMM) 341 342 343 The ``verbose_abbreviated_path`` parameter restricts the length of input / output file paths to either 344 345 * NNN levels of nested paths 346 * A total of MMM characters, MMM is specified by setting ``verbose_abbreviated_path`` to -MMM (i.e. negative values) 347 348 ``verbose_abbreviated_path`` defaults to ``2`` 349 350 351 For example: 352 353 Given ``["aa/bb/cc/dddd.txt", "aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt"]`` 354 355 356 .. code-block:: python 357 :emphasize-lines: 1,4,8,19 358 359 # Original relative paths 360 "[aa/bb/cc/dddd.txt, aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt]" 361 362 # Full abspath 363 verbose_abbreviated_path = 0 364 "[/test/ruffus/src/aa/bb/cc/dddd.txt, /test/ruffus/src/aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt]" 365 366 # Specifed level of nested directories 367 verbose_abbreviated_path = 1 368 "[.../dddd.txt, .../gggg.txt]" 369 370 verbose_abbreviated_path = 2 371 "[.../cc/dddd.txt, .../ffff/gggg.txt]" 372 373 verbose_abbreviated_path = 3 374 "[.../bb/cc/dddd.txt, .../eeee/ffff/gggg.txt]" 375 376 377 # Truncated to MMM characters 378 verbose_abbreviated_path = -60 379 "<???> /bb/cc/dddd.txt, aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt]" 380 381 382 If you are using ``ruffus.cmdline``, the abbreviated path lengths can be specified on 383 the command line as an extension to the verbosity: 384 385 .. code-block:: bash 386 :emphasize-lines: 4,7 387 388 # verbosity of 4 389 yourscript.py --verbose 4 390 391 # display three levels of nested directories 392 yourscript.py --verbose 4:3 393 394 # restrict input and output parameters to 60 letters 395 yourscript.py --verbose 4:-60 396 397 398 The number after the colon is the abbreviated path length 399 400 401============================================================================================================================================================ 402Other changes 403============================================================================================================================================================ 404 * BUG FIX: Output producing wild cards was not saved in the checksum files!!! 405 * BUG FIX: @mkdir bug under Windows. Thanks to Sean Turley. (Aargh! Different exceptions are thrown in Windows vs. Linux for the same condition!) 406 * Added :ref:`pipeline_get_task_names(...) <pipeline_functions.pipeline_get_task_names>` which returns all task name as a list of strings. Thanks to Clare Sloggett 407 408 409******************************************************************** 410version 2.4.1 411******************************************************************** 412 413 26th April 2014 414 415 * Breaking changes to drmaa API suggested by Bernie Pope to ensure portability across different drmaa implementations (SGE, SLURM etc.) 416 417******************************************************************** 418version 2.4 419******************************************************************** 420 421 4th April 2014 422 423============================================================================================================================================================ 424Additions to ``ruffus`` namespace 425============================================================================================================================================================ 426 427 * :ref:`formatter() <new_manual.formatter>` (:ref:`syntax <decorators.formatter>`) 428 * :ref:`originate() <new_manual.originate>` (:ref:`syntax <decorators.originate>`) 429 * :ref:`subdivide() <new_manual.subdivide>` (:ref:`syntax <decorators.subdivide>`) 430 431============================================================================================================================================================ 432Installation: use pip 433============================================================================================================================================================ 434 435 :: 436 437 sudo pip install ruffus --upgrade 438 439============================================================================================================================================================ 4401) Command Line support 441============================================================================================================================================================ 442 443 The optional ``Ruffus.cmdline`` module provides support for a set of common command 444 line arguments which make writing *Ruffus* pipelines much more pleasant. 445 See :ref:`manual <new_manual.cmdline>` 446 447============================================================================================================================================================ 4482) Check pointing 449============================================================================================================================================================ 450 451 * Contributed by **Jake Biesinger** 452 * See :ref:`Manual <new_manual.checkpointing>` 453 * Uses a fault resistant sqlite database file to log i/o files, and additional checksums 454 * defaults to checking file timestamps stored in the current directory (``ruffus_utilility.RUFFUS_HISTORY_FILE = '.ruffus_history.sqlite'``) 455 * :ref:`pipeline_run(..., checksum_level = N, ...) <pipeline_functions.pipeline_run>` 456 457 * level 0 = CHECKSUM_FILE_TIMESTAMPS : Classic mode. Use only file timestamps (no checksum file will be created) 458 * level 1 = CHECKSUM_HISTORY_TIMESTAMPS : Also store timestamps in a database after successful job completion 459 * level 2 = CHECKSUM_FUNCTIONS : As above, plus a checksum of the pipeline function body 460 * level 3 = CHECKSUM_FUNCTIONS_AND_PARAMS : As above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators 461 462 * defaults to level 1 463 464 * Can speed up trivial tasks: Previously Ruffus always added an extra 1 second pause between tasks 465 to guard against file systems (Ext3, FAT, some NFS) with low timestamp granularity. 466 467 468============================================================================================================================================================ 4693) :ref:`subdivide() <new_manual.subdivide>` (:ref:`syntax <decorators.subdivide>`) 470============================================================================================================================================================ 471 472 * Take a list of input jobs (like :ref:`@transform <decorators.transform>`) but further splits each into multiple jobs, i.e. it is a **many->even more** relationship 473 * synonym for the deprecated ``@split(..., regex(), ...)`` 474 475======================================================================================================================================================================================================================================================================================================================== 4764) :ref:`mkdir() <new_manual.mkdir>` (:ref:`syntax <decorators.mkdir>`) with :ref:`formatter() <new_manual.formatter>`, :ref:`suffix() <decorators.suffix>` and :ref:`regex() <decorators.regex>` 477======================================================================================================================================================================================================================================================================================================================== 478 479 * allows directories to be created depending on runtime parameters or the output of previous tasks 480 * behaves just like :ref:`@transform <decorators.transform>` but with its own (internal) function which does the actual work of making a directory 481 * Previous behavior is retained:``mkdir`` continues to work seamlessly inside :ref:`@follows <decorators.follows>` 482 483============================================================================================================================================================ 4845) :ref:`originate() <new_manual.originate>` (:ref:`syntax <decorators.originate>`) 485============================================================================================================================================================ 486 487 * Generates output files without dependencies from scratch (*ex nihilo*!) 488 * For first step in a pipeline 489 * Task function obviously only takes output and not input parameters. (There *are* no inputs!) 490 * synonym for :ref:`@split(None,...) <decorators.split>` 491 * See :ref:`Summary <decorators.originate>` / :ref:`Manual <new_manual.originate>` 492 493======================================================================================================================================================================================================================================================================================================================== 4946) New flexible :ref:`formatter() <new_manual.formatter>` (:ref:`syntax <decorators.formatter>`) alternative to :ref:`regex() <decorators.regex>` & :ref:`suffix() <decorators.suffix>` 495======================================================================================================================================================================================================================================================================================================================== 496 497 * Easy manipulation of path subcomponents in the style of `os.path.split() <http://docs.python.org/2/library/os.path.html#os.path.split>`__ 498 * Regular expressions are no longer necessary for path manipulation 499 * Familiar python syntax 500 * Optional regular expression matches 501 * Can refer to any in the list of N input files (not only the first file as for ``regex(...)``) 502 * Can even refer to individual letters within a match 503 504============================================================================================================================================================ 5057) Combinatorics (all vs. all decorators) 506============================================================================================================================================================ 507 508 * :ref:`@product <new_manual.product>` (See `itertools.product <http://docs.python.org/2/library/itertools.html#itertools.product>`__) 509 * :ref:`@permutations <new_manual.permutations>` (See `itertools.permutations <http://docs.python.org/2/library/itertools.html#itertools.permutations>`__) 510 * :ref:`@combinations <new_manual.combinations>` (See `itertools.combinations <http://docs.python.org/2/library/itertools.html#itertools.combinations>`__) 511 * :ref:`@combinations_with_replacement <new_manual.combinations_with_replacement>` (See `itertools.combinations_with_replacement <http://docs.python.org/2/library/itertools.html#itertools.combinations_with_replacement>`__) 512 * in optional :ref:`combinatorics <new_manual.combinatorics>` module 513 * Only :ref:`formatter() <new_manual.formatter>` provides the necessary flexibility to construct the output. (:ref:`suffix() <decorators.suffix>` and :ref:`regex() <decorators.regex>` are not supported.) 514 * See :ref:`Summary <decorators.combinatorics>` / :ref:`Manual <new_manual.combinatorics>` 515 516 517 518============================================================================================================================================================ 5198) drmaa support and multithreading: 520============================================================================================================================================================ 521 522 * :ref:`ruffus.drmaa_wrapper.run_job() <new_manual.ruffus.drmaa_wrapper.run_job>` (:ref:`syntax <drmaa_wrapper.run_job>`) 523 * Optional helper module allows jobs to dispatch work to a computational cluster and wait until it completes. 524 * Requires ``multithread`` rather than ``multiprocess`` 525 526============================================================================================================================================================ 5279) ``pipeline_run(...)`` and exceptions 528============================================================================================================================================================ 529 See :ref:`Manual <new_manual.exceptions>` 530 531 * Optionally terminate pipeline after first exception 532 * Display exceptions without delay 533 534 535============================================================================================================================================================ 53610) Miscellaneous 537============================================================================================================================================================ 538 539 Better error messages for ``formatter()``, ``suffix()`` and ``regex()`` for ``pipeline_printout(..., verbose >= 3, ...)`` 540 * Error messages for showing mismatching regular expression and offending file name 541 * Wrong capture group names or out of range indices will raise informative Exception 542 543******************************************************************** 544version 2.3 545******************************************************************** 546 1st September, 2013 547 548 * ``@active_if`` turns off tasks at runtime 549 The Design and initial implementation were contributed by Jacob Biesinger 550 551 Takes one or more parameters which can be either booleans or functions or callable objects which return True / False:: 552 553 run_if_true_1 = True 554 run_if_true_2 = False 555 556 @active_if(run_if_true, lambda: run_if_true_2) 557 def this_task_might_be_inactive(): 558 pass 559 560 The expressions inside @active_if are evaluated each time 561 ``pipeline_run``, ``pipeline_printout`` or ``pipeline_printout_graph`` is called. 562 563 Dormant tasks behave as if they are up to date and have no output. 564 565 * Command line parsing 566 * Supports both argparse (python 2.7) and optparse (python 2.6): 567 * ``Ruffus.cmdline`` module is optional. 568 * See :ref:`manual <new_manual.cmdline>` 569 * Optionally terminate pipeline after first exception 570 To have all exceptions interrupt immediately:: 571 572 pipeline_run(..., exceptions_terminate_immediately = True) 573 574 By default ruffus accumulates ``NN`` errors before interrupting the pipeline prematurely. ``NN`` is the specified parallelism for ``pipeline_run(..., multiprocess = NN)``. 575 576 Otherwise, a pipeline will only be interrupted immediately if exceptions of type ``ruffus.JobSignalledBreak`` are thrown. 577 578 * Display exceptions without delay 579 580 By default, Ruffus re-throws exceptions in ensemble after pipeline termination. 581 582 To see exceptions as they occur:: 583 584 pipeline_run(..., log_exceptions = True) 585 586 ``logger.error(...)`` will be invoked with the string representation of the each exception, and associated stack trace. 587 588 The default logger prints to sys.stderr, but this can be changed to any class from the logging module or compatible object via ``pipeline_run(..., logger = ???)`` 589 590 * Improved ``pipeline_printout()`` 591 592 * `@split` operations now show the 1->many output in pipeline_printout 593 594 This make it clearer that ``@split`` is creating multiple output parameters (rather than a single output parameter consisting of a list):: 595 596 Task = split_animals 597 Job = [None 598 -> cows 599 -> horses 600 -> pigs 601 , any_extra_parameters] 602 * File date and time are displayed in human readable form and out of date files are flagged with asterisks. 603 604 605 606******************************************************************** 607version 2.2 608******************************************************************** 609 22nd July, 2010 610 611 * Simplifying **@transform** syntax with **suffix(...)** 612 613 Regular expressions within ruffus are very powerful, and can allow files to be moved 614 from one directory to another and renamed at will. 615 616 However, using consistent file extensions and 617 ``@transform(..., suffix(...))`` makes the code much simpler and easier to read. 618 619 Previously, ``suffix(...)`` did not cooperate well with ``inputs(...)``. 620 For example, finding the corresponding header file (".h") for the matching input 621 required a complicated ``regex(...)`` regular expression and ``input(...)``. This simple case, 622 e.g. matching "something.c" with "something.h", is now much easier in Ruffus. 623 624 625 For example: 626 :: 627 628 source_files = ["something.c", "more_code.c"] 629 @transform(source_files, suffix(".c"), add_inputs(r"\1.h", "common.h"), ".o") 630 def compile(input_files, output_file): 631 ( source_file, 632 header_file, 633 common_header) = input_files 634 # call compiler to make object file 635 636 This is equivalent to calling: 637 638 :: 639 640 compile(["something.c", "something.h", "common.h"], "something.o") 641 compile(["more_code.c", "more_code.h", "common.h"], "more_code.o") 642 643 The ``\1`` matches everything *but* the suffix and will be applied to both ``glob``\ s and file names. 644 645 For simplicity and compatibility with previous versions, there is always an implied r"\1" before 646 the output parameters. I.e. output parameters strings are *always* substituted. 647 648 649 * Tasks and glob in **inputs(...)** and **add_inputs(...)** 650 651 ``glob``\ s and tasks can be added as the prerequisites / input files using 652 ``inputs(...)`` and ``add_inputs(...)``. ``glob`` expansions will take place when the task 653 is run. 654 655 * Advanced form of **@split** with **regex**: 656 657 The standard ``@split`` divided one set of inputs into multiple outputs (the number of which 658 can be determined at runtime). 659 660 This is a ``one->many`` operation. 661 662 663 An advanced form of ``@split`` has been added which can split each of several files further. 664 665 In other words, this is a ``many->"many more"`` operation. 666 667 For example, given three starting files: 668 :: 669 670 original_files = ["original_0.file", 671 "original_1.file", 672 "original_2.file"] 673 We can split each into its own set of sub-sections: 674 :: 675 676 @split(original_files, 677 regex(r"starting_(\d+).fa"), # match starting files 678 r"files.split.\1.*.fa" # glob pattern 679 r"\1") # index of original file 680 def split_files(input_file, output_files, original_index): 681 """ 682 Code to split each input_file 683 "original_0.file" -> "files.split.0.*.fa" 684 "original_1.file" -> "files.split.1.*.fa" 685 "original_2.file" -> "files.split.2.*.fa" 686 """ 687 688 689 This is, conceptually, the reverse of the @collate(...) decorator 690 691 * Ruffus will complain about unescaped regular expression special characters: 692 693 Ruffus uses "\\1" and "\\2" in regular expression substitutions. Even seasoned python 694 users may not remember that these have to be 'escaped' in strings. The best option is 695 to use 'raw' python strings e.g. 696 697 :: 698 699 r"\1_substitutes\2correctly\3four\4times" 700 701 Ruffus will throw an exception if it sees an unescaped "\\1" or "\\2" in a file name, 702 which should catch most of these bugs. 703 704 * Prettier output from *pipeline_printout_graph* 705 706 Changed to nicer colours, symbols etc. for a more professional look. 707 @split and @merge tasks now look different from @transform. 708 Colours, size and resolution are now fully customisable:: 709 710 pipeline_printout_graph( #... 711 user_colour_scheme = { 712 "colour_scheme_index":1, 713 "Task to run" : {"fillcolor":"blue"}, 714 pipeline_name : "My flowchart", 715 size : (11,8), 716 dpi : 120)}) 717 718 An SVG bug in firefox has been worked around so that font size are displayed correctly. 719 720 721 722 723******************************************************************** 724version 2.1.1 725******************************************************************** 726 * **@transform(.., add_inputs(...))** 727 ``add_inputs(...)`` allows the addition of extra input dependencies / parameters for each job. 728 729 Unlike ``inputs(...)``, the original input parameter is retained: 730 :: 731 732 from ruffus import * 733 @transform(["a.input", "b.input"], suffix(".input"), add_inputs("just.1.more","just.2.more"), ".output") 734 def task(i, o): 735 "" 736 737 Produces: 738 :: 739 740 Job = [[a.input, just.1.more, just.2.more] ->a.output] 741 Job = [[b.input, just.1.more, just.2.more] ->b.output] 742 743 744 Like ``inputs``, ``add_inputs`` accepts strings, tasks and ``glob`` s 745 This minor syntactic change promises add much clarity to Ruffus code. 746 ``add_inputs()`` is available for ``@transform``, ``@collate`` and ``@split`` 747 748 749******************************************************************** 750version 2.1.0 751******************************************************************** 752 * **@jobs_limit** 753 Some tasks are resource intensive and too many jobs should not be run at the 754 same time. Examples include disk intensive operations such as unzipping, or 755 downloading from FTP sites. 756 757 Adding:: 758 759 @jobs_limit(4) 760 @transform(new_data_list, suffix(".big_data.gz"), ".big_data") 761 def unzip(i, o): 762 "unzip code goes here" 763 764 would limit the unzip operation to 4 jobs at a time, even if the rest of the 765 pipeline runs highly in parallel. 766 767 (Thanks to Rob Young for suggesting this.) 768 769******************************************************************** 770version 2.0.10 771******************************************************************** 772 * **touch_files_only** option for **pipeline_run** 773 774 When the pipeline runs, task functions will not be run. Instead, the output files for 775 each job (in each task) will be ``touch``\ -ed if necessary. 776 This can be useful for simulating a pipeline run so that all files look as 777 if they are up-to-date. 778 779 Caveats: 780 781 * This may not work correctly where output files are only determined at runtime, e.g. with **@split** 782 * Only the output from pipelined jobs which are currently out-of-date will be ``touch``\ -ed. 783 In other words, the pipeline runs *as normal*, the only difference is that the 784 output files are ``touch``\ -ed instead of being created by the python task functions 785 which would otherwise have been called. 786 787 * Parameter substitution for **inputs(...)** 788 789 The **inputs(...)** parameter in **@transform**, **@collate** can now take tasks and ``glob`` s, 790 and these will be expanded appropriately (after regular expression replacement). 791 792 For example:: 793 794 @transform("dir/a.input", regex(r"(.*)\/(.+).input"), 795 inputs((r"\1/\2.other", r"\1/*.more")), r"elsewhere/\2.output") 796 def task1(i, o): 797 """ 798 Some pipeline task 799 """ 800 801 Is equivalent to calling:: 802 803 task1(("dir/a.other", "dir/1.more", "dir/2.more"), "elsewhere/a.output") 804 805 \ 806 807 Here:: 808 809 r"\1/*.more" 810 811 is first converted to:: 812 813 r"dir/*.more" 814 815 which matches:: 816 817 "dir/1.more" 818 "dir/2.more" 819 820 821******************************************************************** 822version 2.0.9 823******************************************************************** 824 825 * Better display of logging output 826 * Advanced form of **@split** 827 This is an experimental feature. 828 829 Hitherto, **@split** only takes 1 set of input (tasks/files/``glob`` s) and split these 830 into an indeterminate number of output. 831 832 This is a one->many operation. 833 834 Sometimes it is desirable to take multiple input files, and split each of them further. 835 836 This is a many->many (more) operation. 837 838 It is possible to hack something together using **@transform** but downstream tasks would not 839 aware that each job in **@transform** produces multiple outputs (rather than one input, 840 one output per job). 841 842 The syntax looks like:: 843 844 @split(get_files, regex(r"(.+).original"), r"\1.*.split") 845 def split_files(i, o): 846 pass 847 848 If ``get_files()`` returned ``A.original``, ``B.original`` and ``C.original``, 849 ``split_files()`` might lead to the following operations:: 850 851 A.original 852 -> A.1.original 853 -> A.2.original 854 -> A.3.original 855 B.original 856 -> B.1.original 857 -> B.2.original 858 C.original 859 -> C.1.original 860 -> C.2.original 861 -> C.3.original 862 -> C.4.original 863 -> C.5.original 864 865 Note that each input (``A/B/C.original``) can produce a number of output, the exact 866 number of which does not have to be pre-determined. 867 This is similar to **@split** 868 869 Tasks following ``split_files`` will have ten inputs corresponding to each of the 870 output from ``split_files``. 871 872 If **@transform** was used instead of **@split**, then tasks following ``split_files`` 873 would only have 3 inputs. 874 875******************************************************************** 876version 2.0.8 877******************************************************************** 878 879 * File names can be in unicode 880 * File systems with 1 second timestamp granularity no longer cause problems. 881 882******************************************************************** 883version 2.0.2 884******************************************************************** 885 886 * Much prettier /useful output from :ref:`pipeline_printout <pipeline_functions.pipeline_printout>` 887 * New tutorial / manual 888 889 890 891******************************************************************** 892version 2.0 893******************************************************************** 894 * Revamped documentation: 895 896 * Rewritten tutorial 897 * Comprehensive manual 898 * New syntax help 899 900 * Major redesign. New decorators include 901 902 * :ref:`@split <new_manual.split>` 903 * :ref:`@transform <new_manual.transform>` 904 * :ref:`@merge <new_manual.merge>` 905 * :ref:`@collate <new_manual.collate>` 906 907 * Major redesign. Decorator *inputs* can mix 908 909 * Output from previous tasks 910 * |glob|_ patterns e.g. ``*.txt`` 911 * Files names 912 * Any other data type 913 914******************************************************************** 915version 1.1.4 916******************************************************************** 917 Tasks can get their input by automatically chaining to the output from one or more parent tasks using :ref:`@files_re <decorators.files_re>` 918 919******************************************************************** 920version 1.0.7 921******************************************************************** 922 Added `proxy_logger` module for accessing a shared log across multiple jobs in different processes. 923 924******************************************************************** 925version 1.0 926******************************************************************** 927 928 Initial Release in Oxford 929 930