1.. include:: global.inc
2
3.. role:: raw-html(raw)
4   :format: html
5
6:raw-html:`<style> .red {color:red} </style>`
7
8.. role:: red
9
10
11########################################
12Latest Changes
13########################################
14
15Major Features added to Ruffus
16
17.. note::
18
19    See :ref:`To do list <todo>` for future enhancements to Ruffus
20
21
22********************************************************************
23version 2.8.1
24********************************************************************
25
26* compatibility with gevent >= 1.2
27
28********************************************************************
29version 2.8.1
30********************************************************************
31
32* Ctrl-C will kill drmaa jobs
33* python3.7 compatibility, thanks to @jbarlow83, @QuLogic
34
35********************************************************************
36version 2.6.3
37********************************************************************
38    21st April 2015
39
40=====================================================================================================================
41Bug fixes and minor enhancements
42=====================================================================================================================
43
44    * `@transform(..., suffix("xxx"),` :red:`output_dir` `= "/new/output/path")` works even when the ouput has more than one file  `(github)  <https://github.com/bunbun/ruffus/issues/43>`__
45    * `@subdivide( ..., suffix("xxx"),` :red:`output_dir` `= "/new/output/path")` works in exactly the same way as `@transform(..., outputdir="xxx")`  `(github)  <https://github.com/bunbun/ruffus/issues/42>`__
46    * `ruffus.drmaa_wrapper.run_job()` works with python3 `(github)  <https://github.com/bunbun/ruffus/issues/46>`__
47      Fixed issue with byte and text streams.
48    * `ruffus.drmaa.wrapper.run_job()` allows env (environment) to be set for jobs run locally as well as those on the cluster `(github)  <https://github.com/bunbun/ruffus/issues/44>`__
49    * New object-orientated style syntax works seamlessly with Ruffus command line support `ruffus.cmdline.run` `(github)  <https://github.com/bunbun/ruffus/issues/48>`__.
50
51
52
53
54********************************************************************
55version 2.6.2
56********************************************************************
57
58    12th March 2015
59
60=====================================================================================================================
611) Bug fixes
62=====================================================================================================================
63
64    * ``pipeline_printout_graph()`` incompatibility with python3 fixed
65    * checkpointing did not work correctly with :ref:`@split(...) <decorators.split>` and :ref:`@subdivide(...) <decorators.subdivide>`
66
67
68=====================================================================================================================
692) `@transform `(..., suffix("xxx"),` :red:`output_dir` `= "/new/output/path")`
70=====================================================================================================================
71
72    Thanks to the suggestion of Milan Simonovic.
73
74    :ref:`@transform(..., suffix(...) ) <decorators.transform>` has easy to understand syntax and takes care of all the common use cases
75    of Ruffus.
76
77    However, when we need to place the output in a different directories, we suddenly have to plunge into the deep end and parse file paths using
78    :ref:`regex() <decorators.regex>` or :ref:`formatter() <new_manual.formatter>`.
79
80    Now, :ref:`@transform <decorators.transform>` takes an optional ``output_dir`` named parameter so that we can continue to use :ref:`suffix() <new_manual.suffix>` even when the output needs
81    to go into a new directory.
82
83        .. <<Python
84
85        .. code-block:: python
86            :emphasize-lines: 2,3,9
87
88            #
89            #   input/a.fasta -> output/a.sam
90            #   input/b.fasta -> output/b.sam
91            #
92            starting_files = ["input/a.fasta","input/b.fasta"]
93            @transform(starting_files,
94                       suffix('.fasta'),
95                       '.sam',
96                       output_dir = "output")
97            def map_dna_sequence(input_file, output_file) :
98                pass
99
100        ..
101            Python
102
103    See example ``test\test_suffix_output_dir.py``
104
105=====================================================================================================================
1062) Named parameters
107=====================================================================================================================
108
109    Decorators can take named parameters.
110
111    These are self documenting, and improve clarity.
112
113    Note that the usual Python rules for function parameters apply:
114
115    * Positional arguments must precede named arguments
116    * Named arguments cannot be used to fill in for "missing" positional arguments
117
118
119    For example the following two functions are identical:
120
121    **Positional parameters:**
122
123        .. <<Python
124
125        .. code-block:: python
126
127            @merge(prev_task, ["a.summary", "b.summary"], 14, "extra_info", {"a":45, "b":5})
128            def merge_task(inputs, outputs, extra_num, extra_str, extra_dict):
129                pass
130        ..
131            Python
132
133    **Named parameters:**
134
135        .. <<Python
136
137        .. code-block:: python
138
139            # new style is a bit clearer
140            @merge(input   = prev_task,
141                   output  = ["a.summary", "b.summary"],
142                   extras  = [14, "extra_info", {"a":45, "b":5}]
143                   )
144            def merge_task(inputs, outputs, extra_num, extra_str, extra_dict):
145                pass
146        ..
147            Python
148
149    .. warning::
150
151        ``,extras=`` takes all the *extras* parameters (``14, "extra_info", {"a":45, "b":5}``) as a single list
152
153    * :ref:`@split(...) <decorators.split>` and :ref:`@merge(...) <decorators.merge>`
154        * *input*
155        * *output*
156        * [*extras*\ ]
157    * :ref:`@transform(...) <decorators.transform>` and :ref:`@mkdir(...) <decorators.mkdir>`
158        * *input*
159        * *filter*
160        * [*replace_inputs* or *add_inputs*\ ]
161        * *output*
162        * [*extras*\ ]
163        * [*output_dir*\ ]
164    * :ref:`@collate(...) <decorators.collate>` and :ref:`@subdivide(...) <decorators.collate>`
165        * *input*
166        * *filter*
167        * *output*
168        * [*extras*\ ]
169    * :ref:`@originate(...) <decorators.originate>`
170        * *output*
171        * [*extras*\ ]
172    * :ref:`@product(...) <decorators.product>`, :ref:`@permutations(...) <decorators.permutations>`, :ref:`@combinations(...) <decorators.combinations>`, and :ref:`@combinations_with_replacement(...) <decorators.combinations_with_replacement>`
173        * *input*
174        * *filter*
175        * [*input2...NNN*\ ] (only for ``product``)
176        * [*filter2...NNN*\ ] (only for ``product``) where NNN is an incrementing number
177        * *tuple_size* (except for ``product``)
178        * [*replace_inputs* or *add_inputs*\ ]
179        * *output*
180        * [*extras*\ ]
181
182
183
184=============================================
1853) New object orientated syntax for Ruffus
186=============================================
187
188    Ruffus Pipelines can now be created directly using the new ``Pipeline`` and ``Task`` objects instead of via decorators.
189
190        .. <<python
191
192        .. code-block:: python
193            :emphasize-lines: 9
194
195            # make ruffus pipeline
196            my_pipeline = Pipeline(name = "test")
197            my_pipeline.transform(task_func  = map_dna_sequence,
198                                  input      = starting_files,
199                                  filter     = suffix('.fasta'),
200                                  output     = '.sam',
201                                  output_dir = "output")
202
203            my_pipeline.run()
204        ..
205            python
206
207    This new syntax is fully compatible and inter-operates with traditional Ruffus syntax using decorators.
208
209    Apart from cosmetic changes, the new syntax allows different instances of modular Ruffus sub-pipelines
210    to be defined separately, in different python modules and then joined together flexible at runtime.
211
212    The new syntax and discussion are introduced :ref:`here <new_syntax>`.
213
214
215
216********************************************************************
217version 2.5
218********************************************************************
219
220    6th August 2014
221
222============================================================================================================================================================
2231) Python3 compatability (but at least python 2.6 is now required)
224============================================================================================================================================================
225
226    Ruffus v2.5 is now python3 compatible. This has required surprisingly many changes to the codebase. Please report any bugs to me.
227
228    .. note::
229
230        **Ruffus now requires at least python 2.6**
231
232        It proved to be impossible to support python 2.5 and python 3.x at the same time.
233
234============================================================================================================================================================
2352) Ctrl-C interrupts
236============================================================================================================================================================
237
238    Ruffus now mostly(!) terminates gracefully when interrupted by Ctrl-C .
239
240    Please send me bug reports for when this doesn't work with a minimally reproducible case.
241
242    This means that, in general, if an ``Exception`` is thrown during your pipeline but you don't want to wait for the rest of the jobs to complete, you can still press Ctrl-C at any point.
243    Note that you may still need to clean up spawned processes, for example, using ``qdel`` if you are using ``Ruffus.drmaa_wrapper``
244
245============================================================================================================================================================
2463) Customising flowcharts in pipeline_printout_graph() with ``@graphviz``
247============================================================================================================================================================
248
249    *Contributed by Sean Davis, with improved syntax via Jake Biesinger*
250
251    The graphics for each task can have its own attributes (URL, shape, colour) etc. by adding
252    `graphviz attributes  <http://www.graphviz.org/doc/info/attrs.html>`__
253    using the ``@graphviz`` decorator.
254
255    * This allows HTML formatting in the task names (using the ``label`` parameter as in the following example).
256      HTML labels **must** be enclosed in ``<`` and ``>``. E.g.
257
258      .. code-block:: python
259
260        label = "<Line <BR/> wrapped task_name()>"
261
262    * You can also opt to keep the task name and wrap it with a prefix and suffix:
263
264      .. code-block:: python
265
266        label_suffix = "??? ", label_prefix = ": What is this?"
267
268    * The ``URL`` attribute allows the generation of clickable svg, and also client / server
269      side image maps usable in web pages.
270      See `Graphviz documentation  <http://www.graphviz.org/content/output-formats#dimap>`__
271
272
273    Example:
274
275        .. code-block:: python
276
277
278            @graphviz(URL='"http://cnn.com"', fillcolor = '"#FFCCCC"',
279                            color = '"#FF0000"', pencolor='"#FF0000"', fontcolor='"#4B6000"',
280                            label_suffix = "???", label_prefix = "What is this?<BR/> ",
281                            label = "<What <FONT COLOR=\"red\">is</FONT>this>",
282                            shape= "component", height = 1.5, peripheries = 5,
283                            style="dashed")
284            def Up_to_date_task2(infile, outfile):
285                pass
286
287            #   Can use dictionary if you wish...
288            graphviz_params = {"URL":"http://cnn.com", "fontcolor": '"#FF00FF"'}
289            @graphviz(**graphviz_params)
290            def myTask(input,output):
291                pass
292
293        .. **
294
295        .. image:: images/history_html_flowchart.png
296           :scale: 30
297
298
299============================================================================================================================================================
3004. Consistent verbosity levels
301============================================================================================================================================================
302
303    The verbosity levels are now more fine-grained and consistent between pipeline_printout and pipeline_run.
304    Note that At verbosity > 2, ``pipeline_run``  outputs lists of up-to-date tasks before running the pipeline.
305    Many users who defaulted to using a verbosity of 3 may want to move up to ``verbose = 4``.
306
307        * **level 0** : *Nothing*
308        * **level 1** : *Out-of-date Task names*
309        * **level 2** : *All Tasks (including any task function docstrings)*
310        * **level 3** : *Out-of-date Jobs in Out-of-date Tasks, no explanation*
311        * **level 4** : *Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings*
312        * **level 5** : *All Jobs in Out-of-date Tasks,  (include only list of up-to-date tasks)*
313        * **level 6** : *All jobs in All Tasks whether out of date or not*
314        * **level 10**: *Logs messages useful only for debugging ruffus pipeline code*
315
316    * Defaults to **level 4** for pipeline_printout: *Out of date jobs, with explanations and warnings*
317    * Defaults to **level 1** for pipeline_run: *Out-of-date Task names*
318
319============================================================================================================================================================
3205. Allow abbreviated paths from ``pipeline_run`` or ``pipeline_printout``
321============================================================================================================================================================
322
323    .. note ::
324
325            Please contact me with suggestions if you find the abbreviations useful but "aesthetically challenged"!
326
327    Some pipelines produce interminable lists of long filenames. It would be nice to be able to abbreviate this
328    to just enough information to follow the progress.
329
330    Ruffus now allows either
331        1) Only the nth top level sub-directories to be included
332        2) The message to be truncated to a specified number of characters (to fit on a line, for example)
333
334           Note that the number of characters specified is the separate length of the input and output parameters,
335           not the entire message. You many need to specify a smaller limit that you expect (e.g. ``60`` rather than `80`)
336
337        .. code-block:: python
338
339            pipeline_printout(verbose_abbreviated_path = NNN)
340            pipeline_run(verbose_abbreviated_path = -MMM)
341
342
343    The ``verbose_abbreviated_path`` parameter restricts the length of input / output file paths to either
344
345        * NNN levels of nested paths
346        * A total of MMM characters, MMM is specified by setting ``verbose_abbreviated_path`` to -MMM (i.e. negative values)
347
348        ``verbose_abbreviated_path`` defaults to ``2``
349
350
351    For example:
352
353        Given ``["aa/bb/cc/dddd.txt", "aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt"]``
354
355
356        .. code-block:: python
357           :emphasize-lines: 1,4,8,19
358
359            # Original relative paths
360            "[aa/bb/cc/dddd.txt, aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt]"
361
362            # Full abspath
363            verbose_abbreviated_path = 0
364            "[/test/ruffus/src/aa/bb/cc/dddd.txt, /test/ruffus/src/aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt]"
365
366            # Specifed level of nested directories
367            verbose_abbreviated_path = 1
368            "[.../dddd.txt, .../gggg.txt]"
369
370            verbose_abbreviated_path = 2
371            "[.../cc/dddd.txt, .../ffff/gggg.txt]"
372
373            verbose_abbreviated_path = 3
374            "[.../bb/cc/dddd.txt, .../eeee/ffff/gggg.txt]"
375
376
377            # Truncated to MMM characters
378            verbose_abbreviated_path = -60
379            "<???> /bb/cc/dddd.txt, aaa/bbbb/cccc/eeed/eeee/ffff/gggg.txt]"
380
381
382    If you are using ``ruffus.cmdline``, the abbreviated path lengths can be specified on
383    the command line as an extension to the verbosity:
384
385        .. code-block:: bash
386           :emphasize-lines: 4,7
387
388            # verbosity of 4
389            yourscript.py --verbose 4
390
391            # display three levels of nested directories
392            yourscript.py --verbose 4:3
393
394            # restrict input and output parameters to 60 letters
395            yourscript.py --verbose 4:-60
396
397
398        The number after the colon is the abbreviated path length
399
400
401============================================================================================================================================================
402Other changes
403============================================================================================================================================================
404    * BUG FIX: Output producing wild cards was not saved in the checksum files!!!
405    * BUG FIX: @mkdir bug under Windows. Thanks to Sean Turley. (Aargh! Different exceptions are thrown in Windows vs. Linux for the same condition!)
406    * Added :ref:`pipeline_get_task_names(...) <pipeline_functions.pipeline_get_task_names>` which returns all task name as a list of strings. Thanks to Clare Sloggett
407
408
409********************************************************************
410version 2.4.1
411********************************************************************
412
413    26th April 2014
414
415    * Breaking changes to drmaa API suggested by Bernie Pope to ensure portability across different drmaa implementations (SGE, SLURM etc.)
416
417********************************************************************
418version 2.4
419********************************************************************
420
421    4th April 2014
422
423============================================================================================================================================================
424Additions to ``ruffus`` namespace
425============================================================================================================================================================
426
427    * :ref:`formatter() <new_manual.formatter>` (:ref:`syntax <decorators.formatter>`)
428    * :ref:`originate() <new_manual.originate>` (:ref:`syntax <decorators.originate>`)
429    * :ref:`subdivide() <new_manual.subdivide>` (:ref:`syntax <decorators.subdivide>`)
430
431============================================================================================================================================================
432Installation: use pip
433============================================================================================================================================================
434
435    ::
436
437        sudo pip install ruffus --upgrade
438
439============================================================================================================================================================
4401) Command Line support
441============================================================================================================================================================
442
443    The optional ``Ruffus.cmdline`` module provides support for a set of common command
444    line arguments which make writing *Ruffus* pipelines much more pleasant.
445    See :ref:`manual <new_manual.cmdline>`
446
447============================================================================================================================================================
4482) Check pointing
449============================================================================================================================================================
450
451    * Contributed by **Jake Biesinger**
452    * See :ref:`Manual <new_manual.checkpointing>`
453    * Uses a fault resistant sqlite database file to log i/o files, and additional checksums
454    * defaults to checking file timestamps stored in the current directory (``ruffus_utilility.RUFFUS_HISTORY_FILE = '.ruffus_history.sqlite'``)
455    * :ref:`pipeline_run(..., checksum_level = N, ...) <pipeline_functions.pipeline_run>`
456
457       * level 0 = CHECKSUM_FILE_TIMESTAMPS      : Classic mode. Use only file timestamps (no checksum file will be created)
458       * level 1 = CHECKSUM_HISTORY_TIMESTAMPS   : Also store timestamps in a database after successful job completion
459       * level 2 = CHECKSUM_FUNCTIONS            : As above, plus a checksum of the pipeline function body
460       * level 3 = CHECKSUM_FUNCTIONS_AND_PARAMS : As above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators
461
462       * defaults to level 1
463
464    * Can speed up trivial tasks: Previously Ruffus always added an extra 1 second pause between tasks
465      to guard against file systems (Ext3, FAT, some NFS) with low timestamp granularity.
466
467
468============================================================================================================================================================
4693) :ref:`subdivide() <new_manual.subdivide>` (:ref:`syntax <decorators.subdivide>`)
470============================================================================================================================================================
471
472    * Take a list of input jobs (like :ref:`@transform <decorators.transform>`) but further splits each into multiple jobs, i.e. it is a **many->even more** relationship
473    * synonym for the deprecated ``@split(..., regex(), ...)``
474
475========================================================================================================================================================================================================================================================================================================================
4764) :ref:`mkdir() <new_manual.mkdir>` (:ref:`syntax <decorators.mkdir>`) with :ref:`formatter() <new_manual.formatter>`, :ref:`suffix() <decorators.suffix>`  and :ref:`regex() <decorators.regex>`
477========================================================================================================================================================================================================================================================================================================================
478
479    * allows directories to be created depending on runtime parameters or the output of previous tasks
480    * behaves just like :ref:`@transform <decorators.transform>` but with its own (internal) function which does the actual work of making a directory
481    * Previous behavior is retained:``mkdir`` continues to work seamlessly inside :ref:`@follows <decorators.follows>`
482
483============================================================================================================================================================
4845) :ref:`originate() <new_manual.originate>` (:ref:`syntax <decorators.originate>`)
485============================================================================================================================================================
486
487    * Generates output files without dependencies from scratch  (*ex nihilo*!)
488    * For first step in a pipeline
489    * Task function obviously only takes output and not input parameters. (There *are* no inputs!)
490    * synonym for :ref:`@split(None,...) <decorators.split>`
491    * See :ref:`Summary <decorators.originate>` / :ref:`Manual <new_manual.originate>`
492
493========================================================================================================================================================================================================================================================================================================================
4946) New flexible :ref:`formatter() <new_manual.formatter>` (:ref:`syntax <decorators.formatter>`) alternative to :ref:`regex() <decorators.regex>`  & :ref:`suffix() <decorators.suffix>`
495========================================================================================================================================================================================================================================================================================================================
496
497    * Easy manipulation of path subcomponents in the style of `os.path.split()  <http://docs.python.org/2/library/os.path.html#os.path.split>`__
498    * Regular expressions are no longer necessary for path manipulation
499    * Familiar python syntax
500    * Optional regular expression matches
501    * Can refer to any in the list of N input files (not only the first file as for ``regex(...)``)
502    * Can even refer to individual letters within a match
503
504============================================================================================================================================================
5057) Combinatorics (all vs. all decorators)
506============================================================================================================================================================
507
508    * :ref:`@product <new_manual.product>`  (See `itertools.product  <http://docs.python.org/2/library/itertools.html#itertools.product>`__)
509    * :ref:`@permutations <new_manual.permutations>`  (See `itertools.permutations  <http://docs.python.org/2/library/itertools.html#itertools.permutations>`__)
510    * :ref:`@combinations <new_manual.combinations>` (See `itertools.combinations  <http://docs.python.org/2/library/itertools.html#itertools.combinations>`__)
511    * :ref:`@combinations_with_replacement <new_manual.combinations_with_replacement>` (See `itertools.combinations_with_replacement  <http://docs.python.org/2/library/itertools.html#itertools.combinations_with_replacement>`__)
512    * in optional :ref:`combinatorics <new_manual.combinatorics>` module
513    * Only :ref:`formatter() <new_manual.formatter>` provides the necessary flexibility to construct the output. (:ref:`suffix() <decorators.suffix>` and :ref:`regex() <decorators.regex>` are not supported.)
514    * See :ref:`Summary <decorators.combinatorics>` / :ref:`Manual <new_manual.combinatorics>`
515
516
517
518============================================================================================================================================================
5198) drmaa support and multithreading:
520============================================================================================================================================================
521
522    * :ref:`ruffus.drmaa_wrapper.run_job() <new_manual.ruffus.drmaa_wrapper.run_job>` (:ref:`syntax <drmaa_wrapper.run_job>`)
523    * Optional helper module allows jobs to dispatch work to a computational cluster and wait until it completes.
524    * Requires ``multithread`` rather than ``multiprocess``
525
526============================================================================================================================================================
5279) ``pipeline_run(...)`` and exceptions
528============================================================================================================================================================
529    See :ref:`Manual <new_manual.exceptions>`
530
531    * Optionally terminate pipeline after first exception
532    * Display exceptions without delay
533
534
535============================================================================================================================================================
53610) Miscellaneous
537============================================================================================================================================================
538
539    Better error messages for ``formatter()``, ``suffix()`` and ``regex()`` for ``pipeline_printout(..., verbose >= 3, ...)``
540        * Error messages for showing mismatching regular expression and offending file name
541        * Wrong capture group names or out of range indices will raise informative Exception
542
543********************************************************************
544version 2.3
545********************************************************************
546    1st September, 2013
547
548    * ``@active_if`` turns off tasks at runtime
549        The Design and initial implementation were contributed by Jacob Biesinger
550
551        Takes one or more parameters which can be either booleans or functions or callable objects which return True / False::
552
553                run_if_true_1 = True
554                run_if_true_2 = False
555
556                @active_if(run_if_true, lambda: run_if_true_2)
557                def this_task_might_be_inactive():
558                    pass
559
560        The expressions inside @active_if are evaluated each time
561        ``pipeline_run``, ``pipeline_printout`` or ``pipeline_printout_graph`` is called.
562
563        Dormant tasks behave as if they are up to date and have no output.
564
565    * Command line parsing
566        * Supports both argparse (python 2.7) and optparse (python 2.6):
567        * ``Ruffus.cmdline`` module is optional.
568        * See :ref:`manual <new_manual.cmdline>`
569    * Optionally terminate pipeline after first exception
570        To have all exceptions interrupt immediately::
571
572                pipeline_run(..., exceptions_terminate_immediately = True)
573
574        By default ruffus accumulates ``NN`` errors before interrupting the pipeline prematurely. ``NN`` is the specified parallelism for ``pipeline_run(..., multiprocess = NN)``.
575
576        Otherwise, a pipeline will only be interrupted immediately if exceptions of type ``ruffus.JobSignalledBreak`` are thrown.
577
578    * Display exceptions without delay
579
580        By default, Ruffus re-throws exceptions in ensemble after pipeline termination.
581
582        To see exceptions as they occur::
583
584                pipeline_run(..., log_exceptions = True)
585
586        ``logger.error(...)`` will be invoked with the string representation of the each exception, and associated stack trace.
587
588        The default logger prints to sys.stderr, but this can be changed to any class from the logging module or compatible object via ``pipeline_run(..., logger = ???)``
589
590    * Improved ``pipeline_printout()``
591
592            * `@split` operations now show the 1->many output in pipeline_printout
593
594                This make it clearer that ``@split`` is creating multiple output parameters (rather than a single output parameter consisting of a list)::
595
596                        Task = split_animals
597                             Job = [None
598                                   -> cows
599                                   -> horses
600                                   -> pigs
601                                    , any_extra_parameters]
602            * File date and time are displayed in human readable form and out of date files are flagged with asterisks.
603
604
605
606********************************************************************
607version 2.2
608********************************************************************
609    22nd July, 2010
610
611    * Simplifying **@transform** syntax with **suffix(...)**
612
613        Regular expressions within ruffus are very powerful, and can allow files to be moved
614        from one directory to another and renamed at will.
615
616        However, using consistent file extensions and
617        ``@transform(..., suffix(...))`` makes the code much simpler and easier to read.
618
619        Previously, ``suffix(...)`` did not cooperate well with ``inputs(...)``.
620        For example, finding the corresponding header file (".h") for the matching input
621        required a complicated ``regex(...)`` regular expression and ``input(...)``. This simple case,
622        e.g. matching "something.c" with "something.h", is now much easier in Ruffus.
623
624
625        For example:
626          ::
627
628            source_files = ["something.c", "more_code.c"]
629            @transform(source_files, suffix(".c"), add_inputs(r"\1.h", "common.h"), ".o")
630            def compile(input_files, output_file):
631                ( source_file,
632                  header_file,
633                  common_header) = input_files
634                # call compiler to make object file
635
636          This is equivalent to calling:
637
638            ::
639
640              compile(["something.c", "something.h", "common.h"], "something.o")
641              compile(["more_code.c", "more_code.h", "common.h"], "more_code.o")
642
643        The ``\1`` matches everything *but* the suffix and will be applied to both ``glob``\ s and file names.
644
645    For simplicity and compatibility with previous versions, there is always an implied r"\1" before
646    the output parameters. I.e. output parameters strings are *always* substituted.
647
648
649    * Tasks and glob in **inputs(...)** and **add_inputs(...)**
650
651        ``glob``\ s and tasks can be added as the prerequisites / input files using
652        ``inputs(...)`` and ``add_inputs(...)``. ``glob`` expansions will take place when the task
653        is run.
654
655    * Advanced form of **@split** with **regex**:
656
657        The standard ``@split`` divided one set of inputs into multiple outputs (the number of which
658        can be determined at runtime).
659
660        This is a ``one->many`` operation.
661
662
663        An advanced form of ``@split`` has been added which can split each of several files further.
664
665        In other words, this is a ``many->"many more"`` operation.
666
667        For example, given three starting files:
668            ::
669
670                original_files = ["original_0.file",
671                                  "original_1.file",
672                                  "original_2.file"]
673        We can split each into its own set of sub-sections:
674            ::
675
676                @split(original_files,
677                   regex(r"starting_(\d+).fa"),                         # match starting files
678                         r"files.split.\1.*.fa"                         # glob pattern
679                         r"\1")                                         # index of original file
680                def split_files(input_file, output_files, original_index):
681                    """
682                        Code to split each input_file
683                            "original_0.file" -> "files.split.0.*.fa"
684                            "original_1.file" -> "files.split.1.*.fa"
685                            "original_2.file" -> "files.split.2.*.fa"
686                    """
687
688
689        This is, conceptually, the reverse of the @collate(...) decorator
690
691    * Ruffus will complain about unescaped regular expression special characters:
692
693        Ruffus uses "\\1" and "\\2" in regular expression substitutions. Even seasoned python
694        users may not remember that these have to be 'escaped' in strings. The best option is
695        to use 'raw' python strings e.g.
696
697            ::
698
699                r"\1_substitutes\2correctly\3four\4times"
700
701        Ruffus will throw an exception if it sees an unescaped "\\1" or "\\2" in a file name,
702        which should catch most of these bugs.
703
704    * Prettier output from *pipeline_printout_graph*
705
706        Changed to nicer colours, symbols etc. for a more professional look.
707        @split and @merge tasks now look different from @transform.
708        Colours, size and resolution are now fully customisable::
709
710            pipeline_printout_graph( #...
711                                     user_colour_scheme = {
712                                                            "colour_scheme_index":1,
713                                                            "Task to run"  : {"fillcolor":"blue"},
714                                                             pipeline_name : "My flowchart",
715                                                             size          : (11,8),
716                                                             dpi           : 120)})
717
718        An SVG bug in firefox has been worked around so that font size are displayed correctly.
719
720
721
722
723********************************************************************
724version 2.1.1
725********************************************************************
726    * **@transform(.., add_inputs(...))**
727        ``add_inputs(...)`` allows the addition of extra input dependencies / parameters for each job.
728
729        Unlike ``inputs(...)``, the original input parameter is retained:
730            ::
731
732                from ruffus import *
733                @transform(["a.input", "b.input"], suffix(".input"), add_inputs("just.1.more","just.2.more"), ".output")
734                def task(i, o):
735                ""
736
737        Produces:
738            ::
739
740                Job = [[a.input, just.1.more, just.2.more] ->a.output]
741                Job = [[b.input, just.1.more, just.2.more] ->b.output]
742
743
744        Like ``inputs``, ``add_inputs`` accepts strings, tasks and ``glob`` s
745        This minor syntactic change promises add much clarity to Ruffus code.
746        ``add_inputs()`` is available for ``@transform``, ``@collate`` and ``@split``
747
748
749********************************************************************
750version 2.1.0
751********************************************************************
752    * **@jobs_limit**
753      Some tasks are resource intensive and too many jobs should not be run at the
754      same time. Examples include disk intensive operations such as unzipping, or
755      downloading from FTP sites.
756
757      Adding::
758
759          @jobs_limit(4)
760          @transform(new_data_list, suffix(".big_data.gz"), ".big_data")
761          def unzip(i, o):
762            "unzip code goes here"
763
764      would limit the unzip operation to 4 jobs at a time, even if the rest of the
765      pipeline runs highly in parallel.
766
767      (Thanks to Rob Young for suggesting this.)
768
769********************************************************************
770version 2.0.10
771********************************************************************
772    * **touch_files_only** option for **pipeline_run**
773
774      When the pipeline runs, task functions will not be run. Instead, the output files for
775      each job (in each task) will be ``touch``\ -ed if necessary.
776      This can be useful for simulating a pipeline run so that all files look as
777      if they are up-to-date.
778
779      Caveats:
780
781        * This may not work correctly where output files are only determined at runtime, e.g. with **@split**
782        * Only the output from pipelined jobs which are currently out-of-date will be ``touch``\ -ed.
783          In other words, the pipeline runs *as normal*, the only difference is that the
784          output files are ``touch``\ -ed instead of being created by the python task functions
785          which would otherwise have been called.
786
787    * Parameter substitution for **inputs(...)**
788
789      The **inputs(...)** parameter in **@transform**, **@collate** can now take tasks and ``glob`` s,
790      and these will be expanded appropriately (after regular expression replacement).
791
792      For example::
793
794          @transform("dir/a.input", regex(r"(.*)\/(.+).input"),
795                        inputs((r"\1/\2.other", r"\1/*.more")), r"elsewhere/\2.output")
796          def task1(i, o):
797            """
798            Some pipeline task
799            """
800
801      Is equivalent to calling::
802
803            task1(("dir/a.other", "dir/1.more", "dir/2.more"), "elsewhere/a.output")
804
805      \
806
807          Here::
808
809                r"\1/*.more"
810
811          is first converted to::
812
813                r"dir/*.more"
814
815          which matches::
816
817                "dir/1.more"
818                "dir/2.more"
819
820
821********************************************************************
822version 2.0.9
823********************************************************************
824
825    * Better display of logging output
826    * Advanced form of **@split**
827      This is an experimental feature.
828
829      Hitherto, **@split** only takes 1 set of input (tasks/files/``glob`` s) and split these
830      into an indeterminate number of output.
831
832          This is a one->many operation.
833
834      Sometimes it is desirable to take multiple input files, and split each of them further.
835
836          This is a many->many (more) operation.
837
838      It is possible to hack something together using **@transform** but downstream tasks would not
839      aware that each job in **@transform** produces multiple outputs (rather than one input,
840      one output per job).
841
842      The syntax looks like::
843
844           @split(get_files, regex(r"(.+).original"), r"\1.*.split")
845           def split_files(i, o):
846                pass
847
848      If ``get_files()`` returned ``A.original``, ``B.original`` and ``C.original``,
849      ``split_files()`` might lead to the following operations::
850
851            A.original
852                    -> A.1.original
853                    -> A.2.original
854                    -> A.3.original
855            B.original
856                    -> B.1.original
857                    -> B.2.original
858            C.original
859                    -> C.1.original
860                    -> C.2.original
861                    -> C.3.original
862                    -> C.4.original
863                    -> C.5.original
864
865      Note that each input (``A/B/C.original``) can produce a number of output, the exact
866      number of which does not have to be pre-determined.
867      This is similar to **@split**
868
869      Tasks following ``split_files`` will have ten inputs corresponding to each of the
870      output from ``split_files``.
871
872      If **@transform** was used instead of **@split**, then tasks following ``split_files``
873      would only have 3 inputs.
874
875********************************************************************
876version 2.0.8
877********************************************************************
878
879    * File names can be in unicode
880    * File systems with 1 second timestamp granularity no longer cause problems.
881
882********************************************************************
883version 2.0.2
884********************************************************************
885
886    * Much prettier /useful output from :ref:`pipeline_printout <pipeline_functions.pipeline_printout>`
887    * New tutorial / manual
888
889
890
891********************************************************************
892version 2.0
893********************************************************************
894    * Revamped documentation:
895
896        * Rewritten tutorial
897        * Comprehensive manual
898        * New syntax help
899
900    * Major redesign. New decorators include
901
902        * :ref:`@split <new_manual.split>`
903        * :ref:`@transform <new_manual.transform>`
904        * :ref:`@merge <new_manual.merge>`
905        * :ref:`@collate <new_manual.collate>`
906
907    * Major redesign. Decorator *inputs* can mix
908
909        * Output from previous tasks
910        * |glob|_ patterns e.g. ``*.txt``
911        * Files names
912        * Any other data type
913
914********************************************************************
915version 1.1.4
916********************************************************************
917    Tasks can get their input by automatically chaining to the output from one or more parent tasks using :ref:`@files_re <decorators.files_re>`
918
919********************************************************************
920version 1.0.7
921********************************************************************
922    Added `proxy_logger` module for accessing a shared log across multiple jobs in different processes.
923
924********************************************************************
925version 1.0
926********************************************************************
927
928    Initial Release in Oxford
929
930