1:mod:`difflib` --- Helpers for computing deltas
2===============================================
3
4.. module:: difflib
5   :synopsis: Helpers for computing differences between objects.
6
7.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
8.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
9.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
10
11**Source code:** :source:`Lib/difflib.py`
12
13.. testsetup::
14
15   import sys
16   from difflib import *
17
18--------------
19
20This module provides classes and functions for comparing sequences. It
21can be used for example, for comparing files, and can produce information
22about file differences in various formats, including HTML and context and unified
23diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
24
25
26.. class:: SequenceMatcher
27   :noindex:
28
29   This is a flexible class for comparing pairs of sequences of any type, so long
30   as the sequence elements are :term:`hashable`.  The basic algorithm predates, and is a
31   little fancier than, an algorithm published in the late 1980's by Ratcliff and
32   Obershelp under the hyperbolic name "gestalt pattern matching."  The idea is to
33   find the longest contiguous matching subsequence that contains no "junk"
34   elements; these "junk" elements are ones that are uninteresting in some
35   sense, such as blank lines or whitespace.  (Handling junk is an
36   extension to the Ratcliff and Obershelp algorithm.) The same
37   idea is then applied recursively to the pieces of the sequences to the left and
38   to the right of the matching subsequence.  This does not yield minimal edit
39   sequences, but does tend to yield matches that "look right" to people.
40
41   **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst
42   case and quadratic time in the expected case. :class:`SequenceMatcher` is
43   quadratic time for the worst case and has expected-case behavior dependent in a
44   complicated way on how many elements the sequences have in common; best case
45   time is linear.
46
47   **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that
48   automatically treats certain sequence items as junk. The heuristic counts how many
49   times each individual item appears in the sequence. If an item's duplicates (after
50   the first one) account for more than 1% of the sequence and the sequence is at least
51   200 items long, this item is marked as "popular" and is treated as junk for
52   the purpose of sequence matching. This heuristic can be turned off by setting
53   the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
54
55   .. versionadded:: 3.2
56      The *autojunk* parameter.
57
58
59.. class:: Differ
60
61   This is a class for comparing sequences of lines of text, and producing
62   human-readable differences or deltas.  Differ uses :class:`SequenceMatcher`
63   both to compare sequences of lines, and to compare sequences of characters
64   within similar (near-matching) lines.
65
66   Each line of a :class:`Differ` delta begins with a two-letter code:
67
68   +----------+-------------------------------------------+
69   | Code     | Meaning                                   |
70   +==========+===========================================+
71   | ``'- '`` | line unique to sequence 1                 |
72   +----------+-------------------------------------------+
73   | ``'+ '`` | line unique to sequence 2                 |
74   +----------+-------------------------------------------+
75   | ``'  '`` | line common to both sequences             |
76   +----------+-------------------------------------------+
77   | ``'? '`` | line not present in either input sequence |
78   +----------+-------------------------------------------+
79
80   Lines beginning with '``?``' attempt to guide the eye to intraline differences,
81   and were not present in either input sequence. These lines can be confusing if
82   the sequences contain tab characters.
83
84
85.. class:: HtmlDiff
86
87   This class can be used to create an HTML table (or a complete HTML file
88   containing the table) showing a side by side, line by line comparison of text
89   with inter-line and intra-line change highlights.  The table can be generated in
90   either full or contextual difference mode.
91
92   The constructor for this class is:
93
94
95   .. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
96
97      Initializes instance of :class:`HtmlDiff`.
98
99      *tabsize* is an optional keyword argument to specify tab stop spacing and
100      defaults to ``8``.
101
102      *wrapcolumn* is an optional keyword to specify column number where lines are
103      broken and wrapped, defaults to ``None`` where lines are not wrapped.
104
105      *linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff`
106      (used by :class:`HtmlDiff` to generate the side by side HTML differences).  See
107      :func:`ndiff` documentation for argument default values and descriptions.
108
109   The following methods are public:
110
111   .. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \
112                         numlines=5, *, charset='utf-8')
113
114      Compares *fromlines* and *tolines* (lists of strings) and returns a string which
115      is a complete HTML file containing a table showing line by line differences with
116      inter-line and intra-line changes highlighted.
117
118      *fromdesc* and *todesc* are optional keyword arguments to specify from/to file
119      column header strings (both default to an empty string).
120
121      *context* and *numlines* are both optional keyword arguments. Set *context* to
122      ``True`` when contextual differences are to be shown, else the default is
123      ``False`` to show the full files. *numlines* defaults to ``5``.  When *context*
124      is ``True`` *numlines* controls the number of context lines which surround the
125      difference highlights.  When *context* is ``False`` *numlines* controls the
126      number of lines which are shown before a difference highlight when using the
127      "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
128      the next difference highlight at the top of the browser without any leading
129      context).
130
131      .. note::
132         *fromdesc* and *todesc* are interpreted as unescaped HTML and should be
133         properly escaped while receiving input from untrusted sources.
134
135      .. versionchanged:: 3.5
136         *charset* keyword-only argument was added.  The default charset of
137         HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``.
138
139   .. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
140
141      Compares *fromlines* and *tolines* (lists of strings) and returns a string which
142      is a complete HTML table showing line by line differences with inter-line and
143      intra-line changes highlighted.
144
145      The arguments for this method are the same as those for the :meth:`make_file`
146      method.
147
148   :file:`Tools/scripts/diff.py` is a command-line front-end to this class and
149   contains a good example of its use.
150
151
152.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
153
154   Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
155   generating the delta lines) in context diff format.
156
157   Context diffs are a compact way of showing just the lines that have changed plus
158   a few lines of context.  The changes are shown in a before/after style.  The
159   number of context lines is set by *n* which defaults to three.
160
161   By default, the diff control lines (those with ``***`` or ``---``) are created
162   with a trailing newline.  This is helpful so that inputs created from
163   :func:`io.IOBase.readlines` result in diffs that are suitable for use with
164   :func:`io.IOBase.writelines` since both the inputs and outputs have trailing
165   newlines.
166
167   For inputs that do not have trailing newlines, set the *lineterm* argument to
168   ``""`` so that the output will be uniformly newline free.
169
170   The context diff format normally has a header for filenames and modification
171   times.  Any or all of these may be specified using strings for *fromfile*,
172   *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
173   expressed in the ISO 8601 format. If not specified, the
174   strings default to blanks.
175
176      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
177      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
178      >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py'))
179      *** before.py
180      --- after.py
181      ***************
182      *** 1,4 ****
183      ! bacon
184      ! eggs
185      ! ham
186        guido
187      --- 1,4 ----
188      ! python
189      ! eggy
190      ! hamster
191        guido
192
193   See :ref:`difflib-interface` for a more detailed example.
194
195
196.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
197
198   Return a list of the best "good enough" matches.  *word* is a sequence for which
199   close matches are desired (typically a string), and *possibilities* is a list of
200   sequences against which to match *word* (typically a list of strings).
201
202   Optional argument *n* (default ``3``) is the maximum number of close matches to
203   return; *n* must be greater than ``0``.
204
205   Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1].
206   Possibilities that don't score at least that similar to *word* are ignored.
207
208   The best (no more than *n*) matches among the possibilities are returned in a
209   list, sorted by similarity score, most similar first.
210
211      >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
212      ['apple', 'ape']
213      >>> import keyword
214      >>> get_close_matches('wheel', keyword.kwlist)
215      ['while']
216      >>> get_close_matches('pineapple', keyword.kwlist)
217      []
218      >>> get_close_matches('accept', keyword.kwlist)
219      ['except']
220
221
222.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
223
224   Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
225   delta (a :term:`generator` generating the delta lines).
226
227   Optional keyword parameters *linejunk* and *charjunk* are filtering functions
228   (or ``None``):
229
230   *linejunk*: A function that accepts a single string argument, and returns
231   true if the string is junk, or false if not. The default is ``None``. There
232   is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
233   without visible characters, except for at most one pound character (``'#'``)
234   -- however the underlying :class:`SequenceMatcher` class does a dynamic
235   analysis of which lines are so frequent as to constitute noise, and this
236   usually works better than using this function.
237
238   *charjunk*: A function that accepts a character (a string of length 1), and
239   returns if the character is junk, or false if not. The default is module-level
240   function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
241   blank or tab; it's a bad idea to include newline in this!).
242
243   :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
244
245      >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
246      ...              'ore\ntree\nemu\n'.splitlines(keepends=True))
247      >>> print(''.join(diff), end="")
248      - one
249      ?  ^
250      + ore
251      ?  ^
252      - two
253      - three
254      ?  -
255      + tree
256      + emu
257
258
259.. function:: restore(sequence, which)
260
261   Return one of the two sequences that generated a delta.
262
263   Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract
264   lines originating from file 1 or 2 (parameter *which*), stripping off line
265   prefixes.
266
267   Example:
268
269      >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
270      ...              'ore\ntree\nemu\n'.splitlines(keepends=True))
271      >>> diff = list(diff) # materialize the generated delta into a list
272      >>> print(''.join(restore(diff, 1)), end="")
273      one
274      two
275      three
276      >>> print(''.join(restore(diff, 2)), end="")
277      ore
278      tree
279      emu
280
281
282.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
283
284   Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
285   generating the delta lines) in unified diff format.
286
287   Unified diffs are a compact way of showing just the lines that have changed plus
288   a few lines of context.  The changes are shown in an inline style (instead of
289   separate before/after blocks).  The number of context lines is set by *n* which
290   defaults to three.
291
292   By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
293   created with a trailing newline.  This is helpful so that inputs created from
294   :func:`io.IOBase.readlines` result in diffs that are suitable for use with
295   :func:`io.IOBase.writelines` since both the inputs and outputs have trailing
296   newlines.
297
298   For inputs that do not have trailing newlines, set the *lineterm* argument to
299   ``""`` so that the output will be uniformly newline free.
300
301   The context diff format normally has a header for filenames and modification
302   times.  Any or all of these may be specified using strings for *fromfile*,
303   *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
304   expressed in the ISO 8601 format. If not specified, the
305   strings default to blanks.
306
307
308      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
309      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
310      >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py'))
311      --- before.py
312      +++ after.py
313      @@ -1,4 +1,4 @@
314      -bacon
315      -eggs
316      -ham
317      +python
318      +eggy
319      +hamster
320       guido
321
322   See :ref:`difflib-interface` for a more detailed example.
323
324.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n')
325
326   Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a
327   sequence of delta lines (also bytes) in the format returned by *dfunc*.
328   *dfunc* must be a callable, typically either :func:`unified_diff` or
329   :func:`context_diff`.
330
331   Allows you to compare data with unknown or inconsistent encoding. All
332   inputs except *n* must be bytes objects, not str. Works by losslessly
333   converting all inputs (except *n*) to str, and calling ``dfunc(a, b,
334   fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of
335   *dfunc* is then converted back to bytes, so the delta lines that you
336   receive have the same unknown/inconsistent encodings as *a* and *b*.
337
338   .. versionadded:: 3.5
339
340.. function:: IS_LINE_JUNK(line)
341
342   Return ``True`` for ignorable lines.  The line *line* is ignorable if *line* is
343   blank or contains a single ``'#'``, otherwise it is not ignorable.  Used as a
344   default for parameter *linejunk* in :func:`ndiff` in older versions.
345
346
347.. function:: IS_CHARACTER_JUNK(ch)
348
349   Return ``True`` for ignorable characters.  The character *ch* is ignorable if *ch*
350   is a space or tab, otherwise it is not ignorable.  Used as a default for
351   parameter *charjunk* in :func:`ndiff`.
352
353
354.. seealso::
355
356   `Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
357      Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
358      was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988.
359
360
361.. _sequence-matcher:
362
363SequenceMatcher Objects
364-----------------------
365
366The :class:`SequenceMatcher` class has this constructor:
367
368
369.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
370
371   Optional argument *isjunk* must be ``None`` (the default) or a one-argument
372   function that takes a sequence element and returns true if and only if the
373   element is "junk" and should be ignored. Passing ``None`` for *isjunk* is
374   equivalent to passing ``lambda x: False``; in other words, no elements are ignored.
375   For example, pass::
376
377      lambda x: x in " \t"
378
379   if you're comparing lines as sequences of characters, and don't want to synch up
380   on blanks or hard tabs.
381
382   The optional arguments *a* and *b* are sequences to be compared; both default to
383   empty strings.  The elements of both sequences must be :term:`hashable`.
384
385   The optional argument *autojunk* can be used to disable the automatic junk
386   heuristic.
387
388   .. versionadded:: 3.2
389      The *autojunk* parameter.
390
391   SequenceMatcher objects get three data attributes: *bjunk* is the
392   set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of
393   non-junk elements considered popular by the heuristic (if it is not
394   disabled); *b2j* is a dict mapping the remaining elements of *b* to a list
395   of positions where they occur. All three are reset whenever *b* is reset
396   with :meth:`set_seqs` or :meth:`set_seq2`.
397
398   .. versionadded:: 3.2
399      The *bjunk* and *bpopular* attributes.
400
401   :class:`SequenceMatcher` objects have the following methods:
402
403   .. method:: set_seqs(a, b)
404
405      Set the two sequences to be compared.
406
407   :class:`SequenceMatcher` computes and caches detailed information about the
408   second sequence, so if you want to compare one sequence against many
409   sequences, use :meth:`set_seq2` to set the commonly used sequence once and
410   call :meth:`set_seq1` repeatedly, once for each of the other sequences.
411
412
413   .. method:: set_seq1(a)
414
415      Set the first sequence to be compared.  The second sequence to be compared
416      is not changed.
417
418
419   .. method:: set_seq2(b)
420
421      Set the second sequence to be compared.  The first sequence to be compared
422      is not changed.
423
424
425   .. method:: find_longest_match(alo, ahi, blo, bhi)
426
427      Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
428
429      If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns
430      ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
431      <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
432      k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
433      <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
434      all maximal matching blocks, return one that starts earliest in *a*, and
435      of all those maximal matching blocks that start earliest in *a*, return
436      the one that starts earliest in *b*.
437
438         >>> s = SequenceMatcher(None, " abcd", "abcd abcd")
439         >>> s.find_longest_match(0, 5, 0, 9)
440         Match(a=0, b=4, size=5)
441
442      If *isjunk* was provided, first the longest matching block is determined
443      as above, but with the additional restriction that no junk element appears
444      in the block.  Then that block is extended as far as possible by matching
445      (only) junk elements on both sides. So the resulting block never matches
446      on junk except as identical junk happens to be adjacent to an interesting
447      match.
448
449      Here's the same example as before, but considering blanks to be junk. That
450      prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
451      second sequence directly.  Instead only the ``'abcd'`` can match, and
452      matches the leftmost ``'abcd'`` in the second sequence:
453
454         >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
455         >>> s.find_longest_match(0, 5, 0, 9)
456         Match(a=1, b=0, size=4)
457
458      If no blocks match, this returns ``(alo, blo, 0)``.
459
460      This method returns a :term:`named tuple` ``Match(a, b, size)``.
461
462
463   .. method:: get_matching_blocks()
464
465      Return list of triples describing non-overlapping matching subsequences.
466      Each triple is of the form ``(i, j, n)``,
467      and means that ``a[i:i+n] == b[j:j+n]``.  The
468      triples are monotonically increasing in *i* and *j*.
469
470      The last triple is a dummy, and has the value ``(len(a), len(b), 0)``.  It
471      is the only triple with ``n == 0``.  If ``(i, j, n)`` and ``(i', j', n')``
472      are adjacent triples in the list, and the second is not the last triple in
473      the list, then ``i+n < i'`` or ``j+n < j'``; in other words, adjacent
474      triples always describe non-adjacent equal blocks.
475
476      .. XXX Explain why a dummy is used!
477
478      .. doctest::
479
480         >>> s = SequenceMatcher(None, "abxcd", "abcd")
481         >>> s.get_matching_blocks()
482         [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
483
484
485   .. method:: get_opcodes()
486
487      Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is
488      of the form ``(tag, i1, i2, j1, j2)``.  The first tuple has ``i1 == j1 ==
489      0``, and remaining tuples have *i1* equal to the *i2* from the preceding
490      tuple, and, likewise, *j1* equal to the previous *j2*.
491
492      The *tag* values are strings, with these meanings:
493
494      +---------------+---------------------------------------------+
495      | Value         | Meaning                                     |
496      +===============+=============================================+
497      | ``'replace'`` | ``a[i1:i2]`` should be replaced by          |
498      |               | ``b[j1:j2]``.                               |
499      +---------------+---------------------------------------------+
500      | ``'delete'``  | ``a[i1:i2]`` should be deleted.  Note that  |
501      |               | ``j1 == j2`` in this case.                  |
502      +---------------+---------------------------------------------+
503      | ``'insert'``  | ``b[j1:j2]`` should be inserted at          |
504      |               | ``a[i1:i1]``. Note that ``i1 == i2`` in     |
505      |               | this case.                                  |
506      +---------------+---------------------------------------------+
507      | ``'equal'``   | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences |
508      |               | are equal).                                 |
509      +---------------+---------------------------------------------+
510
511      For example::
512
513        >>> a = "qabxcd"
514        >>> b = "abycdf"
515        >>> s = SequenceMatcher(None, a, b)
516        >>> for tag, i1, i2, j1, j2 in s.get_opcodes():
517        ...     print('{:7}   a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
518        ...         tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
519        delete    a[0:1] --> b[0:0]      'q' --> ''
520        equal     a[1:3] --> b[0:2]     'ab' --> 'ab'
521        replace   a[3:4] --> b[2:3]      'x' --> 'y'
522        equal     a[4:6] --> b[3:5]     'cd' --> 'cd'
523        insert    a[6:6] --> b[5:6]       '' --> 'f'
524
525
526   .. method:: get_grouped_opcodes(n=3)
527
528      Return a :term:`generator` of groups with up to *n* lines of context.
529
530      Starting with the groups returned by :meth:`get_opcodes`, this method
531      splits out smaller change clusters and eliminates intervening ranges which
532      have no changes.
533
534      The groups are returned in the same format as :meth:`get_opcodes`.
535
536
537   .. method:: ratio()
538
539      Return a measure of the sequences' similarity as a float in the range [0,
540      1].
541
542      Where T is the total number of elements in both sequences, and M is the
543      number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
544      sequences are identical, and ``0.0`` if they have nothing in common.
545
546      This is expensive to compute if :meth:`get_matching_blocks` or
547      :meth:`get_opcodes` hasn't already been called, in which case you may want
548      to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
549      upper bound.
550
551      .. note::
552
553         Caution: The result of a :meth:`ratio` call may depend on the order of
554         the arguments. For instance::
555
556            >>> SequenceMatcher(None, 'tide', 'diet').ratio()
557            0.25
558            >>> SequenceMatcher(None, 'diet', 'tide').ratio()
559            0.5
560
561
562   .. method:: quick_ratio()
563
564      Return an upper bound on :meth:`ratio` relatively quickly.
565
566
567   .. method:: real_quick_ratio()
568
569      Return an upper bound on :meth:`ratio` very quickly.
570
571
572The three methods that return the ratio of matching to total characters can give
573different results due to differing levels of approximation, although
574:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
575:meth:`ratio`:
576
577   >>> s = SequenceMatcher(None, "abcd", "bcde")
578   >>> s.ratio()
579   0.75
580   >>> s.quick_ratio()
581   0.75
582   >>> s.real_quick_ratio()
583   1.0
584
585
586.. _sequencematcher-examples:
587
588SequenceMatcher Examples
589------------------------
590
591This example compares two strings, considering blanks to be "junk":
592
593   >>> s = SequenceMatcher(lambda x: x == " ",
594   ...                     "private Thread currentThread;",
595   ...                     "private volatile Thread currentThread;")
596
597:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
598sequences.  As a rule of thumb, a :meth:`ratio` value over 0.6 means the
599sequences are close matches:
600
601   >>> print(round(s.ratio(), 3))
602   0.866
603
604If you're only interested in where the sequences match,
605:meth:`get_matching_blocks` is handy:
606
607   >>> for block in s.get_matching_blocks():
608   ...     print("a[%d] and b[%d] match for %d elements" % block)
609   a[0] and b[0] match for 8 elements
610   a[8] and b[17] match for 21 elements
611   a[29] and b[38] match for 0 elements
612
613Note that the last tuple returned by :meth:`get_matching_blocks` is always a
614dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
615tuple element (number of elements matched) is ``0``.
616
617If you want to know how to change the first sequence into the second, use
618:meth:`get_opcodes`:
619
620   >>> for opcode in s.get_opcodes():
621   ...     print("%6s a[%d:%d] b[%d:%d]" % opcode)
622    equal a[0:8] b[0:8]
623   insert a[8:8] b[8:17]
624    equal a[8:29] b[17:38]
625
626.. seealso::
627
628   * The :func:`get_close_matches` function in this module which shows how
629     simple code building on :class:`SequenceMatcher` can be used to do useful
630     work.
631
632   * `Simple version control recipe
633     <https://code.activestate.com/recipes/576729/>`_ for a small application
634     built with :class:`SequenceMatcher`.
635
636
637.. _differ-objects:
638
639Differ Objects
640--------------
641
642Note that :class:`Differ`\ -generated deltas make no claim to be **minimal**
643diffs. To the contrary, minimal diffs are often counter-intuitive, because they
644synch up anywhere possible, sometimes accidental matches 100 pages apart.
645Restricting synch points to contiguous matches preserves some notion of
646locality, at the occasional cost of producing a longer diff.
647
648The :class:`Differ` class has this constructor:
649
650
651.. class:: Differ(linejunk=None, charjunk=None)
652   :noindex:
653
654   Optional keyword parameters *linejunk* and *charjunk* are for filter functions
655   (or ``None``):
656
657   *linejunk*: A function that accepts a single string argument, and returns true
658   if the string is junk.  The default is ``None``, meaning that no line is
659   considered junk.
660
661   *charjunk*: A function that accepts a single character argument (a string of
662   length 1), and returns true if the character is junk. The default is ``None``,
663   meaning that no character is considered junk.
664
665   These junk-filtering functions speed up matching to find
666   differences and do not cause any differing lines or characters to
667   be ignored.  Read the description of the
668   :meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
669   parameter for an explanation.
670
671   :class:`Differ` objects are used (deltas generated) via a single method:
672
673
674   .. method:: Differ.compare(a, b)
675
676      Compare two sequences of lines, and generate the delta (a sequence of lines).
677
678      Each sequence must contain individual single-line strings ending with
679      newlines.  Such sequences can be obtained from the
680      :meth:`~io.IOBase.readlines` method of file-like objects.  The delta
681      generated also consists of newline-terminated strings, ready to be
682      printed as-is via the :meth:`~io.IOBase.writelines` method of a
683      file-like object.
684
685
686.. _differ-examples:
687
688Differ Example
689--------------
690
691This example compares two texts. First we set up the texts, sequences of
692individual single-line strings ending with newlines (such sequences can also be
693obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects):
694
695   >>> text1 = '''  1. Beautiful is better than ugly.
696   ...   2. Explicit is better than implicit.
697   ...   3. Simple is better than complex.
698   ...   4. Complex is better than complicated.
699   ... '''.splitlines(keepends=True)
700   >>> len(text1)
701   4
702   >>> text1[0][-1]
703   '\n'
704   >>> text2 = '''  1. Beautiful is better than ugly.
705   ...   3.   Simple is better than complex.
706   ...   4. Complicated is better than complex.
707   ...   5. Flat is better than nested.
708   ... '''.splitlines(keepends=True)
709
710Next we instantiate a Differ object:
711
712   >>> d = Differ()
713
714Note that when instantiating a :class:`Differ` object we may pass functions to
715filter out line and character "junk."  See the :meth:`Differ` constructor for
716details.
717
718Finally, we compare the two:
719
720   >>> result = list(d.compare(text1, text2))
721
722``result`` is a list of strings, so let's pretty-print it:
723
724   >>> from pprint import pprint
725   >>> pprint(result)
726   ['    1. Beautiful is better than ugly.\n',
727    '-   2. Explicit is better than implicit.\n',
728    '-   3. Simple is better than complex.\n',
729    '+   3.   Simple is better than complex.\n',
730    '?     ++\n',
731    '-   4. Complex is better than complicated.\n',
732    '?            ^                     ---- ^\n',
733    '+   4. Complicated is better than complex.\n',
734    '?           ++++ ^                      ^\n',
735    '+   5. Flat is better than nested.\n']
736
737As a single multi-line string it looks like this:
738
739   >>> import sys
740   >>> sys.stdout.writelines(result)
741       1. Beautiful is better than ugly.
742   -   2. Explicit is better than implicit.
743   -   3. Simple is better than complex.
744   +   3.   Simple is better than complex.
745   ?     ++
746   -   4. Complex is better than complicated.
747   ?            ^                     ---- ^
748   +   4. Complicated is better than complex.
749   ?           ++++ ^                      ^
750   +   5. Flat is better than nested.
751
752
753.. _difflib-interface:
754
755A command-line interface to difflib
756-----------------------------------
757
758This example shows how to use difflib to create a ``diff``-like utility.
759It is also contained in the Python source distribution, as
760:file:`Tools/scripts/diff.py`.
761
762.. literalinclude:: ../../Tools/scripts/diff.py
763