1# Code dedicated to the computation and properties of "stable ranges"
2#
3# These stable ranges are use for obsolescence markers discovery
4#
5# Copyright 2017 Pierre-Yves David <pierre-yves.david@ens-lyon.org>
6#
7# This software may be used and distributed according to the terms of the
8# GNU General Public License version 2 or any later version.
9r"""stable range
10
11General Goals and Properties
12---------------------------
13
14Stable-ranges get useful when some logic needs a recursive way to slice the
15history of a repository in smaller and smaller group of revisions. Here is
16example of such use cases:
17
18* **bundle caching:**
19  With an easy way to slice any subsets of history into stable-ranges, we
20  can cache a small number of bundles covering these ranges and reuse them for
21  other pull operations. Even if the pull operation have different boudaries.
22
23* **metadata discovery:**
24  With a simple way to recursively look at smaller and smaller ranges, an
25  algorithm can do fine-grained discovery of area of history where some mutable
26  metadata differ from one repository to another. Such meta data can be
27  obsolescence markers, CI status, lightweight stag, etc...
28
29To fix these use cases best, stable-ranges need some important properties:
30
31* the total number of ranges needed to cover a full repository is well bounded.
32* the minimal number of ranges to cover an arbitrary subset of the history is well bounded
33* for the same section of history, the range will be the same on any
34  repositories,
35* the ranges are cheap to compute iteratively, each new revisions re-uses the
36  ranges previous revisions uses.
37
38Simple introduction to the Concepts
39-----------------------------------
40
41To keep things simple, let us look at the issue on a linear history::
42
43  A -> B -> C -> D -> E -> F -> G -> H
44
45To make sure we have range that cover each part of the history with a good
46granularity we use some binary recursion. The standard stable range will be:
47
48 [A -> B -> C -> D -> E -> F -> G -> H] size 8
49 [A -> B -> C -> D]  [E -> F -> G -> H] size 4
50 [A -> B]  [C -> D]  [E -> F]  [G -> H] size 2
51 [A]  [B]  [C]  [D]  [E]  [F]  [G]  [H] size 1
52
53Well bounded total number of ranges:
54````````````````````````````````````
55
56This binary slicing make sure we keep the total number of stable ranges under control.
57
58As you can see, we have N size 1 ranges. They are trivial and we don't care
59about them. Then we have: N/2 size 2 ranges + N/4 size 4 ranges + N/8 size 8
60ranges, etc... So a total of about "length(repo)" standard ranges.
61
62
63Well bounded number of range to cover a subset:
64```````````````````````````````````````````````
65
66Any subset of the history can be expressed with this standard ranges.
67
68For example, [A, F] subset, can be covered with 2 ranges::
69
70  [A ->[B -> C -> D]  [E -> F]
71
72A less strivial example [B, F], still requires a small number of ranges (3)::
73
74  [B]  [C -> D]  [E -> F]
75
76In practice, any subset can be expressed in at most "2 x log2(length(subset))"
77stable range, well bounded value.
78
79Cheap incremental updates
80`````````````````````````
81
82The scheme describe above result in 2N subranges for a repository is size N. We
83do not want to have to recompute these 2N stable-ranges whenever a new revision
84is added to the repository. To achieve these, the stable-ranges are defined by
85**fixed boundaries** that are independant from the total size of the
86repository. Here is how it looks like in our example.
87
88We start with a repository having only [A, F]. Notice how we still creates
89power of two sized stable range::
90
91  [A -> B -> C -> D]
92  [A -> B]  [C -> D]  [E -> F]
93  [A]  [B]  [C]  [D]  [E]  [F]
94
95This we simply adds a new revision G, we reuse more the range we already have::
96
97  [A -> B -> C -> D]
98  [A -> B]  [C -> D]  [E -> F]
99  [A]  [B]  [C]  [D]  [E]  [F]  [G]
100
101Adding H is a bigger even as we read a new boundary.
102
103  [A -> B -> C -> D -> E -> F -> G -> H]
104  [A -> B -> C -> D]  [E -> F -> G -> H]
105  [A -> B]  [C -> D]  [E -> F]  [G -> H]
106  [A]  [B]  [C]  [D]  [E]  [F]  [G]  [H]
107
108At most, adding a new revision `R` will introduces `log2(length(::R))` new
109stable ranges.
110
111More advanced elements
112----------------------
113
114Of course, the history of repository is not as simple as our linear example. So
115how do we deal with the branching and merging? To do so, we leverage the
116"stable sort" algorithm defined in the `stablesort.py` module. To define the
117stable range that compose a set of revison `::R`, we linearize the space by
118sorting it. The stable sort algorithm has two important property:
119
120First, it give the same result on different repository. So we can use it to
121algorithm involving multiple peers.
122
123Second, in case of merge, it reuse the same order as the parents as much as possible.
124This is important to keep reusing existing stable range as the repository grow.
125
126How are ranges defined?
127```````````````````````
128
129To keep things small, and simple, a stable range always contains the final part
130of a `stablesort(::R)` set of revision. It is defined by two items:
131
132* its head revision, the R in `stablesort(::R)`
133* the size of that range... well almost, for implementation reason, it uses the
134  index of the first included item. Or in other word, the number of excluded
135  initial item in `stablesort(::R)`.
136
137Lets look at a practical case. In our initial example, `[A, B, C, D, E, F, G,
138H]` is H-0; `[E, F, G, H]` is H-4; `[G, H]` is H-6 and `[H]` is H-7.
139
140Let us look at a non linar graph::
141
142 A - B - C - E
143       |    /
144        -D
145
146and assume that `stablesort(::E) = [A, B, C, D, E]`. Then `[A, B, C]` is C-0,
147`[A, B, D]` is D-0; `[D, E]` is E-3, `[E]` is E-4, etc...
148
149Slicing in a non linear context
150```````````````````````````````
151
152Branching can also affect the way we slice things.
153
154The small example above offers a simple example. For a size 5 (starting at the
155root), standard slicing will want a size 4 part and size 1 part. So, in a
156simple linear space `[A, B, C, D, E]` would be sliced as `[A, B, C, D] + [E]`.
157However, in our non-linear case, `[A, B, C, D]` has two heads (C and D) and
158cannot be expressed with a single range. As a result the range will be sliced
159into more sub ranges::
160
161    stdslice(A-0) = [A, B, C] + [D] + [E] = C-0 + D-2 + A-4
162
163Yet, this does not mean ranges containing a merge will always result in slicing
164with many revision. the sub ranges might also silently contains them. Let us
165look at an exemple::
166
167  A - B - C - D - E --- G - H
168        |             /
169         ---------- F
170
171with::
172
173  `stablesort(::H) == [A, B, C, D, E, F, G, H]`
174
175then::
176
177  stdslice(H-0) = [A, B, C, D] + [E, F, G, H] = D-0 + H-4
178
179As a result the non linearity will increase the number of subranges involved,
180but in practice the impact stay limited.
181
182The total number of standard subranges stay under control with about
183`O(log2(N))` new stable range introduced for each new revision. In practice the
184total number of stableranges we have is about `O(length(repo))`
185
186In addition, it is worth nothing that the head of the extra ranges we have to
187use will match the destination of the "jump" cached by the stablesort
188algorithm. So, all this slicing can usually be done without iterating over the
189stable sorted revision.
190
191Caching Strategy
192----------------
193
194The current caching strategy use a very space inefficient sqlite database.
195testing show it often take 100x more space than what basic binary storage would
196take. The sqlite storage was very useful at the proof of concept stage.
197
198Since all new stable-ranges introduced by a revision R will be "R headed". So we
199could easily store their standard subranges alongside the revision information
200to reuse the existing revision index.
201
202Subrange information can be efficiently stored, a naive approach storing all
203stable ranges and their subranges would requires just 2 integer per range + 2
204integer for any extra sub-ranges other than the first and last ones.
205
206We can probably push efficiency further by taking advantage of the large
207overlap in subranges for one non-merge revision to the next. This is probably a
208premature optimisation until we start getting actual result for a naive binary
209storage.
210
211To use this at a large scale, it would be important to compute these data at
212commit time and to exchange them alongside the revision over the network. This
213is similar to what we do for other cached data.
214
215It is also important to note that the smaller ranges can probably be computed
216on the fly instead of being cached. The exact tradeoff would requires some
217field testing.
218
219Performance
220-----------
221
222The current implementation has not been especially optimized for performance.
223The goal was mostly to get the order of magnitude of the algorithm complexity.
224The result are convincing: medium repository get a full cache warming in a
225couple of seconds and even very large and branchy repository get a fully warmed
226in the order of tens of minutes. We do not observes a complexity explosion
227making the algorithm unusable of large repositories.
228
229A better cache implementation combined with an optimized version of the
230algorithm should give much faster performance. Combined with commit-time
231computation and exchange over the network, the overall impact of this should be
232invisible to the user.
233
234The stable range is currently successfully used in production for 2 use cases:
235* obsolescence markers discovery,
236* caching precomputed bundle while serving pulls
237
238practical data
239--------------
240
241The evolve repository:
242
243    number of revisions:          4833
244    number of heads:                15
245    number of merge:               612 ( 12%)
246    number of range:              4826
247      with   2 subranges:         4551 ( 94%)
248      with   3 subranges:          255 (  5%)
249      with   4 subranges:           12 (  0%)
250      with   5 subranges:            7 (  0%)
251      with   8 subranges:            1 (  0%)
252    average range/revs:              0.99
253
254    Estimated approximative size of a naive compact storage:
255           41 056 bytes
256    Current size of the sqlite cache (for comparison):
257        5 312 512 bytes
258
259The mercurial repository:
260
261    number of revisions:         42849
262    number of heads:                 2
263    number of merge:              2647 (  6%)
264    number of range:             41279
265      with   2 subranges:        39740 ( 96%)
266      with   3 subranges:         1494 (  3%)
267      with   4 subranges:           39 (  0%)
268      with   5 subranges:            5 (  0%)
269      with   7 subranges:            1 (  0%)
270    average range/revs:              0.96
271    Estimated approximative size of a naive compact storage:
272           342 968 bytes
273    Current size of the sqlite cache (for comparison):
274        62 803 968 bytes
275
276The pypy repository (very brancy history):
277
278    number of revisions:         97409
279    number of heads:               183
280    number of merge:              8371 (  8%)
281    number of range:            107025
282      with   2 subranges:       100166 ( 93%)
283      with   3 subranges:         5839 (  5%)
284      with   4 subranges:          605 (  0%)
285      with   5 subranges:          189 (  0%)
286      with   6 subranges:           90 (  0%)
287      with   7 subranges:           38 (  0%)
288      with   8 subranges:           18 (  0%)
289      with   9 subranges:            9 (  0%)
290      with  10 subranges:           15 (  0%)
291      with  11 subranges:            4 (  0%)
292      with  12 subranges:            6 (  0%)
293      with  13 subranges:            7 (  0%)
294      with  14 subranges:            6 (  0%)
295      with  15 subranges:            1 (  0%)
296      with  16 subranges:            2 (  0%)
297      with  17 subranges:            2 (  0%)
298      with  18 subranges:            3 (  0%)
299      with  19 subranges:            2 (  0%)
300      with  20 subranges:            3 (  0%)
301      with  25 subranges:            1 (  0%)
302      with  27 subranges:            1 (  0%)
303      with  31 subranges:            3 (  0%)
304      with  32 subranges:            2 (  0%)
305      with  33 subranges:            1 (  0%)
306      with  35 subranges:            1 (  0%)
307      with  43 subranges:            1 (  0%)
308      with  44 subranges:            1 (  0%)
309      with  45 subranges:            2 (  0%)
310      with  47 subranges:            1 (  0%)
311      with  51 subranges:            1 (  0%)
312      with  52 subranges:            1 (  0%)
313      with  57 subranges:            1 (  0%)
314      with  65 subranges:            1 (  0%)
315      with  73 subranges:            1 (  0%)
316      with  79 subranges:            1 (  0%)
317    average range/revs:              1.10
318    Estimated approximative size of a naive compact storage:
319          934 176 bytes
320    Current size of the sqlite cache (for comparison):
321      201 236 480 bytes
322
323A private and branchy repository:
324
325    number of revisions:        605011
326    number of heads:             14061
327    number of merge:            118109 ( 19%)
328    number of range:            747625
329      with   2 subranges:       595985 ( 79%)
330      with   3 subranges:       130196 ( 17%)
331      with   4 subranges:        14093 (  1%)
332      with   5 subranges:         4090 (  0%)
333      with   6 subranges:          741 (  0%)
334      with   7 subranges:          826 (  0%)
335      with   8 subranges:         1313 (  0%)
336      with   9 subranges:           83 (  0%)
337      with  10 subranges:           22 (  0%)
338      with  11 subranges:            9 (  0%)
339      with  12 subranges:           26 (  0%)
340      with  13 subranges:            5 (  0%)
341      with  14 subranges:            9 (  0%)
342      with  15 subranges:            3 (  0%)
343      with  16 subranges:          212 (  0%)
344      with  18 subranges:            6 (  0%)
345      with  19 subranges:            3 (  0%)
346      with  24 subranges:            1 (  0%)
347      with  27 subranges:            1 (  0%)
348      with  32 subranges:            1 (  0%)
349    average range/revs:              1.23
350    Estimated approximative size of a naive compact storage:
351        7 501 928 bytes
352    Current size of the sqlite cache (for comparison):
353    1 950 310 400 bytes
354"""
355
356import abc
357import functools
358import heapq
359import math
360import time
361
362from mercurial import (
363    error,
364    node as nodemod,
365    scmutil,
366    util,
367)
368
369from mercurial.i18n import _
370
371from . import (
372    exthelper,
373    firstmergecache,
374    stablesort,
375    utility,
376)
377
378filterparents = utility.filterparents
379
380eh = exthelper.exthelper()
381
382
383#################################
384### Stable Range computation  ###
385#################################
386
387def _hlp2(i):
388    """return highest power of two lower than 'i'"""
389    return 2 ** int(math.log(i - 1, 2))
390
391def subrangesclosure(repo, stablerange, heads):
392    """set of all standard subrange under heads
393
394    This is intended for debug purposes. Range are returned from largest to
395    smallest in terms of number of revision it contains."""
396    subranges = stablerange.subranges
397    toproceed = [(r, 0, ) for r in heads]
398    ranges = set(toproceed)
399    while toproceed:
400        entry = toproceed.pop()
401        for r in subranges(repo, entry):
402            if r not in ranges:
403                ranges.add(r)
404                toproceed.append(r)
405    ranges = list(ranges)
406    n = repo.changelog.node
407    rangelength = stablerange.rangelength
408    ranges.sort(key=lambda r: (-rangelength(repo, r), n(r[0])))
409    return ranges
410
411_stablerangemethodmap = {
412    b'branchpoint': lambda repo: stablerange(),
413    b'default': lambda repo: repo.stablerange,
414    b'basic-branchpoint': lambda repo: stablerangebasic(),
415    b'basic-mergepoint': lambda repo: stablerangedummy_mergepoint(),
416    b'mergepoint': lambda repo: stablerange_mergepoint(),
417}
418
419@eh.command(
420    b'debugstablerange',
421    [
422        (b'r', b'rev', [], b'operate on (rev, 0) ranges for rev in REVS'),
423        (b'', b'subranges', False, b'recursively display data for subranges too'),
424        (b'', b'verify', False, b'checks subranges content (EXPENSIVE)'),
425        (b'', b'method', b'mergepoint',
426         b'method to use, one of "branchpoint", "mergepoint"')
427    ],
428    _(b''))
429def debugstablerange(ui, repo, **opts):
430    """display standard stable subrange for a set of ranges
431
432    Range as displayed as '<node>-<index> (<rev>, <depth>, <length>)', use
433    --verbose to get the extra details in ().
434    """
435    short = nodemod.short
436    revs = scmutil.revrange(repo, opts['rev'])
437    if not revs:
438        raise error.Abort(b'no revisions specified')
439    if ui.verbose:
440        template = b'%s-%d (%d, %d, %d)'
441
442        def _rangestring(repo, rangeid):
443            return template % (
444                short(node(rangeid[0])),
445                rangeid[1],
446                rangeid[0],
447                depth(unfi, rangeid[0]),
448                length(unfi, rangeid)
449            )
450    else:
451        template = b'%s-%d'
452
453        def _rangestring(repo, rangeid):
454            return template % (
455                short(node(rangeid[0])),
456                rangeid[1],
457            )
458    # prewarm depth cache
459    unfi = repo.unfiltered()
460    node = unfi.changelog.node
461
462    method = opts['method']
463    getstablerange = _stablerangemethodmap.get(method)
464    if getstablerange is None:
465        raise error.Abort(b'unknown stable sort method: "%s"' % method)
466
467    stablerange = getstablerange(unfi)
468    depth = stablerange.depthrev
469    length = stablerange.rangelength
470    subranges = stablerange.subranges
471    stablerange.warmup(repo, max(revs))
472
473    if opts['subranges']:
474        ranges = subrangesclosure(unfi, stablerange, revs)
475    else:
476        ranges = [(r, 0) for r in revs]
477
478    for r in ranges:
479        subs = subranges(unfi, r)
480        subsstr = b', '.join(_rangestring(unfi, s) for s in subs)
481        rstr = _rangestring(unfi, r)
482        if opts['verify']:
483            status = b'leaf'
484            if 1 < length(unfi, r):
485                status = b'complete'
486                revs = set(stablerange.revsfromrange(unfi, r))
487                subrevs = set()
488                for s in subs:
489                    subrevs.update(stablerange.revsfromrange(unfi, s))
490                if revs != subrevs:
491                    status = b'missing'
492            ui.status(b'%s [%s] - %s\n' % (rstr, status, subsstr))
493        else:
494            ui.status(b'%s - %s\n' % (rstr, subsstr))
495
496class abstractstablerange(object):
497    """The official API for a stablerange"""
498
499    __metaclass__ = abc.ABCMeta
500
501    @abc.abstractmethod
502    def subranges(self, repo, rangeid):
503        """return the stable sub-ranges of a rangeid"""
504        raise NotImplementedError()
505
506    @abc.abstractmethod
507    def revsfromrange(self, repo, rangeid):
508        """return revision contained in a range"""
509        raise NotImplementedError()
510
511    @abc.abstractmethod
512    def depthrev(self, repo, rev):
513        """depth a revision"""
514        # Exist to allow basic implementation to ignore the depthcache
515        # Could be demoted to _depthrev.
516        raise NotImplementedError()
517
518    @abc.abstractmethod
519    def warmup(self, repo, upto=None):
520        """warmup the stable range cache"""
521        raise NotImplementedError()
522
523    @abc.abstractmethod
524    def rangelength(self, repo, rangeid):
525        """number of revision in <range>"""
526        raise NotImplementedError()
527
528    def _slicepoint(self, repo, rangeid):
529        """find the standard slicing point for a range"""
530        rangedepth = self.depthrev(repo, rangeid[0])
531        step = _hlp2(rangedepth)
532        standard_start = 0
533        while standard_start < rangeid[1] and 0 < step:
534            if standard_start + step < rangedepth:
535                standard_start += step
536            step //= 2
537        if rangeid[1] == standard_start:
538            slicesize = _hlp2(self.rangelength(repo, rangeid))
539            slicepoint = rangeid[1] + slicesize
540        else:
541            assert standard_start < rangedepth
542            slicepoint = standard_start
543        return slicepoint
544class stablerangebasic(abstractstablerange):
545    """a very dummy implementation of stablerange
546
547    the implementation is here to lay down the basic algorithm in the stable
548    range in a inefficient but easy to read manners. It should be used by test
549    to validate output."""
550
551    __metaclass__ = abc.ABCMeta
552
553    def _sortfunction(self, repo, headrev):
554        return stablesort.stablesort_branchpoint(repo, [headrev])
555
556    def warmup(self, repo, upto=None):
557        # no cache to warm for basic implementation
558        pass
559
560    def depthrev(self, repo, rev):
561        """depth a revision"""
562        return len(repo.revs(b'::%d', rev))
563
564    def revsfromrange(self, repo, rangeid):
565        """return revision contained in a range
566
567        The range `(<head>, <skip>)` contains all revisions stable-sorted from
568        <head>, skipping the <index>th lower revisions.
569        """
570        headrev, index = rangeid[0], rangeid[1]
571        revs = self._sortfunction(repo, headrev)
572        return revs[index:]
573
574    def rangelength(self, repo, rangeid):
575        """number of revision in <range>"""
576        return len(self.revsfromrange(repo, rangeid))
577
578    def subranges(self, repo, rangeid):
579        """return the stable sub-ranges of a rangeid"""
580        headrev, index = rangeid[0], rangeid[1]
581        if self.rangelength(repo, rangeid) == 1:
582            return []
583        slicepoint = self._slicepoint(repo, rangeid)
584
585        # search for range defining the lower set of revision
586        #
587        # we walk the lower set from the top following the stable order of the
588        # current "head" of the lower range.
589        #
590        # As soon as the revision in the lowerset diverges from the one in the
591        # range being generated, we emit the range and start a new one.
592        result = []
593        lowerrevs = self.revsfromrange(repo, rangeid)[:(slicepoint - index)]
594        head = None
595        headrange = None
596        skip = None
597        for rev in lowerrevs[::-1]:
598            if head is None:
599                head = rev
600                headrange = self.revsfromrange(repo, (head, 0))
601                skip = self.depthrev(repo, rev) - 1
602            elif rev != headrange[skip - 1]:
603                result.append((head, skip))
604                head = rev
605                headrange = self.revsfromrange(repo, (head, 0))
606                skip = self.depthrev(repo, rev) - 1
607            else:
608                skip -= 1
609        result.append((head, skip))
610
611        result.reverse()
612
613        # top part is trivial
614        top = (headrev, slicepoint)
615        result.append(top)
616
617        # double check the result
618        initialrevs = self.revsfromrange(repo, rangeid)
619        subrangerevs = sum((self.revsfromrange(repo, sub) for sub in result),
620                           [])
621        assert initialrevs == subrangerevs
622        return result
623
624class stablerangedummy_mergepoint(stablerangebasic):
625    """a very dummy implementation of stablerange use 'mergepoint' sorting
626    """
627
628    def _sortfunction(self, repo, headrev):
629        return stablesort.stablesort_mergepoint_head_basic(repo, [headrev])
630
631class stablerangecached(abstractstablerange):
632    """an implementation of stablerange using caching"""
633
634    __metaclass__ = abc.ABCMeta
635
636    def __init__(self):
637        # cache the standard stable subranges or a range
638        self._subrangescache = {}
639        super(stablerangecached, self).__init__()
640
641    def depthrev(self, repo, rev):
642        return repo.depthcache.get(rev)
643
644    def rangelength(self, repo, rangeid):
645        """number of revision in <range>"""
646        headrev, index = rangeid[0], rangeid[1]
647        return self.depthrev(repo, headrev) - index
648
649    def subranges(self, repo, rangeid):
650        assert 0 <= rangeid[1] <= rangeid[0], rangeid
651        cached = self._getsub(rangeid)
652        if cached is not None:
653            return cached
654        value = self._subranges(repo, rangeid)
655        self._setsub(rangeid, value)
656        return value
657
658    def _getsub(self, rev):
659        """utility function used to access the subranges cache
660
661        This mostly exist to help the on disk persistence"""
662        return self._subrangescache.get(rev)
663
664    def _setsub(self, rev, value):
665        """utility function used to set the subranges cache
666
667        This mostly exist to help the on disk persistence."""
668        self._subrangescache[rev] = value
669
670class stablerange_mergepoint(stablerangecached):
671    """Stablerange implementation using 'mergepoint' based sorting
672    """
673
674    def __init__(self):
675        super(stablerange_mergepoint, self).__init__()
676
677    def warmup(self, repo, upto=None):
678        # no cache to warm for basic implementation
679        pass
680
681    def revsfromrange(self, repo, rangeid):
682        """return revision contained in a range
683
684        The range `(<head>, <skip>)` contains all revisions stable-sorted from
685        <head>, skipping the <index>th lower revisions.
686        """
687        limit = self.rangelength(repo, rangeid)
688        return repo.stablesort.get(repo, rangeid[0], limit=limit)
689
690    def _stableparent(self, repo, headrev):
691        """The parent of the changeset with reusable subrange
692
693        For non-merge it is simple, there is a single parent. For Mercurial we
694        have to find the right one. Since the stable sort use merge-point, we
695        know that one of REV parents stable sort is a subset of REV stable
696        sort. In other word:
697
698            sort(::REV) = sort(::min(parents(REV))
699                          + sort(only(max(parents(REV)), min(parents(REV)))
700                          + [REV]
701
702        We are looking for that `min(parents(REV))`. Since the subrange are
703        based on the sort, we can reuse its subrange as well.
704        """
705        ps = filterparents(repo.changelog.parentrevs(headrev))
706        if not ps:
707            return nodemod.nullrev
708        elif len(ps) == 1:
709            return ps[0]
710        else:
711            tiebreaker = stablesort._mergepoint_tie_breaker(repo)
712            return min(ps, key=tiebreaker)
713
714    def _parentrange(self, repo, rangeid):
715        stable_parent = self._stableparent(repo, rangeid[0])
716        stable_parent_depth = self.depthrev(repo, stable_parent)
717        stable_parent_range = (stable_parent, rangeid[1])
718        return stable_parent_depth, stable_parent_range
719
720    def _warmcachefor(self, repo, rangeid, slicepoint):
721        """warm cache with all the element necessary"""
722        stack = []
723        depth, current = self._parentrange(repo, rangeid)
724        while current not in self._subrangescache and slicepoint < depth:
725            stack.append(current)
726            depth, current = self._parentrange(repo, current)
727        while stack:
728            current = stack.pop()
729            self.subranges(repo, current)
730
731    def _subranges(self, repo, rangeid):
732        headrev, initial_index = rangeid
733        # size 1 range can't be sliced
734        if self.rangelength(repo, rangeid) == 1:
735            return []
736        # find were we need to slice
737        slicepoint = self._slicepoint(repo, rangeid)
738
739        ret = self._slicesrangeat(repo, rangeid, slicepoint)
740
741        return ret
742
743    def _slicesrangeat(self, repo, rangeid, slicepoint):
744        headrev, initial_index = rangeid
745        self._warmcachefor(repo, rangeid, slicepoint)
746
747        stable_parent_data = self._parentrange(repo, rangeid)
748        stable_parent_depth, stable_parent_range = stable_parent_data
749
750        # top range is always the same, so we can build it early for all
751        top_range = (headrev, slicepoint)
752
753        # now find out about the lower range, if we are lucky there is only
754        # one, otherwise we need to issue multiple one to cover every revision
755        # on the lower set. (and cover them only once).
756        if slicepoint == stable_parent_depth:
757            # luckly shot, the parent is actually the head of the lower range
758            subranges = [
759                stable_parent_range,
760                top_range,
761            ]
762        elif slicepoint < stable_parent_depth:
763            # The parent is above the slice point,
764            # it's lower subrange will be the same so we just get them,
765            # (and the top range is always the same)
766            subranges = self.subranges(repo, stable_parent_range)[:]
767            parenttop = subranges.pop()
768            lenparenttop = self.rangelength(repo, parenttop)
769            skimfromparent = stable_parent_depth - slicepoint
770            if lenparenttop < skimfromparent:
771                # dropping the first subrange of the stable parent range is not
772                # enough to skip what we need to skip, change in approach is needed
773                subranges = self._slicesrangeat(repo, stable_parent_range, slicepoint)
774                subranges.pop()
775            elif lenparenttop > skimfromparent:
776                # The first subrange of the parent is longer that what we want
777                # to drop, we need to keep some of it.
778                midranges = self._slicesrangeat(repo, parenttop, slicepoint)
779                subranges.extend(midranges[:-1])
780            subranges.append(top_range)
781        elif initial_index < stable_parent_depth < slicepoint:
782            # the parent is below the range we are considering, we need to
783            # compute these uniques subranges
784            subranges = [stable_parent_range]
785            subranges.extend(self._unique_subranges(repo, headrev,
786                                                    stable_parent_depth,
787                                                    slicepoint))
788            subranges.append(top_range)
789        else:
790            # we cannot reuse the parent range at all
791            subranges = list(self._unique_subranges(repo, headrev,
792                                                    initial_index,
793                                                    slicepoint))
794            subranges.append(top_range)
795
796        ### slow code block to validated the slicing works as expected
797        # toprevs = self.revsfromrange(repo, rangeid)
798        # subrevs = []
799        # for s in subranges:
800        #     subrevs.extend(self.revsfromrange(repo, s))
801        # assert toprevs == subrevs, (rangeid, slicepoint, stable_parent_range, stable_parent_depth, toprevs, subrevs)
802
803        return subranges
804
805    def _unique_subranges(self, repo, headrev, initial_index, slicepoint):
806        """Compute subrange unique to the exclusive part of merge"""
807        result = []
808        depth = repo.depthcache.get
809        nextmerge = repo.firstmergecache.get
810        walkfrom = functools.partial(repo.stablesort.walkfrom, repo)
811        getjumps = functools.partial(repo.stablesort.getjumps, repo)
812        skips = depth(headrev) - slicepoint
813        tomap = slicepoint - initial_index
814
815        jumps = getjumps(headrev)
816        # this function is only caled if headrev is a merge
817        # and initial_index is above its lower parents
818        assert jumps is not None
819        jumps = iter(jumps)
820        assert 0 < skips, skips
821        assert 0 < tomap, (tomap, (headrev, initial_index), slicepoint)
822
823        # utility function to find the next changeset with jump information
824        # (and the distance to it)
825        def nextmergedata(startrev):
826            merge = nextmerge(startrev)
827            depthrev = depth(startrev)
828            if merge == startrev:
829                return 0, startrev
830            elif merge == nodemod.nullrev:
831                return depthrev, None
832            depthmerge = depth(merge)
833            return depthrev - depthmerge, merge
834
835        # skip over all necesary data
836        mainjump = None
837        jumpdest = headrev
838        while 0 < skips:
839            jumphead = jumpdest
840            currentjump = next(jumps)
841            skipped = size = currentjump[2]
842            jumpdest = currentjump[1]
843            if size == skips:
844                jumphead = jumpdest
845                mainjump = next(jumps)
846                mainsize = mainjump[2]
847            elif skips < size:
848                revs = walkfrom(jumphead)
849                next(revs)
850                for i in range(skips):
851                    jumphead = next(revs)
852                    assert jumphead is not None
853                skipped = skips
854                size -= skips
855                mainjump = currentjump
856                mainsize = size
857            skips -= skipped
858        assert skips == 0, skips
859
860        # exiting from the previous block we should have:
861        # jumphead: first non-skipped revision (head of the high subrange)
862        # mainjump: next jump coming jump on main iteration
863        # mainsize: len(mainjump[0]::jumphead)
864
865        # Now we need to compare walk on the main iteration with walk from the
866        # current subrange head. Instead of doing a full walk, we just skim
867        # over the jumps for each iteration.
868        rangehead = jumphead
869        refjumps = None
870        size = 0
871        while size < tomap:
872            assert mainjump is not None
873            if refjumps is None:
874                dist2merge, merge = nextmergedata(jumphead)
875                if (mainsize <= dist2merge) or merge is None:
876                    refjumps = iter(())
877                    ref = None
878                else:
879                    # advance counters
880                    size += dist2merge
881                    mainsize -= dist2merge
882                    jumphead = merge
883                    refjumps = iter(getjumps(merge))
884                    ref = next(refjumps, None)
885            elif ref is not None and mainjump[0:2] == ref[0:2]:
886                # both follow the same path
887                size += mainsize
888                jumphead = mainjump[1]
889                mainjump = next(jumps, None)
890                mainsize = mainjump[2]
891                ref = next(refjumps, None)
892                if ref is None:
893                    # we are doing with section specific to the last merge
894                    # reset `refjumps` to trigger the logic that search for the
895                    # next merge
896                    refjumps = None
897            else:
898                size += mainsize
899                if size < tomap:
900                    subrange = (rangehead, depth(rangehead) - size)
901                    assert subrange[1] < depth(subrange[0])
902                    result.append(subrange)
903                    tomap -= size
904                    size = 0
905                    jumphead = rangehead = mainjump[1]
906                    mainjump = next(jumps, None)
907                    mainsize = mainjump[2]
908                    refjumps = None
909
910        if tomap:
911            subrange = (rangehead, depth(rangehead) - tomap)
912            assert subrange[1] < depth(subrange[0]), (rangehead, depth(rangehead), tomap)
913            result.append(subrange)
914        result.reverse()
915        return result
916
917class stablerange(stablerangecached):
918
919    def __init__(self, lrusize=2000):
920        # The point up to which we have data in cache
921        self._tiprev = None
922        self._tipnode = None
923        # To slices merge, we need to walk their descendant in reverse stable
924        # sort order. For now we perform a full stable sort their descendant
925        # and then use the relevant top most part. This order is going to be
926        # the same for all ranges headed at the same merge. So we cache these
927        # value to reuse them accross the same invocation.
928        self._stablesortcache = util.lrucachedict(lrusize)
929        # something useful to compute the above
930        # mergerev -> stablesort, length
931        self._stablesortprepared = util.lrucachedict(lrusize)
932        # caching parent call # as we do so many of them
933        self._parentscache = {}
934        # The first part of the stable sorted list of revision of a merge will
935        # shared with the one of others. This means we can reuse subranges
936        # computed from that point to compute some of the subranges from the
937        # merge.
938        self._inheritancecache = {}
939        super(stablerange, self).__init__()
940
941    def warmup(self, repo, upto=None):
942        """warm the cache up"""
943        repo = repo.unfiltered()
944        repo.depthcache.update(repo)
945        cl = repo.changelog
946        # subrange should be warmed from head to range to be able to benefit
947        # from revsfromrange cache. otherwise each merge will trigger its own
948        # stablesort.
949        #
950        # we use the revnumber as an approximation for depth
951        ui = repo.ui
952        starttime = util.timer()
953
954        if upto is None:
955            upto = cl.tiprev()
956        if self._tiprev is None:
957            revs = cl.revs(stop=upto)
958            nbrevs = upto + 1
959        else:
960            assert cl.node(self._tiprev) == self._tipnode
961            if upto <= self._tiprev:
962                return
963            revs = cl.revs(start=self._tiprev + 1, stop=upto)
964            nbrevs = upto - self._tiprev
965        rangeheap = []
966        progress = ui.makeprogress(_(b"filling depth cache"), _(b"changesets"), nbrevs)
967        for idx, r in enumerate(revs):
968            if not idx % 1000:
969                progress.update(idx)
970            # warm up depth
971            self.depthrev(repo, r)
972            rangeheap.append((-r, (r, 0)))
973        progress.complete()
974
975        heappop = heapq.heappop
976        heappush = heapq.heappush
977        heapify = heapq.heapify
978
979        original = set(rangeheap)
980        seen = 0
981        # progress report is showing up in the profile for small and fast
982        # repository so we build that complicated work around.
983        progress_each = 100
984        progress_last = time.time()
985        heapify(rangeheap)
986        progress = ui.makeprogress(_(b"filling stablerange cache"), _(b"changesets"), nbrevs)
987        while rangeheap:
988            value = heappop(rangeheap)
989            if value in original:
990                if not seen % progress_each:
991                    # if a lot of time passed, report more often
992                    progress_new = time.time()
993                    if (1 < progress_each) and (0.1 < progress_new - progress_last):
994                        progress_each /= 10
995                    progress.update(seen)
996                    progress_last = progress_new
997                seen += 1
998                original.remove(value) # might have been added from other source
999            __, rangeid = value
1000            if self._getsub(rangeid) is None:
1001                for sub in self.subranges(repo, rangeid):
1002                    if self._getsub(sub) is None:
1003                        heappush(rangeheap, (-sub[0], sub))
1004        progress.complete()
1005
1006        self._tiprev = upto
1007        self._tipnode = cl.node(upto)
1008
1009        duration = util.timer() - starttime
1010        repo.ui.log(b'evoext-cache', b'updated stablerange cache in %.4f seconds\n',
1011                    duration)
1012
1013    def subranges(self, repo, rangeid):
1014        cached = self._getsub(rangeid)
1015        if cached is not None:
1016            return cached
1017        value = self._subranges(repo, rangeid)
1018        self._setsub(rangeid, value)
1019        return value
1020
1021    def revsfromrange(self, repo, rangeid):
1022        headrev, index = rangeid
1023        rangelength = self.rangelength(repo, rangeid)
1024        if rangelength == 1:
1025            revs = [headrev]
1026        else:
1027            # get all revs under heads in stable order
1028            #
1029            # note: In the general case we can just walk down and then request
1030            # data about the merge. But I'm not sure this function will be even
1031            # call for the general case.
1032
1033            allrevs = self._stablesortcache.get(headrev)
1034            if allrevs is None:
1035                allrevs = self._getrevsfrommerge(repo, headrev)
1036                if allrevs is None:
1037                    mc = self._filestablesortcache
1038                    sorting = stablesort.stablesort_branchpoint
1039                    allrevs = sorting(repo, [headrev], mergecallback=mc)
1040                self._stablesortcache[headrev] = allrevs
1041            # takes from index
1042            revs = allrevs[index:]
1043        # sanity checks
1044        assert len(revs) == rangelength
1045        return revs
1046
1047    def _parents(self, rev, func):
1048        parents = self._parentscache.get(rev)
1049        if parents is None:
1050            parents = func(rev)
1051            self._parentscache[rev] = parents
1052        return parents
1053
1054    def _getsub(self, rev):
1055        """utility function used to access the subranges cache
1056
1057        This mostly exist to help the on disk persistence"""
1058        return self._subrangescache.get(rev)
1059
1060    def _setsub(self, rev, value):
1061        """utility function used to set the subranges cache
1062
1063        This mostly exist to help the on disk persistence."""
1064        self._subrangescache[rev] = value
1065
1066    def _filestablesortcache(self, sortedrevs, merge):
1067        if merge not in self._stablesortprepared:
1068            self._stablesortprepared[merge] = (sortedrevs, len(sortedrevs))
1069
1070    def _getrevsfrommerge(self, repo, merge):
1071        prepared = self._stablesortprepared.get(merge)
1072        if prepared is None:
1073            return None
1074
1075        mergedepth = self.depthrev(repo, merge)
1076        allrevs = prepared[0][:prepared[1]]
1077        nbextrarevs = prepared[1] - mergedepth
1078        if not nbextrarevs:
1079            return allrevs
1080
1081        anc = repo.changelog.ancestors([merge], inclusive=True)
1082        top = []
1083        counter = nbextrarevs
1084        for rev in reversed(allrevs):
1085            if rev in anc:
1086                top.append(rev)
1087            else:
1088                counter -= 1
1089                if counter <= 0:
1090                    break
1091
1092        bottomidx = prepared[1] - (nbextrarevs + len(top))
1093        revs = allrevs[:bottomidx]
1094        revs.extend(reversed(top))
1095        return revs
1096
1097    def _inheritancepoint(self, repo, merge):
1098        """Find the inheritance point of a Merge
1099
1100        The first part of the stable sorted list of revision of a merge will shared with
1101        the one of others. This means we can reuse subranges computed from that point to
1102        compute some of the subranges from the merge.
1103
1104        That point is latest point in the stable sorted list where the depth of the
1105        revisions match its index (that means all revision earlier in the stable sorted
1106        list are its ancestors, no dangling unrelated branches exists).
1107        """
1108        value = self._inheritancecache.get(merge)
1109        if value is None:
1110            revs = self.revsfromrange(repo, (merge, 0))
1111            i = reversed(revs)
1112            next(i) # pop the merge
1113            expected = len(revs) - 1
1114            # Since we do warmup properly, we can expect the cache to be hot
1115            # for everythin under the merge we investigate
1116            cache = repo.depthcache
1117            # note: we cannot do a binary search because element under the
1118            # inherited point might have mismatching depth because of inner
1119            # branching.
1120            for rev in i:
1121                if cache.get(rev) == expected:
1122                    break
1123                expected -= 1
1124            value = (expected - 1, rev)
1125            self._inheritancecache[merge] = value
1126        return value
1127
1128    def _subranges(self, repo, rangeid):
1129        if self.rangelength(repo, rangeid) == 1:
1130            return []
1131        slicepoint = self._slicepoint(repo, rangeid)
1132
1133        # make sure we have cache for all relevant parent first to prevent
1134        # recursion (python is bad with recursion
1135        stack = []
1136        current = rangeid
1137        while current is not None:
1138            current = self._cold_reusable(repo, current, slicepoint)
1139            if current is not None:
1140                stack.append(current)
1141        while stack:
1142            # these call will directly compute the subranges
1143            self.subranges(repo, stack.pop())
1144        return self._slicesrangeat(repo, rangeid, slicepoint)
1145
1146    def _cold_reusable(self, repo, rangeid, slicepoint):
1147        """return parent range that it would be useful to prepare to slice
1148        rangeid at slicepoint
1149
1150        This function also have the important task to update the revscache of
1151        the parent rev s if possible and needed"""
1152        ps = filterparents(self._parents(rangeid[0], repo.changelog.parentrevs))
1153        if not ps:
1154            return None
1155        elif len(ps) == 1:
1156            # regular changesets, we pick the parent
1157            reusablerev = ps[0]
1158        else:
1159            # merge, we try the inheritance point
1160            # if it is too low, it will be ditched by the depth check anyway
1161            index, reusablerev = self._inheritancepoint(repo, rangeid[0])
1162
1163        # if we reached the slicepoint, no need to go further
1164        if self.depthrev(repo, reusablerev) <= slicepoint:
1165            return None
1166
1167        reurange = (reusablerev, rangeid[1])
1168        # if we have an entry for the current range, lets update the cache
1169        # if we already have subrange for this range, no need to prepare it.
1170        if self._getsub(reurange) is not None:
1171            return None
1172
1173        # look like we found a relevent parentrange with no cache yet
1174        return reurange
1175
1176    def _slicesrangeat(self, repo, rangeid, globalindex):
1177        ps = self._parents(rangeid[0], repo.changelog.parentrevs)
1178        if len(ps) == 1:
1179            reuserev = ps[0]
1180        else:
1181            index, reuserev = self._inheritancepoint(repo, rangeid[0])
1182            if index < globalindex:
1183                return self._slicesrangeatmerge(repo, rangeid, globalindex)
1184
1185        assert reuserev != nodemod.nullrev
1186
1187        reuserange = (reuserev, rangeid[1])
1188        top = (rangeid[0], globalindex)
1189
1190        if rangeid[1] + self.rangelength(repo, reuserange) == globalindex:
1191            return [reuserange, top]
1192        # This will not initiate a recursion since we took appropriate
1193        # precaution in the caller of this method to ensure it will be so.
1194        # It the parent is a merge that will not be the case but computing
1195        # subranges from a merge will not recurse.
1196        reusesubranges = self.subranges(repo, reuserange)
1197        slices = reusesubranges[:-1] # pop the top
1198        slices.append(top)
1199        return slices
1200
1201    def _slicesrangeatmerge(self, repo, rangeid, globalindex):
1202        localindex = globalindex - rangeid[1]
1203
1204        result = []
1205        allrevs = self.revsfromrange(repo, rangeid)
1206        bottomrevs = allrevs[:localindex]
1207
1208        if globalindex == self.depthrev(repo, bottomrevs[-1]):
1209            # simple case, top revision in the bottom set contains exactly the
1210            # revision we needs
1211            result.append((bottomrevs[-1], rangeid[1]))
1212        else:
1213            head = None
1214            headrange = None
1215            skip = None
1216            for rev in bottomrevs[::-1]:
1217                if head is None:
1218                    head = rev
1219                    headrange = self.revsfromrange(repo, (head, 0))
1220                    skip = self.depthrev(repo, rev) - 1
1221                elif rev != headrange[skip - 1]:
1222                    result.append((head, skip))
1223                    head = rev
1224                    headrange = self.revsfromrange(repo, (head, 0))
1225                    skip = self.depthrev(repo, rev) - 1
1226                else:
1227                    skip -= 1
1228            result.append((head, skip))
1229
1230            result.reverse()
1231
1232        # top part is trivial
1233        top = (rangeid[0], globalindex)
1234        result.append(top)
1235        return result
1236
1237# merge later for outer layer wrapping
1238eh.merge(stablesort.eh)
1239eh.merge(firstmergecache.eh)
1240