1# Code dedicated to the computation of stable sorting
2#
3# These stable sorting are used stable ranges
4#
5# Copyright 2017 Pierre-Yves David <pierre-yves.david@ens-lyon.org>
6#
7# This software may be used and distributed according to the terms of the
8# GNU General Public License version 2 or any later version.
9
10r"""Stable sorting for the mercurial graph
11
12The goal is to provided an efficient, revnum independant way, to sort revisions
13in a topologicaly. Having it independant from revnum is important to make it
14stable from one repository to another, unlocking various capabilities. For
15example it can be used for discovery purposes.
16
17This docstring describe the currently preferred solution:
18
19Probleme definition
20-------------------
21
22We want a way to order revision in the graph. For a linear history things are simple::
23
24  A -> B -> C -> D -> E -> F -> G -> H
25
26  stablesort(::H) = [A, B, C, D, E, F, G, H]
27
28However, things become more complicated when the graph is not linear::
29
30  A -> B -> C -> D -> G -> H
31         \         /
32          > E -> F
33
34  stablesort(::A) = [A]
35  stablesort(::B) = [A, B]
36  stablesort(::C) = [A, B, C]
37  stablesort(::D) = [A, B, C, D]
38  stablesort(::E) = [A, B, E]
39  stablesort(::F) = [A, B, E, F]
40  stablesort(::G) = [A, B, C, D, E, F, G]
41  stablesort(::H) = [A, B, C, D, E, F, G, H]
42
43Basic principle:
44----------------
45
46We are always talking about set of revision defined by a single heads
47(eg: `stablesort(::r)`)
48
49For non merge revisions, the definition is simple::
50
51  stablesort(::r) == stablesort(p1(r)) + r
52
53This is visible in some of the example above:
54
55  stablesort(::B) = stablesort(::A) + [B]
56  stablesort(::E) = stablesort(::B) + [E]
57  stablesort(::H) = stablesort(::G) + [H]
58
59For merge revision, we reuse as much as possible of the parents order:
60
61    pl = stablemin(parents(m))
62    ph = stablemax(parents(m))
63    stablesort(::m) == stablesort(pl)
64                       + [i for i in stablesort(ph) if i in ph % pl]
65                       + m
66
67This is visible in the example above:
68
69    stablesort(::G) = stablesort(::D) + [stablesort(::F) - ::D] + [G]
70    stablesort(::G) = [A, B, C, D] + ([A, B, E, F] - [A, B, C ,D]) + [G]
71    stablesort(::G) = [A, B, C, D] + [E, F] + [G]
72
73To decide which parent goes first in the stablesort, we need to order them. The
74`stablemin/stablemax` function express this. The actual order used is an
75implementation details (eg: could be node-order, in the example it is
76alphabetical order)
77
78The `ph % pl` set of revision is called the "exclusive part". It correspond to
79all revisions ancestors of `ph` (`ph` included) that are not ancestors of `pl`
80(`pl` included).  In this area we try to reuse as much as the stable-sorted
81order for `ph`. In simple case, the `[i for i in stablesort(ph) if i in ph %
82pl]` is just the contiguous final range of `stablesort(ph)`. This is the case
83in the example we have looked at so far::
84
85    stablesort(::F) - ::D = [A, B, E, F] - [A, B, C ,D] = stablesort(::F)[-2:]
86
87
88However in more advance case, this will not be contiguous and we'll need to
89skip over multiple parts of `stablesort(ph)` to cover `ph % pl`.Let's have a
90look at an example of such case::
91
92  A - B ----- F - H
93   \    \   /   /
94    \     E - G
95     \  /   /
96      C - D
97
98We have the following stablesort:
99
100  stablesort(::A) = [A]
101  stablesort(::B) = [A, B]
102  stablesort(::C) = [A, C]
103  stablesort(::D) = [A, C, D]
104  stablesort(::E) = [A, B, C, E]
105  stablesort(::F) = [A, B, C, E, F]
106  stablesort(::G) = [A, B, C, D, E, G]
107  stablesort(::H) = [A, B, C, E, F, D, G, H]
108
109The stable order of `stablesort(::H)` match our definition::
110
111
112  stablesort(::H) = [A, B, C, E, F] + [D, G] + [H]
113  stablesort(::F) = [A, B, C, E, F]
114  stablesort(::G) - ::F = [A, B, C, D, E, G] - [A, B, C, E, F] = [D, G]
115
116In this order, we reuse all of `stablesort(::F)`, but the subset of
117`stablesort(::G)` we reuse is not contiguous, we had to skip over 'E' that is
118already contained inside ::F.
119
120Usage
121-----
122
123An important details is that, in practice, the sorted revision are always
124walked backward, from the head of the set of revisions.
125
126preexisting cached data
127-----------------------
128
129The stable sort assume we already have 2 important property cached for each
130changesets:
131
1321) changeset depth == len(::r)
1332) first merge == max(merge() and ::r)
134
135Caching strategy
136----------------
137
138Since we always walk from the head, the iteration mostly have to follow the
139unique parent of non merge revision. For merge revision, we need to iterate
140over the revisions accessible only through one of the parent before coming back
141to the other parent eventually.
142
143To efficiently cache the revision path we need to walk, we records "jumps". A
144jump is a revision where the next revision (in the stable sort) will not be a
145parent of the current revision, but another revision in the graph.
146
147In the first (simple) example above, we had::
148
149  A -> B -> C -> D -> G -> H
150         \         /
151          > E -> F
152
153  stablesort(::D) = [A, B, C, D]
154  stablesort(::F) = [A, B, E, F]
155  stablesort(::G) = [A, B, C, D, E, F, G]
156
157In this case, when caching stable sort data for `G`, we need to record the `E
158-> D` jump. This correspond to point were we are done iterating over the
159revision accessible through `F` and we need to "jump back to the other parent"
160of `G`: `D`.
161
162
163
164In the second (more advance) example above, we had::
165
166  A - B ----- F - H
167   \    \   /   /
168    \     E - G
169     \  /   /
170      C - D
171
172  stablesort(::F) = [A, B, C, E, F]
173  stablesort(::G) = [A, B, C, D, E, G]
174  stablesort(::H) = [A, B, C, E, F, D, G, H]
175
176In this case, when caching stable sort data for `G`, we need to record the `G
177-> D` and the `D -> F` jumps.
178
179Jumps are recorded using the following formats:
180
181    (jump-point, jump-destination, section-size)
182
183* jump-point is the last revision number we should iterate over before jumping,
184* jump-destination is the next revision we should iterate over after the jump point,
185* section-size is the number of revision to be iterated before reaching jump-point.
186
187the section-size is not directly used when doing a stable-sorted walk. However
188it is useful for higher level piece of code to take decision without having to
189actually walk the graph, (see stable range documentation).
190
191For each merge, we store the set of jumps that cover the exclusive side.
192
193Practical data
194--------------
195
196The mercurial repository has simple branching and few jumps:
197
198    number of revisions:        69771
199    number of merge:             2734
200    number of jumps:             2950
201    average jumps:                  1.079
202    median jumps:                   1
203    90% jumps:                      1
204    99% jumps:                      3
205    max jumps:                      6
206    jump cache size:           35 400 bytes
207
208Mozilla's branching is fairly simple too:
209
210    number of revisions:       435078
211    number of merge:            21035
212    number of jumps:            31434
213    average jumps:                  1.494
214    median jumps:                   1
215    90% jumps:                      2
216    99% jumps:                      9
217    max jumps:                    169
218    jump cache size:          377 208 bytes
219
220Pypy has a more complicated branching history but jumps cache remains reasonable
221
222    number of revisions:        95010
223    number of merge:             7911
224    number of jumps:            24326
225    average jumps:                  3.075
226    median jumps:                   1
227    90% jumps:                      5
228    99% jumps:                     40
229    max jumps:                    329
230    jump cache size:          291 912 bytes
231
232This still apply to larger private project:
233
234    number of revisions:       605011
235    number of merge:           118109
236    number of jumps:           314925
237    average jumps:                  2.667
238    median jumps:                   1
239    90% jumps:                      3
240    99% jumps:                     34
241    max jumps:                    660
242    jump cache size:        3 779 100 bytes
243
244It is worth noting that the last jump could be computed form other information,
245removing one jump storage per merge. However this does not seems to be an issue
246worth the troubles for now.
247"""
248
249import array
250import collections
251import struct
252
253from mercurial.i18n import _
254from mercurial import (
255    commands,
256    error,
257    localrepo,
258    logcmdutil,
259    node as nodemod,
260    pycompat,
261    scmutil,
262)
263
264from mercurial.utils.stringutil import forcebytestr
265
266from . import (
267    depthcache,
268    exthelper,
269    utility,
270    genericcaches,
271)
272
273filterparents = utility.filterparents
274
275eh = exthelper.exthelper()
276
277def _mergepoint_tie_breaker(repo):
278    """the key use to tie break merge parent
279
280    Exists as a function to help playing with different approaches.
281
282    Possible other factor are:
283        * depth of node,
284        * number of exclusive merges,
285        * number of jump points.
286        * <insert-your-idea>
287    """
288    node = repo.changelog.node
289    depth = repo.depthcache.get
290
291    def key(rev):
292        return (-depth(rev), node(rev))
293    return key
294
295@eh.command(
296    b'debugstablesort',
297    [
298        (b'r', b'rev', [], b'heads to start from'),
299        (b'', b'method', b'branchpoint', b"method used for sorting, one of: "
300         b"branchpoint, basic-mergepoint and basic-headstart"),
301        (b'l', b'limit', b'', b'number of revision display (default to all)')
302    ] + commands.formatteropts,
303    _(b''))
304def debugstablesort(ui, repo, **opts):
305    """display the ::REVS set topologically sorted in a stable way
306    """
307    revs = scmutil.revrange(repo, opts['rev'])
308
309    method = opts['method']
310    sorting = _methodmap.get(method)
311    if sorting is None:
312        valid_method = b', '.join(sorted(_methodmap))
313        raise error.Abort(b'unknown sorting method: "%s"' % method,
314                          hint=b'pick one of: %s' % valid_method)
315
316    displayer = logcmdutil.changesetdisplayer(ui, repo,
317                                              pycompat.byteskwargs(opts),
318                                              buffered=True)
319    kwargs = {}
320    if opts['limit']:
321        kwargs['limit'] = int(opts['limit'])
322    for r in sorting(repo, revs, **kwargs):
323        ctx = repo[r]
324        displayer.show(ctx)
325        displayer.flush(ctx)
326    displayer.close()
327
328@eh.command(
329    b'debugstablesortcache',
330    [] + commands.formatteropts,
331    _(b''))
332def debugstablesortcache(ui, repo, **opts):
333    """display data about the stable sort cache of a repository
334    """
335    unfi = repo.unfiltered()
336    revs = unfi.revs('all()')
337    nbrevs = len(revs)
338    ui.write(b'number of revisions: %12d\n' % nbrevs)
339    merge = unfi.revs('merge()')
340    nbmerge = len(merge)
341    cache = unfi.stablesort
342    cache.save(repo)
343    ui.write(b'number of merge:     %12d\n' % nbmerge)
344    alljumps = []
345    alljumpssize = []
346    for r in merge:
347        jumps = cache.getjumps(unfi, r)
348        if jumps is None:
349            continue # not a merge
350        jumps = list(jumps)
351        alljumps.append(jumps)
352        alljumpssize.append(len(jumps))
353    nbjumps = sum(alljumpssize)
354    ui.write(b'number of jumps:     %12d\n' % nbjumps)
355    if not nbjumps:
356        return 0
357    avgjumps = nbjumps / float(len(alljumpssize))
358    ui.write(b'average jumps:                 %6.3f\n' % avgjumps)
359    alljumpssize.sort()
360    medianjumps = alljumpssize[len(alljumpssize) // 2]
361    ui.write(b'median jumps:        %12d\n' % medianjumps)
362    tensjumps = alljumpssize[len(alljumpssize) * 9 // 10]
363    ui.write(b'90%% jumps:           %12d\n' % tensjumps)
364    centsjumps = alljumpssize[len(alljumpssize) * 99 // 100]
365    ui.write(b'99%% jumps:           %12d\n' % centsjumps)
366    ui.write(b'max jumps:           %12d\n' % max(alljumpssize))
367    ui.write(b'jump cache size:     %12d bytes\n' % (nbjumps * 12))
368
369def stablesort_branchpoint(repo, revs, mergecallback=None):
370    """return '::revs' topologically sorted in "stable" order
371
372    This is a depth first traversal starting from 'nullrev', using node as a
373    tie breaker.
374    """
375    # Various notes:
376    #
377    # * Bitbucket is used dates as tie breaker, that might be a good idea.
378    #
379    # * It seemds we can traverse in the same order from (one) head to bottom,
380    #   if we the following record data for each merge:
381    #
382    #  - highest (stablesort-wise) common ancestors,
383    #  - order of parents (tablesort-wise)
384    cl = repo.changelog
385    parents = cl.parentrevs
386    nullrev = nodemod.nullrev
387    n = cl.node
388    # step 1: We need a parents -> children mapping for 2 reasons.
389    #
390    # * we build the order from nullrev to tip
391    #
392    # * we need to detect branching
393    children = collections.defaultdict(list)
394    for r in cl.ancestors(revs, inclusive=True):
395        ps = filterparents(parents(r))
396        if not ps:
397            children[nullrev].append(r)
398        for p in ps:
399            children[p].append(r)
400    # step two: walk back up
401    # * pick lowest node in case of branching
402    # * stack disregarded part of the branching
403    # * process merge when both parents are yielded
404
405    # track what changeset has been
406    seen = [0] * (max(revs) + 2)
407    seen[nullrev] = True # nullrev is known
408    # starts from repository roots
409    # reuse the list form the mapping as we won't need it again anyway
410    stack = children[nullrev]
411    if not stack:
412        return []
413    if 1 < len(stack):
414        stack.sort(key=n, reverse=True)
415
416    # list of rev, maybe we should yield, but since we built a children mapping we are 'O(N)' already
417    result = []
418
419    current = stack.pop()
420    while current is not None or stack:
421        if current is None:
422            # previous iteration reached a merge or an unready merge,
423            current = stack.pop()
424            if seen[current]:
425                current = None
426                continue
427        ps = filterparents(parents(current))
428        if not all(seen[p] for p in ps):
429            # we can't iterate on this merge yet because other child is not
430            # yielded yet (and we are topo sorting) we can discard it for now
431            # because it will be reached from the other child.
432            current = None
433            continue
434        assert not seen[current]
435        seen[current] = True
436        result.append(current) # could be yield, cf earlier comment
437        if mergecallback is not None and 2 <= len(ps):
438            mergecallback(result, current)
439        cs = children[current]
440        if not cs:
441            current = None
442        elif 1 == len(cs):
443            current = cs[0]
444        else:
445            cs.sort(key=n, reverse=True)
446            current = cs.pop() # proceed on smallest
447            stack.extend(cs)   # stack the rest for later
448    assert len(result) == len(set(result))
449    return result
450
451def stablesort_mergepoint_multirevs(repo, revs):
452    """return '::revs' topologically sorted in "stable" order
453
454    This is a depth first traversal starting from 'revs' (toward root), using node as a
455    tie breaker.
456    """
457    cl = repo.changelog
458    tiebreaker = _mergepoint_tie_breaker(repo)
459    if not revs:
460        return []
461    elif len(revs) == 1:
462        heads = list(sorted(revs))
463    else:
464        # keeps heads only
465        heads = sorted(repo.revs(b'sort(heads(%ld::%ld))', revs, revs), key=tiebreaker)
466
467    results = []
468    while heads:
469        h = heads.pop()
470        if revs:
471            bound = cl.findmissingrevs(common=heads, heads=[h])
472        else:
473            bound = cl.ancestors([h], inclusive=True)
474        results.append(stablesort_mergepoint_bounded(repo, h, bound))
475    if len(results) == 1:
476        return results[0]
477    finalresults = []
478    for r in results[::-1]:
479        finalresults.extend(r)
480    return finalresults
481
482def stablesort_mergepoint_bounded(repo, head, revs):
483    """return 'revs' topologically sorted in "stable" order.
484
485    The 'revs' set MUST have 'head' as its one and unique head.
486    """
487    # Various notes:
488    #
489    # * Bitbucket is using dates as tie breaker, that might be a good idea.
490    cl = repo.changelog
491    parents = cl.parentrevs
492    nullrev = nodemod.nullrev
493    tiebreaker = _mergepoint_tie_breaker(repo)
494    # step 1: We need a parents -> children mapping to detect dependencies
495    children = collections.defaultdict(set)
496    parentmap = {}
497    for r in revs:
498        ps = filterparents(parents(r))
499        if 2 <= len(ps):
500            ps = tuple(sorted(ps, key=tiebreaker))
501        parentmap[r] = ps
502        for p in ps:
503            children[p].add(r)
504        if not ps:
505            children[nullrev].add(r)
506    # step two: walk again,
507    stack = [head]
508    resultset = set()
509    result = []
510
511    def add(current):
512        resultset.add(current)
513        result.append(current)
514
515    while stack:
516        current = stack.pop()
517        add(current)
518        parents = parentmap[current]
519        for p in parents:
520            if 1 < len(children[p]) and not children[p].issubset(resultset):
521                # we need other children to be yield first
522                continue
523            if p in revs:
524                stack.append(p)
525
526    result.reverse()
527    assert len(result) == len(resultset)
528    return result
529
530def stablesort_mergepoint_head_basic(repo, revs, limit=None):
531    heads = repo.revs(b'sort(heads(%ld))', revs)
532    if not heads:
533        return []
534    elif 2 < len(heads):
535        raise error.Abort(b'cannot use head based merging, %d heads found'
536                          % len(heads))
537    head = heads.first()
538    revs = stablesort_mergepoint_bounded(repo, head, repo.revs(b'::%d', head))
539    if limit is None:
540        return revs
541    return revs[-limit:]
542
543def stablesort_mergepoint_head_debug(repo, revs, limit=None):
544    heads = repo.revs(b'sort(heads(%ld))', revs)
545    if not heads:
546        return []
547    elif 2 < len(heads):
548        raise error.Abort(b'cannot use head based merging, %d heads found'
549                          % len(heads))
550    head = heads.first()
551    revs = stablesort_mergepoint_head(repo, head)
552    if limit is None:
553        return revs
554    return revs[-limit:]
555
556def stablesort_mergepoint_head(repo, head):
557    """return '::rev' topologically sorted in "stable" order
558
559    This is a depth first traversal starting from 'rev' (toward root), using node as a
560    tie breaker.
561    """
562    cl = repo.changelog
563    parents = cl.parentrevs
564    tiebreaker = _mergepoint_tie_breaker(repo)
565
566    top = [head]
567    mid = []
568    bottom = []
569
570    ps = filterparents(parents(head))
571    while len(ps) == 1:
572        top.append(ps[0])
573        ps = filterparents(parents(ps[0]))
574    top.reverse()
575    if len(ps) == 2:
576        ps = sorted(ps, key=tiebreaker)
577
578        # get the part from the highest parent. This is the part that changes
579        mid_revs = repo.revs(b'only(%d, %d)', ps[1], ps[0])
580        if mid_revs:
581            mid = stablesort_mergepoint_bounded(repo, ps[1], mid_revs)
582
583        # And follow up with part othe parent we can inherit from
584        bottom = stablesort_mergepoint_head(repo, ps[0])
585
586    return bottom + mid + top
587
588def stablesort_mergepoint_head_cached(repo, revs, limit=None):
589    heads = repo.revs(b'sort(heads(%ld))', revs)
590    if not heads:
591        return []
592    elif 2 < len(heads):
593        raise error.Abort(b'cannot use head based merging, %d heads found'
594                          % len(heads))
595    head = heads.first()
596    cache = stablesortcache()
597    first = list(cache.get(repo, head, limit=limit))
598    second = list(cache.get(repo, head, limit=limit))
599    if first != second:
600        repo.ui.warn(b'stablesort-cache: initial run different from re-run:\n'
601                     b'    %s\n'
602                     b'    %s\n' % (first, second))
603    return second
604
605class stablesortcache(object):
606
607    def __init__(self):
608        self._jumps = {}
609        super(stablesortcache, self).__init__()
610
611    def get(self, repo, rev, limit=None):
612        result = []
613        for r in self.walkfrom(repo, rev):
614            result.append(r)
615            if limit is not None and limit <= len(result):
616                break
617        result.reverse()
618        return result
619
620    def getjumps(self, repo, rev):
621        if not self._hasjumpsdata(rev):
622            parents = filterparents(repo.changelog.parentrevs(rev))
623            if len(parents) <= 1:
624                self._setjumps(rev, None)
625            else:
626                # merge ! warn the cache
627                tiebreaker = _mergepoint_tie_breaker(repo)
628                minparent = sorted(parents, key=tiebreaker)[0]
629                for r in self.walkfrom(repo, rev):
630                    if r == minparent:
631                        break
632        return self._getjumps(rev)
633
634    def _hasjumpsdata(self, rev):
635        return rev in self._jumps
636
637    def _getjumps(self, rev):
638        return self._jumps.get(rev)
639
640    def _setjumps(self, rev, jumps):
641        self._jumps[rev] = jumps
642
643    def walkfrom(self, repo, head):
644        tiebreaker = _mergepoint_tie_breaker(repo)
645        cl = repo.changelog
646        parentsfunc = cl.parentrevs
647
648        def parents(rev):
649            return filterparents(parentsfunc(rev))
650
651        def oneparent(rev):
652            ps = parents(rev)
653            if not ps:
654                return None
655            if len(ps) == 1:
656                return ps[0]
657            return max(ps, key=tiebreaker)
658
659        current = head
660        previous_current_1 = object()
661        previous_current_2 = object()
662
663        while current is not None:
664            previous_current_2 = previous_current_1
665            previous_current_1 = current
666            assert previous_current_1 is not previous_current_2
667
668            jumps = self._getjumps(current)
669            if jumps is not None:
670                # we have enough cached information to directly iterate over
671                # the exclusive size.
672                for j in jumps:
673                    jump_point = j[0]
674                    jump_dest = j[1]
675                    if current == jump_point:
676                        yield current
677                    else:
678                        while current != jump_point:
679                            yield current
680                            current = oneparent(current)
681                            assert current is not None
682                        yield current
683                    current = jump_dest
684                continue
685
686            yield current
687
688            ps = parents(current)
689            if not ps:
690                current = None # break
691            if len(ps) == 1:
692                current = ps[0]
693            elif len(ps) == 2:
694                lower_parent, higher_parent = sorted(ps, key=tiebreaker)
695
696                rev = current
697                jumps = []
698
699                def recordjump(source, destination, size):
700                    jump = (source, destination, size)
701                    jumps.append(jump)
702                process = self._process_exclusive_side
703                for rev in process(lower_parent, higher_parent, cl, parents,
704                                   tiebreaker, recordjump):
705                    yield rev
706
707                if rev == current:
708                    recordjump(rev, lower_parent, 1)
709
710                self._setjumps(current, jumps)
711
712                current = lower_parent
713
714    def _process_exclusive_side(self, lower, higher, cl, parents, tiebreaker,
715                                recordjump):
716
717        exclusive = cl.findmissingrevs(common=[lower], heads=[higher])
718
719        def popready(stack):
720            """pop the top most ready item in the list"""
721            for idx in range(len(stack) - 1, -1, -1):
722                if children[stack[idx]].issubset(seen):
723                    return stack.pop(idx)
724            return None
725
726        hardstack = []
727        previous = None
728        seen = set()
729        current = higher
730        children = collections.defaultdict(set)
731        bound = set(exclusive)
732        if exclusive:
733            for r in exclusive:
734                for p in parents(r):
735                    children[p].add(r)
736
737            hardstack.append(higher)
738        nextjump = False
739        size = 1 # take the merge point into account
740        while hardstack:
741            current = popready(hardstack)
742            if current in seen:
743                continue
744            softstack = []
745            while current in bound and current not in seen:
746                if nextjump:
747                    recordjump(previous, current, size)
748                    nextjump = False
749                    size = 0
750                yield current
751                size += 1
752                previous = current
753                seen.add(current)
754
755                all_parents = parents(current)
756
757                # search or next parent to walk through
758                fallback, next = None, None
759                if 1 == len(all_parents):
760                    next = all_parents[0]
761                elif 2 <= len(all_parents):
762                    fallback, next = sorted(all_parents, key=tiebreaker)
763
764                # filter parent not ready (children not emitted)
765                while next is not None and not children[next].issubset(seen):
766                    nextjump = True
767                    next = fallback
768                    fallback = None
769
770                # stack management
771                if next is None:
772                    next = popready(softstack)
773                    if next is not None:
774                        nextjump = True
775                elif fallback is not None:
776                    softstack.append(fallback)
777
778                # get ready for next iteration
779                current = next
780
781            # any in processed head has to go in the hard stack
782            nextjump = True
783            hardstack.extend(softstack)
784
785        if previous is not None:
786            recordjump(previous, lower, size)
787
788def stablesort_mergepoint_head_ondisk(repo, revs, limit=None):
789    heads = repo.revs(b'sort(heads(%ld))', revs)
790    if not heads:
791        return []
792    elif 2 < len(heads):
793        raise error.Abort(b'cannot use head based merging, %d heads found'
794                          % len(heads))
795    head = heads.first()
796    unfi = repo.unfiltered()
797    cache = unfi.stablesort
798    cache.save(unfi)
799    return cache.get(repo, head, limit=limit)
800
801S_INDEXSIZE = struct.Struct(b'>I')
802
803class ondiskstablesortcache(stablesortcache, genericcaches.changelogsourcebase):
804
805    _filepath = b'evoext-stablesortcache-00'
806    _cachename = b'evo-ext-stablesort'
807
808    def __init__(self):
809        super(ondiskstablesortcache, self).__init__()
810        self._index = array.array(r'l')
811        self._data = array.array(r'l')
812        del self._jumps
813
814    def getjumps(self, repo, rev):
815        if len(self._index) < rev:
816            msg = b'stablesortcache must be warmed before use (%d < %d)'
817            msg %= (len(self._index), rev)
818            raise error.ProgrammingError(msg)
819        return self._getjumps(rev)
820
821    def _getjumps(self, rev):
822        # very first revision
823        if rev == 0:
824            return None
825        # no data yet
826        if len(self._index) <= rev:
827            return None
828        index = self._index
829        # non merge revision
830        if index[rev] == index[rev - 1]:
831            return None
832        data = self._data
833        # merge revision
834
835        def jumps():
836            for idx in range(index[rev - 1], index[rev]):
837                i = idx * 3
838                yield tuple(data[i:i + 3])
839        return jumps()
840
841    def _setjumps(self, rev, jumps):
842        assert len(self._index) == rev, (len(self._index), rev)
843        if rev == 0:
844            self._index.append(0)
845            return
846        end = self._index[rev - 1]
847        if jumps is None:
848            self._index.append(end)
849            return
850        assert len(self._data) == end * 3, (len(self._data), end)
851        for j in jumps:
852            self._data.append(j[0])
853            self._data.append(j[1])
854            self._data.append(j[2])
855            end += 1
856        self._index.append(end)
857
858    def _updatefrom(self, repo, data):
859        repo = repo.unfiltered()
860
861        total = len(data)
862
863        progress = repo.ui.makeprogress(b'updating stablesort cache', _(b'changesets'), total)
864        progress.update(0)
865        for idx, rev in enumerate(data):
866            parents = filterparents(repo.changelog.parentrevs(rev))
867            if len(parents) <= 1:
868                self._setjumps(rev, None)
869            else:
870                # merge! warn the cache
871                tiebreaker = _mergepoint_tie_breaker(repo)
872                minparent = sorted(parents, key=tiebreaker)[0]
873                for r in self.walkfrom(repo, rev):
874                    if r == minparent:
875                        break
876            if not (idx % 1000): # progress as a too high performance impact
877                revstr = b'' if rev is None else (b'rev %d' % rev)
878                progress.update(idx, item=revstr)
879        progress.complete()
880
881    def clear(self, reset=False):
882        super(ondiskstablesortcache, self).clear()
883        self._index = array.array(r'l')
884        self._data = array.array(r'l')
885
886    def load(self, repo):
887        """load data from disk
888
889        (crude version, read everything all the time)
890        """
891        assert repo.filtername is None
892
893        data = repo.cachevfs.tryread(self._filepath)
894        self._cachekey = self.emptykey
895        self._index = array.array(r'l')
896        self._data = array.array(r'l')
897        if data:
898            headerdata = data[:self._cachekeysize]
899            cachekey = self._deserializecachekey(headerdata)
900            offset = self._cachekeysize
901            indexsizedata = data[offset:offset + S_INDEXSIZE.size]
902            indexsize = S_INDEXSIZE.unpack(indexsizedata)[0]
903            offset += S_INDEXSIZE.size
904            if indexsize % self._datastruct.size == 0 and len(data) - offset >= indexsize:
905                index = list(self._deserializedata(data[offset:offset + indexsize]))
906                offset += indexsize
907                expected = index[-1] * self._datastruct.size * 3
908                data = data[offset:]
909            else:
910                # index cannot be read, so we need to abort somehow
911                expected = None
912            if len(data) == expected:
913                self._index.extend(index)
914                self._data.extend(self._deserializedata(data))
915                self._cachekey = cachekey
916            else:
917                repo.ui.debug(b'stablesortcache file seems to be corrupted, '
918                              b'it will be rebuilt from scratch\n')
919        self._ondiskkey = self._cachekey
920        pass
921
922    def save(self, repo):
923        """save the data to disk
924
925        (crude version, rewrite everything all the time)
926        """
927        if self._cachekey is None or self._cachekey == self._ondiskkey:
928            return
929        try:
930            cachefile = repo.cachevfs(self._filepath, b'w', atomictemp=True)
931
932            # data to write
933            headerdata = self._serializecachekey()
934            indexdata = self._serializedata(self._index)
935            data = self._serializedata(self._data)
936            indexsize = S_INDEXSIZE.pack(len(indexdata))
937
938            # writing
939            cachefile.write(headerdata)
940            cachefile.write(indexsize)
941            cachefile.write(indexdata)
942            cachefile.write(data)
943            cachefile.close()
944            self._ondiskkey = self._cachekey
945        except (IOError, OSError) as exc:
946            repo.ui.log(b'stablesortcache', b'could not write update %s\n' % forcebytestr(exc))
947            repo.ui.debug(b'stablesortcache: could not write update %s\n' % forcebytestr(exc))
948
949@eh.reposetup
950def setupcache(ui, repo):
951
952    class stablesortrepo(repo.__class__):
953
954        @localrepo.unfilteredpropertycache
955        def stablesort(self):
956            cache = ondiskstablesortcache()
957            cache.update(self)
958            return cache
959
960        @localrepo.unfilteredmethod
961        def destroyed(self):
962            if r'stablesort' in vars(self):
963                self.stablesort.clear()
964            super(stablesortrepo, self).destroyed()
965
966        @localrepo.unfilteredmethod
967        def updatecaches(self, tr=None, **kwargs):
968            if utility.shouldwarmcache(self, tr):
969                self.stablesort.update(self)
970                self.stablesort.save(self)
971            super(stablesortrepo, self).updatecaches(tr, **kwargs)
972
973    repo.__class__ = stablesortrepo
974
975_methodmap = {
976    b'branchpoint': stablesort_branchpoint,
977    b'basic-mergepoint': stablesort_mergepoint_multirevs,
978    b'basic-headstart': stablesort_mergepoint_head_basic,
979    b'headstart': stablesort_mergepoint_head_debug,
980    b'headcached': stablesort_mergepoint_head_cached,
981    b'headondisk': stablesort_mergepoint_head_ondisk,
982}
983
984# merge last so that repo setup wrap after that one.
985eh.merge(depthcache.eh)
986