1# Code dedicated to the computation of stable sorting 2# 3# These stable sorting are used stable ranges 4# 5# Copyright 2017 Pierre-Yves David <pierre-yves.david@ens-lyon.org> 6# 7# This software may be used and distributed according to the terms of the 8# GNU General Public License version 2 or any later version. 9 10r"""Stable sorting for the mercurial graph 11 12The goal is to provided an efficient, revnum independant way, to sort revisions 13in a topologicaly. Having it independant from revnum is important to make it 14stable from one repository to another, unlocking various capabilities. For 15example it can be used for discovery purposes. 16 17This docstring describe the currently preferred solution: 18 19Probleme definition 20------------------- 21 22We want a way to order revision in the graph. For a linear history things are simple:: 23 24 A -> B -> C -> D -> E -> F -> G -> H 25 26 stablesort(::H) = [A, B, C, D, E, F, G, H] 27 28However, things become more complicated when the graph is not linear:: 29 30 A -> B -> C -> D -> G -> H 31 \ / 32 > E -> F 33 34 stablesort(::A) = [A] 35 stablesort(::B) = [A, B] 36 stablesort(::C) = [A, B, C] 37 stablesort(::D) = [A, B, C, D] 38 stablesort(::E) = [A, B, E] 39 stablesort(::F) = [A, B, E, F] 40 stablesort(::G) = [A, B, C, D, E, F, G] 41 stablesort(::H) = [A, B, C, D, E, F, G, H] 42 43Basic principle: 44---------------- 45 46We are always talking about set of revision defined by a single heads 47(eg: `stablesort(::r)`) 48 49For non merge revisions, the definition is simple:: 50 51 stablesort(::r) == stablesort(p1(r)) + r 52 53This is visible in some of the example above: 54 55 stablesort(::B) = stablesort(::A) + [B] 56 stablesort(::E) = stablesort(::B) + [E] 57 stablesort(::H) = stablesort(::G) + [H] 58 59For merge revision, we reuse as much as possible of the parents order: 60 61 pl = stablemin(parents(m)) 62 ph = stablemax(parents(m)) 63 stablesort(::m) == stablesort(pl) 64 + [i for i in stablesort(ph) if i in ph % pl] 65 + m 66 67This is visible in the example above: 68 69 stablesort(::G) = stablesort(::D) + [stablesort(::F) - ::D] + [G] 70 stablesort(::G) = [A, B, C, D] + ([A, B, E, F] - [A, B, C ,D]) + [G] 71 stablesort(::G) = [A, B, C, D] + [E, F] + [G] 72 73To decide which parent goes first in the stablesort, we need to order them. The 74`stablemin/stablemax` function express this. The actual order used is an 75implementation details (eg: could be node-order, in the example it is 76alphabetical order) 77 78The `ph % pl` set of revision is called the "exclusive part". It correspond to 79all revisions ancestors of `ph` (`ph` included) that are not ancestors of `pl` 80(`pl` included). In this area we try to reuse as much as the stable-sorted 81order for `ph`. In simple case, the `[i for i in stablesort(ph) if i in ph % 82pl]` is just the contiguous final range of `stablesort(ph)`. This is the case 83in the example we have looked at so far:: 84 85 stablesort(::F) - ::D = [A, B, E, F] - [A, B, C ,D] = stablesort(::F)[-2:] 86 87 88However in more advance case, this will not be contiguous and we'll need to 89skip over multiple parts of `stablesort(ph)` to cover `ph % pl`.Let's have a 90look at an example of such case:: 91 92 A - B ----- F - H 93 \ \ / / 94 \ E - G 95 \ / / 96 C - D 97 98We have the following stablesort: 99 100 stablesort(::A) = [A] 101 stablesort(::B) = [A, B] 102 stablesort(::C) = [A, C] 103 stablesort(::D) = [A, C, D] 104 stablesort(::E) = [A, B, C, E] 105 stablesort(::F) = [A, B, C, E, F] 106 stablesort(::G) = [A, B, C, D, E, G] 107 stablesort(::H) = [A, B, C, E, F, D, G, H] 108 109The stable order of `stablesort(::H)` match our definition:: 110 111 112 stablesort(::H) = [A, B, C, E, F] + [D, G] + [H] 113 stablesort(::F) = [A, B, C, E, F] 114 stablesort(::G) - ::F = [A, B, C, D, E, G] - [A, B, C, E, F] = [D, G] 115 116In this order, we reuse all of `stablesort(::F)`, but the subset of 117`stablesort(::G)` we reuse is not contiguous, we had to skip over 'E' that is 118already contained inside ::F. 119 120Usage 121----- 122 123An important details is that, in practice, the sorted revision are always 124walked backward, from the head of the set of revisions. 125 126preexisting cached data 127----------------------- 128 129The stable sort assume we already have 2 important property cached for each 130changesets: 131 1321) changeset depth == len(::r) 1332) first merge == max(merge() and ::r) 134 135Caching strategy 136---------------- 137 138Since we always walk from the head, the iteration mostly have to follow the 139unique parent of non merge revision. For merge revision, we need to iterate 140over the revisions accessible only through one of the parent before coming back 141to the other parent eventually. 142 143To efficiently cache the revision path we need to walk, we records "jumps". A 144jump is a revision where the next revision (in the stable sort) will not be a 145parent of the current revision, but another revision in the graph. 146 147In the first (simple) example above, we had:: 148 149 A -> B -> C -> D -> G -> H 150 \ / 151 > E -> F 152 153 stablesort(::D) = [A, B, C, D] 154 stablesort(::F) = [A, B, E, F] 155 stablesort(::G) = [A, B, C, D, E, F, G] 156 157In this case, when caching stable sort data for `G`, we need to record the `E 158-> D` jump. This correspond to point were we are done iterating over the 159revision accessible through `F` and we need to "jump back to the other parent" 160of `G`: `D`. 161 162 163 164In the second (more advance) example above, we had:: 165 166 A - B ----- F - H 167 \ \ / / 168 \ E - G 169 \ / / 170 C - D 171 172 stablesort(::F) = [A, B, C, E, F] 173 stablesort(::G) = [A, B, C, D, E, G] 174 stablesort(::H) = [A, B, C, E, F, D, G, H] 175 176In this case, when caching stable sort data for `G`, we need to record the `G 177-> D` and the `D -> F` jumps. 178 179Jumps are recorded using the following formats: 180 181 (jump-point, jump-destination, section-size) 182 183* jump-point is the last revision number we should iterate over before jumping, 184* jump-destination is the next revision we should iterate over after the jump point, 185* section-size is the number of revision to be iterated before reaching jump-point. 186 187the section-size is not directly used when doing a stable-sorted walk. However 188it is useful for higher level piece of code to take decision without having to 189actually walk the graph, (see stable range documentation). 190 191For each merge, we store the set of jumps that cover the exclusive side. 192 193Practical data 194-------------- 195 196The mercurial repository has simple branching and few jumps: 197 198 number of revisions: 69771 199 number of merge: 2734 200 number of jumps: 2950 201 average jumps: 1.079 202 median jumps: 1 203 90% jumps: 1 204 99% jumps: 3 205 max jumps: 6 206 jump cache size: 35 400 bytes 207 208Mozilla's branching is fairly simple too: 209 210 number of revisions: 435078 211 number of merge: 21035 212 number of jumps: 31434 213 average jumps: 1.494 214 median jumps: 1 215 90% jumps: 2 216 99% jumps: 9 217 max jumps: 169 218 jump cache size: 377 208 bytes 219 220Pypy has a more complicated branching history but jumps cache remains reasonable 221 222 number of revisions: 95010 223 number of merge: 7911 224 number of jumps: 24326 225 average jumps: 3.075 226 median jumps: 1 227 90% jumps: 5 228 99% jumps: 40 229 max jumps: 329 230 jump cache size: 291 912 bytes 231 232This still apply to larger private project: 233 234 number of revisions: 605011 235 number of merge: 118109 236 number of jumps: 314925 237 average jumps: 2.667 238 median jumps: 1 239 90% jumps: 3 240 99% jumps: 34 241 max jumps: 660 242 jump cache size: 3 779 100 bytes 243 244It is worth noting that the last jump could be computed form other information, 245removing one jump storage per merge. However this does not seems to be an issue 246worth the troubles for now. 247""" 248 249import array 250import collections 251import struct 252 253from mercurial.i18n import _ 254from mercurial import ( 255 commands, 256 error, 257 localrepo, 258 logcmdutil, 259 node as nodemod, 260 pycompat, 261 scmutil, 262) 263 264from mercurial.utils.stringutil import forcebytestr 265 266from . import ( 267 depthcache, 268 exthelper, 269 utility, 270 genericcaches, 271) 272 273filterparents = utility.filterparents 274 275eh = exthelper.exthelper() 276 277def _mergepoint_tie_breaker(repo): 278 """the key use to tie break merge parent 279 280 Exists as a function to help playing with different approaches. 281 282 Possible other factor are: 283 * depth of node, 284 * number of exclusive merges, 285 * number of jump points. 286 * <insert-your-idea> 287 """ 288 node = repo.changelog.node 289 depth = repo.depthcache.get 290 291 def key(rev): 292 return (-depth(rev), node(rev)) 293 return key 294 295@eh.command( 296 b'debugstablesort', 297 [ 298 (b'r', b'rev', [], b'heads to start from'), 299 (b'', b'method', b'branchpoint', b"method used for sorting, one of: " 300 b"branchpoint, basic-mergepoint and basic-headstart"), 301 (b'l', b'limit', b'', b'number of revision display (default to all)') 302 ] + commands.formatteropts, 303 _(b'')) 304def debugstablesort(ui, repo, **opts): 305 """display the ::REVS set topologically sorted in a stable way 306 """ 307 revs = scmutil.revrange(repo, opts['rev']) 308 309 method = opts['method'] 310 sorting = _methodmap.get(method) 311 if sorting is None: 312 valid_method = b', '.join(sorted(_methodmap)) 313 raise error.Abort(b'unknown sorting method: "%s"' % method, 314 hint=b'pick one of: %s' % valid_method) 315 316 displayer = logcmdutil.changesetdisplayer(ui, repo, 317 pycompat.byteskwargs(opts), 318 buffered=True) 319 kwargs = {} 320 if opts['limit']: 321 kwargs['limit'] = int(opts['limit']) 322 for r in sorting(repo, revs, **kwargs): 323 ctx = repo[r] 324 displayer.show(ctx) 325 displayer.flush(ctx) 326 displayer.close() 327 328@eh.command( 329 b'debugstablesortcache', 330 [] + commands.formatteropts, 331 _(b'')) 332def debugstablesortcache(ui, repo, **opts): 333 """display data about the stable sort cache of a repository 334 """ 335 unfi = repo.unfiltered() 336 revs = unfi.revs('all()') 337 nbrevs = len(revs) 338 ui.write(b'number of revisions: %12d\n' % nbrevs) 339 merge = unfi.revs('merge()') 340 nbmerge = len(merge) 341 cache = unfi.stablesort 342 cache.save(repo) 343 ui.write(b'number of merge: %12d\n' % nbmerge) 344 alljumps = [] 345 alljumpssize = [] 346 for r in merge: 347 jumps = cache.getjumps(unfi, r) 348 if jumps is None: 349 continue # not a merge 350 jumps = list(jumps) 351 alljumps.append(jumps) 352 alljumpssize.append(len(jumps)) 353 nbjumps = sum(alljumpssize) 354 ui.write(b'number of jumps: %12d\n' % nbjumps) 355 if not nbjumps: 356 return 0 357 avgjumps = nbjumps / float(len(alljumpssize)) 358 ui.write(b'average jumps: %6.3f\n' % avgjumps) 359 alljumpssize.sort() 360 medianjumps = alljumpssize[len(alljumpssize) // 2] 361 ui.write(b'median jumps: %12d\n' % medianjumps) 362 tensjumps = alljumpssize[len(alljumpssize) * 9 // 10] 363 ui.write(b'90%% jumps: %12d\n' % tensjumps) 364 centsjumps = alljumpssize[len(alljumpssize) * 99 // 100] 365 ui.write(b'99%% jumps: %12d\n' % centsjumps) 366 ui.write(b'max jumps: %12d\n' % max(alljumpssize)) 367 ui.write(b'jump cache size: %12d bytes\n' % (nbjumps * 12)) 368 369def stablesort_branchpoint(repo, revs, mergecallback=None): 370 """return '::revs' topologically sorted in "stable" order 371 372 This is a depth first traversal starting from 'nullrev', using node as a 373 tie breaker. 374 """ 375 # Various notes: 376 # 377 # * Bitbucket is used dates as tie breaker, that might be a good idea. 378 # 379 # * It seemds we can traverse in the same order from (one) head to bottom, 380 # if we the following record data for each merge: 381 # 382 # - highest (stablesort-wise) common ancestors, 383 # - order of parents (tablesort-wise) 384 cl = repo.changelog 385 parents = cl.parentrevs 386 nullrev = nodemod.nullrev 387 n = cl.node 388 # step 1: We need a parents -> children mapping for 2 reasons. 389 # 390 # * we build the order from nullrev to tip 391 # 392 # * we need to detect branching 393 children = collections.defaultdict(list) 394 for r in cl.ancestors(revs, inclusive=True): 395 ps = filterparents(parents(r)) 396 if not ps: 397 children[nullrev].append(r) 398 for p in ps: 399 children[p].append(r) 400 # step two: walk back up 401 # * pick lowest node in case of branching 402 # * stack disregarded part of the branching 403 # * process merge when both parents are yielded 404 405 # track what changeset has been 406 seen = [0] * (max(revs) + 2) 407 seen[nullrev] = True # nullrev is known 408 # starts from repository roots 409 # reuse the list form the mapping as we won't need it again anyway 410 stack = children[nullrev] 411 if not stack: 412 return [] 413 if 1 < len(stack): 414 stack.sort(key=n, reverse=True) 415 416 # list of rev, maybe we should yield, but since we built a children mapping we are 'O(N)' already 417 result = [] 418 419 current = stack.pop() 420 while current is not None or stack: 421 if current is None: 422 # previous iteration reached a merge or an unready merge, 423 current = stack.pop() 424 if seen[current]: 425 current = None 426 continue 427 ps = filterparents(parents(current)) 428 if not all(seen[p] for p in ps): 429 # we can't iterate on this merge yet because other child is not 430 # yielded yet (and we are topo sorting) we can discard it for now 431 # because it will be reached from the other child. 432 current = None 433 continue 434 assert not seen[current] 435 seen[current] = True 436 result.append(current) # could be yield, cf earlier comment 437 if mergecallback is not None and 2 <= len(ps): 438 mergecallback(result, current) 439 cs = children[current] 440 if not cs: 441 current = None 442 elif 1 == len(cs): 443 current = cs[0] 444 else: 445 cs.sort(key=n, reverse=True) 446 current = cs.pop() # proceed on smallest 447 stack.extend(cs) # stack the rest for later 448 assert len(result) == len(set(result)) 449 return result 450 451def stablesort_mergepoint_multirevs(repo, revs): 452 """return '::revs' topologically sorted in "stable" order 453 454 This is a depth first traversal starting from 'revs' (toward root), using node as a 455 tie breaker. 456 """ 457 cl = repo.changelog 458 tiebreaker = _mergepoint_tie_breaker(repo) 459 if not revs: 460 return [] 461 elif len(revs) == 1: 462 heads = list(sorted(revs)) 463 else: 464 # keeps heads only 465 heads = sorted(repo.revs(b'sort(heads(%ld::%ld))', revs, revs), key=tiebreaker) 466 467 results = [] 468 while heads: 469 h = heads.pop() 470 if revs: 471 bound = cl.findmissingrevs(common=heads, heads=[h]) 472 else: 473 bound = cl.ancestors([h], inclusive=True) 474 results.append(stablesort_mergepoint_bounded(repo, h, bound)) 475 if len(results) == 1: 476 return results[0] 477 finalresults = [] 478 for r in results[::-1]: 479 finalresults.extend(r) 480 return finalresults 481 482def stablesort_mergepoint_bounded(repo, head, revs): 483 """return 'revs' topologically sorted in "stable" order. 484 485 The 'revs' set MUST have 'head' as its one and unique head. 486 """ 487 # Various notes: 488 # 489 # * Bitbucket is using dates as tie breaker, that might be a good idea. 490 cl = repo.changelog 491 parents = cl.parentrevs 492 nullrev = nodemod.nullrev 493 tiebreaker = _mergepoint_tie_breaker(repo) 494 # step 1: We need a parents -> children mapping to detect dependencies 495 children = collections.defaultdict(set) 496 parentmap = {} 497 for r in revs: 498 ps = filterparents(parents(r)) 499 if 2 <= len(ps): 500 ps = tuple(sorted(ps, key=tiebreaker)) 501 parentmap[r] = ps 502 for p in ps: 503 children[p].add(r) 504 if not ps: 505 children[nullrev].add(r) 506 # step two: walk again, 507 stack = [head] 508 resultset = set() 509 result = [] 510 511 def add(current): 512 resultset.add(current) 513 result.append(current) 514 515 while stack: 516 current = stack.pop() 517 add(current) 518 parents = parentmap[current] 519 for p in parents: 520 if 1 < len(children[p]) and not children[p].issubset(resultset): 521 # we need other children to be yield first 522 continue 523 if p in revs: 524 stack.append(p) 525 526 result.reverse() 527 assert len(result) == len(resultset) 528 return result 529 530def stablesort_mergepoint_head_basic(repo, revs, limit=None): 531 heads = repo.revs(b'sort(heads(%ld))', revs) 532 if not heads: 533 return [] 534 elif 2 < len(heads): 535 raise error.Abort(b'cannot use head based merging, %d heads found' 536 % len(heads)) 537 head = heads.first() 538 revs = stablesort_mergepoint_bounded(repo, head, repo.revs(b'::%d', head)) 539 if limit is None: 540 return revs 541 return revs[-limit:] 542 543def stablesort_mergepoint_head_debug(repo, revs, limit=None): 544 heads = repo.revs(b'sort(heads(%ld))', revs) 545 if not heads: 546 return [] 547 elif 2 < len(heads): 548 raise error.Abort(b'cannot use head based merging, %d heads found' 549 % len(heads)) 550 head = heads.first() 551 revs = stablesort_mergepoint_head(repo, head) 552 if limit is None: 553 return revs 554 return revs[-limit:] 555 556def stablesort_mergepoint_head(repo, head): 557 """return '::rev' topologically sorted in "stable" order 558 559 This is a depth first traversal starting from 'rev' (toward root), using node as a 560 tie breaker. 561 """ 562 cl = repo.changelog 563 parents = cl.parentrevs 564 tiebreaker = _mergepoint_tie_breaker(repo) 565 566 top = [head] 567 mid = [] 568 bottom = [] 569 570 ps = filterparents(parents(head)) 571 while len(ps) == 1: 572 top.append(ps[0]) 573 ps = filterparents(parents(ps[0])) 574 top.reverse() 575 if len(ps) == 2: 576 ps = sorted(ps, key=tiebreaker) 577 578 # get the part from the highest parent. This is the part that changes 579 mid_revs = repo.revs(b'only(%d, %d)', ps[1], ps[0]) 580 if mid_revs: 581 mid = stablesort_mergepoint_bounded(repo, ps[1], mid_revs) 582 583 # And follow up with part othe parent we can inherit from 584 bottom = stablesort_mergepoint_head(repo, ps[0]) 585 586 return bottom + mid + top 587 588def stablesort_mergepoint_head_cached(repo, revs, limit=None): 589 heads = repo.revs(b'sort(heads(%ld))', revs) 590 if not heads: 591 return [] 592 elif 2 < len(heads): 593 raise error.Abort(b'cannot use head based merging, %d heads found' 594 % len(heads)) 595 head = heads.first() 596 cache = stablesortcache() 597 first = list(cache.get(repo, head, limit=limit)) 598 second = list(cache.get(repo, head, limit=limit)) 599 if first != second: 600 repo.ui.warn(b'stablesort-cache: initial run different from re-run:\n' 601 b' %s\n' 602 b' %s\n' % (first, second)) 603 return second 604 605class stablesortcache(object): 606 607 def __init__(self): 608 self._jumps = {} 609 super(stablesortcache, self).__init__() 610 611 def get(self, repo, rev, limit=None): 612 result = [] 613 for r in self.walkfrom(repo, rev): 614 result.append(r) 615 if limit is not None and limit <= len(result): 616 break 617 result.reverse() 618 return result 619 620 def getjumps(self, repo, rev): 621 if not self._hasjumpsdata(rev): 622 parents = filterparents(repo.changelog.parentrevs(rev)) 623 if len(parents) <= 1: 624 self._setjumps(rev, None) 625 else: 626 # merge ! warn the cache 627 tiebreaker = _mergepoint_tie_breaker(repo) 628 minparent = sorted(parents, key=tiebreaker)[0] 629 for r in self.walkfrom(repo, rev): 630 if r == minparent: 631 break 632 return self._getjumps(rev) 633 634 def _hasjumpsdata(self, rev): 635 return rev in self._jumps 636 637 def _getjumps(self, rev): 638 return self._jumps.get(rev) 639 640 def _setjumps(self, rev, jumps): 641 self._jumps[rev] = jumps 642 643 def walkfrom(self, repo, head): 644 tiebreaker = _mergepoint_tie_breaker(repo) 645 cl = repo.changelog 646 parentsfunc = cl.parentrevs 647 648 def parents(rev): 649 return filterparents(parentsfunc(rev)) 650 651 def oneparent(rev): 652 ps = parents(rev) 653 if not ps: 654 return None 655 if len(ps) == 1: 656 return ps[0] 657 return max(ps, key=tiebreaker) 658 659 current = head 660 previous_current_1 = object() 661 previous_current_2 = object() 662 663 while current is not None: 664 previous_current_2 = previous_current_1 665 previous_current_1 = current 666 assert previous_current_1 is not previous_current_2 667 668 jumps = self._getjumps(current) 669 if jumps is not None: 670 # we have enough cached information to directly iterate over 671 # the exclusive size. 672 for j in jumps: 673 jump_point = j[0] 674 jump_dest = j[1] 675 if current == jump_point: 676 yield current 677 else: 678 while current != jump_point: 679 yield current 680 current = oneparent(current) 681 assert current is not None 682 yield current 683 current = jump_dest 684 continue 685 686 yield current 687 688 ps = parents(current) 689 if not ps: 690 current = None # break 691 if len(ps) == 1: 692 current = ps[0] 693 elif len(ps) == 2: 694 lower_parent, higher_parent = sorted(ps, key=tiebreaker) 695 696 rev = current 697 jumps = [] 698 699 def recordjump(source, destination, size): 700 jump = (source, destination, size) 701 jumps.append(jump) 702 process = self._process_exclusive_side 703 for rev in process(lower_parent, higher_parent, cl, parents, 704 tiebreaker, recordjump): 705 yield rev 706 707 if rev == current: 708 recordjump(rev, lower_parent, 1) 709 710 self._setjumps(current, jumps) 711 712 current = lower_parent 713 714 def _process_exclusive_side(self, lower, higher, cl, parents, tiebreaker, 715 recordjump): 716 717 exclusive = cl.findmissingrevs(common=[lower], heads=[higher]) 718 719 def popready(stack): 720 """pop the top most ready item in the list""" 721 for idx in range(len(stack) - 1, -1, -1): 722 if children[stack[idx]].issubset(seen): 723 return stack.pop(idx) 724 return None 725 726 hardstack = [] 727 previous = None 728 seen = set() 729 current = higher 730 children = collections.defaultdict(set) 731 bound = set(exclusive) 732 if exclusive: 733 for r in exclusive: 734 for p in parents(r): 735 children[p].add(r) 736 737 hardstack.append(higher) 738 nextjump = False 739 size = 1 # take the merge point into account 740 while hardstack: 741 current = popready(hardstack) 742 if current in seen: 743 continue 744 softstack = [] 745 while current in bound and current not in seen: 746 if nextjump: 747 recordjump(previous, current, size) 748 nextjump = False 749 size = 0 750 yield current 751 size += 1 752 previous = current 753 seen.add(current) 754 755 all_parents = parents(current) 756 757 # search or next parent to walk through 758 fallback, next = None, None 759 if 1 == len(all_parents): 760 next = all_parents[0] 761 elif 2 <= len(all_parents): 762 fallback, next = sorted(all_parents, key=tiebreaker) 763 764 # filter parent not ready (children not emitted) 765 while next is not None and not children[next].issubset(seen): 766 nextjump = True 767 next = fallback 768 fallback = None 769 770 # stack management 771 if next is None: 772 next = popready(softstack) 773 if next is not None: 774 nextjump = True 775 elif fallback is not None: 776 softstack.append(fallback) 777 778 # get ready for next iteration 779 current = next 780 781 # any in processed head has to go in the hard stack 782 nextjump = True 783 hardstack.extend(softstack) 784 785 if previous is not None: 786 recordjump(previous, lower, size) 787 788def stablesort_mergepoint_head_ondisk(repo, revs, limit=None): 789 heads = repo.revs(b'sort(heads(%ld))', revs) 790 if not heads: 791 return [] 792 elif 2 < len(heads): 793 raise error.Abort(b'cannot use head based merging, %d heads found' 794 % len(heads)) 795 head = heads.first() 796 unfi = repo.unfiltered() 797 cache = unfi.stablesort 798 cache.save(unfi) 799 return cache.get(repo, head, limit=limit) 800 801S_INDEXSIZE = struct.Struct(b'>I') 802 803class ondiskstablesortcache(stablesortcache, genericcaches.changelogsourcebase): 804 805 _filepath = b'evoext-stablesortcache-00' 806 _cachename = b'evo-ext-stablesort' 807 808 def __init__(self): 809 super(ondiskstablesortcache, self).__init__() 810 self._index = array.array(r'l') 811 self._data = array.array(r'l') 812 del self._jumps 813 814 def getjumps(self, repo, rev): 815 if len(self._index) < rev: 816 msg = b'stablesortcache must be warmed before use (%d < %d)' 817 msg %= (len(self._index), rev) 818 raise error.ProgrammingError(msg) 819 return self._getjumps(rev) 820 821 def _getjumps(self, rev): 822 # very first revision 823 if rev == 0: 824 return None 825 # no data yet 826 if len(self._index) <= rev: 827 return None 828 index = self._index 829 # non merge revision 830 if index[rev] == index[rev - 1]: 831 return None 832 data = self._data 833 # merge revision 834 835 def jumps(): 836 for idx in range(index[rev - 1], index[rev]): 837 i = idx * 3 838 yield tuple(data[i:i + 3]) 839 return jumps() 840 841 def _setjumps(self, rev, jumps): 842 assert len(self._index) == rev, (len(self._index), rev) 843 if rev == 0: 844 self._index.append(0) 845 return 846 end = self._index[rev - 1] 847 if jumps is None: 848 self._index.append(end) 849 return 850 assert len(self._data) == end * 3, (len(self._data), end) 851 for j in jumps: 852 self._data.append(j[0]) 853 self._data.append(j[1]) 854 self._data.append(j[2]) 855 end += 1 856 self._index.append(end) 857 858 def _updatefrom(self, repo, data): 859 repo = repo.unfiltered() 860 861 total = len(data) 862 863 progress = repo.ui.makeprogress(b'updating stablesort cache', _(b'changesets'), total) 864 progress.update(0) 865 for idx, rev in enumerate(data): 866 parents = filterparents(repo.changelog.parentrevs(rev)) 867 if len(parents) <= 1: 868 self._setjumps(rev, None) 869 else: 870 # merge! warn the cache 871 tiebreaker = _mergepoint_tie_breaker(repo) 872 minparent = sorted(parents, key=tiebreaker)[0] 873 for r in self.walkfrom(repo, rev): 874 if r == minparent: 875 break 876 if not (idx % 1000): # progress as a too high performance impact 877 revstr = b'' if rev is None else (b'rev %d' % rev) 878 progress.update(idx, item=revstr) 879 progress.complete() 880 881 def clear(self, reset=False): 882 super(ondiskstablesortcache, self).clear() 883 self._index = array.array(r'l') 884 self._data = array.array(r'l') 885 886 def load(self, repo): 887 """load data from disk 888 889 (crude version, read everything all the time) 890 """ 891 assert repo.filtername is None 892 893 data = repo.cachevfs.tryread(self._filepath) 894 self._cachekey = self.emptykey 895 self._index = array.array(r'l') 896 self._data = array.array(r'l') 897 if data: 898 headerdata = data[:self._cachekeysize] 899 cachekey = self._deserializecachekey(headerdata) 900 offset = self._cachekeysize 901 indexsizedata = data[offset:offset + S_INDEXSIZE.size] 902 indexsize = S_INDEXSIZE.unpack(indexsizedata)[0] 903 offset += S_INDEXSIZE.size 904 if indexsize % self._datastruct.size == 0 and len(data) - offset >= indexsize: 905 index = list(self._deserializedata(data[offset:offset + indexsize])) 906 offset += indexsize 907 expected = index[-1] * self._datastruct.size * 3 908 data = data[offset:] 909 else: 910 # index cannot be read, so we need to abort somehow 911 expected = None 912 if len(data) == expected: 913 self._index.extend(index) 914 self._data.extend(self._deserializedata(data)) 915 self._cachekey = cachekey 916 else: 917 repo.ui.debug(b'stablesortcache file seems to be corrupted, ' 918 b'it will be rebuilt from scratch\n') 919 self._ondiskkey = self._cachekey 920 pass 921 922 def save(self, repo): 923 """save the data to disk 924 925 (crude version, rewrite everything all the time) 926 """ 927 if self._cachekey is None or self._cachekey == self._ondiskkey: 928 return 929 try: 930 cachefile = repo.cachevfs(self._filepath, b'w', atomictemp=True) 931 932 # data to write 933 headerdata = self._serializecachekey() 934 indexdata = self._serializedata(self._index) 935 data = self._serializedata(self._data) 936 indexsize = S_INDEXSIZE.pack(len(indexdata)) 937 938 # writing 939 cachefile.write(headerdata) 940 cachefile.write(indexsize) 941 cachefile.write(indexdata) 942 cachefile.write(data) 943 cachefile.close() 944 self._ondiskkey = self._cachekey 945 except (IOError, OSError) as exc: 946 repo.ui.log(b'stablesortcache', b'could not write update %s\n' % forcebytestr(exc)) 947 repo.ui.debug(b'stablesortcache: could not write update %s\n' % forcebytestr(exc)) 948 949@eh.reposetup 950def setupcache(ui, repo): 951 952 class stablesortrepo(repo.__class__): 953 954 @localrepo.unfilteredpropertycache 955 def stablesort(self): 956 cache = ondiskstablesortcache() 957 cache.update(self) 958 return cache 959 960 @localrepo.unfilteredmethod 961 def destroyed(self): 962 if r'stablesort' in vars(self): 963 self.stablesort.clear() 964 super(stablesortrepo, self).destroyed() 965 966 @localrepo.unfilteredmethod 967 def updatecaches(self, tr=None, **kwargs): 968 if utility.shouldwarmcache(self, tr): 969 self.stablesort.update(self) 970 self.stablesort.save(self) 971 super(stablesortrepo, self).updatecaches(tr, **kwargs) 972 973 repo.__class__ = stablesortrepo 974 975_methodmap = { 976 b'branchpoint': stablesort_branchpoint, 977 b'basic-mergepoint': stablesort_mergepoint_multirevs, 978 b'basic-headstart': stablesort_mergepoint_head_basic, 979 b'headstart': stablesort_mergepoint_head_debug, 980 b'headcached': stablesort_mergepoint_head_cached, 981 b'headondisk': stablesort_mergepoint_head_ondisk, 982} 983 984# merge last so that repo setup wrap after that one. 985eh.merge(depthcache.eh) 986