1# Code dedicated to the computation and properties of "stable ranges" 2# 3# These stable ranges are use for obsolescence markers discovery 4# 5# Copyright 2017 Pierre-Yves David <pierre-yves.david@ens-lyon.org> 6# 7# This software may be used and distributed according to the terms of the 8# GNU General Public License version 2 or any later version. 9r"""stable range 10 11General Goals and Properties 12--------------------------- 13 14Stable-ranges get useful when some logic needs a recursive way to slice the 15history of a repository in smaller and smaller group of revisions. Here is 16example of such use cases: 17 18* **bundle caching:** 19 With an easy way to slice any subsets of history into stable-ranges, we 20 can cache a small number of bundles covering these ranges and reuse them for 21 other pull operations. Even if the pull operation have different boudaries. 22 23* **metadata discovery:** 24 With a simple way to recursively look at smaller and smaller ranges, an 25 algorithm can do fine-grained discovery of area of history where some mutable 26 metadata differ from one repository to another. Such meta data can be 27 obsolescence markers, CI status, lightweight stag, etc... 28 29To fix these use cases best, stable-ranges need some important properties: 30 31* the total number of ranges needed to cover a full repository is well bounded. 32* the minimal number of ranges to cover an arbitrary subset of the history is well bounded 33* for the same section of history, the range will be the same on any 34 repositories, 35* the ranges are cheap to compute iteratively, each new revisions re-uses the 36 ranges previous revisions uses. 37 38Simple introduction to the Concepts 39----------------------------------- 40 41To keep things simple, let us look at the issue on a linear history:: 42 43 A -> B -> C -> D -> E -> F -> G -> H 44 45To make sure we have range that cover each part of the history with a good 46granularity we use some binary recursion. The standard stable range will be: 47 48 [A -> B -> C -> D -> E -> F -> G -> H] size 8 49 [A -> B -> C -> D] [E -> F -> G -> H] size 4 50 [A -> B] [C -> D] [E -> F] [G -> H] size 2 51 [A] [B] [C] [D] [E] [F] [G] [H] size 1 52 53Well bounded total number of ranges: 54```````````````````````````````````` 55 56This binary slicing make sure we keep the total number of stable ranges under control. 57 58As you can see, we have N size 1 ranges. They are trivial and we don't care 59about them. Then we have: N/2 size 2 ranges + N/4 size 4 ranges + N/8 size 8 60ranges, etc... So a total of about "length(repo)" standard ranges. 61 62 63Well bounded number of range to cover a subset: 64``````````````````````````````````````````````` 65 66Any subset of the history can be expressed with this standard ranges. 67 68For example, [A, F] subset, can be covered with 2 ranges:: 69 70 [A ->[B -> C -> D] [E -> F] 71 72A less strivial example [B, F], still requires a small number of ranges (3):: 73 74 [B] [C -> D] [E -> F] 75 76In practice, any subset can be expressed in at most "2 x log2(length(subset))" 77stable range, well bounded value. 78 79Cheap incremental updates 80````````````````````````` 81 82The scheme describe above result in 2N subranges for a repository is size N. We 83do not want to have to recompute these 2N stable-ranges whenever a new revision 84is added to the repository. To achieve these, the stable-ranges are defined by 85**fixed boundaries** that are independant from the total size of the 86repository. Here is how it looks like in our example. 87 88We start with a repository having only [A, F]. Notice how we still creates 89power of two sized stable range:: 90 91 [A -> B -> C -> D] 92 [A -> B] [C -> D] [E -> F] 93 [A] [B] [C] [D] [E] [F] 94 95This we simply adds a new revision G, we reuse more the range we already have:: 96 97 [A -> B -> C -> D] 98 [A -> B] [C -> D] [E -> F] 99 [A] [B] [C] [D] [E] [F] [G] 100 101Adding H is a bigger even as we read a new boundary. 102 103 [A -> B -> C -> D -> E -> F -> G -> H] 104 [A -> B -> C -> D] [E -> F -> G -> H] 105 [A -> B] [C -> D] [E -> F] [G -> H] 106 [A] [B] [C] [D] [E] [F] [G] [H] 107 108At most, adding a new revision `R` will introduces `log2(length(::R))` new 109stable ranges. 110 111More advanced elements 112---------------------- 113 114Of course, the history of repository is not as simple as our linear example. So 115how do we deal with the branching and merging? To do so, we leverage the 116"stable sort" algorithm defined in the `stablesort.py` module. To define the 117stable range that compose a set of revison `::R`, we linearize the space by 118sorting it. The stable sort algorithm has two important property: 119 120First, it give the same result on different repository. So we can use it to 121algorithm involving multiple peers. 122 123Second, in case of merge, it reuse the same order as the parents as much as possible. 124This is important to keep reusing existing stable range as the repository grow. 125 126How are ranges defined? 127``````````````````````` 128 129To keep things small, and simple, a stable range always contains the final part 130of a `stablesort(::R)` set of revision. It is defined by two items: 131 132* its head revision, the R in `stablesort(::R)` 133* the size of that range... well almost, for implementation reason, it uses the 134 index of the first included item. Or in other word, the number of excluded 135 initial item in `stablesort(::R)`. 136 137Lets look at a practical case. In our initial example, `[A, B, C, D, E, F, G, 138H]` is H-0; `[E, F, G, H]` is H-4; `[G, H]` is H-6 and `[H]` is H-7. 139 140Let us look at a non linar graph:: 141 142 A - B - C - E 143 | / 144 -D 145 146and assume that `stablesort(::E) = [A, B, C, D, E]`. Then `[A, B, C]` is C-0, 147`[A, B, D]` is D-0; `[D, E]` is E-3, `[E]` is E-4, etc... 148 149Slicing in a non linear context 150``````````````````````````````` 151 152Branching can also affect the way we slice things. 153 154The small example above offers a simple example. For a size 5 (starting at the 155root), standard slicing will want a size 4 part and size 1 part. So, in a 156simple linear space `[A, B, C, D, E]` would be sliced as `[A, B, C, D] + [E]`. 157However, in our non-linear case, `[A, B, C, D]` has two heads (C and D) and 158cannot be expressed with a single range. As a result the range will be sliced 159into more sub ranges:: 160 161 stdslice(A-0) = [A, B, C] + [D] + [E] = C-0 + D-2 + A-4 162 163Yet, this does not mean ranges containing a merge will always result in slicing 164with many revision. the sub ranges might also silently contains them. Let us 165look at an exemple:: 166 167 A - B - C - D - E --- G - H 168 | / 169 ---------- F 170 171with:: 172 173 `stablesort(::H) == [A, B, C, D, E, F, G, H]` 174 175then:: 176 177 stdslice(H-0) = [A, B, C, D] + [E, F, G, H] = D-0 + H-4 178 179As a result the non linearity will increase the number of subranges involved, 180but in practice the impact stay limited. 181 182The total number of standard subranges stay under control with about 183`O(log2(N))` new stable range introduced for each new revision. In practice the 184total number of stableranges we have is about `O(length(repo))` 185 186In addition, it is worth nothing that the head of the extra ranges we have to 187use will match the destination of the "jump" cached by the stablesort 188algorithm. So, all this slicing can usually be done without iterating over the 189stable sorted revision. 190 191Caching Strategy 192---------------- 193 194The current caching strategy use a very space inefficient sqlite database. 195testing show it often take 100x more space than what basic binary storage would 196take. The sqlite storage was very useful at the proof of concept stage. 197 198Since all new stable-ranges introduced by a revision R will be "R headed". So we 199could easily store their standard subranges alongside the revision information 200to reuse the existing revision index. 201 202Subrange information can be efficiently stored, a naive approach storing all 203stable ranges and their subranges would requires just 2 integer per range + 2 204integer for any extra sub-ranges other than the first and last ones. 205 206We can probably push efficiency further by taking advantage of the large 207overlap in subranges for one non-merge revision to the next. This is probably a 208premature optimisation until we start getting actual result for a naive binary 209storage. 210 211To use this at a large scale, it would be important to compute these data at 212commit time and to exchange them alongside the revision over the network. This 213is similar to what we do for other cached data. 214 215It is also important to note that the smaller ranges can probably be computed 216on the fly instead of being cached. The exact tradeoff would requires some 217field testing. 218 219Performance 220----------- 221 222The current implementation has not been especially optimized for performance. 223The goal was mostly to get the order of magnitude of the algorithm complexity. 224The result are convincing: medium repository get a full cache warming in a 225couple of seconds and even very large and branchy repository get a fully warmed 226in the order of tens of minutes. We do not observes a complexity explosion 227making the algorithm unusable of large repositories. 228 229A better cache implementation combined with an optimized version of the 230algorithm should give much faster performance. Combined with commit-time 231computation and exchange over the network, the overall impact of this should be 232invisible to the user. 233 234The stable range is currently successfully used in production for 2 use cases: 235* obsolescence markers discovery, 236* caching precomputed bundle while serving pulls 237 238practical data 239-------------- 240 241The evolve repository: 242 243 number of revisions: 4833 244 number of heads: 15 245 number of merge: 612 ( 12%) 246 number of range: 4826 247 with 2 subranges: 4551 ( 94%) 248 with 3 subranges: 255 ( 5%) 249 with 4 subranges: 12 ( 0%) 250 with 5 subranges: 7 ( 0%) 251 with 8 subranges: 1 ( 0%) 252 average range/revs: 0.99 253 254 Estimated approximative size of a naive compact storage: 255 41 056 bytes 256 Current size of the sqlite cache (for comparison): 257 5 312 512 bytes 258 259The mercurial repository: 260 261 number of revisions: 42849 262 number of heads: 2 263 number of merge: 2647 ( 6%) 264 number of range: 41279 265 with 2 subranges: 39740 ( 96%) 266 with 3 subranges: 1494 ( 3%) 267 with 4 subranges: 39 ( 0%) 268 with 5 subranges: 5 ( 0%) 269 with 7 subranges: 1 ( 0%) 270 average range/revs: 0.96 271 Estimated approximative size of a naive compact storage: 272 342 968 bytes 273 Current size of the sqlite cache (for comparison): 274 62 803 968 bytes 275 276The pypy repository (very brancy history): 277 278 number of revisions: 97409 279 number of heads: 183 280 number of merge: 8371 ( 8%) 281 number of range: 107025 282 with 2 subranges: 100166 ( 93%) 283 with 3 subranges: 5839 ( 5%) 284 with 4 subranges: 605 ( 0%) 285 with 5 subranges: 189 ( 0%) 286 with 6 subranges: 90 ( 0%) 287 with 7 subranges: 38 ( 0%) 288 with 8 subranges: 18 ( 0%) 289 with 9 subranges: 9 ( 0%) 290 with 10 subranges: 15 ( 0%) 291 with 11 subranges: 4 ( 0%) 292 with 12 subranges: 6 ( 0%) 293 with 13 subranges: 7 ( 0%) 294 with 14 subranges: 6 ( 0%) 295 with 15 subranges: 1 ( 0%) 296 with 16 subranges: 2 ( 0%) 297 with 17 subranges: 2 ( 0%) 298 with 18 subranges: 3 ( 0%) 299 with 19 subranges: 2 ( 0%) 300 with 20 subranges: 3 ( 0%) 301 with 25 subranges: 1 ( 0%) 302 with 27 subranges: 1 ( 0%) 303 with 31 subranges: 3 ( 0%) 304 with 32 subranges: 2 ( 0%) 305 with 33 subranges: 1 ( 0%) 306 with 35 subranges: 1 ( 0%) 307 with 43 subranges: 1 ( 0%) 308 with 44 subranges: 1 ( 0%) 309 with 45 subranges: 2 ( 0%) 310 with 47 subranges: 1 ( 0%) 311 with 51 subranges: 1 ( 0%) 312 with 52 subranges: 1 ( 0%) 313 with 57 subranges: 1 ( 0%) 314 with 65 subranges: 1 ( 0%) 315 with 73 subranges: 1 ( 0%) 316 with 79 subranges: 1 ( 0%) 317 average range/revs: 1.10 318 Estimated approximative size of a naive compact storage: 319 934 176 bytes 320 Current size of the sqlite cache (for comparison): 321 201 236 480 bytes 322 323A private and branchy repository: 324 325 number of revisions: 605011 326 number of heads: 14061 327 number of merge: 118109 ( 19%) 328 number of range: 747625 329 with 2 subranges: 595985 ( 79%) 330 with 3 subranges: 130196 ( 17%) 331 with 4 subranges: 14093 ( 1%) 332 with 5 subranges: 4090 ( 0%) 333 with 6 subranges: 741 ( 0%) 334 with 7 subranges: 826 ( 0%) 335 with 8 subranges: 1313 ( 0%) 336 with 9 subranges: 83 ( 0%) 337 with 10 subranges: 22 ( 0%) 338 with 11 subranges: 9 ( 0%) 339 with 12 subranges: 26 ( 0%) 340 with 13 subranges: 5 ( 0%) 341 with 14 subranges: 9 ( 0%) 342 with 15 subranges: 3 ( 0%) 343 with 16 subranges: 212 ( 0%) 344 with 18 subranges: 6 ( 0%) 345 with 19 subranges: 3 ( 0%) 346 with 24 subranges: 1 ( 0%) 347 with 27 subranges: 1 ( 0%) 348 with 32 subranges: 1 ( 0%) 349 average range/revs: 1.23 350 Estimated approximative size of a naive compact storage: 351 7 501 928 bytes 352 Current size of the sqlite cache (for comparison): 353 1 950 310 400 bytes 354""" 355 356import abc 357import functools 358import heapq 359import math 360import time 361 362from mercurial import ( 363 error, 364 node as nodemod, 365 scmutil, 366 util, 367) 368 369from mercurial.i18n import _ 370 371from . import ( 372 exthelper, 373 firstmergecache, 374 stablesort, 375 utility, 376) 377 378filterparents = utility.filterparents 379 380eh = exthelper.exthelper() 381 382 383################################# 384### Stable Range computation ### 385################################# 386 387def _hlp2(i): 388 """return highest power of two lower than 'i'""" 389 return 2 ** int(math.log(i - 1, 2)) 390 391def subrangesclosure(repo, stablerange, heads): 392 """set of all standard subrange under heads 393 394 This is intended for debug purposes. Range are returned from largest to 395 smallest in terms of number of revision it contains.""" 396 subranges = stablerange.subranges 397 toproceed = [(r, 0, ) for r in heads] 398 ranges = set(toproceed) 399 while toproceed: 400 entry = toproceed.pop() 401 for r in subranges(repo, entry): 402 if r not in ranges: 403 ranges.add(r) 404 toproceed.append(r) 405 ranges = list(ranges) 406 n = repo.changelog.node 407 rangelength = stablerange.rangelength 408 ranges.sort(key=lambda r: (-rangelength(repo, r), n(r[0]))) 409 return ranges 410 411_stablerangemethodmap = { 412 b'branchpoint': lambda repo: stablerange(), 413 b'default': lambda repo: repo.stablerange, 414 b'basic-branchpoint': lambda repo: stablerangebasic(), 415 b'basic-mergepoint': lambda repo: stablerangedummy_mergepoint(), 416 b'mergepoint': lambda repo: stablerange_mergepoint(), 417} 418 419@eh.command( 420 b'debugstablerange', 421 [ 422 (b'r', b'rev', [], b'operate on (rev, 0) ranges for rev in REVS'), 423 (b'', b'subranges', False, b'recursively display data for subranges too'), 424 (b'', b'verify', False, b'checks subranges content (EXPENSIVE)'), 425 (b'', b'method', b'mergepoint', 426 b'method to use, one of "branchpoint", "mergepoint"') 427 ], 428 _(b'')) 429def debugstablerange(ui, repo, **opts): 430 """display standard stable subrange for a set of ranges 431 432 Range as displayed as '<node>-<index> (<rev>, <depth>, <length>)', use 433 --verbose to get the extra details in (). 434 """ 435 short = nodemod.short 436 revs = scmutil.revrange(repo, opts['rev']) 437 if not revs: 438 raise error.Abort(b'no revisions specified') 439 if ui.verbose: 440 template = b'%s-%d (%d, %d, %d)' 441 442 def _rangestring(repo, rangeid): 443 return template % ( 444 short(node(rangeid[0])), 445 rangeid[1], 446 rangeid[0], 447 depth(unfi, rangeid[0]), 448 length(unfi, rangeid) 449 ) 450 else: 451 template = b'%s-%d' 452 453 def _rangestring(repo, rangeid): 454 return template % ( 455 short(node(rangeid[0])), 456 rangeid[1], 457 ) 458 # prewarm depth cache 459 unfi = repo.unfiltered() 460 node = unfi.changelog.node 461 462 method = opts['method'] 463 getstablerange = _stablerangemethodmap.get(method) 464 if getstablerange is None: 465 raise error.Abort(b'unknown stable sort method: "%s"' % method) 466 467 stablerange = getstablerange(unfi) 468 depth = stablerange.depthrev 469 length = stablerange.rangelength 470 subranges = stablerange.subranges 471 stablerange.warmup(repo, max(revs)) 472 473 if opts['subranges']: 474 ranges = subrangesclosure(unfi, stablerange, revs) 475 else: 476 ranges = [(r, 0) for r in revs] 477 478 for r in ranges: 479 subs = subranges(unfi, r) 480 subsstr = b', '.join(_rangestring(unfi, s) for s in subs) 481 rstr = _rangestring(unfi, r) 482 if opts['verify']: 483 status = b'leaf' 484 if 1 < length(unfi, r): 485 status = b'complete' 486 revs = set(stablerange.revsfromrange(unfi, r)) 487 subrevs = set() 488 for s in subs: 489 subrevs.update(stablerange.revsfromrange(unfi, s)) 490 if revs != subrevs: 491 status = b'missing' 492 ui.status(b'%s [%s] - %s\n' % (rstr, status, subsstr)) 493 else: 494 ui.status(b'%s - %s\n' % (rstr, subsstr)) 495 496class abstractstablerange(object): 497 """The official API for a stablerange""" 498 499 __metaclass__ = abc.ABCMeta 500 501 @abc.abstractmethod 502 def subranges(self, repo, rangeid): 503 """return the stable sub-ranges of a rangeid""" 504 raise NotImplementedError() 505 506 @abc.abstractmethod 507 def revsfromrange(self, repo, rangeid): 508 """return revision contained in a range""" 509 raise NotImplementedError() 510 511 @abc.abstractmethod 512 def depthrev(self, repo, rev): 513 """depth a revision""" 514 # Exist to allow basic implementation to ignore the depthcache 515 # Could be demoted to _depthrev. 516 raise NotImplementedError() 517 518 @abc.abstractmethod 519 def warmup(self, repo, upto=None): 520 """warmup the stable range cache""" 521 raise NotImplementedError() 522 523 @abc.abstractmethod 524 def rangelength(self, repo, rangeid): 525 """number of revision in <range>""" 526 raise NotImplementedError() 527 528 def _slicepoint(self, repo, rangeid): 529 """find the standard slicing point for a range""" 530 rangedepth = self.depthrev(repo, rangeid[0]) 531 step = _hlp2(rangedepth) 532 standard_start = 0 533 while standard_start < rangeid[1] and 0 < step: 534 if standard_start + step < rangedepth: 535 standard_start += step 536 step //= 2 537 if rangeid[1] == standard_start: 538 slicesize = _hlp2(self.rangelength(repo, rangeid)) 539 slicepoint = rangeid[1] + slicesize 540 else: 541 assert standard_start < rangedepth 542 slicepoint = standard_start 543 return slicepoint 544class stablerangebasic(abstractstablerange): 545 """a very dummy implementation of stablerange 546 547 the implementation is here to lay down the basic algorithm in the stable 548 range in a inefficient but easy to read manners. It should be used by test 549 to validate output.""" 550 551 __metaclass__ = abc.ABCMeta 552 553 def _sortfunction(self, repo, headrev): 554 return stablesort.stablesort_branchpoint(repo, [headrev]) 555 556 def warmup(self, repo, upto=None): 557 # no cache to warm for basic implementation 558 pass 559 560 def depthrev(self, repo, rev): 561 """depth a revision""" 562 return len(repo.revs(b'::%d', rev)) 563 564 def revsfromrange(self, repo, rangeid): 565 """return revision contained in a range 566 567 The range `(<head>, <skip>)` contains all revisions stable-sorted from 568 <head>, skipping the <index>th lower revisions. 569 """ 570 headrev, index = rangeid[0], rangeid[1] 571 revs = self._sortfunction(repo, headrev) 572 return revs[index:] 573 574 def rangelength(self, repo, rangeid): 575 """number of revision in <range>""" 576 return len(self.revsfromrange(repo, rangeid)) 577 578 def subranges(self, repo, rangeid): 579 """return the stable sub-ranges of a rangeid""" 580 headrev, index = rangeid[0], rangeid[1] 581 if self.rangelength(repo, rangeid) == 1: 582 return [] 583 slicepoint = self._slicepoint(repo, rangeid) 584 585 # search for range defining the lower set of revision 586 # 587 # we walk the lower set from the top following the stable order of the 588 # current "head" of the lower range. 589 # 590 # As soon as the revision in the lowerset diverges from the one in the 591 # range being generated, we emit the range and start a new one. 592 result = [] 593 lowerrevs = self.revsfromrange(repo, rangeid)[:(slicepoint - index)] 594 head = None 595 headrange = None 596 skip = None 597 for rev in lowerrevs[::-1]: 598 if head is None: 599 head = rev 600 headrange = self.revsfromrange(repo, (head, 0)) 601 skip = self.depthrev(repo, rev) - 1 602 elif rev != headrange[skip - 1]: 603 result.append((head, skip)) 604 head = rev 605 headrange = self.revsfromrange(repo, (head, 0)) 606 skip = self.depthrev(repo, rev) - 1 607 else: 608 skip -= 1 609 result.append((head, skip)) 610 611 result.reverse() 612 613 # top part is trivial 614 top = (headrev, slicepoint) 615 result.append(top) 616 617 # double check the result 618 initialrevs = self.revsfromrange(repo, rangeid) 619 subrangerevs = sum((self.revsfromrange(repo, sub) for sub in result), 620 []) 621 assert initialrevs == subrangerevs 622 return result 623 624class stablerangedummy_mergepoint(stablerangebasic): 625 """a very dummy implementation of stablerange use 'mergepoint' sorting 626 """ 627 628 def _sortfunction(self, repo, headrev): 629 return stablesort.stablesort_mergepoint_head_basic(repo, [headrev]) 630 631class stablerangecached(abstractstablerange): 632 """an implementation of stablerange using caching""" 633 634 __metaclass__ = abc.ABCMeta 635 636 def __init__(self): 637 # cache the standard stable subranges or a range 638 self._subrangescache = {} 639 super(stablerangecached, self).__init__() 640 641 def depthrev(self, repo, rev): 642 return repo.depthcache.get(rev) 643 644 def rangelength(self, repo, rangeid): 645 """number of revision in <range>""" 646 headrev, index = rangeid[0], rangeid[1] 647 return self.depthrev(repo, headrev) - index 648 649 def subranges(self, repo, rangeid): 650 assert 0 <= rangeid[1] <= rangeid[0], rangeid 651 cached = self._getsub(rangeid) 652 if cached is not None: 653 return cached 654 value = self._subranges(repo, rangeid) 655 self._setsub(rangeid, value) 656 return value 657 658 def _getsub(self, rev): 659 """utility function used to access the subranges cache 660 661 This mostly exist to help the on disk persistence""" 662 return self._subrangescache.get(rev) 663 664 def _setsub(self, rev, value): 665 """utility function used to set the subranges cache 666 667 This mostly exist to help the on disk persistence.""" 668 self._subrangescache[rev] = value 669 670class stablerange_mergepoint(stablerangecached): 671 """Stablerange implementation using 'mergepoint' based sorting 672 """ 673 674 def __init__(self): 675 super(stablerange_mergepoint, self).__init__() 676 677 def warmup(self, repo, upto=None): 678 # no cache to warm for basic implementation 679 pass 680 681 def revsfromrange(self, repo, rangeid): 682 """return revision contained in a range 683 684 The range `(<head>, <skip>)` contains all revisions stable-sorted from 685 <head>, skipping the <index>th lower revisions. 686 """ 687 limit = self.rangelength(repo, rangeid) 688 return repo.stablesort.get(repo, rangeid[0], limit=limit) 689 690 def _stableparent(self, repo, headrev): 691 """The parent of the changeset with reusable subrange 692 693 For non-merge it is simple, there is a single parent. For Mercurial we 694 have to find the right one. Since the stable sort use merge-point, we 695 know that one of REV parents stable sort is a subset of REV stable 696 sort. In other word: 697 698 sort(::REV) = sort(::min(parents(REV)) 699 + sort(only(max(parents(REV)), min(parents(REV))) 700 + [REV] 701 702 We are looking for that `min(parents(REV))`. Since the subrange are 703 based on the sort, we can reuse its subrange as well. 704 """ 705 ps = filterparents(repo.changelog.parentrevs(headrev)) 706 if not ps: 707 return nodemod.nullrev 708 elif len(ps) == 1: 709 return ps[0] 710 else: 711 tiebreaker = stablesort._mergepoint_tie_breaker(repo) 712 return min(ps, key=tiebreaker) 713 714 def _parentrange(self, repo, rangeid): 715 stable_parent = self._stableparent(repo, rangeid[0]) 716 stable_parent_depth = self.depthrev(repo, stable_parent) 717 stable_parent_range = (stable_parent, rangeid[1]) 718 return stable_parent_depth, stable_parent_range 719 720 def _warmcachefor(self, repo, rangeid, slicepoint): 721 """warm cache with all the element necessary""" 722 stack = [] 723 depth, current = self._parentrange(repo, rangeid) 724 while current not in self._subrangescache and slicepoint < depth: 725 stack.append(current) 726 depth, current = self._parentrange(repo, current) 727 while stack: 728 current = stack.pop() 729 self.subranges(repo, current) 730 731 def _subranges(self, repo, rangeid): 732 headrev, initial_index = rangeid 733 # size 1 range can't be sliced 734 if self.rangelength(repo, rangeid) == 1: 735 return [] 736 # find were we need to slice 737 slicepoint = self._slicepoint(repo, rangeid) 738 739 ret = self._slicesrangeat(repo, rangeid, slicepoint) 740 741 return ret 742 743 def _slicesrangeat(self, repo, rangeid, slicepoint): 744 headrev, initial_index = rangeid 745 self._warmcachefor(repo, rangeid, slicepoint) 746 747 stable_parent_data = self._parentrange(repo, rangeid) 748 stable_parent_depth, stable_parent_range = stable_parent_data 749 750 # top range is always the same, so we can build it early for all 751 top_range = (headrev, slicepoint) 752 753 # now find out about the lower range, if we are lucky there is only 754 # one, otherwise we need to issue multiple one to cover every revision 755 # on the lower set. (and cover them only once). 756 if slicepoint == stable_parent_depth: 757 # luckly shot, the parent is actually the head of the lower range 758 subranges = [ 759 stable_parent_range, 760 top_range, 761 ] 762 elif slicepoint < stable_parent_depth: 763 # The parent is above the slice point, 764 # it's lower subrange will be the same so we just get them, 765 # (and the top range is always the same) 766 subranges = self.subranges(repo, stable_parent_range)[:] 767 parenttop = subranges.pop() 768 lenparenttop = self.rangelength(repo, parenttop) 769 skimfromparent = stable_parent_depth - slicepoint 770 if lenparenttop < skimfromparent: 771 # dropping the first subrange of the stable parent range is not 772 # enough to skip what we need to skip, change in approach is needed 773 subranges = self._slicesrangeat(repo, stable_parent_range, slicepoint) 774 subranges.pop() 775 elif lenparenttop > skimfromparent: 776 # The first subrange of the parent is longer that what we want 777 # to drop, we need to keep some of it. 778 midranges = self._slicesrangeat(repo, parenttop, slicepoint) 779 subranges.extend(midranges[:-1]) 780 subranges.append(top_range) 781 elif initial_index < stable_parent_depth < slicepoint: 782 # the parent is below the range we are considering, we need to 783 # compute these uniques subranges 784 subranges = [stable_parent_range] 785 subranges.extend(self._unique_subranges(repo, headrev, 786 stable_parent_depth, 787 slicepoint)) 788 subranges.append(top_range) 789 else: 790 # we cannot reuse the parent range at all 791 subranges = list(self._unique_subranges(repo, headrev, 792 initial_index, 793 slicepoint)) 794 subranges.append(top_range) 795 796 ### slow code block to validated the slicing works as expected 797 # toprevs = self.revsfromrange(repo, rangeid) 798 # subrevs = [] 799 # for s in subranges: 800 # subrevs.extend(self.revsfromrange(repo, s)) 801 # assert toprevs == subrevs, (rangeid, slicepoint, stable_parent_range, stable_parent_depth, toprevs, subrevs) 802 803 return subranges 804 805 def _unique_subranges(self, repo, headrev, initial_index, slicepoint): 806 """Compute subrange unique to the exclusive part of merge""" 807 result = [] 808 depth = repo.depthcache.get 809 nextmerge = repo.firstmergecache.get 810 walkfrom = functools.partial(repo.stablesort.walkfrom, repo) 811 getjumps = functools.partial(repo.stablesort.getjumps, repo) 812 skips = depth(headrev) - slicepoint 813 tomap = slicepoint - initial_index 814 815 jumps = getjumps(headrev) 816 # this function is only caled if headrev is a merge 817 # and initial_index is above its lower parents 818 assert jumps is not None 819 jumps = iter(jumps) 820 assert 0 < skips, skips 821 assert 0 < tomap, (tomap, (headrev, initial_index), slicepoint) 822 823 # utility function to find the next changeset with jump information 824 # (and the distance to it) 825 def nextmergedata(startrev): 826 merge = nextmerge(startrev) 827 depthrev = depth(startrev) 828 if merge == startrev: 829 return 0, startrev 830 elif merge == nodemod.nullrev: 831 return depthrev, None 832 depthmerge = depth(merge) 833 return depthrev - depthmerge, merge 834 835 # skip over all necesary data 836 mainjump = None 837 jumpdest = headrev 838 while 0 < skips: 839 jumphead = jumpdest 840 currentjump = next(jumps) 841 skipped = size = currentjump[2] 842 jumpdest = currentjump[1] 843 if size == skips: 844 jumphead = jumpdest 845 mainjump = next(jumps) 846 mainsize = mainjump[2] 847 elif skips < size: 848 revs = walkfrom(jumphead) 849 next(revs) 850 for i in range(skips): 851 jumphead = next(revs) 852 assert jumphead is not None 853 skipped = skips 854 size -= skips 855 mainjump = currentjump 856 mainsize = size 857 skips -= skipped 858 assert skips == 0, skips 859 860 # exiting from the previous block we should have: 861 # jumphead: first non-skipped revision (head of the high subrange) 862 # mainjump: next jump coming jump on main iteration 863 # mainsize: len(mainjump[0]::jumphead) 864 865 # Now we need to compare walk on the main iteration with walk from the 866 # current subrange head. Instead of doing a full walk, we just skim 867 # over the jumps for each iteration. 868 rangehead = jumphead 869 refjumps = None 870 size = 0 871 while size < tomap: 872 assert mainjump is not None 873 if refjumps is None: 874 dist2merge, merge = nextmergedata(jumphead) 875 if (mainsize <= dist2merge) or merge is None: 876 refjumps = iter(()) 877 ref = None 878 else: 879 # advance counters 880 size += dist2merge 881 mainsize -= dist2merge 882 jumphead = merge 883 refjumps = iter(getjumps(merge)) 884 ref = next(refjumps, None) 885 elif ref is not None and mainjump[0:2] == ref[0:2]: 886 # both follow the same path 887 size += mainsize 888 jumphead = mainjump[1] 889 mainjump = next(jumps, None) 890 mainsize = mainjump[2] 891 ref = next(refjumps, None) 892 if ref is None: 893 # we are doing with section specific to the last merge 894 # reset `refjumps` to trigger the logic that search for the 895 # next merge 896 refjumps = None 897 else: 898 size += mainsize 899 if size < tomap: 900 subrange = (rangehead, depth(rangehead) - size) 901 assert subrange[1] < depth(subrange[0]) 902 result.append(subrange) 903 tomap -= size 904 size = 0 905 jumphead = rangehead = mainjump[1] 906 mainjump = next(jumps, None) 907 mainsize = mainjump[2] 908 refjumps = None 909 910 if tomap: 911 subrange = (rangehead, depth(rangehead) - tomap) 912 assert subrange[1] < depth(subrange[0]), (rangehead, depth(rangehead), tomap) 913 result.append(subrange) 914 result.reverse() 915 return result 916 917class stablerange(stablerangecached): 918 919 def __init__(self, lrusize=2000): 920 # The point up to which we have data in cache 921 self._tiprev = None 922 self._tipnode = None 923 # To slices merge, we need to walk their descendant in reverse stable 924 # sort order. For now we perform a full stable sort their descendant 925 # and then use the relevant top most part. This order is going to be 926 # the same for all ranges headed at the same merge. So we cache these 927 # value to reuse them accross the same invocation. 928 self._stablesortcache = util.lrucachedict(lrusize) 929 # something useful to compute the above 930 # mergerev -> stablesort, length 931 self._stablesortprepared = util.lrucachedict(lrusize) 932 # caching parent call # as we do so many of them 933 self._parentscache = {} 934 # The first part of the stable sorted list of revision of a merge will 935 # shared with the one of others. This means we can reuse subranges 936 # computed from that point to compute some of the subranges from the 937 # merge. 938 self._inheritancecache = {} 939 super(stablerange, self).__init__() 940 941 def warmup(self, repo, upto=None): 942 """warm the cache up""" 943 repo = repo.unfiltered() 944 repo.depthcache.update(repo) 945 cl = repo.changelog 946 # subrange should be warmed from head to range to be able to benefit 947 # from revsfromrange cache. otherwise each merge will trigger its own 948 # stablesort. 949 # 950 # we use the revnumber as an approximation for depth 951 ui = repo.ui 952 starttime = util.timer() 953 954 if upto is None: 955 upto = cl.tiprev() 956 if self._tiprev is None: 957 revs = cl.revs(stop=upto) 958 nbrevs = upto + 1 959 else: 960 assert cl.node(self._tiprev) == self._tipnode 961 if upto <= self._tiprev: 962 return 963 revs = cl.revs(start=self._tiprev + 1, stop=upto) 964 nbrevs = upto - self._tiprev 965 rangeheap = [] 966 progress = ui.makeprogress(_(b"filling depth cache"), _(b"changesets"), nbrevs) 967 for idx, r in enumerate(revs): 968 if not idx % 1000: 969 progress.update(idx) 970 # warm up depth 971 self.depthrev(repo, r) 972 rangeheap.append((-r, (r, 0))) 973 progress.complete() 974 975 heappop = heapq.heappop 976 heappush = heapq.heappush 977 heapify = heapq.heapify 978 979 original = set(rangeheap) 980 seen = 0 981 # progress report is showing up in the profile for small and fast 982 # repository so we build that complicated work around. 983 progress_each = 100 984 progress_last = time.time() 985 heapify(rangeheap) 986 progress = ui.makeprogress(_(b"filling stablerange cache"), _(b"changesets"), nbrevs) 987 while rangeheap: 988 value = heappop(rangeheap) 989 if value in original: 990 if not seen % progress_each: 991 # if a lot of time passed, report more often 992 progress_new = time.time() 993 if (1 < progress_each) and (0.1 < progress_new - progress_last): 994 progress_each /= 10 995 progress.update(seen) 996 progress_last = progress_new 997 seen += 1 998 original.remove(value) # might have been added from other source 999 __, rangeid = value 1000 if self._getsub(rangeid) is None: 1001 for sub in self.subranges(repo, rangeid): 1002 if self._getsub(sub) is None: 1003 heappush(rangeheap, (-sub[0], sub)) 1004 progress.complete() 1005 1006 self._tiprev = upto 1007 self._tipnode = cl.node(upto) 1008 1009 duration = util.timer() - starttime 1010 repo.ui.log(b'evoext-cache', b'updated stablerange cache in %.4f seconds\n', 1011 duration) 1012 1013 def subranges(self, repo, rangeid): 1014 cached = self._getsub(rangeid) 1015 if cached is not None: 1016 return cached 1017 value = self._subranges(repo, rangeid) 1018 self._setsub(rangeid, value) 1019 return value 1020 1021 def revsfromrange(self, repo, rangeid): 1022 headrev, index = rangeid 1023 rangelength = self.rangelength(repo, rangeid) 1024 if rangelength == 1: 1025 revs = [headrev] 1026 else: 1027 # get all revs under heads in stable order 1028 # 1029 # note: In the general case we can just walk down and then request 1030 # data about the merge. But I'm not sure this function will be even 1031 # call for the general case. 1032 1033 allrevs = self._stablesortcache.get(headrev) 1034 if allrevs is None: 1035 allrevs = self._getrevsfrommerge(repo, headrev) 1036 if allrevs is None: 1037 mc = self._filestablesortcache 1038 sorting = stablesort.stablesort_branchpoint 1039 allrevs = sorting(repo, [headrev], mergecallback=mc) 1040 self._stablesortcache[headrev] = allrevs 1041 # takes from index 1042 revs = allrevs[index:] 1043 # sanity checks 1044 assert len(revs) == rangelength 1045 return revs 1046 1047 def _parents(self, rev, func): 1048 parents = self._parentscache.get(rev) 1049 if parents is None: 1050 parents = func(rev) 1051 self._parentscache[rev] = parents 1052 return parents 1053 1054 def _getsub(self, rev): 1055 """utility function used to access the subranges cache 1056 1057 This mostly exist to help the on disk persistence""" 1058 return self._subrangescache.get(rev) 1059 1060 def _setsub(self, rev, value): 1061 """utility function used to set the subranges cache 1062 1063 This mostly exist to help the on disk persistence.""" 1064 self._subrangescache[rev] = value 1065 1066 def _filestablesortcache(self, sortedrevs, merge): 1067 if merge not in self._stablesortprepared: 1068 self._stablesortprepared[merge] = (sortedrevs, len(sortedrevs)) 1069 1070 def _getrevsfrommerge(self, repo, merge): 1071 prepared = self._stablesortprepared.get(merge) 1072 if prepared is None: 1073 return None 1074 1075 mergedepth = self.depthrev(repo, merge) 1076 allrevs = prepared[0][:prepared[1]] 1077 nbextrarevs = prepared[1] - mergedepth 1078 if not nbextrarevs: 1079 return allrevs 1080 1081 anc = repo.changelog.ancestors([merge], inclusive=True) 1082 top = [] 1083 counter = nbextrarevs 1084 for rev in reversed(allrevs): 1085 if rev in anc: 1086 top.append(rev) 1087 else: 1088 counter -= 1 1089 if counter <= 0: 1090 break 1091 1092 bottomidx = prepared[1] - (nbextrarevs + len(top)) 1093 revs = allrevs[:bottomidx] 1094 revs.extend(reversed(top)) 1095 return revs 1096 1097 def _inheritancepoint(self, repo, merge): 1098 """Find the inheritance point of a Merge 1099 1100 The first part of the stable sorted list of revision of a merge will shared with 1101 the one of others. This means we can reuse subranges computed from that point to 1102 compute some of the subranges from the merge. 1103 1104 That point is latest point in the stable sorted list where the depth of the 1105 revisions match its index (that means all revision earlier in the stable sorted 1106 list are its ancestors, no dangling unrelated branches exists). 1107 """ 1108 value = self._inheritancecache.get(merge) 1109 if value is None: 1110 revs = self.revsfromrange(repo, (merge, 0)) 1111 i = reversed(revs) 1112 next(i) # pop the merge 1113 expected = len(revs) - 1 1114 # Since we do warmup properly, we can expect the cache to be hot 1115 # for everythin under the merge we investigate 1116 cache = repo.depthcache 1117 # note: we cannot do a binary search because element under the 1118 # inherited point might have mismatching depth because of inner 1119 # branching. 1120 for rev in i: 1121 if cache.get(rev) == expected: 1122 break 1123 expected -= 1 1124 value = (expected - 1, rev) 1125 self._inheritancecache[merge] = value 1126 return value 1127 1128 def _subranges(self, repo, rangeid): 1129 if self.rangelength(repo, rangeid) == 1: 1130 return [] 1131 slicepoint = self._slicepoint(repo, rangeid) 1132 1133 # make sure we have cache for all relevant parent first to prevent 1134 # recursion (python is bad with recursion 1135 stack = [] 1136 current = rangeid 1137 while current is not None: 1138 current = self._cold_reusable(repo, current, slicepoint) 1139 if current is not None: 1140 stack.append(current) 1141 while stack: 1142 # these call will directly compute the subranges 1143 self.subranges(repo, stack.pop()) 1144 return self._slicesrangeat(repo, rangeid, slicepoint) 1145 1146 def _cold_reusable(self, repo, rangeid, slicepoint): 1147 """return parent range that it would be useful to prepare to slice 1148 rangeid at slicepoint 1149 1150 This function also have the important task to update the revscache of 1151 the parent rev s if possible and needed""" 1152 ps = filterparents(self._parents(rangeid[0], repo.changelog.parentrevs)) 1153 if not ps: 1154 return None 1155 elif len(ps) == 1: 1156 # regular changesets, we pick the parent 1157 reusablerev = ps[0] 1158 else: 1159 # merge, we try the inheritance point 1160 # if it is too low, it will be ditched by the depth check anyway 1161 index, reusablerev = self._inheritancepoint(repo, rangeid[0]) 1162 1163 # if we reached the slicepoint, no need to go further 1164 if self.depthrev(repo, reusablerev) <= slicepoint: 1165 return None 1166 1167 reurange = (reusablerev, rangeid[1]) 1168 # if we have an entry for the current range, lets update the cache 1169 # if we already have subrange for this range, no need to prepare it. 1170 if self._getsub(reurange) is not None: 1171 return None 1172 1173 # look like we found a relevent parentrange with no cache yet 1174 return reurange 1175 1176 def _slicesrangeat(self, repo, rangeid, globalindex): 1177 ps = self._parents(rangeid[0], repo.changelog.parentrevs) 1178 if len(ps) == 1: 1179 reuserev = ps[0] 1180 else: 1181 index, reuserev = self._inheritancepoint(repo, rangeid[0]) 1182 if index < globalindex: 1183 return self._slicesrangeatmerge(repo, rangeid, globalindex) 1184 1185 assert reuserev != nodemod.nullrev 1186 1187 reuserange = (reuserev, rangeid[1]) 1188 top = (rangeid[0], globalindex) 1189 1190 if rangeid[1] + self.rangelength(repo, reuserange) == globalindex: 1191 return [reuserange, top] 1192 # This will not initiate a recursion since we took appropriate 1193 # precaution in the caller of this method to ensure it will be so. 1194 # It the parent is a merge that will not be the case but computing 1195 # subranges from a merge will not recurse. 1196 reusesubranges = self.subranges(repo, reuserange) 1197 slices = reusesubranges[:-1] # pop the top 1198 slices.append(top) 1199 return slices 1200 1201 def _slicesrangeatmerge(self, repo, rangeid, globalindex): 1202 localindex = globalindex - rangeid[1] 1203 1204 result = [] 1205 allrevs = self.revsfromrange(repo, rangeid) 1206 bottomrevs = allrevs[:localindex] 1207 1208 if globalindex == self.depthrev(repo, bottomrevs[-1]): 1209 # simple case, top revision in the bottom set contains exactly the 1210 # revision we needs 1211 result.append((bottomrevs[-1], rangeid[1])) 1212 else: 1213 head = None 1214 headrange = None 1215 skip = None 1216 for rev in bottomrevs[::-1]: 1217 if head is None: 1218 head = rev 1219 headrange = self.revsfromrange(repo, (head, 0)) 1220 skip = self.depthrev(repo, rev) - 1 1221 elif rev != headrange[skip - 1]: 1222 result.append((head, skip)) 1223 head = rev 1224 headrange = self.revsfromrange(repo, (head, 0)) 1225 skip = self.depthrev(repo, rev) - 1 1226 else: 1227 skip -= 1 1228 result.append((head, skip)) 1229 1230 result.reverse() 1231 1232 # top part is trivial 1233 top = (rangeid[0], globalindex) 1234 result.append(top) 1235 return result 1236 1237# merge later for outer layer wrapping 1238eh.merge(stablesort.eh) 1239eh.merge(firstmergecache.eh) 1240