1// Copyright 2009 The Go Authors. All rights reserved. 2// Use of this source code is governed by a BSD-style 3// license that can be found in the LICENSE file. 4 5// Garbage collector (GC). 6// 7// The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple 8// GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is 9// non-generational and non-compacting. Allocation is done using size segregated per P allocation 10// areas to minimize fragmentation while eliminating locks in the common case. 11// 12// The algorithm decomposes into several steps. 13// This is a high level description of the algorithm being used. For an overview of GC a good 14// place to start is Richard Jones' gchandbook.org. 15// 16// The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see 17// Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978. 18// On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978), 19// 966-975. 20// For journal quality proofs that these steps are complete, correct, and terminate see 21// Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world. 22// Concurrency and Computation: Practice and Experience 15(3-5), 2003. 23// 24// 1. GC performs sweep termination. 25// 26// a. Stop the world. This causes all Ps to reach a GC safe-point. 27// 28// b. Sweep any unswept spans. There will only be unswept spans if 29// this GC cycle was forced before the expected time. 30// 31// 2. GC performs the mark phase. 32// 33// a. Prepare for the mark phase by setting gcphase to _GCmark 34// (from _GCoff), enabling the write barrier, enabling mutator 35// assists, and enqueueing root mark jobs. No objects may be 36// scanned until all Ps have enabled the write barrier, which is 37// accomplished using STW. 38// 39// b. Start the world. From this point, GC work is done by mark 40// workers started by the scheduler and by assists performed as 41// part of allocation. The write barrier shades both the 42// overwritten pointer and the new pointer value for any pointer 43// writes (see mbarrier.go for details). Newly allocated objects 44// are immediately marked black. 45// 46// c. GC performs root marking jobs. This includes scanning all 47// stacks, shading all globals, and shading any heap pointers in 48// off-heap runtime data structures. Scanning a stack stops a 49// goroutine, shades any pointers found on its stack, and then 50// resumes the goroutine. 51// 52// d. GC drains the work queue of grey objects, scanning each grey 53// object to black and shading all pointers found in the object 54// (which in turn may add those pointers to the work queue). 55// 56// e. Because GC work is spread across local caches, GC uses a 57// distributed termination algorithm to detect when there are no 58// more root marking jobs or grey objects (see gcMarkDone). At this 59// point, GC transitions to mark termination. 60// 61// 3. GC performs mark termination. 62// 63// a. Stop the world. 64// 65// b. Set gcphase to _GCmarktermination, and disable workers and 66// assists. 67// 68// c. Perform housekeeping like flushing mcaches. 69// 70// 4. GC performs the sweep phase. 71// 72// a. Prepare for the sweep phase by setting gcphase to _GCoff, 73// setting up sweep state and disabling the write barrier. 74// 75// b. Start the world. From this point on, newly allocated objects 76// are white, and allocating sweeps spans before use if necessary. 77// 78// c. GC does concurrent sweeping in the background and in response 79// to allocation. See description below. 80// 81// 5. When sufficient allocation has taken place, replay the sequence 82// starting with 1 above. See discussion of GC rate below. 83 84// Concurrent sweep. 85// 86// The sweep phase proceeds concurrently with normal program execution. 87// The heap is swept span-by-span both lazily (when a goroutine needs another span) 88// and concurrently in a background goroutine (this helps programs that are not CPU bound). 89// At the end of STW mark termination all spans are marked as "needs sweeping". 90// 91// The background sweeper goroutine simply sweeps spans one-by-one. 92// 93// To avoid requesting more OS memory while there are unswept spans, when a 94// goroutine needs another span, it first attempts to reclaim that much memory 95// by sweeping. When a goroutine needs to allocate a new small-object span, it 96// sweeps small-object spans for the same object size until it frees at least 97// one object. When a goroutine needs to allocate large-object span from heap, 98// it sweeps spans until it frees at least that many pages into heap. There is 99// one case where this may not suffice: if a goroutine sweeps and frees two 100// nonadjacent one-page spans to the heap, it will allocate a new two-page 101// span, but there can still be other one-page unswept spans which could be 102// combined into a two-page span. 103// 104// It's critical to ensure that no operations proceed on unswept spans (that would corrupt 105// mark bits in GC bitmap). During GC all mcaches are flushed into the central cache, 106// so they are empty. When a goroutine grabs a new span into mcache, it sweeps it. 107// When a goroutine explicitly frees an object or sets a finalizer, it ensures that 108// the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish). 109// The finalizer goroutine is kicked off only when all spans are swept. 110// When the next GC starts, it sweeps all not-yet-swept spans (if any). 111 112// GC rate. 113// Next GC is after we've allocated an extra amount of memory proportional to 114// the amount already in use. The proportion is controlled by GOGC environment variable 115// (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M 116// (this mark is tracked in next_gc variable). This keeps the GC cost in linear 117// proportion to the allocation cost. Adjusting GOGC just changes the linear constant 118// (and also the amount of extra memory used). 119 120// Oblets 121// 122// In order to prevent long pauses while scanning large objects and to 123// improve parallelism, the garbage collector breaks up scan jobs for 124// objects larger than maxObletBytes into "oblets" of at most 125// maxObletBytes. When scanning encounters the beginning of a large 126// object, it scans only the first oblet and enqueues the remaining 127// oblets as new scan jobs. 128 129package runtime 130 131import ( 132 "internal/cpu" 133 "runtime/internal/atomic" 134 "unsafe" 135) 136 137const ( 138 _DebugGC = 0 139 _ConcurrentSweep = true 140 _FinBlockSize = 4 * 1024 141 142 // debugScanConservative enables debug logging for stack 143 // frames that are scanned conservatively. 144 debugScanConservative = false 145 146 // sweepMinHeapDistance is a lower bound on the heap distance 147 // (in bytes) reserved for concurrent sweeping between GC 148 // cycles. 149 sweepMinHeapDistance = 1024 * 1024 150) 151 152// heapminimum is the minimum heap size at which to trigger GC. 153// For small heaps, this overrides the usual GOGC*live set rule. 154// 155// When there is a very small live set but a lot of allocation, simply 156// collecting when the heap reaches GOGC*live results in many GC 157// cycles and high total per-GC overhead. This minimum amortizes this 158// per-GC overhead while keeping the heap reasonably small. 159// 160// During initialization this is set to 4MB*GOGC/100. In the case of 161// GOGC==0, this will set heapminimum to 0, resulting in constant 162// collection even when the heap size is small, which is useful for 163// debugging. 164var heapminimum uint64 = defaultHeapMinimum 165 166// defaultHeapMinimum is the value of heapminimum for GOGC==100. 167const defaultHeapMinimum = 4 << 20 168 169// Initialized from $GOGC. GOGC=off means no GC. 170var gcpercent int32 171 172func gcinit() { 173 if unsafe.Sizeof(workbuf{}) != _WorkbufSize { 174 throw("size of Workbuf is suboptimal") 175 } 176 177 // No sweep on the first cycle. 178 mheap_.sweepdone = 1 179 180 // Set a reasonable initial GC trigger. 181 memstats.triggerRatio = 7 / 8.0 182 183 // Fake a heap_marked value so it looks like a trigger at 184 // heapminimum is the appropriate growth from heap_marked. 185 // This will go into computing the initial GC goal. 186 memstats.heap_marked = uint64(float64(heapminimum) / (1 + memstats.triggerRatio)) 187 188 // Set gcpercent from the environment. This will also compute 189 // and set the GC trigger and goal. 190 _ = setGCPercent(readgogc()) 191 192 work.startSema = 1 193 work.markDoneSema = 1 194} 195 196func readgogc() int32 { 197 p := gogetenv("GOGC") 198 if p == "off" { 199 return -1 200 } 201 if n, ok := atoi32(p); ok { 202 return n 203 } 204 return 100 205} 206 207// gcenable is called after the bulk of the runtime initialization, 208// just before we're about to start letting user code run. 209// It kicks off the background sweeper goroutine, the background 210// scavenger goroutine, and enables GC. 211func gcenable() { 212 // Kick off sweeping and scavenging. 213 c := make(chan int, 2) 214 expectSystemGoroutine() 215 go bgsweep(c) 216 expectSystemGoroutine() 217 go bgscavenge(c) 218 <-c 219 <-c 220 memstats.enablegc = true // now that runtime is initialized, GC is okay 221} 222 223//go:linkname setGCPercent runtime..z2fdebug.setGCPercent 224func setGCPercent(in int32) (out int32) { 225 // Run on the system stack since we grab the heap lock. 226 systemstack(func() { 227 lock(&mheap_.lock) 228 out = gcpercent 229 if in < 0 { 230 in = -1 231 } 232 gcpercent = in 233 heapminimum = defaultHeapMinimum * uint64(gcpercent) / 100 234 // Update pacing in response to gcpercent change. 235 gcSetTriggerRatio(memstats.triggerRatio) 236 unlock(&mheap_.lock) 237 }) 238 // Pacing changed, so the scavenger should be awoken. 239 wakeScavenger() 240 241 // If we just disabled GC, wait for any concurrent GC mark to 242 // finish so we always return with no GC running. 243 if in < 0 { 244 gcWaitOnMark(atomic.Load(&work.cycles)) 245 } 246 247 return out 248} 249 250// Garbage collector phase. 251// Indicates to write barrier and synchronization task to perform. 252var gcphase uint32 253 254// The compiler knows about this variable. 255// If you change it, you must change gofrontend/wb.cc, too. 256// If you change the first four bytes, you must also change the write 257// barrier insertion code. 258var writeBarrier struct { 259 enabled bool // compiler emits a check of this before calling write barrier 260 pad [3]byte // compiler uses 32-bit load for "enabled" field 261 needed bool // whether we need a write barrier for current GC phase 262 cgo bool // whether we need a write barrier for a cgo check 263 alignme uint64 // guarantee alignment so that compiler can use a 32 or 64-bit load 264} 265 266// gcBlackenEnabled is 1 if mutator assists and background mark 267// workers are allowed to blacken objects. This must only be set when 268// gcphase == _GCmark. 269var gcBlackenEnabled uint32 270 271const ( 272 _GCoff = iota // GC not running; sweeping in background, write barrier disabled 273 _GCmark // GC marking roots and workbufs: allocate black, write barrier ENABLED 274 _GCmarktermination // GC mark termination: allocate black, P's help GC, write barrier ENABLED 275) 276 277//go:nosplit 278func setGCPhase(x uint32) { 279 atomic.Store(&gcphase, x) 280 writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination 281 writeBarrier.enabled = writeBarrier.needed || writeBarrier.cgo 282} 283 284// gcMarkWorkerMode represents the mode that a concurrent mark worker 285// should operate in. 286// 287// Concurrent marking happens through four different mechanisms. One 288// is mutator assists, which happen in response to allocations and are 289// not scheduled. The other three are variations in the per-P mark 290// workers and are distinguished by gcMarkWorkerMode. 291type gcMarkWorkerMode int 292 293const ( 294 // gcMarkWorkerDedicatedMode indicates that the P of a mark 295 // worker is dedicated to running that mark worker. The mark 296 // worker should run without preemption. 297 gcMarkWorkerDedicatedMode gcMarkWorkerMode = iota 298 299 // gcMarkWorkerFractionalMode indicates that a P is currently 300 // running the "fractional" mark worker. The fractional worker 301 // is necessary when GOMAXPROCS*gcBackgroundUtilization is not 302 // an integer. The fractional worker should run until it is 303 // preempted and will be scheduled to pick up the fractional 304 // part of GOMAXPROCS*gcBackgroundUtilization. 305 gcMarkWorkerFractionalMode 306 307 // gcMarkWorkerIdleMode indicates that a P is running the mark 308 // worker because it has nothing else to do. The idle worker 309 // should run until it is preempted and account its time 310 // against gcController.idleMarkTime. 311 gcMarkWorkerIdleMode 312) 313 314// gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes 315// to use in execution traces. 316var gcMarkWorkerModeStrings = [...]string{ 317 "GC (dedicated)", 318 "GC (fractional)", 319 "GC (idle)", 320} 321 322// gcController implements the GC pacing controller that determines 323// when to trigger concurrent garbage collection and how much marking 324// work to do in mutator assists and background marking. 325// 326// It uses a feedback control algorithm to adjust the memstats.gc_trigger 327// trigger based on the heap growth and GC CPU utilization each cycle. 328// This algorithm optimizes for heap growth to match GOGC and for CPU 329// utilization between assist and background marking to be 25% of 330// GOMAXPROCS. The high-level design of this algorithm is documented 331// at https://golang.org/s/go15gcpacing. 332// 333// All fields of gcController are used only during a single mark 334// cycle. 335var gcController gcControllerState 336 337type gcControllerState struct { 338 // scanWork is the total scan work performed this cycle. This 339 // is updated atomically during the cycle. Updates occur in 340 // bounded batches, since it is both written and read 341 // throughout the cycle. At the end of the cycle, this is how 342 // much of the retained heap is scannable. 343 // 344 // Currently this is the bytes of heap scanned. For most uses, 345 // this is an opaque unit of work, but for estimation the 346 // definition is important. 347 scanWork int64 348 349 // bgScanCredit is the scan work credit accumulated by the 350 // concurrent background scan. This credit is accumulated by 351 // the background scan and stolen by mutator assists. This is 352 // updated atomically. Updates occur in bounded batches, since 353 // it is both written and read throughout the cycle. 354 bgScanCredit int64 355 356 // assistTime is the nanoseconds spent in mutator assists 357 // during this cycle. This is updated atomically. Updates 358 // occur in bounded batches, since it is both written and read 359 // throughout the cycle. 360 assistTime int64 361 362 // dedicatedMarkTime is the nanoseconds spent in dedicated 363 // mark workers during this cycle. This is updated atomically 364 // at the end of the concurrent mark phase. 365 dedicatedMarkTime int64 366 367 // fractionalMarkTime is the nanoseconds spent in the 368 // fractional mark worker during this cycle. This is updated 369 // atomically throughout the cycle and will be up-to-date if 370 // the fractional mark worker is not currently running. 371 fractionalMarkTime int64 372 373 // idleMarkTime is the nanoseconds spent in idle marking 374 // during this cycle. This is updated atomically throughout 375 // the cycle. 376 idleMarkTime int64 377 378 // markStartTime is the absolute start time in nanoseconds 379 // that assists and background mark workers started. 380 markStartTime int64 381 382 // dedicatedMarkWorkersNeeded is the number of dedicated mark 383 // workers that need to be started. This is computed at the 384 // beginning of each cycle and decremented atomically as 385 // dedicated mark workers get started. 386 dedicatedMarkWorkersNeeded int64 387 388 // assistWorkPerByte is the ratio of scan work to allocated 389 // bytes that should be performed by mutator assists. This is 390 // computed at the beginning of each cycle and updated every 391 // time heap_scan is updated. 392 assistWorkPerByte float64 393 394 // assistBytesPerWork is 1/assistWorkPerByte. 395 assistBytesPerWork float64 396 397 // fractionalUtilizationGoal is the fraction of wall clock 398 // time that should be spent in the fractional mark worker on 399 // each P that isn't running a dedicated worker. 400 // 401 // For example, if the utilization goal is 25% and there are 402 // no dedicated workers, this will be 0.25. If the goal is 403 // 25%, there is one dedicated worker, and GOMAXPROCS is 5, 404 // this will be 0.05 to make up the missing 5%. 405 // 406 // If this is zero, no fractional workers are needed. 407 fractionalUtilizationGoal float64 408 409 _ cpu.CacheLinePad 410} 411 412// startCycle resets the GC controller's state and computes estimates 413// for a new GC cycle. The caller must hold worldsema. 414func (c *gcControllerState) startCycle() { 415 c.scanWork = 0 416 c.bgScanCredit = 0 417 c.assistTime = 0 418 c.dedicatedMarkTime = 0 419 c.fractionalMarkTime = 0 420 c.idleMarkTime = 0 421 422 // Ensure that the heap goal is at least a little larger than 423 // the current live heap size. This may not be the case if GC 424 // start is delayed or if the allocation that pushed heap_live 425 // over gc_trigger is large or if the trigger is really close to 426 // GOGC. Assist is proportional to this distance, so enforce a 427 // minimum distance, even if it means going over the GOGC goal 428 // by a tiny bit. 429 if memstats.next_gc < memstats.heap_live+1024*1024 { 430 memstats.next_gc = memstats.heap_live + 1024*1024 431 } 432 433 // Compute the background mark utilization goal. In general, 434 // this may not come out exactly. We round the number of 435 // dedicated workers so that the utilization is closest to 436 // 25%. For small GOMAXPROCS, this would introduce too much 437 // error, so we add fractional workers in that case. 438 totalUtilizationGoal := float64(gomaxprocs) * gcBackgroundUtilization 439 c.dedicatedMarkWorkersNeeded = int64(totalUtilizationGoal + 0.5) 440 utilError := float64(c.dedicatedMarkWorkersNeeded)/totalUtilizationGoal - 1 441 const maxUtilError = 0.3 442 if utilError < -maxUtilError || utilError > maxUtilError { 443 // Rounding put us more than 30% off our goal. With 444 // gcBackgroundUtilization of 25%, this happens for 445 // GOMAXPROCS<=3 or GOMAXPROCS=6. Enable fractional 446 // workers to compensate. 447 if float64(c.dedicatedMarkWorkersNeeded) > totalUtilizationGoal { 448 // Too many dedicated workers. 449 c.dedicatedMarkWorkersNeeded-- 450 } 451 c.fractionalUtilizationGoal = (totalUtilizationGoal - float64(c.dedicatedMarkWorkersNeeded)) / float64(gomaxprocs) 452 } else { 453 c.fractionalUtilizationGoal = 0 454 } 455 456 // In STW mode, we just want dedicated workers. 457 if debug.gcstoptheworld > 0 { 458 c.dedicatedMarkWorkersNeeded = int64(gomaxprocs) 459 c.fractionalUtilizationGoal = 0 460 } 461 462 // Clear per-P state 463 for _, p := range allp { 464 p.gcAssistTime = 0 465 p.gcFractionalMarkTime = 0 466 } 467 468 // Compute initial values for controls that are updated 469 // throughout the cycle. 470 c.revise() 471 472 if debug.gcpacertrace > 0 { 473 print("pacer: assist ratio=", c.assistWorkPerByte, 474 " (scan ", memstats.heap_scan>>20, " MB in ", 475 work.initialHeapLive>>20, "->", 476 memstats.next_gc>>20, " MB)", 477 " workers=", c.dedicatedMarkWorkersNeeded, 478 "+", c.fractionalUtilizationGoal, "\n") 479 } 480} 481 482// revise updates the assist ratio during the GC cycle to account for 483// improved estimates. This should be called either under STW or 484// whenever memstats.heap_scan, memstats.heap_live, or 485// memstats.next_gc is updated (with mheap_.lock held). 486// 487// It should only be called when gcBlackenEnabled != 0 (because this 488// is when assists are enabled and the necessary statistics are 489// available). 490func (c *gcControllerState) revise() { 491 gcpercent := gcpercent 492 if gcpercent < 0 { 493 // If GC is disabled but we're running a forced GC, 494 // act like GOGC is huge for the below calculations. 495 gcpercent = 100000 496 } 497 live := atomic.Load64(&memstats.heap_live) 498 499 // Assume we're under the soft goal. Pace GC to complete at 500 // next_gc assuming the heap is in steady-state. 501 heapGoal := int64(memstats.next_gc) 502 503 // Compute the expected scan work remaining. 504 // 505 // This is estimated based on the expected 506 // steady-state scannable heap. For example, with 507 // GOGC=100, only half of the scannable heap is 508 // expected to be live, so that's what we target. 509 // 510 // (This is a float calculation to avoid overflowing on 511 // 100*heap_scan.) 512 scanWorkExpected := int64(float64(memstats.heap_scan) * 100 / float64(100+gcpercent)) 513 514 if live > memstats.next_gc || c.scanWork > scanWorkExpected { 515 // We're past the soft goal, or we've already done more scan 516 // work than we expected. Pace GC so that in the worst case it 517 // will complete by the hard goal. 518 const maxOvershoot = 1.1 519 heapGoal = int64(float64(memstats.next_gc) * maxOvershoot) 520 521 // Compute the upper bound on the scan work remaining. 522 scanWorkExpected = int64(memstats.heap_scan) 523 } 524 525 // Compute the remaining scan work estimate. 526 // 527 // Note that we currently count allocations during GC as both 528 // scannable heap (heap_scan) and scan work completed 529 // (scanWork), so allocation will change this difference 530 // slowly in the soft regime and not at all in the hard 531 // regime. 532 scanWorkRemaining := scanWorkExpected - c.scanWork 533 if scanWorkRemaining < 1000 { 534 // We set a somewhat arbitrary lower bound on 535 // remaining scan work since if we aim a little high, 536 // we can miss by a little. 537 // 538 // We *do* need to enforce that this is at least 1, 539 // since marking is racy and double-scanning objects 540 // may legitimately make the remaining scan work 541 // negative, even in the hard goal regime. 542 scanWorkRemaining = 1000 543 } 544 545 // Compute the heap distance remaining. 546 heapRemaining := heapGoal - int64(live) 547 if heapRemaining <= 0 { 548 // This shouldn't happen, but if it does, avoid 549 // dividing by zero or setting the assist negative. 550 heapRemaining = 1 551 } 552 553 // Compute the mutator assist ratio so by the time the mutator 554 // allocates the remaining heap bytes up to next_gc, it will 555 // have done (or stolen) the remaining amount of scan work. 556 c.assistWorkPerByte = float64(scanWorkRemaining) / float64(heapRemaining) 557 c.assistBytesPerWork = float64(heapRemaining) / float64(scanWorkRemaining) 558} 559 560// endCycle computes the trigger ratio for the next cycle. 561func (c *gcControllerState) endCycle() float64 { 562 if work.userForced { 563 // Forced GC means this cycle didn't start at the 564 // trigger, so where it finished isn't good 565 // information about how to adjust the trigger. 566 // Just leave it where it is. 567 return memstats.triggerRatio 568 } 569 570 // Proportional response gain for the trigger controller. Must 571 // be in [0, 1]. Lower values smooth out transient effects but 572 // take longer to respond to phase changes. Higher values 573 // react to phase changes quickly, but are more affected by 574 // transient changes. Values near 1 may be unstable. 575 const triggerGain = 0.5 576 577 // Compute next cycle trigger ratio. First, this computes the 578 // "error" for this cycle; that is, how far off the trigger 579 // was from what it should have been, accounting for both heap 580 // growth and GC CPU utilization. We compute the actual heap 581 // growth during this cycle and scale that by how far off from 582 // the goal CPU utilization we were (to estimate the heap 583 // growth if we had the desired CPU utilization). The 584 // difference between this estimate and the GOGC-based goal 585 // heap growth is the error. 586 goalGrowthRatio := gcEffectiveGrowthRatio() 587 actualGrowthRatio := float64(memstats.heap_live)/float64(memstats.heap_marked) - 1 588 assistDuration := nanotime() - c.markStartTime 589 590 // Assume background mark hit its utilization goal. 591 utilization := gcBackgroundUtilization 592 // Add assist utilization; avoid divide by zero. 593 if assistDuration > 0 { 594 utilization += float64(c.assistTime) / float64(assistDuration*int64(gomaxprocs)) 595 } 596 597 triggerError := goalGrowthRatio - memstats.triggerRatio - utilization/gcGoalUtilization*(actualGrowthRatio-memstats.triggerRatio) 598 599 // Finally, we adjust the trigger for next time by this error, 600 // damped by the proportional gain. 601 triggerRatio := memstats.triggerRatio + triggerGain*triggerError 602 603 if debug.gcpacertrace > 0 { 604 // Print controller state in terms of the design 605 // document. 606 H_m_prev := memstats.heap_marked 607 h_t := memstats.triggerRatio 608 H_T := memstats.gc_trigger 609 h_a := actualGrowthRatio 610 H_a := memstats.heap_live 611 h_g := goalGrowthRatio 612 H_g := int64(float64(H_m_prev) * (1 + h_g)) 613 u_a := utilization 614 u_g := gcGoalUtilization 615 W_a := c.scanWork 616 print("pacer: H_m_prev=", H_m_prev, 617 " h_t=", h_t, " H_T=", H_T, 618 " h_a=", h_a, " H_a=", H_a, 619 " h_g=", h_g, " H_g=", H_g, 620 " u_a=", u_a, " u_g=", u_g, 621 " W_a=", W_a, 622 " goalΔ=", goalGrowthRatio-h_t, 623 " actualΔ=", h_a-h_t, 624 " u_a/u_g=", u_a/u_g, 625 "\n") 626 } 627 628 return triggerRatio 629} 630 631// enlistWorker encourages another dedicated mark worker to start on 632// another P if there are spare worker slots. It is used by putfull 633// when more work is made available. 634// 635//go:nowritebarrier 636func (c *gcControllerState) enlistWorker() { 637 // If there are idle Ps, wake one so it will run an idle worker. 638 // NOTE: This is suspected of causing deadlocks. See golang.org/issue/19112. 639 // 640 // if atomic.Load(&sched.npidle) != 0 && atomic.Load(&sched.nmspinning) == 0 { 641 // wakep() 642 // return 643 // } 644 645 // There are no idle Ps. If we need more dedicated workers, 646 // try to preempt a running P so it will switch to a worker. 647 if c.dedicatedMarkWorkersNeeded <= 0 { 648 return 649 } 650 // Pick a random other P to preempt. 651 if gomaxprocs <= 1 { 652 return 653 } 654 gp := getg() 655 if gp == nil || gp.m == nil || gp.m.p == 0 { 656 return 657 } 658 myID := gp.m.p.ptr().id 659 for tries := 0; tries < 5; tries++ { 660 id := int32(fastrandn(uint32(gomaxprocs - 1))) 661 if id >= myID { 662 id++ 663 } 664 p := allp[id] 665 if p.status != _Prunning { 666 continue 667 } 668 if preemptone(p) { 669 return 670 } 671 } 672} 673 674// findRunnableGCWorker returns the background mark worker for _p_ if it 675// should be run. This must only be called when gcBlackenEnabled != 0. 676func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g { 677 if gcBlackenEnabled == 0 { 678 throw("gcControllerState.findRunnable: blackening not enabled") 679 } 680 if _p_.gcBgMarkWorker == 0 { 681 // The mark worker associated with this P is blocked 682 // performing a mark transition. We can't run it 683 // because it may be on some other run or wait queue. 684 return nil 685 } 686 687 if !gcMarkWorkAvailable(_p_) { 688 // No work to be done right now. This can happen at 689 // the end of the mark phase when there are still 690 // assists tapering off. Don't bother running a worker 691 // now because it'll just return immediately. 692 return nil 693 } 694 695 decIfPositive := func(ptr *int64) bool { 696 if *ptr > 0 { 697 if atomic.Xaddint64(ptr, -1) >= 0 { 698 return true 699 } 700 // We lost a race 701 atomic.Xaddint64(ptr, +1) 702 } 703 return false 704 } 705 706 if decIfPositive(&c.dedicatedMarkWorkersNeeded) { 707 // This P is now dedicated to marking until the end of 708 // the concurrent mark phase. 709 _p_.gcMarkWorkerMode = gcMarkWorkerDedicatedMode 710 } else if c.fractionalUtilizationGoal == 0 { 711 // No need for fractional workers. 712 return nil 713 } else { 714 // Is this P behind on the fractional utilization 715 // goal? 716 // 717 // This should be kept in sync with pollFractionalWorkerExit. 718 delta := nanotime() - gcController.markStartTime 719 if delta > 0 && float64(_p_.gcFractionalMarkTime)/float64(delta) > c.fractionalUtilizationGoal { 720 // Nope. No need to run a fractional worker. 721 return nil 722 } 723 // Run a fractional worker. 724 _p_.gcMarkWorkerMode = gcMarkWorkerFractionalMode 725 } 726 727 // Run the background mark worker 728 gp := _p_.gcBgMarkWorker.ptr() 729 casgstatus(gp, _Gwaiting, _Grunnable) 730 if trace.enabled { 731 traceGoUnpark(gp, 0) 732 } 733 return gp 734} 735 736// pollFractionalWorkerExit reports whether a fractional mark worker 737// should self-preempt. It assumes it is called from the fractional 738// worker. 739func pollFractionalWorkerExit() bool { 740 // This should be kept in sync with the fractional worker 741 // scheduler logic in findRunnableGCWorker. 742 now := nanotime() 743 delta := now - gcController.markStartTime 744 if delta <= 0 { 745 return true 746 } 747 p := getg().m.p.ptr() 748 selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime) 749 // Add some slack to the utilization goal so that the 750 // fractional worker isn't behind again the instant it exits. 751 return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal 752} 753 754// gcSetTriggerRatio sets the trigger ratio and updates everything 755// derived from it: the absolute trigger, the heap goal, mark pacing, 756// and sweep pacing. 757// 758// This can be called any time. If GC is the in the middle of a 759// concurrent phase, it will adjust the pacing of that phase. 760// 761// This depends on gcpercent, memstats.heap_marked, and 762// memstats.heap_live. These must be up to date. 763// 764// mheap_.lock must be held or the world must be stopped. 765func gcSetTriggerRatio(triggerRatio float64) { 766 // Compute the next GC goal, which is when the allocated heap 767 // has grown by GOGC/100 over the heap marked by the last 768 // cycle. 769 goal := ^uint64(0) 770 if gcpercent >= 0 { 771 goal = memstats.heap_marked + memstats.heap_marked*uint64(gcpercent)/100 772 } 773 774 // Set the trigger ratio, capped to reasonable bounds. 775 if gcpercent >= 0 { 776 scalingFactor := float64(gcpercent) / 100 777 // Ensure there's always a little margin so that the 778 // mutator assist ratio isn't infinity. 779 maxTriggerRatio := 0.95 * scalingFactor 780 if triggerRatio > maxTriggerRatio { 781 triggerRatio = maxTriggerRatio 782 } 783 784 // If we let triggerRatio go too low, then if the application 785 // is allocating very rapidly we might end up in a situation 786 // where we're allocating black during a nearly always-on GC. 787 // The result of this is a growing heap and ultimately an 788 // increase in RSS. By capping us at a point >0, we're essentially 789 // saying that we're OK using more CPU during the GC to prevent 790 // this growth in RSS. 791 // 792 // The current constant was chosen empirically: given a sufficiently 793 // fast/scalable allocator with 48 Ps that could drive the trigger ratio 794 // to <0.05, this constant causes applications to retain the same peak 795 // RSS compared to not having this allocator. 796 minTriggerRatio := 0.6 * scalingFactor 797 if triggerRatio < minTriggerRatio { 798 triggerRatio = minTriggerRatio 799 } 800 } else if triggerRatio < 0 { 801 // gcpercent < 0, so just make sure we're not getting a negative 802 // triggerRatio. This case isn't expected to happen in practice, 803 // and doesn't really matter because if gcpercent < 0 then we won't 804 // ever consume triggerRatio further on in this function, but let's 805 // just be defensive here; the triggerRatio being negative is almost 806 // certainly undesirable. 807 triggerRatio = 0 808 } 809 memstats.triggerRatio = triggerRatio 810 811 // Compute the absolute GC trigger from the trigger ratio. 812 // 813 // We trigger the next GC cycle when the allocated heap has 814 // grown by the trigger ratio over the marked heap size. 815 trigger := ^uint64(0) 816 if gcpercent >= 0 { 817 trigger = uint64(float64(memstats.heap_marked) * (1 + triggerRatio)) 818 // Don't trigger below the minimum heap size. 819 minTrigger := heapminimum 820 if !isSweepDone() { 821 // Concurrent sweep happens in the heap growth 822 // from heap_live to gc_trigger, so ensure 823 // that concurrent sweep has some heap growth 824 // in which to perform sweeping before we 825 // start the next GC cycle. 826 sweepMin := atomic.Load64(&memstats.heap_live) + sweepMinHeapDistance 827 if sweepMin > minTrigger { 828 minTrigger = sweepMin 829 } 830 } 831 if trigger < minTrigger { 832 trigger = minTrigger 833 } 834 if int64(trigger) < 0 { 835 print("runtime: next_gc=", memstats.next_gc, " heap_marked=", memstats.heap_marked, " heap_live=", memstats.heap_live, " initialHeapLive=", work.initialHeapLive, "triggerRatio=", triggerRatio, " minTrigger=", minTrigger, "\n") 836 throw("gc_trigger underflow") 837 } 838 if trigger > goal { 839 // The trigger ratio is always less than GOGC/100, but 840 // other bounds on the trigger may have raised it. 841 // Push up the goal, too. 842 goal = trigger 843 } 844 } 845 846 // Commit to the trigger and goal. 847 memstats.gc_trigger = trigger 848 memstats.next_gc = goal 849 if trace.enabled { 850 traceNextGC() 851 } 852 853 // Update mark pacing. 854 if gcphase != _GCoff { 855 gcController.revise() 856 } 857 858 // Update sweep pacing. 859 if isSweepDone() { 860 mheap_.sweepPagesPerByte = 0 861 } else { 862 // Concurrent sweep needs to sweep all of the in-use 863 // pages by the time the allocated heap reaches the GC 864 // trigger. Compute the ratio of in-use pages to sweep 865 // per byte allocated, accounting for the fact that 866 // some might already be swept. 867 heapLiveBasis := atomic.Load64(&memstats.heap_live) 868 heapDistance := int64(trigger) - int64(heapLiveBasis) 869 // Add a little margin so rounding errors and 870 // concurrent sweep are less likely to leave pages 871 // unswept when GC starts. 872 heapDistance -= 1024 * 1024 873 if heapDistance < _PageSize { 874 // Avoid setting the sweep ratio extremely high 875 heapDistance = _PageSize 876 } 877 pagesSwept := atomic.Load64(&mheap_.pagesSwept) 878 pagesInUse := atomic.Load64(&mheap_.pagesInUse) 879 sweepDistancePages := int64(pagesInUse) - int64(pagesSwept) 880 if sweepDistancePages <= 0 { 881 mheap_.sweepPagesPerByte = 0 882 } else { 883 mheap_.sweepPagesPerByte = float64(sweepDistancePages) / float64(heapDistance) 884 mheap_.sweepHeapLiveBasis = heapLiveBasis 885 // Write pagesSweptBasis last, since this 886 // signals concurrent sweeps to recompute 887 // their debt. 888 atomic.Store64(&mheap_.pagesSweptBasis, pagesSwept) 889 } 890 } 891 892 gcPaceScavenger() 893} 894 895// gcEffectiveGrowthRatio returns the current effective heap growth 896// ratio (GOGC/100) based on heap_marked from the previous GC and 897// next_gc for the current GC. 898// 899// This may differ from gcpercent/100 because of various upper and 900// lower bounds on gcpercent. For example, if the heap is smaller than 901// heapminimum, this can be higher than gcpercent/100. 902// 903// mheap_.lock must be held or the world must be stopped. 904func gcEffectiveGrowthRatio() float64 { 905 egogc := float64(memstats.next_gc-memstats.heap_marked) / float64(memstats.heap_marked) 906 if egogc < 0 { 907 // Shouldn't happen, but just in case. 908 egogc = 0 909 } 910 return egogc 911} 912 913// gcGoalUtilization is the goal CPU utilization for 914// marking as a fraction of GOMAXPROCS. 915const gcGoalUtilization = 0.30 916 917// gcBackgroundUtilization is the fixed CPU utilization for background 918// marking. It must be <= gcGoalUtilization. The difference between 919// gcGoalUtilization and gcBackgroundUtilization will be made up by 920// mark assists. The scheduler will aim to use within 50% of this 921// goal. 922// 923// Setting this to < gcGoalUtilization avoids saturating the trigger 924// feedback controller when there are no assists, which allows it to 925// better control CPU and heap growth. However, the larger the gap, 926// the more mutator assists are expected to happen, which impact 927// mutator latency. 928const gcBackgroundUtilization = 0.25 929 930// gcCreditSlack is the amount of scan work credit that can 931// accumulate locally before updating gcController.scanWork and, 932// optionally, gcController.bgScanCredit. Lower values give a more 933// accurate assist ratio and make it more likely that assists will 934// successfully steal background credit. Higher values reduce memory 935// contention. 936const gcCreditSlack = 2000 937 938// gcAssistTimeSlack is the nanoseconds of mutator assist time that 939// can accumulate on a P before updating gcController.assistTime. 940const gcAssistTimeSlack = 5000 941 942// gcOverAssistWork determines how many extra units of scan work a GC 943// assist does when an assist happens. This amortizes the cost of an 944// assist by pre-paying for this many bytes of future allocations. 945const gcOverAssistWork = 64 << 10 946 947var work struct { 948 full lfstack // lock-free list of full blocks workbuf 949 empty lfstack // lock-free list of empty blocks workbuf 950 pad0 cpu.CacheLinePad // prevents false-sharing between full/empty and nproc/nwait 951 952 wbufSpans struct { 953 lock mutex 954 // free is a list of spans dedicated to workbufs, but 955 // that don't currently contain any workbufs. 956 free mSpanList 957 // busy is a list of all spans containing workbufs on 958 // one of the workbuf lists. 959 busy mSpanList 960 } 961 962 // Restore 64-bit alignment on 32-bit. 963 _ uint32 964 965 // bytesMarked is the number of bytes marked this cycle. This 966 // includes bytes blackened in scanned objects, noscan objects 967 // that go straight to black, and permagrey objects scanned by 968 // markroot during the concurrent scan phase. This is updated 969 // atomically during the cycle. Updates may be batched 970 // arbitrarily, since the value is only read at the end of the 971 // cycle. 972 // 973 // Because of benign races during marking, this number may not 974 // be the exact number of marked bytes, but it should be very 975 // close. 976 // 977 // Put this field here because it needs 64-bit atomic access 978 // (and thus 8-byte alignment even on 32-bit architectures). 979 bytesMarked uint64 980 981 markrootNext uint32 // next markroot job 982 markrootJobs uint32 // number of markroot jobs 983 984 nproc uint32 985 tstart int64 986 nwait uint32 987 ndone uint32 988 989 // Number of roots of various root types. Set by gcMarkRootPrepare. 990 nFlushCacheRoots int 991 nDataRoots, nSpanRoots, nStackRoots int 992 993 // Each type of GC state transition is protected by a lock. 994 // Since multiple threads can simultaneously detect the state 995 // transition condition, any thread that detects a transition 996 // condition must acquire the appropriate transition lock, 997 // re-check the transition condition and return if it no 998 // longer holds or perform the transition if it does. 999 // Likewise, any transition must invalidate the transition 1000 // condition before releasing the lock. This ensures that each 1001 // transition is performed by exactly one thread and threads 1002 // that need the transition to happen block until it has 1003 // happened. 1004 // 1005 // startSema protects the transition from "off" to mark or 1006 // mark termination. 1007 startSema uint32 1008 // markDoneSema protects transitions from mark to mark termination. 1009 markDoneSema uint32 1010 1011 bgMarkReady note // signal background mark worker has started 1012 bgMarkDone uint32 // cas to 1 when at a background mark completion point 1013 // Background mark completion signaling 1014 1015 // mode is the concurrency mode of the current GC cycle. 1016 mode gcMode 1017 1018 // userForced indicates the current GC cycle was forced by an 1019 // explicit user call. 1020 userForced bool 1021 1022 // totaltime is the CPU nanoseconds spent in GC since the 1023 // program started if debug.gctrace > 0. 1024 totaltime int64 1025 1026 // initialHeapLive is the value of memstats.heap_live at the 1027 // beginning of this GC cycle. 1028 initialHeapLive uint64 1029 1030 // assistQueue is a queue of assists that are blocked because 1031 // there was neither enough credit to steal or enough work to 1032 // do. 1033 assistQueue struct { 1034 lock mutex 1035 q gQueue 1036 } 1037 1038 // sweepWaiters is a list of blocked goroutines to wake when 1039 // we transition from mark termination to sweep. 1040 sweepWaiters struct { 1041 lock mutex 1042 list gList 1043 } 1044 1045 // cycles is the number of completed GC cycles, where a GC 1046 // cycle is sweep termination, mark, mark termination, and 1047 // sweep. This differs from memstats.numgc, which is 1048 // incremented at mark termination. 1049 cycles uint32 1050 1051 // Timing/utilization stats for this cycle. 1052 stwprocs, maxprocs int32 1053 tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start 1054 1055 pauseNS int64 // total STW time this cycle 1056 pauseStart int64 // nanotime() of last STW 1057 1058 // debug.gctrace heap sizes for this cycle. 1059 heap0, heap1, heap2, heapGoal uint64 1060} 1061 1062// GC runs a garbage collection and blocks the caller until the 1063// garbage collection is complete. It may also block the entire 1064// program. 1065func GC() { 1066 // We consider a cycle to be: sweep termination, mark, mark 1067 // termination, and sweep. This function shouldn't return 1068 // until a full cycle has been completed, from beginning to 1069 // end. Hence, we always want to finish up the current cycle 1070 // and start a new one. That means: 1071 // 1072 // 1. In sweep termination, mark, or mark termination of cycle 1073 // N, wait until mark termination N completes and transitions 1074 // to sweep N. 1075 // 1076 // 2. In sweep N, help with sweep N. 1077 // 1078 // At this point we can begin a full cycle N+1. 1079 // 1080 // 3. Trigger cycle N+1 by starting sweep termination N+1. 1081 // 1082 // 4. Wait for mark termination N+1 to complete. 1083 // 1084 // 5. Help with sweep N+1 until it's done. 1085 // 1086 // This all has to be written to deal with the fact that the 1087 // GC may move ahead on its own. For example, when we block 1088 // until mark termination N, we may wake up in cycle N+2. 1089 1090 // Wait until the current sweep termination, mark, and mark 1091 // termination complete. 1092 n := atomic.Load(&work.cycles) 1093 gcWaitOnMark(n) 1094 1095 // We're now in sweep N or later. Trigger GC cycle N+1, which 1096 // will first finish sweep N if necessary and then enter sweep 1097 // termination N+1. 1098 gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1}) 1099 1100 // Wait for mark termination N+1 to complete. 1101 gcWaitOnMark(n + 1) 1102 1103 // Finish sweep N+1 before returning. We do this both to 1104 // complete the cycle and because runtime.GC() is often used 1105 // as part of tests and benchmarks to get the system into a 1106 // relatively stable and isolated state. 1107 for atomic.Load(&work.cycles) == n+1 && sweepone() != ^uintptr(0) { 1108 sweep.nbgsweep++ 1109 Gosched() 1110 } 1111 1112 // Callers may assume that the heap profile reflects the 1113 // just-completed cycle when this returns (historically this 1114 // happened because this was a STW GC), but right now the 1115 // profile still reflects mark termination N, not N+1. 1116 // 1117 // As soon as all of the sweep frees from cycle N+1 are done, 1118 // we can go ahead and publish the heap profile. 1119 // 1120 // First, wait for sweeping to finish. (We know there are no 1121 // more spans on the sweep queue, but we may be concurrently 1122 // sweeping spans, so we have to wait.) 1123 for atomic.Load(&work.cycles) == n+1 && atomic.Load(&mheap_.sweepers) != 0 { 1124 Gosched() 1125 } 1126 1127 // Now we're really done with sweeping, so we can publish the 1128 // stable heap profile. Only do this if we haven't already hit 1129 // another mark termination. 1130 mp := acquirem() 1131 cycle := atomic.Load(&work.cycles) 1132 if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) { 1133 mProf_PostSweep() 1134 } 1135 releasem(mp) 1136} 1137 1138// gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has 1139// already completed this mark phase, it returns immediately. 1140func gcWaitOnMark(n uint32) { 1141 for { 1142 // Disable phase transitions. 1143 lock(&work.sweepWaiters.lock) 1144 nMarks := atomic.Load(&work.cycles) 1145 if gcphase != _GCmark { 1146 // We've already completed this cycle's mark. 1147 nMarks++ 1148 } 1149 if nMarks > n { 1150 // We're done. 1151 unlock(&work.sweepWaiters.lock) 1152 return 1153 } 1154 1155 // Wait until sweep termination, mark, and mark 1156 // termination of cycle N complete. 1157 work.sweepWaiters.list.push(getg()) 1158 goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceEvGoBlock, 1) 1159 } 1160} 1161 1162// gcMode indicates how concurrent a GC cycle should be. 1163type gcMode int 1164 1165const ( 1166 gcBackgroundMode gcMode = iota // concurrent GC and sweep 1167 gcForceMode // stop-the-world GC now, concurrent sweep 1168 gcForceBlockMode // stop-the-world GC now and STW sweep (forced by user) 1169) 1170 1171// A gcTrigger is a predicate for starting a GC cycle. Specifically, 1172// it is an exit condition for the _GCoff phase. 1173type gcTrigger struct { 1174 kind gcTriggerKind 1175 now int64 // gcTriggerTime: current time 1176 n uint32 // gcTriggerCycle: cycle number to start 1177} 1178 1179type gcTriggerKind int 1180 1181const ( 1182 // gcTriggerHeap indicates that a cycle should be started when 1183 // the heap size reaches the trigger heap size computed by the 1184 // controller. 1185 gcTriggerHeap gcTriggerKind = iota 1186 1187 // gcTriggerTime indicates that a cycle should be started when 1188 // it's been more than forcegcperiod nanoseconds since the 1189 // previous GC cycle. 1190 gcTriggerTime 1191 1192 // gcTriggerCycle indicates that a cycle should be started if 1193 // we have not yet started cycle number gcTrigger.n (relative 1194 // to work.cycles). 1195 gcTriggerCycle 1196) 1197 1198// test reports whether the trigger condition is satisfied, meaning 1199// that the exit condition for the _GCoff phase has been met. The exit 1200// condition should be tested when allocating. 1201func (t gcTrigger) test() bool { 1202 if !memstats.enablegc || panicking != 0 || gcphase != _GCoff { 1203 return false 1204 } 1205 switch t.kind { 1206 case gcTriggerHeap: 1207 // Non-atomic access to heap_live for performance. If 1208 // we are going to trigger on this, this thread just 1209 // atomically wrote heap_live anyway and we'll see our 1210 // own write. 1211 return memstats.heap_live >= memstats.gc_trigger 1212 case gcTriggerTime: 1213 if gcpercent < 0 { 1214 return false 1215 } 1216 lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime)) 1217 return lastgc != 0 && t.now-lastgc > forcegcperiod 1218 case gcTriggerCycle: 1219 // t.n > work.cycles, but accounting for wraparound. 1220 return int32(t.n-work.cycles) > 0 1221 } 1222 return true 1223} 1224 1225// gcStart starts the GC. It transitions from _GCoff to _GCmark (if 1226// debug.gcstoptheworld == 0) or performs all of GC (if 1227// debug.gcstoptheworld != 0). 1228// 1229// This may return without performing this transition in some cases, 1230// such as when called on a system stack or with locks held. 1231func gcStart(trigger gcTrigger) { 1232 // Since this is called from malloc and malloc is called in 1233 // the guts of a number of libraries that might be holding 1234 // locks, don't attempt to start GC in non-preemptible or 1235 // potentially unstable situations. 1236 mp := acquirem() 1237 if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" { 1238 releasem(mp) 1239 return 1240 } 1241 releasem(mp) 1242 mp = nil 1243 1244 // Pick up the remaining unswept/not being swept spans concurrently 1245 // 1246 // This shouldn't happen if we're being invoked in background 1247 // mode since proportional sweep should have just finished 1248 // sweeping everything, but rounding errors, etc, may leave a 1249 // few spans unswept. In forced mode, this is necessary since 1250 // GC can be forced at any point in the sweeping cycle. 1251 // 1252 // We check the transition condition continuously here in case 1253 // this G gets delayed in to the next GC cycle. 1254 for trigger.test() && sweepone() != ^uintptr(0) { 1255 sweep.nbgsweep++ 1256 } 1257 1258 // Perform GC initialization and the sweep termination 1259 // transition. 1260 semacquire(&work.startSema) 1261 // Re-check transition condition under transition lock. 1262 if !trigger.test() { 1263 semrelease(&work.startSema) 1264 return 1265 } 1266 1267 // For stats, check if this GC was forced by the user. 1268 work.userForced = trigger.kind == gcTriggerCycle 1269 1270 // In gcstoptheworld debug mode, upgrade the mode accordingly. 1271 // We do this after re-checking the transition condition so 1272 // that multiple goroutines that detect the heap trigger don't 1273 // start multiple STW GCs. 1274 mode := gcBackgroundMode 1275 if debug.gcstoptheworld == 1 { 1276 mode = gcForceMode 1277 } else if debug.gcstoptheworld == 2 { 1278 mode = gcForceBlockMode 1279 } 1280 1281 // Ok, we're doing it! Stop everybody else 1282 semacquire(&worldsema) 1283 1284 if trace.enabled { 1285 traceGCStart() 1286 } 1287 1288 // Check that all Ps have finished deferred mcache flushes. 1289 for _, p := range allp { 1290 if fg := atomic.Load(&p.mcache.flushGen); fg != mheap_.sweepgen { 1291 println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen) 1292 throw("p mcache not flushed") 1293 } 1294 } 1295 1296 gcBgMarkStartWorkers() 1297 1298 systemstack(gcResetMarkState) 1299 1300 work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs 1301 if work.stwprocs > ncpu { 1302 // This is used to compute CPU time of the STW phases, 1303 // so it can't be more than ncpu, even if GOMAXPROCS is. 1304 work.stwprocs = ncpu 1305 } 1306 work.heap0 = atomic.Load64(&memstats.heap_live) 1307 work.pauseNS = 0 1308 work.mode = mode 1309 1310 now := nanotime() 1311 work.tSweepTerm = now 1312 work.pauseStart = now 1313 if trace.enabled { 1314 traceGCSTWStart(1) 1315 } 1316 systemstack(stopTheWorldWithSema) 1317 // Finish sweep before we start concurrent scan. 1318 systemstack(func() { 1319 finishsweep_m() 1320 }) 1321 // clearpools before we start the GC. If we wait they memory will not be 1322 // reclaimed until the next GC cycle. 1323 clearpools() 1324 1325 work.cycles++ 1326 1327 gcController.startCycle() 1328 work.heapGoal = memstats.next_gc 1329 1330 // In STW mode, disable scheduling of user Gs. This may also 1331 // disable scheduling of this goroutine, so it may block as 1332 // soon as we start the world again. 1333 if mode != gcBackgroundMode { 1334 schedEnableUser(false) 1335 } 1336 1337 // Enter concurrent mark phase and enable 1338 // write barriers. 1339 // 1340 // Because the world is stopped, all Ps will 1341 // observe that write barriers are enabled by 1342 // the time we start the world and begin 1343 // scanning. 1344 // 1345 // Write barriers must be enabled before assists are 1346 // enabled because they must be enabled before 1347 // any non-leaf heap objects are marked. Since 1348 // allocations are blocked until assists can 1349 // happen, we want enable assists as early as 1350 // possible. 1351 setGCPhase(_GCmark) 1352 1353 gcBgMarkPrepare() // Must happen before assist enable. 1354 gcMarkRootPrepare() 1355 1356 // Mark all active tinyalloc blocks. Since we're 1357 // allocating from these, they need to be black like 1358 // other allocations. The alternative is to blacken 1359 // the tiny block on every allocation from it, which 1360 // would slow down the tiny allocator. 1361 gcMarkTinyAllocs() 1362 1363 // At this point all Ps have enabled the write 1364 // barrier, thus maintaining the no white to 1365 // black invariant. Enable mutator assists to 1366 // put back-pressure on fast allocating 1367 // mutators. 1368 atomic.Store(&gcBlackenEnabled, 1) 1369 1370 // Assists and workers can start the moment we start 1371 // the world. 1372 gcController.markStartTime = now 1373 1374 // Concurrent mark. 1375 systemstack(func() { 1376 now = startTheWorldWithSema(trace.enabled) 1377 work.pauseNS += now - work.pauseStart 1378 work.tMark = now 1379 }) 1380 // In STW mode, we could block the instant systemstack 1381 // returns, so don't do anything important here. Make sure we 1382 // block rather than returning to user code. 1383 if mode != gcBackgroundMode { 1384 Gosched() 1385 } 1386 1387 semrelease(&work.startSema) 1388} 1389 1390// gcMarkDoneFlushed counts the number of P's with flushed work. 1391// 1392// Ideally this would be a captured local in gcMarkDone, but forEachP 1393// escapes its callback closure, so it can't capture anything. 1394// 1395// This is protected by markDoneSema. 1396var gcMarkDoneFlushed uint32 1397 1398// debugCachedWork enables extra checks for debugging premature mark 1399// termination. 1400// 1401// For debugging issue #27993. 1402const debugCachedWork = false 1403 1404// gcWorkPauseGen is for debugging the mark completion algorithm. 1405// gcWork put operations spin while gcWork.pauseGen == gcWorkPauseGen. 1406// Only used if debugCachedWork is true. 1407// 1408// For debugging issue #27993. 1409var gcWorkPauseGen uint32 = 1 1410 1411// gcMarkDone transitions the GC from mark to mark termination if all 1412// reachable objects have been marked (that is, there are no grey 1413// objects and can be no more in the future). Otherwise, it flushes 1414// all local work to the global queues where it can be discovered by 1415// other workers. 1416// 1417// This should be called when all local mark work has been drained and 1418// there are no remaining workers. Specifically, when 1419// 1420// work.nwait == work.nproc && !gcMarkWorkAvailable(p) 1421// 1422// The calling context must be preemptible. 1423// 1424// Flushing local work is important because idle Ps may have local 1425// work queued. This is the only way to make that work visible and 1426// drive GC to completion. 1427// 1428// It is explicitly okay to have write barriers in this function. If 1429// it does transition to mark termination, then all reachable objects 1430// have been marked, so the write barrier cannot shade any more 1431// objects. 1432func gcMarkDone() { 1433 // Ensure only one thread is running the ragged barrier at a 1434 // time. 1435 semacquire(&work.markDoneSema) 1436 1437top: 1438 // Re-check transition condition under transition lock. 1439 // 1440 // It's critical that this checks the global work queues are 1441 // empty before performing the ragged barrier. Otherwise, 1442 // there could be global work that a P could take after the P 1443 // has passed the ragged barrier. 1444 if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) { 1445 semrelease(&work.markDoneSema) 1446 return 1447 } 1448 1449 // Flush all local buffers and collect flushedWork flags. 1450 gcMarkDoneFlushed = 0 1451 systemstack(func() { 1452 gp := getg().m.curg 1453 // Mark the user stack as preemptible so that it may be scanned. 1454 // Otherwise, our attempt to force all P's to a safepoint could 1455 // result in a deadlock as we attempt to preempt a worker that's 1456 // trying to preempt us (e.g. for a stack scan). 1457 casgstatus(gp, _Grunning, _Gwaiting) 1458 forEachP(func(_p_ *p) { 1459 // Flush the write barrier buffer, since this may add 1460 // work to the gcWork. 1461 wbBufFlush1(_p_) 1462 // For debugging, shrink the write barrier 1463 // buffer so it flushes immediately. 1464 // wbBuf.reset will keep it at this size as 1465 // long as throwOnGCWork is set. 1466 if debugCachedWork { 1467 b := &_p_.wbBuf 1468 b.end = uintptr(unsafe.Pointer(&b.buf[wbBufEntryPointers])) 1469 b.debugGen = gcWorkPauseGen 1470 } 1471 // Flush the gcWork, since this may create global work 1472 // and set the flushedWork flag. 1473 // 1474 // TODO(austin): Break up these workbufs to 1475 // better distribute work. 1476 _p_.gcw.dispose() 1477 // Collect the flushedWork flag. 1478 if _p_.gcw.flushedWork { 1479 atomic.Xadd(&gcMarkDoneFlushed, 1) 1480 _p_.gcw.flushedWork = false 1481 } else if debugCachedWork { 1482 // For debugging, freeze the gcWork 1483 // until we know whether we've reached 1484 // completion or not. If we think 1485 // we've reached completion, but 1486 // there's a paused gcWork, then 1487 // that's a bug. 1488 _p_.gcw.pauseGen = gcWorkPauseGen 1489 // Capture the G's stack. 1490 for i := range _p_.gcw.pauseStack { 1491 _p_.gcw.pauseStack[i].pc = 0 1492 } 1493 callers(1, _p_.gcw.pauseStack[:]) 1494 } 1495 }) 1496 casgstatus(gp, _Gwaiting, _Grunning) 1497 }) 1498 1499 if gcMarkDoneFlushed != 0 { 1500 if debugCachedWork { 1501 // Release paused gcWorks. 1502 atomic.Xadd(&gcWorkPauseGen, 1) 1503 } 1504 // More grey objects were discovered since the 1505 // previous termination check, so there may be more 1506 // work to do. Keep going. It's possible the 1507 // transition condition became true again during the 1508 // ragged barrier, so re-check it. 1509 goto top 1510 } 1511 1512 if debugCachedWork { 1513 throwOnGCWork = true 1514 // Release paused gcWorks. If there are any, they 1515 // should now observe throwOnGCWork and panic. 1516 atomic.Xadd(&gcWorkPauseGen, 1) 1517 } 1518 1519 // There was no global work, no local work, and no Ps 1520 // communicated work since we took markDoneSema. Therefore 1521 // there are no grey objects and no more objects can be 1522 // shaded. Transition to mark termination. 1523 now := nanotime() 1524 work.tMarkTerm = now 1525 work.pauseStart = now 1526 getg().m.preemptoff = "gcing" 1527 if trace.enabled { 1528 traceGCSTWStart(0) 1529 } 1530 systemstack(stopTheWorldWithSema) 1531 // The gcphase is _GCmark, it will transition to _GCmarktermination 1532 // below. The important thing is that the wb remains active until 1533 // all marking is complete. This includes writes made by the GC. 1534 1535 if debugCachedWork { 1536 // For debugging, double check that no work was added after we 1537 // went around above and disable write barrier buffering. 1538 for _, p := range allp { 1539 gcw := &p.gcw 1540 if !gcw.empty() { 1541 printlock() 1542 print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork) 1543 if gcw.wbuf1 == nil { 1544 print(" wbuf1=<nil>") 1545 } else { 1546 print(" wbuf1.n=", gcw.wbuf1.nobj) 1547 } 1548 if gcw.wbuf2 == nil { 1549 print(" wbuf2=<nil>") 1550 } else { 1551 print(" wbuf2.n=", gcw.wbuf2.nobj) 1552 } 1553 print("\n") 1554 if gcw.pauseGen == gcw.putGen { 1555 println("runtime: checkPut already failed at this generation") 1556 } 1557 throw("throwOnGCWork") 1558 } 1559 } 1560 } else { 1561 // For unknown reasons (see issue #27993), there is 1562 // sometimes work left over when we enter mark 1563 // termination. Detect this and resume concurrent 1564 // mark. This is obviously unfortunate. 1565 // 1566 // Switch to the system stack to call wbBufFlush1, 1567 // though in this case it doesn't matter because we're 1568 // non-preemptible anyway. 1569 restart := false 1570 systemstack(func() { 1571 for _, p := range allp { 1572 wbBufFlush1(p) 1573 if !p.gcw.empty() { 1574 restart = true 1575 break 1576 } 1577 } 1578 }) 1579 if restart { 1580 getg().m.preemptoff = "" 1581 systemstack(func() { 1582 now := startTheWorldWithSema(true) 1583 work.pauseNS += now - work.pauseStart 1584 }) 1585 goto top 1586 } 1587 } 1588 1589 // Disable assists and background workers. We must do 1590 // this before waking blocked assists. 1591 atomic.Store(&gcBlackenEnabled, 0) 1592 1593 // Wake all blocked assists. These will run when we 1594 // start the world again. 1595 gcWakeAllAssists() 1596 1597 // Likewise, release the transition lock. Blocked 1598 // workers and assists will run when we start the 1599 // world again. 1600 semrelease(&work.markDoneSema) 1601 1602 // In STW mode, re-enable user goroutines. These will be 1603 // queued to run after we start the world. 1604 schedEnableUser(true) 1605 1606 // endCycle depends on all gcWork cache stats being flushed. 1607 // The termination algorithm above ensured that up to 1608 // allocations since the ragged barrier. 1609 nextTriggerRatio := gcController.endCycle() 1610 1611 // Perform mark termination. This will restart the world. 1612 gcMarkTermination(nextTriggerRatio) 1613} 1614 1615func gcMarkTermination(nextTriggerRatio float64) { 1616 // World is stopped. 1617 // Start marktermination which includes enabling the write barrier. 1618 atomic.Store(&gcBlackenEnabled, 0) 1619 setGCPhase(_GCmarktermination) 1620 1621 work.heap1 = memstats.heap_live 1622 startTime := nanotime() 1623 1624 mp := acquirem() 1625 mp.preemptoff = "gcing" 1626 _g_ := getg() 1627 _g_.m.traceback = 2 1628 gp := _g_.m.curg 1629 casgstatus(gp, _Grunning, _Gwaiting) 1630 gp.waitreason = waitReasonGarbageCollection 1631 1632 // Run gc on the g0 stack. We do this so that the g stack 1633 // we're currently running on will no longer change. Cuts 1634 // the root set down a bit (g0 stacks are not scanned, and 1635 // we don't need to scan gc's internal state). We also 1636 // need to switch to g0 so we can shrink the stack. 1637 systemstack(func() { 1638 gcMark(startTime) 1639 // Must return immediately. 1640 // The outer function's stack may have moved 1641 // during gcMark (it shrinks stacks, including the 1642 // outer function's stack), so we must not refer 1643 // to any of its variables. Return back to the 1644 // non-system stack to pick up the new addresses 1645 // before continuing. 1646 }) 1647 1648 systemstack(func() { 1649 work.heap2 = work.bytesMarked 1650 if debug.gccheckmark > 0 { 1651 // Run a full non-parallel, stop-the-world 1652 // mark using checkmark bits, to check that we 1653 // didn't forget to mark anything during the 1654 // concurrent mark process. 1655 gcResetMarkState() 1656 initCheckmarks() 1657 gcw := &getg().m.p.ptr().gcw 1658 gcDrain(gcw, 0) 1659 wbBufFlush1(getg().m.p.ptr()) 1660 gcw.dispose() 1661 clearCheckmarks() 1662 } 1663 1664 // marking is complete so we can turn the write barrier off 1665 setGCPhase(_GCoff) 1666 gcSweep(work.mode) 1667 }) 1668 1669 _g_.m.traceback = 0 1670 casgstatus(gp, _Gwaiting, _Grunning) 1671 1672 if trace.enabled { 1673 traceGCDone() 1674 } 1675 1676 // all done 1677 mp.preemptoff = "" 1678 1679 if gcphase != _GCoff { 1680 throw("gc done but gcphase != _GCoff") 1681 } 1682 1683 // Record next_gc and heap_inuse for scavenger. 1684 memstats.last_next_gc = memstats.next_gc 1685 memstats.last_heap_inuse = memstats.heap_inuse 1686 1687 // Update GC trigger and pacing for the next cycle. 1688 gcSetTriggerRatio(nextTriggerRatio) 1689 1690 // Pacing changed, so the scavenger should be awoken. 1691 wakeScavenger() 1692 1693 // Update timing memstats 1694 now := nanotime() 1695 sec, nsec, _ := time_now() 1696 unixNow := sec*1e9 + int64(nsec) 1697 work.pauseNS += now - work.pauseStart 1698 work.tEnd = now 1699 atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user 1700 atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us 1701 memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS) 1702 memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow) 1703 memstats.pause_total_ns += uint64(work.pauseNS) 1704 1705 // Update work.totaltime. 1706 sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm) 1707 // We report idle marking time below, but omit it from the 1708 // overall utilization here since it's "free". 1709 markCpu := gcController.assistTime + gcController.dedicatedMarkTime + gcController.fractionalMarkTime 1710 markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm) 1711 cycleCpu := sweepTermCpu + markCpu + markTermCpu 1712 work.totaltime += cycleCpu 1713 1714 // Compute overall GC CPU utilization. 1715 totalCpu := sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs) 1716 memstats.gc_cpu_fraction = float64(work.totaltime) / float64(totalCpu) 1717 1718 // Reset sweep state. 1719 sweep.nbgsweep = 0 1720 sweep.npausesweep = 0 1721 1722 if work.userForced { 1723 memstats.numforcedgc++ 1724 } 1725 1726 // Bump GC cycle count and wake goroutines waiting on sweep. 1727 lock(&work.sweepWaiters.lock) 1728 memstats.numgc++ 1729 injectglist(&work.sweepWaiters.list) 1730 unlock(&work.sweepWaiters.lock) 1731 1732 // Finish the current heap profiling cycle and start a new 1733 // heap profiling cycle. We do this before starting the world 1734 // so events don't leak into the wrong cycle. 1735 mProf_NextCycle() 1736 1737 systemstack(func() { startTheWorldWithSema(true) }) 1738 1739 // Flush the heap profile so we can start a new cycle next GC. 1740 // This is relatively expensive, so we don't do it with the 1741 // world stopped. 1742 mProf_Flush() 1743 1744 // Prepare workbufs for freeing by the sweeper. We do this 1745 // asynchronously because it can take non-trivial time. 1746 prepareFreeWorkbufs() 1747 1748 // Ensure all mcaches are flushed. Each P will flush its own 1749 // mcache before allocating, but idle Ps may not. Since this 1750 // is necessary to sweep all spans, we need to ensure all 1751 // mcaches are flushed before we start the next GC cycle. 1752 systemstack(func() { 1753 forEachP(func(_p_ *p) { 1754 _p_.mcache.prepareForSweep() 1755 }) 1756 }) 1757 1758 // Print gctrace before dropping worldsema. As soon as we drop 1759 // worldsema another cycle could start and smash the stats 1760 // we're trying to print. 1761 if debug.gctrace > 0 { 1762 util := int(memstats.gc_cpu_fraction * 100) 1763 1764 var sbuf [24]byte 1765 printlock() 1766 print("gc ", memstats.numgc, 1767 " @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ", 1768 util, "%: ") 1769 prev := work.tSweepTerm 1770 for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} { 1771 if i != 0 { 1772 print("+") 1773 } 1774 print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev)))) 1775 prev = ns 1776 } 1777 print(" ms clock, ") 1778 for i, ns := range []int64{sweepTermCpu, gcController.assistTime, gcController.dedicatedMarkTime + gcController.fractionalMarkTime, gcController.idleMarkTime, markTermCpu} { 1779 if i == 2 || i == 3 { 1780 // Separate mark time components with /. 1781 print("/") 1782 } else if i != 0 { 1783 print("+") 1784 } 1785 print(string(fmtNSAsMS(sbuf[:], uint64(ns)))) 1786 } 1787 print(" ms cpu, ", 1788 work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ", 1789 work.heapGoal>>20, " MB goal, ", 1790 work.maxprocs, " P") 1791 if work.userForced { 1792 print(" (forced)") 1793 } 1794 print("\n") 1795 printunlock() 1796 } 1797 1798 semrelease(&worldsema) 1799 // Careful: another GC cycle may start now. 1800 1801 releasem(mp) 1802 mp = nil 1803 1804 // now that gc is done, kick off finalizer thread if needed 1805 if !concurrentSweep { 1806 // give the queued finalizers, if any, a chance to run 1807 Gosched() 1808 } 1809} 1810 1811// gcBgMarkStartWorkers prepares background mark worker goroutines. 1812// These goroutines will not run until the mark phase, but they must 1813// be started while the work is not stopped and from a regular G 1814// stack. The caller must hold worldsema. 1815func gcBgMarkStartWorkers() { 1816 // Background marking is performed by per-P G's. Ensure that 1817 // each P has a background GC G. 1818 for _, p := range allp { 1819 if p.gcBgMarkWorker == 0 { 1820 expectSystemGoroutine() 1821 go gcBgMarkWorker(p) 1822 notetsleepg(&work.bgMarkReady, -1) 1823 noteclear(&work.bgMarkReady) 1824 } 1825 } 1826} 1827 1828// gcBgMarkPrepare sets up state for background marking. 1829// Mutator assists must not yet be enabled. 1830func gcBgMarkPrepare() { 1831 // Background marking will stop when the work queues are empty 1832 // and there are no more workers (note that, since this is 1833 // concurrent, this may be a transient state, but mark 1834 // termination will clean it up). Between background workers 1835 // and assists, we don't really know how many workers there 1836 // will be, so we pretend to have an arbitrarily large number 1837 // of workers, almost all of which are "waiting". While a 1838 // worker is working it decrements nwait. If nproc == nwait, 1839 // there are no workers. 1840 work.nproc = ^uint32(0) 1841 work.nwait = ^uint32(0) 1842} 1843 1844func gcBgMarkWorker(_p_ *p) { 1845 setSystemGoroutine() 1846 1847 gp := getg() 1848 1849 type parkInfo struct { 1850 m muintptr // Release this m on park. 1851 attach puintptr // If non-nil, attach to this p on park. 1852 } 1853 // We pass park to a gopark unlock function, so it can't be on 1854 // the stack (see gopark). Prevent deadlock from recursively 1855 // starting GC by disabling preemption. 1856 gp.m.preemptoff = "GC worker init" 1857 park := new(parkInfo) 1858 gp.m.preemptoff = "" 1859 1860 park.m.set(acquirem()) 1861 park.attach.set(_p_) 1862 // Inform gcBgMarkStartWorkers that this worker is ready. 1863 // After this point, the background mark worker is scheduled 1864 // cooperatively by gcController.findRunnable. Hence, it must 1865 // never be preempted, as this would put it into _Grunnable 1866 // and put it on a run queue. Instead, when the preempt flag 1867 // is set, this puts itself into _Gwaiting to be woken up by 1868 // gcController.findRunnable at the appropriate time. 1869 notewakeup(&work.bgMarkReady) 1870 1871 for { 1872 // Go to sleep until woken by gcController.findRunnable. 1873 // We can't releasem yet since even the call to gopark 1874 // may be preempted. 1875 gopark(func(g *g, parkp unsafe.Pointer) bool { 1876 park := (*parkInfo)(parkp) 1877 1878 // The worker G is no longer running, so it's 1879 // now safe to allow preemption. 1880 releasem(park.m.ptr()) 1881 1882 // If the worker isn't attached to its P, 1883 // attach now. During initialization and after 1884 // a phase change, the worker may have been 1885 // running on a different P. As soon as we 1886 // attach, the owner P may schedule the 1887 // worker, so this must be done after the G is 1888 // stopped. 1889 if park.attach != 0 { 1890 p := park.attach.ptr() 1891 park.attach.set(nil) 1892 // cas the worker because we may be 1893 // racing with a new worker starting 1894 // on this P. 1895 if !p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) { 1896 // The P got a new worker. 1897 // Exit this worker. 1898 return false 1899 } 1900 } 1901 return true 1902 }, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0) 1903 1904 // Loop until the P dies and disassociates this 1905 // worker (the P may later be reused, in which case 1906 // it will get a new worker) or we failed to associate. 1907 if _p_.gcBgMarkWorker.ptr() != gp { 1908 break 1909 } 1910 1911 // Disable preemption so we can use the gcw. If the 1912 // scheduler wants to preempt us, we'll stop draining, 1913 // dispose the gcw, and then preempt. 1914 park.m.set(acquirem()) 1915 1916 if gcBlackenEnabled == 0 { 1917 throw("gcBgMarkWorker: blackening not enabled") 1918 } 1919 1920 startTime := nanotime() 1921 _p_.gcMarkWorkerStartTime = startTime 1922 1923 decnwait := atomic.Xadd(&work.nwait, -1) 1924 if decnwait == work.nproc { 1925 println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc) 1926 throw("work.nwait was > work.nproc") 1927 } 1928 1929 systemstack(func() { 1930 // Mark our goroutine preemptible so its stack 1931 // can be scanned. This lets two mark workers 1932 // scan each other (otherwise, they would 1933 // deadlock). We must not modify anything on 1934 // the G stack. However, stack shrinking is 1935 // disabled for mark workers, so it is safe to 1936 // read from the G stack. 1937 casgstatus(gp, _Grunning, _Gwaiting) 1938 switch _p_.gcMarkWorkerMode { 1939 default: 1940 throw("gcBgMarkWorker: unexpected gcMarkWorkerMode") 1941 case gcMarkWorkerDedicatedMode: 1942 gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit) 1943 if gp.preempt { 1944 // We were preempted. This is 1945 // a useful signal to kick 1946 // everything out of the run 1947 // queue so it can run 1948 // somewhere else. 1949 lock(&sched.lock) 1950 for { 1951 gp, _ := runqget(_p_) 1952 if gp == nil { 1953 break 1954 } 1955 globrunqput(gp) 1956 } 1957 unlock(&sched.lock) 1958 } 1959 // Go back to draining, this time 1960 // without preemption. 1961 gcDrain(&_p_.gcw, gcDrainFlushBgCredit) 1962 case gcMarkWorkerFractionalMode: 1963 gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit) 1964 case gcMarkWorkerIdleMode: 1965 gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit) 1966 } 1967 casgstatus(gp, _Gwaiting, _Grunning) 1968 }) 1969 1970 // Account for time. 1971 duration := nanotime() - startTime 1972 switch _p_.gcMarkWorkerMode { 1973 case gcMarkWorkerDedicatedMode: 1974 atomic.Xaddint64(&gcController.dedicatedMarkTime, duration) 1975 atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1) 1976 case gcMarkWorkerFractionalMode: 1977 atomic.Xaddint64(&gcController.fractionalMarkTime, duration) 1978 atomic.Xaddint64(&_p_.gcFractionalMarkTime, duration) 1979 case gcMarkWorkerIdleMode: 1980 atomic.Xaddint64(&gcController.idleMarkTime, duration) 1981 } 1982 1983 // Was this the last worker and did we run out 1984 // of work? 1985 incnwait := atomic.Xadd(&work.nwait, +1) 1986 if incnwait > work.nproc { 1987 println("runtime: p.gcMarkWorkerMode=", _p_.gcMarkWorkerMode, 1988 "work.nwait=", incnwait, "work.nproc=", work.nproc) 1989 throw("work.nwait > work.nproc") 1990 } 1991 1992 // If this worker reached a background mark completion 1993 // point, signal the main GC goroutine. 1994 if incnwait == work.nproc && !gcMarkWorkAvailable(nil) { 1995 // Make this G preemptible and disassociate it 1996 // as the worker for this P so 1997 // findRunnableGCWorker doesn't try to 1998 // schedule it. 1999 _p_.gcBgMarkWorker.set(nil) 2000 releasem(park.m.ptr()) 2001 2002 gcMarkDone() 2003 2004 // Disable preemption and prepare to reattach 2005 // to the P. 2006 // 2007 // We may be running on a different P at this 2008 // point, so we can't reattach until this G is 2009 // parked. 2010 park.m.set(acquirem()) 2011 park.attach.set(_p_) 2012 } 2013 } 2014} 2015 2016// gcMarkWorkAvailable reports whether executing a mark worker 2017// on p is potentially useful. p may be nil, in which case it only 2018// checks the global sources of work. 2019func gcMarkWorkAvailable(p *p) bool { 2020 if p != nil && !p.gcw.empty() { 2021 return true 2022 } 2023 if !work.full.empty() { 2024 return true // global work available 2025 } 2026 if work.markrootNext < work.markrootJobs { 2027 return true // root scan work available 2028 } 2029 return false 2030} 2031 2032// gcMark runs the mark (or, for concurrent GC, mark termination) 2033// All gcWork caches must be empty. 2034// STW is in effect at this point. 2035func gcMark(start_time int64) { 2036 if debug.allocfreetrace > 0 { 2037 tracegc() 2038 } 2039 2040 if gcphase != _GCmarktermination { 2041 throw("in gcMark expecting to see gcphase as _GCmarktermination") 2042 } 2043 work.tstart = start_time 2044 2045 // Check that there's no marking work remaining. 2046 if work.full != 0 || work.markrootNext < work.markrootJobs { 2047 print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n") 2048 panic("non-empty mark queue after concurrent mark") 2049 } 2050 2051 if debug.gccheckmark > 0 { 2052 // This is expensive when there's a large number of 2053 // Gs, so only do it if checkmark is also enabled. 2054 gcMarkRootCheck() 2055 } 2056 if work.full != 0 { 2057 throw("work.full != 0") 2058 } 2059 2060 // Clear out buffers and double-check that all gcWork caches 2061 // are empty. This should be ensured by gcMarkDone before we 2062 // enter mark termination. 2063 // 2064 // TODO: We could clear out buffers just before mark if this 2065 // has a non-negligible impact on STW time. 2066 for _, p := range allp { 2067 // The write barrier may have buffered pointers since 2068 // the gcMarkDone barrier. However, since the barrier 2069 // ensured all reachable objects were marked, all of 2070 // these must be pointers to black objects. Hence we 2071 // can just discard the write barrier buffer. 2072 if debug.gccheckmark > 0 || throwOnGCWork { 2073 // For debugging, flush the buffer and make 2074 // sure it really was all marked. 2075 wbBufFlush1(p) 2076 } else { 2077 p.wbBuf.reset() 2078 } 2079 2080 gcw := &p.gcw 2081 if !gcw.empty() { 2082 printlock() 2083 print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork) 2084 if gcw.wbuf1 == nil { 2085 print(" wbuf1=<nil>") 2086 } else { 2087 print(" wbuf1.n=", gcw.wbuf1.nobj) 2088 } 2089 if gcw.wbuf2 == nil { 2090 print(" wbuf2=<nil>") 2091 } else { 2092 print(" wbuf2.n=", gcw.wbuf2.nobj) 2093 } 2094 print("\n") 2095 throw("P has cached GC work at end of mark termination") 2096 } 2097 // There may still be cached empty buffers, which we 2098 // need to flush since we're going to free them. Also, 2099 // there may be non-zero stats because we allocated 2100 // black after the gcMarkDone barrier. 2101 gcw.dispose() 2102 } 2103 2104 throwOnGCWork = false 2105 2106 cachestats() 2107 2108 // Update the marked heap stat. 2109 memstats.heap_marked = work.bytesMarked 2110 2111 // Update other GC heap size stats. This must happen after 2112 // cachestats (which flushes local statistics to these) and 2113 // flushallmcaches (which modifies heap_live). 2114 memstats.heap_live = work.bytesMarked 2115 memstats.heap_scan = uint64(gcController.scanWork) 2116 2117 if trace.enabled { 2118 traceHeapAlloc() 2119 } 2120} 2121 2122// gcSweep must be called on the system stack because it acquires the heap 2123// lock. See mheap for details. 2124//go:systemstack 2125func gcSweep(mode gcMode) { 2126 if gcphase != _GCoff { 2127 throw("gcSweep being done but phase is not GCoff") 2128 } 2129 2130 lock(&mheap_.lock) 2131 mheap_.sweepgen += 2 2132 mheap_.sweepdone = 0 2133 if mheap_.sweepSpans[mheap_.sweepgen/2%2].index != 0 { 2134 // We should have drained this list during the last 2135 // sweep phase. We certainly need to start this phase 2136 // with an empty swept list. 2137 throw("non-empty swept list") 2138 } 2139 mheap_.pagesSwept = 0 2140 mheap_.sweepArenas = mheap_.allArenas 2141 mheap_.reclaimIndex = 0 2142 mheap_.reclaimCredit = 0 2143 unlock(&mheap_.lock) 2144 2145 if !_ConcurrentSweep || mode == gcForceBlockMode { 2146 // Special case synchronous sweep. 2147 // Record that no proportional sweeping has to happen. 2148 lock(&mheap_.lock) 2149 mheap_.sweepPagesPerByte = 0 2150 unlock(&mheap_.lock) 2151 // Sweep all spans eagerly. 2152 for sweepone() != ^uintptr(0) { 2153 sweep.npausesweep++ 2154 } 2155 // Free workbufs eagerly. 2156 prepareFreeWorkbufs() 2157 for freeSomeWbufs(false) { 2158 } 2159 // All "free" events for this mark/sweep cycle have 2160 // now happened, so we can make this profile cycle 2161 // available immediately. 2162 mProf_NextCycle() 2163 mProf_Flush() 2164 return 2165 } 2166 2167 // Background sweep. 2168 lock(&sweep.lock) 2169 if sweep.parked { 2170 sweep.parked = false 2171 ready(sweep.g, 0, true) 2172 } 2173 unlock(&sweep.lock) 2174} 2175 2176// gcResetMarkState resets global state prior to marking (concurrent 2177// or STW) and resets the stack scan state of all Gs. 2178// 2179// This is safe to do without the world stopped because any Gs created 2180// during or after this will start out in the reset state. 2181// 2182// gcResetMarkState must be called on the system stack because it acquires 2183// the heap lock. See mheap for details. 2184// 2185//go:systemstack 2186func gcResetMarkState() { 2187 // This may be called during a concurrent phase, so make sure 2188 // allgs doesn't change. 2189 lock(&allglock) 2190 for _, gp := range allgs { 2191 gp.gcscandone = false // set to true in gcphasework 2192 gp.gcAssistBytes = 0 2193 } 2194 unlock(&allglock) 2195 2196 // Clear page marks. This is just 1MB per 64GB of heap, so the 2197 // time here is pretty trivial. 2198 lock(&mheap_.lock) 2199 arenas := mheap_.allArenas 2200 unlock(&mheap_.lock) 2201 for _, ai := range arenas { 2202 ha := mheap_.arenas[ai.l1()][ai.l2()] 2203 for i := range ha.pageMarks { 2204 ha.pageMarks[i] = 0 2205 } 2206 } 2207 2208 work.bytesMarked = 0 2209 work.initialHeapLive = atomic.Load64(&memstats.heap_live) 2210} 2211 2212// Hooks for other packages 2213 2214var poolcleanup func() 2215 2216//go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup 2217func sync_runtime_registerPoolCleanup(f func()) { 2218 poolcleanup = f 2219} 2220 2221func clearpools() { 2222 // clear sync.Pools 2223 if poolcleanup != nil { 2224 poolcleanup() 2225 } 2226 2227 // Clear central sudog cache. 2228 // Leave per-P caches alone, they have strictly bounded size. 2229 // Disconnect cached list before dropping it on the floor, 2230 // so that a dangling ref to one entry does not pin all of them. 2231 lock(&sched.sudoglock) 2232 var sg, sgnext *sudog 2233 for sg = sched.sudogcache; sg != nil; sg = sgnext { 2234 sgnext = sg.next 2235 sg.next = nil 2236 } 2237 sched.sudogcache = nil 2238 unlock(&sched.sudoglock) 2239 2240 // Clear central defer pools. 2241 // Leave per-P pools alone, they have strictly bounded size. 2242 lock(&sched.deferlock) 2243 // disconnect cached list before dropping it on the floor, 2244 // so that a dangling ref to one entry does not pin all of them. 2245 var d, dlink *_defer 2246 for d = sched.deferpool; d != nil; d = dlink { 2247 dlink = d.link 2248 d.link = nil 2249 } 2250 sched.deferpool = nil 2251 unlock(&sched.deferlock) 2252} 2253 2254// Timing 2255 2256// itoaDiv formats val/(10**dec) into buf. 2257func itoaDiv(buf []byte, val uint64, dec int) []byte { 2258 i := len(buf) - 1 2259 idec := i - dec 2260 for val >= 10 || i >= idec { 2261 buf[i] = byte(val%10 + '0') 2262 i-- 2263 if i == idec { 2264 buf[i] = '.' 2265 i-- 2266 } 2267 val /= 10 2268 } 2269 buf[i] = byte(val + '0') 2270 return buf[i:] 2271} 2272 2273// fmtNSAsMS nicely formats ns nanoseconds as milliseconds. 2274func fmtNSAsMS(buf []byte, ns uint64) []byte { 2275 if ns >= 10e6 { 2276 // Format as whole milliseconds. 2277 return itoaDiv(buf, ns/1e6, 0) 2278 } 2279 // Format two digits of precision, with at most three decimal places. 2280 x := ns / 1e3 2281 if x == 0 { 2282 buf[0] = '0' 2283 return buf[:1] 2284 } 2285 dec := 3 2286 for x >= 100 { 2287 x /= 10 2288 dec-- 2289 } 2290 return itoaDiv(buf, x, dec) 2291} 2292