1Buffer mapping patterns
2-----------------------
3
4There are two main strategies the driver has for CPU access to GL buffer
5objects. One is that the GL calls allocate temporary storage and blit to the GPU
6at
7``glBufferSubData()``/``glBufferData()``/``glFlushMappedBufferRange()``/``glUnmapBuffer()``
8time. This makes the behavior easily match. However, this may be more costly
9than direct mapping of the GL BO on some platforms, and is essentially not
10available to tiling GPUs (since tiling involves running through the command
11stream multiple times). Thus, GL has additional interfaces to help make it so
12apps can directly access memory while avoiding implicit blocking on the GPU
13rendering from those BOs.
14
15Rendering engines have a variety of knobs to set on those GL interfaces for data
16upload, and as a whole they seem to take just about every path available. Let's
17look at some examples to see how they might constrain GL driver buffer upload
18behavior.
19
20Portal 2
21========
22
23.. code-block:: console
24
25  1030842 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
26  1030876 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 65536, data = NULL, usage = GL_DYNAMIC_DRAW)
27  1030877 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, size = 576, data = blob(576))
28  1030896 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 526, count = 252, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
29  1030915 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 19657, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x1f8, basevertex = 0)
30  1030917 glBufferDataARB(target = GL_ARRAY_BUFFER, size = 1572864, data = NULL, usage = GL_DYNAMIC_DRAW)
31  1030918 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 128, data = blob(128))
32  1030919 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 576, size = 12, data = blob(12))
33  1030936 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x240, basevertex = 0)
34  1030937 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 128, size = 128, data = blob(128))
35  1030938 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 588, size = 12, data = blob(12))
36  1030940 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 4, end = 7, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x24c, basevertex = 0)
37  [... repeated draws at increasing offsets]
38  1033097 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
39
40From this sequence, we can see that it is important that the driver either
41implement ``glBufferSubData()`` as a blit from a streaming uploader in sequence with
42the ``glDraw*()`` calls (a common behavior for non-tiled GPUs, particularly those with
43dedicated memory), or that you:
44
451) Track the valid range of the buffer so that you don't have to flush the draws
46   and synchronize on each following ``glBufferSubData()``.
47
482) Reallocate the buffer storage on ``glBufferData`` so that your first
49   ``glBufferSubData()`` of the frame doesn't stall on the last frame's
50   rendering completing.
51
52You can't just empty your valid range on ``glBufferData()`` unless you know that
53the GPU access from the previous frame has completed. This pattern of
54incrementing ``glBufferSubData()`` offsets interleaved with draws from that data
55is common among newer Valve games.
56
57.. code-block:: console
58
59  [ during setup ]
60
61  679259 glGenBuffersARB(n = 1, buffers = &1314)
62  679260 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
63  679261 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 3072, data = NULL, usage = GL_STATIC_DRAW)
64  679264 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
65  679269 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072)
66  679270 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
67
68  [... setup of other buffers on this binding point]
69
70  679343 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
71  679344 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
72  679346 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
73  679347 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
74  679348 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 768, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384300
75  679350 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
76  679351 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
77  679352 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 1536, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384600
78  679354 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
79  679355 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
80  679356 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 2304, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384900
81  679358 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
82  679359 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
83
84  [... setup completes and we start drawing later]
85
86  761845 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
87  761846 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 323, count = 384, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
88
89This suggests that, for non-blitting drivers, resetting your "might be used on
90the GPU" range after a stall could save you a bunch of additional GPU stalls
91during setup.
92
93Terraria
94========
95
96.. code-block:: console
97
98  167581 glXSwapBuffers(dpy = 0x3004630, drawable = 25165844)
99
100  167585 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
101  167586 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 1728, data = blob(1728))
102  167588 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 71, count = 108, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
103  167589 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
104  167590 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 27456, data = blob(27456))
105  167592 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 7, count = 12, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
106  167594 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 8)
107  167596 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 12)
108  [...]
109
110In this game, we can see ``glBufferData()`` being used on the same array buffer
111throughout, to get new storage so that the ``glBufferSubData()`` doesn't cause
112synchronization.
113
114Don't Starve
115============
116
117.. code-block:: console
118
119  7251917 glGenBuffers(n = 1, buffers = &115052)
120  7251918 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
121  7251919 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
122  7251921 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
123  7251928 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
124  7251930 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 114872)
125  7251936 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 18)
126  7251938 glGenBuffers(n = 1, buffers = &115053)
127  7251939 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
128  7251940 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
129  7251942 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
130  7251949 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
131  7251973 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
132  [... drawing next frame]
133  7252388 glDeleteBuffers(n = 1, buffers = &115052)
134  7252389 glDeleteBuffers(n = 1, buffers = &115053)
135  7252390 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
136
137In this game we have a lot of tiny ``glBufferData()`` calls, suggesting that we
138could see working set wins and possibly CPU overhead reduction by packing small
139GL buffers in the same BO. Interestingly, the deletes of the temporary buffers
140always happen at the end of the next frame.
141
142Euro Truck Simulator
143====================
144
145.. code-block:: console
146
147  [usage of VBO 14,15]
148  [...]
149  885199 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
150  885203 glInvalidateBufferData(buffer = 14)
151  885204 glInvalidateBufferData(buffer = 15)
152  [...]
153  889330 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
154  889334 glInvalidateBufferData(buffer = 12)
155  889335 glInvalidateBufferData(buffer = 16)
156  [...]
157  893461 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
158  893462 glClientWaitSync(sync = 0x77eee10, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
159  893463 glDeleteSync(sync = 0x780a630)
160  893464 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x78ec730
161  893465 glInvalidateBufferData(buffer = 13)
162  893466 glInvalidateBufferData(buffer = 17)
163  893505 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
164  893506 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1000
165  893508 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
166  893509 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 15)
167  893510 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 32, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034e5df000
168  893512 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
169  893532 glBindVertexBuffers(first = 0, count = 2, buffers = {10, 15}, offsets = {0, 0}, strides = {52, 16})
170  893552 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
171  893609 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
172  893732 glBindVertexBuffers(first = 0, count = 1, buffers = &14, offsets = &0, strides = &48)
173  893733 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 14)
174  893744 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0xf0, basevertex = 0)
175  893759 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x2e0, basevertex = 6)
176  893786 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 600, type = GL_UNSIGNED_SHORT, indices = 0xe87b0, basevertex = 21515)
177  893822 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
178  893845 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
179  893846 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 788, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1314
180  893848 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
181  893886 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
182  893943 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
183
184At the start of this frame, buffer 14 and 15 haven't been used in the previous 2
185frames, and the ``GL_ARB_sync`` fence has ensured that the GPU has at least started
186frame n-1 as the CPU starts the current frame. The first map is ``offset = 0,
187INVALIDATE_BUFFER | UNSYNCHRONIZED``, which suggests that the driver should
188reallocate storage for the mapping even in the ``UNSYNCHRONIZED`` case, except
189that the buffer is definitely going to be idle, making reallocation unnecessary
190(you may need to empty your valid range, though, to prevent unnecessary batch
191flushes).
192
193Also note the use of a totally unrelated binding point for the mapping of the
194vertex array -- you can't effectively use it as a hint for any buffer placement
195in memory. The game does also use ``glCopyBufferSubData()``, but only on a
196different buffer.
197
198
199Plague Inc
200==========
201
202.. code-block:: console
203
204  1640732 glXSwapBuffers(dpy = 0xb218f20, drawable = 23068674)
205  1640733 glClientWaitSync(sync = 0xb4141430, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
206  1640734 glDeleteSync(sync = 0xb4141430)
207  1640735 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0xb4141430
208
209  1640780 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 78)
210  1640787 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 79)
211  1640788 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
212  1640795 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
213  1640813 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
214  1640814 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4000
215  1640815 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
216  1640816 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998000
217  1640817 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
218  1640819 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
219  1640820 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
220  1640821 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
221  1640823 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
222  1640824 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
223  1640825 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 1096)
224  1640831 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1091)
225  1640832 glDrawElements(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL)
226
227  1640847 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
228  1640848 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 352, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4160
229  1640849 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
230  1640850 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 88, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998058
231  1640851 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
232  1640853 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
233  1640854 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
234  1640855 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
235  1640857 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
236  1640858 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
237  1640863 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x58, basevertex = 4)
238
239At the start of this frame, the VBOs haven't been used in about 6 frames, and
240the ``GL_ARB_sync`` fence has ensured that the GPU has started frame n-1.
241
242Note the use of ``glFlushMappedBufferRange()`` on a small fraction of the size
243of the VBO -- it is important that a blitting driver make use of the flush
244ranges when in explicit mode.
245
246Darkest Dungeon
247===============
248
249.. code-block:: console
250
251  938384 glXSwapBuffers(dpy = 0x377fcd0, drawable = 23068692)
252
253  938385 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
254  938386 glBufferData(target = GL_ARRAY_BUFFER, size = 1048576, data = NULL, usage = GL_STREAM_DRAW)
255  938511 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
256  938512 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
257  938514 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 512)
258  938515 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
259  938523 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
260  938524 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
261  938525 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = NULL)
262  938527 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
263  938528 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
264  938530 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 512, length = 512)
265  938531 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
266  938539 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
267  938540 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
268  938541 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x30)
269  [... more maps and draws at increasing offsets]
270
271Interesting note for this game, after the initial ``glBufferData()`` in the
272frame to reallocate the storage, it unsync maps the whole buffer each time, and
273just changes which region it flushes. The same GL buffer name is used in every
274frame.
275
276Tabletop Simulator
277==================
278
279.. code-block:: console
280
281  1287594 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
282  1287595 glClientWaitSync(sync = 0x7abf554e37b0, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
283  1287596 glDeleteSync(sync = 0x7abf554e37b0)
284  1287597 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7abf56647490
285
286  1287614 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
287  1287615 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7abf2e79a000
288  1287642 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 614)
289  1287650 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 5)
290  1287651 glBufferSubData(target = GL_COPY_WRITE_BUFFER, offset = 0, size = 1088, data = blob(1088))
291  1287652 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 615)
292  1287653 glDrawElements(mode = GL_TRIANGLES, count = 1788, type = GL_UNSIGNED_SHORT, indices = NULL)
293  [... more draw calls]
294  1289055 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
295  1289057 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384)
296  1289058 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
297  1289059 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 480)
298  1289066 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 12, count = 4)
299  1289068 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 8, count = 4)
300  1289553 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
301
302In this app, buffer 480 gets used like this every other frame.  The ``GL_ARB_sync``
303fence ensures that frame n-1 has started on the GPU before CPU work starts on
304the current frame, so the unsynchronized access to the buffers is safe.
305
306Hollow Knight
307=============
308
309.. code-block:: console
310
311  1873034 glXSwapBuffers(dpy = 0x28609d0, drawable = 23068692)
312  1873035 glClientWaitSync(sync = 0x7b1a5ca6e130, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
313  1873036 glDeleteSync(sync = 0x7b1a5ca6e130)
314  1873037 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7b1a5ca6e130
315  1873038 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
316  1873039 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c7e000
317  1873040 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
318  1873041 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a07430000
319  1873065 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
320  1873067 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640)
321  1873068 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
322  1873069 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
323  1873071 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720)
324  1873072 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
325  1873073 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
326  1873074 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 8640, length = 576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c801c0
327  1873075 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
328  1873076 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 720, length = 72, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a074302d0
329  1873077 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
330  1873079 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 576)
331  1873080 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
332  1873081 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
333  1873083 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 72)
334  1873084 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
335  1873085 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 29)
336  1873096 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 30)
337  1873097 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x2d0, basevertex = 240)
338
339In this app, buffer 29/30 get used like this starting from offset 0 every other
340frame.  The ``GL_ARB_sync`` fence is used to make sure that the GPU has reached the
341start of the previous frame before we go unsynchronized writing over the n-2
342frame's buffer.
343
344Borderlands 2
345=============
346
347.. code-block:: console
348
349  3561998 glFlush()
350  3562004 glXSwapBuffers(dpy = 0xbaf0f90, drawable = 23068705)
351  3562006 glClientWaitSync(sync = 0x231c2ab0, flags = GL_SYNC_FLUSH_COMMANDS_BIT, timeout = 10000000000) = GL_ALREADY_SIGNALED
352  3562007 glDeleteSync(sync = 0x231c2ab0)
353  3562008 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x231aadc0
354
355  3562050 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
356  3562051 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1792, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xde056000
357  3562053 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
358  3562054 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1194)
359  3562055 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1280, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xd9426000
360  3562057 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
361  [... unrelated draws]
362  3563051 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
363  3563064 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 875)
364  3563065 glDrawElementsInstancedARB(mode = GL_TRIANGLES, count = 72, type = GL_UNSIGNED_SHORT, indices = NULL, instancecount = 28)
365
366The ``GL_ARB_sync`` fence ensures that the GPU has started frame n-1 before the CPU
367starts on the current frame.
368
369This sequence of buffer uploads appears in each frame with the same buffer
370names, so you do need to handle the ``GL_MAP_INVALIDATE_BUFFER_BIT`` as a
371reallocate if the buffer is GPU-busy (it wasn't in this trace capture) to avoid
372stalls on the n-1 frame completing.
373
374Note that this is just one small buffer. Most of the vertex data goes through a
375``glBufferSubData()``/``glDraw*()`` path with the VBO used across multiple
376frames, with a ``glBufferData()`` when needing to wrap.
377
378Buffer mapping conclusions
379--------------------------
380
381* Non-blitting drivers must track the valid range of a freshly allocated buffer
382  as it gets uploaded in ``pipe_transfer_map()`` and avoid stalling on the GPU
383  when mapping an undefined portion of the buffer when ``glBufferSubData()`` is
384  interleaved with drawing.
385
386* Non-blitting drivers must reallocate storage on ``glBufferData(NULL)`` so that
387  the following ``glBufferSubData()`` won't stall. That ``glBufferData(NULL)``
388  call will appear in the driver as an ``invalidate_resource()`` call if
389  ``PIPE_CAP_INVALIDATE_BUFFER`` is available. (If that flag is not set, then
390  mesa/st will create a new pipe_resource for you). Storage reallocation may be
391  skipped if you for some reason know that the buffer is idle, in which case you
392  can just empty the valid region.
393
394* Blitting drivers must use the ``transfer_flush_region()`` region
395  instead of the mapped range when ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid
396  blitting too much data. (When that bit is unset, you just blit the whole
397  mapped range at unmap time.)
398
399* Buffer valid range tracking in non-blitting drivers must use the
400  ``transfer_flush_region()`` region instead of the mapped range when
401  ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid excess stalls.
402
403* Buffer valid range tracking doesn't need to be fancy, "number of bytes
404  valid starting from 0" is sufficient for all examples found.
405
406* Use the ``pipe_debug_callback`` to report stalls on buffer mapping to ease
407  debug.
408
409* Buffer binding points are not useful for tuning buffer placement (See all the
410  ``PIPE_COPY_WRITE_BUFFER`` instances), you have to track the actual usage
411  history of a GL BO name.  mesa/st does this for optimizing its state updates
412  on reallocation in the ``!PIPE_CAP_INVALIDATE_BUFFER`` case, and if you set
413  ``PIPE_CAP_INVALIDATE_BUFFER`` then you have to flag your own internal state
414  updates (VBO addresses, XFB addresses, texture buffer addresses, etc.) on
415  reallocation based on usage history.
416