1Off Main Thread Painting
2========================
3
4OMTP, or ‘off main thread painting’, is the component of Gecko that
5allows us to perform painting of web content off of the main thread.
6This gives us more time on the main thread for javascript, layout,
7display list building, and other tasks which allows us to increase our
8responsiveness.
9
10Take a look at this `blog
11post <https://mozillagfx.wordpress.com/2017/12/05/off-main-thread-painting/>`__
12for an introduction.
13
14Background
15----------
16
17Painting (or rasterization) is the last operation that happens in a
18layer transaction before we forward it to the compositor. At this point,
19all display items have been assigned to a layer and invalid regions have
20been calculated and assigned to each layer.
21
22The painted layer uses a content client to acquire a buffer for
23painting. The main purpose of the content client is to allow us to
24retain already painted content when we are scrolling a layer. We have
25two main strategies for this, rotated buffer and tiling.
26
27This is implemented with two class hierarchies. ``ContentClient`` for
28rotated buffer and ``TiledContentClient`` for tiling. Additionally we
29have two different painted layer implementations, ``ClientPaintedLayer``
30and ``ClientTiledPaintedLayer``.
31
32The main distinction between rotated buffer and tiling is the amount of
33graphics surfaces required. Rotated buffer utilizes just a single buffer
34for a frame but potentially requires painting into it multiple times.
35Tiling uses multiple buffers but doesn’t require painting into the
36buffers multiple times.
37
38Once the painted layer has a surface (or surfaces with tiling) to paint
39into, they are wrapped in a ``DrawTarget`` of some form and a callback
40to ``FrameLayerBuilder`` is called. This callback uses the assigned
41display items and invalid regions to trigger rasterization. Each
42``nsDisplayItem`` has their ``Paint`` method called with the provided
43``DrawTarget`` that represents the surface, and they paint into it.
44
45High level
46----------
47
48The key abstraction that allows us to paint off of the main thread is
49``DrawTargetCapture`` [1]_. ``DrawTargetCapture`` is a special
50``DrawTarget`` which records all draw commands for replaying to another
51draw target in the local process. This is similar to
52``DrawTargetRecording``, but only holds a reference to resources instead
53of copying them into the command stream. This allows the command stream
54to be much more lightweight than ``DrawTargetRecording``.
55
56OMTP works by instrumenting the content clients to use a capture target
57for all painting [2]_ [3]_ [4]_ [5]_. This capture draw target records all
58the operations that would normally be performed directly on the
59surface’s draw target. Once we have all of the commands, we send the
60capture and surface draw target to the ``PaintThread`` [6]_ where the
61commands are replayed onto the surface. Once the rasterization is done,
62we forward the layer transaction to the compositor.
63
64Tiling and parallel painting
65----------------------------
66
67We can make one additional improvement if we are using tiling as our
68content client backend.
69
70When we are tiling, the screen is subdivided into a grid of equally
71sized surfaces and draw commands are performed on the tiles they affect.
72Each tile is independent of the others, so we’re able to parallelize
73painting by using a worker thread pool and dispatching a task for each
74tile individually.
75
76This is commonly referred to as P-OMTP or parallel painting.
77
78Main thread rasterization
79-------------------------
80
81Even with OMTP it’s still possible for the main thread to perform
82rasterization. A common pattern for painting code is to create a
83temporary draw target, perform drawing with it, take a snapshot, and
84then draw the snapshot onto the main draw target. This is done for
85blurs, box shadows, text shadows, and with the basic layer manager
86fallback.
87
88If the temporary draw target is not a draw target capture, then this
89will perform rasterization on the main thread. This can be bad as it
90lowers our parallelism and can cause contention with content backends,
91like Direct2D, that use locking around shared resources.
92
93To work around this, we changed the main thread painting code to use a
94draw target capture for these operations and added a source surface
95capture [7]_ which only resolves the painting of the draw commands when
96needed on the paint thread.
97
98There are still possible cases we can perform main thread rasterization,
99but we try and address them when they come up.
100
101Out of memory issues
102--------------------
103
104The web is very complex, and so we can sometimes have a large amount of
105draw commands for a content paint. We’ve observed OOM errors for capture
106command lists that have grown to be 200MiB large.
107
108We initially tried to mitigate this by lowering the overhead of capture
109command lists. We do this by filtering commands that don’t actually
110change the draw target state and folding consecutive transform changes,
111but that was not always enough. So we added the ability for our draw
112target capture’s to flush their command lists to the surface draw target
113while we are capturing on the main thread [8]_.
114
115This is triggered by a configurable memory limit. Because this
116introduces a new source of main thread rasterization we try to balance
117setting this too low and suffering poor performance, or setting this too
118high and suffering crashes.
119
120Synchronization
121---------------
122
123OMTP is conceptually simple, but in practice it relies on subtle code to
124ensure thread safety. This was the most arguably the most difficult part
125of the project.
126
127There are roughly four areas that are critical.
128
1291. Compositor message ordering
130
131   Immediately after we queue the async paints to be asynchronously
132   completed, we have a problem. We need to forward the layer
133   transaction at some point, but the compositor cannot process the
134   transaction until all async paints have finished. If it did, it could
135   access unfinished painted content.
136
137   We obviously can’t block on the async paints completing as that would
138   beat the whole point of OMTP. We also can’t hold off on sending the
139   layer transaction to ``IPDL``, as we’d trigger race conditions for
140   messages sent after the layer transaction is built but before it is
141   forwarded. Reftest and other code assumes that messages sent after a
142   layer transaction to the compositor are processed after that layer
143   transaction is processed.
144
145   The solution is to forward the layer transaction to the compositor
146   over ``IPDL``, but flag the message channel to start postponing
147   messages [9]_. Then once all async paints have completed, we unflag
148   the message channel and all postponed messages are sent [10]_. This
149   allows us to keep our message ordering guarantees and not have to
150   worry about scheduling a runnable in the future.
151
1522. Texture clients
153
154   The backing store for content surfaces is managed by texture client.
155   While async paints are executing, it’s possible for shutdown or any
156   number of things to happen that could cause layer manager, all
157   layers, all content clients, and therefore all texture clients to be
158   destroyed. Therefore it’s important that we keep these texture
159   clients alive throughout async painting. Texture clients also manage
160   IPC resources and must be destroyed on the main thread, so we are
161   careful to do that [11]_.
162
1633. Double buffering
164
165   We currently double buffer our content painting - our content clients
166   only ever have zero or one texture that is available to be painted
167   into at any moment.
168
169   This implies that we cannot start async painting a layer tree while
170   previous async paints are still active as this would lead to awful
171   races. We also don’t support multiple nested sets of postponed IPC
172   messages to allow sending the first layer transaction to the
173   compositor, but not the second.
174
175   To prevent issues with this, we flush all active async paints before
176   we begin to paint a new layer transaction [12]_.
177
178   There was some initial debate about implementing triple buffering for
179   content painting, but we have not seen evidence it would help us
180   significantly.
181
1824. Moz2D thread safety
183
184   Finally, most Moz2D objects were not thread safe. We had to insert
185   special locking into draw target and source surface as they have a
186   special copy on write relationship that must be consistent even if
187   they are on different threads.
188
189   Some platform specific resources like fonts needed locking added in
190   order to be thread safe. We also did some work to make filter nodes
191   work with multiple threads executing them at the same time.
192
193Browser process
194---------------
195
196Currently only content processes are able to use OMTP.
197
198This restriction was added because of concern about message ordering
199between ``APZ`` and OMTP. It might be able to lifted in the future.
200
201Important bugs
202--------------
203
2041. `OMTP Meta <https://bugzilla.mozilla.org/show_bug.cgi?id=omtp>`__
2052. `Enable on
206   Windows <https://bugzilla.mozilla.org/show_bug.cgi?id=1403935>`__
2073. `Enable on
208   OSX <https://bugzilla.mozilla.org/show_bug.cgi?id=1422392>`__
2094. `Enable on
210   Linux <https://bugzilla.mozilla.org/show_bug.cgi?id=1432531>`__
2115. `Parallel
212   painting <https://bugzilla.mozilla.org/show_bug.cgi?id=1425056>`__
213
214Code links
215----------
216
217.. [1]  `DrawTargetCapture <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/2d/DrawTargetCapture.h#22>`__
218.. [2]  `Creating DrawTargetCapture for rotated
219    buffer <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/client/ContentClient.cpp#185>`__
220.. [3]  `Dispatch DrawTargetCapture for rotated
221    buffer <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/client/ClientPaintedLayer.cpp#99>`__
222.. [4]  `Creating DrawTargetCapture for
223    tiling <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/client/TiledContentClient.cpp#714>`__
224.. [5]  `Dispatch DrawTargetCapture for
225    tiling <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/client/MultiTiledContentClient.cpp#288>`__
226.. [6]  `PaintThread <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/PaintThread.h#53>`__
227.. [7]  `SourceSurfaceCapture <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/2d/SourceSurfaceCapture.h#19>`__
228.. [8] `Sync flushing draw
229    commands <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/2d/DrawTargetCapture.h#165>`__
230.. [9]  `Postponing messages for
231    PCompositorBridge <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/ipc/CompositorBridgeChild.cpp#1319>`__
232.. [10]  `Releasing messages for
233    PCompositorBridge <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/ipc/CompositorBridgeChild.cpp#1303>`__
234.. [11] `Releasing texture clients on main
235    thread <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/ipc/CompositorBridgeChild.cpp#1170>`__
236.. [12] `Flushing async
237    paints <https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/gfx/layers/client/ClientLayerManager.cpp#289>`__
238