1.. highlight:: c
2
3RCU
4===
5
6Introduction
7------------
8
9RCU (Read-Copy-Update) is, fundamentally, a paradigm of multithreaded
10operation (and not a set of APIs.)  The core ideas are:
11
12* longer, complicated updates to structures are made only on private,
13  "invisible" copies.  Other threads, when they access the structure, see an
14  older (but consistent) copy.
15
16* once done, the updated copy is swapped in in a single operation so that
17  other threads see either the old or the new data but no inconsistent state
18  between.
19
20* the old instance is only released after making sure that it is impossible
21  any other thread might still be reading it.
22
23For more information, please search for general or Linux kernel RCU
24documentation; there is no way this doc can be comprehensive in explaining the
25interactions:
26
27* https://en.wikipedia.org/wiki/Read-copy-update
28* https://www.kernel.org/doc/html/latest/kernel-hacking/locking.html#avoiding-locks-read-copy-update
29* https://lwn.net/Articles/262464/
30* http://www.rdrop.com/users/paulmck/RCU/rclock_OLS.2001.05.01c.pdf
31* http://lse.sourceforge.net/locking/rcupdate.html
32
33RCU, the TL;DR
34^^^^^^^^^^^^^^
35
36#. data structures are always consistent for reading.  That's the "R" part.
37#. reading never blocks / takes a lock.
38#. rcu_read_lock is not a lock in the traditional sense.  Think of it as a
39   "reservation";  it notes what the *oldest* possible thing the thread might
40   be seeing is, and which thus can't be deleted yet.
41#. you create some object, finish it up, and then publish it.
42#. publishing is an ``atomic_*`` call with ``memory_order_release``, which
43   tells the compiler to make sure prior memory writes have completed before
44   doing the atomic op.
45#. ``ATOMLIST_*`` ``add`` operations do the ``memory_order_release`` for you.
46#. you can't touch the object after it is published, except with atomic ops.
47#. because you can't touch it, if you want to change it you make a new copy,
48   work on that, and then publish the new copy.  That's the "CU" part.
49#. deleting the object is also an atomic op.
50#. other threads that started working before you published / deleted an object
51   might not see the new object / still see the deleted object.
52#. because other threads may still see deleted objects, the ``free()`` needs
53   to be delayed.  That's what :c:func:`rcu_free()` is for.
54
55
56When (not) to use RCU
57^^^^^^^^^^^^^^^^^^^^^
58
59RCU is designed for read-heavy workloads where objects are updated relatively
60rarely, but frequently accessed.  Do *not* indiscriminately replace locking by
61RCU patterns.
62
63The "copy" part of RCU implies that, while updating, several copies of a given
64object exist in parallel.  Even after the updated copy is swapped in, the old
65object remains queued for freeing until all other threads are guaranteed to
66not be accessing it anymore, due to passing a sequence point.  In addition to
67the increased memory usage, there may be some bursted (due to batching) malloc
68contention when the RCU cleanup thread does its thing and frees memory.
69
70Other useful patterns
71^^^^^^^^^^^^^^^^^^^^^
72
73In addition to the full "copy object, apply changes, atomically update"
74approach, there are 2 "reduced" usage cases that can be done:
75
76* atomically updating single pieces of a particular object, e.g. some flags
77  or configuration piece
78
79* straight up read-only / immutable objects
80
81Both of these cases can be considered RCU "subsets".  For example, when
82maintaining an atomic list of items, but these items only have a single
83integer value that needs to be updated, that value can be atomically updated
84without copying the entire object.  However, the object still needs to be
85free'd through :c:func:`rcu_free()` since reading/updating and deleting might
86be happening concurrently.  The same applies for immutable objects;  deletion
87might still race with reading so they need to be free'd through RCU.
88
89FRR API
90-------
91
92Before diving into detail on the provided functions, it is important to note
93that the FRR RCU API covers the **cleanup part of RCU, not the read-copy-update
94paradigm itself**.  These parts are handled by standard C11 atomic operations,
95and by extension through the atomic data structures (ATOMLIST, ATOMSORT & co.)
96
97The ``rcu_*`` functions only make sense in conjunction with these RCU access
98patterns.  If you're calling the RCU API but not using these, something is
99wrong.  The other way around is not necessarily true;  it is possible to use
100atomic ops & datastructures with other types of locking, e.g. rwlocks.
101
102.. c:function:: void rcu_read_lock()
103.. c:function:: void rcu_read_unlock()
104
105   These functions acquire / release the RCU read-side lock.  All access to
106   RCU-guarded data must be inside a block guarded by these.  Any number of
107   threads may hold the RCU read-side lock at a given point in time, including
108   both no threads at all and all threads.
109
110   The functions implement a depth counter, i.e. can be nested.  The nested
111   calls are cheap, since they only increment/decrement the counter.
112   Therefore, any place that uses RCU data and doesn't have a guarantee that
113   the caller holds RCU (e.g. ``lib/`` code) should just have its own
114   rcu_read_lock/rcu_read_unlock pair.
115
116   At the "root" level (e.g. un-nested), these calls can incur the cost of one
117   syscall (to ``futex()``).  That puts them on about the same cost as a
118   mutex lock/unlock.
119
120   The ``thread_master`` code currently always holds RCU everywhere, except
121   while doing the actual ``poll()`` syscall.  This is both an optimization as
122   well as an "easement" into getting RCU going.  The current implementation
123   contract is that any ``struct thread *`` callback is called with a RCU
124   holding depth of 1, and that this is owned by the thread so it may (should)
125   drop and reacquire it when doing some longer-running work.
126
127   .. warning::
128
129      The RCU read-side lock must be held **continuously** for the entire time
130      any piece of RCU data is used.  This includes any access to RCU data
131      after the initial ``atomic_load``.  If the RCU read-side lock is
132      released, any RCU-protected pointers as well as the data they refer to
133      become invalid, as another thread may have called :c:func:`rcu_free` on
134      them.
135
136.. c:type:: struct rcu_head
137.. c:type:: struct rcu_head_close
138.. c:type:: struct rcu_action
139
140   The ``rcu_head`` structures are small (16-byte) bits that contain the
141   queueing machinery for the RCU sweeper/cleanup mechanisms.
142
143   Any piece of data that is cleaned up by RCU needs to have a matching
144   ``rcu_head`` embedded in it.  If there is more than one cleanup operation
145   to be done (e.g. closing a file descriptor), more than one ``rcu_head`` may
146   be embedded.
147
148   .. warning::
149
150      It is not possible to reuse a ``rcu_head``.  It is owned by the RCU code
151      as soon as ``rcu_*`` is called on it.
152
153   The ``_close`` variant carries an extra ``int fd`` field to store the fd to
154   be closed.
155
156   To minimize the amount of memory used for ``rcu_head``, details about the
157   RCU operation to be performed are moved into the ``rcu_action`` structure.
158   It contains e.g. the MTYPE for :c:func:`rcu_free` calls.  The pointer to be
159   freed is stored as an offset relative to the ``rcu_head``, which means it
160   must be embedded as a struct field so the offset is constant.
161
162   The ``rcu_action`` structure is an implementation detail.  Using
163   ``rcu_free`` or ``rcu_close`` will set it up correctly without further
164   code needed.
165
166   The ``rcu_head`` may be put in an union with other data if the other data
167   is only used during "life" of the data, since the ``rcu_head`` is used only
168   for the "death" of data.  But note that other threads may still be reading
169   a piece of data while a thread is working to free it.
170
171.. c:function:: void rcu_free(struct memtype *mtype, struct X *ptr, field)
172
173   Free a block of memory after RCU has ensured no other thread can be
174   accessing it anymore.  The pointer remains valid for any other thread that
175   has called :c:func:`rcu_read_lock` before the ``rcu_free`` call.
176
177   .. warning::
178
179      In some other RCU implementations, the pointer remains valid to the
180      *calling* thread if it is holding the RCU read-side lock.  This is not
181      the case in FRR, particularly when running single-threaded.  Enforcing
182      this rule also allows static analysis to find use-after-free issues.
183
184   ``mtype`` is the libfrr ``MTYPE_FOO`` allocation type to pass to
185   :c:func:`XFREE`.
186
187   ``field`` must be the name of a ``struct rcu_head`` member field in ``ptr``.
188   The offset of this field (which must be constant) is used to reduce the
189   memory size of ``struct rcu_head``.
190
191   .. note::
192
193      ``rcu_free`` (and ``rcu_close``) calls are more efficient if they are
194      put close to each other.  When freeing several RCU'd resources, try to
195      move the calls next to each other (even if the data structures do not
196      directly point to each other.)
197
198      Having the calls bundled reduces the cost of adding the ``rcu_head`` to
199      the RCU queue;  the RCU queue is an atomic data structure whose usage
200      will require the CPU to acquire an exclusive hold on relevant cache
201      lines.
202
203.. c:function:: void rcu_close(struct rcu_head_close *head, int fd)
204
205   Close a file descriptor after ensuring no other thread might be using it
206   anymore.  Same as :c:func:`rcu_free`, except it calls ``close`` instead of
207   ``free``.
208
209Internals
210^^^^^^^^^
211
212.. c:type:: struct rcu_thread
213
214   Per-thread state maintained by the RCU code, set up by the following
215   functions.  A pointer to a thread's own ``rcu_thread`` is saved in
216   thread-local storage.
217
218.. c:function:: struct rcu_thread *rcu_thread_prepare(void)
219.. c:function:: void rcu_thread_unprepare(struct rcu_thread *rcu_thread)
220.. c:function:: void rcu_thread_start(struct rcu_thread *rcu_thread)
221
222   Since the RCU code needs to have a list of all active threads, these
223   functions are used by the ``frr_pthread`` code to set up threads.  Teardown
224   is automatic.  It should not be necessary to call these functions.
225
226   Any thread that accesses RCU-protected data needs to be registered with
227   these functions.  Threads that do not access RCU-protected data may call
228   these functions but do not need to.
229
230   Note that passing a pointer to RCU-protected data to some library which
231   accesses that pointer makes the library "access RCU-protected data".  In
232   that case, either all of the library's threads must be registered for RCU,
233   or the code must instead pass a (non-RCU) copy of the data to the library.
234
235.. c:function:: void rcu_shutdown(void)
236
237   Stop the RCU sweeper thread and make sure all cleanup has finished.
238
239   This function is called on daemon exit by the libfrr code to ensure pending
240   RCU operations are completed.  This is mostly to get a clean exit without
241   memory leaks from queued RCU operations.  It should not be necessary to
242   call this function as libfrr handles this.
243
244FRR specifics and implementation details
245----------------------------------------
246
247The FRR RCU infrastructure has the following characteristics:
248
249* it is Epoch-based with a 32-bit wrapping counter.  (This is somewhat
250  different from other Epoch-based approaches which may be designed to only
251  use 3 counter values, but works out to a simple implementation.)
252
253* instead of tracking CPUs as the Linux kernel does, threads are tracked.  This
254  has exactly zero semantic impact, RCU just cares about "threads of
255  execution", which the kernel can optimize to CPUs but we can't.  But it
256  really boils down to the same thing.
257
258* there are no ``rcu_dereference`` and ``rcu_assign_pointer`` - use
259  ``atomic_load`` and ``atomic_store`` instead.  (These didn't exist when the
260  Linux RCU code was created.)
261
262* there is no ``synchronize_rcu``; this is a design choice but may be revisited
263  at a later point.  ``synchronize_rcu`` blocks a thread until it is guaranteed
264  that no other threads might still be accessing data structures that they may
265  have access to at the beginning of the function call.  This is a blocking
266  design and probably not appropriate for FRR.  Instead, ``rcu_call`` can be
267  used to have the RCU sweeper thread make a callback after the same constraint
268  is fulfilled in an asynchronous way.  Most needs should be covered by
269  ``rcu_free`` and ``rcu_close``.
270