1.. _process-architecture: 2 3Process Architecture 4==================== 5 6FRR inherited its overall design architecture from Quagga. The chosen model for 7Quagga is that of a suite of independent daemons that do IPC via Unix domain 8sockets. Within each daemon, the architecture follows the event-driven model. 9FRR has inherited this model as well. As FRR is deployed at larger scales and 10gains ever more features, each adding to the overall processing workload, we 11are approaching the saturation point for a single thread per daemon. In light 12of this, there are ongoing efforts to introduce multithreading to various 13components of FRR. This document aims to describe the current design choices 14and overall model for integrating the event-driven and multithreaded 15architectures into a cohesive whole. 16 17Terminology 18----------- 19Because this document describes the architecture for true kernel threads as 20well as the event system, a digression on terminology is in order here. 21 22Historically Quagga's event system was viewed as an implementation of userspace 23threading. Because of this design choice, the names for various datastructures 24within the event system are variations on the term "thread". The primary 25context datastructure in this system is called a "threadmaster". What would 26today be called an 'event' or 'task' in systems such as libevent are called 27"threads" and the datastructure for them is ``struct thread``. To add to the 28confusion, these "threads" have various types, one of which is "event". To 29hopefully avoid some of this confusion, this document refers to these "threads" 30as a 'task' except where the datastructures are explicitly named. When they are 31explicitly named, they will be formatted ``like this`` to differentiate from 32the conceptual names. When speaking of kernel threads, the term used will be 33"pthread" since FRR's kernel threading implementation is POSIX threads. 34 35.. This should be broken into its document under :ref:`libfrr` 36.. _event-architecture: 37 38Event Architecture 39------------------ 40This section presents a brief overview of the event model as currently 41implemented in FRR. This doc should be expanded and broken off into its own 42section. For now it provides basic information necessary to understand the 43interplay between the event system and kernel threads. 44 45The core event system is implemented in :file:`lib/thread.[ch]`. The primary 46structure is ``struct thread_master``, hereafter referred to as a 47``threadmaster``. A ``threadmaster`` is a global state object, or context, that 48holds all the tasks currently pending execution as well as statistics on tasks 49that have already executed. The event system is driven by adding tasks to this 50data structure and then calling a function to retrieve the next task to 51execute. At initialization, a daemon will typically create one 52``threadmaster``, add a small set of initial tasks, and then run a loop to 53fetch each task and execute it. 54 55These tasks have various types corresponding to their general action. The types 56are given by integer macros in :file:`thread.h` and are: 57 58``THREAD_READ`` 59 Task which waits for a file descriptor to become ready for reading and then 60 executes. 61 62``THREAD_WRITE`` 63 Task which waits for a file descriptor to become ready for writing and then 64 executes. 65 66``THREAD_TIMER`` 67 Task which executes after a certain amount of time has passed since it was 68 scheduled. 69 70``THREAD_EVENT`` 71 Generic task that executes with high priority and carries an arbitrary 72 integer indicating the event type to its handler. These are commonly used to 73 implement the finite state machines typically found in routing protocols. 74 75``THREAD_READY`` 76 Type used internally for tasks on the ready queue. 77 78``THREAD_UNUSED`` 79 Type used internally for ``struct thread`` objects that aren't being used. 80 The event system pools ``struct thread`` to avoid heap allocations; this is 81 the type they have when they're in the pool. 82 83``THREAD_EXECUTE`` 84 Just before a task is run its type is changed to this. This is used to show 85 ``X`` as the type in the output of :clicmd:`show thread cpu`. 86 87The programmer never has to work with these types explicitly. Each type of task 88is created and queued via special-purpose functions (actually macros, but 89irrelevant for the time being) for the specific type. For example, to add a 90``THREAD_READ`` task, you would call 91 92:: 93 94 thread_add_read(struct thread_master *master, int (*handler)(struct thread *), void *arg, int fd, struct thread **ref); 95 96The ``struct thread`` is then created and added to the appropriate internal 97datastructure within the ``threadmaster``. 98 99The Event Loop 100^^^^^^^^^^^^^^ 101To use the event system, after creating a ``threadmaster`` the program adds an 102initial set of tasks. As these tasks execute, they add more tasks that execute 103at some point in the future. This sequence of tasks drives the lifecycle of the 104program. When no more tasks are available, the program dies. Typically at 105startup the first task added is an I/O task for VTYSH as well as any network 106sockets needed for peerings or IPC. 107 108To retrieve the next task to run the program calls ``thread_fetch()``. 109``thread_fetch()`` internally computes which task to execute next based on 110rudimentary priority logic. Events (type ``THREAD_EVENT``) execute with the 111highest priority, followed by expired timers and finally I/O tasks (type 112``THREAD_READ`` and ``THREAD_WRITE``). When scheduling a task a function and an 113arbitrary argument are provided. The task returned from ``thread_fetch()`` is 114then executed with ``thread_call()``. 115 116The following diagram illustrates a simplified version of this infrastructure. 117 118.. todo: replace these with SVG 119.. figure:: ../figures/threadmaster-single.png 120 :align: center 121 122 Lifecycle of a program using a single threadmaster. 123 124The series of "task" boxes represents the current ready task queue. The various 125other queues for other types are not shown. The fetch-execute loop is 126illustrated at the bottom. 127 128Mapping the general names used in the figure to specific FRR functions: 129 130- ``task`` is ``struct thread *`` 131- ``fetch`` is ``thread_fetch()`` 132- ``exec()`` is ``thread_call`` 133- ``cancel()`` is ``thread_cancel()`` 134- ``schedule()`` is any of the various task-specific ``thread_add_*`` functions 135 136Adding tasks is done with various task-specific function-like macros. These 137macros wrap underlying functions in :file:`thread.c` to provide additional 138information added at compile time, such as the line number the task was 139scheduled from, that can be accessed at runtime for debugging, logging and 140informational purposes. Each task type has its own specific scheduling function 141that follow the naming convention ``thread_add_<type>``; see :file:`thread.h` 142for details. 143 144There are some gotchas to keep in mind: 145 146- I/O tasks are keyed off the file descriptor associated with the I/O 147 operation. This means that for any given file descriptor, only one of each 148 type of I/O task (``THREAD_READ`` and ``THREAD_WRITE``) can be scheduled. For 149 example, scheduling two write tasks one after the other will overwrite the 150 first task with the second, resulting in total loss of the first task and 151 difficult bugs. 152 153- Timer tasks are only as accurate as the monotonic clock provided by the 154 underlying operating system. 155 156- Memory management of the arbitrary handler argument passed in the schedule 157 call is the responsibility of the caller. 158 159 160Kernel Thread Architecture 161-------------------------- 162Efforts have begun to introduce kernel threads into FRR to improve performance 163and stability. Naturally a kernel thread architecture has long been seen as 164orthogonal to an event-driven architecture, and the two do have significant 165overlap in terms of design choices. Since the event model is tightly integrated 166into FRR, careful thought has been put into how pthreads are introduced, what 167role they fill, and how they will interoperate with the event model. 168 169Design Overview 170^^^^^^^^^^^^^^^ 171Each kernel thread behaves as a lightweight process within FRR, sharing the 172same process memory space. On the other hand, the event system is designed to 173run in a single process and drive serial execution of a set of tasks. With this 174consideration, a natural choice is to implement the event system within each 175kernel thread. This allows us to leverage the event-driven execution model with 176the currently existing task and context primitives. In this way the familiar 177execution model of FRR gains the ability to execute tasks simultaneously while 178preserving the existing model for concurrency. 179 180The following figure illustrates the architecture with multiple pthreads, each 181running their own ``threadmaster``-based event loop. 182 183.. todo: replace these with SVG 184.. figure:: ../figures/threadmaster-multiple.png 185 :align: center 186 187 Lifecycle of a program using multiple pthreads, each running their own 188 ``threadmaster`` 189 190Each roundrect represents a single pthread running the same event loop 191described under :ref:`event-architecture`. Note the arrow from the ``exec()`` 192box on the right to the ``schedule()`` box in the middle pthread. This 193illustrates code running in one pthread scheduling a task onto another 194pthread's threadmaster. A global lock for each ``threadmaster`` is used to 195synchronize these operations. The pthread names are examples. 196 197 198.. This should be broken into its document under :ref:`libfrr` 199.. _kernel-thread-wrapper: 200 201Kernel Thread Wrapper 202^^^^^^^^^^^^^^^^^^^^^ 203The basis for the integration of pthreads and the event system is a lightweight 204wrapper for both systems implemented in :file:`lib/frr_pthread.[ch]`. The 205header provides a core datastructure, ``struct frr_pthread``, that encapsulates 206structures from both POSIX threads and :file:`thread.[ch]`. In particular, this 207datastructure has a pointer to a ``threadmaster`` that runs within the pthread. 208It also has fields for a name as well as start and stop functions that have 209signatures similar to the POSIX arguments for ``pthread_create()``. 210 211Calling ``frr_pthread_new()`` creates and registers a new ``frr_pthread``. The 212returned structure has a pre-initialized ``threadmaster``, and its ``start`` 213and ``stop`` functions are initialized to defaults that will run a basic event 214loop with the given threadmaster. Calling ``frr_pthread_run`` starts the thread 215with the ``start`` function. From there, the model is the same as the regular 216event model. To schedule tasks on a particular pthread, simply use the regular 217:file:`thread.c` functions as usual and provide the ``threadmaster`` pointed to 218from the ``frr_pthread``. As part of implementing the wrapper, the 219:file:`thread.c` functions were made thread-safe. Consequently, it is safe to 220schedule events on a ``threadmaster`` belonging both to the calling thread as 221well as *any other pthread*. This serves as the basis for inter-thread 222communication and boils down to a slightly more complicated method of message 223passing, where the messages are the regular task events as used in the 224event-driven model. The only difference is thread cancellation, which requires 225calling ``thread_cancel_async()`` instead of ``thread_cancel`` to cancel a task 226currently scheduled on a ``threadmaster`` belonging to a different pthread. 227This is necessary to avoid race conditions in the specific case where one 228pthread wants to guarantee that a task on another pthread is cancelled before 229proceeding. 230 231In addition, the existing commands to show statistics and other information for 232tasks within the event driven model have been expanded to handle multiple 233pthreads; running :clicmd:`show thread cpu` will display the usual event 234breakdown, but it will do so for each pthread running in the program. For 235example, :ref:`bgpd` runs a dedicated I/O pthread and shows the following 236output for :clicmd:`show thread cpu`: 237 238:: 239 240 frr# show thread cpu 241 242 Thread statistics for bgpd: 243 244 Showing statistics for pthread main 245 ------------------------------------ 246 CPU (user+system): Real (wall-clock): 247 Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread 248 0 1389.000 10 138900 248000 135549 255349 T subgroup_coalesce_timer 249 0 0.000 1 0 0 18 18 T bgp_startup_timer_expire 250 0 850.000 18 47222 222000 47795 233814 T work_queue_run 251 0 0.000 10 0 0 6 14 T update_subgroup_merge_check_thread_cb 252 0 0.000 8 0 0 117 160 W zclient_flush_data 253 2 2.000 1 2000 2000 831 831 R bgp_accept 254 0 1.000 1 1000 1000 2832 2832 E zclient_connect 255 1 42082.000 240574 174 37000 178 72810 R vtysh_read 256 1 152.000 1885 80 2000 96 6292 R zclient_read 257 0 549346.000 2997298 183 7000 153 20242 E bgp_event 258 0 2120.000 300 7066 14000 6813 22046 T (bgp_holdtime_timer) 259 0 0.000 2 0 0 57 59 T update_group_refresh_default_originate_route_map 260 0 90.000 1 90000 90000 73729 73729 T bgp_route_map_update_timer 261 0 1417.000 9147 154 48000 132 61998 T bgp_process_packet 262 300 71807.000 2995200 23 3000 24 11066 T (bgp_connect_timer) 263 0 1894.000 12713 148 45000 112 33606 T (bgp_generate_updgrp_packets) 264 0 0.000 1 0 0 105 105 W vtysh_write 265 0 52.000 599 86 2000 138 6992 T (bgp_start_timer) 266 1 1.000 8 125 1000 164 593 R vtysh_accept 267 0 15.000 600 25 2000 15 153 T (bgp_routeadv_timer) 268 0 11.000 299 36 3000 53 3128 RW bgp_connect_check 269 270 271 Showing statistics for pthread BGP I/O thread 272 ---------------------------------------------- 273 CPU (user+system): Real (wall-clock): 274 Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread 275 0 1611.000 9296 173 13000 188 13685 R bgp_process_reads 276 0 2995.000 11753 254 26000 182 29355 W bgp_process_writes 277 278 279 Showing statistics for pthread BGP Keepalives thread 280 ----------------------------------------------------- 281 CPU (user+system): Real (wall-clock): 282 Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread 283 No data to display yet. 284 285Attentive readers will notice that there is a third thread, the Keepalives 286thread. This thread is responsible for -- surprise -- generating keepalives for 287peers. However, there are no statistics showing for that thread. Although the 288pthread uses the ``frr_pthread`` wrapper, it opts not to use the embedded 289``threadmaster`` facilities. Instead it replaces the ``start`` and ``stop`` 290functions with custom functions. This was done because the ``threadmaster`` 291facilities introduce a small but significant amount of overhead relative to the 292pthread's task. In this case since the pthread does not need the event-driven 293model and does not need to receive tasks from other pthreads, it is simpler and 294more efficient to implement it outside of the provided event facilities. The 295point to take away from this example is that while the facilities to make using 296pthreads within FRR easy are already implemented, the wrapper is flexible and 297allows usage of other models while still integrating with the rest of the FRR 298core infrastructure. Starting and stopping this pthread works the same as it 299does for any other ``frr_pthread``; the only difference is that event 300statistics are not collected for it, because there are no events. 301 302Notes on Design and Documentation 303--------------------------------- 304Because of the choice to embed the existing event system into each pthread 305within FRR, at this time there is not integrated support for other models of 306pthread use such as divide and conquer. Similarly, there is no explicit support 307for thread pooling or similar higher level constructs. The currently existing 308infrastructure is designed around the concept of long-running worker threads 309responsible for specific jobs within each daemon. This is not to say that 310divide and conquer, thread pooling, etc. could not be implemented in the 311future. However, designs in this direction must be very careful to take into 312account the existing codebase. Introducing kernel threads into programs that 313have been written under the assumption of a single thread of execution must be 314done very carefully to avoid insidious errors and to ensure the program remains 315understandable and maintainable. 316 317In keeping with these goals, future work on kernel threading should be 318extensively documented here and FRR developers should be very careful with 319their design choices, as poor choices tightly integrated can prove to be 320catastrophic for development efforts in the future. 321