1# Scaling synapse via workers
2
3For small instances it recommended to run Synapse in the default monolith mode.
4For larger instances where performance is a concern it can be helpful to split
5out functionality into multiple separate python processes. These processes are
6called 'workers', and are (eventually) intended to scale horizontally
7independently.
8
9Synapse's worker support is under active development and subject to change as
10we attempt to rapidly scale ever larger Synapse instances. However we are
11documenting it here to help admins needing a highly scalable Synapse instance
12similar to the one running `matrix.org`.
13
14All processes continue to share the same database instance, and as such,
15workers only work with PostgreSQL-based Synapse deployments. SQLite should only
16be used for demo purposes and any admin considering workers should already be
17running PostgreSQL.
18
19See also [Matrix.org blog post](https://matrix.org/blog/2020/11/03/how-we-fixed-synapses-scalability)
20for a higher level overview.
21
22## Main process/worker communication
23
24The processes communicate with each other via a Synapse-specific protocol called
25'replication' (analogous to MySQL- or Postgres-style database replication) which
26feeds streams of newly written data between processes so they can be kept in
27sync with the database state.
28
29When configured to do so, Synapse uses a
30[Redis pub/sub channel](https://redis.io/topics/pubsub) to send the replication
31stream between all configured Synapse processes. Additionally, processes may
32make HTTP requests to each other, primarily for operations which need to wait
33for a reply ─ such as sending an event.
34
35Redis support was added in v1.13.0 with it becoming the recommended method in
36v1.18.0. It replaced the old direct TCP connections (which is deprecated as of
37v1.18.0) to the main process. With Redis, rather than all the workers connecting
38to the main process, all the workers and the main process connect to Redis,
39which relays replication commands between processes. This can give a significant
40cpu saving on the main process and will be a prerequisite for upcoming
41performance improvements.
42
43If Redis support is enabled Synapse will use it as a shared cache, as well as a
44pub/sub mechanism.
45
46See the [Architectural diagram](#architectural-diagram) section at the end for
47a visualisation of what this looks like.
48
49
50## Setting up workers
51
52A Redis server is required to manage the communication between the processes.
53The Redis server should be installed following the normal procedure for your
54distribution (e.g. `apt install redis-server` on Debian). It is safe to use an
55existing Redis deployment if you have one.
56
57Once installed, check that Redis is running and accessible from the host running
58Synapse, for example by executing `echo PING | nc -q1 localhost 6379` and seeing
59a response of `+PONG`.
60
61The appropriate dependencies must also be installed for Synapse. If using a
62virtualenv, these can be installed with:
63
64```sh
65pip install "matrix-synapse[redis]"
66```
67
68Note that these dependencies are included when synapse is installed with `pip
69install matrix-synapse[all]`. They are also included in the debian packages from
70`matrix.org` and in the docker images at
71https://hub.docker.com/r/matrixdotorg/synapse/.
72
73To make effective use of the workers, you will need to configure an HTTP
74reverse-proxy such as nginx or haproxy, which will direct incoming requests to
75the correct worker, or to the main synapse instance. See
76[the reverse proxy documentation](reverse_proxy.md) for information on setting up a reverse
77proxy.
78
79When using workers, each worker process has its own configuration file which
80contains settings specific to that worker, such as the HTTP listener that it
81provides (if any), logging configuration, etc.
82
83Normally, the worker processes are configured to read from a shared
84configuration file as well as the worker-specific configuration files. This
85makes it easier to keep common configuration settings synchronised across all
86the processes.
87
88The main process is somewhat special in this respect: it does not normally
89need its own configuration file and can take all of its configuration from the
90shared configuration file.
91
92
93### Shared configuration
94
95Normally, only a couple of changes are needed to make an existing configuration
96file suitable for use with workers. First, you need to enable an "HTTP replication
97listener" for the main process; and secondly, you need to enable redis-based
98replication. Optionally, a shared secret can be used to authenticate HTTP
99traffic between workers. For example:
100
101
102```yaml
103# extend the existing `listeners` section. This defines the ports that the
104# main process will listen on.
105listeners:
106  # The HTTP replication port
107  - port: 9093
108    bind_address: '127.0.0.1'
109    type: http
110    resources:
111     - names: [replication]
112
113# Add a random shared secret to authenticate traffic.
114worker_replication_secret: ""
115
116redis:
117    enabled: true
118```
119
120See the sample config for the full documentation of each option.
121
122Under **no circumstances** should the replication listener be exposed to the
123public internet; it has no authentication and is unencrypted.
124
125
126### Worker configuration
127
128In the config file for each worker, you must specify the type of worker
129application (`worker_app`), and you should specify a unique name for the worker
130(`worker_name`). The currently available worker applications are listed below.
131You must also specify the HTTP replication endpoint that it should talk to on
132the main synapse process.  `worker_replication_host` should specify the host of
133the main synapse and `worker_replication_http_port` should point to the HTTP
134replication port. If the worker will handle HTTP requests then the
135`worker_listeners` option should be set with a `http` listener, in the same way
136as the `listeners` option in the shared config.
137
138For example:
139
140```yaml
141worker_app: synapse.app.generic_worker
142worker_name: worker1
143
144# The replication listener on the main synapse process.
145worker_replication_host: 127.0.0.1
146worker_replication_http_port: 9093
147
148worker_listeners:
149 - type: http
150   port: 8083
151   resources:
152     - names:
153       - client
154       - federation
155
156worker_log_config: /home/matrix/synapse/config/worker1_log_config.yaml
157```
158
159...is a full configuration for a generic worker instance, which will expose a
160plain HTTP endpoint on port 8083 separately serving various endpoints, e.g.
161`/sync`, which are listed below.
162
163Obviously you should configure your reverse-proxy to route the relevant
164endpoints to the worker (`localhost:8083` in the above example).
165
166
167### Running Synapse with workers
168
169Finally, you need to start your worker processes. This can be done with either
170`synctl` or your distribution's preferred service manager such as `systemd`. We
171recommend the use of `systemd` where available: for information on setting up
172`systemd` to start synapse workers, see
173[Systemd with Workers](systemd-with-workers). To use `synctl`, see
174[Using synctl with Workers](synctl_workers.md).
175
176
177## Available worker applications
178
179### `synapse.app.generic_worker`
180
181This worker can handle API requests matching the following regular
182expressions:
183
184    # Sync requests
185    ^/_matrix/client/(v2_alpha|r0|v3)/sync$
186    ^/_matrix/client/(api/v1|v2_alpha|r0|v3)/events$
187    ^/_matrix/client/(api/v1|r0|v3)/initialSync$
188    ^/_matrix/client/(api/v1|r0|v3)/rooms/[^/]+/initialSync$
189
190    # Federation requests
191    ^/_matrix/federation/v1/event/
192    ^/_matrix/federation/v1/state/
193    ^/_matrix/federation/v1/state_ids/
194    ^/_matrix/federation/v1/backfill/
195    ^/_matrix/federation/v1/get_missing_events/
196    ^/_matrix/federation/v1/publicRooms
197    ^/_matrix/federation/v1/query/
198    ^/_matrix/federation/v1/make_join/
199    ^/_matrix/federation/v1/make_leave/
200    ^/_matrix/federation/v1/send_join/
201    ^/_matrix/federation/v2/send_join/
202    ^/_matrix/federation/v1/send_leave/
203    ^/_matrix/federation/v2/send_leave/
204    ^/_matrix/federation/v1/invite/
205    ^/_matrix/federation/v2/invite/
206    ^/_matrix/federation/v1/query_auth/
207    ^/_matrix/federation/v1/event_auth/
208    ^/_matrix/federation/v1/exchange_third_party_invite/
209    ^/_matrix/federation/v1/user/devices/
210    ^/_matrix/federation/v1/get_groups_publicised$
211    ^/_matrix/key/v2/query
212    ^/_matrix/federation/unstable/org.matrix.msc2946/spaces/
213    ^/_matrix/federation/(v1|unstable/org.matrix.msc2946)/hierarchy/
214
215    # Inbound federation transaction request
216    ^/_matrix/federation/v1/send/
217
218    # Client API requests
219    ^/_matrix/client/(api/v1|r0|v3|unstable)/createRoom$
220    ^/_matrix/client/(api/v1|r0|v3|unstable)/publicRooms$
221    ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/joined_members$
222    ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/context/.*$
223    ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/members$
224    ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state$
225    ^/_matrix/client/unstable/org.matrix.msc2946/rooms/.*/spaces$
226    ^/_matrix/client/(v1|unstable/org.matrix.msc2946)/rooms/.*/hierarchy$
227    ^/_matrix/client/unstable/im.nheko.summary/rooms/.*/summary$
228    ^/_matrix/client/(api/v1|r0|v3|unstable)/account/3pid$
229    ^/_matrix/client/(api/v1|r0|v3|unstable)/devices$
230    ^/_matrix/client/(api/v1|r0|v3|unstable)/keys/query$
231    ^/_matrix/client/(api/v1|r0|v3|unstable)/keys/changes$
232    ^/_matrix/client/versions$
233    ^/_matrix/client/(api/v1|r0|v3|unstable)/voip/turnServer$
234    ^/_matrix/client/(api/v1|r0|v3|unstable)/joined_groups$
235    ^/_matrix/client/(api/v1|r0|v3|unstable)/publicised_groups$
236    ^/_matrix/client/(api/v1|r0|v3|unstable)/publicised_groups/
237    ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/event/
238    ^/_matrix/client/(api/v1|r0|v3|unstable)/joined_rooms$
239    ^/_matrix/client/(api/v1|r0|v3|unstable)/search$
240
241    # Registration/login requests
242    ^/_matrix/client/(api/v1|r0|v3|unstable)/login$
243    ^/_matrix/client/(r0|v3|unstable)/register$
244    ^/_matrix/client/unstable/org.matrix.msc3231/register/org.matrix.msc3231.login.registration_token/validity$
245
246    # Event sending requests
247    ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/redact
248    ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/send
249    ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state/
250    ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)$
251    ^/_matrix/client/(api/v1|r0|v3|unstable)/join/
252    ^/_matrix/client/(api/v1|r0|v3|unstable)/profile/
253
254
255Additionally, the following REST endpoints can be handled for GET requests:
256
257    ^/_matrix/federation/v1/groups/
258
259Pagination requests can also be handled, but all requests for a given
260room must be routed to the same instance. Additionally, care must be taken to
261ensure that the purge history admin API is not used while pagination requests
262for the room are in flight:
263
264    ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/messages$
265
266Additionally, the following endpoints should be included if Synapse is configured
267to use SSO (you only need to include the ones for whichever SSO provider you're
268using):
269
270    # for all SSO providers
271    ^/_matrix/client/(api/v1|r0|v3|unstable)/login/sso/redirect
272    ^/_synapse/client/pick_idp$
273    ^/_synapse/client/pick_username
274    ^/_synapse/client/new_user_consent$
275    ^/_synapse/client/sso_register$
276
277    # OpenID Connect requests.
278    ^/_synapse/client/oidc/callback$
279
280    # SAML requests.
281    ^/_synapse/client/saml2/authn_response$
282
283    # CAS requests.
284    ^/_matrix/client/(api/v1|r0|v3|unstable)/login/cas/ticket$
285
286Ensure that all SSO logins go to a single process.
287For multiple workers not handling the SSO endpoints properly, see
288[#7530](https://github.com/matrix-org/synapse/issues/7530) and
289[#9427](https://github.com/matrix-org/synapse/issues/9427).
290
291Note that a HTTP listener with `client` and `federation` resources must be
292configured in the `worker_listeners` option in the worker config.
293
294#### Load balancing
295
296It is possible to run multiple instances of this worker app, with incoming requests
297being load-balanced between them by the reverse-proxy. However, different endpoints
298have different characteristics and so admins
299may wish to run multiple groups of workers handling different endpoints so that
300load balancing can be done in different ways.
301
302For `/sync` and `/initialSync` requests it will be more efficient if all
303requests from a particular user are routed to a single instance. Extracting a
304user ID from the access token or `Authorization` header is currently left as an
305exercise for the reader. Admins may additionally wish to separate out `/sync`
306requests that have a `since` query parameter from those that don't (and
307`/initialSync`), as requests that don't are known as "initial sync" that happens
308when a user logs in on a new device and can be *very* resource intensive, so
309isolating these requests will stop them from interfering with other users ongoing
310syncs.
311
312Federation and client requests can be balanced via simple round robin.
313
314The inbound federation transaction request `^/_matrix/federation/v1/send/`
315should be balanced by source IP so that transactions from the same remote server
316go to the same process.
317
318Registration/login requests can be handled separately purely to help ensure that
319unexpected load doesn't affect new logins and sign ups.
320
321Finally, event sending requests can be balanced by the room ID in the URI (or
322the full URI, or even just round robin), the room ID is the path component after
323`/rooms/`. If there is a large bridge connected that is sending or may send lots
324of events, then a dedicated set of workers can be provisioned to limit the
325effects of bursts of events from that bridge on events sent by normal users.
326
327#### Stream writers
328
329Additionally, there is *experimental* support for moving writing of specific
330streams (such as events) off of the main process to a particular worker. (This
331is only supported with Redis-based replication.)
332
333Currently supported streams are `events` and `typing`.
334
335To enable this, the worker must have a HTTP replication listener configured,
336have a `worker_name` and be listed in the `instance_map` config. For example to
337move event persistence off to a dedicated worker, the shared configuration would
338include:
339
340```yaml
341instance_map:
342    event_persister1:
343        host: localhost
344        port: 8034
345
346stream_writers:
347    events: event_persister1
348```
349
350The `events` stream also experimentally supports having multiple writers, where
351work is sharded between them by room ID. Note that you *must* restart all worker
352instances when adding or removing event persisters. An example `stream_writers`
353configuration with multiple writers:
354
355```yaml
356stream_writers:
357    events:
358        - event_persister1
359        - event_persister2
360```
361
362#### Background tasks
363
364There is also *experimental* support for moving background tasks to a separate
365worker. Background tasks are run periodically or started via replication. Exactly
366which tasks are configured to run depends on your Synapse configuration (e.g. if
367stats is enabled).
368
369To enable this, the worker must have a `worker_name` and can be configured to run
370background tasks. For example, to move background tasks to a dedicated worker,
371the shared configuration would include:
372
373```yaml
374run_background_tasks_on: background_worker
375```
376
377You might also wish to investigate the `update_user_directory` and
378`media_instance_running_background_jobs` settings.
379
380### `synapse.app.pusher`
381
382Handles sending push notifications to sygnal and email. Doesn't handle any
383REST endpoints itself, but you should set `start_pushers: False` in the
384shared configuration file to stop the main synapse sending push notifications.
385
386To run multiple instances at once the `pusher_instances` option should list all
387pusher instances by their worker name, e.g.:
388
389```yaml
390pusher_instances:
391    - pusher_worker1
392    - pusher_worker2
393```
394
395
396### `synapse.app.appservice`
397
398Handles sending output traffic to Application Services. Doesn't handle any
399REST endpoints itself, but you should set `notify_appservices: False` in the
400shared configuration file to stop the main synapse sending appservice notifications.
401
402Note this worker cannot be load-balanced: only one instance should be active.
403
404
405### `synapse.app.federation_sender`
406
407Handles sending federation traffic to other servers. Doesn't handle any
408REST endpoints itself, but you should set `send_federation: False` in the
409shared configuration file to stop the main synapse sending this traffic.
410
411If running multiple federation senders then you must list each
412instance in the `federation_sender_instances` option by their `worker_name`.
413All instances must be stopped and started when adding or removing instances.
414For example:
415
416```yaml
417federation_sender_instances:
418    - federation_sender1
419    - federation_sender2
420```
421
422### `synapse.app.media_repository`
423
424Handles the media repository. It can handle all endpoints starting with:
425
426    /_matrix/media/
427
428... and the following regular expressions matching media-specific administration APIs:
429
430    ^/_synapse/admin/v1/purge_media_cache$
431    ^/_synapse/admin/v1/room/.*/media.*$
432    ^/_synapse/admin/v1/user/.*/media.*$
433    ^/_synapse/admin/v1/media/.*$
434    ^/_synapse/admin/v1/quarantine_media/.*$
435    ^/_synapse/admin/v1/users/.*/media$
436
437You should also set `enable_media_repo: False` in the shared configuration
438file to stop the main synapse running background jobs related to managing the
439media repository. Note that doing so will prevent the main process from being
440able to handle the above endpoints.
441
442In the `media_repository` worker configuration file, configure the http listener to
443expose the `media` resource. For example:
444
445```yaml
446worker_listeners:
447 - type: http
448   port: 8085
449   resources:
450     - names:
451       - media
452```
453
454Note that if running multiple media repositories they must be on the same server
455and you must configure a single instance to run the background tasks, e.g.:
456
457```yaml
458media_instance_running_background_jobs: "media-repository-1"
459```
460
461Note that if a reverse proxy is used , then `/_matrix/media/` must be routed for both inbound client and federation requests (if they are handled separately).
462
463### `synapse.app.user_dir`
464
465Handles searches in the user directory. It can handle REST endpoints matching
466the following regular expressions:
467
468    ^/_matrix/client/(api/v1|r0|v3|unstable)/user_directory/search$
469
470When using this worker you must also set `update_user_directory: False` in the
471shared configuration file to stop the main synapse running background
472jobs related to updating the user directory.
473
474### `synapse.app.frontend_proxy`
475
476Proxies some frequently-requested client endpoints to add caching and remove
477load from the main synapse. It can handle REST endpoints matching the following
478regular expressions:
479
480    ^/_matrix/client/(api/v1|r0|v3|unstable)/keys/upload
481
482If `use_presence` is False in the homeserver config, it can also handle REST
483endpoints matching the following regular expressions:
484
485    ^/_matrix/client/(api/v1|r0|v3|unstable)/presence/[^/]+/status
486
487This "stub" presence handler will pass through `GET` request but make the
488`PUT` effectively a no-op.
489
490It will proxy any requests it cannot handle to the main synapse instance. It
491must therefore be configured with the location of the main instance, via
492the `worker_main_http_uri` setting in the `frontend_proxy` worker configuration
493file. For example:
494
495```yaml
496worker_main_http_uri: http://127.0.0.1:8008
497```
498
499### Historical apps
500
501*Note:* Historically there used to be more apps, however they have been
502amalgamated into a single `synapse.app.generic_worker` app. The remaining apps
503are ones that do specific processing unrelated to requests, e.g. the `pusher`
504that handles sending out push notifications for new events. The intention is for
505all these to be folded into the `generic_worker` app and to use config to define
506which processes handle the various proccessing such as push notifications.
507
508
509## Migration from old config
510
511There are two main independent changes that have been made: introducing Redis
512support and merging apps into `synapse.app.generic_worker`. Both these changes
513are backwards compatible and so no changes to the config are required, however
514server admins are encouraged to plan to migrate to Redis as the old style direct
515TCP replication config is deprecated.
516
517To migrate to Redis add the `redis` config as above, and optionally remove the
518TCP `replication` listener from master and `worker_replication_port` from worker
519config.
520
521To migrate apps to use `synapse.app.generic_worker` simply update the
522`worker_app` option in the worker configs, and where worker are started (e.g.
523in systemd service files, but not required for synctl).
524
525
526## Architectural diagram
527
528The following shows an example setup using Redis and a reverse proxy:
529
530```
531                     Clients & Federation
532                              |
533                              v
534                        +-----------+
535                        |           |
536                        |  Reverse  |
537                        |  Proxy    |
538                        |           |
539                        +-----------+
540                            | | |
541                            | | | HTTP requests
542        +-------------------+ | +-----------+
543        |                 +---+             |
544        |                 |                 |
545        v                 v                 v
546+--------------+  +--------------+  +--------------+  +--------------+
547|   Main       |  |   Generic    |  |   Generic    |  |  Event       |
548|   Process    |  |   Worker 1   |  |   Worker 2   |  |  Persister   |
549+--------------+  +--------------+  +--------------+  +--------------+
550      ^    ^          |   ^   |         |   ^   |          ^    ^
551      |    |          |   |   |         |   |   |          |    |
552      |    |          |   |   |  HTTP   |   |   |          |    |
553      |    +----------+<--|---|---------+   |   |          |    |
554      |                   |   +-------------|-->+----------+    |
555      |                   |                 |                   |
556      |                   |                 |                   |
557      v                   v                 v                   v
558====================================================================
559                                                         Redis pub/sub channel
560```
561