1# Scaling synapse via workers 2 3For small instances it recommended to run Synapse in the default monolith mode. 4For larger instances where performance is a concern it can be helpful to split 5out functionality into multiple separate python processes. These processes are 6called 'workers', and are (eventually) intended to scale horizontally 7independently. 8 9Synapse's worker support is under active development and subject to change as 10we attempt to rapidly scale ever larger Synapse instances. However we are 11documenting it here to help admins needing a highly scalable Synapse instance 12similar to the one running `matrix.org`. 13 14All processes continue to share the same database instance, and as such, 15workers only work with PostgreSQL-based Synapse deployments. SQLite should only 16be used for demo purposes and any admin considering workers should already be 17running PostgreSQL. 18 19See also [Matrix.org blog post](https://matrix.org/blog/2020/11/03/how-we-fixed-synapses-scalability) 20for a higher level overview. 21 22## Main process/worker communication 23 24The processes communicate with each other via a Synapse-specific protocol called 25'replication' (analogous to MySQL- or Postgres-style database replication) which 26feeds streams of newly written data between processes so they can be kept in 27sync with the database state. 28 29When configured to do so, Synapse uses a 30[Redis pub/sub channel](https://redis.io/topics/pubsub) to send the replication 31stream between all configured Synapse processes. Additionally, processes may 32make HTTP requests to each other, primarily for operations which need to wait 33for a reply ─ such as sending an event. 34 35Redis support was added in v1.13.0 with it becoming the recommended method in 36v1.18.0. It replaced the old direct TCP connections (which is deprecated as of 37v1.18.0) to the main process. With Redis, rather than all the workers connecting 38to the main process, all the workers and the main process connect to Redis, 39which relays replication commands between processes. This can give a significant 40cpu saving on the main process and will be a prerequisite for upcoming 41performance improvements. 42 43If Redis support is enabled Synapse will use it as a shared cache, as well as a 44pub/sub mechanism. 45 46See the [Architectural diagram](#architectural-diagram) section at the end for 47a visualisation of what this looks like. 48 49 50## Setting up workers 51 52A Redis server is required to manage the communication between the processes. 53The Redis server should be installed following the normal procedure for your 54distribution (e.g. `apt install redis-server` on Debian). It is safe to use an 55existing Redis deployment if you have one. 56 57Once installed, check that Redis is running and accessible from the host running 58Synapse, for example by executing `echo PING | nc -q1 localhost 6379` and seeing 59a response of `+PONG`. 60 61The appropriate dependencies must also be installed for Synapse. If using a 62virtualenv, these can be installed with: 63 64```sh 65pip install "matrix-synapse[redis]" 66``` 67 68Note that these dependencies are included when synapse is installed with `pip 69install matrix-synapse[all]`. They are also included in the debian packages from 70`matrix.org` and in the docker images at 71https://hub.docker.com/r/matrixdotorg/synapse/. 72 73To make effective use of the workers, you will need to configure an HTTP 74reverse-proxy such as nginx or haproxy, which will direct incoming requests to 75the correct worker, or to the main synapse instance. See 76[the reverse proxy documentation](reverse_proxy.md) for information on setting up a reverse 77proxy. 78 79When using workers, each worker process has its own configuration file which 80contains settings specific to that worker, such as the HTTP listener that it 81provides (if any), logging configuration, etc. 82 83Normally, the worker processes are configured to read from a shared 84configuration file as well as the worker-specific configuration files. This 85makes it easier to keep common configuration settings synchronised across all 86the processes. 87 88The main process is somewhat special in this respect: it does not normally 89need its own configuration file and can take all of its configuration from the 90shared configuration file. 91 92 93### Shared configuration 94 95Normally, only a couple of changes are needed to make an existing configuration 96file suitable for use with workers. First, you need to enable an "HTTP replication 97listener" for the main process; and secondly, you need to enable redis-based 98replication. Optionally, a shared secret can be used to authenticate HTTP 99traffic between workers. For example: 100 101 102```yaml 103# extend the existing `listeners` section. This defines the ports that the 104# main process will listen on. 105listeners: 106 # The HTTP replication port 107 - port: 9093 108 bind_address: '127.0.0.1' 109 type: http 110 resources: 111 - names: [replication] 112 113# Add a random shared secret to authenticate traffic. 114worker_replication_secret: "" 115 116redis: 117 enabled: true 118``` 119 120See the sample config for the full documentation of each option. 121 122Under **no circumstances** should the replication listener be exposed to the 123public internet; it has no authentication and is unencrypted. 124 125 126### Worker configuration 127 128In the config file for each worker, you must specify the type of worker 129application (`worker_app`), and you should specify a unique name for the worker 130(`worker_name`). The currently available worker applications are listed below. 131You must also specify the HTTP replication endpoint that it should talk to on 132the main synapse process. `worker_replication_host` should specify the host of 133the main synapse and `worker_replication_http_port` should point to the HTTP 134replication port. If the worker will handle HTTP requests then the 135`worker_listeners` option should be set with a `http` listener, in the same way 136as the `listeners` option in the shared config. 137 138For example: 139 140```yaml 141worker_app: synapse.app.generic_worker 142worker_name: worker1 143 144# The replication listener on the main synapse process. 145worker_replication_host: 127.0.0.1 146worker_replication_http_port: 9093 147 148worker_listeners: 149 - type: http 150 port: 8083 151 resources: 152 - names: 153 - client 154 - federation 155 156worker_log_config: /home/matrix/synapse/config/worker1_log_config.yaml 157``` 158 159...is a full configuration for a generic worker instance, which will expose a 160plain HTTP endpoint on port 8083 separately serving various endpoints, e.g. 161`/sync`, which are listed below. 162 163Obviously you should configure your reverse-proxy to route the relevant 164endpoints to the worker (`localhost:8083` in the above example). 165 166 167### Running Synapse with workers 168 169Finally, you need to start your worker processes. This can be done with either 170`synctl` or your distribution's preferred service manager such as `systemd`. We 171recommend the use of `systemd` where available: for information on setting up 172`systemd` to start synapse workers, see 173[Systemd with Workers](systemd-with-workers). To use `synctl`, see 174[Using synctl with Workers](synctl_workers.md). 175 176 177## Available worker applications 178 179### `synapse.app.generic_worker` 180 181This worker can handle API requests matching the following regular 182expressions: 183 184 # Sync requests 185 ^/_matrix/client/(v2_alpha|r0|v3)/sync$ 186 ^/_matrix/client/(api/v1|v2_alpha|r0|v3)/events$ 187 ^/_matrix/client/(api/v1|r0|v3)/initialSync$ 188 ^/_matrix/client/(api/v1|r0|v3)/rooms/[^/]+/initialSync$ 189 190 # Federation requests 191 ^/_matrix/federation/v1/event/ 192 ^/_matrix/federation/v1/state/ 193 ^/_matrix/federation/v1/state_ids/ 194 ^/_matrix/federation/v1/backfill/ 195 ^/_matrix/federation/v1/get_missing_events/ 196 ^/_matrix/federation/v1/publicRooms 197 ^/_matrix/federation/v1/query/ 198 ^/_matrix/federation/v1/make_join/ 199 ^/_matrix/federation/v1/make_leave/ 200 ^/_matrix/federation/v1/send_join/ 201 ^/_matrix/federation/v2/send_join/ 202 ^/_matrix/federation/v1/send_leave/ 203 ^/_matrix/federation/v2/send_leave/ 204 ^/_matrix/federation/v1/invite/ 205 ^/_matrix/federation/v2/invite/ 206 ^/_matrix/federation/v1/query_auth/ 207 ^/_matrix/federation/v1/event_auth/ 208 ^/_matrix/federation/v1/exchange_third_party_invite/ 209 ^/_matrix/federation/v1/user/devices/ 210 ^/_matrix/federation/v1/get_groups_publicised$ 211 ^/_matrix/key/v2/query 212 ^/_matrix/federation/unstable/org.matrix.msc2946/spaces/ 213 ^/_matrix/federation/(v1|unstable/org.matrix.msc2946)/hierarchy/ 214 215 # Inbound federation transaction request 216 ^/_matrix/federation/v1/send/ 217 218 # Client API requests 219 ^/_matrix/client/(api/v1|r0|v3|unstable)/createRoom$ 220 ^/_matrix/client/(api/v1|r0|v3|unstable)/publicRooms$ 221 ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/joined_members$ 222 ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/context/.*$ 223 ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/members$ 224 ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state$ 225 ^/_matrix/client/unstable/org.matrix.msc2946/rooms/.*/spaces$ 226 ^/_matrix/client/(v1|unstable/org.matrix.msc2946)/rooms/.*/hierarchy$ 227 ^/_matrix/client/unstable/im.nheko.summary/rooms/.*/summary$ 228 ^/_matrix/client/(api/v1|r0|v3|unstable)/account/3pid$ 229 ^/_matrix/client/(api/v1|r0|v3|unstable)/devices$ 230 ^/_matrix/client/(api/v1|r0|v3|unstable)/keys/query$ 231 ^/_matrix/client/(api/v1|r0|v3|unstable)/keys/changes$ 232 ^/_matrix/client/versions$ 233 ^/_matrix/client/(api/v1|r0|v3|unstable)/voip/turnServer$ 234 ^/_matrix/client/(api/v1|r0|v3|unstable)/joined_groups$ 235 ^/_matrix/client/(api/v1|r0|v3|unstable)/publicised_groups$ 236 ^/_matrix/client/(api/v1|r0|v3|unstable)/publicised_groups/ 237 ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/event/ 238 ^/_matrix/client/(api/v1|r0|v3|unstable)/joined_rooms$ 239 ^/_matrix/client/(api/v1|r0|v3|unstable)/search$ 240 241 # Registration/login requests 242 ^/_matrix/client/(api/v1|r0|v3|unstable)/login$ 243 ^/_matrix/client/(r0|v3|unstable)/register$ 244 ^/_matrix/client/unstable/org.matrix.msc3231/register/org.matrix.msc3231.login.registration_token/validity$ 245 246 # Event sending requests 247 ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/redact 248 ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/send 249 ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state/ 250 ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)$ 251 ^/_matrix/client/(api/v1|r0|v3|unstable)/join/ 252 ^/_matrix/client/(api/v1|r0|v3|unstable)/profile/ 253 254 255Additionally, the following REST endpoints can be handled for GET requests: 256 257 ^/_matrix/federation/v1/groups/ 258 259Pagination requests can also be handled, but all requests for a given 260room must be routed to the same instance. Additionally, care must be taken to 261ensure that the purge history admin API is not used while pagination requests 262for the room are in flight: 263 264 ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/messages$ 265 266Additionally, the following endpoints should be included if Synapse is configured 267to use SSO (you only need to include the ones for whichever SSO provider you're 268using): 269 270 # for all SSO providers 271 ^/_matrix/client/(api/v1|r0|v3|unstable)/login/sso/redirect 272 ^/_synapse/client/pick_idp$ 273 ^/_synapse/client/pick_username 274 ^/_synapse/client/new_user_consent$ 275 ^/_synapse/client/sso_register$ 276 277 # OpenID Connect requests. 278 ^/_synapse/client/oidc/callback$ 279 280 # SAML requests. 281 ^/_synapse/client/saml2/authn_response$ 282 283 # CAS requests. 284 ^/_matrix/client/(api/v1|r0|v3|unstable)/login/cas/ticket$ 285 286Ensure that all SSO logins go to a single process. 287For multiple workers not handling the SSO endpoints properly, see 288[#7530](https://github.com/matrix-org/synapse/issues/7530) and 289[#9427](https://github.com/matrix-org/synapse/issues/9427). 290 291Note that a HTTP listener with `client` and `federation` resources must be 292configured in the `worker_listeners` option in the worker config. 293 294#### Load balancing 295 296It is possible to run multiple instances of this worker app, with incoming requests 297being load-balanced between them by the reverse-proxy. However, different endpoints 298have different characteristics and so admins 299may wish to run multiple groups of workers handling different endpoints so that 300load balancing can be done in different ways. 301 302For `/sync` and `/initialSync` requests it will be more efficient if all 303requests from a particular user are routed to a single instance. Extracting a 304user ID from the access token or `Authorization` header is currently left as an 305exercise for the reader. Admins may additionally wish to separate out `/sync` 306requests that have a `since` query parameter from those that don't (and 307`/initialSync`), as requests that don't are known as "initial sync" that happens 308when a user logs in on a new device and can be *very* resource intensive, so 309isolating these requests will stop them from interfering with other users ongoing 310syncs. 311 312Federation and client requests can be balanced via simple round robin. 313 314The inbound federation transaction request `^/_matrix/federation/v1/send/` 315should be balanced by source IP so that transactions from the same remote server 316go to the same process. 317 318Registration/login requests can be handled separately purely to help ensure that 319unexpected load doesn't affect new logins and sign ups. 320 321Finally, event sending requests can be balanced by the room ID in the URI (or 322the full URI, or even just round robin), the room ID is the path component after 323`/rooms/`. If there is a large bridge connected that is sending or may send lots 324of events, then a dedicated set of workers can be provisioned to limit the 325effects of bursts of events from that bridge on events sent by normal users. 326 327#### Stream writers 328 329Additionally, there is *experimental* support for moving writing of specific 330streams (such as events) off of the main process to a particular worker. (This 331is only supported with Redis-based replication.) 332 333Currently supported streams are `events` and `typing`. 334 335To enable this, the worker must have a HTTP replication listener configured, 336have a `worker_name` and be listed in the `instance_map` config. For example to 337move event persistence off to a dedicated worker, the shared configuration would 338include: 339 340```yaml 341instance_map: 342 event_persister1: 343 host: localhost 344 port: 8034 345 346stream_writers: 347 events: event_persister1 348``` 349 350The `events` stream also experimentally supports having multiple writers, where 351work is sharded between them by room ID. Note that you *must* restart all worker 352instances when adding or removing event persisters. An example `stream_writers` 353configuration with multiple writers: 354 355```yaml 356stream_writers: 357 events: 358 - event_persister1 359 - event_persister2 360``` 361 362#### Background tasks 363 364There is also *experimental* support for moving background tasks to a separate 365worker. Background tasks are run periodically or started via replication. Exactly 366which tasks are configured to run depends on your Synapse configuration (e.g. if 367stats is enabled). 368 369To enable this, the worker must have a `worker_name` and can be configured to run 370background tasks. For example, to move background tasks to a dedicated worker, 371the shared configuration would include: 372 373```yaml 374run_background_tasks_on: background_worker 375``` 376 377You might also wish to investigate the `update_user_directory` and 378`media_instance_running_background_jobs` settings. 379 380### `synapse.app.pusher` 381 382Handles sending push notifications to sygnal and email. Doesn't handle any 383REST endpoints itself, but you should set `start_pushers: False` in the 384shared configuration file to stop the main synapse sending push notifications. 385 386To run multiple instances at once the `pusher_instances` option should list all 387pusher instances by their worker name, e.g.: 388 389```yaml 390pusher_instances: 391 - pusher_worker1 392 - pusher_worker2 393``` 394 395 396### `synapse.app.appservice` 397 398Handles sending output traffic to Application Services. Doesn't handle any 399REST endpoints itself, but you should set `notify_appservices: False` in the 400shared configuration file to stop the main synapse sending appservice notifications. 401 402Note this worker cannot be load-balanced: only one instance should be active. 403 404 405### `synapse.app.federation_sender` 406 407Handles sending federation traffic to other servers. Doesn't handle any 408REST endpoints itself, but you should set `send_federation: False` in the 409shared configuration file to stop the main synapse sending this traffic. 410 411If running multiple federation senders then you must list each 412instance in the `federation_sender_instances` option by their `worker_name`. 413All instances must be stopped and started when adding or removing instances. 414For example: 415 416```yaml 417federation_sender_instances: 418 - federation_sender1 419 - federation_sender2 420``` 421 422### `synapse.app.media_repository` 423 424Handles the media repository. It can handle all endpoints starting with: 425 426 /_matrix/media/ 427 428... and the following regular expressions matching media-specific administration APIs: 429 430 ^/_synapse/admin/v1/purge_media_cache$ 431 ^/_synapse/admin/v1/room/.*/media.*$ 432 ^/_synapse/admin/v1/user/.*/media.*$ 433 ^/_synapse/admin/v1/media/.*$ 434 ^/_synapse/admin/v1/quarantine_media/.*$ 435 ^/_synapse/admin/v1/users/.*/media$ 436 437You should also set `enable_media_repo: False` in the shared configuration 438file to stop the main synapse running background jobs related to managing the 439media repository. Note that doing so will prevent the main process from being 440able to handle the above endpoints. 441 442In the `media_repository` worker configuration file, configure the http listener to 443expose the `media` resource. For example: 444 445```yaml 446worker_listeners: 447 - type: http 448 port: 8085 449 resources: 450 - names: 451 - media 452``` 453 454Note that if running multiple media repositories they must be on the same server 455and you must configure a single instance to run the background tasks, e.g.: 456 457```yaml 458media_instance_running_background_jobs: "media-repository-1" 459``` 460 461Note that if a reverse proxy is used , then `/_matrix/media/` must be routed for both inbound client and federation requests (if they are handled separately). 462 463### `synapse.app.user_dir` 464 465Handles searches in the user directory. It can handle REST endpoints matching 466the following regular expressions: 467 468 ^/_matrix/client/(api/v1|r0|v3|unstable)/user_directory/search$ 469 470When using this worker you must also set `update_user_directory: False` in the 471shared configuration file to stop the main synapse running background 472jobs related to updating the user directory. 473 474### `synapse.app.frontend_proxy` 475 476Proxies some frequently-requested client endpoints to add caching and remove 477load from the main synapse. It can handle REST endpoints matching the following 478regular expressions: 479 480 ^/_matrix/client/(api/v1|r0|v3|unstable)/keys/upload 481 482If `use_presence` is False in the homeserver config, it can also handle REST 483endpoints matching the following regular expressions: 484 485 ^/_matrix/client/(api/v1|r0|v3|unstable)/presence/[^/]+/status 486 487This "stub" presence handler will pass through `GET` request but make the 488`PUT` effectively a no-op. 489 490It will proxy any requests it cannot handle to the main synapse instance. It 491must therefore be configured with the location of the main instance, via 492the `worker_main_http_uri` setting in the `frontend_proxy` worker configuration 493file. For example: 494 495```yaml 496worker_main_http_uri: http://127.0.0.1:8008 497``` 498 499### Historical apps 500 501*Note:* Historically there used to be more apps, however they have been 502amalgamated into a single `synapse.app.generic_worker` app. The remaining apps 503are ones that do specific processing unrelated to requests, e.g. the `pusher` 504that handles sending out push notifications for new events. The intention is for 505all these to be folded into the `generic_worker` app and to use config to define 506which processes handle the various proccessing such as push notifications. 507 508 509## Migration from old config 510 511There are two main independent changes that have been made: introducing Redis 512support and merging apps into `synapse.app.generic_worker`. Both these changes 513are backwards compatible and so no changes to the config are required, however 514server admins are encouraged to plan to migrate to Redis as the old style direct 515TCP replication config is deprecated. 516 517To migrate to Redis add the `redis` config as above, and optionally remove the 518TCP `replication` listener from master and `worker_replication_port` from worker 519config. 520 521To migrate apps to use `synapse.app.generic_worker` simply update the 522`worker_app` option in the worker configs, and where worker are started (e.g. 523in systemd service files, but not required for synctl). 524 525 526## Architectural diagram 527 528The following shows an example setup using Redis and a reverse proxy: 529 530``` 531 Clients & Federation 532 | 533 v 534 +-----------+ 535 | | 536 | Reverse | 537 | Proxy | 538 | | 539 +-----------+ 540 | | | 541 | | | HTTP requests 542 +-------------------+ | +-----------+ 543 | +---+ | 544 | | | 545 v v v 546+--------------+ +--------------+ +--------------+ +--------------+ 547| Main | | Generic | | Generic | | Event | 548| Process | | Worker 1 | | Worker 2 | | Persister | 549+--------------+ +--------------+ +--------------+ +--------------+ 550 ^ ^ | ^ | | ^ | ^ ^ 551 | | | | | | | | | | 552 | | | | | HTTP | | | | | 553 | +----------+<--|---|---------+ | | | | 554 | | +-------------|-->+----------+ | 555 | | | | 556 | | | | 557 v v v v 558==================================================================== 559 Redis pub/sub channel 560``` 561