• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

doc/H10-Oct-2016-412351

src/H03-May-2022-10,2556,725

test/H10-Oct-2016-9,3697,082

.travis.ymlH A D10-Oct-2016143 98

LICENSEH A D10-Oct-201611.1 KiB203169

README.mdH A D10-Oct-20166.4 KiB155119

rebar.configH A D03-May-2022355 1310

rebar.lockH A D10-Oct-20164 21

rebar.test.configH A D10-Oct-2016102 43

README.md

1sbroker
2=======
3
4`sbroker` is a library that provides the building blocks for creating a pool
5and/or a load regulator. The main goals of the library are to minimise upper
6percentile latency by smart queuing, easily change the feature set live with
7minimal changes, easily inspect a live system and provide confidence with
8property based testing.
9
10Example
11-------
12
13Add a broker to the `sbroker` application env `brokers` and it will be started
14when the application starts. Below we use a CoDel queue for the `ask` side, a
15timeout queue for the `ask_r` side and no meters. Processes then call
16`sbroker:ask/1` and `sbroker:ask_r` to find a match. A process calling
17`sbroker:ask/1` will only match with a process that calls `sbroker:ask_r` and
18vice versa.
19
20```erlang
21ok = application:load(sbroker),
22Broker = broker,
23Brokers = [{{local, Broker},
24            {{sbroker_codel_queue, #{}}, {sbroker_timeout_queue, #{}}, []}}],
25ok = application:set_env(sbroker, brokers, Brokers),
26{ok, _} = application:ensure_all_started(sbroker).
27
28Pid = spawn_link(fun() -> {go, Ref, _, _, _} = sbroker:ask_r(Broker) end),
29{go, Ref, Pid, _, _} = sbroker:ask(Broker).
30```
31
32Matches can also be requested without queuing, asynchronously or using a dynamic
33approach that is synchronous but becomes asynchronous if a match isn't
34immediately available.
35
36Requirements
37------------
38
39The minimum OTP version supported is 18.0.
40
41The `sasl` application is required to start the `sbroker` application. The
42`sasl` `error_logger` handler can be disabled by setting the `sasl` application
43env `sasl_error_logger` to `false`.
44
45Installing
46----------
47
48For rebar3 add `sbroker` as a depencency in `rebar.config`:
49
50```erlang
51{deps, [sbroker]}.
52```
53
54Other build tools may work if they support `rebar3` dependencies but are not
55directly supported.
56
57Testing
58-------
59
60```
61$ rebar3 ct
62```
63
64Documentation
65-------------
66
67Documentation is hosted on hex: http://hexdocs.pm/sbroker/
68
69
70Motivation
71----------
72
73The main roles of a pool are: dispatching, back pressure, load shedding,
74worker supervision and resizing.
75
76Existing pooling solutions assume if a worker is alive it is ready to handle
77work. If a worker isn't ready a client must wait for it be ready, or error
78immediately, when another worker might be ready to successfully handle the
79request. If workers explicitly control when they can are available then the
80pool can always dispatch to workers that are ready.
81
82Therefore in an ideal situation clients are requesting workers and workers are
83requesting clients. This is the broker pattern, where both parties are
84requesting a match with the counter party. For simplicity the same API can be
85used for both and so to the broker both parties are clients.
86
87Existing pooling solutions that support back pressure use a timeout mechanism
88where clients are queued for a length of time and then give up. Once clients
89start timing out, the next client in the queue is likely to have waited close to
90the time out. This leads to the situation where clients are all queued for
91approximately the time out, either giving up or getting a worker. If clients
92that give up could give up sooner then all clients would spend less time waiting
93but the same number would be served.
94
95Therefore in an ideal situation a target queue time would be chosen that keeps
96the system feeling responsive and clients would give up at a rate such that in
97the long term clients spend up to the target time in the queue. This is sojourn
98(queue waiting) time active queue management. CoDel and PIE are two state of the
99art active queue management algorithms with a target sojourn time, so should
100use those with defaults that keep systems feeling responsive to a user.
101
102Existing pooling solutions that support load shedding do not support back
103pressure. These use ETS as a lock system and choose a worker to try. However
104other workers might be available but are not tried or busy wait is used to retry
105multiple times to gain a lock. If clients could use ETS to determine whether
106a worker is likely to be available we could use existing dispatch and back
107pressure mechanisms.
108
109Therefore we want to limit access to the dispatching process by implementing a
110sojourn time active queue management algorithm using ETS in front of the
111dispatching process. Fortunately this is possible with the basic version of PIE.
112
113Existing pooling solutions either don't support resize or grow the pool when no
114workers are immediately available. However that worker may need to setup an
115expensive resource and is unlikely to be ready immediately. If workers are
116started early then the pool will be less likely to have no workers available.
117
118However the same pools that start workers "too late" also start new workers for
119every client that trys to checkout when no workers are available. However old
120workers will become available again, perhaps before new workers are ready. This
121often leads to too many workers getting started and wastes resources until they
122are reaped for being idle. If workers are started at intervals then temporary
123bursts would not start too many workers but persistent increases would still
124cause adequate growth.
125
126Therefore we want workers to be started when worker availablity is running low
127but with intervals between starting workers. This can be achieved by sampling
128the worker queue at intervals and starting a worker based on the reading. This
129is the load regulator pattern, where the concurrency limit of tasks changes
130based on sampling. For simplicity the same API as the broker could be used,
131where the regulator is also the counterparty to the workers.
132
133Existing pooling solutions that also support resizing use a temporary a
134supervisor and keep restarting workers if they crash, equivalent to using max
135restarts infinity. Unfortunately these pools can't recover from faults due to
136bad state because the error does not bubble up the supervision tree and trigger
137restarts. They are "too fault tolerant" because the error does not spread far
138enough to trigger recovery. A pool where workers crash every time is not useful.
139
140Therefore we want workers to be supervised using supervisors with any
141configuration so the user can decide exactly how to handle failures. Fortunately
142using both the broker and regulator patterns allows workers to be started under
143user defined supervisors.
144
145License
146-------
147
148This project is licensed under the Apache License, 2.0.
149
150Roadmap
151-------
152
153* 1.1 - Add circuit breaker sregulator valves
154* 1.2+ - Add improved queue management algorithms when possible, if at all
155