1.. _torflow_aggr:
2
3Torflow aggregation and scaling
4===============================
5
6.. seealso:: :ref:`generator` and :ref:`differences`.
7
8Torflow aggregation or scaling goal is:
9
10From Torflow's `README.spec.txt`_ (section 2.2)::
11
12    In this way, the resulting network status consensus bandwidth values
13    are effectively re-weighted proportional to how much faster the node
14    was as compared to the rest of the network.
15
16Initialization
17--------------
18
19Constants in consensus that Torflow uses and don't change::
20
21  bandwidth-weights Wbd=0 Wbe=0 [] Wbm=10000 Wdb=10000 Web=10000 Wed=10000 Wee=10000 Weg=10000 Wem=10000 Wgb=10000 Wgd=0 Wgg=5852 [] Wmb=10000 Wmd=0 Wme=0 [] Wmm=10000
22
23  params [] bwauthpid=1
24
25Constants in the code::
26
27  IGNORE_GUARD = 0
28  GUARD_SAMPLE_RATE = 2*7*24*60*60 # 2wks
29  MAX_AGE = 2*GUARD_SAMPLE_RATE;  # 4 weeks
30
31  K_p = 1.0
32  T_i = 0
33  T_i_decay = 0
34  T_d = 0
35
36Initialization ``ConsensusJunk``::
37
38  self.bwauth_pid_control = True
39  self.group_by_class = False
40  self.use_pid_tgt = False
41  self.use_circ_fails = False
42  self.use_best_ratio = True
43  self.use_desc_bw = True
44  self.use_mercy = False
45  self.guard_sample_rate = GUARD_SAMPLE_RATE
46  self.pid_max = 500.0
47  self.K_p = K_p = 1.0
48  self.T_i = T_i = 0
49  self.T_d = T_d = 0
50  self.T_i_decay = T_i_decay = 0
51
52  self.K_i = 0
53  self.K_d = self.K_p*self.T_d = 0
54
55Initialization ``Node``::
56
57    self.sbw_ratio = None
58    self.fbw_ratio = None
59    self.pid_bw = 0
60    self.pid_error = 0
61    self.prev_error = 0
62    self.pid_error_sum = 0
63    self.pid_delta = 0
64    self.ratio = None
65    self.new_bw = None
66    self.use_bw = -1
67    self.flags = ""
68
69    # measurement vars from bwauth lines
70    self.measured_at = 0
71    self.strm_bw = 0
72    self.filt_bw = 0
73    self.ns_bw = 0
74    self.desc_bw = 0
75    self.circ_fail_rate = 0
76    self.strm_fail_rate = 0
77    self.updated_at = 0
78
79.. image:: ./images/activity_torflow_aggr.svg
80
81
82.. _relay-descriptors-bandwidth:
83
84Descriptor values for each relay
85--------------------------------
86
87From `TorCtl.py`_ code, it is the minimum of all the descriptor bandwidth
88values::
89
90    bws = map(int, g)
91    bw_observed = min(bws)
92
93    [snip]
94
95    return Router(ns.idhex, ns.nickname, bw_observed, dead, exitpolicy,
96    ns.flags, ip, version, os, uptime, published, contact, rate_limited,
97    ns.orhash, ns.bandwidth, extra_info_digest, ns.unmeasured)
98
99``ns.bandwidth`` is the consensus bandwidth, already multiplied by 1000::
100
101  yield NetworkStatus(*(m.groups()+(flags,)+(int(w.group(1))*1000,))+(unmeasured,))
102
103Because of the matched regular expression, ``bws`` is **not** all the descriptor
104bandwidth values, but the average bandwidth and the observed bandwidth, ie., it
105does not take the average burst, what seems to be a bug in Torflow.
106
107Eg. ``bandwidth`` line in a descriptor::
108
109  bandwidth 1536000 4096000 1728471
110
111Only takes the first and last values, so::
112
113  bw_observed = min(bandwidth-avg, bandwidth-observed)
114
115This is passed to ``Router``, in which the descriptors bandwidth is assigned to
116the consensus bandwidth when there is no consensus bandwidth::
117
118    (idhex, name, bw, down, exitpolicy, flags, ip, version, os, uptime,
119       published, contact, rate_limited, orhash,
120       ns_bandwidth,extra_info_digest,unmeasured) = args
121
122    [snip]
123
124    if ns_bandwidth != None:
125      self.bw = max(ns_bandwidth,1) # Avoid div by 0
126    else:
127      self.bw = max(bw,1) # Avoid div by 0
128
129    [snip]
130
131    self.desc_bw = max(bw,1) # Avoid div by 0
132
133So::
134
135  self.bw = ns_bwandwidth or min(bandwidth-avg, bandwidth-observed) or 1
136  desc_bw = min(bandwidth-avg, bandwidth-observed) or 1
137
138And written by `SQLSupport.py`_ as descriptor and conensus bandwidth::
139
140      f.write(" desc_bw="+str(int(cvt(s.avg_desc_bw,0))))
141      f.write(" ns_bw="+str(int(cvt(s.avg_bw,0)))+"\n")
142
143.. _relay-descriptor-bandwidth-pid:
144
145Descriptor bandwidth with PID control
146~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
147
148Even though `README.spec.txt`_ talks about the consensus bandwidth, in
149`aggregate.py`_ code, the consensus bandwidth is never used, since
150``use_desc_bw`` is initialized to True and never changed::
151
152    if cs_junk.bwauth_pid_control:
153      if cs_junk.use_desc_bw:
154        n.use_bw = n.desc_bw
155      else:
156        n.use_bw = n.ns_bw
157
158So::
159
160  n.use_bw = n.desc_bw = min(bandwidth-avg, bandwidth-observed) or 1
161
162
163Scaling the raw measurements
164----------------------------
165
166.. _overview:
167
168Overview
169~~~~~~~~
170
171This diagram also includes
172:ref:`relay-descriptor-bandwidth-pid`,
173:ref:`relay-bandwidth-ratio` and :ref:`relay-scaled-bandwidth-pid`.
174
175.. image:: ./images/activity_torflow_scaling_simplified1.svg
176
177Simplified image from:
178
179.. image:: ./images/activity_torflow_scaling_simplified.svg
180
181`<./_images/activity_torflow_scaling_simplified.svg>`_
182
183.. image:: ./images/activity_torflow_scaling.svg
184
185`<./_images/activity_torflow_scaling.svg>`_
186
187
188Stream and filtered bandwidth for each relay
189~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
190
191They are calculated in the same way whether or not `PID controller`_ feedback
192is used.
193
194From Torflow's `README.spec.txt`_ (section 1.6)::
195
196    The strm_bw field is the average (mean) of all the streams for the relay
197    identified by the fingerprint field.
198
199    The filt_bw field is computed similarly, but only the streams equal to
200    or greater than the strm_bw are counted in order to filter very slow
201    streams due to slow node pairings.
202
203In the code, `SQLSupport.py`_, ``strm_bw`` is ``sbw`` and
204``filt_bw`` is ``filt_sbws``::
205
206    for rs in RouterStats.query.filter(stats_clause).\
207          options(eagerload_all('router.streams.circuit.routers')).all():
208      tot_sbw = 0
209      sbw_cnt = 0
210      for s in rs.router.streams:
211        if isinstance(s, ClosedStream):
212          skip = False
213          #for br in badrouters:
214          #  if br != rs:
215          #    if br.router in s.circuit.routers:
216          #      skip = True
217          if not skip:
218            # Throw out outliers < mean
219            # (too much variance for stddev to filter much)
220            if rs.strm_closed == 1 or s.bandwidth() >= rs.sbw:
221              tot_sbw += s.bandwidth()
222              sbw_cnt += 1
223
224    if sbw_cnt: rs.filt_sbw = tot_sbw/sbw_cnt
225    else: rs.filt_sbw = None
226
227This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4
228step 1.
229
230Calling ``bwstrm_i`` to ``strm_bw`` and ``bwfilt_i`` to ``filt_bw``,
231if ``bw_j`` is a measurement for a relay ``i``, then:::
232
233  bwstrm_i = mean(bw_j)  # for a relay, the average of all its measurements
234  bwfilt_i = mean(max(bwstrm_i, bw_j))
235
236.. _stream-and-filtered-bandwidth-for-all-relays:
237
238Stream and filtered bandwidth for all relays
239~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
240
241From `README.spec.txt`_ (section 2.1)::
242
243    Once we have determined the most recent measurements for each node, we
244    compute an average of the filt_bw fields over all nodes we have measured.
245
246In Torflow's `aggregate.py`_ code::
247
248  for cl in ["Guard+Exit", "Guard", "Exit", "Middle"]:
249    c_nodes = filter(lambda n: n.node_class() == cl, nodes.itervalues())
250    if len(c_nodes) > 0:
251      true_filt_avg[cl] = sum(map(lambda n: n.filt_bw, c_nodes))/float(len(c_nodes))
252      true_strm_avg[cl] = sum(map(lambda n: n.strm_bw, c_nodes))/float(len(c_nodes))
253      true_circ_avg[cl] = sum(map(lambda n: (1.0-n.circ_fail_rate),
254                            c_nodes))/float(len(c_nodes))
255
256The following code it's actually used later to set the ``filt_avg`` and
257``strm_avg`` for each class::
258
259    filt_avg = sum(map(lambda n: n.filt_bw, nodes.itervalues()))/float(len(nodes))
260    strm_avg = sum(map(lambda n: n.strm_bw, nodes.itervalues()))/float(len(nodes))
261
262Because ``cs_junk.group_by_class`` is False, it runs::
263
264      for cl in ["Guard+Exit", "Guard", "Exit", "Middle"]:
265        true_filt_avg[cl] = filt_avg
266        true_strm_avg[cl] = strm_avg
267        true_circ_avg[cl] = circ_avg
268        pid_tgt_avg[cl] = pid_avg
269
270So ``filt_avg`` and ``strm_avg`` are calculated **not** by class in either case,
271with and without PID control.
272
273Calling ``bwstrm`` to ``strm_avg`` and ``bwfilt`` to ``fitl_avg``, without
274taking into account the different types of nodes::
275
276  bwstrm = mean(bwstrm_i)
277  bwfilt = mean(bwfilt_i)
278
279This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4
280step 2.
281
282.. _relay-bandwidth-ratio:
283
284Ratio for each relay
285~~~~~~~~~~~~~~~~~~~~
286
287From `README.spec.txt`_ (section 2.2)::
288
289    These averages are used to produce ratios for each node by dividing the
290    measured value for that node by the network average.
291
292In Torflow's `aggregate.py`_ code::
293
294    for n in nodes.itervalues():
295        n.fbw_ratio = n.filt_bw/true_filt_avg[n.node_class()]
296        n.sbw_ratio = n.strm_bw/true_strm_avg[n.node_class()]
297
298    [snip]
299
300    # Choose the larger between sbw and fbw
301      if n.sbw_ratio > n.fbw_ratio:
302        n.ratio = n.sbw_ratio
303      else:
304        n.ratio = n.fbw_ratio
305
306So::
307
308  n.ratio = max(n.sbw_ratio, n.fbw_ratio)
309
310This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4
311step 2 and 3.
312
313.. relay-scaled-no-pid:
314
315Scaled bandwidth for each relay without PID control
316~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
317
318From `README.spec.txt`_ (section 2.2)::
319
320    These ratios are then multiplied by the most recent observed descriptor
321    bandwidth we have available for each node, to produce a new value for
322    the network status consensus process.
323
324In `aggregate.py`_ code::
325
326    n.new_bw = n.desc_bw*n.ratio
327
328So::
329
330    n.new_bw = (
331        min(bandwidth-avg, bandwidth-observed) or 1 \
332        * max(bwstrm_i / bwstrm, bwfilt_i / bwfilt)
333    )
334
335This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4
336step 5.
337
338.. _relay-scaled-bandwidth-pid:
339
340Scaled bandwidth for each relay with PID control
341~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
342
343From `README.spec.txt`_ section 3.1::
344
345   The bandwidth authorities measure F_node: the filtered stream
346   capacity through a given node (filtering is described in Section 1.6).
347
348   [snip]
349
350   pid_error = e(t) = (F_node - F_avg)/F_avg.
351
352   [snip]
353
354   new_consensus_bw = old_consensus_bw +
355                        old_consensus_bw * K_p * e(t) +
356                        old_consensus_bw * K_i * \integral{e(t)} +
357                        old_consensus_bw * K_d * \derivative{e(t)}
358
359   [snip]
360
361   For the case where K_p = 1, K_i=0, and K_d=0, it can be seen that this
362   system is equivalent to the one defined in 2.2, except using consensus
363   bandwidth instead of descriptor bandwidth:
364
365       new_bw = old_bw + old_bw*e(t)
366       new_bw = old_bw + old_bw*(F_node/F_avg - 1)
367       new_bw = old_bw*F_node/F_avg
368       new_bw = old_bw*ratio
369
370In Torflow's code, this is actually the case and most of the code is not
371executed because the default ``K`` values.
372
373It seems then that ``F_node`` is ``filt_bw`` in Torflow's code or ``bwfilt_i``
374here, and ``F_avg`` is ``filt_avg`` in Torflow's code or ``bwfilt`` here.
375
376In `aggregate.py`_ code, pid error also depends on which of the ratios is
377greater::
378
379    if cs_junk.use_best_ratio and n.sbw_ratio > n.fbw_ratio:
380            n.pid_error = (n.strm_bw - true_strm_avg[n.node_class()]) \
381                            / true_strm_avg[n.node_class()]
382            else:
383            n.pid_error = (n.filt_bw - true_filt_avg[n.node_class()]) \
384                            / true_filt_avg[n.node_class()]
385
386    [snip]
387
388    n.new_bw = n.use_bw + cs_junk.K_p*n.use_bw*n.pid_error
389
390So::
391
392  if (bwstrm_i / bwstrm) > (bwfilt_i / bwfilt):
393      pid_error = (bwstrm_i - bwstrm) / bwstrm = (bwstrm_i / bwstrm) - 1
394  else:
395      pid_error = (bwfilt_i - bwfilt_i) / bwfilt = (bwfilt_i / bwfilt) - 1
396
397  new_bw = use_bw + use_bw * pid_error
398
399Or::
400
401  if (bwstrm_i / bwstrm) > (bwfilt_i / bwfilt):
402      new_bw = use_bw + use_bw * ((bwstrm_i / bwstrm) - 1)
403      new_bw = use_bw + use_bw * (bwstrm_i / bwstrm) - use_bw
404      new_bw = use_bw * (bwstrm_i / bwstrm)
405  else:
406      new_bw = use_bw + use_bw * ((bwfilt_i / bwfilt) - 1)
407      new_bw = use_bw + use_bw * (bwfilt_i / bwfilt) - use_bw
408      new_bw = use_bw * (bwfilt_i / bwfilt)
409
410Or::
411
412  new_bw = use_bw * max(bwstrm_i / bwstrm, bwfilt_i / bwfilt)
413  new_bw = (
414      min(bandwidth-avg, bandwidth-observed) or 1
415      * max(bwstrm_i / bwstrm, bwfilt_i / bwfilt)
416  )
417
418.. note::
419    So, the new scaled bandwidth is the same for both cases with and without
420    PID controller!
421
422Other pid KeyValues in the Bandwidth File
423-----------------------------------------
424
425.. note::
426
427  From the :ref:`overview` it seems that the only variable needed to
428  calculate the new scaled bandwidth is the ``pid_error``, and from
429  :ref:`relay-descriptor-bandwidth-pid`, it can be substituted
430  by the stream and filtered bandwidths.
431
432This are the variables that can then be ignored::
433
434  pid_error_sum
435  pid_delta
436  prev_error
437
438Limit scaled bandwidth for each relay
439-------------------------------------
440
441It's calculated the same with and without PID control
442
443Once each relay bandwidth is scaled, it is limited to a maximum, that is
444calculated as the sum of all the relays in the current consensus scaled
445bandwidth per 0.05.
446
447From `aggregate.py`_ code::
448
449    NODE_CAP = 0.05
450
451    [snip]
452
453    if n.idhex in prev_consensus:
454      if prev_consensus[n.idhex].bandwidth != None:
455        prev_consensus[n.idhex].measured = True
456        tot_net_bw += n.new_bw
457
458    [snip]
459
460    if n.new_bw > tot_net_bw*NODE_CAP:
461      [snip]
462      n.new_bw = int(tot_net_bw*NODE_CAP)
463
464
465Round the scaled bandwidth for each relay
466-----------------------------------------
467
468Finally, the new scaled bandwidth is expressed in kilobytes and rounded a number
469of digits.
470
471
472.. _README.spec.txt: https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt
473.. _PID Controller: https://en.wikipedia.org/wiki/PID_controller
474.. _SQLSupport.py: https://gitweb.torproject.org/pytorctl.git/tree/SQLSupport.py#n493
475.. _bandwidth file spec: https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt
476.. _aggregate.py: https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/aggregate.py
477.. _TorCtl.py: https://gitweb.torproject.org/pytorctl.git/tree/TorCtl.py
478