1.. _torflow_aggr: 2 3Torflow aggregation and scaling 4=============================== 5 6.. seealso:: :ref:`generator` and :ref:`differences`. 7 8Torflow aggregation or scaling goal is: 9 10From Torflow's `README.spec.txt`_ (section 2.2):: 11 12 In this way, the resulting network status consensus bandwidth values 13 are effectively re-weighted proportional to how much faster the node 14 was as compared to the rest of the network. 15 16Initialization 17-------------- 18 19Constants in consensus that Torflow uses and don't change:: 20 21 bandwidth-weights Wbd=0 Wbe=0 [] Wbm=10000 Wdb=10000 Web=10000 Wed=10000 Wee=10000 Weg=10000 Wem=10000 Wgb=10000 Wgd=0 Wgg=5852 [] Wmb=10000 Wmd=0 Wme=0 [] Wmm=10000 22 23 params [] bwauthpid=1 24 25Constants in the code:: 26 27 IGNORE_GUARD = 0 28 GUARD_SAMPLE_RATE = 2*7*24*60*60 # 2wks 29 MAX_AGE = 2*GUARD_SAMPLE_RATE; # 4 weeks 30 31 K_p = 1.0 32 T_i = 0 33 T_i_decay = 0 34 T_d = 0 35 36Initialization ``ConsensusJunk``:: 37 38 self.bwauth_pid_control = True 39 self.group_by_class = False 40 self.use_pid_tgt = False 41 self.use_circ_fails = False 42 self.use_best_ratio = True 43 self.use_desc_bw = True 44 self.use_mercy = False 45 self.guard_sample_rate = GUARD_SAMPLE_RATE 46 self.pid_max = 500.0 47 self.K_p = K_p = 1.0 48 self.T_i = T_i = 0 49 self.T_d = T_d = 0 50 self.T_i_decay = T_i_decay = 0 51 52 self.K_i = 0 53 self.K_d = self.K_p*self.T_d = 0 54 55Initialization ``Node``:: 56 57 self.sbw_ratio = None 58 self.fbw_ratio = None 59 self.pid_bw = 0 60 self.pid_error = 0 61 self.prev_error = 0 62 self.pid_error_sum = 0 63 self.pid_delta = 0 64 self.ratio = None 65 self.new_bw = None 66 self.use_bw = -1 67 self.flags = "" 68 69 # measurement vars from bwauth lines 70 self.measured_at = 0 71 self.strm_bw = 0 72 self.filt_bw = 0 73 self.ns_bw = 0 74 self.desc_bw = 0 75 self.circ_fail_rate = 0 76 self.strm_fail_rate = 0 77 self.updated_at = 0 78 79.. image:: ./images/activity_torflow_aggr.svg 80 81 82.. _relay-descriptors-bandwidth: 83 84Descriptor values for each relay 85-------------------------------- 86 87From `TorCtl.py`_ code, it is the minimum of all the descriptor bandwidth 88values:: 89 90 bws = map(int, g) 91 bw_observed = min(bws) 92 93 [snip] 94 95 return Router(ns.idhex, ns.nickname, bw_observed, dead, exitpolicy, 96 ns.flags, ip, version, os, uptime, published, contact, rate_limited, 97 ns.orhash, ns.bandwidth, extra_info_digest, ns.unmeasured) 98 99``ns.bandwidth`` is the consensus bandwidth, already multiplied by 1000:: 100 101 yield NetworkStatus(*(m.groups()+(flags,)+(int(w.group(1))*1000,))+(unmeasured,)) 102 103Because of the matched regular expression, ``bws`` is **not** all the descriptor 104bandwidth values, but the average bandwidth and the observed bandwidth, ie., it 105does not take the average burst, what seems to be a bug in Torflow. 106 107Eg. ``bandwidth`` line in a descriptor:: 108 109 bandwidth 1536000 4096000 1728471 110 111Only takes the first and last values, so:: 112 113 bw_observed = min(bandwidth-avg, bandwidth-observed) 114 115This is passed to ``Router``, in which the descriptors bandwidth is assigned to 116the consensus bandwidth when there is no consensus bandwidth:: 117 118 (idhex, name, bw, down, exitpolicy, flags, ip, version, os, uptime, 119 published, contact, rate_limited, orhash, 120 ns_bandwidth,extra_info_digest,unmeasured) = args 121 122 [snip] 123 124 if ns_bandwidth != None: 125 self.bw = max(ns_bandwidth,1) # Avoid div by 0 126 else: 127 self.bw = max(bw,1) # Avoid div by 0 128 129 [snip] 130 131 self.desc_bw = max(bw,1) # Avoid div by 0 132 133So:: 134 135 self.bw = ns_bwandwidth or min(bandwidth-avg, bandwidth-observed) or 1 136 desc_bw = min(bandwidth-avg, bandwidth-observed) or 1 137 138And written by `SQLSupport.py`_ as descriptor and conensus bandwidth:: 139 140 f.write(" desc_bw="+str(int(cvt(s.avg_desc_bw,0)))) 141 f.write(" ns_bw="+str(int(cvt(s.avg_bw,0)))+"\n") 142 143.. _relay-descriptor-bandwidth-pid: 144 145Descriptor bandwidth with PID control 146~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 147 148Even though `README.spec.txt`_ talks about the consensus bandwidth, in 149`aggregate.py`_ code, the consensus bandwidth is never used, since 150``use_desc_bw`` is initialized to True and never changed:: 151 152 if cs_junk.bwauth_pid_control: 153 if cs_junk.use_desc_bw: 154 n.use_bw = n.desc_bw 155 else: 156 n.use_bw = n.ns_bw 157 158So:: 159 160 n.use_bw = n.desc_bw = min(bandwidth-avg, bandwidth-observed) or 1 161 162 163Scaling the raw measurements 164---------------------------- 165 166.. _overview: 167 168Overview 169~~~~~~~~ 170 171This diagram also includes 172:ref:`relay-descriptor-bandwidth-pid`, 173:ref:`relay-bandwidth-ratio` and :ref:`relay-scaled-bandwidth-pid`. 174 175.. image:: ./images/activity_torflow_scaling_simplified1.svg 176 177Simplified image from: 178 179.. image:: ./images/activity_torflow_scaling_simplified.svg 180 181`<./_images/activity_torflow_scaling_simplified.svg>`_ 182 183.. image:: ./images/activity_torflow_scaling.svg 184 185`<./_images/activity_torflow_scaling.svg>`_ 186 187 188Stream and filtered bandwidth for each relay 189~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 190 191They are calculated in the same way whether or not `PID controller`_ feedback 192is used. 193 194From Torflow's `README.spec.txt`_ (section 1.6):: 195 196 The strm_bw field is the average (mean) of all the streams for the relay 197 identified by the fingerprint field. 198 199 The filt_bw field is computed similarly, but only the streams equal to 200 or greater than the strm_bw are counted in order to filter very slow 201 streams due to slow node pairings. 202 203In the code, `SQLSupport.py`_, ``strm_bw`` is ``sbw`` and 204``filt_bw`` is ``filt_sbws``:: 205 206 for rs in RouterStats.query.filter(stats_clause).\ 207 options(eagerload_all('router.streams.circuit.routers')).all(): 208 tot_sbw = 0 209 sbw_cnt = 0 210 for s in rs.router.streams: 211 if isinstance(s, ClosedStream): 212 skip = False 213 #for br in badrouters: 214 # if br != rs: 215 # if br.router in s.circuit.routers: 216 # skip = True 217 if not skip: 218 # Throw out outliers < mean 219 # (too much variance for stddev to filter much) 220 if rs.strm_closed == 1 or s.bandwidth() >= rs.sbw: 221 tot_sbw += s.bandwidth() 222 sbw_cnt += 1 223 224 if sbw_cnt: rs.filt_sbw = tot_sbw/sbw_cnt 225 else: rs.filt_sbw = None 226 227This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4 228step 1. 229 230Calling ``bwstrm_i`` to ``strm_bw`` and ``bwfilt_i`` to ``filt_bw``, 231if ``bw_j`` is a measurement for a relay ``i``, then::: 232 233 bwstrm_i = mean(bw_j) # for a relay, the average of all its measurements 234 bwfilt_i = mean(max(bwstrm_i, bw_j)) 235 236.. _stream-and-filtered-bandwidth-for-all-relays: 237 238Stream and filtered bandwidth for all relays 239~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 240 241From `README.spec.txt`_ (section 2.1):: 242 243 Once we have determined the most recent measurements for each node, we 244 compute an average of the filt_bw fields over all nodes we have measured. 245 246In Torflow's `aggregate.py`_ code:: 247 248 for cl in ["Guard+Exit", "Guard", "Exit", "Middle"]: 249 c_nodes = filter(lambda n: n.node_class() == cl, nodes.itervalues()) 250 if len(c_nodes) > 0: 251 true_filt_avg[cl] = sum(map(lambda n: n.filt_bw, c_nodes))/float(len(c_nodes)) 252 true_strm_avg[cl] = sum(map(lambda n: n.strm_bw, c_nodes))/float(len(c_nodes)) 253 true_circ_avg[cl] = sum(map(lambda n: (1.0-n.circ_fail_rate), 254 c_nodes))/float(len(c_nodes)) 255 256The following code it's actually used later to set the ``filt_avg`` and 257``strm_avg`` for each class:: 258 259 filt_avg = sum(map(lambda n: n.filt_bw, nodes.itervalues()))/float(len(nodes)) 260 strm_avg = sum(map(lambda n: n.strm_bw, nodes.itervalues()))/float(len(nodes)) 261 262Because ``cs_junk.group_by_class`` is False, it runs:: 263 264 for cl in ["Guard+Exit", "Guard", "Exit", "Middle"]: 265 true_filt_avg[cl] = filt_avg 266 true_strm_avg[cl] = strm_avg 267 true_circ_avg[cl] = circ_avg 268 pid_tgt_avg[cl] = pid_avg 269 270So ``filt_avg`` and ``strm_avg`` are calculated **not** by class in either case, 271with and without PID control. 272 273Calling ``bwstrm`` to ``strm_avg`` and ``bwfilt`` to ``fitl_avg``, without 274taking into account the different types of nodes:: 275 276 bwstrm = mean(bwstrm_i) 277 bwfilt = mean(bwfilt_i) 278 279This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4 280step 2. 281 282.. _relay-bandwidth-ratio: 283 284Ratio for each relay 285~~~~~~~~~~~~~~~~~~~~ 286 287From `README.spec.txt`_ (section 2.2):: 288 289 These averages are used to produce ratios for each node by dividing the 290 measured value for that node by the network average. 291 292In Torflow's `aggregate.py`_ code:: 293 294 for n in nodes.itervalues(): 295 n.fbw_ratio = n.filt_bw/true_filt_avg[n.node_class()] 296 n.sbw_ratio = n.strm_bw/true_strm_avg[n.node_class()] 297 298 [snip] 299 300 # Choose the larger between sbw and fbw 301 if n.sbw_ratio > n.fbw_ratio: 302 n.ratio = n.sbw_ratio 303 else: 304 n.ratio = n.fbw_ratio 305 306So:: 307 308 n.ratio = max(n.sbw_ratio, n.fbw_ratio) 309 310This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4 311step 2 and 3. 312 313.. relay-scaled-no-pid: 314 315Scaled bandwidth for each relay without PID control 316~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 317 318From `README.spec.txt`_ (section 2.2):: 319 320 These ratios are then multiplied by the most recent observed descriptor 321 bandwidth we have available for each node, to produce a new value for 322 the network status consensus process. 323 324In `aggregate.py`_ code:: 325 326 n.new_bw = n.desc_bw*n.ratio 327 328So:: 329 330 n.new_bw = ( 331 min(bandwidth-avg, bandwidth-observed) or 1 \ 332 * max(bwstrm_i / bwstrm, bwfilt_i / bwfilt) 333 ) 334 335This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4 336step 5. 337 338.. _relay-scaled-bandwidth-pid: 339 340Scaled bandwidth for each relay with PID control 341~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 342 343From `README.spec.txt`_ section 3.1:: 344 345 The bandwidth authorities measure F_node: the filtered stream 346 capacity through a given node (filtering is described in Section 1.6). 347 348 [snip] 349 350 pid_error = e(t) = (F_node - F_avg)/F_avg. 351 352 [snip] 353 354 new_consensus_bw = old_consensus_bw + 355 old_consensus_bw * K_p * e(t) + 356 old_consensus_bw * K_i * \integral{e(t)} + 357 old_consensus_bw * K_d * \derivative{e(t)} 358 359 [snip] 360 361 For the case where K_p = 1, K_i=0, and K_d=0, it can be seen that this 362 system is equivalent to the one defined in 2.2, except using consensus 363 bandwidth instead of descriptor bandwidth: 364 365 new_bw = old_bw + old_bw*e(t) 366 new_bw = old_bw + old_bw*(F_node/F_avg - 1) 367 new_bw = old_bw*F_node/F_avg 368 new_bw = old_bw*ratio 369 370In Torflow's code, this is actually the case and most of the code is not 371executed because the default ``K`` values. 372 373It seems then that ``F_node`` is ``filt_bw`` in Torflow's code or ``bwfilt_i`` 374here, and ``F_avg`` is ``filt_avg`` in Torflow's code or ``bwfilt`` here. 375 376In `aggregate.py`_ code, pid error also depends on which of the ratios is 377greater:: 378 379 if cs_junk.use_best_ratio and n.sbw_ratio > n.fbw_ratio: 380 n.pid_error = (n.strm_bw - true_strm_avg[n.node_class()]) \ 381 / true_strm_avg[n.node_class()] 382 else: 383 n.pid_error = (n.filt_bw - true_filt_avg[n.node_class()]) \ 384 / true_filt_avg[n.node_class()] 385 386 [snip] 387 388 n.new_bw = n.use_bw + cs_junk.K_p*n.use_bw*n.pid_error 389 390So:: 391 392 if (bwstrm_i / bwstrm) > (bwfilt_i / bwfilt): 393 pid_error = (bwstrm_i - bwstrm) / bwstrm = (bwstrm_i / bwstrm) - 1 394 else: 395 pid_error = (bwfilt_i - bwfilt_i) / bwfilt = (bwfilt_i / bwfilt) - 1 396 397 new_bw = use_bw + use_bw * pid_error 398 399Or:: 400 401 if (bwstrm_i / bwstrm) > (bwfilt_i / bwfilt): 402 new_bw = use_bw + use_bw * ((bwstrm_i / bwstrm) - 1) 403 new_bw = use_bw + use_bw * (bwstrm_i / bwstrm) - use_bw 404 new_bw = use_bw * (bwstrm_i / bwstrm) 405 else: 406 new_bw = use_bw + use_bw * ((bwfilt_i / bwfilt) - 1) 407 new_bw = use_bw + use_bw * (bwfilt_i / bwfilt) - use_bw 408 new_bw = use_bw * (bwfilt_i / bwfilt) 409 410Or:: 411 412 new_bw = use_bw * max(bwstrm_i / bwstrm, bwfilt_i / bwfilt) 413 new_bw = ( 414 min(bandwidth-avg, bandwidth-observed) or 1 415 * max(bwstrm_i / bwstrm, bwfilt_i / bwfilt) 416 ) 417 418.. note:: 419 So, the new scaled bandwidth is the same for both cases with and without 420 PID controller! 421 422Other pid KeyValues in the Bandwidth File 423----------------------------------------- 424 425.. note:: 426 427 From the :ref:`overview` it seems that the only variable needed to 428 calculate the new scaled bandwidth is the ``pid_error``, and from 429 :ref:`relay-descriptor-bandwidth-pid`, it can be substituted 430 by the stream and filtered bandwidths. 431 432This are the variables that can then be ignored:: 433 434 pid_error_sum 435 pid_delta 436 prev_error 437 438Limit scaled bandwidth for each relay 439------------------------------------- 440 441It's calculated the same with and without PID control 442 443Once each relay bandwidth is scaled, it is limited to a maximum, that is 444calculated as the sum of all the relays in the current consensus scaled 445bandwidth per 0.05. 446 447From `aggregate.py`_ code:: 448 449 NODE_CAP = 0.05 450 451 [snip] 452 453 if n.idhex in prev_consensus: 454 if prev_consensus[n.idhex].bandwidth != None: 455 prev_consensus[n.idhex].measured = True 456 tot_net_bw += n.new_bw 457 458 [snip] 459 460 if n.new_bw > tot_net_bw*NODE_CAP: 461 [snip] 462 n.new_bw = int(tot_net_bw*NODE_CAP) 463 464 465Round the scaled bandwidth for each relay 466----------------------------------------- 467 468Finally, the new scaled bandwidth is expressed in kilobytes and rounded a number 469of digits. 470 471 472.. _README.spec.txt: https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt 473.. _PID Controller: https://en.wikipedia.org/wiki/PID_controller 474.. _SQLSupport.py: https://gitweb.torproject.org/pytorctl.git/tree/SQLSupport.py#n493 475.. _bandwidth file spec: https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt 476.. _aggregate.py: https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/aggregate.py 477.. _TorCtl.py: https://gitweb.torproject.org/pytorctl.git/tree/TorCtl.py 478