1.. include:: header.rst 2 3.. contents:: Table of contents 4 :depth: 2 5 :backlinks: none 6 7tuning libtorrent 8================= 9 10libtorrent expose most parameters used in the bittorrent engine for 11customization through the settings_pack. This makes it possible to 12test and tweak the parameters for certain algorithms to make a client 13that fits a wide range of needs. From low memory embedded devices to 14servers seeding thousands of torrents. The default settings in libtorrent 15are tuned for an end-user bittorrent client running on a normal desktop 16computer. 17 18This document describes techniques to benchmark libtorrent performance 19and how parameters are likely to affect it. 20 21profiling 22========= 23 24libtorrent is instrumented with a number of counters and gauges you can have 25access to via the ``session_stats_alert``. First, enable these alerts in the 26alert mask:: 27 28 settings_pack p; 29 p.set_int(settings_mask::alert_mask, alert::stats_notification); 30 ses.apply_settings(p); 31 32Then print alerts to a file:: 33 34 std::vector<alert*> alerts; 35 ses.pop_alerts(&alerts); 36 37 for (auto* a : alerts) { 38 std::cout << a->message() << "\n"; 39 } 40 41If you want to separate generic alerts from session stats, you can filter on the 42alert category in the alert, ``alert::category()``. 43 44The alerts with data will have the type session_stats_alert and there is one 45session_log_alert that will be posted on startup containing the column names 46for all metrics. Logging this line will greatly simplify interpreting the output. 47 48The python scrip in ``tools/parse_session_stats.py`` can parse the resulting 49file and produce graphs of relevant stats. It requires gnuplot_. 50 51.. _gnuplot: http://www.gnuplot.info 52 53reducing memory footprint 54========================= 55 56These are things you can do to reduce the memory footprint of libtorrent. You get 57some of this by basing your default settings_pack on the min_memory_usage() 58setting preset function. 59 60Keep in mind that lowering memory usage will affect performance, always profile 61and benchmark your settings to determine if it's worth the trade-off. 62 63The typical buffer usage of libtorrent, for a single download, with the cache 64size set to 256 blocks (256 * 16 kiB = 4 MiB) is:: 65 66 read cache: 128.6 (2058 kiB) 67 write cache: 103.5 (1656 kiB) 68 receive buffers: 7.3 (117 kiB) 69 send buffers: 4.8 (77 kiB) 70 hash temp: 0.001 (19 Bytes) 71 72The receive buffers is proportional to the number of connections we make, and is 73limited by the total number of connections in the session (default is 200). 74 75The send buffers is proportional to the number of upload slots that are allowed 76in the session. The default is auto configured based on the observed upload rate. 77 78The read and write cache can be controlled (see section below). 79 80The "hash temp" entry size depends on whether or not hashing is optimized for 81speed or memory usage. In this test run it was optimized for memory usage. 82 83disable disk cache 84------------------ 85 86The bulk of the memory libtorrent will use is used for the disk cache. To save 87the absolute most amount of memory, you can disable the cache by setting 88settings_pack::cache_size to 0. You might want to consider using the cache 89but just disable caching read operations. You do this by settings 90settings_pack::use_read_cache to false. This is the main factor in how much 91memory will be used by the client. Keep in mind that you will degrade performance 92by disabling the cache. You should benchmark the disk access in order to make an 93informed trade-off. 94 95remove torrents 96--------------- 97 98Torrents that have been added to libtorrent will inevitably use up memory, even 99when it's paused. A paused torrent will not use any peer connection objects or 100any send or receive buffers though. Any added torrent holds the entire .torrent 101file in memory, it also remembers the entire list of peers that it's heard about 102(which can be fairly long unless it's capped). It also retains information about 103which blocks and pieces we have on disk, which can be significant for torrents 104with many pieces. 105 106If you need to minimize the memory footprint, consider removing torrents from 107the session rather than pausing them. This will likely only make a difference 108when you have a very large number of torrents in a session. 109 110The downside of removing them is that they will no longer be auto-managed. Paused 111auto managed torrents are scraped periodically, to determine which torrents are 112in the greatest need of seeding, and libtorrent will prioritize to seed those. 113 114socket buffer sizes 115------------------- 116 117You can make libtorrent explicitly set the kernel buffer sizes of all its peer 118sockets. If you set this to a low number, you may see reduced throughput, especially 119for high latency connections. It is however an opportunity to save memory per 120connection, and might be worth considering if you have a very large number of 121peer connections. This memory will not be visible in your process, this sets 122the amount of kernel memory is used for your sockets. 123 124Change this by setting settings_pack::recv_socket_buffer_size and 125settings_pack::send_socket_buffer_size. 126 127peer list size 128-------------- 129 130The default maximum for the peer list is 4000 peers. For IPv4 peers, each peer 131entry uses 32 bytes, which ends up using 128 kB per torrent. If seeding 4 popular 132torrents, the peer lists alone uses about half a megabyte. 133 134The default limit is the same for paused torrents as well, so if you have a 135large number of paused torrents (that are popular) it will be even more 136significant. 137 138If you're short of memory, you should consider lowering the limit. 500 is probably 139enough. You can do this by setting settings_pack::max_peerlist_size to 140the max number of peers you want in a torrent's peer list. This limit applies per 141torrent. For 5 torrents, the total number of peers in peer lists will be 5 times 142the setting. 143 144You should also lower the same limit but for paused torrents. It might even make sense 145to set that even lower, since you only need a few peers to start up while waiting 146for the tracker and DHT to give you fresh ones. The max peer list size for paused 147torrents is set by settings_pack::max_paused_peerlist_size. 148 149The drawback of lowering this number is that if you end up in a position where 150the tracker is down for an extended period of time, your only hope of finding live 151peers is to go through your list of all peers you've ever seen. Having a large 152peer list will also help increase performance when starting up, since the torrent 153can start connecting to peers in parallel with connecting to the tracker. 154 155send buffer watermark 156--------------------- 157 158The send buffer watermark controls when libtorrent will ask the disk I/O thread 159to read blocks from disk, and append it to a peer's send buffer. 160 161When the send buffer has fewer than or equal number of bytes as 162settings_pack::send_buffer_watermark, the peer will ask the disk I/O thread 163for more data to send. The trade-off here is between wasting memory by having too 164much data in the send buffer, and hurting send rate by starving out the socket, 165waiting for the disk read operation to complete. 166 167If your main objective is memory usage and you're not concerned about being able 168to achieve high send rates, you can set the watermark to 9 bytes. This will guarantee 169that no more than a single (16 kiB) block will be on the send buffer at a time, for 170all peers. This is the least amount of memory possible for the send buffer. 171 172You should benchmark your max send rate when adjusting this setting. If you have 173a very fast disk, you are less likely see a performance hit. 174 175reduce executable size 176---------------------- 177 178Compilers generally add a significant number of bytes to executables that make use 179of C++ exceptions. By disabling exceptions (-fno-exceptions on GCC), you can 180reduce the executable size with up to 45%. In order to build without exception 181support, you need to patch parts of boost. 182 183Also make sure to optimize for size when compiling. 184 185Another way of reducing the executable size is to disable code that isn't used. 186There are a number of ``TORRENT_*`` macros that control which features are included 187in libtorrent. If these macros are used to strip down libtorrent, make sure the same 188macros are defined when building libtorrent as when linking against it. If these 189are different the structures will look different from the libtorrent side and from 190the client side and memory corruption will follow. 191 192One, probably, safe macro to define is ``TORRENT_NO_DEPRECATE`` which removes all 193deprecated functions and struct members. As long as no deprecated functions are 194relied upon, this should be a simple way to eliminate a little bit of code. 195 196For all available options, see the `building libtorrent`_ section. 197 198.. _`building libtorrent`: building.html 199 200high performance seeding 201======================== 202 203In the case of a high volume seed, there are two main concerns. Performance and scalability. 204This translates into high send rates, and low memory and CPU usage per peer connection. 205 206file pool 207--------- 208 209libtorrent keeps an LRU file cache. Each file that is opened, is stuck in the cache. The main 210purpose of this is because of anti-virus software that hooks on file-open and file close to 211scan the file. Anti-virus software that does that will significantly increase the cost of 212opening and closing files. However, for a high performance seed, the file open/close might 213be so frequent that it becomes a significant cost. It might therefore be a good idea to allow 214a large file descriptor cache. Adjust this though settings_pack::file_pool_size. 215 216Don't forget to set a high rlimit for file descriptors in your process as well. This limit 217must be high enough to keep all connections and files open. 218 219disk cache 220---------- 221 222You typically want to set the cache size to as high as possible. The 223settings_pack::cache_size is specified in 16 kiB blocks. Since you're seeding, 224the cache would be useless unless you also set settings_pack::use_read_cache 225to true. 226 227In order to increase the possibility of read cache hits, set the 228settings_pack::cache_expiry to a large number. This won't degrade anything as 229long as the client is only seeding, and not downloading any torrents. 230 231There's a *guided cache* mode. This means the size of the read cache line that's 232stored in the cache is determined based on the upload rate to the peer that 233triggered the read operation. The idea being that slow peers don't use up a 234disproportional amount of space in the cache. This is enabled through 235settings_pack::guided_read_cache. 236 237In cases where the assumption is that the cache is only used as a read-ahead, and that no 238other peer will ever request the same block while it's still in the cache, the read 239cache can be set to be *volatile*. This means that every block that is requested out of 240the read cache is removed immediately. This saves a significant amount of cache space 241which can be used as read-ahead for other peers. To enable volatile read cache, set 242settings_pack::volatile_read_cache to true. 243 244uTP-TCP mixed mode 245------------------ 246 247libtorrent supports uTP_, which has a delay based congestion controller. In order to 248avoid having a single TCP bittorrent connection completely starve out any uTP connection, 249there is a mixed mode algorithm. This attempts to detect congestion on the uTP peers and 250throttle TCP to avoid it taking over all bandwidth. This balances the bandwidth resources 251between the two protocols. When running on a network where the bandwidth is in such an 252abundance that it's virtually infinite, this algorithm is no longer necessary, and might 253even be harmful to throughput. It is advised to experiment with the 254settings_pack::mixed_mode_algorithm, setting it to settings_pack::prefer_tcp. 255This setting entirely disables the balancing and un-throttles all connections. On a typical 256home connection, this would mean that none of the benefits of uTP would be preserved 257(the modem's send buffer would be full at all times) and uTP connections would for the most 258part be squashed by the TCP traffic. 259 260.. _`uTP`: utp.html 261 262send buffer low watermark 263------------------------- 264 265libtorrent uses a low watermark for send buffers to determine when a new piece should 266be requested from the disk I/O subsystem, to be appended to the send buffer. The low 267watermark is determined based on the send rate of the socket. It needs to be large 268enough to not draining the socket's send buffer before the disk operation completes. 269 270The watermark is bound to a max value, to avoid buffer sizes growing out of control. 271The default max send buffer size might not be enough to sustain very high upload rates, 272and you might have to increase it. It's specified in bytes in 273settings_pack::send_buffer_watermark. 274 275peers 276----- 277 278First of all, in order to allow many connections, set the global connection limit 279high, settings_pack::connections_limit. Also set the upload rate limit to 280infinite, settings_pack::upload_rate_limit, 0 means infinite. 281 282When dealing with a large number of peers, it might be a good idea to have slightly 283stricter timeouts, to get rid of lingering connections as soon as possible. 284 285There are a couple of relevant settings: settings_pack::request_timeout, 286settings_pack::peer_timeout and settings_pack::inactivity_timeout. 287 288For seeds that are critical for a delivery system, you most likely want to allow 289multiple connections from the same IP. That way two people from behind the same NAT 290can use the service simultaneously. This is controlled by 291settings_pack::allow_multiple_connections_per_ip. 292 293In order to always unchoke peers, turn off automatic unchoke by setting 294settings_pack::choking_algorithm to settings_pack::fixed_slot_choker and set the number 295of upload slots to a large number via settings_pack::unchoke_slots_limit, 296or use -1 (which means infinite). 297 298torrent limits 299-------------- 300 301To seed thousands of torrents, you need to increase the settings_pack::active_limit 302and settings_pack::active_seeds. 303 304SHA-1 hashing 305------------- 306 307When downloading at very high rates, it is possible to have the CPU be the 308bottleneck for passing every downloaded byte through SHA-1. In order to enable 309calculating SHA-1 hashes in parallel, on multi-core systems, set 310settings_pack::aio_threads to the number of threads libtorrent should 311perform I/O and do SHA-1 hashing in. Only if that thread is close to saturating 312one core does it make sense to increase the number of threads. 313 314scalability 315=========== 316 317In order to make more efficient use of the libtorrent interface when running a large 318number of torrents simultaneously, one can use the ``session::get_torrent_status()`` call 319together with ``session::post_torrent_updates()``. Keep in mind that every call into 320libtorrent that return some value have to block your thread while posting a message to 321the main network thread and then wait for a response. Calls that don't return any data 322will simply post the message and then immediately return, performing the work 323asynchronously. The time this takes might become significant once you reach a 324few hundred torrents, depending on how many calls you make to each torrent and how often. 325``session::get_torrent_status()`` lets you query the status of all torrents in a single call. 326This will actually loop through all torrents and run a provided predicate function to 327determine whether or not to include it in the returned vector. 328 329To use ``session::post_torrent_updates()`` torrents need to have the ``flag_update_subscribe`` 330flag set. When post_torrent_updates() is called, a state_update_alert alert 331is posted, with all the torrents that have updated since the last time this 332function was called. The client have to keep its own state of all torrents, and 333update it based on this alert. 334 335benchmarking 336============ 337 338There is a bunch of built-in instrumentation of libtorrent that can be used to get an insight 339into what it's doing and how well it performs. This instrumentation is enabled by defining 340preprocessor symbols when building. 341 342There are also a number of scripts that parses the log files and generates graphs (requires 343gnuplot and python). 344 345understanding the disk threads 346============================== 347 348*This section is somewhat outdated, there are potentially more than one disk 349thread* 350 351All disk operations are funneled through a separate thread, referred to as the 352disk thread. The main interface to the disk thread is a queue where disk jobs 353are posted, and the results of these jobs are then posted back on the main 354thread's io_service. 355 356A disk job is essentially one of: 357 3581. write this block to disk, i.e. a write job. For the most part this is just a 359 matter of sticking the block in the disk cache, but if we've run out of 360 cache space or completed a whole piece, we'll also flush blocks to disk. 361 This is typically very fast, since the OS just sticks these buffers in its 362 write cache which will be flushed at a later time, presumably when the drive 363 head will pass the place on the platter where the blocks go. 364 3652. read this block from disk. The first thing that happens is we look in the 366 cache to see if the block is already in RAM. If it is, we'll return 367 immediately with this block. If it's a cache miss, we'll have to hit the 368 disk. Here we decide to defer this job. We find the physical offset on the 369 drive for this block and insert the job in an ordered queue, sorted by the 370 physical location. At a later time, once we don't have any more non-read 371 jobs left in the queue, we pick one read job out of the ordered queue and 372 service it. The order we pick jobs out of the queue is according to an 373 elevator cursor moving up and down along the ordered queue of read jobs. If 374 we have enough space in the cache we'll read read_cache_line_size number of 375 blocks and stick those in the cache. This defaults to 32 blocks. If the 376 system supports asynchronous I/O (Windows, Linux, Mac OS X, BSD, Solars for 377 instance), jobs will be issued immediately to the OS. This especially 378 increases read throughput, since the OS has a much greater flexibility to 379 reorder the read jobs. 380 381Other disk job consist of operations that needs to be synchronized with the 382disk I/O, like renaming files, closing files, flushing the cache, updating the 383settings etc. These are relatively rare though. 384 385contributions 386============= 387 388If you have added instrumentation for some part of libtorrent that is not 389covered here, or if you have improved any of the parser scrips, please consider 390contributing it back to the project. 391 392If you have run tests and found that some algorithm or default value in 393libtorrent are suboptimal, please contribute that knowledge back as well, to 394allow us to improve the library. 395 396If you have additional suggestions on how to tune libtorrent for any specific 397use case, please let us know and we'll update this document. 398 399