• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

cmd/htcat/H08-Jun-2015-

CONTRIBUTORSH A D08-Jun-2015174

LICENSEH A D08-Jun-20151.3 KiB

README.mdH A D08-Jun-20154.5 KiB

defrag.goH A D08-Jun-20155.3 KiB

eager_reader.goH A D08-Jun-20151.9 KiB

eager_reader_test.goH A D08-Jun-2015468

error.goH A D08-Jun-2015222

http.goH A D08-Jun-20154.5 KiB

http_fragment.goH A D08-Jun-2015731

README.md

1# htcat #
2
3`htcat` is a utility to perform parallel, pipelined execution of a
4single HTTP `GET`.  `htcat` is intended for the purpose of
5incantations like:
6
7    htcat https://host.net/file.tar.gz | tar -zx
8
9It is tuned (and only really useful) for faster interconnects:
10
11    $ htcat http://test.com/file | pv -a > /dev/null
12    [ 109MB/s]
13
14This is on a gigabit network, between an AWS EC2 instance and S3.
15This represents 91% use of the theoretical maximum of gigabit (119.2
16MiB/s).
17
18## Installation ##
19
20This program depends on Go 1.1 or later.  One can use `go get` to
21download and compile it from source:
22
23    $ go get github.com/htcat/htcat/cmd/htcat
24
25## Help and Reporting Bugs ##
26
27For correspondence of all sorts, write to <htcat@googlegroups.com>.
28Bugs can be filed at
29[htcat's GitHub Issues page](https://github.com/htcat/htcat/issues).
30
31## Approach ##
32
33`htcat` works by determining the size of the `Content-Length` of the
34URL passed, and then partitioning the work into a series of `GET`s
35that use the `Range` header in the request, with the notable exception
36of the first issued `GET`, which has no `Range` header and is used to
37both start the transfer and attempt to determine the size of the URL.
38
39Unlike most programs that do similar `Range`-based splitting, the
40requests that are performed in parallel are limited to some bytes
41ahead of the data emitted so far instead of splitting the entire byte
42stream evenly.  The purpose of this is to emit those bytes as soon as
43reasonably possible, so that pipelined execution of another tool can,
44too, proceed in parallel.
45
46These requests may complete slightly out of order, and are held in
47reserve until contiguous bytes can be emitted by a defragmentation
48routine, that catenates together the complete, consecutive payloads in
49memory for emission.
50
51Tweaking the number of simultaneous transfers and the size of each
52`GET` makes a trade-off between latency to fill the output pipeline,
53memory usage, and churn in requests and connections and incurring
54their associated start-up costs.
55
56If `htcat`'s peer on the server side processes `Range` requests more
57slowly than regular `GET` without a `Range` header, then, `htcat`'s
58performance can suffer relative to a simpler, single-stream `GET`.
59
60## Numbers ##
61
62These are measurements falling well short of real benchmarks that are
63intended to give a rough sense of the performance improvements that
64may be useful to you.  These were taken via an AWS EC2 instance
65connecting to S3, and there is definitely some variation in runs,
66sometimes very significant, especially at the higher speeds.
67
68|Tool       | TLS | Rate     |
69|-----------|-----|----------|
70|htcat      | no  | 109 MB/s |
71|curl       | no  | 36 MB/s  |
72|aria2c -x5 | no  | 113 MB/s |
73|htcat      | yes | 59 MB/s  |
74|curl       | yes | 5 MB/s   |
75|aria2c -x5 | yes | 17 MB/s  |
76
77On somewhat small files, the situation changes: `htcat` chooses
78smaller parts, as to still get some parallelism.
79
80Below are results while performing a 13MB transfer from S3 (Seattle)
81to an EC2 instance in Virginia.  Notably, TLS being on or off did not
82seem to matter, perhaps in this case it was not a bottleneck.
83
84| Tool   | Time     |
85|--------|----------|
86| curl   | 5.20s    |
87| curl   | 7.75s    |
88| curl   | 6.36s    |
89| htcat  | 2.69s    |
90| htcat  | 2.50s    |
91| htcat  | 3.25s    |
92
93Results while performing a transfer of the same 13MB file from S3 to
94EC2, but all within Virginia:
95
96| Tool       | TLS | Time     |
97|------------|-----|----------|
98| curl       | no  | 0.29s    |
99| curl       | no  | 0.75s    |
100| curl       | no  | 0.44s    |
101| htcat      | no  | 0.30s    |
102| htcat      | no  | 0.30s    |
103| htcat      | no  | 0.48s    |
104| curl       | yes | 2.69s    |
105| curl       | yes | 2.69s    |
106| curl       | yes | 2.62s    |
107| htcat      | yes | 1.37s    |
108| htcat      | yes | 0.45s    |
109| htcat      | yes | 0.59s    |
110
111Results while performing a 4.6MB transfer on a fast (same-region)
112link.  This file is small enough that `htcat` disables multi-request
113parallelism.  Given that, it's unclear why `htcat` performs markedly
114better on the TLS tests than `curl`.
115
116| Tool       | TLS | Time     |
117|------------|-----|----------|
118| curl       | no  | 0.14s    |
119| curl       | no  | 0.13s    |
120| curl       | no  | 0.14s    |
121| htcat      | no  | 0.23s    |
122| htcat      | no  | 0.16s    |
123| htcat      | no  | 0.17s    |
124| curl       | yes | 0.95s    |
125| curl       | yes | 0.97s    |
126| curl       | yes | 0.99s    |
127| htcat      | yes | 0.38s    |
128| htcat      | yes | 0.34s    |
129| htcat      | yes | 0.24s    |
130