• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

afl/H30-Mar-2020-5642

LICENSE.mdH A D30-Mar-2020755 1511

MakefileH A D30-Mar-20201.5 KiB7560

README.mdH A D30-Mar-202015.8 KiB399321

TODO.mdH A D30-Mar-20202.3 KiB5944

blocks.cH A D30-Mar-202013 KiB545341

client.cH A D30-Mar-20202.8 KiB10963

compats.cH A D30-Mar-202075.3 KiB2,6881,587

configureH A D30-Mar-202073.1 KiB2,3892,045

downloader.cH A D30-Mar-202013.7 KiB577347

extern.hH A D30-Mar-202012.5 KiB393277

fargs.cH A D30-Mar-20203.4 KiB13893

flist.cH A D30-Mar-202035.6 KiB1,5701,068

hash.cH A D30-Mar-20202.4 KiB9855

ids.cH A D30-Mar-20207 KiB319205

io.cH A D30-Mar-202014.9 KiB728448

log.cH A D30-Mar-20204 KiB193128

main.cH A D30-Mar-202013.8 KiB549385

md4.cH A D30-Mar-20207.3 KiB266162

md4.hH A D30-Mar-20201.4 KiB4614

misc.cH A D30-Mar-20202.3 KiB8445

mkpath.cH A D30-Mar-20202.3 KiB7934

mktemp.cH A D30-Mar-20207.7 KiB334227

openrsync.1H A D30-Mar-20207.3 KiB306305

receiver.cH A D30-Mar-202011.4 KiB489319

rsync.5H A D30-Mar-202012.9 KiB526525

rsyncd.5H A D30-Mar-20203.9 KiB136135

sender.cH A D30-Mar-202016.5 KiB695414

server.cH A D30-Mar-20203.8 KiB166101

session.cH A D30-Mar-20203.7 KiB161108

socket.cH A D30-Mar-202011.2 KiB504322

symlinks.cH A D30-Mar-20202.4 KiB10567

tests.cH A D30-Mar-202011.8 KiB660514

uploader.cH A D30-Mar-202023.9 KiB1,029674

README.md

1# Introduction
2
3**This system has been merged into OpenBSD base.  If you'd like to
4contribute to openrsync, please mail your patches to tech@openbsd.org.
5This repository is simply the OpenBSD version plus some glue for
6portability.**
7
8This is an implementation of [rsync](https://rsync.samba.org/) with a
9BSD (ISC) license.  It's compatible with a modern rsync (3.1.3 is used
10for testing, but any supporting protocol 27 will do), but accepts only a
11subset of rsync's command-line arguments.
12
13Its officially-supported operating system is OpenBSD, but it will
14compile and run on other UNIX systems.  See [Portability](#Portability)
15for details.
16
17The canonical documentation for openrsync is its manual pages.  See
18[rsync(5)](https://github.com/kristapsdz/openrsync/blob/master/rsync.5)
19and
20[rsyncd(5)](https://github.com/kristapsdz/openrsync/blob/master/rsyncd.5)
21for protocol details or utility documentation in
22[openrsync(1)](https://github.com/kristapsdz/openrsync/blob/master/openrsync.1).
23If you'd like to write your own rsync implementation, the protocol
24manpages should have all the information required.
25
26The [Architecture](#Architecture) and [Algorithm](#Algorithm) sections
27on this page serve to introduce developers to the source code.  They are
28non-canonical.
29
30## Project background
31
32openrsync is written as part of the
33[rpki-client(1)](https://medium.com/@jobsnijders/a-proposal-for-a-new-rpki-validator-openbsd-rpki-client-1-15b74e7a3f65)
34project, an
35[RPKI](https://en.wikipedia.org/wiki/Resource_Public_Key_Infrastructure)
36validator for OpenBSD.  openrsync was funded by
37[NetNod](https://www.netnod.se), [IIS.SE](https://www.iis.se),
38[SUNET](https://www.sunet.se) and [6connect](https://www.6connect.com).
39
40# Installation
41
42On an up-to-date UNIX system, simply download and run:
43
44```
45% ./configure
46% make
47# make install
48```
49
50This will install the openrsync utility and manual pages.
51It's ok to have an installation of rsync at the same time: the two will
52not collide in any way.
53
54If you upgrade your sources and want to re-install, just run the same.
55If you'd like to uninstall the sources:
56
57```
58# make uninstall
59```
60
61If you'd like to interact with the openrsync as a server, you can run
62the following:
63
64```
65% rsync --rsync-path=openrsync src/* dst
66% openrsync --rsync-path=openrsync src/* dst
67```
68
69If you'd like openrsync and rsync to interact, it's important to use
70command-line flags available on both.
71See
72[openrsync(1)](https://github.com/kristapsdz/openrsync/blob/master/openrsync.1)
73for a listing.
74
75# Algorithm
76
77For a robust description of the rsync algorithm, see "[The rsync
78algorithm](https://rsync.samba.org/tech_report/)", by Andrew Tridgell
79and Paul Mackerras.
80Andrew Tridgell's PhD thesis, "[Efficient Algorithms for Sorting and
81Synchronization](https://www.samba.org/~tridge/phd_thesis.pdf)", covers the
82topics in more detail.
83This gives a description suitable for delving into the source code.
84
85The rsync algorithm has two components: the *sender* and the *receiver*.
86The sender manages source files; the receiver manages the destination.
87In the following invocation, first the sender is host *remote* and the
88receiver is the localhost, then the opposite.
89
90```
91% openrsync -lrtp remote:foo/bar ~/baz/xyzzy
92% openrsync -lrtp ~/foo/bar remote:baz/xyzzy
93```
94
95The algorithm hinges upon a file list of names and metadata (e.g., mode,
96mtime, etc.) shared between components.
97The file list describes all source files of the update and is generated
98by the sender.
99The sharing is implemented in
100[flist.c](https://github.com/kristapsdz/openrsync/blob/master/flist.c).
101
102After sharing this list, both the receiver and sender independently sort
103the entries by the filenames' lexicographical order.
104This allows the file list to be sent and received out of order.
105The ordering preserves a directory-first order, so directories are
106processed before their contained files.
107Moreover, once sorted, both sender and receiver may refer to file
108entries by their position in the sorted array.
109
110After the receiver reads the list, it iterates through each file in
111the list, passing information to the sender so that the sender may send
112back instructions to update the file.
113This is called the "block exchange" and is the maintstay of the rsync
114algorithm.
115During the block exchange, the sender waits to receive a request for
116update or end of sequence message; once a request is received, it scans
117for new blocks to send to the receiver.
118
119Once the block exchange is complete, the files are all up to date.
120
121The receiver is implemented in
122[receiver.c](https://github.com/kristapsdz/openrsync/blob/master/receiver.c);
123the sender, in
124[sender.c](https://github.com/kristapsdz/openrsync/blob/master/sender.c).
125A great deal of the block exchange happens in
126[blocks.c](https://github.com/kristapsdz/openrsync/blob/master/blocks.c).
127
128## Block exchange
129
130The block exchange sequence is different for whether the file is a
131directory, symbolic link, or regular file.
132
133For symbolic links, the information required by the receiver is already
134encoded in the file list metadata.
135The symbolic link is updated to point to the correct target.
136No update is requested from the sender.
137
138For directories, the directory is created if it does not already exist.
139No update is requested from the sender.
140
141Regular files are handled as follows.
142First, the file is checked to see if it's up to date.
143This happens if the file size and last modification time are the same.
144If so, no update is requested from the sender.
145
146Otherwise, the receiver examines each file in blocks of a fixed size.
147See [Block sizes](#block-sizes) for details.
148(The terminal block may be smaller if the file size is not divisible by
149the block size.)
150If the file is empty or does not exist, it will have zero blocks.
151Each block is hashed twice: first, with a fast Adler-32 type 4-byte
152hash; second, with a slower MD4 16-byte hash.
153These hashes are implemented in
154[hash.c](https://github.com/kristapsdz/openrsync/blob/master/hash.c).
155The receiver sends the file's block hashes to the sender.
156
157Once accepted, the sender examines the corresponding file with the given
158blocks.
159For each byte in the source file, the sender computes a fast hash given
160the block size.
161It then looks for matching fast hashes in the sent block information.
162If it finds a match, it then computes and checks the slow hash.
163If no match is found, it continues to the next byte.
164The matching (and indeed all block operation) is implemented in
165[block.c](https://github.com/kristapsdz/openrsync/blob/master/block.c).
166
167When a match is found, the data prior to the match is first sent as a
168stream of bytes to the receiver.
169This is followed by an identifier for the found block, or zero if no
170more data is forthcoming.
171
172The receiver writes the stream of bytes first, then copies the data in
173the identified block if one has been specified.
174This continues until the end of file, at which point the file has been
175fully reconstituted.
176
177If the file does not exist on the receiver side---the basis case---the
178entire file is sent as a stream of bytes.
179
180Following this, the whole file is hashed using an MD4 hash.
181These hashes are then compared; and on success, the algorithm continues
182to the next file.
183
184## Block sizes
185
186The block size algorithm plays a crucial role in the protocol
187efficiency.
188In general, the block size is the rounded square root of the total file
189size.
190The minimum block size, however, is 700 B.
191Otherwise, the square root computation is simply
192[sqrt(3)](https://man.openbsd.org/sqrt.3) followed by
193[ceil(3)](https://man.openbsd.org/ceil.3)
194
195For reasons unknown, the square root result is rounded up to the nearest
196multiple of eight.
197
198# Architecture
199
200Each openrsync session is divided into a running *server* and *client*
201process.
202The client openrsync process is executed by the user.
203
204```
205% openrsync -rlpt host:path/to/source dest
206```
207
208The server openrsync is executed on a remote host either on-demand over
209[ssh(1)](https://man.openbsd.org/ssh.1) or as a persistent network
210daemon.
211If executed over [ssh(1)](https://man.openbsd.org/ssh.1), the server
212openrsync is distinguished from a client (user-started) openrsync by the
213**--server** flag.
214
215Once the client or server openrsync process starts, it examines the
216command-line arguments to determine whether it's in *receiver* or
217*sender* mode.
218(The daemon is sent the command-line arguments in a protocol-specific
219way described in
220[rsyncd(5)](https://github.com/kristapsdz/openrsync/blob/master/rsyncd.5),
221but otherwise does the same thing.)
222The receiver is the destination for files; the sender is the origin.
223There is always one receiver and one sender.
224
225The server process is explicitly instructed that it is a sender with the
226**--sender** command-line flag, otherwise it is a receiver.
227The client process implicitly determines its status by looking at the
228files passed on the command line for whether they are local or remote.
229
230```
231openrsync path/to/source host:destination
232openrsync host:source path/to/destination
233```
234
235In the first example, the client is the sender: it *sends* data from
236itself to the server.
237In the second, the opposite is true in that it *receives* data.
238
239The client's command-line files may have any of the following host
240specifications that determine locality.
241
242- local: *../path/to/source ../another*
243- remote server: *host:path/to/source :path/to/another*
244- remote daemon: *rsync://host/module/path ::another*
245
246Host specifications must be consistent: sources must all be local or all
247be remote on the same host.  Both may not be remote.  (**Aside**: it's
248technically possible to do this.  I'm not sure why the GPL rsync is
249limited to one or the other.)
250
251If the source or destination is on a remote server, the client then
252[fork(2)](https://man.openbsd.org/fork.2)s and starts the server
253openrsync on the remote host over
254[ssh(1)](https://man.openbsd.org/ssh.1).
255The client and the server subsequently communicate over
256[socketpair(2)](https://man.openbsd.org/socketpair.2) pipes.
257If on a remote daemon, the client does *not* fork, but instead connects
258to the standalone server with a network
259[socket(2)](https://man.openbsd.org/socket.2).
260
261The server's command-line, whether passed to an openrsync spawned on-demand
262over an [ssh(1)](https://man.openbsd.org/ssh.1) session or passed to the daemon,
263differs from the client's.
264
265```
266openrsync --server [--sender] . files...
267```
268
269The files given are either the single destination directory when in receiver
270mode, or the list of sources when in sender mode.
271The standalone full-stop is a mystery to me.
272
273Locality detection and routing to client and server run-times are
274handled in
275[main.c](https://github.com/kristapsdz/openrsync/blob/master/main.c).
276The client for a server is implemented in
277[client.c](https://github.com/kristapsdz/openrsync/blob/master/client.c)
278and the server in
279[server.c](https://github.com/kristapsdz/openrsync/blob/master/server.c).
280The client for a network daemon is in
281[socket.c](https://github.com/kristapsdz/openrsync/blob/master/socket.c).
282Invocation of the remote server openrsync is managed in
283[child.c](https://github.com/kristapsdz/openrsync/blob/master/child.c).
284
285Once the client and server begin, they start to negotiate the transfer
286of files over the connected socket.
287The protocol used is specified in
288[rsync(5)](https://github.com/kristapsdz/openrsync/blob/master/rsync.5).
289For daemon connections, the
290[rsyncd(5)](https://github.com/kristapsdz/openrsync/blob/master/rsyncd.5)
291protocol is also used for handshaking.
292
293The receiver side is managed in
294[receiver.c](https://github.com/kristapsdz/openrsync/blob/master/receiver.c)
295and the sender in
296[sender.c](https://github.com/kristapsdz/openrsync/blob/master/sender.c).
297
298The receiver side technically has two functions: not only must it upload
299block metadata to the sender, it must also handle data writes as they
300are sent by the sender.
301The rsync protocol is designed so that the sender receives block
302requests and continuously sends data to the receiver.
303
304To accomplish this, the receiver multitasks as the *uploader* and
305*downloader*.  These roles are implemented in
306[uploader.c](https://github.com/kristapsdz/openrsync/blob/master/uploader.c).
307and
308[downloader.c](https://github.com/kristapsdz/openrsync/blob/master/downloader.c),
309respectively.
310The multitasking takes place by a finite state machine driven by data
311coming from the sender and files on disc are they are ready to be
312checksummed and uploaded.
313
314The uploader scans through the list of files and asynchronously opens
315files to process blocks.
316While it waits for the files to open, it relinquishes control to the
317event loop.
318When files are available, it hashes and checksums blocks and uploads to
319the sender.
320
321The downloader waits on data from the sender.
322When data is ready (and prefixed by the file it will update), the
323downloader asynchronously opens the existing file to perform any block
324copying.
325When the file is available for reading, it then continues to read data
326from the sender and copy from the existing file.
327
328## Differences from rsync
329
330The design of rsync involves another mode running alongside the
331receiver: the generator.
332This is implemented as another process
333[fork(2)](https://man.openbsd.org/fork.2)ed from the receiver, and
334communicating with the receiver and sender.
335
336In openrsync, the generator and receiver are one process, and an event
337loop is used for speedy responses to read and write requests.
338
339# Security
340
341Besides the usual defensive programming, openrsync makes significant use
342of native security features.
343
344The system operations available to executing code are foremost limited
345by OpenBSD's [pledge(2)](https://man.openbsd.org/pledge.2).  The pledges
346given depend upon the operating mode.  For example, the receiver needs
347write access to the disc---but only when not in dry-run mode (**-n**).
348The daemon client needs DNS and network access, but only to a point.
349[pledge(2)](https://man.openbsd.org/pledge.2) allows available resources
350to be limited over the course of operation.
351
352The second tool is OpenBSD's
353[unveil(2)](https://man.openbsd.org/unveil.2), which limits access to
354the file-system.  This protects against rogue attempts to "break out" of
355the destination.  It's an attractive alternative to
356[chroot(2)](https://man.openbsd.org/chroot.2) because it doesn't require
357root permissions to execute.
358
359On the receiver side, the file-system is
360[unveil(2)](https://man.openbsd.org/unveil.2)ed at and beneath the
361destination directory.
362After the creation of the destination directory, only targets within
363that directory may be accessed or modified.
364
365On the sender side, input files (and directories) are
366[unveil(2)](https://man.openbsd.org/unveil.2)ed.
367After the generation of the file list, only sources specified or within
368specified directories may be accessed.
369
370Lastly, the MD4 hashs are seeded with
371[arc4random(3)](https://man.openbsd.org/arc4random.3) instead of with
372[time(3)](https://man.openbsd.org/time.3).  (This function is provided
373on a number of operating systems.) This is only applicable when running
374openrsync in server mode, as the server generates the seed.
375
376# Portability
377
378Many have asked about portability.
379
380The only officially-supported operating system is OpenBSD, as this has
381considerable security features.  openrsync does, however, use
382[oconfigure](https://github.com/kristapsdz/oconfigure) for compilation
383on non-OpenBSD systems.  This is to encourage porting.
384
385The actual work of porting is matching the security features provided by
386OpenBSD's [pledge(2)](https://man.openbsd.org/pledge.2) and
387[unveil(2)](https://man.openbsd.org/unveil.2).  These are critical
388elements to the functionality of the system.  Without them, your system
389accepts arbitrary data from the public network.
390
391This is possible (I think?) with FreeBSD's
392[Capsicum](https://man.freebsd.org/capsicum(4)), but Linux's security
393facilities are a mess, and will take an expert hand to properly secure.
394
395**rsync has specific running modes for the super-user**.
396It also pumps arbitrary data from the network onto your file-system.
397openrsync is about 10 000 lines of C code: do you trust me not to make
398mistakes?
399