• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

MakefileH A D03-May-202279 106

READMEH A D03-May-202220.7 KiB510418

dolly.cH A D03-May-202258.6 KiB2,2061,817

README

1	    Dolly - A program to clone disks / partitions
2	    ---------------------------------------------
3
4			    Version  0.58C
5			    24 March 2005
6		   Felix Rauch <rauch@inf.ethz.ch>
7
8
9This document describes the program "dolly", its purpose and the
10format of the required config-file.
11
12
13Purpose
14-------
15
16Dolly is used to clone the installation of one machine to (possibly
17many) other machines. It can distribute image-files (even bzipped),
18partitions or whole hard disk drives to other partitions or hard disk
19drives. As it forms a "virtual TCP ring" to distribute data, it works
20best with fast switched networks (we were able to clone a 2 GB Windows
21NT partition to 15 machines in our cluster over Gigabit Ethernet in
22less than 4 minutes).
23
24As dolly clones whole partitions block-wise it works for most
25filesystems. We used it to clone partitions of the following type:
26Linux, Windows NT, Oberon, Solaris (most of our machines have multi
27boot setups). We have a small (additional) Linux installation on all
28of our machines or use a small one-floppy-disk-linux (e.g. muLinux) to
29do the cloning. On newer machines we use PXE to boot a small system in
30a RAM disk. From that system we then clone the hard disks in the
31machines.
32
33
34How it works
35------------
36
37Setting up or upgrading a cluster of PCs typically leads to the
38problem that many machines need the exact same files. There are
39different approaches to distribute the setup of one "master" machine
40to all the other machines in the cluster. Our approach is not
41sophisticated, but simple and fast (at least for fast switched
42networks). We send the data around in a "virtual TCP ring" from the
43server to all the clients which store tha received data on their local
44disks.
45
46One machine is the master and distributes the data to the others. The
47master can be a machine of the cluster or some other machine (in the
48current version of dolly it should be the same architecture
49though). It stores the image of the partition or disk to be cloned or
50has the partition on a local disk. The server should be on a fast
51switched network (as all the other machines too) for fast cloning.
52
53All other machines are clients. They receive the data from the ring,
54store it to the local disk and send it to the next machine in the
55ring. It is important to note that all of this happens at the same
56time.
57
58The cloning process is depicted in the following two figures. Usually
59there are more than two clients, but you get the idea:
60
61      +--------+  +----------+ +----------+
62      | Master |  | Client 1 | | Client 2 |
63      +----+---+  +---|------+ +----+-----+
64            \         |            /
65             \    +---+----+      /
66              +---+ Switch |-----+
67                  +--------+
68
69        Cloning process, physical network
70
71
72     +--------+  Data   +----------+  Data  +----------+
73     | Master |-------->| Client 1 |------->| Client 2 |
74     +--------+         +----------+        +----------+
75         ^                   |                   |
76         | Data              | Data              | Data
77         |                   V                   V
78      +------+            +------+            +------+
79      | Disk |            | Disk |            | Disk |
80      +------+            +------+            +------+
81
82     Cloning process, virtual network with TCP connections
83
84
85We choose this method instead of a multicast scheme because it is
86simple to implement, doesn't require the need to write a reliable
87multicast protocol and works quite well with existing
88technologies. One could also use the master as an NFS server and copy
89the data to each client, but this puts quite a high load on the server
90and makes it the bottleneck. Furthermore, it would not be possible to
91directly clone partitions from one machine to some others without any
92filesystem in the partition.
93
94
95Different cloning possibilities
96-------------------------------
97
98There are different possibilities to clone your master machine:
99
100- You already have an image of the partition which you want to clone
101  on your master (raw or compressed). In this case you need Linux
102  (some other UNIX might also work, but we haven't tested that yet) on
103  your master and a Linux on each client.
104
105- You want to clone a partition which is on a local disk of your
106  master. In this case you need Linux (or probably another UNIX, we
107  haven't tried that) on your master as well as on all the clients.
108  You can use any Linux installation as long as it's not the one you
109  want to clone (i.e. you can not clone the Linux which you are
110  currently running in. See the warning below).
111
112- You want to clone a whole disk including all the partitions. In this
113  case you either need a second disk on all machines where your Linux
114  used for the cloning process runs on (not the one you want to clone)
115  or you need a small one-floppy-disk-Linux which you boot on all
116  machines. In the later case you also need dolly on all machines
117  (copy it to your floppy disk or mount it with NFS) and the
118  config-file on the master.
119
120WARNING: You can NOT clone an OS which is currently in use. That is why
121         we have a small second Linux installation on all of our machines
122         (or a small system that can be booted over the network by PXE),
123         which we can boot to clone our regular Linux partition.
124
125
126Changes since version 0.2
127-------------------------
128
129We applied some changes to Dolly since version 0.2. Most of them are
130not very important.
131
132- Dolly as a benchmarking tool.
133  Dolly can now be used to benchmark your network. In the dummy mode,
134  Dolly will not access the hard disk, neither for reading nor for
135  writing. It just transfers data between your machines. This might be
136  useful for testing the throughput of your switch. The running time
137  for such a run can be specified with the "-t" option on the command
138  line. With the "-o" option you can specify a logfile where Dolly
139  will write some statistical information.
140
141- Using extra network interfaces.
142  It's now possible to use multiple network interfaces for the data
143  transfer. This is mostly useful if you have multiple network
144  interfaces with similar speeds, e.g. two fast ethernet networks (one
145  for administration/logins and the other for your applications
146  communication). For example: If your machines are connected with two
147  fast ethernet links, then you should be able to increase the
148  thourghput of the cloning process from 10 to 20 MB/s, therefore
149  cutting the cloning-time by half.
150  You need the "add" option in the config file to use this feature.
151  WARNING: This feature has only been tested with the linear network
152  topology (no fanout option or "fanout 1" option in the config file).
153
154- Different networking topologies.
155  We tried different topologies (binary trees, ternary trees, ...) to
156  get somre more results in a paper, but the initial multi-drop chain
157  (virtual TCP ring) is still the best. You will most likely not need
158  this feature.
159
160
161Change in version 0.57
162----------------------
163
164Besides some bug-files and smaller improvements, it's now possible to
165split an image in multiple files for archival and send the
166multiple-file image to the clients. This allows to story arbitrary
167long partitions on file systems with a file size limit. For details
168and examples, see the section about the configuration file below
169(parameters infile and outfile).
170
171
172Change in version 0.58
173----------------------
174
175Thanks to David Mathog, dolly is now able to read or write data from
176its standard input or to its standard output. That means that you can
177e.g. pipe a tar stream through dolly. Whether that feature is useful
178or not depends on your situation. By using tar (instead of cloning the
179whole partition) your disks' reads and writes will be slower, but you
180only transfer the data that is actually needed. This feature might be
181most useful in situations where e.g. your disks/partitions are mostly
182empty or have different sizes/geometries.
183
184Please note that version 0.58 has not yet been thoroughly tested (I'm
185no longer working with clusters). E.g. it is not yet clear what
186happens when somebody tries to reach you with the "write", "talk" or
187"wall" commands while dolly is running (which might potentially
188interfere with with your stdin/stdout, see below).
189
190Note also, that since all of dolly's output is now written to stderr
191(instead of stdout as before), some third-party scripts might no
192longer work.
193
194To use the feature, you should specify /dev/stdin as your infile
195and/or /dev/stdout as your outfile.
196
197
198Change in version 0.58C
199-----------------------
200
201Again, thanks to David Mathog, dolly can now be run without explicit
202sync() at the end of the cloning process (option "-n"). This can speed
203up dolly's runtime considerably when cloning smaller files, but there
204is no garantuee that the data actually made it to the disk if there is
205e.g. a power loss right after dolly finished.
206
207
208Configuration file
209------------------
210
211You need a configuration file for the cloning process. Its format is
212strict, but easy. It contains the following entries (note that the
213order of the entries is fix):
214(The text after "Syntax:" explains the syntax of the entry, the lines
215following "EG:" are example lines)
216
2171. The file/partition you want to clone, preceeded by the keywords
218   "infile" or "compressed infile" in case of a compressed image.
219   This file or partitions needs to be available on the master only.
220   Dolly will warn you if you try to use a compressed infile which
221   does not end with ".bz2". The compressed keyword is important so
222   that the master can inform the clients when they have to use bzcat
223   before writing a file. The optional keyword "split" after the
224   filename instructs Dolly to read all files with the given name and
225   an appended number, separated by an underscore.
226   Syntax: [compressed] infile <input file or device> [split]
227   EG: infile /dev/da10
228       Will just send the partition /dev/da10 to all clients.
229   EG: compressed infile /images/cloneimages/da10_WinNTRes.bz2
230       Will send the given file compressed to all the clients,
231       instructing them to uncompress the image before writing it.
232   EG: infile /images/cloneimages/da split
233       Will send all files of the form /images/cloneimages/da_<number>
234       in order to the clients.
235   EG: compressed infile /images/cloneimages/da.bz2 split
236       Will send all files of the form /images/cloneimages/da.bz2_<number>
237       in order to the clients, instructing them to decompress the
238       incoming stream before writing it.
239
2402. The file or partition you want to write (usually its a partition,
241   but you can also write to a file) after the keyword "outfile". This
242   file needs to be available on the clients only. The optional
243   keyword "compressed" instructs the server to compress the data
244   before sending it, so the client will store the data
245   compressed. The optional keyword "split" after the filename,
246   followed by a number and a multiplier, instructs the client to
247   write the data in junks of no more than the given size. This is
248   useful if the file system on your client does not allow files
249   greater than a certain size. The files will be stored with the
250   given namen and an appended "_<number>".
251   Syntax: [compressed] outfile <output file or device> [split <n>(k|M|G|T)]
252   EG: outfile /dev/da10
253       Will store the incoming data stream to the partition da10.
254   EG: compressed outfile /images/cloneimages/da10_SuSE81.bz2
255       Will store the compressed data stream in the given file.
256   EG: compressed outfile /images/cloneimages/da_all.bz2 split 2G
257       Will store the incoming compressed data stream in the directory
258      /images/cloneimages/ in files da_all.bz2_0, da_all.bz2_1 and so on.
259
260-. Instead of the first two entries ("infile" and "outfile") it is
261   also possible to use the single line "dummy [<MB>]", where <MB> is
262   the number of Megabytes to transfer in dummy mode. If <MB> is set
263   to 0, then the clients will just terminate. This is useful when
264   benchmarking with different options, so the clients can run all the
265   time. To finally terminate them on all clients, just set dummy to 0.
266   NOTE: It is probably better to use the newer "-t" switch on the
267   server to specify the number of seconds the benchmarks should
268   run. In that case you can leave the <MB> blank.
269   Syntax: dummy [<MB>]
270   EG: dummy 128
271
272-. The optional keyword "segsize" is mostly used to benchmark
273   switches. It specifies the maximal size of TCP segments during the
274   network transfer. Usually you don't need to specify this option at
275   all.
276   Syntax: segsize <TCP_MAXSEG size>
277   EG: segsize 128
278
279-. With the optional keyword "add" it is possible to add more
280   interfaces to use. The network traffic is then evenly distributed
281   across the interfaces. This option is useful if you have for
282   example two fast ethernet interfaces in your machines: One for
283   administrative purposes and one for your main application on the
284   cluster. This option is not so useful if you have multiple
285   interfaces with different bandwidths. In this case just use the
286   fastest available.
287   You have to specify the number of additional interfaces and the
288   suffixes of thouse interfaces. For example, in a cluster where the
289   machines are named slave0..slave15 on their default interfaces and
290   all the machines have a second interface named
291   slave0-fast..slave15-fast, you should use the line specified below
292   (EG).
293   Syntax: add <nr>:<suffix>{:<suffix>}
294   EG: add 1:-fast
295
296-. The optional keyword "fanout" was mostly used during performance
297   tests of different network topologies. You barely need it in
298   practice. Fanout specifies the number of outlinks from the server
299   and the following machines (except the leafes). A fanout of 1 is a
300   linear list (the default behaviour of Dolly and usually the
301   fastest), 2 is a binary tree, 3 is a ternary tree, etc. Dolly
302   automatically connects all the specified clients with the desired
303   topology.
304   Syntax: fanout <fanout>
305   EG: fannout 1
306
307-. The optional keyword "hyphennormal" instructs Dolly to treat the '-'
308   character in hostnames as any other character. By default the
309   hyphen is used to separate the base hostnames from the names of the
310   different interface (e.g. "node12-giga"). You might use this
311   paramater if your hostnames include a hypen (like e.g. "node-12").
312   Syntax: hyphennormal
313   EG: hyphennormal
314
3153. After the keyword "server" follows the hostname of the server (or
316   master). This is required for the last machine in the ring to be
317   able to send the end-acknowledge back to the server.
318   Syntax: server <master machine>
319   EG: server cluster-master
320
3214. This entry has the keyword "firstclient" followed by the hostname
322   of the first client in the ring. You should use the hostname of the
323   machine here, not the name of the interface where you want to
324   connect.
325   Syntax: firstclient <name of first machine>
326   EG: firstclient cluster-1
327
3285. This entry has the keyword "lastclient" followed by the hostname of
329   the last client in the ring. You should use the hostname of the
330   machine here, not the name of the interface where you want to
331   connect.
332   Syntax: lastclient <name of last machine>
333   EG: lastclient cluster-9
334
3356. This entry specifies how many clients are in the ring. The keyword
336   is "clients" followed by the actual number of clients. This number
337   does not include the master.
338   Syntax: clients <number of clients>
339   EG: clients 9
340
3417. The following lines contain the interface-names of the client
342   machines. The number of machines must match the above number of
343   clients (see 6.). You should use the name of the interface on
344   which the machines will receive the data.
345   Syntax: <name of client 1>
346           <name of client 2>
347           [...]
348           <name of client n>
349   EG: cluster-1-giga
350       cluster-2-giga
351       [...]
352       cluster-9-giga
353
3548. The last entry in the config file consists of the keyword
355   "endconfig" and marks the end of the configuration file.
356   Syntax: endconfig
357   EG: endconfig
358
359
360Note on nodes' hostnames
361------------------------
362
363On some machines (e.g. with very small maintenance installations),
364gethostbyname() does not return the hostname (I don't know why). If
365you have that problem, you should make sure that the environment
366variables MYNODENAME or HOST are set accordingly. Dolly first tries to
367get the environment variable MYNODENAME, then HOST, then it tries
368gethostbyname(). This feature was introduced in dolly version 0.58.
369
370
371Dolly options
372-------------
373
374Dolly has a few options which are explained here:
375
376  -h
377    Prints a short help and exits.
378
379  -V
380    Prints the version number as well as the date of that version and exits.
381
382  -v
383    This switches to verbose mode in which dolly prints out a little
384    bit more information. This option is recommended if you want to
385    know what's going on during cloning and it might be helpful during
386    debugging.
387
388  -s
389    This option specifies the server machine and should only be used
390    on the master. Dolly will warn you if the config file specifies
391    another master than the machine on which this option is set.
392    This option must be secified before the "-f" option!
393
394  -S
395    Same as "-s", but dolly will not warn you if the server's hostname
396    and the name specified in the config file do not match.
397
398  -q
399    Usually dolly will print a warning when the select() system call
400    is interrupted by a signal. This option suppresses these warnings.
401
402  -c
403    With this option it is possible to specify the uncompressed size
404    of a compressed file. It's only needed for performance statistics
405    at the end of a cloning process and not important if you are not
406    interested in the statistics.
407
408  -d
409    The "Dummy" option disables all disk accesses. It can be used to
410    benchmark the throughput of your system (computers, network,
411    switches). This option must be specified before the "-f" option!
412
413  -t <seconds>
414    When in dummy mode, this option allows to specify how long the
415    testrun should approximately take. Since the dummy mode is mostly
416    used for benchmarking purposes and single runs might result in
417    different speeds (especially with many machines and bad switches
418    or with small TCP segment sizes), it's more convenient to specify
419    the run-lenght in seconds, as the benchmark-time becomes more
420    predictable.
421
422  -f <config file>
423    This option is used to select the config file for this cloning
424    process. This option makes only sense on the master machine and
425    the configuration file must exist on the master.
426
427  -o <logfile>
428    This option specifies the logfile. Dolly will write some
429    statistical information into the logfile. it is mostly
430    used when benchmarking switches. The format of the lines in the
431    logfile is as follows:
432    Trans. data  Segsize Clients Time      Dataflow  Agg. dataflow
433    [MB]         [Byte]  [#]     [s]       [MB/s]    [MB/s]
434
435  -a <timeout>
436    Sometimes it might be useful if Dolly would terminate instead of
437    waiting indefinitely in case something goes wrong. This option
438    lets you specify this timeout. If dolly could not transfer any
439    data after <timeout> seconds, then it will simply print an error
440    message and terminate. This feature might be especially useful for
441    scripted and automatic installations (such as "CloneSys"), where
442    you don't want to have dolly-processes hang around if a machine
443    hangs.
444
445  -n
446    Do not sync() before exit. Thus, dolly will exit sooner, but data
447    may not make it to disk if power fails soon after dolly exits.
448
449
450Starting the process
451--------------------
452
453To start the cloning, you need to start dolly on each machine. It is
454recommended to start it with the "-v" (verbose) option. The order in
455which you start the programs on the master and the clients doesn't
456matter. You must give the "-s" (server) option on exactly one machine
457(the master).
458
459When the machines have found each other and the ring is completed, the
460cloning starts. Dolly will print some progress information every
46110 MBytes.
462
463
464Example
465-------
466
467In this example we assume a cluster of 16 machines, named
468node0..node15. We want to clone the partition da5 from node0 to all
469other nodes. The configuration file (let's name it dollytab.cfg)
470should then look as follows:
471  infile /dev/da5
472  outfile /dev/da5
473  server node0
474  firstclient node1
475  lastclient node15
476  clients 15
477  node1
478  node2
479  node3
480  node4
481  node5
482  node6
483  node7
484  node8
485  node9
486  node10
487  node11
488  node12
489  node13
490  node14
491  node15
492  endconfig
493Next, we start Dolly on all the clients. No options are required for
494the clients (but you might want to add the "-v" option for verbose
495progress reports). Finally, Dolly is started on the server as follows:
496  dolly -v -s -f dollytab.cfg
497That's all.
498
499
500Bibliography
501------------
502
503Felix Rauch, Christian Kurmann, Thomas M. Stricker: <em>Optimizing the
504distribution of large data sets in theory and practice</em>. Concurrency
505and Computation: Practice and Experience, volume 14, issue 3, pages
506165-181, april 2002. (c) John Wiley & Sons, Ltd.
507
508Maintained by Felix Rauch.
509http://www.cs.inf.ethz.ch/~rauch/
510