1$Id: README,v 1.11 2008/02/27 19:35:46 garbled Exp $ 2 3Welcome to clusterit-2.5 ! 4 5This is a collection of clustering tools, to turn your ordinary 6everyday pile of UNIX workstations into a speedy parallel beast. 7 8To get started quickly, please read the file INSTALL. 9 10Initially this work was based on the work of IBM's PSSP, and copied 11heavily from the ideas there. Its also lightly based on the work 12pioneered in GLUnix. I've decided to simplify, and complexify it 13however: 14 15Glunix is a monstrosity. It allows better control over the 16individual nodes, and much better load sharing. However I'm convinced 17alot of the speed advantages of having a parallel cluster are lost with 18the incredible overhead of running the glunix master and daemon services 19on a host. Glunix does however offer a real paralell programming 20environment. Something which is totally beyond the scope of this package. 21 22PSSP is also a very powerful set of tools. Not much more than a bunch 23of staples written in perl, they provide an incredible tool for tying 24an unwieldy number of UNIX machines into one fast demon of an MPP. 25 26The advantages of both systems are central control of a large number of 27machines. Unfortunately, they all have dwarbacks.. as does my solution. 28 29What my solution provides: 30 31*Fast* parallel execution of remote commands. 32 C vs. Perl. You do the math. 33 34Heterogenous cluster makeup. 35 This makes it very easy to administer a large number of machines, 36of varying architectures, and operating systems. The fact that my tools are 37completely architecture independent, make it possible to dsh commands out 38to machines that aren't even running the same OS! This can be useful for a 39variety of mass administration tasks an admin may have to undertake. 40 41Choice of authentication. 42 IBM forces you to use kerberos 4 for authentication on the SP's. 43This is actually fine for a closed environment like an SP, but for something 44to be run on just a stack of otherwise useful boxes, you need more freedom. 45This suite allows you to do whatever you like.. ssh, kerberos, .rhosts. 46Whatever suits your security and speed requirements best. 47 48Sequential node, and random node execution 49 The idea here is that these dsh-like programs allow you to do something 50akin to load balanced scripting. For example one could set up an NFS shared 51build directory, and issue the command 52make -j4 CC='seq gcc' 53Which would execute a build in paralell, on 4 nodes in your cluster, assigning 54processes to each node in sequence. The run command is equivilent to saying: 55"I dont care where you run, just run and tell me how things turned out." 56 57Job Scheduled Shell: 58 59The jsd/jsh pair of programs was specifically designed for parallel 60compiling. The idea is that the user sets up a benchmark program of some 61sort, which is executed by the jsd program. This benchmark then ranks 62the machines in the cluster by performance. When the jsh command is run, 63the fastest machine will be given the command to execute. At the same 64time, jsd keeps track of the node being in use, and refuses to give other 65commands to it, until it completes. In this way, you can avoid the 66problem where a single slower machine tends to accumulate much work 67because it isn't finishing quickly enough. It also tends to favor the 68fastest machine in a cluster, giving it most of the work in a parallel 69compile. 70 71Barrier sync for shell scripting. 72 This is a new idea. The barrier mechanism consists of a daemon run on 73a host, and a client which can be used to barrier sync with. An example of use 74would be: 75 76#!/bin/sh 77do something 78barrier -h host -k token -s 5 79do something else 80 81Then, you would dsh the execution of this script to your hosts. The barrier 82makes sure that all hosts have completed the first "something" before the 83continue on to the next something. The -s, is the level of paralellism for 84the script, ie: how many processes to wait for before continuing. 85 86dvt: 87 88This is a parallel interactive execution environment. The user is given 89windows for each host in the cluster, and a central management window. 90Keystrokes typed on the central management window, will be relayed to all 91of the subordinate windows. This allows the user to vi a file on 20 92machines simultaneously, for example. You can also select a window, and 93use it like a normal xterm, to perform actions on just that host. 94 95What my solution does not provide: 96 97A parallel programming API 98 Use MPI, or PVM, or whatever for that.. thats outside the scope of 99this suite. 100 101Please visit the ClusterIt homepage for more information 102http://clusterit.sourceforge.net/ 103Tim Rightnour <root@garbled.net> 104