1$Id: README,v 1.11 2008/02/27 19:35:46 garbled Exp $
2
3Welcome to clusterit-2.5 !
4
5This is a collection of clustering tools, to turn your ordinary
6everyday pile of UNIX workstations into a speedy parallel beast.
7
8To get started quickly, please read the file INSTALL.
9
10Initially this work was based on the work of IBM's PSSP, and copied
11heavily from the ideas there.  Its also lightly based on the work
12pioneered in GLUnix.  I've decided to simplify, and complexify it
13however:
14
15Glunix is a monstrosity.  It allows better control over the
16individual nodes, and much better load sharing.  However I'm convinced
17alot of the speed advantages of having a parallel cluster are lost with
18the incredible overhead of running the glunix master and daemon services
19on a host.  Glunix does however offer a real paralell programming
20environment.  Something which is totally beyond the scope of this package.
21
22PSSP is also a very powerful set of tools.  Not much more than a bunch
23of staples written in perl, they provide an incredible tool for tying
24an unwieldy number of UNIX machines into one fast demon of an MPP.
25
26The advantages of both systems are central control of a large number of
27machines.  Unfortunately, they all have dwarbacks.. as does my solution.
28
29What my solution provides:
30
31*Fast* parallel execution of remote commands.
32	C vs. Perl.  You do the math.
33
34Heterogenous cluster makeup.
35	This makes it very easy to administer a large number of machines,
36of varying architectures, and operating systems.  The fact that my tools are
37completely architecture independent, make it possible to dsh commands out
38to machines that aren't even running the same OS!  This can be useful for a
39variety of mass administration tasks an admin may have to undertake.
40
41Choice of authentication.
42	IBM forces you to use kerberos 4 for authentication on the SP's.
43This is actually fine for a closed environment like an SP, but for something
44to be run on just a stack of otherwise useful boxes, you need more freedom.
45This suite allows you to do whatever you like.. ssh, kerberos, .rhosts.
46Whatever suits your security and speed requirements best.
47
48Sequential node, and random node execution
49	The idea here is that these dsh-like programs allow you to do something
50akin to load balanced scripting.  For example one could set up an NFS shared
51build directory, and issue the command
52make -j4 CC='seq gcc'
53Which would execute a build in paralell, on 4 nodes in your cluster, assigning
54processes to each node in sequence.   The run command is equivilent to saying:
55"I dont care where you run, just run and tell me how things turned out."
56
57Job Scheduled Shell:
58
59The jsd/jsh pair of programs was specifically designed for parallel
60compiling.  The idea is that the user sets up a benchmark program of some
61sort, which is executed by the jsd program.  This benchmark then ranks
62the machines in the cluster by performance.  When the jsh command is run,
63the fastest machine will be given the command to execute.  At the same
64time, jsd keeps track of the node being in use, and refuses to give other
65commands to it, until it completes.  In this way, you can avoid the
66problem where a single slower machine tends to accumulate much work
67because it isn't finishing quickly enough.  It also tends to favor the
68fastest machine in a cluster, giving it most of the work in a parallel
69compile.
70
71Barrier sync for shell scripting.
72	This is a new idea.  The barrier mechanism consists of a daemon run on
73a host, and a client which can be used to barrier sync with.  An example of use
74would be:
75
76#!/bin/sh
77do something
78barrier -h host -k token -s 5
79do something else
80
81Then, you would dsh the execution of this script to your hosts.  The barrier
82makes sure that all hosts have completed the first "something" before the
83continue on to the next something.  The -s, is the level of paralellism for
84the script, ie: how many processes to wait for before continuing.
85
86dvt:
87
88This is a parallel interactive execution environment.  The user is given
89windows for each host in the cluster, and a central management window.
90Keystrokes typed on the central management window, will be relayed to all
91of the subordinate windows. This allows the user to vi a file on 20
92machines simultaneously, for example.  You can also select a window, and
93use it like a normal xterm, to perform actions on just that host.
94
95What my solution does not provide:
96
97A parallel programming API
98	Use MPI, or PVM, or whatever for that.. thats outside the scope of
99this suite.
100
101Please visit the ClusterIt homepage for more information
102http://clusterit.sourceforge.net/
103Tim Rightnour <root@garbled.net>
104