1 Hadoop On Demand 2 ================ 3 41. Introduction: 5================ 6 7The Hadoop On Demand (HOD) project is a system for provisioning and 8managing independent Hadoop MapReduce instances on a shared cluster 9of nodes. HOD uses a resource manager for allocation. At present it 10supports Torque (http://www.clusterresources.com/pages/products/torque-resource-manager.php) 11out of the box. 12 132. Feature List: 14================ 15 16The following are the features provided by HOD: 17 182.1 Simplified interface for managing MapReduce clusters: 19 20The MapReduce user interacts with the cluster through a simple 21command line interface, the HOD client. HOD brings up a virtual 22MapReduce cluster with the required number of nodes, which the 23user can use for running Hadoop jobs. When done, HOD will 24automatically clean up the resources and make the nodes available 25again. 26 272.2 Automatic installation of Hadoop: 28 29With HOD, Hadoop does not need to be even installed on the cluster. 30The user can provide a Hadoop tarball that HOD will automatically 31distribute to all the nodes in the cluster. 32 332.3 Configuring Hadoop: 34 35Dynamic parameters of Hadoop configuration, such as the NameNode and 36JobTracker addresses and ports, and file system temporary directories 37are generated and distributed by HOD automatically to all nodes in 38the cluster. 39 40In addition, HOD allows the user to configure Hadoop parameters 41at both the server (for e.g. JobTracker) and client (for e.g. JobClient) 42level, including 'final' parameters, that were introduced with 43Hadoop 0.15. 44 452.4 Auto-cleanup of unused clusters: 46 47HOD has an automatic timeout so that users cannot misuse resources they 48aren't using. The timeout applies only when there is no MapReduce job 49running. 50 512.5 Log services: 52 53HOD can be used to collect all MapReduce logs to a central location 54for archiving and inspection after the job is completed. 55 563. HOD Components 57================= 58 59This is a brief overview of the various components of HOD and how they 60interact to provision Hadoop. 61 62HOD Client: The HOD client is a Unix command that users use to allocate 63Hadoop MapReduce clusters. The command provides other options to list 64allocated clusters and deallocate them. The HOD client generates the 65hadoop-site.xml in a user specified directory. The user can point to 66this configuration file while running Map/Reduce jobs on the allocated 67cluster. 68 69RingMaster: The RingMaster is a HOD process that is started on one node 70per every allocated cluster. It is submitted as a 'job' to the resource 71manager by the HOD client. It controls which Hadoop daemons start on 72which nodes. It provides this information to other HOD processes, 73such as the HOD client, so users can also determine this information. 74The RingMaster is responsible for hosting and distributing the 75Hadoop tarball to all nodes in the cluster. It also automatically 76cleans up unused clusters. 77 78HodRing: The HodRing is a HOD process that runs on every allocated node 79in the cluster. These processes are run by the RingMaster through the 80resource manager, using a facility of parallel execution. The HodRings 81are responsible for launching Hadoop commands on the nodes to bring up 82the Hadoop daemons. They get the command to launch from the RingMaster. 83 84Hodrc / HOD configuration file: An INI style configuration file where 85the users configure various options for the HOD system, including 86install locations of different software, resource manager parameters, 87log and temp file directories, parameters for their MapReduce jobs, 88etc. 89 90Submit Nodes: Nodes where the HOD Client is run, from where jobs are 91submitted to the resource manager system for allocating and running 92clusters. 93 94Compute Nodes: Nodes which get allocated by a resource manager, 95and on which the Hadoop daemons are provisioned and started. 96 974. Next Steps: 98============== 99 100- Read getting_started.txt to get an idea of how to get started with 101installing, configuring and running HOD. 102 103- Read config.txt to get more details on configuration options for HOD. 104 105