Context Navigation

Changes between Version 16 and Version 17 of ComputeStartDefault

Timestamp:: 2010-11-09T10:43:28+01:00 (15 years ago)
Author:: george
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ComputeStartDefault

v16	v17
12	12	Why do we need two different kinds of nodes in the system, if these nodes have the same functionallity? A workflow operation is an execution of a bioinformatics analysis tool, which is invoked from a command line. A usual output is files and a standard command-line output or/and error. The difference between two kinds of Worker nodes is in a way analysis tools are invoked from them. Resident Worker starts a job by sumbitting a shell script to the cluster job scheduler. In contrast to Resident Worker, Extra Worker directly invokes an analysis tool. In this way, the cluster scheduler can be circumvented.
13	13
14		Extra Workers are pre-started and stopped by Resident Worker. Resident Worker receives a command from Resource Manager and starts Extra Workers by submitting a script to the cluster scheduler to start them. After being started, Extra Workers communicate to Job Manager and register themselves. In practice, it can take more time to pre-start many Extra Workers for direct parallel operations execution than submit operations as scripts to the cluster scheduler. Furthermore, running many Extra Workers in the system increases the network load at the Job Manager node. An advanced Extra Worker pre-starting mechanism is planned to be developed in the future.
	14	Extra Workers are pre-started and stopped by Resident Worker. Resident Worker receives a command from Resource Manager and starts Extra Workers by submitting a script to the cluster scheduler to start them. After being started, Extra Workers communicate to Job Manager and register themselves. In practice, it can take more time to pre-start many Extra Workers for direct parallel execution of analysis operations than submit scripts to a cluster scheduler to execute the same operations. Furthermore, running many Extra Workers in the system increases the network load on the Job Manager node. Still, Extra Workers can be efficiently used in the system having an advanced strategy to pre-start them, that is planned to be developed in the future.
15	15
16	16	Resource Manager is required only if a computational cluster is used in the system. Its logic is rather simple and di- rectly depends on the policies of the cluster used. We tested our framework on the [http://www.rug.nl/cit/hpcv/faciliteiten/HPCCluster?lang=en Millipede HPC cluster] , which appears in the TOP500 supercomputers list. This cluster has a policy that any cluster job execution should not exceed the ten days limit to assure availability of cluster resources to all users. This means, that Resident Worker cannot run longer that ten days either. In our current implementation to keep a cluster as a part of our computational cloud, Resident Worker starts a new Resident Worker node in some time before it will be removed by the cluster administrator, e.g. two days before the end of a ten-days period. A request for starting a new Resident Worker is passed to the cluster scheduler and processed in some time depending on a cluster load. Hence, we assure that at least one Resident Worker is running on the cluster.