Changes between Version 13 and Version 14 of GCCluster


Ignore:
Timestamp:
2013-01-09T15:56:01+01:00 (12 years ago)
Author:
Patrick
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GCCluster

    v13 v14  
    1 = GCC cluster =
    2 [[TOC()]]
    3 
    4 The GCC has its own 480 core cluster. The main workhorses are 10 servers each with
    5  * 48 cores
    6  * 256 GB RAM
    7  * 1 GBit management NIC
    8  * 10 GBit NIC for a dedicated fast IO connection to a
    9  * 2 PB shared GPFS for storage
    10 
    11 = For users =
    12 
    13 == Login to the User Interface server ==
    14 
    15 To submit jobs, check the status, test scripts, etc. you need to login on the user interface server a.k.a. cluster.gcc.rug.nl using SSH.
    16 Please note that cluster.gcc.rug.nl is only available from within certain RUG/UMCG subnets. From outside you need a double hop; Firstly login to the proxy:
    17 {{{
    18 $> ssh [your_account]@proxy.gcc.rug.nl
    19 }}}
    20 followed by:
    21 {{{
    22 $> ssh [your_account]@cluster.gcc.rug.nl
    23 }}}
    24 If you are inside certain subnets of the RUG/UMCG network, you can skip the proxy and login to cluster.gcc.rug.nl directly.[[BR]]
    25 If you are outside, you can automate the double hop via the proxy as documented here: [http://wiki.gcc.rug.nl/wiki/TransparentMultiHopSSH TransparentMultiHopSSH]
    26 
    27 == Available queues ==
    28 
    29 In order to quickly test jobs you are allowed to run the directly on cluster.gcc.rug.nl outside the scheduler. Please think twice though before you hit enter: if you crash cluster.gcc.rug.nl others can no longer submit or monitor their jobs, which is pretty annoying. On the other hand it's not a disaster as the scheduler and execution daemons run on physically different servers and hence are not affected by a crash of cluster.gcc.rug.nl.
    30 
    31 To test how your jobs perform on an execution node and get an idea of the typical resource requirements for your analysis you should submit a few jobs to the test queues first. The test queues run on a dedicated execution node, so in case your jobs make that server run out of disk space, out of memory or do other nasty things accidentally, it will not affect the production queues and ditto nodes.
    32 
    33 Once you've tested your job scripts and are sure they will behave nice & perform well, you can submit jobs to the production queue named ''gcc''. In case you happen to be part of the gaf group and need to process high priority sequence data for the Genome Analysis Facility you can also use the ''gaf'' queue.
    34 
    35 ||**Queue**||**Job type**||**Limits**||
    36 ||test-short||debugging||10 minutes max. walltime per job; limited to a single test node / 48 cores||
    37 ||test-long||debugging||max 4 jobs running simultaneously per user; limited to half the test node / 24 cores||
    38 ||gcc||production - default prio||none||
    39 ||gaf||production - high prio||only available to users from the gaf group||
    40 
    41 == Useful commands ==
    42 
    43 Please refer to the Torque manuals for a complete overview. Some examples:
    44 
    45 === Submitting jobs: ===
    46 Simple submit of job script to the default queue, which routes your job to the ''gcc'' production queue:
    47 {{{
    48 $> qsub myScript.sh
    49 }}}
    50 Submitting a job with a jobname different from the filename of the submitted script (default) and with a dependency on a previously submitted job.
    51 This job will not start before the dependency has completed successfully:
    52 {{{
    53 $> qsub -N [nameOfYourJob] -W depend=afterok:[ID of a previously submitted job] myScript.sh
    54 }}}
    55 Instead of providing arguments to qsub on the commandline, you can also add them using the ''#PBS'' syntax as a special type of comments to your (bash) job script like this:
    56 {{{
    57 #!/bin/bash
    58 #PBS -N jobName
    59 #PBS -q test-short
    60 #PBS -l nodes=1:ppn=2
    61 #PBS -l walltime=00:06:00
    62 #PBS -l mem=10mb
    63 #PBS -e /some/path/to/your/testScript1.err
    64 #PBS -o /some/path/to/your/testScript1.out
    65 
    66 [Your actual work...]
    67 }}}
    68 
    69 === Checking for the status of your jobs: ===
    70 Default output for all users:
    71 {{{
    72 $> qstat
    73 }}}
    74 Long jobs names:
    75 {{{
    76 $> wqstat
    77 }}}
    78 Limit output to your own jobs
    79 {{{
    80 $> wqstat -u [your account]
    81 }}}
    82 Get "full" a.k.a detailed output for a specific job (you probably don't want that for all jobs....):
    83 {{{
    84 $> qstat -f [jobID]
    85 }}}
    86 Get other detailed status info for a specific job:
    87 {{{
    88 $> checkjob [jobID]
    89 }}}
    90 
    91 === List jobs based on priority as in who is next in the queue: ===
    92 {{{
    93 $> diagnose -p
    94 }}}
    95 
    96 === List available nodes: ===
    97 {{{
    98 $> pbsnodes
    99 }}}
    100 
    101 = For admins =
    102 
    103 
    104 == Servers ==
    105 
    106 ||**Function**||**DNS**||**IP**||**Daemons**||**Comments**||
    107 ||User interface node||cluster.gcc.rug.nl||195.169.22.156||- (clients only)||Login node to submit and inspect jobs. [[BR]] Relatively powerful machine. [[BR]] Users can run code outside the scheduler for debugging purposes.||
    108 ||scheduler VM||scheduler01||195.169.22.214||''pbs_server''[[BR]]''maui''||Dedicated scheduler [[BR]] No user logins if this one is currently the production scheduler||
    109 ||scheduler VM||scheduler02||195.169.22.190||''pbs_server''[[BR]]''maui''||Dedicated scheduler [[BR]] No user logins if this one is currently the production scheduler||
    110 ||Execution node||targetgcc01||192.168.211.191||''pbs_mom''||Dedicated test node: only the ''test-short'' and ''test-long'' queues run on this node. [[BR]] Crashing the test node shall not affect production!.||
    111 ||Execution node||targetgcc02||192.168.211.192||''pbs_mom''||Redundant production node: only the default ''gcc'' and priority ''gaf'' queues run on this node.||
    112 ||Execution node||targetgcc03||192.168.211.193||''pbs_mom''||Redundant production node: only the default ''gcc'' and priority ''gaf'' queues run on this node.||
    113 ||Execution node||targetgcc04||192.168.211.194||''pbs_mom''||Redundant production node: only the default ''gcc'' and priority ''gaf'' queues run on this node.||
    114 ||Execution node||targetgcc05||192.168.211.195||''pbs_mom''||Redundant production node: only the default ''gcc'' and priority ''gaf'' queues run on this node.||
    115 ||Execution node||targetgcc06||192.168.211.196||''pbs_mom''||Redundant production node: only the default ''gcc'' and priority ''gaf'' queues run on this node.||
    116 ||Execution node||targetgcc07||192.168.211.197||''pbs_mom''||Redundant production node: only the default ''gcc'' and priority ''gaf'' queues run on this node.||
    117 ||Execution node||targetgcc08||192.168.211.198||''pbs_mom''||Redundant production node: only the default ''gcc'' and priority ''gaf'' queues run on this node.||
    118 ||Execution node||targetgcc09||192.168.211.199||''pbs_mom''||Redundant production node: only the default ''gcc'' and priority ''gaf'' queues run on this node.||
    119 ||Execution node||targetgcc10||192.168.211.200||''pbs_mom''||Redundant production node: only the default ''gcc'' and priority ''gaf'' queues run on this node.||
    120 
    121 == PBS software / flavour ==
    122 
    123 The current setup uses the resource manager **Torque 2.5.12** combined with the scheduler **Maui 3.3.1**.
    124 
    125 === Maui ===
    126 
    127 Runs only on the schedulers with config files in $MAUI_HOME:
    128 {{{
    129 /usr/local/maui/
    130 }}}
    131 
    132 === Torque ===
    133 
    134 Torque clients are available on all servers.[[BR]]
    135 Torque's pbs_server daemon runs only on the schedulers.[[BR]]
    136 Torque's pbs_mom daemon runs only on the execution nodes where the real work is done.[[BR]]
    137 Torque config files are installed in $TORQUE_HOME:
    138 {{{
    139 /var/spool/torque/
    140 }}}
    141 
    142 == Dual scheduler setup for seamless cluster upgrades ==
    143 
    144 We use two schedulers: scheduler01 and scheduler02. These alternate as production and test scheduler. The production scheduler is hooked up to cluster.gcc.rug.nl and does not allow direct user logins. Hence you cannot submit jobs from the production scheduler, but only from cluster.gcc.rug.nl. The other is the test scheduler, which does not have a dedicated user interface machine and does allow direct user logins. You will need to login to the test scheduler in order to submit jobs. When it is time to upgrade software or tweak the !Torque/Maui configs:
    145 
    146  * We drain a few nodes: running jobs are allowed to finish, but no new ones will start.[[BR]]
    147    On the production scheduler as root:
    148 {{{
    149 $> qmgr -c 'set node targetgcc[0-9][0-9] state = offline'
    150 }}}
    151  * Once ''idle'' move the drained nodes from the production to the test scheduler.[[BR]]
    152    Change the name of the scheduler in both these files on each node to be moved:
    153 {{{
    154 $TORQUE_HOME/server_name
    155 $TORQUE_HOME/mom_priv/config
    156 }}}
    157    On each execution node where the config changed run as root:
    158 {{{
    159 $> service pbs_mom restart
    160 }}}
    161    On the test scheduler as root:
    162 {{{
    163 $> qmgr -c 'set node targetgcc[0-9][0-9] state = online'
    164 }}}
    165  * Check the change in available execution nodes using:
    166 {{{
    167 $> pbsnodes
    168 }}}
    169  * Test the new setup
    170  * Disable direct logins to the test scheduler
    171  * Enable direct logins to the production scheduler
    172  * Disable job submission from cluster.gcc.rug.nl on the production scheduler
    173  * Take cluster.gcc.rug.nl offline
    174  * Make cluster.gcc.rug.nl the user interface and submit host for the test scheduler
    175  * Take cluster.gcc.rug.nl back online: the test scheduler is now the new production scheduler and vice versa
    176  * Drain additional nodes and move them to the new production scheduler
    177 
    178 == Installation details ==
    179 
    180 === /etc/hosts files ===
    181 
    182 Extremely important: make sure hosts are named consistently in the /etc/hosts files on all hosts that are part of the cluster. More explicitly:
    183  * there shall be only one line per IP address.
    184  * in case of multiple names/aliases for the same IP address these shall all be listed on all hosts and in exactly the same order on that single line.
    185 Inconsistent naming of hosts will result in miscommunication between Torque or Maui daemons.
    186 A typical symptom of inconsistent host names is when qsub fails to register job dependencies.
    187 
    188 === Our current config files: ===
    189  * $TORQUE_HOME/mom_priv/[[attachment:gcc_pbs_mom.config.txt|config]]
    190  * $MAUI_HOME/[[attachment:gcc_maui.cfg.txt|maui.cfg]]
    191  * Other Torque settings can be loaded from a [[attachment:gcc_torque.txt|file]] using qmgr.[[BR]]
    192    To export/inspect the settings use:
    193 {{{
    194 $> qmgr -c 'p s'
    195 }}}
    196 
    197 === Init scripts ===
    198 
    199 Both the Torque and Maui source downloads contain a contrib folder with /etc/init.d/ scripts to start/stop the daemons:
    200  * /etc/init.d/pbs_server  [[attachment:suse.pbs_server|SuSE flavor]] | [[attachment:redhat.pbs_server|Redhat/CentOS/Fedora flavor]]
    201  * /etc/init.d/maui        [[attachment:suse.maui|SuSE flavor]] | [[attachment:redhat.maui|Redhat/CentOS/Fedora flavor]]
    202  * /etc/init.d/pbs_mom     [[attachment:suse.pbs_mom|SuSE flavor]]
    203 We use versions patched for:
    204 * The location where the daemons are installed.
    205 * The run levels at which the daemons should be started or stopped.
    206 * Dependencies: GPFS is explicitly defined as service required for starting/stopping the Torque and Maui daemons.
    207 * Make sure to check whether your scheduler runs on a SuSE or Redhat/CentOS/Fedora VM
    208 
    209 To install:
    210 
    211 On scheduler[01|02] as root:
    212 {{{
    213 $> cp *.pbs_server /etc/init.d/pbs_server; chkconfig --add pbs_server; service pbs_server status
    214 $> cp *.maui       /etc/init.d/maui;       chkconfig --add maui;       service maui status
    215 }}}
    216 
    217 On targetgcc![01-10] as root:
    218 {{{
    219 $> cp *.pbs_mom    /etc/init.d/pbs_mom;    chkconfig --add pbs_mom;    service pbs_mom status
    220 }}}
    221 
    222 === wqstat ===
    223 
    224 We have patched torque-2.5.12/src/cmds/[[attachment:qstat.c]] and recompiled the clients to create ''wqstat'', which reports long job names up to 40 characters as compared to the default 16. As normal user:
    225 {{{
    226 $> cd torque-2.5.12
    227 $> ./configure --with-default-server=scheduler01 --disable-server --disable-mom --prefix=/some/other/location/cluster_clients/
    228 $> make
    229 $> make install
    230 }}}
    231 As root:
    232 {{{
    233 $> cp /some/other/location/cluster_clients/bin/qstat   /usr/local/bin/wqstat
    234 }}}
     1Please see [[GCCCluster]]