Version 3 (modified by 12 years ago) (diff) | ,
---|
GCC cluster
The GCC has its own 480 core cluster. The main workhorses are 10 servers with 48 cores, 256 GB, 1 GBit management NIC and a 10 GBit NIC for a dedicated IO connection to a 2 PB shared GPFS for storage.
Servers
Function | DNS | IP | Deamons | Comments |
User interface node | cluster.gcc.rug.nl | 195.169.22.156 | - (clients only) | Login node to submit and inspect jobs. Relatively powerful machine. Users can run code outside the scheduler for debugging purposes. |
scheduler VM | scheduler01 | 195.169.22.214 | pbs_server maui | Dedicated scheduler No user logins if this one is currently the production scheduler |
scheduler VM | scheduler02 | 195.169.22.190 | pbs_server maui | Dedicated scheduler No user logins if this one is currently the production scheduler |
Execution node | targetgcc01 | 192.168.211.191 | pbs_mom | Dedicated test node: only the test-short and test-long queues run on this node. Crashing the test node shall not affect production!. |
Execution node | targetgcc02 | 192.168.211.192 | pbs_mom | Redundant production node: only the default gcc and priority gaf queues run on this node. |
Execution node | targetgcc03 | 192.168.211.193 | pbs_mom | Redundant production node: only the default gcc and priority gaf queues run on this node. |
Execution node | targetgcc04 | 192.168.211.194 | pbs_mom | Redundant production node: only the default gcc and priority gaf queues run on this node. |
Execution node | targetgcc05 | 192.168.211.195 | pbs_mom | Redundant production node: only the default gcc and priority gaf queues run on this node. |
Execution node | targetgcc06 | 192.168.211.196 | pbs_mom | Redundant production node: only the default gcc and priority gaf queues run on this node. |
Execution node | targetgcc07 | 192.168.211.197 | pbs_mom | Redundant production node: only the default gcc and priority gaf queues run on this node. |
Execution node | targetgcc08 | 192.168.211.198 | pbs_mom | Redundant production node: only the default gcc and priority gaf queues run on this node. |
Execution node | targetgcc09 | 192.168.211.199 | pbs_mom | Redundant production node: only the default gcc and priority gaf queues run on this node. |
Execution node | targetgcc10 | 192.168.211.200 | pbs_mom | Redundant production node: only the default gcc and priority gaf queues run on this node. |
PBS software / flavour
The current setup uses the resource manager Torque 2.5.12 combined with the scheduler Maui 3.3.1.
Maui
Runs only on the schedulers with config files in
/usr/local/maui/
Torque
Torque clients are available on all servers.
Torque's pbs_server deamon runs only on the schedulers.
Torque's pbs_mom daemon runs only on the execution nodes where the real work is done.
Torque config files are installed in
/var/spool/torque/
Dual scheduler setup
Installation details
Attachments (9)
-
gcc_pbs_mom.config.txt (805 bytes) - added by 12 years ago.
pbs_mom config
-
suse.maui (1.2 KB) - added by 12 years ago.
/etc/init.d/ script for maui
-
suse.pbs_mom (1.8 KB) - added by 12 years ago.
/etc/init.d/ script for pbs_mom
-
suse.pbs_server (1.7 KB) - added by 12 years ago.
/etc/init.d/ script for pbs_server
-
qstat.c (55.1 KB) - added by 12 years ago.
Patched qstat.c for long job names
-
gcc_maui.cfg.txt (659 bytes) - added by 12 years ago.
maui.cfg
-
redhat.maui (565 bytes) - added by 12 years ago.
/etc/init.d/ script for maui (Redhat/CentOS/Fedora flavor)
-
redhat.pbs_server (2.3 KB) - added by 12 years ago.
/etc/init.d/ script for pbs_server (Redhat/CentOS/Fedora flavor)
-
gcc_torque.txt (2.5 KB) - added by 12 years ago.
torque setup commands to be imported with qmgr
Download all attachments as: .zip