Platform for Scientific Computing

WR Cluster Usage



Users have accounts that are valid on all cluster nodes. Passwords can be changed with the passwd command on wr0. It takes some time (up to several minutes) until such a change will be seen by all nodes. Please be aware that account names are the same as with your university accounts, but that the accounts are different, including passwords.

File Systems and I/O

Each node (server as well as cluster nodes) has its own operating system on a local disc. Certain shared directory subtrees are exported from wr0 to all cluster nodes. This includes user data (e.g. $HOME = /home/username) as well as commonly used application software (e.g. /usr/local).

The /tmp directory is on all nodes guaranteed to be located on a fast node-local filesystem. The environment variable $TMPDIR within a batch job contains a name to a job-private fast local directory somewhere in /tmp on a node. This directory is generated on each job start with a job-specific name and removed an job termination. See additional description here. If possible, use this dynamically set environment variable to write to and read from temporary files used only in one job run.

The /scratch directory can be used for larger amounts of data that needs to be available longer than a batch job run and/or needs to be shared between nodes. The /scratch directory is shared between all nodes and access to it is slow.

The /scratch2 directory is reserved for users with high I/O demands. It is shared between all nodes, has a medium capacity and has fast access from most nodes (10x higher than /scratch). Get in contact with us if you have high I/O demands.

Please be aware that data on /tmp filessystems may be deleted without any notice after a certain period of time. And be aware that there is no backup for the scratch file systems!

mount point purpose located shared on all nodes daily backup capacity access time default soft quota
/ operating system local no no - - -
/tmp node-local temporary user data local no no small fast 10 GB
/usr/local application software remote server yes yes - - -
/home user data remote server yes yes medium medium 50 GB
/scratch user data remote server yes no large slow 5 TB
/scratch2 user data remote server yes no large fast on request

We have established quotas on file systems. Users can ask for their own quota with the command quota -s --show-mntpoint. The maximum number of files is per default restricted to 1 Mio. / 2 Mio. (soft / hard limit) files per file system. For the /scratch filesystem, the numbers are 5 Mio. / 6 Mio.

Beside the standard solutions for most users, we have individual nodes with special local I/O features. If you have special, fast or large I/O demands, contact the system administrator to find the best individual solution.

File and Directory Names

Don't use spaces or umlauts in file or directory names, e.g. on copying from a MS Windows system. Otherwise you may get in trouble due to, e.g., different character encodings.

Software Packages

Beside a set of standard software packages, a user can extend his/her package list with additional software packages or package versions. This needs to be done by a user itself using the module command with several possible subcommands. A software environment is called a module. Loading a module means usually that internally the search paths for commands, libraries etc. are extended. For a full command reference of the module system, read the documentation.


Here is a short overview of some (sub-)commands: A module may exist in several versions where the user has the possibility to work with one specific version of choice. If no version is specified during the load a default version is used. It is a good practice always to use the default version of a module even if the concrete version behind the default may change over the time. Most modules are downward compatible such that no problems should exist in this case and you will always get the most advanced, fast and with the least errors version of a module at any time.

Example: Instead of

user@wr0: module load gcc/10.1.0
just use

user@wr0: module load gcc


user@wr0: module avail

---------------------------------------------- /usr/local/modules/modulesfiles ----------------------------------------------
amd/default            gcc/8.1.0              intel-mpi/2018         metis/5.1.0-32         pin/default
aocc/2.1.0             gcc/8.2.0              intel-mpi/2019         metis/5.1.0-64         python/2.7.15
aocc/2.2.0             gcc/9.1.0              intel-mpi/2020         mpitools/default       python/2.7.15-dg
aocc/default           gcc/default            intel-mpi/default      nvtop/default          python/default
atop/2.3.0             gnuplot/5.2.3          intel-tools/2018       octave/4.4.0           python3/3.6.5
atop/2.4.0             gnuplot/default        intel-tools/2019       octave/default         python3/3.7.0
atop/2.5.0             hwloc/1.11.10          intel-tools/2020       ompp/0.8.5             python3/3.8.1
atop/default           hwloc/1.11.11          intel-tools/default    ompp/default           python3/3.9.0
boost/1.72.0           hwloc/1.11.13          java/10.0.1            openmpi/gnu            python3/default
cmake/3.11.1           hwloc/2.0.1            java/14.0.1            openmpi/intel          sage/8.2
cuda/10.2              hwloc/2.0.4            java/default           papi/5.6.0             sage/default
cuda/9.2               hwloc/2.1.0            libFHBRS/3.1           papi/5.7.0             slurm/18.08.3
cuda/default           hwloc/2.2.0            libFHBRS/default       papi/6.0.0             slurm/19.05.5
dinero4/4.7            hwloc/2.3.0            likwid/4.3.2           papi/default           slurm/default
dinero4/default        hwloc/default          likwid/5.0.1           pgi/18.7               texlive/2018
ffmpeg/4.0             intel-compiler/2018    likwid/default         pgi/19.4               texlive/default
ffmpeg/default         intel-compiler/2019    matlab/default         pgi/20.1               valgrind/3.13.0
gcc/10.1.0             intel-compiler/2020    matlab/R2018a          pgi/default            valgrind/default
gcc/7.3.0              intel-compiler/default matlab/R2019b          pin/3.7

---------------------------------------------- /usr/share/Modules/modulefiles -----------------------------------------------
dot         module-git  module-info modules     null        use.own

----------------------------------------------------- /etc/modulefiles ------------------------------------------------------
mpi/compat-openmpi16-x86_64  mpi/mvapich2-2.0-psm-x86_64  mpi/mvapich2-2.2-x86_64      mpi/openmpi-x86_64
mpi/mpich-3.0-x86_64         mpi/mvapich2-2.0-x86_64      mpi/mvapich2-psm-x86_64
mpi/mpich-3.2-x86_64         mpi/mvapich2-2.2-psm2-x86_64 mpi/mvapich2-x86_64
mpi/mpich-x86_64             mpi/mvapich2-2.2-psm-x86_64  mpi/openmpi3-x86_64

user@wr0: module whatis gcc
gcc                  : GNU compiler suite version 10.1.0

# check current compiler version (system default without loading a module)
user@wr0: gcc --version
nt color=#FF0000>gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28

# load default version
user@wr0: module load gcc
user@wr0: gcc --version
gcc (GCC) 10.1.0

# unload default version
user@wr0: module unload gcc
user@wr0: gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)

Available Modules

A list of selected software packages (eventually with sub-versions) that are handled using the module command is:
name purpose
aocc AMD compiler
cmake CMake system
cuda CUDA development and runtime environment
gcc GNU compiler suite
gnuplot plot program
hwloc detect hardware properties
intel-compiler Intel Compiler enviroment
intel-mpi Intel MPI
intel-tools Intel development tools
java Oracle Java environment
likwid development tools
matlab Matlab mathematical software with toolboxes
metis graph partitioning package
octave GNU octave
ompp OpenMP tool
opencl OpenCL
openmpi OpenMPI environment
papi Papi performance counter library
pgi PGI compiler suite
python Python 2
python3 Python 3
sage Mathematical software system sage
slurm batch system
texlive TeX distribution
valgrind Valgrind software analysis tool

Initial Module Enviroment Setup

If you need always the same modules, you may include the load commands in your .bash_profile (once per session executed) or .bashrc (once per shell executed) file in your home directory. Example $HOME/.bashrc file:

module load intel-compiler openmpi/intel

Using the Batch System

A batch system is used on HPC systems to manage the work of many users on such a system. Users submit their requests for computational work and (hardware) requirements that are necessary for the execution of their requests. Then, the batch system looks for resources that fulfill the requirements and starts the job as soon as such resources get available. This might be immediately or later. We use Slurm as a batch system and we ask you to use the batch system for all your work on all cluster nodes other than wr0. Slurm has a command line interface and additionally a X11 based graphical interface to display certain batch system state. To work with batch jobs, a user usually does a sequence of steps described below.

Usual Steps

1) Specify What Should Be Done

The first thing to do is to specify the work that has to be done by the job. This specification is done with a shell script (a file). Such a batch job script is a shell script that is submitted to and started by the batch system. In a batch script you specify all actions that should be done in your job either sequentially or parallel. The execution of the script later starts in the same directory where you submitted the job.

Sequential Job

An example of such a batch script /home/user/ is:

# start sequential program
# change directory and execute another sequential program
cd subdir

OpenMP Job

An example of such a batch script /home/user/ is:

# set the number of threads
# start OpenMP program


An example of such a batch script /home/user/ is:

# load the OpenMPI environment
module load openmpi/gnu

# start here your MPI program
mpirun ./test_mpi.exe

2) Specify Which Resources You Need

Additionally, at the begin of a job script a description is given which resources are needed for the execution. The syntax for that is a sequence of lines starting with #SBATCH (which is a special form of a shell comment). In each line a certain part of the request can be specified. See the documentation of Slurm sbatch for a list of all options that are available to specify. Here, only an example is given. More options are given in a summary later. An example for such a resource request is:
#SBATCH --partition=any          # partition (queue)
#SBATCH --nodes=4                # number of nodes
#SBATCH --ntasks-per-node=32     # number of tasks per node
#SBATCH --mem=4G                 # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00              # total runtime of job allocation (format D-HH:MM:SS; first parts optional)
#SBATCH --output=slurm.%j.out    # filename for STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err     # filename for STDERR

# here comes the part with the description of the computational work, for example:
# load the OpenMPI environment
module load openmpi/gnu

# start here your MPI program
mpirun ./test_mpi.exe
The meaning of the lines in this example are: A job can be started only, if all requested resource specifications can be fulfilled. For the example: 4 nodes in the partition any are available with 32 cores and 4 GB memory each, for 2 minutes.

Instead of requesting 4 nodes with 32 cores, it is also possible to request a certain number of cores / hardware threads but that may be spread arbitrary over several nodes. The example given above adapted to that is:
#SBATCH --partition=any          # partition (queue)
#SBATCH --tasks=80               # number of tasks     <---------- this is different to above
#SBATCH --mem=4G                 # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00              # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)
#SBATCH --output=slurm.%j.out    # filename for STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err     # filename for STDERR

# here comes the part with the description of the computational work, for example:
# load the OpenMPI environment
module load openmpi/gnu

# start here your MPI program
mpirun ./test_mpi.exe
In this example, 80 parallel execution units are requested. This can be fulfilled by 4 x 20-core nodes. But this request may also be fulfilled by one node with 80 cores or 80 nodes with one core used on each (and other cores on a node left for other jobs). This specification gives more freedom to the batch system to find resources. But the programming model is (usually) restricted to MPI as a program run may be spread over several nodes.

3) Submit the Job

After you specified in a file the requested resources and the work that should be done, you submit this job script to the batch system. This is done with the sbatch command using the job script filename as an argument. Example:

user@wr0: sbatch
If the system accepts the request (i.e., no syntax error in the script etc.) the batch system prints a job ID that may be used to refer to this job.

Please be aware, that all loaded modules of your interactive session (where you execute the sbatch command) are as well loaded when starting your submitted batch job. This may lead to different behaviour of a batch job for interactive sessions with differently loaded modules!

4) Check Job Status

After submission you may check the status of your/all your jobs with several commands depending on the amount of information you want.
  1. You can view the batch status of all batch jobs in a web brower ( link). The page gets updated periodically.
  2. You can show the status of all of your non-finished jobs in a shell window with the squeue command.
    user@wr0: squeue
                    55       any user     PD       0:00      2 (Resources)
                    56       any user     PD       0:00      2 (Priority)
                    54       any user      R       0:08      2 wr[50,51]
    In the example the user has 3 jobs submitted that are either running or still waiting. The column ST marks the job state (R=running, PD=waiting).

5) Get Results

Output to stdout / stderr in your program is redirected to 2 files you find in the directory where you submitted the job after the job finished. The file names can be specified in the job script with the options shown above. It is a good idea, to include at least the job ID in the filename.

user@wr0: ls -l
-rw------- 1 user fb02    316 Mar  9 07:27 slurm.51.out
-rw------- 1 user fb02  11484 Mar  9 07:27 slurm.52.err

Selected Batch System Commands

A summary of useful commands is given in the following table. See appropriate man pages or the Slurm documentation for all available options and a full description.
command meaning
sbatch <shell-script> submit the shell-script to the batch system
scancel <jobid> delete a job with the given job ID, that may be either in running or waiting state
squeue show the state of own jobs in queues
sinfo [options] show the state of partitions or nodes
scontrol show job <jobid> show more details for the job

Partitions, Resource Limits and Job Priorities

We have established several partitions with different behaviour and restrictions. See the output of the sinfo command for a list of available partitions. With each partition are associated certain policies (hardware properties, maximum number of jobs in queue, maximum runtime per job, scheduling priority, maximum physical memory, special hardware features).

Resource Limits

As part of a job submit, you can specify a request for main memory above the default 1 GB. Be aware that on nodes not all main memory as given in the hardware overview table can be allocated for your job. For example, the operating system needs some memory for itself, for the efficient communication with a GPU memory is pinned etc. For example, it might happen that on a system with 128 GB main memory only 120 GB are available for a job. Therefore the advice is, that you should specify resource requests that fit to you job's needs and do not request the maximum available resources of a node.

A list of the most important queues is:

queue name maximum time per job usable memory default virt.memory/process nodes used
any 72 hours (dependent on node) 1 GB any node
hpc 72 hours 185 GB 1 GB wr50-wr99
hpc3 72 hours 185 GB 1 GB wr50-wr99
gpu 72 hours 185 GB 1 GB wr12,wr15-wr19
gpu4 72 hours 185 GB 1 GB wr15
wr14 72 hours 120 GB 1 GB wr14
wr43 72 hours 750 GB 1 GB wr43
wr44 72 hours 1 TB 1 GB wr44

Job Priorities and Job Scheduling

Jobs are mainly scheduled based on their calculated job priority. Many factors contribute to a job's priority, amongst others are the main factors the waiting time and the resource consumption during the last 14 days (fairshare). Additionally, a backfill strategy is used by the scheduling system. You can contribute to a fair scheduling and efficient utilization of the whole system if you specify precisely what resources you need in a job (instead of the maximum resources).

Environment Variables and Modules

The batch system defines certain environment variables that you may use in your batch job script.

variable name purpose example
$SLURM_SUBMIT_DIR working directory where the job was submitted /home/user/testdir
$SLURM_JOB_ID Job ID given to the job 65
$SLURM_JOB_NAME Job name given to the job testjob
$SLURM_JOB_NUM_NODES number of nodes assigned to this job 2
$SLURM_JOB_CPUS_PER_NODE number of cores per node assigned to this job 32(x5) (32 cores, on 5 nodes)
$SLURM_JOB_NODELIST node names of assigned nodes wr[50,51]

Special Requests

If you do not want to use Hyperthreads (i.e. only real cores), specify in your job request additionally: #SBATCH --ntasks-per-core=1
Example: 1 hpc3-node is requested that has 32 cores / 64 hardware threads. The program starts with 32 OpenMP (software) threads spread over all cores and not using hHyperthreading.
#SBATCH --partition=hpc3         # partition
#SBATCH --nodes=1                # number of nodes
#SBATCH --ntasks-per-core=1      # use only real cores
#SBATCH --time=2:00              # total runtime of job allocation

Hybrid Programming Models
If you want to use hybrid programming models (e.g. MPI+OpenMP), you can influence the mapping of MPI processes to the requested hardware in several ways, including
Use Specific Nodes
If you want to use specific nodes (e.g. wr73), this can be specified in the resource spcification part of the batch job.

Temporary Files

For a batch job an environment variable $TMPDIR gets defined with a name of a temporary directory (with fast access) that should be used for fast temporary file storage within a job scope. The directory is created on job start and deleted when the job finished. Example on how to use the environment variable within a program:

char *basedir = getenv("TMPDIR");
if(basedir != NULL)
    char filename = "test.dat";
    char allname[1024];
    sprintf(allname, "%s/%s", basedir, filename);
    FILE *f = fopen(all, "w");

Information about Job Runs

Sometimes it is necessary to get some information about a job execution. E.g. what the maximum amount of main memory is/was during the execution of the job to get reasonable values for the resource specification in a job script. The slurm command sstat helps with that for running jobs.

user@wr0: sstat --format=jobid,maxvmsize,MaxDiskRead 123456.batch
       JobID  MaxVMSize  MaxDiskRead
------------ ---------- ------------
123456.batch  47654040K     39789920
where 123456 is the job number of the running job.

If you need such information for already finishedy jobs, use the command sacct.

user@wr0: sacct -j 123456.batch --format="jobid,CPUTime,MaxVMSize,MaxDiskRead"
       JobID    CPUTime  MaxVMSize  MaxDiskRead
------------ ---------- ---------- ------------
123456.batch   01:37:04  24173824K      828.59M

GPU Nodes

There are several nodes with different types of GPU's to speed up certain computations. Please use GPU nodes only if you can utilize them in an appropriate way. If you need assistance in choosing the appropriate GPU, please contact us.

GPU nodes are in the following queues:
batch queue number of nodes GPU cards with tensor cores CUDA compute capability
gpu 6 Nvidia V100 yes 7.0
gpu4 1 4x Nvidia V100 yes 7.0
wr14 1 Nvidia K80 no 3.7

Since CUDA 11.x, GPU's with compute capability less than 5.2 are by default no longer supported (marked as deprecated). If you want to compile CUDA code that runs on all of our GPU platforms, either work with a Cuda version below 11 (e.g. module load cuda/10.2) or with Cuda 11.2, compile the code explicitly for all of our platforms with:

user@wr0: nvcc -gencode arch=compute_37,code=sm_37 -gencode arch=compute_70,code=sm_70 ...
i.e. for compute capability 3.7 and 7.0.

To optimize the throughput on our GPU nodes and to minimize waiting times for alle GPU users, please follow the rules:

Use in the beginning of your work node wr14 which you can access interactively, i.e. ssh wr14. Here you start a short but representable program run. Afterwards you can get information about the GPU utilization of this program with the command:

user@wr14: nvidia-smi -q -d ACCOUNTING
which results in a output like:

==============NVSMI LOG==============

Timestamp                           : Fri Nov 20 16:54:34 2020
Driver Version                      : 440.33.01
CUDA Version                        : 10.2

Attached GPUs                       : 2
GPU 00000000:84:00.0
    Accounting Mode                 : Enabled
    Accounting Mode Buffer Size     : 4000
    Accounted Processes
        Process ID                  : 48147
            GPU Utilization         : 85 %
            Memory Utilization      : 81 %
            Max memory usage        : 171 MiB
            Time                    : 6936 ms
            Is Running              : 0

GPU 00000000:85:00.0
    Accounting Mode                 : Enabled
    Accounting Mode Buffer Size     : 4000
    Accounted Processes             : None
The lines below Process ID give you valuable information about the GPU utilization. If more than one process ist listed in the output, usually the last entry in the output relates to the last execution on the GPU. You can also monitor the GPU usage of your running program on a batch node with thej command:

user@wr0: srun -s --jobid your-running-job-id --pty nvidia-smi
where your_running_job_id is the job-ID of your running GPU program on a batch node. The output looks similar to

user@wr0> srun -s --jobid 123456 --pty nvidia-smi
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:01:00.0 Off |                  Off |
| N/A   28C    P0              85W / 500W |    592MiB / 81920MiB |     85%      Default |
|                                         |                      |             Disabled |

| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|    0   N/A  N/A    386553      C   ./vectorproduct.exe                         534MiB |
which shows that the current utilization of the GPU used is 85% and that the program that runs a kernels on the GPU is called ./vectorproduct.exe and utilizes 534 MiB of GPU memory.

Software Development

Interactice Development

To fasten development cycles, you can use some nodes interactively. The nodes available for that are: Additionally, you can use the srun command: srun --x11 --pty /bin/bash with additinal options that are possible.


All main development tools are available. Among them are compilers (C, C++, Java, Fortran) and parallel programming environments (OpenMP, MPI, CUDA, OpenCL, OpenACC). Application software is in the responsibility of users.
compiler name module command documentation safe optimization debug option compiler feedback version
GNU C cc / gcc module load gcc man gcc -O2 -g -ftree-vectorizer-verbose=2 --version
Intel C (oneAPI) icx module load intel-compiler man icx -O2 -g –qopt-report --version
PGI C pgcc module load pgi man pgcc -O2 -g -Minfo=vec --version
GNU C++ g++ module load gcc man g++ -O2 -g -ftree-vectorizer-verbose=2 --version
Intel C++ (oneAPI) icpx module load intel-compiler man icpx -O2 -g -qopt-report --version
PGI C++ pgc++ module load pgi man pgc++ -O2 -g -Minfo=vec --version
GNU Fortran gfortran module load gcc man gfortran -O2 -g -ftree-vectorizer-verbose=2 --version
Intel Fortran ifort module load intel-compiler man ifort -O2 -g –vec-report=2 (or higher) --version
PGI Fortran pgfortran module load pgi man pgfortran -O2 -g -Minfo=vec --version
Oracle Java javac module load java   -O -g n.a. -version


On wr14 there is additionally the whole PGI Compiler infrastructure with compilers and the profiler pgprof installed. Documentation is available under /usr/local/PGI/. The tool infrastructure can be used only on wr5. The generated Code may be executed on all nodes. Exception: if you use the accelerator functionality of the PGI compiler, the code can be executed only on nodes with a GPU.

Base Software

The following base software is installed:

Intel MKL

The Intel Math Kernel Library (MKL) is installed. You can use this software after a module load intel-compiler which expands include file search paths and library search paths accordingly. It should be used preferably on Intel-based systems, but works also on AMD systems. The library contains basic mathematical functions (BLAS, LAPACK, FFT,...). If you use any of the Intel compilers, just add the flag -mkl as a compiler and linker flag. Otherwise, check this page for the appropriate version and correspondings flags. Example for Makefile:

CC      = icx
CFLAGS  = -mkl
LDLIBS  = -mkl
By default MKL uses all available cores. You can restrict this number with the environment variable MKL_NUM_THREADS, e.g.

before you start a MKL-based program.

Parallel Programming

There are different approaches for parallel programming today: shared memory parallel programming based on OpenMP, distributed memory programming based on MPI, and GPGPU computing based on CUDA, OpenCL, OpenACC or OpenMP 4++.


compiler name module command documentation version
GNU OpenMP C/C++ gcc/g++ -fopenmp module load gcc man gcc --version
Intel OpenMP C/C++ (oneAPI) icx/icpx -qopenmp module load intel-compiler man icx / icpx --version
PGI OpenMP C/C++ pgcc/pgCC -mp module load pgi man pgcc /pgCC --version
Intel OpenMP Fortran ifort -qopenmp module load intel-compiler man ifort --version
GNU OpenMP Fortran gfortran -fopenmp module load gcc man gfortran --version
PGI Fortran pgfortran -mp module load pgi man pgfortran --version

Example: Compile and run an OpenMP C file:

module load intel-compiler
icx -qopenmp -O2 t.c


compiler name module command documentation version
MPI C (based on gcc) mpicc module load openmpi/gnu see gcc --version
MPI C++ (based on gcc) mpic++ module load openmpi/gnu see g++ --version
MPI Fortran (based on gfortran) mpif90 module load openmpi/gnu see gfortran --version
MPI C (based on Intel oneAPI icx) mpiicx module load openmpi/intel see icx --version
MPI C++ (based on Intel oneAPI icpx) mpiicpx module load openmpi/intel see icpx --version
MPI Fortran (based on ifort) mpiifort module load openmpi/intel see ifort --version

Which MPI-compilers are used can be influenced through the module command: with module load openmpi/gnu you can use the GNU compiler environment (gcc, g++, gfortran), and with module load openmpi/intel you can use the Intel compiler environment (icc, icpc, ifort). Be aware that using module load openmpi/intel the MPI compiler names mpicc etc. are mapped to the GNU compilers. To use an Intel compiler you need to specify Intel's own names for that, i.e., mpiicx, mpiicpx, mpiifort.

All options discussed in the compiler section also apply here, e.g. optimization.

Example: Compile a MPI C file and generate optimised code:

module load openmpi/intel
mpiicx -O2 t.c

The MPI implementation we use (OpenMPI) has options to influence the communication medium used. Within one node, MPI processes can communicate through shared memory, Omni-Path, or Ethernet with TCP/IP. Between nodes, Omni-Path or Ethernet with TCP/IP is possible. OpenMPI ususally chooses the most appropriate medium which means you don't need to specify anything. But if you want to choose a specific and applicable medium you may specify this in the call to mpirun through the --mca btl specifier: mpirun --mca btl communication-channels ... where communication-channels is a list of comma separated communication mediums. Possible values are: sm for shared memory, openib for Omni-Path / Infiniband, and tcp for Ethernet. The last specifier must be self.


mpirun --mca btl tcp,self -np 4 mpi.exe

OpenCL and CUDA

Some nodes have a NVIDIA GPU installed (V100, K80). Program development can be done interactively on wr14 (i.e. ssh wr14) as there are all necessary drivers installed locally on that system. Production runs on any Tesla card should be done using the appropriate batch queues. Use module load cuda to load the CUDA environment (inclusing certain possible versions). Use module load opencl/nvidia or module load opencl/intel to load the OpenCL environment, for Nvidia GPUs or Intel processors, respectively. With both modules, the standard environment variables CPATH for inlude files and LIBRARY_PATH for libraries are set accordingly to be used e.g. in a makefile.

To compile an OpenCL program on a node with the appropriate software environment installed proceed as follows:

module load opencl
cc opencltest.c -lOpenCL
To compile a CUDA project use the following Makefile template:

# defines
CC              = cc
CUDA_CC         = nvcc
LDLIBS          = -lcudart

# default rules based on suffices
#       C
%.o: %.c
        $(CC) -c $(CFLAGS) -o $@ $<

#       CUDA
        $(CUDA_CC) -c $(CUDA_CFLAGS) -o $@ $<

myprogram.exe: myprogram.o kernel.o
        $(CC) -o $@ $^ $(LDLIBS)
Here the CUDA kernel and host part is in a file kernel.c and the non-CUDA part of your program is in a file myprogram.c.


Directive-based GPU programming is available through the PGI compiler. See /usr/local/PGI/doc for documentation. Use wr14 only interactively to compile such programs. The generated code can be executed on wr14-wr27. You can specify the compute capability as a compiler option. Important: the PGI compiler generates per default debug code that in general is very slow. If you want fast code add the nodebug option. Example:

module load pgi
pgcc -acc -ta=nvidia,cc3.5,nodebug openacctest.c
where 3.5 corresponds to the compute capability of the target GPU.


See this document .

Resource Requirements

If you want to find out the memory requirements of a non-MPI job, use:

/usr/bin/time -f "%M KB" command
which prints out the peak memory consumption in kilobytes of the command execution.

Usage Examples

Sequential C program

C-program named test.c

#include <stdio.h>
int main(int argc, char **argv) {
    printf("Hello world\n");
    return 0;


CC     = cc

#default rules
%.o: %.c
        $(CC) $(CFLAGS) -c $<
%.exe: %.o
        $(CC) -o $@ $< $(LDLIBS)

default:: test_sequential.exe

Batch script

#SBATCH --output=slurm.%j.out    # STDOUT
#SBATCH --error=slurm.%j.err     # STDERR
#SBATCH --partition=any          # partition (queue)
#SBATCH --ntasks=1               # use 1 task
#SBATCH --mem=100                # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00              # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)

# start program

OpenMP C program

C-program named test_openmp.c

#include <stdio.h>
#include <omp.h>
int main(int argc, char **argv) {
#pragma omp parallel
    printf("I am the %d. thread of  %d threads\n", omp_get_thread_num(), omp_get_num_threads());
    return 0;


CC     = gcc -fopenmp

#default rules
%.o: %.c
        $(CC) $(CFLAGS) -c $<
%.exe: %.o
        $(CC) -o $@ $< $(LDLIBS)

default:: test_openmp.exe

Batch script

#SBATCH --output=slurm.%j.out    # STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err     # STDERR
#SBATCH --partition=any          # partition (queue)
#SBATCH --nodes=1                # number of tasks/cores
#SBATCH --ntasks-per-node=32     # number of tasks per node
#SBATCH --mem=1G                 # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00              # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)

# start program (with 24 threads, in total 32 threads were requested by the job)

MPI C program

C-program named test_mpi.c :

#include <stdio.h>
#include <unistd.h>
#include <mpi.h>
int main(int argc, char **argv) {
  int size, rank;
  char hostname[80];

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  gethostname(hostname, 80);
  printf("Hello world from %d (on node %s) of size %d!\n", rank, hostname, size);
  return 0;


CC     = mpicc

#default rules
%.o: %.c
        $(CC) $(CFLAGS) -c $<
%.exe: %.o
        $(CC) -o $@ $< $(LDLIBS)

default:: test_mpi.exe

Batch script

#SBATCH --output=slurm.%j.out    # STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err     # STDERR
#SBATCH --partition=any          # partition (queue)
#SBATCH --nodes=5                # number of tasks/cores
#SBATCH --ntasks-per-node=32     # number of tasks per node
#SBATCH --mem=4G                 # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00              # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)

module load gcc openmpi/gnu

# start program (with maximum parallelism as specified in job request, for this example 5*32=160)
mpirun ./test_mpi.exe


For some of the application programs installed a brief description is given here how to use them.


Beside the basic Matlab program there are several Matlab toolboxes installed.

Using Matlab interactively

To run Matlab interactively on wr0 you have to do the following: Usage:
user@wr0: module load matlab
user@wr0: matlab
This starts the Matlab shell. If you logged in from a X-Server capable computer and used ssh -Y to login to wr0 the graphical panel appears on your computer instead of the text panel (see here for details of X-Server usage).

Using Matlab with the Batch System

Inside your batch job start Matlab without display:
    module load matlab
    matlab -nodisplay -nosplash -nodesktop -r "m-file"
where m-file is the name of your Matlab script with the suffix .m

Pitfalls Using Matlab

Matlab is very sensible with memory allocation / administration.


As there are several groups of OpenFOAM users we try to bring together these to coordinate the installation of one (or several) OpenFOAM versions. Please contact us if you are interested.

X11 applications

X11 applications are possible only on wr0. To use X11 applications that open a display on your local X-server (e.g. xterm, ...) you need to redirect the X11 output to your local X11 server and to allow another computer to open a window on your computer.
  1. The easiest way to enable this is to login to the WR-cluster with ssh and use the ssh option -Y (or with older ssh versions also -X ) that enables X11 tunneling through your ssh connection. If your login path goes over multiple computers please be sure to use the -Y option for every intermediate host on the path.
    user@another_host:  ssh -Y
    On your local computer (i.e. where the X-server is running) you must allow wr0 to open a window. Execute on your local computer in a shell: xhost +
  2. Another possibility it to set the DISPLAY variable on the cluster and to allow other computers (i.e. the WR cluster) to open a window on your local X-Server.
    Example: Please be aware that newer versions of X-Servers don't support by default IP-Ports but rather Unix ports and therefore this second version doesn't work.
You can test your X11 setup executing in an ssh shell window on wr0 xterm. A window on your local computer must pop up with a shell on wr0.