passwd
command on wr0. It takes some time (up to several
minutes) until such a change will be seen by all nodes. Please be aware that account
names are the same as with your university accounts, but that the
accounts are different, including passwords.
wr0
to all cluster nodes. This includes user data (e.g. $HOME =
/home/username) as well as commonly used application software
(e.g. /usr/local).
The /tmp
directory is on all nodes guaranteed to be
located on a fast node-local filesystem. The environment
variable $TMPDIR
within a batch job contains a name to a
job-private fast local directory somewhere in /tmp
on a
node. This directory is generated on each job start with a
job-specific name and removed an job termination. See additional
description here. If possible, use this
dynamically set environment variable to write to and read from
temporary files used only in one job run.
The /scratch
directory can be used for larger amounts of
data that needs to be available longer than a batch job run and/or
needs to be shared between nodes. The /scratch
directory
is shared between all nodes and access to it is slow.
The /scratch2
directory is reserved for users with high
I/O demands. It is shared between all nodes, has a medium capacity and
has fast access from most nodes (10x higher
than /scratch
). Get in contact with us if you have high
I/O demands.
Please be aware that data on /tmp filessystems may be deleted without any notice after a certain period of time. And be aware that there is no backup for the scratch file systems!
mount point | purpose | located | shared on all nodes | daily backup | capacity | access time | default soft quota |
---|---|---|---|---|---|---|---|
/ | operating system | local | no | no | - | - | - |
/tmp | node-local temporary user data | local | no | no | small | fast | 10 GB |
/usr/local | application software | remote server | yes | yes | - | - | - |
/home | user data | remote server | yes | yes | medium | medium | 50 GB |
/scratch | user data | remote server | yes | no | large | slow | 5 TB |
/scratch2 | user data | remote server | yes | no | large | fast | on request |
We have established quotas on file systems. Users can ask for their
own quota with the command quota -s --show-mntpoint
. The
maximum number of files is per default restricted to 1 Mio. / 2
Mio. (soft / hard limit) files per file system. For the /scratch
filesystem, the numbers are 5 Mio. / 6 Mio.
Beside the standard solutions for most users, we have individual nodes with special local I/O features. If you have special, fast or large I/O demands, contact the system administrator to find the best individual solution.
module
command with several possible subcommands. A
software environment is called a module. Loading a module means
usually that internally the search paths for commands, libraries
etc. are extended.
For a full command reference of the module system, read
the
documentation.
module avail
shows a list of available modulesmodule whatis
shows a verbose information on a modulemodule load
loads a named modulemodule list
show all currently loaded modulesmodule unload
removes a named moduleExample: Instead of
user@wr0: module load gcc/10.1.0
just use
user@wr0: module load gcc
user@wr0: module avail
---------------------------------------------- /usr/local/modules/modulesfiles ----------------------------------------------
amd/default gcc/8.1.0 intel-mpi/2018 metis/5.1.0-32 pin/default
aocc/2.1.0 gcc/8.2.0 intel-mpi/2019 metis/5.1.0-64 python/2.7.15
aocc/2.2.0 gcc/9.1.0 intel-mpi/2020 mpitools/default python/2.7.15-dg
aocc/default gcc/default intel-mpi/default nvtop/default python/default
atop/2.3.0 gnuplot/5.2.3 intel-tools/2018 octave/4.4.0 python3/3.6.5
atop/2.4.0 gnuplot/default intel-tools/2019 octave/default python3/3.7.0
atop/2.5.0 hwloc/1.11.10 intel-tools/2020 ompp/0.8.5 python3/3.8.1
atop/default hwloc/1.11.11 intel-tools/default ompp/default python3/3.9.0
boost/1.72.0 hwloc/1.11.13 java/10.0.1 openmpi/gnu python3/default
cmake/3.11.1 hwloc/2.0.1 java/14.0.1 openmpi/intel sage/8.2
cuda/10.2 hwloc/2.0.4 java/default papi/5.6.0 sage/default
cuda/9.2 hwloc/2.1.0 libFHBRS/3.1 papi/5.7.0 slurm/18.08.3
cuda/default hwloc/2.2.0 libFHBRS/default papi/6.0.0 slurm/19.05.5
dinero4/4.7 hwloc/2.3.0 likwid/4.3.2 papi/default slurm/default
dinero4/default hwloc/default likwid/5.0.1 pgi/18.7 texlive/2018
ffmpeg/4.0 intel-compiler/2018 likwid/default pgi/19.4 texlive/default
ffmpeg/default intel-compiler/2019 matlab/default pgi/20.1 valgrind/3.13.0
gcc/10.1.0 intel-compiler/2020 matlab/R2018a pgi/default valgrind/default
gcc/7.3.0 intel-compiler/default matlab/R2019b pin/3.7
---------------------------------------------- /usr/share/Modules/modulefiles -----------------------------------------------
dot module-git module-info modules null use.own
----------------------------------------------------- /etc/modulefiles ------------------------------------------------------
mpi/compat-openmpi16-x86_64 mpi/mvapich2-2.0-psm-x86_64 mpi/mvapich2-2.2-x86_64 mpi/openmpi-x86_64
mpi/mpich-3.0-x86_64 mpi/mvapich2-2.0-x86_64 mpi/mvapich2-psm-x86_64
mpi/mpich-3.2-x86_64 mpi/mvapich2-2.2-psm2-x86_64 mpi/mvapich2-x86_64
mpi/mpich-x86_64 mpi/mvapich2-2.2-psm-x86_64 mpi/openmpi3-x86_64
user@wr0: module whatis gcc
gcc : GNU compiler suite version 10.1.0
# check current compiler version (system default without loading a module)
user@wr0: gcc --version
nt color=#FF0000>gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28
# load default version
user@wr0: module load gcc
user@wr0: gcc --version
gcc (GCC) 10.1.0
# unload default version
user@wr0: module unload gcc
user@wr0: gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
module
command is:
name | purpose |
---|---|
aocc | AMD compiler |
cmake | CMake system |
cuda | CUDA development and runtime environment |
gcc | GNU compiler suite |
gnuplot | plot program |
hwloc | detect hardware properties |
intel-compiler | Intel Compiler enviroment |
intel-mpi | Intel MPI |
intel-tools | Intel development tools |
java | Oracle Java environment |
likwid | development tools |
matlab | Matlab mathematical software with toolboxes |
metis | graph partitioning package |
octave | GNU octave |
ompp | OpenMP tool |
opencl | OpenCL |
openmpi | OpenMPI environment |
papi | Papi performance counter library |
pgi | PGI compiler suite |
python | Python 2 |
python3 | Python 3 |
sage | Mathematical software system sage |
slurm | batch system |
texlive | TeX distribution |
valgrind | Valgrind software analysis tool |
.bash_profile
(once per session executed) or .bashrc
(once per shell executed) file in your home directory.
Example $HOME/.bashrc
file:
module load intel-compiler openmpi/intel
wr0
. Slurm
has a
command line interface and additionally a X11 based graphical
interface to display certain batch system state.
To work with batch jobs, a user usually does a sequence of steps
described below.
/home/user/job_sequential.sh
is:
#!/bin/sh
# start sequential program
./test_sequential.exe
# change directory and execute another sequential program
cd subdir
./another_program.exe
/home/user/job_openmp.sh
is:
#!/bin/sh
# set the number of threads
export OMP_NUM_THREADS=16
# start OpenMP program
./test_openmp.exe
/home/user/job_mpi.sh
is:
#!/bin/sh
# load the OpenMPI environment
module load openmpi/gnu
# start here your MPI program
mpirun ./test_mpi.exe
#SBATCH
(which is a
special form of a shell comment). In each line a certain part of the
request can be specified. See the documentation of
Slurm sbatch for
a list of all options that are available to specify. Here, only an
example is given. More options are given in a summary later.
An example for such a resource request is:
#!/bin/bash
#SBATCH --partition=any # partition (queue)
#SBATCH --nodes=4 # number of nodes
#SBATCH --ntasks-per-node=32 # number of tasks per node
#SBATCH --mem=4G # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00 # total runtime of job allocation (format D-HH:MM:SS; first parts optional)
#SBATCH --output=slurm.%j.out # filename for STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err # filename for STDERR
# here comes the part with the description of the computational work, for example:
# load the OpenMPI environment
module load openmpi/gnu
# start here your MPI program
mpirun ./test_mpi.exe
The meaning of the lines in this example are:
any
is requested. A partition is a class of hardware nodes. For most
partitions it is valid, that all nodes have the same or similar hardware
properties.
--mem=4G
asks for 4GB of main memory on each of the nodes.
--time=2:00
asks for 2 minutes of
usage for the requested resources.
any
are available with 32 cores and 4 GB memory each,
for 2 minutes.
#!/bin/bash
#SBATCH --partition=any # partition (queue)
#SBATCH --tasks=80 # number of tasks <---------- this is different to above
#SBATCH --mem=4G # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00 # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)
#SBATCH --output=slurm.%j.out # filename for STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err # filename for STDERR
# here comes the part with the description of the computational work, for example:
# load the OpenMPI environment
module load openmpi/gnu
# start here your MPI program
mpirun ./test_mpi.exe
In this example, 80 parallel execution units are requested. This can
be fulfilled by 4 x 20-core nodes. But this request may
also be fulfilled by one node with 80 cores or 80 nodes with one core
used on each (and other cores on a node left for other jobs). This
specification gives more freedom to the batch system to find
resources. But the programming model is (usually) restricted to MPI as
a program run may be spread over several nodes.
sbatch
command using the
job script filename as an argument.
Example:
user@wr0: sbatch jobscript.sh
If the system accepts the request (i.e., no syntax error in the script
etc.) the batch system prints a job ID that may be used to refer to
this job.
Please be aware, that all loaded modules of your interactive session (where you execute the sbatch command) are as well loaded when starting your submitted batch job. This may lead to different behaviour of a batch job for interactive sessions with differently loaded modules!
squeue
command.
user@wr0: squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
55 any test2.sh user PD 0:00 2 (Resources)
56 any test3.sh user PD 0:00 2 (Priority)
54 any test1.sh user R 0:08 2 wr[50,51]
In the example the user has 3 jobs submitted that are either running
or still waiting. The column ST marks the job state (R=running, PD=waiting).
user@wr0: ls -l
-rw------- 1 user fb02 316 Mar 9 07:27 slurm.51.out
-rw------- 1 user fb02 11484 Mar 9 07:27 slurm.52.err
command | meaning |
---|---|
sbatch <shell-script> | submit the shell-script to the batch system |
scancel <jobid> | delete a job with the given job ID, that may be either in running or waiting state |
squeue | show the state of own jobs in queues |
sinfo [options] | show the state of partitions or nodes |
scontrol show job <jobid> | show more details for the job |
sinfo
command for a
list of available partitions. With each partition are associated
certain policies (hardware properties, maximum number of jobs in
queue, maximum runtime per job, scheduling priority, maximum physical
memory, special hardware features).
A list of the most important queues is:
queue name | maximum time per job | usable memory | default virt.memory/process | nodes used |
---|---|---|---|---|
any
| 72 hours | (dependent on node) | 1 GB | any node |
hpc
| 72 hours | 185 GB | 1 GB | wr50-wr99 |
hpc3
| 72 hours | 185 GB | 1 GB | wr50-wr99 |
gpu
| 72 hours | 185 GB | 1 GB | wr12,wr15-wr19 |
gpu4
| 72 hours | 185 GB | 1 GB | wr15 |
wr14
| 72 hours | 120 GB | 1 GB | wr14 |
wr43
| 72 hours | 750 GB | 1 GB | wr43 |
wr44
| 72 hours | 1 TB | 1 GB | wr44 |
variable name | purpose | example |
---|---|---|
$SLURM_SUBMIT_DIR
| working directory where the job was submitted | /home/user/testdir |
$SLURM_JOB_ID
| Job ID given to the job | 65 |
$SLURM_JOB_NAME
| Job name given to the job | testjob |
$SLURM_JOB_NUM_NODES
| number of nodes assigned to this job | 2 |
$SLURM_JOB_CPUS_PER_NODE
| number of cores per node assigned to this job | 32(x5) (32 cores, on 5 nodes) |
$SLURM_JOB_NODELIST
| node names of assigned nodes | wr[50,51] |
#SBATCH --ntasks-per-core=1
#!/bin/bash
#SBATCH --partition=hpc3 # partition
#SBATCH --nodes=1 # number of nodes
#SBATCH --ntasks-per-core=1 # use only real cores
#SBATCH --time=2:00 # total runtime of job allocation
export OMP_NUM_THREADS=32
./test_openmp.exe
#SBATCH --cpus-per-task=X
to reserve X CPUs per (MPI-) task#SBATCH --ntasks-per-node=X
to spread equally tasks over nodes (X tasks per node in the example)
#!/bin/bash
#SBATCH --partition=any # partition
#SBATCH --nodes=4 # number of nodes
#SBATCH --ntasks-per-node=32 # number of cores per node
#SBATCH --time=2:00 # total runtime of job allocation
module load openmpi/gnu
mpirun ./test_mpi.exe
#SBATCH --nodelist=wr73
to ask for node wr73$TMPDIR
gets
defined with a name of a temporary directory (with fast access) that
should be used for fast temporary file storage within a job scope. The
directory is created on job start and deleted when the job finished.
Example on how to use the environment variable within a program:
char *basedir = getenv("TMPDIR");
if(basedir != NULL)
{
char filename = "test.dat";
char allname[1024];
sprintf(allname, "%s/%s", basedir, filename);
FILE *f = fopen(all, "w");
}
sstat
helps with that for running jobs.
user@wr0: sstat --format=jobid,maxvmsize,MaxDiskRead 123456.batch
JobID MaxVMSize MaxDiskRead
------------ ---------- ------------
123456.batch 47654040K 39789920
where 123456
is the job number of the running job.
If you need such information for already finishedy jobs, use the command sacct
.
Example:
user@wr0: sacct -j 123456.batch --format="jobid,CPUTime,MaxVMSize,MaxDiskRead"
JobID CPUTime MaxVMSize MaxDiskRead
------------ ---------- ---------- ------------
123456.batch 01:37:04 24173824K 828.59M
GPU nodes are in the following queues:
batch queue | number of nodes | GPU cards | with tensor cores | CUDA compute capability |
---|---|---|---|---|
gpu | 6 | Nvidia V100 | yes | 7.0 |
gpu4 | 1 | 4x Nvidia V100 | yes | 7.0 |
wr14 | 1 | Nvidia K80 | no | 3.7 |
Since CUDA 11.x, GPU's with compute capability less than 5.2 are by default no longer supported (marked as deprecated).
If you want to compile CUDA code that runs on all of our GPU platforms, either work with a Cuda version below 11 (e.g. module load cuda/10.2
) or with Cuda 11.2, compile the code explicitly for all of our platforms with:
user@wr0: nvcc -gencode arch=compute_37,code=sm_37 -gencode arch=compute_70,code=sm_70 ...
i.e. for compute capability 3.7 and 7.0.
To optimize the throughput on our GPU nodes and to minimize waiting times for alle GPU users, please follow the rules:
gpu4
. gpu4
.wr14
which you can access interactively, i.e. ssh wr14
.
Here you start a short but representable program run. Afterwards you can get information about the GPU utilization of this program with the command:
user@wr14: nvidia-smi -q -d ACCOUNTING
which results in a output like:
==============NVSMI LOG==============
Timestamp : Fri Nov 20 16:54:34 2020
Driver Version : 440.33.01
CUDA Version : 10.2
Attached GPUs : 2
GPU 00000000:84:00.0
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Accounted Processes
Process ID : 48147
GPU Utilization : 85 %
Memory Utilization : 81 %
Max memory usage : 171 MiB
Time : 6936 ms
Is Running : 0
GPU 00000000:85:00.0
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Accounted Processes : None
The lines below Process ID
give you valuable information about the GPU utilization. If more than one process ist listed in the output, usually the last entry in the output relates to the last execution on the GPU.
You can also monitor the GPU usage of your running program on a batch node with thej command:
user@wr0: srun -s --jobid your-running-job-id --pty nvidia-smi
where your_running_job_id
is the job-ID of your running GPU program on a batch node. The output looks similar to
user@wr0> srun -s --jobid 123456 --pty nvidia-smi
...
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:01:00.0 Off | Off |
| N/A 28C P0 85W / 500W | 592MiB / 81920MiB | 85% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 386553 C ./vectorproduct.exe 534MiB |
+---------------------------------------------------------------------------------------+
which shows that the current utilization of the GPU used is 85% and that the program that runs a kernels on the GPU is called ./vectorproduct.exe
and utilizes 534 MiB of GPU memory.
wr0
for all development where you do not need special hardware (e.g. accelerator) or want to use MPI wr14
for GPU development, i.e., CUDA, OpenCL, and MPI tests with small data sets and a small number of MPI processes. Do a ssh -Y wr14
to work interactively on wr14
srun
command:
srun --x11 --pty /bin/bash
with additinal options that are possible.
compiler | name | module command | documentation | safe optimization | debug option | compiler feedback | version |
---|---|---|---|---|---|---|---|
GNU C | cc / gcc | module load gcc |
man gcc | -O2 | -g | -ftree-vectorizer-verbose=2 | --version |
Intel C (oneAPI) | icx | module load intel-compiler |
man icx | -O2 | -g | –qopt-report | --version |
PGI C | pgcc | module load pgi |
man pgcc | -O2 | -g | -Minfo=vec | --version |
GNU C++ | g++ | module load gcc |
man g++ | -O2 | -g | -ftree-vectorizer-verbose=2 | --version |
Intel C++ (oneAPI) | icpx | module load intel-compiler |
man icpx | -O2 | -g | -qopt-report | --version |
PGI C++ | pgc++ | module load pgi |
man pgc++ | -O2 | -g | -Minfo=vec | --version |
GNU Fortran | gfortran | module load gcc |
man gfortran | -O2 | -g | -ftree-vectorizer-verbose=2 | --version |
Intel Fortran | ifort | module load intel-compiler |
man ifort | -O2 | -g | –vec-report=2 (or higher) | --version |
PGI Fortran | pgfortran | module load pgi |
man pgfortran | -O2 | -g | -Minfo=vec | --version |
Oracle Java | javac | module load java |
-O | -g | n.a. | -version |
Examples:
cc -O2 t.c
module load intel-compiler; ifort -O2 t.f
wr14
there is additionally the whole PGI Compiler infrastructure
with compilers and the profiler pgprof
installed. Documentation is available
under /usr/local/PGI/
. The tool infrastructure can
be used only on wr5. The generated Code may be executed on all
nodes. Exception: if you use the accelerator functionality of the PGI compiler,
the code can be executed only on nodes with a GPU.
module load intel-compiler
which expands
include file search paths and library search paths accordingly. It
should be used preferably on Intel-based systems, but works also on
AMD systems. The library contains basic mathematical functions (BLAS,
LAPACK, FFT,...).
If you use any of the Intel compilers, just add the flag -mkl
as a
compiler and linker flag. Otherwise,
check this page
for the appropriate version and correspondings flags.
Example for Makefile:
CC = icx
CFLAGS = -mkl
LDLIBS = -mkl
By default MKL uses all available cores. You can restrict this number with the
environment variable MKL_NUM_THREADS
, e.g.
export MKL_NUM_THREADS=1
before you start a MKL-based program.
compiler | name | module command | documentation | version |
---|---|---|---|---|
GNU OpenMP C/C++ | gcc/g++ -fopenmp | module load gcc |
man gcc | --version |
Intel OpenMP C/C++ (oneAPI) | icx/icpx -qopenmp | module load intel-compiler |
man icx / icpx | --version |
PGI OpenMP C/C++ | pgcc/pgCC -mp | module load pgi |
man pgcc /pgCC | --version |
Intel OpenMP Fortran | ifort -qopenmp | module load intel-compiler |
man ifort | --version |
GNU OpenMP Fortran | gfortran -fopenmp | module load gcc |
man gfortran | --version |
PGI Fortran | pgfortran -mp | module load pgi |
man pgfortran | --version |
Example: Compile and run an OpenMP C file:
module load intel-compiler
icx -qopenmp -O2 t.c
export OMP_NUM_THREADS=8
./a.out
compiler | name | module command | documentation | version |
---|---|---|---|---|
MPI C (based on gcc) | mpicc | module load openmpi/gnu |
see gcc | --version |
MPI C++ (based on gcc) | mpic++ | module load openmpi/gnu |
see g++ | --version |
MPI Fortran (based on gfortran) | mpif90 | module load openmpi/gnu |
see gfortran | --version |
MPI C (based on Intel oneAPI icx) | mpiicx | module load openmpi/intel |
see icx | --version |
MPI C++ (based on Intel oneAPI icpx) | mpiicpx | module load openmpi/intel |
see icpx | --version |
MPI Fortran (based on ifort) | mpiifort | module load openmpi/intel |
see ifort | --version |
Which MPI-compilers are used can be influenced through the module
command: with module load openmpi/gnu
you can use the GNU compiler environment (gcc, g++, gfortran), and with module load openmpi/intel
you can use the Intel compiler environment (icc, icpc, ifort). Be aware that using module load openmpi/intel
the MPI compiler names mpicc etc. are mapped to the GNU compilers. To use an Intel compiler you need to specify Intel's own names for that, i.e., mpiicx, mpiicpx, mpiifort.
All options discussed in the compiler section also apply here, e.g. optimization.
Example: Compile a MPI C file and generate optimised code:
module load openmpi/intel
mpiicx -O2 t.c
The MPI implementation we use (OpenMPI) has options to influence the communication medium used.
Within one node, MPI processes can communicate through shared memory, Omni-Path, or Ethernet with TCP/IP.
Between nodes, Omni-Path or Ethernet with TCP/IP is possible. OpenMPI ususally chooses the most
appropriate medium which means you don't need to specify anything.
But if you want to choose a specific and applicable medium you may specify this
in the call to mpirun
through the --mca btl
specifier:
mpirun --mca btl communication-channels ...
where communication-channels
is a list of comma separated
communication mediums. Possible values are: sm
for shared memory,
openib
for Omni-Path / Infiniband, and tcp
for Ethernet.
The last specifier must be self
.
mpirun --mca btl tcp,self -np 4 mpi.exe
ssh wr14
) as there are all necessary drivers
installed locally on that system. Production runs on any Tesla card
should be done using the appropriate batch queues.
Use module load cuda
to load the CUDA environment (inclusing certain possible versions).
Use module load opencl/nvidia
or module load
opencl/intel
to load the OpenCL environment, for Nvidia GPUs or
Intel processors, respectively.
With both modules, the standard environment
variables CPATH
for inlude files
and LIBRARY_PATH
for libraries are set accordingly to be
used e.g. in a makefile.
module load opencl
cc opencltest.c -lOpenCL
./a.out
To compile a CUDA project use the following Makefile template:
# defines
CC = cc
CUDA_CC = nvcc
LDLIBS = -lcudart
# default rules based on suffices
# C
%.o: %.c
$(CC) -c $(CFLAGS) -o $@ $<
# CUDA
%.o: %.cu
$(CUDA_CC) -c $(CUDA_CFLAGS) -o $@ $<
myprogram.exe: myprogram.o kernel.o
$(CC) -o $@ $^ $(LDLIBS)
Here the CUDA kernel and host part is in a file kernel.c
and the
non-CUDA part of your program is in a file myprogram.c
.
/usr/local/PGI/doc
for documentation.
Use wr14
only interactively to compile such programs. The
generated code can be executed on wr14-wr27
. You can specify the compute capability as a compiler option.
Important: the PGI compiler
generates per default debug code that in general is very slow. If you
want fast code add the nodebug
option.
Example:
module load pgi
pgcc -acc -ta=nvidia,cc3.5,nodebug openacctest.c
./a.out
where 3.5
corresponds to the compute capability of the target GPU.
/usr/bin/time -f "%M KB" command
which prints out the peak memory consumption in kilobytes of the command execution.
test.c
#include <stdio.h>
int main(int argc, char **argv) {
printf("Hello world\n");
return 0;
}
CC = cc
CFLAGS = -O
#default rules
%.o: %.c
$(CC) $(CFLAGS) -c $<
%.exe: %.o
$(CC) -o $@ $< $(LDLIBS)
default:: test_sequential.exe
#!/bin/bash
#SBATCH --output=slurm.%j.out # STDOUT
#SBATCH --error=slurm.%j.err # STDERR
#SBATCH --partition=any # partition (queue)
#SBATCH --ntasks=1 # use 1 task
#SBATCH --mem=100 # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00 # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)
# start program
./test_sequential.exe
test_openmp.c
#include <stdio.h>
#include <omp.h>
int main(int argc, char **argv) {
#pragma omp parallel
printf("I am the %d. thread of %d threads\n", omp_get_thread_num(), omp_get_num_threads());
return 0;
}
CC = gcc -fopenmp
CFLAGS = -O
#default rules
%.o: %.c
$(CC) $(CFLAGS) -c $<
%.exe: %.o
$(CC) -o $@ $< $(LDLIBS)
default:: test_openmp.exe
#!/bin/bash
#SBATCH --output=slurm.%j.out # STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err # STDERR
#SBATCH --partition=any # partition (queue)
#SBATCH --nodes=1 # number of tasks/cores
#SBATCH --ntasks-per-node=32 # number of tasks per node
#SBATCH --mem=1G # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00 # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)
# start program (with 24 threads, in total 32 threads were requested by the job)
export OMP_NUM_THREADS=24
./test_openmp.exe
test_mpi.c
:
#include <stdio.h>
#include <unistd.h>
#include <mpi.h>
int main(int argc, char **argv) {
int size, rank;
char hostname[80];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
gethostname(hostname, 80);
printf("Hello world from %d (on node %s) of size %d!\n", rank, hostname, size);
MPI_Finalize();
return 0;
}
CC = mpicc
CFLAGS = -O
#default rules
%.o: %.c
$(CC) $(CFLAGS) -c $<
%.exe: %.o
$(CC) -o $@ $< $(LDLIBS)
default:: test_mpi.exe
#!/bin/bash
#SBATCH --output=slurm.%j.out # STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err # STDERR
#SBATCH --partition=any # partition (queue)
#SBATCH --nodes=5 # number of tasks/cores
#SBATCH --ntasks-per-node=32 # number of tasks per node
#SBATCH --mem=4G # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00 # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)
module load gcc openmpi/gnu
# start program (with maximum parallelism as specified in job request, for this example 5*32=160)
mpirun ./test_mpi.exe
user@wr0: module load matlab user@wr0: matlabThis starts the Matlab shell. If you logged in from a X-Server capable computer and used
ssh -Y username@wr0.wr.inf.h-brs.de
to
login to wr0 the graphical panel appears on your computer instead of
the text panel (see here for details of X-Server usage).
... module load matlab matlab -nodisplay -nosplash -nodesktop -r "m-file"where
m-file
is the name of your Matlab script with the suffix .m
export MALLOC_ARENA_MAX=4
. This influences / restricts Matlab's memory allocation in a multithreaded environment. -Y
(or with older ssh versions also -X
) that enables
X11 tunneling through your ssh connection.
If your login path goes over multiple computers please be sure to use the -Y
option
for every intermediate host on the path.
user@another_host: ssh -Y user@wr0.wr.inf.h-brs.deOn your local computer (i.e. where the X-server is running) you must allow wr0 to open a window. Execute on your local computer in a shell:
xhost +
export DISPLAY=mycomputer.mydomain:0.0
xhost +
(which would allow any computer to open a window on your X-server)xterm
. A window on your local computer must pop up with a shell
on wr0.