passwd
command
on wr0
. Passwords must comply to the passwords rules of
the university (e.g. at least 10 characters,...)..
wr0
to all cluster nodes. This includes user data (e.g. $HOME
=
/home/username
) as well as commonly used application
software (e.g. /usr/local
).
The /tmp
directory is on all nodes guaranteed to be
located on a fast node-local filesystem. The environment
variable $TMPDIR
within a batch job contains a name to a
job-private fast local directory somewhere in /tmp
on a
node. This directory is generated on each job start with a
job-specific name and removed at job termination. See additional
description here. If possible, use this
dynamically set environment variable to write to and read from
temporary files used only in one job run.
The /work
directory can be used for larger amounts of
data that needs to be available longer than a batch job run and/or
needs to be shared between nodes. The /work
directory is
shared between all nodes and access to it is slower than to a
local /tmp
-based file.
Please be aware that data on /tmp
filessystems may be
deleted without any notice after a certain period of time. And be
aware that there is no backup for the /tmp
and /work
file systems!
mount point | purpose | located | shared on all nodes | daily backup | capacity | access time | default quota |
---|---|---|---|---|---|---|---|
/ | operating system | local | no | no | - | - | - |
/tmp | node-local temporary user data | local | no | no | small | fast | 10 GB |
/usr/local | application software | remote server | yes | yes | - | - | - |
/home | user data | remote server | yes | yes | medium | medium | 100 GB |
/work | user data | remote server | yes | no | large | medium | 10 TB |
We have established quotas on file systems. Users can ask for their
own quota with the command quota -s --show-mntpoint
. The
maximum number of files is per default restricted to 1 Mio. / 2
Mio. (soft / hard limit) files per file system. For
the /work
filesystem, the numbers are 5 Mio. / 6 Mio.
Beside the standard solutions for most users, we have individual nodes with special local I/O features. If you have special, fast or large I/O demands, contact the system administrator to find the best individual solution.
module
command with several possible subcommands. A
software environment is called a module. Loading a module means
usually that internally the search paths for commands, libraries
etc. are extended.
For a full command reference of the module system, read
the
documentation.
module avail
shows a list of available modulesmodule whatis
shows a verbose information on a modulemodule load
loads a named modulemodule list
show all currently loaded modulesmodule unload
removes a named moduleExample: Instead of
user@wr0: module load gcc/13.2.0
just use
user@wr0: module load gcc
user@wr0: module avail
aocc/4.1.0 cuda/default gnuplot/default libFHBRS/default openmpi/4.1.5 pin/default
aocc/default dinero4/4.7 intel-compiler/2023 likwid/5.2.2 openmpi/default scalasca/2.6.1
aocl/aocl-aocc-4.1.0 dinero4/default intel-compiler/default likwid/default openmpi/gnu slurm/23.02.4
aocl/aocl-gcc-4.1.0 ffmpeg/6.0 intel-mpi/2023 metis/5.2.1-32 oracle-java/20 slurm/default
aocl/default ffmpeg/default intel-mpi/default metis/5.2.1-64 oracle-java/default texlive/2023
cmake/3.27.1 gcc/13.2.0 intel-tools/2023 openjdk/20.0.2 pgi/20.1 texlive/default
cmake/default gcc/default intel-tools/default openjdk/default pgi/default valgrind/3.21.0
cuda/12.2 gnuplot/5.4.8 libFHBRS/3.2 openmpi-system/gnu pin/3.28 valgrind/default
------------------------------------------------- /usr/share/Modules/modulefiles -------------------------------------------------
dot module-git module-info modules null use.own
----------------------------------------------------- /usr/share/modulefiles -----------------------------------------------------
mpi/mpich-x86_64 mpi/openmpi-x86_64
Key:
loaded modulepath
user@wr0: module whatis gcc
gcc : GNU compiler suite version 10.1.0
# check current compiler version (system default without loading a module)
user@wr0: gcc --version
nt color=#FF0000>gcc --version
gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4)
# load default version
user@wr0: module load gcc
user@wr0: gcc --version
gcc (GCC) 13.2.0
# unload default version
user@wr0: module unload gcc
user@wr0: gcc --version
gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4)
module
command is:
name | purpose |
---|---|
aocc | AMD compiler (called with clang,...) |
aocl | AMD optimized libraries (e.g. BLAS) |
cmake | CMake system |
cuda | Nvidia CUDA development and runtime environment |
gcc | GNU compiler suite |
gnuplot | plot program |
intel-compiler | Intel Compiler enviroment |
intel-tools | Intel development tools |
matlab | Matlab mathematical software with toolboxes |
metis | graph partitioning package |
openmpi | OpenMPI MPI environment |
openjdk | OpenJDK Java development kit |
oracle-java | Oracle Java development kit |
pgi | PGI compiler suite |
slurm | batch system |
texlive | TeX distribution |
valgrind | Valgrind software analysis tool |
.bash_profile
(once per session executed)
or .bashrc
(once per shell executed) file in your home
directory.
Example $HOME/.bashrc
file:
module load gcc openmpi
Slurm
has a command line interface and
additionally a X11 based graphical interface to display certain batch
system states.
To work with batch jobs, a user usually does a sequence of steps
described below step by step.
/home/user/job_sequential.sh
is:
#!/bin/sh
# start sequential program
./test_sequential.exe
# change directory and execute another sequential program
cd subdir
./another_program.exe
/home/user/job_openmp.sh
is:
#!/bin/sh
# set the number of threads
export OMP_NUM_THREADS=16
# start OpenMP program
./test_openmp.exe
/home/user/job_mpi.sh
is:
#!/bin/sh
# load the OpenMPI environment
module load openmpi/gnu
# start here your MPI program
mpirun ./test_mpi.exe
#SBATCH
(which
is a special form of a shell comment). In each line a certain part of
the request can be specified. See the documentation of
Slurm sbatch for
a list of all options that are available to specify. Here, only an
example is given. More options are given in a summary later.
An example for such a resource request is:
#!/bin/bash
#SBATCH --partition=any # partition / wait queue
#SBATCH --nodes=4 # number of nodes
#SBATCH --ntasks-per-node=32 # number of tasks per node
#SBATCH --mem=4G # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00 # total runtime of job allocation (format D-HH:MM:SS; first parts optional)
#SBATCH --output=slurm.%j.out # filename for STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err # filename for STDERR
# here comes the part with the description of the computational work, for example:
# load the OpenMPI environment
module load openmpi/gnu
# start here your MPI program
mpirun ./test_mpi.exe
The meaning of the lines in this example are:
any
is
requested. A partition is a class of hardware nodes with same /
similar properties. For most partitions it is valid, that all nodes in
that partition have the same or similar hardware properties.
--mem=4G
asks for 4GB of main memory on each of the nodes.
--time=2:00
asks for a maximum of 2 minutes of usage for the requested resources.
any
are available with 32 free cores and 4 GB memory each,
for 2 minutes.
#!/bin/bash
#SBATCH --partition=any # partition (queue)
#SBATCH --tasks=80 # number of tasks <---------- this is different to above
#SBATCH --mem=4G # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00 # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)
#SBATCH --output=slurm.%j.out # filename for STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err # filename for STDERR
# here comes the part with the description of the computational work, for example:
# load the OpenMPI environment
module load openmpi/gnu
# start here your MPI program
mpirun ./test_mpi.exe
In this example, 80 parallel execution units are requested. This can
be fulfilled by 4 x 20-core nodes. But this request may
also be fulfilled by one node with 80 cores or 80 nodes with one core
used on each (and other cores on a node left for other jobs). This
specification gives more freedom to the batch system to find
resources. But the programming model is (usually) restricted to MPI as
a program run may be spread over several nodes.
sbatch
command using the
job script filename as an argument.
Example:
user@wr0: sbatch jobscript.sh
If the system accepts the request (i.e., no syntax error in the script, the requested resources can in principle be fulfilled at some time etc.) the batch system prints a job ID that may be used to refer to
this job.
Please be aware, that all loaded modules of your interactive session (where you execute the sbatch
command) are as well loaded when starting your submitted batch job. This may lead to different behaviour of a batch job for interactive sessions with differently loaded modules!
squeue
command.
user@wr0: squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
55 any test2.sh user PD 0:00 2 (Resources)
56 any test3.sh user PD 0:00 2 (Priority)
54 any test1.sh user R 0:08 2 wr[50,51]
In the example the user has 3 jobs submitted that are either running
or still waiting. The column ST marks the job state (R=running, PD=waiting
).
user@wr0: ls -l
-rw------- 1 user fb02 316 Mar 9 07:27 slurm.51.out
-rw------- 1 user fb02 11484 Mar 9 07:27 slurm.52.err
command | meaning |
---|---|
sbatch <shell-script> | submit the shell-script to the batch system |
scancel <jobid> | delete a job with the given job ID, that may be either in running or waiting state |
squeue | show the state of own jobs in queues |
sinfo [options] | show the state of partitions or nodes |
scontrol show job <jobid> | show more details for the job |
sinfo
command for a
list of available partitions. With each partition are associated
certain policies (hardware properties, maximum number of jobs in
queue, maximum runtime per job, scheduling priority, maximum physical
memory, special hardware features).
Preferably use a more general partition, e.g. any
unless
you have special hardware/software requirements. Job requests for a
more general partition have higher priority than job requests for a
more specialized partition.
A list of the most important queues is:
queue name | maximum time per job | usable memory | default virt.memory/process (on CPU) | nodes used |
---|---|---|---|---|
any
| 72 hours | (dependent on node) | 1 GB | any node |
hpc
| 72 hours | 370 GB (AMD part) or 185 GB (Intel part) | 1 GB | wr50-wr106 |
hpc1
| 72 hours | 370 GB | 1 GB | wr50-wr74 |
hpc3
| 72 hours | 185 GB | 1 GB | wr75-wr106 |
gpu
| 72 hours | 185 GB | 1 GB | wr15-wr19 |
gpu4
| 72 hours | 185 GB (wr14) or 470 GB (wr20-wr25) | 1 GB | wr14,wr20-wr25 |
bigmem
| 72 hours | 750 GB / 1 TB | 1 GB | wr43 / wr44 |
variable name | purpose | example |
---|---|---|
$SLURM_SUBMIT_DIR
| working directory where the job was submitted | /home/user/testdir |
$SLURM_JOB_ID
| Job ID given to the job | 65 |
$SLURM_JOB_NAME
| Job name given to the job | testjob |
$SLURM_JOB_NUM_NODES
| number of nodes assigned to this job | 2 |
$SLURM_JOB_CPUS_PER_NODE
| number of cores per node assigned to this job | 32(x5) (32 cores, on 5 nodes) |
$SLURM_JOB_NODELIST
| node names of assigned nodes | wr[50,51] |
#SBATCH --ntasks-per-core=1
#!/bin/bash
#SBATCH --partition=hpc3 # partition
#SBATCH --nodes=1 # number of nodes
#SBATCH --ntasks-per-core=1 # use only real cores
#SBATCH --time=2:00 # total runtime of job allocation
export OMP_NUM_THREADS=32
./test_openmp.exe
#SBATCH --cpus-per-task=X
to reserve X CPUs per (MPI-) task#SBATCH --ntasks-per-node=X
to spread equally tasks over nodes (X tasks per node in the example)
#!/bin/bash
#SBATCH --partition=any # partition
#SBATCH --nodes=4 # number of nodes
#SBATCH --ntasks-per-node=32 # number of cores per node
#SBATCH --time=2:00 # total runtime of job allocation
module load openmpi
mpirun ./test_mpi.exe
#SBATCH --nodelist=wr73
to ask for node wr73$TMPDIR
gets
defined with a name of a temporary directory (with fast access) that
should be used for fast temporary file storage within a job scope. The
directory is created on job start and deleted when the job finished.
Example on how to use the environment variable within a program:
char *basedir = getenv("TMPDIR");
if(basedir != NULL)
{
char filename = "test.dat";
char allname[1024];
sprintf(allname, "%s/%s", basedir, filename);
FILE *f = fopen(all, "w");
}
sstat
helps with that for running jobs.
user@wr0: sstat --format=jobid,maxvmsize,MaxDiskRead 123456.batch
JobID MaxVMSize MaxDiskRead
------------ ---------- ------------
123456.batch 47654040K 39789920
where 123456
is the job number of the running job.
If you need such information for already finishedy jobs, use the command sacct
.
Example:
user@wr0: sacct -j 123456.batch --format="jobid,CPUTime,MaxVMSize,MaxDiskRead"
JobID CPUTime MaxVMSize MaxDiskRead
------------ ---------- ---------- ------------
123456.batch 01:37:04 24173824K 828.59M
GPU nodes are in the following queues:
batch queue | #GPUs available | #GPU(s) on node | available GPU memory |
---|---|---|---|
gpu | 5 | Nvidia V100 | 16 GB |
gpu4 | 30 | 4x Nvidia A100 / 4x Nvidia V100 | 80 GB / 16 GB |
To use one or more GPU's on a GPU-node (one
of wr14-wr25
), add --gres=gpu
to your job
request. For this default case, you ask for 1 GPU on the requested
node(s). For the nodes wr14
and wr20-25
you
may ask for up to 4 GPUs, respectively. To request more than one GPU,
add --gres=gpu:4
to your job request, if you ask for
example for 4 GPU's.If you don't use the --gres=...
option,
your job is started on the requested node without access to any GPU,
i.e. CPU-only!
Example for a batch job file:
#!/bin/bash
#SBATCH --partition=gpu4 # GPU partition
#SBATCH --nodes=1 # number of nodes
#SBATCH --gres=gpu:4 # ask for a node with 4 GPUs
#SBATCH --time=24:00 # total runtime of job allocation
module load cuda
./my_cuda_program.exe
Using the --gres=...
option, the environment
variable CUDA_VISIBLE_DEVICES
is set by the batch system
to a value with the reserved GPU number(s) on a node. This environment
variable is respected by CUDA. All your CUDA programs in your job run
on those devices that are assigned to your job.
Use in the beginning of your GPU work node wr14
which you
can access interactively, i.e. from wr0
do a ssh
wr14
. Here you can start a short but representable program run
to test your CUDA application. To simulate a batch job program run,
set the environment variable accordingly:
user@wr14: export CUDA_VISIBLE_DEVICES=0
After the test run finished, you can get information about the GPU utilization of this
program run with the command:
user@wr14: nvidia-smi -q -d ACCOUNTING
which results in a output like:
==============NVSMI LOG==============
Timestamp : Fri Nov 17 16:54:34 2023
Driver Version : 440.33.01
CUDA Version : 12.3
Attached GPUs : 2
GPU 00000000:84:00.0
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Accounted Processes
Process ID : 48147
GPU Utilization : 85 %
Memory Utilization : 81 %
Max memory usage : 171 MiB
Time : 6936 ms
Is Running : 0
GPU 00000000:85:00.0
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Accounted Processes : None
The lines below Process ID
give you valuable information
about the GPU utilization. If more than one process is listed in the
output, usually the last entry in the output relates to the last
execution on the GPU.
You can also monitor the GPU usage of your running program on a batch node with the command:
user@wr0: srun -s --jobid your-running-job-id --pty nvidia-smi
where your_running_job_id
is the job-ID of your running GPU program on a batch node. The output looks similar to
user@wr0> srun -s --jobid 123456 --pty nvidia-smi
...
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:01:00.0 Off | Off |
| N/A 28C P0 85W / 500W | 592MiB / 81920MiB | 85% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 386553 C ./vectorproduct.exe 534MiB |
+---------------------------------------------------------------------------------------+
which shows that the current utilization of the GPU used is 85% and that the program that runs a kernel on the GPU is called ./vectorproduct.exe
and utilizes 534 MiB of GPU memory.
apptainer help
for the available CLI commands with apptainer, many of them compatible
with docker, including the usage of docker images.
wr0
for all development where you do not need special hardware (e.g. accelerator) or want to use MPI wr14
for GPU development, i.e., CUDA, OpenCL,
OpenACC. Additionally you can use this system for MPI tests
with small data sets and a small number of MPI
processes. From wr0
, do a ssh -Y wr14
to
work interactively on wr14
. Don't start productions runs
on this test system, for that use instead the batch system. srun
command:
srun --x11 --pty /bin/bash
on other nodes with additinal
options that are possible.
compiler | name | module command | documentation | safe optimization | debug option | compiler feedback | version |
---|---|---|---|---|---|---|---|
GNU C | cc / gcc | module load gcc |
man gcc | -O2 | -g | -ftree-vectorizer-verbose=2 | --version |
AMD aocc | clang | module load aocc |
man clang | -O2 | -g | -ftree-vectorizer-verbose=2 | --version |
Intel C (oneAPI) | icx | module load intel-compiler |
man icx | -O2 | -g | –qopt-report | --version |
Nvidia C | nvc | module load nvidia-hpc |
man nvc | -O2 | -g | -Minfo | --version |
GNU C++ | g++ | module load gcc |
man g++ | -O2 | -g | -ftree-vectorizer-verbose=2 | --version |
AMD aoc++ | clang | module load gcc |
man clang | -O2 | -g | -ftree-vectorizer-verbose=2 | --version |
Intel C++ (oneAPI) | icpx | module load intel-compiler |
man icpx | -O2 | -g | -qopt-report | --version |
Nvidia C++ | nvcc | module load nvidia-hpc |
man nvcc | -O2 | -g | -Minfo | --version |
GNU Fortran | gfortran | module load gcc |
man gfortran | -O2 | -g | -ftree-vectorizer-verbose=2 | --version |
AMD Fortran | flang | module load aocc |
man flang | -O2 | -g | -ftree-vectorizer-verbose=2 | --version |
Intel Fortran | ifort | module load intel-compiler |
man ifort | -O2 | -g | –vec-report=2 (or higher) | --version |
Nvidia Fortran | nvfortran | module load nvidia-hpc |
man nvfortran | -O2 | -g | -Minfo | --version |
Oracle Java | javac | module load oracle-java |
-O | -g | n.a. | -version |
Examples:
cc -O2 t.c
module load intel-compiler; ifort -O2 t.f
module load intel-compiler
which expands
include file search paths and library search paths accordingly. It
should be used preferably on Intel-based systems, but works also on
AMD systems. The library contains basic mathematical functions (BLAS,
LAPACK, FFT,...).
If you use any of the Intel compilers, just add the
flag -qmkl
as a compiler and linker flag. Otherwise,
check this
page for the appropriate version and correspondings flags.
Example for Makefile:
CC = icx
CFLAGS = -qmkl
LDLIBS = -qmkl
By default MKL uses all available cores. You can restrict this number
with the environment variable MKL_NUM_THREADS
, e.g.
export MKL_NUM_THREADS=1
before you start a MKL-based program.
module load aocl
. See the AMD documentation
how to use this software.
compiler | name | module command | documentation | version |
---|---|---|---|---|
GNU OpenMP C/C++ (same for AMD aocc clang) | gcc/g++ -fopenmp | module load gcc |
man gcc | --version |
Intel OpenMP C/C++ (oneAPI) | icx/icpx -qopenmp | module load intel-compiler |
man icx / icpx | --version |
Nvidia OpenMP C/C++ | nvc/nvcc -mp | module load nvidia-hpc |
man nvc / nvcc | --version |
Intel OpenMP Fortran | ifort -qopenmp | module load intel-compiler |
man ifort | --version |
GNU OpenMP Fortran (same for AMD aocc flang) | gfortran -fopenmp | module load gcc |
man gfortran | --version |
Nvidia Fortran | nvfortran -mp | module load nvidia-hpc |
man nvfortran | --version |
Example: Compile and run an OpenMP C file:
module load gcc
gcc -fopenmp -O2 t.c
export OMP_NUM_THREADS=8
./a.out
compiler | name | module command | documentation | version |
---|---|---|---|---|
MPI C (based on gcc) | mpicc | module load openmpi/gnu |
see gcc | --version |
MPI C++ (based on gcc) | mpic++ | module load openmpi/gnu |
see g++ | --version |
MPI Fortran (based on gfortran) | mpif90 | module load openmpi/gnu |
see gfortran | --version |
MPI C (based on Intel oneAPI icx) | mpiicx | module load openmpi/intel |
see icx | --version |
MPI C++ (based on Intel oneAPI icpx) | mpiicpx | module load openmpi/intel |
see icpx | --version |
MPI Fortran (based on ifort) | mpiifort | module load openmpi/intel |
see ifort | --version |
Which MPI-compilers are used can be influenced through the module
command: with module load openmpi/gnu
you can use the GNU compiler environment (gcc, g++, gfortran), and with module load openmpi/intel
you can use the Intel compiler environment (icc, icpc, ifort). Be aware that using module load openmpi/intel
the MPI compiler names mpicc etc. are mapped to the GNU compilers. To use an Intel compiler you need to specify Intel's own names for that, i.e., mpiicx, mpiicpx, mpiifort. We don't recommend Intel MPI.
All options discussed in the compiler section also apply here, e.g. optimization.
Example: Compile a MPI C file and generate optimised code:
module load openmpi/intel
mpiicx -O2 t.c
ssh wr14
) as there are all necessary drivers
installed locally on that system. Production runs on any GPU
should be done using the appropriate batch queues.
Use module load cuda
to load the CUDA environment (including certain possible versions available).
To compile a CUDA project you can use the following Makefile template:
# defines
CC = cc
CUDA_CC = nvcc
LDLIBS = -lcudart
# default rules based on suffices
# C
%.o: %.c
$(CC) -c $(CFLAGS) -o $@ $<
# CUDA
%.o: %.cu
$(CUDA_CC) -c $(CUDA_CFLAGS) -o $@ $<
myprogram.exe: myprogram.o kernel.o
$(CC) -o $@ $^ $(LDLIBS)
Here the CUDA kernel and host part is in a file kernel.cu
and the
non-CUDA part of your program is in a file myprogram.c
.
nvc / nvcc
. Use wr14
interactively to compile such programs. The generated code can be
executed on all GPU nodes. You can specify the compute capability as a
compiler option.
/usr/bin/time -f "%M KB" command
which prints out the peak memory consumption in kilobytes of the command execution.
test.c
#include <stdio.h>
int main(int argc, char **argv) {
printf("Hello world\n");
return 0;
}
CC = cc
CFLAGS = -O
#default rules
%.o: %.c
$(CC) $(CFLAGS) -c $<
%.exe: %.o
$(CC) -o $@ $< $(LDLIBS)
default:: test_sequential.exe
#!/bin/bash
#SBATCH --output=slurm.%j.out # STDOUT
#SBATCH --error=slurm.%j.err # STDERR
#SBATCH --partition=any # partition (queue)
#SBATCH --ntasks=1 # use 1 task
#SBATCH --mem=100 # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00 # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)
# start program
./test_sequential.exe
test_openmp.c
#include <stdio.h>
#include <omp.h>
int main(int argc, char **argv) {
#pragma omp parallel
printf("I am the %d. thread of %d threads\n", omp_get_thread_num(), omp_get_num_threads());
return 0;
}
CC = gcc -fopenmp
CFLAGS = -O
#default rules
%.o: %.c
$(CC) $(CFLAGS) -c $<
%.exe: %.o
$(CC) -o $@ $< $(LDLIBS)
default:: test_openmp.exe
#!/bin/bash
#SBATCH --output=slurm.%j.out # STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err # STDERR
#SBATCH --partition=any # partition (queue)
#SBATCH --nodes=1 # number of tasks/cores
#SBATCH --ntasks-per-node=32 # number of tasks per node
#SBATCH --mem=1G # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00 # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)
# start program (with 24 threads, in total 32 threads were requested by the job)
export OMP_NUM_THREADS=24
./test_openmp.exe
test_mpi.c
:
#include <stdio.h>
#include <unistd.h>
#include <mpi.h>
int main(int argc, char **argv) {
int size, rank;
char hostname[80];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
gethostname(hostname, 80);
printf("Hello world from %d (on node %s) of size %d!\n", rank, hostname, size);
MPI_Finalize();
return 0;
}
CC = mpicc
CFLAGS = -O
#default rules
%.o: %.c
$(CC) $(CFLAGS) -c $<
%.exe: %.o
$(CC) -o $@ $< $(LDLIBS)
default:: test_mpi.exe
#!/bin/bash
#SBATCH --output=slurm.%j.out # STDOUT (%N: nodename, %j: job-ID)
#SBATCH --error=slurm.%j.err # STDERR
#SBATCH --partition=any # partition (queue)
#SBATCH --nodes=5 # number of tasks/cores
#SBATCH --ntasks-per-node=32 # number of tasks per node
#SBATCH --mem=4G # memory per node in MB (different units with suffix K|M|G|T)
#SBATCH --time=2:00 # total runtime of job allocation ((format D-HH:MM:SS; first parts optional)
module load gcc openmpi/gnu
# start program (with maximum parallelism as specified in job request, for this example 5*32=160)
mpirun ./test_mpi.exe
user@wr0: module load matlab user@wr0: matlabThis starts the Matlab shell. If you logged in from a X-Server capable computer and used
ssh -Y username@wr0.wr.inf.h-brs.de
to
login to wr0 the graphical panel appears on your computer instead of
the text panel (see here for details of X-Server usage).
... module load matlab matlab -nodisplay -nosplash -nodesktop -r "m-file"where
m-file
is the name of your Matlab script with the suffix .m
export MALLOC_ARENA_MAX=4
. This influences / restricts Matlab's memory allocation in a multithreaded environment. -Y
(or with older ssh
versions also -X
) that enables X11 tunneling through
your ssh connection. If your login path goes over multiple computers
please be sure to use the -Y
option for every
intermediate host on the path.
user@another_host: ssh -Y user@wr0.wr.inf.h-brs.deOn your local computer (i.e. where the X-server is running) you must allow wr0 to open a window. Execute on your local computer in a shell:
xhost +
export DISPLAY=mycomputer.mydomain:0.0
xhost +
(which would allow any computer to open a window on your X-server)xterm
. A window on your local computer must pop up with a shell
on wr0.