WR Cluster Hardware

Content

Overview
Front-End Node / Server
Hot Standby
Cluster Nodes
Shared Memory Parallel Systems
Accelerators
Network

Overview

The whole cluster consists of:

one front-end node wr0 for interactive access and administration as well as several server
a sub-cluster WR-I with 25 nodes wr50-wr74 with 2 64-core 2-way hyperthreaded AMD EPYC 9554 (a total of 6,400 way parallelism), connected with 200 Gb/s Infiniband
a sub-cluster WR-II with 32 nodes wr75-wr106 with 2 16-core 2-way hyperthreaded Intel Xeon Gold 6130 (a total of 2,048 way parallelism), connected with 100 Gb/s Omni-Path
shared memory parallel systems wr44,wr43 with a large main memory
systems with compute accelerators wr15-wr19 (1 GPU) and wr14, wr20-wr25 (4-way GPU)

with a total of approx. 10.500 CPU hardware threads plus additionally approx. 200,000 graphic processor cores.
Configuration

Key Hardware Parameters

system	wr0	wr50-wr74	wr75-wr106	wr20-wr25	wr14	wr15-wr19	wr44	wr43	wr4	wr7	wr5	wr3	wr1	wr2	wr6
processor	AMD EPYC 9374F (Genoa, 4th Gen)	AMD EPYC 9554 (Genoa, 4th Gen)	Intel Xeon Gold 6130F (Skylake)	AMD EPYC 7543 (Milan, 3rd Gen)	Intel Xeon Gold 6130 (Skylake)	Intel Xeon Gold 6130 (Skylake)	AMD EPYC 7702 (Rome, 2nd Gen)	Intel Xeon E5-4657L v2 (Ivy Bridge)	AMD EPYC 75F3 (Milan, 3rd Gen)	AMD EPYC 9254 (Genoa, 4th Gen)	Intel Xeon Gold 6128 (Skylake)	Intel Xeon Gold 6150 (Skylake)	Intel Xeon E5-2643 v2 (Ivy Bridge)	Intel Xeon Bronze (Skylake EP)	Intel Xeon E5-2697 v3 (Haswell EP)
cores per processor	32 (+2*HT)	64 (+2*HT)	16 (+2*HT)	32 (+2*HT)	16 (+2*HT)	16 (+2*HT)	64 (+2*HT)	12 (+2*HT)	32 (+2*HT)	24 (+2*HT)	6 (+2*HT)	18 (+2*HT)	6 (+2*HT)	6 (+2*HT)	14 (+2*HT)
processors per node	2	2	2	2	2	2	2	4	2	2	2	2	2	1	2
hw threads per node	128	128 (HT disabled)	64	128	64	64	256	96	128	96	72	24	12	56	24
clock speed [GHz]	3.85 (4.3 with TurboBoost)	3.1 (3.75 with TurboBoost)	2.1 (3.7 with TurboBoost)	2.8 (3.7 with TurboBoost)	2.1 (3.7 with TurboBoost)	2.1 (3.7 with TurboBoost)	2.0 (3.35 with TurboBoost)	2.4 (2.9 with TurboBoost)	2.95 (4.0 with TurboBoost)	2.9 (4.15 with TurboBoost)	3.7 (3.4 with TurboBoost)	2.7 (3.7 with TurboBoost)	3.4 (3.7 with TurboBoost)	1.7	2.6 (3.6 with TurboBoost)
L1 (per core) / L2 per core / L3 data cache size (shared)	32K / 1M / 256M	32K / 1M / 256M	32K / 1M / 22M	32K / 1M / 256M	32K / 1M / 22M	32K / 1M / 22M	32K / 512K / 256M	32K / 256K / 30M	32K / 256K / 8M	32K / 256K / 128M	32K / 512K / 256M	32K / 1M / 19.25M	32K / 1M / 19.25M	32K / 256K / 8.25M	32K / 256K / 35M
accelerator	-	-	-	4x Nvidia A100	4x Nvidia V100	Nvidia V100	-	-	-	-	-	-	-	-	Nvidia K80
main memory [GB]	384	384	192	512	192	192	1024	768	512	768	192	128	256	128	96
local disk [GB]	63,360	900	500	900	1,600	500	13,300	500	300,000	237,849	240,000	9,600	112,000	53,000	256
avg. latency L1 / L2 / L3 / main memory [cycles]	5 / 16 / 58 / 474	4 / 14 / 52 / 459	4 / 14 / 75 / 331	4 / 12 / 48 / 359	4 / 14 / 75 / 319	4 / 14 / 75 / 319	4 / 12 / 36 / 328	3 / 10 / 39 / 177	4 / 12 / 43 / 378	4 / 14 / 52 / 450	4 / 14 / 75 / 325	3 / 11 / 65 / 250	4 / 12 / 39 / 211	4 / 14 / 53 / 153	4 / 12 / 56 / 286
stream memory bandwidth core / node [GB/s]	42.2 / 388.5	60.6 / 750.0	13.2 / 180.3	50,6 / 345.4	13.2 / 179.7	13.2 / 179.7	38.8 / 278.1	6.2 / 76.8	46.6 / 225.5	50.1 / 241.0	13.5 / 142.9	14.1 / 167.0	10.0 / 83.4	9.9 / 42.0	48.3 / 111.4
measured DGEMM GFlops core / node	52.5 / 3,724.5	58.5 / 6,432.4	100.2 / 1,830.0	56.4 / 2,452.0	100.9 / 1,825.9	100.9 / 1,825.9	50.0 / 2,883.2	21.5 / 837.4	22.5 / 1,406.4	63.9 / 2,320.1	84.7 / 960.4	101.1 / 2,490.6	27.1 / 315.6	11.9 / 72.2	48.3 / 1,022.3
job queue	-	any,hpc,hpc1	any,hpc,hpc3	gpu,gpu4	any,gpu	any,gpu	any,bigmem,wr44	any,bigmem,wr43	-	-	-	-	-	-	-

Server

SSD-based Fileserver (wr4)

This is the fileserver for large data sets. This server hosts the filesystem /work.

Barebone Supermicro A+ 2124US-TNRP with a H12DSU-iN mainboard
2 AMD EPYC 75F3 (Milan) with 32 cores each at 2.95 GHz (Turbo at 4.0 GHz) with 2x Hyperthreading (in total 128 threads)
8 memory channels per processor
512 GB DDR4-3200 memory
2 SSD Samsung PM9A3 (960 GB each)
20 Samsung PM1733 NVME SSD (gross in total 300 TiB) organized in 5x4 software RAID-5 arrays virtualized to a larger filesystem using LVM
2x 10Gb/s SFP+ Ethernet adapter based on Intel X710-TM4
200 Gb/s dual port Mellanox Infiniband adapter MCX653106A-HDAT ConnectX-6 VPI

SSD-based Fileserver (wr7)

This is the fileserver for very large large data sets. This server hosts the filesystem /work2.

Barebone Supermicro AS 2125HS-TNR with a H13DSH mainboard
2 AMD EPYC 9254 (Genoa) with 24 cores each at 2.9 GHz (Turbo at 4.15 GHz) (in total 48 threads)
12 memory channels per processor
768 GB DDR5-4800 memory
2 SSD Micron 7450 PRO (960 GB each)
16 Micron 7450 PRO NVME SSD 15,36 TiB (gross in total 245 TiB) organized in 4x4 software RAID-5 arrays virtualized to a larger filesystem using LVM
200 Gb/s Mellanox Infiniband adapter MCX653105A-HDAT ConnectX-6 VPI

HD-based Backup Fileserver (wr5)

This server acts as a backup server for critical data.

Barebone Supermicro SuperStorage 6029P‐E1CR24H with a Super X11DSC+ mainboard
2 Intel Xeon Gold 6128 (Skylake SP) with 6 cores each at 3.4 GHz (Turbo at 3.7 GHz) with 2x Hyperthreading (in total 24 threads)
8 memory channels per processor
192 GB DDR4-2666 memory
RAID controller based on Broadcom 3108
2 SSD Intel D3-S4510 (960 GB)
6 RAID-6 arrays based on Toshiba MG06ACA10TE SATA3 discs (24 disks, gross in total 240 TiB) virtualized to a larger filesystem using LVM
10Gb/s SFP+ Ethernet adapter Intel 82599ES
100 Gb/s Intel Omni-Path adapter Intel 100HFA016LS

Network Gateway System (wr2)

This server acts as a gateway between Infiniband and Omnipath.

Barebone Supermicro SuperServer 6029U-TRT with a Super X11DPU mainboard
Intel Bronze 3104 (Skylake EP) with 6 cores each at 1.7 GHz with 2x Hyperthreading (in total 12 threads)
96 GB DDR4-2666 memory
6 memory channels per processor
LVM filesystem with several 10TB discs

Hot Standby Server

These systems act as a hot standby for critical servers.

Standby Backup Server (wr1)

This server is a standby system for backup.

Barebone Supermicro SuperStorage Server 6047R-E1R24N with a X9DRi-LN4F+ mainboard
2 Intel Xeon E5-2643v2 (Ivy Bridge EP) with 6 cores each at 3.5 GHz (Turbo at 3.8 GHz) with 2x Hyperthreading (in total 24 threads)
4 memory channels per processor
128 GB DDR3-1600 memory
8-port RAID controller AOC-SAS2LP-H8iR based on LSI 2108
4 RAID-6 arrays based on WD SATA3 discs (109 / 80 TiB gross/net) virtualized to a larger filesystem using LVM

Hot-Standby Server Node (wr3)

This is the front end node and acts as the central access point for the whole cluster.

Barebone Supermicro 2029U-TN24R4T with a Super C11DPU mainboard
2 Intel Xeon Gold 6150 (Xeon SP) with 18 cores each at 2.7 GHz (Turbo at 3.7 GHz) with 2x Hyperthreading (in total 72 threads)
192 GB DDR4-2400 memory
3 SSD-based software RAID systems (in total 10x Samsung PM963 NVMe)
100 Gb/s Intel Omni-Path adapter Intel 100HFA016LS
peak performance is 3,110 GigaFlops (2 processors with 18 cores, each has 2 FMA units, each capable of 2x8 FMA-flops per AVX cycle)

Some technical details for the processor:

L1 data cache: 32 KB, 8-way set associative, 64 bytes/line, exclusive cache
L2 unified cache: 1 MB, 16-way set associative, 64 bytes/line, exclusive cache
L3 cache: 24.75 MB, 11-way set associative, exclusive cache
6 memory channels per processor, theor. memory bandwidth 115.2 GB/s per processor
peak performance is 86.4 GigaFlops / processor core thread

Test System (wr6)

This older server acts as an internal test system. It has also an (old) GPU Nvidia K80.

Barebone Supermicro SYS-2028GR-TR with X10DRG-H mainboard
2 Intel Xeon E5-2697 v3 at 2.6 GHz with in total 56 hardware threads
4 memory channels per processor
128 GB DDR4-2133 memory
Nvidia K80 with 2x GK210 GPUs and 2x12 GB memory

CPU Nodes

The CPU cluster part consists of two parts, one part with 25 systems wr50-wr74 based on newer AMD processors and a second part with 32 systems wr75-wr106 based on older Intel processors.

AMD-based CPU nodes

The cluster nodes wr50-wr74 are based on the Gigabyte barebone R183-Z90 rev.AAD1 with a MZ93-FS0 mainboard. The specification for each cluster node is:

AMD EPYC 9554 (Genoa) at 3.1 GHz, each with in total 128 cores / hardware threads (we disabled hyperthreading on these nodes)
384 GB DDR5-4800 memory
960 GB SSD
200 Gb/s Infiniband

Some technical details for the processor:

L1 data cache: 32 KB, 8-way set associative, write-back, 64 bytes/line
L2 unified cache: 1 MB, 8-way set associative, write-back, 64 bytes/line, exclusive cache
L3 cache: 256 MB, 16-way associative, write-back, exclusive cache
12 memory channels per processor

Intel-based CPU nodes

The cluster nodes wr75-wr106 are based on the PowerEdge C6420 barebone from Dell, 4 grouped in a PowerEdge C6400 chassis. The specification for each cluster node is:

Intel Xeon Gold 6130 and Intel Xeon Gold 6130-F at 2.1 GHz, each with in total 64 hardware threads
192 GB DDR4-2466 memory
480 GB SSD
100 Gb/s Intel Omni-Path through a Xeon Gold 6130-F

Some technical details for the processor:

L1 data cache: 32 KB, 8-way set associative, write-back, 64 bytes/line
L2 unified cache: 1 MB, 8-way set associative, write-back, 64 bytes/line, exclusive cache
L3 cache: 22 MB, 16-way associative, write-back, exclusive cache
6 memory channels per processor, theor. memory bandwidth 127.8 GB/s per processor
peak performance is 54.4 GigaFlops / processor core thread

Shared Memory Parallel Systems

Big Memory / Shared Memory Parallelism (wr44)

For shared memory jobs with a demand for large main memory, highly parallelism and/or high local IO demands there is this many-core shared memory server available based on the barebone Gigabyte R182-Z92 (rev 100) with a MZ92-FS0-00 mainboard.

2x AMD EPYC 7702 (Zen 2 / Rome) processors at 2 GHz (TurboBoost up to 3.35)
1 TB DDR4-2933 memory
4x Micron 9300 MAX-3DWPD (U.2, 3.2 TB) and 1x Samsung 970 EVO (M.2, 500 GB)
10 Gb/s Ethernet Intel Server Adapter X520-SR2

Some technical details for each processor:

64 cores, 2-way hyperthreading
L1 data cache per core: 32 KB, 8-way set associative, write-back, 64 bytes/line
L2 unified cache per core: 512 KB, 8-way set associative, write-back, 64 bytes/line
L3 unified victim cache shared by all cores: 256 MB, 16-way set associative, 64 bytes/line
8 memory channels
theor. memory bandwidth 187.71 GB/s per processor (with our DDR4-2933)

Big Memory / Shared Memory Parallelism (wr43)

For shared memory jobs with a demand for large main memory and/or highly parallelism there is this many-core shared memory server available based on the barebone Supermicro 8027T-TRF+.

4-way motherboard Supermicro X9QR7-TF+
4 Intel Xeon E5-4657L processors at 2.4 GHz (TurboBoost up to 2.9)
768 GB DDR3-1866 memory
500 GB Micron MX200 SSD
10 Gb/s Ethernet Intel Server Adapter X520-SR1

Some technical details for the processors:

12 cores, 2-way hyperthreading
L1 data cache per core: 32 KB, 8-way set associative, write-back, 64 bytes/line
L2 unified cache per core: 256 KB, 8-way set associative, write-back, 64 bytes/line
L3 unified cache shared by all cores: 30 MB, 128-way set associative, write-back, 64 bytes/line
4 memory channels
theor. memory bandwidth 59.7 GB/s per processor

Accelerator

GPU Computing (wr20-wr25)

These 6 nodes each have 4 Nvidia A100 SXM4 connected with NVLink.

Barebone Supermicro A+ Server 2124GQ-NART
mainboard H12DSG-Q-CPU6
2 AMD EPYC 7543 (Zen 3) at 2.8 GHz with in total 128 hardware threads
512 GiB DDR4-3200 memory
3.8 + 0.9 GB SSD
4 Nvidia HGX-A100 SXM4 with 80 GB memory connected by 600 GB/s NVLink
2 port 200 Gb/s Mellanox Infiniband adapter

Some technical details for the processor:

32 cores / 64 hardware threads
L1 data cache: 1 MiB, 8-way set associative, write-back, 64 bytes/line
L2 unified cache: 16 MiB, 8-way set associative, write-back, 64 bytes/line
L3 cache: 256 MiB, 16-way associative, write-back
8 memory channels per processor, theor. memory bandwidth 190.73 GiB/s per processor

Some technical details for the GPUs:

Nvidia HGX-A100 SXM4 GPU (Ampere architecture) with 6912 Cuda cores (108 SM) and 640 Tensor cores
19.5 TFlops (FP32) / 9.7 TFlops (FP64) / 1565 TFlops (tensor FP32) peak performance
80 GB HBM2 memory
1.9 TB/s bandwidth to onboard memory
system interface PCIe 4.0 x16
600 GB/s NVLink interconnect bandwidth
The GPU implements the Nvidia Ampere architecture

There are several hardware restrictions using this cards (in total 4 devices):


Device 0: "NVIDIA A100-SXM4-80GB"
  CUDA Driver Version / Runtime Version          11.6 / 11.6
  CUDA Capability Major/Minor version number:    8.0
  Total amount of global memory:                 81070 MBytes (85007794176 bytes)
  (108) Multiprocessors, (064) CUDA Cores/MP:    6912 CUDA Cores
  GPU Max Clock rate:                            1410 MHz (1.41 GHz)
  Memory Clock rate:                             1593 Mhz
  Memory Bus Width:                              5120-bit
  L2 Cache Size:                                 41943040 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        167936 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 5 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 193 / 0
  Compute Mode:
 < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

GPU Computing (wr14)

This node has 4 Nvidia Tesla V100 SXM2 connected with NVLink.

Barebone Dell PowerEdge C4140
2 Intel Xeon Gold 6130 at 2.1 GHz with in total 64 hardware threads
6 memory channels per processor
192 GB DDR4-2466 memory
480 GB SSD
4 Nvidia Tesla V100 SXM2 with 16 GB memory connected by NVLink
100 Gb/s Intel Omni-Path adapter Intel 100HFA016LS

Some technical details for the GPUs:

Nvidia Tesla V100 SXM2 GPU (Volta architecture) with 5120 Cuda cores and 640 Tensor cores
15.7 TFlops / 7.8 TFlops / 125 TFlops peak performance 32 / 64 bit / tensor floating point performance
16 GB HBM2 memory
900 GB/s bandwidth to onboard memory
system interface PCIe 3.0 x16
300 GB/s NVLink interconnect bandwidth
The GPU implements the Nvidia Volta architecture

There are several hardware restrictions using this cards (in total 4 devices):


Device 0: "Tesla V100-SXM2-16GB"
  CUDA Driver Version / Runtime Version          9.2 / 9.2
  CUDA Capability Major/Minor version number:    7.0
  Total amount of global memory:                 16160 MBytes (16945512448 bytes)
  (80) Multiprocessors, ( 64) CUDA Cores/MP:     5120 CUDA Cores
  GPU Max Clock rate:                            1530 MHz (1.53 GHz)
  Memory Clock rate:                             877 Mhz
  Memory Bus Width:                              4096-bit
  L2 Cache Size:                                 6291456 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 5 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 26 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

GPU Computing (wr15-wr19)

These nodes have a Nvidia Tesla V100 PCIe.

Barebone Dell PowerEdge R740
2 Intel Xeon Gold 6130 at 2.1 GHz with in total 64 hardware threads
6 memory channels per processor
192 GB DDR4-2466 memory
480 GB SSD
Nvidia Tesla V100 with 16 GB memory
100 Gb/s Intel Omni-Path adapter Intel 100HFA016LS

Some technical details for the GPU:

Nvidia Tesla V100 PCIe GPU (Volta architecture) with 5120 Cuda cores and 640 Tensor cores
14 TFlops / 7 TFlops / 112 TFlops peak performance 32 / 64 bit / tensor floating point performance
16 GB HBM2 memory
900 GB/s bandwidth to onboard memory
system interface PCIe 3.0 x16
The GPU implements the Nvidia Volta architecture

There are several hardware restrictions using this card:


Device 0: "Tesla V100-PCIE-16GB"
  CUDA Driver Version / Runtime Version          9.2 / 9.2
  CUDA Capability Major/Minor version number:    7.0
  Total amount of global memory:                 16160 MBytes (16945512448 bytes)
  (80) Multiprocessors, ( 64) CUDA Cores/MP:     5120 CUDA Cores
  GPU Max Clock rate:                            1380 MHz (1.38 GHz)
  Memory Clock rate:                             877 Mhz
  Memory Bus Width:                              4096-bit
  L2 Cache Size:                                 6291456 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 7 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 59 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Networks

There are three networks where all nodes are connected to at least 2:

a service network
a fast interconnection network for communication in parallel applications (200 Gb/s Infiniband or 100 Gb/s Omni-Path)
a 1 or 10 Gigabit Ethernet network for services

For Ethernet connectivity, two central switches IBM G8264 connect most Ethernet nodes with 10 Gb/s or 1Gb/s. The two switches are linked with two 40Gb/s ports.
The Infiniband network is realized with three 40-port 200 Gb/s Infiniband switches.
The Omni-Path network is realized with two 48-port Omni-Path switches with a 3:1 blocking.