NIC4

Introduction

NIC4 is the High Performance Computing (HPC) massively parallel cluster of  the University of Liège, installed in the framework of the Consortium des Équipements de Calcul Intensif ( CÉCI ) and funded by the Fonds de la Recherche Scientifique de Belgique ( F.R.S.-FNRS ) under Grant No. 2.5020.11.

Hosted at the SEGI facility (Sart Tilman B26), it features 128 compute nodes with two 8-cores Intel E5-2650 processors at 2.0 GHz and 64 GB of RAM (4 GB/core), interconnected with a QDR Infiniband network (2:1 blocking factor), and having exclusive access to a fast 144 TB FHGFS parallel filesystem.

Access and General Documentation

As for the 5 other CÉCI clusters, the main source of information concerning the access to and the usage of NIC4 is gathered on the CÉCI website.

Anyone who is officially affiliated with a university member of the CÉCI consortium (ULiège, UCLouvain, ULB, UNamur and UMons), is holder of an official university email address and is endorsed by a supervisor with a permanent academic or scientific position, can claim access to the NIC4 supercomputer and all the CÉCI supercomputing infrastructure.

The first step is to create a CÉCI account by visiting the login.ceci-hpc.be website (your computer must be connected to your university network, see blow) and entering your email address. The full procedure is explained here.

The clusters can only be accessed from a computer connected to a university network, either directly (cable or Wi-Fi), or through a gateway or through a VPN. They must be accessed though a secure shell (SSH). More information here.

ULiège members should install and use the preconfigured "Pulse Secure" VPN client software. On Linux, you can also install the "openconnect" software with your prefered package manager and use it with: ''sudo openconnect --juniper -u sNNNNN vpn.gw.ulg.ac.be. On a Mac, "openconnect" can also be easily installed with homebrew.

To start working, you will need to write a submission script describing the resources you need and the operations you want to perform, and submit that script to the resource manager/job scheduler. The one installed on the CÉCI clusters is named Slurm. Find more information on how to do that here.

Please read carefully the FAQ and the HowTo's sections before sending your question by email to nicadm@segi.ulg.ac.be.

Specific Documentation

Preferred kind of jobs and queue configuration

NIC4 is ideally suited for massively parallel jobs (MPI, several dozens of cores) with many communications and/or a lot of (parallel) disk I/O operations.

Accordingly, the default queue is configured to allow jobs of maximum 2 days. If your job takes longer, increase the degree of parallelism, or implement checkpointing, a mechanism that allows stopping a computation moment and restart it at a later time (more on this here and here). The maximum number of cores per user is currently set to 448 (these settings are indicative only, and may change depending on the load on the cluster).

The 128 nodes have 64 GB of RAM and thus 4 GB per core. To maximize the occupation rate of all the cores of the cluster, do not launch MPI jobs with more than 4 GB per core.

A small number of SMP/OpenMP parallel jobs running on one node are allowed, but they also should try to respect the 4 GB of RAM per core limitation. If your jobs need more RAM or longer running time, use other CÉCI clusters. And if the parallel efficiency of your OpenMP job is not ideal, don't forget that running 2 jobs with 8 cores each should be more efficient that only one job with 16 cores !

A small number of serial jobs with no more than 16 GB are also allowed. If your jobs need more RAM or longer running time, use other CÉCI clusters.

To enforce these guidelines, the maximum number of concurrently running jobs per user is set to 32 , and the maximum number of jobs an user can submit at the same time is set to 64 (sum of the RUNNING and PENDING jobs).

See also Which cluster should I use?

Storage

The home directories of the users ( $HOME on the  /home/ partition) are hosted on a 70 TB NFS server, with a quota of 20 GB per user (that could be increased upon motivated request). Check your quota with the 'quota' command.

The $HOME directories ARE NOT BACKED UP ! It is your responsibility to get back a copy of your important files and directories. Your home directory is well suited to hold you configuration files, your programs (sources and executables), your input data and your important result files (if they fit within your quota). Don't launch your jobs directly in your home directory, instead work in your $SCRATCH directory (see below)

NIC4 has a second independent storage system, a 144 TB very fast parallel distributed FHGFS file system. Each user has automatically a $SCRATCH = $GLOBALSCRATCH directory on that /scratch partition, where he should launch his jobs. There is no quota on this partition.

Moreover, the CÉCI common filesystem is fully available on the 6 CÉCI clusters, and the /CECI/ partition is directly accessible from all the login and compute nodes on all the clusters. Make sure to try it out by just typing 'cd $CECIHOME ; pwd'. More information here.

Modules

'module available' shows you the list of all the installed software and applications. If you need more information about a module, use 'module show software_name' and/or 'module help software_name'.

BLAS and LAPACK: If you need the BLAS and/or LAPACK libraries, use the 'openblas/0.2.20' module which provides an optimized and multithreaded implementation of the BLAS and LAPACK libraries (the number of threads is controlled by the $OPENBLAS_NUM_THREADS environment variable which is set by default to 1), or use the Intel MKL library 'intel/mkl/64/11.1/2013_sp1.1.106'.

MPI (Message Passing Interface): try first OpenMPI version 1.6.4, compiled with GCC ('openmpi/qlc/gcc/64/1.6.4') or with Intel compilers ('openmpi/qlc/intel/64/1.6.4').

Python: The basic version of Python installed with the system is 2.6.6. If you need a more recent version, there is a 'python/2.7.6' module. If you need a very recent Python2, or Python3, see the EasyBuild section below.

Matlab/Octave: Matlab licenses are expensive and not easy to manage at the ULg level for a cluster available to users from other universities, so Matlab will not be installed on NIC4. But we provide a 100% free and mostly compatible alternative: Octave ! ('octave/3.8.0'). And there is also Scilab ('scilab/5.4.1'). Moreover, we installed modules ('mcr/R201*_v*') for different versions of the Malab Compiler/Matlab Runtime that enables the execution of compiled MATLAB applications or components on computers that do not have MATLAB installed.

EasyBuild: loading the 'EasyBuild' module gives you access to an Experimental and Untested list of additional software ('module load EasyBuild ; module avail'). EasyBuild modules using MPI may have some problems. Do not try to mix modules coming from the main '---- /cm/shared/modulefiles ----' section with modules coming with the '---- /home/easybuild/Modules/modulefiles ----' section.

 

Last edit: September 21th 2018 by David Colignon

 

Contact(s) : David Colignon

Version imprimable Page mise à jour le 2018-09-21