|NIC4 (old webpage)||Tweeter|
The "official" webpages for the old NIC4 and the new NIC5 clusters have moved, please go to:
Following the security incident on 05/06/2020, you will find in this paragraph important information concerning the status, acces and usage of NIC4 (just a copy of what's on the new official page http://www.campus.uliege.be/nic4 ):
NIC4 is the old (February 2014!) High Performance Computing (HPC) massively parallel cluster of the University of Liège, installed in the framework of the Consortium des Équipements de Calcul Intensif ( CÉCI ) and funded by the Fonds de la Recherche Scientifique de Belgique ( F.R.S.-FNRS ) under Grant No. 2.5020.11.
Hosted at the SEGI facility (Sart Tilman B26), it features 128 compute nodes with two 8-cores Intel E5-2650 processors at 2.0 GHz and 64 GB of RAM (4 GB/core), interconnected with a QDR Infiniband network (2:1 blocking factor), and having exclusive access to a fast 144 TB FHGFS parallel filesystem.
Anyone who is officially affiliated with a university member of the CÉCI consortium (ULiège, UCLouvain, ULB, UNamur and UMons), is holder of an official university email address and is endorsed by a supervisor with a permanent academic or scientific position, can claim access to the NIC4 supercomputer and all the CÉCI supercomputing infrastructure.
The first step is to create a CÉCI account by visiting the login.ceci-hpc.be website (your computer must be connected to your university network, see below) and entering your email address. The full procedure is explained here.
The clusters must be accessed with a secure shell (SSH), through a gateway. More information here. ULiège users who are working from outside the University network must use the latest ULiège VPN https://my.segi.uliege.be/cms/c_11650735/fr/mysegi-new-vpn to connect to the gwceci.uliege.be gateway. The use of a PC or server inside the University to connect to the gateway instead of the VPN is strongly discouraged for evident security reasons.
To start working, you will need to write a submission script describing the resources you need and the operations you want to perform, and submit that script to the resource manager/job scheduler. The one installed on the CÉCI clusters is named Slurm. Find more information on how to do that here.
Preferred kind of jobs and queue configuration
NIC4 is ideally suited for massively parallel jobs (MPI, several dozens of cores) with many communications and/or a lot of (parallel) disk I/O operations.
Accordingly, the default queue is configured to allow jobs of maximum 3 days. If your job takes longer, increase the degree of parallelism, or implement checkpointing, a mechanism that allows stopping a computation moment and restart it at a later time (more on this here). The maximum number of cores per user is currently set to 480 (these settings are indicative only, and may change depending on the load on the cluster).
The 128 nodes have 64 GB of RAM and thus 4 GB per core. To maximize the occupation rate of all the cores of the cluster, do not launch MPI jobs with more than 4 GB per core.
A small number of SMP/OpenMP parallel jobs running on one node are allowed, but they also should try to respect the 4 GB of RAM per core limitation. If your jobs need more RAM or longer running time, use other CÉCI clusters. And if the parallel efficiency of your OpenMP job is not ideal, don't forget that running 2 jobs with 8 cores each should be more efficient that only one job with 16 cores !
A small number of serial jobs with no more than 16 GB are also allowed. If your jobs need more RAM or longer running time, use other CÉCI clusters (Hercules2, Dragon2).
To enforce these guidelines, the maximum number of concurrently running jobs per user is set to 128 , and the maximum number of jobs an user can submit at the same time is set to 256 (sum of the RUNNING and PENDING jobs). Again, these settings are indicative only, and may change depending on the load on the cluster.
See also Which cluster should I use?
The home directories of the users ( $HOME on the /home/ partition) are hosted on a 70 TB NFS server, with a quota of 20 GB per user (that could be increased upon motivated request). Check your local quota with the 'quota' command or with the more general 'ceci-quota' command.
The $HOME directories ARE NOT BACKED UP ! It is your responsibility to get back a copy of your important files and directories. Your home directory is well suited to hold you configuration files, your programs (sources and executables), your input data and your important result files (if they fit within your quota). Don't launch your jobs directly in your home directory, instead work in your $SCRATCH directory (see below)
NIC4 has a second independent storage system, a 144 TB very fast parallel distributed FHGFS file system. Each user has automatically a $SCRATCH = $GLOBALSCRATCH directory on that /scratch partition, where he should launch his jobs. There is no quota on this partition.
Moreover, the CÉCI common filesystem is fully available on the 6 CÉCI clusters, and the /CECI/ partition is directly accessible from all the login and compute nodes on all the clusters. Make sure to try it out by just typing 'cd $CECIHOME ; pwd'. More information here.
'module available' shows you the list of all the installed software and applications. If you need more information about a module, use 'module show software_name' and/or 'module help software_name'.
BLAS and LAPACK: If you need the BLAS and/or LAPACK libraries, use the 'openblas/0.2.20' module which provides an optimized and multithreaded implementation of the BLAS and LAPACK libraries (the number of threads is controlled by the $OPENBLAS_NUM_THREADS environment variable which is set by default to 1), or use the Intel MKL library 'intel/mkl/64/11.1/2013_sp1.1.106'.
MPI (Message Passing Interface): try first OpenMPI version 1.6.4, compiled with GCC ('openmpi/qlc/gcc/64/1.6.4') or with Intel compilers ('openmpi/qlc/intel/64/1.6.4').
Python: The basic version of Python installed with the system is 2.6.6. If you need a more recent version, there is a 'python/2.7.6' module. If you need a very recent Python2, or Python3, see the EasyBuild section below.
Matlab/Octave: Matlab licenses are expensive and not easy to manage at the ULg level for a cluster available to users from other universities, so Matlab will not be installed on NIC4. But we provide a 100% free and mostly compatible alternative: Octave ! ('module load EasyBuild Octave/4.4.1-foss-2018b ; octave --gui'). And there is also Scilab ('scilab/5.4.1'). Moreover, we installed modules ('mcr/R201*_v*') for different versions of the Malab Compiler/Matlab Runtime that enables the execution of compiled MATLAB applications or components on computers that do not have MATLAB installed (see also https://indico.cism.ucl.ac.be/event/19/ ).
EasyBuild: loading the 'EasyBuild' module gives you access to an Experimental and Untested list of additional software ('module load EasyBuild ; module avail'). EasyBuild modules using MPI may have some problems. Do not try to mix modules coming from the main '---- /cm/shared/modulefiles ----' section with modules coming with the '---- /home/easybuild/Modules/modulefiles ----' section.
Last edit: March 3rd 2021 at 16:00 by David Colignon