CHPC Software: Math Libraries
By the term Math library in this document we consider a software package that includes
functions that perform certain mathematical operations. This is a very wide term and
as such the list below is not exhaustive, but, it represents the most commonly used
math functions in scientific and engineering computations.
Math libraries can be roughly divided into general libraries, which provide multitude
of functionality, and specialized libraries that provide specific functionality. Among
the general libraries we include Intel Math Kernel Library (MKL) library, or GNU Scientific
Library (GSL). The pecialized libraries include BLAS and LAPACK linear algebra libraries,
FFTW Fast Fourier Transform library, etc. The general libraries often provide optimized
functionality of the specialized libraries, or use them underneath.
The below listed libraries are the most common libraries that we provide, if you don't
see the one you need on the list, please, contact us.
Intel Math Kernel Library
MKL contains highly optimized math routines. It includes full optimized BLAS, LAPACK, sparse solvers, vector math library, random number generators and and fast Fourier transform routines (including FFTW wrappers). For more information, consult the Intel Math Kernel Library Documentation.
MKL is supplied as an independent module, depending on a compiler. For example to use MKL with the Intel compiler, we need to load both the compiler and the MKL module:
module load intel-oneapi-compilers intel-oneapi-mkl
Compilation instructions:
The examples below (diagonalization of a symmetric matrix) require the source files lapack1.f90 and lapack1.c
Intel Fortran (using dynamic linking)
ifort lapack1.f90 -o lapack1_ifort -L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -Wl,-rpath=$MKLROOT/lib/intel64
Intel C/C++ (using dynamic linking)
icc lapack1.c -o lapack1_icc -L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -Wl,-rpath=$MKLROOT/lib/intel64
If you use the C++ compiler, please replace icc by icpc and change the suffix .c into .cc in the previous statement.
It is also possible to incorporate OpenMP-threaded MKL into an OpenMP or mixed MPI/OpenMP code. To do so, parallelize your code with OpenMP but leave the MKL calls unthreaded, and instead link the threaded MKL library as e.g.:
icc lapack1.c -o lapack1_icc_mt -L$MKLROOT/lib/intel64 -Wl,-rpath=$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
Then run as you usually would with given OMP_NUM_THREADS and MKL calls will run over that many threads as well.
For distributed (MPI) parallel linear algebra routines, ScaLAPACK is also fully implemented inside MKL and recommended to use instead of the reference ScaLAPACK distribution. From release 11.2 (2015), MKL also includes cluster sparse matrix solvers based on PARDISO. Large sparse eigenproblems can be solved, to certain tolerance, using PRIMME, which can be linked to MKL for LAPACK/BLAS.
These and other advanced MKL routines require relatively complex linking schemes for
which the best is to use the MKL Link Line Advisor page. The MKL Link Advisor also lets you define link flags for GNU and PGI compilers,
which we recommend to use as MKL generally provides superior performance. To use GNU
or PGI compilers with MKL, first load the intel module, then load the GNU or PGI module,
and then other potential libraries to use with GNU or PGI compiler.
MKL also includes interface for FFTW - commonly used Fast Fourier Transform library.
It is advantageous to use this interface especially when building multi CPU architecture
binaries with the -ax Intel compiler flag. The header files for the FFTW interface
are at $MKLROOT/include/fftw
.
GNU Scientific Library (GSL)
GSL is a numerical library for C/C++ provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total with an extensive test suite. While GSL is not parallel, it is reasonably thread safe and its routines should be callable from parallel code sections. One can also link a parallel BLAS library such as MKL or ACML and utilize the shared memory parallelism they provide.
GNU gcc
module load gcc/8.5.0 gsl
gcc source.c -o executable -I$GSL_ROOT/include -L$GSL_ROOT/lib -lgsl -lcblas -Wl,-rpath=$GSL_ROOT/lib
This links with the generic unoptimized version of BLAS. $GSL_INCDIR and $GSL_LIBDIR are environment variables defined in the gsl module.
Intel C/C++
module load intel-oneapi-compilers gsl intel-oneapi-mkl
(or
iccicpc
) source.c -O3 -axCORE-AVX2,AVX,SSE4.2 -o executable -I$GSL_ROOT/include -L$GSL_ROOT/lib
-lgsl -L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread
-Wl,-rpath=$GSL_ROOT/lib -Wl,-rpath=$MKLROOT/lib/intel64
This links with MKL threaded BLAS library for optimal performance and OpenMP parallelism.
OpenBLAS Library
OpenBLAS is is an optimized BLAS library based on GotoBLAS2. Its advantage is a relative simplicity,
disadvantage is a low maturity. Some of the applications we build link to OpenBLAS
for simplicity, but we recommend that everyone uses MKL instead. OpenBLAS is available
via module load, e.g. module load gcc/8.5.0 openblas
. Linking is relatively simple with adding the following to the link line: -Wl,-rpath=$OPENBLAS_ROOT/lib -L$OPENBLAS_ROOT/lib -lopenblas.
LAPACK Library
LAPACK (Linear Algebra PACKage) provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. It runs on single processor only. The CentOS 7 operation system comes with reference LAPACK (and BLAS), but we highly recommend to use the Intel MKL which includes full LAPACK for optimal performance. Linking LAPACK with MKL is the same as linking BLAS, described above.
ScaLAPACK Library
The ScaLAPACK (or Scalable LAPACK) library includes a subset of LAPACK routines redesigned for distributed memory MIMD parallel computers. It is written in a Single-Program- Multiple-Data style using explicit message passing for interprocessor communication. It assumes matrices are laid out in a two-dimensional block cyclic decomposition.
The fundamental building blocks of the ScaLAPACK library are distributed memory versions (PBLAS) of the Level 1, 2 and 3 BLAS, and a set of Basic Linear Algebra Communication Subprograms (BLACS) for communication tasks that arise frequently in parallel linear algebra computations. In the ScaLAPACK routines, all interprocessor communication occurs within the PBLAS and the BLACS. One of the design goals of ScaLAPACK was to have the ScaLAPACK routines resemble their LAPACK equivalents as much as possible.module load intel-oneapi-compilers intel-oneapi-mpi
mpiifort -openmp -o executable program.f90 -Wl,-rpath=$MKLROOT/lib/intel64 -L$MKLROOT/lib/intel64
-lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_ilp64
-liomp5 -lpthread -lm -I$MKLROOT/lib/include
FFTW Library
Fastest Fourier Transform in the West (FFTW) is a high performance Fast Fourier Transform (FFT) library. Apart from being optimized
for most PC architectures it also includes OpenMP and MPI parallelism. Latest serial
and threaded OpenMP builds with the three compilers that we support (GNU, Intel and
NVHPC) can be accessed through their respective modules. To link serial FFTW with
e.g. Intel compiler, simply add -L$FFTW_ROOT/lib -lfftw3
to the link line. To link OpenMP FFTW, add -lfftw3_omp
to the serial link line.
For example, for the Intel compiler with OpenMP:
module load intel-oneapi-compilers fftw
pgcc myprog.c -o myprog.exe -I$FFTW_ROOT/include -L$FFTW_ROOT/lib -Wl,-rpath=$FFTW_ROOT/lib -lfftw3 -lfftw3_omp
Please, note that there is also FFTW version 2 which is still used in some of the
codes, which is incompatible with FFTW 3. This one is available as module fftw/2.1.5
.
Also note that the Intel MKL includes FFTW wrappers with the FFT performance being on par with FFTW, for the information how to link see our MKL documentation.