Невозможно запустить многопоточный BLAS с поддержкой MKL в нескольких логических ядрах
Программное обеспечение, которое я разрабатываю для своей степени магистра, использует SLEPc(следовательно, PETSc) в качестве своего собственного решения, и я пытался связать PETSc с Intel MKL, чтобы использовать библиотеки BLAS MKL, которые, по моему опыту, работают быстрее, чем предоставляемые системой.
На моем персональном компьютере (MacBook Pro с MacOS) все работает просто отлично. Хотя я хотел использовать сервер для ускорения некоторых вычислений и попробовал все (ну, вероятно, не все), но мой решатель не будет работать в более чем одном логическом ядре на этом сервере (проверено с помощью htop
).
У меня нет большого опыта в этой области, поэтому я перечислю всю информацию, которая, на мой взгляд, может оказаться полезной.
Количество физических ядер: 12
Количество логических ядер: 24
Версия PETSc: 3.8.4
Версия SLEPc: 3.8.2
MKL версия: 2017.4.196
Соответствующие переменные среды:
MKLROOT=/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl OPENBLAS_NUM_THREADS=12 OMP_NUM_THREADS=12 MKL_NUM_THREADS=12 MKL_DOMAIN__NUM_THREADS="MKL_BLAS=12" MKL_THREADING_LAYER=intel OMP_NESTED=TRUE MKL_DYNAMIC="FALSE" OMP_DYNAMIC="FALSE"
PETSc
petscvariables
файл:MPICXX_SHOW = g++ -I/home/orlandini/open-mpi/install/include -pthread -Wl,-rpath -Wl,/home/orlandini/open-mpi/install/lib -Wl,--enable-new-dtags -L/home/orlandini/open-mpi/install/lib -lmpi C_DEPFLAGS = -MMD -MP FC_DEFINE_FLAG = -D MPICC_SHOW = gcc -I/home/orlandini/open-mpi/install/include -pthread -Wl,-rpath -Wl,/home/orlandini/open-mpi/install/lib -Wl,--enable-new-dtags -L/home/orlandini/open-mpi/install/lib -lmpi AR_FLAGS = cr CXX_DEPFLAGS = -MMD -MP FC_DEPFLAGS = -MMD -MP MPIFC_SHOW = gfortran -I/home/orlandini/open-mpi/install/include -pthread -I/home/orlandini/open-mpi/install/lib -Wl,-rpath -Wl,/home/orlandini/open-mpi/install/lib -Wl,--enable-new-dtags -L/home/orlandini/open-mpi/install/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi FAST_AR_FLAGS = Scq FC_MODULE_OUTPUT_FLAG = -J PETSC_LANGUAGE = CONLY FC_LINKER_FLAGS = -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O LIBNAME = ${INSTALL_LIB_DIR}/libpetsc.${AR_LIB_SUFFIX} SL_LINKER = /home/orlandini/open-mpi/install/bin/mpicc PETSC_BUILD_USING_CMAKE = 1 CC_FLAGS = -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O MKL_SPARSE_INCLUDE = PETSC_PRECISION = double PETSC_LIB_BASIC = -lpetsc MKL_SPARSE_OPTIMIZE_LIB = FC_FLAGS = -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O BLASLAPACK_LIB = -Wl,-rpath,/opt/intel/mkl/lib/intel64 -L/opt/intel/mkl/lib/intel64 -lmkl_rt PETSC_MAT_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} PCC = /home/orlandini/open-mpi/install/bin/mpicc SL_LINKER_LIBS = ${PETSC_EXTERNAL_LIB_BASIC} MPI_LIB = MKL_SPARSE_LIB = PETSC_EXTERNAL_LIB_BASIC = -Wl,-rpath,/opt/intel/mkl/lib/intel64 -L/opt/intel/mkl/lib/intel64 -Wl,-rpath,/home/orlandini/open-mpi/install/lib -L/home/orlandini/open-mpi/install/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/opt/intel/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/opt/intel/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/opt/intel/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/opt/intel/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -lmkl_rt -lm -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lstdc++ -lm -ldl -lmpi -lgcc_s -lpthread -ldl SL_LINKER_FLAGS = ${PCC_LINKER_FLAGS} CC_SUFFIX = o PETSC_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} SHLIBS = libpetsc CONFIGURE_OPTIONS = --with-scalar-type=complex --with-cmake=1 --with-shared-libraries=1 --with-mpi-dir=/home/orlandini/open-mpi/install/ --with-debugging=0 --with-blaslapack-lib=/opt/intel/mkl/lib/intel64/libmkl_rt.so PETSC_CHARACTERISTIC_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} PTHREAD_LIB = PETSC_SCALAR = complex PETSC_FC_INCLUDES = -I/home/orlandini/petsc/include -I/home/orlandini/petsc/arch-linux2-c-opt/include -I/home/orlandini/open-mpi/install/include CPP_FLAGS = PETSC_KSP_LIB_BASIC = -lpetsc FPP_FLAGS = FC_LINKER = /home/orlandini/open-mpi/install/bin/mpif90 MKL_SPARSE_OPTIMIZE_INCLUDE = PETSC_KSP_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} CXX_FLAGS = -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O -fPIC PCC_LINKER_FLAGS = -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O PETSC_CONTRIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} PETSC_CC_INCLUDES = -I/home/orlandini/petsc/include -I/home/orlandini/petsc/arch-linux2-c-opt/include -I/home/orlandini/open-mpi/install/include PCC_LINKER = /home/orlandini/open-mpi/install/bin/mpicc PETSC_SYS_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} PCC_FLAGS = -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O PTHREAD_INCLUDE = PETSC_TS_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} PETSC_TAO_LIB_BASIC = -lpetsc BLASLAPACK_INCLUDE = PETSC_TS_LIB_BASIC = -lpetsc PETSC_VEC_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} CC_LINKER_SUFFIX = SL_LINKER_SUFFIX = so PETSC_DM_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} DESTDIR = /home/orlandini/petsc/arch-linux2-c-opt FC_MODULE_FLAG = -I wPETSC_DIR = /home/orlandini/petsc PETSC_WITH_EXTERNAL_LIB = -L/home/orlandini/petsc/arch-linux2-c-opt/lib -Wl,-rpath,/opt/intel/mkl/lib/intel64 -L/opt/intel/mkl/lib/intel64 -Wl,-rpath,/home/orlandini/open-mpi/install/lib -L/home/orlandini/open-mpi/install/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/opt/intel/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/opt/intel/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/opt/intel/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/opt/intel/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -lpetsc -lmkl_rt -lm -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lstdc++ -lm -Wl,-rpath,/home/orlandini/open-mpi/install/lib -L/home/orlandini/open-mpi/install/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/opt/intel/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/opt/intel/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/opt/intel/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/opt/intel/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -ldl -Wl,-rpath,/home/orlandini/open-mpi/install/lib -lmpi -lgcc_s -lpthread -ldl PETSC_TAO_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} MPI_INCLUDE = -I/home/orlandini/open-mpi/install/include FC_SUFFIX = o PETSC_SNES_LIB = ${C_SH_LIB_PATH} ${PETSC_WITH_EXTERNAL_LIB} SHELL = /bin/sh GREP = /bin/grep MV = /bin/mv PYTHON = /usr/bin/python MKDIR = /bin/mkdir -p SEDINPLACE = /bin/sed -i SED = /bin/sed DIFF = /usr/bin/diff -w GZIP = /bin/gzip RM = /bin/rm -f CP = /bin/cp CC_LINKER_SLFLAG = -Wl,-rpath, CC = /home/orlandini/open-mpi/install/bin/mpicc RANLIB = /usr/bin/ranlib DYNAMICLINKER = /home/orlandini/open-mpi/install/bin/mpicc CXX = /home/orlandini/open-mpi/install/bin/mpicxx FC = /home/orlandini/open-mpi/install/bin/mpif90 CXXCPP = /home/orlandini/open-mpi/install/bin/mpicxx -E FC_LINKER_SLFLAG = -Wl,-rpath, CPP = /home/orlandini/open-mpi/install/bin/mpicc -E AR_LIB_SUFFIX = a LD_SHARED = /home/orlandini/open-mpi/install/bin/mpicc AR = /usr/bin/ar DIR = /home/orlandini/petsc PETSC_SCALAR_SIZE = 64 PETSC_INDEX_SIZE = 32 MAKE_IS_GNUMAKE = 1 MAKE_NP = 18 NPMAX = 24 OMAKE_PRINTDIR = /usr/bin/make --print-directory MAKE = /usr/bin/make MAKE_PAR_OUT_FLG = --output-sync=recurse OMAKE = /usr/bin/make --no-print-directory GIT = git SL_LINKER_FUNCTION = -shared -Wl,-soname,$(call SONAME_FUNCTION,$(notdir $(1)),$(2)) SONAME_FUNCTION = $(1).so.$(2) BUILDSHAREDLIB = yes GDB = /usr/bin/gdb DSYMUTIL = true MPIEXEC = /home/orlandini/open-mpi/install/bin/mpiexec CMAKE = /home/orlandini/cmake/bin/cmake CTEST = /home/orlandini/cmake/bin/ctest TEST_RUNS = C C_Info C_NotSingle Fortran Fortran_NotSingle F90_NotSingle Cxx F90 F90_Complex F2003 Fortran_Complex C_Complex
Хотя PETSc настроен для работы с MPI, на данный момент я не использую его (т. Е. Только один процесс MPI).
Я пытался связаться с libiomp5.so
а также libmkl_intel_thread.so
также вместо libmkl_rt.so
но без удачи.
Мне жаль, что я публикую это здесь, но я совершенно заблудился, и я был бы очень признателен за любые советы, которые вы, ребята, можете дать мне.