Gromacs2019 install request

hrishidhondge · Août 26, 2023, 4:59

Would it be possible to install the Gromacs version 2019 on the core cluster?

There is a Gromacs version 2020 available, but I need the 2019 version for a few of my biomolecular simulations. It would be really a great help!

I checked the installed version (2020) of Gromacs and it lacks MPI support. Without MPI it will take almost forever to run a few nanoseconds of simulation for even a small system.
The MPI support can be added while installing, during the cmake command :

cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=OFF -DREGRESSIONTEST_PATH=/PATH/WHERE/regressiontests-2019 -DGMX_MPI=ON -DGMX_GPU=ON

I also tried installing myself the 2019 version of Gromacs, but I encountered the following error:

nvcc fatal   : Unsupported gpu architecture 'compute_30'

CMake Error at libgromacs_generated_pme-gather.cu.o.Release.cmake:220 (message):
  Error generating
  /shared/projects/free_energy_computation/gromacs/gromacs-2019/build/src/gromacs/CMakeFiles/libgromacs.dir/ewald/./libgromacs_generated_pme-gather.cu.o

Please let me know if you can either install it as a module on cluster or help me in solving this error so I can install it, that would be a great help.
Thank you in advance!

hrishidhondge · Août 29, 2023, 5:42

Hello @team.ifbcorecluster,

I tried several things and the same issue is described here (Installation issue: gromacs fail due to cuda compute_30 (dropped in cuda11) · Issue #18239 · spack/spack · GitHub).

But now I'm getting the following issue.

*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[gpu-node-01.ifb.local:00443] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

It would be very helpful if you could help me to fix this one.
Thank you in advance!

dbenaben · Août 30, 2023, 7:32

Hi,

There is not so many GPU nodes. So I think it's not really usefull tu use MPI in this case.

Have you tried to use a full GPU node without MPI ?

hrishidhondge · Août 30, 2023, 8:48

Hello @dbenaben,

I tried several commands and with each of these, it's taking a very long time like 400 steps will be completed on Sep 17, 2023.

I tried the following commands:

1. gmx mdrun -v -deffnm md_0_10

2. gmx mdrun -ntmpi 6 -ntomp 12 -nb gpu -pme gpu -npme -1 -bonded gpu -v -deffnm md_0_10

3. gmx mdrun -gputasks 0001 -nb gpu -pme gpu -npme 1 -ntmpi 4 -v -deffnm md_0_10

4. mpirun -np 1 gmx_mpi mdrun -ntomp 6 -nb gpu -gputasks 00 -v -deffnm md_0_10

5. gmx mdrun -ntmpi 2 -nb gpu -pme gpu -bonded gpu -v -deffnm md_0_10

where, md_0_10 is the input file.

I don't think with any of these commands the gmx tried to use the full node of GPU. How can I do that?

dbenaben · Août 30, 2023, 9:11

I don't know how Gromacs works, but I would try on CPU and check the CPU efficiency with seff.

If it's too long, I will try with the GPU. You can use differents "profils" witth slurm/nvidia.
First a light "profil" (1g.5gb) and if necessary a full GPU card (7g.40gb).
https://ifb-elixirfr.gitlab.io/cluster/doc/slurm/slurm_GPU/
https://ifb-elixirfr.gitlab.io/cluster/doc/slurm/slurm_at/#gpu-nodes

But please, each time, check that ressources are really used (especially GPU): https://ifb-elixirfr.gitlab.io/cluster/doc/troubleshooting/#slurm-how-to-use-resources-wisely

hrishidhondge · Août 30, 2023, 11:21

Hello @dbenaben,

I tried looking for the usage of resources and found that the GPUs are not detected

GROMACS:      gmx mdrun, version 2019
Executable:   /shared/projects/free_energy_computation/gromacs/bin/gmx
Data prefix:  /shared/projects/free_energy_computation/gromacs
Working dir:  /shared/ifbstor1/projects/free_energy_computation/simulation/secA_LTS18945
Process ID:   18088
Command line:
  gmx mdrun -v -deffnm md_0_10

GROMACS version:    2019
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  NONE
FFT library:        fftw-3.3.8
RDTSCP usage:       disabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /usr/bin/cc GNU 4.8.5
C compiler flags:        -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
C++ compiler:       /usr/bin/c++ GNU 4.8.5
C++ compiler flags:     -std=c++11   -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
CUDA compiler:      /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2021 NVIDIA Corporation;Built on Sun_Aug_15_21:14:11_PDT_2021;Cuda compilation tools, release 11.4, V11.4.120;Build cuda_11.4.r11.4/compiler.30300941_0
CUDA compiler flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_70,code=compute_70;-use_fast_math;;; ;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        11.40
CUDA runtime:       N/A

NOTE: Detection of GPUs failed. The API reported:
      no CUDA-capable device is detected
      GROMACS cannot run tasks on a GPU.

Running on 1 node with total 64 cores, 64 logical cores, 0 compatible GPUs
Hardware detected:
  CPU info:
    Vendor: AMD
    Brand:  AMD EPYC 7343 16-Core Processor                
    Family: 25   Model: 1   Stepping: 1
    Features: aes amd apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf misalignsse mmx msr nonstop_tsc pcid pclmuldq pdpe1gb popcnt pse rdrnd rdtscp sha sse2 sse3 sse4a sse4.1 sse4.2 ssse3 x2apic
  Hardware topology: Only logical processor count

Highest SIMD level requested by all nodes in run: AVX2_128
SIMD instructions selected at compile time:       None
This program was compiled for different hardware than you are running on,
which could influence performance.

The current CPU can measure timings more accurately than the code in
gmx mdrun was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake option.

I'm not sure why is it happening.

dbenaben · Août 30, 2023, 11:31

I don't know. Maybe a CUDA version issue compatibility?

Also, be careful with the CPU usage. Some software detect the number of cores (Running on 1 node with total 64 cores) and try to use them instead of using CPU/cores allocated by slurm. So, for example, if you ask for 6 CPU and Gromacs detects 64CPU, Gromacs could launch 64 process but your job could only use 6 CPU, it's then counters productive.