Hi,
It has been a few days that I fail to run properly my mpi script. I start using 1 node / 8 tasks (not so large ressources) and my job almost fails every time. Looking at the error.log I get :
--------------------------------------------------------------------------
A request was made to bind to that would result in binding more
processes than cpus on a resource:
Bind to: CORE
Node: cpu-node-91
#processes: 2
#cpus: 1
You can override this protection by adding the "overload-allowed"
option to your binding directive.
--------------------------------------------------------------------------
Here is my slurm script :
#!/bin/bash
#SBATCH --job-name=genomatch_job # Job name
#SBATCH --output=output_%j.log # Standard output log (%j is replaced with the jobID)
#SBATCH --error=error_%j.log # Standard error log
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks=8 # Total number of tasks (should match the number of tasks per node)
#SBATCH --ntasks-per-node=8 # Number of tasks per node
#SBATCH --cpus-per-task=1 # Number of CPUs per task
#SBATCH --mem=32G # Total memory allocation (adjust as needed)
#SBATCH --time=04:00:00 # Time limit (D-HH:MM:SS, adjust as needed)
#SBATCH --partition=fast # Partition to use
#SBATCH --mail-type=BEGIN,END,FAIL # Notifications for job start, end, and fail
#SBATCH --mail-user=nicolas.mendiboure@ens-lyon.fr # Change this to your email
# Set home directory and data directories
DATADIR=/shared/projects/genomatch/data
INPUTDIR=$DATADIR/inputs
GENOME=$INPUTDIR/S288c-Lys2.fa
SPARSE=$INPUTDIR/AD265-266/AD265_AD266_merged_S288c_DSB_chr3_rDNA_cutsite_q20.txt
FRAGS=$INPUTDIR/AD265-266/fragments_list_S288c_chr3_DpnIIHinfI.txt
CHROM=$INPUTDIR/AD265-266/info_contigs_S288c_chr3_DpnIIHinfI.txt
K=8
source ~/.bashrc
# Activate the conda environment
conda activate genomatch_env
module load openmpi
# Log some useful information
echo "Job started on $(hostname) at $(date)"
echo "Running on ${SLURM_NTASKS} total tasks"
echo "Running ${SLURM_TASKS_PER_NODE} tasks per node"
echo "Allocated memory: ${SLURM_MEM_PER_NODE} MB"
# Run the script
mpirun --bind-to core -np ${SLURM_NTASKS} genomatch kmerize -g $GENOME -s $SPARSE -f $FRAGS -c $CHROM -k $K -b 20kb -F
'genomatch' is the name of my python module, with its the subcommand kmerize (and its args).
I tried with and without "--bind-to" and "--map-by" options, nothing changed.
On my personal computer the command mpirun (or mpiexec) just works fine, but as soon as I switch to the ifb cluster and use a slurm script, it doesn't work.
I am very new to slurm and HPC in general, its quite possible that I miss some points concerning the proper definition of my slurm.
Thank you in advance