Bonjour la team !
Je suis entrain de tester Helixer en mode GPU ...
J'ai fait un apptainer pull docker://gglyptodon/helixer-docker:helixer_v0.3.5_cuda_12.2.2-cudnn8
puis j'ai lance ce script
#!/bin/bash
#SBATCH -J helixer_chamadorea
#SBATCH -p gpu
#SBATCH --gres=gpu:7g.40gb:1
module load singularity
cd /shared/projects/kmexplore/chamadorea/OUTPUT/helixir_test
ASSEMBLY="chamadorea-FLYE-ASSEMBLY.fasta"
GFF="chamadorea-FLYE-ASSEMBLY.gff3"
singularity run --nv helixer-docker_helixer_v0.3.5_cuda_12.2.2-cudnn8.sif Helixer.py --fasta
-path chamadorea-FLYE-ASSEMBLY.fasta --lineage land_plant --gff-output-path chamadorea-FLYE-ASSEMBLY.gff3
Mon job est killé pendant la phase de training
Total params: 2161736 (8.25 MB)
Trainable params: 2161160 (8.24 MB)
Non-trainable params: 576 (2.25 KB)
__________________________________________________________________________________________________
/var/spool/slurm/slurmd/job60475889/slurm_script: line 12: 480136 Killed singularity run --nv helixer-docker_helixer_v0.3.5_cuda_12.2.2-cudnn8.sif Helixer.py --fasta-path $ASSEMBLY --lineage land_plant --gff-output-path $GFF
slurmstepd-gpu-node-01: error: Detected 1 oom_kill event in StepId=60475889.batch. Some of the step tasks have been OOM Killed.
Une idée ?
Merci!!
Julie