HELP launch failed requeued held

Hello Julie,

Jobs can be "requeued" after launch failed.
In you case, it happens sometimes, a node was in error (up but not running correctly).
Slurm try to run your job on this idle node, but it goes wrong, and the job is "requeued held".
The failed node (cpu-node-35) have been reboot. It's ok right now.

So nothing to do. It was a error on the server.

Thanks for reporting

1 « J'aime »