ReqNodeNotAvail, UnavailableNodes:cpu-node-[11-12,25,30,34-37,51]

Dear IFB Core Cluster support team,

We submitted a job to the long partition, but the job is stuck in the PD state.
The job indicates the "ReqNodeNotAvail" reason, as shown in the screenshot below.

But, we didn't request a specific node for the job.

Do you mind giving us some suggestions on how to solve this problem?

Thank you in advance!

Best,

Hello,

There is currently no node available for your job, this is why it is in pending state.
You can check the status of the nodes of the long partition by using the following command :

sinfo -Nl -p long

We are currently performing kernel upgrades on our nodes. Some more nodes should be available in the long partition soon.

Regards,

Julien

Thanks for the information!

Best,

Pareil ce matin, c'est pas demain votre maintenance?

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
14451953      fast  TEcount mhennion PD       0:00      1 (ReqNodeNotAvail, UnavailableNodes:cpu-node-[11,30,34-37,51])

[mhennion @ cpu-node-33 10:54]$ DFAM : sinfo -Nl -p fast
Tue Dec 15 10:58:26 2020
NODELIST     NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
cpu-node-6       1     fast*       mixed   54   2:27:1 257689        0      1   (null) none                
cpu-node-7       1     fast*        idle   54   2:27:1 257689        0      1   (null) none                
cpu-node-8       1     fast*        idle   54   2:27:1 257689        0      1   (null) none                
cpu-node-9       1     fast*        idle   54   2:27:1 257689        0      1   (null) none                
cpu-node-10      1     fast*       mixed   54   2:27:1 257689        0      1   (null) none                
cpu-node-11      1     fast*     drained   54   2:27:1 257671        0      1   (null) kernel upg          
cpu-node-12      1     fast*       mixed   54   2:27:1 257689        0      1   (null) none                
cpu-node-13      1     fast*       mixed   54   2:27:1 257689        0      1   (null) none                
cpu-node-14      1     fast*        idle   54   2:27:1 257689        0      1   (null) none            
...

Merci!