AlphaFold 2.1 on IFB cluster (?)

azanzoni · Janvier 18, 2022, 9:14

Hello,

I would like to know if there is a plan to provide AlphaFold 2.1 to the IFB community in the near future.

I am personally interested in using AlphaFold-Multimer but I don't have currently the hardware required to perform some tests. Poking @team.alphafold

Have a nice day,

Andreas

Francois · Janvier 27, 2022, 1:49

Hello,

Alphafold is installed. But we don't know if it work well or not.

So we can't provide support on it for now.

To try it you will have to ask access to gpu node.

We have started a doc here: Alphafold2 - IFB Core Cluster Documentation

Any help to improve this documentation or to make alphafold work well are welcome.

azanzoni · Janvier 30, 2022, 10:12

Thank you for your reply. I will definetely give a try very soon and provide feedback.

Andreas

Andrei_Kiselev · Février 3, 2022, 1:06

Thank you @Francois for reply and the test script!

I have a problem running AlphaFold. It hangs at HHblits step

The submission script is there /shared/home/akiselev2/aphanoclust/SSP/AlphaFold_test.sh
The slurm log file is here /shared/home/akiselev2/aphanoclust/SSP/slurm-21105284.out
I stopped the job after two hours at this step

Thank you for your help

azanzoni · Février 7, 2022, 5:41

I am facing a similar issue using the multimer option.

What is the exact error message that you get?

Best,

Andrei_Kiselev · Février 7, 2022, 6:02

I don't get an error message, just nothing happens after HHblits step is started

smarthey · Février 10, 2022, 11:12

Could you attach your file /shared/home/akiselev2/aphanoclust/SSP/slurm-21105284.out or cut and past the content ?

Maybe your job was canceled after reaching the default time limit on the gpu partition ?(DefaultTime=04:00:00).
In this case you must submit your job with the option --time=24:00:00 (max time allowed on gpu partition).

Andrei_Kiselev · Février 10, 2022, 11:25

The job was terminated manually after two hours at HHblits step. I feel that this step shouldn't take more than 10-15 mins for a short protein I used.

I0203 12:09:52.191925 47838050918080 templates.py:857] Using precomputed obsolete pdbs /shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat.
I0203 12:09:53.635205 47838050918080 xla_bridge.py:230] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: 
I0203 12:09:55.071061 47838050918080 xla_bridge.py:230] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I0203 12:10:03.425242 47838050918080 run_alphafold.py:384] Have 5 models: ['model_1', 'model_2', 'model_3', 'model_4', 'model_5']
I0203 12:10:03.425383 47838050918080 run_alphafold.py:397] Using random seed 1407284561204456532 for the data pipeline
I0203 12:10:03.425539 47838050918080 run_alphafold.py:150] Predicting test
I0203 12:10:03.430382 47838050918080 jackhmmer.py:130] Launching subprocess "/shared/ifbstor1/software/miniconda/envs/alphafold-2.1.1/bin/jackhmmer -o /dev/null -A /tmp/tmpq6_afi14/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 test.fasta /shared/bank/alphafold2/current/uniref90/uniref90.fasta"
I0203 12:10:03.454846 47838050918080 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0203 12:15:22.877326 47838050918080 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 319.422 seconds
I0203 12:15:22.881907 47838050918080 jackhmmer.py:130] Launching subprocess "/shared/ifbstor1/software/miniconda/envs/alphafold-2.1.1/bin/jackhmmer -o /dev/null -A /tmp/tmp8q_l428p/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 test.fasta /shared/bank/alphafold2/current/mgnify/mgy_clusters_2018_12.fa"
I0203 12:15:22.892377 47838050918080 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0203 12:21:48.899630 47838050918080 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 386.007 seconds
I0203 12:21:48.911220 47838050918080 hhsearch.py:85] Launching subprocess "/shared/ifbstor1/software/miniconda/envs/alphafold-2.1.1/bin/hhsearch -i /tmp/tmp_irgnxbd/query.a3m -o /tmp/tmp_irgnxbd/output.hhr -maxseq 1000000 -d /shared/bank/alphafold2/current/pdb70/pdb70"
I0203 12:21:48.977049 47838050918080 utils.py:36] Started HHsearch query
I0203 12:26:04.244283 47838050918080 utils.py:40] Finished HHsearch query in 255.267 seconds
I0203 12:26:04.259516 47838050918080 hhblits.py:128] Launching subprocess "/shared/ifbstor1/software/miniconda/envs/alphafold-2.1.1/bin/hhblits -i test.fasta -cpu 4 -oa3m /tmp/tmph4er2s0m/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /shared/bank/alphafold2/current/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0203 12:26:04.319489 47838050918080 utils.py:36] Started HHblits query
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 21105284 ON gpu-node-03 CANCELLED AT 2022-02-03T14:06:12 ***
srun: got SIGCONT
srun: forcing job termination
slurmstepd: error: *** STEP 21105284.0 ON gpu-node-03 CANCELLED AT 2022-02-03T14:06:12 ***
srun: error: gpu-node-03: task 0: Terminated

smarthey · Février 10, 2022, 11:37

I think you should wait a little longer.
In one of my tests with a small ptotein (300 aas), the HH-Blits step took 3h30 :

I0127 16:51:21.289265 46912823561920 utils.py:36] Started HHblits query
I0127 20:19:23.106022 46912823561920 utils.py:40] Finished HHblits query in 12481.771 seconds

If you use the option --db_preset=full_dbs the HHblits step can take severals hours.