Maximum allowed job run time

Hello,
I'm new on W4M. I did a PCA on a big dataset (300 samples) which ran well. However my PLS analysis ran over the walltime.

Could you help?

Thank you
Simon

Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/ethevenot/multivariate/Multivariate/2.3.10
Galaxy Tool Version: 2.3.10
Tool Version: None
Tool Standard Output: stdout
Tool Standard Error: stderr
Tool Exit Code: None
History Content API ID: 6eece17699c62b74
Job API ID: 777ca7cf1c191b30
History API ID: 49f747565f9af515
UUID: 85bac157-afee-42a2-8da7-c0f754be2cb5

@ethevenot, the current setting is 12h.
Is it normal for you that it take so long?

I do not know how many features you have but this running time seems definitely too long (it should take < 1 min). Feel free to share your history with me (etienne.thevenot@cea.fr) or send me the 3 .tsv tables if you wish.

Best wishes,

Etienne.

1 J'aime

Which is the response feature you are trying to predict by PLS(-DA)?

Hello Etienne,

I'm trying to predict the Family

I did a few cheks on your data and here are some comments.

  1. I reproduce your results: PCA OK in a few seconds and PLS-DA to predict the 'family' response is still not converging after many seconds.
  2. Your dataset contains ~100X more features than samples (so a high risk of overfitting), and two of your 5 classes contain less than 7 samples (which is very few; in any case, the cross-validation argument, which is 7 by default, should be lowered).
  3. I therefore focused on the 2 classes with the most of the samples, randomly selected 1/10 of the features, and selected only two components. I end up in a few seconds with a significant model. I could also obtain a significant model with the 3 main classes and 2 (or 3) components in a few minutes.

Thank you very much ! I've been struggling a while with that dataset !

So do you suggest that I should :

  • Increase my detection threshold to keep less features during pre-processing?
  • Replace the classes with few samples by a class "other" in my sampleMetadata file?

Thanks again

I assume that some (many) of your features are noise which should indeed be filtered during preprocessing or quality controls.
I would first analyze your 3 main classes in the classical way. You might also group the remaining samples in a 4th class.