Hello,
I'm new on W4M. I did a PCA on a big dataset (300 samples) which ran well. However my PLS analysis ran over the walltime.
Could you help?
Thank you
Simon
Galaxy Tool ID: |
toolshed.g2.bx.psu.edu/repos/ethevenot/multivariate/Multivariate/2.3.10 |
Galaxy Tool Version: |
2.3.10 |
Tool Version: |
None |
Tool Standard Output: |
stdout |
Tool Standard Error: |
stderr |
Tool Exit Code: |
None |
History Content API ID: |
6eece17699c62b74 |
Job API ID: |
777ca7cf1c191b30 |
History API ID: |
49f747565f9af515 |
UUID: |
85bac157-afee-42a2-8da7-c0f754be2cb5 |
@ethevenot, the current setting is 12h.
Is it normal for you that it take so long?
I do not know how many features you have but this running time seems definitely too long (it should take < 1 min). Feel free to share your history with me (etienne.thevenot@cea.fr) or send me the 3 .tsv tables if you wish.
Best wishes,
Etienne.
1 « J'aime »
Which is the response feature you are trying to predict by PLS(-DA)?
Hello Etienne,
I'm trying to predict the Family
I did a few cheks on your data and here are some comments.
- I reproduce your results: PCA OK in a few seconds and PLS-DA to predict the 'family' response is still not converging after many seconds.
- Your dataset contains ~100X more features than samples (so a high risk of overfitting), and two of your 5 classes contain less than 7 samples (which is very few; in any case, the cross-validation argument, which is 7 by default, should be lowered).
- I therefore focused on the 2 classes with the most of the samples, randomly selected 1/10 of the features, and selected only two components. I end up in a few seconds with a significant model. I could also obtain a significant model with the 3 main classes and 2 (or 3) components in a few minutes.
Thank you very much ! I've been struggling a while with that dataset !
So do you suggest that I should :
- Increase my detection threshold to keep less features during pre-processing?
- Replace the classes with few samples by a class "other" in my sampleMetadata file?
Thanks again
I assume that some (many) of your features are noise which should indeed be filtered during preprocessing or quality controls.
I would first analyze your 3 main classes in the classical way. You might also group the remaining samples in a 4th class.