No infos from univariate statistical analysis beside good multivaraite; on metabolomic data

Hello everyone,
I used W4M for processing my metabolomic data.

Background:
-My experimental design is not perfect, having only 6 subjects sampled 2 times (in 2 differents condition).
-I have around 4 000 features detected.
-using an in-house library I only identify 200 metabolites.

Statistics procedure:
-I decided to run a PCA on all the features detected to see if my condition discriminates. I have a nice separation in 2 groupes, each per condition. Composante does not explain a huge amount of the variance but still: composante 1 : 20% and composante 2 : 16%).
-I then try to run univariate analysis on all the features detected : Wilcoxon on 2 level qualitative variable (my conditions), with FDR correction.
I do not obtain any statistically significant difference between my conditions.

My plan was to use the statistics to determine reduce the features I am interested on, to perform identification afterwards.

Question:
Due to my PCA I was expecting some differences in the univariate analysis. I think I am loosing the significant features due to the correction on all features and my small amount of subjects.
Do you agree with that explanation or I missed something?

Alternative:
I may run the identification on all features and use only the identified features to perform statistics.

I thank you in advance for any input on this :slight_smile:
Wish you a nice day,

best regards,
Maëlle

Ping @team.w4m :slight_smile:

Hello Maelle,

Which univariate test did you apply? Did you apply multiple test correction, FDR for example?
Are you data paired, ie samples were collected on the same subjects, at different dates?
Best

Marie

Hello Marie,
thank you for answering,

I used a Wilcoxon test on 2 level, on my qualitative variable (condition), and applied a FDR correction. I guess it is a reason why I don't reach the significant level with correction on all the features (4 000 detected). On the 4 000 detected features I can only identify a very small part (100) with my in-house library.

Indeed my data are paired; same subjects at 2 different timepoint.

Best regards,

You should use tests for paired data. I think that a wilcoxon test exists for this type of data

I did use a Wilcoxon test on univariate analysis Univariate Univariate statistics (Galaxy Version 2.2.4). On W4M I don't know is there is a way to specify that data are paired for the Wilcoxon. Or I should export in R to performed paired wilcoxon.

Hi,
The Wilcoxon test provided in the Univariate module is not the paired one. It provides the Wilcoxon rank-sum test, also known as the Mann-Whitney-Wilcoxon test, while the paired test you need is the Wilcoxon signed-rank test.
As a R software user, you can use the wilcox.test() function with the "paired=TRUE" argument (please refer to the R function manual for more information about how to use appropriately this function).
Additional note: as you mentioned, your experiment is limited in number of subjects. Even if the use of a paired test can be more appropriate due to your design, having such a few number of individuals is still a huge limitation. If you combine this fact with the multiple-testing correction to apply, even with a substential effect in your dataset as observable in PCA it may not be surprising not to find features bellow the p-value threshold you defined.
Concerning the alternatives as a relevant approach to analyse your data, with the few information you gave it is possible to imagine possibilities as the one you mentioned, or even use the result of your PCA as a way to select features of interest. Nonetheless, to be sure the alternative you choose is relevant, the best is (as always) to discuss this with a specialist (in statistics and/or metabolomics) to whom you could explain in more details the objectives of you experiment.
Best regards

1 « J'aime »