Use the OPLS-DA dataset to create S-plot

tco22 · Janvier 22, 2025, 1:14

Hello W4M team.

I was wondering is I am using the good variables to create a S-plot after performing OPLS-DA using the multivariate tool in Galaxy. I created the S-plot with in-house code outside of Galaxy environment.

From the variableMetadata output file, I used the "OPLSDA_XLOAD-h1" column for the loadings for the predictive component (x axis), and the "OPLSDA_COEFF" column as regression coefficients (relationship between variables and the predictive component - y axis).
Is this choice correct or I need to modify something?

Best regards,
Thomas

yguitton · Janvier 23, 2025, 9:04

Hello @tco22

I think that a R script or function do exist for creating S-plot from ropls outputs, but still have to find it. Maybe @ethevenot @melpetera or @mtremblayfranco can help?

Best

ethevenot · Janvier 23, 2025, 12:08

Hi,

To generate the S plot, you need to compute the covariance and the correlation between the scores and the variables.

Below is the code to generate the S plot in R.

You first need to download your 2 data files 'dataMatrix.tsv', and 'sampleMetadata.tsv' from W4M to your computer.

Then read your 2 table files in your local R session:

dataMatrix <- t(as.matrix(read.table("dataMatrix.tsv",
check.names = FALSE,
header = TRUE,
row.names = 1,
sep = "\t",
stringsAsFactors = FALSE)))

sampleMetadata <- read.table("sampleMetadata.tsv",
check.names = FALSE,
header = TRUE,
row.names = 1,
sep = "\t",
stringsAsFactors = FALSE)

Get the scores of the predictive component (column of the sampleMetadata named "factor of interest_pls model_X-SCOR-p1" (or "-h1" in case of OPLS); here I use "gender_PLSDA_XSCOR-p1" as an example)

scoreVn <- sampleMetadata[, "gender_PLSDA_XSCOR-p1"]

Compute the covariance between the scores and the variables

covVn <- cov(scoreVn, dataMatrix)

Compute the correlation between the scores and the variables

corVn <- cor(scoreVn, dataMatrix)

Generate the S-plot

dev.new()
plot(covVn, corVn, main = "S-plot",
xlab = "cov(t, X)",
ylab = "cor(t, X)")

You can identify the variables of interest on the plot interactively (e.g. those with high correlation and high covariance)

selVi <- identify(covVn, corVn, labels = colnames(dataMatrix))
colnames(dataMatrix)[selVi]

Best wishes

Etienne.

tco22 · Janvier 23, 2025, 4:39

Hi @ethevenot,

Many thanks for the script and the quick reply. It works perfecty.

Best regards,
Thomas