Se connecter

Bernd Jagla · 2022-06-10, 03:26 PM

Sorry for writing in English, you can answer in French if that is more comfortable Wink

Hi,

hope this finds you well.

I am interested in discussing transformations for cytometry data analysis.

Specifically, I found that mainly for reproducibility reasons, one should use the arcsinh transformation and adjust the cofactor such that there is only one population around zero.

Question: why not shift all values to being >=1 and then use either log transform or arcsinh, but this time adjust the cofactor to optimize the separation of different clusters?

I believe that one could be worried about shifting the means/MFIs of the individual populations, and this might be important for comparing between samples. But in my experience, comparing between samples is not possible without a proper alignment of the distributions.
Some biologists say that that shift in distributions/population/MFIs is biologically significant, and I would agree. But these are different questions. Counting/identifying cell types and shifts in MFIs are different questions and should be handled separately (mho).

I have already spoken to a few people about this but without a clear and convincing answer.

Would anyone be willing to discuss this?
If this interests many people, we might organize an afternoon to discuss it???

Looking forward to your answers.

Best,

Bernd

Bernd Jagla · 2022-06-15, 01:47 PM

Would it be interesting to develop this a bit further and write either a manuscript or blog post about this?

Or just put it on bioRxiv?

Please reply if someone is interested in participating

Best,
Bernd

***SGranjeaud*** · 2022-06-15, 03:15 PM

Here is the answer of Antonio Cosma to the initial message of Berndt

Dear Bernd,

I think is really an interesting question.

I fully agree about the arcsinh transformation with cofactor adjustment.

Some time ago, I created a dashboard to show how arcsinh transformation may affect clustering and z-score: https://public.tableau.com/app/profile/a...clustering

I always wondered whether moving a cytometry dataset far from 0 could improve the analysis in some way. We should also separate flow from mass cytometry data since their relation with the space close to zero is quite different.

Happy to discuss the subject
Best
Antonio

***SGranjeaud*** · 2022-06-15, 03:39 PM

Dear all,

Always OK to discuss to share points of view.

I am not sure that it is easy to split markers in two categories, one category of markers with clearly 2 or 3 peaks that are used to identify populations, and one category of markers that are used for evaluating shift of MFI. Manual gating strategies are using markers without clear peaks. Maybe there are better alternatives, but I have seen such usages.

IIUC, Berndt, you aim at optimizing the separation between 2 (or 3) peaks in a set of markers in order to get a better clustering. It sounds interesting.

Moving intensities away from the evil zero sounds interesting, I never tried. It might be useful to remember that intensities result from the compensation process. So intensities could be shifted back to their original region. There should be 1 or 2 articles in which authors tried to clusterize uncompensated intensities, but I can't remember. Antonio might know about these articles.

The optimization might also depend on the features that are important in the clustering algorithm. Is the separation the most important feature? Is the width of the distribution critical or not? Should it be the same through all clusters?

Best,
Samuel

Antonio · 2022-06-15, 11:48 PM

Algorithms such as t-SNE, FlowSOM, etc. work better on non-compensated data. Indeed, populations are better resolved. The problem is that to give a biological meaning to a population you need compensated data. One possible workflow is to compensate a dataset, run the algorithm on the uncompensated axis, and use the compensated one to characterize biologically the non-compensated data. If you use a deterministic algorithm you can in theory not compensate your data after assigning a biological relevant population to a region in the multidimensional space.

There is a manuscript in which they use spectral flow cytometry data and apply t-SNE (?) on the raw data (64 channels, without unmixing). I should search for it.

Se connecter
Utilisateur :
Mot de passe :	Mot de passe oublié ?
	Se rappeler