Guillermo Prol-Castelo
@gprolcastelo.bsky.social
30 followers 58 following 21 posts
Bioinformatics predoctoral researcher at @bsc-cns.bsky.social and @upf.edu #AI #bio https://github.com/gprolcastelo
Posts Media Videos Starter Packs
Reposted by Guillermo Prol-Castelo
evenflowproject.bsky.social
🎉 Congratulations to EVENFLOW partner @gprolcastelo.bsky.social on the publication of his second PhD pre-print! 📄

A contribution to understanding VAEs in cancer progression research, supported by the @evenflowproject.bsky.social project.

Learn more 👇

#CancerResearch #DeepLearning
gprolcastelo.bsky.social
New research alert 📝❗❗❗

In our latest pre-print, 2nd of my PhD, we performed a Systematic Literature Review on the use of Deep Representational Learning (DRL), especially the Variational Autoencoder (VAE), in cancer progression research.

This thread explains our main findings.

(1 minute read)
gprolcastelo.bsky.social
I would also like to acknowledge the @evenflowproject.bsky.social for their support
gprolcastelo.bsky.social
I would like to thank Davide Cirillo and @alfonsovalencia.bsky.social for their supervision
gprolcastelo.bsky.social
Studying cancer progression with DRL is challenging due to limited longitudinal data and the absence of methods for real-time tracking. Still, solving these problems could lead to major breakthroughs in personalized treatment
gprolcastelo.bsky.social
Single cell omics data is commonly used to infer pseudo-time trajectories of cancer cells. However, these trajectories lack actual temporal information from either real time or stages.
gprolcastelo.bsky.social
It is also not clear how to study cancer in a temporal manner, given cancer progression differs from patient to patient. From the literature, we see that stages may be used as a proxy time-unit to study cancer’s time dimension.
gprolcastelo.bsky.social
Given the difficulties of performing a longitudinal study in human patients, there is a lack of follow-up data, especially in cancer.
gprolcastelo.bsky.social
We found that the most common uses of VAEs to study cancer include diagnosis, prognosis, and subtyping.
Word cloud of exclusion reasons. Most publications use deep and representation learning applied to cancer for subtyping, diagnosis, and studying prognosis and survival.
gprolcastelo.bsky.social
Cancer is a highly complex and dynamic disease, making it well-suited for analysis using DRL methods and VAEs.

We wanted to elucidate the most common uses of DRL and the VAE in the study of cancer, paying special attention to the temporal component of cancer, which remains understudied.
gprolcastelo.bsky.social
DRL methods are used to learn a low-dimensional embedding from data. Specifically, the VAE can learn said representation, the latent space, keeping non-linear relationships in the original data. Moreover, it is also a generative method, as it can create new, synthetic data from the original data.
Summary figure of the most common methodologies involving representation learning and cancer studies.
On the left side of the figure are shown the most common types of data used to train VAEs particularly—and DRL methods more generally—, in cancer studies. After being passed to the encoder, data is embeded into a lower-dimensional representation, the latent space. The latent space has been commonly used for subtyping, survival anlyses (commonly as part of prognostic analyses), and, when it comes to single-cell data, also for pseudo-time trajectories inference. Possible further applications, shown on the right-hand side of the figure, include leveraging the decoder’s generative capabilities to reconstruct noisy data, and align and infer out-of-sample data, which may be used to study cancer progression in time. However, such applications of the decoder remain underexplored in biomedical research.
gprolcastelo.bsky.social
New research alert 📝❗❗❗

In our latest pre-print, 2nd of my PhD, we performed a Systematic Literature Review on the use of Deep Representational Learning (DRL), especially the Variational Autoencoder (VAE), in cancer progression research.

This thread explains our main findings.

(1 minute read)
Reposted by Guillermo Prol-Castelo
biorxivpreprint.bsky.social
10 Years of Variational Autoencoder: Insights from Cancer Temporal Progression Studies, a Systematic Literature Review https://www.biorxiv.org/content/10.1101/2025.05.29.656750v1
Reposted by Guillermo Prol-Castelo
aixbiobot.bsky.social
10 Years of Variational Autoencoder: Insights from Cancer Temporal Progression Studies, a Systematic Literature Review [new]
VAE cancer omics analysis reveals temporal modeling gap (limited data). Proposes VAEs for cancer staging.
10 Years of Variational Autoencoder: Insights from Cancer Temporal Progression Studies, a Systematic Literature Review Figure 1 Figure 2 Figure 3
Reposted by Guillermo Prol-Castelo
biorxiv-bioinfo.bsky.social
10 Years of Variational Autoencoder: Insights from Cancer Temporal Progression Studies, a Systematic Literature Review https://www.biorxiv.org/content/10.1101/2025.05.29.656750v1
Reposted by Guillermo Prol-Castelo
alfonsovalencia.bsky.social
Happy to be part of the amazing new world of synthetic data by the hand of @gprolcastelo.bsky.social in Davide Cirillo’s group!!
gprolcastelo.bsky.social
New Insights into Medulloblastoma! 🧠

I am very happy to announce the first paper in my Ph.D. thesis journey:

Exploring the Boundaries of Medulloblastoma Subgroups with synthetic Data Generation -> www.biorxiv.org/content/10.1...

Let’s dive in into our findings with this thread! 🧵⤵️
Exploring the Boundaries of Medulloblastoma Subgroups with Synthetic Data Generation
Medulloblastoma is a childhood brain tumor traditionally classified into four molecular subgroups. Recent evidence suggests that Groups 3 and 4 represent a biological continuum rather than distinct en...
www.biorxiv.org
gprolcastelo.bsky.social
Shout out to all the authors: Alejandro Tejada-Lapuerta, Beatriz Urda-García, Iker Núñez-Carpintero, @alfonsovalencia.bsky.social, and Davide Cirillo. Thanks to the #Evenflow project, and my institution @bsc-cns.bsky.social
gprolcastelo.bsky.social
5. How is this helpful for MB research?

We believe our contributions will help develop better treatments for MB: labeling patients’ subgroups leads to different treatment strategies, so elucidating the most adequate is essential for an optimal recovery.
gprolcastelo.bsky.social
4.2. We have seen there are about 2,500 genes’ expressions that are unique to the G3-G4 subgroup, some of which are commonly mutated in MB: KMT2C, MYC, SNCAIP, SYNCRIP, and TP53.
gprolcastelo.bsky.social
4. What do we find?

4.1. By identifying and augmenting the patients in the G3-G4 subgroup, we achieved high classification performance, reinforcing that this intermediate group displays distinct features in comparison to G3 and G4.
Boxplot of classification results for G3, G4, and G3-G4, augmented with synthetic data.
gprolcastelo.bsky.social
3. How can we study a rare occurrence of a rare disease?

We have obtained the data from the largest repository available on MB [7] and used the VAE's [8] generative ability to amplify the G3-G4 subgroup. This means we can learn from real patient data to generate new, synthetic patients.
gprolcastelo.bsky.social
2.2. Research has suggested the possibility to deem G3 and G4 as a continuum [4] but also the existence of an additional subgroup between G3 and G4 [5, 6] sometimes referred to as G3-G4. However, the limited number of patients in this intermediate case have made gaining relevant insights a challenge
gprolcastelo.bsky.social
2. Why study Medulloblastoma Subgroups?

2.1. G3 and G4 subgroups tend to be closely clustered. This tight relationship is reflected in the latest consensus classification of MB, dividing the disease into WNT, SHH, and non-WNT/non-SHH subgroups [3].
gprolcastelo.bsky.social
1. What is Medulloblastoma?

Medulloblastoma (MB) is a childhood brain tumor that is classically divided into four molecular subgroups [1]: Wingless (WNT), Sonic Hedgehog (SHH), Group 3 (G3), and Group 4 (G4). It is a rare disease, with 5 cases per million in the pediatric population [2].