Abdul Muntakim Rafi
muntakimrafi.bsky.social
Abdul Muntakim Rafi
@muntakimrafi.bsky.social
PhD candidate @SBME_UBC | Machine Learning | Gene regulation
Pinned
0/ Essential reading for anyone training or using sequence-function models trained on genomic sequences! 🚨 In our new preprint, we explore the ways homology within genomes can cause leakage when training sequence-based models and ways to prevent it
Reposted by Abdul Muntakim Rafi
New (and hotly anticipated - at least by me) preprint from my group describing a better way to partition training data for genomic-trained models to solve the long-neglected problem of homology-based data leakage. Thread from first author @muntakimrafi.bsky.social 👇
0/ Essential reading for anyone training or using sequence-function models trained on genomic sequences! 🚨 In our new preprint, we explore the ways homology within genomes can cause leakage when training sequence-based models and ways to prevent it
January 27, 2025 at 11:48 PM
0/ Essential reading for anyone training or using sequence-function models trained on genomic sequences! 🚨 In our new preprint, we explore the ways homology within genomes can cause leakage when training sequence-based models and ways to prevent it
January 27, 2025 at 11:04 PM
Had a lot of fun at the CSHL Biological Data Science conference.

Thanks to the scholarship from the "James P. Taylor Foundation for open science" for making it possible.

#cshl
November 17, 2024 at 11:02 PM
I am attending the Biological Data Science Meeting at CSHL. Will be giving a talk this Friday morning on the results from the Random Promoter DREAM Challenge. Will also be presenting a poster on a recent work where we address and solve the homology-based leakage in genome trained models.
November 14, 2024 at 7:13 AM
Thrilled to share our research at the recent @KipoiZoo seminar! 🧬 We showed how chromosomal splitting of genome can cause train-test leakage through sequence homology and proposed a scalable solution to tackle it. Preprint coming soon!

youtu.be/0_08qB0wLoM?...
Kipoi Seminar - Abdul Muntakim Rafi (University of British Columbia)
YouTube video by Kipoi Seminar
youtu.be
November 14, 2024 at 6:31 AM
1/If you're training ML models on DNA sequences, u need to take a look at our new paper in @NatureBiotech! It contains analysis done by over 300 researchers, tells the story of how we built state-of-the-art for short regulatory DNA and developed a framework to keep improving
November 14, 2024 at 6:25 AM