Sameed Siddiqui
@sameedms.bsky.social
41 followers 35 following 17 posts
Californian lost in the Northeast ☀️. PhD @ MIT Computational and Systems Biology | MBA Fellow at MIT Sloan. @SabetiLab member
Posts Media Videos Starter Packs
sameedms.bsky.social
Finally, thanks to the team! A huge shoutout to my friend and mentee @krithik-bs.bsky.social. Also infinitely grateful for #AlbertGu for his advice, and #MichaelMitzenmacher @pardissabeti.bsky.social for their mentorship and leadership. So much laughter while making this paper, can't wait for more.
sameedms.bsky.social
This work shows that principled mathematical insights, like approximation of epistatic interactions, can provide an accessible and performant alternative to large foundation models—suggesting broader applicability beyond biological sequences.
sameedms.bsky.social
We are excited about Lyra's potential to accelerate discoveries in molecular biology, therapeutic development, and protein engineering.
sameedms.bsky.social
Lyra makes cutting-edge biological modeling accessible to labs without extensive compute resources. Instead of relying on massive GPU clusters, Lyra empowers researchers to train state-of-the-art models directly on their own laptops.
sameedms.bsky.social
Lyra’s subquadratic O(N log N) complexity dramatically reduces memory (125x–2600x less than Evo and ESM-1b) and accelerates inference—up to 239x faster than ESM-1b, processing sequences up to 1M length.
sameedms.bsky.social
RNA-dependent RNA polymerases (RDRPs) are essential markers for RNA virus detection. Lyra achieves a near-perfect 0.998 true positive rate, matching LucaProt-ESM with over 60,000x fewer parameters, accelerating pathogen discovery without needing large-scale GPU infrastructure.
sameedms.bsky.social
Lyra achieves SOTA results in 6 out of 7 intrinsically disordered protein region tasks, with an average AUC of 0.89, outperforming a ProtT5-based model (avg AUC 0.86). Lyra accomplishes this using only 55K parameters, compared to ProtT5’s 3 billion parameters—a >50,000-fold reduction in model size.
sameedms.bsky.social
Lyra’s consistently strong performance across different tasks using orders of magnitude fewer parameters allows researchers to spend less time optimizing models and more time generating biological insights.
sameedms.bsky.social
Lyra sets records in 5/9 RNA BEACON benchmarking tasks tested, including nearly solving the splice-site prediction dataset (98.89% accuracy vs previous best 50.55%) and almost doubling performance on structural score imputation (0.73 vs 0.42).
sameedms.bsky.social
We tested Lyra on 101 diverse biological tasks spanning:

1. Proteomics

2. Genomics

3. CRISPR guide efficacy

Lyra set new performance records in 79 out of 101 tasks, w/ substantially smaller models than competing architectures.
sameedms.bsky.social
We designed Lyra with two simple components: Projected Gated Convolutions (PGC), which enhance local feature extraction, and diagonalized State Space Models (S4D), which capture global epistatic interactions. In doing so, Lyra efficiently captures both global and local epistatic relationships.
sameedms.bsky.social
We drew a mathematical connection between State Space Models (SSMs) and polynomial approximation, showing how their hidden states can naturally approximate the polynomial terms that govern epistatic relationships. This makes SSMs ideal for modeling biological functions as multilinear polynomials.
sameedms.bsky.social
This perspective provides a principled mathematical framework for modeling sequence-function relationships.
sameedms.bsky.social
To unify biological sequence modeling across DNA, RNA, and proteins into a single computational framework, we revisited epistasis—the phenomenon where mutations influence each other—which can be characterized by multilinear polynomials.
sameedms.bsky.social
Breaking down how biological sequences encode molecular functions remains a central challenge in computational biology. For example, given a GFP sequence, can we predict its fluorescence brightness?
sameedms.bsky.social
🧬 Meet Lyra, a new paradigm for accessible, powerful modeling of biological sequences. Lyra is a lightweight SSM achieving SOTA performance across DNA, RNA, and protein tasks—yet up to 120,000x smaller than foundation models (ESM, Evo). Bonus: you can train it on your Mac.
arxiv.org/abs/2503.16351