Artem Moskalev
@artemmoskalev.bsky.social
1.2K followers 520 following 40 posts
Re-imagining drug discovery with AI 🧬. Deep Learning ⚭ Geometry. Previously PhD at the University of Amsterdam. https://amoskalev.github.io/
Posts Media Videos Starter Packs
Pinned
artemmoskalev.bsky.social
ICML Spotlight 🚨 Equivariance is too slow and expensive, especially when you need global context. It makes us wonder if it even worths it? We present Geometric Hyena Networks — a simple equivariant model orders of magnitude more memory- and compute-efficient for high-dimensional data.

1/8
artemmoskalev.bsky.social
Job alert 🚨 Our team is looking for ML Research Scientist to join Johnson&Johnson research. We work on geometric deep learning and LLMs in drug discovery. 🧬🤓 Drop me a message if you’re interested, or share this if you know someone who’s a great fit!

Multiple locations available.
artemmoskalev.bsky.social
Our code for Hierarchical RNA Language models is out! Multiple training regimes, architectures and evaluations, check it out!

Code: github.com/johnsonandjo...
artemmoskalev.bsky.social
Interested in efficient equivariance for long-context? Visit our Geoemtric Hyena poster at ICML! ⭐️Spotlight⭐️

When: 11 a.m. — 1:30 p.m Wed July 16
Where: East Exhibition Hall A-B #E-3103
artemmoskalev.bsky.social
ICML Spotlight 🚨 Equivariance is too slow and expensive, especially when you need global context. It makes us wonder if it even worths it? We present Geometric Hyena Networks — a simple equivariant model orders of magnitude more memory- and compute-efficient for high-dimensional data.

1/8
artemmoskalev.bsky.social
In SF June 23–25 🇺🇸 and Boston June 29–30 🇺🇸

Let me know if you're around! Happy to meet and chat about life and AI for bio!
artemmoskalev.bsky.social
Joint work with brilliant Mangal Prakash, Junjie Xu, Tianyu Cui, Rui Liao, and Tommaso Mansi.

8/8
artemmoskalev.bsky.social
Notably, when the equivariant transformer runs out of memory on sequences over 37k tokens, our model can handle up to 2.7M million tokens on a single A10G GPU, providing up to 72× longer context length within the same computational budget.

7/8
artemmoskalev.bsky.social
We test the proposed geometric long convolution on multiple large-molecule property and dynamics prediction tasks for RNA and protein biomolecules. Geometric Hyena is on par or better than equivariant self-attention at a fraction of its computational cost.

6/8
artemmoskalev.bsky.social
To evaluate long geometric context capabilities of our models, we introduce the geometric extension of the mechanistic interpretability suite. Specifically, we evaluate equivariant models over the increasing degree of complexity of equivariant associative recall.

5/8
artemmoskalev.bsky.social
Inspired by the recent success of long-context models, long-convolutions, and Hyena hierarchy, we propose the geometric counterpart. We rely on the power of FFT to push the computational complexity to NlogN, adapting it for vectors. The implementation is simple – just 50 lines of code!

4/8
artemmoskalev.bsky.social
In many biological or physical systems, we need equivariance + global context. Leading to quadratic complexity with respect to system size multiplied by the cost of equivariance. Existing equivariant models are not equipped to work on that scale 🫠.

3/8
artemmoskalev.bsky.social
Our solution is data-controlled geometric long convolution. It provides a global (all-to-all) context akin to self-attention but comes at O(NlogN) cost! No low-rank approximations or coarsening.

Paper: arxiv.org/abs/2505.22560
Code: is coming soon, stay tuned!

2/8
Geometric Hyena Networks for Large-scale Equivariant Learning
Processing global geometric context while preserving equivariance is crucial when modeling biological, chemical, and physical systems. Yet, this is challenging due to the computational demands of equi...
arxiv.org
artemmoskalev.bsky.social
ICML Spotlight 🚨 Equivariance is too slow and expensive, especially when you need global context. It makes us wonder if it even worths it? We present Geometric Hyena Networks — a simple equivariant model orders of magnitude more memory- and compute-efficient for high-dimensional data.

1/8
artemmoskalev.bsky.social
- HARMONY: A Multi-Representation Framework for RNA Property Prediction. ORAL at AI4NA workshop. Monday. openreview.net/forum?id=TvBuXU1J2K
artemmoskalev.bsky.social
- InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference. ORAL at AI4NA workshop. Monday. openreview.net/forum?id=nzUsRhtnBa
artemmoskalev.bsky.social
- Beyond Sequence: Impact of Geometric Context for RNA Property Prediction. Saturday 10-12:30. Hall 3 + Hall 2B #5. openreview.net/forum?id=9htTvHkUhh
artemmoskalev.bsky.social
Attending ICLR 🇸🇬 Always happy to chat about LLMs and geometric deep learning for drug discovery and bio!

We are presenting two conference, and two ORALS workshop papers

- HELM:Hierarchical Encoding for mRNA Language Modeling. Thursday 10-12:30. Hall 3+Hall 2B #6. openreview.net/forum?id=MMHqnUOnl0
Reposted by Artem Moskalev
chaitjo.bsky.social
Introducing All-atom Diffusion Transformers

— towards Foundation Models for generative chemistry, from my internship with the FAIR Chemistry team

There are a couple ML ideas which I think are new and exciting in here 👇
Reposted by Artem Moskalev
maxxxzdn.bsky.social
🤹 Excited to share Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems

joint work with @wellingmax.bsky.social and @jwvdm.bsky.social

preprint: arxiv.org/abs/2502.17019
code: github.com/maxxxzdn/erwin
artemmoskalev.bsky.social
1. Hyperbolic NNs for sequence modeling: jobs.jnj.com/en/jobs/2506234550w

2. Causal Inference and Bayesian Optimization: jobs.jnj.com/en/jobs/2506234553w

3. Multi-modal Sequence, Structure & Interaction modeling: jobs.jnj.com/en/jobs/2506234539w

Apply and reach out to me if interested! 😁
artemmoskalev.bsky.social
AI/ML Internships in Drug Discovery🚨

Our team is hiring PhD research interns for summer 2025 in the US. Come work with us on cutting-edge drug discovery projects! We have 3 openings:
artemmoskalev.bsky.social
If you need to pick a neural network and train it on RNA data, our work provides guidelines on which method works best in which conditions.
artemmoskalev.bsky.social
What did we learn? In the presence of severe noise simple sequence transformer without any geometry works the best but it requires much more data to converge! At the same time, 3D Geometric GNNs are the most vulnerable to geometric noise.
artemmoskalev.bsky.social
We study different types of neural networks on various types of RNA representations: 1D vs 2D vs 3D. We evaluate property prediction performance, noise robustness, data efficiency and OOD noise generalization.
artemmoskalev.bsky.social
Accepted to ICLR 🚨 Does using more geometry always help with molecule property prediction? In practice, we deal with imperfect geometries, which introduce structural noise.

In our work arxiv.org/abs/2410.11933, we investigate when and how geometric information is useful (or not) for RNA molecules.
Beyond Sequence: Impact of Geometric Context for RNA Property Prediction
Accurate prediction of RNA properties, such as stability and interactions, is crucial for advancing our understanding of biological processes and developing RNA-based therapeutics. RNA structures can ...
arxiv.org