Lightnews — Scholar-powered news

Artem Moskalev @artemmoskalev.bsky.social · Aug 14

Job alert 🚨 Our team is looking for ML Research Scientist to join Johnson&Johnson research. We work on geometric deep learning and LLMs in drug discovery. 🧬🤓 Drop me a message if you’re interested, or share this if you know someone who’s a great fit!

Multiple locations available.

2 6

Artem Moskalev @artemmoskalev.bsky.social · Aug 12

Our code for Hierarchical RNA Language models is out! Multiple training regimes, architectures and evaluations, check it out!

Code: github.com/johnsonandjo...

5

Artem Moskalev @artemmoskalev.bsky.social · Jul 14

Interested in efficient equivariance for long-context? Visit our Geoemtric Hyena poster at ICML! ⭐️Spotlight⭐️

When: 11 a.m. — 1:30 p.m Wed July 16
Where: East Exhibition Hall A-B #E-3103

Artem Moskalev @artemmoskalev.bsky.social · Jun 4

ICML Spotlight 🚨 Equivariance is too slow and expensive, especially when you need global context. It makes us wonder if it even worths it? We present Geometric Hyena Networks — a simple equivariant model orders of magnitude more memory- and compute-efficient for high-dimensional data.

1/8

2

Artem Moskalev @artemmoskalev.bsky.social · Jun 23

In SF June 23–25 🇺🇸 and Boston June 29–30 🇺🇸

Let me know if you're around! Happy to meet and chat about life and AI for bio!

Artem Moskalev @artemmoskalev.bsky.social · Jun 4

Joint work with brilliant Mangal Prakash, Junjie Xu, Tianyu Cui, Rui Liao, and Tommaso Mansi.

8/8

1

Artem Moskalev @artemmoskalev.bsky.social · Jun 4

Notably, when the equivariant transformer runs out of memory on sequences over 37k tokens, our model can handle up to 2.7M million tokens on a single A10G GPU, providing up to 72× longer context length within the same computational budget.

7/8

1 1

Artem Moskalev @artemmoskalev.bsky.social · Jun 4

We test the proposed geometric long convolution on multiple large-molecule property and dynamics prediction tasks for RNA and protein biomolecules. Geometric Hyena is on par or better than equivariant self-attention at a fraction of its computational cost.

6/8

1 1

Artem Moskalev @artemmoskalev.bsky.social · Jun 4

To evaluate long geometric context capabilities of our models, we introduce the geometric extension of the mechanistic interpretability suite. Specifically, we evaluate equivariant models over the increasing degree of complexity of equivariant associative recall.

5/8

1 1

Artem Moskalev @artemmoskalev.bsky.social · Jun 4

Inspired by the recent success of long-context models, long-convolutions, and Hyena hierarchy, we propose the geometric counterpart. We rely on the power of FFT to push the computational complexity to NlogN, adapting it for vectors. The implementation is simple – just 50 lines of code!

4/8

1 1

Artem Moskalev @artemmoskalev.bsky.social · Jun 4

In many biological or physical systems, we need equivariance + global context. Leading to quadratic complexity with respect to system size multiplied by the cost of equivariance. Existing equivariant models are not equipped to work on that scale 🫠.

3/8

1 1

Artem Moskalev @artemmoskalev.bsky.social · Jun 4

Our solution is data-controlled geometric long convolution. It provides a global (all-to-all) context akin to self-attention but comes at O(NlogN) cost! No low-rank approximations or coarsening.

Paper: arxiv.org/abs/2505.22560
Code: is coming soon, stay tuned!

2/8

Geometric Hyena Networks for Large-scale Equivariant Learning

Processing global geometric context while preserving equivariance is crucial when modeling biological, chemical, and physical systems. Yet, this is challenging due to the computational demands of equi...

arxiv.org

1 1

Artem Moskalev @artemmoskalev.bsky.social · Jun 4

ICML Spotlight 🚨 Equivariance is too slow and expensive, especially when you need global context. It makes us wonder if it even worths it? We present Geometric Hyena Networks — a simple equivariant model orders of magnitude more memory- and compute-efficient for high-dimensional data.

1/8

1 8

Artem Moskalev @artemmoskalev.bsky.social · Apr 21

- HARMONY: A Multi-Representation Framework for RNA Property Prediction. ORAL at AI4NA workshop. Monday. openreview.net/forum?id=TvBuXU1J2K

1

Artem Moskalev @artemmoskalev.bsky.social · Apr 21

- InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference. ORAL at AI4NA workshop. Monday. openreview.net/forum?id=nzUsRhtnBa

1 1

Artem Moskalev @artemmoskalev.bsky.social · Apr 21

- Beyond Sequence: Impact of Geometric Context for RNA Property Prediction. Saturday 10-12:30. Hall 3 + Hall 2B #5. openreview.net/forum?id=9htTvHkUhh

1 1

Artem Moskalev @artemmoskalev.bsky.social · Apr 21

Attending ICLR 🇸🇬 Always happy to chat about LLMs and geometric deep learning for drug discovery and bio!

We are presenting two conference, and two ORALS workshop papers

- HELM:Hierarchical Encoding for mRNA Language Modeling. Thursday 10-12:30. Hall 3+Hall 2B #6. openreview.net/forum?id=MMHqnUOnl0

1 1

Reposted by Artem Moskalev

Chaitanya K. Joshi @chaitjo.bsky.social · Mar 10

Introducing All-atom Diffusion Transformers

— towards Foundation Models for generative chemistry, from my internship with the FAIR Chemistry team

There are a couple ML ideas which I think are new and exciting in here 👇

1 9 44

Reposted by Artem Moskalev

Max Zhdanov @maxxxzdn.bsky.social · Mar 5

🤹 Excited to share Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems

joint work with @wellingmax.bsky.social and @jwvdm.bsky.social

preprint: arxiv.org/abs/2502.17019
code: github.com/maxxxzdn/erwin

3 14 41

Artem Moskalev @artemmoskalev.bsky.social · Mar 2

North Sea weekend 🌊

3

Artem Moskalev @artemmoskalev.bsky.social · Feb 7

1. Hyperbolic NNs for sequence modeling: jobs.jnj.com/en/jobs/2506234550w

2. Causal Inference and Bayesian Optimization: jobs.jnj.com/en/jobs/2506234553w

3. Multi-modal Sequence, Structure & Interaction modeling: jobs.jnj.com/en/jobs/2506234539w

Apply and reach out to me if interested! 😁

4

Artem Moskalev @artemmoskalev.bsky.social · Feb 7

AI/ML Internships in Drug Discovery🚨

Our team is hiring PhD research interns for summer 2025 in the US. Come work with us on cutting-edge drug discovery projects! We have 3 openings:

1 2 7

Artem Moskalev @artemmoskalev.bsky.social · Feb 3

If you need to pick a neural network and train it on RNA data, our work provides guidelines on which method works best in which conditions.

2

Artem Moskalev @artemmoskalev.bsky.social · Feb 3

What did we learn? In the presence of severe noise simple sequence transformer without any geometry works the best but it requires much more data to converge! At the same time, 3D Geometric GNNs are the most vulnerable to geometric noise.

1 1

Artem Moskalev @artemmoskalev.bsky.social · Feb 3

We study different types of neural networks on various types of RNA representations: 1D vs 2D vs 3D. We evaluate property prediction performance, noise robustness, data efficiency and OOD noise generalization.

1 1

Artem Moskalev @artemmoskalev.bsky.social · Feb 3

Accepted to ICLR 🚨 Does using more geometry always help with molecule property prediction? In practice, we deal with imperfect geometries, which introduce structural noise.

In our work arxiv.org/abs/2410.11933, we investigate when and how geometric information is useful (or not) for RNA molecules.

Beyond Sequence: Impact of Geometric Context for RNA Property Prediction

Accurate prediction of RNA properties, such as stability and interactions, is crucial for advancing our understanding of biological processes and developing RNA-based therapeutics. RNA structures can ...

arxiv.org

1 1 13