Lightnews — Scholar-powered news

BenjMurrell

@benjmurrell.bsky.social

A technical thread on loss scaling in diffusion and flow matching models (related to a new preprint):

Since the dawn of time, people have been messing with (or dropping entirely) these pesky time-dependent loss scaling terms, mostly because the models train better without them.

November 21, 2025 at 11:09 PM

BenjMurrell

@benjmurrell.bsky.social

We figured out flow matching over states that change dimension. With "Branching Flows", the model decides how big things must be! This works wherever flow matching works, with discrete, continuous, and manifold states. We think this will unlock some genuinely new capabilities.

November 10, 2025 at 9:10 AM

BenjMurrell

@benjmurrell.bsky.social

One day remaining before this closes!
bsky.app/profile/benj...

BenjMurrell @benjmurrell.bsky.social · May 15

My lab, at Karolinska, in Stockholm, is looking for a PhD student with a computational/quantitative background to work on probabilistic/generative models of proteins (structure and sequence). The research will involve methods development, and applications in vaccine design.

June 21, 2025 at 4:45 AM

BenjMurrell

@benjmurrell.bsky.social

We tried to set up a simple demo/tutorial model for the protein design ecosystem we've been developing, and it turned out a bit more interesting than we expected. 🧵

This was a team effort from a few people in my lab, including @antonoresten.bsky.social and others (not sure who is on this app)

June 13, 2025 at 5:06 PM

BenjMurrell

@benjmurrell.bsky.social

My lab, at Karolinska, in Stockholm, is looking for a PhD student with a computational/quantitative background to work on probabilistic/generative models of proteins (structure and sequence). The research will involve methods development, and applications in vaccine design.

May 15, 2025 at 1:21 PM

BenjMurrell

@benjmurrell.bsky.social

May 15, 2025 at 11:50 AM

BenjMurrell

@benjmurrell.bsky.social

May 15, 2025 at 9:34 AM

BenjMurrell

@benjmurrell.bsky.social

Transformer/attention folks: I saw massive activations in Qwen's keys, always towards the end of each head, especially in layer 1. Turns out this is directly driven by the key projection bias (which Qwen has but eg. Llama3 does not).

These large values are where RoPE has the slowest(?) effect. Why?

December 9, 2024 at 12:00 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news