Lightnews — Scholar-powered news

Caio

@caiocorro.bsky.social

Pourquoi pas Instagram ?

November 6, 2025 at 12:26 PM

Caio

@caiocorro.bsky.social

You need this fixed logsumexp function, otherwise you will have NaN gradients for the neural network. I personnaly came across this bug when building a CRF with the following transition structure for discontinuous named entity recognition, see here: aclanthology.org/2024.emnlp-m...

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

The use case: conditional random fields with forbidden transitions between tags. Altough implementing the forward function is trivial with Pytorch, see for example here: github.com/FilippoC/lig...

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

But now the backward pass works as expected, and you get null gradients for w1.

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

Yes: implement your own logsumexp function that fixes this bug. I found the workaround by coming across this github issue: github.com/pytorch/pyto...
The forward pass is basically the same, but using the custom logsumexp function.

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

But this will give you NaN gradient for w1 ! If you look at the gradient of w2, masked values have a null gradient, as expected. But for w1, instead of having a vector of null gradients, we have a vector of NaNs. This completly breaks gradient backprop and grad descent. So, is there a nice solution?

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

The input of the first logsumexp is completly masked, the second one is partially masked, the last one has no masked applied on it. Obviously, the first logit is equal to -inf. This means a "masked output probability" after softmax. We can then just compute a loss and backpropagate gradient.

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

Qu'est ce que ça va être quand ils vont découvrir l'existence du manifeste ou de l'état et la revolution. 🙃

October 2, 2025 at 8:07 PM

Caio

@caiocorro.bsky.social

J'ai pas encore testé, mais le dernier Mafia à l'air ultra stylé

August 30, 2025 at 11:49 PM

Reposted by Caio

Naomi Saphra

@nsaphra.bsky.social

For updates on AI, I increasingly just advise people to pick a discord they like and stick with it. Twitter stopped having interesting science chat ages ago, it’s just companies announcing their products and getting a bunch of meme QTs.

August 23, 2025 at 2:29 PM

Caio

@caiocorro.bsky.social

2 weeks is always unreasonable. 🙃

August 14, 2025 at 12:10 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news