Caio
caiocorro.bsky.social
Caio
@caiocorro.bsky.social
NLP researcher
Pourquoi pas Instagram ?
November 6, 2025 at 12:26 PM
You need this fixed logsumexp function, otherwise you will have NaN gradients for the neural network. I personnaly came across this bug when building a CRF with the following transition structure for discontinuous named entity recognition, see here: aclanthology.org/2024.emnlp-m...
November 4, 2025 at 9:12 AM
The use case: conditional random fields with forbidden transitions between tags. Altough implementing the forward function is trivial with Pytorch, see for example here: github.com/FilippoC/lig...
November 4, 2025 at 9:12 AM
But now the backward pass works as expected, and you get null gradients for w1.
November 4, 2025 at 9:12 AM
Yes: implement your own logsumexp function that fixes this bug. I found the workaround by coming across this github issue: github.com/pytorch/pyto...
The forward pass is basically the same, but using the custom logsumexp function.
November 4, 2025 at 9:12 AM
But this will give you NaN gradient for w1 ! If you look at the gradient of w2, masked values have a null gradient, as expected. But for w1, instead of having a vector of null gradients, we have a vector of NaNs. This completly breaks gradient backprop and grad descent. So, is there a nice solution?
November 4, 2025 at 9:12 AM
The input of the first logsumexp is completly masked, the second one is partially masked, the last one has no masked applied on it. Obviously, the first logit is equal to -inf. This means a "masked output probability" after softmax. We can then just compute a loss and backpropagate gradient.
November 4, 2025 at 9:12 AM
Qu'est ce que ça va être quand ils vont découvrir l'existence du manifeste ou de l'état et la revolution. 🙃
October 2, 2025 at 8:07 PM
J'ai pas encore testé, mais le dernier Mafia à l'air ultra stylé
August 30, 2025 at 11:49 PM
Reposted by Caio
For updates on AI, I increasingly just advise people to pick a discord they like and stick with it. Twitter stopped having interesting science chat ages ago, it’s just companies announcing their products and getting a bunch of meme QTs.
August 23, 2025 at 2:29 PM
2 weeks is always unreasonable. 🙃
August 14, 2025 at 12:10 AM