Our work also raises a deeper question: which of the attention or MLP blocks have to be adapted to steer a model to your specific downstream application? Tons of open questions to explore!
This amazing work was led by
@pszwnzl.bsky.social , Wojciech Jasiński, Marek Śmieja and Bartosz Zielinski