Lightnews — Scholar-powered news

Jaap Jumelet @jumelet.bsky.social · Sep 1

Wij speelden als kind (in Breda) vaak "1 keer tets", waar je een voetbal maximaal 1 keer mocht laten stuiteren; ik had ook geen idee dat dat een Brabants woord was.

1

Jaap Jumelet @jumelet.bsky.social · Aug 1

Happening now at the SIGTYP poster session! Come talk to Leonie and me about MultiBLiMP!

1 2 20

Reposted by Jaap Jumelet

Jelle Zuidema 🟥 @wzuidema.bsky.social · Jul 29

I'll be in Vienna only from tomorrow, but today my star PhD student Marianne is already presenting some of our work:

BLIMP-NL, in which we create a large new dataset for syntactic evaluation of Dutch LLMs, and learn a lot about dataset creation, LLM evaluation and grammatical abilities on the way.

Marianne de Heer Kloots @mdhk.net · Jul 24

Next week I’ll be in Vienna for my first *ACL conference! 🇦🇹✨

I will present our new BLiMP-NL dataset for evaluating language models on Dutch syntactic minimal pairs and human acceptability judgments ⬇️

🗓️ Tuesday, July 29th, 16:00-17:30, Hall X4 / X5 (Austria Center Vienna)

The BLiMP-NL dataset consists of 84 Dutch minimal pair paradigms covering 22 syntactic phenomena, and comes with graded human acceptability ratings & self-paced reading times.

An example minimal pair:
A. Ik bekijk de foto van mezelf in de kamer (I watch the photograph of myself in the room; grammatical)
B. Wij bekijken de foto van mezelf in de kamer (We watch the photograph of myself in the room; ungrammatical)

Differences in human acceptability ratings between sentences correlate with differences in model syntactic log-odds ratio scores.

1 1 11

Jaap Jumelet @jumelet.bsky.social · Jul 1

Congrats and good luck in Canada!

1

Reposted by Jaap Jumelet

Arianna Bisazza @arianna-bis.bsky.social · Jun 19

Proud to introduce TurBLiMP, the 1st benchmark of minimal pairs for free-order, morphologically rich Turkish language!

Pre-print: arxiv.org/abs/2506.13487

Fruit of an almost year-long project by amazing MS student @ezgibasar.bsky.social in collab w/ @frap98.bsky.social and @jumelet.bsky.social

TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs

We introduce TurBLiMP, the first Turkish benchmark of linguistic minimal pairs, designed to evaluate the linguistic abilities of monolingual and multilingual language models (LMs). Covering 16 linguis...

arxiv.org

1 2 11

Reposted by Jaap Jumelet

Casper Albers 🟥 @casperalbers.nl · Jun 13

Ik snap niet dat hier niet meer ophef over is:

Het binnenhalen van Amerikaanse wetenschappers wordt betaalt door Nederlandse academici geen inflatiecorrectie op hun salaris te geven.

1/2

5 35 66

Jaap Jumelet @jumelet.bsky.social · Jun 12

Ohh cool! Nice to see the interactions-as-structure idea I had back in 2021 is still being explored!

3

Reposted by Jaap Jumelet

Catherine Arnett @ 🍁COLM🍁 @catherinearnett.bsky.social · Jun 5

My paper with @tylerachang.bsky.social and @jamichaelov.bsky.social will appear at #ACL2025NLP! The updated preprint is available on arxiv. I look forward to chatting about bilingual models in Vienna!

Catherine Arnett @ 🍁COLM🍁 @catherinearnett.bsky.social · Mar 7

✨New pre-print✨ Crosslingual transfer allows models to leverage their representations for one language to improve performance on another language. We characterize the acquisition of shared representations in order to better understand how and when crosslingual transfer happens.

1 2 8

Reposted by Jaap Jumelet

Francesca Padovani @frap98.bsky.social · May 30

“Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models”

I’m happy to share that the preprint of my first PhD project is now online!

🎊 Paper: arxiv.org/abs/2505.23689

Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models

Seminal work by Huebner et al. (2021) showed that language models (LMs) trained on English Child-Directed Language (CDL) can reach similar syntactic abilities as LMs trained on much larger amounts of ...

arxiv.org

2 17 62

Reposted by Jaap Jumelet

Neil Renic @ncrenic.bsky.social · May 28

"A well-delivered lecture isn’t primarily a delivery system for information. It is an ignition point for curiosity, all the better for being experienced in an audience."

Marvellous defence of the increasingly maligned university experience by @patporter76.bsky.social
thecritic.co.uk/university-a...

University: a good idea | Patrick Porter | The Critic Magazine

A former student of mine has penned an attack on universities, derived from their own disappointing experience studying Politics and International Relations at the place where I ply my trade. In short...

thecritic.co.uk

19 59

Reposted by Jaap Jumelet

Miryam de Lhoneux @mdlhx.bsky.social · May 16

Interested in multilingual tokenization in #NLP? Lisa Beinborn and I are hiring!

PhD candidate position in Göttingen, Germany: www.uni-goettingen.de/de/644546.ht...

PostDoc position in Leuven, Belgium:
www.kuleuven.be/personeel/jo...

Deadline 6th of June

Stellen OBP - Georg-August-Universität Göttingen

Webseiten der Georg-August-Universität Göttingen

www.uni-goettingen.de

2 13 25

Reposted by Jaap Jumelet

BlackboxNLP @blackboxnlp.bsky.social · May 15

BlackboxNLP, the leading workshop on interpretability and analysis of language models, will be co-located with EMNLP 2025 in Suzhou this November! 📆

This edition will feature a new shared task on circuits/causal variable localization in LMs, details here: blackboxnlp.github.io/2025/task

3 8 21

Reposted by Jaap Jumelet

Leshem (Legend) Choshen @ICML @ACL @lchoshen.bsky.social · May 9

Close your books, test time!
The evaluation pipelines are out, baselines are released & the challenge is on

There is still time to join and
We are excited to learn from you on pretraining and human-model gaps

*Don't forget to fastEval on checkpoints
github.com/babylm/evalu...
📈🤖🧠
#AI #LLMS

4 10

Reposted by Jaap Jumelet

Seth Aycock @sethjsa.bsky.social · Apr 25

Pleased to announce our paper was accepted at ICLR 2025 as a Spotlight! I will present our poster on Saturday April 26, 3-5pm, Poster #241. Hope to see you there!
arxiv.org/abs/2409.19151

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Extremely low-resource (XLR) languages lack substantial corpora for training NLP models, motivating the use of all available resources such as dictionaries and grammar books. Machine Translation from ...

arxiv.org

3 17

Jaap Jumelet @jumelet.bsky.social · Apr 23

Scherp geschreven en geheel mee eens, maar beetje wrang wel dat de boodschap zich achter een paywall van 450 euro bevindt :') (dank voor de screenshots!)

1 1

Reposted by Jaap Jumelet

Jirui Qi @jiruiqi.bsky.social · Apr 11

✨ New Paper ✨
[1/] Retrieving passages from many languages can boost retrieval augmented generation (RAG) performance, but how good are LLMs at dealing with multilingual contexts in the prompt?

📄 Check it out: arxiv.org/abs/2504.00597
(w/ @arianna-bis.bsky.social @Raquel_Fernández)

#NLProc

1 5 4

Jaap Jumelet @jumelet.bsky.social · Apr 17

That is definitely possible indeed, and a potential confounding factor. In RuBLiMP, a Russian benchmark, they defined a way to validate this based on LM probs, but we left that open for future work. The poor performance on low-res langs shows they're definitely not trained on all of UD though!

1 1

Reposted by Jaap Jumelet

Jaap Jumelet @jumelet.bsky.social · Apr 7

✨New paper ✨

Introducing 🌍MultiBLiMP 1.0: A Massively Multilingual Benchmark of Minimal Pairs for Subject-Verb Agreement, covering 101 languages!

We present over 125,000 minimal pairs and evaluate 17 LLMs, finding that support is still lacking for many languages.

🧵⬇️

3 22 75

Reposted by Jaap Jumelet

Arianna Bisazza @arianna-bis.bsky.social · Apr 8

Modern LLMs "speak" hundreds of languages... but do they really?
Multilinguality claims are often based on downstream tasks like QA & MT, while *formal* linguistic competence remains hard to gauge in lots of languages

Meet MultiBLiMP!
(joint work w/ @jumelet.bsky.social & @weissweiler.bsky.social)

Jaap Jumelet @jumelet.bsky.social · Apr 7

✨New paper ✨

Introducing 🌍MultiBLiMP 1.0: A Massively Multilingual Benchmark of Minimal Pairs for Subject-Verb Agreement, covering 101 languages!

We present over 125,000 minimal pairs and evaluate 17 LLMs, finding that support is still lacking for many languages.

🧵⬇️

2 6 21

Jaap Jumelet @jumelet.bsky.social · Apr 7

Joint work with @weissweiler.bsky.social and @arianna-bis.bsky.social.
Check out the full paper at arxiv.org/abs/2504.02768!

We have released all our data on huggingface huggingface.co/datasets/jum...

We hope to extend this pipeline to many more phenomena!

MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

We introduce MultiBLiMP 1.0, a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages, 6 linguistic phenomena and containing more than 125,000 minimal pairs. Our minimal ...

arxiv.org

1 6

Jaap Jumelet @jumelet.bsky.social · Apr 7

Person agreement is easier to model than Gender or Number. Sentences with higher overall perplexity lead to less accurate judgements, and models are more likely to pick the wrong inflection if it is split into more tokens. Surprisingly, subject-verb distance has no effect.

1 3

Jaap Jumelet @jumelet.bsky.social · Apr 7

We find that boosting specific languages works, but only if you pre-, and not post-train: EuroLLM outperforms same size Llama3 on its target languages, but Aya is not significantly better. Neither of them outperform Llama3 significantly on a language not intentionally included.

1 2

Jaap Jumelet @jumelet.bsky.social · Apr 7

We evaluate 17 Language Models, among them Llama 3, Aya, and Gemma 3.
Overall, Llama3 70B and Gemma 27B perform best, but the monolingual 500M Goldfish models significantly outperform them in 14 languages!

Base models consistently outperform their instruction-tuned counterparts.

2 5

Jaap Jumelet @jumelet.bsky.social · Apr 7

We create 125,000 pairs for 101 languages and six types of agreement, resulting in high diversity across phenomena, typological families, geography, amount of resources available, sentence length, and word frequencies. 43 of our languages are not Indo-European.

1 1

Jaap Jumelet @jumelet.bsky.social · Apr 7

MultiBLiMP is created automatically using Universal Dependencies and Universal Morphology.

We search for subject-verb or -participle pairs with our target features Number, Person, and Gender in UD, then insert the word with the opposite feature value to form a minimal pair.

1 2