Lightnews — Scholar-powered news

Computational Linguistics @UPF @colt-upf.bsky.social · 18h

Great work by Xixian Liao, Thomas Brochhagen, @gboleda.bsky.social and @laiamayol.bsky.social !

Computational Linguistics @UPF @colt-upf.bsky.social · 18h

Do you use a pronoun more often when the entity you’re talking about is more predictable?

Previous work offers diverging answers so we conducted a meta-analysis, combining data from 20 studies across 8 different languages.

Now out in Language: muse.jhu.edu/article/969615

1 2

Reposted by Computational Linguistics @UPF

Facultat de Traducció i Ciències del Llenguatge de la UPF @traduccioupf.bsky.social · 12d

📢 Seminari de recerca organitzat pel COLT- URLING, "LLM and human language: representations, judgments, and historical change".

📆 29/09/2025
🕦 15:30
🎤 Adele Goldberg (Princeton University)
🚩55.410, Edifici Tànger del Campus Poblenou - UPF
ℹ️ ja.cat/wi2t7

@colt-upf.bsky.social

1

Reposted by Computational Linguistics @UPF

Facultat de Traducció i Ciències del Llenguatge de la UPF @traduccioupf.bsky.social · 9d

📢 Seminari de recerca organitzat pel COLT- URLING, "Associative memory in psycholinguistics and in AI architectures".

📆 01/10/2025
🕦 12:00
🎤 Jakub Dotlačil
🚩55.410, Edifici Tànger del Campus Poblenou - UPF
ℹ️ ja.cat/U5xH2

@colt-upf.bsky.social

2

Reposted by Computational Linguistics @UPF

Gemma Boleda @gboleda.bsky.social · 8d

New paper! 🚨 I argue that LLMs represent a synthesis between distributed and symbolic approaches to language, because, when exposed to language, they develop highly symbolic representations and processing mechanisms in addition to distributed ones.
arxiv.org/abs/2502.11856

Sigmoid function. Non-linearities in neural network allow it to behave in distributed and near-symbolic fashions.

1 11 26

Reposted by Computational Linguistics @UPF

Desmond Elliott @delliott.bsky.social · Jul 7

📢I am hiring a Postdoc to work on post-training methods for low-resource languages. Apply by August 15 employment.ku.dk/faculty/?sho....
Let's talk at #ACL2025NLP in Vienna if you want to know more about the position and life in Denmark.

Postdoc in Natural Language Processing

employment.ku.dk

12 23

Reposted by Computational Linguistics @UPF

Alexander Hoyle @alexanderhoyle.bsky.social · Jul 8

Evaluating topic models (and document clustering methods) is hard. In fact, since our paper critiquing standard evaluation practices four years ago, there hasn't been a good replacement metric

That ends today (we hope)! Our new ACL paper introduces an LLM-based evaluation protocol 🧵

Screenshot of first page of paper. It is here: https://arxiv.org/pdf/2507.00828

Abstract: Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations

3 10 52

Computational Linguistics @UPF @colt-upf.bsky.social · Jul 8

🎉New paper "Prediction Hubs are Context-Informed Frequent Tokens in LLMs" from our lab, accepted at ACL 2025!

If you're interested in representational geometry, come find Beatrix Nielsen and Marco Baroni at the poster :)

Beatrix M. G. Nielsen @beatrixmgn.bsky.social · Jul 7

Our paper "Prediction Hubs are Context-Informed Frequent Tokens in LLMs" has been accepted at ACL 2025!

Main points:
1. Hubness is not a problem when language models do next-token prediction.
2. Nuisance hubness can appear when other comparisons are made.

1

Computational Linguistics @UPF @colt-upf.bsky.social · Jun 2

Today at UPF Campus de la Ciutadella at 2:30 pm! Come slightly earlier to check in!

Sala Polivalent 24S18

maps.app.goo.gl/n1hBxiviKcLW...

Computational Linguistics @UPF @colt-upf.bsky.social · May 20

⭐ Registration open til May 27th! ⭐
Website: www.upf.edu/web/colt/sym...

June 2nd, UPF

𝗦𝗽𝗲𝗮𝗸𝗲𝗿 𝗹𝗶𝗻𝗲𝘂𝗽:
Arianna Bisazza (language acquisition with NNs)
Naomi Saphra (emergence in LLM training dynamics)
Jean-Rémi King (TBD)
Louise McNally (pitfalls of contextual/formal accounts of semantics)

Computational Linguistics @UPF @colt-upf.bsky.social · May 13

Announcing the COLT Symposium on June 2nd!

𝗘𝗺𝗲𝗿𝗴𝗲𝗻𝘁 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 𝗼𝗳 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗶𝗻 𝗺𝗶𝗻𝗱𝘀 𝗮𝗻𝗱 𝗺𝗮𝗰𝗵𝗶𝗻𝗲𝘀

What properties of language are emerging from work in experimental and theoretical linguistics, neuroscience & LLM interpretability?

Info: tinyurl.com/colt-site
Register: tinyurl.com/colt-register

🧵1/3

2

Computational Linguistics @UPF @colt-upf.bsky.social · May 29

📢 𝗟𝗼𝗰𝗮𝘁𝗶𝗼𝗻 𝗰𝗵𝗮𝗻𝗴𝗲📢

UPF Campus de la Ciutadella
**Sala Polivalent 24.S18**

Thank you for bearing with us!

Computational Linguistics @UPF @colt-upf.bsky.social · May 26

Last day to sign up for the COLT Symposium!
Register: tinyurl.com/colt-register

📢 𝗟𝗼𝗰𝗮𝘁𝗶𝗼𝗻 𝗰𝗵𝗮𝗻𝗴𝗲📢
June 2nd, 14:30 - 19:00

UPF Campus de la Ciutadella
Room 40.101

maps.app.goo.gl/1216LJRsWmTE...

Computational Linguistics @UPF @colt-upf.bsky.social · May 20

⭐ Registration open til May 27th! ⭐
Website: www.upf.edu/web/colt/sym...

June 2nd, UPF

𝗦𝗽𝗲𝗮𝗸𝗲𝗿 𝗹𝗶𝗻𝗲𝘂𝗽:
Arianna Bisazza (language acquisition with NNs)
Naomi Saphra (emergence in LLM training dynamics)
Jean-Rémi King (TBD)
Louise McNally (pitfalls of contextual/formal accounts of semantics)

Computational Linguistics @UPF @colt-upf.bsky.social · May 26

Last day to sign up for the COLT Symposium!
Register: tinyurl.com/colt-register

📢 𝗟𝗼𝗰𝗮𝘁𝗶𝗼𝗻 𝗰𝗵𝗮𝗻𝗴𝗲📢
June 2nd, 14:30 - 19:00

UPF Campus de la Ciutadella
Room 40.101

maps.app.goo.gl/1216LJRsWmTE...

Computational Linguistics @UPF @colt-upf.bsky.social · May 20

⭐ Registration open til May 27th! ⭐
Website: www.upf.edu/web/colt/sym...

June 2nd, UPF

𝗦𝗽𝗲𝗮𝗸𝗲𝗿 𝗹𝗶𝗻𝗲𝘂𝗽:
Arianna Bisazza (language acquisition with NNs)
Naomi Saphra (emergence in LLM training dynamics)
Jean-Rémi King (TBD)
Louise McNally (pitfalls of contextual/formal accounts of semantics)

Computational Linguistics @UPF @colt-upf.bsky.social · May 13

Announcing the COLT Symposium on June 2nd!

𝗘𝗺𝗲𝗿𝗴𝗲𝗻𝘁 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 𝗼𝗳 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗶𝗻 𝗺𝗶𝗻𝗱𝘀 𝗮𝗻𝗱 𝗺𝗮𝗰𝗵𝗶𝗻𝗲𝘀

What properties of language are emerging from work in experimental and theoretical linguistics, neuroscience & LLM interpretability?

Info: tinyurl.com/colt-site
Register: tinyurl.com/colt-register

🧵1/3

1 5

Computational Linguistics @UPF @colt-upf.bsky.social · May 20

⭐ Registration open til May 27th! ⭐
Website: www.upf.edu/web/colt/sym...

June 2nd, UPF

𝗦𝗽𝗲𝗮𝗸𝗲𝗿 𝗹𝗶𝗻𝗲𝘂𝗽:
Arianna Bisazza (language acquisition with NNs)
Naomi Saphra (emergence in LLM training dynamics)
Jean-Rémi King (TBD)
Louise McNally (pitfalls of contextual/formal accounts of semantics)

Computational Linguistics @UPF @colt-upf.bsky.social · May 13

Announcing the COLT Symposium on June 2nd!

𝗘𝗺𝗲𝗿𝗴𝗲𝗻𝘁 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 𝗼𝗳 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗶𝗻 𝗺𝗶𝗻𝗱𝘀 𝗮𝗻𝗱 𝗺𝗮𝗰𝗵𝗶𝗻𝗲𝘀

What properties of language are emerging from work in experimental and theoretical linguistics, neuroscience & LLM interpretability?

Info: tinyurl.com/colt-site
Register: tinyurl.com/colt-register

🧵1/3

1 4

Computational Linguistics @UPF @colt-upf.bsky.social · May 14

Updated website: www.upf.edu/web/colt/sym...

Computational Linguistics @UPF @colt-upf.bsky.social · May 13

Announcing the COLT Symposium on June 2nd!

𝗘𝗺𝗲𝗿𝗴𝗲𝗻𝘁 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 𝗼𝗳 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗶𝗻 𝗺𝗶𝗻𝗱𝘀 𝗮𝗻𝗱 𝗺𝗮𝗰𝗵𝗶𝗻𝗲𝘀

What properties of language are emerging from work in experimental and theoretical linguistics, neuroscience & LLM interpretability?

Info: tinyurl.com/colt-site
Register: tinyurl.com/colt-register

🧵1/3

Computational Linguistics @UPF @colt-upf.bsky.social · May 13

𝗚𝗲𝘁𝘁𝗶𝗻𝗴 𝘁𝗵𝗲𝗿𝗲:

𝗪𝗵𝗲𝗻: 2nd June 2025, 14:30 - 19:00
𝗪𝗵𝗲𝗿𝗲: UPF Poblenou, Auditori (enter via Roc Boronat building) maps.app.goo.gl/2WMt21hR5L9r...

In-person only, with mandatory registration:
tinyurl.com/colt-register

See you there!

🧵3/3

Computational Linguistics @UPF @colt-upf.bsky.social · May 13

Our speakers span a wide range of expertise between AI, linguistics, and neuroscience.

14:30 Arianna Bisazza (Uni. Groningen)
15:30 Naomi Saphra (Harvard)

-- coffee break --

17:00 Jean-Rémi King (Meta AI)
18:00 Louise McNally (UPF)

Abstracts: tinyurl.com/colt-site

🧵2/3

1 1

Computational Linguistics @UPF @colt-upf.bsky.social · May 13

Announcing the COLT Symposium on June 2nd!

𝗘𝗺𝗲𝗿𝗴𝗲𝗻𝘁 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 𝗼𝗳 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗶𝗻 𝗺𝗶𝗻𝗱𝘀 𝗮𝗻𝗱 𝗺𝗮𝗰𝗵𝗶𝗻𝗲𝘀

What properties of language are emerging from work in experimental and theoretical linguistics, neuroscience & LLM interpretability?

Info: tinyurl.com/colt-site
Register: tinyurl.com/colt-register

🧵1/3

1 2 4

Computational Linguistics @UPF @colt-upf.bsky.social · Apr 22

Please find us at #ICLR2025! We will present our work on intrinsic dimension as a cue for stages of language processing in LLMs.

Saturday morning, Poster session 5
Hall 3 + Hall2B #563
iclr.cc/virtual/2025...

Arxiv: arxiv.org/abs/2405.15471

1

Reposted by Computational Linguistics @UPF

ERCbravenewword @ercbravenewword.bsky.social · Mar 3

📢 Upcoming Seminar

Words are weird? On the role of lexical ambiguity in language
🗣 Gemma Boleda (Universitat Pompeu Fabra, Spain)
Why is language so ambiguous? Discover how ambiguity balances cognitive simplicity and communicative complexity through large-scale studies.
📍 UniMiB, Room U6-01C, Milan

2 6 13

Computational Linguistics @UPF @colt-upf.bsky.social · Feb 24

⚡New position paper from Gemma Boleda: is it time to make peace between symbolic and continuous approaches to language?

Gemma Boleda @gboleda.bsky.social · Feb 24

new pre-print: LLMs as a synthesis between symbolic and continuous approaches to language arxiv.org/abs/2502.11856

LLMs as a synthesis between symbolic and continuous approaches to language

Since the middle of the 20th century, a fierce battle is being fought between symbolic and continuous approaches to language and cognition. The success of deep learning models, and LLMs in particular,...

arxiv.org

3

Reposted by Computational Linguistics @UPF

Beatrix M. G. Nielsen @beatrixmgn.bsky.social · Feb 24

The project I did with Marco Baroni and Iuri Macocco while I was in Barcelona is now on Arxiv: arxiv.org/abs/2502.10201 🎉

TLDR below 👇

Prediction hubs are context-informed frequent tokens in LLMs

Hubness, the tendency for few points to be among the nearest neighbours of a disproportionate number of other points, commonly arises when applying standard distance measures to high-dimensional data,...

arxiv.org

1 2 3

Reposted by Computational Linguistics @UPF

Gemma Boleda @gboleda.bsky.social · Feb 5

This year, CoNLL will be accepting *non-archival* (as well as archival) submissions! www.conll.org #CoNLL2025

Follow CoNLL at
@conll-conf.bsky.social

CoNLL 2025 | CoNLL

www.conll.org

1 1

Reposted by Computational Linguistics @UPF

Emily Cheng @emcheng.bsky.social · Feb 2

Here's our work accepted to #ICLR2025!

We look at how intrinsic dimension evolves over LLM layers, spotting a universal high-dimensional phase.

This ID peak is where:

- linguistic features are built
- different LLMs are most similar,

with implications for task transfer

🧵 1/6

1 2 11

Reposted by Computational Linguistics @UPF

Deep Learning Barcelona @dlbcnai.bsky.social · Dec 16

Què és l’aprenentatge profund ?

La @marionamec.bsky.social de @neurofregides.bsky.social ens ho explica en motiu del Deep Learning Barcelona Symposium 2024 (@dlbcn.ai), aquest dijous 19 de desembre.

#deeplearning #ciencia #català #barcelona

www.youtube.com/shorts/R4u_Z...

Què és l'aprenentatge profund ? - La Dimoni de Maxwell #deeplearning #ciencia #català #barcelona

YouTube video by Deep Learning Barcelona

www.youtube.com

3 7

Computational Linguistics @UPF @colt-upf.bsky.social · Dec 2

Conclusion: for communication in-context,

Lexical systems with a soft mapping between referents and names let speakers maximize communication accuracy while minimizing complexity.

Paper: aclanthology.org/2024.emnlp-m...

3/3

7

Computational Linguistics @UPF @colt-upf.bsky.social · Dec 2

We explored, for a color naming task, why a soft mapping between referents and words is a good solution for communication...

...by taking into account
1⃣ in-context communication
2⃣ the hierarchical structure of the lexicon

2/3

1 4