Introducing 🌍MultiBLiMP 1.0: A Massively Multilingual Benchmark of Minimal Pairs for Subject-Verb Agreement, covering 101 languages!
We present over 125,000 minimal pairs and evaluate 17 LLMs, finding that support is still lacking for many languages.
🧵⬇️
Introducing 🌍MultiBLiMP 1.0: A Massively Multilingual Benchmark of Minimal Pairs for Subject-Verb Agreement, covering 101 languages!
We present over 125,000 minimal pairs and evaluate 17 LLMs, finding that support is still lacking for many languages.
🧵⬇️
consumerresearcher.com/academic-fre...
consumerresearcher.com/academic-fre...
theconversation.com/does-made-wi...
theconversation.com/does-made-wi...
We ran a field experiment on X/Twitter (N=1,256) using LLMs to rerank content in real-time, adjusting exposure to polarizing posts. Result: Algorithmic ranking impacts feelings toward the political outgroup! 🧵⬇️
We ran a field experiment on X/Twitter (N=1,256) using LLMs to rerank content in real-time, adjusting exposure to polarizing posts. Result: Algorithmic ranking impacts feelings toward the political outgroup! 🧵⬇️