Angelika Romanou
@agromanou.bsky.social
500 followers 310 following 19 posts
PhD candidate at EPFL doing research in #NLProc 👩🏻‍💻 https://agromanou.github.io/
Posts Media Videos Starter Packs
Pinned
agromanou.bsky.social
🚀 Introducing INCLUDE 🌍: A multilingual LLM evaluation benchmark spanning 44 languages!

Contains *newly-collected* data, prioritizing *regional knowledge*.
Setting the stage for truly global AI evaluation.
Ready to see how your model measures up?
#AI #Multilingual #LLM #NLProc
Reposted by Angelika Romanou
bayazitdeniz.bsky.social
1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability
agromanou.bsky.social
Proud to have been part of the team behind #Apertus 🌍✨ an open multilingual LLM.

Trained on open data, supporting 1,800+ languages, and built with transparency, compliance & responsible AI in mind.

🤖 Try Apertus models: huggingface.co/collections/...
agromanou.bsky.social
If you’re at @iclr-conf.bsky.social this week, come check out our spotlight poster INCLUDE during the Thursday 3:00–5:30pm session!

I will be there to chat about all things multilingual & multicultural evaluation.

Feel free to reach out anytime during the conference. I’d love to connect!
agromanou.bsky.social
🚀 Introducing INCLUDE 🌍: A multilingual LLM evaluation benchmark spanning 44 languages!

Contains *newly-collected* data, prioritizing *regional knowledge*.
Setting the stage for truly global AI evaluation.
Ready to see how your model measures up?
#AI #Multilingual #LLM #NLProc
Reposted by Angelika Romanou
silingao.bsky.social
NEW PAPER ALERT: Generating visual narratives to illustrate textual stories remains an open challenge, due to the lack of knowledge to constrain faithful and self-consistent generations. Our #CVPR2025 paper proposes a new benchmark, VinaBench, to address this challenge.
Reposted by Angelika Romanou
abosselut.bsky.social
Lots of great news out of the EPFL NLP lab these last few weeks. We'll be at @iclr-conf.bsky.social and @naaclmeeting.bsky.social in April / May to present some of our work in training dynamics, model representations, reasoning, and AI democratization. Come chat with us during the conference!
Reposted by Angelika Romanou
bkhmsi.bsky.social
🚨 New Paper!

Can neuroscience localizers uncover brain-like functional specializations in LLMs? 🧠🤖

Yes! We analyzed 18 LLMs and found units mirroring the brain's language, theory of mind, and multiple demand networks!

w/ @gretatuckute.bsky.social, @abosselut.bsky.social, @mschrimpf.bsky.social
🧵👇
Reposted by Angelika Romanou
smamooler.bsky.social
🚀 Introducing PICLe: a framework for in-context named-entity detection (NED) using pseudo-annotated demonstrations.
🎯 No human labeling needed—yet it outperforms few-shot learning with human annotations!
#AI #NLProc #LLMs #ICL #NER
agromanou.bsky.social
Introducing Global-MMLU🌍: A multilingual benchmark featuring MMLU translations in 42 languages crafted with:
✅ Human curation
✅ Extensive metadata
✅ Insights into cultural sensitivity

Proud to have collaborated with Shivalika Singh, @sarahooker.bsky.social and Cohere For AI!
sarahooker.bsky.social
Is MMLU Western-centric? 🤔

As part of a massive cross-institutional collaboration:
🗽Find MMLU is heavily overfit to western culture
🔍 Professional annotation of cultural sensitivity data
🌍 Release improved Global-MMLU 42 languages

📜 Paper: arxiv.org/pdf/2412.03304
📂 Data: hf.co/datasets/Coh...
Reposted by Angelika Romanou
abosselut.bsky.social
1/ 📘 Could ChatGPT get an engineering degree? Spoiler, yes! In our new @pnas.org article, we explore how AI assistants like GPT-4 perform in STEM university courses — and on average they pass a staggering 91.7% of core courses. 🧵 #AI #HigherEd #STEM #LLMs #NLProc
agromanou.bsky.social
👏 As well as the fantastic multilingual research community that helped us collect and validate INCLUDE!
agromanou.bsky.social
🙏 We thank our amazing core team and advisors:
@negarforoutan.bsky.social, Anna Sotnikova, @eric-zemingchen.bsky.social, Sree Harsha Nelaturu, Shivalika Singh, Rishabh Maheshwary, Micol Altomare, Mohamed A Haggag, Imanol Schlag, @mziizm.bsky.social, @sarahooker.bsky.social, @abosselut.bsky.social
agromanou.bsky.social
For easy evaluation, we provide the following subsets:
INCLUDE-base: up to 550 samples per language, totaling ~23K questions
🤗 : huggingface.co/datasets/Coh...
INCLUDE-lite: up to 250 samples per language, totaling ~11K questions
🤗 : huggingface.co/datasets/Coh...
agromanou.bsky.social
🤝 Information is transferred across languages of the same script, though untrained languages might also excel due to potential data contamination.

🌎 Models can struggle with non-English instructions, entangling knowledge evaluation with other factors such as task formatting.
agromanou.bsky.social
Analysis shows:
📚 Models have a long way to go in capturing the regional knowledge reflected in languages.

💪 Model scale improves regional knowledge understanding, but other techniques like CoT or instruction tuning have minimal or negative impacts.
agromanou.bsky.social
To build INCLUDE, we collected ~200K MCQ data from 44 languages and 58 knowledge domains, collected from local sources in 52 countries, representing a rich array of cultural and regional knowledge.
agromanou.bsky.social
🤔 Why is regional knowledge so important?

Users expect #LLMs to know information relevant to their environments— customs, culture, etc.
To be relevant & relatable, LLMs need to know these nuances. It's not just global knowledge; it's about meeting user needs where they are.
agromanou.bsky.social
🌍 First, what is regional knowledge?

It's the local info, culture & practices of a regional context. US Law is a great topic, but not as relevant for multilingual LLMs for other regions.

For INCLUDE, we collect regional knowledge rather than translating Western-centric benchmarks.
agromanou.bsky.social
🚀 Introducing INCLUDE 🌍: A multilingual LLM evaluation benchmark spanning 44 languages!

Contains *newly-collected* data, prioritizing *regional knowledge*.
Setting the stage for truly global AI evaluation.
Ready to see how your model measures up?
#AI #Multilingual #LLM #NLProc
Reposted by Angelika Romanou
abosselut.bsky.social
[email protected] is hiring for multiple positions in CS (including one open call): www.epfl.ch/about/workin...

Apply to come join us in Beautiful Lausanne!
Open Faculty Positions
-
www.epfl.ch