Amir Hossein Kargaran
kargaranamir.bsky.social
Amir Hossein Kargaran
@kargaranamir.bsky.social
PhD Student at @cislmu.bsky.social
Multilingual NLP and LLMs
Twitter: https://x.com/amir_nlp
Homepage: https://kargaranamir.github.io
Are you working on multilingual, multicultural #LLM? Interested in diverse & inclusive language modeling?

😎 Stay tuned at our MELT workshop at #COLM2025

🔗 melt-workshop.github.io

We welcome 2p (EA), 4p (short), 8p (long) papers as well as talented reviewers:

🔗 forms.gle/MYcXED7RLJDS...
June 5, 2025 at 8:39 AM
New paper: How does pretraining on programming languages + English shape LLMs' concept space?
🔍 Do LLMs use English or a programming language as a kind of pivot language?
🧠 Are neurons language-specific or shared across programming languages and English?
🔗 arxiv.org/abs/2506.01074
June 3, 2025 at 5:22 PM
Thanks to everyone who stopped by at our work! I’ll be at the conference until the closing night and would love to meet and connect with more people. Feel free to DM me here or on the Whova app.
December 13, 2024 at 4:18 AM
🇨🇦 I'll be in Montreal December 4–8, then Vancouver for NeurIPS to present our work on pretraining data for minority languages (arxiv.org/abs/2410.23825). Looking forward to reconnecting and meeting new people. DM me if you want to meet in the upcoming days! :)
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
The need for large text corpora has increased with the advent of pretrained language models and, in particular, the discovery of scaling laws for these models. Most available corpora have sufficient d...
arxiv.org
December 1, 2024 at 9:18 PM