TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8
#NLProc #LLM #AIResearch
TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8
#NLProc #LLM #AIResearch
(I should have printed more stickers as they were more popular than I anticipated😅)
(I should have printed more stickers as they were more popular than I anticipated😅)
📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581
📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581
Case in point, we are looking to expand the research/foundation models team at Orby AI and are looking for highly motivated researchers and ML/Research engineers. Please reach out if you're interested in learning more!
/fin
Case in point, we are looking to expand the research/foundation models team at Orby AI and are looking for highly motivated researchers and ML/Research engineers. Please reach out if you're interested in learning more!
/fin
I'll try my best and see if I can get 100% of my reviews to be 'great' this round.
If you didn't see it already, ARR publishes how many of your reviews are considered to be 'great': stats.aclrollingreview.org
Join me for the challenge :)
I'll try my best and see if I can get 100% of my reviews to be 'great' this round.
If you didn't see it already, ARR publishes how many of your reviews are considered to be 'great': stats.aclrollingreview.org
Join me for the challenge :)
I will be presenting at #NeurIPS2024 and am happy to chat in-person or digitally!
I work on developing AI agents that can collaborate and communicate robustly with us and each other.
More at: esteng.github.io and in thread below
🧵👇
I will be presenting at #NeurIPS2024 and am happy to chat in-person or digitally!
I work on developing AI agents that can collaborate and communicate robustly with us and each other.
More at: esteng.github.io and in thread below
🧵👇
As part of a massive cross-institutional collaboration:
🗽Find MMLU is heavily overfit to western culture
🔍 Professional annotation of cultural sensitivity data
🌍 Release improved Global-MMLU 42 languages
📜 Paper: arxiv.org/pdf/2412.03304
📂 Data: hf.co/datasets/Coh...
As part of a massive cross-institutional collaboration:
🗽Find MMLU is heavily overfit to western culture
🔍 Professional annotation of cultural sensitivity data
🌍 Release improved Global-MMLU 42 languages
📜 Paper: arxiv.org/pdf/2412.03304
📂 Data: hf.co/datasets/Coh...
I'm searching for faculty positions/postdocs in multilingual/multicultural NLP, vision+language models, and eval for genAI!
I'll be at #NeurIPS2024 presenting our work on meta-evaluation for text-to-image faithfulness! Let's chat there!
Papers in🧵, see more: saxon.me
I'm searching for faculty positions/postdocs in multilingual/multicultural NLP, vision+language models, and eval for genAI!
I'll be at #NeurIPS2024 presenting our work on meta-evaluation for text-to-image faithfulness! Let's chat there!
Papers in🧵, see more: saxon.me
🐟 7B and 13B weights, trained up to 4-5T tokens, fully open data, code, etc
🐠 better architecture and recipe for training stability
🐡 staged training, with new data mix Dolmino🍕 added during annealing
🦈 state-of-the-art OLMo 2 Instruct models
#nlp #mlsky
links below👇
Please reply or DM me if you're doing research at CLSP and would like to be added - I'm still trying to find out which of us are on here so far.
go.bsky.app/JtWKca2
Please reply or DM me if you're doing research at CLSP and would like to be added - I'm still trying to find out which of us are on here so far.
go.bsky.app/JtWKca2