You can try out recipes👩🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵
Paper: arxiv.org/abs/2511.02817
Dataset: huggingface.co/oolongbench
Code: github.com/abertsch72/o...
Leaderboard: oolongbench.github.io
Paper: arxiv.org/abs/2511.02817
Dataset: huggingface.co/oolongbench
Code: github.com/abertsch72/o...
Leaderboard: oolongbench.github.io
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
✍️Thanks to my illustrious coauthors @clarana.bsky.social @jaredfern.bsky.social timdettmers.com @strubell.bsky.social @jessedodge.bsky.social, t'was a fun project 🌏
✍️Thanks to my illustrious coauthors @clarana.bsky.social @jaredfern.bsky.social timdettmers.com @strubell.bsky.social @jessedodge.bsky.social, t'was a fun project 🌏
✍️Thanks to my illustrious coauthors @clarana.bsky.social @jaredfern.bsky.social timdettmers.com @strubell.bsky.social @jessedodge.bsky.social, t'was a fun project 🌏
🤞means luck in US but deeply offensive in Vietnam 🚨
📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!
📜: arxiv.org/abs/2502.17710
🤞means luck in US but deeply offensive in Vietnam 🚨
📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!
📜: arxiv.org/abs/2502.17710
today @akshitab.bsky.social @natolambert.bsky.social and I are giving our #neurips2024 tutorial on language model development.
everything from data, training, adaptation. published or not, no secrets 🫡
tues, 12/10, 9:30am PT ☕️
neurips.cc/virtual/2024...
today @akshitab.bsky.social @natolambert.bsky.social and I are giving our #neurips2024 tutorial on language model development.
everything from data, training, adaptation. published or not, no secrets 🫡
tues, 12/10, 9:30am PT ☕️
neurips.cc/virtual/2024...
It isn’t just about making models reusable. If the origin of data is opaque, if labor is hidden & exploited, if frameworks are dominated by Big Tech, if computational power is mastered by an oligopoly…‘open’ is just a label.
Meredith Whittaker & friends in Nature.
@smw.bsky.social, @davidthewid.bsky.social & I correct the record👇
nature.com/articles/s41...
It isn’t just about making models reusable. If the origin of data is opaque, if labor is hidden & exploited, if frameworks are dominated by Big Tech, if computational power is mastered by an oligopoly…‘open’ is just a label.
Meredith Whittaker & friends in Nature.
Students do different research, go on the job market, and recruit other students. Ping me and I'll add you!
Students do different research, go on the job market, and recruit other students. Ping me and I'll add you!
🌟 In our new paper, we rethink how we should be controlling for these factors 🧵:
🌟 In our new paper, we rethink how we should be controlling for these factors 🧵:
You can try out recipes👩🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵
✨linguistically+cognitively motivated evaluation
✨NLP for low-resource+endangered languages
✨figuring out what features of language data LMs are *actually* learning
I'll be presenting two posters 🧵:
✨linguistically+cognitively motivated evaluation
✨NLP for low-resource+endangered languages
✨figuring out what features of language data LMs are *actually* learning
I'll be presenting two posters 🧵:
aclanthology.org/2024.emnlp-m...
aclanthology.org/2024.emnlp-m...
You can try out recipes👩🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵
You can try out recipes👩🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵
Speaking for myself and my "early career" goals, the anonymity deadlines are incredibly stressful and (as far as I can tell) not beneficial to me.
Speaking for myself and my "early career" goals, the anonymity deadlines are incredibly stressful and (as far as I can tell) not beneficial to me.