Najoung Kim
@najoung.bsky.social
520 followers 110 following 31 posts
https://najoung.kim langauge
Posts Media Videos Starter Packs
Pinned
najoung.bsky.social
Seeing an experiment and thinking "but have they tried X? what if we do Y?" is a key part of research and a start to new discoveries. RExBench tests if coding agents can implement new extensions.

It complements recent evals (eg PaperBench from OpenAI
) on replication! See 👇 for details
sebschu.bsky.social
Can coding agents autonomously implement AI research extensions?

We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code.

Finding: Most agents we tested had a low success rate, but there is promise!
Screenshot of the RExBench preprint title page.
najoung.bsky.social
Required qualifications: BA, hands-on engineering experience, and interest in research. Please help spread the word and share with people who might be a good fit! 🪄

Unfortunately only open to US citizens/permanent residents/independent work auth holders.
najoung.bsky.social
The first part of this effort was RExBench: bsky.app/profile/najo...

There will be some freedom of scope for the second part of the project within the theme of research agent evaluation - the RA will contribute to scoping the project along with the team as well.
najoung.bsky.social
Seeing an experiment and thinking "but have they tried X? what if we do Y?" is a key part of research and a start to new discoveries. RExBench tests if coding agents can implement new extensions.

It complements recent evals (eg PaperBench from OpenAI
) on replication! See 👇 for details
sebschu.bsky.social
Can coding agents autonomously implement AI research extensions?

We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code.

Finding: Most agents we tested had a low success rate, but there is promise!
najoung.bsky.social
👾 Full-time research assistant position (1 year) with @sebschu.bsky.social and me! 👾

We're looking for someone to join the research agent evaluation team, starting Fall 2025. Application link to be available soon, but feel free to send us your CV and/or come talk to us at #ACL2025. 🧵
Reposted by Najoung Kim
lizzieloo.bsky.social
I have a sabbatical coming up and I'm going to Nepal! Why Nepal? You can read about it at my first blog post about this trip.

TL;DR: linguistic diversity, writing systems (and pretty scripts), classifiers, and a school for Newar kids in Kathmandu.

sites.bu.edu/lislab/2025/...
Field trip to Nepal! | Linguistic Semantics Lab (LiSLab)
sites.bu.edu
najoung.bsky.social
ever since VLMs were a thing i've been interested in how the additional visual modality changes language in meaningful ways. after negative findings after negative findings, excited to report this result! proud of our junior authors for digging into this 🐸
yuluqin.bsky.social
Does vision training change how language is represented and used in meaningful ways?🤔The answer is a nuanced yes! Comparing VLM-LM minimal pairs, we find that while the taxonomic organization of the lexicon is similar, VLMs are better at _deploying_ this knowledge. [1/9]
najoung.bsky.social
green carded finally 💚💚
najoung.bsky.social
Excited this is finally out and ❤️ to our team @nedwards99.bsky.social @yukyunglee.bsky.social Audrey Mao and Yulu Qin! of course @sebschu great as always 🙏🙏
najoung.bsky.social
Seeing an experiment and thinking "but have they tried X? what if we do Y?" is a key part of research and a start to new discoveries. RExBench tests if coding agents can implement new extensions.

It complements recent evals (eg PaperBench from OpenAI
) on replication! See 👇 for details
sebschu.bsky.social
Can coding agents autonomously implement AI research extensions?

We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code.

Finding: Most agents we tested had a low success rate, but there is promise!
Screenshot of the RExBench preprint title page.
Reposted by Najoung Kim
koyena.bsky.social
🚨 Registration is live! 🚨

The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!

A chance for the mech interp community to nerd out on how models really work 🧠🤖

🌐 Info: nemiconf.github.io/summer25/
📝 Register: forms.gle/v4kJCweE3UUH...
NEMI 2024 (Last Year)
najoung.bsky.social
i'll be in copenhagen for a few days, lmk if you want to get coffee! will be around most of Thurs and early Sat. alternatively you can also come see me give a talk (at a museum apparently) on Fri:

cphnlp.github.io
Copenhagen NLP Symposium 2025
symposium website
cphnlp.github.io
Reposted by Najoung Kim
annarogers.bsky.social
📢 The Copenhagen NLP Symposium on June 20th!

- Invited talks by @loubnabnl.hf.co (HF) @mziizm.bsky.social (Cohere) @najoung.bsky.social (BU) @kylelo.bsky.social (AI2) Yohei Oseki (UTokyo)
- Exciting posters by other participants

Register to attend and/or present your poster at cphnlp.github.io /1
Copenhagen NLP Symposium 2025
symposium website
cphnlp.github.io
Reposted by Najoung Kim
najoung.bsky.social
thanks Matt, means a lot!! 😊
najoung.bsky.social
hello NAACL friends I'm giving a keynote today at RepL4NLP at 1:30PM local time, come say hi! I'll mostly be musing about things with light research discussions
Screenshot of a slide that says "what does it take to convince ourselves that a system is exhibiting compositionality?" with a side comment "mostly AI, but humans too!!" for the word system
najoung.bsky.social
we are scheming to make everyone's lab name a recursive acronym!
najoung.bsky.social
i think SNAIL is excellent
najoung.bsky.social
in liminal state, as correctly described by a colleague
najoung.bsky.social
so very excited that naomi is joining!!! a huge win for cds 💖
nsaphra.bsky.social
Life update: I'm starting as faculty at Boston University
@bucds.bsky.social in 2026! BU has SCHEMES for LM interpretability & analysis, I couldn't be more pumped to join a burgeoning supergroup w/ @najoung.bsky.social @amuuueller.bsky.social. Looking for my first students, so apply and reach out!
CDS building which looks like a jenga tower
najoung.bsky.social
"Gaming Linguists" good bigram
Korean to English Gaming Linguists Urgently Required
najoung.bsky.social
Repost appreciated! 🙏

ACL 2025 Ling theory & Cognitive modeling track is looking for emergency reviewers. The emergency review period is between 3/18-26, and these reviewers will be excluded from the ARR cycle. If you're interested, please sign up here! docs.google.com/forms/d/1fH7...
ACL 2025 Ling theory & Cognitive modeling track emergency reviewer volunteer form
The Linguistic Theories, Cognitive Modeling, and Psycholinguistics track at ACL 2025 is looking for emergency reviewers. The emergency reviews will take place between 18th to 26th of March, 2025. Thes...
docs.google.com
najoung.bsky.social
have many tasks but immobilized by car
fluffy car on lap
Reposted by Najoung Kim
benlipkin.bsky.social
Lots of folks talking about scaling LLM inference over this last year

Internally, I’ve been developing and using a library that makes this extremely easy, and I decided to open-source it
Meet the decoding library: github.com/benlipkin/de...

1/7
GitHub - benlipkin/decoding: Composable inference algorithms with LLMs and programmable logic
Composable inference algorithms with LLMs and programmable logic - benlipkin/decoding
github.com
najoung.bsky.social
this is a first... suffering maximized...
The password must NOT contain a non alpha character in the last two positions of the password!
najoung.bsky.social
why say "bad" when you can say "ungood"