Lightnews — Scholar-powered news

Joe Stacey

@joestacey.bsky.social

2.5K followers 2.1K following 140 posts

NLP PhD student at Imperial College London and Apple AI/ML Scholar.

Posts Media Videos Starter Packs

Pinned

Joe Stacey @joestacey.bsky.social · May 27

We have a fun new #NLProc paper on arXiv about improving the robustness of fine-tuned NLI models!

Have a look :)
arxiv.org/abs/2505.20209

1 5

Reposted by Joe Stacey

Lisa Alazraki @lisaalaz.bsky.social · Aug 28

We have released #AgentCoMa, an agentic reasoning benchmark where each task requires a mix of commonsense and math to be solved 🧐

LLM agents performing real-world tasks should be able to combine these different types of reasoning, but are they fit for the job? 🤔

🧵⬇️

1 2 4

Joe Stacey @joestacey.bsky.social · Jul 22

Congratulations!! Awesome you will be in Europe!

1 1

Joe Stacey @joestacey.bsky.social · Jul 17

The bad:

- the chocolate here is terrible for no good reason
- hotel breakfasts never have any baked beans, which are way under appreciated here (they are delicious and add much needed moisture to a cooked breakfast)
- the temperature in summer is inhumane

Think that covers the main stuff 😍

Joe Stacey @joestacey.bsky.social · Jul 17

Here’s my review of the US after a few days here. Did I miss anything? 🤔

The good:

- Americans are the most charming, friendly and hospitable people
- it’s super fun how the country is split into states that all have different laws and stuff, with different vibes state to state

1 1

Joe Stacey @joestacey.bsky.social · Jul 2

Any chance Keir Starmer can reshuffle himself in as foreign secretary, and shuffle in another prime minister who actually has some vague idea about what they want to achieve? 🙏🤦‍♂️

Joe Stacey @joestacey.bsky.social · Jul 2

Finally the heatwave has ended, and the UK is once again a bearable place to be 😍😍

If you have any UK-based collaborations, their productivity is about to increase like 10 fold

Joe Stacey @joestacey.bsky.social · May 27

This work was really fun and a great last paper for my PhD. Check it out 🙂 Massive thanks to all my amazing collaborators!

arxiv.org/abs/2505.20209

P.S. if you know about a paper improving NLI model robustness not already in our related work appendix, I would love to hear about it 🥰

How to Improve the Robustness of Closed-Source Models on NLI

Closed-source Large Language Models (LLMs) have become increasingly popular, with impressive performance across a wide range of natural language tasks. These models can be fine-tuned to further improv...

arxiv.org

Joe Stacey @joestacey.bsky.social · May 27

5) The best way to improve performance on the hardest OOD data was to choose more challenging training examples

Our best method (Uncertainty Sampling) picked examples with the most uncertain predictions. This identified challenging examples, but without too much label noise

1 1

Joe Stacey @joestacey.bsky.social · May 27

4) Creating more complex synthetic data avoids a loss in performance on harder OOD datasets

We find that generating more challenging synthetic data (Long & Complex Generation) helps retain performance on harder OOD datasets, while still achieving gains on easier OOD data

Joe Stacey @joestacey.bsky.social · May 27

3) Replacing some training examples with LLM-generated data proved very effective on less challenging OOD data

See Standard-OOD scores below (avg), where the simplest LLM-generated data (Short & Simple Generation) performed best, with substantial improvements

Joe Stacey @joestacey.bsky.social · May 27

2) We experiment with 6+ ways for improving robustness:

This involved sampling methods to choose more complex examples in our training data, and generating new synthetic examples

Some methods were pretty fun, e.g. asking an LLM to assess the difficulty of training examples

1 1

Joe Stacey @joestacey.bsky.social · May 27

1) It's time to stop using fine-tuned encoder models:

We find that fine-tuned LLMs are substantially more robust than commonly used encoder models, despite being fine-tuned on x50 less data.

This is especially the case on challenging OOD datasets (see Challenge-OOD avg below)

Joe Stacey @joestacey.bsky.social · May 27

The paper tries to improve the robustness of closed-source LLMs fine-tuned on NLI, assuming a realistic training budget of 10k training examples.

Here's a 45 second rundown of what we found!

Joe Stacey @joestacey.bsky.social · May 27

We have a fun new #NLProc paper on arXiv about improving the robustness of fine-tuned NLI models!

Have a look :)
arxiv.org/abs/2505.20209

1 5

Joe Stacey @joestacey.bsky.social · May 18

I’d personally just love to see more negative results from nice ideas that didn’t quite work out. I feel like there’s probably a bunch of cool stuff people have tried out and discarded that could be made to work across multiple papers. Would be fun and interesting too

1 1 2

Joe Stacey @joestacey.bsky.social · May 18

Was worried it was just me hating on it so much 🤣

Joe Stacey @joestacey.bsky.social · May 18

I’d love to see more diversity in the field, what kind of things were you thinking?

Joe Stacey @joestacey.bsky.social · May 18

Should I use an LLM to help refine my paper writing for the ARR deadline? 🤔🤔

It will improve the paper for sure, but probably also making the tone a whole lot more annoying

Reposted by Joe Stacey

Juan Diego Rodriguez (@ COLM 2025) @juand-r.bsky.social · Apr 28

If you're at #NAACL2025 and want to hear about similarity effects for property inheritance in LMs, please stop by!

I will be presenting this work on Wednesday at the 11-12:30 poster session on Interpretability & analysis for language models (Hall 3).

aclanthology.org/2025.naacl-l...

4 12

Joe Stacey @joestacey.bsky.social · Apr 28

Looks so cool! I’m insanely jealous

1 2

Joe Stacey @joestacey.bsky.social · Apr 23

I’m not a fan of musk, but imo there’s some really nice work here 🙂

Interested in the Washington post article, would you mind sharing a link?

Reposted by Joe Stacey

Imperial NLP @imperial-nlp.bsky.social · Apr 22

Excited to share our ICLR and NAACL papers! Please come and say hi, we're super friendly :)

5 14

Joe Stacey @joestacey.bsky.social · Apr 14

That’s an awesome paper 👍👍

1 1

Joe Stacey @joestacey.bsky.social · Apr 5

Wow, the old ITV Agatha Christie’s Poirot is brilliant. Some tv for 1989…

Gonna go binge watch the 13 seasons now 😍

Joe Stacey @joestacey.bsky.social · Apr 5

Congratulations! It’s definitely worth trying/experimenting with responses that are more concise in the future and see what kind of reaction you get.

Best of luck with your meta-reviews! 🤞