Yoav Artzi
banner
yoavartzi.com
Yoav Artzi
@yoavartzi.com
LM/NLP/ML researcher ¯\_(ツ)_/¯

yoavartzi.com / associate professor @ Cornell CS + Cornell Tech campus @ NYC / nlp.cornell.edu / associate faculty director @ arXiv.org / researcher @ ASAPP / starting @colmweb.org / building RecNet.io
Hence, this is an interesting and important benchmark. Through a simple environment, it exposes a fairly fundamental flaw in current models
December 29, 2025 at 2:21 AM
This is not surprising, and aligns with other findings in the literature regarding visual reasoning and manipulation
December 29, 2025 at 2:20 AM
The prompts do provide rudimentary illustration. The stateful version allows the model to see the outcome of its own actions, technically allowing it to infer the physics. Generally though, the result for LLMs out of the box is negative.
December 29, 2025 at 2:20 AM
Most of the experiments are not with VLMs, but with a diverse set of RL methods.

Do LLMs understand physics? They definitely generate outputs that seem to indicate so.
December 29, 2025 at 2:20 AM
Reposted by Yoav Artzi
Submit to COLM! Deadline of March 31. This llama gets to enjoy his holidays and isn't stressed out just yet...
COLM 2026 is just around the corner! Mark your calendars for:

💡 Abstract deadline: Thursday, March 26, 2026
📄 Full paper submission deadline: Tuesday, March 31, 2026

Call for papers (website coming soon):
docs.google.com/document/d/1...
December 16, 2025 at 3:36 PM
Zoe presented this paper at NeurIPS D+B: it's all knots(🪢🪢🪢!?), no language tokens were harmed (or reinforced) in the process

It's such a fun and creative paper, a real mind twist ;)

You really get to think carefully about visual intelligence looking at these knots 🪢
🧩Natural language isn’t all you need.

We’re great at evaluating text-based reasoning (MATH, AIME…) but what about long-horizon visual reasoning?

Enter 𝗞𝗻𝗼𝘁𝗚𝘆𝗺: a minimalistic testbed for evaluating agents on spatial reasoning along a difficulty ladder
December 13, 2025 at 3:33 PM
Reposted by Yoav Artzi
Hi all, I will be at #NeurIPS2025 to present my work on stress-testing looooooong visual reasoning with KnotGym🥨
Let's talk, whether or not your VLM that can see 14 million possible futures like Doctor Strange
November 28, 2025 at 4:08 PM
Reposted by Yoav Artzi
COLM is going to San Francisco for 2026!

🗓️Dates: October 6-9, 2026
🏨Venue: Hilton San Francisco Union Square

Website and CFPs for papers and workshops coming up soon!
November 11, 2025 at 7:30 PM
Reposted by Yoav Artzi
November 10, 2025 at 8:48 PM
This is maybe counterintuitive to the original intention of just index the chaos to make it accessible. I guess that ideal of search softened a long time ago
November 10, 2025 at 3:57 PM
That's definitely part of it, because this digestions has deeper history. Search engine indexing also seems just easier, so companies opt to it, even pre AI-overview-everything
November 10, 2025 at 3:57 PM
Re peer-rev --> pre-print servers: arXiv is a simple uniform place to store. Indexing engines love it, so if you want something to be searchable, nothing is better. To make things worse, at times it seems like journals/proceedings almost play a game of hide-and-seek with PDFs
November 10, 2025 at 3:50 PM
Re position papers: I don't think anyone can deny how effective some of these papers became for citations counts
November 10, 2025 at 3:50 PM
Is this all just a big practical joke for ChatGPT? I have been told god doesn't play dice with the world, but I guess AGI does :)
November 6, 2025 at 8:52 PM
It's a Thursday though ....
November 5, 2025 at 2:17 PM
All available here:
lm-class.org

ChangeLog here:
lm-class.org/CHANGELOG.md
LM-class
LM-class is an education resource for contemporary language modeling, broadly construed.
lm-class.org
November 3, 2025 at 3:54 PM
Pushed a big update to LM-class (v2025.2) -- this second version makes a much more mature resource

Many refinements of lecture slides + significant improvements to the assignments

Many thanks to @ch272h.bsky.social, Yilun Hua, and Shankar Padmanabhan for their work on the assignments
November 3, 2025 at 3:54 PM
This kind of ad-hoc adaptation is hard in general of LLMs, but you can post-train to it for some degree
arxiv.org/abs/2508.06482

I suspect contemporary ASR models have the same backbone, so maybe applicable too

More broadly, there is a lot of interesting stuff to do in this space of adaptation
Post-training for Efficient Communication via Convention Formation
Humans communicate with increasing efficiency in multi-turn interactions, by adapting their language and forming ad-hoc conventions. In contrast, prior work shows that LLMs do not naturally show this ...
arxiv.org
November 3, 2025 at 3:50 PM
I am potentially recruiting a postdoctoral fellow through this program. If interested, name me as a mentor, and ping me to let me know that you are applying! The process includes some sort of interview, so I can try to squeeze a few of these in advance (it will help a lot!)
Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca.

Deadline for full consideration is Nov 20, 2025!
academicjobsonline.org/ajo/jobs/30971
October 28, 2025 at 6:46 PM
Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca.

Deadline for full consideration is Nov 20, 2025!
academicjobsonline.org/ajo/jobs/30971
October 28, 2025 at 6:45 PM
Reposted by Yoav Artzi
Cornell (NYC and Ithaca) is recruiting AI postdocs, apply by Nov 20, 2025! If you're interested in working with me on technical approaches to responsible AI (e.g., personalization, fairness), please email me.

academicjobsonline.org/ajo/jobs/30971
Cornell University, Empire AI Fellows Program
Job #AJO30971, Postdoctoral Fellow, Empire AI Fellows Program, Cornell University, New York, New York, US
academicjobsonline.org
October 28, 2025 at 6:19 PM
Wild
October 28, 2025 at 1:51 PM
There's the legit gaming, which is just optimizing for the metrics and breaking them. Then there's the really fake stuff, like citation rings. You would thing citation translate to bitcoins with the level of creativity and effort that people put into it
October 27, 2025 at 6:41 PM
The top citer has >1k papers, with a PhD from 2007. That's one hell of a steady rate ¯\_(ツ)_/¯
October 27, 2025 at 6:39 PM
It's pretty crazy how the entire citation game has been manipulated. It's enough to give a quick look at Semantic Scholar for Bengio, who GScholar just gave 1M citations. SScholar gave 0.5M, but it's not only the number, it's the top citers
October 27, 2025 at 6:36 PM