Author | Lightnews

Yoav Artzi

@yoavartzi.com

Hence, this is an interesting and important benchmark. Through a simple environment, it exposes a fairly fundamental flaw in current models

December 29, 2025 at 2:21 AM

Yoav Artzi

@yoavartzi.com

This is not surprising, and aligns with other findings in the literature regarding visual reasoning and manipulation

December 29, 2025 at 2:20 AM

Yoav Artzi

@yoavartzi.com

The prompts do provide rudimentary illustration. The stateful version allows the model to see the outcome of its own actions, technically allowing it to infer the physics. Generally though, the result for LLMs out of the box is negative.

December 29, 2025 at 2:20 AM

Yoav Artzi

@yoavartzi.com

Most of the experiments are not with VLMs, but with a diverse set of RL methods.

Do LLMs understand physics? They definitely generate outputs that seem to indicate so.

December 29, 2025 at 2:20 AM

Reposted by Yoav Artzi

Greg Durrett

@gregdnlp.bsky.social

Submit to COLM! Deadline of March 31. This llama gets to enjoy his holidays and isn't stressed out just yet...

Conference on Language Modeling @colmweb.org · 27d

COLM 2026 is just around the corner! Mark your calendars for:

💡 Abstract deadline: Thursday, March 26, 2026
📄 Full paper submission deadline: Tuesday, March 31, 2026

Call for papers (website coming soon):
docs.google.com/document/d/1...

Llama enjoying a mug of hot cocoa in an office with Tuesday, March 31 circled on a calendar behind them

December 16, 2025 at 3:36 PM

Yoav Artzi

@yoavartzi.com

Zoe presented this paper at NeurIPS D+B: it's all knots(🪢🪢🪢!?), no language tokens were harmed (or reinforced) in the process

It's such a fun and creative paper, a real mind twist ;)

You really get to think carefully about visual intelligence looking at these knots 🪢

Zizhao Chen @ch272h.bsky.social · Dec 5

🧩Natural language isn’t all you need.

We’re great at evaluating text-based reasoning (MATH, AIME…) but what about long-horizon visual reasoning?

Enter 𝗞𝗻𝗼𝘁𝗚𝘆𝗺: a minimalistic testbed for evaluating agents on spatial reasoning along a difficulty ladder

December 13, 2025 at 3:33 PM

Reposted by Yoav Artzi

Zizhao Chen

@ch272h.bsky.social

Hi all, I will be at #NeurIPS2025 to present my work on stress-testing looooooong visual reasoning with KnotGym🥨
Let's talk, whether or not your VLM that can see 14 million possible futures like Doctor Strange

November 28, 2025 at 4:08 PM

Reposted by Yoav Artzi

Conference on Language Modeling

@colmweb.org

COLM is going to San Francisco for 2026!

🗓️Dates: October 6-9, 2026
🏨Venue: Hilton San Francisco Union Square

Website and CFPs for papers and workshops coming up soon!

November 11, 2025 at 7:30 PM

Reposted by Yoav Artzi

Conference on Language Modeling

@colmweb.org

November 10, 2025 at 8:48 PM

Yoav Artzi

@yoavartzi.com

This is maybe counterintuitive to the original intention of just index the chaos to make it accessible. I guess that ideal of search softened a long time ago

November 10, 2025 at 3:57 PM

Yoav Artzi

@yoavartzi.com

That's definitely part of it, because this digestions has deeper history. Search engine indexing also seems just easier, so companies opt to it, even pre AI-overview-everything

November 10, 2025 at 3:57 PM

Yoav Artzi

@yoavartzi.com

Re peer-rev --> pre-print servers: arXiv is a simple uniform place to store. Indexing engines love it, so if you want something to be searchable, nothing is better. To make things worse, at times it seems like journals/proceedings almost play a game of hide-and-seek with PDFs

November 10, 2025 at 3:50 PM

Yoav Artzi

@yoavartzi.com

Re position papers: I don't think anyone can deny how effective some of these papers became for citations counts

November 10, 2025 at 3:50 PM

Yoav Artzi

@yoavartzi.com

Is this all just a big practical joke for ChatGPT? I have been told god doesn't play dice with the world, but I guess AGI does :)

November 6, 2025 at 8:52 PM

Yoav Artzi

@yoavartzi.com

It's a Thursday though ....

November 5, 2025 at 2:17 PM

Yoav Artzi

@yoavartzi.com

All available here:
lm-class.org

ChangeLog here:
lm-class.org/CHANGELOG.md

LM-class

LM-class is an education resource for contemporary language modeling, broadly construed.

lm-class.org

November 3, 2025 at 3:54 PM

Yoav Artzi

@yoavartzi.com

Pushed a big update to LM-class (v2025.2) -- this second version makes a much more mature resource

Many refinements of lecture slides + significant improvements to the assignments

Many thanks to @ch272h.bsky.social, Yilun Hua, and Shankar Padmanabhan for their work on the assignments

November 3, 2025 at 3:54 PM

Yoav Artzi

@yoavartzi.com

This kind of ad-hoc adaptation is hard in general of LLMs, but you can post-train to it for some degree
arxiv.org/abs/2508.06482

I suspect contemporary ASR models have the same backbone, so maybe applicable too

More broadly, there is a lot of interesting stuff to do in this space of adaptation

Post-training for Efficient Communication via Convention Formation

Humans communicate with increasing efficiency in multi-turn interactions, by adapting their language and forming ad-hoc conventions. In contrast, prior work shows that LLMs do not naturally show this ...

arxiv.org

November 3, 2025 at 3:50 PM

Yoav Artzi

@yoavartzi.com

I am potentially recruiting a postdoctoral fellow through this program. If interested, name me as a mentor, and ping me to let me know that you are applying! The process includes some sort of interview, so I can try to squeeze a few of these in advance (it will help a lot!)

Yoav Artzi @yoavartzi.com · Oct 28

Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca.

Deadline for full consideration is Nov 20, 2025!
academicjobsonline.org/ajo/jobs/30971

October 28, 2025 at 6:46 PM

Yoav Artzi

@yoavartzi.com

Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca.

Deadline for full consideration is Nov 20, 2025!
academicjobsonline.org/ajo/jobs/30971

October 28, 2025 at 6:45 PM

Reposted by Yoav Artzi

Angelina Wang

@angelinawang.bsky.social

Cornell (NYC and Ithaca) is recruiting AI postdocs, apply by Nov 20, 2025! If you're interested in working with me on technical approaches to responsible AI (e.g., personalization, fairness), please email me.

academicjobsonline.org/ajo/jobs/30971

Cornell University, Empire AI Fellows Program

Job #AJO30971, Postdoctoral Fellow, Empire AI Fellows Program, Cornell University, New York, New York, US

academicjobsonline.org

October 28, 2025 at 6:19 PM

Yoav Artzi

@yoavartzi.com

Wild

October 28, 2025 at 1:51 PM

Yoav Artzi

@yoavartzi.com

There's the legit gaming, which is just optimizing for the metrics and breaking them. Then there's the really fake stuff, like citation rings. You would thing citation translate to bitcoins with the level of creativity and effort that people put into it

October 27, 2025 at 6:41 PM

Yoav Artzi

@yoavartzi.com

The top citer has >1k papers, with a PhD from 2007. That's one hell of a steady rate ¯\_(ツ)_/¯

October 27, 2025 at 6:39 PM

Yoav Artzi

@yoavartzi.com

It's pretty crazy how the entire citation game has been manipulated. It's enough to give a quick look at Semantic Scholar for Bengio, who GScholar just gave 1M citations. SScholar gave 0.5M, but it's not only the number, it's the top citers

October 27, 2025 at 6:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news