utherwayn.bsky.social
@utherwayn.bsky.social
Just a bunny loving game developer
@simonwillison.net I'm not trying to be an LLM denier here, but man this paper hit home for me as not an ML kind of person and I'd love to see your take on it?

[2506.21521] Potemkin Understanding in Large Language Models share.google/W9cKIwYoWI5W...

Coherence seems like an important metric.
Potemkin Understanding in Large Language Models
Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This...
share.google
June 28, 2025 at 7:20 PM