Vojtech Cahlik
vojtechcahlik.bsky.social
Vojtech Cahlik
@vojtechcahlik.bsky.social
I am a freelance software & machine learning engineer and a PhD student at the Czech Technical University in Prague.
This paper compares the capabilities of LLMs across benchmarks by assigning Elo-style scores to individual models, as well as to the benchmarks themselves. According to the authors, this can be used to measure the long-term progress in AI research:
epoch.ai/blog/a-roset...
A Rosetta Stone for AI benchmarks
Most benchmarks saturate too quickly to study long-run AI trends. We solve this using a statistical framework that stitches benchmarks together, with big implications for algorithmic progress and AI f...
epoch.ai
December 5, 2025 at 6:20 PM
Dan Hendrycks et al. have released an interesting evaluation of present-day AI capabilities on the path to human-level AI. This is another piece of evidence that the AI revolution is yet to slow down.

www.agidefinition.ai
October 30, 2025 at 7:34 AM
Historical breakthroughs in NLP:

1950: The Turing Test is proposed
2017: Attention Is All You Need
2022: Release of ChatGPT
2025: A functional 5M-parameter LLM built with Minecraft redstone

www.youtube.com/watch?v=VaeI...
I built ChatGPT with Minecraft redstone!
YouTube video by sammyuri
www.youtube.com
October 10, 2025 at 3:24 PM
I'm happy to speculate that our general technique for grounding explanations in LLM reasoning, presented at last week's XAI 2025 conference, could pave the way for finally cracking natural language explanations.
arxiv.org/abs/2503.11248
Reasoning-Grounded Natural Language Explanations for Language Models
We propose a large language model explainability technique for obtaining faithful natural language explanations by grounding the explanations in a reasoning process. When converted to a sequence of to...
arxiv.org
July 14, 2025 at 10:08 AM