Lightnews — Scholar-powered news

Vojtech Cahlik

@vojtechcahlik.bsky.social

2 followers 11 following 7 posts

I am a freelance software & machine learning engineer and a PhD student at the Czech Technical University in Prague.

Posts Replies Media Videos

Vojtech Cahlik

@vojtechcahlik.bsky.social

This paper compares the capabilities of LLMs across benchmarks by assigning Elo-style scores to individual models, as well as to the benchmarks themselves. According to the authors, this can be used to measure the long-term progress in AI research:
epoch.ai/blog/a-roset...

A Rosetta Stone for AI benchmarks

Most benchmarks saturate too quickly to study long-run AI trends. We solve this using a statistical framework that stitches benchmarks together, with big implications for algorithmic progress and AI f...

epoch.ai

December 5, 2025 at 6:20 PM

Vojtech Cahlik

@vojtechcahlik.bsky.social

Dan Hendrycks et al. have released an interesting evaluation of present-day AI capabilities on the path to human-level AI. This is another piece of evidence that the AI revolution is yet to slow down.

www.agidefinition.ai

October 30, 2025 at 7:34 AM

Vojtech Cahlik

@vojtechcahlik.bsky.social

Historical breakthroughs in NLP:

1950: The Turing Test is proposed
2017: Attention Is All You Need
2022: Release of ChatGPT
2025: A functional 5M-parameter LLM built with Minecraft redstone

www.youtube.com/watch?v=VaeI...

I built ChatGPT with Minecraft redstone!

YouTube video by sammyuri

www.youtube.com

October 10, 2025 at 3:24 PM

Vojtech Cahlik

@vojtechcahlik.bsky.social

I'm happy to speculate that our general technique for grounding explanations in LLM reasoning, presented at last week's XAI 2025 conference, could pave the way for finally cracking natural language explanations.
arxiv.org/abs/2503.11248

Reasoning-Grounded Natural Language Explanations for Language Models

We propose a large language model explainability technique for obtaining faithful natural language explanations by grounding the explanations in a reasoning process. When converted to a sequence of to...

arxiv.org

July 14, 2025 at 10:08 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news