Daniel Paleka
@dpaleka.bsky.social
190 followers 47 following 53 posts
ai safety researcher | phd ETH Zurich | https://danielpaleka.com
Posts Media Videos Starter Packs
dpaleka.bsky.social
Benchmarks can reward strategic gambling over calibrated forecasting when optimizing for ranking performance.

"Bet everything" on one scenario beats careful probability estimation for maximizing the chance of ranking #1 on the leaderboard. (6/7)
dpaleka.bsky.social
Model knowledge cutoffs are guidelines about reliability, not guarantees of no information thereafter. GPT-4o, when nudged, can reveal knowledge beyond its stated Oct 2023 cutoff. (5/7)
dpaleka.bsky.social
Date-restricted search leaks future knowledge. Searching pre-2019 articles about “Wuhan” returns results abnormally biased towards the Wuhan Institute of Virology — an association that only emerged later. (4/7)
dpaleka.bsky.social
The time traveler problem: When forecasting "Will civil war break out in Sudan by 2030?", you can deduce the answer is "yes" - otherwise they couldn't grade you yet.

We find that backtesting in existing papers often has similar logical issues that leak information about answers. (3/7)
dpaleka.bsky.social
Forecasting evaluation is tricky. The gold standard is asking about future events; but that takes months/years.

Instead, researchers use "backtesting": questions where we can evaluate predictions now, but the model has no information about the outcome ... or so we think (2/7)
dpaleka.bsky.social
How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations.

We identify key issues with forecasting evaluations 🧵 (1/7)
dpaleka.bsky.social
why is it that whenever i see survivorship bias on my timeline it already has the red-dotted plane in the replies?
dpaleka.bsky.social
OpenAI and DeepMind should have entries at Eurovision too
dpaleka.bsky.social
3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear

4o: yes you are Jesus Christ's brother. now go. Nanjing awaits

o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream
dpaleka.bsky.social
Of course, we don't have the old chatgpt-4o API endpoint, so we can't see whether the prompt is fully at fault or there was also a model update.
dpaleka.bsky.social
The sycophancy effect on controversial binary options is much smaller than what you would assume from the overall positive vibe towards the user. On most such statements, models don't actually state they agree with the user.
dpaleka.bsky.social
Quick sycophancy eval: comparing the two recent OpenAI ChatGPT system prompts, it is clear last week's prompt moves other models towards sycophancy too, while the current prompt makes them more disagreeable.
dpaleka.bsky.social
i was today years old when i realized the grammatical plural of anecdote is anecdotes, not anecdata. i dislike this finding
dpaleka.bsky.social
we are so lucky that pathogens, as opposed to political and religious memes, do not organize coalitions of hosts against non-hosts as an instrumental objective
dpaleka.bsky.social
oh that's cool. it would be interesting to draw a matrix of how well the various models are aware of models other than themselves, in the sense they consider them as coherent entities similar to their own self-perception
dpaleka.bsky.social
fixed games such as blackjack you cannot optimize too much because rules don't change. meanwhile, a casino gets unlimited iteration on slot machines and the reward signal is as good as it gets
dpaleka.bsky.social
are slot machines and the like so profitable because simplistic gambling is inherently very addictive, or because there has been a legible financial incentive for an entire industry to spend decades optimizing them to be addictive as possible?
dpaleka.bsky.social
TIL the concept of *epistemic hell*. standard Joseph Henrich example: in the ancestral environment, hygienic and food prep rituals determine survival, but no hunter-gatherer can possibly explain why. hence genetic selection for accepting of religious rituals and against reasoning
dpaleka.bsky.social
Why do meeting transcription apps (Fireflies, Granola) require Google Workspace accounts?
dpaleka.bsky.social
what are you doing Claude i thought we were friends
dpaleka.bsky.social
the rate of people's familiarity with Scaling Scaling Laws with Board Games over time is starting to look like the plot from Scaling Scaling Laws with Board Games
dpaleka.bsky.social
go do something that can fail