Jesper N. Wulff
banner
jnwulff.bsky.social
Jesper N. Wulff
@jnwulff.bsky.social

Professor @AarhusUni doing research on organizational research methods and teaching deep neural networks in our Msc. BI program. https://sites.google.com/view/jesperwulff/bio

Business 66%
Economics 34%

Yes!!
There still seems to be a lot of confusion about significance testing in psych. No, p-values *don’t* become useless at large N. This flawed point also used to be framed as "too much power". But power isn't the problem – it's 1) unbalanced error rates and 2) the (lack of a) SESOI. 1/ >
But here's, the thing, p values and significance become useless at such large sample sizes. When you're dividing the coefficient by the SE and the sample size is in the tens of thousands, EVERYTHING IS SIGNIFICANT. All you're testing is whether the coefficient is different than zero.
There still seems to be a lot of confusion about significance testing in psych. No, p-values *don’t* become useless at large N. This flawed point also used to be framed as "too much power". But power isn't the problem – it's 1) unbalanced error rates and 2) the (lack of a) SESOI. 1/ >
But here's, the thing, p values and significance become useless at such large sample sizes. When you're dividing the coefficient by the SE and the sample size is in the tens of thousands, EVERYTHING IS SIGNIFICANT. All you're testing is whether the coefficient is different than zero.

Reposted by Jesper Wulff

New blog post on Gelman's recent claim that Type S and M errors are intended as a 'rhetorical tool', and if I was wrong to believe they were recommended more routinely in our recent preprint criticizing the idea of Type S and M errors. daniellakens.blogspot.com/2025/09/type...
Type S and M errors as a “rhetorical tool”
We recently posted a preprint criticizing the idea of Type S and M errors ( https://osf.io/2phzb_v1 ). From our abstract: “While these conce...
daniellakens.blogspot.com

What corresponds to the Z-test in this analogy? If the P-curve is the W-test then what is the Z-test?

Reposted by Jesper Wulff

More examples of faked institutional email addresses from @deevybee.bsky.social here deevybee.blogspot.com/2022/10/what...

9  Equivalence Testing and Interval Hypotheses – Improving Your Statistical Inferences share.google/tZRu9HekIBdY...
9  Equivalence Testing and Interval Hypotheses – Improving Your Statistical Inferences
This open educational resource contains information to improve statistical inferences, design better experiments, and report scientific research more transparently.
share.google

Absolutely! I'm planning on getting met into the stats curriculum in our undergrad business adm program. My favorite resource is Lakens' online book.

If it makes sense to test a hypothesis, do minimum effect testing and/or set alpha as a function of sample size.
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

Reposted by Jesper Wulff

"OpenAI is making “small steps that are good, but I don’t think we’re anywhere near where we need to be”, says Mark Steyvers, a cognitive science and AI researcher at UC Irvine. “It’s not frequent enough that GPT says ‘I don’t know’.”" www.nature.com/articles/d41...
Can researchers stop AI making up citations?
OpenAI’s GPT-5 hallucinates less than previous models do, but cutting hallucination completely might prove impossible.
www.nature.com
➡️ Deadline approaching—only one month left to send in your papers and presentation proposals for #CDSM2025!

🚨 𝗖𝗮𝗹𝗹 𝗳𝗼𝗿 𝗣𝗮𝗽𝗲𝗿𝘀: 𝗖𝗮𝘂𝘀𝗮𝗹 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗠𝗲𝗲𝘁𝗶𝗻𝗴 𝟮𝟬𝟮𝟱 🚨
📅 𝗡𝗼𝘃 𝟭𝟮–𝟭𝟯, 𝟮𝟬𝟮𝟱 (𝗩𝗶𝗿𝘁𝘂𝗮𝗹)
📥 Submission Deadline: 𝗦𝗲𝗽𝘁 𝟯𝟬, 𝟮𝟬𝟮𝟱
Ever stared at a table of regression coefficients & wondered what you're doing with your life?

Very excited to share this gentle introduction to another way of making sense of statistical models (w @vincentab.bsky.social)
Preprint: doi.org/10.31234/osf...
Website: j-rohrer.github.io/marginal-psy...

Reposted by Jesper Wulff

"Being Bayesian in a Frequentist World"

New post on "Bayesian dynamic borrowing" in R 📚

Link 👇
If you are preparing your bachelor statistics course and would like to add optional material for students to better understand statistics on a conceptual level (see topics in the screenshot) my free textbook provides a state of the art overview. lakens.github.io/statistical_...
My video about how LLMs are not search engines has led to many, MANY comments telling me that I should be using Perplexity. Some insisting that Perplexity does not hallucinate.

Out of a list of 26 papers it just provided me (in "Research" mode) 4 were real. FOUR. 85% hallucination rate.
TIL the original paper describing CRISPR, by Francisco Mojica, was rejected by 4 journals and took 2 years to be published
Just in case there was any doubt, ChatGPT 5.0 still makes up completely random citations that don't exist and should not be used for literature search.
‼️Cool new paper‼️

Finds that journal data policies in psychology boost sharing statements to ~100%, but only about half of datasets are complete, understandable, reusable.

Open: open.lnu.se/index.php/me...
5. Most frequentist methods are just *fine* and there's no need to always go full luxury bayesian in every application.

Reposted by Jesper Wulff

When power is derived from lies, data become the enemy.
www.nytimes.com/2025/08/01/b...
Trump, Claiming Weak Jobs Numbers Were ‘Rigged,’ Fires Labor Official
www.nytimes.com

Will the videos be released to a broad audience after the conference?

Reposted by Jesper Wulff

Inspiring PDW on using sensitivity analysis in empirical management research. My contribution is to present the sensemakr package by Cinelli & Hazlett (2020) for observational designs. Thanks a lot to the organizers for putting this fantastic session together. #AOM2025
I’ve long used FiveThirtyEight’s interactive “Hack Your Way To Scientific Glory” to illustrate the idea of p-hacking when I teach statistics. But ABC/Disney killed the site earlier this month :(

So I made my own with #rstats and Observable and #QuartoPub ! stats.andrewheiss.com/hack-your-way/
In my latest (and last!) column for Science’s Expert Voices series, I write about the reasons behind AI chatbots’ “deceptive” behaviors (and why Claude threatened a fictional CEO with blackmail).

www.science.org/doi/10.1126/...
Why AI chatbots lie to us
A few weeks ago, a colleague of mine needed to collect and format some data from a website, and he asked the latest version of Anthropic’s generative AI system, Claude, for help. Claude cheerfully agr...
www.science.org
This is fascinating: www.reddit.com/r/OpenAI/s/I...

Someone “worked on a book with ChatGPT” for weeks and then sought help on Reddit when they couldn’t download the file. Redditors helped them realized ChatGPT had just been roleplaying/lying and there was no file/book…
From the OpenAI community on Reddit
Explore this post and more from the OpenAI community
www.reddit.com
Using time series graphs to make causal claims be like

Reposted by Jesper Wulff

The people call and I answer.

Here are my thoughts on that developer RCT and the "AI slows down developers" claim.

www.fightforthehuman.com/are-develope...
Are developers slowed down by AI? Evaluating an RCT (?) and what it tells us about developer productivity
Seven different people texted or otherwise messaged me about this study which claims to measure “the impact of early-2025 AI on experience open-source developer productivity.” You know, when I decide...
www.fightforthehuman.com

Thick vs thin causality
It's a distinction from @kph3k.bsky.social's Genetic Lottery. I used it before even without talking about genetics; it catches a lot of the misconceptions that student have about what it means to call something a cause. Those misconceptions also led to a blog post: www.the100.ci/2024/06/26/s...
It's a distinction from @kph3k.bsky.social's Genetic Lottery. I used it before even without talking about genetics; it catches a lot of the misconceptions that student have about what it means to call something a cause. Those misconceptions also led to a blog post: www.the100.ci/2024/06/26/s...
After all these reports of authors adding language instructions for LLM reviews in their papers I wanted to check this myself and I downloaded the .tex source from one of these papers.

Here is an example.
(I will not share the identity of the paper)