Lightnews — Scholar-powered news

Reposted by Jesper N. Wulff

Daniel Lakens @lakens.bsky.social · 10d

New blog post on Gelman's recent claim that Type S and M errors are intended as a 'rhetorical tool', and if I was wrong to believe they were recommended more routinely in our recent preprint criticizing the idea of Type S and M errors. daniellakens.blogspot.com/2025/09/type...

Type S and M errors as a “rhetorical tool”

We recently posted a preprint criticizing the idea of Type S and M errors ( https://osf.io/2phzb_v1 ). From our abstract: “While these conce...

daniellakens.blogspot.com

6 11

Reposted by Jesper N. Wulff

Joachim Baumann @joachimbaumann.bsky.social · 26d

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

5 94 260

Jesper N. Wulff @jnwulff.bsky.social · 13d

What corresponds to the Z-test in this analogy? If the P-curve is the W-test then what is the Z-test?

1

Reposted by Jesper N. Wulff

Richard Sever @richardsever.bsky.social · 15d

More examples of faked institutional email addresses from @deevybee.bsky.social here deevybee.blogspot.com/2022/10/what...

2 2 12

Jesper N. Wulff @jnwulff.bsky.social · 22d

9 Equivalence Testing and Interval Hypotheses – Improving Your Statistical Inferences share.google/tZRu9HekIBdY...

9 Equivalence Testing and Interval Hypotheses – Improving Your Statistical Inferences

This open educational resource contains information to improve statistical inferences, design better experiments, and report scientific research more transparently.

share.google

3

Jesper N. Wulff @jnwulff.bsky.social · 22d

Absolutely! I'm planning on getting met into the stats curriculum in our undergrad business adm program. My favorite resource is Lakens' online book.

1 1

Jesper N. Wulff @jnwulff.bsky.social · 22d

If it makes sense to test a hypothesis, do minimum effect testing and/or set alpha as a function of sample size.

1 1

Reposted by Jesper N. Wulff

Marieke van Vugt @mvugt.bsky.social · Sep 9

"OpenAI is making “small steps that are good, but I don’t think we’re anywhere near where we need to be”, says Mark Steyvers, a cognitive science and AI researcher at UC Irvine. “It’s not frequent enough that GPT says ‘I don’t know’.”" www.nature.com/articles/d41...

Can researchers stop AI making up citations?

OpenAI’s GPT-5 hallucinates less than previous models do, but cutting hallucination completely might prove impossible.

www.nature.com

3 5

Reposted by Jesper N. Wulff

Paul Hünermund @p-hunermund.com · Aug 30

➡️ Deadline approaching—only one month left to send in your papers and presentation proposals for #CDSM2025!

🚨 𝗖𝗮𝗹𝗹 𝗳𝗼𝗿 𝗣𝗮𝗽𝗲𝗿𝘀: 𝗖𝗮𝘂𝘀𝗮𝗹 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗠𝗲𝗲𝘁𝗶𝗻𝗴 𝟮𝟬𝟮𝟱 🚨
📅 𝗡𝗼𝘃 𝟭𝟮–𝟭𝟯, 𝟮𝟬𝟮𝟱 (𝗩𝗶𝗿𝘁𝘂𝗮𝗹)
📥 Submission Deadline: 𝗦𝗲𝗽𝘁 𝟯𝟬, 𝟮𝟬𝟮𝟱

23 20

Reposted by Jesper N. Wulff

Julia M. Rohrer @dingdingpeng.the100.ci · Aug 25

Ever stared at a table of regression coefficients & wondered what you're doing with your life?

Very excited to share this gentle introduction to another way of making sense of statistical models (w @vincentab.bsky.social)
Preprint: doi.org/10.31234/osf...
Website: j-rohrer.github.io/marginal-psy...

Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities

Abstract
Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as “counterfactual prediction machines,” which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).

Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.

A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals).

Illustrated are
1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals
2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and
3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.

49 280 940

Reposted by Jesper N. Wulff

Theiss Bendixen @theissbendixen.bsky.social · Aug 25

"Being Bayesian in a Frequentist World"

New post on "Bayesian dynamic borrowing" in R 📚

Link 👇

2 2 10

Reposted by Jesper N. Wulff

Daniel Lakens @lakens.bsky.social · Aug 25

If you are preparing your bachelor statistics course and would like to add optional material for students to better understand statistics on a conceptual level (see topics in the screenshot) my free textbook provides a state of the art overview. lakens.github.io/statistical_...

4 68 210

Reposted by Jesper N. Wulff

Dr. Casey Fiesler @cfiesler.bsky.social · Aug 23

My video about how LLMs are not search engines has led to many, MANY comments telling me that I should be using Perplexity. Some insisting that Perplexity does not hallucinate.

Out of a list of 26 papers it just provided me (in "Research" mode) 4 were real. FOUR. 85% hallucination rate.

29 470 1.5K

Reposted by Jesper N. Wulff

Saloni @scientificdiscovery.dev · Aug 17

TIL the original paper describing CRISPR, by Francisco Mojica, was rejected by 4 journals and took 2 years to be published

6 78 300

Reposted by Jesper N. Wulff

Carl T. Bergstrom @carlbergstrom.com · Aug 16

Just in case there was any doubt, ChatGPT 5.0 still makes up completely random citations that don't exist and should not be used for literature search.

1. David Ackerly (UC Berkeley)

While his most-cited work is on leaf size and SLA, he also wrote explicitly about plasticity in leaf traits, including shape, in the context of ecological strategies.

Example: Ackerly (1997), “Allocation, leaf display, and growth in fluctuating light environments: A comparative study of deciduous and evergreen species” (Oecologia). This emphasizes how plasticity in leaf traits mediates adaptation to light.

28 270 980

Reposted by Jesper N. Wulff

Carlisle Rainey 👨‍💻📊📚 @carlislerainey.bsky.social · Aug 12

‼️Cool new paper‼️

Finds that journal data policies in psychology boost sharing statements to ~100%, but only about half of datasets are complete, understandable, reusable.

Open: open.lnu.se/index.php/me...

1 4 17

Reposted by Jesper N. Wulff

Zeta Of 1 @zetaof1.bsky.social · Aug 4

5. Most frequentist methods are just *fine* and there's no need to always go full luxury bayesian in every application.

1 7 27

Reposted by Jesper N. Wulff

Jess Calarco @jessicacalarco.com · Aug 2

When power is derived from lies, data become the enemy.
www.nytimes.com/2025/08/01/b...

Trump, Claiming Weak Jobs Numbers Were ‘Rigged,’ Fires Labor Official

www.nytimes.com

5 36 110

Jesper N. Wulff @jnwulff.bsky.social · Aug 1

Will the videos be released to a broad audience after the conference?

1

Reposted by Jesper N. Wulff

Paul Hünermund @p-hunermund.com · Jul 26

Inspiring PDW on using sensitivity analysis in empirical management research. My contribution is to present the sensemakr package by Cinelli & Hazlett (2020) for observational designs. Thanks a lot to the organizers for putting this fantastic session together. #AOM2025

1 3 17

Reposted by Jesper N. Wulff

Melanie Mitchell @melaniemitchell.bsky.social · Jul 24

In my latest (and last!) column for Science’s Expert Voices series, I write about the reasons behind AI chatbots’ “deceptive” behaviors (and why Claude threatened a fictional CEO with blackmail).

www.science.org/doi/10.1126/...

Why AI chatbots lie to us

A few weeks ago, a colleague of mine needed to collect and format some data from a website, and he asked the latest version of Anthropic’s generative AI system, Claude, for help. Claude cheerfully agr...

www.science.org

9 81 200

Reposted by Jesper N. Wulff

Dr. Casey Fiesler @cfiesler.bsky.social · Jul 16

This is fascinating: www.reddit.com/r/OpenAI/s/I...

Someone “worked on a book with ChatGPT” for weeks and then sought help on Reddit when they couldn’t download the file. Redditors helped them realized ChatGPT had just been roleplaying/lying and there was no file/book…

From the OpenAI community on Reddit

Explore this post and more from the OpenAI community

www.reddit.com

260 2K 8.2K

Reposted by Jesper N. Wulff

Dr. Cat Hicks @grimalkina.bsky.social · Jul 13

The people call and I answer.

Here are my thoughts on that developer RCT and the "AI slows down developers" claim.

www.fightforthehuman.com/are-develope...

Are developers slowed down by AI? Evaluating an RCT (?) and what it tells us about developer productivity

Seven different people texted or otherwise messaged me about this study which claims to measure “the impact of early-2025 AI on experience open-source developer productivity.” You know, when I decide...

www.fightforthehuman.com

4 41 98

Reposted by Jesper N. Wulff

Khoa @khoavuumn.bsky.social · Jul 14

Using time series graphs to make causal claims be like

11 160 850

Jesper N. Wulff @jnwulff.bsky.social · Jul 7

Thick vs thin causality

Julia M. Rohrer @dingdingpeng.the100.ci · Jul 7

It's a distinction from @kph3k.bsky.social's Genetic Lottery. I used it before even without talking about genetics; it catches a lot of the misconceptions that student have about what it means to call something a cause. Those misconceptions also led to a blog post: www.the100.ci/2024/06/26/s...