stephenekhansen.bsky.social
@stephenekhansen.bsky.social
More generally, establishing procedures for valid inference in the growing world of AI-generated indicators is a major future challenge.

arxiv.org/abs/2402.15585
Inference for Regression with Variables Generated by AI or Machine Learning
It has become common practice for researchers to use AI-powered information retrieval algorithms or other machine learning methods to estimate variables of economic interest, then use these estimates ...
arxiv.org
December 12, 2024 at 10:44 AM
We also show that an IV strategy that uses a human-labeled sample to purge the measurement error in generated variables works poorly when the number of labels is small relative to the unlabeled data.
December 12, 2024 at 10:44 AM
We provide an illustration of how bias correction increases the estimated impact of remote work on wages across occupations.
December 12, 2024 at 10:44 AM
We provide a simple bias correction formula that applied researchers can easily use. This restores valid inference and has quantitatively important effects even when AI/LLM are extremely accurate.
December 12, 2024 at 10:44 AM
We consider the realistic case where algorithms become more precise as the sample size increases. In this setting, **point estimates are biased** but **standard errors are correct**.

This is the opposite of the typical generated regressor problem.
December 12, 2024 at 10:44 AM
Suppose we treat an AI-generated variable as "data" in a regression model.

One intuition is that measurement error biases coefficient estimates. Another is that ignoring uncertainty biases standard errors. Which is it?
December 12, 2024 at 10:44 AM