Lightnews — Scholar-powered news

Reposted by andrea wang

Jennah Gosciak @jennahgosciak.bsky.social · Jun 24

I am presenting a new 📝 “Bias Delayed is Bias Denied? Assessing the Effect of Reporting Delays on Disparity Assessments” at @facct.bsky.social on Thursday, with @aparnabee.bsky.social, Derek Ouyang, @allisonkoe.bsky.social, @marzyehghassemi.bsky.social, and Dan Ho. 🔗: arxiv.org/abs/2506.13735
(1/n)

"Bias Delayed is Bias Denied? Assessing the Effect of Reporting Delays on Disparity Assessments"

Conducting disparity assessments at regular time intervals is critical for surfacing potential biases in decision-making and improving outcomes across demographic groups. Because disparity assessments fundamentally depend on the availability of demographic information, their efficacy is limited by the availability and consistency of available demographic identifiers. While prior work has considered the impact of missing data on fairness, little attention has been paid to the role of delayed demographic data. Delayed data, while eventually observed, might be missing at the critical point of monitoring and action -- and delays may be unequally distributed across groups in ways that distort disparity assessments. We characterize such impacts in healthcare, using electronic health records of over 5M patients across primary care practices in all 50 states. Our contributions are threefold. First, we document the high rate of race and ethnicity reporting delays in a healthcare setting and demonstrate widespread variation in rates at which demographics are reported across different groups. Second, through a set of retrospective analyses using real data, we find that such delays impact disparity assessments and hence conclusions made across a range of consequential healthcare outcomes, particularly at more granular levels of state-level and practice-level assessments. Third, we find limited ability of conventional methods that impute missing race in mitigating the effects of reporting delays on the accuracy of timely disparity assessments. Our insights and methods generalize to many domains of algorithmic fairness where delays in the availability of sensitive information may confound audits, thus deserving closer attention within a pipeline-aware machine learning framework.

Figure contrasting a conventional approach to conducting disparity assessments, which is static, to the analysis we conduct in this paper. Our analysis (1) uses comprehensive health data from over 1,000 primary care practices and 5 million patients across the U.S., (2) timestamped information on the reporting of race to measure delay, and (3) retrospective analyses of disparity assessments under varying levels of delay.

1 4 13

Reposted by andrea wang

Emma Harvey @emmharv.bsky.social · Jun 23

I am so excited to be in 🇬🇷Athens🇬🇷 to present "A Framework for Auditing Chatbots for Dialect-Based Quality-of-Service Harms" by me, @kizilcec.bsky.social, and @allisonkoe.bsky.social, at #FAccT2025!!

🔗: arxiv.org/pdf/2506.04419

A screenshot of our paper's:

Title: A Framework for Auditing Chatbots for Dialect-Based Quality-of-Service Harms
Authors: Emma Harvey, Rene Kizilcec, Allison Koenecke
Abstract: Increasingly, individuals who engage in online activities are expected to interact with large language model (LLM)-based chatbots. Prior work has shown that LLMs can display dialect bias, which occurs when they produce harmful responses when prompted with text written in minoritized dialects. However, whether and how this bias propagates to systems built on top of LLMs, such as chatbots, is still unclear. We conduct a review of existing approaches for auditing LLMs for dialect bias and show that they cannot be straightforwardly adapted to audit LLM-based chatbots due to issues of substantive and ecological validity. To address this, we present a framework for auditing LLM-based chatbots for dialect bias by measuring the extent to which they produce quality-of-service harms, which occur when systems do not work equally well for different people. Our framework has three key characteristics that make it useful in practice. First, by leveraging dynamically generated instead of pre-existing text, our framework enables testing over any dialect, facilitates multi-turn conversations, and represents how users are likely to interact with chatbots in the real world. Second, by measuring quality-of-service harms, our framework aligns audit results with the real-world outcomes of chatbot use. Third, our framework requires only query access to an LLM-based chatbot, meaning that it can be leveraged equally effectively by internal auditors, external auditors, and even individual users in order to promote accountability. To demonstrate the efficacy of our framework, we conduct a case study audit of Amazon Rufus, a widely-used LLM-based chatbot in the customer service domain. Our results reveal that Rufus produces lower-quality responses to prompts written in minoritized English dialects.

1 10 31

Reposted by andrea wang

John Garrison Marks @johngmarks.com · Jun 10

Worth noting today that the entire budget of the NEH is about $200M.

emptywheel @emptywheel.bsky.social · Jun 10

According to acting DOD Comptroller Bryn McDonnell it'll cost $134M for the deployment of the Guard to Los Angeles.

6 230 420

Reposted by andrea wang

David Mimno @dmimno.bsky.social · Jun 10

New NEH-supported tutorial on running LLMs locally with ollama! Your laptop is more powerful than you think. Save money, privacy, and energy.

aiforhumanists.com/tutorials/

Code Tutorials

The AI for Humanists project is developing resources to enable DH scholars to explore how large language models and AI technologies can be used in their research and teaching. Find an annotated biblio...

aiforhumanists.com

3 24 63

Reposted by andrea wang

Lucy Li @lucy3.bsky.social · May 5

I'm joining Wisconsin CS as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, & responsible AI. 🌲🧀🚣🏻‍♀️ Apply to be my PhD student!

Before then, I'll postdoc for a year in the NLP group at another UW 🏔️ in the Pacific Northwest

Wisconsin-Madison's tree-filled campus, next to a big shiny lake

A computer render of the interior of the new computer science, information science, and statistics building. A staircase crosses an open atrium with visibility across multiple floors

16 14 150

Reposted by andrea wang

Alex Gil @elotroalex.bsky.social · Apr 27

For the HTR and OCR crew:

New paper by Jonathan Bourne. He's been working to help DLOC handle OCR for a whole bunch of Caribbean historical newspapers. "Scrambled text: fine-tuning language models for OCR error correction using synthetic data" link.springer.com/article/10.1...

Scrambled text: fine-tuning language models for OCR error correction using synthetic data - International Journal on Document Analysis and Recognition (IJDAR)

OCR errors are common in digitised historical archives significantly affecting their usability and value. Generative Language Models (LMs) have shown potential for correcting these errors using the co...

link.springer.com

4 14 44

Reposted by andrea wang

Maria Antoniak @mariaa.bsky.social · Apr 25

Slightly paraphrasing @oms279.bsky.social during his talk at #COMPTEXT2025:

"The single most important use case for LLMs in sociology is turning unstructured data into structured data."

Discussing his recent work on codebooks, prompts, and information extraction: osf.io/preprints/so...

2 5 29

Reposted by andrea wang

Simona Liao @simonaliao.bsky.social · Dec 2

Hi everyone, I am excited to share our large-scale survey study with 800+ researchers, which reveals researchers’ usage and perceptions of LLMs as research tools, and how the usage and perceptions differ based on demographics.

See results in comments!

🔗 Arxiv link: arxiv.org/abs/2411.05025

LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions

The rise of large language models (LLMs) has led many researchers to consider their usage for scientific work. Some have found benefits using LLMs to augment or automate aspects of their research pipe...

arxiv.org

9 31 100

andrea wang @andreawwenyi.bsky.social · Apr 9

China is a nation with over a hundred minority languages and many ethnic groups. What does this say about China’s 21st century AI policy?

1

andrea wang @andreawwenyi.bsky.social · Apr 9

This suggests a break from China’s past stance of using inclusive language policy as a way to build a multiethnic nation. We see no evidence of socio-political pressure or carrots for Chinese AI groups to dedicate resources for linguistic inclusivity.

1 1

andrea wang @andreawwenyi.bsky.social · Apr 9

In fact, many LLMs from China fail to even recognize some lower resource Chinese languages such as Uyghur.

1 1

andrea wang @andreawwenyi.bsky.social · Apr 9

LLMs from China are highly correlated with Western LLMs in multilingual performance (0.93 - 9.99) on tasks such as reading comprehension.

1 2

andrea wang @andreawwenyi.bsky.social · Apr 9

[New preprint!] Do Chinese AI Models Speak Chinese Languages? Not really. Chinese LLMs like DeepSeek are better at French than Cantonese. Joint work with
Unso Jo and @dmimno.bsky.social . Link to paper: arxiv.org/pdf/2504.00289
🧵

1 6 25

Reposted by andrea wang

Sung Kim @sungkim.bsky.social · Mar 31

You’ve probably heard about how AI/LLMs can solve Math Olympiad problems ( deepmind.google/discover/blo... ).

So naturally, some people put it to the test — hours after the 2025 US Math Olympiad problems were released.

The result: They all sucked!

9 51 180

Reposted by andrea wang

travis lloyd (træve) @travislloydphd.bsky.social · Mar 26

*NEW DATASET AND PAPER* (CHI2025): How are online communities responding to AI-generated content (AIGC)? We study this by collecting and analyzing the public rules of 300,000+ subreddits in 2023 and 2024. 1/

1 5 16

Reposted by andrea wang

Dr. Casey Fiesler @cfiesler.bsky.social · Mar 19

hey it's that time of year again, when people start to wonder whether AIES is actually happening and when this year’s paper deadline might be if so! anyone know anything about the ACM/AAAI conference on AI Ethics & Society for 2025?

(I used to ask about this every year on Twitter haha.)

1 6 21

Reposted by andrea wang

David Mimno @dmimno.bsky.social · Nov 11

Best Student Paper at #AIES 2024 went to @andreawwenyi.bsky.social! Annotating gender-biased narratives in the courtroom is a complex, nuanced task with frequent subjective decision-making by legal experts. We asked: What do experts desire from a language model in this annotation process?

1 4 19

andrea wang @andreawwenyi.bsky.social · Mar 28

Hi! Yess! Paper is here — aclanthology.org/2023.emnlp-m...

1 1