Lightnews — Scholar-powered news

Reposted by Divya Shanmugam

Gabriel Agostini @gsagostini.bsky.social · Sep 3

Are you a researcher using computational methods to understand cities?

@mfranchi.bsky.social @jennahgosciak.bsky.social and I organize an EAAMO Bridges working group on Urban Data Science and we are looking for new members!

Fill the interest form on our page: urban-data-science-eaamo.github.io

Urban Data Science & Equitable Cities | EAAMO Bridges

EAAMO Bridges Urban Data Science & Equitable Cities working group: biweekly talks, paper studies, and workshops on computational urban data analysis to explore and address inequities.

urban-data-science-eaamo.github.io

1 7 7

Divya Shanmugam @dmshanmugam.bsky.social · Aug 22

can't recommend highly enough!

Emma Pierson @emmapierson.bsky.social · Aug 22

🚨 New postdoc position in our lab at Berkeley EECS! 🚨

(please reshare)

We seek applicants with experience in language modeling who are excited about high-impact applications in the health and social sciences!

More info in thread

1/3

2

Reposted by Divya Shanmugam

Monica Agrawal @monicaagrawal.bsky.social · Jul 15

Excited to be at #ICML2025 to present our paper on 'pragmatic misalignment' in (deployed!) RAG systems: narrowly "accurate" responses that can be profoundly misinterpreted by readers.

It's especially dangerous for consequential domains like medicine! arxiv.org/pdf/2502.14898

A person searching for risks of surgery. A traditional search engine would surface websites that would likely include both pros and cons of the surgery. However, RAG results only excerpt the cons.

2 12

Reposted by Divya Shanmugam

Serena Booth @reniebird.bsky.social · Jul 14

I'll be presenting a position paper about consumer protection and AI in the US at ICML. I have a surprisingly optimistic take: our legal structures are stronger than I anticipated when I went to work on this issue in Congress.

Is everything broken rn? Yes. Will it stay broken? That's on us.

A poster for the paper "Position: Strong Consumer Protection is an Inalienable Defense for AI Safety in the United States"

1 5 19

Reposted by Divya Shanmugam

Allison Koenecke @allisonkoe.bsky.social · Jun 22

🎉Excited to present our paper tomorrow at @facct.bsky.social, “Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese”, with @brucelyu17.bsky.social, Jiebo Luo and Jian Kang, revealing 🤖 LLM performance disparities. 📄 Link: arxiv.org/abs/2505.22645

"Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese" Abstract:

While the capabilities of Large Language Models (LLMs) have been studied in both Simplified and Traditional Chinese, it is yet unclear whether LLMs exhibit differential performance when prompted in these two variants of written Chinese. This understanding is critical, as disparities in the quality of LLM responses can perpetuate representational harms by ignoring the different cultural contexts underlying Simplified versus Traditional Chinese, and can exacerbate downstream harms in LLM-facilitated decision-making in domains such as education or hiring. To investigate potential LLM performance disparities, we design two benchmark tasks that reflect real-world scenarios: regional term choice (prompting the LLM to name a described item which is referred to differently in Mainland China and Taiwan), and regional name choice (prompting the LLM to choose who to hire from a list of names in both Simplified and Traditional Chinese). For both tasks, we audit the performance of 11 leading commercial LLM services and open-sourced models -- spanning those primarily trained on English, Simplified Chinese, or Traditional Chinese. Our analyses indicate that biases in LLM responses are dependent on both the task and prompting language: while most LLMs disproportionately favored Simplified Chinese responses in the regional term choice task, they surprisingly favored Traditional Chinese names in the regional name choice task. We find that these disparities may arise from differences in training data representation, written character preferences, and tokenization of Simplified and Traditional Chinese. These findings highlight the need for further analysis of LLM biases; as such, we provide an open-sourced benchmark dataset to foster reproducible evaluations of future LLM behavior across Chinese language variants (this https URL).

Figure showing that three different LLMs (GPT-4o, Qwen-1.5, and Taiwan-LLM) may answer a prompt about pineapples differently when asked in Simplified Chinese vs. Traditional Chinese.

Figure showing that LLMs disproportionately answer questions about regional-specific terms (like the word for "pineapple," which differs in Simplified and Traditional Chinese) correctly when prompted in Simplified Chinese as opposed to Traditional Chinese.

Figure showing that LLMs have high variance of adhering to prompt instructions, favoring Traditional Chinese names over Simplified Chinese names in a benchmark task regarding hiring.

1 4 17

Reposted by Divya Shanmugam

Shaily @shaily99.bsky.social · Jun 9

🖋️ Curious how writing differs across (research) cultures?
🚩 Tired of “cultural” evals that don't consult people?

We engaged with interdisciplinary researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗

📜 arxiv.org/abs/2506.00784

[1/11]

An overview of the work “Research Borderlands: Analysing Writing Across Research Cultures” by Shaily Bhatt, Tal August, and Maria Antoniak. The overview describes that We survey and interview interdisciplinary researchers (§3) to develop a framework of writing norms that vary across research cultures (§4) and operationalise them using computational metrics (§5). We then use this evaluation suite for two large-scale quantitative analyses: (a) surfacing variations in writing across 11 communities (§6); (b) evaluating the cultural competence of LLMs when adapting writing from one community to another (§7).

1 30 74

Divya Shanmugam @dmshanmugam.bsky.social · Jun 14

and... here is the actual GIF 🙈

1 3

Divya Shanmugam @dmshanmugam.bsky.social · Jun 14

it brings me tremendous joy you noticed!!!

Divya Shanmugam @dmshanmugam.bsky.social · Jun 14

Last but not least, thanks to Helen Lu, @swamiviv1, and John Guttag, my wonderful collaborators on this work! One of my last from the PhD 🥹

1

Divya Shanmugam @dmshanmugam.bsky.social · Jun 14

Check out the paper for more details or come by Poster 458 this afternoon at CVPR.

arxiv.org/abs/2505.22764

Thanks also to MIT News for covering this work!

news.mit.edu/2025/making-...

Test-time augmentation improves efficiency in conformal prediction

A conformal classifier produces a set of predicted classes and provides a probabilistic guarantee that the set includes the true class. Unfortunately, it is often the case that conformal classifiers p...

arxiv.org

1 3

Divya Shanmugam @dmshanmugam.bsky.social · Jun 14

Empirically, TTA reduces prediction set sizes by 10-14% on average, with larger improvements for (1) classes with the largest prediction set sizes and (2) stronger coverage guarantees.

1 2

Divya Shanmugam @dmshanmugam.bsky.social · Jun 14

We also present a new finding on TTA that explains its value to conformal scores: it promotes the true class to be more likely even when it is predicted to be unlikely, which is valuable for conformal scores that rely on orderings over predicted probabilities (e.g. APS, RAPS)!

1 2

Divya Shanmugam @dmshanmugam.bsky.social · Jun 14

We show that test-time augmentation (TTA)—a classic vision technique—is a simple and surprisingly effective way to shrink sets while maintaining coverage. TTA aggregates predictions over transformations of an input (a neat way to create an ensemble out of a single classifier!)

1 2

Divya Shanmugam @dmshanmugam.bsky.social · Jun 14

New work 🎉: conformal classifiers return sets of classes for each example, with a probabilistic guarantee the true class is included. But these sets can be too large to be useful.

In our #CVPR2025 paper, we propose a method to make them more compact without sacrificing coverage.

A gif explaining the value of test-time augmentation to conformal classification. The video begins with an illustration of TTA reducing the size of the predicted set of classes for a dog image, and goes on to explain that this is because TTA promotes the true class's predicted probability to be higher, even when it's predicted to be unlikely.

3 6 22

Divya Shanmugam @dmshanmugam.bsky.social · Jun 12

One place you can find me is Poster Session 4 on Saturday, at 5PM, presenting recent work on how you can use test-time augmentation to reduce the size of sets produced by conformal prediction. Full paper thread coming shortly :) here is the paper in the meantime: arxiv.org/abs/2505.22764

Test-time augmentation improves efficiency in conformal prediction

A conformal classifier produces a set of predicted classes and provides a probabilistic guarantee that the set includes the true class. Unfortunately, it is often the case that conformal classifiers p...

arxiv.org

2

Divya Shanmugam @dmshanmugam.bsky.social · Jun 12

I’m in Nashville this week for #CVPR2025! DM me to chat about conformal prediction, test-time adaptation, or model reliability. Excited to see new work and to catch up with friends old and new!!

1 4

Reposted by Divya Shanmugam

Jenn Wortman Vaughan @jennwv.bsky.social · May 20

Please help us spread the word! 📣

FATE is hiring a pre-doc research assistant! We're looking for candidates who will have completed their bachelor's degree (or equivalent) by summer 2025 and want to advance their research skills before applying to PhD programs.

Hanna Wallach @hannawallach.bsky.social · May 20

Exciting news: The Fairness, Accountability, Transparency and Ethics (FATE) group at Microsoft Research NYC is hiring a predoctoral fellow!!! 🎉

www.microsoft.com/en-us/resear...

FATE Research Assistant (“Pre-doc”) - Microsoft Research

The Fairness, Accountability, Transparency, and Ethics (FATE) Research group at Microsoft Research New York City (MSR NYC) is looking for a pre-doctoral research assistant (pre-doc) to start August 20...

www.microsoft.com

28 39

Reposted by Divya Shanmugam

Erica Chiang @ericachiang.bsky.social · May 1

I really enjoyed (and learned a LOT from) working on this project with these wonderful co-authors:
@dmshanmugam.bsky.social
Ashley Beecy
Gabriel Sayer
@destrin.bsky.social
@nkgarg.bsky.social
@emmapierson.bsky.social
7/7

1 5

Divya Shanmugam @dmshanmugam.bsky.social · May 1

Erica’s new paper on a method to both measure *and* correct for three types of disparities associated with disease progression is now out! Check out the thread for more detail + findings from a case study on heart failure. Congratulations!!!

Erica Chiang @ericachiang.bsky.social · May 1

I’m really excited to share the first paper of my PhD, “Learning Disease Progression Models That Capture Health Disparities” (accepted at #CHIL2025)! ✨ 1/

📄: arxiv.org/abs/2412.16406

3

Divya Shanmugam @dmshanmugam.bsky.social · Apr 26

this is very good!!!

1

Divya Shanmugam @dmshanmugam.bsky.social · Apr 25

my friend jonah made a fun game that i now play everyday: guessten.com! please enjoy and send me your scores

GuessTen

guessten.com

1 2

Divya Shanmugam @dmshanmugam.bsky.social · Apr 9

just used this to source citations with great success - a very nice tool!!

Ai2 @ai2.bsky.social · Mar 26

Meet Ai2 Paper Finder, an LLM-powered literature search system.

Searching for relevant work is a multi-step process that requires iteration. Paper Finder mimics this workflow — and helps researchers find more papers than ever 🔍

Screenshot of the Ai2 Paper Finder interface

1

Divya Shanmugam @dmshanmugam.bsky.social · Apr 3

i’ve been wondering this too! thanks for asking

3

Divya Shanmugam @dmshanmugam.bsky.social · Apr 2

kenny had the great idea to spend a whole day analyzing dogs — so so fun! i like health data but turns out i love dog data

Kenny Peng @kennypeng.bsky.social · Apr 2

Our lab had a #dogathon 🐕 yesterday where we analyzed NYC Open Data on dog licenses. We learned a lot of dog facts, which I’ll share in this thread 🧵

1) Geospatial trends: Cavalier King Charles Spaniels are common in Manhattan; the opposite is true for Yorkshire Terriers.

7

Reposted by Divya Shanmugam

Gabriel Agostini @gsagostini.bsky.social · Mar 28

Migration data lets us study responses to environmental disasters, social change patterns, policy impacts, etc. But public data is too coarse, obscuring these important phenomena!

We build MIGRATE: a dataset of yearly flows between 47 billion pairs of US Census Block Groups. 1/5

5 18 41