Lightnews — Scholar-powered news

Simona Liao

@simonaliao.bsky.social

63 followers 2 following 6 posts

She/They UW CSE Master Graduate on Human Computer Interaction Product Manager @ Microsoft

Posts Media Videos Starter Packs

Simona Liao @simonaliao.bsky.social · Dec 4

Thanks for your question, Sung! Here is a pie chart representing the racial identities of non-native English speakers. White folks take up ~50% (most are from European countries) followed up by ~20% Asian folks

Pie chart representing the racial groups, with count in brackets
51.4% White/Caucasian (127)
17.8% Asian (44)
8.9% Prefer not to disclose (22)
8.1% Prefer to self-describe (20)
8.1% Hispanic and Latino (20)
3.6% Black or African American (9)
2.0% Middle Eastern (5)

Simona Liao @simonaliao.bsky.social · Dec 2

5. Researchers in computer science fields are more comfortable disclosing their LLM usage and have lower ethical concerns compared to researchers in other disciplines (see fig 4).

Simona Liao @simonaliao.bsky.social · Dec 2

4. Women and non-binary researchers have greater ethical concerns, as do those with more years of research experience (see fig 4).

Simona Liao @simonaliao.bsky.social · Dec 2

2. Researchers who are non-White, non-native English speaking, and junior researchers both use LLMs more frequently and also perceive higher benefits and lower risks (see Fig 4).

3. Equity was a large theme in respondents’ discussions of the benefits of LLMs.

Figure 4: A collection of 36 heatmaps. The y-axis or each row represents a demographic breakdown (from top to bottom: Race, Gender, Langauge, Experience, Field of Study, and All participants). Each column, from left to right, represents LLM usage frequency, perception of risk, benefits, ethics, the comfort level of disclosing to peers, and of disclosing to reviewers. For each individual heatmap, the x-axis includes the six types of LLM usage (from left to right, information seeking, editing, ideation & framing, direct writing, data cleaning & analysis, and data generation.).

Simona Liao @simonaliao.bsky.social · Dec 2

Our Key Takeaways:

1. 81% of researchers we surveyed have used LLMs in one or more places in their research pipeline, with the tasks of Information Seeking and Editing reported most frequently and Data Analysis and Generation reported least frequently (see fig 2).

Fig 2: Overview of Usage Frequency Divided by LLM Usage Type (N=816). The left diverging bar chart displays the distribution
of usage frequency across different types of LLM usage, with each type represented by a separate row. The frequency levels, from left to right, are: Very Rarely, Rarely, Occasionally, Frequently, and Very Frequently, with the midpoint of the chart centered at "Occasionally."
The grey bar chart on the right indicates the percentage of responses that report "Never" using LLMs for each corresponding type. From this plot, we can tell that researchers report using LLMs for Information Seeking and Editing most frequently, and for Data
Cleaning & Analysis and Data Generation the least frequently.

Simona Liao @simonaliao.bsky.social · Dec 2

Hi everyone, I am excited to share our large-scale survey study with 800+ researchers, which reveals researchers’ usage and perceptions of LLMs as research tools, and how the usage and perceptions differ based on demographics.

See results in comments!

🔗 Arxiv link: arxiv.org/abs/2411.05025

LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions

The rise of large language models (LLMs) has led many researchers to consider their usage for scientific work. Some have found benefits using LLMs to augment or automate aspects of their research pipe...