Lightnews — Scholar-powered news

Prolific

@joinprolific.bsky.social

270 followers 1.4K following 160 posts

The ultimate human data platform to power world-changing AI and research. 🔗 www.prolific.com

Posts Media Videos Starter Packs

Pinned

Prolific @joinprolific.bsky.social · Aug 21

🔍 Audience Finder is here.

Instantly check if the participants you need for your research are on Prolific. No set-up required.

Just type who you’re looking for (“Bilingual psychologists who speak Mandarin”) and see live results in seconds.

➡️ www.prolific.com/audience-fin... | #AcademicSky

Prolific @joinprolific.bsky.social · 19d

AI benchmarks show models like GPT-5 might ace scientific reasoning, or Gemini and Claude adapt to new concepts. But which LLMs offer the best user experience?

@tomsguide.com explains how Prolific's AI leaderboard HUMAINE is a big step forward.

#ArtificialIntelligence #MachineLearning #LLM

27 AI models were ranked by the public and ChatGPT came 8th — these are the models that beat it

A surprising set of results for the big AI chatbots

www.tomsguide.com

Reposted by Prolific

Andrew Gordon @andrewgordon.bsky.social · 21d

Super excited to share this work with the world!

We wanted to put human preference at the heart of our AI model leaderboard, and to do so in a principled and representative manner.

The result: HUMAINE

Check out the data here and get in touch with any questions: huggingface.co/spaces/Proli...

Prolific @joinprolific.bsky.social · 21d

Get more insights, compare models, and give us a like on Hugging Face ➡️ huggingface.co/spaces/Proli...

Follow for updates on requested models.

HUMAINE Leaderboard - a Hugging Face Space by ProlificAI

This application helps you analyze human feedback to evaluate AI models. You provide feedback data, and it gives you insights to improve your models.

Prolific @joinprolific.bsky.social · 21d

Probability of Practical Superiority (PPS) Comparison Matrix is available under "Model Comparison" for estimating how likely one model is likely to outperform another in real-world usage.

Values >55% suggest one model meaningfully outperforms beyond random chance.

Prolific @joinprolific.bsky.social · 21d

HUMAINE uses rigorous methodology to ensure representative and reliable results.

Our methodology includes:

- Comparative assessment
- Statistically rigorous hierarchical modeling
- Four core dimensions of user experience and overall winner
- Commitment to demographic representation

Prolific @joinprolific.bsky.social · 21d

🏆 Gemini-2.5-Pro takes the #1 spot decisively, with DeepSeek-V3 emerging as the surprise runner-up.

⚖️ Age groups show highest disagreement on model rankings.

📊 27 models tested across 100K+ human comparisons - one of the largest public AI evaluations to date.

Prolific @joinprolific.bsky.social · 21d

Our human-centered evaluation and leaderboard shows how people actually experience AI models in the real world.

Rich, multi-dimensional feedback from a diverse, representative sample of real users from our pool - revealing not just which model they prefer, but why.

Prolific @joinprolific.bsky.social · 21d

Introducing HUMAINE: the LLM benchmark that puts real human experience first 🎯

21,352 human evaluators. 27 models. 22 demographic groups. 5 evaluation dimensions.

In partnership with @hf.co. See insights: huggingface.co/spaces/Proli...

More in 🧵 #ArtificialIntelligence #MLSky #LLM #Developer

Visual of Prolific's human-centered leaderboard for Large Language Models (LLMs). There are four visible bar graphs with the labels of gemini-2.5-pro, deepseek-chat-v3-0324, magistral-medium-2506, and grok-4. The bars appear on the branded Prolific blue, pink, and white blackground. The title reads "The benchmark with human experience at the center."

Prolific @joinprolific.bsky.social · 27d

We've also launched new features like regionally-stratified representative samples for the US and UK, upgraded AI-powered audience search, and more.

See everything that's new here: www.prolific.com/resources/wh...

What's New at Prolific: regional rep samples and more | Prolific

Discover Prolific's September 2025 updates: regional rep samples, new skilled participants, and more

www.prolific.com

Prolific @joinprolific.bsky.social · 27d

New: Periodic identity confirmation checks with video recording to ensure authentic, human feedback in your academic and industry research.

Along with bank-grade ID verification, this ensures participants are who they say they are—not bots, duplicates, or fraud.

#AcademicSky #ResearchIntegrity

Prolific @joinprolific.bsky.social · Sep 8

AI/ML researchers in San Francisco, we have a few spaces left at our community event 🇺🇸

Join us Sep 11 for this month’s theme: Researchers building real-world AI tooling. With speakers Prolific CEO Phelim Bradley and Jiaxin Pei, Postdoc at @stanfordhai.bsky.social. 👇

#MLSky #ArtificialIntelligence

Prolific Meetup #1 – Researchers Building Real-World AI Tooling · Luma

About the Event Prolific is bringing the AI community together in San Francisco to explore the evolving role of humans in post-training, alignment, evaluation,…

Prolific @joinprolific.bsky.social · Aug 21

🔍 Audience Finder is here.

Instantly check if the participants you need for your research are on Prolific. No set-up required.

Just type who you’re looking for (“Bilingual psychologists who speak Mandarin”) and see live results in seconds.

➡️ www.prolific.com/audience-fin... | #AcademicSky

Prolific @joinprolific.bsky.social · Aug 18

👉 Model-based screening during participant onboarding through Protocol, our proprietary data protection system, where we run internal models to flag LLM-generated responses with high confidence.

More information and tips in this transparent session: youtu.be/MBo50M6etCk

Keeping Research Real in the Age of AI: LLM Detection, Data Quality, and Fraud Prevention | Prolific

YouTube video by Prolific

Prolific @joinprolific.bsky.social · Aug 18

👉 Bi-weekly data quality audits where a specialized team of human reviewers benchmark data quality across honesty, transparency, verbosity, and attention.

👉 Authenticity checks, our own tool that detects LLM-generated responses using advanced behavioral analysis—with 98.7% accuracy.

Prolific @joinprolific.bsky.social · Aug 18

In the age of AI, authenticity matters more than ever in academic research. LLM-generated responses, false demographics, and more pose a threat.

VP of Product Sara details tools and internal measures we’ve implemented to help maintain research integrity 🧵

#AcademicSky #ResearchIntegrity

Prolific @joinprolific.bsky.social · Aug 6

Absolutely. We continuously adapt our systems to ensure high quality data. Our LLM detection tool uses advanced behavioral analysis (98.7% accuracy), our data protection system also catches LLM use at onboarding stage, and more. Please do report low-quality IDs in-app and get in touch with feedback.

Prolific @joinprolific.bsky.social · Aug 4

Interesting findings around how agents compete with humans for partnerships by @yaominj.bsky.social, @levinbrinkmann.bsky.social, Anne-Marie Nussberger, Ivan Soraperra, @jfbonnefon.bsky.social, @iyadrahwan.bsky.social, @mpib-berlin.bsky.social.

Taskers sourced via Prolific.

#AI #MLSky #AcademicSky

Prolific @joinprolific.bsky.social · Aug 4

"Through three experiments (N = 975), we found that bots, though more prosocial than humans and linguistically distinguishable, were not selected preferentially when their identity was hidden. Instead, humans misattributed bots’ behavior to humans and vice versa."

arxiv.org/abs/2507.13524

Reposted by Prolific

Michael Cohen @indecisionwins.bsky.social · Jul 28

New preprint -- we (me, @klempert.bsky.social, Dave Wolk, and Joe Kable) examined whether characteristic aging-related changes in cognition and personality show up on six online platforms (3 crowdsourcing sites -- MTurk, CloudResearch Toolkit, & Prolific, and 3 panels). osf.io/preprints/ps... 1/

Reposted by Prolific

Andrew Gordon @andrewgordon.bsky.social · Jul 28

Sharing a piece I wrote for The AI Journal on who should govern AI: aijourn.com/a-fine-balan...

Our recent @joinprolific.bsky.social polling shows 69.7% of people think AI investment will primarily benefit corporations not the public. They're probably right. [1/3]

#AI #AIGovernance #TechPolicy

Prolific @joinprolific.bsky.social · Jul 24

Great analysis and always a pleasure to support.

Prolific @joinprolific.bsky.social · Jul 24

Original thread: bsky.app/profile/kobi...

Congratulations Kobi and team. Pleased to have supported this project.

Kobi Hackenburg @kobihackenburg.bsky.social · Jul 21

Today (w/ @ox.ac.uk @stanford @MIT @LSE) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19  LLMs, 707 political issues.

We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more!

🧵:

Prolific @joinprolific.bsky.social · Jul 24

The largest investigation of AI persuasion with 76,977 participants across 3 large-scale experiments. Excellent work by @ox.ac.uk PhD candidate @kobihackenburg.bsky.social.

19 LLMs. 707 political issues. 466,769 fact-checkable claims evaluated.

arxiv.org/abs/2507.13919

#AcademicSky #MLSky #PhDSky

A still image of the PhD candidate Kobi Hackenburg's AI persuasion research paper titled "The Levers of Political Persuasion with Conversational AI."

Prolific @joinprolific.bsky.social · Jul 18

Sorry to hear this Jim, we're keen to investigate this jump. You can report it quickly within the platform ("Action" tab), but alternatively could you submit a data quality report ASAP if not already so we can investigate the IDs more urgently? Thank you! forms.prolific.com/to/Zv6w7ZWt

Prolific @joinprolific.bsky.social · Jul 10

4) Report data quality concerns in app

If you do experience data quality issues, we want to investigate quickly. To easily report it, click the "Action" dropdown on the submissions page to flag any concerns.

Full list of updates: www.prolific.com/resources/wh...