Collective Intelligence Project
banner
cip.org
Collective Intelligence Project
@cip.org
We're on a mission to steer transformative technology for the collective good.

cip.org
- Why "uncommon ground" beats common ground every time

- Sci-fi book recommendations

- And much more
August 15, 2025 at 2:08 PM
- Our work bringing 100K+ people into AI development through globaldialogues.ai

- How we're building evaluation benchmarks from lived experiences, not just lab tests

- Digital twins that could represent your values without taking up all your evenings
Global Dialogues
Exploring humanity's vision for artificial intelligence through global conversations and collective intelligence.
globaldialogues.ai
August 15, 2025 at 2:08 PM
What you'll find in this episode:

- How Taiwan crowdsourced anti-deepfake legislation in 24 hours (and it worked)

- Why 1 in 3 adults now use AI for daily emotional support, and what that means for democracy
August 15, 2025 at 2:08 PM
@divya.bsky.social and @audreyt.org joined @reidhoffman.bsky.social and Aria Finger, hosts of the Possible Podcast, to talk about how democracy and AI can bring out the best of each other.

Apple: podcasts.apple.com/us/podcast/a...

Spotify: open.spotify.com/episode/6UDj...
Audrey Tang and Divya Siddarth on Outfitting Democracy for the AI Era
Podcast Episode · Possible · 08/13/2025 · 52m
podcasts.apple.com
August 15, 2025 at 2:08 PM
We're asking the a global sample of the world: "𝖯𝖾𝗋𝗌𝗈𝗇𝖺𝗅𝗅𝗒, 𝗐𝗈𝗎𝗅𝖽 𝗒𝗈𝗎 𝖾𝗏𝖾𝗋 𝖼𝗈𝗇𝗌𝗂𝖽𝖾𝗋 𝗁𝖺𝗏𝗂𝗇𝗀 𝖺 𝗋𝗈𝗆𝖺𝗇𝗍𝗂𝖼 𝗋𝖾𝗅𝖺𝗍𝗂𝗈𝗇𝗌𝗁𝗂𝗉 𝗐𝗂𝗍𝗁 𝖺𝗇 𝖠𝖨, 𝗂𝖿 𝗍𝗁𝖾 𝖠𝖨 𝗐𝖺𝗌 𝖺𝖽𝗏𝖺𝗇𝖼𝖾𝖽 𝖾𝗇𝗈𝗎𝗀𝗁?"

Prediction time: What % do you think will say yes?

Tell us your response in the comments!
May 26, 2025 at 5:59 PM
10/10: Read the piece to learn more about this under-explored issue.

It includes specific strategies to address these biases and provides access to the full Github suite.

www.cip.org/blog/llm-jud...
LLM Judges Are Unreliable — The Collective Intelligence Project
When Large Language Models are used as judges for decision-making across various sensitive domains, they consistently exhibit unpredictable and hidden measurement biases, making their verdicts unrelia...
www.cip.org
May 23, 2025 at 5:27 PM
9/10: We built a Github suite to systematically test and quantify these biases.

It lets you:
May 23, 2025 at 5:27 PM
8/10: To improve reliability: Neutralize labels, vary order, empirically validate all prompt components, and optimize scoring mechanics. Diversify your model portfolio and critically evaluate human baselines.
May 23, 2025 at 5:27 PM
7/10: These aren't just minor quirks. LLMs lack the mechanistic precision of traditional software. Their architecture means system prompts and input material exist in the same context, leading to unpredictable interactions.
May 23, 2025 at 5:27 PM
6/10: Rubric-based scoring is also affected. We observed 'recency bias' where criteria scored later received lower averages. Holistic vs. isolated evaluation dramatically shifted scores too.
May 23, 2025 at 5:27 PM
5/10: For example, in pairwise choices, LLMs favored "Response B" 60-69% of the time, a significant deviation from random. Even explicit "de-biasing" prompts sometimes increased bias.
May 23, 2025 at 5:27 PM
4/10: LLMs exhibit cognitive biases similar to humans: serial position, framing, anchoring. Our tests across frontier models from Google, Mistral, Anthropic, and OpenAI consistently show these biases in judgment contexts.
May 23, 2025 at 5:27 PM
3/10: "Prompt engineering" often relies on untested folklore. We found even minor prompt changes, like "Response A" vs. "Response B" labeling, significantly bias LLM choices.
May 23, 2025 at 5:27 PM
2/10: This is important because LLMs are increasingly deployed for evaluation tasks, ranking, decision-making, and judgement in many critical domains.
May 23, 2025 at 5:27 PM
1/10: LLM Judges Are Unreliable.

Our latest blog post from @j11y.io shows that positional preferences, order effects, and prompt sensitivity fundamentally undermine the reliability of LLM judges.
May 23, 2025 at 5:27 PM
Reposted by Collective Intelligence Project
The Collective Intelligence Project @cip.org has launched the Global Dialogues Challenge, an open call to explore global perspectives on the future of artificial intelligence.

A $10,000 prize fund will be distributed among the winning entrants.

www.cip.org/challenge
Global Dialogues Challenge — The Collective Intelligence Project
www.cip.org
May 21, 2025 at 2:56 PM
Reposted by Collective Intelligence Project
We're really thrilled to be able to have such a juicy prize fund. If you're feeling a sassiness with data and want to build something small to explore or inspire better AI for humans, take a look and enter. cip.org/challenge

Step 1. Grab the data.
Step 2. Build something cool.

<3
May 20, 2025 at 12:39 PM
Details and how to apply: cip.org/challenge
Global Dialogues Challenge — The Collective Intelligence Project
cip.org
May 19, 2025 at 5:56 PM
Submissions will be judged by an amazing panel:

@audreyt.org (Cyber Ambassador-at-large for Taiwan)

@nabiha.bsky.social (Executive Director of @mozilla.org )

Zoe Hitzig (Research Scientist at OpenAI and Poet)
May 19, 2025 at 5:56 PM
The challenge runs from Monday, May 19th through Friday, July 11th.

A $10,000 prize fund will be distributed among the winning submissions.
May 19, 2025 at 5:56 PM
This is an open call to explore global perspectives on AI using the public datasets sourced from our globaldialogues.ai project.

Participants can submit benchmarks, visualizations, artistic responses, or analytical reflections.
Global Dialogues
Exploring humanity's vision for artificial intelligence through global conversations and collective intelligence.
Globaldialogues.ai
May 19, 2025 at 5:56 PM
We're officially launching the Global Dialogues Challenge!
May 19, 2025 at 5:56 PM
“We have been sort of stuck with outdated notions of what fairness and bias means for a long time,” says
@divya.bsky.social, “we have to be aware of differences, even if that becomes somewhat uncomfortable.”

Read the full @technologyreview.com article on new approaches to evaluating AI ⬇️
March 14, 2025 at 6:36 PM