Nick Vincent
@nickmvincent.bsky.social
290 followers 300 following 69 posts
Studying people and computers (https://www.nickmvincent.com/) Blogging about data and steering AI (https://dataleverage.substack.com/)
Posts Media Videos Starter Packs
Pinned
nickmvincent.bsky.social
Hi Bluesky (+ many friendly familiar faces). I'm a researcher in HCI + ML, assistant prof at Simon Fraser University up in BC, and working on "healthy data flow". Doing a quick thread recapping some recent writing (blogs, pre-prints, etc.) that capture the things I work on and talk about!
nickmvincent.bsky.social
Anyone compiling discussions/thoughts on emerging licensing schemes and preference signals? eg rslstandard.org and github.com/creativecomm... ? externalizing some notes here datalicenses.org, but want to find where these discussions are happening!
RSL: Really Simple Licensing
The open content licensing standard for the AI-first Internet
rslstandard.org
nickmvincent.bsky.social
Excited to be giving a talk on data leverage to the Singapore AI Safety Hub. Trying to capture updated thoughts from recent years, and have long wanted to better connect leverage/collective bargaining to the safety context.
nickmvincent.bsky.social
About a week away from the deadline to submit to the

✨ Workshop on Algorithmic Collective Action (ACA) ✨

acaworkshop.github.io

at NeurIPS 2025!
About the workshop – ACA@NeurIPS
acaworkshop.github.io
nickmvincent.bsky.social
(1) ongoing challenges in benchmarking, (2) challenges in communicating benchmarks to the public, (3) dataset documentation, and (4) post-hoc dataset "reverse engineering"

The original post: dataleverage.substack.com/p/selling-ag...
nickmvincent.bsky.social
who paid that Dr for a verified attestation with provenance can use this attestation as a quality signal; a promise to consumers about the exact nature of the evaluation. A "9/10 dentists recommend" for a chatbot.

More generally, I think there are interesting connections between current discourse &
nickmvincent.bsky.social
For some types of info, we can maybe treat as open and focus on selling convenient/"nice" packages (ala Wikimedia Enterprise)

But attestations provide another object to transact over. Valuable info (a Dr giving thumbs up/down on medical responses) may leak, but the AI developer
nickmvincent.bsky.social
So in a post-AI world, to help people transact over work that produces information, we likely need:
- individual property-ish rights over info (not a great way to go, IMO)
- rights that enable collective bargaining (good!)
- or...
nickmvincent.bsky.social
The core challenge: many inputs into AI are information, and thus hard to design efficient markets for. Info is hard to exclude (pre-training data remains very hard to exclude, but even post-training data may be hard without sufficient effort)
nickmvincent.bsky.social
It looks like some skepticism was warranted (not much progress towards this vision yet). I do think "dataset details as quality signals" is still possible though, and could play a key role in addressing looming information economics challenges.
nickmvincent.bsky.social
🧵In several recent posts, I speculated that eventually, dataset details may become an important quality signal for consumers choosing AI products.

"This model is good for asking health questions, because 10,000 doctors attested to supporting training and/or eval". Etc.
nickmvincent.bsky.social
[FAccT-related link round-up]: It was great to present on measuring Attentional Agency with Zachary Wojtowicz at FAccT. Here's our paper on ACM DL: dl.acm.org/doi/10.1145/...

On Thurs Aditya Karan will present on collective action dl.acm.org/doi/10.1145/... at 10:57 (New Stage A)
Algorithmic Collective Action with Two Collectives | Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency
You will be notified whenever a record that you have chosen has been cited.
dl.acm.org
nickmvincent.bsky.social
“Attentional agency” — talk in new stage b at facct in the session right now!
nickmvincent.bsky.social
Off to FAccT; Excited to see faces old and new!
nickmvincent.bsky.social
Do some aspects seem wrong (in the next 2 posts, I get into how these ideas interact w/ reinforcement learning)?
nickmvincent.bsky.social
This has implications for Internet policy, for understanding where the value in AI comes from, and for thinking about why we might even consider a certain model to be "good"!

This first post leans heavily on recent work with Zachary Wojtowicz and Shrey Jain, to appear at this upcoming FAccT
nickmvincent.bsky.social
New data leverage post: "Google and TikTok rank bundles of information; ChatGPT ranks grains."

dataleverage.substack.com/p/google-and...

This will be post 1/3 in a series about viewing many AI products as all competing around the same task: ranking bundles or grains of records made by people.
Google and TikTok rank bundles of information; ChatGPT ranks grains.
Google and others solve our attentional problem by ranking discrete bundles of information, whereas ChatGPT ranks more granular chunks. This lens can help us reason about AI policy.
dataleverage.substack.com