Lightnews — Scholar-powered news

Sauvik Das

@sauvik.me

450 followers 110 following 120 posts

I work on human-centered {security|privacy|computing}. Associate Professor (w/o tenure) at @hcii.cmu.edu. Director of the SPUD (Security, Privacy, Usability, and Design) Lab. Non-Resident Fellow @cendemtech.bsky.social

Posts Media Videos Starter Packs

Pinned

Sauvik Das @sauvik.me · Nov 20

I created a starter pack for researchers who work at the nexus of HCI & cybersecurity / privacy here.

Please do let me know if you would like to be added to the list!I'm sure I've missed many folks.

go.bsky.app/RGsu5jn

10 23 28

Sauvik Das @sauvik.me · 2d

So are you telling me that if people are no longer afraid if they’ll be able to afford housing and food and other basic necessities that they will actually be willing and able to…buy things?

1 1

Sauvik Das @sauvik.me · 2d

upsell.feedback.gtfo

1 1

Sauvik Das @sauvik.me · 2d

#chi2026 reviewing season is here! Friendly reminder that it is possible to write thoughtful reviews without nitpicking and without it taking a lot of time. I've actually completed my seven 2AC reviews already.

Some (updated) tips I wrote about this:
sauvik-das.medium.com/a-quick-gui...

A quick guide to reviewing papers quick

Last year, the world was on fire [1]. While it burned, I reviewed. A lot. More than I care to review again in a single year.

sauvik-das.medium.com

2 3

Sauvik Das @sauvik.me · 11d

3) Finally, there is...very very little documentation associated with these datasets which made this audit much harder than it needed to be. To help improve documentation practices, we extended datasheets for datasets w/ audio-specific questions

Sauvik Das @sauvik.me · 11d

2) Most datasets pay little attention to representation — with the exception being Mozilla Common Voice. So, unsurprisingly, most audio data is in English and there is little attempt to ensure vocal representation from a broad set of individuals.

Sauvik Das @sauvik.me · 11d

1) While there is a lot of data that may be copyrighted, to circumvent copyright issues some datasets just comprise a lot of "old" audio data, e.g., sentences read from old newspapers and books that are now in the public domain.

Sauvik Das @sauvik.me · 11d

Our audit was broad: we included sound, voice, and music. We explored content, audio quality, language representation, toxicity, bias, and licensing adherence. Lots to unpack but three key findings:

Sauvik Das @sauvik.me · 11d

ML models are only as good as the data they are trained on, and there is understandably a lot of concern around how the data that powers these models are sourced.

Through a broad review of recent gen audio papers, we identified the most commonly used datasets and audited them.

Sauvik Das @sauvik.me · 11d

Large audio models power a broad suite of new applications: they can continue unfinished audio, clone voices, provide an expressive range of text-to-speech voices, and can even create entire songs from simple text-based prompts. But what are they trained on?

Sauvik Das @sauvik.me · 11d

📣 Accepted to #AIES2025: What do the audio datasets powering generative audio models actually contain? (led by @willie-agnew.bsky.social)

Answer: Lots of old audio content that is mostly English, often biased, and of dubious copyright / permissioning status.

Paper: www.sauvik.me/papers/65/s...

1 1 3

Sauvik Das @sauvik.me · 12d

@kyzyl.me will be presenting this at the Privacy session at #UIST2025 next Wednesday!

programs.sigchi.org/uist/2025/pr...

Please check it out if you'll be there :)

Conference Programs

programs.sigchi.org

Sauvik Das @sauvik.me · 12d

Importantly, we found that Imago Obscura helps *address* privacy risks without impacting sharing intent.

People believed it *greatly* reduced privacy risks for images they previously wanted to share but did not share for privacy reasons, with no difference in sharing intent.

1 1

Sauvik Das @sauvik.me · 12d

In a summative evaluation, we found that Imago Obscura effectively improved users' awareness of / motivation to address / ability to address key privacy risks in images they wanted to share online.

1 1

Sauvik Das @sauvik.me · 12d

It can identify when the background of an image might uniquely identify a user's location and replace it to mitigate that risk.

It can recognize when bystanders may be in the background of a photo and provides simple mechanisms to replace those bystanders.

1 1

Sauvik Das @sauvik.me · 12d

Examples: Imago Obscura flags content that could uniquely identify an individual, like an obvious tattoo, and can generate realistic and less identifiable replacements.

It also identifies information that a user might consider confidential, and can obfuscate it via, e.g., blurring.

1 1

Sauvik Das @sauvik.me · 12d

A longstanding usable privacy challenge is helping users understand and address privacy risks in personal images they share online, from location risks to bystander risks.

Imago Obscura lets users bring-their-own-threat model, and then helps identify and address specific risks.

1 1

Sauvik Das @sauvik.me · 12d

🔐 New #UIST2025 paper by @kyzyl.me

Imago Obscura uses #vision #language #models to understand user #privacy concerns, improve their awareness of image privacy risks, and their ability to address these risks.

📜: sauvik.me/papers/66/se...
🔗: cmu-spuds.github.io/imago-obscura/

#PrivacySky

1 1 4

Sauvik Das @sauvik.me · 20d

Honestly, now I want to write a piper titled: "[mention a specific relevant paper by them]"

Reposted by Sauvik Das

Ordinary Notes @hystericalblkns.bsky.social · 21d

If you’re on academia dot edu, let me suggest that you strongly consider deleting your account.

The new TOC from academia dot edu.

By creating an Account with Academia.edu, you grant us a worldwide, irrevocable, non-exclusive, transferable license, permission, and consent for Academia.edu to use your Member Content and your personal information (including, but not limited to, your name, voice, signature, photograph, likeness, city, institutional affiliations, citations, mentions, publications, and areas of interest) in any manner, including for the purpose of advertising, selling, or soliciting the use or purchase of Academia.edu's Services.

83 1.3K 2.1K

Sauvik Das @sauvik.me · 26d

CHI reviews? 😱

1 1

Sauvik Das @sauvik.me · 28d

well then, that's clearly going to be an example in my Designing Human-Centered Software class.

Sauvik Das @sauvik.me · Aug 15

Wait okay minor correction on the above. The macroeconomic environment is worse for the majority of people, but of course the richest people continue to feast

Sauvik Das @sauvik.me · Aug 15

Still worth paying attention of course to because if our students are not getting jobs, then that's a big problem we need to help them solve.

1 2

Sauvik Das @sauvik.me · Aug 15

The article notes that "AI" is probably not the issue. It's a handy bogeyman. I use AI coding assistants and I code more now, not less...

The market is worse relative to a few years ago, but that's probably largely because the macroeconomic environment is worse relative to a few years ago.

1 1

Sauvik Das @sauvik.me · Aug 14

having a latte at a local coffee shop with some salmon on toast while they’re blaring Soundgarden’s “Outshined" and it's raining just a lil?

1 1