Sauvik Das
@sauvik.me
450 followers 110 following 120 posts
I work on human-centered {security|privacy|computing}. Associate Professor (w/o tenure) at @hcii.cmu.edu. Director of the SPUD (Security, Privacy, Usability, and Design) Lab. Non-Resident Fellow @cendemtech.bsky.social
Posts Media Videos Starter Packs
Pinned
sauvik.me
I created a starter pack for researchers who work at the nexus of HCI & cybersecurity / privacy here.

Please do let me know if you would like to be added to the list!I'm sure I've missed many folks.

go.bsky.app/RGsu5jn
sauvik.me
So are you telling me that if people are no longer afraid if they’ll be able to afford housing and food and other basic necessities that they will actually be willing and able to…buy things?
sauvik.me
upsell.feedback.gtfo
sauvik.me
#chi2026 reviewing season is here! Friendly reminder that it is possible to write thoughtful reviews without nitpicking and without it taking a lot of time. I've actually completed my seven 2AC reviews already.

Some (updated) tips I wrote about this:
sauvik-das.medium.com/a-quick-gui...
A quick guide to reviewing papers quick
Last year, the world was on fire [1]. While it burned, I reviewed. A lot. More than I care to review again in a single year.
sauvik-das.medium.com
sauvik.me
3) Finally, there is...very very little documentation associated with these datasets which made this audit much harder than it needed to be. To help improve documentation practices, we extended datasheets for datasets w/ audio-specific questions
sauvik.me
2) Most datasets pay little attention to representation — with the exception being Mozilla Common Voice. So, unsurprisingly, most audio data is in English and there is little attempt to ensure vocal representation from a broad set of individuals.
sauvik.me
1) While there is a lot of data that may be copyrighted, to circumvent copyright issues some datasets just comprise a lot of "old" audio data, e.g., sentences read from old newspapers and books that are now in the public domain.
sauvik.me
Our audit was broad: we included sound, voice, and music. We explored content, audio quality, language representation, toxicity, bias, and licensing adherence. Lots to unpack but three key findings:
sauvik.me
ML models are only as good as the data they are trained on, and there is understandably a lot of concern around how the data that powers these models are sourced.

Through a broad review of recent gen audio papers, we identified the most commonly used datasets and audited them.
sauvik.me
Large audio models power a broad suite of new applications: they can continue unfinished audio, clone voices, provide an expressive range of text-to-speech voices, and can even create entire songs from simple text-based prompts. But what are they trained on?
sauvik.me
📣 Accepted to #AIES2025: What do the audio datasets powering generative audio models actually contain? (led by @willie-agnew.bsky.social)

Answer: Lots of old audio content that is mostly English, often biased, and of dubious copyright / permissioning status.

Paper: www.sauvik.me/papers/65/s...
sauvik.me
@kyzyl.me will be presenting this at the Privacy session at #UIST2025 next Wednesday!

programs.sigchi.org/uist/2025/pr...

Please check it out if you'll be there :)
Conference Programs
programs.sigchi.org
sauvik.me
Importantly, we found that Imago Obscura helps *address* privacy risks without impacting sharing intent.

People believed it *greatly* reduced privacy risks for images they previously wanted to share but did not share for privacy reasons, with no difference in sharing intent.
sauvik.me
In a summative evaluation, we found that Imago Obscura effectively improved users' awareness of / motivation to address / ability to address key privacy risks in images they wanted to share online.
sauvik.me
It can identify when the background of an image might uniquely identify a user's location and replace it to mitigate that risk.

It can recognize when bystanders may be in the background of a photo and provides simple mechanisms to replace those bystanders.
sauvik.me
Examples: Imago Obscura flags content that could uniquely identify an individual, like an obvious tattoo, and can generate realistic and less identifiable replacements.

It also identifies information that a user might consider confidential, and can obfuscate it via, e.g., blurring.
sauvik.me
A longstanding usable privacy challenge is helping users understand and address privacy risks in personal images they share online, from location risks to bystander risks.

Imago Obscura lets users bring-their-own-threat model, and then helps identify and address specific risks.
sauvik.me
🔐 New #UIST2025 paper by @kyzyl.me

Imago Obscura uses #vision #language #models to understand user #privacy concerns, improve their awareness of image privacy risks, and their ability to address these risks.

📜: sauvik.me/papers/66/se...
🔗: cmu-spuds.github.io/imago-obscura/

#PrivacySky
sauvik.me
Honestly, now I want to write a piper titled: "[mention a specific relevant paper by them]"
Reposted by Sauvik Das
hystericalblkns.bsky.social
If you’re on academia dot edu, let me suggest that you strongly consider deleting your account.
The new TOC from academia dot edu. 

By creating an Account with Academia.edu, you grant us a worldwide, irrevocable, non-exclusive, transferable license, permission, and consent for Academia.edu to use your Member Content and your personal information (including, but not limited to, your name, voice, signature, photograph, likeness, city, institutional affiliations, citations, mentions, publications, and areas of interest) in any manner, including for the purpose of advertising, selling, or soliciting the use or purchase of Academia.edu's Services.
sauvik.me
CHI reviews? 😱
sauvik.me
well then, that's clearly going to be an example in my Designing Human-Centered Software class.
sauvik.me
Wait okay minor correction on the above. The macroeconomic environment is worse for the majority of people, but of course the richest people continue to feast
sauvik.me
Still worth paying attention of course to because if our students are not getting jobs, then that's a big problem we need to help them solve.
sauvik.me
The article notes that "AI" is probably not the issue. It's a handy bogeyman. I use AI coding assistants and I code more now, not less...

The market is worse relative to a few years ago, but that's probably largely because the macroeconomic environment is worse relative to a few years ago.
sauvik.me
having a latte at a local coffee shop with some salmon on toast while they’re blaring Soundgarden’s “Outshined" and it's raining just a lil?