@prakharg.bsky.social
24 followers 160 following 63 posts
Posts Media Videos Starter Packs
prakharg.bsky.social
Four case studies with the gap between the reality of model use and their sandbox evaluations in audits... Definitely need to take a deeper dive, great presentation by Emily Black!
prakharg.bsky.social
Evaluations in the way the model would be deployed vs evaluations in only controlled unrealistic settings!
prakharg.bsky.social
Allowing companies to do isolated audits can lead to D-Hacking!! More robust testing is needed...
prakharg.bsky.social
Legal frameworks tend to have control over allocative decisions (Yes/No outcomes), which fit well with traditional ML systems... But not with GenAI systems
prakharg.bsky.social
Zollo et al: Towards Effective Discrimination Testing for Generative AI
#FAccT2025
prakharg.bsky.social
Nuance of stereotype errors is so important to understand their true harms... Insightful presentation by @angelinawang.bsky.social
prakharg.bsky.social
Women tend to report stereotype-reinforcing errors as more harmful while men tend to report stereotype-violating errors as more harmful...
prakharg.bsky.social
Some items are more associated with men vs women (not surprising), but not all of them are equally harmful!!
prakharg.bsky.social
Cognitive beliefs, attitudes and behaviours... Three ways to measure harms ('pragmatic harms')
prakharg.bsky.social
Are all errors equally harmful? No! Stereotype-reinforcing errors vs stereotype-violating errors
prakharg.bsky.social
Our understanding of stereotypes sometimes isn't indicative of reality.... they can appear in both directions, or might exist simply without harm
prakharg.bsky.social
Wang et al: Measuring Machine Learning Harms from Stereotypes Requires Understanding Who Is Harmed by Which Errors in What Ways
#FAccT2025
prakharg.bsky.social
Clear narrative and a great presentation by Cecilia Panigutti
prakharg.bsky.social
Risk-measuring studies - Bringing it back to risk measurement, but this time with a clearly defined objective instead of risk-uncovering as before... Not just whether a risk exists, but 'how severe' is it?
prakharg.bsky.social
Interface-design studies - Focus on UI design elements which impact user interaction
prakharg.bsky.social
Reverse-engineering studies - Narrower scope and in-depth studies of how algorithms work... Methodological precision in the key!
prakharg.bsky.social
Risk-uncovering studies - Typical starts from anecdotal evidence and help surface new risks
prakharg.bsky.social
A review organized not by data collection technique, but by DSA risk management framework categories
prakharg.bsky.social
Narrative review of algorithmic auditing studies, practical recommendation for best practices, and mapping to DSA obligations...
prakharg.bsky.social
Panigutti et al: How to investigate algorithmic-driven risks in online platforms and search engines? A narrative review through the lens of the EU Digital Services Act
#FAccT2025
prakharg.bsky.social
Such a broad topic... Excellent presentation by @feliciajing.bsky.social
prakharg.bsky.social
Historical methods working alongside many other ways of auditing these models can help us take advantage of the broader scope of historical evaluations....
prakharg.bsky.social
AI Audits have moved from bottom-up external evaluations to new age 'auditing companies'. While this has increased speed and scale, they have significantly narrowed the scope of auditing.
prakharg.bsky.social
Why the history of AI assessments? A study through the lens of historical methods can help us understand neglected areas of auditing.
prakharg.bsky.social
Sandoval and Jing: Historical Methods for AI Evaluations, Assessments, and Audits
#FAccT2025