Posts
Media
Videos
Starter Packs
Reposted
Hanna Wallach
@hannawallach.bsky.social
· Jun 15
Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge
The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges com...
arxiv.org
Reposted
Yoav Goldberg
@yoavgo.bsky.social
· Dec 17
Reposted
Edward Grefenstette
@egrefen.bsky.social
· Nov 19
Reposted
Marco
@mcognetta.bsky.social
· Nov 11