We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs
On different prompts, domains, tokens, models...
Join our community effort to expand it with YOUR model predictions & become a co-author!
We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs
On different prompts, domains, tokens, models...
Join our community effort to expand it with YOUR model predictions & become a co-author!
While regulators rely on benchmarking 📊, we show why it cannot guarantee AI behavior:
arxiv.org/pdf/2501.15693
Excited about this multidisciplinary collaboration!
@gabistanovsky.bsky.social,
@rkeydar.bsky.social , Gadi Perl
While regulators rely on benchmarking 📊, we show why it cannot guarantee AI behavior:
arxiv.org/pdf/2501.15693
Excited about this multidisciplinary collaboration!
@gabistanovsky.bsky.social,
@rkeydar.bsky.social , Gadi Perl