Evals, metrics, multilinguality, multiculturality, multimodality, and (dabbling in) reasoning
https://saxon.me/
Interestingly, only for some multilingual models is this true. Aya knows China best in Chinese, but LLaMA's best in English always.
www.youtube.com/watch?v=B9hG...
www.youtube.com/watch?v=B9hG...
This lecture is open to the public. If you're interested in joining, here's a GCal invite link: calendar.google.com/calendar/eve...
This lecture is open to the public. If you're interested in joining, here's a GCal invite link: calendar.google.com/calendar/eve...
Check out my, @marstin.bsky.social and @xiangyue96.bsky.social's tutorial, "The Science of Benchmarking: What's Measured, What's Missing, What's Next" on December 2 from 1:30 to 4:00pm.
benchmarking.science
What will we cover?
1/3
Check out my, @marstin.bsky.social and @xiangyue96.bsky.social's tutorial, "The Science of Benchmarking: What's Measured, What's Missing, What's Next" on December 2 from 1:30 to 4:00pm.
benchmarking.science
What will we cover?
1/3
Are the equations supporting an argument or are they just a fancy way to express something simple? Do introduced terms do anything or get referenced anywhere?
I find the answer is usually no in the kinds of papers I review
Are the equations supporting an argument or are they just a fancy way to express something simple? Do introduced terms do anything or get referenced anywhere?
I find the answer is usually no in the kinds of papers I review
a 🧵 1/n
Drain: arxiv.org/abs/2511.04820
Strain: direct.mit.edu/qss/article/...
Oligopoly: direct.mit.edu/qss/article/...
More than anything my PhD taught me this.
More than anything my PhD taught me this.
Interestingly, only for some multilingual models is this true. Aya knows China best in Chinese, but LLaMA's best in English always.
Interestingly, only for some multilingual models is this true. Aya knows China best in Chinese, but LLaMA's best in English always.
We’re testing new systems to improve reply quality. See what’s coming: bsky.social/about/blog/1...
FYI the blog post for the updated policy is out. Our llm future is dire:/
Also, check out the cool bsky comment integration I've added to the blog! Engagement with this post will go under the blogpost on my site as comments!
saxon.me/blog/2024/gr...
Also, check out the cool bsky comment integration I've added to the blog! Engagement with this post will go under the blogpost on my site as comments!
saxon.me/blog/2024/gr...
Turning the replies to a bluesky post into the comment section for a blogpost is a small concrete way to support the ecosystem: future visitors who want to add comments incentivized to interact on the platform
Also, it's very easy to do:
Turning the replies to a bluesky post into the comment section for a blogpost is a small concrete way to support the ecosystem: future visitors who want to add comments incentivized to interact on the platform
Also, it's very easy to do:
youtu.be/6i2I3dkZ5-M
youtu.be/6i2I3dkZ5-M