@boydgraber.bsky.social
190 followers 350 following 37 posts
Posts Media Videos Starter Packs
boydgraber.bsky.social
We had come at it more from the position of trying to use as few dev examples as possible (to keep them secret). I.e., use the best items you could and every model uses the exact same. But it makes sense to use the adaptive testing scenario if you don't mind potentially exposing more dev.
boydgraber.bsky.social
In 2021, we proposed using IRT to find bad examples and to create more targeted leaderboards (Evaluation
Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?).

From my reading, the big difference seems to be that they're also using the agent's skill, which is super cool!
boydgraber.bsky.social
We also found that it's helpful for improving uncertainty estimation of models:

arxiv.org/abs/2205.12507
arxiv.org
boydgraber.bsky.social
If it said that 1990 was "about 10 years ago", I would say that it has reached tenured faculty-level intelligence.
boydgraber.bsky.social
Today's the deadline to apply for an AI-specific teaching track position at UMD:

umd.wd1.myworkdayjobs.com/UMCP/job/Uni...

Please join us!
boydgraber.bsky.social
A couple of weeks ago I left my family behind at a cable car station to finish climbing to the peak of a mountain because they were too scared to continue. When I reached the top, my phone gave a notification: new podcasts available for download. Apparently LMU has an observatory on Wendelstein.
boydgraber.bsky.social
Do you mean salary, physical facilities, work environment, or funding ecosystem?
boydgraber.bsky.social
At the risk of picking out one of my favorite children, this was the paper with our best traditional video of this cycle (thanks to Jon May for playing along):

t.co/QQlgwzo6jf
t.co/2G6kwAAPMy
https://youtu.be/L_hcHQep3fc
t.co
boydgraber.bsky.social
My students and I are presenting three papers on Monday at #ACL2025 and this thread will recap them (including their videos).
boydgraber.bsky.social
The precursor to this paper "The Incoherence of Coherence" had our most-watched paper video ever, so I thought we had to surpass it somehow ... so we decided to do a song parody (of Roxanne, obviously):

youtu.be/87OBxEM8a9E
boydgraber.bsky.social
Sara’s Crias went 4-2 to win the tournament (and $150 dollars). Noah Sheidlower’s music packet was the most difficult for computers, and Jame Carlson’s Spatial Reasoning was the fan favorite. We’ll announce writer and computer prizes after our online mirror. (And also post the packets.)
boydgraber.bsky.social
We had our first human–computer cooperative AI tournament at the UMD. Key takeaways: 1) computers are getting better at trivia 2) they still suck at calibration 3) our teaming mechanic kept the games competitive and mostly fun (at least that’s what the players said).
Human-Computer AI Collaborative Tournament Gameplay
boydgraber.bsky.social
Today is the deadline to sign up for our Human-Computer trivia competition held on June 14, 2024 in College Park, MD. $150 prize for the team who can answer the most questions with the help of an AI.
QANTA: Question Answering is not a Trivial Activity Logo [Two humans working with a computer to answer a question]