Siqi Liu (刘思奇)
@liusiqi.bsky.social
63 followers 150 following 3 posts
Staff Research Engineer @ DeepMind
Posts Media Videos Starter Packs
liusiqi.bsky.social
We have got exciting (and unconventional) stuff cooking and we are hiring for a strong research engineer on the GDM Game Theory team in London.

Consider apply if you are interested in the intersection of game theory, multiagent systems and LLMs!
job-boards.greenhouse.io/deepmind/job...
Research Engineer, Game Theory & Multi-Agent Systems
London, UK
job-boards.greenhouse.io
liusiqi.bsky.social
Frontier models are often compared on crowdsourced user prompts - user prompts can be low-quality, biased and redundant, making "performance on average" hard to trust.

Come find us at #ICLR2025 to discuss game-theoretic evaluation (shorturl.at/0QtBj)! See you in Singapore!
Re-evaluating Open-Ended Evaluation of Large Language Models
A case study using the livebench.ai leaderboard.
shorturl.at
Reposted by Siqi Liu (刘思奇)
lukemarris.bsky.social
[🧵1/N] Thrilled to share our work "Re-evaluating Open-Ended Evaluation of Large Language Models"! 🚀 Popular LLM leaderboards (think Elo/Chatbot Arena) are useful, but are they telling the whole story? We find issues w/ redundancy & bias. 🤔
Paper @ ICLR 2025: arxiv.org/abs/2502.20170 #LLM #ICLR2025
Reposted by Siqi Liu (刘思奇)
jeffdean.bsky.social
🥁Introducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding.

Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on the LM Arena leaderboard. 🥇
Reposted by Siqi Liu (刘思奇)