Lightnews — Scholar-powered news

Manfred Diaz @manfreddiaz.bsky.social · 4d

It hasn't disappointed thus far!

Manfred Diaz @manfreddiaz.bsky.social · Aug 8

@sharky6000.bsky.social this may be of interest!

1 4

Manfred Diaz @manfreddiaz.bsky.social · Jun 16

I was following this one during the COVID pandemic, but it has been inactive for quite some time. The original talks' recordings are amazing, though!

1 1

Manfred Diaz @manfreddiaz.bsky.social · Jun 5

Yeah, it's been a period for all of us simultaneously! I have also been pretty busy with thesis/job search. Hopefully, it will be back running in the Fall term!

1

Manfred Diaz @manfreddiaz.bsky.social · May 23

@aamasconf.bsky.social 2025 was very special for us! We had the opportunity. to present a tutorial on general evaluation of AI agents, and we got a best paper award! Congrats, @sharky6000.bsky.social and the team! 🎉

Marc Lanctot @sharky6000.bsky.social · May 23

That's a wrap for day #4 @aamasconf.bsky.social . I did not present anything today but I am honored that we received the best paper award!

Thanks to everyone who made it happen! 👇 1/2

1 12

Reposted by Manfred Diaz

Marc Lanctot @sharky6000.bsky.social · May 18

In the afternoon we will be giving a tutorial on general evaluation of AI agents.

sites.google.com/view/aamas20... 10/N

A Tutorial on General Evaluation of AI Agents

Artificial Intelligence (AI) and machine learning (ML), in particular, have emerged as scientific disciplines concerned with understanding and building single and multi-agent systems with the ability ...

sites.google.com

1 1 3

Reposted by Manfred Diaz

Joel Z Leibo @jzleibo.bsky.social · May 9

Announcing our latest arxiv paper:

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt
arxiv.org/abs/2505.05197

We argue for a view of AI safety centered on preventing disagreement from spiraling into conflict.

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt

Artificial Intelligence (AI) systems are increasingly placed in positions where their decisions have real consequences, e.g., moderating online spaces, conducting research, and advising on policy. Ens...

arxiv.org

1 6 23

Manfred Diaz @manfreddiaz.bsky.social · May 1

Congrats, Seth!

1 1

Reposted by Manfred Diaz

Joel Z Leibo @jzleibo.bsky.social · Apr 22

First LessWrong post! Inspired by Richard Rorty, we argue for a different view of AI alignment, where the goal is "more like sewing together a very large, elaborate, polychrome quilt", than it is "like getting a clearer vision of something true and deep"
www.lesswrong.com/posts/S8KYwt...

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt — LessWrong

We can just drop the axiom of rational convergence.

www.lesswrong.com

3 1 6

Manfred Diaz @manfreddiaz.bsky.social · Apr 16

The quality of London's museums is just amazing! Enjoy!

3

Reposted by Manfred Diaz

Joel Z Leibo @jzleibo.bsky.social · Apr 1

In case folks are interested, here's a video of a talk I gave at MIT a couple weeks ago: youtu.be/FmN6fRyfcsY?...

A Theory of Appropriateness with Applications to Generative Artificial Intelligence

YouTube video by MITCBMM

youtu.be

3 8

Reposted by Manfred Diaz

Marc Lanctot @sharky6000.bsky.social · Mar 28

Our new evaluation method, Soft Condorcet Optimization is now available open-source! 👍

Both the sigmoid (smooth Kendall-tau) and Fenchel-Young (perturbed optimizers) versions.

Also, an optimized C++ implementation that is ~40X faster than the Python one. 🤩⚡

github.com/google-deepm...

3 16

Reposted by Manfred Diaz

Marc Lanctot @sharky6000.bsky.social · Mar 26

Working at the intersection of social choice and learning algorithms?

Check out the 2nd Workshop on Social Choice and Learning Algorithms (SCaLA) at @ijcai.bsky.social this summer.

Submission deadline: May 9th.

I attended last year at AAMAS and loved it! 👍

sites.google.com/corp/view/sc...

SCaLA-25

A workshop connecting research topics in social choice and learning algorithms.

sites.google.com

6 18

Manfred Diaz @manfreddiaz.bsky.social · Mar 6

If the AAMAS website is a good reference for this, it may not be, but uncertain atm.

1 1

Manfred Diaz @manfreddiaz.bsky.social · Mar 4

Come to understand ML evaluation from first principles! We have put together a great AAMAS tutorial covering statistics, probabilistic models, game theory, and social choice theory.

Bonus: a unifying perspective of the problem leveraging decision-theoretic principles!

Join us on May 19th!

Marc Lanctot @sharky6000.bsky.social · Mar 4

Attending @aamasconf.bsky.social ?

Are you interested in general agent evaluation but don't know much about the area?

Check out our tutorial taking place on May 19th! sites.google.com/view/aamas20...

(Co-organized with @manfreddiaz.bsky.social @ktlr.bsky.social @drimgemp.bsky.social )

1/2

A Tutorial on General Evaluation of AI Agents

Artificial Intelligence (AI) and machine learning (ML), in particular, have emerged as scientific disciplines concerned with understanding and building single and multi-agent systems with the ability ...

sites.google.com

1 1 5

Manfred Diaz @manfreddiaz.bsky.social · Mar 4

Re #2: The key finding there is that the stationary points of SCO contain the margin matrix but, as I said in the note, there is still more work to do!

1 1

Manfred Diaz @manfreddiaz.bsky.social · Mar 4

Thanks! I have been meaning to update the manuscript to standalone without the main paper but instead I may have change the content to a different format 😉. Coming soon!

2 1

Manfred Diaz @manfreddiaz.bsky.social · Feb 25

Ah, I see the confusion... I never used the "identically distributed assumption," only the independence assumption (from 8 to 9).

1

Manfred Diaz @manfreddiaz.bsky.social · Feb 25

I'm not sure if I understood your question correctly, but yes? As the post you shared says, "Voila! We have shown that minimizing the KL divergence amounts to finding the maximum likelihood estimate of θ." Maybe I am missing your point 😬

2

Manfred Diaz @manfreddiaz.bsky.social · Feb 25

Elo drives most LLM evaluations, but we often overlook its assumptions, benefits, and limitations. While working on SCO, we wanted to understand the SCO-Elo distinction, so I looked and uncovered some intriguing findings and documented them in these notes. I hope you find them valuable!

Marc Lanctot @sharky6000.bsky.social · Feb 24

Btw, if you stare at the derivation of Elo as logistic regression, SCO is really quite close to Elo. The difference is that Elo uses a classification objective (cross entropy loss) on top of the output of the sigmoid.

@manfreddiaz.bsky.social dug even deeper: manfreddiaz.github.io/assets/pdf/s...

manfreddiaz.github.io

1 2

Reposted by Manfred Diaz

Marc Lanctot @sharky6000.bsky.social · Feb 24

Looking for a principled evaluation method for ranking of *general* agents or models, i.e. that get evaluated across a myriad of different tasks?

I’m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! 🧵 1/N

1 17 67

Manfred Diaz @manfreddiaz.bsky.social · Feb 20

I had the convexity results for the online pairwise update (Section B.1.1.1) in my notes (manfreddiaz.github.io/assets/pdf/s...), but it is not clear to me if they hold for the other non-online settings. Worth taking a more detailed pass over the paper!

manfreddiaz.github.io

2

Manfred Diaz @manfreddiaz.bsky.social · Feb 20

That's a nice finding, @sacha2.bsky.social! @sharky6000.bsky.social I skimmed over it, and it seems neat! There is an important distinction, though. They work with the "online" Elo regime, departing from the traditional gradient/batch gradient descent updates. (e.g., FIDE doesn't use online updates)

1 2

Manfred Diaz @manfreddiaz.bsky.social · Feb 12

lol 😀

3

Manfred Diaz @manfreddiaz.bsky.social · Feb 12

Not that Michael Jordan, but this one en.wikipedia.org/wiki/Michael...

Michael I. Jordan - Wikipedia

en.wikipedia.org

1 3