Eytan Adar
@eytan.adar.prof
1.6K followers 2.3K following 350 posts
Michigan faculty, http://www.cond.org
Posts Media Videos Starter Packs
eytan.adar.prof
That's an interesting point, but that's not really due to recorded music is it? We sit around a TV, not a record player or stereo (though maybe those technologies were the bridge).
eytan.adar.prof
I should also add that you don't need the LLM/VLM to do this... you can have humans do some (or all) of the algorithm. It's just that the LLM/VLM solution makes it scale and can highlight images for editors that they might not have considered. (6/5)
eytan.adar.prof
Images selected in this way are better representatives of the main properties of the concept, while also highlighting what makes it different from other, related concepts. The paper with more details is here: arxiv.org/abs/2509.15059 (5/5)
QuizRank: Picking Images by Quizzing VLMs
Images play a vital role in improving the readability and comprehension of Wikipedia articles by serving as `illustrative aids.' However, not all images are equally effective and not all Wikipedia edi...
arxiv.org
eytan.adar.prof
We also extended this idea to build a Contrastive QuizRank. Instead of just considering what is interesting about the concept (e.g., the Western Bluebird), we let the LLM figure out what makes it different from a "distractor" (e.g., the Mountain Bluebird). (4/5)
Image of a mountain bluebird which is entirely blue
eytan.adar.prof
So if I ask: "what is the chest color of the Western Bluebird?", a good image helps you answer correctly (orange). QuizRank uses an LLM to generate questions (based on the article) and a VLM to take the test. Images that help the VLM do well on the test are better and are ranked more highly. (3/5)
eytan.adar.prof
For example, given the 234 different images of the Western Bluebird, which should I pick? Our intuition is that with a "good" instructional image, someone should be able to answer questions about important visual properties of the concept better than with a "bad" image. (2/5)
eytan.adar.prof
Fun paper from a recent project with UMich alumni Tenghao Ji. Can we pick better images for Wikipedia articles? Given all the choices in the Wikipedia Commons, which image is best as an "instructional aid?" (1/5)
An image of a male western bluebird. The chest is a brown color and the back and head are blue.
eytan.adar.prof
And by *cool, I mean the research. Not what it implies
eytan.adar.prof
Very cool... We found something related in privacy. Affordances/design->shifts in perceived norms around sharing (which deviate from actual norms)->increased sharing by individual->increased sharing by community
eytan.adar.prof
I'm not that mean... I only hate your single-author papers... :) (but seriously, good luck. We're too old to stay up this late)
eytan.adar.prof
to arxiv? yes... probably
eytan.adar.prof
We need something more technical sounding than "co-evolution" to describe this... RHLAF: reinforcing human learning through AI feedback? HAAF: human adaptation through AI feedback? :)
timkellogg.me
As LLMs Improve, People Adapt Their Prompts

a study shows that a lot of the real world performance gains that people see are actually because people learn how to use the model better

arxiv.org/abs/2407.14333
The chart presents the decomposition of Average Treatment Effect (ATE) on cosine similarity into two components: Model Effect (red) and Prompting Effect (blue).
	•	Y-axis: Δ Cosine Similarity (change in similarity).
	•	X-axis: The source of prompts (top labels) and the replay model used (bottom labels).
	•	Points and error bars: Represent mean effects with 95% confidence intervals, bootstrapped and clustered by participant.

Breakdown:
	1.	DALL-E 2 → DALL-E 2 (baseline): Δ Cosine Similarity is ~0, establishing the reference point.
	2.	DALL-E 2 prompts replayed on DALL-E 3: Shows a Model Effect (increase ~0.007–0.008). This isolates the improvement attributable to the newer model when given the same prompts.
	3.	DALL-E 3 prompts replayed on DALL-E 3 vs DALL-E 2 prompts on DALL-E 3: The additional boost is attributed to the Prompting Effect (~0.006–0.007).
	4.	Total ATE (black bracket): When prompts written for DALL-E 3 are used on DALL-E 3, the improvement in cosine similarity reaches ~0.016–0.018.
	5.	DALL-E 3 prompts replayed on DALL-E 2: Effect is small, close to baseline, showing the limited benefit of improved prompts without the newer model.

Summary (from caption):
	•	ATE (black) = Model Effect (red) + Prompting Effect (blue).
	•	Model upgrades (DALL-E 3 vs DALL-E 2) and better prompt designs both contribute to improved performance.
	•	Prompting alone offers some gains, but most improvements come from model advancements.
eytan.adar.prof
I'm very curious... Are cheese curlers often used in the bathtub? Was the art director like, "how do I show that our cheese curler is rust proof?"
An Amazon ad for a cheese curler being used in a bathtub
eytan.adar.prof
It's just a mess of a reviewing process. Forced reviewing, leading to non expert, low quality, short reviews that arrive late (well into rebuttal period), etc. they make a slightly random process very random.
eytan.adar.prof
If you depend on EMNLP/ARR for your publishing or for hiring/promotion, I feel bad for you.
eytan.adar.prof
I'm not sure who recommended it to me, but let me pass it on since it was a fun read: "Get the Picture" by Bianca Bosker
Cover for Get the Picture book,
eytan.adar.prof
One more for your collection, Lecons de Statique Graphique, Favaro, 1885
eytan.adar.prof
Lotka, 1926... so "contemporary" :)
3d plot of model of age distribution in US population
eytan.adar.prof
I still like it... ;) I'm just ACing some conferences so the struggle is on my mind
eytan.adar.prof
I like it, but I've been noticing increasing numbers of conflicts with some communities doing lots of co-authoring and big vision papers that include everyone.