Arkadiy Saakyan
@asaakyan.bsky.social
400 followers 200 following 6 posts
PhD student at Columbia University working on human-AI collaboration, AI creativity and explainability. prev. intern @GoogleDeepMind, @AmazonScience asaakyan.github.io
Posts Media Videos Starter Packs
Pinned
asaakyan.bsky.social
Can vision-language models understand figurative meaning in multimodal inputs, like visual metaphors, sarcastic captions or memes? Come find out at our #NAACL2025 poster on Friday at 9am!

New task & dataset of images and captions with figurative phenomena like metaphor, idiom, sarcasm, and humor.
Reposted by Arkadiy Saakyan
danielsc4.it
📢 New paper: Applied interpretability 🤝 MT personalization!

We steer LLM generations to mimic human translator styles on literary novels in 7 languages. 📚

SAE steering can beat few-shot prompting, leading to better personalization while maintaining quality.

🧵1/
asaakyan.bsky.social
Even powerful models achieve only 50% explanation adequacy rate, suggesting difficulties in reasoning about figurative inputs. Hallucination & unsound reasoning are the most prominent error categories.
asaakyan.bsky.social
Our main results are:
1. VLMs struggle to generalize from literal to figurative meaning understanding (training on e-ViL only achieves random F1 on our task)
2. Figurative meaning in the image is harder to explain compared to when it is in the text
3. VLMs benefit from image data during fine-tuning
asaakyan.bsky.social
Via human-AI collaboration, we augment existing datasets for multimodal metaphors, sarcasm, and humor with entailed/contradicted captions and textual explanations. The figurative part can be in the image, caption, or both. We benchmarks a variety of models on the resulting data.
asaakyan.bsky.social
We frame the multimodal figurative meaning understanding problem as an explainable visual entailment task between an image (premise) and its caption (hypothesis). The VLM predicts whether the image entails or contradicts the caption, and shows the reasoning steps in a textual explanation.
asaakyan.bsky.social
Can vision-language models understand figurative meaning in multimodal inputs, like visual metaphors, sarcastic captions or memes? Come find out at our #NAACL2025 poster on Friday at 9am!

New task & dataset of images and captions with figurative phenomena like metaphor, idiom, sarcasm, and humor.
Reposted by Arkadiy Saakyan
gsagostini.bsky.social
Migration data lets us study responses to environmental disasters, social change patterns, policy impacts, etc. But public data is too coarse, obscuring these important phenomena!

We build MIGRATE: a dataset of yearly flows between 47 billion pairs of US Census Block Groups. 1/5
Reposted by Arkadiy Saakyan
jennarussell.bsky.social
People often claim they know when ChatGPT wrote something, but are they as accurate as they think?

Turns out that while general population is unreliable, those who frequently use ChatGPT for writing tasks can spot even "humanized" AI-generated text with near-perfect accuracy 🎯