Martin Ziqiao Ma
@marstin.bsky.social
160 followers 62 following 40 posts
phd<<<1,1>>>(UMich); ex<<<3,1>>>({MIT_IBM_Watson, Adobe, Amazon}); Make the community better @ACLMentorship @GrowAILikeChild Herborium Lover, Fortune Teller, Pokémon Trainer, Szechuan Cuisine Chef. https://mars-tin.github.io
Posts Media Videos Starter Packs
marstin.bsky.social
Regrettably can’t attend #COLM2025 due to deadlines, but
Jane and Joyce will be presenting our work. :)

Jane is an exceptional undergraduate researcher and a great collaborator! Go meet her at COLM if you’re curious about her work on mechanistic interpretability, multimodality, & pragmatics!
marstin.bsky.social
Vision-Language Models are not yet pragmatically optimal.

We identify 3 key failures of pragmatic competence in referring expression generation with VLMs: (1) cannot uniquely refer to the referent, (2) include excessive or irrelevant information, and (3) misalign with human pragmatic preferences.
Reposted by Martin Ziqiao Ma
fredashi.bsky.social
🚀 ACL ARR is looking for a Co-CTO to join me lead our amazing tech team and drive the future of our workflow. If you’re interested or know someone who might be, let’s connect!

RTs & recommendations appreciated.
aclrollingreview.bsky.social
🚨 ARR is looking for a volunteer Co-CTO to help improve tech infrastructure!
🛠️ Preferred:
• 5+ years in NLP research
• Git, CLI tools, Python, and basic HTML
• 2-year role, overlapping with current Co-CTO
Interested? DM @fredashi.bsky.social or email [email protected]
#ARR #ACL #NLProc
marstin.bsky.social
Unfortunately, I’ll be missing #ACL2025NLP this year — but here are a few things I’m excited about! 👇
marstin.bsky.social
📣 Excited to announce SpaVLE: #NeurIPS2025 Workshop on Space in Vision, Language, and Embodied AI!

Join us in San Diego to push the frontiers of spatial understanding and reasoning across CV, NLP, and robotics!

👉 space-in-vision-language-embodied-ai.github.io
Reposted by Martin Ziqiao Ma
hokin.bsky.social
#CoreCognition #LLM #multimodal #GrowAI We spent 3 years to curate 1503 classic experiments spanning 12 core concepts in human cognitive development and evaluated on 230 MLLMs with 11 different prompts for 5 times to get over 3.8 millions inference data points.

A thread (1/n) - #ICML2025
Reposted by Martin Ziqiao Ma
hokin.bsky.social
New Paper Alert ‼️ Current VLMs completely fail human gaze understanding 🙀 and scaling does NO help ‼️

However, humans, since an extremely age 🧒, are extremely sensitive to other people's gaze 🙄 👀

No mentors, no labs, only pre-doc students, 111 VLMs, and we did it 😎
Reposted by Martin Ziqiao Ma
jhucompsci.bsky.social
& @tianminshu.bsky.social (+ @marstin.bsky.social, @zhitinghu.bsky.social, ‪@lianhui.bsky.social & more) will present “SimWorld: A World Simulator for Scaling Photorealistic Multi-Agent Interactions,” an @unrealengine.bsky.social-based sim that generates unlimited/diverse urban environments: (13/14)
SimWorld
SimWorld: A World Simulator for Scaling Photorealistic Multi-Agent Interactions
simworld-cvpr2025.maitrix.org
marstin.bsky.social
At Albuquerque Now :)
marstin.bsky.social
See you at #NAACL2025! I will talk about grounded lexicon acquisition and scaling mechanistically grounded vision language models. Happy to chat if you are around :)
fredashi.bsky.social
On my way to NAACL✈️! If you're also there and interested in grounding, don't miss our tutorial on "Learning Language through Grounding"!
Mark your calendar: May 3rd, 14:00-17:30, Ballroom A.

Another exciting collaboration with @marstin.bsky.social @kordjamshidi.bsky.social, Jiayuan, and Joyce!
marstin.bsky.social
We introduce RefOI, a new dataset of 1.5k objects, each with 3 written and 2 spoken human-produced referring expressions. We also release RefOI-TLHF, a large dataset of token-level human feedback for 10.6k referring expressions.

👀https://vlm-reg.github.io/
📄https://arxiv.org/abs/2504.16060
VLMs Are Not Pragmatically Competent in Referring Expression Generation
VLMs fail to refer like humans. Our study reveals widespread pragmatic issues in GPT-4o, LLaVA, and others, showing how their expressions often violate Gricean maxims.
vlm-reg.github.io
marstin.bsky.social
Vision-Language Models are not yet pragmatically optimal.

We identify 3 key failures of pragmatic competence in referring expression generation with VLMs: (1) cannot uniquely refer to the referent, (2) include excessive or irrelevant information, and (3) misalign with human pragmatic preferences.
marstin.bsky.social
🔹 ICLR BiAlign Workshop:
We’re hosting the Bidirectional Human-AI Alignment Workshop (BiAlign).
🗓 Apr 28, (Garnet 216–214)

Website: bialign-workshop.github.io

I’ll join remotely — huge thanks to @huashen.bsky.social for leading this!
marstin.bsky.social
🔹 ICLR Oral Paper:
Do Vision-Language Models Represent Space and How?

🗓 Oral: Apr 25, 3:42–3:54 a.m. (Session 4C)
🗓 Poster: Thu, Apr 24, 10 p.m.–12:30 a.m. (Hall 3 + 2B, #212)

Website: spatial-comfort.github.io

Big thanks to @fredashi.bsky.social for presenting on site!
marstin.bsky.social
I won’t be attending #ICLR2025 in person since #NAACL2025 follows right after, but here are a few things I’m excited about (all time in EDT) ⬇️
marstin.bsky.social
🎉 Out of these, 72 papers were accepted, including 5 tiny papers. 10 papers were selected for oral presentations: 2 at CHI and 8 at ICLR. Award winners will be announced during the workshop!
marstin.bsky.social
📬 We received over 100 submissions, each reviewed by 2–4 expert reviewers, with ethical assessments included when appropriate. Our program committee features leading researchers in NLP, RL, HCI, ML, and AI/ML Ethics, carefully selected based on scholarly merit and expertise.
marstin.bsky.social
🙏 Special thanks to Tammy Masterson, Technical Partnerships Lead at the AI Security Institute, who will be joining us as a panelist.
marstin.bsky.social
🙏 We are grateful to our gold sponsors, Prolific and Layer 6 AI of TD Bank Group, for their generous support in funding paper awards and travel grants.
marstin.bsky.social
#ICLR2025 and #CHI2025 are just around the corner!

We warmly invite you to join us at our ICLR Workshop and CHI SIG on Bidirectional Human-AI Alignment (Bi-Align), a space for rigorous and reflective conversations of alignment research.
Reposted by Martin Ziqiao Ma
aclmentorship.bsky.social
📢 Join us for the ACL Mentorship Session
@naaclmeeting.bsky.social #NAACL2025

Mentors:
@amuuueller.bsky.social
@fredashi.bsky.social
• Jiayuan Mao
@marstin.bsky.social
• Oana Ignat
• Weijia Shi
@zhijingjin.bsky.social
marstin.bsky.social
Meet VEGGIE 🥦

VEGGIE is an instructional video generative model trained solely with diffusion loss, designed for both video concept grounding and instruction-based editing. It effectively handles diverse video editing tasks by pixel-level grounded training in a multi-task learning setup. ⬇️
shoubin.bsky.social
Introducing VEGGIE 🥦—a unified, end-to-end, and versatile instructional video generative model.

VEGGIE supports 8 skills, from object addition/removal/changing, and stylization to concept grounding/reasoning. It exceeds SoTA and shows 0-shot multimodal instructional & in-context video editing.