Benjamin Feuer
@benjaminfeuer.bsky.social
87 followers 58 following 23 posts
PhD researcher at NYU, working on LLMs, VLMs, and tabular foundation models from a data-centric perspective. Father of two, NYC diehard.
Posts Media Videos Starter Packs
benjaminfeuer.bsky.social
* A submission = a curated reasoning dataset on @huggingface with 1k or 10k samples and a scalable, reproducible curation strategy you document in a write-up
* You don’t need to train a model
* You can submit with nothing more than a free Colab or Kaggle account for basic testing

🧵 5 / n
benjaminfeuer.bsky.social
💪anyone can compete for free 💪: Thanks to our sponsor @LambdaAPI we offer three free submissions for up to 500 teams. This is unprecedented in data-centric research, which tends to be very expensive because you have to train lots of models!

🧵 4 / n
benjaminfeuer.bsky.social
🤖 open-models 🤖: every model we present results for will have open weights, and one of those models will be Molmo-O from @allen_ai (a recent best paper honorable mention from @cvpr at #CVPR2025), trained on open data.

🧵 3 / n
benjaminfeuer.bsky.social
DCVLR is data-centric: we train an ~7B VLM on your dataset. The best performer (on benchmarks like MathVista, VMCBench and LiveXiv) will be eligible to win $1500 and a talk at #NeurIPS2025!

We also have a few twists compared to prior data-centric competitions –

🧵 2 / n
benjaminfeuer.bsky.social
So excited to announce the DCVLR (Data Curation for Vision-Language Reasoning) competition at #NeurIPS2025, led by @oumi-pbc.bsky.social and Lambda AI!

🌟open-data 🌟
🤖 open-models 🤖
💻 open-source 💻
💪anyone can compete for free 💪

dcvlr-neurips.github.io

🧵 1 / n
DCVLR: Data Curation for Vision Language Reasoning - NeurIPS 2025 Competition
Join the DCVLR NeurIPS 2025 Competition. Advance visual reasoning in VLMs through data curation.
dcvlr-neurips.github.io
benjaminfeuer.bsky.social
Co-organizing with wonderful collaborators from MIT, NYU, Stanford and UW: @thaottn.bsky.social , @sewoong79.bsky.social , @sarameghanbeery.bsky.social , @yuhuiz.bsky.social !
benjaminfeuer.bsky.social
We are excited to be sponsored by @datologyai.com
, who will be providing prizes for best paper awards 🏆
benjaminfeuer.bsky.social
🚀We welcome any submission that discusses domain-specific data curation pipelines and/or generalizable curation principles, getting us closer to building data-centric methods that are robust, efficient, and adaptable across domains.

Refer to our website for the call for papers!
benjaminfeuer.bsky.social
📢 Announcing our data-centric workshop at ICML 2025 on unifying data curation frameworks across domains!

📅 Deadline: May 24, AoE
🔗 Website: dataworldicml2025.github.io

We have an amazing lineup of speakers + panelists from various institutions and application areas!
ICML 2025 Workshop on Unifying Data Curation Frameworks Across Domains
ICML 2025 Workshop on Unifying Data Curation Frameworks Across Domains
dataworldicml2025.github.io
benjaminfeuer.bsky.social
That's not what they did, they used gpt-4o for program synthesis, it's fundamentally different than asking the LLM to provide the correct response in the prompt
benjaminfeuer.bsky.social
Thanks for sharing! FWIW, I sensed mostly optimism and excitement at NeurIPS -- the people I spoke to were eager to talk about their research and learn about mine. Let's meet up in the new year and compare notes @kyunghyuncho.bsky.social
benjaminfeuer.bsky.social
That does seem like a sound rule! Although, interestingly, they did not apply it to me. 😅
benjaminfeuer.bsky.social
Unpopular opinion: the #ICLR2025 reviews were better quality than in the last few years.

I think its mainly because they had people review fewer papers.

Opinions?
benjaminfeuer.bsky.social
I feel like my Macbook Pro battery is starting to go; it used to last all day, now it's dead by the afternoon. The thing is only 2.5 years old. 🤨
benjaminfeuer.bsky.social
Excited to be making my first post on BlueSky! Let's talk AI research.

@eugenevinitsky.bsky.social, can I get a who's who on here? :-)