Scholar

Juan Carlos Niebles

H-index: 53
Computer science 91%
Engineering 6%
jcniebles.bsky.social
📢📢 Exciting news!

Our paper, "Exploring Diffusion Transformer Designs via Grafting," has been accepted as an Oral at #NeurIPS2025, with only 77 out of 21k submissions receiving this honor.

📄Paper: arxiv.org/abs/2506.05340
🌎Website: grafting.stanford.edu
🧑🏻‍💻Code: github.com/keshik6/graf...
Exploring Diffusion Transformer Designs via Grafting
Exploring Diffusion Transformer Designs via Grafting
grafting.stanford.edu
jcniebles.bsky.social
Congrats Chaitanya on winning the BEST PAPER AWARD 🥇 🏆

Check out details of our work:

arxiv.org/abs/2504.12513
jcniebles.bsky.social
Our first #cvpr2025 poster is up!

🕐Come check it out right now until 13:00

“AdaVid: Adaptive Video-Language Pretraining”

🪧ExHall D Poster # 203

📝 arxiv.org/abs/2504.12513
jcniebles.bsky.social
Just finished a day at the #CVPR2025 Area Chair workshop. Lots of interesting discussions and ideas, reconnection with colleagues and friends.

Had the chance to present our ViUnit poster to fellow ACs. If you missed it, come to our Sunday poster session.

See details in the 🧵⬇️
jcniebles.bsky.social
If you're at #CVPR2025, please stop by my posters and say hello! I'd love to chat about our work and all things computer vision. See you in Nashville! 👋
jcniebles.bsky.social
Kicking things off on June 11th by participating in the #CVPR2025 Area Chair workshop! Eager to connect with fellow ACs and colleagues. Let's make this an impactful conference!
jcniebles.bsky.social
Excited to attend #CVPR2025 in Nashville! 🤠 Looking forward to a fantastic week of cutting-edge computer vision research and connecting with the community.
@cvprconference.bsky.social
jcniebles.bsky.social
This RL approach effectively aligns VLMs with the demands of interactive decision-making. It's a powerful new pathway for developing more capable and adaptable visual agents using readily available VLM tech.
jcniebles.bsky.social
We tested our approach on PaliGemma, xGen-MM, and MoonDream2 across Gym Cards, BabyAI, and MiniWoB. Results? Substantial improvements in valid action syntax accuracy and task success rates, even starting from noisy data!
jcniebles.bsky.social
This approach works great for offline-to-online fine-tuning, learning from static datasets (even random actions!) and then smoothly transitioning to online learning where the agent gathers new data to refine its policy. Self-improvement is key!
jcniebles.bsky.social
AFSFT helps VLMs overcome challenges like strict action syntax and suboptimal data. It learns from demonstrations and filters out tokens that would lead to invalid syntax or poor choices, even penalizing invalid syntax.
jcniebles.bsky.social
Enter Reinforcement Learning (RL)! Our paper introduces an "offline-to-online" RL technique called Advantage-Filtered Supervised Fine-Tuning (AFSFT) that allows VLMs to learn through trial and error, improving even with imperfect initial data.
jcniebles.bsky.social
Traditional supervised fine-tuning (SFT) has limits – it can't go beyond its training data, and imperfect datasets mean replicating flaws. What if we don't have perfect examples or a good initial VLM?
jcniebles.bsky.social
The catch? VLMs can struggle with the precise rules and structured outputs many agent tasks require, unlike LLMs which excel at function calling and specific syntax. Think describing a button vs. knowing the exact command to click it.
jcniebles.bsky.social
Large Language Models (LLMs) are great for agents, but what happens when we give them "eyes"? VLMs extend this power to process visual info, opening up new possibilities like robotic control and automating tasks by "seeing" your screen.
jcniebles.bsky.social
Just dropped a new blog post: "Level up your Agents: Teaching Vision-Language Models to Play by the Rules"! We're exploring how to make Vision-Language Models (VLMs) even smarter at interactive tasks.

blog: www.niebles.net/blog/2025/vl...

arxiv: arxiv.org/abs/2505.03181
#multimodalAI #agents #VLM
jcniebles.bsky.social
Check out this great intro to Large Action Models, the key engine powering the AI Agent revolution. 🤖

By @salesforce.com AI Research’s Shelby Heinecke.

See video here:
youtube.com/watch?v=vlvv...
What Are Large Action Models? | The AI Research Lab - Explained
YouTube video by Salesforce
youtube.com

Reposted by: Juan Carlos Niebles

baxterkb.bsky.social
@salesforce.com #AI Research has a new series called "AI Explained."
🎬 "The AI Research Lab - Explained" debuts with our groundbreaking work on Large Action Models! Sr. Mgr Shelby Heinecke reveals how we're training these specialized models to generate precise, executable actions. t.co/XLhlN2EZyk
https://bit.ly/4kfipp4
t.co
cvprconference.bsky.social
Behind every great conference is a team of dedicated reviewers. Congratulations to this year’s #CVPR2025 Outstanding Reviewers!

cvpr.thecvf.com/Conferences/...
jcniebles.bsky.social
Will AI be a "bicycle for the mind" boosting our creativity, or could it overshadow our own abilities? 🤔

📝 My latest blog explores this fascinating question!

Read more here: www.niebles.net/blog/2025/cr...

#AI #creativity #artificialintelligence
www.niebles
jcniebles.bsky.social
Unfortunately not my experience…

bsky.app/profile/jcni...
jcniebles.bsky.social
You are lucky. I still need to chase reviewers and have had to assign 12 emergency reviews for my pile!
jcniebles.bsky.social
You are lucky. I still need to chase reviewers and have had to assign 12 emergency reviews for my pile!
jcniebles.bsky.social
With AI models trained in colossal datasets, does the traditional concept of “generalization” (performing well on *unseen* data) still hold?
My latest blog outlines this critical question. Join the discussion! #AI #MachineLearning #Generalization

www.niebles.net/blog/2025/ga...

References

Fields & subjects

Updated 1m