Spandana Gella
@spandanagella.bsky.social
130 followers 210 following 7 posts
Sr Mgr & Research Scientist @ServiceNowRSRCH, Montreal
Posts Media Videos Starter Packs
spandanagella.bsky.social
Our team is hiring an intern discrete diffusion of text and/or code. Please apply!
pierreandrenoel.bsky.social
ServiceNow Research is offering an 8 months, full time research internship on discrete diffusion of text and/or code. Funded through Mitacs Accelerate; applicants must be at a Canadian university for the whole duration of the internship. Details and application form: forms.gle/pd9Hco79hpxM...
Reposted by Spandana Gella
patricebechard.bsky.social
🚀 New paper from our team at @servicenowresearch.bsky.social!⁣

💫𝐒𝐭𝐚𝐫𝐅𝐥𝐨𝐰: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐖𝐨𝐫𝐤𝐟𝐥𝐨𝐰 𝐎𝐮𝐭𝐩𝐮𝐭𝐬 𝐅𝐫𝐨𝐦 𝐒𝐤𝐞𝐭𝐜𝐡 𝐈𝐦𝐚𝐠𝐞𝐬⁣
We use VLMs to turn 𝘩𝘢𝘯𝘥-𝘥𝘳𝘢𝘸𝘯 𝘴𝘬𝘦𝘵𝘤𝘩𝘦𝘴 and diagrams into executable workflows 🖍️→⚙️⁣

🔗 arxiv.org/abs/2503.218...
📝 tinyurl.com/3utdbn97%E2%...
#Sketch2Flow #AI #VLM
Reposted by Spandana Gella
edwardjian.bsky.social
🚀 Excited to share that UI-Vision has been accepted at ICML 2025! 🎉

We have also released the UI-Vision grounding datasets. Test your agents on it now! 🚀

🤗 Dataset: huggingface.co/datasets/Ser...

#ICML2025 #AI #DatasetRelease #Agents
spandanagella.bsky.social
Very excited to announce our GUI benchmarking dataset UI-Vision : uivision.github.io

Our evals reveal current GUI-models struggle with grounding small elements, dense UIs and has limited domain/spatial/motion understanding.

Watch out this space for more exciting stuff from us!
spandanagella.bsky.social
Web agents powered by LLMs can solve complex tasks, but our analysis shows that they can also be easily misused to automate harmful tasks.

See the thread below for more details on our new web agent safety benchmark: SafeArena and Agent Risk Assessment framework (ARIA).
xhluca.bsky.social
Agents like OpenAI Operator can solve complex computer tasks, but what happens when users use them to cause harm, e.g. spread misinformation?

To find out, we introduce SafeArena (safearena.github.io), a benchmark to assess the capabilities of web agents to complete harmful web tasks. A thread 👇
Reposted by Spandana Gella
karstanczak.bsky.social
📢New Paper Alert!🚀

Human alignment balances social expectations, economic incentives, and legal frameworks. What if LLM alignment worked the same way?🤔

Our latest work explores how social, economic, and contractual alignment can address incomplete contracts in LLM alignment🧵
Reposted by Spandana Gella
aarashfeizi.bsky.social
🚨 Excited to introduce PairBench! 🚨

💡 TL;DR: VLM-judges can fail at data comparison!

✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation.

📄 Paper: arxiv.org/abs/2502.15210

🧵 Thread: 👇
Reposted by Spandana Gella
alex-lacoste.bsky.social
We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet
spandanagella.bsky.social
If you want to know all about the exciting stuff we do with web agents @servicenowresearch.bsky.social register here and interact with our team including the amazing @alex-lacoste.bsky.social and @adrouinenv.bsky.social
alex-lacoste.bsky.social
Join us for a co-hosted Happy Hour
NeurIPS 2024
with ServiceNow and IMean.ai
as we explore the cutting edge of WebAgent development!

📅 Date: Dec 13th 6:00pm PST
📍 Location: 15min walk from Neurips see details after RSVP
🎉 RSVP Here: lu.ma/rw9x9vc6
spandanagella.bsky.social
We would be delighted to come and see you ;)
spandanagella.bsky.social
Thrilled to launch BigDocs—an open multimodal dataset set to transform document understanding! Our contribution to VLM community, supporting transparency in multimodal document reasoning. Proud to work with the most passionate and amazing team @servicenowresearch.bsky.social !
joanrod.bsky.social
🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!
Reposted by Spandana Gella
alex-lacoste.bsky.social
🧵-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.
AgentLab diagram.

The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights:

Core Agent Features:

Dynamic Prompting and a Unified LLM API for interacting with large language models.
BrowserGym Platform:

A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others.
Key Features:

Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces.
Blue elements represent AgentLab components.