Lightnews — Scholar-powered news

Spandana Gella

@spandanagella.bsky.social

Our team is hiring an intern discrete diffusion of text and/or code. Please apply!

Pierre-André Noël @pierreandrenoel.bsky.social · Jun 17

ServiceNow Research is offering an 8 months, full time research internship on discrete diffusion of text and/or code. Funded through Mitacs Accelerate; applicants must be at a Canadian university for the whole duration of the internship. Details and application form: forms.gle/pd9Hco79hpxM...

June 17, 2025 at 2:32 PM

Reposted by Spandana Gella

Patrice Bechard

@patricebechard.bsky.social

🚀 New paper from our team at @servicenowresearch.bsky.social!⁣
⁣
💫𝐒𝐭𝐚𝐫𝐅𝐥𝐨𝐰: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐖𝐨𝐫𝐤𝐟𝐥𝐨𝐰 𝐎𝐮𝐭𝐩𝐮𝐭𝐬 𝐅𝐫𝐨𝐦 𝐒𝐤𝐞𝐭𝐜𝐡 𝐈𝐦𝐚𝐠𝐞𝐬⁣
We use VLMs to turn 𝘩𝘢𝘯𝘥-𝘥𝘳𝘢𝘸𝘯 𝘴𝘬𝘦𝘵𝘤𝘩𝘦𝘴 and diagrams into executable workflows 🖍️→⚙️⁣
⁣
🔗 arxiv.org/abs/2503.218...
📝 tinyurl.com/3utdbn97%E2%...
#Sketch2Flow #AI #VLM

May 29, 2025 at 3:34 AM

Reposted by Spandana Gella

Xiangru (Edward) Jian

@edwardjian.bsky.social

🚀 Excited to share that UI-Vision has been accepted at ICML 2025! 🎉

We have also released the UI-Vision grounding datasets. Test your agents on it now! 🚀

🤗 Dataset: huggingface.co/datasets/Ser...

#ICML2025 #AI #DatasetRelease #Agents

May 15, 2025 at 2:14 PM

Spandana Gella

@spandanagella.bsky.social

Very excited to announce our GUI benchmarking dataset UI-Vision : uivision.github.io

Our evals reveal current GUI-models struggle with grounding small elements, dense UIs and has limited domain/spatial/motion understanding.

Watch out this space for more exciting stuff from us!

March 24, 2025 at 5:17 PM

Spandana Gella

@spandanagella.bsky.social

Web agents powered by LLMs can solve complex tasks, but our analysis shows that they can also be easily misused to automate harmful tasks.

See the thread below for more details on our new web agent safety benchmark: SafeArena and Agent Risk Assessment framework (ARIA).

Xing Han Lu @xhluca.bsky.social · Mar 10

Agents like OpenAI Operator can solve complex computer tasks, but what happens when users use them to cause harm, e.g. spread misinformation?

To find out, we introduce SafeArena (safearena.github.io), a benchmark to assess the capabilities of web agents to complete harmful web tasks. A thread 👇

March 10, 2025 at 8:11 PM

Reposted by Spandana Gella

Karolina Stańczak

@karstanczak.bsky.social

📢New Paper Alert!🚀

Human alignment balances social expectations, economic incentives, and legal frameworks. What if LLM alignment worked the same way?🤔

Our latest work explores how social, economic, and contractual alignment can address incomplete contracts in LLM alignment🧵

March 4, 2025 at 4:08 PM

Reposted by Spandana Gella

Aarash Feizi

@aarashfeizi.bsky.social

🚨 Excited to introduce PairBench! 🚨

💡 TL;DR: VLM-judges can fail at data comparison!

✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation.

📄 Paper: arxiv.org/abs/2502.15210

🧵 Thread: 👇

February 27, 2025 at 7:50 PM

Reposted by Spandana Gella

Alexandre Lacoste

@alex-lacoste.bsky.social

We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

December 12, 2024 at 5:55 PM

Spandana Gella

@spandanagella.bsky.social

If you want to know all about the exciting stuff we do with web agents @servicenowresearch.bsky.social register here and interact with our team including the amazing @alex-lacoste.bsky.social and @adrouinenv.bsky.social

Alexandre Lacoste @alex-lacoste.bsky.social · Dec 12

Join us for a co-hosted Happy Hour
NeurIPS 2024
with ServiceNow and IMean.ai
as we explore the cutting edge of WebAgent development!

📅 Date: Dec 13th 6:00pm PST
📍 Location: 15min walk from Neurips see details after RSVP
🎉 RSVP Here: lu.ma/rw9x9vc6

December 12, 2024 at 5:28 PM

Spandana Gella

@spandanagella.bsky.social

Thrilled to launch BigDocs—an open multimodal dataset set to transform document understanding! Our contribution to VLM community, supporting transparency in multimodal document reasoning. Proud to work with the most passionate and amazing team @servicenowresearch.bsky.social !

Juan Rodriguez @joanrod.bsky.social · Dec 10

🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!

December 10, 2024 at 8:08 PM

Reposted by Spandana Gella

Alexandre Lacoste

@alex-lacoste.bsky.social

🧵-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.

AgentLab diagram.

The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights:

Core Agent Features:

Dynamic Prompting and a Unified LLM API for interacting with large language models.
BrowserGym Platform:

A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others.
Key Features:

Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces.
Blue elements represent AgentLab components.

December 3, 2024 at 9:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news