Patrice Bechard
@patricebechard.bsky.social
12 followers 86 following 19 posts
Applied Research Scientist working on LLMs at @ServiceNow. Opinions are my own.
Posts Media Videos Starter Packs
Pinned
patricebechard.bsky.social
🚀 New paper from our team at @servicenowresearch.bsky.social!⁣

💫𝐒𝐭𝐚𝐫𝐅𝐥𝐨𝐰: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐖𝐨𝐫𝐤𝐟𝐥𝐨𝐰 𝐎𝐮𝐭𝐩𝐮𝐭𝐬 𝐅𝐫𝐨𝐦 𝐒𝐤𝐞𝐭𝐜𝐡 𝐈𝐦𝐚𝐠𝐞𝐬⁣
We use VLMs to turn 𝘩𝘢𝘯𝘥-𝘥𝘳𝘢𝘸𝘯 𝘴𝘬𝘦𝘵𝘤𝘩𝘦𝘴 and diagrams into executable workflows 🖍️→⚙️⁣

🔗 arxiv.org/abs/2503.218...
📝 tinyurl.com/3utdbn97%E2%...
#Sketch2Flow #AI #VLM
patricebechard.bsky.social
🔍 Extra findings:

• Models struggle most with handwritten & whiteboard sketches
• UI screenshots are easiest
• End-to-end generation beats decomposed pipelines
• Finetuning on diverse sketch data is key to generalization
patricebechard.bsky.social
📊 We benchmarked top VLMs (GPT-4o, Claude, Gemini) vs. open-weight models (Qwen, LLaMA, Pixtral).

📈 Finetuned open models outperform proprietary ones:

Qwen2.5-VL-7B → FlowSim: 0.614
GPT-4o → FlowSim: 0.786
𝐐𝐰𝐞𝐧𝟐.𝟓-𝐕𝐋-𝟕𝐁 (𝐟𝐢𝐧𝐞𝐭𝐮𝐧𝐞𝐝) → 𝐅𝐥𝐨𝐰𝐒𝐢𝐦: 𝟎.𝟗𝟓𝟕
patricebechard.bsky.social
🧠 We built a large dataset (22K+ samples) of workflow diagrams:

• Synthetic (Graphviz)
• Manual (hand-drawn)
• Whiteboard
• Digital
• UI screenshots

These were paired with structured JSON workflow outputs for training and evaluation.
patricebechard.bsky.social
𝐖𝐡𝐲?

Workflow automation is powerful—but authoring flows is still complex, even with low-code tools.
💫𝐒𝐭𝐚𝐫𝐅𝐥𝐨𝐰 explores a simpler interface: 𝐣𝐮𝐬𝐭 𝐝𝐫𝐚𝐰 𝐢𝐭.

Imagine sketching a workflow on a whiteboard and getting a runnable flow in return.
patricebechard.bsky.social
🚀 New paper from our team at @servicenowresearch.bsky.social!⁣

💫𝐒𝐭𝐚𝐫𝐅𝐥𝐨𝐰: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐖𝐨𝐫𝐤𝐟𝐥𝐨𝐰 𝐎𝐮𝐭𝐩𝐮𝐭𝐬 𝐅𝐫𝐨𝐦 𝐒𝐤𝐞𝐭𝐜𝐡 𝐈𝐦𝐚𝐠𝐞𝐬⁣
We use VLMs to turn 𝘩𝘢𝘯𝘥-𝘥𝘳𝘢𝘸𝘯 𝘴𝘬𝘦𝘵𝘤𝘩𝘦𝘴 and diagrams into executable workflows 🖍️→⚙️⁣

🔗 arxiv.org/abs/2503.218...
📝 tinyurl.com/3utdbn97%E2%...
#Sketch2Flow #AI #VLM
patricebechard.bsky.social
🌟 Key Features:

* One retriever for many use cases
* Works across languages! 🌍
* Handles structured data like workflows
* Lightweight & fast for production
* Generalizes to new domains & tasks
patricebechard.bsky.social
📊 Our Results:

Multi-task instruction fine-tuning FTW! Our approach beats both BM25 and strong off-the-shelf encoder models across all retrieval tasks (in-distribution and out-of-distribution).
patricebechard.bsky.social
💡 The Challenge:

* RAG needs domain-specific knowledge
* Multiple apps = multiple retrievers = 💰
* Different types of data (steps, tables, fields, ...)
patricebechard.bsky.social
🚀 Excited to share our new work on making RAG actually work for enterprise applications!
We present a recipe to build a custom retriever that handles multiple retrieval tasks simultaneously for domain-specific RAG applications 🧵
Reposted by Patrice Bechard
alex-lacoste.bsky.social
We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet
Reposted by Patrice Bechard
joanrod.bsky.social
🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!
patricebechard.bsky.social
Finally, we outline trade-offs and practical considerations, from latency improvements to deployment strategies. If you’re designing GenAI systems, this is a goldmine of insights!
patricebechard.bsky.social
Evaluation was key: we developed a novel tree-based metric, Flow Similarity, to assess workflow correctness. Plus, we measured each sub-task and RAG component separately for fine-grained insights.
patricebechard.bsky.social
We dive deep into dataset creation, discussing how Task Decomposition guided our labeling efforts. By focusing on smaller tasks, we sped up labeling, reduced costs, and iteratively improved our system.
patricebechard.bsky.social
RAG enhances the system by grounding the generation process in real-time data from the environment. This reduces hallucinations and ensures that the generated workflows are accurate and context-aware.
patricebechard.bsky.social
Task Decomposition allows us to split the workflow generation into two sub-tasks:

1. Outlining the workflow structure
2. Populating inputs for each step

Each sub-task is easier to solve and test, boosting the system’s modularity and maintainability.
patricebechard.bsky.social
We tackle a real-world use case: Workflow Generation. Given a user requirement in natural language, our system generates complex workflows step by step. This involves breaking the problem into smaller, manageable tasks.
patricebechard.bsky.social
Looking to build an LLM-powered app but finding it hard to make it robust? We’ve got you covered! Our new paper explores how Task Decomposition and Retrieval-Augmented Generation (RAG) can help you create reliable systems. 🧵👇