Sebastian
@mersault.bsky.social
420 followers 550 following 23 posts
Professor at BIFOLD & TU Berlin, research on data engineering for ML. Previously at UvA, NYU, Amazon, Twitter. Opinions are my own. https://deem.berlin
Posts Media Videos Starter Packs
Reposted by Sebastian
emilienschultz.bsky.social
What a banger is skrub @skrub-data.bsky.social !

Big thumbs up for the sklearn team & the maintainer of this package
Reposted by Sebastian
pydatalondon.bsky.social
It looks like a date frame, but Skrub stores the whole transformation pipeline in the magic skb attribute!
Reposted by Sebastian
ptenigma.bsky.social
🔥CAN YOU BUILD AI MODELS that give you (verifiable) uncertainty estimates in their outputs? Cool talk on ML, classifiers, + calibration www.youtube.com/watch?v=SI6b... by scikit-learn architect @gaelvaroquaux.bsky.social
*with ninja-level modeling of variance you probably didn't know existed !
Reposted by Sebastian
oovcharenko.bsky.social
✨ Excited to present our workshop paper at DataWorld at #ICML2025 tomorrow 🇨🇦

We introduce the problem of detecting cross-modal errors in tabular data that originate from other modalities.

Visit our poster:
📅 Saturday, July 19, 10:05 AM - 11:20 AM
📍 West Meeting Room 208-209
mersault.bsky.social
On Saturday, @oovcharenko.bsky.social will present a poster on "Towards Cross-Modal Error Detection with Tables and Images" at the the Data World workshop, which focuses on finding errors in tables by inspecting corresponding image data:

olgaovcharenko.github.io/_pages/MERIT...

(3/3)
mersault.bsky.social
On Thursday, @oovcharenko.bsky.social will present her research on "scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data". This paper is joint work with ETH Zuerich and was selected as a spotlight poster:

icml.cc/virtual/2025...

(2/3)
mersault.bsky.social
The DEEM Lab is at ICML this week for the first time, with two contributions!

(1/3)
Reposted by Sebastian
oovcharenko.bsky.social
Our paper "Towards Cross-Modal Error Detection with Tables and Images" was accepted for the DataWorld workshop at ICML'25! 🥳

Thanks to @mersault.bsky.social!
Reposted by Sebastian
stefan-grafberger.com
Our demo "mlidea: Interactively Improving ML Data Preparation Code via 'Shadow Pipelines'" was accepted at VLDB! 🥳

We demo suggestions for ML pipelines, similar to IntelliJ code inspections or Grammarly suggestions

youtu.be/ePGm1J6S2qk

Joint work w/ @mersault.bsky.social @p-groth.bsky.social
Reposted by Sebastian
duckdb.org
DuckDB @duckdb.org · May 28
📢 We are hosting a DuckDB meetup in Berlin during the week of the SIGMOD conference.

📍 The meetup will take place on June 26 (Thursday) south of the Tiergarten and will feature talks by Amine Mhedhbi, David Justen and dltHub!

📝 If you plan to attend, please register at duckdb.org/events/2025/...
mersault.bsky.social
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation pipelines that optimize ML models along responsibility objectives.

This is a fully-funded position at @bifold.berlin, co-supervised by Julia Stoyanovich from NYU.

Details: deem.berlin#jobs-17725
Reposted by Sebastian
smaglia.bsky.social
One more week to apply to this exciting position... and another position on #CausalRepresentationLearning and #ReinforcementLearning for learning provably correct #concepts from raw data opening up soon!
smaglia.bsky.social
Exciting new PhD position at Utrecht University on the #causal effects of communication in #multi-agent #RL with Shihan Wang, Mehdi Dastani and me 🎉

This is part of www.hybrid-intelligence-centre.nl, which aims at combining human and machine intelligence.

Deadline 20 May

www.uu.nl/en/organisat...
PhD Position on Causal Effects of Communication in MARL
Join us to work on the causal effects of communication in multi-agent reinforcement learning (MARL).
www.uu.nl
mersault.bsky.social
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation for ML/AI systems.

This is a fully-funded position with salary level E13 at the newly founded DEEM Lab, as part of @bifold.berlin .

Details available at deem.berlin#jobs-2225
Reposted by Sebastian
oovcharenko.bsky.social
📢 Our extended benchmark on self-supervised learning for single-cell data, scSSL-Bench 🧬, is now accepted at ICML (spotlight)!

Thanks to all collaborators from @bifold.berlin and @ethzurich.bsky.social!
oovcharenko.bsky.social
📢 Our benchmark on self-supervised learning for single-cell data🧬 is accepted at the #NeurIPS2024 SSL workshop. We take a first step towards establishing best practices for SSL methods for single-cell data, and benchmark 8 SSL methods on 3 downstream tasks across 8 datasets.
mersault.bsky.social
I was invited to review for the "Journal of Pipeline Systems Engineering and Practice", seems our work on ML pipelines is finally recognised by other communities as well ;D
mersault.bsky.social
@recsys.bsky.social Quick question, is the full list of accepted workshops already published somewhere? I am looking for a target venue for the early work of a student of mine. Thx.
mersault.bsky.social
We have openings for student assistants in the DEEM Lab at @bifold.berlin. This is a great opportunity to work with PhD students, implement cool stuff, gather research experience and become a co-author of scientific publications :)

deem.berlin#jobs-193487