Koyena Pal
koyena.bsky.social
Koyena Pal
@koyena.bsky.social
CS Ph.D. Candidate @ Northeastern | Interpretability + Data Science | BS/MS @ Brown

koyenapal.github.io
The other key tasks are model search and benchmarking, with important applications like document generation and auditing.

Read more in our paper (with @davidbau.bsky.social
and Renée Miller) here: arxiv.org/abs/2403.02327

Excited to share that this is accepted to #EDBT2025! 🎉

🧵5/5
Model Lakes
Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners re...
arxiv.org
March 5, 2025 at 6:28 PM
The second one is model versioning — where the aim is to map a model’s position within a lake of models, capturing these relationships using directed model graphs.

Other tasks, like model tree heritage recovery and differentiating outputs from various LLMs, are part of model versioning.

🧵4/5
March 5, 2025 at 6:28 PM
We see four major tasks for Model Lakes.

The first is model attribution — tracing & understanding a model's output through attack techniques like model inversion (recovering user inputs) and interpretability methods like reverse engineering to analyze model behavior.

bsky.app/profile/srus...

🧵3/5
March 5, 2025 at 6:28 PM
Model Lake is a system containing numerous heterogenous pre-trained models and related data in their natural formats. This concept is inspired from data lakes, which collect raw, unstructured data at scale.

By addressing shared challenges across research, we can unlock meaningful solutions. 👇

🧵2/5
March 5, 2025 at 6:28 PM