Dr Juulia Suvilehto
@juulia.bsky.social
230 followers 460 following 72 posts
Your friendly neighbourhood (data & neuro-) scientist. Interested in healthcare, big data, social interaction, tea, and cats. 🇫🇮 living in 🇸🇪
Posts Media Videos Starter Packs
juulia.bsky.social
Hahaha, ok so it's an avalanche of spam and very little ham it sounds like. That sounds better than the inverse to me actually 😂
juulia.bsky.social
My best tip is to be very ruthless in having rules for email that automatically gets cleaned out of the inbox, e.g. automatic emails from systems, stuff that gets sent out as FYI, weekly XYZ digests etc. They all go into their own folders and I look at them if/when I feel like it. Maybe never.
juulia.bsky.social
In my old job I would get around 20-30 emails per day that required me specifically to react to them (so, no automated emails, no mailing list stuff). In my new job, so far ✨almost nothing✨. But I think email volume is likely a function of your tenure and/or network centrality in an organisation.
juulia.bsky.social
Thank you everyone for your suggestions! This is how I ended up plotting my LLM system output vs human annotation vs LLM-as-a-judge evals. Extra thanks to @libbyheeren.bsky.social for boosting my original question. #databs #dataviz
Four bar plots arranged like a confusion matrix. Bars have two colors stacked, based on evaluator output. Plot shows that the evaluator (DSPy optimized + got-5-nano) flags many false positives and false negatives and does not flag that many true positives or true negatives. A barplot with three series of two bars each. The series are different evaluation frameworks: Phoenix, deepeval, and DSPy. Of the pairs of bars, the left one displays evaluation output when original rating was incorrect and the right one displays evaluation output when original rating was correct. The color of the bar shows evaluator output. The plot shows that DSPy (non optimized) performs best, ie tags most incorrect responses and does not tag many correct responses.
juulia.bsky.social
Wow, thank you so much @thoughtfulnz.bsky.social ! That's really pretty & so nice of you to make up a toy example! 🤩🤩🤩
juulia.bsky.social
Ooh, that's a thought! Thanks! I think it's good for the audience to have a sense of how prevalent true vs faulty answers are, but that doesn't necessarily need to be in the same plot as the comparisons 🤔
juulia.bsky.social
Thanks Libby! Something like this is what I'm leaning towards but with four sections (so that I can show false positives and false negatives separately, they have different business implications)
juulia.bsky.social
Fair point! Right now I'm just going : right answer is the one a human gave + false positives and false negatives have different cost (false negatives are much more costly than false positives)
juulia.bsky.social
Oh hey thanks, I hadn't really thought of radar plot for this! That's definitely worth considering!
juulia.bsky.social
I think the problem is that data science is >10% thinking and you can't really externalize the thinking, just the execution. And even for execution you need to split in suitable sized chunks that you quality control yourself.
juulia.bsky.social
I’m really loving it for making and tweaking plots where I kind of know the plotting library but not well enough to remember all the functions and params by heart. @hadley.nz had a wonderful demo of this at the Posit conf earlier this month, it might be on youtube soon!
juulia.bsky.social
Maybe sometime down the line! Right I have a number of evals coming from many different models & frameworks and need to figure out how to best visualize them for our dev team.
juulia.bsky.social
Any references for good visualizations of LLM evals? We have orig system output (yes/no), a human annotator's y/n and a number of different y/n evaluations. The aim is to see how good our original system is and how much of the issues do the diff LLM evals catch. How would you visualize this? #databs
juulia.bsky.social
My mentor once said: strongest results are the ones where you don't even need stats to figure out if there was an effect, you just plot the raw data and look at it with your two human eyes. If it's obvious then, it's a ding-dang strong effect.
Reposted by Dr Juulia Suvilehto
randyau.com
#DataBS Conf 2025 preshow! We have two talks that we couldn't fit into the schedule but the speakers pre-recorded their talk for us to share before the main event next week!

Both are really good and give me lots of excitement about what we'll see next week.

ti.to/databsconf/d... <- free tix

🧵1/3
DataBS Conf
"Data, Behind the Scenes" is a free-to-attend online-only, single track conference centered on the real stories of data work from the folks in the trenches. We’re not here for the latest AI hype, perf...
ti.to
Reposted by Dr Juulia Suvilehto
randyau.com
This week on Counting Stuff, the #dataBS conference attendee registration form is open! Also an intro to the 6 of 14 talks we have confirmed!

Tickets are free/pay-what-you-want and there's technically a limit on attendees so grab yours today!

www.counting-stuff.com/databs-conf-...
DataBS Conf attendee registration is open!
Also the first batch of confirmed speakers listed
www.counting-stuff.com
Reposted by Dr Juulia Suvilehto
databsconf.com
The weekend is here!

Perfect time to submit your #dataBS talk.

We want your lessons-learned stories from data work, everything from "my data pipeline: the unsung hero" to "how we got that AI system working" to "what we learned when it fell over."

Details and sign-up:
bit.ly/dataBSconf-cfs
Data Behind the Scenes Conf - Call for Speakers
What This Conference Is About "Data, Behind the Scenes" is a (free) online-only, single track conference centered on the real stories of data work from the folks in the trenches. We’re not here for th...
bit.ly
juulia.bsky.social
Hi! Just wanted to let you know that there seems to be something wrong with the link - I get an 404 page when trying to open it and cant find the blog even in the blog tab.
Reposted by Dr Juulia Suvilehto
randyau.com
Hey #dataBS! Got some stories about how data is messily done to accomplish real things? Have you learned a painful lesson about how not to do something? Then you've got something to share and should submit a talk to the Data Behind the Scenes Conference! (Form closes July 31)

bit.ly/dataBSconf-cfs
Data Behind the Scenes Conf - Call for Speakers
What This Conference Is About "Data, Behind the Scenes" is a (free) online-only, single track conference centered on the real stories of data work from the folks in the trenches. We’re not here for th...
bit.ly
Reposted by Dr Juulia Suvilehto
randyau.com
The #dataBS (Data Behind the Scenes) Conference Call for speakers is out! We're gonna do this!!

All online, single track, free to attend. Come talk about your messy experiences doing data stuff. At work, personal projects, whatever. A space to commiserate about nerdy things!

bit.ly/dataBSconf-cfs
Data Behind the Scenes Conf - Call for Speakers
What This Conference Is About "Data, Behind the Scenes" is a (free) online-only, single track conference centered on the real stories of data work from the folks in the trenches. We’re not here for th...
bit.ly
juulia.bsky.social
Academics, get thee some fuck off skills. Marketable skills that can help you fuck off from academia and get employed elsewhere. I see folx do multiple postdocs because they don't know what else they could do. Don't let that be you - make sure you have the ability to fuck off #academia #AcademicSky
juulia.bsky.social
I seem to re-learn this truth at regular intervals: if you're looking for an explanation of how something slightly niche works, find a master's thesis related to the topic. Students (at least in the nordics) are expected to explain the whole thing and not leave steps out (unlike academic articles).
juulia.bsky.social
Ok that's a very good point! Could it be that since knowing how to do something in python is such a small part of a data scientist expertise it's just not something that is a strong part of our identity? Like, I wear clothes every day but I would never call myself "a real fashionista." Cause 🤷‍♀️
juulia.bsky.social
I think there's a subtle difference between saying "I can bake" and saying "I'm a real baker". Like, the latter feels like taking ownership of something that maybe someone else has a stronger claim over (pro bakers) whereas I can make a birthday cake but I'm not a professional baker.