Lightnews — Scholar-powered news

Dr Juulia Suvilehto @juulia.bsky.social · 5h

Hahaha, ok so it's an avalanche of spam and very little ham it sounds like. That sounds better than the inverse to me actually 😂

1

Dr Juulia Suvilehto @juulia.bsky.social · 8h

My best tip is to be very ruthless in having rules for email that automatically gets cleaned out of the inbox, e.g. automatic emails from systems, stuff that gets sent out as FYI, weekly XYZ digests etc. They all go into their own folders and I look at them if/when I feel like it. Maybe never.

1

Dr Juulia Suvilehto @juulia.bsky.social · 8h

In my old job I would get around 20-30 emails per day that required me specifically to react to them (so, no automated emails, no mailing list stuff). In my new job, so far ✨almost nothing✨. But I think email volume is likely a function of your tenure and/or network centrality in an organisation.

2

Dr Juulia Suvilehto @juulia.bsky.social · 8h

Thank you everyone for your suggestions! This is how I ended up plotting my LLM system output vs human annotation vs LLM-as-a-judge evals. Extra thanks to @libbyheeren.bsky.social for boosting my original question. #databs #dataviz

Four bar plots arranged like a confusion matrix. Bars have two colors stacked, based on evaluator output. Plot shows that the evaluator (DSPy optimized + got-5-nano) flags many false positives and false negatives and does not flag that many true positives or true negatives.

A barplot with three series of two bars each. The series are different evaluation frameworks: Phoenix, deepeval, and DSPy. Of the pairs of bars, the left one displays evaluation output when original rating was incorrect and the right one displays evaluation output when original rating was correct. The color of the bar shows evaluator output. The plot shows that DSPy (non optimized) performs best, ie tags most incorrect responses and does not tag many correct responses.

1 2

Dr Juulia Suvilehto @juulia.bsky.social · 7d

Wow, thank you so much @thoughtfulnz.bsky.social ! That's really pretty & so nice of you to make up a toy example! 🤩🤩🤩

1

Dr Juulia Suvilehto @juulia.bsky.social · 8d

Ooh, that's a thought! Thanks! I think it's good for the audience to have a sense of how prevalent true vs faulty answers are, but that doesn't necessarily need to be in the same plot as the comparisons 🤔

1

Dr Juulia Suvilehto @juulia.bsky.social · 8d

Thanks Libby! Something like this is what I'm leaning towards but with four sections (so that I can show false positives and false negatives separately, they have different business implications)

2

Dr Juulia Suvilehto @juulia.bsky.social · 8d

Fair point! Right now I'm just going : right answer is the one a human gave + false positives and false negatives have different cost (false negatives are much more costly than false positives)

1

Dr Juulia Suvilehto @juulia.bsky.social · 8d

Oh hey thanks, I hadn't really thought of radar plot for this! That's definitely worth considering!

1 1

Dr Juulia Suvilehto @juulia.bsky.social · 8d

I think the problem is that data science is >10% thinking and you can't really externalize the thinking, just the execution. And even for execution you need to split in suitable sized chunks that you quality control yourself.

1 2

Dr Juulia Suvilehto @juulia.bsky.social · 9d

I’m really loving it for making and tweaking plots where I kind of know the plotting library but not well enough to remember all the functions and params by heart. @hadley.nz had a wonderful demo of this at the Posit conf earlier this month, it might be on youtube soon!

2 3

Dr Juulia Suvilehto @juulia.bsky.social · 9d

Maybe sometime down the line! Right I have a number of evals coming from many different models & frameworks and need to figure out how to best visualize them for our dev team.

Dr Juulia Suvilehto @juulia.bsky.social · 9d

Any references for good visualizations of LLM evals? We have orig system output (yes/no), a human annotator's y/n and a number of different y/n evaluations. The aim is to see how good our original system is and how much of the issues do the diff LLM evals catch. How would you visualize this? #databs

5

Dr Juulia Suvilehto @juulia.bsky.social · 14d

My mentor once said: strongest results are the ones where you don't even need stats to figure out if there was an effect, you just plot the raw data and look at it with your two human eyes. If it's obvious then, it's a ding-dang strong effect.

1 1 4

Reposted by Dr Juulia Suvilehto

Randy Au @randyau.com · 20d

#DataBS Conf 2025 preshow! We have two talks that we couldn't fit into the schedule but the speakers pre-recorded their talk for us to share before the main event next week!

Both are really good and give me lots of excitement about what we'll see next week.

ti.to/databsconf/d... <- free tix

🧵1/3

DataBS Conf

"Data, Behind the Scenes" is a free-to-attend online-only, single track conference centered on the real stories of data work from the folks in the trenches. We’re not here for the latest AI hype, perf...

ti.to

1 8 10

Reposted by Dr Juulia Suvilehto

Randy Au @randyau.com · Aug 19

This week on Counting Stuff, the #dataBS conference attendee registration form is open! Also an intro to the 6 of 14 talks we have confirmed!

Tickets are free/pay-what-you-want and there's technically a limit on attendees so grab yours today!

www.counting-stuff.com/databs-conf-...

DataBS Conf attendee registration is open!

Also the first batch of confirmed speakers listed

www.counting-stuff.com

1 9 19

Reposted by Dr Juulia Suvilehto

Data Behind the Scenes Conference @databsconf.com · Jul 19

The weekend is here!

Perfect time to submit your #dataBS talk.

We want your lessons-learned stories from data work, everything from "my data pipeline: the unsung hero" to "how we got that AI system working" to "what we learned when it fell over."

Details and sign-up:
bit.ly/dataBSconf-cfs

Data Behind the Scenes Conf - Call for Speakers

What This Conference Is About "Data, Behind the Scenes" is a (free) online-only, single track conference centered on the real stories of data work from the folks in the trenches. We’re not here for th...

bit.ly

10 13

Dr Juulia Suvilehto @juulia.bsky.social · Jul 18

Hi! Just wanted to let you know that there seems to be something wrong with the link - I get an 404 page when trying to open it and cant find the blog even in the blog tab.

1

Reposted by Dr Juulia Suvilehto

Randy Au @randyau.com · Jul 16

Hey #dataBS! Got some stories about how data is messily done to accomplish real things? Have you learned a painful lesson about how not to do something? Then you've got something to share and should submit a talk to the Data Behind the Scenes Conference! (Form closes July 31)

bit.ly/dataBSconf-cfs

Data Behind the Scenes Conf - Call for Speakers

What This Conference Is About "Data, Behind the Scenes" is a (free) online-only, single track conference centered on the real stories of data work from the folks in the trenches. We’re not here for th...

bit.ly

6 6

Reposted by Dr Juulia Suvilehto

Randy Au @randyau.com · Jul 1

The #dataBS (Data Behind the Scenes) Conference Call for speakers is out! We're gonna do this!!

All online, single track, free to attend. Come talk about your messy experiences doing data stuff. At work, personal projects, whatever. A space to commiserate about nerdy things!

bit.ly/dataBSconf-cfs

Data Behind the Scenes Conf - Call for Speakers

What This Conference Is About "Data, Behind the Scenes" is a (free) online-only, single track conference centered on the real stories of data work from the folks in the trenches. We’re not here for th...

bit.ly

3 26 54

Dr Juulia Suvilehto @juulia.bsky.social · Jun 11

kermit the frog is wearing a black hood and kermit the frog is wearing a hood .

ALT: kermit the frog is wearing a black hood and kermit the frog is wearing a hood .

media.tenor.com

Dr Juulia Suvilehto @juulia.bsky.social · Jun 5

Academics, get thee some fuck off skills. Marketable skills that can help you fuck off from academia and get employed elsewhere. I see folx do multiple postdocs because they don't know what else they could do. Don't let that be you - make sure you have the ability to fuck off #academia #AcademicSky

1 1 16

Dr Juulia Suvilehto @juulia.bsky.social · May 12

I seem to re-learn this truth at regular intervals: if you're looking for an explanation of how something slightly niche works, find a master's thesis related to the topic. Students (at least in the nordics) are expected to explain the whole thing and not leave steps out (unlike academic articles).

Dr Juulia Suvilehto @juulia.bsky.social · Apr 23

Ok that's a very good point! Could it be that since knowing how to do something in python is such a small part of a data scientist expertise it's just not something that is a strong part of our identity? Like, I wear clothes every day but I would never call myself "a real fashionista." Cause 🤷‍♀️

1

Dr Juulia Suvilehto @juulia.bsky.social · Apr 23

I think there's a subtle difference between saying "I can bake" and saying "I'm a real baker". Like, the latter feels like taking ownership of something that maybe someone else has a stronger claim over (pro bakers) whereas I can make a birthday cake but I'm not a professional baker.

1