datascienceweekly.bsky.social
@datascienceweekly.bsky.social
Reposted
tmap or ggplot2 for maps? 🗺️

David O’Sullivan breaks down the trade-offs in a blog post.

URL: dosull.github.io/posts/2024-1...

#RStats #RSpatial #Maps #tmap #ggplot2
tmap vs. ggplot2 for mapping – Geospatial Stuff
For me at least the choice between ggplot2 and tmap is an ongoing question. Here are my latest thoughts on the subject (with code).
dosull.github.io
November 12, 2025 at 2:02 PM
Reposted
Sparse Circuits

a new mech interp paper from OpenAI proposes a way to train models so that they’re natively easier to understand

openai.com/index/unders...
Understanding neural networks through sparse circuits
We trained models to think in simpler, more traceable steps—so we can better understand how they work.
openai.com
November 13, 2025 at 6:37 PM
Reposted
This paper introduces the LEGO Database, a large natural dataset that can be used to teach Structured Query Language (SQL) and relational database concepts.
ERIC - EJ1468081 - Using LEGO® Brick Data to Teach SQL and Relational Database Concepts, Information Systems Education Journal, 2025
This paper introduces the LEGO® Database, a large natural dataset that can be used to teach Structured Query Language (SQL) and relational database concepts. This dataset is well-suited for introductory and advanced database assignments and end-of-semester group projects. The data is freely available from Kaggle.com and contains eight tables with 633,250 rows of data on 11,673 LEGO® sets sold between 1950 and 2017. As a guiding example, I introduce an example group project assignment designed to provide students hands-on experience with database management and SQL queries. I also discuss tips, suggestions, and lessons learned from using the data for group projects over the past five years. While LEGO® bricks have been widely used in educational settings, including college and computer classrooms, this is the first work to discuss the use of LEGO® data in a college database course.
eric.ed.gov
October 19, 2025 at 12:37 AM
Reposted
This year I'm teaching an advanced stats course for our psych grad students, and I want to squeeze as much causal stuff as I can - but there's just too much!

ATE, DAGs, confounder selection, table 1 & 2 fallacies, collider bias, ...

What else should I squeeze in there?
November 13, 2025 at 11:30 AM
Reposted
We're hiring an open-source #python developer focused on modeling APIs!

tidyverse.org/blog/2025/11...

#numpy #scipy #scikitlearn
Python Open-Source Developer
Posit is hiring a Python open-source developer to create more data analysis tools.
tidyverse.org
November 12, 2025 at 5:46 PM
Reposted
I spent several days working on this blog post and video. Behold everything I know (aka everything @joseph-barbier.bsky.social has ever taught me) about making high-quality PDFs using Quarto and Typst. #rstats
How to Make High-Quality PDFs with Quarto and Typst
Learn to create high-quality PDFs using Quarto and Typst with this in-depth tutorial, including custom templates, branding, and advanced techniques.
rfortherestofus.com
November 13, 2025 at 1:36 PM
Reposted
Colocating Input Partitions with Kafka Streams When Consuming Multiple Topics: Sub-Topology Matters! - Vishal Sharma medium.com/expedia-grou...
Colocating Input Partitions with Kafka Streams When Consuming Multiple Topics: Sub-Topology Matters!
Understanding how sub-topology design influences partition co-location
medium.com
November 13, 2025 at 4:27 PM
Reposted
One of my favorite conf talks of all time was Ryan Timpe's talk about learning R via hilarious side projects, so I'm pretty freaking pumped to say he's is gonna be the featured guest at the Data Science Hangout tomorrow 🤩 If you haven't seen that 2020 talk yet, here it is: youtu.be/oOG-aXP_ICI
Ryan Timpe | Learning R with humorous side projects | RStudio (2020)
YouTube video by Posit PBC
youtu.be
November 13, 2025 at 12:32 AM
Reposted
We’re hiring a Junior Systems Administrator! If you like solving problems, working with Linux, and keeping systems running smoothly, you’ll fit right in at Jumping Rivers.

Find out more on our website.

#Hiring #TechJobs #Linux #SystemsAdmin #JumpingRivers
Junior Systems Administrator: Jumping Rivers is hiring!
Jumping Rivers is hiring a Junior Systems Administrator at its Newcastle Upon Tyne offices.
jumping-rivers.welcomekit.co
November 13, 2025 at 11:16 AM
Reposted
To be effective, data science agents need to be able to read plots reliably. @sara-altman.bsky.social and I wrote about some concerning findings on LLMs' ability to interpret plots when the content contradicts their expectations on the @posit.co blog.

posit.co/blog/introdu...
When plotting, LLMs see what they expect to see - Posit
Data science agents need to accurately read plots even when the content contradicts their expectations. Our testing shows today's LLMs still struggle here.
posit.co
November 13, 2025 at 3:07 PM
Reposted
I've done a lot of work in Python this fall, and it hasn't endeared me to the language at all. Why does stuff have to be so complicated when you're doing it in Python?
blog.genesmindsmachines.com/p/python-is-...
Python is not a great language for data science. Part 1: The experience
It may be a good language for data science, but it’s not a great one.
blog.genesmindsmachines.com
November 13, 2025 at 4:16 PM
Reposted
Do you teach #rstats? Do your students complain about how lame and old-fashioned dplyr is? Don't worry: I have the solution for you: github.com/hadley/genzp....

genzplyr is dplyr, but bussin fr fr no cap.
GitHub - hadley/genzplyr: dplyr but make it bussin fr fr no cap
dplyr but make it bussin fr fr no cap. Contribute to hadley/genzplyr development by creating an account on GitHub.
github.com
November 6, 2025 at 11:25 PM
Data Science Weekly - Issue 624, by @DataSciNews open.substack.com/pub/datascie...
Data Science Weekly - Issue 624
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
open.substack.com
November 6, 2025 at 11:57 PM
Reposted
There are a lot of great posts out there that aren't very highly ranked.

Don't rely on bluesky to find you great content; you can find it on your own! Here's how:

#Rstats via @northeasternu.bsky.social's Storybench

www.storybench.org/how-to-analy...
How to Analyze bluesky Posts and Trends with R - Storybench
If all you're doing on bluesky is scrolling, liking and posting, then you're riding a bike with training wheels. Here are simple tools using its open-source skeleton.
www.storybench.org
November 5, 2025 at 7:58 PM
Reposted
I'm excited to share side::kick(), an experimental open-source coding agent for RStudio built entirely in R. It can interact with your files, communicate with your active #rstats session, and run code.

Check it out: github.com/simonpcouch/...
November 5, 2025 at 3:57 PM
Reposted
In our latest blog post we compare the syntax of two Python libraries, Pandas and Polars for standard data-manipulation tasks.

#python #polars #pandas
Polars and Pandas - Working with the Data-Frame
In our latest blog post we compare the Pandas and Polars syntax for standard data-manipulation tasks.
www.jumpingrivers.com
November 6, 2025 at 2:12 PM
Reposted
New blog post: open-source software packages have surprising problems with the way they calculate weighted medians and other quantiles.

www.practicalsignificance.com/posts/weight...

#rstats #julialang
Weighted Quantile Weirdness and Bugs – Practical Significance
Computing quantiles is surprisingly complicated. It gets much weirder when you use weights, and popular software behaves in surprising ways that might trouble you.
www.practicalsignificance.com
November 5, 2025 at 4:30 PM
Reposted
Last post on causal inference: DID
Plus: I finally added the "copy to clipboard" button 😁

thestippe.github.io/statistics/d...
Difference in difference
Causal inference from 1850
thestippe.github.io
November 6, 2025 at 9:08 PM
Reposted
Still got my head in the clouds about this. The paper is really out now 😍
October 30, 2025 at 10:04 PM
Data Science Weekly - Issue 623, by @DataSciNews open.substack.com/pub/datascie...
Data Science Weekly - Issue 623
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
open.substack.com
October 30, 2025 at 10:23 PM
Reposted
new blog post:

Of course, someone has to write imperative code to build reproducible data science pipelines. It doesn’t have to be you.

brodrigues.co/posts/2025-1...
October 29, 2025 at 3:52 PM
Reposted
A Geometric Analysis of PCA

What property of the data distribution determines the excess risk of principal component analysis? In this paper, they provide a precise answer to this question.

arxiv.org/abs/2510.20978
October 29, 2025 at 12:29 PM