Lightnews — Scholar-powered news

Reposted

Jakub Nowosad

@jakubnowosad.com

tmap or ggplot2 for maps? 🗺️

David O’Sullivan breaks down the trade-offs in a blog post.

URL: dosull.github.io/posts/2024-1...

#RStats #RSpatial #Maps #tmap #ggplot2

tmap vs. ggplot2 for mapping – Geospatial Stuff

For me at least the choice between ggplot2 and tmap is an ongoing question. Here are my latest thoughts on the subject (with code).

dosull.github.io

November 12, 2025 at 2:02 PM

Reposted

Tim Kellogg

@timkellogg.me

Sparse Circuits

a new mech interp paper from OpenAI proposes a way to train models so that they’re natively easier to understand

openai.com/index/unders...

Understanding neural networks through sparse circuits

We trained models to think in simpler, more traceable steps—so we can better understand how they work.

openai.com

November 13, 2025 at 6:37 PM

Reposted

Esther Schindler

@estherschindler.bsky.social

This paper introduces the LEGO Database, a large natural dataset that can be used to teach Structured Query Language (SQL) and relational database concepts.

ERIC - EJ1468081 - Using LEGO® Brick Data to Teach SQL and Relational Database Concepts, Information Systems Education Journal, 2025

This paper introduces the LEGO® Database, a large natural dataset that can be used to teach Structured Query Language (SQL) and relational database concepts. This dataset is well-suited for introductory and advanced database assignments and end-of-semester group projects. The data is freely available from Kaggle.com and contains eight tables with 633,250 rows of data on 11,673 LEGO® sets sold between 1950 and 2017. As a guiding example, I introduce an example group project assignment designed to provide students hands-on experience with database management and SQL queries. I also discuss tips, suggestions, and lessons learned from using the data for group projects over the past five years. While LEGO® bricks have been widely used in educational settings, including college and computer classrooms, this is the first work to discuss the use of LEGO® data in a college database course.

eric.ed.gov

October 19, 2025 at 12:37 AM

Reposted

Mattan S. Ben-Shachar

@mattansb.msbstats.info

This year I'm teaching an advanced stats course for our psych grad students, and I want to squeeze as much causal stuff as I can - but there's just too much!

ATE, DAGs, confounder selection, table 1 & 2 fallacies, collider bias, ...

What else should I squeeze in there?

November 13, 2025 at 11:30 AM

Reposted

Max kuhn

@topepo.bsky.social

We're hiring an open-source #python developer focused on modeling APIs!

tidyverse.org/blog/2025/11...

#numpy #scipy #scikitlearn

Python Open-Source Developer

Posit is hiring a Python open-source developer to create more data analysis tools.

tidyverse.org

November 12, 2025 at 5:46 PM

Reposted

David Keyes

@dgkeyes.com

I spent several days working on this blog post and video. Behold everything I know (aka everything @joseph-barbier.bsky.social has ever taught me) about making high-quality PDFs using Quarto and Typst. #rstats

How to Make High-Quality PDFs with Quarto and Typst

Learn to create high-quality PDFs using Quarto and Typst with this in-depth tutorial, including custom templates, branding, and advanced techniques.

rfortherestofus.com

November 13, 2025 at 1:36 PM

Reposted

rmoff 🏃‍♂️🫖🥓

@rmoff.net

Colocating Input Partitions with Kafka Streams When Consuming Multiple Topics: Sub-Topology Matters! - Vishal Sharma medium.com/expedia-grou...

Colocating Input Partitions with Kafka Streams When Consuming Multiple Topics: Sub-Topology Matters!

Understanding how sub-topology design influences partition co-location

medium.com

November 13, 2025 at 4:27 PM

Reposted

Libby Heeren

@libbyheeren.bsky.social

One of my favorite conf talks of all time was Ryan Timpe's talk about learning R via hilarious side projects, so I'm pretty freaking pumped to say he's is gonna be the featured guest at the Data Science Hangout tomorrow 🤩 If you haven't seen that 2020 talk yet, here it is: youtu.be/oOG-aXP_ICI

Ryan Timpe | Learning R with humorous side projects | RStudio (2020)

YouTube video by Posit PBC

youtu.be

November 13, 2025 at 12:32 AM

Reposted

Jumping Rivers

@jumpingrivers.com

We’re hiring a Junior Systems Administrator! If you like solving problems, working with Linux, and keeping systems running smoothly, you’ll fit right in at Jumping Rivers.

Find out more on our website.

#Hiring #TechJobs #Linux #SystemsAdmin #JumpingRivers

Junior Systems Administrator: Jumping Rivers is hiring!

Jumping Rivers is hiring a Junior Systems Administrator at its Newcastle Upon Tyne offices.

jumping-rivers.welcomekit.co

November 13, 2025 at 11:16 AM

Reposted

Simon P. Couch

@simonpcouch.com

To be effective, data science agents need to be able to read plots reliably. @sara-altman.bsky.social and I wrote about some concerning findings on LLMs' ability to interpret plots when the content contradicts their expectations on the @posit.co blog.

posit.co/blog/introdu...

When plotting, LLMs see what they expect to see - Posit

Data science agents need to accurately read plots even when the content contradicts their expectations. Our testing shows today's LLMs still struggle here.

posit.co

November 13, 2025 at 3:07 PM

Reposted

Claus Wilke

@clauswilke.com

I've done a lot of work in Python this fall, and it hasn't endeared me to the language at all. Why does stuff have to be so complicated when you're doing it in Python?
blog.genesmindsmachines.com/p/python-is-...

Python is not a great language for data science. Part 1: The experience

It may be a good language for data science, but it’s not a great one.

blog.genesmindsmachines.com

November 13, 2025 at 4:16 PM

Reposted

Hadley Wickham

@hadley.nz

Do you teach #rstats? Do your students complain about how lame and old-fashioned dplyr is? Don't worry: I have the solution for you: github.com/hadley/genzp....

genzplyr is dplyr, but bussin fr fr no cap.

GitHub - hadley/genzplyr: dplyr but make it bussin fr fr no cap

dplyr but make it bussin fr fr no cap. Contribute to hadley/genzplyr development by creating an account on GitHub.

github.com

November 6, 2025 at 11:25 PM

datascienceweekly.bsky.social

@datascienceweekly.bsky.social

Data Science Weekly - Issue 624, by @DataSciNews open.substack.com/pub/datascie...

Data Science Weekly - Issue 624

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

open.substack.com

November 6, 2025 at 11:57 PM

Reposted

Michael Friendly

@datavisfriendly.bsky.social

Nice use of #rstats #quarto #closeread in this post!

The R Data Scientist @rstats.blaze.email · 10d

An Introduction to Writing Your Own ggplot2 Geoms

https://rworks.dev/posts/ggplot2-extensions/

#rstats #datascience

An Introduction to Writing Your Own ggplot2 Geoms

Summary: Explores creating custom ggplot2 extensions with recipes for geoms and stats in R using ggproto, make_constructor, and base R workflows.

rworks.dev

November 4, 2025 at 2:22 AM

Reposted

Lucia Walinchus

@walinchus.bsky.social

There are a lot of great posts out there that aren't very highly ranked.

Don't rely on bluesky to find you great content; you can find it on your own! Here's how:

#Rstats via @northeasternu.bsky.social's Storybench

www.storybench.org/how-to-analy...

How to Analyze bluesky Posts and Trends with R - Storybench

If all you're doing on bluesky is scrolling, liking and posting, then you're riding a bike with training wheels. Here are simple tools using its open-source skeleton.

www.storybench.org

November 5, 2025 at 7:58 PM

Reposted

Simon P. Couch

@simonpcouch.com

I'm excited to share side::kick(), an experimental open-source coding agent for RStudio built entirely in R. It can interact with your files, communicate with your active #rstats session, and run code.

Check it out: github.com/simonpcouch/...

November 5, 2025 at 3:57 PM

Reposted

Jumping Rivers

@jumpingrivers.com

In our latest blog post we compare the syntax of two Python libraries, Pandas and Polars for standard data-manipulation tasks.

#python #polars #pandas

Polars and Pandas - Working with the Data-Frame

In our latest blog post we compare the Pandas and Polars syntax for standard data-manipulation tasks.

www.jumpingrivers.com

November 6, 2025 at 2:12 PM

Reposted

Ben Schneider

@bschneidr.bsky.social

New blog post: open-source software packages have surprising problems with the way they calculate weighted medians and other quantiles.

www.practicalsignificance.com/posts/weight...

#rstats #julialang

Weighted Quantile Weirdness and Bugs – Practical Significance

Computing quantiles is surprisingly complicated. It gets much weirder when you use weights, and popular software behaves in surprising ways that might trouble you.

www.practicalsignificance.com

November 5, 2025 at 4:30 PM

Reposted

Stefano

@stippe87.bsky.social

Last post on causal inference: DID
Plus: I finally added the "copy to clipboard" button 😁

thestippe.github.io/statistics/d...

Difference in difference

Causal inference from 1850

thestippe.github.io

November 6, 2025 at 9:08 PM

Reposted

Sophie Huiberts

@sophie.huiberts.me

Still got my head in the clouds about this. The paper is really out now 😍

Sophie Huiberts @sophie.huiberts.me · 17d

The simplex algorithm is super efficient. 80 years of experience says it runs in linear time. Nobody can explain _why_ it is so fast.

We invented a new algorithm analysis framework to find out.

Beyond Smoothed Analysis: Analyzing the Simplex Method by the Book

Narrowing the gap between theory and practice is a longstanding goal of the algorithm analysis community. To further progress our understanding of how algorithms work in practice, we propose a new alg...

arxiv.org

October 30, 2025 at 10:04 PM

datascienceweekly.bsky.social

@datascienceweekly.bsky.social

Data Science Weekly - Issue 623, by @DataSciNews open.substack.com/pub/datascie...

Data Science Weekly - Issue 623

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

open.substack.com

October 30, 2025 at 10:23 PM

Reposted

GFZ Helmholtz-Zentrum für Geoforschung

@gfz.bsky.social

News story EN 👉 www.gfz.de/en/press/new...

Full paper 👉 www.nature.com/articles/s41...

Turning Smartphones into Earthquake Sensors

Turning Smartphones into Earthquake Sensors: Thousands of mobile phone accelerometres provide data that enable a new step for high resolution shake-maps and safer cities – as a study based on a Citize...

www.gfz.de

October 29, 2025 at 2:43 PM

Reposted

Bruno Rodrigues

@brodriguesco.bsky.social

new blog post:

Of course, someone has to write imperative code to build reproducible data science pipelines. It doesn’t have to be you.

brodrigues.co/posts/2025-1...

October 29, 2025 at 3:52 PM

Reposted

Steven P. Sanderson II, MPH

@spsanderson.com

Today's blogpost goes over the use of #ragnar #ollama in #R for document summary, specifically health insurance payer policy.

Post: www.spsanderson.com/steveondata/...

#RStats #Blog #tidyverse #ellmer #ragnar #ollama

RAG with Ollama and ragnar in R: A Practical Guide for R Programmers – Steve’s Data Tips and Tricks

Learn how to build a privacy-preserving Retrieval-Augmented Generation (RAG) workflow in R using Ollama and the ragnar package. Discover step-by-step methods for summarizing health insurance policy do...

www.spsanderson.com

October 29, 2025 at 10:54 PM

Reposted

Sung Kim

@sungkim.bsky.social

A Geometric Analysis of PCA

What property of the data distribution determines the excess risk of principal component analysis? In this paper, they provide a precise answer to this question.

arxiv.org/abs/2510.20978

October 29, 2025 at 12:29 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news