David Robinson
@drob.bsky.social
6.7K followers 85 following 17 posts
Director of Engineering at Heap. #rstats fan. Dad x2. He/him
Posts Media Videos Starter Packs
drob.bsky.social
He doesn’t believe that “brekkie” is real; fair enough I barely believe it myself
drob.bsky.social
My son came up with a silly language where you add -ie to the end of each word. Like a shirt is a “shirtie” or milk is “milkie”

So I told him about Australia and he absolutely lost it
drob.bsky.social
Blown away

Using OpenAI’s Deep Research is like collaborating with a PhD student

(It told me it would get right on it then ghosted me)
drob.bsky.social
Thanks for sharing- I might get back into them!
drob.bsky.social
By convention, I use all caps for any function or infix operator I want to be passed to SQL

Because that avoids the possibility of conflicting with an R function, which will lead to an error when dbplyr finds it and tries to apply it

(E.g. lowercase extract() would have had a conflict from tidyr)
drob.bsky.social
That’s right, %FROM% isn’t from a package; dbplyr turns any unrecognized infix operator directly into SQL (much like it does with variable names)

Fun fact; %FrOm% would work too
drob.bsky.social
Being able to read is OP for a dad

I take my son around the Natural History Museum, he asks about anything, and I rattle off what the plaque says

He thinks I’m a goddamn genius
Reposted by David Robinson
ikashnitsky.phd
Here is a talk by @drob.bsky.social at @posit.co's conf 2019. The ideas shaped and voiced there are priceless. I've been suggesting this talk to my @datavizartskill.ikashnitsky.phd students ever since, and hopefully some of them found it as useful and motivating /3
youtu.be/th79W4rv67g?...
David Robinson | The unreasonable effectiveness of public work | RStudio (2019)
YouTube video by Posit PBC
youtu.be
drob.bsky.social
"How did you spend 2024?"
"I'll tell you how I spent 5% of it"
Apple Music Replay '24

#1 Taylor Swift: 25,520
#
drob.bsky.social
My #rstats solution to Day 2 of #adventofcode

* I feel like half of Advent of Code puzzles need a diff(), especially in the early days!
* Didn't use much tidyverse today (except map_lgl and between, but those could have easily been replaced)
library(tidyverse)
library(adventdrob)

input <- advent_input(2, 2024)
x <- input$x

is_safe <- function(report) {
  d <- diff(report)
  return((all(d > 0) || all(d < 0)) && all(between(abs(d), 1, 3)))
}

is_safe_part2 <- function(report) {
  return(is_safe(report) ||
           any(map_lgl(seq_along(report), \(i) is_safe(report[-i]))))
}

input$x %>%
  str_split(" ") %>%
  map(as.numeric) %>%
  map_lgl(is_safe_part2) %>%
  sum()
drob.bsky.social
When I woke up I realized my Part 1 could have been WAY shorter with sort 🤦‍♂️
input %>%
  separate(x, c("first", "second"), convert = TRUE) %>%
  summarize(sum(abs(sort(second) - sort(first))))
drob.bsky.social
My #rstats solution to Day 1 of #adventofcode

* Fun use of gather and spread (I know I'm supposed to be using pivot_longer and pivot_wider, but old-dog-new-tricks)
* One step I got stuck on was setting a ties.method in rank()
library(tidyverse)
library(adventdrob)

input <- advent_input(1, 2024)

separated <- input %>%
  separate(x, c("first", "second"), convert = TRUE)

# Part 1
separated %>%
  gather(type, value) %>%
  group_by(type) %>%
  mutate(rank = rank(value, ties.method = "first")) %>%
  ungroup() %>%
  spread(type, value) %>%
  summarize(sum(abs(second - first)))

# Part 2
totals %>%
  count(first = second, sort = TRUE) %>%
  inner_join(separated, by = "first") %>%
  summarize(sum(first * n))
drob.bsky.social
Who is doing #rstats Advent of Code this year? ❄️🎄
drob.bsky.social
My wife Dana is hiring a full-time Data Engineer at her company!

Great role for someone with strong experience in Python, dbt, and Snowflake who wants to join a growing startup in the government data space

Please forward to strong data folks you know!

www.linkedin.com/jobs/view/40...
Upfront (allupfront.com) is fixing the childcare industry through accurate, complete data so the government, parents, and providers can all make decisions and operate with optimal results. Our SaaS platform cleans, validates, and provides vital insights on childcare data (price, hours, location, availability, etc.) and serves as a central hub for every stakeholder. A Techstars portfolio company, Upfront has seen rapid growth with customers such as the states of Maryland, Arizona, and North Carolina. 



We are looking for a engineer to maintain our system ingest data from our clients, clean and enrich that data, and integrate it into our production system. 



Responsibilities:

Build a system for ingesting daily batches of data from a client’s API
Develop and maintain ETL scripts for processing and enriching that data
Manage data quality and accuracy, such as developing automated tests


We expect you to have:

The ability to architect a data ingestion platform from the ground up
Extensive experience with at least one platform for scheduled ETL pipelines 
Extensive experience in dbt, AWS and Snowflake
Intermediate to advanced proficiency in Python
Attention to detail and proactivity when it comes to data quality


Extra points for:

Skill at visualizing and drawing insights from data
Experience pulling data from public websites
Scrappy mindset- we're a small, but smart team and nothing is above or below our job title
drob.bsky.social
Was there a period where you were using the early tools personally / in teaching before you uploaded them to CRAN?

Did that change over the course of reshape, reshape2, ggplot, ggplot2?
Reposted by David Robinson
isabelizimm.bsky.social
`rm -rf` ❌
• remove rf??? what does that even mean???? (nothing)
• boring
• hard to remember

`rm -fr` ✅
• remove forreal 💅🏼💅🏼
• makes u smile every time
• will never forget
Reposted by David Robinson
emilyriederer.bsky.social
#rstats is actually fighting about Base versus Tidyverse again on this platform. We are so back
Reposted by David Robinson
tylermw.com
ggplot(data, aes(x,y)) +
geom_jitter(velocity = units(17000, "mph"))

help, my data is stuck in LEO

#RStats #ggplot #dataviz
drob.bsky.social
Tired: P-hacking
Wired: Querymandering
elsabirch.bsky.social
in haste I misread something as ‘querymandering’ so if that wasn’t in the lexicon already it is now
drob.bsky.social
I’ve been thinking about this too!!

I think one important shift in the last ten years is that data analysts are much more likely to use SQL + scripting, so “analysts that can program” is no longer a niche that gets its own title
Reposted by David Robinson
apreshill.com
Question for #databs folks:

I am searching for a recent write up of how data careers/titles are evolving. Has anyone written or read something that resonated on this lately?

I’m hoping for a boots on the ground point-of-view of basically “where have all the data scientists gone” 🤠
drob.bsky.social
Remember in Squid Game where the contestants in mortal danger barely managed to reach safety through a popular vote

And then later the dull banality of regular life drove them to voluntarily re-enter mortal danger

Dunno what made me think of that
Reposted by David Robinson
jeremy-data.bsky.social
My fav starter packs so far, a thread:

stats: go.bsky.app/Ki7PjpS
stats: go.bsky.app/7TBN5rX
causal inference: go.bsky.app/FdemGAZ
package devs: go.bsky.app/N1569Qh
data peeps: go.bsky.app/8TdEfdK
medical stats: go.bsky.app/ArqEz36
bioinformatics: go.bsky.app/Ha64Gmv
r-ladies: go.bsky.app/Vgxwa2F
Reposted by David Robinson
juliasilge.com
We've got a brand new, baby website for Positron! Take a look if you are interested in getting started, and please let us know how it goes:
positron.posit.co
Positron
A next-generation data science IDE
positron.posit.co