Lightnews — Scholar-powered news

Jeremy Allen

@jeremy-data.bsky.social

5.1K followers 4.3K following 1.1K posts

I help @posit.co deliver the best R and Python data science solutions to the public sector in the US and elsewhere. Fan of cats coffee whiskey tea photography philosophy Atlanta - personal account. https://18chains.com/

18chains.com

Posts Media Videos Starter Packs

Pinned

Jeremy Allen @jeremy-data.bsky.social · Jan 12

Work peeps convinced me to share my bird photographs on a blog. Luckily, I had a long-dead blog I could dust off. I added a new portfolio just for birding. My older portfolios are still there, too. My rstats content is still elsewhere. This is all personal and photo stuff. 18chains.com

18 Chains

Photographs and thoughts on being, by Jeremy Allen. Sometimes whiskey, cats, and rstats.

18chains.com

2 11

Reposted by Jeremy Allen

sarah @sarahjohnson.io · Aug 21

I was loving Claude Code... until I tried it with #rstats. Constant errors, wouldn't use the tidyverse even when asked, "optimized" functions were slower.

Frustrated, I started a session just to teach R to Claude and summarize what it learned into a CLAUDE.md file gist.github.com/sj-io/3828d6...

Claude R Tidyverse Expert

Claude R Tidyverse Expert. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

11 25 110

Reposted by Jeremy Allen

Emil Hvitfeldt @emilhvitfeldt.bsky.social · 4d

Excited to watch the webinar on "Modular, interoperable, and extensible topological data analysis in R" next week by Jason Cory Brunson and Aymeric Stamm

r-consortium.org/webinars/mod...
#tidymodels

Modular, interoperable, and extensible topological data analysis in R – R Consortium

r-consortium.org

1 7

Jeremy Allen @jeremy-data.bsky.social · 4d

New mission: "Posit’s mission is to create free and opensource software for data science, scientific research, and technical communication EVEN IF IT HURTS" - Hadley 2025

1 11

Jeremy Allen @jeremy-data.bsky.social · 5d

And nearby you will see a headline about how some seemingly helpless group of people are left concerned by what they are experiencing

Jeremy Allen @jeremy-data.bsky.social · 5d

I am now data bald on top and data grey in the beard

1 1

Jeremy Allen @jeremy-data.bsky.social · 5d

2️⃣

Jeremy Allen @jeremy-data.bsky.social · 6d

it prints hexagonally!

Jeremy Allen @jeremy-data.bsky.social · 6d

do you even spelling bee?

$bee <- function(max_vowels = 2) { vowels <- c("a", "e", "i", "o", "u") seven <- sample(letters, 7, replace = FALSE) num_vowels <- length(intersect(seven, vowels)) while (num_vowels == 0 || num_vowels > max_vowels) { seven <- sample(letters, 7, replace = FALSE) num_vowels <- length(intersect(seven, vowels)) } pattern <- "\n\n %s\n%s %s\n %s\n%s %s\n %s\n\n" cat(sprintf(pattern, seven[1], seven[2], seven[3], seven[4], seven[5], seven[6], seven[7])) } bee()$

1 7

Jeremy Allen @jeremy-data.bsky.social · 6d

do it well and you can earn venure, or vibenure, or tibe, or whatever

Jeremy Allen @jeremy-data.bsky.social · 6d

Maybe @pewresearch.org is working on a sequel?

Jeremy Allen @jeremy-data.bsky.social · 7d

I follwed that up with this prompt and got really good results.

Look at the case names column and the full name column. Tell me where a defendant is listed more than once for the same case but at different times in the same case. For example, if a shooter is recorded here twice, once for their indictment once for their sentencing, I want to know that because I only want to count them once. Or, if a defendant appears twice because they were indicted at two jurisdiction levels (federal and state) I want to know that too.

1 4

Jeremy Allen @jeremy-data.bsky.social · 7d

I don't have the exact prompt, but it was something like this (also pasted in the image alt text).

I need to fix data quality issues with these cases. Examine the case names column and look for typos and other naming inconsistencies. Also identify cases that are missing names or may have odd placeholder names.

1 3

Jeremy Allen @jeremy-data.bsky.social · 7d

not sure what you mean by promo example

1 1

Jeremy Allen @jeremy-data.bsky.social · 7d

cc @prosecutionproject.bsky.social on some data cleaning possibilities

Jeremy Allen @jeremy-data.bsky.social · 7d

boycott

Jeremy Allen @jeremy-data.bsky.social · 7d

I like Claude Code for this kind of thing. I doubt I would have found and fixed 151 data cleaning issues in 13 minutes.

⏺ Task(Analyze case name patterns)
⎿ Done (15 tool uses · 48.9k tokens · 12m 58s)

⏺ I've completed a comprehensive analysis of case names across all three CSV files (8,707 total cases). Here's what I found:

Key Findings

151 unique issues identified across these categories:

Major Issues:

1. United States typos (22 cases): "United State", "United Staes", "United Sates", "United Statse", etc.
2. State name typos (55 cases): Virgina, Tennesse, Georiga, Conneticut, Illnois, Minneosta, Pennsylvannia, etc.
3. Concatenated "of[State]" (16 cases): "ofMassachusetts", "ofVirginia", "ofPennsylvania"
4. "American" instead of "America" (4 cases): "United States of American"
5. "vs" instead of "v." (13 cases): Mostly Oregon and North Carolina
6. Whitespace issues (23 cases): Leading/trailing spaces
7. Multiple consecutive spaces (4 cases)
8. Placeholder values (13 cases): "Unknown", "Data not available", "W"

Your Current Code Coverage:

✅ Already handles: Some "United States" typos, "Ofmassachusetts", Tennessee variations❌ Missing: Most state typos,
whitespace issues, "vs" patterns, "United States of American", placeholders

I've provided a complete updated cleaning code block that fixes all 151 issues. The code is ordered properly (whitespace →
concatenation → spelling → separators → standardization) and includes all the fixes your current code has plus the new ones
needed for the two additional datasets.

4 2 29

Jeremy Allen @jeremy-data.bsky.social · 7d

I loved this. I am now shopping for a scope.

1 1

Jeremy Allen @jeremy-data.bsky.social · 7d

oh I like that

Jeremy Allen @jeremy-data.bsky.social · 8d

Something I jokingly said at work, but dang if it isn't actually true:
"Everyone needs an emotional support R script."

You know, those R scripts in your project where you test your freak code before you're confident it works?
#RStats

Screenshot of the top corner of the Positron IDE with three files open and their tabs are named "todo," "violence.qmd," and "play.R." The third one is circled with a pink line.

9 7 49

Jeremy Allen @jeremy-data.bsky.social · 8d

my squint says it is Michael Fassbender

Jeremy Allen @jeremy-data.bsky.social · 10d

Not just NASA, but also parts of NOAA including the National Weather Service. Classifying them as intelligence agencies strips their employees of collective bargaining rights.