Jeremy Allen
@jeremy-data.bsky.social
5.1K followers 4.3K following 1.1K posts
I help @posit.co deliver the best R and Python data science solutions to the public sector in the US and elsewhere. Fan of cats coffee whiskey tea photography philosophy Atlanta - personal account. https://18chains.com/
Posts Media Videos Starter Packs
Pinned
jeremy-data.bsky.social
Work peeps convinced me to share my bird photographs on a blog. Luckily, I had a long-dead blog I could dust off. I added a new portfolio just for birding. My older portfolios are still there, too. My rstats content is still elsewhere. This is all personal and photo stuff. 18chains.com
18 Chains
Photographs and thoughts on being, by Jeremy Allen. Sometimes whiskey, cats, and rstats.
18chains.com
Reposted by Jeremy Allen
sarahjohnson.io
I was loving Claude Code... until I tried it with #rstats. Constant errors, wouldn't use the tidyverse even when asked, "optimized" functions were slower.

Frustrated, I started a session just to teach R to Claude and summarize what it learned into a CLAUDE.md file gist.github.com/sj-io/3828d6...
Claude R Tidyverse Expert
Claude R Tidyverse Expert. GitHub Gist: instantly share code, notes, and snippets.
gist.github.com
Reposted by Jeremy Allen
emilhvitfeldt.bsky.social
Excited to watch the webinar on "Modular, interoperable, and extensible topological data analysis in R" next week by Jason Cory Brunson and Aymeric Stamm

r-consortium.org/webinars/mod...
#tidymodels
Modular, interoperable, and extensible topological data analysis in R – R Consortium
r-consortium.org
jeremy-data.bsky.social
New mission: "Posit’s mission is to create free and opensource software for data science, scientific research, and technical communication EVEN IF IT HURTS" - Hadley 2025
jeremy-data.bsky.social
And nearby you will see a headline about how some seemingly helpless group of people are left concerned by what they are experiencing
jeremy-data.bsky.social
I am now data bald on top and data grey in the beard
jeremy-data.bsky.social
do you even spelling bee?
bee <- function(max_vowels = 2) {
  vowels <- c("a", "e", "i", "o", "u")
  seven <- sample(letters, 7, replace = FALSE)
  num_vowels <- length(intersect(seven, vowels))
  
  while (num_vowels == 0 || num_vowels > max_vowels) {
    seven <- sample(letters, 7, replace = FALSE)
    num_vowels <- length(intersect(seven, vowels))
  }
  pattern <- "\n\n   %s\n%s     %s\n   %s\n%s     %s\n   %s\n\n"
  cat(sprintf(pattern, seven[1], seven[2], seven[3], seven[4], 
              seven[5], seven[6], seven[7]))
}

bee()
jeremy-data.bsky.social
do it well and you can earn venure, or vibenure, or tibe, or whatever
jeremy-data.bsky.social
I follwed that up with this prompt and got really good results.
Look at the case names column and the full name column. Tell me where a defendant is listed more than once for the same case but at different times in the same case. For example, if a shooter is recorded here twice, once for their indictment once for their sentencing, I want to know that because I only want to count them once. Or, if a defendant appears twice because they were indicted at two jurisdiction levels (federal and state) I want to know that too.
jeremy-data.bsky.social
I don't have the exact prompt, but it was something like this (also pasted in the image alt text).
I need to fix data quality issues with these cases. Examine the case names column and look for typos and other naming inconsistencies. Also identify cases that are missing names or may have odd placeholder names.
jeremy-data.bsky.social
not sure what you mean by promo example
jeremy-data.bsky.social
I like Claude Code for this kind of thing. I doubt I would have found and fixed 151 data cleaning issues in 13 minutes.
⏺ Task(Analyze case name patterns)
  ⎿  Done (15 tool uses · 48.9k tokens · 12m 58s)                                                                            

⏺ I've completed a comprehensive analysis of case names across all three CSV files (8,707 total cases). Here's what I found:

  Key Findings

  151 unique issues identified across these categories:

  Major Issues:

  1. United States typos (22 cases): "United State", "United Staes", "United Sates", "United Statse", etc.
  2. State name typos (55 cases): Virgina, Tennesse, Georiga, Conneticut, Illnois, Minneosta, Pennsylvannia, etc.
  3. Concatenated "of[State]" (16 cases): "ofMassachusetts", "ofVirginia", "ofPennsylvania"
  4. "American" instead of "America" (4 cases): "United States of American"
  5. "vs" instead of "v." (13 cases): Mostly Oregon and North Carolina
  6. Whitespace issues (23 cases): Leading/trailing spaces
  7. Multiple consecutive spaces (4 cases)
  8. Placeholder values (13 cases): "Unknown", "Data not available", "W"

  Your Current Code Coverage:

  ✅ Already handles: Some "United States" typos, "Ofmassachusetts", Tennessee variations❌ Missing: Most state typos,
  whitespace issues, "vs" patterns, "United States of American", placeholders

  I've provided a complete updated cleaning code block that fixes all 151 issues. The code is ordered properly (whitespace →
  concatenation → spelling → separators → standardization) and includes all the fixes your current code has plus the new ones
  needed for the two additional datasets.
jeremy-data.bsky.social
I loved this. I am now shopping for a scope.
jeremy-data.bsky.social
Something I jokingly said at work, but dang if it isn't actually true:
"Everyone needs an emotional support R script."

You know, those R scripts in your project where you test your freak code before you're confident it works?
#RStats
Screenshot of the top corner of the Positron IDE with three files open and their tabs are named "todo," "violence.qmd," and "play.R." The third one is circled with a pink line.
jeremy-data.bsky.social
my squint says it is Michael Fassbender
jeremy-data.bsky.social
As a toddler I yearned to spend my brightest years parsing date strings
jeremy-data.bsky.social
Yeah it was quite a scene
jeremy-data.bsky.social
New living room at home.

jk

Antique store
Man sitting on a gold-patterned vintage sofa inside an antique store, surrounded by ornate lamps, framed artwork, glass displays, vintage clothing, and eclectic decorative objects. The room is warmly lit with a cozy, maximalist atmosphere
jeremy-data.bsky.social
Data tables were different in 1870. This is the Dpt of Agriculture’s 1870 report.
Interior page from the Report of the Commissioner of Agriculture for 1870 showing a statistical table of agricultural products exported from the United States. The table lists categories such as animal matter, leather, wool, breadstuffs, cotton, wood products, ashes, and fruits, along with quantities and values in pounds, bushels, barrels, cords, and dollars. Page heading reads ‘Report of the Statistician,’ page number 63 visible in the top right Title page of the Report of the Commissioner of Agriculture for the Year 1870, published by the Government Printing Office in Washington, 1871. The page has centered black text on aged paper with some slight discoloration along the edges