Collin Berke
@collinberke.bsky.social
110 followers 220 following 180 posts
Media Research Analyst | #rstats | data enthusiast | news, sports, and podcast aficionado Website: https://www.collinberke.com/ GitHub: https://github.com/collinberke LinkedIn: https://www.linkedin.com/in/collinberke/
Posts Media Videos Starter Packs
Pinned
collinberke.bsky.social
Hi, I'm Collin! 👋

Here are a few things about me (in no particular order):
- 8-year media research analyst (i.e., a data analyst)
- #rstats user & developer | #datavis enthusiast
- dslc.io member
- Likely fixing my #neovim config
- Sporadic blogger
- Occasional university course instructor
collinberke.bsky.social
An option for 24hr time formatting as well!

Yeah, we're going to turn that that configuration on right now.
collinberke.bsky.social
TIL: Microsoft Teams can be configured to use ISO 8601 date formatting.

I don't know why this makes me so excited. Perhaps it's the analyst in me who's had to wrestle with and wrangle various date formats in the past ...

Obligatory @xkcd.com link: xkcd.com/1179/

#dataBS
ISO 8601
xkcd.com
collinberke.bsky.social
Note to self: It's been awhile since I've needed to do back indexing to return a single row of data while using #RStats.

So, here's a Base R and dplyr refresher 👇

#dataBS
collinberke.bsky.social
If you're still following this thread, I highly suggest checking out the full post: www.counting-stuff.com/data-cleanin....

It's full of other great takeaways.

9/9
collinberke.bsky.social
"Unless you know what people are going to use the data for, you won’t know. Even if you know, you’d still must guess at what is likely to help their analysis and not hurt it. By far the most important thing to do is to fully document every cleaning decision ..."

8/9
collinberke.bsky.social
I'm a big fan of the practical advice at the end:
* Save a copy of the original data, avoid permanent changes.
* Leave a paper trail.
* Fix what's needed to get your analysis to work.
* Reduce unwanted variation
* Eliminate bias where possible

7/9
collinberke.bsky.social
Data practitioners are not safe from change, which is the result of operating in environments utilizing modern distributed-systems infrastructure. Thus, cleaning operations will be constantly revisited.

6/9
collinberke.bsky.social
"The problem with automation is that it can divorce understanding the underlying data from the analysis and interpretation of the data."

5/9
collinberke.bsky.social
"If you delegate that responsibility someone else [data cleaning], whether it’s another human or a machine, you put yourself at risk of doing something dangerous with your data."

4/9
collinberke.bsky.social
"The real reason for all this cleaning work is the signal-to-noise ratio in raw data is too poor for purpose we intend. We need to improve data quality to amp the signal for our tools to find."

3/9
collinberke.bsky.social
"We’re doing cleaning because we want to extract the useful signal from the noise, and we decide certain bits of noise “correctable” at the data point level for that purpose."

2/9
collinberke.bsky.social
I got introduced to @randyau.com's 'Data Cleaning IS Analysis, Not Grunt Work' post during the #dataBS Conf this week: www.counting-stuff.com/data-cleanin... . I just finished--it was a great read.

Here are some quotes and thoughts I'm walking away with 👇

1/9 #RStats
Data Cleaning IS Analysis, Not Grunt Work
Also, most data cleaning articles suck
www.counting-stuff.com
collinberke.bsky.social
Probably should have mentioned the date and time:

October 16, 2025, 9am PT / 12pm ET
collinberke.bsky.social
Just registered for the @rconsortium.bsky.social's 'How to use `pointblank` to understand, validate, and document your data' online workshop. I'm looking forward to learning more from @richmeister.bsky.social.

Sign up here: r-consortium.org/webinars/how...

#RStats #DataBS
How to use pointblank to understand, validate, and document your data – R Consortium
r-consortium.org
collinberke.bsky.social
Perhaps to give the user the option to write their own custom handling behavior? 🤷‍♂️
collinberke.bsky.social
The ‘just work without error’ design decisions are something I appreciate from time to time.

I’ll also never stop loving finding those little things base R has already solved. You just have to explore a little bit.
collinberke.bsky.social
TIL: #Rstats base::system.file() has a `mustWork` argument. It pushes an error if a file is not found. Useful.
collinberke.bsky.social
I often go beyond 80 characters per line ...

The world continues to turn.
collinberke.bsky.social
Great resource! 👀
Thanks, @emilhvitfeldt.bsky.social!
emilhvitfeldt.bsky.social
I'm exited to announce a new resource about making slides with quarto and revealjs. This book is the combination of all the work I have done in this area, reordered and polished up

There isn't a lot of new information yet, but this format allows me to add more easily

slidecrafting-book.com
#quarto
Screenshot of first page of slidecrafting-book.com website
Reposted by Collin Berke
thomasp85.com
I am beyond excited to announce that ggplot2 4.0.0 has just landed on CRAN.

It's not every day we have a new major #ggplot2 release but it is a fitting 18 year birthday present for the package.

Get an overview of the release in this blog post and be on the lookout for more in-depth posts #rstats
ggplot2 4.0.0
A new major version of ggplot2 has been released on CRAN. Find out what is new here.
www.tidyverse.org
collinberke.bsky.social
Clearly I don't know how link shortening works on BlueSky. So, if anyone has trouble opening the above links, just send me a DM and I'll point you in the right direction.
collinberke.bsky.social
Seeing you have marketing listed in your description
@drsundar.bsky.social
, you might find this dataset useful: developers.google.com/analytics/bi...

I also wrote a blog post on this topic as well. Check it out: www.collinberke.com/blog/posts/2....
developers.google.com
collinberke.bsky.social
Yes, of course! BigQuery is a cloud based data warehouse: cloud.google.com/bigquery. bigrquery is an #rstats package that provides an interface to bridge BigQuery and R (i.e., download data stored in this warehouse). BigQuery has public datasets to experiment with: cloud.google.com/bigquery/pub...
BigQuery | AI data platform | Lakehouse | EDW
BigQuery is the autonomous data and AI platform, automating the entire data lifecycle so you can go from data to AI to action faster.
cloud.google.com