Gordon Forbes
@gforb.bsky.social
98 followers 100 following 55 posts
I love Netflix for their data science blog and The BBC for their ggplot2 resources.
Posts Media Videos Starter Packs
gforb.bsky.social
How is AI going to make this all easier?

And relatedly, how can we stop the increase in productivity form AI leading to an overwhelming ammount of methedologicaly questionable research.
gforb.bsky.social
To me this is what age as catagorical means - eg. age 0-18, 19 - 65, 65+.

I would describe using individual integer ages as (ie. 1,2,3,4,5,6,...) as descrete age.

The 'big catagory' approach is used worryingly often - sometimes due to restrictions on the data.
Reposted by Gordon Forbes
gdeejay.bsky.social
Hey #rstats,

What's your rule for splitting R scripts that form part of a wider analysis pipeline / project?

I usually write a single script which includes sections for each step from data cleaning to the final results, but it can become unwieldy when the script becomes long.
...
Reposted by Gordon Forbes
gforb.bsky.social
On tabular health data, time and time again, I see linear (or generalised linear models) perform as well or better than machine learning algorithms that avoid linearity assumptions.

I am surprised by this, as the linearity assumption is unlikely to be true.

Does anyone else see this? Why is this?
The optimal machine learning model was linear regression
Reposted by Gordon Forbes
dingdingpeng.the100.ci
Georgi Baklicharov asks: can treatment effect testing in trials with intercurrent events be nearly assumption-free? #EuroCIM2025
gforb.bsky.social
Hard disagree! It is acceptable and probably preferable for a group to write a paper without understanding the details of each other's work.

eg. I want to be able to write "the model was estimated with restricted maximum likelihood" in a stats section without explaining REML to my collaborators.
Reposted by Gordon Forbes
rmkubinec.bsky.social
Happy to see that ordered beta regression reached 100 citations on Google Scholar!

The model has citations from work in climate science, ecology, medicine, psychology, & political science, just to name a few.

Thanks to all of you for using ordbetareg (or glmmTMB)!!

#rstats
Reposted by Gordon Forbes
dingdingpeng.the100.ci
I'm working with data that I'm not allowed to share -- I'd like to generate synthetic data so that others at least have something that they can run my code on!

Any pointers to tutorials, favorite packages etc.?
a man wearing a mask is playing a keyboard with the words love synths written above him
ALT: a man wearing a mask is playing a keyboard with the words love synths written above him
media.tenor.com
Reposted by Gordon Forbes
juliasilge.com
Check out my new screencast, where I walk through how I use #Positron for #rstats package development work. I decided to release a new version of an R package to CRAN ✨live✨ this time around!

youtu.be/uL3NZQIMrpk
Release an R package with Positron
YouTube video by Julia Silge
youtu.be
Reposted by Gordon Forbes
matloff.bsky.social
In coursework, the contrast between ridge regression and the LASSO is really emphasized. After all, the latter actually does feature selection, by virtue of having a sparse solution to the minimum l1 problem, versus ridge's nonsparse solution in l2, pretty cool. 1/2
gforb.bsky.social
I totally agree that, theoretically, it makes no sense to shrink parameters to zero.

In practice, if it means a model can be applied without collecting mostly irrelevant data, this can be a huge win. Especially in health when that extra data can involve invasive or expensive tests.
gforb.bsky.social
R packages for consort diagrams
cghlewis.bsky.social
Lots of Word Doc templates out there if you just search "consort diagram template".

Also, several R packages to help develop these.
www.riinu.me/2024/02/cons...
Reposted by Gordon Forbes
statsepi.bsky.social
The term "digital twin" as it is now used in medicine has no real relationship to how the term is used in engineering. Yet every paper on the former talks about the success of the latter as if that's relevant. They are not the same!
gforb.bsky.social
My spell check is trying to change bootstrap to Boomer.

As in Boomer p-values.

I didn't know the generation wars had made it to statistical inference. What next? Millennial credible intervals?
gforb.bsky.social
Surely, do what is computationally feasible.

For a random forest on a medium-sized data set, there is no reason not to use a resampling approach or cross-validation.

If you have just developed an LLM using data scraped from the whole Internet, you are not going to be running cross-validation.
Reposted by Gordon Forbes
statsepi.bsky.social
I think you mean "non-specific".
gforb.bsky.social
When I've worked with REDCap databases I've always had to rely on a data manager pulling an extract and emailing it to me.

This could be a game changer.
Reposted by Gordon Forbes
timpmorris.bsky.social
A minor stylistic preference I’ve recently found myself using: When introducing a key initialism or acronym in a paper, put the compressed version in the text and its expansion in parentheses.

Instead of ‘under missing at random (MAR)’, use ‘under MAR (missing at random)’.
1/
Reposted by Gordon Forbes
hadley.nz
What advice do folks have for organising projects that will be deployed to production? How do you organise your directories? What do you do if you're deploying multiple "things" (e.g. an app and an api) from the same project?