Scott James Perry
banner
sjperry.bsky.social
Scott James Perry
@sjperry.bsky.social
SSHRC postdoctoral fellow at Universitat Pompeu Fabra. Phonetician interested in spontaneous speech, bilingualism, and lexical processing. Big fan of quantitative methods and properly extracted coffee!
I was hired to analyze some data a while back and the professors were surprised at the end when I sent a folder with all of the scripts + a README file. Apparently the consultant before me never shared scripts because he said "then the next time you'll just use the script and not have to hire me".
September 23, 2025 at 3:16 PM
The numbers on sharing these things later on "reasonable request" are pretty bleak, so if you don't share it puiblicly in a way that is linked with the paper somehow that information is/will be gone forever. If you can't share the evidence of a claim, the strength claims make by the paper is weaker.
July 16, 2025 at 1:52 PM
I always go through scripts shared during peer-review and make sure they are clear, run properly, and check it for common statistical errors. I have found everything from silly mistakes in coding to serious errors in data cleaning that invalidated the claims of the paper.
July 16, 2025 at 1:50 PM
b) Unless there is a valid reason for the data to NOT be shared (and there are several), it should be the default because it's hard to verify what was done with a script and no data. Or at least it's harder to tell if what was done was reasonable.
July 16, 2025 at 1:47 PM
IMO data cleaning scripts also need to be in the open materials, although I've had spirited discussions about this with co-authors before who think that only the bare minimum code to replicate the numbers in the paper needs to be shared.
July 16, 2025 at 1:46 PM
a) It stands as a record of what you did, including details that will never make it into a paper. As analyses get larger and more unwieldy, less details can be reported in papers that sometimes have strict word limits.
July 16, 2025 at 1:45 PM
An exception: you want cheddar in Spain and all the locally-made stuff is white and the KerryGold is yellow.
July 1, 2025 at 5:59 AM
I honestly think that I've had to cite this paper in more than half of the reviews I've ever done. At least in linguistics, it seems to be a super common misinterpretation.
June 3, 2025 at 7:13 AM
Yeah, I taught a class in a new department during the pandemic when masks were mandatory. When the mandate lifted and they all stopped wearing them I realized that I couldn't recognize most students because I filled in the lower half of their faces and was not accurate not even once.
May 21, 2025 at 8:34 AM
To be fair I built and troubleshot the model/priors on simulated data a few months ago before we built the survey, but I always feel more comfortable when something goes wrong and I dig around to fix it. Now I feel like a cartoon character that's walked off a cliff but hasn't looked down yet.
May 20, 2025 at 12:59 PM
I'm still very novice in terms of Stan code (simple models I can code up - hierarchical models kick my butt), but if your target audience is brms people looking to learn more Stan then that's me and I'd be happy to work through drafts and give feedback.
May 20, 2025 at 12:38 PM
Tagging @dingdingpeng.the100.ci and @vincentab.bsky.social because I told them I'd keep them in the loop. Also, I'll be writing up the code and examples used in this class into a blog post directed towards linguists that want to start to use the package marginaleffects!
April 11, 2025 at 1:45 PM
For the few it didn't work for, I'm trying to keep in mind that the previous version had been honed for 4 years, while this was my first time teaching this. I'm sure I have lots of room for improvement in walking others through this material. Hopefully this can mitigate the confusion in future years
April 11, 2025 at 1:42 PM
Overall, I think I will pivot to teaching marginaleffects for model interpretation. It seems like it was positive for many students, neutral for quite a few, and potentially discouraging for a small minority.
April 11, 2025 at 1:42 PM
A couple students said they understand LMMs and GLMs worse than before the course. This was hard to read, because I want to motivate students to learn stats, not discourage them. I'm hoping this is a "I'm confused because I'm paying attention" thing, but maybe this change didn't work for everyone.
April 11, 2025 at 1:42 PM
We also used non-aggregated predictions and comparisons to help us understand what is actually happening in a mixed-effects model (i.e., varying intercepts and slopes). Of those who already had experience with these, more than half said that doing this made them understand the models better.
April 11, 2025 at 1:41 PM
For predictions/comparisons, I asked how comfortable people felt from "1" (I don't feel comfortable using this at all) to "5" (I could use this in my own research). Most answered between 2-4. Those with previous stats knowledge answered higher. predictions() were rated higher than comparisons().
April 11, 2025 at 1:41 PM
25% of students took this course previously. Of that 25%, half thought that model interpretation was more straightforward with marginaleffects compared to summary tables. The other half said both were equally hard to understand. No one thought summary tables were easier, or that both were easy.
April 11, 2025 at 1:40 PM