Pete Bachant
@petebachant.me
200 followers 760 following 130 posts
Bicycles, fluid dynamics, Python, open source, open science, reproducibility. https://petebachant.me | https://calkit.org
Posts Media Videos Starter Packs
petebachant.me
1. Generate evidence to support some claims
2. Don't automate the creation of said evidence

Congratulations, you've just contributed to the reproducibility crisis!

#reproducibility #openscience
Reposted by Pete Bachant
pyopensci.org
So much brilliant work never makes it into a paper.
The code, the data, the long nights helping others debug.
At pyOpenSci, we believe that code, data, and community are the pulse.
Research advances quickly when we build together & openly.
Join us. 💛 bit.ly/pyos-volunteer
#openscience #opensource
Get involved with pyOpenSci
pyOpenSci’s Website
bit.ly
petebachant.me
Don't be ashamed of "messy" code. If it works, it's good. Share it.

#openscience #reproducibility
Messy reproducible code is better than "clean" but irreproducible code.
petebachant.me
I'd also argue that things might be getting worse. Researchers have access to even more complex tooling when it comes to doing computations: bigger ML models, GPUs, etc. Keeping workflows reproducible is a huge task. I'd argue it's worth it, but only if can be done cheaply.
petebachant.me
Reading through some slides from 2013 titled "how to succeed in reproducible research without really trying". It's true we have all the tools needed for researchers to build their own reproducible workflows, but still many do not. Maybe the tools are still too hard to learn and use!
petebachant.me
Programming tip: Name classes after the data they encapsulate, not the actions they perform on that data. For example, instead of SchemaProcessor, just call it Schema:

processed_schema = Schema().process()

#programming #oop #softwareengineering
petebachant.me
Hot take: Notebooks are fine in production as long as they're part of a reproducible pipeline

docs.calkit.org/notebooks/

#reproducibility #datascience #openscience
Notebooks - Calkit
docs.calkit.org
petebachant.me
Still trying to learn what makes my requests so unreasonable
Reposted by Pete Bachant
plos.org
PLOS @plos.org · 22d
In a newly released arXiv preprint, we explore how open science practice like sharing data, code and preprints relate to citation impact in French-authored research over a 3-year period.

Thanks to @ouvrirlascience.bsky.social for highlighting its national importance.

🔗 Read more: plos.io/3Vmykrj
Promotional image for PLOS featuring a butterfly on a flower on the left and a graphic with colorful lines and text about open science practice on the right.
petebachant.me
In other words, it's fine to arrive at something you like interactively, but if you want to share it, put it into the pipeline first.
petebachant.me
Reproducibility tip: Any figure, dataset, ML model, etc., should not be shared until it is produced with an automated, version-controlled pipeline.

#reproducibility #openscience
petebachant.me
While profiling some CUDA code on a SLURM cluster I realized I was not working in a very reproducible way, which could become a problem down the road if I ever needed to know how a certain result was generated, so Calkit now has SLURM integration: docs.calkit.org/pipeline/slu...
SLURM integration - Calkit
docs.calkit.org
petebachant.me
That is what I do, but an easier way might be nice :)
petebachant.me
Julia should have an option that automatically does the same thing as:

export JULIA_LOAD_PATH=@:@stdlib
julia --project=. -e 'using Pkg; Pkg.instantiate()'

before running any command.

Maybe an enhanced reproducibility mode option, like --repro?

#julialang #reproducibility
petebachant.me
Just sent you an email!
petebachant.me
I don't know why, but I always found it hard to remember the process for adding an SSH key to GitHub, so I made a wizard for it:

calkit config github-ssh

(might be buggy, but still an improvement over manually running commands from the docs)
petebachant.me
Moving pieces of code farther apart from each other (into different packages, modules, repos) doesn't guarantee you've decoupled them. In fact, if you haven't, you probably just made your life a whole lot harder.

#softwareengineering
petebachant.me
If you're a leader of knowledge workers you should be giving teams fewer, vaguer goals. Handing out well-defined projects and tasks to individuals is a waste of their abilities.
petebachant.me
How much "waste" do you have in your scientific workflow? For example, do you manually rerun plotting scripts/notebooks after updating processing logic? Do you then manually re-upload these figures to Overleaf? Want to automate this stuff away? Reach out and I will help!
petebachant.me
Code, data, config files, etc. all must be shared in order to describe computational methods with sufficient detail.

#reproducibility #openscience
petebachant.me
Anyone have any good references that examine the relationship between computational reproducibility and time to publication? I'd assume more automated, reproducible workflows will help studies get through peer review more quickly.

#openscience #reproducibility