Paul Harrison
@paulfharrison.bsky.social
420 followers 120 following 82 posts
Bioinformatician at Monash University, Melbourne, Australia. I also use mastodon: @[email protected] https://mastodon.online/@pfh My homepage is: https://logarithmic.net/pfh/ On Twitter I was: @paulfharrison
Posts Media Videos Starter Packs
paulfharrison.bsky.social
"... a fundamental question that often remains overlooked is whether or not model parameters can be confidently estimated from the available data."
Reposted by Paul Harrison
tommytang.bsky.social
Heatmap in ggplot2
yunuuuu.github.io/ggalign/ind...
I always use complexheatmap, but this seems to be a good alternative if you want to stay within the ggplot
paulfharrison.bsky.social
If you've ever wondered why medical research has so many Chesterton's Fences, or what it would actually take to "do your own research", this would be a good starting point. (The target audience of the book is doctors seeking to use published medical research.)
paulfharrison.bsky.social
I did get the AMS. Not quite sure what I'm doing with it. Maybe some interesting possibilities mixing soft and hard materials.

I've heard good things about marble PLA.
paulfharrison.bsky.social
uvx demakein

I've finally updated my wind instrument design program to Python 3. It only took me 10 years to get around to. I was pleased to find there is now a fairly solid python library for 3D boolean operations (manifold3d).

github.com/pfh/demakein
A picture of a 3D printed whistle in front of a 3D printer.
paulfharrison.bsky.social
grug brain bioinformatician not trust maximum a posteriori estimate. big brained bioinformatics shaman develop map estimate. danger! noise demon hide deeper in data! grug prefer count matrix. grug know what to do when have count matrix.
Reposted by Paul Harrison
robert.bio
Excited to announce our first interactive article on sandbox.bio, about genomic ranges: sandbox.bio/concepts/gen...

Move & resize the ranges to see how that affects bedtools operations like merge and intersect in real time!
Reposted by Paul Harrison
macsys.bsky.social
🚨 Exciting PhD Opportunities with MACSYS! The MACSYS team at Monash University is offering multiple fully funded #PhD scholarships for students eager to explore the cutting edge of computational biology, microbiology, and systems modelling.
👉More info/apply: macsys.org/monash-phd-s...
Reposted by Paul Harrison
mixomics.org
👩‍💻 We’re hiring! Lê Cao Lab at Uni Melbourne @mig-unimelb.bsky.social needs an R dev to power the next-gen of mixOmics 🚀
Love #RStats, #Bioconductor & multi-omics? Help expand mixOmics, run workshops & publish cutting-edge methods.
Apply: unimelb.wd105.myworkdayjobs.com/en-US/UoM_Ex...
Reposted by Paul Harrison
davisjmcc.bsky.social
📢 PostDoc opportunity in our Bioinformatics & Cellular Genomics lab at SVI! 🧬

You’d join a welcoming, supportive, and brilliant team.

Why not spend a few years in Melbourne and be part of something exciting?

Apply here: www.seek.com.au/job/84737876

#ScienceCareers #PostDoc #Bioinformatics
Research Officer - Bioinformatics Job in Fitzroy, Melbourne VIC - SEEK
Seeking a Postdoc to develop computational toolkits to enable large-scale studies of single-cell and spatial 'omics and statistical genetics
www.seek.com.au
paulfharrison.bsky.social
Parquet format and the arrow library have been life changing.

(I suspect I should be getting on board with duckdb one of these days too.)
paulfharrison.bsky.social
(Well, not exactly fine. A change to any single species abundance alters all of these ratios, so the null hypothesis could then be quite correctly rejected for all species!)
paulfharrison.bsky.social
To be clear, I'm arguing semantics. I believe the software does *something* useful, it's just not being described clearly.

For example, it might look at the ratio of each species to the geometric mean as a baseline. This is fine, but that baseline's appropriateness needs to be checked.
paulfharrison.bsky.social
I continue to be astounded at the number of compositional data analysis packages that will happily report differential abundance of individual species.

Did they not understand the concept of compositional data? How is it possible to publish methods with this premise?
paulfharrison.bsky.social
Putting it through its paces. However I have more testing to do to really get to know this transformation.

logarithmic.net/varistran/ar...
Samesum transformation
logarithmic.net
paulfharrison.bsky.social
Transform counts like log2(count/scale+1), with a scale chosen per sample such that each sample adds to the same total.

It's similar to CLR with a pseudocount, but all zeros transform to the same value.
paulfharrison.bsky.social
Normalization and log transformation of log count data. Pseudocounts, library size adjustment, Centered Log Ratios (CLR), Variance Stabilizing Transformation, and all that. Many variations on a similar task. Here's something I haven't seen done:
paulfharrison.bsky.social
Some conventional flow matching:
A computer generated image that looks like daubs of colored oil-paint.
paulfharrison.bsky.social
Yes, the link above goes on to talk about that in the next section.

I do hesitate a bit at having a single duplicate correlation value across all genes. There's a package called variancePartition without this limitation, but I haven't tried it.
paulfharrison.bsky.social
The first time I saw this it took me a couple of weeks to get my head around it.

~timepoint*treatment+id has multicollinearity because id nests within treatment. We could use a mixed model ~timepoint*treatment+(1|id), but many popular tools only support fixed effects models.