Austin Richardson
@agdr.org
350 followers 320 following 24 posts
Metagenomicist @onecodex.bsky.social. My cat’s name is Nancy
Posts Media Videos Starter Packs
Pinned
agdr.org
NCBI's Taxonomy changes over time. We built Taxonomy Time Machine to track these changes:

🕰️ app: taxonomy.onecodex.com
📄 pre-print: www.biorxiv.org/content/10.1...
A screenshot of a webpage titled "Taxonomy Time Machine" showing the taxonomic lineage and descendants of the Influenza A Virus
Reposted by Austin Richardson
jlsteenwyk.bsky.social
Apple's approach to protein structure is great for accessibility - & potentially biological realism - reasons.

Eg, prediction could be achieved w/ smaller compute & the generative nature of prediction allows for multiple conformations

A summary here: genomely.substack.com/p/simplefold...
SimpleFold and the Future of Protein Folding
A Generative Shift in Protein Folding
genomely.substack.com
Reposted by Austin Richardson
If you're wondering why we're hosting the pre-print via dropbox, its because arXiv (and bioRxiv) did not accept it (because it is a review). Its a bit disconcerting, because a review is precisely the type of paper that would benefit a lot from pre-publication dissemination and feedback.
Thank you folks for your feedback on our survey about Hash functions in genomic sequence analysis. We've updated the paper and you can see the new version here: tinyurl.com/4kk9ccmt.
agdr.org
Closed my eyes for a sec and summoned another earthquake
agdr.org
they should invent a type of volatile memory that gets heavier the more data it contains
Reposted by Austin Richardson
bedec.bsky.social
Blogged about how zstd --long fills the gap between fast and slow-but-high-ratio genome compression methods log.bede.im/2025/09/12/z...
agdr.org
you can just pour milk over trail mix and eat it like cereal
agdr.org
"You are standing in an open field west of a white house, with a boarded front door."
Reposted by Austin Richardson
curiouscoding.nl
Little writeup on the speed of fasta parsers, at last.

Basically: both needletail and paraseq are process input linearly, and thus have a limit around 4 GB/s.

By giving each thread its own slice of the input file, we're limited by RAM bandwidth instead :)

curiouscoding.nl/posts/fasta-...
With 3 threads, the middle thread processes the reads starting in the middle third of the fasta file.
Reposted by Austin Richardson
punkrockscience.bsky.social
I do not enjoy that we now live in a world where seeing this banner at the top of PubMed makes me nervous.
Red banner from the top of PubMed, saying "Service Alert: Planned Maintenance beginning July 25th. Most services will be unavailable for 24+ hours starting 9pm EDT. Learn more about the maintenance."
Reposted by Austin Richardson
acritschristoph.bsky.social
TIL the EBV genome is *included in the hg38 assembly* so that EBV reads are not erroneously mapped elsewhere to the human genome. That's certainly .... an interesting solution ... 🤯

But it enabled this extremely cool work:
caleblareau.bsky.social
In a cool twist of fate, the EBV contig is in hg38 to mop up unscrupulous EBV reads, a by-product of immortalizing lymphoblastic cell lines (used in 1000 Genomes Project, etc.). Hence, a simple `samtools view` could get a measure of persistent EBV DNA in large WGS cohorts, e.g., UK Biobank. 4/
Reposted by Austin Richardson
philippbayer.bsky.social
This is a bad take
stevensalzberg.substack.com/p/i-know-gen...

Saying that DNA data is like your browsing data and can can therefore be leaked is a false equivalence. Thing A is on fire so it's fine for thing B to be on fire, too-style argumentation.
I know genomes. Don't delete your DNA
Too many people are panicking about 23andMe.
stevensalzberg.substack.com
agdr.org
Q: what do viruses and potatoes have in common?
A: both are "acellular root"
Reposted by Austin Richardson
bioinformer.bsky.social
Are you attending #ASMicrobe this is week? Stop by my talk on Friday morning (10AM) and say hello! 👋 if you can’t make it and want to meet up - just drop me a DM!

I love this meeting and connecting with so many friends and colleagues over the years has made it really a special meeting.
agdr.org
🌳 Taxonomy Time Machine now supports batch lookups! Quickly resolve lists of names/TaxIDs to their current NCBI taxonomy → taxonomy.onecodex.com/bulk-resolver
Reposted by Austin Richardson
bioinformer.bsky.social
🧵 The ATCC Genome Portal hit 5,500 authenticated microbial genomes (>2,600 species)! 🎉🥳 We've sequenced, assembled, annotated 4,538 bacteria, 479 viruses, 479 fungi, and 4 protists! All NGS in-house @ ATCC under ISO, and >90% on BOTH @nanoporetech.com and #Illumina 😎 www.atcc.org/applications...
Discover the ATCC Genome Portal | ATCCCart
www.atcc.org
agdr.org
Something happened to my $PATH and now nothing works

Trisolarans: “the Sophons have succeeded in disrupting science”
agdr.org
Bad day for VCF files
Reposted by Austin Richardson
kblin.bsky.social
It's clearly a DNS issue, but overall, the NCBI is the least reliable I've ever experienced in my career. And I'm in this long enough to remember the Entrez API giving you only part of the file every 50-100th time.
agdr.org
using github copilot to fail at github workflows aka boiling the ocean
agdr.org
I call it the London Smaug (Tension Tamer + espresso latte)
agdr.org
How many arginines are in mStrawberry?