Ben J Woodcroft
@benjwoodcroft.bsky.social
380 followers 140 following 60 posts
Yet another microbial bioinformatician, group leader, dad github.com/wwood https://research.qut.edu.au/cmr/team/ben-woodcroft/
Posts Media Videos Starter Packs
Pinned
benjwoodcroft.bsky.social
Out in @natbiotech.nature.com: Metagenome taxonomy profilers usually ignore unknown species. SingleM is an accurate profiler which doesn't, even detecting phyla with no MAGs. Profiles of 700,000 metagenomes at sandpiper.qut.edu.au. A 🧵
Logo for the Sandpiper website
benjwoodcroft.bsky.social
Yes normal behaviour. See wwood.github.io/singlem/FAQ for the formula - most windows are 60bp, and so if your reads are uniform length you get that.

But you are looking at the OTU table there, perhaps you want the taxonomic profile output (which is a more final output)?
FAQ
Documentation for SingleM
wwood.github.io
benjwoodcroft.bsky.social
Trimmed reads are bad news when they become short, but if they remain 100bp+ then you should be fine I reckon.
2/2
benjwoodcroft.bsky.social
Thanks - strange that your Lyrebird experience wasn't good. Please report errors (what did you you?) at github.com/wwood/single... or just via email. We test installation inclusive of DB download at github.com/wwood/single...

But a new version of the lyrebird DB incoming btw.
1/2
wwood/singlem
Novelty-inclusive microbial (and now dsDNA phage) community profiling of shotgun metagenomes - wwood/singlem
github.com
Reposted by Ben J Woodcroft
caleblareau.bsky.social
Excited to share a new preprint from the lab with @ryandhindsa.bsky.social ! www.biorxiv.org/content/10.1...

Led by @sherrynyeo.bsky.social, @erinmayc.bsky.social, and friends, we continue our journey to find viral DNA in our favorite place-- the overlooked and discarded reads in existing data! 1/
the treasure trove of all sequencing datasets
benjwoodcroft.bsky.social
Thanks for kind words. By UCEs you mean e.g. 16S? It actually does this already, and tests pass. But it isn't the most efficient and code is a bit crusty and db is out of date, since it doesn't get used much. See wwood.github.io/singlem/FAQ
FAQ
Documentation for SingleM
wwood.github.io
benjwoodcroft.bsky.social
This is great @titus.idyll.org (though to be picky it's SingleM or singlem, not singleM). We wrote a few parsers for other formats at github.com/wwood/single... - it'd be nice if not everyone needed to reinvent (and use standard names for things like coverage inclusive vs exclusive of children).
singlem-benchmarking/bin at main · wwood/singlem-benchmarking
Contribute to wwood/singlem-benchmarking development by creating an account on GitHub.
github.com
benjwoodcroft.bsky.social
I wonder if AI could do a good job of that integration. I'd love to learn some Haskell actually, just need to find the time..
benjwoodcroft.bsky.social
Good good, or could be better?
benjwoodcroft.bsky.social
Good q. Imagine that new Chem nanopore should be fine. You can check by running the supplemented package on your genomes and making sure there is the expected number of markers detected. Should be in line with mag completeness.
benjwoodcroft.bsky.social
cheers Daan - here's a thread explaining some of the deets bsky.app/profile/benj...
benjwoodcroft.bsky.social
Out in @natbiotech.nature.com: Metagenome taxonomy profilers usually ignore unknown species. SingleM is an accurate profiler which doesn't, even detecting phyla with no MAGs. Profiles of 700,000 metagenomes at sandpiper.qut.edu.au. A 🧵
Logo for the Sandpiper website
benjwoodcroft.bsky.social
Thanks for spreading the word @jcamthrash.bsky.social - there's a explanatory thread at bsky.app/profile/benj...
benjwoodcroft.bsky.social
Out in @natbiotech.nature.com: Metagenome taxonomy profilers usually ignore unknown species. SingleM is an accurate profiler which doesn't, even detecting phyla with no MAGs. Profiles of 700,000 metagenomes at sandpiper.qut.edu.au. A 🧵
Logo for the Sandpiper website
benjwoodcroft.bsky.social
Thanks for considering it for publication. There's a explanation thread at bsky.app/profile/benj...
benjwoodcroft.bsky.social
Out in @natbiotech.nature.com: Metagenome taxonomy profilers usually ignore unknown species. SingleM is an accurate profiler which doesn't, even detecting phyla with no MAGs. Profiles of 700,000 metagenomes at sandpiper.qut.edu.au. A 🧵
Logo for the Sandpiper website
benjwoodcroft.bsky.social
Thanks also to the reviewers including Alice McHardy - very fair and helpful we thought.
benjwoodcroft.bsky.social
Many many to thank, particularly
@aroneys.bsky.social @rossenzhao.bsky.social Mitchell Cunningham, Linda Blackall, Gene Tyson, @cmrqut.bsky.social and dozens of people who have helped with the software, ms, and everyone who tolerated my enthusiasm.
benjwoodcroft.bsky.social
SingleM is BYO genome, you can add your MAGs to the refDB to get profiles which include both known species and your novel MAGs. wwood.github.io/singlem/tool...
SingleM supplement
Documentation for SingleM
wwood.github.io
benjwoodcroft.bsky.social
Novel lineage detection + 700k profiles makes it possible to recover novel MAGs from taxons you care about. We recovered new genera from the underrepresented Muirbacteria, Wallbacteria, Riflebacteria and Fusobacteria phyla by assembling the right metagenomes.
benjwoodcroft.bsky.social
@ace-gtdb.bsky.social R226-based profiles from 700k public metagenomes are at sandpiper.qut.edu.au. Search for your fave microbe by GTDB taxonomy there and see to get prevalence and community profiles. Got something novel? Get in touch.
Screenshot of the sandpiper website front page
benjwoodcroft.bsky.social
A new
@rust-lang.org approach also helps - conserved regions are already aligned to each other so distance calcs become a vector similarity search problem. github.com/wwood/smafa Big distance => novel species. Props to @viralinstruction.bsky.social for awesome PR.
GitHub - wwood/smafa: Biological sequence aligner for pre-aligned sequences
Biological sequence aligner for pre-aligned sequences - wwood/smafa
github.com
benjwoodcroft.bsky.social
Fast and RAM-efficient since most raw reads are swiftly ignored. We optimise an up-front DIAMOND BLASTX-based method. Thanks @bbuchfink.bsky.social / Serratus for makeidx
Walltime and RAM usage of SingleM.
benjwoodcroft.bsky.social
Perhaps most strikingly, it detects microbes that aren't in the ref db, correctly weighting their relative abundance.
Benchmarking of SingleM when challenged by novel phyla, genera and species. Performance of SingleM substantially greater than other tools, particularly for higher levels of novelty.
benjwoodcroft.bsky.social
It's accurate on communities of known species / non-rep strains (though can struggle with low abundance species where coverage <1X)
Benchmarking results of several community profilers, showing good performance of SingleM.
benjwoodcroft.bsky.social
SingleM is a new approach to metagenome profiling. It uses conserved regions within marker genes (20aa) spanned by individual short reads. Concentrating analysis on these regions makes things easier.
A cartoon of a sequence alignment, showing how only reads which fully cover the 20aa window of the HMM are retained for further downstream analysis.