Lightnews — Scholar-powered news

Florian Trigodet

@floriantrigodet.bsky.social

440 followers 210 following 22 posts

Computational microbiologist. Senior scientist at the Helmholtz Institute for Functional Marine Biodiversity, Oldenburg. Working in Meren lab.

Posts Media Videos Starter Packs

Pinned

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

I am very happy (and anxious) to share with you our most recent work in which we evaluated four of the most popular long-read assemblers,

www.biorxiv.org/content/10.1...

and tell you just a little bit about it in the following 🧵

Assemblies of long-read metagenomes suffer from diverse errors

Genomes from metagenomes have revolutionised our understanding of microbial diversity, ecology, and evolution, propelling advances in basic science, biomedicine, and biotechnology. Assembly algorithms...

www.biorxiv.org

5 69 130

Florian Trigodet @floriantrigodet.bsky.social · 10h

See this tutorial by Iva Veseli: anvio.org/tutorials/fm...
It contains everything you described above

An exercise on metabolic reconstruction in anvi'o

A tutorial on how to run metabolism estimation and enrichment in anvi'o

anvio.org

Florian Trigodet @floriantrigodet.bsky.social · 10h

With anvi'o you can annotate genomes with multiple annotation sources, including KEGG KOfams.
Anvi'o also include a set of tool to compute metabolic module completeness and copy/numbers (useful for metagenomics).
Better: there is a program to compare func and metabolic enrichment

Florian Trigodet @floriantrigodet.bsky.social · 14d

I briefly used myloasm on a small project and I found the same read clipping issues as we reported for other assemblers like metaMDBG. Haven't had the time to run a full scale analysis like we did in our preprint.

Reposted by Florian Trigodet

Daan Speth @daanspeth.bsky.social · Jun 10

I'm happy to announce the latest release of the GlobDB, available at globdb.org.

The GlobDB is a database of "species dereplicated" microbial genomes, and as of release 226 contains twice the number of species-representative genomes (306,260) than the latest GTDB release.

home | GlobDB

globdb.org

3 61 110

Reposted by Florian Trigodet

A. Murat Eren (Meren) @merenbey.bsky.social · Jun 5

We have a new 3-year postdoc position in our group at the @hifmb.de to study plasmids and plasmids systems of the marine environment to survey their utility in microbial responses to environmental change.

Please see the official job ad here, and spread the word:

jobs.awi.de/Vacancies/20...

PostDoc in ecology and evolution of plasmids in polar waters at HIFMB (f/d/m)

Layout AWI HIPP extern, englisch

jobs.awi.de

2 85 79

Reposted by Florian Trigodet

Svetlana Ugarcina Perovic @svetlanaup.bsky.social · Apr 30

New #MicrobiomeDigest: microbiomedigest.com/2025/04/30/a...

•Herrgårds cheese @jrotwitguez.bsky.social‬
•Asian Skin Microbiome Program @cherrychengchenli.bsky.social ‬
•long-read assemblers @floriantrigodet.bsky.social
•argNorm @vedanthramji.bsky.social

CU next week at @microbiomevif.bsky.social!

April 30, 2025

See you next week at MVIF! Skin microbiome Large-scale skin metagenomics reveals extensive prevalence, coordination, and functional adaptation of skin microbiome dermotypes across body sites – Che…

microbiomedigest.com

5 4

Florian Trigodet @floriantrigodet.bsky.social · Apr 29

In the end, the truth should be in the reads and if multiple long (or short) reads support the joining of two genomic sequence with different GC content, skew, etc; then I would be inclined to trust its reality

Florian Trigodet @floriantrigodet.bsky.social · Apr 29

(if all of them happens at the same genomic loci, I would have no doubt that it a case of a chimeric contig)

Florian Trigodet @floriantrigodet.bsky.social · Apr 29

But all of them can occur naturally: recent genomic rearrangement creates shifts in GC skew; GC content can change with HGT or non-coding genes like rRNA; non-specific read recruitment and hypervariable region (insertion/deletion of genes) creates drops in coverage

3 2

Florian Trigodet @floriantrigodet.bsky.social · Apr 29

GC skew is a great idea and I think a combination of GC skew, sharp change in GC content and drop of coverage would be great indicators to find chimeric sequences. Especially if they all occurs at the same genomic location

1 2

Florian Trigodet @floriantrigodet.bsky.social · Apr 29

Thanks a lot for going through it in your journal club! The details of my blast search can be found here: tinyurl.com/ynxwsvwc
In short, I remove the DUST filter and I ask BLAST to only report the first hit. I don't know how it would report no hits if too many hits?

A reproducible workflow for Trigodet et al., 2025

A bioinformatics workflow for our study long-read assemblers

tinyurl.com

1 1

Reposted by Florian Trigodet

The Banfield Lab @banfieldlab.bsky.social · Apr 28

Genomes from long-read metagenomic assemblies contain rampant errors, highlighting the pressing need for stricter evaluation methods in long-read assembly algorithms. Read more in our paper with the Eren group. @floriantrigodet.bsky.social @merenbey.bsky.social

16 41

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

Misreporting non-circular elements as circular can quickly deteriorate public genome databases. We hope we can work together to ensure assemblers include stricter checks, or offer modes that prioritize caution. We would love to hear your experiences or thoughts!

1 1 6

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

We hope to help the community to understand potential issues they may run into, and help the developers to see different perspectives. We have a fully reproducible bioinformatics workflow, and it is easy to add one more assembler to it, or new datasets:

merenlab.org/data/benchma...

A reproducible workflow for Trigodet et al., 2025

A bioinformatics workflow for our study long-read assemblers

merenlab.org

1 7

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

We are aware that developing assembly algorithms, especially for metagenomes, is a notoriously complex and difficult task, and we have a deep appreciation of those who invest their time and skills in creating and maintaining them. A heartfelt THANK YOU. We're here to help, nothing more.

1 2 8

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

And we observed astonishing number of repeats in results. Repeats are common in nature and the improved ability to resolve repeats is one of the strengths of long-read sequencing. But the repeats we found didn't look convincing and likely underlined other issues.

2 4 5

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

Accurate reconstruction of genomic variation is essential to associate within-population structural differences to ecological or evolutionary phenotypes. But we observed serious haplotyping errors, where assemblers created chimeric constructs or did things biologists wouldn't expect.

Figure 4. Prototypical mapping artifacts and their putative origin. (A) A chimeric sequence assembled from two subpopulations. At a conserved locus, two subpopulations existed with their own and distinct sequence. The assembled contig contains all or a part of each subpopulation specific sequence resulting in a chimeric construct. (B) Another example of a variable genomic site, but in this example the contig sequence contains the sequence of a very minor subpopulation, supported by only one read. (C) Duplicated sequence only found in the assembly, not supported by any long reads. (D) Two contigs assembled from metaMDBG (left) and metaFlye (left) presenting large regions with no coverage. We blasted these regions back to the long reads and found no hits. Coverage visualization was exported from the anvi’o interactive interface (left) or the IGV software (right) and the read mapping visualization was from IGV as well. Indel smaller than 150bp as well as mismatches are not displayed in the mapping. Red markers at the end of reads indicate read clipping.

1 2 3

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

We observed cases of premature circularization. VERY OFTEN. Catching premature circularization can be easy in some cases, but very difficult in others.

Figure 3. Premature circularization of a Methanothrix genome. (A) Frequency of circular contigs under 500kbp with a minimum of 3 ribosomal proteins. Each point represents one assembly. (B) A pangenomic analysis of all publicly available Methanothrix genomes from the NCBI’s RefSeq database completed with the so-called circular genome of Methanothrix assembled from the sample AD Sludge by hifiasm-meta (light blue), as well as a contig from the same assembly which correspond to the rest of the missing Methanothrix genome (medium blue) and the combination of these two contigs (dark blue). (C) KEGG metabolic module completion of all genomes and contigs in (B). (D) A schematic representation of the reads mapping over a transposase gene in the prematurely circularized contigs (light blue in B and C) showing the lack of reads support around the gene, the full figure is available in Supplementary Figure 2

1 3 3

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

We observed chimeric contigs where the assembly software reported a single contig that brought together sequences from two distinct taxa, sometimes three or more, and even sequences that belonged to distinct domains of life.

Figure 2. Multi-domain and multi-phyla contigs. Six contigs from metaMDBG, metaFlye and hifiasm-meta. For each contigs we displayed the GC content, coverage in the metagenomics reads used for their assembly, gene level taxonomy. For each assembly breakpoint, we display a zoomed-in detail of the read mapping from IGV. In these subplots, red arrows at the end of the mapped read indicate clipping and the coloring at the end of these reads indicates that the following portion of the read mapped to another contig and similar colors indicate that multiple reads continue to map on the same contig. Blue markers indicate large indels (> 150bp)

1 4 3

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

Unlike traditional evaluations of assembly software, our evaluation made quite a heavy use of unassembled long-reads to quantify how well the assembled sequences matched to long-reads. They generally worked great, but then sometimes they didn't at all. Here are a few things we observed:

1 1 1

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

But then we learned that Jill Banfield's group was dealing with similar issues in soil samples. At that point we decided to take a deeper look at our long-read assemblers, and used the datasets they used, and added some novel ones into the mix to re-evaluate them.

A schematic representation of long reads mapping to a contig with multiple types of read disagreement with the reference, including indel and single nucleotide variants representing more than half or all the coverage, and clipping events spanning the entire coverage

1 3

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

Our pangenomes were certainly raising some red flags. But we were not sure if fractions of genomes as circular elements were a feature of nature that we missed due to the years of short-read assemblies. With changing technology, you sometimes learn things you didn't even know you were missing.

1 1 2

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

While examining the assembly results we are initially extremely happy with the very large number of giant and occasionally circular contigs. Although we quickly realised that many of the circular contigs did not make any sense.

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

Assemblers play a significant role turning individual reads into long genomic segments, and have tremendous implications on downstream work. Last year we were very excited to apply some of the new assemblers to our PacBio datasets from marine samples.

Florian Trigodet @floriantrigodet.bsky.social · Apr 28

With technologies such as PacBio and ONT, genome-resolved metagenomics is experiencing its second coming. Complete and circular genomes from all domains of life as well as viruses plasmids, all WITHOUT binning seem right around the corner. That is, if we can actually assemble them.

1 5