Florian Trigodet
@floriantrigodet.bsky.social
440 followers 210 following 22 posts
Computational microbiologist. Senior scientist at the Helmholtz Institute for Functional Marine Biodiversity, Oldenburg. Working in Meren lab.
Posts Media Videos Starter Packs
floriantrigodet.bsky.social
With anvi'o you can annotate genomes with multiple annotation sources, including KEGG KOfams.
Anvi'o also include a set of tool to compute metabolic module completeness and copy/numbers (useful for metagenomics).
Better: there is a program to compare func and metabolic enrichment
floriantrigodet.bsky.social
I briefly used myloasm on a small project and I found the same read clipping issues as we reported for other assemblers like metaMDBG. Haven't had the time to run a full scale analysis like we did in our preprint.
Reposted by Florian Trigodet
daanspeth.bsky.social
I'm happy to announce the latest release of the GlobDB, available at globdb.org.

The GlobDB is a database of "species dereplicated" microbial genomes, and as of release 226 contains twice the number of species-representative genomes (306,260) than the latest GTDB release.
home | GlobDB
globdb.org
Reposted by Florian Trigodet
merenbey.bsky.social
We have a new 3-year postdoc position in our group at the @hifmb.de to study plasmids and plasmids systems of the marine environment to survey their utility in microbial responses to environmental change.

Please see the official job ad here, and spread the word:

jobs.awi.de/Vacancies/20...
PostDoc in ecology and evolution of plasmids in polar waters at HIFMB (f/d/m)
Layout AWI HIPP extern, englisch
jobs.awi.de
floriantrigodet.bsky.social
In the end, the truth should be in the reads and if multiple long (or short) reads support the joining of two genomic sequence with different GC content, skew, etc; then I would be inclined to trust its reality
floriantrigodet.bsky.social
(if all of them happens at the same genomic loci, I would have no doubt that it a case of a chimeric contig)
floriantrigodet.bsky.social
But all of them can occur naturally: recent genomic rearrangement creates shifts in GC skew; GC content can change with HGT or non-coding genes like rRNA; non-specific read recruitment and hypervariable region (insertion/deletion of genes) creates drops in coverage
floriantrigodet.bsky.social
GC skew is a great idea and I think a combination of GC skew, sharp change in GC content and drop of coverage would be great indicators to find chimeric sequences. Especially if they all occurs at the same genomic location
floriantrigodet.bsky.social
Thanks a lot for going through it in your journal club! The details of my blast search can be found here: tinyurl.com/ynxwsvwc
In short, I remove the DUST filter and I ask BLAST to only report the first hit. I don't know how it would report no hits if too many hits?
A reproducible workflow for Trigodet et al., 2025
A bioinformatics workflow for our study long-read assemblers
tinyurl.com
Reposted by Florian Trigodet
banfieldlab.bsky.social
Genomes from long-read metagenomic assemblies contain rampant errors, highlighting the pressing need for stricter evaluation methods in long-read assembly algorithms. Read more in our paper with the Eren group. @floriantrigodet.bsky.social @merenbey.bsky.social
floriantrigodet.bsky.social
Misreporting non-circular elements as circular can quickly deteriorate public genome databases. We hope we can work together to ensure assemblers include stricter checks, or offer modes that prioritize caution. We would love to hear your experiences or thoughts!
floriantrigodet.bsky.social
We hope to help the community to understand potential issues they may run into, and help the developers to see different perspectives. We have a fully reproducible bioinformatics workflow, and it is easy to add one more assembler to it, or new datasets:

merenlab.org/data/benchma...
A reproducible workflow for Trigodet et al., 2025
A bioinformatics workflow for our study long-read assemblers
merenlab.org
floriantrigodet.bsky.social
We are aware that developing assembly algorithms, especially for metagenomes, is a notoriously complex and difficult task, and we have a deep appreciation of those who invest their time and skills in creating and maintaining them. A heartfelt THANK YOU. We're here to help, nothing more.
floriantrigodet.bsky.social
And we observed astonishing number of repeats in results. Repeats are common in nature and the improved ability to resolve repeats is one of the strengths of long-read sequencing. But the repeats we found didn't look convincing and likely underlined other issues.
floriantrigodet.bsky.social
Accurate reconstruction of genomic variation is essential to associate within-population structural differences to ecological or evolutionary phenotypes. But we observed serious haplotyping errors, where assemblers created chimeric constructs or did things biologists wouldn't expect.
Figure 4. Prototypical mapping artifacts and their putative origin. (A) A chimeric sequence assembled from two subpopulations. At a conserved locus, two subpopulations existed with their own and distinct sequence. The assembled contig contains all or a part of each subpopulation specific sequence resulting in a chimeric construct. (B) Another example of a variable genomic site, but in this example the contig sequence contains the sequence of a very minor subpopulation, supported by only one read. (C) Duplicated sequence only found in the assembly, not supported by any long reads. (D) Two contigs assembled from metaMDBG (left) and metaFlye (left) presenting large regions with no coverage. We blasted these regions back to the long reads and found no hits. Coverage visualization was exported from the anvi’o interactive interface (left) or the IGV software (right) and the read mapping visualization was from IGV as well. Indel smaller than 150bp as well as mismatches are not displayed in the mapping. Red markers at the end of reads indicate read clipping.
floriantrigodet.bsky.social
We observed cases of premature circularization. VERY OFTEN. Catching premature circularization can be easy in some cases, but very difficult in others.
Figure 3. Premature circularization of a Methanothrix genome. (A) Frequency of circular contigs under 500kbp with a minimum of 3 ribosomal proteins. Each point represents one assembly. (B) A pangenomic analysis of all publicly available Methanothrix genomes from the NCBI’s RefSeq database completed with the so-called circular genome of Methanothrix assembled from the sample AD Sludge by hifiasm-meta (light blue), as well as a contig from the same assembly which correspond to the rest of the missing Methanothrix genome (medium blue) and the combination of these two contigs (dark blue). (C) KEGG metabolic module completion of all genomes and contigs in (B). (D) A schematic representation of the reads mapping over a transposase gene in the prematurely circularized contigs (light blue in B and C) showing the lack of reads support around the gene, the full figure is available in Supplementary Figure 2
floriantrigodet.bsky.social
We observed chimeric contigs where the assembly software reported a single contig that brought together sequences from two distinct taxa, sometimes three or more, and even sequences that belonged to distinct domains of life.
Figure 2. Multi-domain and multi-phyla contigs. Six contigs from metaMDBG, metaFlye and hifiasm-meta. For each contigs we displayed the GC content, coverage in the metagenomics reads used for their assembly, gene level taxonomy. For each assembly breakpoint, we display a zoomed-in detail of the read mapping from IGV. In these subplots, red arrows at the end of the mapped read indicate clipping and the coloring at the end of these reads indicates that the following portion of the read mapped to another contig and similar colors indicate that multiple reads continue to map on the same contig. Blue markers indicate large indels (> 150bp)
floriantrigodet.bsky.social
Unlike traditional evaluations of assembly software, our evaluation made quite a heavy use of unassembled long-reads to quantify how well the assembled sequences matched to long-reads. They generally worked great, but then sometimes they didn't at all. Here are a few things we observed:
floriantrigodet.bsky.social
But then we learned that Jill Banfield's group was dealing with similar issues in soil samples. At that point we decided to take a deeper look at our long-read assemblers, and used the datasets they used, and added some novel ones into the mix to re-evaluate them.
A schematic representation of long reads mapping to a contig with multiple types of read disagreement with the reference, including indel and single nucleotide variants representing more than half or all the coverage, and clipping events spanning the entire coverage
floriantrigodet.bsky.social
Our pangenomes were certainly raising some red flags. But we were not sure if fractions of genomes as circular elements were a feature of nature that we missed due to the years of short-read assemblies. With changing technology, you sometimes learn things you didn't even know you were missing.
floriantrigodet.bsky.social
While examining the assembly results we are initially extremely happy with the very large number of giant and occasionally circular contigs. Although we quickly realised that many of the circular contigs did not make any sense.
floriantrigodet.bsky.social
Assemblers play a significant role turning individual reads into long genomic segments, and have tremendous implications on downstream work. Last year we were very excited to apply some of the new assemblers to our PacBio datasets from marine samples.
floriantrigodet.bsky.social
With technologies such as PacBio and ONT, genome-resolved metagenomics is experiencing its second coming. Complete and circular genomes from all domains of life as well as viruses plasmids, all WITHOUT binning seem right around the corner. That is, if we can actually assemble them.