Florian Huber
@me-datapoint.bsky.social
2.1K followers 590 following 48 posts
Professor for data science at HSD, @zdd-hsd.bsky.social | ML fan & critic | current research mostly #datascience, #machinelearning, #cheminformatics #dataviz #nlp | ✨ #openscience #openaccess #rse | living data point 🚲
Posts Media Videos Starter Packs
Pinned
me-datapoint.bsky.social
New preprint out!
1/4

@julianpollmann.bsky.social and I went down several rabbit holes to assess some commonly used molecular fingerprints.

Bottom line: For large datasets, make an effort to select suitable settings. "We used Tanimoto" is not good enough.

--> www.biorxiv.org/content/10.1...
Sketch of count/binary fingerprints and weighing options.
me-datapoint.bsky.social
Impressive milestone by @europarl.europa.eu to ban "veggie-burger" and other great dangers to humanity. 100 millions of confused meat-eaters can now finally navigate the menus again.
me-datapoint.bsky.social
New #matchms release (0.31)🚀

With functionalities that were on our TODO list for a looooong time: Flash Entropy and BLINK scores! The new "FlashSimilarity" allows computing modified cosine, spectral entropy etc., about 100x faster (or more if you use Linux).

#Python #opensource #massspec
Reposted by Florian Huber
jjjvanderhooft.bsky.social
Ready for the 4th International Summer 🌞 School on Non-Target Metabolomics at DTU - Technical University of Denmark #Copenhagen organized by Martin Hansen & Scott Jarmusch with a team of local and international helpers and instructors 😎
Thanks Lone Gram for opening the school 🙌
#CompMetabolomics
me-datapoint.bsky.social
Great post!

We also noted the same thing, which triggered us to point out some pitfalls of various fingerprints --> www.biorxiv.org/content/10.1...
www.biorxiv.org
Reposted by Florian Huber
bobehayes.bsky.social
BREAKING NEWS: #AI coding may not be helping as much as you think

"But for now, the disconnnect between what coders thought they would get out of the tools efficiency-wise and what they actually did get out of them is cause for reevaluation." ~ @garymarcus.bsky.social

garymarcus.substack....
BREAKING NEWS: AI coding may not be helping as much as you think
Coding has been the strongest use case. But a new study from METR just dropped.
garymarcus.substack.com
me-datapoint.bsky.social
Hier in @duesseldorf.bsky.social wird vorerst lieber noch jeder Parkplatz verteidigt...

(und leider nicht nur hier)
annikajoeres.bsky.social
Paris setzt auf grüne Dächer, weniger Parkplätze und Stadtwälder – und verzeichnet dort während Hitzewellen deutlich kühlere Temperaturen.
Stadtchefin Anne Hidalogo rettet Leben, weil sie die Klimakrise ernst nimmt.
Ist so.
www.derstandard.at/story/300000...
Paris kühlt sich während der Hitzewelle dank der Begrünung ab
Die Vegetalisierung von Paris durch Bürgermeisterin Anne Hidalgo vermag die Temperaturen deutlich zu senken, wie sich vor allem während der sommerlichen Hitze zeigt
www.derstandard.at
Reposted by Florian Huber
aoc.bsky.social
I don’t think anyone is prepared for what they just did w/ ICE.

This is not a simple budget increase. It is an explosion - making ICE bigger than the FBI, US Bureau of Prisons, DEA,& others combined.

It is setting up to make what’s happening now look like child’s play. And people are disappearing.
Reposted by Florian Huber
d-64.org
D64 @d-64.org · Jul 1
Hey Verwaltungs-Digitalisierer:innen! Am 17. Juli starten wir eine neue AG zur Verwaltungsdigitalisierung. Eure Expertise aus dem öffentlichen Dienst ist gefragt! Gemeinsam gestalten wir die Zukunft der öffentlichen, digitalen Verwaltung 💪

d-64.org/veranstaltun...
Verwaltung der Digitalisierung gestalten: Neue Arbeitsgruppe startet! | D64 – Zentrum für digitalen Fortschritt
Wir gründen am 17. Juli 2025 eine neue Arbeitsgruppe zur Verwaltungsdigitalisierung. Hier bringen wir digitale Kompetenz und politische Gestaltung zusammen.
d-64.org
Reposted by Florian Huber
me-datapoint.bsky.social
@jorainer.bsky.social and @philouail.bsky.social gave a great overview of the ecosystem around #RforMassSpectrometry and #XCMS!

#MetSoc25
I am super glad they now also provide options to combine with #Python and #matchms (thanks🙏)
Reposted by Florian Huber
jorainer.bsky.social
📢 Poster 1001 at #MetSoc2025: Marilyn De Graeve on our #SpectriPy #rstats package to integrate #python and #rstats packages for #MassSpec data analysis . TODAY
Reposted by Florian Huber
sneumann.bsky.social
Hi, in case your phone didn't pick up the QR code to the slides of my Hitch-Hikers Guide to Computational Metabolomics talk this morning at #Metabolomics2025, featuring #xcms, #massbank, not #metfrag but #CASMI and #MetFamily, please find them at doi.org/10.5281/zeno...
me-datapoint.bsky.social
Great keynote by @sneumann.bsky.social at #MetSoc25, strongly advocating for #opensource , data-sharing, and making things interoperable.

Glad to also spot #matchms in this universe :)
Slide from presentation of Steffen Neumann
Reposted by Florian Huber
jjjvanderhooft.bsky.social
Proud of Niek de Jonge who did a fantastic job in presenting his work on cross-ion mode spectral similarity scoring! 😎 👏
Work with Florian Huber @me-datapoint.bsky.social

#metabolomics #CompMetabolomics #MetSoc25 #MS2DeepScore
me-datapoint.bsky.social
4/4
We also highlight options for count fingerprints, such as log-counts and IDF weighted counts. The latter can be used to adjust the bit importance to a dataset of your choice.

An example use-case are chemical space visualizations.

Preprint: www.biorxiv.org/content/10.1...
Chemical Space Visualizations using UMAP and various molecular fingerprints.
me-datapoint.bsky.social
3/4
A huge issue is bit collisions.
Fingerprints with a high bit occupation (RDKit, MAP4) often lead to (1) arbitrary misinterpretations, (2) shifts to high Tanimoto scores, (3) very different handling of small and large molecules.

--> Consider using sparse fingerprints!
--> Morgan >> MAP4 / RDKit
me-datapoint.bsky.social
2/4
We focused on weaknesses of the fingerprints.
Many show frequent duplicates, so same fingerprint for different compounds. Most problematic: this can include *very* different compounds ending up with identical fingerprints.

- MAP4 >> Morgan-type >> daylight
- count >> binary

#cheminformatics
Benchmarking plot on fingerprint duplications.
me-datapoint.bsky.social
New preprint out!
1/4

@julianpollmann.bsky.social and I went down several rabbit holes to assess some commonly used molecular fingerprints.

Bottom line: For large datasets, make an effort to select suitable settings. "We used Tanimoto" is not good enough.

--> www.biorxiv.org/content/10.1...
Sketch of count/binary fingerprints and weighing options.
me-datapoint.bsky.social
Good start for me at #metabolomics2025 with a hands-on workshop on MS2LDA by Jonas Dietrich, Rosina Torres Ortega and @jjjvanderhooft.bsky.social.
me-datapoint.bsky.social
Went by train to #Prague for #metabolomics2025.

These are the kind of moments that remind me how great the European project is. No border controls, no visas. Just a train following a river to the neighboring country.
Elbe river seen from a train somewhere after Dresden.
Reposted by Florian Huber
garymarcus.bsky.social
Orwell’s 1984, but with LLMs
me-datapoint.bsky.social
Hier alle Ergebnisse: fahrradklima-test.adfc.de/ergebnisse

(Besonders von Ruhrpott bis Köln ist es leider ziemlich traurig)
Ergebnisse
fahrradklima-test.adfc.de