Leland McInnes
@lelandmcinnes.bsky.social
2.6K followers 300 following 39 posts
A Mathematician dabbling in Data Science, especially unsupervised learning and data exploration. UMAP, HDBSCAN, PyNNDescent, DataMapPlot. (He/Him)
Posts Media Videos Starter Packs
Reposted by Leland McInnes
bschmidt.bsky.social
Despite the gutting of the National Center for Educational Statistics, the dept of Ed *did* manage to release 2024 college major counts in the usual format, so I can run it through the same code I do every year. First off, the change since peak of the largest fields -- another year of drops.
A line chart captioned "The big humanities majors were mostly still falling in 2024", showing drops since 2008 for most humanities fields between 10% (Study of the Arts) to 68% (religion) with history, english, and foreign languages all clustered around 50-55%
Reposted by Leland McInnes
patcon.bsky.social
I'm very much a learner, but you're maybe asking if aspects of matrix factorisation approaches to dimensionality reduction apply here. But LocalMAP is a KNN approach, with a matrix factorisation initialisation. h/t @lelandmcinnes.bsky.social for his attempts to describe these youtu.be/9iol3Lk6kyU
A Bluffer's Guide to Dimension Reduction - Leland McInnes
YouTube video by PyData
youtu.be
Reposted by Leland McInnes
unireps.bsky.social
📢 Save the date!
Join us for the next @ellis.eu x UniReps Speaker Series!
📅 27th August – 16:00 CEST
📍https://ethz.zoom.us/j/66426188160
🎙️ Speakers: Keynote by @lelandmcinnes.bsky.social & Flash Talk by Yu (Demi) Qin
🔔 Stay updated by joining our Google group: groups.google.com/u/2/g/ellis-...
Reposted by Leland McInnes
domoritz.de
🚀 We've just open-sourced Embedding Atlas – a tool for exploring large embedding spaces through rich, interactive visualizations 📊.
Screenshot of embedding atlas showing the embedding view on the left, a table at the bottom and charts on the right.
Reposted by Leland McInnes
astroarxiv.bsky.social
Meteoroid stream identification with HDBSCAN unsupervised clustering algorithm. Eloy Peña-Asensio et. al. https://arxiv.org/abs/2507.01501
Figure 1 Figure 2 Figure 3 Figure 4
Reposted by Leland McInnes
bendavidsteel.bsky.social
Ever wanted to pan through the latent🌌 space of TikTok videos? Made using the amazing toponymy and datamapplot from @lelandmcinnes.bsky.social
and data from mine and @jurgenpfeffer.bsky.social
's first complete TikTok slice. link below
Reposted by Leland McInnes
scipyconf.bsky.social
🎤 Speaker Spotlight: Leland McInnes
Join Leland at #SciPy2025 for his talk "DataMapPlot: Rich Tools for UMAP Visualizations." 📊

Discover powerful new ways to explore high-dimensional data!
🔗 scipy2025.scipy.org
Reposted by Leland McInnes
lelandmcinnes.bsky.social
Explore Wikipedia through a data map. Pages are grouped by semantic similarity, for topic clusters.
Hover to see details, zoom to explore more fine-grained topics, click to go to a page. Search by page
name to find interesting starting points for exploration.

lmcinnes.github.io/datamapplot_...
lelandmcinnes.bsky.social
I'll be giving a talk about DataMapPlot for visualizing data maps at Scipy this year. I would love to meet potential users and chat about where to go next.

cfp.scipy.org/scipy2025/ta...
SciPy 2025 poster advertising a talk by Leland McInnes about "DataMapPlot: Rich Tools for UMAP Visualizations"
Reposted by Leland McInnes
lelandmcinnes.bsky.social
I also updated the ArXiv data map example to make use of new features in datamapplot.
lmcinnes.github.io/datamapplot_...

You can tweak parameters and build your own version:
gist.github.com/lmcinnes/e11...
Reposted by Leland McInnes
msbr89.bsky.social
OMG I am so glad someone finally did this.

Thank you 🙏 @lelandmcinnes.bsky.social

This will now consume hours and hours of my time.

lmcinnes.github.io/datamapplot_...
lelandmcinnes.bsky.social
I also updated the ArXiv data map example to make use of new features in datamapplot.
lmcinnes.github.io/datamapplot_...

You can tweak parameters and build your own version:
gist.github.com/lmcinnes/e11...
Reposted by Leland McInnes
datavisfriendly.bsky.social
Great idea. Did no one think of this before?
lelandmcinnes.bsky.social
Explore Wikipedia through a data map. Pages are grouped by semantic similarity, for topic clusters.
Hover to see details, zoom to explore more fine-grained topics, click to go to a page. Search by page
name to find interesting starting points for exploration.

lmcinnes.github.io/datamapplot_...
lelandmcinnes.bsky.social
It should be possible to build one en Francais following this: gist.github.com/lmcinnes/951...

It pulls the data from "hf://datasets/Cohere/wikipedia-2023-11-embed-multilingual-v3/en/*.parquet" but you can swap in "hf://datasets/Cohere/wikipedia-2023-11-embed-multilingual-v3/fr/*.parquet".
Interactive Data Map of Wikipedia
Interactive Data Map of Wikipedia. GitHub Gist: instantly share code, notes, and snippets.
gist.github.com
lelandmcinnes.bsky.social
Vous pouvez créer le vôtre : gist.github.com/lmcinnes/951...

Remplacez « en » par « fr » et tout devrait fonctionner.

(Veuillez excuser mon erreur de traduction Google !)
Interactive Data Map of Wikipedia
Interactive Data Map of Wikipedia. GitHub Gist: instantly share code, notes, and snippets.
gist.github.com
lelandmcinnes.bsky.social
Thank you to all the people who contributed to DataMapPlot and Toponymy! Toponymy is still very much
in development, so please check it out, and if you have ideas or features to add consider contributing.
Documentation for Toponymy is coming soon.
lelandmcinnes.bsky.social
But most importantly you can build this yourself using open source tools. A notebook with full end-to-end code is here: gist.github.com/lmcinnes/951...

You can use the same tools and techniques to build a map for your own data.
Interactive Data Map of Wikipedia
Interactive Data Map of Wikipedia. GitHub Gist: instantly share code, notes, and snippets.
gist.github.com
lelandmcinnes.bsky.social
It does provide a novel way to explore Wikipedia though.

You can see the scope of all of English language Wikipedia at once.

There are surprising clusters (Every Polish village; Japanese railway stations; etc.), dense topics, and surprising connections to be found.
lelandmcinnes.bsky.social
All of this is really just a tech-demo for the tools backing it: Toponymy for creating topics and
topic labels, and DataMapPlot for creating the interactive visualization.

github.com/TutteInstitu...
github.com/TutteInstitu...
GitHub - TutteInstitute/toponymy
Contribute to TutteInstitute/toponymy development by creating an account on GitHub.
github.com
lelandmcinnes.bsky.social
Explore Wikipedia through a data map. Pages are grouped by semantic similarity, for topic clusters.
Hover to see details, zoom to explore more fine-grained topics, click to go to a page. Search by page
name to find interesting starting points for exploration.

lmcinnes.github.io/datamapplot_...
Reposted by Leland McInnes
scipyconf.bsky.social
🔥 Meet our Keynote Speakers for #SciPy2025!

Dr Malvika Sharan, co-Director of Open Life Science (OLS) and a senior researcher at The Alan Turing Institute will be sharing with us her expertise at our favorite conference.

You can't miss her ➡️ hubs.la/Q03sdlsb0
Reposted by Leland McInnes
scipyconf.bsky.social
🔥 Meet our Keynote Speakers for #SciPy2025!

Hon. Dr. Kathryn D. Huff 🇺🇸, nuclear engineer, policy leader, and former Assistant Secretary for the Office of Nuclear Energy will be joining us in Tacoma! 🙌

Don't miss her talk, grab your ticket now: hubs.la/Q03sdlsb0