Glenn K. Lockwood
banner
glennklockwood.com
Glenn K. Lockwood
@glennklockwood.com
I am a supercomputing enthusiast, but I usually don't know what I'm talking about. I post about large-scale infrastructure for #HPC and #AI.
NERSC just got their first rack of the Doudna early access system from Dell--36 nodes of GB200 NVL4 in a 50OU rack. It's TALL!

#HPC

Source: www.linkedin.com/posts/nation...
January 14, 2026 at 12:15 AM
This feels like the same thing when cosmetics companies all started saying they don't test on animals. Sounds good, but the reality is that the market is saturated. Rural municipalities don't need tax breaks or other incentives to attract datacenter buildout anymore.
January 14, 2026 at 12:11 AM
Finding that there's no intuitive way to understand how key/value vectors can/can't be cached during inference w/o also knowing the basics of autoregressive decode. It's proving to be a great filter for thought leader types who don't actually know what they're talking about.

#schadenfreude
January 13, 2026 at 6:26 PM
The idea is that VAST will qualify its software on whatever hardware is already in someone's datacenter, providing more usable capacity on the same SSDs than whatever software was originally installed on it. Relies on VAST's lower EC overhead (146+4) and fancy global data reduction algorithms.
January 13, 2026 at 5:22 PM
Reposted by Glenn K. Lockwood
TACC is proud to congratulate Stacyann Nelson on receiving the 2025 Joseph A. Johnson Award of Excellence. 🎉

Stacyann was an inaugural participant in the Advanced Computing Student Collaborative in 2016. ACSC is presented by the NSF LCCF at TACC.

Learn more: bit.ly/3NiNZab
January 13, 2026 at 2:08 PM
Last year, DOE issued an RFP to build commercial AI datacenters on DOE-owned land. That resulted in announcements including Argonne's 100K-GPU Solstice system and the Genesis Mission. Looks like DOD is following the same pattern; let's see what massive #AI systems arise from this request.

#HPC
January 12, 2026 at 5:57 PM
The CFP for CUG 2026, to be held in Nice, France, is now open! Wish I could go. CUG was one of my favorite annual #HPC conferences.

cug.org/cug-2026/
CUG 2026 – CUG
cug.org
January 12, 2026 at 2:51 PM
Reposted by Glenn K. Lockwood
I don't get the chance to talk about Arkouda—a Python library for doing exploratory data analytics interactively on supercomputers—nearly as often as I do Chapel, so this talk at the University of Washington eScience Institute (uwescience.bsky.social) that wrapped up 2025 was a fun one for me.
Interested in doing exploratory analytics interactively on data sets that exceed your workstation’s capacity? Learn about Arkouda, and how it compares to Pandas, Dask, and Spark, in this talk and demo given at the @uwescience.bsky.social Data Science Seminar:

www.youtube.com/watch?si=c_o...
UW Data Science Seminar: Brad Chamberlain
YouTube video by UW eScience Institute
www.youtube.com
January 9, 2026 at 4:42 PM
The problem with saying "just tier to HDD if flash prices are too high" is HDDs are scarce too. HDD manufacturers opted not to invest in expanding fab capacity b/c they'd never recover the capex given declining HDD sales. HDDs are a myopic investment right now.

blocksandfiles.com/2026/01/09/s...
Sky-high flash prices: data reduction or tier to disk?
As SSD prices rise we wonder whether we’re seeing a panic-buying storage media price rise bubble or is AI-driven demand real? In either case, what should we do about it? SSDs cost more than disks so a...
blocksandfiles.com
January 10, 2026 at 1:35 AM
Yeah, so this doesn’t feel good. Seems like it’s preying on people’s ignorance. Maybe that’s what a lot of marketing winds up being in any case.
They are not.... they just generate expectations and good PR. One should look into the business they generate by being and operating there (e.g. buying electricity) and the employment they generate, but the initial capex does not stay in the state in any meaningful form
January 9, 2026 at 5:57 PM
When these companies pledge $XX billion investment, doesn’t that include the XX-1 billion that winds up going to NVIDIA? Unclear how these represent investments in the region
January 9, 2026 at 4:31 PM
Is there anyone smarter than me who can explain what NVIDIA's new Inference Context Management Storage actually is? I collated everything I could find across all the press releases (glennklockwood.com/garden/ICMS), and it just looks like parallel storage attached over nonblocking frontend Ethernet.
NVIDIA ICMS
NVIDIA’s Inference Context Management Storage (ICMS) appears to be an architectural blueprint whereby1 Pooled NVMe (file? block?) is attached to the frontend (north-south) network ...
glennklockwood.com
January 8, 2026 at 10:48 PM
Back at GTC'25, NVIDIA said they would change the NVL## nomenclature to reflect die count, and that Vera Rubin would be NVL144 despite having 72 GPUs. But all the CES-related announcements still refer to NVL72. Did they reverse course on the name change? Or is this a junior version? So confusing.
January 8, 2026 at 9:46 PM
I somehow learned both a great deal and nothing at all after reading this.
January 8, 2026 at 12:09 AM
It's 2026, and Google Docs still lacks very basic features that are essential to editing documents in a grown-up work environment. Exhibit A: you cannot hide suggested deletions, making an doc with suggestions completely unreadable.

support.google.com/docs/thread/...
how can I hide deletions . I'm not used to working in Docs and I've been hunting to no avail. - Google Docs Editors Community
support.google.com
January 7, 2026 at 6:13 PM
Little bit of good news - Congress is quietly pushing for NSF/NASA/DOE budgets that ignore the dramatic cuts proposed by the earlier budget blueprint: www.science.org/content/arti...
Congress set to reject Trump’s major budget cuts to NSF, NASA, and energy science
Appropriators agree instead to keep this year’s spending nearly level
www.science.org
January 5, 2026 at 11:26 PM
I use this read-it-later app that continually resurfaces old things I’ve read and found interesting. Headed into 2026 in a job that is all #AI all the time, I feel like this quote is more meaningful than ever. Everyone is an expert, yet so few actually understand.

Source: fs.blog/two-types-of...
December 31, 2025 at 7:42 PM
Reposted by Glenn K. Lockwood
Interested in features, design, history or migration to #FluxFramework? We've updated our admin guide for you! flux-framework.readthedocs.io/projects/flu...

Have questions? Let us know! We are aiming to better engage with all of you, our community, in 2026 🎉 starting with joining @hpsf.bsky.social!
December 30, 2025 at 6:48 PM
What dingdong decided that the first week of the new year is a good time to host what has become a major annual technology conference?

Also, what dingdong decided that datacenter products should be announced at something originally called the “Consumer Electronics Show?”

#grumble #happynewyear
December 31, 2025 at 7:31 PM
I pitch in and so should you. Plus there are stickers.
December 21, 2025 at 6:04 PM
Sync training across geos isn’t new, tho doing it b/c of training data governance is. But training across AMD+NVIDIA is new; leave it to DOE to demonstrate such odd methods!

Unclear what separates “federated learning” from multicluster training tho.

www.sandia.gov/labnews/2025...

#AI
Three national security laboratories, one AI model – LabNews
Sandia, Los Alamos and Lawrence Livermore national laboratories have proven that it's possible to share a large language model without compromising sensitive data from each lab.
www.sandia.gov
December 19, 2025 at 3:22 PM
Very last minute, but I'm giving a talk online tomorrow (Thurs Dec 18) about my analysis of over 85K model training checkpoints and implications for system design. Punchline is "less bandwidth makes training go faster."

Registration required: www.vastdata.com/events/vast-...

#AI #storage
Smarter, Not Faster: The Storage Reality Hidden in 85,000 AI Checkpoints - VAST Data
Stop chasing multi-terabyte-per-second performance for your global storage. Focus on "checkpoint overlap," not raw bandwidth. Invest your budget in what matters most: GPUs.
www.vastdata.com
December 18, 2025 at 12:32 AM
NERSC recently did a wholesale replacement of its FDR InfiniBand storage fabric to RoCE. The IB was a greenfield installation back when I started in 2015, and replacing it with a competing technology in production is quite the feat. Glad to hear it succeeded.

www.nersc.gov/news-and-eve...
Network Upgrades Pave the Way to a Faster Future | NERSC
The National Energy Research Scientific Computing Center (NERSC), a U. S.
www.nersc.gov
December 17, 2025 at 12:19 AM
Idle CPUs in GPU clusters do not “exceed $250 million per year in electricity.” This is crazy math. What are they using as fuel here??
December 17, 2025 at 12:08 AM
Does this mean no more dirt-cheap NRE from Slurm? Or will Slurm development no longer be coin-operated? Would love to see serious engineering effort go into modernizing Slurm, but this could go in many directions.
As hybrid #HPC + #AI + #Quantum workflows become more prevalent, orchestration of complex systems is becoming a key component of successful deployment. Looking forward to accelerating these capabilities for the open-source community.
NVIDIA Acquires Open-Source Workload Management Provider SchedMD
NVIDIA will continue to distribute SchedMD’s open-source, vendor-neutral Slurm software, ensuring wide availability for high-performance computing and AI.
blogs.nvidia.com
December 15, 2025 at 5:40 PM