Suhaib Khan
banner
suhaibkhan.bsky.social
Suhaib Khan
@suhaibkhan.bsky.social
Interested in HPC & large storage systems
Reposted by Suhaib Khan
A brave stake in the ground that defines what is (and isn’t) a parallel file system. I generally agree with Chris’ explanation. But I’m sure he’ll get hate from parallel storage elitists who don’t like how inclusive his take is.
November 26, 2025 at 4:50 PM
DOE to build an integrated discovery platform by linking together #supercomputers and other facilities at its 17 National Laboratories with industry and academia drawing on the expertise of roughly 40,000 scientists, engineers, and technical staff.

www.theregister.com/2025/11/25/t...
Trump orders nationwide AI Genesis Mission to drive science
: DOE told to build a unified research platform linking federal compute, datasets, and national labs
www.theregister.com
November 25, 2025 at 4:58 PM
Reposted by Suhaib Khan
NVIDIA will introduce liquid-cooled busbars into the racks for its Vera Rubin platform, part of a broader evolution of data center racks and power design to support more powerful AI computing.
open.substack.com/pub/datacent...
As Densities Soar, AI Racks Add Liquid-Cooled Busbars
NVIDIA, Meta Bring Liquid Cooling into Power Chain for Extreme-Density Racks
open.substack.com
November 24, 2025 at 2:34 PM
Intel: We’re simplifying the Diamond Rapids platform with a focus on 16 Channel processors and extending its benefits down the stack to support a range of unique customers and their use cases. (Source: Intel Spokesperson to STH)

www.servethehome.com/intel-cancel...

@servethehome.com.web.brid.gy
Intel Cancels its Mainstream Next-Gen Xeon Server Processors
A major next-generation Intel Xeon platform has been removed from the company's roadmap. We have the details on the Diamond Rapids shift
www.servethehome.com
November 16, 2025 at 9:11 PM
Reposted by Suhaib Khan
Silly thread for a Saturday: some of the #HPC clusters I’ve worked on over the years.

First up is Cielo, a Cray XE6 I worked on at LANL! Which might actually be the prettiest supercomputer I’ve worked on.
November 15, 2025 at 5:55 PM
Reposted by Suhaib Khan
A random interesting fact about the Cray XE6 racks: air exhausted up from the top of the rack!

The result was that the lights above Cielo all started failing over time due to being blasted with hot air. And no good way to access them due to the size of the cluster.

Clearly an early photo 😁
Silly thread for a Saturday: some of the #HPC clusters I’ve worked on over the years.

First up is Cielo, a Cray XE6 I worked on at LANL! Which might actually be the prettiest supercomputer I’ve worked on.
November 15, 2025 at 7:35 PM
Reposted by Suhaib Khan
Exclusive: Amazon.com is joining Microsoft in supporting legislation that threatens to further limit Nvidia’s ability to export to China, a rare split between the chip designer and two of its biggest customers.
Amazon and Microsoft Back Effort That Would Restrict Nvidia’s Exports to China
The legislation in Washington would give tech leaders preferential access to chips at their data centers around the world.
on.wsj.com
November 14, 2025 at 7:27 AM
Reposted by Suhaib Khan
Exclusive: Samsung hikes memory chip prices by up to 60% as shortage worsens, sources say reut.rs/49QjvFI
Exclusive: Samsung hikes memory chip prices by up to 60% as shortage worsens, sources say
Samsung Electronics this month raised prices of certain memory chips - now in short supply due to the global race to build AI data centres - by as much as 60% compared to September, two people with knowledge of the hikes said.
reut.rs
November 14, 2025 at 9:00 AM
Reposted by Suhaib Khan
Very nice overview of the emerging UALink standard with nice features such as splitting packets in switches, in-network computing, high energy efficiency, and lowest silicon overhead: buff.ly/AgLvC1g

I'll be joining a panel at SC25 contrasting UALink and UEC next Wed: buff.ly/BeCMFcL Join us there
Introducing the UALink 200G 1.0 Specification Webinar
The Ultra Accelerator Link™ (UALink™) Consortium is an open industry standard group dedicated to advancing the UALink specification. The Consortium recently released the UALink 200G 1.0…
buff.ly
November 14, 2025 at 6:00 AM
DARPA’s Next-Generation Microelectronics Manufacturing (NGMM) program is building a packaging plant in Austin that is dedicated to 3D heterogeneous integration (3DHI).

spectrum.ieee.org/3d-heterogen...

@spectrum.ieee.org @darpa.mil
Why Is DARPA Betting on 3D Heterogeneous Integration?
Can a 1980s-era fab in Austin transform the future of microelectronics with 3D heterogeneous integration?
spectrum.ieee.org
November 13, 2025 at 10:14 PM
If you can read an analog clock correctly, you are still outperforming #AI in that regard.

spectrum.ieee.org/large-langua...

@spectrum.ieee.org
AI Struggles to Read Analog Clocks Correctly
AI struggles with analog clocks. What does this reveal about its limitations in image analysis?
spectrum.ieee.org
November 13, 2025 at 10:10 PM
Reposted by Suhaib Khan
Uhm, there is a typo in the headline, remove the "s" from insane. That should fix it.
November 13, 2025 at 12:06 AM
Reposted by Suhaib Khan
Paderborn's new Otus #supercomputer features 142,656 processor cores, including AMD “Turin” and #Nvidia H100 #GPUs, and 5PB of storage managed with IBM Spectrum Scale (formerly GPFS) file system. ow.ly/mnlX50XqGAs
‘Otus’ Now Open for Business at Germany's PC2 - HPCwire
The Paderborn Center for Parallel Computing (PC2) in Germany this week opened its newest and largest supercomputer for business. Otus, which sports more than 142,000 processor cores, will be used to r...
ow.ly
November 12, 2025 at 9:41 PM
Andrew Ng: #AI has stark limitations, and despite rapid improvements, it will remain limited compared to humans for a long time.

#AI is amazing, but it has unfortunately been hyped up to be even more amazing than it is.

www.deeplearning.ai/the-batch/is...
Safer (and Sexier) Chatbots, Better Images Through Reasoning, The Dawn of Industrial AI, and more...
The Batch AI News and Insights: I recently received an email titled “An 18-year-old’s dilemma: Too late to contribute to AI?” Its author, who gave...
www.deeplearning.ai
November 12, 2025 at 10:09 PM
Reposted by Suhaib Khan
Europe takes a major step in research connectivity! A new terabit network will link supercomputers across the continent, including EuroHPC’s @lumi-supercomputer.eu located in CSC’s data center in Kajaani 🚀

🔗 csc.fi/en/news/tera...
November 12, 2025 at 2:46 PM
Reposted by Suhaib Khan
Are 2030 AI hyperscalars capital constrained, power constrained, DRAM constrained, flash constrained, compute constrained, software constrained, or :-) demand constrained?
November 11, 2025 at 2:30 PM
Reposted by Suhaib Khan
Racks filled with GPUs and liquid cooling gear can now weigh 6,000 pounds or more, requiring new approaches to address human safety and investment protection. Google, Meta, and Microsoft are turning to robotics to safely move these huge racks.
open.substack.com/pub/datacent...
Data Centers Turn to Robots to Haul Multi-Ton Racks
Hyperscalers, OCP Ramp Up Robotics Teams for Worker Safety, Productivity
open.substack.com
November 11, 2025 at 12:47 PM
Reposted by Suhaib Khan
Scammers have a new way of getting into your pockets: by targeting your #AI assistant. They use prompt engineering, embedding code in emails that trick AI tools into taking malicious actions. Learn how to protect your digital presence. spectrum.ieee.org/ai-agent-phi...
November 9, 2025 at 4:01 PM
Reposted by Suhaib Khan
Can we build an #AI #Climate Scientist? Asked at the ADIA Lab Symposium in Abu Dhabi last week - now online at buff.ly/6igSeyg :-).

Much work to be done - this is outlining some directions of indicative results with a lot of potential to accelerate AI for Science.
November 9, 2025 at 9:24 AM
Reposted by Suhaib Khan
AI excels in complex tasks but falters at reading analog clocks—what does this tell us about its limitations?
AI Struggles to Read Analog Clocks Correctly
AI struggles with analog clocks. What does this reveal about its limitations in image analysis?
spectrum.ieee.org
November 8, 2025 at 2:01 PM
Reposted by Suhaib Khan
Nvidia's biggest scale up domain is 72 GPUs. Google's is 9,216 TPUs.

Historically TPUs have trailed on FLOPS, memory, & bandwidth. That's no longer the case with Ironwood.

Google has a Blackwell-class TPU with absurd scale. More on @theregister.com ⬇️

www.theregister.com/2025/11/06/g...
TPU v7, Google's answer to Nvidia's Blackwell is nearly here
: Chocolate Factory's homegrown silicon boasts Blackwell-level perf at massive scale
www.theregister.com
November 7, 2025 at 4:16 PM
Reposted by Suhaib Khan
Exclusive: Intel is losing a data center AI executive who previously helped lead the company’s Gaudi accelerator chip efforts and is now headed for a job at AMD, CRN has learned. www.crn.com/news/compone...
Exclusive: Intel Is Losing A Data Center AI Executive To AMD
Intel is losing a data center AI executive who previously helped led the company’s Gaudi accelerator chip efforts and is now headed for a job at AMD, CRN has learned.
www.crn.com
November 6, 2025 at 9:04 PM
Reposted by Suhaib Khan
Collaborator and friend Dan Alistarh talks at ETH about using the new NvFP4 and MXFP4 block formats for inference.

Some going from "terrible" accuracy to acceptable using micro rotations to smoothen outliers in blocks.

arxiv.org/abs/2509.23202

Great collaboration and cool stuff
November 5, 2025 at 8:32 AM
Reposted by Suhaib Khan
Google recently posted a promo for using their managed Lustre service to accelerate inferencing via KV caching. Raises questions:

1. What ever happened to Google Managed DAOS (ParallelStore)? It performs better than Lustre.

2. Does Gemini use this? Unlikely. See glennklockwood.com/garden/atten...
attention
Attention is the mathematical operation within a transformer that allows different parts of the input to figure out how important they are to each other ...
glennklockwood.com
November 4, 2025 at 4:38 PM
OpenAI spreads the imaginary wealth beyond Microsoft with $38B AWS deal

Amazon deal still dwarfed by $250B Azure commitment made as part of OpenAI's for-profit transformation

www.theregister.com/2025/11/03/o...
OpenAI signs $38B cloud computing deal with AWS
: Amazon deal still dwarfed by $250B Azure commitment made as part of OpenAI's for-profit transformation
www.theregister.com
November 3, 2025 at 6:56 PM