Stephan Hoyer
@stephanhoyer.com
1.7K followers 470 following 33 posts
Building AI climate models at Google. I also contribute to the scientific Python ecosystem, including Xarray, NumPy and JAX. Opinions are my own, not my employer's.
Posts Media Videos Starter Packs
stephanhoyer.com
Do you take it yourself?
stephanhoyer.com
I think the problem is the algorithm. BlueSky's lack of a recommendation engine means that if you're not posting all the time, your stuff doesn't get seen.
stephanhoyer.com
The "ungamable impact" of OSS really resonates with me:
www.thonking.ai/i/158277004/...

Sadly it does not necessarily align with what makes for a sucessful career in Big Tech. But it's worth trying anyways! :)
Why PyTorch is an amazing place to work... and Why I'm Joining Thinking Machines
In which I convince to you to join either PyTorch or Thinking Machines!
www.thonking.ai
stephanhoyer.com
I think it's just about readability with small font, the same reason why printed newspapers use many columns.
stephanhoyer.com
The losses here should be marked as millions not billions, right?
stephanhoyer.com
Pretty much anything that you can write in high level array code like NumPy is very fast in JAX. Only intrinsically very loopy code is (relatively) slow, but JAX has excellent support for writing custom kernels in lower level languages.
stephanhoyer.com
AD compatible Python is at the cutting edge of performance these days with it's central role in large-scale AI training.

In my experience (mostly geophysical fluid dynamics) JAX has comparable perf to modern Fortran on CPUs, with a much easier path to GPUs and multi-device code.
stephanhoyer.com
Those are tiny chunks! Does that reduce max throughput for analytics use-cases compared to larger chunks?
stephanhoyer.com
Such exciting news!

For anyone who has tried the new sharding feature -- do you have any guidance on optimal shard sizes, if I want more flexibility in access patterns but still optimal throughput?
Reposted by Stephan Hoyer
weatherwest.bsky.social
Is there a link between #ClimateChange & increasing risk/severity of #wildfire in California--including the still-unfolding disaster? Yes. Is climate change the only factor at play? No, of course not. So what's really going on? [Thread] #CAfire #CAwx #LAfires iopscience.iop.org/a...
stephanhoyer.com
This is a huge milestone for cloud-native big scientific data!
zarr.dev
Zarr @zarr.dev · Jan 9
🎉 Zarr-Python 3 is here! 🎉

- Full support for Zarr v3 spec
- Chunk-sharding for more efficient data storage
- Major performance boosts with async I/O & parallel compression

💻 pip install --upgrade zarr
💻 conda install --channel conda-forge zarr

Blog post: https://buff.ly/3C3OwYw
stephanhoyer.com
We have a few other updates to share as well, which can be found in the inaugral edition of the NeuralGCM newsletter:
groups.google.com/g/neuralgcm-...

The biggest one is that NeuralGCM models are now freely available for everyone to use, including for commercial purposes!
NeuralGCM update: new models, new license, new datasets
groups.google.com
stephanhoyer.com
Interested in AI weather/climate modeling at #AGU24?

I'll be giving an overview talk on NeuralGCM at 11:30am Wed at the Google booth, and an talk on modeling precipitation with NeuralGCM at 4:25pm Wed in the session A34A.
stephanhoyer.com
When I hear "ML" I tend to think of old school (i.e., scikit-learn) machine learning, which is great but much less powerful than deep learning. So I would opt for "AI weather models" though that misses quite a bit of nuance.
stephanhoyer.com
This diagram is accurate historically, but recently AI seems to have become synonymous with deep learning.
stephanhoyer.com
The bottleneck for traditional models is data movement within the CPU, not data transfer to disk -- physics based simulations do too little compute per byte (low arithmetic intensity) to fully utilize modern hardware.

AI is way better in this respect. It's easy to use lots of FLOPs on big matmuls!
stephanhoyer.com
Unlimited potential, zero bugs!
stephanhoyer.com
There's nothing like the feeling of starting a codebase from scratch.