Arseny Kapoulkine
@zeux.io
1.3K followers 75 following 190 posts
Previously: technical fellow at Roblox meshoptimizer, pugixml, volk, calm, niagara, qgrep, Luau https://github.com/zeux https://zeux.io
Posts Media Videos Starter Packs
zeux.io
New blog post! In "Billions of triangles in minutes" we'll walk through hierarchical cluster level of detail generation of, well, billions of triangles in minutes. Reposts welcome!

zeux.io/2025/09/30/b...
zeux.io
Yeah if you can process two triangles at once you can do this without weird offset math. However, for mesh shaders, certain *ahemd* vendors really want you to output one triangle per thread, so that option stops being appealing...
zeux.io
Unlike Vulkan, DX12 doesn't support loading 8-bit types from byte address buffers; this seems like it would be a problem for working with 8-bit triangle indices, but don't worry, you can just
zeux.io
~3m instead of ~4m now :) It's actually ~2m40s when not ran under the profiler for whatever reason.

While I *can* make the green bars completely solid I've already spent way longer than I should on this exercise so this will have to do!
zeux.io
#ScreenshotSaturday
Reposted by Arseny Kapoulkine
zeux.io
Not just from tiny_ocl.h, no.
zeux.io
I would recommend updating documentation as tiny_bvh requires C++17 now (thousands separators are C++14, static inline variables are C++17); this is not very obvious from the readme.
zeux.io
It's kinda ironic: on one hand, it's probably enough; on the other hand, Epic had to write a custom compute micropoly rasterizer because it was in fact not enough :)

2080 had 6 GPCs @ 1.5 GHz, 5070 has 5 GPCs @ 2.3 GHz. So just in general fairly close, as long as they didn't increase tri/GPC rate.
zeux.io
Obviously if I ran this on a 5090 I'd expect higher performance... and many more watts. But I don't have a 5090.

And probably from the architectural perspective, pure rasterization bottlenecks have been squeezed dry 7 years ago and there's not much else to do, and not much need - 19B/sec is enough
zeux.io
So this suggests no progress in pure rasterization performance in... 7 years? In fact a noticeable regression in tri/sec/W.

Of course, when people say "rasterization", they usually mean modern ALU heavy rendering pipelines - not pure geometry stress test. Still!

Caveat: no 2080 to retest again :)
zeux.io
... couple minor tweaks to meshlet function interfaces, as they were extremely experimental at the time. Everything else worked as is.
- Curiously, the commit log said it ran at ~19B tri/sec on RTX 2080. On my RTX 5070 now, I get ~17B tri/sec on the same mesh. 5070 is 250W, my 2080 was a 215W model.
zeux.io
Spent some time resurrecting niagara from 2018 to test some NV_mesh_shader stuff. This was fun!

- The code almost compiles and runs... needed a couple SPIRV fixes to align with latest Vulkan SDK for some reason.
- This was using meshoptimizer v0.9 (17 versions ago!)
- Update to v0.25 needed a...
zeux.io
Starting in ~20 minutes!
zeux.io
Upcoming niagara stream!

Tomorrow (Saturday, Aug 23) at 11 AM PST (6 PM GMT), we will talk about and work on simplifying Vulkan synchronization code, following recent developments in the ecosystem like unified layouts.

youtube.com/live/0rqWe1M...
niagara: Simplifying synchronization
YouTube video by Arseny Kapoulkine
youtube.com
zeux.io
While it's quite silly to literally run the simplifier every frame on the UI thread in the browser, it's fun to see that it can indeed work at 60 FPS!
Reposted by Arseny Kapoulkine
jasonschreier.bsky.social
BREAKING: Silksong will be out on September 4. Two weeks from today. Really.

Often, games that take 7+ years to make are plagued by mismanagement and painful burnout. But for Silksong? Team Cherry was having a blast. They still are.

This is their story: www.bloomberg.com/news/newslet...
Why ‘Silksong’ Took Seven Years to Make
The highly anticipated indie game has been in production for so long that it’s become an internet meme
www.bloomberg.com
zeux.io
@akien.bsky.social Any sense as to the timeline of 4.5 release? Did not expect 2 months of betas 😅
zeux.io
now I can finally relax a little 😩
zeux.io
Same as in binary identical positions? That's guaranteed :)
zeux.io
A standard VB/IB, so 6 topologically disconnected patches in one mesh if I understood your description correctly, with 3 copies of each corner vertex.
zeux.io
Thanks to Valve for sponsoring most of the work on the core library in this release!

The documentation has seen significant further structural improvements in multiple sections and I'm now quite happy with it :) Give it a read if you haven't yet!

meshoptimizer.org
zeux.io
... improvements to appearance by default and more non-simplification work!

gltfpack now supports permissive simplification as well as WebP texture compression support. See release notes for even more library & gltfpack changes.

GitHub stars and boosts are appreciated!

github.com/zeux/meshopt...
Release v0.25 · zeux/meshoptimizer
This release contains many improvements to the meshoptimizer library, with a particular focus on simplification algorithms, as well as several new gltfpack features. Highlights: New simplification...
github.com
zeux.io
meshoptimizer v0.25 is out! Featuring new simplification function that optimizes positions and attributes for appearance, experimental permissive mode to simplify faceted regions with selective seam preservation, regularization option for improved tessellation quality and deformation, multiple ...