Nigel J W
nigeljw.bsky.social
Nigel J W
@nigeljw.bsky.social
GPU Software Architect at Arm. Previously, Unity, PlayStation, and Qualcomm.

My opinions do not represent any of my current or past employers.
I like your default error buffer suggestion. It both leverages and validates the feature.
December 30, 2025 at 9:57 PM
One other modern aspect that is a bit of a pain point at the moment is the state object addition flow. The only content is wrt to RT, but that feature is independent of RT, and it was a bit painful to get set up for sparse nodes with work graphs without existing explicit references to follow.
December 30, 2025 at 12:52 PM
The main problem as you implied is that the correct root flags need to be defined in shader land, as well as the correct buffer indexing. As part of the header, some simple golden reference high level shader examples could be provided in combination with the client side header.
December 30, 2025 at 12:07 PM
It can also be challenging (mainly due to the lack of references) to extract the layout from the compiled binary through the compiler functions and feed it into the root signature API. Having a helper header for that specifically sounds like a great idea. This removes so much potential for error.
December 30, 2025 at 11:55 AM
I don't think that going bindless is overly opinionated, as having to manually define the layouts is very burdensome and creates a canyon of potential discrepancy between the CPU side layouts and the shader definitions is the biggest pain point especially for new devs.
December 30, 2025 at 11:52 AM
I completely agree with vfig about how the heavy abstractions in most reference samples and frameworks make it very hard to learn about the API or customize the sample, which is true across almost all APIs even console reference examples.
December 30, 2025 at 11:46 AM
I think you could refer to it as a minimal abstraction golden reference helper header, where the correct modern paths are used. One awful pain point with existing DX sample references is the lack of Compiler3 entrypoint usage, and the older paths are actually already deprecated. 😓
December 30, 2025 at 11:44 AM
Separately, I want to experiment more with Halide. I spent a few months over a year ago toying with Triton for some basic computational fluids, but the lack of array indexing made it limited for general purpose compute. I want to explore Halide more, but the declarative style is a shift for me.
December 28, 2025 at 5:05 PM
"I'm not saying it outperforms all experts, but that it can in future, with enough compute." This is exactly what I disagree with, the idea that if we just throw more compute at the problem, then it will get better, which is a fallacy. I don't disagree with everything you have said though.
December 28, 2025 at 4:38 PM
I'm on a train with spotty logic with a 5 year old sleeping on me, and this is a bit hard to debate with such a limited amount of characters.
December 28, 2025 at 4:35 PM
I'm just not convinced that our future contributions will be exclusive to high level logic, as LLMs desperately struggle to understand basic logic. It barely understands positional logic (up versus down, left versus right). Hardware complexity is too complex, proprietary, and continuously evolving.
December 28, 2025 at 4:31 PM
I'm not dismissing using AI for autotuning, as I think that is a hero use case for AI. At the moment, I still feel that I spend more time building the context for an AI prompt and fixing the output than I do writing the code myself. I do agree that there will be a shift as we are already seeing.
December 28, 2025 at 4:26 PM
There is almost always a human in the loop with agentic flows and pretty much all metrics used in research are pure self bias. Any metrics struggle to properly represent any actual reality (perception distortion paradox).
Production environments are a completely different beast.
December 28, 2025 at 4:20 PM
Autotuning existing generic kernels with explicit constraints compared with designing new sota compute algorithms. Conditioned agentic flow for autotuning (CudaForge) is interesting, but implying it outperforms all experts on all problems is dangerous nonsense. GenAI == GenericAI
December 28, 2025 at 4:09 PM
As explicit examples of significant weaknesses, even the latest world models do not understand the mapping between high level shader code and the intermediate representation. They also inject imaginary intrinsics that do not exist when it doesn't understand how to solve a specific compute fragment.
December 27, 2025 at 9:08 PM
Specialized compute is also the backend of GenAI.
December 27, 2025 at 9:04 PM
As you point out, it is unlikely that world models get access to proper RTL designs, which is also true for the ever increasing complexity from the continuous development of proprietary compute algorithms which are software houses' core IP. Specialized compute still remains out of reach for GenAI.
December 27, 2025 at 8:59 PM
This is such a tease for me right now. Hard to find decent mincemeat tarts in Copenhagen. I did not realize it is possible to get the mix in a jar.
December 27, 2025 at 8:43 PM
We have open positions across Europe with Cambridge, Lund, and Trondheim being the main locations focused on graphics: careers.arm.com
Working at Arm | Jobs & Careers
Join our ecosystem and help shape a future where AI and compute drive life-changing innovation that empowers smarter living, working, and connecting—built on Arm.
careers.arm.com
December 20, 2025 at 8:13 PM
That's cool. Are you doing the light binning pass on the GPU or the CPU side?
December 16, 2025 at 6:40 PM
I really miss the internal yearly Christmas clearance sale. I think my wife is glad that I don't come home each December with more t-shirts and other merchandise anymore...
December 9, 2025 at 6:28 PM
That's great to hear. I went last year and really enjoyed it. It was a good mix of people across the industry with a really fantastic representation from games studios. A lot of interesting perspectives.
November 25, 2025 at 7:22 PM