Raph Levien
@raphlinus.bsky.social
810 followers 96 following 74 posts
Doing fundamental research in UI and 2D graphics
Posts Media Videos Starter Packs
raphlinus.bsky.social
I gave a seminar entitled "How Rust won: The quest for performant, reliable software" at the Topos Institute on Jun 3, and the video (youtu.be/k_-6KI3m31M) is now published. I hope people enjoy it!
[Berkeley Seminar] Raph Levien | How Rust won: the quest for performant, reliable software
YouTube video by Topos Institute
youtu.be
raphlinus.bsky.social
Is this a microcontroller that has a 1-cycle multiply instruction, like RP2040 or better? In that case, FNV might be a good choice. If not, Jenkins is a possibility. You might also take a look at github.com/ztanml/fast-....
GitHub - ztanml/fast-hash: Fast hash function learned using genetic programming
Fast hash function learned using genetic programming - ztanml/fast-hash
github.com
raphlinus.bsky.social
I've subscribed. We need more explorations of this space, especially from a human rather than business-centered perspective. I'm eager to see where it goes.
raphlinus.bsky.social
We have our usual monthly update for Linebender: linebender.org/blog/tmil-19/

A lot of good progress, especially on the renderer, and fearless_simd is coming along well. Also some personal news: I'm leaving Google and taking a new role at Canva, moving to Australia in January.
Linebender in July 2025
Linebender in July 2025
linebender.org
Reposted by Raph Levien
opalescentopal.bsky.social
With Tom Lehrer's passing, I suppose this is a moment to share the story of the prank he played on the National Security Agency, and how it went undiscovered for nearly 60 years.
raphlinus.bsky.social
The pardon is the carrot. The stick, not just for Ghislaine, is the threat of being accused of treason (or some other offense with capital punishment) if you've got factual information and facilitate that being released to the public, as is being done with the Russia scandal. Timing no coincidence.
Reposted by Raph Levien
georgetakei.bsky.social
When I was little, the U.S. military came to our home at gunpoint and took me and my family away. We were imprisoned for years in barbed wire camps simply because we were Japanese American. I have spent my life telling that story, hoping it would never be repeated.
raphlinus.bsky.social
Cool to see this work happening, and look forward to some nice GPU compute acceleration for path rendering!
raphlinus.bsky.social
Topnotch reporting from Marisa Kabas on a very important topic. It's maddening that mainstream news sources are not meeting the moment. Support independent journalism by subscribing.
marisakabas.bsky.social
NEW from me — I wrote about FEMA Acting Administrator David Richardson, who has been completely silent and out of view since deadly floods devastated Texas last week, and spoke to staffers about what it's like to respond to a disaster with no leader.

Read here:
Have you seen this man?
In the wake of deadly floods in Texas, FEMA Acting Administrator David Richardson is nowhere to be found.
www.thehandbasket.co
raphlinus.bsky.social
Took me a little while to place, but yeah. Thanks!
raphlinus.bsky.social
That was a fun Saturday morning exercise. I did it on pen and paper, got it mostly right (missed a sign, which is a weakness), then validated it in Desmos. I personally choose to consider this not cheating, it's my standard toolset for such things, but maybe I should practice doing it from scratch.
raphlinus.bsky.social
But (though I was kinda expecting to), I still can't repro. I just get an undefined behavior error in debug, and the intuitively correct answer in release. I'm afraid Rust just doesn't have the expressive power for eldritch horror that C does.
raphlinus.bsky.social
fn function(a: u16, b: u16) {
let c = unsafe { (a as i32).unchecked_mul(b as i32) as u32 };
if c < 2147483648 {
println!("{c} is less than 2147483648");
} else {
println!("{c} is greater than or equal to 2147483648");
}
}

fn main() {
function(65535, 65535);
}
raphlinus.bsky.social
I tried porting this to Rust and am unable to repro. What am I doing wrong?
raphlinus.bsky.social
Four here. I was hoping to get 3 by doing paper in collaboration with Norm Megill, back when I was working on Metamath, but sadly he has passed. Another reminder that the time we have here is precious, not to get maudlin on your thread.
raphlinus.bsky.social
...being able write super-performant libraries and then sufficient abstraction to use those libraries. In the limit, you get something like PyTorch (also where MLIR is going), where you write very high level stuff and magic happens to run it on your GPU efficiently.
raphlinus.bsky.social
Ah, makes sense. I think this question is conflating pedagogy ("how do you teach modern CPU and GPU performance") with language design ("how do you write programs that are maintainable and also exploit available performance"). The latter pulls you in a different direction, mostly an emphasis on...
raphlinus.bsky.social
If your question is, "how could CPUs be designed for radically better performance," then I think the answer is GPU-like parallelism but without the impoverished execution model. The problem is that we have no idea how to program such a beast, and it would be weak on existing workloads.
raphlinus.bsky.social
Depending on the exact workload of course, I suspect many problems could get a 2x or more speedup by adopting data oriented design and exploiting SIMD. Especially on modern SIMD with predication (AVX-512, SVE, RVV). Of course, in some domains (media codecs) this is already done.
raphlinus.bsky.social
I think more explicit control fine-grained scheduling has been tried and failed (Itanic). I doubt more explicit control over memory hierarchy would buy much. Explicit SIMD parallelism is, in my mind, the biggest underexploited potential performance gain, and "just" needs language support.
raphlinus.bsky.social
I'm not seeing the distinction. Assembly exactly mirrors the capabilities of the hardware, the only problems are ergonomics and portability.
raphlinus.bsky.social
No but you can't do that in assembler either. There are some things you can do in assembler but not C, like constant time crypto, and certain goto variants such as tail calls and coroutines (the latter if which is now available in C++).
raphlinus.bsky.social
On a CPU, there's not much about the memory hierarchy you can't address from C, modulo wider SIMD load/store which is neither standard nor portable. Neon has another twist, which is built-in permutation.
raphlinus.bsky.social
On a GPU, you do need that, and it's (mostly) explicit, especially "workgroup shared memory." One missing feature in IR's (much less higher level languages) is the ability to explicitly address scratch memory, you usually only get that implicitly from register spills.
raphlinus.bsky.social
To answer the question for GPU is a blog post or research program. There are huge missing pieces at the low level, everybody is chasing the higher levels. C++ variants are only ok, and have a big impedance mismatch on non-Nvidia hardware.