Dad, husband, brother, son. Working on Dart at Google, ex-game dev at EA, wrote "Game Programming Patterns" and "Crafting Interpreters". http://stuffwithstuff.com/
Ah, you're right. I assumed that the JSON spec went down to the encoding level, but it doesn't. (Though it does do a funny little dance to acknowledge that JSON string literals may contain UTF-16 surrogate pairs but then punts on whether an implementation treats them as a single code point or not.)
October 22, 2025 at 10:36 PM
Ah, you're right. I assumed that the JSON spec went down to the encoding level, but it doesn't. (Though it does do a funny little dance to acknowledge that JSON string literals may contain UTF-16 surrogate pairs but then punts on whether an implementation treats them as a single code point or not.)
The unicorn emoji 🦄 is 1 code point (0x0001f984), 2 code units in UTF-16 (0xd83e, 0xdd84), and 4 bytes in UTF-16 (0xd8, 0x3e, 0xdd, 0x84). The answer is different for encodings like UTF-8 or UTF-32 (JSON is UTF-16). And there is a whole separate ball of complexity around grapheme clusters.
October 6, 2025 at 5:40 PM
The unicorn emoji 🦄 is 1 code point (0x0001f984), 2 code units in UTF-16 (0xd83e, 0xdd84), and 4 bytes in UTF-16 (0xd8, 0x3e, 0xdd, 0x84). The answer is different for encodings like UTF-8 or UTF-32 (JSON is UTF-16). And there is a whole separate ball of complexity around grapheme clusters.
Is "length" here bytes, code points, or code units? :) I hate that I can't read anything about string length without that question immediately entering my mind.
September 22, 2025 at 8:11 PM
Is "length" here bytes, code points, or code units? :) I hate that I can't read anything about string length without that question immediately entering my mind.
There are a bunch of reports in the issue trackers for both books already, but I'm trying to not touch them at all. (It's easy to fix the online versions, but then I worry about the online versions getting out of sync with the print versions and not realizing that if I update the print editions.)
July 16, 2025 at 12:58 AM
There are a bunch of reports in the issue trackers for both books already, but I'm trying to not touch them at all. (It's easy to fix the online versions, but then I worry about the online versions getting out of sync with the print versions and not realizing that if I update the print editions.)
Yeah, having written a parser for it... Markdown is just not a great language. CommonMark helps regularize it some, but in the early days, it was wild.
I don't have advice that would fit in a bsky post beyond the general "avoid inventing a language unless you have to (or want to)". :)
July 2, 2025 at 12:55 AM
Yeah, having written a parser for it... Markdown is just not a great language. CommonMark helps regularize it some, but in the early days, it was wild.
I don't have advice that would fit in a bsky post beyond the general "avoid inventing a language unless you have to (or want to)". :)
@mrale.ph would definitely know better than me. Compiler folks who work on optimizations are by nature cagey about definitive answers to questions like this because they want their future selves to have the freedom to change optimizations without breaking users who rely on them.
May 31, 2025 at 1:23 AM
@mrale.ph would definitely know better than me. Compiler folks who work on optimizations are by nature cagey about definitive answers to questions like this because they want their future selves to have the freedom to change optimizations without breaking users who rely on them.