Keenan Crane
@keenancrane.bsky.social
2.8K followers 120 following 260 posts
Digital Geometer, Associate Professor of Computer Science & Robotics at Carnegie Mellon University. There are four lights. https://www.cs.cmu.edu/~kmcrane/
Posts Media Videos Starter Packs
keenancrane.bsky.social
“Fair dice” might make you think of perfect cubes with equal frequencies (say, 1/6 on all sides) 🎲

But “fair” really just means you get the frequencies you expect (say, 1/4, 1/4 & 1/2)

We can now design fair dice with any frequencies—and any shape! 🐉

hbaktash.github.io/projects/put...
keenancrane.bsky.social
Tangent-point energy works for (2).

To incorporate (1) I might (strongly) penalize the distance from each data point p to the *closest* point on the curve. This encourages at least one point of the curve to pass through each data point, without pulling on the whole curve.
keenancrane.bsky.social
Thanks for the thought-provoking example. 😊
keenancrane.bsky.social
Reminds me of the Kahneman and Tversky experiments (“Steve is more likely to be a librarian than a farmer.”) If LLMs are trained on human-generated text, it doesn’t seem reasonable to expect them to be smarter than the average text-generating human. (Though they sometimes are anyway.)
keenancrane.bsky.social
On the other hand, I was too dumb to recognize the subtlety on first glance. So maybe the model is “just as bad as a human?”
keenancrane.bsky.social
So, in the the absence of any priors or additional information, 1/3 is a reasonable-ish approximation. But I agree it would be far better if the model simply said “that’s hard to answer because there are many ambiguous factors” (as I have).
keenancrane.bsky.social
This one’s not so clear cut: “baby” is an ambiguous age range, and a baby can be a twin or triplet, born in any order. Even a newborn could have younger step siblings in rare cases.

We’re also presuming it’s a human baby, whereas other species have different life spans.
keenancrane.bsky.social
Not seeing it. What’s wrong with this answer? (There are six possible permutations, but the other two siblings are interchangeable…)
keenancrane.bsky.social
I adapted Unicodeit! (See the acknowledgment section on GitHub; also meant to mention that in the footer).

I had been using your website for years, but wanted something more integrated.

Thank you for contributing to open source. 😁
keenancrane.bsky.social
I got tired of mashing together tools to write long threads with 𝐫𝐢𝐜𝐡 𝑓𝑜𝑟𝑚𝑎𝑡𝑡𝑖𝑛𝑔 and ℳα†ℏ—so I wrote La𝑇𝑤𝑒𝑒𝑡!

It converts Markdown and LaTeX to Unicode that can be used in “tweets”, and automatically splits long threads. Try it out!

keenancrane.github.io/LaTweet/
keenancrane.bsky.social
(More seriously: if the geometry of the apples was well-captured by the artist, and the color is unique to that geometry, I would be willing to bet the answer is “yes.”)
keenancrane.bsky.social
If it began life as a drawing, is that question even well-posed?
keenancrane.bsky.social
Oh, you wrote a book on this stuff. I guess I didn't need to be quite so didactic in my response! ;-)
keenancrane.bsky.social
(But I take your point: it's hard to get all these different nuances across precisely in diagrams. That's why we also have mathematical notation to go along with the diagrams! :-) )
keenancrane.bsky.social
Well, f maps *any* point of the data space to the latent space, and g maps *any* point of the latent space to the data space. I.e.,

f : ℝⁿ → ℝᵏ,
g : ℝᵏ → ℝⁿ.

The point x is just one example. So it might in fact be misleading to imply that f gets applied only to x, or that ends only at x̂.
keenancrane.bsky.social
P.S. I should also mention that these diagrams were significantly improved via feedback from many folks from here and elsewhere.

Hopefully they account for some of the gripes—if not, I'm ready for the next batch! 😉

bsky.app/profile/keen...
keenancrane.bsky.social
I can't* fathom why the top picture, and not the bottom picture, is the standard diagram for an autoencoder.

The whole idea of an autoencoder is that you complete a round trip and seek cycle consistency—why lay out the network linearly?
keenancrane.bsky.social
Of course, there will be those who say that the representation diagram is “obvious,” and “that's what everyone has in their head anyway.”

If so… good for you! If not, I hope this alternative picture provides some useful insight as you hack in this space. 😉

[End 🧵]
keenancrane.bsky.social
If you want to use or repurpose these diagrams, the source files (as PDF) can be found at

cs.cmu.edu/~kmcrane/Aut...

(Licensed under CC0 1.0 Universal)
cs.cmu.edu
keenancrane.bsky.social
Likewise, here's a simpler “implementation” diagram, that still retains the most important idea of an *auto*-encoder, namely, that you're comparing the output against *itself*.
keenancrane.bsky.social
Personally, I find both of these diagrams a little bit crowded—here's a simpler “representation” diagram, with fewer annotations (that might anyway be better explained in accompanying text).
keenancrane.bsky.social
Finally, a natural question raised by this picture is: how do I sample/generate new latents z? For a “vanilla” autoencoder, there's no simple a priori description of the high-density regions.

This situation motivates *variational* autoencoders (which are a whole other story…).
keenancrane.bsky.social
It should also be clear that, unless the reconstruction loss is exactly zero, the learned manifold M only approximates (rather than interpolates) the given data. For instance, x does not sit on M, even though x̂ does.

(If M does interpolate all xᵢ, you're probably overfitting)
keenancrane.bsky.social
Another thing made clear by this picture is that, no matter what the true dimension of the data might be, the manifold M predicted by the decoder generically has the same dimension as the latent space: it's the image of R^k under g.

So, the latent dimension is itself a prior.
keenancrane.bsky.social
In regions where we don't have many samples, the decoder g isn't reliable: we're basically extrapolating (i.e., guessing) what the true data manifold looks like.

The diagram suggests this idea by “cutting off” the manifold—but in reality there’s no clear, hard cutoff.