Lightnews — Scholar-powered news

Keenan Crane @keenancrane.bsky.social · 13d

“Fair dice” might make you think of perfect cubes with equal frequencies (say, 1/6 on all sides) 🎲

But “fair” really just means you get the frequencies you expect (say, 1/4, 1/4 & 1/2)

We can now design fair dice with any frequencies—and any shape! 🐉

hbaktash.github.io/projects/put...

5 18

Keenan Crane @keenancrane.bsky.social · 13d

Nicely produced clip by Matt Wein and Marylee Williams about our recent dice design project at @scsatcmu.bsky.social and @adobe.com

youtube.com/shorts/jD0ag...

🎲 🎥 🐉 🪙

Rethinking Fair Dice

YouTube video by CMU School of Computer Science

youtube.com

1 3 15

Keenan Crane @keenancrane.bsky.social · 14d

Tangent-point energy works for (2).

To incorporate (1) I might (strongly) penalize the distance from each data point p to the *closest* point on the curve. This encourages at least one point of the curve to pass through each data point, without pulling on the whole curve.

2

Keenan Crane @keenancrane.bsky.social · 19d

Thanks for the thought-provoking example. 😊

1

Keenan Crane @keenancrane.bsky.social · 19d

Reminds me of the Kahneman and Tversky experiments (“Steve is more likely to be a librarian than a farmer.”) If LLMs are trained on human-generated text, it doesn’t seem reasonable to expect them to be smarter than the average text-generating human. (Though they sometimes are anyway.)

1 1

Keenan Crane @keenancrane.bsky.social · 19d

On the other hand, I was too dumb to recognize the subtlety on first glance. So maybe the model is “just as bad as a human?”

1 1

Keenan Crane @keenancrane.bsky.social · 19d

So, in the the absence of any priors or additional information, 1/3 is a reasonable-ish approximation. But I agree it would be far better if the model simply said “that’s hard to answer because there are many ambiguous factors” (as I have).

1 1

Keenan Crane @keenancrane.bsky.social · 19d

This one’s not so clear cut: “baby” is an ambiguous age range, and a baby can be a twin or triplet, born in any order. Even a newborn could have younger step siblings in rare cases.

We’re also presuming it’s a human baby, whereas other species have different life spans.

1 1

Keenan Crane @keenancrane.bsky.social · 20d

Not seeing it. What’s wrong with this answer? (There are six possible permutations, but the other two siblings are interchangeable…)

1

Keenan Crane @keenancrane.bsky.social · 27d

I adapted Unicodeit! (See the acknowledgment section on GitHub; also meant to mention that in the footer).

I had been using your website for years, but wanted something more integrated.

Thank you for contributing to open source. 😁

1 2

Keenan Crane @keenancrane.bsky.social · 27d

I got tired of mashing together tools to write long threads with 𝐫𝐢𝐜𝐡 𝑓𝑜𝑟𝑚𝑎𝑡𝑡𝑖𝑛𝑔 and ℳα†ℏ—so I wrote La𝑇𝑤𝑒𝑒𝑡!

It converts Markdown and LaTeX to Unicode that can be used in “tweets”, and automatically splits long threads. Try it out!

keenancrane.github.io/LaTweet/

3 17 89

Keenan Crane @keenancrane.bsky.social · Sep 6

(More seriously: if the geometry of the apples was well-captured by the artist, and the color is unique to that geometry, I would be willing to bet the answer is “yes.”)

1

Keenan Crane @keenancrane.bsky.social · Sep 6

If it began life as a drawing, is that question even well-posed?

1 1

Keenan Crane @keenancrane.bsky.social · Sep 6

Oh, you wrote a book on this stuff. I guess I didn't need to be quite so didactic in my response! ;-)

4

Keenan Crane @keenancrane.bsky.social · Sep 6

(But I take your point: it's hard to get all these different nuances across precisely in diagrams. That's why we also have mathematical notation to go along with the diagrams! :-) )

Keenan Crane @keenancrane.bsky.social · Sep 6

Well, f maps *any* point of the data space to the latent space, and g maps *any* point of the latent space to the data space. I.e.,

f : ℝⁿ → ℝᵏ,
g : ℝᵏ → ℝⁿ.

The point x is just one example. So it might in fact be misleading to imply that f gets applied only to x, or that ends only at x̂.

3 1

Keenan Crane @keenancrane.bsky.social · Sep 6

P.S. I should also mention that these diagrams were significantly improved via feedback from many folks from here and elsewhere.

Hopefully they account for some of the gripes—if not, I'm ready for the next batch! 😉

bsky.app/profile/keen...

Keenan Crane @keenancrane.bsky.social · Aug 29

I can't* fathom why the top picture, and not the bottom picture, is the standard diagram for an autoencoder.

The whole idea of an autoencoder is that you complete a round trip and seek cycle consistency—why lay out the network linearly?

1 2

Keenan Crane @keenancrane.bsky.social · Sep 6

Of course, there will be those who say that the representation diagram is “obvious,” and “that's what everyone has in their head anyway.”

If so… good for you! If not, I hope this alternative picture provides some useful insight as you hack in this space. 😉

[End 🧵]

1 3

Keenan Crane @keenancrane.bsky.social · Sep 6

If you want to use or repurpose these diagrams, the source files (as PDF) can be found at

cs.cmu.edu/~kmcrane/Aut...

(Licensed under CC0 1.0 Universal)

cs.cmu.edu

1 5

Keenan Crane @keenancrane.bsky.social · Sep 6

Likewise, here's a simpler “implementation” diagram, that still retains the most important idea of an *auto*-encoder, namely, that you're comparing the output against *itself*.

1 1

Keenan Crane @keenancrane.bsky.social · Sep 6

Personally, I find both of these diagrams a little bit crowded—here's a simpler “representation” diagram, with fewer annotations (that might anyway be better explained in accompanying text).

1 1

Keenan Crane @keenancrane.bsky.social · Sep 6

Finally, a natural question raised by this picture is: how do I sample/generate new latents z? For a “vanilla” autoencoder, there's no simple a priori description of the high-density regions.

This situation motivates *variational* autoencoders (which are a whole other story…).

1 2

Keenan Crane @keenancrane.bsky.social · Sep 6

It should also be clear that, unless the reconstruction loss is exactly zero, the learned manifold M only approximates (rather than interpolates) the given data. For instance, x does not sit on M, even though x̂ does.

(If M does interpolate all xᵢ, you're probably overfitting)

1 3

Keenan Crane @keenancrane.bsky.social · Sep 6

Another thing made clear by this picture is that, no matter what the true dimension of the data might be, the manifold M predicted by the decoder generically has the same dimension as the latent space: it's the image of R^k under g.

So, the latent dimension is itself a prior.

1 3

Keenan Crane @keenancrane.bsky.social · Sep 6

In regions where we don't have many samples, the decoder g isn't reliable: we're basically extrapolating (i.e., guessing) what the true data manifold looks like.

The diagram suggests this idea by “cutting off” the manifold—but in reality there’s no clear, hard cutoff.

1 2