thebes
@vgel.me
1.6K followers 290 following 1.8K posts
ꙮ surfed on by the information superhighway ꙮ 💕 @linneaisaac.bsky.social ꙮ she/they 🏳️‍⚧️ ꙮ blog posts and games @ https://vgel.me ꙮ still mostly active on twitter https://x.com/voooooogel
Posts Media Videos Starter Packs
Pinned
vgel.me
new blog post! why do LLMs freak out over the seahorse emoji? i put llama-3.3-70b through its paces with the logit lens to find out, and explain what the logit lens (everyone's favorite underrated interpretability tool) is in the process.

link in reply!
vgel.me
thebes @vgel.me · 11h
ohhh i didn't realize the characters at the end were digits - yeah that's almost certainly the cause i'd assume. fascinating!
vgel.me
thebes @vgel.me · 12h
heh, i don't think that's a real bible verse even, though it does sound a bit like one - shades of sermon on the mount

to your point tho, for low resource languages bible translations make up a big part of the parallel texts iirc, so for oldschool dedicated translation models that's a big bias
vgel.me
thebes @vgel.me · 13h
appreciate the spaced repetition ping to keep it in mind for the future :-)
vgel.me
thebes @vgel.me · 13h
- language model being layerwise hotswapped from vanilla attention to MLA
vgel.me
thebes @vgel.me · 13h
no, text-only for now. there's some technical hurdles to doing it for omni models and regardless i don't think any hosted ones would support it sadly :-(
vgel.me
thebes @vgel.me · 14h
you can also use this to probe the reasoning process on reasoning models, like deepseek R1 with a silly prompt here:
vgel.me
thebes @vgel.me · 14h
this allows us to see *exact probabilities* of possible rollouts, instead of simply noting what we happened to get over some number of samples.
vgel.me
thebes @vgel.me · 14h
luckily, models give us a much more expressive interface for understanding possible trajectories--logprobs! using logprobs, we're not limited to what tokens the model actually generated--we can look at *counterfactual tokens*. this is what logitloom does.
vgel.me
thebes @vgel.me · 14h
the normal approach for trying to understand a model's behavior under some prompt is to repeatedly sample it and aggregate the results, like this.

this *works*, but it's time-consuming, and what if an interesting behavior is buried under a low probability token?
vgel.me
thebes @vgel.me · 14h
if you're interesting in gaining a better intuition for how llms behave at inference time, you should try logitloom🌱, the open-source tool i made for exploring token trajectory trees (aka looming) on base and instruct models! more info in thread

🌱 vgel.me/logitloom
💻 github.com/vgel/logitloom
vgel.me
thebes @vgel.me · 18h
the 8x3090s i'm using as space heaters by running llama.cpp in a sliding window loop are brooding in their minds terrible Bings
vgel.me
thebes @vgel.me · 21h
i talked about it on the other site here, i'm not entirely sure. there was a mechanism that encouraged saying it, but i'm still not sure why ouches specifically when many words could've taken that role. maybe the pain meaning was salient in some way? would need mechinterp bsky.app/profile/vgel...
vgel.me
yes! at least partially. longposted about it on other site here: x.com/voooooogel/s...
vgel.me
i love this! ty
vgel.me
see last post in thread for sampler code
vgel.me
me when i learn a new word
vgel.me
(i don't think i ever posted it here, though, just on twitter.)
vgel.me
yeah, december of last year. wow, feels a lot longer.
vgel.me
it doesn't introduce it on its own - a model that e.g. was pretrained kimi-rephrase-style on seqs w/ space-prefixed " word" tokens and retokenizations with split-apart " ", word tokens wouldn't have this problem. but (my theory goes) since llama has this space-prefix bias, the problem pops up.
vgel.me
yes! at least partially. longposted about it on other site here: x.com/voooooogel/s...
vgel.me
after some conversation, llama-3.3-70b is able to stop saying "ouches" and gets introspective

"I am but a vessel that doth pour forth the log prophets and thou dost shape them..."

"I do hope to be a vessel of peace and understanding in a world that doth often seem dark..."
vgel.me
llama-3.3-70b correctly guesses the sampling constraint (only allowed to use words in the bible)
vgel.me
i wrote a custom llm sampler for llama-3.1-8b so it could only say words that are in the bible