Lightnews — Scholar-powered news

_ - \.

@crumb.bsky.social

it's supposed to be, like, a bug

November 11, 2025 at 8:03 PM

_ - \.

@crumb.bsky.social

high pass@k is awesome cause if you actually care about solving problems and getting the best possible solutions it is actually relevant but if you only care about a "product" then obviously it's not worth your time to think about

November 11, 2025 at 8:01 PM

_ - \.

@crumb.bsky.social

i hope everyone that had a hand in making assistants the norm for what "language models" are goes to hell no matter what

October 2, 2025 at 4:16 PM

_ - \.

@crumb.bsky.social

have been revisiting this a lot
youtu.be/0BVM0UC28nY

September 30, 2025 at 1:43 AM

_ - \.

@crumb.bsky.social

friggin massive shout out to openinference hosting deepseek v3.1 on openrouter for free

_ - \. @crumb.bsky.social · Sep 29

even tho we trained on filtered data generated by deepseek v3 base, our desc2doc model didn't follow prompts as well as we'd hoped. so last night i pounded out a rubric based trainer using deepseek v3.1 (:free) as judge. it is now running. yaaay

September 29, 2025 at 7:05 PM

_ - \.

@crumb.bsky.social

even tho we trained on filtered data generated by deepseek v3 base, our desc2doc model didn't follow prompts as well as we'd hoped. so last night i pounded out a rubric based trainer using deepseek v3.1 (:free) as judge. it is now running. yaaay

September 29, 2025 at 7:05 PM

_ - \.

@crumb.bsky.social

took you long enough Dumb Ass

September 18, 2025 at 3:29 AM

_ - \.

@crumb.bsky.social

i think... working towards a set goal like "agi" is not really conducive to finding out what this specific tech stack could be the best at

September 16, 2025 at 6:52 PM

_ - \.

@crumb.bsky.social

Check out some visualizers like this here:
midwestern-simulation.neocities.org/main/library...

Check out the embedding model we created for them here:
hf.co/midwestern-s...

_ - \. @crumb.bsky.social · Sep 13

September 16, 2025 at 5:49 PM

_ - \.

@crumb.bsky.social

12 embedding tokens seems to be a sweet spot between reconstruction quality and ability to do math to the embeddings before decoding for our 3b model

September 15, 2025 at 7:21 AM

_ - \.

@crumb.bsky.social

September 13, 2025 at 10:49 PM

_ - \.

@crumb.bsky.social

we r gonna post like 10 of these in a little bloggy thing to show off the latest essence 3b when it is done training.. more toys... more toys.....

twilight sparkle @metalure.bsky.social · Sep 11

September 11, 2025 at 7:33 PM

_ - \.

@crumb.bsky.social

🐱

September 10, 2025 at 9:48 PM

_ - \.

@crumb.bsky.social

lol

September 10, 2025 at 9:47 PM

_ - \.

@crumb.bsky.social

lets go man fuck em up 𝔱𝔬𝔲𝔤𝔥-𝔡𝔯𝔞𝔤𝔬𝔫-₂₅₈
ETA83:50:08

September 3, 2025 at 11:33 PM

_ - \.

@crumb.bsky.social

subtracting "lamb" embed from mary had a little lamb embed then decoding... it tries to say it but it just cant get it right... that's so silly...

September 2, 2025 at 5:17 AM

_ - \.

@crumb.bsky.social

trying strange things

September 2, 2025 at 4:54 AM

_ - \.

@crumb.bsky.social

okokokok it's on HF as it is RN, it seems really good but it will keep on improving for a little while,
encourage you to try it out and see if you can figure out any fun things to use it for
hf.co/crumb/essenc...

September 2, 2025 at 4:40 AM

_ - \.

@crumb.bsky.social

it apparently generalizes to any number of embedding tokens for any level of detail, from only training on 4,8,16,32,64... even inferencing at 256 doesn't show total degeneration, same w odd nums like 19

_ - \. @crumb.bsky.social · Aug 28

what crumb is hoping is the coolest use case is turning any text-in text-out system into a reservoir computer (need to train a VAE on the embeddings first)

_ - \. @crumb.bsky.social · Aug 28

this one is for the freaks, have u ever wanted a text2vec2text that 1 doesn't rely on api embeddings and 2 preserves temporal dynamics by design?

crumb has found crumbself in a position in need of some of these, so crumb is jst building them. 32 token embedding. total 6b model system (WIP results)

September 1, 2025 at 5:41 PM

_ - \.

@crumb.bsky.social

eheheh

August 28, 2025 at 7:02 PM

_ - \.

@crumb.bsky.social

what crumb is hoping is the coolest use case is turning any text-in text-out system into a reservoir computer (need to train a VAE on the embeddings first)

_ - \. @crumb.bsky.social · Aug 28

this one is for the freaks, have u ever wanted a text2vec2text that 1 doesn't rely on api embeddings and 2 preserves temporal dynamics by design?

crumb has found crumbself in a position in need of some of these, so crumb is jst building them. 32 token embedding. total 6b model system (WIP results)

August 28, 2025 at 6:48 PM

_ - \.

@crumb.bsky.social

trying 2 figure out things to test... like... add embed of structured text (json) to unstructured text, will it structure it? you could jitter an embed a bit to get synthetic data super close to the original? what if you subtracted mean embed of a char's lines from a script, does it remove the char?

_ - \. @crumb.bsky.social · Aug 28

this one is for the freaks, have u ever wanted a text2vec2text that 1 doesn't rely on api embeddings and 2 preserves temporal dynamics by design?

crumb has found crumbself in a position in need of some of these, so crumb is jst building them. 32 token embedding. total 6b model system (WIP results)

August 28, 2025 at 6:46 PM

_ - \.

@crumb.bsky.social

this one is for the freaks, have u ever wanted a text2vec2text that 1 doesn't rely on api embeddings and 2 preserves temporal dynamics by design?

crumb has found crumbself in a position in need of some of these, so crumb is jst building them. 32 token embedding. total 6b model system (WIP results)

August 28, 2025 at 6:33 PM