Author | Lightnews

x__- @crumb.bsky.social · Oct 2

i hope everyone that had a hand in making assistants the norm for what "language models" are goes to hell no matter what

6 14

x__- @crumb.bsky.social · Sep 30

have been revisiting this a lot
youtu.be/0BVM0UC28nY

1 1

x__- @crumb.bsky.social · Sep 29

idk if you're supposed to use it like this but you can

1 2

x__- @crumb.bsky.social · Sep 29

friggin massive shout out to openinference hosting deepseek v3.1 on openrouter for free

x__- @crumb.bsky.social · Sep 29

even tho we trained on filtered data generated by deepseek v3 base, our desc2doc model didn't follow prompts as well as we'd hoped. so last night i pounded out a rubric based trainer using deepseek v3.1 (:free) as judge. it is now running. yaaay

1 1

x__- @crumb.bsky.social · Sep 29

if you dont write your trainers from scratch tailored specifically for the needs of every new task you're... well probably normal but it's really fun once you get in the habit. plus you internalize how it works a lot better. plus lots of time to listen to new music

x__- @crumb.bsky.social · Sep 29

even tho we trained on filtered data generated by deepseek v3 base, our desc2doc model didn't follow prompts as well as we'd hoped. so last night i pounded out a rubric based trainer using deepseek v3.1 (:free) as judge. it is now running. yaaay

1

x__- @crumb.bsky.social · Sep 18

Crumb You're being a little hard on the model, you are pushing information through a really tight bottleneck into channels it isn't used to utilizing, y'know you could really- doooooont care shakes butt

x__- @crumb.bsky.social · Sep 18

there are thousands of steps where tens of steps happen... and there are tens of steps where thousands of steps happen...

1

x__- @crumb.bsky.social · Sep 18

took you long enough Dumb Ass

1 3

x__- @crumb.bsky.social · Sep 16

like we are not in a 90s movie about The Future we are in the real world

2

x__- @crumb.bsky.social · Sep 16

i think there are much more profound endings we can reach if we just follow where the tech wants to go on its own

1 2

x__- @crumb.bsky.social · Sep 16

exploratory research vs Product Building research

1 2

x__- @crumb.bsky.social · Sep 16

i think... working towards a set goal like "agi" is not really conducive to finding out what this specific tech stack could be the best at

1 1 2

x__- @crumb.bsky.social · Sep 16

Check out some visualizers like this here:
midwestern-simulation.neocities.org/main/library...

Check out the embedding model we created for them here:
hf.co/midwestern-s...

x__- @crumb.bsky.social · Sep 13

x__- @crumb.bsky.social · Sep 15

12 embedding tokens seems to be a sweet spot between reconstruction quality and ability to do math to the embeddings before decoding for our 3b model

1

x__- @crumb.bsky.social · Sep 13

1 2

x__- @crumb.bsky.social · Sep 11

expecting a lot of fun word embedding type arithmetic stuff to be possible here*
*once we train a vae so we can sample on the manifold

1

x__- @crumb.bsky.social · Sep 11

we r gonna post like 10 of these in a little bloggy thing to show off the latest essence 3b when it is done training.. more toys... more toys.....

╰──╮╭──╯ @metalure.bsky.social · Sep 11

1 1 2

x__- @crumb.bsky.social · Sep 10

🐱

1

x__- @crumb.bsky.social · Sep 10

lol

2