x__-
banner
crumb.bsky.social
x__-
@crumb.bsky.social
100 followers 110 following 92 posts
https://midwestern-simulation.neocities.org/
Posts Media Videos Starter Packs
i hope everyone that had a hand in making assistants the norm for what "language models" are goes to hell no matter what
have been revisiting this a lot
youtu.be/0BVM0UC28nY
idk if you're supposed to use it like this but you can
friggin massive shout out to openinference hosting deepseek v3.1 on openrouter for free
even tho we trained on filtered data generated by deepseek v3 base, our desc2doc model didn't follow prompts as well as we'd hoped. so last night i pounded out a rubric based trainer using deepseek v3.1 (:free) as judge. it is now running. yaaay
if you dont write your trainers from scratch tailored specifically for the needs of every new task you're... well probably normal but it's really fun once you get in the habit. plus you internalize how it works a lot better. plus lots of time to listen to new music
even tho we trained on filtered data generated by deepseek v3 base, our desc2doc model didn't follow prompts as well as we'd hoped. so last night i pounded out a rubric based trainer using deepseek v3.1 (:free) as judge. it is now running. yaaay
Crumb You're being a little hard on the model, you are pushing information through a really tight bottleneck into channels it isn't used to utilizing, y'know you could really- doooooont care shakes butt
there are thousands of steps where tens of steps happen... and there are tens of steps where thousands of steps happen...
took you long enough Dumb Ass
like we are not in a 90s movie about The Future we are in the real world
i think there are much more profound endings we can reach if we just follow where the tech wants to go on its own
exploratory research vs Product Building research
i think... working towards a set goal like "agi" is not really conducive to finding out what this specific tech stack could be the best at
Check out some visualizers like this here:
midwestern-simulation.neocities.org/main/library...

Check out the embedding model we created for them here:
hf.co/midwestern-s...
12 embedding tokens seems to be a sweet spot between reconstruction quality and ability to do math to the embeddings before decoding for our 3b model
expecting a lot of fun word embedding type arithmetic stuff to be possible here*
*once we train a vae so we can sample on the manifold
we r gonna post like 10 of these in a little bloggy thing to show off the latest essence 3b when it is done training.. more toys... more toys.....
+ hopfully after this run and a context extension the broader system for wrapping human beings as reservoirs comes clearer into view, this model should be able to represent state as clearly as i want
+ higher more thorough range of n embed tokens explored, 1-128 instead of [4,8,16,32,64] after seeing first model exhibited some generalization to any n any way
+ this is one unified model instead of two separate models, preliminary testing showed "meh it should work probably"
this is another decoder -> embedding sequence model conversion but in addition to reconstruction it's being given auxiliary tasks: span corruption and masked language modelling
lets go man fuck em up 𝔱𝔬𝔲𝔤𝔥-𝔡𝔯𝔞𝔤𝔬𝔫-₂₅₈
ETA83:50:08