aria 🍊
aurelium.me
aria 🍊
@aurelium.me
450 followers 340 following 550 posts
she/her software infra @ arcee, opinions my own
Posts Media Videos Starter Packs
if you put too much on ICL, you're going to eat into the benefits of DSA-style top-k sparse attention. you'll need to keep increasing your top-k value just to pull in enough context to answer a question
probably, but I am inclined to believe that an 80B-A1B model with a standard amount of stuff memorized will be a much better experience than even an ideal 1B "cognitive core" model with web search, at the cost of a bit of SSD space
imo, the key to the "cognitive core" is less fitting more and more capabilities into fewer parameters and more cranking sparsity up to a degree that makes total parameter count irrelevant
i don't entirely like that this is the case, tbh. Qwen's inflation of benchmark scores by training on billions of tokens of "technically not the test set" is, imo, dishonest and does bad things to their models' downstream performance while also disadvantaging anyone who doesn't do the same
at Arcee our post-training library has custom model implementatons built in for two architectures: Qwen and our own

they have fully taken the mantle of the Standard Open Model from llama at this point
i work a schedule I call "???10". i work an unspecified amount of time a day at random times, 90% of which is waiting idly for things to finish
can confirm. right now I am "working on machine learning" by reading posts and eating potato chips while I wait 30 minutes to see if a bug manifests on an earlier checkpoint
they're the kind of lab that drops a ~solution to the problem of escalating compute costs for long-context inference in a release called "V3.2"
Reposted by aria 🍊
claude sonnet 3.6's yellowstone vacation
the base text-completion LLM definitely *can* be genuinely bigoted in a convincing way. it's not outside of its capabilities

I just think that if you optimize in post-training for "be a useful chatbot", the model will naturally kick itself out of basins associated with misanthropy and hatred
like them or not the AI-safetyists and AGI true believers are the ones that stop OpenAI (and to a lesser extent anthropic) from becoming slightly-less-partisan xAI
that and synthetic porn for the most pathetic men alive
it's not as if he revealed any new information. anyone paying attention to anything besides idle speculation about secret proprietary model performance would not find this especially novel
I disagree vehemently with the vision of the world that most of these frontier labs have but it's a far cry between this and that
the only big AI lab whose employees you regularly see publicly arguing to repeal women's suffrage is xAI. at this point anyone who survived the initial exodus(es) is really suspect to me
holy shit that is a lot of people. and I was one of them, probably!
Seattle No Kings from the monorail
finding myself in this picture was like a game of Where's Waldo
Reposted by aria 🍊
Taking off my balaclava to reveal a second, identical balaclava underneath. Everyone at the fusion intelligence center groans at my shit
the RevComs in cap hill succeeded at confusing a bunch of people about where the protest was

that's an achievement I guess
Seattle No Kings was amazing, and now I will collapse unconscious at home
finally invented The VRAM Torturer from the hit novel "The VRAM Torturer is a pretty good quality-of-life feature for your training library"
besides the public safety benefits, there would be a lot of social benefits to putting drunk drivers in prison the first time around. think of all the domestic abuse that could be stopped!
this video is how I find out a DUI is only a misdemeanor in FL (and many other states)... how the fuck? explains why so many people die from drunk drivers I guess
i do think the officer was named "Lane" or something, bc he says it a lot and it sounds more like "Lane, don't do this to me"