@spellbanisher.bsky.social
This one wasn't too bad
December 1, 2025 at 12:35 AM
Same prompt, but on a foggy car window instead of a chalkboard
November 29, 2025 at 2:13 PM
Using AI to write this article made it tedious to read. The 'rule of three' (where you have lists of attributes in 3 clauses) is especially egregious here, where it is used 5 times in 4 sentences.
July 5, 2025 at 1:41 PM
It is actually based on the training set. Another study found that the two-shot average on the evaluation set was 60%, with a high of 98%. They also estimated that an ensemble of 10 randomly selected people online would score 100%.
arxiv.org/html/2409.01...
December 23, 2024 at 9:06 PM
A smaller open source model running on less than .10$ per task managed 56% on arc-agi. O3 used 30,000x as much compute to get 88%. Wouldn't be surprised if used similar methods, with difference being compute. Openai did train the model for this domain.
December 21, 2024 at 2:58 PM
From google labs Imagefx
December 18, 2024 at 2:54 AM