Dan Saattrup Smart
banner
saattrupdan.com
Dan Saattrup Smart
@saattrupdan.com
Researcher and consultant in low-resource NLP, with a focus on evaluation. saattrupdan.com
If we dig down into more granular evaluations, we see that the main discrepancies between the two models lie in that o3-mini gets a higher text classification performance, where gpt-4o performs better at common-sense reasoning.

(3/4)
February 10, 2025 at 4:33 PM
Overall, the gpt-4o model achieves a slightly better rank score of 1.46, compared to o3-mini's 1.51. Here lower is better, with 1 being the best score possible (indicating that the model beats all other models at all tasks).

We use the default 'medium' reasoning effort of o3-mini here.

(2/4)
February 10, 2025 at 4:33 PM
However, for Icelandic, Faroese and Norwegian, it's not quite there yet.
January 20, 2025 at 2:01 PM
For Danish, Swedish, Dutch, German and English, it turns out that it is roughly on par with GPT-4-turbo!
January 20, 2025 at 2:01 PM
Loving this Neovim plugin ❄️

Source: github.com/marcussimons...
December 13, 2024 at 5:32 PM