Lightnews — Scholar-powered news

Dan Saattrup Smart

@saattrupdan.com

280 followers 930 following 34 posts

Researcher and consultant in low-resource NLP, with a focus on evaluation. saattrupdan.com

Posts Replies Media Videos

Dan Saattrup Smart

@saattrupdan.com

If we dig down into more granular evaluations, we see that the main discrepancies between the two models lie in that o3-mini gets a higher text classification performance, where gpt-4o performs better at common-sense reasoning.

(3/4)

February 10, 2025 at 4:33 PM

Dan Saattrup Smart

@saattrupdan.com

Overall, the gpt-4o model achieves a slightly better rank score of 1.46, compared to o3-mini's 1.51. Here lower is better, with 1 being the best score possible (indicating that the model beats all other models at all tasks).

We use the default 'medium' reasoning effort of o3-mini here.

(2/4)